## Вы здесь

# Новости LessWrong.com

*Адрес:*https://www.lesswrong.com/

*Обновлено:*59 минут 38 секунд назад

### (notes on) Policy Desiderata for Superintelligent AI: A Vector Field Approach

**Meta:** I thought I'd spend a little time reading the policy papers that Nick Bostrom has written. I made notes as I went along, so I spent a little while cleaning them up into a summary post. These are my notes on Bostrom, Dafoe and Flynn's 2016 policy desiderata paper, which received significant edits in 2018. I spent 6-8 hours on this post, not a great deal of time, so I've not been maximally careful.

Overall, this is not a policy *proposal*. Nor does it commit strongly to a particular moral or political worldview. The goal of this paper is to merely observe which policy challenges are especially important or different in the case of superintelligent AI, that most moral and political worldviews will need to deal with. The paper also makes no positive argument for the importance or likelihood or timeline of superintelligent AI - it instead assumes that this shall occur in the present century, and then explores the policy challenges that would follow.

Botrom, Dafoe and Flynn spend a fair amount of time explaining that they’re not going to be engaging in what (I think) Robin Hanson would call standard value talk. They’re not going to endorse a particular moral or political theory, nor are they going to adopt various moral or political theories and show how they propose different policies. They’re going to look at the details of this particular policy landscape and try to talk about the regularities that will need to be addressed by most standard moral and political frameworks, and in what direction these regularities suggest changing policy.

They call this the ‘vector field’ approach. If you don't feel like you fully grok the concept, here's the quote where they lay out the formalism (with light editing for readability).

The vector field approach might then attempt to derive directional policy change conclusions of a form that we might schematically represent as follows: “However much emphasis .mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} X you think that states ought, under present circumstances, to give to the objective of economic equality, there are certain special circumstances Y, which can be expected to hold in the radical AI context we described above, that should make you think that in those circumstances states should instead give emphasis fY(X) to the objective of economic equality."The idea is that f here is some relatively simple function, defined over a space of possible evaluative standards or ideological positions. For instance, f might simply add a term to X, which would correspond to the claim the emphasis given economic equality should be increased by a certain amount in the circumstances Y (according to all the ideological positions under consideration). Or f might require telling a more complicated story, perhaps along the lines of:“However much emphasis you give to economic equality as a policy objective under present circumstances, under conditions Y you should want to conceive of economic equality differently—certain dimensions of economic inequality are likely to become irrelevant and other dimensions are likely to become more important or policy-relevant than they are today.”I particularly like this quote:

This vector field approach is only fruitful to the extent that there are some patterns in how the special circumstances Y impact policy assessments from different evaluative positions. If the prospect of radical AI had entirely different and idiosyncratic implications for every particular ideology or interest platform, then the function f would amount to nothing more than a lookup table.I read this as saying something like “This paper only makes sense if *facts* matter, separate to *values*.” It’s funny to me that this sentence felt necessary to be written.

A few more quotes on what the paper is trying to do.

A strong proposal for the governance of advanced AI would ideally accommodate each of these desiderata to a high degree. There may exist additional desiderata that we have not identified here; we make no claim that our list is complete. Furthermore, a strong policy proposal should presumably also integrate many other normative, prudential, and practical considerations that are either idiosyncratic to particular evaluative positions or are not distinctive to the context of radical AI.[...]Using a “vector field” approach to normative analysis, we sought to extract directional policy implications from these special circumstances. We characterized these implications as a set of desiderata—traits of future policies, governance structures, or decision-making contexts that would, by the standards of a wide range of key actors, stakeholders, and ethical views, enhance the prospects of beneficial outcomes in the transition to a machine intelligence era[...]By “policy proposals” we refer not only official government documents but also plans and options developed by private actors who take an interest in long-term AI developments. The desiderata, therefore, are also relevant to some corporations, research funders, academic or non-profit research centers, and various other organizations and individuals.Next are the actual desiderata. They're given under four headings (efficiency, allocation, population, and process), each with 2-4 desiderata. Each subheading below corresponds to a policy desiderata in the paper. For each desiderata I have summarised of all the arguments and considerations in the text that felt new or non-trivial to me personally (e.g. I spent only one sentence on the arguments for AI safety).

If you want to just read the paper's summary, jump down to page 23 which has a table and summarises in their own words.

Efficiency DesiderataExpeditious progressWe should make sure to take ahold of our cosmic endowment - and the sooner the better.

AI safetyChoose policies that leads us to develop sufficient technical understanding that the AI will do what we expect it to do, and that give these tools to AI builders.

Conditional stabilizationThe ability to establish a singleton, or regime of intensive global surveillance, or ability to thoroughly suppress the spread of dangerous or info, should we need to use this ability in the face of otherwise catastrophic global coordination failures.

Non-turbulenceTechnology will change rapidly. We don’t want to have to rush regulations through, or alternatively take too long to adapt such that the environment radically changes again. So try to reduce turbulence.

Allocation DesiderataUniversal benefitIf you force someone to take a risk, it is only fair that they are compensated with a share of any reward gained. Existential risks involve everyone, so everyone should get proportional benefit.

Epsilon-magnanimityMany people’s values have diminishing returns to further resources e.g. income guarantees for all, ensuring all animals have minimally positive lives, aesthetic projects like preserving some artworks, etc. While today they must fight for a cut of the small pie, as long as they are granted a non-zero weighting in the long-run, they can be satisfied. 0.00001% of GDP may be more than enough to give all humans a $40k income, for example.

This is especially good in light of normative uncertainty - as long as we give some weighting to various values, they will get satiated in a basic way in the long-run.

ContinuityReasons to expect unusually high concentration and permutation of wealth and power:

- In the modern world, salary is more evenly distributed than capital. Superintelligent AI is likely to greatly increase the factor share of income accrued from capital, leading to massive increases in inequality and increase concentration of wealth.
- If a small group decides how the AI works and its high-level decisions, they could gain a decisive strategic advantage and take over the world.
- If there is radical and unpredictable technological change, then it is likely that wealth distribution will change radically and unpredictably.
- Automated security and surveillance systems will help a regime stay alive without support from the public or elites - when behaviour is more legible it’s easier to punish or control it. This is also likely to at least sustain concentration of wealth and power, but also to increase it.

As such we wish to implement policies that more sustain existing concentration and distribution of wealth and power.

Also of interest, is (given the high likelihood of redistribution, change in concentration, and general unpredictable turbulence) how much we seem to face a global, real-life, Rawlsian veil-of-ignorance. It might be good to set up things like insurance to make sure everyone gets some minimum of power and self-determination in the future (it seems that people have diminishing returns to power - “most people would much rather be certain to have power over one life (their own) than have a 10% chance of having power over the lives of ten people and a 90% chance of having no power.”

Population DesiderataMind crime preventionFour key factors: novelty, invisibility, difference, and magnitude.

- Novelty and invisibility: Sentient digital entities may be moral patients. They would be a novel type of mind, and would not exhibit many characteristics that inform our moral intuitions - they lack facial expressions, physicality, human speech, and so on, if they are being run invisibly in some microprocessor. This means we should worry about policy makers taking an unconscionable moral decision.
- Difference: It is also the case that these minds may be very different to human or animal minds, again subverting our intuitions about what behaviour is normative toward them, and increasing the complexity of choosing sensible policies here.
- Magnitude: It may be incredibly cheap to create as many people as currently exist in a country, magnifying the concerns of the previous three factors. “With high computational speed or parallelization, a large amount of suffering could be generated in a small amount of wall clock time.” This may mean that mind crime is a principal desideratum in AI policy.

This is a worry about malthusian scenarios (where average income falls to subsistence levels). Hanson has written about these scenarios.

This can also undermine democracy (“One person, one vote”). If a political faction can invest in creating more people, they can create the biggest voting block. This leaves the following trilemma of options:

- (i) deny equal votes to all persons
- (ii) impose constraints on creating new persons
- (iii) accept that voting power becomes proportional to ability and willingness to pay to create voting surrogates, resulting in both economically inefficient spending on such surrogates and the political marginalization of those who lack resources or are unwilling to spend them on buying voting power

Some interesting forms of (i):

- Make voting rights something you inherit, a 1-1 mapping.
- Robin Hanson has suggested ‘speed-weighted voting’, because faster ems are more costly, so you'd actually have to pay a lot for marginal voters. This still looks like richer people getting a stronger vote, but in-principle puts a much higher cost on it.

Overall this is an especially different environment than usual policy-making, which means that we will need to be able to reconsider fundamental assumptions using first-principles thinking to a greater extent than before and be exceptionally wise (able to get the right answer to the most important questions while they are surrounded by confusion and misunderstanding).

Technological innovation is the primary driver of this radical new policy landscape, and so an understanding of the technologies is unusually helpful.

Speed and decisivenessIn many possible futures, historic events will be happening faster than global treaties are typically negotiated, ratified, and implemented. We need a capacity for rapid decision-making and decisive global implementation.

AdaptabilityMany fundamental principles will need to be re-examined. Some examples: legitimacy, consent, political participation, accountability.

**Voluntary consent.** Given AIs that are super-persuaders and can convince anyone of anything, consent becomes a much vaguer and fuzzier concept. Perhaps consent only counts if the consentee has an “AI guardian” or “AI advisor” of some sort.

**Political participation.** This norm is typically justified on three grounds:

- Epistemic benefit of including information from a maximal diversity of sources.
- Ensures all interests and preferences are given some weighting in the decision.
- Intrinsic good.

However,

- The epistemic effect may become negative if the AI making decisions sits at a sufficiently high epistemic vantage point.
- AI may be able to construct a process / mechanism that accounts for all values without consistent input from humans.
- The intrinsic good is not changed, though it may not be worth the cost if the above to factors become strongly net negative and wasteful.

The above examples, of consent and political participation, are not at all clear, but just go to show that there are many unquestioned assumptions in modern political debate that may need either reformulation, abandonment, or extra vigilance spent on safeguarding their existence into the future.

Changes since 2016The paper was originally added to Nick Bostrom's website in 2016, and received an update in late 2018 (original, current).

The main updates as I can see them are:

- The addition of 'vector field approach' to the title and body. It was lightly alluded to in the initial version. (I wonder if this was due to lots of feedback trying to fit the paper into standard value talk, where it did not want to be.)
- Changing the heading from "Mode" to "Process", and fleshing out the three desiderata rather than a single one called "Responsibility and wisdom". If you read the initial paper, this is the main section to re-read to get anything new.

There have definitely being significant re-writings of the opening section, and there may be more, but I did not take the time to compare them section-for-section.

*I've added some personal reflection/updates in a comment.*

Discuss

### Evicting Religious Beliefs

In this post I want to tell the story of how I finally evicted a belief that was overdue on rent . I never believed this belief it was always compartmentalized as a touchy subject not in the sense that it was in a different magistrate but just the people I loved had this belief and I never had the courage to touch it or do my due diligence.

Upon reading the sequences, It started to seem to me that this was hilarious so I went on a quest of research, verification and thinking to see if it had any sense.It didn't it never actually did .

I was born and raised a Muslim not by my parents but by society and a school curricula of 4 hours a week of "Islamic Education" . In class other students who prayed and did their religious duty often told stories of how their parents thought them, mine weren't doing theirs my father was a biologist and geologist by training. He thought Biology and Geology at high-school for 10 years. We had scientific magazines at home, books and documentaries and dad's explanation every time I asked or someone else asked a question like how this particular pattern was formed, or what is the theory of evolution. We didn't care about the subject. Still the education of society, classmates was strict on the subject to a particular degree.

During Islamic Education classes I often asked questions about contradictory statements in the Quran or why something I find evil is asked to be done. Often the teacher's answer were you know mysterious . This didn't bother me at the start but after time and with my parents not caring but **still believing the belief** I just put it in a box . I would open it from time to time to use the **prayer tool** before an exam or when talking to others. Because this was a true subject to others and when it was up for not-discussion. It's true It's in the holy book was the common answer.

When I started college I had 5 roommates they were all religious in their practice, praying 5 times a day at the exact time (on the call to prayer) and always discussing *hadiths*, *haram behavior seen in college* and many things. They never had a problem with me since they also were intellectually curious and liked it when I was explaining something . Our mutual respect and bond was based on our love for mathematics and physics. I forgot to add that before this time I tried praying, I never sensed anything in particular so I would stop and sometimes later pray ...

When I told them that I have prayed before but never sensed anything they told me that with time God blesses you and you'll start having *such feeling* . Unbeknown to me *confirmation bias* or any *bias* at all (my time spent was spent on computers) .

We split at the end of the year for some unrelated reasons, mostly circumstances. The belief still remained in that box, I believed because I was scared to not believe for a mysterious (map) reason . But now I opened the box, reevaluated the evidence may an update and evicted the belief. It was quite simple now because I had some Bayesian training although rookie but 3 points of bayesianjutsu is good enough in my environment . I looked at all the arguments and it always seemed that the strongest point was the divination and specialness of our holy book the Quran since it's the only book that never was edited. The Quran is considered the most superior thing after God and the Prophet it's **the book** .

Wrong !

Our holy book history is so full of drama that and events that it sounds like a joke.I've never read it all until the past week with the lens of a critic highlighting and circling , acts considered as crimes actually to do that, anything from bad structural language to contradictions, repetitions and stories. Making a point at each one

Classes of Islamic Education taught us that the Quran was spoken from Gabriel to Mohammed in a cave (What is it with Gods,prophets and caves anyway) during 23 years.They told us that it was spoken in Arabic by Gabriel and God chooses it because it's his favorite language.50 years later it was collected by Uthman-Ibn' Aafan and made into the one we use today. That' it .When you'd ask how it wasn't edited we were told humans at that time had incredible memory just look at how these *hadiths* were remembered. Not evidence, just filler answers.I always suspected this but never made the effort to look.

Actually the story is more complex, the book itself was edited over the course of 300 years and this was said by current equivalent of high priests"sheikhs" or "aalim" it means someone was devoted his life to Islamic study .Arabic as a language is complex there are a lot of rules to respect and something called "chakel" which roughly means form, it's the symbols you find in any Arabic text above or below some letters, and if you remove it then 2 letters become just one like the "ش" ,and it's sister "س" . The original Quran was said by critics to be a translation from a christian book by Arius, over the years a lot of things were added that reinforced it's Arabic origins and other fillers . In the text itself, there is a sentence "this text is written in Arabic" repeated 10 times , I've read a lot of books there was never any reference to the language it's written in , it's obvious .Back to languages, after the prophet died , 20 years later words were added that designated possession and some particular grammar of Arabic, 50 years later form was added to show how to read the words.

This practice of removing punctuation, form and filler letters is called filtering it's used to study the book in it's so called original format.

Now this would always be given the benefit of the doubt, but you have to understand that the Quran is confusing by itself the Arabic used is of poor structure, contradictory and is vague. There are over 200,000 thousands books and volumes dedicated to explaining the Quran with more than 300 different interpretations for the same verse. Syriac words present in the text like ,"سارية"~ sarya, in the verse that tells the story of Mary and says "the one coming below you is sarya" in the Arabic definition of origin it means river but in the context of the story it means nothing more than 200 interpretations are present that say it means generous or good . Actually the Syriac meaning of the word is "not a bastard or legitimate". The word "نصرة" points to Christians in the Quran. This was a Syriac word not present in any other language.

Some words are without Arabic definition simply like " طود " spelled "Tod" which is defined in **modern** Arabic as mountain but the word is only mentioned once in the book in the story of the Exodus and describes the act of Moses splitting the sea to two parts that rose like a mountain. A similar word mentioned 10 times within the same context is "طور" spelled "Tor" of Syriac origin in spelling and meaning (mountain) "طور ".

There are many other words, a huge number of them that are Syriac but were later added to Arabic.Some words such as "Kawthar" with an entire surat has no meaning neither in Arabic or Syriac or any language, a prominent Muslim scholar Ibn Al Naqib gave 26 different interpretations of it .

Later after Mohammed's death more verses were removed or added on particular describes the act of "stoning" that was removed after Mohammed's death because it was God's decision when he sent a goat that ate parts of the Quran during the prophet's funeral, the stoning verse was later added by Omar Ibn Al Khattab a prominent friend "sahabi" who is for sure going to heaven according to Islam.

The word Quran isn't Arabic it's derived from Qurian a Syriac word meaning the book of liturgical readings.

A lot of the stories in the book aren't Muslim at all in fact the Islam parts are additions to relics of Judaism and Christianity like the story of Gog and Mahgog or Noah and the boat a story from Sumerian 1600 B.C.E . One story popular among those who say Islam is a religion of good is that of the two brothers Cain and Abel where the famous line occurs "he who kills one soul is like killing all of humanity..." a story with origins in a Jewish text Mishnah Sanhedrin . A striking one is a story taken from Targum of Esther that was translated to a king Salomon and a queen Saba with a difference instead of Red-cock in Targum the Quran speaks of a lapwing.

A lot is to be told, there could be volumes written about these discrepancies alone but the beliefs of Islam in my society are stronger than those in Western countries here people die according to these rules, to criticize is blasphemy punished by jail (we were colonized by France which left a bit of western modernism thankfully) .

This and a series of research I found and thought about made me kick off the belief officially and wear a silly invisible hat to deal with society for the time being where I respond with "Bless you", "God's will" and other relics of language that I found hard to drop but pretty useful as social skills.

Funny enough the religion I was defending in front of Dawkins ideas of Islam or others who call it a religion of terrorism turned out to be hilarious. There are some beautiful contradictions in the text that make me cringe like the story of Cain and Abel that condemns death and the use of the sentence "kill those who don't belief or turn to atheism" . The belief in the box that I defended sometimes turned to be a hilarious jokes full of logical fallacies, myths like the story of wars that never happened and other condemning truths.

My religion sums up to nothing but interpretations, bias exploitation and myths the book is filled with intimidation and seduction in the same sexists contexts of women, virgins and rivers of red wine.

I now can't fathom the idea of God or Islam anymore to me the remaining questions that held the belief in my mind were completely destroyed the belief or it's remains was evicted and it's time to evict others.

Thanks to Eliezer, Luke and lesswrong essays who pushed me to see my biases and be curious and more importantly to school myself.

Discuss

### (Why) Does the Basilisk Argument fail?

As far as I can tell, the standard rebuttal to Roko's idea is based on the strategy of simply ignoring acausal blackmail. But what if you consider the hypothetical situation that there is no blackmail and acausal deals involved, and the Basilisk scenario is how things just *are*? Simply put, if we accept the Computational Theory of Mind, we need to accept that our experiences could in principle be a simulation by another actor. Then, if you ever find yourself entertaining the question:

**Q1**: Should I do everything I can to ensure the creation/simulation of an observer that is asking itself the same question as I am now, and will suffer a horrible fate if it answers

*no*?

Would the answer that maximizes your utility not be *yes*? The idea being that answering *no* opens up the possibility that you are an actor created by someone who answered *yes, *and thus, might end up suffering whatever horrible fate he might have implemented. This seems to be the argument that Roko put forth in his original post, and that I have not seen soundly refuted.

Of course, one could argue that there is no evidence for being in such a simulation instead of the real world, and thus that the commitment and ethical dilemma of answering *yes *is not worth it. If we assign equal probability to all worlds we might inhabit, this might indeed be convincing. However, I think a kind of anthropic argument, based on Bostroms Self-Sampling Assumption (SSA) could be constructed to show that we may indeed be more likely to be in such a simulation. Assume your your hypotheses to be

**H0**: "I am living in the real world, which is as it appears to be, i.e. the possible observers are the ≈ 7 billion humans on planet earth".

**H1**: "The real world is dominated by actors who answered *yes *to Q1, and (possibly with the help of AGI) filled the universe with simulations of observers pondering Q1".

Then, given the data

**D:** "I am an observer thinking about Q1"

And assuming that in the world as it appears to be in **H0**, only very few people, say, one in ten thousand, asked themselves this question Q1, while under **H1**, the universe is literally filled with observers thinking about Q1, we get

**Pr[D|H0]** = 0.0001

**Pr[D|H1]** ≈ 1

So, the Bayes factor would show very clear evidence in favor of **H1**. Of course, this would apply to any hypothetical model in which most observers would have exactly the same thoughts as we (and may indeed lead to solipsism), but **H1** at least gives a convincing reason on why this should be the case.

So how can we avoid biting the bullet of answering *yes *to Q1,* *which seems like a very unattractive option, but possibly still better than being the only actor to answer *no*? Admittedly, I am quite new to the idea of anthropic reasoning, so my logic could be flawed. I would like to hear thoughts on this.* *

Discuss

### What are some of bizarre theories based on anthropic reasoning?

Doomsday argument and simulation argument are quite bizarre for most people. But these are not the only strange theories one can come up with when employing anthropics.

Some examples:

**It is likely to have an unusual high IQ.**

Perhaps brain works in a way, that having high IQ correlates with something that also causes more observer moments. Hence there are more of high IQ experience in the world than that of low IQ.

**Fragile universe**

Total universe destroying physical catastrophes that expands in speed of light (say false vacuum collapse) could be very frequent. As much, as once every second. And it is only due to survivor bias that we think universe is stable and safe. How would we know?

**Animals does not have consciousness**

There are more animals than humans on Earth. Still we find ourselves as humans. Perhaps it is because we only can be humans, as only humans have consciousness.

**We are stuck inside infinite loop**

Lets assume simulation argument is correct. Then we probably exist inside simulation, run by some software. All software have bugs. One of the bugs software sometimes have is getting itself in an infinite loop. Biggest amount of experience computed by this software then could possibly be inside this infinite loop.

Discuss

### Constructing Goodhart

A recent question from Scott Garrabrant brought up the issue of formalizing Goodhart’s Law. The problem is to come up with some model system where optimizing for something which is almost-but-not-quite the thing you really want produces worse results than not optimizing at all. Considering how endemic Goodhart’s Law is in the real world, this is surprisingly non-trivial.

Let’s start simple: we have some true objective .mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} u(x), and we want to choose x to maximize it. Sadly, we don’t actually have any way to determine the true value u for a given value x — but we can determine u(x)+ϵ(x), where ϵ is some random function of x. People talked about this following Scott’s question, so I won’t math it out here, but the main answer is that more optimization of u+ϵ still improves u on average over a wide variety of assumptions. John Maxwell put it nicely in his answer to Scott’s question:

*If your proxy consists of something you’re trying to maximize plus unrelated noise that’s roughly constant in magnitude, you’re still best off maximizing the heck out of that proxy, because the very highest value of the proxy will tend to be a point where the noise is high and the thing you’re trying to maximize is also high.*

In short: absent some much more substantive assumptions, there is no Goodhart effect.

Rather than generic random functions, I suggest thinking about Goodhart on a causal DAG instead. As an example, I’ll use the old story about soviet nail factories evaluated on number of nails made, and producing huge numbers of tiny useless nails.

We really want to optimize something like the total economic value of nails produced. There’s some complicated causal network leading from the factory’s inputs to the economic value of its outputs (we’ll use a dramatically simplified network as an example).

If we pick a specific cross-section of that network, we find that economic value is mediated by number of nails, size, and strength — those variables are enough to determine the objective. All the inputs further up influence the objective by changing number, size, and/or strength of nails.

Now, we choose number of nails as a proxy for the objective. If we were just using this proxy to optimize machine count, that would be fine — machine count only influences our objective via number of nails produced, it doesn’t effect size or strength, so number of nails is a fine proxy for our true objective for the purpose of ordering machines. But mould shape is another matter. Mould shape effects both number and size, so we can use a smaller mold to increase number of nails while decreasing size. If we’re using number as a proxy for the true objective, ignoring size and strength, then that’s going to cause a problem.

Generalizing: we have a complicated causal DAG which determines some output we really want to optimize. We notice that some node in the middle of that DAG is highly predictive of happy outputs, so we optimize for that thing as a proxy. If our proxy were a bottleneck in the DAG — i.e. it’s on every possible path from inputs to output — then that would work just fine. But in practice, there are other nodes in parallel to the proxy which also matter for the output — in our example, size and shape. By optimizing for the proxy, we accept trade-offs which harm nodes in parallel to it, which potentially adds up to net-harmful effect on the output.

So we have a model which can potentially give rise to Goodhart, but *will* it? If we construct a random DAG, choose a proxy node close to the objective, and optimize for that proxy, we probably won’t see a Goodhart effect (at least not right away). Why not? Well, if we’ve just initialized all the parameters randomly, then whatever change we make to optimize for number of nails is just as likely to improve other sub-objectives as to harm them. For instance, if we’re starting off with a random mould, then it’s just as likely to be too big as too small — if it’s producing giant useless nails, then shrinking the mould improves both number *and* size of nails.

Of course, in the real world, we probably wouldn’t be starting from a giant useless mould. Goodhart hits in the real world because we’re not just starting from random points, we’re starting from points which have had some optimization already. But we’re not starting from the best possible point — then any change would be bad, proxy optimization or not. Rather, I expect that most real systems are starting from a pareto-optimal point.

Here’s why: look at the cross-section of our causal DAG from earlier. Number, size, strength… in the business world, we’d call these key performance indicators (KPIs) for the factory. If something obviously improves one or more KPIs without any downside, then usually everyone immediately agrees that it should be done. That’s the generalized efficient markets hypothesis, on super-easy mode. Without trade-offs, optimization is trivial. Add trade-offs, and things get contentious: there’s a trade-off between number and size, so the quality assurance department gets into an argument with the sales department about how to handle the trade-off, and some agreement is hammered out which probably isn’t all that optimal.

If we’ve made all the optimizations we can without getting into trade-offs, then we’re at a pareto optimal point: we cannot improve any KPI without harming some other KPI. If we expect those optimizations to be easy and to happen all the time, then we should expect to usually end up at pareto optima.

And if we’re already at a pareto optimum, and we start optimizing for some proxy objective, then we’re *definitely* going to harm all the other objectives. That’s the whole point of pareto optimality, after all: we can’t improve one thing without trading off against something else. That doesn’t mean that we’ll see net harm to the true objective right away; even if we’re pareto optimal, we could be starting from a point with far too few nails produced. If the factory has a culture of unnecessary perfectionism, then pushing for higher nail count may help. But keep pushing, and we’ll slide down the pareto curve past the optimal point and into unhappy territory. That’s the mark of a Goodhart effect.

Discuss

### Conclusion to the sequence on value learning

*This post summarizes the sequence on value learning. While it doesn’t introduce any new ideas, it does shed light on which parts I would emphasize most, and the takeaways I hope that readers get. I make several strong claims here; interpret these as my impressions, not my beliefs. I would guess many researchers disagree with the (strength of the) claims, though I do not know what their arguments would be.*

Over the last three months we’ve covered a lot of ground. It’s easy to lose sight of the overall picture over such a long period of time, so let's do a brief recap.

The “obvious” approachHere is an argument for the importance of AI safety:

- Any agent that is much more intelligent than us
__should not be exploitable__by us, since if we could find some way to exploit the agent, the agent could also find the exploit and patch it. - Anything that is not exploitable must be an
__expected utility maximizer__; since we cannot exploit a superintelligent AI, it must look like an expected utility maximizer to us. - Due to
__Goodhart’s Law__, even “slightly wrong” utility functions can lead to catastrophic outcomes when maximized. - Our utility function is complex and
__fragile__, so getting the “right” utility function is difficult.

This argument implies that by the time we have a superintelligent AI system, there is only one part of that system that could still have been influenced by us: the utility function. Every other feature of the AI system is fixed by math. As a result, we must *necessarily* solve AI alignment by influencing the utility function.

So of course, the natural approach is to __get the right utility function__, or at least an __adequate__ one, and have our AI system optimize that utility function. Besides __fragility of value__, which you might hope that machine learning could overcome, the big challenge is that even if you assume __full access to the entire human policy__, we __cannot infer their values__ without making an assumption about how their preferences relate to their behavior. In addition, any __misspecification__ can lead to __bad inferences__. And finally the entire project of having a single utility function that captures optimal behavior in all possible environments seems quite hard to do -- it seems necessary to have some sort of __feedback from humans__, or you end up extrapolating in some strange way that is not necessarily what we “would have” wanted.

So does this mean we’re doomed? Well, there are still some __potential avenues__ for rescuing ambitious value learning, though they do look quite difficult to me. But I think we should actually question the assumptions underlying our original argument.

Consider the calculator. From the perspective of someone before the time of calculators, this device would look quite intelligent -- just look at the speed with which it can do arithmetic! Nonetheless, we can all agree that a standard calculator is not dangerous.

It also seems strange to ascribe goals to the calculator -- while this is not *wrong* per se, we certainly have better ways of predicting what a calculator will and will not do than by modelling it as an expected utility maximizer. If you model a calculator as aiming to achieve the goal of “give accurate math answers”, problems arise: what if I take a hammer to the calculator and then try to ask it 5 + 3? The utility maximizer model here would say that it answers 8, whereas with our understanding of how calculators work we know it probably won’t give any answer at all. Utility maximization with a simple utility function is only a good model for the calculator within a restricted set of environmental circumstances and a restricted action space. (For example, we don’t model the calculator as having access to the action, “build armor that can protect against hammer attacks”, because otherwise utility maximization would predict it takes that action.)

Of course, it may be that something that is generally superintelligent will work in as broad a set of circumstances as we do, and will have as wide an action space as we do, and must still look to us like an __expected utility maximizer__ since __otherwise we could Dutch book it__. However, if you take such a broad view, then it turns out that __all behavior looks coherent__. There’s no *mathematical* reason that an intelligent agent must have catastrophic behavior, since *any* behavior that you observe is consistent with the maximization of some utility function.

To be clear, while I agree with every statement in __Optimized agent appears coherent__, I am making the strong claim that these statements are *vacuous* and by themselves tell us nothing about the systems that we will actually build. Typically, I do not flat out disagree with a common argument. I usually think that the argument is important and forms a piece of the picture, but that there are other arguments that push in other directions that might be more important. That’s not the case here: I am claiming that the argument that “superintelligent agents must be expected utility maximizers by virtue of coherence arguments” provides *no* useful information, with almost the force of a theorem. My uncertainty here is almost entirely caused by the fact that other smart people believe that this argument is important and relevant.

I am *not* claiming that we don’t need to worry about AI safety since AIs won’t be expected utility maximizers. First of all, you *can* model them as expected utility maximizers, it’s just not useful. Second, if we build an AI system whose internal reasoning consisted of maximizing the expectation of some simple utility function, I think all of the classic concerns apply. Third, it does seem likely that __humans will build AI systems that are “trying to pursue a goal”__, and that can have all of the standard __convergent instrumental subgoals__. I propose that we describe these systems as __goal-directed__ rather than expected utility maximizers, since the latter is vacuous and implies a level of formalization that we have not yet reached. However, this risk is significantly different. If you believed that superintelligent AI *must* be goal-directed because of math, then your only recourse for safety would be to make sure that the goal is good, which is what motivated us to study __ambitious value learning__. But if the argument is actually that AI will be goal-directed because humans will make it that way, you could try to build __AI that is not goal-directed__ that can do the things that goal-directed AI can do, and have humans build that instead.

Now that we aren’t forced to influence just a utility function, we can consider alternative designs for AI systems. For example, we can aim for __corrigible__ behavior, where the agent is __ trying to do what we want__. Or we could try to

__learn human norms__, and create AI systems that follow these norms while trying to accomplish some task. Or we could try to create an AI ecosystem akin to

__Comprehensive AI Services__, and set up the services such that they are keeping each other in check. We could create systems that learn

__how to do what we want in particular domains__, by

__learning our instrumental goals and values__, and use these as subsystems in AI systems that accelerate progress, enable better decision-making, and are generally corrigible. If we want to take such an approach, we have another source of influence: the

__human policy__. We can train our human overseers to provide supervision in a particular way that leads to good behavior on the AI’s part. This is analogous to training operators of computer systems, and can benefit from insights from Human-Computer Interaction (HCI).

This sequence is somewhat misnamed: while it is organized around value learning, there are many ideas that should be of interest to researchers working on other agendas as well. Many of the key ideas can be used to analyze *any* proposed solution for alignment (though the resulting analysis may not be very interesting).

**The necessity of feedback.** The main argument of __Human-AI Interaction__ is that any proposed solution that aims to have an AI system (or a CAIS glob of services) produce good outcomes over the long term needs to continually use data about humans as feedback in order to “stay on target”. Here, “human” is shorthand for “something that we know shares our values”, eg. idealized humans, uploads, or sufficiently good imitation learning would all probably count.

(If this point seems obvious to you, note that __ambitious value learning__ does not clearly satisfy this criterion, and approaches like impact measures, mild optimization, and boxing are punting on this problem and aiming for not-catastrophic outcomes rather than good outcomes.)

**Mistake models.** We saw that __ambitious value learning__ has the problem that even if we __assume perfect information about the human__, we __cannot infer their values__ without making an assumption about how their preferences relate to their behavior. This is an example of a much broader pattern: given that our AI systems necessarily get feedback from us, they must be making some assumption about how to interpret that feedback. For any proposed solution to alignment, we should ask what assumptions the AI system is making about the feedback it gets from us.

Discuss

### AI Safety Prerequisites Course: Revamp and New Lessons

Previous post: Fundamentals of Formalisation Level 7: Equivalence Relations and Orderings. First post: Fundamentals of Formalisation level 1: Basic Logic.

Nine months ago we, RAISE, have started creating a Math Prerequisites for AI Safety online course. It has mostly MIRI research related subjects: set theory, computability theory, and logic, but we want to add machine learning related subjects in the future. For 4 months we've been adding new lessons and announcing them on LessWrong. Then we stopped, looked back and decided to improve their usability. That's what we've been busy with since August.

News since the last post- Big update of 7 levels we had previously published, which you can see in the picture above. The lessons use textbooks, which you will need to follow along. Previously lessons looked like "read that section; now solve problems 1.2, 1.3, 1.4c from the textbook; now solve these additional problems we came up with". Now our lessons still say "read that section", but the problems (and their solutions, in contrast to many textbooks, which don't provide solutions) are included in lessons themselves. Additional problems are now optional, and we recommend that students skip them by default and do them only if they need more practice. New levels in Logic, Set Theory, and Computability tracks will be like that as well.
- Level 1 was very long, consisted of 45 pages of reading, and could take 10 hours for someone unfamiliar with logic. We separated it into smaller parts.
- Two new levels. Level 8.1: Proof by Induction. Level 8.2: Abacus Computability.

If you study using our course, please give us feedback. Leave a comment here or email us at raise@aisafety.camp, or through the contact form. Do you have an idea about what prerequisites are most important for AI Safety research? Do you know an optimal way to learn them? Tell us using the same methods or collaborate with us.

Can you check if a mathematical proof and is correct? Do you know how to make proofs understandable and easy to remember? Would you like to help to create the prerequisites course? If yes, consider volunteering.

Discuss

### Rationality: What's the point?

*This post is part of my* *Hazardous Guide To Rationality.* *I don't expect this to be new or exciting to frequent LW people, and I would super appreciate comments and feedback in light of intents for the sequence, as outlined in the above link.*

A friend once articulated that he didn't like when things are taught, "Mr. Miygai style". A bunch of disconnected, unmotivated facts, exercises, and ideas are put before you, and it's only at the very end that it clicks and you see the hidden structure and purpose of everything you've learned.

Therefore, the very first post of this sequence is going to be a drive by of what I think some of the cool/useful/amazing things are that you can get out of The Way. I never would have become a close-up magician if I hadn't seen someone do incredible things that blew my mind.

Who Is This For?As much as it pains me to say this, it might not really matter whether or not you follow The Way. It really depends on what you're trying to do. The guy who kicked of the contemporary rationality community, Eliezer Yudkowsky, notes that besides a natural draw based on personality, the biggest reason he's invested in rationality is because he really wants to make sure Friendly AI happens before Unfriendly AI, and turns out that's really hard.

[*add more*]

What's the Pot of Gold at the End of the Rainbow?Things I claim you can get better at

- Believing true things and not believing false things.
- Arrive at true beliefs faster.
- "Failing mysteriously" less often
- Understanding how your own mind works.

Why some of the above things are awesome

- If you have something to protect, (you really want to make certain things happen) better models, more true beliefs, update speed, and being confused by lies, all make you more likely to make the changes you want to see in the world.
- If you get a kick out of more deeply grokking how the world around you works, a kick you will get.
- A lot of interpersonal problems come from two gaps:
- One between "How human minds work" and "How you think human minds work"
- One between "Your beliefs, feelings, and emotions" and "Your self-model of your beliefs, feelings, and emotions"
- Shorting those gap will result in less interpersonal problems.

Discuss

### Quantifying Human Suffering and "Everyday Suffering"

In the case of humans, it seems self-evident that suffering is a consciously experienced, *mental* or *psychological* phenomenon. This makes it difficult to quantify, given our lack of access to other beings’ qualia. However, the science of neuropsychology seeks to correlate reports of subjective experience with quantitative measures of physiological (brain) activity. If the variable being reported by subjects is the (relative) degree of suffering experienced at any moment, this gives us a way to quantify suffering by correlating this variable with relevant brain-scan variables.

Once quantitative measures are in place, different methods for suffering alleviation (e.g. meditation, therapy, psychotherapeutic drugs) can be assessed for their relative efficacy. This already happens in clinical contexts, for example by measuring the effect of “Mindfulness Based Stress Reduction” (MBSR) on variables such as cortisol levels, which are related to consciously experienced stress.

I’m not aware of any research to extend suffering-quantification (and subsequent alleviation) beyond clinical settings and into “everyday life”. Most people will never have a clinical symptom that requires a psychotherapeutic treatment, but that doesn’t mean they won’t be subject to significant amounts of suffering throughout their lives. We might call that “everyday suffering”.

Measuring everyday suffering, e.g. measuring cortisol levels of healthy subjects in their day-to-day lives, might inform opportunities to alleviate it. This is probably already happening to some extent. An example intervention: given MBSR’s efficacy at alleviating stress-levels of those with psychiatric disorders, it stands to reason that it will alleviate the stress of healthy subjects. Thus, one might imagine a government funded program to provide all citizens access to MBSR as a means of reducing cortisol/stress levels and their associated suffering.

Alleviating everyday suffering is akin to the “betterment of well people” and I simply want to raise the point (for discussion) that this might be a neglected cause. It’s not as pressing a challenge as mitigating the intense suffering of certain beings (like factory-farm animals) but if large, healthy populations are subject to any baseline of mental suffering, I think it’s important that we try to measure, and then work to reduce, that baseline. Even a small reduction of that baseline in a large population would mean a significant decrease in total global suffering.

If anybody knows of research to assess the mental health of large populations I would love to hear about it. Thanks, Will.

Discuss

### Complexity Penalties in Statistical Learning

I am currently taking a course on statistical learning at the Australian Mathematical Sciences Institute Summer School. One idea that has appeared many times in the course is that a more complicated model is likely to have many short comings. This is because complicated models tend to overfit the observed data. They often give explanatory value to parts of the observation that are simply random noise.

This is common knowledge for many aspiring rationalists. The term complexity penalty is used to describe the act of putting less credence in complicated explanations because they are more complex. In this blog post I aim to provide a brief introduction to statistical learning and use an example to demonstrate how complexity penalties arise in this setting.

Statistical LearningBroadly speaking, statistical learning is the process of using data to select a model and then using the model to make predictions about future data. So, in order to perform statistical learning, we need at least three things. We need some data, a class of models and a way of measuring how well a model predicts the future data. In this blog we will look at the problem of polynomial regression.

The DataFor polynomial regression, our data is in the form of .mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} n pairs of real numbers(x1,y1),(x2,y2),…,(xn,yn). Our goal is to find the relationship between the input values xi and the output values yi and then use this to predict future outputs given new inputs. For example, the input values could represent the average temperature of a particular day and the corresponding output value could be the number of ice creams sold that day. Going with this example, we can suppose our data looks something like this:

To simplify our analysis we will make some assumptions about the relationship between our the inputs and outputs. We will assume that there exists an unknown function g∗ such that y=g∗(x)+E, where E is a statistical error term with mean equal to 0 and variance equal to σ. This assumption is essentially saying that there is some true relationship between our input x and our output y but that the output can fluctuate around this true value. Furthermore we are assuming that these fluctuations are balanced in the positive and negative direction (since the mean of E is zero) and the size of these fluctuations doesn't depend on the input x (since the variance of E is constant).

The ModelsWe want models that can take in a new number x and predict what the corresponding y should be. Thus our models will be functions that take in real numbers and return real numbers. Since we are doing polynomial regression, the classes of models we will be using will be different sets of polynomial functions. More specifically, let Gp be the set of polynomials of degree at most p. That is Gp contains all functions g of the form

g(x)=a0+a1x+a2x2+…+apxp, where ai∈R.The parameter p corresponds to the complexity of the class of model we are using. As we increase p, we are considering more and more complicated possible models.

Evaluating the modelsWe now have our data and our class of models, the remaining ingredient is a way to measure the performance of a particular model. Recall that our goal is to find a model that can take in new numbers x and predict which y should be associated to it. Thus if we have a second set of data (^x1,^y1),(^x2,^y2),…,(^xm,^ym) , one way to measure the performance is to look at the average distance between our guess g(^xi) and the actual value ^yi. That is, the best model is the one which minimizes

1mm∑i=1|g(^xi)−^yi|.It turns out that looking at the average *squared *distance between our guesses and the actual value gives a better way to measure performance. By taking squares we are more forgiving when the model gets the answer almost right but much less forgiving when the model is way off. Taking squares also makes the mathematics more tractable. The best model now becomes the one which minimizes

The above average is called the *test loss *of the model g. From our assumptions about the type of data we're modeling we know that even the perfect function g∗ will occasionally differ from the output we're given. Thus, most of the time, we won't be able to make the test loss much smaller than σ which is the expected test loss of g∗.

Using the test loss to measure performance has one clear limitation, it requires a second batch of data to test our models. What do we do if we only have one batch? One solution is to divide our batch in two and keep some data to the side to use to test models. Another solution is to try to estimate the test loss. It turns out that complexity penalties naturally arise when exploring this second solution.

Training lossOne way to try to estimate the test loss is to look at how well our model matches the data we've seen so far. This gives rise to the *training loss* which is defined as

Note that for the training loss we're using the original data points to test the performance of our model. This makes the training loss easy to calculate and easy to minimize within the class Gp (the set of all polynomials of degree at most p). Here is a plot of some of the polynomials of a fixed degree that minimize the training loss. The purple polynomial has degree 1, the green polynomial has degree 2 and the black polynomial has degree 15.

Since the training loss only uses the old data it doesn't tell us much about how the model will perform on new data. For example, while the 15 degree polynomial matches the above initial data very well, it is overfitting. The 15 degree polynomial does a poor job of matching some new independent data, as shown below.

In general, we'd expect the training loss to be much smaller than the testing loss. This is because the model has already been calibrated to the original data. Indeed if we were using polynomials of degree p≥n−1 we would be able to find a model that passes through every data point (xi,yi). Such a model would have a training loss of 0 but wouldn't generalize well to new data and would have a high test loss.

Approximating the test lossThus it seems that the training loss won't be the most informative or useful ways of estimating the testing loss. However, the training loss is salvageable, we just need to add an extra term that makes up for how optimistic the training loss is. Note that we can write the test loss as

Test Loss=Training Loss + (Test Loss − Training Loss).Thus the difference between test loss and the training loss of a model gives us a way of quantifying how much the model is over-fitting the training data. Thus if we can estimate this difference we'll be able to add it to the training loss to get an estimate of the test loss and evaluate the performance of our model!

It turns out that in our particular case estimating this difference isn't too tricky. Suppose that we have a model g in the class Gp (that is g is a polynomial of degree at most p). Then for high values of n (that is when we have lots of data), we have the following approximation

Test Loss − Training Loss≈2σ(p+1)n.Rewriting this we have

Test Loss≈Training Loss +2σ(p+1)nThus we can measure the performance of our models by calculating the training loss and then adding 2σ(p+1)n. This number is our complexity penalty as it increases as the complexity parameter p increases. It also increases with σ, the variance of the errors. Thus the more noisy our data is, the more likely we are to overfit. Also the penalty decreases with n, the number of data points. This suggests that if we have enough data we can get away with using quite complicated models without worrying about overfitting. This is because with enough data the true relationship between the inputs and outputs will become very clear and even a quite complicated model mightn't overfit it.

One last interesting observation about this complexity penalty is the way it depends on a given model. Recall that a model g is a polynomial of degree p and that σ and n are parameters for the whole statistical learning problem. Thus the above complexity penalty depends on g only via the degree of g. This gives us the following tractable way of finding the best model. For each p we can find the polynomial of degree p that minimizes the training loss and record the training loss it achieves. We can then compare polynomials of different degrees by adding the complexity penalty 2σ(p+1)n to the training loss. We can then chose the best model based off which p minimizes the sum of the training loss and complexity penalty. The only downside to this method is that σ is an unknown quantity but hopefully some heuristics can be used to estimate it.

Below is a plot of the training loss, test loss and the approximation of the test loss from our example for different values of p. While the approximation isn't always exact it follows the general trend of the test loss. Most importantly, both the test loss and the estimation have a minimum at p=2. This shows that using approximation would let us select the best model which in this case is a quadratic.

Other examples of complexity penalties

Complexity penalties can be found all over statistical learning. In other problems the above estimate can be harder to calculate. Thus complexity penalties are used in a more heuristic manner. This gives rise to techniques such as ridge regression, LASSO regression and kernel methods. Model complexity is again an important factor when training neural networks . The number of layers and the size of each layer are both complexity parameters and must be tuned to avoid overfitting.

What makes the above example interesting is that the complexity penalty arose naturally out of trying to measure the performance of our model. It wasn't a heuristic but rather a proven formula guaranteed to provide a good estimate of the test loss. This in turn gives support to the heuristic complexity penalties used in situations when such proofs or formulas are more difficult to come by.

ReferencesThe ideas in this blog post are not my own and come from the AMSI Summer School course Mathematical Methods for Machine Learning taught by Zdravko Botev. The notes for the course will soon be published as a book by D. P. Kroese, Z. I. Botev, S. Vaisman and T. Taimre titled "Mathematical and Statistical Methods for Data Science and Machine Learning".

I made the plots myself but they are based off similar plots from the course. You can access the data set I made and used to make the plots here.

Discuss

### How to stay concentrated for a long period of time?

I find it nearly impossible to focus on anything more than 2+ hours straight. Even if I enjoy an activity, I end up being distracted by a hell lot of things or just being too exhausted to continue doing it.

The situation is only getting worse if I have a task that is not rewarding in a short-term. Yet a lot of people I know can endure unpleasant tasks even for a few days (e.g. when they have to finish "that dull project" right before deadline).

The question is, how can a person develop ability for staying focused? What kind of tricks can be used? If this tricks can be harmful for health, I'd like to hear about them anyway.

Discuss

### Depression philosophizing

*Epistemic status: Weak confidence. Refine and dismantle as you see fit.*

For certain people, philosophical thinking is net harmful to their everyday life.

This should not be that surprising. Certain kinds of cognitive behavior do reliably lead to unhappiness, and there's no *a priori *reason to suppose explicit, logical thinking is somehow exempt from that risk. For many people, it appears that a stable sense of identity, purpose in life, and place in society are important factors in creating and maintaining happiness and contentment. Philosophical thinking often involves destabilizing those concepts.

I want to point to something I have noticed in myself, and suspect happens in others as well. I call it "depression philosophizing." You begin to think philosophically about your life, and slowly, maybe imperceptibly, you feel worse and worse about yourself, as you meditate on such concepts as morality, meaning, and ontology in an abstract sense.

It's tempting to unilaterally demonize depression philosophizing. But there is one big thing about it that stands out and that makes it so hard to quit doing - and that's that the quality of your intellectual rigor doesn't correlate very much with your emotional state. Plenty of great philosophers were miserable people (Wittgenstein, Nietzsche, Schopenhauer).

I think rationalists are likely to fall prey to this trap. As a group, we have a revealed preference towards abstract thinking and philosophy. Some of our folk heroes appear unusually good at facing philosophical problems without letting it get to them or divert them from their goals - Nate Soares pops to mind for me.

I don't have a good answer for how to combat this beyond the usual mechanisms used to treat depression. But I've had some success at simply reminding myself that even the act of stopping and thinking has an opportunity cost to it - it's not actually a very wise move to devote large amounts of time running your brain in circles around a tempting philosophical issue when you know people have tried and failed to answer it conclusively for thousands of years. Sometimes I even accept the maxim "ignorance is bliss", in this small domain of human experience. These experiences remind me strongly of cognitive-behavioral therapeutic techniques. I hope this helps someone else who grapples with depression philosophizing to start reclaiming ground from their own disastrously clever mind.

Discuss

### How to notice being mind-hacked

*Epistemic status: quite sure, but likely nothing new, I have not done the requisite literature search.*

Human mind is not designed with security in mind. It has some defenses against basic adversaries that would have prevented our survival as a species, but not much more than that. It is also necessarily open to external influences because humans are social animals and cooperation is essential for survival. So, any security expert would be horrified at how vulnerable to adversarial mind hacking humans are. Humans generally do not like to accept how easy we are to sway, and how often it happens to us, but we can definitely see other people being easily influenced, and most of us aren't special in terms of mind security.

Another common term for it is "manipulation," but there is a slight difference. Manipulation generally presumes that the interests of the manipulator are detrimental to the mind being manipulated. Mind hacking does not have to have this negative connotation.

So, given that our minds are security sieves and we live in the world where influencing others (yet another term for mind hacking), and where we have certainly been mind-hacked by others over and over again, how does one notice a hack (unauthorized breach of mind security), whether when it is about to happen, when in progress, and after the fact? I am limiting the scope to just noticing. I am not implying that one has to try to stop a mind hack in preparation or in progress, or trying to undo it after it happened. Descriptive, not prescriptive.

Let's start with a a few obvious examples.

**Your friend, noticing your distress, invites you to their church event**, just to get your mind off things. A month later, you have converted to their faith, quote scriptures, believe in salvation and dedicate your life to spreading the gospel.

**Or you come across a book**, say, HPMoR or From AI to Zombies (I take partial credit/blame for the latter name), learn about rationality, get blown away by Eliezer's genius, and, next thing you know, you are at a local x-risk meetup worrying about an unaligned AI accidentally paper-clipping the universe and donating 10% of your income to an EA cause.

**Or you pick up** a Siouxsie and the Banshees CD in a record store (back when CDs and record stores were a thing), and soon you are a part of the goth subculture, deathhawk up every weekend, your carefully crafted Rihanna mixtape (another anachronism) gathering dust in the back of the bottom drawer.

**Or maybe you end up at a kink munch**, seemingly out of idle curiosity, then at a play party, then you discover your submissive side, end up dumping your vanilla partner and go on a sub frenzy and eventually settle as a slave to a Master/Mistress.

Not all mind hacks are as striking. But these somewhat extreme, yet also mainstream examples is a good place to start the analysis. Some salient features:

- A glaring chasm between your identity before and after the event.
- Acceptance of your current identity and thinking of yourself before the event as immature/naive/stupid/unenlightened.
- Realization that the you before the event would likely be similarly disapproving of the change that transpired and would have prevented it if they could anticipate it.
- [What else?]

The above suggests how to notice the event *post hoc* (*post hack*?). The identity disconnect and the feelings around it are a telltale sign.

Noticing a hacking attempt or a hack in progress is probably harder. When skillfully executed, it never rises to the conscious level. You don't necessarily consciously notice your identity changing. Instead, you may be swept in the feelings of insight, being wowed, enlightened, or the opposite, intense guilt, shame and remorse, and often some combination of both. And even if we do recognize it for what it is, these same intense feelings can be too addictive to break the spell, and we can crave them more and more, and rationalize away what is happening. So, to provisionally answer the title non-question, watch out for the mind-hack-associated feelings.

What have been your experiences with noticing being mind hacked, intentionally or accidentally, or with doing it to others, whether on purpose or not?

Discuss

### How does Gradient Descent Interact with Goodhart?

I am confused about how gradient descent (and other forms of local search) interact with Goodhart's law. I often use a simple proxy of "sample points until I get one with a large .mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} U value" or "sample n points, and take the one with the largest U value" when I think about what it means to optimize something for U. I might even say something like "n bits of optimization" to refer to sampling 2n points. I think this is not a very good proxy for what most forms of optimization look like, and this question is trying to get at understanding the difference.

(Alternatively, maybe sampling is a good proxy for many forms of optimization, and this question will help us understand why, so we can think about converting an arbitrary optimization process into a certain number of bits of optimization, and comparing different forms of optimization directly.)

One reason I care about this is that I am concerned about approaches to AI safety that involve modeling humans to try to learn human value. One reason for this concern is that I think it would be nice to be able to save human approval as a test set. Consider the following two procedures:

A) Use some fancy AI system to create a rocket design, optimizing according to some specifications that we write down, and then sample rocket designs output by this system until you find one that a human approves of.

B) Generate a very accurate model of a human. Use some fancy AI system to create a rocket design, optimizing simultaneously according to some specifications that we write down and approval according to the accurate human model. Then sample rocket designs output by this system until you find one that a human approves of.

I am more concerned about the second procedure, because I am worried that the fancy AI system might use a method of optimizing for human approval that Goodharts away the connection between human approval and human value. (In addition to the more benign failure mode of Goodharting away the connection between true human approval and the approval of the accurate model.)

It is possible that I am wrong about this, and I am failing to see just how unsafe procedure A is, because I am failing to imagine the vast number of rocket designs one would have to sample before finding one that is approved, but I think maybe procedure B is actually worse (or worse in some ways). My intuition her is saying something like: "Human approval is a good proxy for human value when sampling (even large numbers of) inputs/plans, but a bad proxy for human value when choosing inputs/plans that were optimized via local search. Local search will find ways to hack the human approval while having little effect on the true value." The existence of adversarial examples for many systems makes me feel especially worried. I might find the answer to this question valuable in thinking about how comfortable I am with superhuman human modeling.

Another reason why I am curious about this is that I think maybe understanding how different forms of optimization interact with Goodhart can help me develop a suitable replacement for "sample points until I get one with a large U value" when trying to do high level reasoning about what optimization will look like. Further this replacement might suggest a way to measure how much optimization happened in a system.

Here is a proposed experiment, (or class of experiments), for investigating how gradient descent interacts with Goodhart's law. You might want to preregister predictions on how experiments of this form might go before reading comments.

Proposed Experiment:

1. Generate a true function V:Rn→R. (For example, you can write down a function explicitly, or generate a random function by randomly initializing a neural net, or training a neural net on random data)

2. Generate a proxy function U:Rn→R, which can be interepereted as a proxy for V. (For example, you can generate a random noise function W, and let U=V+W, or you can train a neural net to try to copy U)

3. Fix some initial distribution μ on Rn, which will represent random sampling. (For example the normal distribution)

4. Define from μ some other distribution ^μ, which can be interpreted as sampling points according to μ, then performing some kind of local optimization according to U. (For example, take a point x according to μ, then perform k steps of gradient ascent on U, or take a point x according to μ, sample k more points all within distance ε of x, and take the one with the highest U value)

5. Screen off the proxy value by conditioning points sampled from μ and ^μ to be in a narrow high band of proxy values, and compare the corresponding distribution on true values. (For example, is E(V(x)|U(x)∈(y,y+ε)) greater when x is sampled from μ or ^μ?)

So, after conditioning on having a high proxy value, μ represents getting that high proxy value via sampling randomly until you find one, while ^μ represents a combination of random sampling with some form of local search. If μ does better according to the true value, this would imply that the optimization via gradient descent respects the true value less than random sampling.

There are many degrees of freedom in the procedure I describe above, and even more degrees of freedom in the space of procedures that do not exactly fit the description above, but still get at the general question. I expect the answer will depend heavily on how these choices are made. The real goal is not to get a binary answer, but to develop an understanding of how (and why) the various choices effect how much better or worse local search Goodharts relative to random sampling.

I am asking this question because I want to know the answer, but (maybe due the the experimental nature) it also seems relatively approachable as far as AI safety question go, so some people might want to try to do these experiments themselves, or try to figure out how they could get an answer that would satisfy them. Also, note that the above procedure is implying a very experimental way of approaching the question, which I think is partially appropriate, but it may be better to think about the problem in theory or in some combination of theory and experiments.

(Thanks to many people I talked with about ideas in this post over the last month: Abram Demski, Sam Eisenstat, Tsvi Benson-Tilsen, Nate Sores, Evan Hubinger, Peter Schmidt-Nielsen, Dylan Hadfield-Menell, David Krueger, Ramana Kumar, Smitha Milli, Andrew Critch, and many other people that I probably forgot to mention.)

Discuss

### Reality and rational best practice

*This post is part of my* *Hazardous Guide To Rationality.* *I don't expect this to be new or exciting to frequent LW people, and I would super appreciate comments and feedback in light of intents for the sequence, as outlined in the above link.*

- The Simple Truth
- The shifting sands of belief
- Updating as the winds of evidence shift, not in begrudging jumps and jerks
- Why you don't need "certainty"
- ... and why it feels like you totally do need it.
- 0 and 1 aren't probabilities
- Fallacy of the grey
- Make your beliefs pay rent
- Reductionism
- Pole Vaulting over the Uncanny Valley of Bad Rationality
- Crash course in VNM rational agents (and why you aren't one)
- Rescuing the Utility function
- More from a "how not to personally fall into an existential funk" perspective.

Discuss

### How the Social Affects your rationality

*This post is part of my* *Hazardous Guide To Rationality.* *I don't expect this to be new or exciting to frequent LW people, and I would super appreciate comments and feedback in light of intents for the sequence, as outlined in the above link.*

- You brain is primarily designed for social reasoning, not reality reasoning.
- Posts: face of the ice, of two minds,
- Hansionian exploration of how our minds are structured to play politics.
- Most people don't actually need to be that attuned to reality to get by and live an okay/great life.
- Much of what people say is debate or arguing about the truth, is actually a coordination game where people are trying to make social outcomes happen by "logically trapping" others and "appealing to the reasonable person interface"
- It is possible for
*YOU*to be trying to achieve a social outcome, yet think you are just talking about your beliefs about reality. - Insert A Humans Guide to Words (but condensed)
- There are a lot of English sentences that feel like questions you can ask about the world, that are just nonsense
- It feels like words
*mean*something, but they don't. It's people who mean things, and they happen to use words. - Given all that a lot of the people you will interact with won't have a deep understanding on all of the ways their natural manner of thinking can lead to nonsense, how might you go about communicating with people?
- Double Crux

Discuss

### A Crash Course in Your Brain

*This post is part of my Hazardous Guide To Rationality. I don't expect this to be new or exciting to frequent LW people, and I would super appreciate comments and feedback in light of intents for the sequence, as outlined in the above link. *

Talking about truth and reality can be hard. First, we're going to take a stroll through what we currently know about how the human mind works, and what the implications are for one's ability to be right.

*Outline of main ideas. Could be post per main bullet.*

- The Unconscious exists
- There is "more happening" in your brain than you are consciously aware of
- S1 / S2 introduction (research if I actually recommend Thinking Fast and Slow as the best intro)
- Confabulation is a thing
- You have an entire sub-module in your brain which is specialized for making up reasons for why you do things. Because of this, even if you ask yourself, "Why did I just tip over that vase?" and get a ready answer, it is hard to figure out if that is a true reason for your behavior.
- By default, thoughts feel like facts.
- The lower-level a thought produced by your brain, the less it feels like, 'A thing I think which could be true of false" and the more it feels like, "The way the world obviously is, duh."
- Your intuitions do not have special magical access to the truth. They are sometimes wrong, and sometimes right. But unless you pay attention, you are likely to
*by default*, believe them to be compleely correct. - We are Predictably Wrong
- You are do not automatically know what you believe is and is not true about the world
- You also have the ability to say "I believe XYZ" while having no meaningful/consequential relations of XYZ to the rest of your world model. You can also not notice that this is the case.
- Luckily, you do still have some non-zero ability to have anticipation/expectations about reality, and have world models/beliefs.
- When beliefs are secretly decisions, not models.

Discuss

### Super-Human Feedback

I've taken to calling Debate, Amplification, and Recursive Reward Modeling **"Super-human feedback" (SHF)** techniques. The point of this post is just to introduce that terminology and explain a bit why I like it and what I mean by it.

By calling something SHF I mean that it aims to outperform a single, unaided human H at the task of providing feedback about H's intentions for training an AI system. I like thinking of it this way, because I think it makes it clear that these three approaches are naturally grouped together like this, and might inspire us to consider what else could fall into that category (a simple example is just using a team of humans).

I think this is very similar to "scalable oversight" (as discussed in Concrete Problems), but maybe different because:

1) It doesn't imply that the approach must be scalable

2) It doesn't require that feedback is expensive, i.e. it applies to things where human feedback is cheap, but we can do better than the cheap human feedback with SHF.

Discuss

### What kind of information would serve as the best evidence for resolving the debate of whether a centrist or leftist Democratic nominee is likelier to take the White House in 2020?

The public debate for who should be the Democratic Party presidential candidate for the 2020 electoral race hinges on whether the Dems are likelier to win by nominating a progressive or socialist, or by nominating a centrist or moderate. Assuming this is a problem one wanted to approach epistemically (as opposed to just viewing the debate as a power struggle), we should look for facts of the matter about whether, overall, Americans are likelier to elect a progressive/socialist, or moderate/centrist to the Presidency in 2020. Unfortunately there is no apparent consensus about how to make traction on this problem.

- Having read a variety of sources for months now, left-leaning media outlets give all the usual arguments for why the Dems must nominate Sanders or another progressive to win, and all the more moderate media outlets say the exact opposite. Ideally, some people from both sides would come together for an adversarial collaboration, but it doesn't appear either side can even agree on what would count as good or bad evidence for their positions. In other words, most of the time on both sides of this issue are playing reference class tennis, and neither side is aware this is what they're doing.
- On social media I started a discussion on this topic among some rationalists and effective altruists to see if it would elicit hope to make traction on this issue. None of them who had a strong opinion on this issue could or would back it up at length, which I'm assuming means they don't have strong, asymmetrical evidence (at least not in the eyes of the opposition) for their position. It's my expectation two rationalists
*should*be aware when they're playing a game of reference class tennis. If on this topic they are aware, it doesn't appear rationalists know how to get out of this failure mode better than anyone else. - My search of last resort, at least under the streetlight (
*read:*my attempt to answer this question without doing any work myself), was to check PredictIt. As of this posting, the top 6 contenders have been assigned between a 9% chance and 23% chance of winning the Democratic nomination. In roughly the same rank order for chances of winning the Presidency overall, those same 6 prospective Democratic nominees are currently receiving an assignment of between a 6% and 19% chance of winning. These 6 contenders run the gamut from Beto O'Rourke and Joe Biden,

commonly regarded as the most moderate/centrist prospective Dem nominees, to Bernie Sanders, whose policy platform is generally furthest to the left among the lot of them. Leading the pack is Kamala Harris who has announced her presidential run and is perceived as somewhere between centrist Dems and democratic socialists like Sanders, with Sanders and Biden tied for 2nd (both are yet to publicly declare if they'll be running in 2020). PredictIt's predictions for which candidate will fare best are strongly correlated with who is regarded as being the strongest contender as a personality. They appear to have little to no correlation with the candidate's perceived ideology. The high level of apparent uncertainty in the predictions is likely an artifact of it being so early in the presidential race, as opposed to the prediction markets pricing in how likely the candidates' policy platforms are to appeal to the median voter.

**In other words**, if there is solid info out there which could best predict whether, in general, a leftist or centrist Democrat is likelier to win the Democratic nomination and/or the Presidency in 2020, *it doesn't look like this has been priced into PredictIt's predictions yet*. So to answer this question, we might have to find info prediction markets haven't yet, or that they may have at least overlooked.

*Suggestions for how to answer this question:*

- I'm not looking for the kind of evidence the public, for whatever value of "the public," would necessarily agree would resolve this debate. I'm looking to answer this question for my own curiosity. I.e., if you think people are generally too mind-killed on this subject to notice good evidence even when it's staring them in the face, don't worry about that. Please optimize for sending me what you consider the best empirical info on this subject my way without consideration for how anyone else might take it.
- Uncovering the relevant info could be easier than it appears. Maybe the data I'm looking for isn't a needle in the haystack. For all I know there is some excellent poll that perfectly answers this question, but nobody in the media cited it because the results didn't make for a sexy clickbait headline.

Discuss