Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 5 минут 17 секунд назад

A Concrete Proposal for Adversarial IDA

26 марта, 2019 - 22:50
Published on March 26, 2019 7:50 PM UTC

Note: This post came out of a conversation with Geoffrey Irving and Buck Shlegeris.

Epistemic Status: I suspect Paul has already thought of most or all of the ideas presented here, though I nevertheless found the exercise of carefully specifying an IDA implementation helpful and suspect others may find reading it helpful as well.

This is a proposal for how to train a machine learning model to approximate HCH using Iterated Distillation and Amplification (IDA). This particular proposal came out of a desire to use a debate-like adversary to improve the amplification process, and the primary goal of this proposal is to show how one could do that. Though I have tried to retain a lot of the relevant detail, I have made two simplifications to make this proposal easier to specify: I am attempting to approximate something closer to weak HCH rather than strong HCH and I am only allowing the generation of two subquestions at a time. I am confident that those simplifications could easily be dropped, though I think doing so here would only make this presentation more complicated.

Before I proceed, I want to make one final note: this is not a proposal for how to build an aligned AGI. I think there are still a whole bunch of issues that would prevent this proposal from actually working.


We will start with some initial definitions:

  • Let .mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} Q be the set of all questions in natural language.
  • Let A be the set of all answers in natural language.
  • Let M be the sum type of either Q×Q or A representing either an answer to the given question or two subquestions to help answer it.
  • Let H:Q→A be the answer that a human gives to the given question.
  • Let Hfan out:Q→M be the answer or subquestion pair generated by a human when asked what to do with the given question.
  • Let Hfan in:Q×(Q×A)×(Q×A)→M be the answer or two subquestions generated by a human to some question when given answers to two subquestions related to that question.
  • Let ML:Q→Δ(A) be a model (the training procedure for which we will describe below) from questions to a probability distribution over strings representing answers. Specifically, we will implement the probability distribution by having our model output an embedding vector which yields the probability distribution when fed into some trained language model (by repeatedly conditioning on previous characters and multiplying all the conditional probabilities).
  • Let MLfan out:Q→Δ(M) be the function from questions to an embedding vector representing a distribution over answers or subquestions generated by asking ML what to do with the given question. For the two subquestion case we enforce that the distribution is symmetric wrt interchange of the subquestions.
  • Let MLfan in:Q×(Q×A)×(Q×A)→Δ(M) be the function from two answers to an embedding vector representing a distribution over answers or subquestions generated by asking ML to integrate the given subquestion answers into an answer to the original question. We again enforce symmetry wrt interchange of the subquestions.
  • Let Adv:Q×A→R be an adversary model which we will train (as described below) to predict how good the given answer is to the given question.
  • Let Advfan out:Q×M→R be an adversary model for MLfan out generated by calling Adv using the same transformation as used to implement MLfan out from ML.
  • Let Advfan in:Q×(Q×A)×(Q×A)×M→R be an adversary model for MLfan in generated by calling Adv using the same transformation as used to implement MLfan in from ML.
  • Let ϵ be some Gumbel random variable such that each use of ϵ below is its own independent sample from a Gumbel distribution. We use a Gumbel random variable so that we can use the Gumbel-max trick to effectively do Boltzmann exploration.

The goal here is to use the adversary to determine which portions of the HCH tree we actually go down to maximize our sample efficiency wrt H. Specifically, the amplification procedure is as follows:

1. Generate some question Q∈Q either from the training distribution or from the output of some adversary.

2. Call Hfan out(Q). If we get an answer, let A=Hfan out(Q) and skip to the end. Otherwise, let SQ1,SQ2=Hfan out(Q).

3. Call Adv to generate scores S1=Adv(SQ1,ML(SQ1)) and S2=Adv(SQ2,ML(SQ2)).

4. Without loss of generality, let S1+ϵ≤S2+ϵ and recurse on SQ1 (by setting Q=SQ1 in the subcall) to generate A1 (from the A of the subcall) and let A2=ML(SQ2).

5. Call Hfan in(Q,(Q1,A1),(Q2,A2)). If we get an answer, let A=Hfan in(Q,(Q1,A1),(Q2,A2)). Otherwise, let SQ1,SQ2=Hfan in(Q,(Q1,A1),(Q2,A2)) and return to step 3.

This procedure should eventually produce an answer, though we then need to compute the following losses for ML,MLfan out,MLfan in,Adv,Advfan out,Advfan in:

L=−log(ML(Q)|A)Lfan out=−log(MLfan out(Q)|Hfan out(Q))Lfan in=−log(MLfan in(Q,(Q1,A1),(Q2,A2))|Hfan in(Q,(Q1,A1),(Q2,A2)))LAdv=∣∣S1−log(ML(SQ1)|A1)∣∣2LAdvfan out=|Advfan out(Q,(SQ1,SQ2))+Lfan out|2LAdvfan in=|Advfan in(Q,(SQ1,A1),(SQ2,A2),A)+Lfan out|2

Note that some of these losses are actually for the same model since MLfan out,MLfan in are implemented internally by calling ML and Advfan out,Advfan in are implemented internally by calling Adv.


For the distillation procedure, we want to improve ML by training it to approximate its own ML-consulting-ML tree. Specifically:

1. Generate some Q∈Q as in amplification.

2. Call MLfan out(Q). If we get an answer, let A=MLfan out(Q) and skip to the end. Otherwise, let SQ1,SQ2=MLfan out(Q).

3. Let A1=ML(SQ1) and A2=ML(SQ2).

4. Call MLfan in(Q,(Q1,A1),(Q2,A2)). If we get an answer, let A=MLfan in(Q,(Q1,A1),(Q2,A2)). Otherwise, let SQ1,SQ2=MLfan in(Q,(Q1,A1),(Q2,A2)) and return to step 3.

Then, compute the following losses for ML,Adv:


Finally, we want to iterate the above procedure by replacing H,Hfan out,Hfan in in the amplification procedure with some H′,H′fan out,H′fan in. First, let α be some confidence threshold. Then, we will define the following primed Hs:

  • Let H′(Q)=if Adv(Q,ML(Q))+ϵ<α+ϵ then H(Q) else ML(Q).
  • Let
H′fan out(Q)=if Advfan out(Q,MLfan out(Q))+ϵ<α+ϵ then Hfan out(Q) else MLfan out(Q)
  • Let
H′fan in(Q,(SQ1,A1),(SQ2,A2))=if Advfan in(Q,(SQ1,A1),(SQ2,A2),MLfan in(Q,(SQ1,A1),(SQ2,A2)))+ϵ<α+ϵ then Hfan in(Q,(SQ1,A1),(SQ2,A2)) else MLfan in(Q,(SQ1,A1),(SQ2,A2))

This procedure allows us to continue amplifying the model while using the adversary to require only minimal human data that is selected so as to be maximally helpful.


This proposal differs in a couple of ways from previous proposals made by Paul. First, Paul has recently moved away from discrete amplification/distillation steps. This proposal, however, provides a way to recover discrete steps while still collapsing the recursion. In practice, however, you might still just want to stick with the amplification procedure described here without doing the distillation step, as it isn't strictly necessary.

Second, this proposal uses an adversary to guide the training process. This technique is similar to the concept of importance sampling. The main benefit of this approach is that it takes advantage of active learning by allowing the system to choose which questions and subquestions would be most useful for it to have answered by a human.

Another benefit of the adversary, however, is that it could make transparency much easier. One of the major benefits of IDA is that ML gets trained to approximate its own ML-consulting-ML tree. As a result, understanding the reasoning that went into the final answer produced by ML can be recovered by unfolding its tree (at least in the limit of perfect training). However, unfolding the entire tree is very expensive, as it's linear in the size of the tree. With an adversary, however, you can choose which portions of the tree to unfold first by calling the adversary, enabling you to find errors much more quickly; for a perfect adversary, this reduces the problem of finding an error to O(logn) instead of O(n).

Thus, the hope is that the use of such an adversary could assist both in making IDA more competitive (by increasing sample efficiency and using active learning) and in making IDA safer (due to the increased ease of transparency).

It should be noted, however, that it is also possible that the use of such an adversary might make the safety situation for IDA worse. First, it introduces the possibility of a robustness to relative scale failure if either ML or Adv gets significantly stronger than the other. One possible way to resolve such an issue, however, might be to give Adv the ability to call ML and vice versa, allowing them to use each other to boost their own capabilities. Second, for an ML and Adv system that are themselves optimizers, with goals that don't perfectly match up with their loss functions, they could cooperate to make it arbitrarily unlikely that H is ever consulted on some specific question. Third, even if ML and Adv weren't cooperating, an RSA-2048-style failure could still prevent the identification of malicious cognition. Resolving failures of these second two types is still an open question (see "Inner Optimization," Hubinger, van Merwijk, Mikulik, Skalse, and Garrabrant, forthcoming).


Deck Guide: Biomancer’s Familiar

26 марта, 2019 - 18:20
Published on March 26, 2019 3:20 PM UTC

Biomancer’s Familiar is a great card. I wanted it to happen so bad. I spent a substantial portion of my preperations for Cleveland trying to make Biomancer’s Familiar happen. I tried two color versions. I tried three color versions. I tried going big, going small, going wide, and everything else I could think of. I came close enough to consider buying the cards.

Ultimately, I could not make it happen. Blue was too strong and too structurally tough. You had to give up Pelt Collector, had to pay real mana for your spells. The format wanted different things than the Familiar deck could provide. Sideboarding wasn’t as impactful for you as I wanted. The deck was good. It was fun as hell. But not good enough. I switched to blue, tore up the ladder with it, and never looked back. Things didn’t work out at the Pro Tour, but given the overall results, I am confident I made the right decision.

There are three reasons to share the deck now.

The first reason is, as noted, that the deck is great fun. As constructed, it’s a step behind where it needs to be to win major tournamens, but it’s still good enough to get five wins more often than not in Traditional Constructed. Its best draws bury people if not answered, and it gets them often.

The second reason is that perhaps I missed something. There’s a lot of good things going on, so the deck might be one card, or one idea or sideboard plan, away from competitive. That might be a two color or three color build. It might involve a card from War of the Spark.

The third reason is that now that I write it out, I think this deck plays fine against what’s out there right now. My results weren’t as good as with blue, but perhaps things have changed.

Here is the strongest version of the deck:

3 Adventurous Impulse

2 Quench

2 Negate


4 Llanowar Elves

4 Biomancer’s Familiar

4 Growth-Chamber Guardian

4 Incubation Druid

2 Druid of the Cowl

4 Sphinx of Foresight

4 Frilled Mystic

4 Biogenic Ooze

10 Forest

5 Island

4 Hinterland Harbor

4 Breeding Pool


4 Kraul Harpooner

1 Thorn Lieutenant

4 Entrancing Melody

2 Negate

2 Spell Pierce

2 Dive Down

There are only eight adapt creatures in the deck for Biomancer’s Familiar. This seems light, but you have a lot of search, and more of that is not what you need. There are places where a Xengara, Utopia Speaker or two would be welcome for additional power, but I found that if we were going large and finding ways to close things out, it was better to just go straight to Biogenic Ooze. Ooze does often get a substantial boost from Biomancer’s Familiar, although those games are usually (but not always!) yours anyway.

This deck has a lot of mana. You have 23 lands, 3 Adventurous Impulse that will almost always hit mana if you want them to, and ten mana creatures. You also have a lot of ways to use that mana. Adventurous Impulse often finds a good sink if you want one, Sphinx sends mana away once you’re set. Incubation Druid turns into a 3/5 even without a Biomancer’s Familiar. WIth Biomancer’s Familiar it turns into a giant mana sink. Growth-Chamber Guardian eats up a lot of mana. Other games, you have a lot of mana and use it to keep counters up while developing your board. That’s fine too.

Path one is to go for the quick easy win. You get a lot of easy wins from a quick Biomancer’s Familiar even without Llanowar Elves. Get one out on turn 2, play a Growth-Chamber Guardian on turn 3 and turn it to 4/4 on the spot. Next turn, you’ll have a 6/6 and a 4/4 (plus a 2/2) and from there it rapidly gets worse, so those two cards together will beat most draws that can’t remove the Biomancer’s Familiar, even without additional spells.

The other main path is to build to five or more mana, then go for it. On turn four, you can deploy the Biomancer’s Familiar and the Growth-Chamber Guardian, or boot up the Incubation Druid right away and tap it for mana to keep going, or both. Other times, you power out a quick Biogenic Ooze instead, which also works. Having Ooze gives you extra ‘packages’ to deploy if Kaya’s Wrath or Gates Ablaze sets you back. Given Incubation Druid, you can often do this reasonably early with counter backup.

Playing traditional aggro-control without the boost is also strong. Force them to either walk into your counters, especially Frilled Mystic, or you can adapt if they pass, and things get steadily worse.

Sphinx of Foresight is a very good card that doesn’t have a good home elsewhere. This is its chance to shine, as you highly value the scry to set up your combinations, and going mana creature into turn three Sphinx of Foresight is often quite strong. If you untap with it, you often don’t have to ever tap out again and the extra scry triggers are more impactful than they appear, as once your mana is set, and especially once you find the first Growth-Chamber Guardian, you have a few very high impact cards and a lot of very low impact cards. Other times, they tap out dealing with it and you stick a Biogenic Ooze.

A nice bonus for this deck is that you know when Biomancer’s Familiar or Growth-Chamber Guardian isn’t doing anything for you. Sometimes you have a duplicate. Sometimes you have other uses for your mana. Sometimes there’s nothing to use the Biomancer’s Familiar on, and you have enough mana to work without it when that changes. In those cases, you can expose your creatures and let them get killed, soaking up mana and removal to make way for later. Smart opponents know that Biomancer’s Familiar is a space bunny and that space bunnies must die.

Sideboarding poses a problem. The deck’s cards, other than Biogenic Ooze, are all either counters, mana, or working towards making your central engine happen. What can we take out? If we put in good cards from the sideboard, are we improving matters? That’s why sideboarding wasn’t impactful enough. The new cards were good, but you have to give up a lot to put them in.



There are two problems in the blue matchup. The first is that you have a hard time stopping Curious Obsession. The four cheap counters help but if you try that you get blown out by Spell Pierce or by them not having Curious Obsession in the first place, in games where deploying mana would have let you compete. Your best weapon against Curious Obsession therefore is Sphinx of Foresight, since there are probably not putting Obsession on Mist-Cloaked Herald or Tempest Djinn, and it’s often not possible for them to hold up a counter on turn three that stops a creature. Your other best weapon is to overpower them through it. If you have Biomancer’s Familiar and Growth-Chamber Guardian, or stick a Biogenic Ooze, it is not going to much matter that they draw two cards per turn. You can also try to use your counters to stop Temptest Djinn, which is difficult for them to have counter backup to defend. Without the Djinns, they can’t bring much power to the table.

The other problem is that they have all the control. With lots of one drops, flyers, chump blockers, Merfolk Tricksters and counters, they choose where the battle is fought. Often you will have a lot more power, and they find a way to win regardless, especially if you had to take turns two and three setting up before things get rolling.

If they hang back and don’t tap mana, usually the right thing to do is develop your mana. If they fight it, you’ll still have enough and run out of counters. If they don’t fight it, you can pick up tempo and start double casting or having counter backup later. You have a lot of threats that are quite frustrating for them, and can make playing a Tempest Djinn quite perilous. Once you have them on the board, force them to make a move.

That doesn’t mean the matchup is great. It’s definitely not, but it is winnable.

You can do more or less sideboarding on the margin, the detaul is something like:

In: +4 Kraul Harpooner, +4 Entrancing Melody, +1 Spell Pierce, +1 Negate

Out: -3 Adventurous Impulse, -2 Druid of the Cowl, -4 Frilled Mystic, -1 Biogenic Ooze

Sideboarding offers you some very strong cards. Kraul Harpooner is perfect and fits right into your strategy. You also are very good with Entrancing Melody, with so much mana as to cast it often with counter backup, while the Kraul Harpooner keeps Siren Stormcaller from getting in the way. Your core strategy is to deploy creatures, so it’s hard for them to have a lot of defenses for Entrancing Melody like Negate or Dive Down, and if they try to respond in kind then that’s the type of mana exchange that favors you quite a bit.

We can consider more copies of Negate, or Spell Pierce, althoguh the motivation for those cards lies elsewhere.

I tested Essence Capture in this and other places. It’s very cute and sometimes a true blowout when it turns on Incubation Druid or a smaller one on Growth-Chamber Guardian, but the double blue mana wasn’t quite compatible with our mana base once we see what we have to sideboard out.

Frilled Mystic is the easy cut. Playing a waiting game and refusing to tap mana against a deck full of one drops, where they can counter back, where the 3/2 body doesn’t have much impact, is not a good idea. The other cut turns out to be Adventurous Impulse. Even tapping one mana is often something you don’t have time for, and you’re bringing in a high impact spell for a high impact creature, which makes the card much worse. For similar reasons, you let go of Druid of the Cowl, as it doesn’t block anything and the draws it enables are often far too slow, or let them break us up with counters. That gives us room for a third Negate or first Spell Pierce. Depending on how you feel about where that leaves the mana and how much you are determined to fight Curious Obsession, you can then cut copies of Biogenic Ooze. Being on the play versus draw can also be a consideration.

If you wanted to improve matters further after board, you could play Crushing Canopy in the board, or have access to more copies of Quench. It’s not clear how else we can improve much.


They will kill as many creatures as they can on sight. This is wise. Your goal is to keep forcing them to do this until they run out of removal, or develop your mana so that you can slip in the engine or an Ooze while they’re tapped out. There’s a scary early phase where you can get overrun, and a scary later phase where you have to find a way to quickly turn the corner before you get burned out, and often won’t have good attacks that seem safe. Ooze is better than it looks here because it lets you close games quickly without the engine despite being low on life. Often you have to do this while holding up Frilled Mystic for many turns, which can make things tricky. Sometimes you get burned out before you can finish the job, or have to expose risk of that happening to avoid giving them too much time.

The other way you lose is Experimental Frenzy or Rekindling Phoenix against an unimpressive board. Ideally you have counters ready for that, and there is a point in the game where this becomes your primary concern.

Thus, the early turns are largely about preventing them from getting creature damage in and establishing a board that will let you sit on counters. Druid of the Cowl is very good on turn two, as they have to take time off to kill it or you get to play a Sphinx, or a two drop with counter backup, on turn three.

In: +1 Thorn Lieutenant, +4 Entrancing Melody, +2 Negate

Out: -3 Adventurous Impulse, -4 Biogenic Ooze

Thorn Lieutenant is in the board for this matchup in particular. Thorn Lieutenant does exactly what you want. If they try to attack into it with one and two drops, it is a perfect wall. If they kill it, you get a free 1/1 that is surprisingly annoying. Later on, it turns around and attacks and is another way to exploit Biomancer’s Familiar. Cutting the activation from six mana to four makes things a lot easier. It’s a nice to have, and it offers another long term threat in other matchups where you want that, but an easy cut from the board if you want something else badly.

You do already have a ton of perfectly good two drops. But many of them are long term valuable, and you want the option to hold them for the right later opportunity. You also often want to cast two of them on your four mana turn, or one per turn while holding up counters.

Entrancing Melody gives you coverage against Rekindling Phoenix, and is also very strong when it takes Goblin Chainwhirler. That frees up the ground for you to attack, as they lost a good blocker and a good attacker and you picked up a great additional blocker, letting you close the game out quickly. Even taking a small creature prevents them from going wide.

With that, you no longer need Biogenic Ooze as much, which means Adventurous Impulse gets worse, so it comes out too. The extra counters let you stop Experimental Frenzy or prevent you from being burned out later.

Putting in Dive Down is reasonable as well, and can lead to the engine coming online, but is a way things can go wrong if they start aiming all their removal at your head, giving you dead cards. Watch how they play and act accordingly.

This matchup is quite good as configured. If you don’t care about it much, you can trim an additional Druid of the Cowl and give up Thorn Lieutenant, and things worse but still fine. If you care about it a lot, you can have access to more Thorn Lieutenants, including in the main, or a third maindeck Druid of the Cowl.


They can deploy a lot of power quickly. Your best draws go over the top of that fast enough to not die, unless they go completely nuts. Once you turn the corner, you can be very patient, as not much threatens you, but there is risk that they gain the ability to go wide and kill you with an Alpha strike. There are games where you spend a lot of time pumping up your team but can’t get through and they’re making tokens or keep playing creatures, and closing it out gets tricky. That got a lot easier once we added a full four Biogneic Ooze, and either that or Sphinx of Foresight can close things out.


In: +4 Entrancing Melody, +1 Thorn Lieutenant

Out: -1 Negate, -4 Frilled Mystic

There is nothing you need to counter. There are things you’d like to counter, especially removal spells, but not enough to be thrilled about holding up mana. If they show a bunch of flyers that Kraul Harpooner can pick off, I don’t mind putting a few in. It’s also a solid blocker for the early turns.

Esper Control:

Kaya’s Wrath is your enemy. They can hit you with discard and then wipe your board. That is the most common way you lose. You also need to watch out for Cry of the Crenarium if you deploy creatures in the wrong order. The other way is they counter or kill everything one by one and you run out of threats. Biogenic Ooze gives you extra good threats, especially after they Kaya’s Wrath.

Once you have enough stuff, sit back on counters and don’t use them on spells that don’t change the path of the game. What matters is mostly Kaya’s Wrath. Know when you need to walk into it, when you can afford to play around it, and when they’ll get enough counter backup for it.


In: +2 Negate, +2 Spell Pierce, +2 Dive Down, +1 Thorn Lieutenant

Out: -2 Druid of the Cowl, -4 Sphinx of Foresight, -1 Adventurous Impulse

Sphinx is easy to answer, doesn’t hit hard, and costs too much to protect properly. Giving up the scry at the start of the game is unfortunate, but that isn’t enough to justify its presence. Adventurous Impulse gets substantially worse, but we still have a lot of strong hits and love finding Frilled Mystic, so it mostly stays despite Sphinx leaving. The Thorn Lieutenant gives you a threat that can close things out, and I’ve found it plays surprisingly well against control. But if you don’t have it, you won’t miss it much here.

The counters shore you up against Kaya’s Wrath and Teferi, Hero of Dominaria. Dive Down protects your key creatures against removal.

Nexus of Fate:

You’re playing a similar game to blue. They’re better at it here, as your extra power is mostly overkill, but even a worse version of this strategy still works well. Always counter Search for Azcanta, and almost always hang back on counters once they get to four.


In: +2 Spell Pierce, +2 Negate

Out: -4 Biogenic Ooze

You don’t need the power Ooze provides, so take it out, deploy stuff early then sit on counters. If you aren’t happy with the matchup, add more counters to the sideboard until you are satisfied.


You have a few different fears to worry about. Hostage Taker on your creatures is often quite bad. In corner cases it is so bad that you need to consider holding Biogenic Ooze. If a WIldgrowth Walker goes large, it can buy a lot of time. If the game goes long enough without you closing it out with your engine or a Biogenic Ooze or Sphinx of Foresight, they will cast Hydroid Krasis one time too many for escalating sizes.

Then there’s Finality. You need to be continuously aware of Finality. Sphinx of Foresight and Biogenic Ooze are both vulnerable, as are many of your cheaper creatures. Once you are clearly ahead, prioritize getting creatures to five toughness. Push a Growth-Chamber Guardian to 6/6 and leave one at 2/2 for now, which is usually right anyway. Get Incubation Druid to 3/5 even if it feels unnatural or slows things down a bit. If you can’t, consider paying a lot to hold up counters, and/or hold some creatures back. Holding up counters is how the last few turns are best handled most of the time in any case, if you have them available.


In: +4 Entrancing Melody, +2 Dive Down

Out: -3 Adventurous Impulse, -2 Druid of the Cowl, -1 Biogenic Ooze

You love the spells coming in, and need to make room. Their plan is mostly to trade cards with you in various forms and grind you out, so flooding on mana is a danger. Druid of the Cowl does not useful blocking and Adventurous Impulse can miss, while Entrancing Melody mostly only costs two mana and Dive Down costs one, and they rarely kill Llanowar Elves or a non-adapted Incubation Druid, so you’re not overly mana light.

I’m not sure how many copies of Negate you want. Finality is important, but so is Hostage Taker, and playing too many spells is how you run into trouble. I’m pretty unhappy that we’re cutting a Biogenic Ooze as it is to stay at eight answers.


Gruul smash. You build up. Who will do it better? Back when I tested no one was playing Gruul, so I don’t know. They can certainly deploy a lot of threats fast and pick off your creatures before you can do your thing, Pelt Collector is super efficient and Rekindling Phoenix is tough. If you can do your thing in full, you’ll win.


In: +4 Entrancing Melody, +2 Dive Down

Out: -2 Druid of the Cowl, -1 Biogenic Ooze, -1 Adventurous Impulse, -2 Negate

Dive Down is a better Negate, so it’s an easy swap. I can see going either up or down on answers, but I don’t think you have time for Negate, and Spell Pierce won’t play in context. Druid of the Cowl does not actually block, so go with your other two drops. That leaves two cards to bring out. Biogenic Ooze seems slow so I’m fine bringing one out, which in turn makes me like Adventurous Impulse less given how many spells I’m bringing in. This is a place to start, but it’s likely wrong.

Other matchups follow similar principles.

If you get a chance, take this deck for a spin and see what you think.


[Method] The light side of motivation: positive feedback-loop

26 марта, 2019 - 13:56
Published on March 26, 2019 10:56 AM UTC

I want to share this method I use sometimes to stay focused on my tasks, earn rewards from them, and build up a positive feedback-loop to do more difficult things. It's nothing new and has probably been written about a few times, but I have been using it subconsciously for years, and wanted to do an explicit representation for future use. If this sounds completely wrong to you, please ignore it or tell me in the comments.

It should go without mentioning that this is just one part of a well-tuned system. It works because other parts work and support it. If supportive systems are wired differently or broken, this approach may not work at all.

What you need
  • the ability to motivate yourself to some degree
  • some preparation, or else that you have a mental list of tasks that can be done
  • the ability to complete simple or moderately difficult tasks when you are already motivated (Side-note: Forcing yourself to do things when you really, really don't want to might be effective one or twice. But in the long run, it's going to build up an even stronger aversion. Then making yourself do the thing will be even more difficult. How to change your feelings about a task is not the main topic of this post.)
How it works
  • Step 1: Induce happiness. Make yourself feel confident and hopeful.
  • Step 2: Choose the (simple to moderately difficult) action you want to complete.
  • Step 3: Complete the action and earn reward for this! With the "evidence" of being able to complete actions, stir up your confidence of being able to complete tasks.
  • Step 4: Choose the next stack of actions. They should be moderately difficult and involve clear steps to the solution (no 'meta-actions'). Begin them immediately, before your confidence and motivation fades; in doing so, focus on the short-term future when you are going to feel accomplished about having completed them. Use your recently activated confidence for this. Don't focus on your potential dislike of them or any other feelings of avoidance. If they crop up regardless, ignore them and tell them that they are going to be defeated soon, so they should just leave. (I know how this sounds. But if you are anything like me, treating your mind like a dog to be trained can be really helpful in getting it to do things!)
  • Step 5: Relax from having successfully completed a stack of necessary and useful actions. Bask in the reward, but don't linger more than 10 minutes. Even if you are slightly exhausted, this is not the time to stop! Focus on building your resolve to tackle a more difficult action next; one that you know you can complete, but may have to work harder for.
  • Step 6: Make sure to possess the necessary energy to do this more difficult task. Eat or drink something small and healthy if you don't; take a short (5 to 15 min) nap.
  • Step 7: Sit down in a clear and organised workspace. Be determined to do this! Don't stop until you are finished. Persist through exhaustion; this is a sign of working hard, not failure. When you are done, take the time to clean up your workspace.
  • Step 8: Rest a lot! Go to sleep or take a nap, eat something, read a light novel. Well done! You have completed your goal!

Caution! Do not use this to work yourself to exhaustion over time. This is meant to help in keeping up a healthy work-mentality; don't use it to trick your body or mind into giving more than it has. Take steps to make sure this doesn't happen; perhaps set up a reminder for some weeks later, checking that your habits don't stray into forbidden territory. The danger lies in not noticing until it is to late. Be prepared!

How to reward yourself

This might be really different for different people. I build up some habits over the years where, after completing some chosen task or thingy, I would internally congratulate myself, focus on the positive feelings this evoked, etc.

It might take some experimentation. Physical rewards, like a pleasant sound and light effect, a 'Well done!' stamp on a paper (humans are weird, but if it produces the desired results...), can also be effective. This works for children, pets and games, which is why I started using it.

These small rewards don't really matter at all, of course. They are just tools to build up the desired habits. Eventually, when you are working on the things that are important to you and making progress, that may become a reward of its own.


"Moral" as preference label

26 марта, 2019 - 13:30
Published on March 26, 2019 10:30 AM UTC

Note: working on a research agenda, hence the large amount of small individual posts, to have things to link to in the main documents.

In my quest to synthesise human preferences, I've occasionally been asked whether I distinguish moral preferences from other types of preferences - for example, whether preferences for Abba or Beethoven, or avocado or sausages, should rank as high as human rights or freedom of speech.

The answer is, of course not. But these are not the sort of things that should be built into the system by hand. This should be reflected in the meta-preferences. We label certain preferences "moral", and we often have the belief that these should have priority, to some extent, over merely "selfish" preferences (the extent of this belief varies from person to person, of course).

I deliberately wrote the wrong word there for this formalism - we don't have the "belief" that moral preferences are more important, we have the meta-preference that a certain class of beliefs, labelled "moral", whatever that turns out to mean, should be given greater weight. This is especially the case as there are a lot of cases where it is very unclear if a preference is moral or not (many people have strong moral-ish preferences over mainstream cultural and entertainment choices).

This is an example of the sort of challenges that a preference synthesis process should be able to figure out on its own. If the method needs to be constantly tweaked to get over every small problem of definition, then it cannot work. As always, however, it need not get everything exactly right; indeed, it needs to be robust enough that it doesn't change much if a borderline meta-preference such as "everyone should know their own history" gets labelled as moral or not.


What I've Learned From My Parents' Arranged Marriage

26 марта, 2019 - 09:40
Published on March 26, 2019 6:40 AM UTC

When I tell people my parents had an arranged marriage, I get a number of different reactions. Most people have the wrong idea of exactly what that looks like, and those who do have the right idea often wonder if my parents can even understand what dating is like, given they've never experienced it. I've heard people assume that my parents' arranged marriage meant they were completely unable to help or give advice when it came to my dating life, and I've found the opposite to be the case; the advice my parents gave me about dating was as valuable as anything I found anywhere else, and allowed me to pass that advice on to my friends. Growing up hearing their story taught me a lot about what was important to know about myself before I started dating anyone, and how a good couple functions and grows together. I found that much of this is less commonly talked about when it comes to Western dating, and so I want to share their story and what I learned from it with you. For background, I'll start with telling you what arranged marriage is actually like.
Although some parts of India still do the traditional "bride and groom don't meet until the wedding", these tend to be remote and rural parts. Most arranged marriages today function a little more like a blind date, but with your parents and their network finding you a match rather than your friends. On the more traditional end, families may set up a "bride viewing", which today functions like a first meeting where the parents introduce each half of the couple, then leave them alone to get to know each other. They later tell their parents if they agree to the marriage or not. On the more liberal end, a couple may go on many dates before agreeing. In some cases, young people will date and fall in love, and the parents will meet after and decide to "arrange" the marriage if all parties agree to it. In the case of my parents, my dad's cousin (who he was very close with) met my mother and thought they would be a good match due to compatible philosophical interests and tastes in literature. My mother had, at that point, not dated at all, despite being in graduate school; it is normal for young people in India to feel marriage is not something they have to worry too much about because they trust their families will find someone good for them. The fact that my dad's cousin met my mother and immediately thought of my father points at another way arranged marriages affect the culture: people are always on the lookout for a good match.
When you ask someone who has had an arranged marriage about love, the first thing they say is that the love will come naturally once the couple is married. As a child, I always found this thought strange. As I grew older, though, I noticed the truth of this in the stories my mother told me about her relationship early on with my father. When they married, he was living in the US, and she was finishing her master's in India; for the year it took to finish her degree, they wrote letters. The way they did this nourished their love for each other, and fostered growth in their relationship. Western romance is described as something that happens on accident, but arranged romance happens on purpose. Even relationships that start with falling in love can benefit from growing and deepening that bond in the same way. This happens because you water love like a plant, and give it the right kinds of nutrients so it can grow.
One of the values that my mom spoke to me about more explicitly is that of cultural compatibility. In India, marriage is arranged through the social network of the parents. Traditionally, this focused a lot on social standing and religion, because of the idea that families of the same groups will raise their children similarly, and have similar values. My parents both grew up valuing learning and knowledge. They would have been far less compatible with people who were more focused on material wealth, or spiritual minimalism. Because their families had similar values, they were each instilled with similar values. This is reinforced by the fact that India is a more collectivist culture, and thus it is thought that your family knows you better than anyone else. Those who know you best are more likely to have a sense of who you would get along with, whether they're related to you or not. Further, getting along with the people your partner cares about most is important in any long term relationship. The fact that my mom got along well with my dad's cousin was a good sign; my mom connected more with the rest of my dad's family after the marriage, even though my dad had to go back to the US. Whether the relationship is arranged or not, fostering individual relationships with the people your partner cares about helps strengthen your relationship.
Compatibility includes not only what you value, but also what you want. Around the time my mother was getting married, many people her age were talking about wanting to move to the US. She was one of the few who wasn't fussed; she felt she'd be just as happy continuing to live in India. Of course, when she met my dad, that changed. For the right person, she was willing to move. There are people who wouldn't have been willing to make that move for anything, and there are those who wanted to move so badly that they didn't want to marry anyone willing to stay. This can be applied to anything one might want out of life, from living situation to religion to children, and more. In Western romantic media, this is often portrayed as being heartless. Ultimately, though, it's about trade-offs. Does your love for the person really overpower how much you want something? That answer differs for everyone. You can say that love conquers all, but a mismatch in this type of compatibility is one of the most common causes for divorce in the US. Knowing what you want your life to look like before you find the person to spend it with is going to be easier than trying to convince someone else to change what they want.
Of course, compatibility is nothing if you're not also complementary. This is where modern dating begins to look like marketing: know your target audience, and know what they want. If you know what kind of values you want your partner to have, you might already have a vague sense of what they would be like as a person. Knowing what you provide is crucial, especially when it comes to things like online dating. Traditional gender roles cover this well if you fit neatly in to one or the other, but things don't work that way for everyone. Give that my dad lived in the US, the fact that he could provide citizenship was huge. But he would not have been satisfied with a marriage with someone who saw this as his biggest asset. The fact that my mother was not obsessed with moving to the US meant that their complementary focus had to happen elsewhere. They shared the value of intellectual engagement, but my dad was always more focused on abstract ideas, while my mother tended to think more concretely. Here was where they were able to complement each other, which gave their life together more balance, and helped foster their growth individually as well. Finding someone whose traits and skills complement yours can help cover areas of life you struggle with, provide perspective when needed, and encourage you to grow and learn new things.
As a child, I didn't see the story of my parents as a love story. Love stories were about falling madly, hopelessly, and deeply, all at once, and my parents never really had that. But as I grew older, I noticed the details of their relationship. When my dad bought her a nice dress, it was as much because he wanted to see her in it as it was because he knew she hated shopping. When she challenged his ideas, it was out of love and respect, more than anything else. When we did things together as a family, they made sure to take time to connect with each other as a couple, even if it was only briefly. And as I became more independent, they were able to spend more and more time together. Love that lasts over a lifetime doesn't stay the same; it grows and changes with you as you grow and change. Falling in love doesn't happen once, but again and again.


Do you like bullet points?

26 марта, 2019 - 07:30
Published on March 26, 2019 4:30 AM UTC

I think more naturally in bullet points, and I (sometimes) like reading posts that are written in bullet style. (This website is one of my favorites, and is written entirely in bullets).

(Disclaimer, although I wrote this post in bullet points because it was cute, I don't think it's the best exemplar of them. Or rather, it's an example of using bullet points to do rough thinking, rather than an example of using them to illustrate a complex argument)

I like bullet points because:

  • It's easier to skim, and build up a high level understanding of a post's structure. If you understand a concept you can skip it and move on, if you want to drill down and understand it better you can do so.
    • Relatedly, it exposes your cruxes more readily. You can pick out and refute points, in a way that can be harder with meandering prose.
  • It's easier to hash out early stage ideas. When I'm first thinking about something, my brain is jumping around and forming connections, developing a model at multiple levels of resolution. Bullet lists make this easier to keep track of.
    • I like this for other people's posts as well, since it feels more playful, like I can be part of their early generation process. I think LessWrong would be better if more people wrote more unpolished things to get early feedback on them, and bullet lists are a nice way to signal that something is still in development.
  • Prose often adds unnecessary cruft. In the transition from bullets-to-prose, posts can go 2x-3x as long (or, when I go to write a short bullet summary of something I wrote in prose, it turns out to be much shorter, and the prose mostly unnecessary)

I had assumed this was a common experience, and that it was in fact a weakness of humanity that we didn't have better, more comprehensive bullet-point tools.

But, alas, Typical Mind Fallacy. It turned out a couple people on the LessWrong team reacted very negatively to bullet points. Concerns include:

  • It's easy to think you've communicated more clearly than you have, because you didn't bother writing the connecting words between paragraphs.
  • They're harder to read straight through. If you include bold words, readers might not bother reading the non-bold words, and miss nuance.
  • "I like numbered arguments, since that makes it easier to respond to individual points. But unnumbered bullet lists are just hard to parse."
    • [Alas, the LessWrong website currently doesn't enable this very well because our Rich Editor's implementation of numbered lists was annoying]
  • "I dunno man it's just really hard to read. My brain keeps trying to collapse the bullets like they're code."

I asked a couple more people, and they said "I dunno, bullet points seem fine. Depends on the situation?"


I am curious what the LessWrong userbase thinks about them overall. Raise your hand if you think bullet points are fine? Terrible? Great? Any particular types of posts you prefer reading bullet-style, and types of posts you think fare poorly if not written in prose?


DanielFilan's Shortform Feed

26 марта, 2019 - 02:32
Published on March 25, 2019 11:32 PM UTC

Rationality-related writings that are more comment-shaped than post-shaped. Please don't leave top-level comments here unless they're indistinguishable to me from something I would say here.


IRL 4/8: Maximum Entropy IRL and Bayesian IRL

26 марта, 2019 - 01:07
Published on March 25, 2019 10:07 PM UTC

Every Monday for 8 weeks, we will be posting lessons about Inverse Reinforcement Learning. This is lesson 4.

Note that access to the lessons requires creating an account here.

This lesson comes with the following supplementary material:

Have a nice day!


Please take the LW/SSC meetups survey!

26 марта, 2019 - 00:48
Published on March 25, 2019 9:48 PM UTC

I've put together a survey to gather information on the state of meetups around the world, and in particular to figure out what kinds of actions it might be useful for people interested in global meetup coordination to take. It's branded as being for SlateStarCodex meetups, but please don't be put off by that if your group isn't affiliated with SSC - the branding is just an artifact of previous decisions, but I'm just as interested in getting data on LW and EA groups.

You can take the survey here.

Context: I've been organizing and thinking about meetups for a couple years now. I coordinated the SSC Meetups Everywhere 2018 and I received a grant from the Centre for Effective Altruism to coordinate SSC meetups.


Please let me know either in the survey or in the comments below if you have any feedback or questions! It's very unlikely that I'll make changes to the survey questions now since that would mess up the data, but this is my first time doing something like this and I will definitely take feedback into account for the future.

Data will not be released publicly because it would be too easy to identify individuals and I neglected to include a question about releasing people's answers, but I am planning to share aggregate statistics and lessons learned publicly. I will also probably reach out to individual meetup organizers if there's significant data on what people want to see from their groups.


To perform best at work, look at Time & Energy account balance

25 марта, 2019 - 23:14
Published on March 25, 2019 7:37 PM UTC

Several weeks ago, I got a chance to join a talk hosting one of the very few female regional head at Google.

Despite not having any business background, she climbed the rank from entry level employee to become a regional head, surpassing everyone else from prestigious business degrees and rich experiences.

One success driver she mentioned got my attention. Despite lagging very much behind at the beginning, the core to her success is that she always aims for 120% result of any task in front of her.

The reason why this interests me is not because of my fresh ears.

In fact, this is not the first time I heard of this concept. Not the first time I get inspired of giving it all to whatever is in front. Not the first time I try…and not the first time I fail.

Did I not put in enough effort?

No…in fact, I put in so much effort to make this concept come to live, not realising that while effort is highly important, it’s critically inadequate.

As I listened to this amazing regional head talking about different aspects of her life, I came to realisation on what I have always been missing so far.

To make each task yield 120%, apart from effort, we should also look at our time and energy balance.

Contributing the best on a task means to give the amount of time and energy in the level required to make the result best.

We cannot contribute what we don’t have.

No matter how much effort we try to give adequate time required for the best, we only have 24 hours a day.

No matter how much energy we try to put into each task, we only have a limited stream in each day.

Therefore, giving our best does not start from the moment we begin working…but from the moment we plan our schedule and project pipelines.

When having "Enough Time" is Not Enough When my boss asked if I have enough time to take on one additional project, I would look at how much time is required to finish all the tasks on my desk and then, most of the time, said "Yes" thinking I have enough time to finish it all.

However, there is a difference between having enough time to finish it all and having time to make it best.

Coming back to evaluate all the projects in my pipeline, I realize that the time I have is only enough to finish all up, but not to go above and beyond.

I have two choices:

  • Finishing a lot of tasks with average results OR

  • Complete major task with the best impact that goes beyond expectations

There is no right answer here, but for my situation, the second works better.

Even having Time is Sometimes Not Enough Having time is good. But having time without full energy...hmm...unlikely to be productive.

Another good lesson I learned from this talk is that ample time to do it best should always come with ample energy.

It's just normal to plan business projects with the right balance between high-low energy requirement. However...our energy pool is not limited only in working hours, but also in personal life.

One thing I learned is that when looking at high-low energy requirement in my activities list, I should include all activities both in office and at home.

Despite saying "I only have one major project going on during working hours", if this lady has to practice running a marathon at night with high intensity, how would she have enough energy to do both best, despite marathon not being related to works.

To summarize, with one key success driver in career being to do our best in the tasks at hand (eg. the concept to deliver 120%), many ambitious people try to put in so much effort to ensure the best results. However, the best results actually begin even before we start doing each task...but begins during project planning, in which time and energy balance would determine how our project results would turn out to be.


Subagents, akrasia, and coherence in humans

25 марта, 2019 - 17:24
Published on March 25, 2019 2:24 PM UTC

In my previous posts, I have been building up a model of mind as a collection of subagents with different goals, and no straightforward hierarchy. This then raises the question of how that collection of subagents can exhibit coherent behavior: after all, many ways of aggregating the preferences of a number of agents fail to create consistent preference orderings.

We can roughly describe coherence as the property that, if you become aware that there exists a more optimal strategy for achieving your goals than the one that you are currently executing, then you will switch to that better strategy. If an agent is not coherent in this way, then bad things are likely to happen to them.

Now, we all know that humans sometimes express incoherent behavior. But on the whole, people still do okay: the median person in a developed country still manages to survive until their body starts giving up on them, and typically also manages to have and raise some number of initially-helpless children until they are old enough to take care of themselves.

For a subagent theory of mind, we would like to have some explanation of when exactly the subagents manage to be collectively coherent (that is, change their behavior to some better one), and what are the situations in which they fail to do so. The conclusion of this post will be:

We are capable of changing our behaviors on occasions when the mind-system as a whole puts sufficiently high probability on the new behavior being better, when the new behavior is not being blocked by a particular highly weighted subagent (such as an IFS-style protector) that puts high probability on it being bad, and when we have enough slack in our lives for any new behaviors to be evaluated in the first place. Akrasia is subagent disagreement about what to do.

(Those of you who read my previous post might remember that I said this post would be about “unification of mind” - that is, about how to make subagents agree with each other better. Turns out that I spent so many words explaining when subagents disagree, that I had to put off the post on how to get them to agree. Maybe my next post will manage to be about that…)

Correcting your behavior as a default

There are many situations in which we exhibit incoherent behavior simply because we’re not aware of it. For instance, suppose that I do my daily chores in a particular order, when doing them in some other order would save more time. If you point this out to me, I’m likely to just say “oh”, and then adopt the better system.

Similarly, several of the experiments which get people to exhibit incoherent behavior rely on showing different groups of people different formulations of the same question, and then indicating that different framings of the same question get different answers from people. It doesn’t work quite as well if you show the different formulations to the same people, because then many of them will realize that differing answers would be inconsistent.

But there are also situations in which someone realizes that they are behaving in a nonsensical way, yet will continue behaving in that way. Since people usually can change suboptimal behaviors, we need an explanation for why they sometimes can’t.

Towers of protectors as a method for coherence

In my post about Internal Family Systems, I discussed a model of mind composed of several different kinds of subagents. One of them, the default planning subagent, is a module just trying to straightforwardly find the best thing to do and then execute that. On the other hand, protector subagents exist to prevent the system from getting into situations which were catastrophic before. If they think that the default planning subagent is doing something which seems dangerous, they will override it and do something else instead. (Previous versions of the IFS post called the default planning agent, “a reinforcement learning subagent”, but this was potentially misleading since several other subagents were reinforcement learning ones too, so I’ve changed the name.)

Thus, your behavior can still be coherent even if you feel that you are failing to act in a coherent way. You simply don’t realize that a protector is carrying out a routine intended to avoid dangerous outcomes - and this might actually be a very successful way of keeping you out of danger. Some subagents in your mind think that doing X would be a superior strategy, but the protector thinks that it would be a horrible idea - so from the point of view of the system as a whole, doing X is not a better strategy, so not switching to it is actually better.

On the other hand, it may also be the case that the protector’s behavior, while keeping you out of situations which the protector considers unacceptable, is causing other outcomes which are also unacceptable. The default planning subagent may realize this - but as already established, any protector can overrule it, so this doesn’t help.

Evolution’s answer here seems to be spaghetti towers. The default planning subagent might eventually figure out the better strategy, which avoids both the thing that the protector is trying to block and the new bad outcome. But it could be dangerous to wait that long, especially since the default planning agent doesn't have direct access to the protector's goals. So for the same reasons why a separate protector subagent was created to avoid the first catastrophe, the mind will create or recruit a protector to avoid the second catastrophe - the one that the first protector keeps causing.

With permission, I’ll borrow the illustrations from eukaryote’s spaghetti tower post to illustrate this.

Example Eric grows up in an environment where he learns that disagreeing with other people is unsafe, and that he should always agree to do things that other people ask of him. So Eric develops a protector subagent running a pleasing, submissive behavior.

Unfortunately, while this tactic worked in Eric’s childhood home, once he became an adult he starts saying “yes” to too many things, without leaving any time for his own needs. But saying “no” to anything still feels unsafe, so he can’t just stop saying “yes”. Instead he develops a protector which tries to keep him out of situations where people would ask him to do anything. This way, he doesn’t need to say “no”, and also won’t get overwhelmed by all the things that he has promised to do. The two protectors together form a composite strategy.

While this helps, it still doesn’t entirely solve the issue. After all, there are plenty of reasons that might push Eric into situations where someone would ask something of him. He still ends up agreeing to do lots of things, to the point of neglecting his own needs. Eventually, his brain creates another protector subagent. This one causes exhaustion and depression, so that he now has a socially-acceptable reason for being unable to do all the things that he has promised to do. He continues saying “yes” to things, but also keeps apologizing for being unable to do things that he (honestly) intended to do as promised, and eventually people realize that you probably shouldn’t ask him to do anything that’s really important to get done.

And while this kind of a process of stacking protector on top of a protector is not perfect, for most people it mostly works out okay. Almost everyone ends up having their unique set of minor neuroses and situations where they don’t quite behave rationally, but as they learn to understand themselves better, their default planning subagent gets better at working around those issues. This might also make the various protectors relax a bit, since the various threats are generally avoided and there isn’t a need to keep avoiding them.

Gradually, as negative consequences to different behaviors become apparent, behavior gets adjusted - either by the default planning subagents or by spawning more protectors - and remains coherent overall.

But sometimes, especially for people in highly stressful environments where almost any mistake may get them punished, or when they end up in an environment that their old tower of protectors is no longer well-suited for (distributional shift), things don’t go as well. In that situation, their minds may end up looking like this a hopelessly tangled web, where they have almost no flexibility. Something happens in their environment, which sets off one protector, which sets off another, which sets off another - leaving them with no room for flexibility or rational planning, but rather forcing them to act in a way which is almost bound to only make matters worse.

This kind of an outcome is obviously bad. So besides building spaghetti towers, the second strategy which the mind has evolved to employ for keeping its behavior coherent while piling up protectors, is the ability to re-process memories of past painful events.

As I discussed in my original IFS post, the mind has methods for bringing up the original memories which caused a protector to emerge, in order to re-analyze them. If ending up in some situation is actually no longer catastrophic (for instance, you are no longer in your childhood home where you get punished simply for not wanting to do something), then the protectors which were focused on avoiding that outcome can relax and take a less extreme role.

For this purpose, there seems to be a built-in tension. Exiles (the IFS term for subagents containing memories of past trauma) “want” to be healed and will do things like occasionally sending painful memories or feelings into consciousness so as to become the center of attention, especially if there is something about the current situation which resembles the past trauma. This also acts as what my IFS post called a fear model - something that warns of situations which resemble the past trauma enough to be considered dangerous in their own right. At the same time, protectors “want” to keep the exiles hidden and inactive, doing anything that they can for keeping them so. Various schools of therapy - IFS one of them - seek to tap into this existing tension so as to reveal the trauma, trace it back to its original source, and heal it.

Coherence and conditioned responses

Besides the presence of protectors, another possibility for why we might fail to change our behavior are strongly conditioned habits. Most human behavior involves automatic habits: behavioral routines which are triggered by some sort of a cue in the environment, and lead to or have once led to a reward. (Previous discussion; see also.)

The problem with this is that people might end up with habits that they wouldn’t want to have. For instance, I might develop a habit of checking social media on their phone when I’m bored, creating a loop of boredom (cue) -> looking at social media (behavior) -> seeing something interesting on social media (reward).

Reflecting on this behavior, I notice that back when I didn’t do it, my mind was more free to wander when I was bored, generating motivation and ideas. I think that my old behavior was more valuable than my new one. But even so, my new behavior still delivers enough momentary satisfaction to keep reinforcing the habit.

Subjectively, this feels like an increasing compulsion to check my phone, which I try to resist since I know that long-term it would be a better idea to not be checking my phone all the time. But as the compulsion keeps growing stronger and stronger, eventually I give up and look at the phone anyway.

The exact neuroscience of what is happening at such a moment remains only partially understood (Simpson & Balsam 2016). However, we know that whenever different subsystems in the brain produce conflicting motor commands, that conflict needs to be resolved, with only one at a time being granted access to the “final common motor path”. This is thought to happen in the basal ganglia, a part of the brain closely involved in action selection and connected to the global neuronal workspace.

One model (e.g. Redgrave 2007, McHaffie 2005) is that the basal ganglia receives inputs from many different brain systems; each of those systems can send different “bids” supporting or opposing a specific course of action to the basal ganglia. A bid submitted by one subsystem may, through looped connections going back from the basal ganglia, inhibit other subsystems, until one of the proposed actions becomes sufficiently dominant to be taken.

The above image from Redgrave 2007 has a conceptual image of the model, with two example subsystems shown. Suppose that you are eating at a restaurant in Jurassic Park when two velociraptors charge in through the window. Previously, your hunger system was submitting successful bids for the “let’s keep eating” action, which then caused inhibitory impulses to the be sent to the threat system. This inhibition prevented the threat system from making bids for silly things like jumping up from the table and running away in a panic. However, as your brain registers the new situation, the threat system gets significantly more strongly activated, sending a strong bid for the “let’s run away” action. As a result of the basal ganglia receiving that bid, an inhibitory impulse is routed from the basal ganglia to the subsystem which was previously submitting bids for the “let’s keep eating” actions. This makes the threat system’s bids even stronger relative to the (inhibited) eating system’s bids.

Soon the basal ganglia, which was previously inhibiting the threat subsystem’s access to the motor system while allowing the eating system access, withdraws that inhibition and starts inhibiting the eating system’s access instead. The result is that you jump up from your chair and begin to run away. Unfortunately, this is hopeless since the velociraptor is faster than you. A few moments later, the velociraptor’s basal ganglia gives the raptor’s “eating” subsystem access to the raptor’s motor system, letting it happily munch down its latest meal.

But let’s leave velociraptors behind and go back to our original example with the phone. Suppose that you have been trying to replace the habit of looking at your phone when bored, to instead smiling and directing your attention to pleasant sensations in your body, and then letting your mind wander.

Until the new habit establishes itself, the two habits will compete for control. Frequently, the old habit will be stronger, and you will just automatically check your phone without even remembering that you were supposed to do something different. For this reason, behavioral change programs may first spend several weeks just practicing noticing the situations in which you engage in the old habit. When you do notice what you are about to do, then more goal-directed subsystems may send bids towards the “smile and look for nice sensations” action. If this happens and you pay attention to your experience, you may notice that long-term it actually feels more pleasant than looking at the phone, reinforcing the new habit until it becomes prevalent.

To put this in terms of the subagent model, we might drastically simplify things by saying that the neural pattern corresponding to the old habit is a subagent reacting to a specific sensation (boredom) in the consciousness workspace: its reaction is to generate an intention to look at the phone. At first, you might train the subagent responsible for monitoring the contents of your consciousness, to output moments of introspective awareness highlighting when that intention appears. That introspective awareness helps alert a goal-directed subagent to try to trigger the new habit instead. Gradually, a neural circuit corresponding to the new habit gets trained up, which starts sending its own bids when it detects boredom. Over time, reinforcement learning in the basal ganglia starts giving that subagent’s bids more weight relative to the old habit’s, until it no longer needs the goal-directed subagent’s support in order to win.

Now this model helps incorporate things like the role of having a vivid emotional motivation, a sense of hope, or psyching yourself up when trying to achieve habit change. Doing things like imagining an outcome that you wish the habit to lead to, may activate additional subsystems which care about those kinds of outcomes, causing them to submit additional bids in favor of the new habit. The extent to which you succeed at doing so, depends on the extent to which your mind-system considers it plausible that the new habit leads to the new outcome. For instance, if you imagine your exercise habit making you strong and healthy, then subagents which care about strength and health might activate to the extent that you believe this to be a likely outcome, sending bids in favor of the exercise action.

On this view, one way for the mind to maintain coherence and readjust its behaviors, is its ability to re-evaluate old habits in light of which subsystems get activated when reflecting on the possible consequences of new habits. An old habit having been strongly reinforced reflects that a great deal of evidence has accumulated in favor of it being beneficial, but the behavior in question can still be overridden if enough influential subsystems weigh in with their evaluation that a new behavior would be more beneficial in expectation.

Some subsystems having concerns (e.g. immediate survival) which are ranked more highly than others (e.g. creative exploration) means that the decision-making process ends up carrying out an implicit expected utility calculation. The strengths of bids submitted by different systems do not just reflect the probability that those subsystems put on an action being the most beneficial. There are also different mechanisms giving the bids from different subsystems varying amounts of weight, depending on how important the concerns represented by that subsystem happen to be in that situation. This ends up doing something like weighting the probabilities by utility, with the kinds of utility calculations that are chosen by evolution and culture in a way to maximize genetic fitness on average. Protectors, of course, are subsystems whose bids are weighted particularly strongly, since the system puts high utility on avoiding the kinds of outcomes they are trying to avoid.

The original question which motivated this section was: why are we sometimes incapable of adopting a new habit or abandoning an old one, despite knowing that to be a good idea? And the answer is: because we don’t know that such a change would be a good idea. Rather, some subsystems think that it would be a good idea, but other subsystems remain unconvinced. Thus the system’s overall judgment is that the old behavior should be maintained.

Interlude: Minsky on mutually bidding subagentsI was trying to concentrate on a certain problem but was getting bored and sleepy. Then I imagined that one of my competitors, Professor Challenger, was about to solve the same problem. An angry wish to frustrate Challenger then kept me working on the problem for a while. The strange thing was, this problem was not of the sort that ever interested Challenger.What makes us use such roundabout techniques to influence ourselves? Why be so indirect, inventing misrepresentations, fantasies, and outright lies? Why can't we simply tell ourselves to do the things we want to do? [...]Apparently, what happened was that my agency for Work exploited Anger to stop Sleep. But why should Work use such a devious trick?To see why we have to be so indirect, consider some alternatives. If Work could simply turn off Sleep, we'd quickly wear our bodies out. If Work could simply switch Anger on, we'd be fighting all the time. Directness is too dangerous. We'd die.Extinction would be swift for a species that could simply switch off hunger or pain. Instead, there must be checks and balances. We'd never get through one full day if any agency could seize and hold control over all the rest. This must be why our agencies, in order to exploit each other's skills, have to discover such roundabout pathways. All direct connections must have been removed in the course of our evolution.This must be one reason why we use fantasies: to provide the missing paths. You may not be able to make yourself angry simply by deciding to be angry, but you can still imagine objects or situations that make you angry. In the scenario about Professor Challenger, my agency Work exploited a particular memory to arouse my Anger's tendency to counter Sleep. This is typical of the tricks we use for self-control.Most of our self-control methods proceed unconsciously, but we sometimes resort to conscious schemes in which we offer rewards to ourselves: "If I can get this project done, I'll have more time for other things." However, it is not such a simple thing to be able to bribe yourself. To do it successfully, you have to discover which mental incentives will actually work on yourself. This means that you - or rather, your agencies - have to learn something about one another's dispositions. In this respect the schemes we use to influence ourselves don't seem to differ much from those we use to exploit other people - and, similarly, they often fail. When we try to induce ourselves to work by offering ourselves rewards, we don't always keep our bargains; we then proceed to raise the price or even deceive ourselves, much as one person may try to conceal an unattractive bargain from another person.Human self-control is no simple skill, but an ever-growing world of expertise that reaches into everything we do. Why is it that, in the end, so few of our self-incentive tricks work well? Because, as we have seen, directness is too dangerous. If self-control were easy to obtain, we'd end up accomplishing nothing at all.

-- Marvin Minsky, The Society of Mind

Akrasia is subagent disagreement

You might feel that the above discussion doesn’t still entirely resolve the original question. After all, sometimes we do manage to change even strongly conditioned habits pretty quickly. Why is it sometimes hard and sometimes easier?

Redgrave et al. (2010) discuss two modes of behavioral control: goal-directed versus habitual. Goal-directed control is a relatively slow mode of decision-making, where “action selection is determined primarily by the relative utility of predicted outcomes”, whereas habitual control involves more directly conditioned stimulus-response behavior. Which kind of subsystem is in control is complicated, and depends on a variety of factors (the following quote has been edited to remove footnotes to references; see the original for those):

Experimentally, several factors have been shown to determine whether the agent (animal or human) operates in goal-directed or habitual mode. The first is over-training: here, initial control is largely goal-directed, but with consistent and repeated training there is a gradual shift to stimulus–response, habitual control. Once habits are established, habitual responding tends to dominate, especially in stressful situations in which quick reactions are required. The second related factor is task predictability: in the example of driving, talking on a mobile phone is fine so long as everything proceeds predictably. However, if something unexpected occurs, such as someone stepping out into the road, there is an immediate switch from habitual to goal-directed control. Making this switch takes time and this is one of the reasons why several countries have banned the use of mobile phones while driving. The third factor is the type of reinforcement schedule: here, fixed-ratio schedules promote goal-directed control as the outcome is contingent on responding (for example, a food pellet is delivered after every n responses). By contrast, interval schedules (for example, schedules in which the first response following a specified period is rewarded) facilitate habitual responding because contingencies between action and outcome are variable. Finally, stress, often in the form of urgency, has a powerful influence over which mode of control is used. The fast, low computational requirements of stimulus–response processing ensure that habitual control predominates when circumstances demand rapid reactions (for example, pulling the wrong way in an emergency when driving on the opposite side of the road). Chronic stress also favours stimulus–response, habitual control. For example, rats exposed to chronic stress become, in terms of their behavioural responses, insensitive to changes in outcome value and resistant to changes in action–outcome contingency. [...]Although these factors can be seen as promoting one form of instrumental control over the other, real-world tasks often have multiple components that must be performed simultaneously or in rapid sequences. Taking again the example of driving, a driver is required to continue steering while changing gear or braking. During the first few driving lessons, when steering is not yet under automatic stimulus–response control, things can go horribly awry when the new driver attempts to change gears. By contrast, an experienced (that is, ‘over-trained’) driver can steer, brake and change gear automatically, while holding a conversation, with only fleeting contributions from the goal-directed control system. This suggests that many skills can be deconstructed into sequenced combinations of both goal-directed and habitual control working in concert. [...]Nevertheless, a fundamental problem remains: at any point in time, which mode should be allowed to control which component of a task? Daw et al. have used a computational approach to address this problem. Their analysis was based on the recognition that goal-directed responding is flexible but slow and carries comparatively high computational costs as opposed to the fast but inflexible habitual mode. They proposed a model in which the relative uncertainty of predictions made by each control system is tracked. In any situation, the control system with the most accurate predictions comes to direct behavioural output.

Note those last sentences: besides the subsystems making their own predictions, there might also be a meta-learning system keeping track of which other subsystems tend to make the most accurate predictions in each situation, giving extra weight to the bids of the subsystem which has tended to perform the best in that situation. We’ll come back to that in future posts.

This seems compatible with my experience in that, I feel like it’s possible for me to change even entrenched habits relatively quickly - assuming that the new habit really is unambiguously better. In that case, while I might forget and lapse to the old habit a few times, there’s still a rapid feedback loop which quickly indicates that the goal-directed system is simply right about the new habit being better.

Or, the behavior in question might be sufficiently complex and I might be sufficiently inexperienced at it, that the goal-directed (default planning) subagent has always mostly remained in control of it. In that case change is again easy, since there is no strong habitual pattern to override.

In contrast, in cases where it’s hard to establish a new behavior, there tends to be some kind of genuine uncertainty:

  • The benefits of the old behavior have been validated in the form of direct experience (e.g. unhealthy food that tastes good, has in fact tasted good each time), whereas the benefits of the new behavior come from a less trusted information source which is harder to validate (e.g. I’ve read scientific studies about the long-term health risks of this food).
  • Immediate vs. long-term rewards: the more remote the rewards, the larger the risk that they will for some reason never materialize.
  • High vs. low variance: sometimes when I’m bored, looking at my phone produces genuinely better results than letting my thoughts wander. E.g. I might see an interesting article or discussion, which gives me novel ideas or insights that I would not otherwise have had. Basically looking at my phone usually produces worse results than not looking at it - but sometimes it also produces much better ones than the alternative.
  • Situational variables affecting the value of the behaviors: looking at my phone can be a way to escape uncomfortable thoughts or sensations, for which purpose it’s often excellent. This then also tends to reinforce the behavior of looking at the phone when I’m in the same situation otherwise, but without uncomfortable sensations that I’d like to escape.

When there is significant uncertainty, the brain seems to fall back to those responses which have worked the best in the past - which seems like a reasonable approach, given that intelligence involves hitting tiny targets in a huge search space, so most novel responses are likely to be wrong.

As the above excerpt noted, the tendency to fall back to old habits is exacerbated during times of stress. The authors attribute it to the need to act quickly in stressful situations, which seems correct - but I would also emphasize the fact that negative emotions in general tend to be signs of something being wrong. E.g. Eldar et al. (2016) note that positive or negative moods tend to be related to whether things are going better or worse than expected, and suggest that mood is a computational representation of momentum, acting as a sort of global update to our reward expectations.

For instance, if an animal finds more fruit than it had been expecting, that may indicate that spring is coming. A shift to a good mood and being “irrationally optimistic” about finding fruit even in places where the animal hasn’t seen fruit in a while, may actually serve as a rational pre-emptive update to its expectations. In a similar way, things going less well than expected may be a sign of some more general problem, necessitating fewer exploratory behaviors and less risk-taking, so falling back into behaviors for which there is a higher certainty of them working out.

So to repeat the summary that I had in the beginning: we are capable of changing our behaviors on occasions when the mind-system as a whole puts sufficiently high probability on the new behavior being better, when the new behavior is not being blocked by a particular highly weighted subagent (such as an IFS protector whose bids get a lot of weight) that puts high probability on it being bad, and when we have enough slack in our lives for any new behaviors to be evaluated in the first place. Akrasia is subagent disagreement about what to do.


The Amish, and Strategic Norms around Technology

25 марта, 2019 - 01:16
Published on March 24, 2019 10:16 PM UTC

I was reading Legal Systems Very Different From Ours by David Friedman. The chapter on the Amish made a couple interesting claims, which changed my conception of that culture (although I'm not very confident that the Amish would endorse these claims as fair descriptions).

Strategic Norms Around Technology

The Amish relationship to technology is not "stick to technology from the 1800s", but rather "carefully think about how technology will affect your culture, and only include technology that does what you want."

So, electric heaters are fine. Central heating in a building is not. This is because if there's a space-heater in the living room, this encourages the family to congregate together. Whereas if everyone has heating in their room, they're more likely to spend time apart from each other.

Some communities allow tractors, but only if they don't have rubber tires. This makes them good for tilling fields but bad for driving around.

Cars and telephones are particularly important not to allow, because easy transportation and communication creates a slippery slope to full-connection to the outside world. And a lot of the Amish lifestyle depends on cutting themselves off from the various pressures and incentives present in the rest of the world.

Some Amish communities allow people to borrow telephones or cars from non-Amish neighbors. I might have considered this hypocritical. But in the context of "strategic norms of technology", it need not be. The important bit is to add friction to transportation and communication.

Competitive Dictatorship

Officially, most Amish congregations operate via something-like-consensus (I'm not sure I understood this). But Friedman's claim is that in practice, most people tend to go with what the local bishop says. This makes a bishop something like a dictator.

But, there are lots of Amish communities, and if you don't like the direction a bishop is pushing people in, or how they are resolving disputes, you can leave. There is a spectrum of communities ranging in how strict they are about about various rules, and they make decisions mostly independently.

So there is not only strategic norms around technology, but a fairly interesting, semi-systematic exploration of those norms.

Other Applications

I wouldn't want to be Amish-in-particular, but the setup here is very interesting to me.

I know some people who went to MAPLE, a monastery program. While there, there were limits on technology that meant, after 9pm, you basically had two choices: read, or go to bed. The choices were strongly reinforced by the social and physical environment. And this made it much easier to make choices they endorsed.

Contrast this with my current house, where a) you face basically infinite choices about to spend your time, and b) in practice, the nightly choices often end up being something like "stay up till 1am playing minecraft with housemates" or "stay up till 2am playing minecraft with housemates."

I'm interested in the question "okay, so... my goals are not the Amish goals. But, what are my goals exactly, and is there enough consensus around particular goals to make valid choices around norms and technology other than 'anything goes?'"

There are issues you face that make this hard, though:

Competition with the Outside World – The Amish system works because it cuts itself off from the outside world, and its most important technological choices directly cause that. Your business can't get outcompeted by someone else who opens up their shop on Sundays because there is nobody who opens their shop on Sundays.

You also might have goals that directly involve the outside world.

(The Amish also have good relationships with the government such that they can get away with implementing their own legal systems and get exceptions for things like school-laws. If you want to do something on their scale, you both would need to not attract the ire of the government, and be good enough at rolling your own legal system to not screw things up and drive people away)

Lack of Mid-Scale-Coordination – I've tried to implement 10pm bedtimes. It fails, horribly, because I frequently attend events that last till midnight or later. Everyone could shift their entire sleep schedule forward, maybe. But also...

People Are Different – Some of people's needs are cultural. But some are biological, and some needs are maybe due to environmental factors that happened over decades and can't be changed on a dime.

Some people do better with rules and structure. Some people flourish more with flexibility. Some people need rules and structure but different rules and structure than other people.

This all makes it fairly hard to coordinate on norms.

Contenders for Change

Given the above, I think it makes most sense to:

  • Look for opportunities explore norms and technology-use at the level of individuals, households, and small organizations (these seem like natural clusters with small numbers of stakeholders, where you can either get consensus or have a dictator).
  • While doing so, choose norms that are locally stable, that don't require additional cooperation outside yourself, your household or your org.

For example, I could imagine an entire household trying out a rule, like "the household internet turns off at 10pm", or "all the lights turn reddish at night so it's easier to get to sleep"