Вы здесь
Сборщик RSSлент
Уличная эпистемология. Тренировка
Рациональное додзё. Премортемы
Рациональное додзё. Премортемы
Ненасильственное общение. Тренировка
Ненасильственное общение. Тренировка
Рациональное додзё. Эксперименты
Рациональное додзё. Эксперименты
Terms & literature for purposely lossy communication
Say I have a 10 of coins that could either be red or blue, and heads or tails. This has 20 bits of information.
My friend is interested in knowing about which of them are heads or tails, but doesn’t care about the color. I then decide to only tell my friend the heads/tails information; so, 10 bits of information.
Another example: With image compression, there's a big difference between "Accurately reconstruct this image, pixel by pixel" vs. "Accurately reconstruct what the viewer would remember, which is basically just 'Lion on grass'".
I’d feel uncomfortable calling this translation “compression”, because there was definitely intentional information loss. Most of the literature on compression is about optimally maintaining information, not on optimally loosing it.
Are there other good terms or literature for this?
Discuss
Logical Representation of Causal Models
Epistemic status: I expect some people to say "this is obvious and trivial", and others to say "this makes no sense at all".
One fundamental difference between E.T. Jaynes’ probability textbook and most others is the emphasis on including our model in the prior information. When we write an expression like .mjxchtml {display: inlineblock; lineheight: 0; textindent: 0; textalign: left; texttransform: none; fontstyle: normal; fontweight: normal; fontsize: 100%; fontsizeadjust: none; letterspacing: normal; wordwrap: normal; wordspacing: normal; whitespace: nowrap; float: none; direction: ltr; maxwidth: none; maxheight: none; minwidth: 0; minheight: 0; border: 0; margin: 0; padding: 1px 0} .MJXcdisplay {display: block; textalign: center; margin: 1em 0; padding: 0} .mjxchtml[tabindex]:focus, body :focus .mjxchtml[tabindex] {display: inlinetable} .mjxfullwidth {textalign: center; display: tablecell!important; width: 10000em} .mjxmath {display: inlineblock; bordercollapse: separate; borderspacing: 0} .mjxmath * {display: inlineblock; webkitboxsizing: contentbox!important; mozboxsizing: contentbox!important; boxsizing: contentbox!important; textalign: left} .mjxnumerator {display: block; textalign: center} .mjxdenominator {display: block; textalign: center} .MJXcstacked {height: 0; position: relative} .MJXcstacked > * {position: absolute} .MJXcbevelled > * {display: inlineblock} .mjxstack {display: inlineblock} .mjxop {display: block} .mjxunder {display: tablecell} .mjxover {display: block} .mjxover > * {paddingleft: 0px!important; paddingright: 0px!important} .mjxunder > * {paddingleft: 0px!important; paddingright: 0px!important} .mjxstack > .mjxsup {display: block} .mjxstack > .mjxsub {display: block} .mjxprestack > .mjxpresup {display: block} .mjxprestack > .mjxpresub {display: block} .mjxdelimh > .mjxchar {display: inlineblock} .mjxsurd {verticalalign: top} .mjxmphantom * {visibility: hidden} .mjxmerror {backgroundcolor: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; fontstyle: normal; fontsize: 90%} .mjxannotationxml {lineheight: normal} .mjxmenclose > svg {fill: none; stroke: currentColor} .mjxmtr {display: tablerow} .mjxmlabeledtr {display: tablerow} .mjxmtd {display: tablecell; textalign: center} .mjxlabel {display: tablerow} .mjxbox {display: inlineblock} .mjxblock {display: block} .mjxspan {display: inline} .mjxchar {display: block; whitespace: pre} .mjxitable {display: inlinetable; width: auto} .mjxrow {display: tablerow} .mjxcell {display: tablecell} .mjxtable {display: table; width: 100%} .mjxline {display: block; height: 0} .mjxstrut {width: 0; paddingtop: 1em} .mjxvsize {width: 0} .MJXcspace1 {marginleft: .167em} .MJXcspace2 {marginleft: .222em} .MJXcspace3 {marginleft: .278em} .mjxtest.mjxtestdisplay {display: table!important} .mjxtest.mjxtestinline {display: inline!important; marginright: 1px} .mjxtest.mjxtestdefault {display: block!important; clear: both} .mjxexbox {display: inlineblock!important; position: absolute; overflow: hidden; minheight: 0; maxheight: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjxtestinline .mjxleftbox {display: inlineblock; width: 0; float: left} .mjxtestinline .mjxrightbox {display: inlineblock; width: 0; float: right} .mjxtestdisplay .mjxrightbox {display: tablecell!important; width: 10000em!important; minwidth: 0; maxwidth: none; padding: 0; border: 0; margin: 0} .MJXcTeXunknownR {fontfamily: monospace; fontstyle: normal; fontweight: normal} .MJXcTeXunknownI {fontfamily: monospace; fontstyle: italic; fontweight: normal} .MJXcTeXunknownB {fontfamily: monospace; fontstyle: normal; fontweight: bold} .MJXcTeXunknownBI {fontfamily: monospace; fontstyle: italic; fontweight: bold} .MJXcTeXamsR {fontfamily: MJXcTeXamsR,MJXcTeXamsRw} .MJXcTeXcalB {fontfamily: MJXcTeXcalB,MJXcTeXcalBx,MJXcTeXcalBw} .MJXcTeXfrakR {fontfamily: MJXcTeXfrakR,MJXcTeXfrakRw} .MJXcTeXfrakB {fontfamily: MJXcTeXfrakB,MJXcTeXfrakBx,MJXcTeXfrakBw} .MJXcTeXmathBI {fontfamily: MJXcTeXmathBI,MJXcTeXmathBIx,MJXcTeXmathBIw} .MJXcTeXsansR {fontfamily: MJXcTeXsansR,MJXcTeXsansRw} .MJXcTeXsansB {fontfamily: MJXcTeXsansB,MJXcTeXsansBx,MJXcTeXsansBw} .MJXcTeXsansI {fontfamily: MJXcTeXsansI,MJXcTeXsansIx,MJXcTeXsansIw} .MJXcTeXscriptR {fontfamily: MJXcTeXscriptR,MJXcTeXscriptRw} .MJXcTeXtypeR {fontfamily: MJXcTeXtypeR,MJXcTeXtypeRw} .MJXcTeXcalR {fontfamily: MJXcTeXcalR,MJXcTeXcalRw} .MJXcTeXmainB {fontfamily: MJXcTeXmainB,MJXcTeXmainBx,MJXcTeXmainBw} .MJXcTeXmainI {fontfamily: MJXcTeXmainI,MJXcTeXmainIx,MJXcTeXmainIw} .MJXcTeXmainR {fontfamily: MJXcTeXmainR,MJXcTeXmainRw} .MJXcTeXmathI {fontfamily: MJXcTeXmathI,MJXcTeXmathIx,MJXcTeXmathIw} .MJXcTeXsize1R {fontfamily: MJXcTeXsize1R,MJXcTeXsize1Rw} .MJXcTeXsize2R {fontfamily: MJXcTeXsize2R,MJXcTeXsize2Rw} .MJXcTeXsize3R {fontfamily: MJXcTeXsize3R,MJXcTeXsize3Rw} .MJXcTeXsize4R {fontfamily: MJXcTeXsize4R,MJXcTeXsize4Rw} .MJXcTeXvecR {fontfamily: MJXcTeXvecR,MJXcTeXvecRw} .MJXcTeXvecB {fontfamily: MJXcTeXvecB,MJXcTeXvecBx,MJXcTeXvecBw} @fontface {fontfamily: MJXcTeXamsR; src: local('MathJax_AMS'), local('MathJax_AMSRegular')} @fontface {fontfamily: MJXcTeXamsRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_AMSRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_AMSRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_AMSRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXcalB; src: local('MathJax_Caligraphic Bold'), local('MathJax_CaligraphicBold')} @fontface {fontfamily: MJXcTeXcalBx; src: local('MathJax_Caligraphic'); fontweight: bold} @fontface {fontfamily: MJXcTeXcalBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_CaligraphicBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_CaligraphicBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_CaligraphicBold.otf') format('opentype')} @fontface {fontfamily: MJXcTeXfrakR; src: local('MathJax_Fraktur'), local('MathJax_FrakturRegular')} @fontface {fontfamily: MJXcTeXfrakRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_FrakturRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_FrakturRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_FrakturRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXfrakB; src: local('MathJax_Fraktur Bold'), local('MathJax_FrakturBold')} @fontface {fontfamily: MJXcTeXfrakBx; src: local('MathJax_Fraktur'); fontweight: bold} @fontface {fontfamily: MJXcTeXfrakBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_FrakturBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_FrakturBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_FrakturBold.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmathBI; src: local('MathJax_Math BoldItalic'), local('MathJax_MathBoldItalic')} @fontface {fontfamily: MJXcTeXmathBIx; src: local('MathJax_Math'); fontweight: bold; fontstyle: italic} @fontface {fontfamily: MJXcTeXmathBIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MathBoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MathBoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MathBoldItalic.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsansR; src: local('MathJax_SansSerif'), local('MathJax_SansSerifRegular')} @fontface {fontfamily: MJXcTeXsansRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_SansSerifRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_SansSerifRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_SansSerifRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsansB; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerifBold')} @fontface {fontfamily: MJXcTeXsansBx; src: local('MathJax_SansSerif'); fontweight: bold} @fontface {fontfamily: MJXcTeXsansBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_SansSerifBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_SansSerifBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_SansSerifBold.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsansI; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerifItalic')} @fontface {fontfamily: MJXcTeXsansIx; src: local('MathJax_SansSerif'); fontstyle: italic} @fontface {fontfamily: MJXcTeXsansIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_SansSerifItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_SansSerifItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_SansSerifItalic.otf') format('opentype')} @fontface {fontfamily: MJXcTeXscriptR; src: local('MathJax_Script'), local('MathJax_ScriptRegular')} @fontface {fontfamily: MJXcTeXscriptRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_ScriptRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_ScriptRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_ScriptRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXtypeR; src: local('MathJax_Typewriter'), local('MathJax_TypewriterRegular')} @fontface {fontfamily: MJXcTeXtypeRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_TypewriterRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_TypewriterRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_TypewriterRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXcalR; src: local('MathJax_Caligraphic'), local('MathJax_CaligraphicRegular')} @fontface {fontfamily: MJXcTeXcalRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_CaligraphicRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_CaligraphicRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_CaligraphicRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmainB; src: local('MathJax_Main Bold'), local('MathJax_MainBold')} @fontface {fontfamily: MJXcTeXmainBx; src: local('MathJax_Main'); fontweight: bold} @fontface {fontfamily: MJXcTeXmainBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MainBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MainBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MainBold.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmainI; src: local('MathJax_Main Italic'), local('MathJax_MainItalic')} @fontface {fontfamily: MJXcTeXmainIx; src: local('MathJax_Main'); fontstyle: italic} @fontface {fontfamily: MJXcTeXmainIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MainItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MainItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MainItalic.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmainR; src: local('MathJax_Main'), local('MathJax_MainRegular')} @fontface {fontfamily: MJXcTeXmainRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MainRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MainRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MainRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmathI; src: local('MathJax_Math Italic'), local('MathJax_MathItalic')} @fontface {fontfamily: MJXcTeXmathIx; src: local('MathJax_Math'); fontstyle: italic} @fontface {fontfamily: MJXcTeXmathIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MathItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MathItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MathItalic.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsize1R; src: local('MathJax_Size1'), local('MathJax_Size1Regular')} @fontface {fontfamily: MJXcTeXsize1Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_Size1Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_Size1Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_Size1Regular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsize2R; src: local('MathJax_Size2'), local('MathJax_Size2Regular')} @fontface {fontfamily: MJXcTeXsize2Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_Size2Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_Size2Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_Size2Regular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsize3R; src: local('MathJax_Size3'), local('MathJax_Size3Regular')} @fontface {fontfamily: MJXcTeXsize3Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_Size3Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_Size3Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_Size3Regular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsize4R; src: local('MathJax_Size4'), local('MathJax_Size4Regular')} @fontface {fontfamily: MJXcTeXsize4Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_Size4Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_Size4Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_Size4Regular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXvecR; src: local('MathJax_Vector'), local('MathJax_VectorRegular')} @fontface {fontfamily: MJXcTeXvecRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_VectorRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_VectorRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_VectorRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXvecB; src: local('MathJax_Vector Bold'), local('MathJax_VectorBold')} @fontface {fontfamily: MJXcTeXvecBx; src: local('MathJax_Vector'); fontweight: bold} @fontface {fontfamily: MJXcTeXvecBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_VectorBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_VectorBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_VectorBold.otf') format('opentype')} P[X=2Y=3], that’s really shorthand for P[X=2Y=3,M], where M is some probabilistic model in which X and Y appear. In practice, I’ve found this useful for two main usecases: model comparison, and interventions/counterfactuals in causal models. This post is mainly about the latter.
The general idea is that an intervention in a causal model (e.g. do(Z=1)) takes in one model and returns a new model  it should really be written as M′=do(Z=1,M). When we write something like P[X=2Y=3,do(Z=1)], that’s really shorthand for P[X=2Y=3,do(Z=1,M)].
In order to make this all less handwavy, we need to make the model M a bit more explicit.
What’s in a Model?The simplest way to represent a probabilistic model is as a table of possibilities  more explicitly, a list of exhaustive and mutuallyexclusive logic statements. If I roll a standard die and call the outcome X, then I’d explicitly represent my model as M=(P[X=1M]=16)&…&(P[X=6M]=16).
In Probability as Minimal Map, we saw that P[XP[XY]=p]=p. Interpretation: I obtain some data Y, calculate the probability P[XY]=p, then my computer crashes and I lose the data. But as long as I still know p, I should still assign the same probability to X. Thus: the probability of X, given P[XY]=p (but not given Y itself!) is just p.
(Note that I left the model M implicit in the previous paragraph  really we should write P[XP[XY,M]=p,M]=p.)
Now let’s apply that idea to the expression P[X=1M], with our diemodel M=(P[X=1M]=16)&…&(P[X=6M]=16). Our given information includes P[X=1M]=16, so
P[X=1M]=P[X=1P[X=1M]=16,M]=16.
Representing models this way gives a much stronger logicflavor to the calculations; our probability calculations are a derivation in an explicit logic. The axioms of that logic are the contents of M, along with the universal laws of probability (i.e. Bayes’ rule, sum rule, etc) and arithmetic.
Causality & InterventionsIn the case of a causal model, M would look something like
M=(G=(...graph...))&(P[X1Xpa(1,G),M]=f1(X1,Xpa(1,G)))&…&(P[XnXpa(n,G),M]=fn(Xn,Xpa(n,G)))
i.e. M gives a graph G and an expression for the probability of each Xi in terms of i’s parents in G. (This would be for a Bayes net; structural equations are left as an exercise to the reader.)
A do() operation then works exactly like you’d expect: do(Xi=1,M) returns a new model M′ in which:
 The arrows into node i in G have been removed
 fi has been replaced with the indicator function I[Xi=1] (or, for continuous Xi, δ(Xi−1)dXi)
Counterfactuals work the same way, except they’re limited to structural models  i.e. every nondeterministic node must be a root. As long as the model satisfies that constraint, a counterfactual is exactly the same as an intervention: if we have some data (X1,...,Xn)=(2,1,...,−0.3,6), then to run the counterfactual X3=1, we calculate P[X(X1,X2,X4,...,Xn)=(2,1,...,−0.3,6),do(X3=1,M)]. If we do this with a nonstructural model  i.e. if some nondeterministic node has parents  then we’ll find that the result is sometimes undefined: our axioms do not fully determine the probability in question.
Why Does This Matter?Hopefully this all seems pretty trivial. Why belabor it?
There are a handful of practical applications where explicitly including the model is useful.
The most important of these is model comparison, especially the Bayesian approach to learning causal structure. Another application is scenarios involving a mix of different experimental interventions and observational studies.
But the main reason I’m bringing it up is that agenty things have the type signature (A > B) > A. In English: agenty things have some model (A > B) which predicts the results (B) of their own actions (A). They use that model to decide what actions to perform: (A > B) > A.
In the context of causal models, the model (A > B) is our causal model M. (A > B) > A means performing some computation on M in order to find A  which is a lot simpler with an explicit representation of M.
Of course, we could just use the usual structural equation representation without explicitly making everything a statement in some logic  but then we’d have a lot more different types of things floating around. By explicitly making everything logic statements, we unify the formulation. Statements like “counterfactuals are underdefined for Bayes nets” become statements about provability within our logic, and can themselves be proven. Also, by formulating the model in terms of logic statements, we have a single unified language for probability queries P[XY]  the models M, M′, etc can be represented and manipulated in the same format as any other information.
Discuss
Disasters
If there were a natural disaster tomorrow and it took about two weeks to get things working again, how many people would be ok for food, water, and other necessities? I'm guessing below 5%, but I think this level of preparedness would be a good goal for most people who can afford it. Why don't people plan for potential disasters? Some possibilities:

They don't think disasters are likely. On the other hand, I also don't think disasters are likely! While we have extra water in the basement, I think the chances we'll need it sometime during my life are only maybe 2%. Since it's not expensive, and if we do need it we'll be incredibly happy to have it, I think it's worth setting up.
It does matter a lot whether the chances are ~2% or 0.0002%, but if you think your lifetime chance of being impacted by a serious disaster is under 1% I'd encourage you to think about historical natural disasters in your area (earthquakes, floods, hurricanes, wildfires, etc) plus the risk of potential humancaused disasters (nuclear war, epidemics, civil war, economic collapse, etc).
It's weird. Most people don't do it, and a heuristic of "do the things other people do" is normally a pretty good one. In this case, though, I think we should be trying to change what's normal. The government agrees; the official recommendations involve a lot more preparation than people typically do.
They can't afford the money, time, or thought. Many people are in situations where planning for what's likely to happen in the next couple months is hard enough, let alone for things that have a low single digits chance of happening ever. This can't explain all of it, though, because even people who do have more time and money also haven't generally thought through simpler preparations.
They don't think preparation is likely to be useful. If there's a nuclear strike we're all dead anyway, right? Except most disasters, even nuclear ones, aren't this binary. Avoiding exposure to radiation and having KI available can help your longterm chances a lot. Many disasters (nuclear, earthquake, epidemic, severe storm) are ones where having sufficient supplies to stay at home for weeks would be very helpful. If you think preparation wouldn't help and you haven't, say, read through the suggestions on ready.gov, I'd recommend doing that.
They're used to local emergencies. We generally have a lot more experience with things like seeing houses burn down, knowing people who've become unable to work, or having family members get very sick. These can be major problems on a personal scale, but families, society, government, and infrastructure will generally still be intact. We can have insurance and expect that it will pay out; others in our families and communities may be able to help us. Things that affect a few people in a region or community at a time are the sort of things societies have the spare capacity for and figure out how to handle. A regional disaster works very differently, and makes planning in advance much more worthwhile.
They expect to see it coming. Forecasting is good enough that we're very unlikely to be surprised by a hurricane, but for now an earthquake could still come out of nowhere. Others seem like the kind of thing we ought to be able to anticipate, but are tricky: it's hard to see an economic collapse coming because economic confidence is antiinductive and we tend to suddenly go from "things are good" to "things are very much not good". Paying attention is valuable, but it's not sufficient.
They're not considering how bad things can be. For many of us our daily experience is really very good: high quality plentiful food and drink, comfortable and sufficient clothing, interesting things to do, good medical care. When you consider how bad a disaster can be, things that would improve your life a lot in very rare circumstances can make a lot of sense.
They're not sure what to do. This is pretty reasonable: there's a ton of writing, often aimed at people who've gotten really into prepping, and not much in the way of "here are a few things to do if you want to allocate a weekend morning to getting into a better place". Storing extra water (~15gal/person), food (buy extra nonperishables and rotate through them), and daily medications, however, goes a long way. For a longer list, this guide seems pretty good. (Though they're funded by affiliate links so they have incentives to push you in the "buying things" direction.)
Discuss
Уличная эпистемология. Тренировка
Уличная эпистемология. Тренировка
Safety regulators: A tool for mitigating technological risk
Crossposted to the Effective Altruism Forum
So far the idea of differential technological development has been discussed in a way that either (1) emphasizes ratios of progress rates, (2) ratios of remaining work, (3) maximizing or minimizing correlations (for example, minimizing the overlap between the capability to do harm and the desire to do so), (4) implementing safe tech before developing and implementing unsafe tech, and (5) the occasional niche analysis (possibly see also a complementary aside relating differential outcomes to growth rates in the long run). I haven’t seen much work talking about how various capabilities (a generalization of technology) may interact with each other in general in ways that prevent downside effects (though see also The Vulnerable World Hypothesis), and wish to elaborate on this interaction type.
As technology improves, our capacity to do both harm and good increases and each additional capacity unlocks new capacities that can be implemented. For example the invention of engines unlocked railroads, which in turn unlocked more efficient trade networks. However, the invention of engines also enabled the construction of mobile war vehicles. How, in an ideal world, could we implement capacities so we get the outcomes we want while creating minimal harm and risks in the process?
What does implementing a capacity do? It enables us to change something. A normal progression is:
 We have no control over something (e.g. We cannot generate electricity)
 We have control but our choices are noisy and partially random (e.g. We can produce electric sparks on occasion but don’t know how to use them)
 Our choices are organized but there are still downside effects (e.g. We can channel electricity to our homes but occasionally people get electrocuted or fires are started)
 Our use of the technology mostly doesn’t have downside effects (e.g. We have capable safety regulators (e.g. insulation, fuses,...) that allows us to minimize fire and electrocution risks)
The problem is that downside effects in stages 2 and 3 could overwhelm the value achieved during those stages and at stage 4, especially when considering powerful game changing technologies that could lead to existential risks.
Even more fundamentally, as agents in the world we want to avoid shifting the expected utility in a negative direction relative to other options (the opportunity costs). We want to implement new capacities in the best sequence, like with any other plan, so as to maximize the value we achieve. The value is a property of an entire plan and the value is harder to think about than just what is the optimal (or safe) next thing to do (ignoring what is done after). We wish to make choosing which capacities to develop more manageable and easier to think about. One way to do this is to make sure that each capacity we implement is immediately an improvement relative to the state we’re in before implementing it (this simplification is an example of a greedy algorithm heuristic). What does this simplification imply about the sequence of implementing capacities?
This implies that what we want to do is to have the capacities so we may do good without the downside effects and risks of those capacities. How do we do this? If we’re lucky the capacity itself has no downside risks, and we’re done. But if we’re not lucky we need to implement a regulator on that capacity: a safety regulator. Let’s define a safety regulator as a capacity that helps control other capacities to mitigate their downside effects. Once a capacity has been fully safety regulated, it is then unlocked and we can implement it to positive effect.
Some distinctions we want to pay attention to are then:
 A capacity  a technology, resource, or plan that changes the world either autonomously or by enabling us to use it
 An implemented capacity  a capacity that is implemented
 An available capacity  a capacity that can be implemented immediately
 An unlocked capacity  a capacity that is safe and beneficial to implement given the technological context, and is also available
 A potential capacity  the set of all possible capacities: those already implemented, those being worked on, those that are available and those that exists in theory but need prerequisite capacities to be implemented first.
 A safety regulator  a capacity that unlocks other capacities, by mitigating downside effects and possibly providing a prerequisite. (The safety regulator may or may not be unlocked itself at this stage  you may need to implement other safety regulators or capacities to unlock it). Generally, safety regulators are somewhat specialized for the specific capacities they unlock.
Running the suggested heuristic strategy then looks like: If a capacity is unlocked, then implement it; otherwise, implement either an unlocked safety regulator for it first or choose a different capacity to implement. We could call this a safety regulated capacity expanding feedback loop. For instance, with respect to nuclear reactions humanity (1) had the implemented capacity of access to radioactivity, (2) this made available the safety regulator of controlling chain reactions, (3) determining how to control chain reactions was implemented (through experimentation and calculation), (4) this unlocked the capacity to use chain reactions (in a controlled fashion), (5) and the capacity of using chain reactions was implemented.
Limitations and extensions to this method:
 It’s difficult to tell which of the unlocked capacities to implement at a particular step. But we’ll assume some sort of decision process exists for optimizing that.
 Capacities may be good temporarily, but if other capacities are not implemented in time, they may become harmful (see the loss unstable states idea).
 Implementing capacities in this way isn’t necessarily optimal because this approach does not allow for temporary bad effects that yield better results in the long run.
 Capacities do not necessarily stay unlocked forever due to interactions with other capacities that may be implemented in the interim.
 A locked capacity may be net good to implement if a safety regulator is implemented before the downside effects could take place (this is related to handling cluelessness).
 The detailed interaction between capacities and planning which to develop in which order resembles the type of problem the TWEAK planner was built for and it may be one good starting point for further research.
 In more detail, how can one capacity prevent the negative effects of another?
Discuss
How Doomed are Large Organizations?
We now take the model from the previous post, and ask the questions over the next several posts. This first answer post asks these questions:
 Are these dynamics the inevitable results of large organizations?
 How can we forestall these dynamics within an organization?
 To what extent should we avoid creating large organizations?
 Has this dynamic ever been different in the past in other times and places?
These are the best answers I was able to come up with. Some of this is reiteration of previous observations and prescriptions. Some of it is new.
There are some bold claims in these answer posts, which I lack the space and time to defend in detail or provide citations for properly, with which I am confident many readers will disagree. I am fine with that. I do not intend to defend them further unless I see an opportunity in doing so.
I would love to be missing much better strategies for making organizations less doomed – if you have ideas please please please share them in the comments and/or elsewhere.
Are these dynamics the inevitable result of large organizations?These dynamics are the default result of large organizations. There is continuous pressure over time pushing towards such outcomes.
The larger the organization, the longer it exists, and the more such outcomes have already happened, both there and elsewhere, the greater the pressure towards such outcomes.
Once such dynamics take hold, reversing them within an organization is extremely difficult.
Nonlocally within a civilization, one can allow new organizations to periodically take the place of old ones to reset the damage.
Locally within a sufficiently large organization and over a sufficiently long time horizon, this makes these dynamics inevitable. The speed at which this occurs still varies greatly, and depends on choices made.
How can we forestall these dynamics within an organization?These dynamics can be forestalled somewhat through a strong organizational culture that devotes substantial head space and resources to keeping the wrong people and behaviors out. This requires a leader who believes in this and in making it a top priority. Usually this person is a founder. Losing the founder is often the trigger for a rapid ramp up in maze level.
Keeping maze levels in check means continuously sacrificing substantial head space, resources, ability to scale and shortterm effectiveness to this cause. This holds both for the organization overall and the leader personally.
Head space is sacrificed three ways: You have less people, you devote some of those people to the mazefighting process, and the process takes up space in everyone’s head.
Central to this is to ruthlessly enforce an organizational culture with zero tolerance for maze behaviors.
Doing anything with an intent to deceive, or an intent to game your metrics at the expense of object level results, needs to be an automatic “you’re fired.”
Some amount of politics is a human universal, but it needs to be strongly discouraged. Similarly, some amount of putting in extra effort at crucial times is necessary, but strong patterns of guarding people’s nonwork lives from work, both in terms of time and other influences, are also strongly necessary.
Workers and managers need to have as much effective skin in the game as you can muster.
One must hire carefully, with a keen eye to the motivations and instincts of applicants, and a long period of teaching them the new cultural norms. This means at least growing slowly, so new people can be properly incorporated.
You also want a relatively flat hierarchy, to the extent possible.
There will always be bosses when crunch time comes. Someone is always in charge. Don’t let anyone tell you different. But the less this is felt in ordinary interactions, and thus the more technically direct reports each boss can have and still be effective, and thus the less levels of hierarchy you need for a given number of people, the better off you’ll be.
You can run things in these ways. I have seen it. It helps. A lot.
Another approach is to lower the outside maze level. Doing so by changing society at large is exceedingly hard. Doing so by associating with outside organizations with lower maze levels, and going into industries and problems with lower maze levels, seems more realistic. If you want to ‘disrupt’ an area that is suffering from maze dysfunction, it makes sense to bypass the existing systems entirely. Thus, move fast, break things.
One can think of all these tactics as taking the questions one uses to identify or predict a maze, and trying to engineer the answers you want. That is a fine intuitive place to start.
However, if Goodhart’s Law alarm bells did not go off in your head when you read that last paragraph, you do not appreciate how dangerous Goodhart Traps are.
The Goodhart TrapThe fatal flaw is that no matter what you target when distributing rewards and punishments and cultural approval, it has to be something. If you spell it out, and a sufficiently large organization has little choice but to spell it out, you inevitably replace one type of Goodharting with another. One type of deception becomes another.
One universal is that in order to maintain a unique culture, you must filter for those that happily embrace that culture. That means you are now testing everyone constantly, no matter how explicit you avoid making this, on whether they happily embrace the company and its culture. People therefore pretend to embrace the culture and pretend to be constantly happy. Even if they do embrace the culture and are happy, they still additionally will put on a show of doing so.
If you punish deception you get people pretending not to deceive. If you punish pretending, you get people who pretend to not be the type of people who would pretend. People Goodhart on not appearing to Goodhart.
Which is a much more interesting level to play on, and usually far less destructive. If you do a good enough job picking your Goodhart targets, this beats the alternatives by a lot.
Still, you eventually end up in a version of the same place. Deception is deception. Pretending is pretending. Fraud is fraud. The soul still dies. Simulacrum levels still slowly rise.
Either you strongly enforce a culture, and slowly get that result, or you don’t. If you don’t and are big enough, you quickly get a maze. If you do and/or are smaller, depending on your skill level and dedication to the task, you slowly get a maze.
Hiring well is better than enforcing or training later, since once people are in they can then be themselves. Also because enforcement of culture is, as pointed out above, toxic even if you mean to enforce a nontoxic ideal. But relying on employee selection puts a huge premium on not making hiring mistakes. Even one bad hire in the wrong place can be fatal. Especially if they then are in a position to bring others with them. You need to defend your hiring process especially strongly from these same corruptions.
My guess is that once an organization grows beyond about Dunbar’s number, defending your culture becomes a losing battle even under the best of circumstances. Enforcing the culture will fail outright in the medium term, unless the culture outside the organization is supporting you.
If you are too big, every known strategy is only a holding action. There is no permanent solution.
To what extent should we avoid creating large organizations?Quite a lot. These effects are a really big deal. Organizations get less effective, more toxic and corrupt as places to work and interact with, and add more toxicity and corruption to society.
Every level of hierarchy enhances this effect. The first five, dramatically so. Think hard before being or having a boss. Think harder before letting someone’s boss report to a boss. Think even harder than that before adding a fourth or fifth level of hierarchy.
That does not mean such things can be fully avoided. The advantages of large organizations with many degrees of hierarchy are also a really big deal. We cannot avoid them entirely.
We must treat creating additional managerial levels as having very high costs. This is not an action to be taken lightly. Wherever possible, create distinct organizations and allow them to interact. Even better, allow people to interact as individuals.
This adds friction and transaction costs. It makes many forms of coordination harder. Sometimes it simply cannot be done if you want to do the thing you’d like to do.
This is increasingly the case, largely as a result of enemy action. Some of this is technology and our problems being legitimately more complex. Most of it is regulatory frameworks and mazesupporting social norms that require massive costs, including massive fixed costs, be paid as part of doing anything at all. This is a key way mazes expropriate resources and reward other mazes while punishing nonmazes.
I often observe people who are stuck working in mazes who would much prefer to be selfemployed or to exit their current job or location, but who are unable to do so because the legal deck is increasingly stacked against that.
Even if the work itself is permitted, health insurance issues alone force many into working for the man.
When one has a successful small organization, the natural instinct is to scale it up and become a larger organization.
Resist this urge whenever possible. There is nothing wrong with being good at what you do at the scale you are good at doing it. Set an example others can emulate. Let others do other things, be other places. Any profits from that enterprise can be returned to investors and/or paid to employees, and used to live life or create or invest in other projects, or to help others.
One need not point to explicit quantified dangers to do this. Arguments that one cannot legitimately choose to ‘leave money on the table’ or otherwise not maximize, are maximalist arguments for some utility function that does not properly capture human value and is subject to Goodhart’s Law, and against the legitimacy of slack.
The fear that if you don’t grow, you’ll get ‘beaten’ by those that do, as in Raymond’s kingdoms? Overblown. Also asking the wrong question. So what if someone else is bigger or more superficially successful? So what if you do not build a giant thing that lasts? Everything ends. That is not, by default, what matters. A larger company is often not better than several smaller companies. A larger club is often not better than several smaller clubs. A larger state is often not better or longer lasting than several smaller ones. Have something good and positive, for as long as it is viable and makes sense, rather than transforming into something likely to be bad.
People like to build empires. Those with power usually want more power. That does not make more power a good idea. It is only a good idea where it is instrumentally useful.
In some places, competition really is winnertakeall and/or regulations and conditions too heavily favor the large over the small. One must grow to survive. Once again, we should be suspicious that this dynamic has been engineered rather than being inherent in the underlying problem space.
Especially in those cases, this leads back to the question of how we can grow larger and keep these dynamics in check.
Has this dynamic ever been different in the past in other places and times?These dynamics seem to me to be getting increasingly worse, which implies they have been better in the past.
Recent developments indicate an increasing simulacrum level, an increasing reluctance to allow older institutions to be replaced by newer ones, and an increasing reliance on cronyism and corruption that props up failure, allowing mazes to survive past when they are no longer able to fulfill their original functions.
Those in the political and academic systems, on all sides, increasingly openly advocate against the very concept of objective truth, or that people should tell it, or are blameworthy for not doing so. Our president’s supporters admit and admire that he is a corrupt liar, claiming that his honesty about his corruption and lying, and his admiration for others who are corrupt, who lie and who bully, is refreshing, because they are distinct from the corrupt, the liars and the bullies who are more locally relevant to their lives. Discourse is increasingly fraught and difficult. When someone wants to engage in discourse, I frequently now observe them spending much of their time pointing out how difficult it is to engage in discourse (and I am not claiming myself as an exception here), as opposed to what such people used to do instead, which was engage in discourse.
We are increasingly paralyzed and unable to do things across a wide variety of potential human activities.
Expropriation by existing mazes and systems eats increasing shares of everything, especially in education, health care and housing.
I don’t have time for a full takedown here, but: Claims to the contrary, such as those recently made by Alex Tabbrok in Why Are The Prices So Damn High?, are statistical artifacts that defy the evidence of one’s eyes. They are the product of Moloch’s Army. When I have insurance and am asked with no warning to pay $850 for literally five minutes of a doctor’s time, after being kept waiting for an hour (and everyone I ask about this says just refuse to pay it)? When sending my child to a kindergarten costs the majority of a skilled educator’s salary? When you look at rents?
Don’t tell me the problem is labor costs due to increasing demand for real services.
Just. Don’t.
Some technological innovations remain permitted for now, and many of the organizations exploiting this are relatively new and reliant on objectlevel work, and thus less mazelike for now, but this is sufficiently narrow that we call the result “the tech industry.” We see rapid progress in the few places where innovation and actual work is permitted to those without mazes and connections, and where there is sufficient motivation for work, either intrinsic or monetary.
The tech industry also exhibits some very mazelike behaviors of its own, but it takes a different form. I am unlikely to be the best person to tackle those details, as others have better direct experience, and I will not attempt to tackle them here and now.
We see very little everywhere else. Increasingly we live in an amalgamated giant maze, and the maze is paralyzing us and taking away our ability to think or talk while robbing us blind. Mazes are increasingly in direct position to censor, deplatform or punish us, even if we do not work for them.
The idea of positivesum, objectlevel interactions being someone’s primary source of income is increasingly seen as illegitimate, and risky and irresponsible, in contrast to working for a maze. People instinctively think there’s something shady or rebellious about that entire enterprise of having an actual enterprise. A proper person seeks rent, plays the game starting in childhood, sends the right signals and finds ways to game the system. They increase their appeal to mazes by making themselves as dependent on them and their income and legitimacy streams, and as vulnerable to their blackmail, as possible.
The best way to see that positivesum games are a thing is to notice that the sum changes. If everything is zerosum, the sum would always be zero.
The best way to see that these dynamics used to be much less severe, at least in many times and places, is that those times and places looked and felt different, and got us here without collapsing. Moral Mazes was written before I was born, but the spread of these dynamics is clear as day within my lifetime, and yours as well.
Did some times and places, including our recent past, have it less bad than us in these ways? I see this as almost certainly true, but I am uncertain of the magnitude of this effect due to not having good enough models of the past.
Did some times and places have it worse than we do now? Very possible. But they’re not around anymore. Which is how it works.
The next section will ask why it was different in the past, what the causes are in general, and whether we can duplicate past conditions in good ways.
Discuss
Book Review—The Origins of Unfairness: Social Categories and Cultural Evolution
On my secret ("secret") blog, I reviewed a book about the cultural evolutionary game theory of gender! I thought I'd share the link on this website, because you guys probably like game theory?? (~2400 words)
Discuss
Whipped Cream vs Fancy Butter
Epistemic status: Jeff missing the point
The supermarket sells various kinds of fancy butter, but why don't people eat whipped cream instead? Let's normalize to 100 calorie servings and compare prices:
 Plain Butter, store brand: $0.10
 Heavy Whipping Cream, store brand: $0.20
 Fancy butter, Kerrygold brand: $0.30
Perhaps the reason people don't normally use whipped cream is that whipping it is too much trouble? If you use a manual eggbeater in a standard sixteen ounce deli cup it takes about fifteen seconds (youtube) for a serving.
Alternatively, maybe people think whipped cream has to have sugar in it? This one is simple: whipped cream should not have sugar in it. If you're eating whipped cream on something sweet it doesn't need sugar because the other thing is sweet, while if you're having it on something savory it doesn't need sugar because that would taste funny.
I'm sure I'm missing something, but I'm very happy over here eating whipped cream.
Discuss
Inner alignment requires making assumptions about human values
Many approaches to AI alignment require making assumptions about what humans want. On a first pass, it might appear that inner alignment is a subcomponent of AI alignment that doesn't require making these assumptions. This is because if we define the problem of inner alignment to be the problem of how to train an AI to be aligned with arbitrary reward functions, then a solution would presumably have no dependence on any particular reward function. We could imagine an alien civilization solving the same problem, despite using very different reward functions to train their AIs.
Unfortunately, the above argument fails because aligning an AI with our values requires giving the AI extra information that is not encoded directly in the reward function (under reasonable assumptions). The argument for my thesis is subtle, and so I will break it into pieces.
First, I will more fully elaborate what I mean by inner alignment. Then I will argue that the definition implies that we can't come up with a full solution without some dependence on human values. Finally, I will provide an example, in order to make this discussion less abstract.
Characterizing inner alignmentIn the last few posts I wrote (1, 2), I attempted to frame the problem of inner alignment in a way that wasn't too theoryladen. My concern was that the previous characterization was dependent on a solving particular outcome where you have an AI that is using an explicit outer loop to evaluate strategies based on an explicit internal search.
In the absence of an explicit internal objective function, it is difficult to formally define whether an agent is "aligned" with the reward function that is used to train it. We might therefore define alignment as the ability of our agent to perform well on the test distribution. However, if the test set is sampled from the same distribution as the training data, this definition is equivalent to the performance of a model in standard machine learning, and we haven't actually defined the problem in a way that adds clarity.
What we really care about is whether our agent performs well on a test distribution that doesn't match the training environment. In particular, we care about the agent's performance on during realworld deployment. We can estimate this real world performance ahead of time by giving the agent a test distribution that was artificially selected to emphasize important aspects of the real world more closely than the training distribution (eg. by using relaxed adversarial training).
To distinguish the typical robustness problem from inner alignment, we evaluate the agent on this testing distribution by observing its behaviors and evaluating it very negatively if it does something catastrophic (defined as something so bad we'd prefer it to fail completely). This information is used to iterate on future versions of the agent. An inner aligned agent is therefore defined as an agent that avoids catastrophes during testing.
The reward function doesn't provide enough informationSince reward functions are defined as mappings between stateaction pairs and a real number, our agent doesn't actually have enough information from the reward function alone to infer what good performance means on the test. This is because the test distribution contains states that were not available in the training distribution.
Therefore, no matter how much the agent learns about the true reward function during training, it must perform some implicit extrapolation of the reward function to what we intended, in order to perform well on the test we gave it.
We can visualize this extrapolation as if we were asking a supervised learner what it predicts for inputs beyond the range it was provided in its training set. It will be forced to make some guesses on what rule determines what the function looks like outside of its normal range.
One might assume that we could just use simplicity as the criterion for extrapolation. Perhaps we could just say, formally, the simplest possible reward function that encodes the values observed during training is the "true reward" function that we will use to test the agent. Then the problem of inner alignment reduces to the problem of creating an agent that is able to infer the true reward function from data, and then perform well according to it inside general environments. Framing the problem like this would minimize dependence on human values.
There are a number of problems with that framing, however. To start, there are boring problems associated with using simplicity to extrapolate the reward function, such as the fact that one's notion of simplicity is language dependent, and not universal, and the universal prior is malign. Beyond these (arguably minor) issues, there's a deeper issue, which forces us to make assumptions about human values in order to ensure inner alignment.
Since we assumed that the training environment was necessarily different from the testing environment, we cannot possibly provide the agent information about every possible scenario we consider catastrophic. Therefore, the metric we were using to judge the success of the agent during testing is not captured in training data alone. We must introduce additional information about what we consider catastrophic. This information comes in the form of our own preferences, as we prefer the agent to fail in some ways but not in others..mjxchtml {display: inlineblock; lineheight: 0; textindent: 0; textalign: left; texttransform: none; fontstyle: normal; fontweight: normal; fontsize: 100%; fontsizeadjust: none; letterspacing: normal; wordwrap: normal; wordspacing: normal; whitespace: nowrap; float: none; direction: ltr; maxwidth: none; maxheight: none; minwidth: 0; minheight: 0; border: 0; margin: 0; padding: 1px 0} .MJXcdisplay {display: block; textalign: center; margin: 1em 0; padding: 0} .mjxchtml[tabindex]:focus, body :focus .mjxchtml[tabindex] {display: inlinetable} .mjxfullwidth {textalign: center; display: tablecell!important; width: 10000em} .mjxmath {display: inlineblock; bordercollapse: separate; borderspacing: 0} .mjxmath * {display: inlineblock; webkitboxsizing: contentbox!important; mozboxsizing: contentbox!important; boxsizing: contentbox!important; textalign: left} .mjxnumerator {display: block; textalign: center} .mjxdenominator {display: block; textalign: center} .MJXcstacked {height: 0; position: relative} .MJXcstacked > * {position: absolute} .MJXcbevelled > * {display: inlineblock} .mjxstack {display: inlineblock} .mjxop {display: block} .mjxunder {display: tablecell} .mjxover {display: block} .mjxover > * {paddingleft: 0px!important; paddingright: 0px!important} .mjxunder > * {paddingleft: 0px!important; paddingright: 0px!important} .mjxstack > .mjxsup {display: block} .mjxstack > .mjxsub {display: block} .mjxprestack > .mjxpresup {display: block} .mjxprestack > .mjxpresub {display: block} .mjxdelimh > .mjxchar {display: inlineblock} .mjxsurd {verticalalign: top} .mjxmphantom * {visibility: hidden} .mjxmerror {backgroundcolor: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; fontstyle: normal; fontsize: 90%} .mjxannotationxml {lineheight: normal} .mjxmenclose > svg {fill: none; stroke: currentColor} .mjxmtr {display: tablerow} .mjxmlabeledtr {display: tablerow} .mjxmtd {display: tablecell; textalign: center} .mjxlabel {display: tablerow} .mjxbox {display: inlineblock} .mjxblock {display: block} .mjxspan {display: inline} .mjxchar {display: block; whitespace: pre} .mjxitable {display: inlinetable; width: auto} .mjxrow {display: tablerow} .mjxcell {display: tablecell} .mjxtable {display: table; width: 100%} .mjxline {display: block; height: 0} .mjxstrut {width: 0; paddingtop: 1em} .mjxvsize {width: 0} .MJXcspace1 {marginleft: .167em} .MJXcspace2 {marginleft: .222em} .MJXcspace3 {marginleft: .278em} .mjxtest.mjxtestdisplay {display: table!important} .mjxtest.mjxtestinline {display: inline!important; marginright: 1px} .mjxtest.mjxtestdefault {display: block!important; clear: both} .mjxexbox {display: inlineblock!important; position: absolute; overflow: hidden; minheight: 0; maxheight: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjxtestinline .mjxleftbox {display: inlineblock; width: 0; float: left} .mjxtestinline .mjxrightbox {display: inlineblock; width: 0; float: right} .mjxtestdisplay .mjxrightbox {display: tablecell!important; width: 10000em!important; minwidth: 0; maxwidth: none; padding: 0; border: 0; margin: 0} .MJXcTeXunknownR {fontfamily: monospace; fontstyle: normal; fontweight: normal} .MJXcTeXunknownI {fontfamily: monospace; fontstyle: italic; fontweight: normal} .MJXcTeXunknownB {fontfamily: monospace; fontstyle: normal; fontweight: bold} .MJXcTeXunknownBI {fontfamily: monospace; fontstyle: italic; fontweight: bold} .MJXcTeXamsR {fontfamily: MJXcTeXamsR,MJXcTeXamsRw} .MJXcTeXcalB {fontfamily: MJXcTeXcalB,MJXcTeXcalBx,MJXcTeXcalBw} .MJXcTeXfrakR {fontfamily: MJXcTeXfrakR,MJXcTeXfrakRw} .MJXcTeXfrakB {fontfamily: MJXcTeXfrakB,MJXcTeXfrakBx,MJXcTeXfrakBw} .MJXcTeXmathBI {fontfamily: MJXcTeXmathBI,MJXcTeXmathBIx,MJXcTeXmathBIw} .MJXcTeXsansR {fontfamily: MJXcTeXsansR,MJXcTeXsansRw} .MJXcTeXsansB {fontfamily: MJXcTeXsansB,MJXcTeXsansBx,MJXcTeXsansBw} .MJXcTeXsansI {fontfamily: MJXcTeXsansI,MJXcTeXsansIx,MJXcTeXsansIw} .MJXcTeXscriptR {fontfamily: MJXcTeXscriptR,MJXcTeXscriptRw} .MJXcTeXtypeR {fontfamily: MJXcTeXtypeR,MJXcTeXtypeRw} .MJXcTeXcalR {fontfamily: MJXcTeXcalR,MJXcTeXcalRw} .MJXcTeXmainB {fontfamily: MJXcTeXmainB,MJXcTeXmainBx,MJXcTeXmainBw} .MJXcTeXmainI {fontfamily: MJXcTeXmainI,MJXcTeXmainIx,MJXcTeXmainIw} .MJXcTeXmainR {fontfamily: MJXcTeXmainR,MJXcTeXmainRw} .MJXcTeXmathI {fontfamily: MJXcTeXmathI,MJXcTeXmathIx,MJXcTeXmathIw} .MJXcTeXsize1R {fontfamily: MJXcTeXsize1R,MJXcTeXsize1Rw} .MJXcTeXsize2R {fontfamily: MJXcTeXsize2R,MJXcTeXsize2Rw} .MJXcTeXsize3R {fontfamily: MJXcTeXsize3R,MJXcTeXsize3Rw} .MJXcTeXsize4R {fontfamily: MJXcTeXsize4R,MJXcTeXsize4Rw} .MJXcTeXvecR {fontfamily: MJXcTeXvecR,MJXcTeXvecRw} .MJXcTeXvecB {fontfamily: MJXcTeXvecB,MJXcTeXvecBx,MJXcTeXvecBw} @fontface {fontfamily: MJXcTeXamsR; src: local('MathJax_AMS'), local('MathJax_AMSRegular')} @fontface {fontfamily: MJXcTeXamsRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_AMSRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_AMSRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_AMSRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXcalB; src: local('MathJax_Caligraphic Bold'), local('MathJax_CaligraphicBold')} @fontface {fontfamily: MJXcTeXcalBx; src: local('MathJax_Caligraphic'); fontweight: bold} @fontface {fontfamily: MJXcTeXcalBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_CaligraphicBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_CaligraphicBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_CaligraphicBold.otf') format('opentype')} @fontface {fontfamily: MJXcTeXfrakR; src: local('MathJax_Fraktur'), local('MathJax_FrakturRegular')} @fontface {fontfamily: MJXcTeXfrakRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_FrakturRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_FrakturRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_FrakturRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXfrakB; src: local('MathJax_Fraktur Bold'), local('MathJax_FrakturBold')} @fontface {fontfamily: MJXcTeXfrakBx; src: local('MathJax_Fraktur'); fontweight: bold} @fontface {fontfamily: MJXcTeXfrakBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_FrakturBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_FrakturBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_FrakturBold.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmathBI; src: local('MathJax_Math BoldItalic'), local('MathJax_MathBoldItalic')} @fontface {fontfamily: MJXcTeXmathBIx; src: local('MathJax_Math'); fontweight: bold; fontstyle: italic} @fontface {fontfamily: MJXcTeXmathBIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MathBoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MathBoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MathBoldItalic.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsansR; src: local('MathJax_SansSerif'), local('MathJax_SansSerifRegular')} @fontface {fontfamily: MJXcTeXsansRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_SansSerifRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_SansSerifRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_SansSerifRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsansB; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerifBold')} @fontface {fontfamily: MJXcTeXsansBx; src: local('MathJax_SansSerif'); fontweight: bold} @fontface {fontfamily: MJXcTeXsansBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_SansSerifBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_SansSerifBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_SansSerifBold.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsansI; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerifItalic')} @fontface {fontfamily: MJXcTeXsansIx; src: local('MathJax_SansSerif'); fontstyle: italic} @fontface {fontfamily: MJXcTeXsansIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_SansSerifItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_SansSerifItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_SansSerifItalic.otf') format('opentype')} @fontface {fontfamily: MJXcTeXscriptR; src: local('MathJax_Script'), local('MathJax_ScriptRegular')} @fontface {fontfamily: MJXcTeXscriptRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_ScriptRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_ScriptRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_ScriptRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXtypeR; src: local('MathJax_Typewriter'), local('MathJax_TypewriterRegular')} @fontface {fontfamily: MJXcTeXtypeRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_TypewriterRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_TypewriterRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_TypewriterRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXcalR; src: local('MathJax_Caligraphic'), local('MathJax_CaligraphicRegular')} @fontface {fontfamily: MJXcTeXcalRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_CaligraphicRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_CaligraphicRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_CaligraphicRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmainB; src: local('MathJax_Main Bold'), local('MathJax_MainBold')} @fontface {fontfamily: MJXcTeXmainBx; src: local('MathJax_Main'); fontweight: bold} @fontface {fontfamily: MJXcTeXmainBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MainBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MainBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MainBold.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmainI; src: local('MathJax_Main Italic'), local('MathJax_MainItalic')} @fontface {fontfamily: MJXcTeXmainIx; src: local('MathJax_Main'); fontstyle: italic} @fontface {fontfamily: MJXcTeXmainIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MainItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MainItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MainItalic.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmainR; src: local('MathJax_Main'), local('MathJax_MainRegular')} @fontface {fontfamily: MJXcTeXmainRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MainRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MainRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MainRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmathI; src: local('MathJax_Math Italic'), local('MathJax_MathItalic')} @fontface {fontfamily: MJXcTeXmathIx; src: local('MathJax_Math'); fontstyle: italic} @fontface {fontfamily: MJXcTeXmathIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MathItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MathItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MathItalic.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsize1R; src: local('MathJax_Size1'), local('MathJax_Size1Regular')} @fontface {fontfamily: MJXcTeXsize1Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_Size1Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_Size1Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_Size1Regular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsize2R; src: local('MathJax_Size2'), local('MathJax_Size2Regular')} @fontface {fontfamily: MJXcTeXsize2Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_Size2Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_Size2Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_Size2Regular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsize3R; src: local('MathJax_Size3'), local('MathJax_Size3Regular')} @fontface {fontfamily: MJXcTeXsize3Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_Size3Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_Size3Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_Size3Regular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsize4R; src: local('MathJax_Size4'), local('MathJax_Size4Regular')} @fontface {fontfamily: MJXcTeXsize4Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_Size4Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_Size4Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_Size4Regular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXvecR; src: local('MathJax_Vector'), local('MathJax_VectorRegular')} @fontface {fontfamily: MJXcTeXvecRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_VectorRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_VectorRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_VectorRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXvecB; src: local('MathJax_Vector Bold'), local('MathJax_VectorBold')} @fontface {fontfamily: MJXcTeXvecBx; src: local('MathJax_Vector'); fontweight: bold} @fontface {fontfamily: MJXcTeXvecBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_VectorBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_VectorBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_VectorBold.otf') format('opentype')} [1]
It's also important to note that if we actually did provide the agent with the exact same data during training as it would experience during deployment, this is equivalent to simply letting the agent learn in the real world, and there would be no difference between training and testing. Since we normally assume providing such a perfect environment is either impossible or unsafe, the considerations in that case become quite different.
An exampleI worry my discussion was a bit abstract to be useful, so I'll provide a specific example to show where my thinking lies. Consider the lunar lander example that I provided in the last post.
To reiterate, we train an agent to land on a landing pad, but during training there is a perfect correlation between whether a landing pad is painted red and whether it is a real landing pad.
During deployment, if the "true" factor that determined whether a patch of ground is a landing pad was whether it is enclosed by flags, and some faraway crater is painted red, then the agent might veer off into the crater rather than landing on the landing pad.
Since there is literally not enough information during training to infer what property correctly determines whether a patch of ground is a landing pad, the agent is forced to infer whether its the flags or the red painting. It's not exactly clear what the "simplest" inference is here, but it's coherent to imagine that "red painting determines whether something is a landing pad" is the simplest inference.
As humans, we might have a preference for the flags being the true determinant, since that resonates more with what we think a landing pad should be, and whether something is painted red is not nearly as compelling to us.
The important point is to notice that our judgement here is determined by our preferences, and not something the agent could have learned during training using some valueneutral inferences. The agent must make further assumptions about human preferences for it to consistently perform well during testing.
 You might wonder whether we could define catastrophe in a completely valueindependent way, sidestepping this whole issue. This is the approach implicitly assumed by impact measures. However, if we want to avoid all types of situations where we'd prefer the system fail completely, I think this will require a broader notion of catastrophe than "something with a large impact." Furthermore, we would not want to penalize systems for having a large positive impact.
Discuss
[AN #82 errata]: How OpenAI Five distributed their training computation
[AN #82 errata]: How OpenAI Five distributed their training computation View this email in your browser Find all Alignment Newsletter resources here. In particular, you can sign up, or look through this spreadsheet of all summaries that have ever been in the newsletter. I'm always happy to hear feedback; you can send it to me by replying to this email.
Audio version here (may not be up yet).
HighlightsDota 2 with Large Scale Deep Reinforcement Learning (OpenAI et al) (summarized by Nicholas): In April, OpenAI Five (AN #54) defeated the world champion Dota 2 team, OG. This paper describes its training process. OpenAI et al. handengineered the reward function as well as some features, actions, and parts of the policy. The rest of the policy was trained using PPO with an LSTM architecture at a massive scale. They trained this in a distributed fashion as follows:
 The Controller receives and distributes the updated parameters.
 The Rollout Worker CPUs simulate the game, send observations to the Forward Pass GPUs and publish samples to the Experience Buffer.
 The Forward Pass GPUs determine the actions to use and send them to the Rollout Workers.
 The Optimizer GPUs sample experience from the Experience Buffer, calculate gradient updates, and then publish updated parameters to the Controller.
The model trained over 296 days. In that time, OpenAI needed to adapt it to changes in the code and game mechanics. This was done via model “surgery”, in which they would try to initialize a new model to maintain the same inputoutput mapping as the old one. When this was not possible, they gradually increased the proportion of games played with the new version over time.
Nicholas's opinion: I feel similarly to my opinion on AlphaStar (AN #73) here. The result is definitely impressive and a major step up in complexity from shorter, discrete games like chess or go. However, I don’t see how the approach of just running PPO at a large scale brings us closer to AGI because we can’t run massively parallel simulations of real world tasks. Even for tasks that can be simulated, this seems prohibitively expensive for most use cases (I couldn’t find the exact costs, but I’d estimate this model cost tens of millions of dollars). I’d be quite excited to see an example of deep RL being used for a complex real world task without training in simulation.
Technical AI alignment Technical agendas and prioritizationJust Imitate Humans? (Michael Cohen) (summarized by Rohin): This post asks whether it is safe to build AI systems that just imitate humans. The comments have a lot of interesting debate.
Agent foundationsConceptual Problems with UDT and Policy Selection (Abram Demski) (summarized by Rohin): In Updateless Decision Theory (UDT), the agent decides "at the beginning of time" exactly how it will respond to every possible sequence of observations it could face, so as to maximize the expected value it gets with respect to its prior over how the world evolves. It is updateless because it decides ahead of time how it will respond to evidence, rather than updating once it sees the evidence. This works well when the agent can consider the full environment and react to it, and often gets the right result even when the environment can model the agent (as in Newcomblike problems), as long as the agent knows how the environment will model it.
However, it seems unlikely that UDT will generalize to logical uncertainty and multiagent settings. Logical uncertainty occurs when you haven't computed all the consequences of your actions and is reduced by thinking longer. However, this effectively is a form of updating, whereas UDT tries to know everything upfront and never update, and so it seems hard to make it compatible with logical uncertainty. With multiagent scenarios, the issue is that UDT wants to decide on its policy "before" any other policies, which may not always be possible, e.g. if another agent is also using UDT. The philosophy behind UDT is to figure out how you will respond to everything ahead of time; as a result, UDT aims to precommit to strategies assuming that other agents will respond to its commitments; so two UDT agents are effectively "racing" to make their commitments as fast as possible, reducing the time taken to consider those commitments as much as possible. This seems like a bad recipe if we want UDT agents to work well with each other.
Rohin's opinion: I am no expert in decision theory, but these objections seem quite strong and convincing to me.
A Critique of Functional Decision Theory (Will MacAskill) (summarized by Rohin): This summary is more editorialized than most. This post critiques Functional Decision Theory (FDT). I'm not going to go into detail, but I think the arguments basically fall into two camps. First, there are situations in which there is no uncertainty about the consequences of actions, and yet FDT chooses actions that do not have the highest utility, because of their impact on counterfactual worlds which "could have happened" (but ultimately, the agent is just leaving utility on the table). Second, FDT relies on the ability to tell when someone is "running an algorithm that is similar to you", or is "logically correlated with you". But there's no such crisp concept, and this leads to all sorts of problems with FDT as a decision theory.
Rohin's opinion: Like Buck from MIRI, I feel like I understand these objections and disagree with them. On the first argument, I agree with Abram that a decision should be evaluated based on how well the agent performs with respect to the probability distribution used to define the problem; FDT only performs badly if you evaluate on a decision problem produced by conditioning on a highly improbable event. On the second class of arguments, I certainly agree that there isn't (yet) a crisp concept for "logical similarity"; however, I would be shocked if the intuitive concept of logical similarity was not relevant in the general way that FDT suggests. If your goal is to hardcode FDT into an AI agent, or your goal is to write down a decision theory that in principle (e.g. with infinite computation) defines the correct action, then it's certainly a problem that we have no crisp definition yet. However, FDT can still be useful for getting more clarity on how one ought to reason, without providing a full definition.
Learning human intentLearning to Imitate Human Demonstrations via CycleGAN (Laura Smith et al) (summarized by Zach): Most methods for imitation learning, where robots learn from a demonstration, assume that the actions of the demonstrator and robot are the same. This means that expensive techniques such as teleoperation have to be used to generate demonstrations. This paper presents a method to engage in automated visual instructionfollowing with demonstrations (AVID) that works by translating video demonstrations done by a human into demonstrations done by a robot. To do this, the authors use CycleGAN, a method to translate an image from one domain to another domain using unpaired images as training data. CycleGAN allows them to translate videos of humans performing the task into videos of the robot performing the task, which the robot can then imitate. In order to make learning tractable, the demonstrations had to be divided up into 'key stages' so that the robot can learn a sequence of more manageable tasks. In this setup, the robot only needs supervision to ensure that it's copying each stage properly before moving on to the next one. To test the method, the authors have the robot retrieve a coffee cup and make coffee. AVID significantly outperforms other imitation learning methods and can achieve 70% / 80% success rate on the tasks, respectively.
Zach's opinion: In general, I like the idea of 'translating' demonstrations from one domain into another. It's worth noting that there do exist methods for translating visual demonstrations into latent policies. I'm a bit surprised that we didn't see any comparisons with other adversarial methods like GAIfO, but I understand that those methods have high sample complexity so perhaps the methods weren't useful in this context. It's also important to note that these other methods would still require demonstration translation. Another criticism is that AVID is not fully autonomous since it relies on human feedback to progress between stages. However, compared to kinetic teaching or teleoperation, sparse feedback from a human overseer is a minor inconvenience.
Read more: Paper: AVID: Learning MultiStage Tasks via PixelLevel Translation of Human Videos
Preventing bad behaviorWhen Goodharting is optimal: linear vs diminishing returns, unlikely vs likely, and other factors (Stuart Armstrong) (summarized by Flo): Suppose we were uncertain about which arm in a bandit provides reward (and we don’t get to observe the rewards after choosing an arm). Then, maximizing expected value under this uncertainty is equivalent to picking the most likely reward function as a proxy reward and optimizing that; Goodhart’s law doesn’t apply and is thus not universal. This means that our fear of Goodhart effects is actually informed by more specific intuitions about the structure of our preferences. If there are actions that contribute to multiple possible rewards, optimizing the most likely reward does not need to maximize the expected reward. Even if we optimize for that, we have a problem if value is complex and the way we do reward learning implicitly penalizes complexity. Another problem arises if the correct reward is comparatively difficult to optimize: if we want to maximize the average, it can make sense to only care about rewards that are both likely and easy to optimize. Relatedly, we could fail to correctly account for diminishing marginal returns in some of the rewards.
Goodhart effects are a lot less problematic if we can deal with all of the mentioned factors. Independent of that, Goodhart effects are most problematic when there is little middle ground that all rewards can agree on.
Flo's opinion: I enjoyed this article and the proposed factors match my intuitions. There seem to be two types of problems: extreme beliefs and concave Pareto boundaries. Dealing with the second is more important since a concave Pareto boundary favours extreme policies, even for moderate beliefs. Luckily, diminishing returns can be used to bend the Pareto boundary. However, I expect it to be hard to find the correct rate of diminishing returns, especially in novel situations.
Rohin's opinion: Note that this post considers the setting where we have uncertainty over the true reward function, but we can't learn about the true reward function. If you can gather information about the true reward function, which seems necessary to me (AN #41), then it is almost always worse to take the most likely reward or expected reward as a proxy reward to optimize.
RobustnessAugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty (Dan Hendrycks, Norman Mu et al) (summarized by Dan H): This paper introduces a data augmentation technique to improve robustness and uncertainty estimates. The idea is to take various random augmentations such as random rotations, produce several augmented versions of an image with compositions of random augmentations, and then pool the augmented images into a single image by way of an elementwise convex combination. Said another way, the image is augmented with various traditional augmentations, and these augmented images are “averaged” together. This produces highly diverse augmentations that have similarity to the original image. Unlike techniques such as AutoAugment, this augmentation technique uses typical resources, not 15,000 GPU hours. It also greatly improves generalization to unforeseen corruptions, and it makes models more stable under small perturbations. Most importantly, even as the distribution shifts and accuracy decreases, this technique produces models that can remain calibrated under distributional shift.
Miscellaneous (Alignment)Defining and Unpacking Transformative AI (Ross Gruetzemacher et al) (summarized by Flo): The notion of transformative AI (TAI) is used to highlight that even narrow AI systems can have large impacts on society. This paper offers a clearer definition of TAI and distinguishes it from radical transformative AI (RTAI).
"Discontinuities or other anomalous patterns in metrics of human progress, as well as irreversibility are common indicators of transformative change. TAI is then broadly defined as an AI technology, which leads to an irreversible change of some important aspects of society, making it a (multidimensional) spectrum along the axes of extremity, generality and fundamentality. " For example, advanced AI weapon systems might have strong implications for great power conflicts but limited effects on people's daily lives; extreme change of limited generality, similar to nuclear weapons. There are two levels: while TAI is comparable to generalpurpose technologies (GPTs) like the internal combustion engine, RTAI leads to changes that are comparable to the agricultural or industrial revolution. Both revolutions have been driven by GPTs like the domestication of plants and the steam engine. Similarly, we will likely see TAI before RTAI. The scenario where we don't is termed a radical shift.
Nonradical TAI could still contribute to existential risk in conjunction with other factors. Furthermore, if TAI precedes RTAI, our management of TAI can affect the risks RTAI will pose.
Flo's opinion: Focusing on the impacts on society instead of specific features of AI systems makes sense and I do believe that the shape of RTAI as well as the risks it poses will depend on the way we handle TAI at various levels. More precise terminology can also help to prevent misunderstandings, for example between people forecasting AI and decision maker.
Six AI Risk/Strategy Ideas (Wei Dai) (summarized by Rohin): This post briefly presents three ways that power can become centralized in a world with Comprehensive AI Services (AN #40), argues that under risk aversion "logical" risks can be more concerning than physical risks because they are more correlated, proposes combining human imitations and oracles to remove the human in the loop and become competitive, and suggests doing research to generate evidence of difficulty of a particular strand of research.
Copyright © 2020 Rohin Shah, All rights reserved.Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.
Discuss
WORKSHOP ON ASSURED AUTONOMOUS SYSTEMS (WAAS)
This may be of interest to people interested in AI Safety. This event is part of the 2020 IEEE Symposium on Security and Privacy and is being sponsored by the Johns Hopkins University Institute for Assured Autonomy.
https://www.ieeesecurity.org/TC/SPW2020/WAAS/
Discuss
Why Do You Keep Having This Problem?
One thing I've noticed recently is that when someone complains about how a certain issue "just keeps happening" or they "keep having to deal with it", it often seems to indicate an unsolved problem that people may not be aware of. Some examples:
 Players of a game repeatedly ask the same rules questions to the judges at an event. This doesn't mean everyone is bad at reading  it likely indicates an area of the rules that is unclear or misleadingly written.
 People keep trying to open a door the wrong way, either pulling on a door that's supposed to be pushed or pushing a door that's supposed to be pulled  it's quite possible the handle has been designed poorly in a way that gives people the wrong idea of how to use it. (The Design of Everyday Things has more examples of this sort of issue.)
 Someone keeps hearing the same type of complaint or having the same conversation about a particular policy at work  this might be a sign that that policy might have issues. [1]
 Every time someone tries to moderate a forum they run, lots of users protest against their actions and call it unjust; this might be a sign that they're making bad moderation decisions.
I'm not going to say that all such cases are ones where things should change  it's certainly possible that one might have to take unpopular but necessary measures under some circumstances  but I do think that this sort of thing should be a pretty clear warning sign that things might be going wrong.
Thus, I suspect you should consider these sorts of patterns not just as "some funny thing that keeps happening" or whatever, but rather as potential indicators of "bugs" to be corrected!
[1] This post was primarily inspired by a situation in which I saw someone write "This is the fifth time I've had this conversation in the last 24 hours and I'm sick of it" or words to that effect  the reason they had kept having that conversation, at least in my view, was because they were implementing a bad policy and people kept questioning them on it (with perhaps varying degrees of politeness).
Discuss
Страницы
 1
 2
 3
 4
 5
 6
 7
 8
 9
 …
 следующая ›
 последняя »