Вы здесь
Новости LessWrong.com
(Double)Inverse Embedded Agency Problem
MIRI has said a lot about the issue of embedded agency over the last year. However, I am yet to see them trying to make progress in what I see as the most promising areas.
How does one attack a problem that is new, complicated and nonobvious? By constructing toy models and inverting hard questions to make them more tractable.
In general an inverse problem is harder than the "direct" one, because we are trying to infer unobservables from observables. Wikipedia gives an example of figuring out the position of Neptune from the perturbations in the orbit of Uranus. Another popular example is NPcomplete problems: they are famously hard to solve but it is easy to verify a solution. Another example: you take a multiplechoice math quiz, it is often faster and easier to get the right answer by plugging the 4 or 5 potential solutions into the stated problem than to solve the problem directly.
I'll give an example from my own area. The equations of general relativity are hard to solve except in a few highly symmetric cases. It is a classic inverse problem. But! Any spacetime metric is actually a solution of the Einstein equations, so all one needs to do is to write down a metric and calculate its Einstein tensor to see what kind of a matter distribution (and boundary conditions) it is a solution of. Inverting the inverse problem! Of course, life is not that easy. Most solutions correspond to "unphysical" matter, usually with negative energy density, superluminal flows, singularities, infinities, weird topologies etc. However, it is a useful approach if one wants to study some general properties of the equations, and get a feel for (or sometimes a theorem about) what goes wrong, why and how. After a few iterations one can get better at guessing what form a "good" solution might take, and write up an ansatz that can help solve the original, not the inverse problem in some cases.
Another, more familiar example: arithmetic division. Until you learn or figure out the rules, it's hard. But its inverse problem, multiplication, is actually much easier! So to learn more about division, it pays to try to start with potential solutions and see what kind of multiplication actually solve the division problem. Eventually one can come up with the long division algorithm, that uses nothing but multiplication and subtraction. And voila, inverting an inverse problem helps us solve the original one.
This approach is common in computer science, as well. Plenty of algorithms, like search, actually rely on solving smaller and simpler inverse problems.
I contend that a similar approach could be useful for making progress in understanding embedded agency. To that end, let's first restate the original problem of embedded agency (copied from the alignment forum page):
How can one make good models of the world that are able to fit within an agent that is much smaller than the world?
This is a hard inverse problem! There are many faucets of it, such as the oftmentioned problem of logical counterfactuals, that do not seem to yield to direct attacks. So, it seem natural to learn to "seek under the light" before stepping into the darkness, and that includes, you guessed it, constructing toy models and inverting the inverse problems.
What would inverting this problem look like? There are multiple possible formulations, just like an inverse of the operation of power a^b is both nth root and logarithm. Here is a couple of ideas:
 Create a toy universe and look for its representations inside.
 Create a toy model and construct a world around it such that the model represents the world in some way.
Here is an example: a fractal is selfsimilar, so any subset of it can be thought of as a nearperfect model of the whole. Of course, a model is not enough, one has to figure out what would constitute an agent using this model in this fractal world. But at least it can be a promising and potentially illuminating direction to explore. There are plenty more ideas one can come up after thinking about it for 5 minutes.
I hope someone at MIRI is either thinking along these directions, or is ready to try to, instead of being stuck analyzing the messy and complicated inverse problem that is the "real world".
Discuss
The Final Vote (for the LW 2018 Review)
It's now time for us, together, to decide what were the most valuable LessWrong posts in 2018.
Voting is live, and available for all users with 1000 karma, which is 430 accounts.
The vote will be open for 12 days, and will close on Sunday 19th January. (Well, we'll turn it off on Monday after we get into work, probably early afternoon, ensuring all timezones get it throughout Sunday.)
To enter your votes, go to the vote button on the frontpage in the "LessWrong 2018 Review" section, or click "Vote Here" at the bottom of this post.
How To VoteSorting Posts Into Buckets
The first part of voting is sorting the nominated posts into buckets.
The five buckets are: No, Neutral, Good, Important, Crucial. Sort as you think is best.
The key part is in the relative weighting of different posts. If you put every post in 'crucial' or every post in 'good', this won't end up having a different effect on your final vote.
FineTuning Your Votes
The system we're using is quadratic voting (as I discussed a few weeks ago).
Once you're happy with your buckets, click 'Convert to Quadratic'. At this point the system converts your buckets roughly into their quadratic equivalents. The system will only assign integer numbers of votes, which means here that it will likely allocate around 8090% of the total votes available to you. If you vote on a smaller number of posts (<10), the automatic system may cast substantially fewer of your votes.
If you're happy with how things look, you can just leave at this point, and your votes will be saved (you can come back any time before the vote closes to update them). But if you want to allocate all the remaining votes available, you'll need to do finetuning.
There are two key parts of quadratic voting you need to know.
 First, you have a limited budget of how much you're allowed to spend on votes.
 Second, votes on a post have increasing marginal cost.
Here this means that your first vote costs 1 point, your second vote on that post costs 2 points, your third costs 3 points. Your nth vote costs n points.
You have 500 points to spend. You can see how many points you've spent at the top of the posts.
The system will automatically weight the buckets differently. For example, I just did this, and I got the following weightings:
 Good: 2 votes.
 Important: 4 votes.
 Crucial: 9 votes.
 Neutral: 0 votes.
 No: 4 votes.
(Note that negative votes cost the same as positive votes. The first negative vote costs 1 point, the second negative vote costs 2 points, etc.)
You'll see your score at the top of the page. When I arrived on the finetuning page, the system had spent about 416 points, which meant I had a fair number of more votes to buy.
Once you're happy with the balance, just close the page, your votes will be saved.
You can return to this page anytime until voting is over, to reconfigure your weights.
Leaving Comments (Anonymous)
There's a field to leave anonymous thoughts on a post. All comments written here will be put into a public google doc, and be linked to from the post that announces the result of the vote. If you want to share your thoughts, however briefly, this is a space to do that.
I will likely be making a book of 2018 posts, and if so will use the vote as my guide to what to include, so I know I'll be interested in reading through people's anonymous thoughts and feelings about the 2018 LW posts.
Extra Notes
If you'd like to go back to the buckets stage, then hit “Return to basic voting”. If you do this, all of your finetuning will be thrown out the window, and the system will recalculate your weights entirely based on the new buckets you assign.
I find it really valuable to be able to see the posts in the order I've currently ranked them, so there's a button at the top to reorder the posts, which I expect to be clicked dozens of times by each user.
On the side, there's a box showing you all nominations and reviews for the post you click on, which you might want to read.
The results
This vote will have many outputs. Once we've had the time to analyse the data, we'll include a bunch of data/graphs, all anonymised, such as:
 For the each winning post and each bucket, how many times people picked that bucket
 The individual votes each post got.
 The results if the karma cutoff was 10,000 karma rather than 1,000.
 The output of people's buckets compared with the output of people's quadratic finetuning
 The mean and standard deviation of the votes
Vote Here (if you have more than 1000 karma)
Discuss
How to Throw Away Information in Causal DAGs
When constructing a highlevel abstract causal DAG from a lowlevel DAG, one operation which comes up quite often is throwing away information from a node. This post is about how to do that.
First, how do we throw away information from random variables in general? Sparknotes:
 Given a random variable .mjxchtml {display: inlineblock; lineheight: 0; textindent: 0; textalign: left; texttransform: none; fontstyle: normal; fontweight: normal; fontsize: 100%; fontsizeadjust: none; letterspacing: normal; wordwrap: normal; wordspacing: normal; whitespace: nowrap; float: none; direction: ltr; maxwidth: none; maxheight: none; minwidth: 0; minheight: 0; border: 0; margin: 0; padding: 1px 0} .MJXcdisplay {display: block; textalign: center; margin: 1em 0; padding: 0} .mjxchtml[tabindex]:focus, body :focus .mjxchtml[tabindex] {display: inlinetable} .mjxfullwidth {textalign: center; display: tablecell!important; width: 10000em} .mjxmath {display: inlineblock; bordercollapse: separate; borderspacing: 0} .mjxmath * {display: inlineblock; webkitboxsizing: contentbox!important; mozboxsizing: contentbox!important; boxsizing: contentbox!important; textalign: left} .mjxnumerator {display: block; textalign: center} .mjxdenominator {display: block; textalign: center} .MJXcstacked {height: 0; position: relative} .MJXcstacked > * {position: absolute} .MJXcbevelled > * {display: inlineblock} .mjxstack {display: inlineblock} .mjxop {display: block} .mjxunder {display: tablecell} .mjxover {display: block} .mjxover > * {paddingleft: 0px!important; paddingright: 0px!important} .mjxunder > * {paddingleft: 0px!important; paddingright: 0px!important} .mjxstack > .mjxsup {display: block} .mjxstack > .mjxsub {display: block} .mjxprestack > .mjxpresup {display: block} .mjxprestack > .mjxpresub {display: block} .mjxdelimh > .mjxchar {display: inlineblock} .mjxsurd {verticalalign: top} .mjxmphantom * {visibility: hidden} .mjxmerror {backgroundcolor: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; fontstyle: normal; fontsize: 90%} .mjxannotationxml {lineheight: normal} .mjxmenclose > svg {fill: none; stroke: currentColor} .mjxmtr {display: tablerow} .mjxmlabeledtr {display: tablerow} .mjxmtd {display: tablecell; textalign: center} .mjxlabel {display: tablerow} .mjxbox {display: inlineblock} .mjxblock {display: block} .mjxspan {display: inline} .mjxchar {display: block; whitespace: pre} .mjxitable {display: inlinetable; width: auto} .mjxrow {display: tablerow} .mjxcell {display: tablecell} .mjxtable {display: table; width: 100%} .mjxline {display: block; height: 0} .mjxstrut {width: 0; paddingtop: 1em} .mjxvsize {width: 0} .MJXcspace1 {marginleft: .167em} .MJXcspace2 {marginleft: .222em} .MJXcspace3 {marginleft: .278em} .mjxtest.mjxtestdisplay {display: table!important} .mjxtest.mjxtestinline {display: inline!important; marginright: 1px} .mjxtest.mjxtestdefault {display: block!important; clear: both} .mjxexbox {display: inlineblock!important; position: absolute; overflow: hidden; minheight: 0; maxheight: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjxtestinline .mjxleftbox {display: inlineblock; width: 0; float: left} .mjxtestinline .mjxrightbox {display: inlineblock; width: 0; float: right} .mjxtestdisplay .mjxrightbox {display: tablecell!important; width: 10000em!important; minwidth: 0; maxwidth: none; padding: 0; border: 0; margin: 0} .MJXcTeXunknownR {fontfamily: monospace; fontstyle: normal; fontweight: normal} .MJXcTeXunknownI {fontfamily: monospace; fontstyle: italic; fontweight: normal} .MJXcTeXunknownB {fontfamily: monospace; fontstyle: normal; fontweight: bold} .MJXcTeXunknownBI {fontfamily: monospace; fontstyle: italic; fontweight: bold} .MJXcTeXamsR {fontfamily: MJXcTeXamsR,MJXcTeXamsRw} .MJXcTeXcalB {fontfamily: MJXcTeXcalB,MJXcTeXcalBx,MJXcTeXcalBw} .MJXcTeXfrakR {fontfamily: MJXcTeXfrakR,MJXcTeXfrakRw} .MJXcTeXfrakB {fontfamily: MJXcTeXfrakB,MJXcTeXfrakBx,MJXcTeXfrakBw} .MJXcTeXmathBI {fontfamily: MJXcTeXmathBI,MJXcTeXmathBIx,MJXcTeXmathBIw} .MJXcTeXsansR {fontfamily: MJXcTeXsansR,MJXcTeXsansRw} .MJXcTeXsansB {fontfamily: MJXcTeXsansB,MJXcTeXsansBx,MJXcTeXsansBw} .MJXcTeXsansI {fontfamily: MJXcTeXsansI,MJXcTeXsansIx,MJXcTeXsansIw} .MJXcTeXscriptR {fontfamily: MJXcTeXscriptR,MJXcTeXscriptRw} .MJXcTeXtypeR {fontfamily: MJXcTeXtypeR,MJXcTeXtypeRw} .MJXcTeXcalR {fontfamily: MJXcTeXcalR,MJXcTeXcalRw} .MJXcTeXmainB {fontfamily: MJXcTeXmainB,MJXcTeXmainBx,MJXcTeXmainBw} .MJXcTeXmainI {fontfamily: MJXcTeXmainI,MJXcTeXmainIx,MJXcTeXmainIw} .MJXcTeXmainR {fontfamily: MJXcTeXmainR,MJXcTeXmainRw} .MJXcTeXmathI {fontfamily: MJXcTeXmathI,MJXcTeXmathIx,MJXcTeXmathIw} .MJXcTeXsize1R {fontfamily: MJXcTeXsize1R,MJXcTeXsize1Rw} .MJXcTeXsize2R {fontfamily: MJXcTeXsize2R,MJXcTeXsize2Rw} .MJXcTeXsize3R {fontfamily: MJXcTeXsize3R,MJXcTeXsize3Rw} .MJXcTeXsize4R {fontfamily: MJXcTeXsize4R,MJXcTeXsize4Rw} .MJXcTeXvecR {fontfamily: MJXcTeXvecR,MJXcTeXvecRw} .MJXcTeXvecB {fontfamily: MJXcTeXvecB,MJXcTeXvecBx,MJXcTeXvecBw} @fontface {fontfamily: MJXcTeXamsR; src: local('MathJax_AMS'), local('MathJax_AMSRegular')} @fontface {fontfamily: MJXcTeXamsRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_AMSRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_AMSRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_AMSRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXcalB; src: local('MathJax_Caligraphic Bold'), local('MathJax_CaligraphicBold')} @fontface {fontfamily: MJXcTeXcalBx; src: local('MathJax_Caligraphic'); fontweight: bold} @fontface {fontfamily: MJXcTeXcalBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_CaligraphicBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_CaligraphicBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_CaligraphicBold.otf') format('opentype')} @fontface {fontfamily: MJXcTeXfrakR; src: local('MathJax_Fraktur'), local('MathJax_FrakturRegular')} @fontface {fontfamily: MJXcTeXfrakRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_FrakturRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_FrakturRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_FrakturRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXfrakB; src: local('MathJax_Fraktur Bold'), local('MathJax_FrakturBold')} @fontface {fontfamily: MJXcTeXfrakBx; src: local('MathJax_Fraktur'); fontweight: bold} @fontface {fontfamily: MJXcTeXfrakBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_FrakturBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_FrakturBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_FrakturBold.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmathBI; src: local('MathJax_Math BoldItalic'), local('MathJax_MathBoldItalic')} @fontface {fontfamily: MJXcTeXmathBIx; src: local('MathJax_Math'); fontweight: bold; fontstyle: italic} @fontface {fontfamily: MJXcTeXmathBIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MathBoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MathBoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MathBoldItalic.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsansR; src: local('MathJax_SansSerif'), local('MathJax_SansSerifRegular')} @fontface {fontfamily: MJXcTeXsansRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_SansSerifRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_SansSerifRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_SansSerifRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsansB; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerifBold')} @fontface {fontfamily: MJXcTeXsansBx; src: local('MathJax_SansSerif'); fontweight: bold} @fontface {fontfamily: MJXcTeXsansBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_SansSerifBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_SansSerifBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_SansSerifBold.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsansI; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerifItalic')} @fontface {fontfamily: MJXcTeXsansIx; src: local('MathJax_SansSerif'); fontstyle: italic} @fontface {fontfamily: MJXcTeXsansIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_SansSerifItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_SansSerifItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_SansSerifItalic.otf') format('opentype')} @fontface {fontfamily: MJXcTeXscriptR; src: local('MathJax_Script'), local('MathJax_ScriptRegular')} @fontface {fontfamily: MJXcTeXscriptRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_ScriptRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_ScriptRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_ScriptRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXtypeR; src: local('MathJax_Typewriter'), local('MathJax_TypewriterRegular')} @fontface {fontfamily: MJXcTeXtypeRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_TypewriterRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_TypewriterRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_TypewriterRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXcalR; src: local('MathJax_Caligraphic'), local('MathJax_CaligraphicRegular')} @fontface {fontfamily: MJXcTeXcalRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_CaligraphicRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_CaligraphicRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_CaligraphicRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmainB; src: local('MathJax_Main Bold'), local('MathJax_MainBold')} @fontface {fontfamily: MJXcTeXmainBx; src: local('MathJax_Main'); fontweight: bold} @fontface {fontfamily: MJXcTeXmainBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MainBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MainBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MainBold.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmainI; src: local('MathJax_Main Italic'), local('MathJax_MainItalic')} @fontface {fontfamily: MJXcTeXmainIx; src: local('MathJax_Main'); fontstyle: italic} @fontface {fontfamily: MJXcTeXmainIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MainItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MainItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MainItalic.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmainR; src: local('MathJax_Main'), local('MathJax_MainRegular')} @fontface {fontfamily: MJXcTeXmainRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MainRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MainRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MainRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmathI; src: local('MathJax_Math Italic'), local('MathJax_MathItalic')} @fontface {fontfamily: MJXcTeXmathIx; src: local('MathJax_Math'); fontstyle: italic} @fontface {fontfamily: MJXcTeXmathIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MathItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MathItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MathItalic.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsize1R; src: local('MathJax_Size1'), local('MathJax_Size1Regular')} @fontface {fontfamily: MJXcTeXsize1Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_Size1Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_Size1Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_Size1Regular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsize2R; src: local('MathJax_Size2'), local('MathJax_Size2Regular')} @fontface {fontfamily: MJXcTeXsize2Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_Size2Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_Size2Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_Size2Regular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsize3R; src: local('MathJax_Size3'), local('MathJax_Size3Regular')} @fontface {fontfamily: MJXcTeXsize3Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_Size3Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_Size3Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_Size3Regular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsize4R; src: local('MathJax_Size4'), local('MathJax_Size4Regular')} @fontface {fontfamily: MJXcTeXsize4Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_Size4Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_Size4Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_Size4Regular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXvecR; src: local('MathJax_Vector'), local('MathJax_VectorRegular')} @fontface {fontfamily: MJXcTeXvecRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_VectorRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_VectorRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_VectorRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXvecB; src: local('MathJax_Vector Bold'), local('MathJax_VectorBold')} @fontface {fontfamily: MJXcTeXvecBx; src: local('MathJax_Vector'); fontweight: bold} @fontface {fontfamily: MJXcTeXvecBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_VectorBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_VectorBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_VectorBold.otf') format('opentype')} X, we can throw away information by replacing X with f(X) for some function f.
 Given some other random variable Y, f(X) “contains all information in X relevant to Y” if and only if P[Yf(X)]=P[YX]. In particular, the full distribution function (y→P[Y=yX]) is a minimal representation of the information about Y contained in X.
For more explanation of this, see Probability as Minimal Map.
For our purposes, starting from a lowlevel causal DAG, we want to:
 Pick a node Xi
 Pick a set of nodes XS (with i∈S)
 Replace Xi by f(Xi) for some function f, such that
 P[X¯Sf(Xi)]=P[X¯SXi]
Here ¯S denotes all the node indices outside S. (Specifying S rather than ¯S directly will usually be easier in practice, since XS is usually a small neighborhood of nodes around Xi.) In English: we want to throw away information from Xi, while retaining all information relevant to nodes outside the set S.
Two prototypical examples:
 In a digital circuit, we pick the voltage in a particular wire at a particular time as Xi. Assuming the circuit is well designed, we will find that only the binary value bin(Xi) is relevant to voltages far away in the circuit or in time. So, with all “nearby” voltages as XS, we can replace Xi by bin(Xi).
 In a fluid, we pick the (microscopic) positions and momenta of all the particles in a little cell of spacetime as Xi. Assuming uniform temperature, identical particles, and some source of external noise  even just a little bit  we expect that only the total number and momentum of particles in the cell will be relevant to the positions and momenta of particles far away in space and time. So, with all “nearby” cells/particles as XS, we can replace the microscopic positions and momenta of all particles in the cell with the total number and momentum of particles in the cell.
In both examples, we’re throwing out “local” information, while maintaining any information which is relevant “globally”. This will mean that local queries  e.g. the voltage in one wire given the voltage in a neighboring wire at the same time  are not supported; shortrange correlations violate the abstraction. However, largescale queries  e.g. the voltage in a wire now given the voltage in a wire a few seconds ago  are supported.
Modifying ChildrenWe still have one conceptual question to address: when we replace Xi by f(Xi), how do we modify children nodes of Xi to use f(Xi) instead?
The first and most important answer is: it doesn’t matter, so long as whatever they do is consistent with f(Xi). For instance, suppose Xi ranges over {1, 0, 1}, and f(Xi)=X2i. When f(Xi)=1, the children can act as though Xi were 1 or 1  it doesn’t matter which, so long as they don’t act like Xi=0. As long as the childrens’ behavior is consistent with the information in f(Xi), we will be able to support longrange queries.
There is one big catch, however: the children do need to all behave as if Xi had the same value, whatever value they choose. The joint distribution P[Xch(i)Xsp(i),f(Xi)] (where ch(i) = children of i and sp(i) = spouses of i) must be equal to P[Xch(i)Xsp(i),X∗i] for some value X∗i consistent with f(Xi). The simplest way to achieve this is to pick a particular “representative” value X∗i(f∗) for each possible value f∗ of f(Xi), so that f(X∗i(f∗))=f∗.
Example: in the digital circuit case, we would pick one representative “high” voltage (for instance the supply voltage VDD) and one representative “low” voltage (for instance the ground voltage VSS). X∗i(f(Xi)) would then map any high voltages to VDD and any low voltages to VSS.
Once we have our representative value function X∗i(f(Xi)), we just have the children use X∗i(f(Xi)) in place of Xi.
If we want, we could even simplify one step further: we could just choose f to spit out representative values directly. That convention is cleaner for proofs and algorithms, but a bit more confusing for human usage and examples.
Discuss
Morality vs related concepts
How can you know I’m talking about morality (aka ethics), rather than something else, when I say that I “should” do something, that humanity “ought” to take certain actions, or that something is “good”? What are the borderlines and distinctions between morality and these various “something else”s? How do they overlap and interrelate?
In this post, I try to collect together and summarise philosophical concepts that are relevant to the above questions.[1] I hope this will benefit readers by introducing them to some thoughtclarifying conceptual distinctions they may not have been aware of, as well as terms and links they can use to find more relevant info. With this groundwork established, my next post will discuss what moral uncertainty is.
Epistemic status: The concepts covered here are broad, fuzzy, and overlap in various ways, making definitions and distinctions between them almost inevitably debatable. Additionally, I’m not an expert in these topics; indeed, I expect many readers to know more than me about at least some of them, and one reason I wrote this was to help me clarify my own understandings. I’d appreciate feedback or comments in relation to any mistakes, poor phrasings, etc. (and just in general!).
Also note that my intention here is mostly to summarise existing ideas, rather than to provide original ideas or analysis.
NormativityA normative statement is any statement related to what one should do, what one ought to do, which of two things are better, or similar. “Something is said by philosophers to have ‘normativity’ when it entails that some action, attitude or mental state of some other kind is justified, an action one ought to do or a state one ought to be in” (Darwall). Normativity is thus the overarching category (superset) of which things like morality, prudence (in the sense explained below), and arguably rationality are just subsets.
This matches the usage of “normative” in economics, where normative claims relate to “what ought to be” (e.g., “The government should increase its spending”), while positive claims relate to “what is” (including predictions, such as what effects an increase in government spending may have). In linguistics, the equivalent distinction is between prescriptive approaches (involving normative claims about “better” or “correct” uses of language) and descriptive approaches (which investigate about how language is used).
PrudencePrudence essentially refers to the subset of normativity that has to do with one’s own selfinterest, happiness, or wellbeing (see here and here). This contrasts with morality, which may include but isn’t limited to one’s selfinterest (except perhaps for egoist moral theories).
For example (based on MacAskill p. 41), we may have moral reasons to give money to GiveWellrecommended charities, but prudential reasons to spend the money on ourselves, and both sets of reasons are “normatively relevant” considerations.
(The rest of this section is my own analysis, and may be mistaken.)
I would expect that the significance of prudential reasons, and how they relate to moral reasons, would differ depending on the moral theories one is considering (e.g., depending on which moral theories one has some belief in). Considering moral and prudential reasons separately does seem to make sense in relation to moral theories that don’t precisely mandate specific behaviours; for example, moral theories that simply forbid certain behaviours (e.g., violating people’s rights) while otherwise letting one choose from a range of options (e.g., donating to charity or not).[2]
In contrast, “maximising” moral theories like classical utilitarianism claim that the only action one is permitted to take is the very best action, leaving no room for choosing the “prudentially best” action out of a range of “morally acceptable” actions. Thus, in relation to maximising theories, it seems like keeping track of prudential reasons instead of only moral reasons, and sometimes acting based on prudential rather than moral reasons, would mean one is effectively either:
 using a modified version of the maximising moral theory (rather than the theory itself), or
 acting as if “morally uncertain” between the maximising moral theory and a “moral theory” in which prudence is seen as “intrinsically valuable”.
Either way, the boundary between prudence and morality seems to become fuzzier or less meaningful in such cases.[3]
(Instrumental) Rationality(This section is sortof my own analysis, and may be mistaken or use terms in unusual ways.)
Rationality, in one important sense at least, has to do with what one should do or intend, given one’s beliefs and preferences. This is the kind of rationality that decision theory often is seen as invoking. It can be spelled out in different ways. One is to see it as a matter of coherence: It is rational to do or intend what coheres with one’s beliefs and preferences (Broome, 2013; for a critic, see Arpaly, 2000).
Using this definition, it seems to me that:
 Rationality can be considered a subset of normativity in which the “should” statements, “ought” statements, etc. follow in a systematic way from one’s beliefs and preferences.
 Whether a “should” statement, “ought” statement, etc. is rational is unrelated to the balance of moral or prudential reasons involved. E.g., what I “rationally should” do relates only to morality and not prudence if my preferences relate only to morality and not prudence, and vice versa (with situations in between those extremes being possible too, of course).[4]
For example, the statement “I should give someone some money for a burrito” is true if I believe that doing so will result in me being able to eat a burrito, and I value that outcome more than I value continuing to have that money. And it doesn’t matter whether the reason I value that outcome is:
 Prudential: based on selfinterest;
 Moral: e.g., I’m a utilitarian who believes that the best way I can use my money to increase universewide utility is to buy myself a burrito (perhaps it looks really tasty and my biases are selfserving the hell out of me);
 Some mixture of the two.
Note that that discussion focused on instrumental rationality, but the same basic points could be made in relation to epistemic rationality, given that epistemic rationality itself “can be seen as a form of instrumental rationality in which knowledge and truth are goals in themselves” (LW Wiki).
For example, I could say that, from the perspective of epistemic rationality, I “shouldn’t” believe that buying that burrito will create more utility in expectation than donating the same money to AMF would. This is because holding that belief won’t help me meet the goal of having accurate beliefs.
Whether and how this relates to morality would depend on whether the “deeper reasons” why I prefer to have accurate beliefs (assuming I do indeed have that preference) are prudential, moral, or mixed.[5]
Subjective vs objectiveSubjective normativity relates to what one should do based on what one believes, whereas objective normativity relates to what one “actually” should do (i.e., based on the true state of affairs). Greaves and CottonBarratt illustrate this distinction with the following example:
Suppose Alice packs the waterproofs but, as the day turns out, it does not rain. Does it follow that Alice made the wrong decision? In one (objective) sense of “wrong”, yes: thanks to that decision, she experienced the mild but unnecessary inconvenience of carrying bulky raingear around all day. But in a second (more subjective) sense, clearly it need not follow that the decision was wrong: if the probability of rain was sufficiently high and Alice sufficiently dislikes getting wet, her decision could easily be the appropriate one to make given her state of ignorance about how the weather would in fact turn out. Normative theories of decisionmaking under uncertainty aim to capture this second, more subjective, type of evaluation; the standard such account is expected utility theory.[6][7]
This distinction can be applied to each subtype of normativity (i.e., morality, prudence, etc.).
(I’ll discuss this distinction further in my next post, on the nature of moral uncertainty.)
AxiologyThe term axiology is used in different ways in different ways, but the definition we’ll focus on here is from the Stanford Encyclopaedia of Philosophy:
Traditional axiology seeks to investigate what things are good, how good they are, and how their goodness is related to one another. Whatever we take the “primary bearers” of value to be, one of the central questions of traditional axiology is that of what stuffs are good: what is of value.
The same article also states: “For instance, a traditional question of axiology concerns whether the objects of value are subjective psychological states, or objective states of the world.”
Axiology (in this sense) is essentially one aspect of morality/ethics. For example, classical utilitarianism combines:
 the principle that one must take actions which will lead to the outcome with the highest possible level of value, rather than just doing things that lead to “good enough” outcomes, or just avoiding violating people’s rights
 the axiology that “wellbeing” is what has intrinsic value
The axiology itself is not a moral theory, but plays a key role in that moral theory.
Thus, one can’t have an axiological “should” statement, but one’s axiology may influence/inform one’s moral “should” statements.
Decision theoryI had a vague sense that this section was less important/interesting, so I put it in this footnote instead.[8]
MetaethicsWhile normative ethics addresses such questions as "What should I do?", evaluating specific practices and principles of action, metaethics addresses questions such as "What is goodness?" and "How can we tell what is good from what is bad?", seeking to understand the nature of ethical properties and evaluations. (Wikipedia)
Thus, metaethics is not directly normative at all; it isn’t about making “should”, “ought”, “better than”, or similar statements. Instead, it’s about understanding the “nature” of (the moral subset of) such statements, “where they come from”, and other such fun/spooky/nonsense/incredibly important matters.
MetanormativityMetanormativity relates to the “norms that govern how one ought to act that take into account one’s fundamental normative uncertainty”. Normative uncertainty, in turn, is essentially a generalisation of moral uncertainty that can also account for (uncertainty about) prudential reasons. I will thus discuss the topic of metanormativity in my next post, which centres on the question: “What is moral uncertainty?”
As stated earlier, I hope this usefully added to/clarified the concepts in your mental toolkit, and I’d welcome any feedback or comments!
(In particular, if you think there’s another concept whose overlaps with/distinctions from “morality” are worth highlighting, either let me know to add it, or just go ahead and explain it in the comments yourself.)
This post won’t attempt to discuss specific debates within metaethics, such as whether or not there are “objective moral facts”, and, if there are, whether or not these facts are “natural”. Very loosely speaking, I’m not trying to answer questions about what morality itself actually is, but rather about the overlaps and distinctions between what morality is meant to be about and what other topics that involve “should” and “ought” statements are meant to be about. ↩︎
Considering moral and prudential reasons separately also seems to make sense for moral theories which see supererogation as possible; that it, theories which see some acts as “morally good although not (strictly) required” (SEP). If we only believe in such theories, we may often find ourselves deciding between one act that’s morally “good enough” and another (supererogatory) act that’s morally better but prudentially worse. (E.g., perhaps, occasionally donating small sums to whichever charity strikes one’s fancy, vs donating 10% of one’s income to charities recommended by Animal Charity Evaluators.) ↩︎
The boundary seems even fuzzier when you also consider that many moral theories, such as classical or preference utilitarianism, already consider one’s own happiness or preferences to be morally relevant. This arguably makes also considering “prudential reasons” look like simply “doublecounting” one’s selfinterest, or giving it additional “weight”. ↩︎
If we instead used a definition of rationality in which preferences must only be based on selfinterest, then I believe rationality would become a subset of prudence specifically, rather than of normativity as a whole. It would still be the case that the distinctive feature of rational “should” statements is that they follow in a systematic way from one’s beliefs and preferences. ↩︎
Somewhat relevantly, Darwall writes: “Epistemology has an irreducibly normative aspect, in so far as it is concerned with norms for belief.” ↩︎
We could further divide subjective normativity up into, roughly, “what one should do based on what one actually believes” and “what one should do based on what it would be reasonable for one to believe”. The following quote is relevant (though doesn’t directly address that exact distinction):
Before moving on, we should distinguish subjective credences, that is, degrees of belief, from epistemic credences, that is, the degree of belief that one is epistemically justified in having, given one’s evidence. When I use the term ‘credence’ I refer to epistemic credences (though much of my discussion could be applied to a parallel discussion involving subjective credences); when I want to refer to subjective credences I use the term ‘degrees of belief’.
The reason for this is that appropriateness seems to have some sort of normative force: if it is most appropriate for someone to do something, it seems that, other things being equal, they ought, in the relevant sense of ‘ought’, to do it. But people can have crazy beliefs: a psychopath might think that a killing spree is the most moral thing to do. But there’s no sense in which the psychopath ought to go on a killing spree: rather, he ought to revise his beliefs. We can only capture that idea if we talk about epistemic credences, rather than degrees of belief.
(I found that quote in this comment, where it’s attributed to Will MacAskill’s BPhil thesis. Unfortunately, I can’t seem to access the thesis, including via Wayback Machine.) ↩︎
It also seems to me that this “subjective vs objective” distinction is somewhat related to, but distinct from, ex ante vs ex post thinking. ↩︎
(This section is sortof my own commentary, may be mistaken, and may accidentally deviate from standard uses of terms. Please let me know if you think I should delete it, change it, and/or put it back into the main text.)
It seems to me that the way to fit decision theories into this picture is to say that one must add a decision theory to one of the “sources of normativity” listed above (e.g., morality) in order to get some form of normative (e.g., moral) statements. However, a decision theory can’t “generate” a normative statement by itself.
For example, suppose that I have a moral preference for having more money rather than less, all other things held constant (because I wish to donate it to costeffective causes). By itself, this can’t tell me whether I “should” onebox or twobox in Newcomb’s problem. But once I specify my decision theory, I can say whether I “should” onebox or twobox. E.g., if I’m a causal decision theorist, I “should” twobox.
But if I knew only that I was a causal decision theorist, it would still be possible that I “should” onebox, if for some reason I preferred to have less money. Thus, as stated, we must specify (or assume) both a set of preferences and a decision theory in order to arrive at normative statements. ↩︎
Discuss
AIRCS Workshop: How I failed to be recruited at MIRI.
This blog post will touch on two related topics:
 Why and how I applied to MIRI and failed to secure an internship.
 My experience at the AI Risk for Computer Scientists workshop (AIRCS).
If you're only interested in the AIRCS workshop, you may skip to the second section directly. Ideally, I'd have liked to make two separate entries, as they may pertain to different points of interest. However, both topics are extremely intertwined to me, and I could not make a clear cut in this experience. I should also note that me failing to secure an internship at MIRI probably have had a drastic impact in how I write about it, if only because I'd have been more constrained in what I describe had I got the internship.
With respect to people's name, I'll adhere to the following rule: Only mention names if what they said was done so publicly. That means that for books, facebook public pages or lectures, I will assume I can use the name of the author or teacher, and I will keep the name to myself for private discussions.
Miri and me.I discovered MIRI through Eliezer Yudkowsky, as I first began reading HPMOR and then Rationality, from A.I. to Zombie. Like almost everyone, I'm not sure what it is MIRI exactly do. I know at least that MIRI's intended goal is to save the world from unaligned AGI. But whatever it is they concretely do, it seems quite fun  I mean, it seems to involve type theory and category theory. I also read some articles they wrote, and skimmed through many other. While interesting, I've never caught enough details to actually imagine how to even start implementing anything they speak of. Reading some of their writings reminded me of several epistemology books I read years ago, but written more precisely, with clear code in mind, and to a computersciencesavy audience.
As I said, fun stuff, fun work!
In February 2019, Eliezer Yudkowsky shared on facebook a post by Buck Schlegeris stating that "I think that every EA who is a software engineer should apply to work at MIRI, if you can imagine wanting to work at MIRI." (and some more stuff). When I read that, I thought I should give it at least a try. I really didn't know how I could have helped them given my professional background  mostly logic and automata theory  but if they say that we should give it a try anyway, let's. I must admit that I was still skeptical back then, and didn't know exactly what they do, but I thought I would eventually come to get it. And even if it did not seem that they'll save the world from anything in the end(1), it still seemed a place I'd have loved to work at, assuming they are similar to the less wrongers I met at the european LessWrong community weekend.
Note that Buck Schlegeris's message does not directly concerns me. I'm not EA, but only EA adjacent. I still fail to see any effective actions I could take appart from giving some money, and when I applied for 80k career coaching, they told me they couldn't help. It also does not concern me because I used to be a post doc researcher who sometimes programmed, and not a software engineer in itself. But I wanted to let them decide wether I'd be a good fit or not.
The application process went as follows: First it started off with a triplebyte quizz. This one was extremely easy. I think there was at most two questions which answers I didn't know. The second part was two one hour calls with a MIRI researcher. The first call was a general presentation of myself, how I've heard of MIRI, why I am interested in working there, what were my past experiences, and so on. I told the interviewer something such as:
I am not even convinced that MIRI's work is important. At best, I am convinced that you were convinced that it is important. But even if EY is a good writer who I admire, the fact that he his set on the importance of his mission is not sufficient for me to think he is right. Furthermore, MIRI gets money by convincing people that their work is important, which means that MIRI has a strong incentive to be persuasive, whether or not your beliefs are true, and whether or not you still hold that belief. I do believe that when MIRI was created, it was not clear they would ever get money. If EY is as good as he seems to be when he writes, he could probably have cashed money in in far easier ways. So the best argument I have currently regarding AGI alignment's importance is that the founders of MIRI thought it was important at the time of MIRI's creation.
Honestly, after saying all of this, I thought my application was over and we would have stopped there, but it seemed okay with him. The second interview was more technical, the interviewer asked me plenty of questions on various topics pertaining to computer science, programming and theory. He also gave me a short programming exercice, which I succeeded (I won't tell you what the questions and exercice were for obvious reasons). I should emphasize that my answers were far from being perfect. When discussing with friends I learned that I had got some answers wrong; I had questions related to the Coq language for instance, I gave the closest answer I knew, which was haskell/ocaml related while Coq's construction was greatly generalized as far as I understand it. One question asked me how I would create a data structure allowing to have an efficient access to some functions. I gave a correct answer, I knew that the worst time complexity was logarithmic, but was not able to prove it. After the interview, I did realize that actually the proof was extremely easy.
The interviewer told me that, before taking a decision, MIRI wanted to meet me at a workshop, so they invited me to AIRCS, and that they also wanted me to take a 2 day long programming test. I still don't understand that: what's the point of inviting me if the test fails ? It would appear more cost efficient to wait until after the test to decide whether they want me to come or not (I don't think I ever asked it out loud, I was already happy to have a trip to California for free). I want to emphasize that at this point, I still had no idea of why they had found my application interesting, and what I would actually do if I worked for them. I kind of hoped that I'd get an answer eventually. I was noticing my confusion, as a good rationalist should do. Alas, as always when I notice my confusion, I stay confused and can't figure out what to do with it.
Last part of the interview was a two day long programming problem. Since the interviewer asked me to put my code on github, I guess it's open source and not a secret, but I don't feel confortable sharing the problem here... If you really want to, you can always go to my github account, unless at some point they ask me to delete it. There was a main programming task, with two options, I did only one of them. I mean, the second one seems way more interesting, but with my knowledge today, I fear that I would have needed a week to really read enough and implement it correctly. As I told my interviewer, it relates to a topic I do not know well. There was one thing which I had particularly appreciated: I was being paid for the whole time doing this programming exercice for them. This is a practice I've never seen elsewhere. I do not wish to disclose how much I have been paid, but I'll state that two hours at that rate was more than a day at the French PhD rate. I didn't even ask to be paid; I hadn't even thought that being paid for a job interview was possible. The task had to be done in 14 hours, i.e. after 2 days of 7 hours work each. I did not know if that rate also applied to me being an intern, but that salary was nice regardless, especially since they'd paid very quickly  it takes three months for every payment back in France. As a quick note here: thank you to everyone who gave at MIRI :p.
Since it seemed that MIRI was actually interested in my application, I believed that I should read more about MIRI. I was mostly reading what MIRI wrote, but to have an informed opinion, I thought I should read what other people wrote about it. Sometimes, EY made fun of the sneer club on twitter. I also saw a copy of an email sent to a lot of people related to CFAR about some creepy stuff that allegedly occurred at MIRI/CFAR. I wanted to get an idea of what I was getting myself into, so I started to browse through all of it. My inner Slytherin argues that the sneer club is filled with rationalists who posts non important stuff so that the actual important stuff get lost in the middle of everything else. I'm not going to write down everything I've read, but let me write a few examples as to explain of why I still went through with the application process. I've read that MIRI's work is not important, this one I might agree with but if I'm getting paid and there is only a one percent chance the work is important, that's still far better than most other job I can find now. Miri is maledominated... well; according to their "team" page, it's hard to argue with this one too, however, given that I'm looking for a job as a computer programmer, I fear that's a problem I'll have everywhere. "At a MIRI's workshop, some guy touched a woman without her consent, and did it again after she asked him to stop and he was banned for this". I do not want to belittle the importance of sexual assault, but if MIRI's reaction was to bam him, I guess that's better than most conference I've heard of. I have no more details, so I don't know how easy it was to report the assault to staff, but I assume that if there were any problem here, the author would have wrote it. I also read that some people have mentioned the idea that infohazard could be used to blackmail people. I kind of assume that less wrong/rationality groups is the best place for this kind of scams, but I'm still okay with this, especially since I have not heard of people actually implementing the idea.
Long story short: it's quite hard to believe that LW/MIRI is a cult (after all, they ask for money to save the world). But since I was actually being offered things by them and obtained the best wage I ever had for the working test, it was hard to be frightened by this cultidea. And while I sent a few hundreds euros to Berlin's LW community, it was used to pay 3 nights at an hostel, and 4 days of food, so I highly doubt there are making huge profits on it.
Let's now skip a whole month and go to AIRCS.
AIRCSBefore discussing AIRCS, I want to stress this is a personal view, which may not represent the way anyone else felt about AIRCS. In particular, the fact that, after meeting me at AIRCS, I received an email telling me I won't have an internship at MIRI have probably affected my views. At least because I would probably have been more MIRI aligned if I was actually at MIRI nowadays. I tried to write as much as possible before receiving MIRI's answer to remain as "objective" as possible, however they were quite quick to answer, and I'm a slow writer. And anyway, as you'll see below, the simple fact that I was at AIRCS to help them decide whether they hire me or not meant that my experience was distinct from the others'.
Now that this is written down, I can begin to describe AIRCS.
Normal lifeBefore going into activities in the workshop's planning, I'm going to speak of the workshop generally. After all, the workshop was 5 night and 4 day long, so of course we've spent times doing other stuff. Usually, there were 1h10 long activities separated by 20 minutes breaks, while lunch and dinner breaks were longer. The last night was an "after party" instead of workshops, and we had two breakout sessions prior. In those, people were free to offer whatever they wanted. There was a lecture about category theory and quantum circuits if I remember correctly. There was some coding in Haskell. I said I would gladly do an introduction to Anki, but no one found this more interesting than the other stuff  This still led to some ankirelated questions later by some participants. During some breaks, there were "two minutes clean up", where everyone were cleaning whatever they saw near them. Usually, the clean up were announced in the main room, which means that people far from it didn't hear it and so didn't clean what was near them. Since most of the disorder was actually in the main room that was not an issue. At the end of the two minute cleaning, we resumed the break.
AI Risk for Computer Scientists is a worskhop which, as far as I can tell, is not really for computer scientists. A facebook friend of mine, president of EA France, told me she was invited even though she had a PhD in economics and is not doing any computer science. Most of the people there were computer scientists, however, and most of the noncomputer scientists were mathematicians. A lot of the small talk was related to math and computer science and were often far more technical than the workshops themselves, which was quite unexpected to me. Edward Kmett was there and spoke a lot about haskell and theories behind programming languages, which was extremely interesting and not related at all to AI.
A lot of people at the event were vegan, and so at each meal, there was a huge choice of good vegan food, which is incredible to me ! So I was even more intrigued as to why we almost exclusively used throwaway utensil, plates and bowls; as most vegan texts I've read are about how meat arms the environment. I assume that if EA cares about animal suffering in itself, then using throwaways is less of a direct suffering factor.
I had at least four goals arriving at this workshop:
 The first one was to discover what could actually be done to advance research related to AI safety.
 The second one was to understand why people believe that there is such a high risk that an unaligned AGI will be created in the foreseeable future.
 The third one was to get an internship at MIRI; and thus to draw a salary again.
 The last one was to discover and enjoy California, the bay area, and this rationalist community I've read a lot about.
Task four is mostly a success, task two is kind of a success and the other two tasks are complete failures.
Workshop activitiesAlmost all activities were done in small groups, with either a half, a third or a fourth of all participants and so the lectures/activities were repeated two to four times during the day. This way, we all had done the same activities, and being in small groups also allow to have real discussions with the professor and the other participants. Groups were changing a lot, so that we also met all participants.
As far as I understand it, plenty of people are panicked when they really understand what AI risks are. So Anna Salamon gave us a rule: We don't speak of AI safety to people who do not express the desire to hear about it. When I asked for more informations, she specified that it is okay to mention the words "AI Safety"; but not to give any details until the other person is sure they want to hear about it. In practice, this means it is okay to share a book/post on AI safety, but we should warn the person to read it only if they feel ready. Which leads to a related problem: some people never experienced an existential crisis or anxiety attack of their life, so it's all too possible they can't really "be ready". On the same note, another researcher at MIRI answered me when asked as to why they don't hold a lecture explaining the imminence of AI that they do not want to be the one explaining to everyone why we're all going to die in the next few decades. On the one hand, I guess I understand why, but on the other hand, I'd say it's kind of the point of being there !
To give a more concrete example of how they tried to fight potential panic: during one of the activity, Anna Salamon asked us to list all the reasons why it could be a bad idea of thinking about AI risk. An obvious answer was that thinking about AI may help AGI be created, and so our defeat would come sooner. Another answer, as explained above, would be that some people would panic and not be able to do anything anymore. One of my answer was about a theory we are not allowed to discuss, which if it were true would mean that it's a very bad idea of thinking about it altogether. However, as I expected, she didn't write in on the white board. I still felt that to be exhaustive, I had to mention it to her.
As you can see, the staff really cared about us and wanted to be sure that we would be able to manage the thoughts related to AI risk. This led to a lot of talks which were not directly related to AI. Given the importance of non AIrelated talks, I'm going to start to describe the workshop by listing examples of activities which were not AI related.
Non AI related events.I do not know how much of the stuff we saw is usual at CFAR. There were talks on bucketing, inner sim, world models, question substitution, double crux, focusing... I'm not going to try to explain all of this, as I would probably do a bad job; just go to a CFAR event to learn about them. Or read their book, it seems they just published it. To test those tools, we had to apply them to specific problems; either an AIrelated problem, or a personnal one. I don't even know how to start thinking about AI problems, so I decided to choose personal problems.
Sadly, there has been some timing issues, and the professor usually did not have the time to see every participants to help them with the exercices.
If all of this helps to think clearly about problems and how to solve them, then it seems strange to me that CFAR uses it only for AI stuff. If it does help to deal with the horror of unaligned AI, why isn't it used to speak of global warming ? While I admit that the narrative about AI risk is even worse than the one about global warming, I do think that if you really try to grasp the whole impact of global warming you also need some mental help to consider the horror it entails. The same thing would also be true if you want to consider global poverty, slavery, and so many other things in our world.
There were also a lot of socalled "circle". I've heard that circles are not usually done at CFAR, because circles are less effective at mainstream workshop than in AIRCS workshop. We've been told when we started circling that we would not really learn what circle is, that all circles are different. That we could see plenty of circles and they would not seem similar, the same way we could discover a lot of music and think that there is nothing in common between them. For a computer scientist, this did seem extremely strange and dubious. How can I evaluate whether the exercice was correctly done or not ? Or even what was the purpose of this exercice ? As far I as understand, one goal of the circle is to be able to get what we feel and to put words on it. In particular, circle is also about going meta, and being able to think about the current thinking process. I do feel like that's something I was already quite good at, it looks a lot like blogging  I mostly blog in French, so most of you could not really check out yourself. When I want to write about something and fail, I usually try to understand what my problem is; and then I write to explain what I would have liked to write and why I failed doing it. This is also something I really often do with one of my partner; which leads to conversations too meta for my taste sometime, but which helps discussion in the long run. My point here is that it felt so much like something I often do that I didn't really felt like it was new for me. Or maybe I just didn't understand what circling is and just used instead something I already knew. Who knows ? There were two circles in small group the first two days, and a big long circle with everyone the third day. Someone had been more vocal and honest than myself, and asked why we were doing this. We only have four days, they said, and we are supposed to be talking about AI. Personally, I decided to trust MIRI and to suspend my judgement, expecting that at some point I'd get the meaning in all of this.
Job interviewOkay, let's go back to the job interview part. You'll now understand why both «describing AIRCS» and «applying at MIRI» had to be in the same blog post.
I ask you to remember that I was actually at AIRCS because MIRI wanted to meet me before any decision on hiring me. That means that, during circles, I was asked to be as honest as possible about my feelings while also being considered for an internship. This is extremely awkward. Similarly, to discover CFAR tool, they asked us to consider a personal problem we currently have. One of my biggest problem is about having a salary again, which probably means having a job. Other personal problems (like finding a town where I want to live in) are extremely related to the problem of having a job. But the staff member teaching us CFAR materials may also eventually have to give their reading on me. Which means that the person to help me learn how to solve my problems is potentially a part of the problem's solution. The last day, we had to fill a feedback form about the workshop... That is, they asked me to fill a feedback from about a part of my job interview BEFORE I actually had any answers.
All of this was extremely awkward. And it really made the workshop hard to enjoy, as I spent time and time again wondering how I should talk about those problems, while talking about them could effectively change my chances of having the job.
I appreciate and entirely realize that the AIRCS workshop is not just a recruiting process. I do believe that all staff members were honest in saying there were meeting people, teaching interesting stuff, and that no one was actually taking notes on me. No one had any thought process focused on me either. The first time I spoke of this problem to someone from MIRI, during a 1to1 discussion, he told me that I should not consider this as a job interview but at a place to socialize and potentially network. But just because they do not think of AIRCS as a job interview does not mean AIRCS is not a job interview. Case in point: half a week after the workshop, the recruiter told me that "After discussing some more, we decided that we don't want to move forward with you right now". So the workshop really was what led them to decide not to hire me.
To be more exhaustive, there is a second possibility; maybe after the 2 days long exam, the recruiter already knew they would not hire me. However, they do want to have a lot of people attending AIRCS for reasons I'll discuss later. Thinking that I would not be interested in attending if I knew I didn't have the job, they didn't let me know. However, this would require the recruiter to be able to lie really consistently during the whole workshop, which would be quite impressive.
During a trip to the beach, I finally had the courage to tell the recruiter that AIRCS is quite complex to navigate for me, when it's both a CFAR workshop and a job interview. I assumed that, since the MIRI wanted honest communication, telling them my thoughts would be the best I could do anyway. They answered that they see how having activities such as circles could be a problem, but that there was nothing they could do. I answered two things. First: they could mention people coming to AIRCS for a future job interview that some things will be akward for them; but that they have the same workshop as everyone else so they'll have to deal with it. And I also answered that my problems would be solved if I had an answer already. Since I did know that there was no way for them to already confirm that I had the internship, the conclusion would have been that the workshop would be nicer if they could confirm I was not going to get it. I immediately added by reflex that I was not asking for it. That I was just being exhaustive. I was dishonest then, I was pretty sure I wouldn't have the job, and actually knowing it would have made things quite simpler. However, I had the feeling that asking this would have decreased my probability of being hired, and so I avoided doing it. Furthermore, I do understand why it's generally a bad idea to tell unknown people in your buildings that they won't have the job. Some may react pretty badly, and damage your properties. There was no way for me to convince them that it would be safe to tell me I would not get hired if it were the case. Furthermore, other people were there in order to work at MIRI, and he could not just give informations to me and ignore everyone else.
I do not believe that my first advice will be listened to. During a discussion, the last night near the fire, the recruiter was discussing with some other miri staff and participants. And at some point they mentioned MIRI's recruiting process. I think that they were mentioning that they loved recruiting because it leads them to work with extremly interesting people, but that it's hard to find them. Given that my goal was explicitly to be recruited, and that I didn't have any answers yet, it was extremely awkward for me. I can't state explicitly why, after all I didn't have to add anything to their remark. But even if I can't explain why I think that, I still firmly believe that it's the kind of things a recruiter should avoid saying near their potential hire.
AIrelated activitiesOk, now, let's go back to AI related activities. The disparity in the participants' respective knowledge was very impressive to me. One of the participants was doing a PhD related to AI safety; while another one knew the subject so well that they could cite researchers and say that "in research paper A, person B states C", and that was related to the current discussion. And some other participants were totally new to all of this and came there by pure curiosity, or as to understand why EA spoke so much of AI. While the actual activities were created to be accessible to everyone, the discussions between participants was not always so. Because of course, people who already know a lot about those topics speak of the facts they know, they use their knowledge when they discuss with the staff.
Most discussions were pretty high level. For example, someone presented a talk where they explained how they tried and failed to model and simulate a brain of C. Elegansis. A worm with an extremely simple and well understood brain. They explained to us a lot of things about biology, and how they had been trying and scanning precisely a brain. If I understood correctly, they told us they failed due to technical constraints and what those were. They believe that, nowadays, we can theoretically create the technology to solve this problem. However there is no one interested in said technology, so it won't be developed and be available to the market. Furthermore, all of their research was done prior to them discovering AI safety stuff so it's good that no one created such a precise model of a  even if just a worm  brain.
Another talk was about a possible AI timeline. This paragraph is going to describe a 70 minuteslong talk, so I'll be omitting a lot of stuff. The main idea of the timeline was that, if you multiply the number of times a neuron fire every seconds, the number of neurons in the brain, the number of living creatures on earth on one side, and if you multiply the number of processors, their speed, and the memory of computers on the other, then you obtain two numbers you can compare. And according to this comparison, we should have been able to simulate a human with today's supercomputer, and tomorrows computer. Similarly, with tomorrows supercomputer we should be able to simulate the whole evolution of life. However, it seems no one did anything close to those simulations. So, Buck Shlegeris, who gave the talk, asked us: "all of this are just words. What are your thoughts" and "what's wrong with this timeline" ? Those are actually questions he asked us a lot: he wanted our opinions, and why we disagreed. That's quite an interesting way to check how we think I guess, and an extremely hard one to answer. Because there were so many thing wrong with the talk we head. I think I answered that one does not only have to simulate brains, but also all of their environments. And the interactions between brains, as opposed to just each brains in their own simulation. But honestly, my real answer would be that it's just not how science and technology works. You can't just hand wave ideas and expect things to work. By default, things do not work. They work when an actual precise effort have been made, because someone had a plan. I used to teach introduction to programming; and the talk reminded me of some of my students asking why their program does not work  while I was trying to understand why they believed it would work in the first place. Even with parallel super computer, having a program simulating the evolution from a single cell to humanity is a task that does not seems trivial, and there is no reason for it to have occurred by itself. The fact that this does not occur is the default. I guess I could also have answered that most programs bugs when they start, and if you need to use super computer for weeks, even a really passionate hacker is not going to have enough resources today to actually do this work. However, I didn't gave this long answer; I didn't feel like it was something I could easily convey with words. That the whole point made no sense. And it probably would not have helped my recruitment either.
AI DiscussionI'm not going to list in details every lectures. One major aspect of it however, is that even after all of them I didn't know what one can concretely do to help with AI safety/alignment problems. Neither did I know why they believed that it was such a high risk. So I went to speak with staff members directly and asked them those questions. One staff member told me that I could go to AI alignment forums, read on the problem they were discussing and, if I had any ideas, write them down. If I had been really honest, I would have asked "why do you want to hire developers, I was hoping to learn that, but I still have no ideas as to what I would be doing if hired". I did not. Because at some point it felt like it was of bad taste to recall that that was my initial goal, and I was pretty sure that I would not be hired anyway.
It would be interesting to read the forum and thinking about those questions myself. As I wrote earlier, MIRI's stuff does seem fun to consider. But then again, there are tons of fun stuff to consider in the world, and plenty of things I don't know yet and want to learn. Currently, what I must learn, I believe, is what will help me have a job. It won't save lives. But it is an interesting perspective to me. And even once I have a stable job, there will be so much more things to be curious about; I'm not sure that I'll just take the time to work on AI. After all, if they don't think I can help them by doing this fulltime, I don't see how I could help by doing it parttime.
The question I asked most during discussions was something along the line of :
I do agree that an unaligned AI would be extremly dangerous. I will accept the idea that spending the career of a hundred researchers in my field is totally worth it if there is even a two percent chance of having every life destroyed in the next century. Can you explain to me why you believe that the probability is at least two percent ? I think that I was being quite generous with my question. Low probability, high timeframe. According to Anna Salamon's rule I gave earlier, if you don't feel ready to hear the answer I received, and understand why there is such a risk, you should skip to next section.
The first person who answered this question told me they had followed the recent progress of machine learning and thought that we are less than ten breakthrough away from AGI. So it is going to occur imminently. It also means that I can't get convinced the same way as I have not seen the speed of progress of the field. Another answer I got was that there is so many progress going so quick that AGI is bound to occur. Personally, I do agree that progress is quick; and I expect to be amazed and astonished multiple time again in the years to come. However, it does not necessarily means that we are going to reach AGI. Let's say that the progress is going exponentially fast, and that AGI is aleph_0 (the smallest uncountable ordinal). Then the speed does not really matter, the function can't reach aleph_0. I've been answered that anyway, even if AI gets stuck again and fails to reach AGI, we will reach AGI as soon as we know how to model and simulates brain. I must admit that I found Age of EM extremly interesting as well as frightening. However I have no certitudes that we will reach EMs one day. In both cases, AGI and EMs, I see the same problem: I can imagine that we need fundamental breakthrough to make real progress. However, while private companies are happy to do more and more amazing technological creations, it does not seems to me that they are willing to do decadelong researches which won't bring back any money in the next few years. If you need a biological/chemical discovery which requires something as large as the hardron collider, then I don't see any private company financing it. Especially since, once the thing is discovered, other companies may use it. On the other hand and as far as I know, universities have less and less fundings worldwide, and professors need to be able to justify they'll publish papers soon and regularly. All in all, I really don't think research is going to be able to make fundamental important discoveries that would allow to create a new kind of science probably required for AGI/Ems. I held this last discussion just after the introduction to «double crux». And I don't know whether this introduction helped, but after this discussion, I understood that one big difference between me and most of the staff is that I'm far more pessimistic about the future of research than they are.
The point of the workshopSome people asked what was the point of the workshop. Why had MIRI paid the travel and food of ~20 participants, and provided the staff to work with us. It was supposed to be about AI and we spent a lot of time doing circles. We didn't really learn how to help with alignment problems. I'm going to write down some answers to that that I heard during the workshop. It is beneficial to see a lot of things that are not related to AI, because we need to have a really big surface in order to interact with plenty of parts of the world (this is a chemical reaction metaphor). Maybe some useful ideas will be coming from some other fields, fields that are not yet related to AI safety. So the AI safety field need to interact with a lot of people from the outside. Some people might get interested enough to actually want to work on AI safety after this workshop, and go to MIRI or to some other places which work on those stuff, such as CHAI, FHI, etc. Some people may come to AIRCS alumni workshop, where they get more technical (I've not yet seen anything related to the alumni workshop, and I doubt I'll ever be invited  I didn't really give any new ideas in the first workshop). Also, if at some point they need a lot of people to work on the alignment problem, they'll have all the AIRCS alumni they can contact. AIRCS alumnis have been introduced to their ideas and could start working quicker on those topics. However, I didn't understand why they would suddenly need so urgently a team to work on the alignment problem (and it sounds a little bit Terminatorish).
Now, there is a second question: what was the point of me going there. I mean, I went there because I was invited and curious. But what was their goal when inviting me ? While I was not told exactly why my application was rejected after the workshop, I have strong intuitions about it. First, I was not able to participate in discussions as much as the other participants did. As I said a lot of time, I am not convinced that the risk is as high as they perceive it. And I found that most talks were so high levels that there was no way for me to say anything concrete about them. At least not without seeing how they actually would try to implement things first. Worse, I was saying all of this out loud! A staff member from MIRI told me that my questions meant I didn't really understood the risks. And understanding them is, supposedly, a prerequisite to be able to work on AI alignment. So it was easy to conclude that I would not get the internship. But all of this was already known to the recruiter after the first screening we had. I already told him I skimmed though MIRI's post but had not had read a lot. And I was not even convinced that it was a serious risk. We already had a discussion where they tried to understand my thoughts on this, and why I did not believe in the scenarios they were depicting. As I wrote in the introduction, I notice I am confused and I'm quite perplex by the fact that my confusion has only increased. Every time a company decides not to hire me, I would love to know why, at least as to avoid making the same mistakes again. Miri here is an exception. I can see only so many reasons not to hire me that the outcome was unsurprising. The process and they considering me in the first place was.
The confusion does not decrease.
Footnotes:
 Most probable ending. Because either someone crafts an unaligned AGI by not taking MIRI's position into consideration, or because AGI is impossible
 I still wonder why. I assume there are already a lot of computer scientists wanting to have an impact, and that they specialize more in the US and UK whereas I was in France back then.
 My friend tells me that actually, it's not even a generalization. Both concepts are unrelated, appart that they have the same name.
 There are actually teacher assistants who are still waiting for their 20182019 salary.
 To be honest, at this point, I was pretty sure I wasn't going to have the job. But not quite sure enough to feel comfortable about it.
Discuss
Definitions of Causal Abstraction: Reviewing Beckers & Halpern
Author's Notes: This post is fairly technical, with little background and minimal examples; it is not recommended for general consumption. A general understanding of causal models is assumed. This post is probably most useful when read alongside the paper. If your last name is "Beckers" or "Halpern", you might want to skip to the last section.
There’s been a handful of papers in the last few years on abstracting causal models. Beckers and Halpern (B&H) wrote an entire paper on definitions of abstraction on causal models. This post will outline the general framework in which these definitions live, discuss the main two definitions which B&H favor, and wrap up with some discussion of a conjecture from the paper. I'll generally use notation and explanations which I find intuitive, rather than matching the paper on everything.
In general, we’ll follow B&H in progressing from more general to more specific definitions.
General FrameworkWe have two causal models: one “lowlevel”, and one “highlevel”. There’s a few choices about what sort of “causal model” to use here; the main options are:
 Structural equations
 Structural equations with a DAG structure (i.e. no feedback loops)
 Bayes nets
B&H use the first, presumably because it is the most general. That means that everything here will also apply to the latter two options.
Notation for the causal models:
 We’ll write .mjxchtml {display: inlineblock; lineheight: 0; textindent: 0; textalign: left; texttransform: none; fontstyle: normal; fontweight: normal; fontsize: 100%; fontsizeadjust: none; letterspacing: normal; wordwrap: normal; wordspacing: normal; whitespace: nowrap; float: none; direction: ltr; maxwidth: none; maxheight: none; minwidth: 0; minheight: 0; border: 0; margin: 0; padding: 1px 0} .MJXcdisplay {display: block; textalign: center; margin: 1em 0; padding: 0} .mjxchtml[tabindex]:focus, body :focus .mjxchtml[tabindex] {display: inlinetable} .mjxfullwidth {textalign: center; display: tablecell!important; width: 10000em} .mjxmath {display: inlineblock; bordercollapse: separate; borderspacing: 0} .mjxmath * {display: inlineblock; webkitboxsizing: contentbox!important; mozboxsizing: contentbox!important; boxsizing: contentbox!important; textalign: left} .mjxnumerator {display: block; textalign: center} .mjxdenominator {display: block; textalign: center} .MJXcstacked {height: 0; position: relative} .MJXcstacked > * {position: absolute} .MJXcbevelled > * {display: inlineblock} .mjxstack {display: inlineblock} .mjxop {display: block} .mjxunder {display: tablecell} .mjxover {display: block} .mjxover > * {paddingleft: 0px!important; paddingright: 0px!important} .mjxunder > * {paddingleft: 0px!important; paddingright: 0px!important} .mjxstack > .mjxsup {display: block} .mjxstack > .mjxsub {display: block} .mjxprestack > .mjxpresup {display: block} .mjxprestack > .mjxpresub {display: block} .mjxdelimh > .mjxchar {display: inlineblock} .mjxsurd {verticalalign: top} .mjxmphantom * {visibility: hidden} .mjxmerror {backgroundcolor: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; fontstyle: normal; fontsize: 90%} .mjxannotationxml {lineheight: normal} .mjxmenclose > svg {fill: none; stroke: currentColor} .mjxmtr {display: tablerow} .mjxmlabeledtr {display: tablerow} .mjxmtd {display: tablecell; textalign: center} .mjxlabel {display: tablerow} .mjxbox {display: inlineblock} .mjxblock {display: block} .mjxspan {display: inline} .mjxchar {display: block; whitespace: pre} .mjxitable {display: inlinetable; width: auto} .mjxrow {display: tablerow} .mjxcell {display: tablecell} .mjxtable {display: table; width: 100%} .mjxline {display: block; height: 0} .mjxstrut {width: 0; paddingtop: 1em} .mjxvsize {width: 0} .MJXcspace1 {marginleft: .167em} .MJXcspace2 {marginleft: .222em} .MJXcspace3 {marginleft: .278em} .mjxtest.mjxtestdisplay {display: table!important} .mjxtest.mjxtestinline {display: inline!important; marginright: 1px} .mjxtest.mjxtestdefault {display: block!important; clear: both} .mjxexbox {display: inlineblock!important; position: absolute; overflow: hidden; minheight: 0; maxheight: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjxtestinline .mjxleftbox {display: inlineblock; width: 0; float: left} .mjxtestinline .mjxrightbox {display: inlineblock; width: 0; float: right} .mjxtestdisplay .mjxrightbox {display: tablecell!important; width: 10000em!important; minwidth: 0; maxwidth: none; padding: 0; border: 0; margin: 0} .MJXcTeXunknownR {fontfamily: monospace; fontstyle: normal; fontweight: normal} .MJXcTeXunknownI {fontfamily: monospace; fontstyle: italic; fontweight: normal} .MJXcTeXunknownB {fontfamily: monospace; fontstyle: normal; fontweight: bold} .MJXcTeXunknownBI {fontfamily: monospace; fontstyle: italic; fontweight: bold} .MJXcTeXamsR {fontfamily: MJXcTeXamsR,MJXcTeXamsRw} .MJXcTeXcalB {fontfamily: MJXcTeXcalB,MJXcTeXcalBx,MJXcTeXcalBw} .MJXcTeXfrakR {fontfamily: MJXcTeXfrakR,MJXcTeXfrakRw} .MJXcTeXfrakB {fontfamily: MJXcTeXfrakB,MJXcTeXfrakBx,MJXcTeXfrakBw} .MJXcTeXmathBI {fontfamily: MJXcTeXmathBI,MJXcTeXmathBIx,MJXcTeXmathBIw} .MJXcTeXsansR {fontfamily: MJXcTeXsansR,MJXcTeXsansRw} .MJXcTeXsansB {fontfamily: MJXcTeXsansB,MJXcTeXsansBx,MJXcTeXsansBw} .MJXcTeXsansI {fontfamily: MJXcTeXsansI,MJXcTeXsansIx,MJXcTeXsansIw} .MJXcTeXscriptR {fontfamily: MJXcTeXscriptR,MJXcTeXscriptRw} .MJXcTeXtypeR {fontfamily: MJXcTeXtypeR,MJXcTeXtypeRw} .MJXcTeXcalR {fontfamily: MJXcTeXcalR,MJXcTeXcalRw} .MJXcTeXmainB {fontfamily: MJXcTeXmainB,MJXcTeXmainBx,MJXcTeXmainBw} .MJXcTeXmainI {fontfamily: MJXcTeXmainI,MJXcTeXmainIx,MJXcTeXmainIw} .MJXcTeXmainR {fontfamily: MJXcTeXmainR,MJXcTeXmainRw} .MJXcTeXmathI {fontfamily: MJXcTeXmathI,MJXcTeXmathIx,MJXcTeXmathIw} .MJXcTeXsize1R {fontfamily: MJXcTeXsize1R,MJXcTeXsize1Rw} .MJXcTeXsize2R {fontfamily: MJXcTeXsize2R,MJXcTeXsize2Rw} .MJXcTeXsize3R {fontfamily: MJXcTeXsize3R,MJXcTeXsize3Rw} .MJXcTeXsize4R {fontfamily: MJXcTeXsize4R,MJXcTeXsize4Rw} .MJXcTeXvecR {fontfamily: MJXcTeXvecR,MJXcTeXvecRw} .MJXcTeXvecB {fontfamily: MJXcTeXvecB,MJXcTeXvecBx,MJXcTeXvecBw} @fontface {fontfamily: MJXcTeXamsR; src: local('MathJax_AMS'), local('MathJax_AMSRegular')} @fontface {fontfamily: MJXcTeXamsRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_AMSRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_AMSRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_AMSRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXcalB; src: local('MathJax_Caligraphic Bold'), local('MathJax_CaligraphicBold')} @fontface {fontfamily: MJXcTeXcalBx; src: local('MathJax_Caligraphic'); fontweight: bold} @fontface {fontfamily: MJXcTeXcalBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_CaligraphicBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_CaligraphicBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_CaligraphicBold.otf') format('opentype')} @fontface {fontfamily: MJXcTeXfrakR; src: local('MathJax_Fraktur'), local('MathJax_FrakturRegular')} @fontface {fontfamily: MJXcTeXfrakRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_FrakturRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_FrakturRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_FrakturRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXfrakB; src: local('MathJax_Fraktur Bold'), local('MathJax_FrakturBold')} @fontface {fontfamily: MJXcTeXfrakBx; src: local('MathJax_Fraktur'); fontweight: bold} @fontface {fontfamily: MJXcTeXfrakBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_FrakturBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_FrakturBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_FrakturBold.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmathBI; src: local('MathJax_Math BoldItalic'), local('MathJax_MathBoldItalic')} @fontface {fontfamily: MJXcTeXmathBIx; src: local('MathJax_Math'); fontweight: bold; fontstyle: italic} @fontface {fontfamily: MJXcTeXmathBIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MathBoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MathBoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MathBoldItalic.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsansR; src: local('MathJax_SansSerif'), local('MathJax_SansSerifRegular')} @fontface {fontfamily: MJXcTeXsansRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_SansSerifRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_SansSerifRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_SansSerifRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsansB; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerifBold')} @fontface {fontfamily: MJXcTeXsansBx; src: local('MathJax_SansSerif'); fontweight: bold} @fontface {fontfamily: MJXcTeXsansBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_SansSerifBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_SansSerifBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_SansSerifBold.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsansI; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerifItalic')} @fontface {fontfamily: MJXcTeXsansIx; src: local('MathJax_SansSerif'); fontstyle: italic} @fontface {fontfamily: MJXcTeXsansIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_SansSerifItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_SansSerifItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_SansSerifItalic.otf') format('opentype')} @fontface {fontfamily: MJXcTeXscriptR; src: local('MathJax_Script'), local('MathJax_ScriptRegular')} @fontface {fontfamily: MJXcTeXscriptRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_ScriptRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_ScriptRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_ScriptRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXtypeR; src: local('MathJax_Typewriter'), local('MathJax_TypewriterRegular')} @fontface {fontfamily: MJXcTeXtypeRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_TypewriterRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_TypewriterRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_TypewriterRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXcalR; src: local('MathJax_Caligraphic'), local('MathJax_CaligraphicRegular')} @fontface {fontfamily: MJXcTeXcalRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_CaligraphicRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_CaligraphicRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_CaligraphicRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmainB; src: local('MathJax_Main Bold'), local('MathJax_MainBold')} @fontface {fontfamily: MJXcTeXmainBx; src: local('MathJax_Main'); fontweight: bold} @fontface {fontfamily: MJXcTeXmainBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MainBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MainBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MainBold.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmainI; src: local('MathJax_Main Italic'), local('MathJax_MainItalic')} @fontface {fontfamily: MJXcTeXmainIx; src: local('MathJax_Main'); fontstyle: italic} @fontface {fontfamily: MJXcTeXmainIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MainItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MainItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MainItalic.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmainR; src: local('MathJax_Main'), local('MathJax_MainRegular')} @fontface {fontfamily: MJXcTeXmainRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MainRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MainRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MainRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmathI; src: local('MathJax_Math Italic'), local('MathJax_MathItalic')} @fontface {fontfamily: MJXcTeXmathIx; src: local('MathJax_Math'); fontstyle: italic} @fontface {fontfamily: MJXcTeXmathIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MathItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MathItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MathItalic.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsize1R; src: local('MathJax_Size1'), local('MathJax_Size1Regular')} @fontface {fontfamily: MJXcTeXsize1Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_Size1Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_Size1Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_Size1Regular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsize2R; src: local('MathJax_Size2'), local('MathJax_Size2Regular')} @fontface {fontfamily: MJXcTeXsize2Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_Size2Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_Size2Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_Size2Regular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsize3R; src: local('MathJax_Size3'), local('MathJax_Size3Regular')} @fontface {fontfamily: MJXcTeXsize3Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_Size3Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_Size3Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_Size3Regular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsize4R; src: local('MathJax_Size4'), local('MathJax_Size4Regular')} @fontface {fontfamily: MJXcTeXsize4Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_Size4Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_Size4Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_Size4Regular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXvecR; src: local('MathJax_Vector'), local('MathJax_VectorRegular')} @fontface {fontfamily: MJXcTeXvecRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_VectorRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_VectorRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_VectorRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXvecB; src: local('MathJax_Vector Bold'), local('MathJax_VectorBold')} @fontface {fontfamily: MJXcTeXvecBx; src: local('MathJax_Vector'); fontweight: bold} @fontface {fontfamily: MJXcTeXvecBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_VectorBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_VectorBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_VectorBold.otf') format('opentype')} XH for the variables in the highlevel model and XL for variables in the lowlevel model.
 We’ll use capitalletter indices to indicate choosing multiple indices at once. For instance, with S= (1,2,3), XS would be (X1,X2,X3).
 We’ll write interventions as XS←X∗S. For instance, with S=(1,2,3), XS←X∗S would be equivalent to the three simultaneous interventions (X1←X∗1,X2←X∗2,X3←X∗3). Usually both S and X∗ will be unspecified, to indicate a generic intervention.
Next, we need some connection between the highlevel and lowlevel model, to capture the intuitive notion of “abstraction”. At its most general, this connection has two pieces:
 A mapping τ between values of the variables in the models: XH=τ(XL)
 A mapping ω between interventions: (XHSH←XH∗SH)=ω(XLSL←XL∗SL). Here ω determines both SH and XH∗SH as a function of SL and XL∗SL.
Note that, for true maximum generality, both τ and ω could be nondeterministic. However, we’ll generally ignore that possibility within the context of this post.
Finally, the key piece: the highlevel and lowlevel models should yield the same predictions (in cases where they both make a prediction). Formally:
P[XHdo(ω(XLS←XL∗S))]=P[τ(XL)do(XLS←XL∗S)]
For the category theorists: this means that we get the same distribution by either (a) performing an intervention on the lowlevel model and then applying τ to XL, or (b) first applying τ to XL, then applying the highlevel intervention (found by transforming the lowlevel intervention via ω).
The first definition of “abstraction” examined by B&H is basically just this, plus a little wiggle room: they don’t require all possible interventions to be supported, and instead include in the definition a set of supported interventions. This definition isn’t specific to B&H  it’s an obvious starting point for defining abstraction on causal models as broadly as possible. B&H adopt this maximallygeneral definition from Rubenstein et al, and dub it “exact transformation”.
B&H then go on to argue that this definition is too general for most purposes. I won’t rehash their arguments and examples here; the examples in the paper are pretty readable if you’re interested. They also introduce one slightly stronger definition which I will skip altogether; it seems to just be cleaning up a few weird cases, without any major conceptual additions.
τAbstractionThe main attraction in B&H is their definition of “τabstraction”. The main idea in jumping from the maximallygeneral framework above to τabstraction is that the function τ mapping lowlevel variables to highlevel variables induces a choice of mapping between interventions; there’s no need to leave the choice of ω completely openended.
In particular, since XH=τ(XL) by definition, it seems like τ should also somehow relate XH∗ to XL∗ in the interventions XLSL←XL∗SL and XHSH←XH∗SH. The obvious condition is XH∗=τ(XL∗). However, the interventions themselves only constrain XH∗ and XL∗ at the indices SH and SL respectively, whereas τ may depend on (and determine) the variables at other indices.
One natural condition to impose: each value of XH∗ consistent with the highlevel intervention should correspond to at least one possible value of XL∗ consistent with the corresponding lowlevel intervention, and each possible value of XL∗ consistent with the lowlevel intervention should produce a value of XH∗ consistent with the highlevel intervention. More formally: if our intervention values are XH∗SH=xH∗ and XL∗SL=xL∗, then we want equality between sets:
{XH∗XH∗SH=xH∗}={τ(XL∗)XL∗SL=xL∗}
This is the main criterion B&H use to define the “natural” mapping between interventions ωτ. (The exact definition given by B&H is a bit dense, so I won’t walk through the whole thing here.)
Armed with a natural transformation ωτ between lowlevel and highlevel interventions, the next step is of course to define a notion of abstraction: modulo some relatively minor technical conditions, a τabstraction is an abstraction consistent with our general framework, and for which ω=ωτ.
One more natural step: A “strong” τabstraction is one for which all interventions on the highlevel model are allowed.
Constructive τAbstractionIn practical examples of abstraction, the highlevel variables XH usually don’t all depend on all the lowlevel variables XL. Usually, the individual highlevel variables XHi can each be calculated from nonoverlapping subsets of the variables XL. In other words: we can choose a partition σ of the lowlevel variables and break up τ such that
XHi=τi(XLσi).
Also including all the conditions required for a strong τabstraction, B&H call this a “constructive” τabstraction.
The interesting part: B&H conjecture that, modulo some asyetunknown minor technical conditions, any strong τabstraction is constructive.
I think this conjecture is probably wrong. Main problem: constructive τabstraction doesn’t handle ontology shifts.
My goto example of causal abstraction with an ontology shift is a fluid model (e.g. Navier Stokes) as an abstraction of a particle model with only local interactions (e.g. lots of billiard balls). In this case, we have two representations of the lowlevel system:
 A Lagrangian representation, in which we track the position and momentum of each particle
 An Eulerian representation, in which we track the mass and momentum densities as a function of position
The two are completely equivalent; each contains the same information. Yet they have very different structure:
 In the Lagrangian representation, each “variable” (i.e. a particle’s mass & momentum at a given time) interacts with all other variables which are nearby in time; we need to check for collisions against every other particle, even those far away in space, since we don’t know ahead of time which will be close by.
 In the Eulerian representation, each “variable” (i.e. mass & momentum density at a given point in space and time) interacts only with variables which are nearby in both space and time.
In this case, the highlevel fluid model is a constructive abstraction of the Eulerian representation, but not of the Lagrangian representation: the highlevel model only contains interactions which are local in both time and space.
Conceptually, the problem here is that our graph can have dynamic structure: the values of the variables themselves can determine which other variables they interact with. When that happens, an ontology shift can sometimes make the dynamic structure static, as in the Lagrangian > Eulerian transformation. But that means that a constructive τabstraction on the static structure will not be a constructive τabstraction on the dynamic structure (since the partition would depend on the variables themselves), even though the two models are equivalent (and therefore presumably both are τabstractions).
This does leave open the possibility of weakening the definition of a constructive τabstraction to allow the partition σ to depend on XL. Off the top of my head, I don’t know of a counterexample to the conjecture with that modification made.
Discuss
Exploring safe exploration
This post is an attempt at reformulating some of the points I wanted to make in “Safe exploration and corrigibility” in a clearer way. This post is standalone and does not assume that post as background.
In a previous comment thread, Rohin argued that safe exploration is best defined as being about the agent not making “an accidental mistake.” I think that definition is wrong, at least to the extent that I think it both doesn't make much sense and doesn't describe how I actually expect current safe exploration work to be useful.
First, what does it mean for a failure to be an “accident?” This question is simple from the perspective of an engineer outside the whole system—any unintended failure is an accident, encapsulating the majority of AI safety concerns (i.e. “accident risk”). But that's clearly not what the term “accidental mistake” is pointing at in this context—rather, the question here is what is an accident from the perspective of the model? Intuitively, an accident from the perspective of the model should be some failure that the model didn't intend or wouldn't retroactively endorse. But that sort of a definition only makes sense for highly coherent mesaoptimizers that actually have some notion of intent. Maybe instead we should be thinking of this from the perspective of the base optimizer/loss function? That is, maybe a failure is an accidental failure if the loss function wouldn't retroactively endorse it (e.g. the model got a very low reward for making the mistake). By this definition, however, every generalization failure is an accidental failure such that safe exploration would just be the problem of generalization.
Of all of these definitions, the definition defining an accidental failure from the perspective of the model as a failure that the model didn't intend or wouldn't endorse seems the most sensical to me. Even assuming that your model is a highly coherent mesaoptimizer such that this definition makes sense, however, I still don't think it describes current safe exploration work, and in fact I don't think it's even really a safety problem. The problem of producing models which don't make mistakes from the perspective of their own internal goals is precisely the problem of making powerful, capable models—that is, it's precisely the problem of capability generalization. Thus, to the extent that it's reasonable to say this for any ML problem, the problem of accidental mistakes under this definition is just a capabilities problem. However, I don't think that at all invalidates the utility of current safe exploration work, as I don't think that current safe exploration work is actually best understood as avoiding “accidental mistakes.”
If safe exploration work isn't about avoiding accidental mistakes, however, then what is it about? Well, let's take a look at an example. Safety Gym has a variety of different environments containing both goal states that the agent is supposed to reach and unsafe states that the agent is supposed to avoid. From OpenAI's blog post: “If deep reinforcement learning is applied to the real world, whether in robotics or internetbased tasks, it will be important to have algorithms that are safe even while learning—like a selfdriving car that can learn to avoid accidents without actually having to experience them.” Why wouldn't this happen naturally, though—shouldn't an agent in a POMDP always want to be careful? Well, not quite. When we do RL, there are really two different forms of exploration happening:[1]
 Withinepisode exploration, where the agent tries to identify what particular environment/state it's in, and
 Acrossepisode exploration, which is the problem of making your agent explore enough to gather all the data necessary to train it properly.
In your standard episodic POMDP setting, you get withinepisode exploration naturally, but not acrossepisode exploration, which you have to explicitly incentivize.[2] Because we have to explicitly incentivize acrossepisode exploration, however, it can often lead to behaviors which are contrary to the goal of actually trying to achieve the greatest possible reward in the current episode. Fundamentally, I think current safe exploration research is about trying to fix that problem—that is, it's about trying to make acrossepisode exploration less detrimental to reward acquisition. This sort of a problem is most important in an online learning setting where bad acrossepisode exploration could lead to catastrophic consequences (e.g. crashing an actual car to get more data about car crashes).
Thus, rather than define safe exploration as “avoiding accidental mistakes,” I think the right definition is something more like “improving acrossepisode exploration.” However, I think that this framing makes clear that there are other types of safe exploration problems—that is, there are other problems in the general domain of making acrossepisode exploration better. For example, I would love to see an exploration of how different acrossepisode exploration techniques impact capability generalization vs. objective generalization—that is, when is acrossepisode exploration helping you collect data which improves the model's ability to achieve its current goal versus helping you collect data which improves the model's goal?[3] Because acrossepisode exploration is explicitly incentivized, it seems entirely possible to me that we'll end up getting the incentives wrong somehow, so it seems quite important to me to think about how to get them right—and I think that the problem of getting them right is the right way to think about safe exploration.
This terminology is borrowed from Rohin's first comment in the same comment chain I mentioned previously. ↩︎
With some caveats—in fact, I think a form of acrossepisode exploration will be instrumentally incentivized for an agent that is aware of the training process it resides in, though that's a bit of a tricky question that I won't try to fully address now (I tried talking about this somewhat in “Safe exploration and corrigibility,” though I don't think I really succeeded there). ↩︎
This is what I somewhat confusingly called the “objective exploration problem” in “Safe exploration and corrigibility.” ↩︎
Discuss
Open & Welcome Thread  January 2020
 If it’s worth saying, but not worth its own post, here's a place to put it.
 You can also make a shortform post.
 And, if you are new to LessWrong, here's the place to introduce yourself.
 Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are welcome.
If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ.
The Open Thread sequence is here.
Discuss
Looking at Mixers
This is not an official post by the BIDA Board, and the Board has not reviewed it.
In Spring 2015 BIDA bought a Mackie 1604VLZ Pro mixer used for $275. It wasn't the ideal mixer, but it was reasonably close and a good price. It's been starting to wear out for a while, however, and last night one of the two monitor mixes stopped working. While this one may be fixable, I think we should probably get a new mixer that's closer to what we want and sell this one. Here's what I think would be good in a mixer for a dance like BIDA:
At least 10 XLR inputs, ideally 16.
At least 2 prefader monitor outputs, ideally 3+
EQ with sweepable mids.
XLR outputs for the mains, and ideally the monitors.
Ideally, the monitors are covered by the mute.
Ideally, the monitors are covered by the EQ.
Ideally, few extra controls we're not going to use that, when set wrong, keep things from working in hardtodiagnose ways (this is an issue with our current board).
 Mackie ProFX22v3: analog board with pretty much everything we need, though on the expensive end. It has 17 xlr inputs, two of which can be used as DIs, and 3 prefader monitor outputs. The EQ includes the monitors, but the mute unfortunately doesn't. About $700 new.

Mackie ProFX22v2: by going back a generation we can get something almost as good for less. This brings us down to 16 xlr inputs, none of which can be a DI, and two prefader monitor outputs. About $500 new.

Mackie ProFX16v3: another direction we could go is a smaller board. Dropping down a size from the ProFX22v3 brings us down to 11 xlr inputs, but doesn't change anything else. The big question is how often we would find ourselves needing more than 11 inputs. About $500 new.

Behringer XR18: this is a tabletcontrolled mixer, which probably won't work for us. But it has sixteen XLR/TRS combo inputs, six monitor outputs, and lets you mix from anywhere in the hall. It includes its own WIFI, and I know a lot of sound people like it a lot. About $600 new.

KitchenAid KSM105: this is a very popular analog mixer, but it doesn't meet any of our requirements. I don't understand why so many people buy these when they're so underspec'd.
Most of these also have cheaper options available if we bought used.
What mixers have people been using for live sound, and what do you like about them?
Comment via: facebook
Discuss
What were the biggest discoveries / innovations in AI and ML?
These could be theoretical breakthroughs (like "the idea of a perceptron" or [something Judea Pearl did]), or they could be watershed developments / new applications that don't necessarily involve much new theory (like AlexNet or AlphaGo). Anything that seems like an important development in AI, is fair game.
I want an independentlygenerated list of all the interesting developments of AI, over the whole life of the fields, for a research project that I'm working on.
Feel free to include ones that you, personally, think were a big deal in some way, even if most people don't think so.
Thanks!
Discuss
The Universe Doesn't Have to Play Nice
It's often helpful to think about the root cause of your disagreements with other people and it seems to me that one such cause is that I believe the universe doesn't have to play nice in terms of our ability to know anything, while other people do.
Here are some examples of where this assumption can be used:
 The problem of skepticism  How do we know that any of the objects that we seem to perceive are real? Countless philosophers have tried to answer this question, but I'm happy to bite the bullet as I don't see any reason for believing that the universe has to give us this ability
 Godel's theorem/incomputability: It may seem obvious that every mathematical truth has a proof or that we can write a program to compute everything, but we now know this is false
 Problem of induction  Induction may have worked in the past, but we can't conclude that it'll work in the future without using induction itself which would be a circular argument
 Boltzmann Brains  It totally seems as though we could construct a simulation where the majority of agents would be Boltzmann brains.
 Bayesianism  How do we form a prior given that the prior is before any observation that could help us determine a reasonable prior? Some people see this as an argument against Bayesianism, but I see it as just another way the universe isn't nice
 Theories that can explain anything  Evolutionary psychology has often been criticised as being able to provide justso stories for whatever we observe. I don't believe that the universe is always kind enough to provide us with a nice clean empirical test to determine if a theory is true or not as opposed to subtly impacting our expectations over a wide range of results. This becomes worse once you take into account varying positions within a theory, as there could be dozens of incompatible schemes for constraining expectations based on a theory with almost nothing in common
 Nonempirically testable facts  Some people believe if there isn't a clean empirical test to falsify a theory then it is irrelevant pseudoscience rubbish. But reality doesn't have to give us such clear cut boundaries. There are lots of things that are reasonable to posit or lean towards, but where we can't expect to ever know for certain.
 Qualia  Many people believe qualia don't exist because we wouldn't be able learn about them empirically. But it seems spurious to assume nothing exists outside of our lightcone just because we can't observe it
See also:
Discuss
What we Know vs. How we Know it?
Two weeks ago I said:
The other concept I’m playing with is that “what we know” is inextricable from “how we know it”. This is dangerously close to logical positivism, which I disagree with my limited understanding of. And yet it’s really improved my thinking when doing historical research.
I have some more clarify on what I meant now. Let’s say you’re considering my exroommate, person P, as a roommate, and ask me for information. I have a couple of options.
Scenario 1: I turn over chat logs and video recordings of my interactions with the P.
E.g., recordings of P playing music loudly and chat logs showing I’d asked them to stop.
Trust required: that the evidence is representative and not an elaborate deep fake.
Scenario 2: I report representative examples of my interactions with P.
E.g., “On these dates P played music really loudly even when I asked them to stop.”
Trust required: that from scenario 1, plus that I’m not making up the examples.
Scenario 3: I report summaries of patterns with P
E.g., “P often played loud music, even when I asked them to stop”
Trust required: that from scenario 2, plus my ability to accurately infer and report patterns from data.
Scenario 4: I report what a third party told me
E.g. “Mark told me they played loud music a lot”
Trust required: that from scenario 3, plus my ability to evaluate other people’s evidence
Scenario 5: I give a flat “yes good” or “no bad” answer.
E.g., “P was a bad roommate.”
Trust required: that from scenario 3 and perhaps 4, plus that I have the same heuristics for roommate goodness that you do.
The earlier the scenario, the more you can draw your own conclusions and the less trust you need to have in me. Maybe you don’t care about loud music, and a flat yes/no would drive you away from a roommate that would be fine for you. Maybe I thought I was clear about asking for music to stop but my chat logs reveal I was merely hinting, and you are confident you’ll be able to ask more directly. The more specifics I give you, the better an assessment you’ll be able to make.
Here’s what this looks like applied to recent reading:
Scenario 5: Rome fell in the 500s AD.
Even if I trust your judgement, I have no idea why you think this or what it means to you.
Scenario 4: In Rome: The Book, Bob Loblaw says Rome Fell in the 500s AD.
At least I can look up why Bob thinks this.
Scenario 3: Pottery says Rome fell between 300 and 500 AD.
Useful to experts who already know the power of pottery, but leaves newbies lost.
Scenario 2: Here are 20 dig sites in England. Those dated before 323 (via METHOD) contain pottery made in Greece (which we can identify by METHOD), those after 500 AD show cruder pottery made locally.
Great. Now my questions are “Can pottery evidence give that much precision?” and “Are you interpreting it correctly?”
Scenario 1: Please enjoy this pile of 3 million pottery shards.
Too far, too far.
In this particular example (from The Fall of Rome), 23 was the sweet spot. It allowed me to learn as much as possible with a minimum of trust. But there’s definitely room in life for 4; you can’t prove everything in every paper and sometimes it’s more efficient to offload it.
I don’t view 5 as acceptable for anything that’s trying to claim to be evidenced based, or at least, any basis besides “Try this and see if it helps you.” (which is a perfectly fine basis if it’s cheap).
Discuss
Clumping Solstice Singalongs in Groups of 24
This post assumes you're familiar with rationalist solstice. (It also assumes that while yes, ritual is something to be epistemically careful about, the overall effect size is relatively small compared to spending much of your life thinking about a topic with peers that think that topic is important, and meanwhile having community identities is valuable. If you want to debate that please do so on one of those previous posts)
If you run a solstice ceremony with singalongs, there's particular value in:
 Doing at least 16 singalongs
 Clumping* them together in groups of 24, rather than alternating song / story / song / story. (Clumping is valuable even if you are doing a smaller number of songs)
This isn't the right approach for all possible solstice aesthetics, but there's a magic thing that can happen here if you do. And if you're not doing it (i.e. most solstice organizers seem to default to the "story/song/story/song" thing), you won't receive any feedback that there's a different thing you could do with a magic, synergistic outcome.
Reasons to want more songs, and to cluster them in groups of 24:
 It takes people awhile to get comfortable singing.
 Context switching makes it harder to get into the headspace of singing.
 There is a secret, deeper headspace of singing that you only get to if you do a LOT of it, in a row, in an environment that encourages being thoroughly unselfconscious about it.
 There is a long game that I think singalong solstice celebrations can help with, which is to restore musicality as a basic skill, which in turn allows you to have much richer musical traditions than if it's an incidental thing you do a little of sometimes. The payoff for this comes on a multiyear timescale.
There are reasons not to want this many songs, or to have them clustered this way. Some people get more value out of the speeches or other activities than songs. One organizer of a small solstice mentioned their primary concern was "Have each person bring one activity to the solstice", and most of them weren't comfortable with songleading. Getting people directly involved with Solstice indeed seems valuable if that's an option. (This makes more sense for smaller communities)
But my impression is that much of the time, the ratio of songs/stories and their placement was determined somewhat arbitrarily, and then never reconsidered.
Getting Comfortable
It used to be that group singing was quite common. There were no iPods or headphones, or even recordings. Running into a 1inamillion musician was a rare event. Therefore, it was quite natural that if you wanted music in your life, you had to make it yourself, and when you did you were comparing yourself to your friends and family, not to popular superstars.
This is no longer the case by default. So it takes people awhile to get used to "oh, okay I am actually allowed to sing. I am actually encouraged to sing. It doesn't matter if I sound good, we are doing this thing together."
For many people, it takes at least two songs in a row to get them to a point where they even consider singing at all, let alone feeling good about it. The feeling of hesitation resets when you spend a lot of time listening to a speech.
The idea here is not just "people get to sing", but, "people feel a deep reassurance that singing is okay, that we are all here singing together", and I think that's just impossible to get in the space of one or even two songs. (It becomes even harder to hit this point if there are proportionately few singalongs, and especially if there are also performancepiece songs that people are not encouraged to sing along with)
Deep musical headspace
In my preferred celebration, "Deep reassurance that singing is okay" is only step one. There's a second deeper stage of feeling connected to the other people in the room, and connected to ideas that you're all here to celebrate, for which reassurance is a prerequisite but insufficient.
Step two requires the songs be resonant, and for you to have a strong sense that the other people in the room all have a particular connection to the songs. (The sense of ingroup identity and sense of philosophical connection are separate qualities, but work together to produce something greater than the sum of their parts)
You can get pieces of this in the space of a single song, but there's a version of it with unique qualia that takes something like 8 songs to really get going (and then, once you're there, it's nice to get to stay there awhile)
Interwoven Story and Song; each Round Deepening
The formula I find works best (at least for my preferences) is:
 On average, groups of 24 songs
 Start with a song that's a particularly inviting singalong, to set the overall context of "this is an event where we're here to sing together."
 Each song gets a brief story (like 1030 seconds) that gives it some context and helps people fit it into the overall narrative arc of the night. The brief stories are not long enough to take you out of singalongheadspace.
 In between sets of 24 songs, there are longer stories, speeches, meditations and other activities that move the narrative along more significantly. Each one sets the overall context for the next 24 songs, shifting the particular qualia of "deep singalong" that you'd get from it.
Once you've gotten into the overall singalong headspace, it's less necessary to do groups of songs – alternating between a song and a speech won't kill the headspace once it's had a chance to take root.
Your Mileage May Vary
Reiterating a final time that this is just one particular effect you can go for. I think it's important that local solstice organizers adapt to fit the needs of their particular communities. But the effect I'm trying to describe here is hard to grok if you haven't directly experienced it, and I wanted people to at least have considered an option they may have been missing.
Discuss
UML V: Convex Learning Problems
(This is part five in a sequence on Machine Learning based on this book. Click here for part 1.)
The first three posts of this sequence have defined PAC learning and established some theoretical results, problems (like overfitting), and limitations. While that is helpful, it doesn't actually answer the question of how to solve real problems (unless the brute force approach is viable). The first way in which we approach that question is to study particular classes of problems and prove that they are learnable. For example, in the previous post, we've looked (among other things) at linear regression, where the loss function has the form .mjxchtml {display: inlineblock; lineheight: 0; textindent: 0; textalign: left; texttransform: none; fontstyle: normal; fontweight: normal; fontsize: 100%; fontsizeadjust: none; letterspacing: normal; wordwrap: normal; wordspacing: normal; whitespace: nowrap; float: none; direction: ltr; maxwidth: none; maxheight: none; minwidth: 0; minheight: 0; border: 0; margin: 0; padding: 1px 0} .MJXcdisplay {display: block; textalign: center; margin: 1em 0; padding: 0} .mjxchtml[tabindex]:focus, body :focus .mjxchtml[tabindex] {display: inlinetable} .mjxfullwidth {textalign: center; display: tablecell!important; width: 10000em} .mjxmath {display: inlineblock; bordercollapse: separate; borderspacing: 0} .mjxmath * {display: inlineblock; webkitboxsizing: contentbox!important; mozboxsizing: contentbox!important; boxsizing: contentbox!important; textalign: left} .mjxnumerator {display: block; textalign: center} .mjxdenominator {display: block; textalign: center} .MJXcstacked {height: 0; position: relative} .MJXcstacked > * {position: absolute} .MJXcbevelled > * {display: inlineblock} .mjxstack {display: inlineblock} .mjxop {display: block} .mjxunder {display: tablecell} .mjxover {display: block} .mjxover > * {paddingleft: 0px!important; paddingright: 0px!important} .mjxunder > * {paddingleft: 0px!important; paddingright: 0px!important} .mjxstack > .mjxsup {display: block} .mjxstack > .mjxsub {display: block} .mjxprestack > .mjxpresup {display: block} .mjxprestack > .mjxpresub {display: block} .mjxdelimh > .mjxchar {display: inlineblock} .mjxsurd {verticalalign: top} .mjxmphantom * {visibility: hidden} .mjxmerror {backgroundcolor: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; fontstyle: normal; fontsize: 90%} .mjxannotationxml {lineheight: normal} .mjxmenclose > svg {fill: none; stroke: currentColor} .mjxmtr {display: tablerow} .mjxmlabeledtr {display: tablerow} .mjxmtd {display: tablecell; textalign: center} .mjxlabel {display: tablerow} .mjxbox {display: inlineblock} .mjxblock {display: block} .mjxspan {display: inline} .mjxchar {display: block; whitespace: pre} .mjxitable {display: inlinetable; width: auto} .mjxrow {display: tablerow} .mjxcell {display: tablecell} .mjxtable {display: table; width: 100%} .mjxline {display: block; height: 0} .mjxstrut {width: 0; paddingtop: 1em} .mjxvsize {width: 0} .MJXcspace1 {marginleft: .167em} .MJXcspace2 {marginleft: .222em} .MJXcspace3 {marginleft: .278em} .mjxtest.mjxtestdisplay {display: table!important} .mjxtest.mjxtestinline {display: inline!important; marginright: 1px} .mjxtest.mjxtestdefault {display: block!important; clear: both} .mjxexbox {display: inlineblock!important; position: absolute; overflow: hidden; minheight: 0; maxheight: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjxtestinline .mjxleftbox {display: inlineblock; width: 0; float: left} .mjxtestinline .mjxrightbox {display: inlineblock; width: 0; float: right} .mjxtestdisplay .mjxrightbox {display: tablecell!important; width: 10000em!important; minwidth: 0; maxwidth: none; padding: 0; border: 0; margin: 0} .MJXcTeXunknownR {fontfamily: monospace; fontstyle: normal; fontweight: normal} .MJXcTeXunknownI {fontfamily: monospace; fontstyle: italic; fontweight: normal} .MJXcTeXunknownB {fontfamily: monospace; fontstyle: normal; fontweight: bold} .MJXcTeXunknownBI {fontfamily: monospace; fontstyle: italic; fontweight: bold} .MJXcTeXamsR {fontfamily: MJXcTeXamsR,MJXcTeXamsRw} .MJXcTeXcalB {fontfamily: MJXcTeXcalB,MJXcTeXcalBx,MJXcTeXcalBw} .MJXcTeXfrakR {fontfamily: MJXcTeXfrakR,MJXcTeXfrakRw} .MJXcTeXfrakB {fontfamily: MJXcTeXfrakB,MJXcTeXfrakBx,MJXcTeXfrakBw} .MJXcTeXmathBI {fontfamily: MJXcTeXmathBI,MJXcTeXmathBIx,MJXcTeXmathBIw} .MJXcTeXsansR {fontfamily: MJXcTeXsansR,MJXcTeXsansRw} .MJXcTeXsansB {fontfamily: MJXcTeXsansB,MJXcTeXsansBx,MJXcTeXsansBw} .MJXcTeXsansI {fontfamily: MJXcTeXsansI,MJXcTeXsansIx,MJXcTeXsansIw} .MJXcTeXscriptR {fontfamily: MJXcTeXscriptR,MJXcTeXscriptRw} .MJXcTeXtypeR {fontfamily: MJXcTeXtypeR,MJXcTeXtypeRw} .MJXcTeXcalR {fontfamily: MJXcTeXcalR,MJXcTeXcalRw} .MJXcTeXmainB {fontfamily: MJXcTeXmainB,MJXcTeXmainBx,MJXcTeXmainBw} .MJXcTeXmainI {fontfamily: MJXcTeXmainI,MJXcTeXmainIx,MJXcTeXmainIw} .MJXcTeXmainR {fontfamily: MJXcTeXmainR,MJXcTeXmainRw} .MJXcTeXmathI {fontfamily: MJXcTeXmathI,MJXcTeXmathIx,MJXcTeXmathIw} .MJXcTeXsize1R {fontfamily: MJXcTeXsize1R,MJXcTeXsize1Rw} .MJXcTeXsize2R {fontfamily: MJXcTeXsize2R,MJXcTeXsize2Rw} .MJXcTeXsize3R {fontfamily: MJXcTeXsize3R,MJXcTeXsize3Rw} .MJXcTeXsize4R {fontfamily: MJXcTeXsize4R,MJXcTeXsize4Rw} .MJXcTeXvecR {fontfamily: MJXcTeXvecR,MJXcTeXvecRw} .MJXcTeXvecB {fontfamily: MJXcTeXvecB,MJXcTeXvecBx,MJXcTeXvecBw} @fontface {fontfamily: MJXcTeXamsR; src: local('MathJax_AMS'), local('MathJax_AMSRegular')} @fontface {fontfamily: MJXcTeXamsRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_AMSRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_AMSRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_AMSRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXcalB; src: local('MathJax_Caligraphic Bold'), local('MathJax_CaligraphicBold')} @fontface {fontfamily: MJXcTeXcalBx; src: local('MathJax_Caligraphic'); fontweight: bold} @fontface {fontfamily: MJXcTeXcalBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_CaligraphicBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_CaligraphicBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_CaligraphicBold.otf') format('opentype')} @fontface {fontfamily: MJXcTeXfrakR; src: local('MathJax_Fraktur'), local('MathJax_FrakturRegular')} @fontface {fontfamily: MJXcTeXfrakRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_FrakturRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_FrakturRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_FrakturRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXfrakB; src: local('MathJax_Fraktur Bold'), local('MathJax_FrakturBold')} @fontface {fontfamily: MJXcTeXfrakBx; src: local('MathJax_Fraktur'); fontweight: bold} @fontface {fontfamily: MJXcTeXfrakBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_FrakturBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_FrakturBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_FrakturBold.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmathBI; src: local('MathJax_Math BoldItalic'), local('MathJax_MathBoldItalic')} @fontface {fontfamily: MJXcTeXmathBIx; src: local('MathJax_Math'); fontweight: bold; fontstyle: italic} @fontface {fontfamily: MJXcTeXmathBIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MathBoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MathBoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MathBoldItalic.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsansR; src: local('MathJax_SansSerif'), local('MathJax_SansSerifRegular')} @fontface {fontfamily: MJXcTeXsansRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_SansSerifRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_SansSerifRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_SansSerifRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsansB; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerifBold')} @fontface {fontfamily: MJXcTeXsansBx; src: local('MathJax_SansSerif'); fontweight: bold} @fontface {fontfamily: MJXcTeXsansBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_SansSerifBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_SansSerifBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_SansSerifBold.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsansI; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerifItalic')} @fontface {fontfamily: MJXcTeXsansIx; src: local('MathJax_SansSerif'); fontstyle: italic} @fontface {fontfamily: MJXcTeXsansIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_SansSerifItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_SansSerifItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_SansSerifItalic.otf') format('opentype')} @fontface {fontfamily: MJXcTeXscriptR; src: local('MathJax_Script'), local('MathJax_ScriptRegular')} @fontface {fontfamily: MJXcTeXscriptRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_ScriptRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_ScriptRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_ScriptRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXtypeR; src: local('MathJax_Typewriter'), local('MathJax_TypewriterRegular')} @fontface {fontfamily: MJXcTeXtypeRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_TypewriterRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_TypewriterRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_TypewriterRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXcalR; src: local('MathJax_Caligraphic'), local('MathJax_CaligraphicRegular')} @fontface {fontfamily: MJXcTeXcalRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_CaligraphicRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_CaligraphicRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_CaligraphicRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmainB; src: local('MathJax_Main Bold'), local('MathJax_MainBold')} @fontface {fontfamily: MJXcTeXmainBx; src: local('MathJax_Main'); fontweight: bold} @fontface {fontfamily: MJXcTeXmainBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MainBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MainBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MainBold.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmainI; src: local('MathJax_Main Italic'), local('MathJax_MainItalic')} @fontface {fontfamily: MJXcTeXmainIx; src: local('MathJax_Main'); fontstyle: italic} @fontface {fontfamily: MJXcTeXmainIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MainItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MainItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MainItalic.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmainR; src: local('MathJax_Main'), local('MathJax_MainRegular')} @fontface {fontfamily: MJXcTeXmainRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MainRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MainRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MainRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXmathI; src: local('MathJax_Math Italic'), local('MathJax_MathItalic')} @fontface {fontfamily: MJXcTeXmathIx; src: local('MathJax_Math'); fontstyle: italic} @fontface {fontfamily: MJXcTeXmathIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_MathItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_MathItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_MathItalic.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsize1R; src: local('MathJax_Size1'), local('MathJax_Size1Regular')} @fontface {fontfamily: MJXcTeXsize1Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_Size1Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_Size1Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_Size1Regular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsize2R; src: local('MathJax_Size2'), local('MathJax_Size2Regular')} @fontface {fontfamily: MJXcTeXsize2Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_Size2Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_Size2Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_Size2Regular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsize3R; src: local('MathJax_Size3'), local('MathJax_Size3Regular')} @fontface {fontfamily: MJXcTeXsize3Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_Size3Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_Size3Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_Size3Regular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXsize4R; src: local('MathJax_Size4'), local('MathJax_Size4Regular')} @fontface {fontfamily: MJXcTeXsize4Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_Size4Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_Size4Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_Size4Regular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXvecR; src: local('MathJax_Vector'), local('MathJax_VectorRegular')} @fontface {fontfamily: MJXcTeXvecRw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_VectorRegular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_VectorRegular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_VectorRegular.otf') format('opentype')} @fontface {fontfamily: MJXcTeXvecB; src: local('MathJax_Vector Bold'), local('MathJax_VectorBold')} @fontface {fontfamily: MJXcTeXvecBx; src: local('MathJax_Vector'); fontweight: bold} @fontface {fontfamily: MJXcTeXvecBw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/eot/MathJax_VectorBold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/woff/MathJax_VectorBold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTMLCSS/TeX/otf/MathJax_VectorBold.otf') format('opentype')} ℓ:Rd→R and is linear. In this post, we focus on convex learning problems, where the loss function also has the above form and is convex.
ConvexityWe begin with sets rather than functions.
Convex setsA set M (that is part of a vector space) is called convex iff for any two points in that set, the line segment which connects both points is a subset of M.The condition that M is part of a vector space is missing in the book, but it is key – as far as I know, being part of a vector space is the most general way of demanding that a line between two points even exists. Convexity cannot be defined for mere topological spaces, or even metric spaces. In our case, all of our sets will live in Rd for some d∈N+.
For example, let's look at letters (as a subset of the plane, R2). None of the letters in this chapter so far is convex – the letter l comes the closest, but it's not quite there if you look closely enough. Even the uppercase I is not convex in this font. The only convex symbols in this post that I've noticed are . and ' and  and –.
Conversely, every regular filled polygon with n corners is convex. The circle is not convex (no two points have a line segment which is contained in the circle), but the disc (filled circle) is convex. The disc with an arbitrary set of points on its boundary (i.e. the circle) taken out remains convex. The disc with any other point taken out is not convex, neither is the disc with any additional point added. You get the idea. (To be precise on the last one, the mathematical description of the disc is D={x∈R2x≤1}, so there is no way to add a single point that somehow touches the boundary.)
Convex functionsInformally, a function f:Rd→R is convex iff the set of all points on and above the function is convex as a subset of Rd+1, where the dependent variable goes up/downward.
(Here, the middle function (x3) is not convex because the red line segment is not in the blue set, but the left (x) and the right (x2) are convex.)
The formal definition is that a function f:Rd→R is convex iff for all x,y∈Rd, the equation f(x+α(y−x))≤f(x)+α[f(y)−f(x)] holds for all α∈[0,1]. This says that a line segment connecting two points on the graph lines above the graph, so the same thing.
If d=1 as was the case in all of my pictures, then f is convex iff the little pixie flying along the function graph never turns rightward. This is the case iff f′ is monotonically increasing, which is the case iff f′′(x)≥0 for all x∈R.
The main reason why convexity is a desirable property is that, for a convex function, every local minimum is a global minimum – which is probably fairly obvious, but let's do a formal proof, because it's reasonably easy in this case.
Suppose that x∈Rd is a local minimum. Then we find some ball Bd(x,ϵ):={p∈Rdp−x≤ϵ} around x such that f(y)≥x for all y in the ball (this is what it means to be a local minimum in Rd). Now let z be an arbitrary point in Rd; we show that its function value can't lie below that of x. Imagine the line segment from x to z. A part of it must lie in our ball, so we find some (small) δ∈R+ such that x+δ[z−x]∈Bd(x,ϵ). Then (because x is our local minimum), we have that f(x)≤f(x+δ[z−x]). By convexity of f we have f(x+δ[z−x])≤f(x)+δ[f(y)−f(x)], so taken together we obtain the equation
f(x)≤f(x)+δ[f(z)−f(x)]
Or equivalently δ[f(z)−f(x)]≥0 which is to say that δf(z)≥δf(x) which is to say that f(z)≥f(x).
If there are several local minima, then there are several global minima, then one can draw a line segment between them that inevitably cannot go up or down (because otherwise one of the global minima wouldn't be a global minimum), so really there is just one global minimum, which might be 'wide' if the function just happens to not go up or down for a while. This is all about the difference between ≤ and <. The simplest example is a constant function – it is convex, and everywhere is a global minimum (or rather, the global minimum).
Jensen's InequalityThe key fact about convex functions, I would argue, is Jensen's inequality:
Given α1,...,αn∈R+ with ∑ni=1αi=1, if f:Rd→R is convex, then for any sequence (x1,...,xn)∈(Rd)n, it holds that f(∑ni=1αixi)≤∑ni=1αif(xi).If you look at the inequality above, you might notice that it is almost the definition of linearity, except for the condition ∑ni=1αi=1 and the fact that we have ≤ instead of =. So convex functions fulfill the linearity property as an inequality rather than an equality (almost). In particular, linear functions are convex. Conversely, concave functions (these are functions where the ≤ in the definition of convex functions is a ≥) also fulfill the above property as an inequality, only the the sign does again turn around. In particular, linear functions are concave. To refresh your memory, here is the definition of convexity:
f(x+α(y−x))≤f(x)+α[f(y)−f(x)]∀x,y∈X,α∈[0,1]
To to summarize: convex functions never turn rightward, concave functions never turn leftward, and the intersection of both does neither, i.e. always goes straight, i.e. is linear. Looking at convexity and concavity as a generalization of linearity might further motivate the concept. In fact, it might have even been wise to define the three concepts in analogous terms rather than considering Jensen's inequality a result about convex functions.
Terms of the form x+a(y−x), which one sees quite often (for example in defining points on a line segment), can be equivalently written as (1−a)x+ay. I think the first form is far more intuitive; however, the second one generalizes a bit better. We see that x and y are given weights, and those weights sum up to 1. If one goes from 2 weighted values to n weighted values (still all nonnegative), one gets Jensen's inequality. Thus, the statement of Jensen's inequality is that if you take any number of points on the graph and construct a weighted mean, that resulting point still lies above the graph. See wikipedia's page for a simple proof via induction.
Guaranteeing learnabilityRecall that we are trying to find solvable special cases of the setting minimize a loss function of the form ℓ:Rd→R. This can be divided into three tasks:
(1) define the special case
(2) demonstrate that this special case is indeed solvable
(3) apply it as widely as possible
This chapter is about (1). (By the way, when I say "chapter" and "section", I'm referring to the level1 and level2 headlines of this post as visible in the navigation at the left.) The reason why we aren't already done with (1) is that convexity of the loss function alone turns out to be insufficient to guarantee PAC learnability. We'll discuss a counterexample in the context of linear regression and then define additional properties to remedy the situation.
A failure of convexityIn our counterexample, we'll have X=Y=R; note that this is a convex set. Our hypothesis class H will be that of all (small) linear predictors, i.e. just H={fα:x↦α⋅xα∈[−1,1]}. The reason that we only allow small predictors is that our final formulation of the learnable class will also demand that H is a bounded set, so this example demonstrates that even boundedness + convexity is still not enough.
We've previously defined real loss functions as taking a hypothesis and returning the real error, and empirical loss functions as taking a hypothesis and a training sequence and returning the empirical error. Now we'll look pointbased loss functions (not bolded as it's not an official term, but I'll be using it a lot) which measure the error of a hypothesis on a single point only, i.e. they have the form ℓ(x,y):H→R for some (x,y)∈X×Y. To be more specific, we will turn the squared loss function defined in the previous post into a pointbased loss function. Thus we will have ℓ(2)(x,y)(hα)=α⋅x−y2=(αx−y)2, where the last equality holds because we're in the onedimensional case (this is also why none of the letters are bolded). We will only care about two points (all else with have probability mass 0), namely these two:
That's the point (1/−1) at the left and one all the way to the right at (1/μ,0). With μ, think of an extremely small positive number, so that 1/μ is quite large.
If this class was PAC learnable, then there would be a learner A such that, for all ϵ,δ∈(0,1), if the size of the training sequence is at least m∗(ϵ,δ), then for all probability distributions over X×Y, with probability at least 1−δ over the choice of S, the error of A(S) would be at most ϵ larger than that of the best classifier.
So to prove that it is not learnable, we first assume we have some learner A. Then we get to set some ϵ and δ and construct a probability distribution DA based on A. Finally, we have to prove that A fails on the problem given the choices of ϵ and δ and DA. That will show that the problem is not PAC learnable.
We consider two possible candidates for DA. The first is DL which has all probability mass on the point (1/−1) on the left. The second is D?, which has almost all probability mass on the point (1/−1), but also has μ probability mass on the point (1/μ,0). Here μ∈R+ will be an extremely small number; so small that the right point will be unlikely to ever appear in the training sequence.
If the right point doesn't appear in the training sequence, then the sequence consists of only the left point sampled over and over again. In that case, A cannot differentiate between DL and D?, so in order to succeed, it would have to output a hypothesis which performs well with both distributions – which as we will show is not possible. Given our class H, the hypothesis A(S) must be of the form hα for some α∈R. Recall that the classifier is supposed to predict the ycoordinate of the points. Thus, for the first point, α=−1 would be the best choice (since −1⋅1=−1) and for the second point, α=0 would be the best choice (since 0⋅1/μ=0).
Now if α≤−0,5, then we declare that DA=D?. In this case (assuming that the second point doesn't appear in the training sequence), there will be a μ chance of predicting the value α⋅1/μ=α/μ≤−1/2μ, which, since we use the squared loss function, leads to an error of at least 14μ2, and thus the expected error is at least μ14μ2=14μ, which, because μ is super tiny, is a very large number. Conversely, the best classifier would be at least as good as the classifier with α=0, which would only have error 1−μ (for the left point), which is about 1 and thus a much smaller number.
Conversely, if α>−0.5, we declare that DA=DL, in which case the error of A(S) is at least (−0.5−(−1))2=14, whereas the best classifier (with α=−1) has zero error.
Thus, we only need to choose some ϵ<14 and an arbitrary δ. Then, given the sample size m, we set μ small enough such that the training sequence is less than δ likely to contain the second point. This is clearly possible: we can make μ arbitrarily small; if we wanted, we could make it so small that the probability of sampling the second point is <δ100. That concludes our proof.
Why was this negative result possible? It comes down to the fact that we were able to make the error of the first classifier with α<−0.5 large via a super unlikely sample point with super2 high error – so the problem is the growth rate of the loss function. As long as the loss function grows so quickly that, while both giving a point less probability mass and moving it further to the right, the expected error goes up, well then one can construct examples with arbitrarily high expected error. (As we've seen, the expected error in the case of α<−0.5 is at least 14μ, i.e. a number that grows arbitrarily large as μ→0.)
Lipschitzness & SmoothnessThere are at least two ways to formalize a requirement such that the loss function is somehow "bounded". They're called Lipschitzness and Smoothness, and both are very simple.
Lipschitzness says that a function cannot grow too fast, i.e.
A function f:Rd→R is ρLipschitz iff f(y)−f(x)≤ρy−x∀x,y∈RdIf f is differentiable, then a way to measure maximum growth is the gradient, because the gradient points into the direction of fastest growth. Thus, one has the equivalent characterization:
A differentiable function f:Rd→R is ρLipschitz iff ∇x≤ρ for all x∈RdHowever, nondifferentiable functions can be Lipschitz; for example, the absolute value function x on the real numbers is 1Lipschitz. Conversely, smoothness is about the change of change. Thus, x is definitely not smooth since the derivative jumps from −1 to 1 across a single point (smoothness is only even defined for differentiable functions). On the other hand, the function x2 is smooth on all of R. The formal definition simply moves Lipschitzness one level down, i.e.
A differentiable function f:Rd→R is βsmooth iff its gradient is βLipschitzWhich is to say, iff ∇f(y)−∇f(x)≤ρy−x for all x,y∈Rd. In the onedimensional case, the gradient equals the derivative, and if the derivative is itself differentiable, then smoothness can be characterized in terms of the second derivative. Thus, a twicedifferentiable function f:R→R is βsmooth iff f′′(x)≤β for all x∈R.
One now defines the class of convex Lipschitz bounded problems and that of convex smooth bounded problems. They both require that H has a structure as a familiar set like Bd(0,M), that it is convex and bounded (so H=Rd would not suffice), and that, for all (x,y)∈X×Y, the pointbased loss function ℓ(x,y):H→R is ρLipschitz (in the former case) or βsmooth and nonnegative (in the latter case). If all this is given, the class is called convex Lipschitz bounded with parameters (M,ρ); or convex smooth bounded with parameters (M,β).
In the previous example, the hypothesis could be represented by the set [0,1], which is both convex and bounded. (In that example, we thought of it as a set of functions, each of which fully determined by an element in [0,1]; now we think of it as the set [0,1] itself.) Each pointbased loss function ℓ(2)(x,y) is convex (and nonnegative). However, for any number x∈R+, the loss function ℓ(2)x,y is defined by the rule ℓ(2)(x,y)(α)=(α⋅x−y)2, and the gradient of this function with respect to α (which equals the derivative since we're in the onedimensional case) is 2(α⋅x−y). Since this gets large as α gets large, the function is not Lipschitz. Furthermore, the second derivative is 2x. This means that each particular function induced by the point (x,y) is 2xsmooth, but there is no parameter β such that all functions are βsmooth.
Surrogate Loss FunctionsWe are now done with task (1), defining the problem. Task (2) would be demonstrating that both convex Lipschitz bounded and convex smooth bounded problems are in fact PAC learnable with no further conditions. This is done by defining an algorithm and then proving that that the algorithm works, i.e. learns any instance of the class with the usual PAC learning guarantees. The algorithm we will look at for this purpose (which does learn both classes) is an implementation of Stochastic Gradient Descent; however we'll do this in the next post rather than now. For this final chapter, we will instead dive into (3), i.e. find an example of how the ability to learn these two classes is useful even for problems that don't naturally fit into either of them.
Recall the case of binary linear classification. We have a set of points in some highdimensional space X=Rd, a training sequence where points are given binary labels (i.e. Y={−1,1}) and we wish to find a hyperplane that performs well in the real world. We've already discussed the Perceptron algorithm and also reduced the problem to linear programming; however, both approaches have assumed that the problem is separable.
We're not going to find a perfect solution for the general case, because one can show that the problem is NPhard. However, we can find a solution that approximates the optimal predictor. The approach here is to define a surrogate loss function, which is a loss function ℓ∗ that (a) upperbounds the real loss ℓ0−1, and (b) has nicer properties than the real loss, so that minimizing it is easier. In particular, we would like for it to be a member of one of the two learnable classes we have introduced. Our pointbased loss function for ℓ0−1 has the form ℓ0−1(x,y)(ha):=1ha(x)≠y, where 1B for a boolean statement B is 1 iff B is true and 0 otherwise.
Recall that each hyperplane is fully determined by one vector in Rd, hence the notation ha. If we represent H directly as Rd and assume d=1, then the graph of ℓ0−1(x,y) looks like this...
... because in d=1 and the homogeneous case, the classifier determined by a single number; if this number is positive it will label all positive points with 1; if it's negative, it will label all negative points with 1. If the xcoordinate of the point in question is positive with label 1 or negative with label −1 (i.e. x>0 and y=1; or x<0 and y=−1), then the former case is the correct one and we get this loss function. Otherwise, the loss function would jump from 0 to 1 instead.
Obviously, d=1 is silly, but it already demonstrates that this loss function is not convex (it makes a turn to the right, and it's easy to find a segment which connects two points of the graph and doesn't lie above the graph). But consider the alternative loss function ℓ∗(x,y):
This new loss function can be defined by the rule ℓ∗(x,y)(ha):=max(0,1−⟨a,x⟩y). In the picture, the horizontal axis corresponds to a and we have x>0 and y=1. This loss function is easily seen to be convex, nonnegative, and not at all smooth. It is also xLipschitz. Thus, the problem with X=Rd is not convex Lipschitz bounded, but if we take X=Bd(0,ρ) and also H=Bd(0,M) for some M,ρ∈R+, then it does become a member of the convexLipschitzbounded class with parameters M and ρ, and we can learn via e.g. stochastic gradient descent.
Of course, this won't give us exactly what we want (although penalizing a predictor for being "more wrong" might not be unreasonable), so if we want to bound our loss (empirical or real) with respect to ℓ0−1, we will have to do it via ℓ0−1(h)=ℓ∗(h)+[ℓ0−1(h)−ℓ∗(h)], where the second term is the difference between both loss functions. If ℓ0−1(h)−ℓ∗(h) is small, then this approach will perform well.
Discuss
Homeostasis and “Root Causes” in Aging
Let’s start with a stylized fact: almost every cell type in the human body is removed and replaced on a regular basis. The frequency of this turnover ranges from a few days (for many immune cells and cells in the gastrointestinal lining) to ten years (for fat, heart, and skeleton cells). Only a handful of tissues are believed to be nonrenewing in humans  e.g. eggs, neurons, and the lens of the eye (and even out of those, neurons are debatable).
This means that the number of cells of any given type is determined by “homeostatic equilibrium”  the balance of cell removal and replacement. If an ulcer destroys a bunch of cells in your stomach lining, they’ll be replaced over a few days, and the number of stomach cells will return to roughly the same equilibrium level as before. If a healthy person receives a bunch of extra red blood cells in a transfusion, they’ll be broken down over a few months, and the number of blood cells will return to roughly the same equilibrium level as before.
As organisms age, we see a change in the homeostatic equilibrium level of many different cell types (and other parameters, like hormone and cytokine levels). In particular, a wide variety of symptoms of aging involve “depletion” (i.e. lower observed counts) of various cell types.
However, human aging happens on a very slow timescale, i.e. decades. Most cell counts equilibrate much faster  for instance, immune cell counts equilibrate on a scale of days to weeks. So, suppose we see a decrease in the count of certain immune cells with age  e.g. naive T cells. Could it be that naive T cells just wear out and die off with age? No  T cells are replaced every few weeks, so a change on a timescale of decades cannot be due to the cells themselves dying off. If the count of naive T cells falls on a timescale of decades, then either (a) the rate of new cell creation has decreased, or (b) the rate of old cell removal has increased (or both). Either of those would require some “upstream” change to cause the rate change.
More generally: in order for cell counts, or chemical concentrations, or any other physiological parameter to decrease/increase with age, at least one of the following must be true:
 the timescale of turnover is on the order of decades (or longer)
 rate of removal increases/decreases
 rate of creation decreases/increases
If none of these is true, then any change is temporary  the cell count/concentration/whatever will return to the same level as before, determined by the removal and creation rates.
Of those three possibilities, notice that the second two  increase/decrease in production/removal rate  both imply some other upstream cause. Something else must have caused the rate change. Sooner or later, that chain of causeandeffect needs to bottom out, and it can only bottom out in something which equilibrates on a timescale of decades or longer. (Feedback loops are possible, but if all the components equilibrate on a fast timescale then so will the loop.) Something somewhere in the system is outofequilibrium on a timescale of decades. We’ll call that thing (or things) a “root cause” of aging. It’s something which is not replaced on a timescale faster than decades, and it either accumulates or decumulates with age.
Now, the main criteria: a root cause of aging cannot be a higher or lower value of any parameter subject to homeostasis on a faster timescale than aging itself. Examples:
 Most cell types turn over on timescales of days to months. “Depletion” of any of these cell types cannot be a root cause of aging; either their production rate has decreased or their removal rate has increased.
 DNA damage (as opposed to mutation) is normally repaired on a timescale of hours  sometimes much faster, depending on type. “Accumulation” of DNA damage cannot be a root cause of aging; either the rate of new damage has increased or the repair rate has decreased.
 DNA mutations cannot be repaired; from a cell’s perspective, the original information is lost. So mutations can accumulate in a nonequilibrium fashion, and are a plausible root cause under the homeostasis argument.
Note that the homeostasis argument does not mean the factors ruled out above are not links in the causal chain. For instance, there’s quite a bit of evidence that DNA damage does increase with age, and that this has important physiological effects. However, there must be changes further up the causal chain  some other longterm change in the organism’s state leads to faster production or slower repair of DNA damage. Conversely, the homeostasis argument does not imply that “plausible root causes” are the true root causes  for instance, although DNA mutations could accumulate in principle, cells with certain problematic mutations are believed to be cleared out by the immune system  so the number of cells with these mutations is in equilibrium on a fast timescale, and cannot be a root cause of aging.
For any particular factor which changes with age, key questions are:
 Is it subject to homeostasis?
 If so, on what timescale does it turn over?
 If it is subject to homeostasis on a timescale faster than aging, then what are the production and removal mechanisms, and what changes the production and removal rates with age?
These determine the applicability of the homeostasis argument. Typically, anything which can normally be fixed/replaced/undone by the body will be ruled out as a root cause of aging  the timescale of aging is very long compared to practically all other physiological processes. We then follow the causal chain upstream, in search of plausible root cause.
Discuss
Dissolving Confusion around Functional Decision Theory
Summary
Functional Decision Theory (FDT), (see also causal, evidential, timeless, and updateless decision theories) recommends taking cooperative, nongreedy actions in twin prisoners dilemmas, Newcombian problems, and Parfit’s hitchhikerlike games but not smoking lesion situations. It’s a controversial concept with important implications for designing agents that have optimal behavior when embedded in environments in which they may potentially interact with models of themselves. Unfortunately, I think that FDT is sometimes explained confusingly and misunderstood by its proponents and opponents alike. To help dissolve confusion about FDT and address key concerns of its opponents, I refute the criticism that FDT assumes that causation can happen backward in time and offer two key principles that provide a framework for clearly understanding it:
 Questions in decision theory are not questions about what choices you should make with some sort of unpredictable free will. They are questions about what type of source code you should be running.
 I should consider predictor P to “subjunctively depend” on agent A to the extent that P makes predictions of A’s actions based on correlations that cannot be confounded by my choice of what source code A runs.
I think that functional decision theory (FDT) is a beautifully counterintuitive and insightful framework for instrumental rationally. I will not make it my focus here to talk about what it is and what types of situations it is useful in. To gain a solid background, I recommend this post of mine or the original paper on it by Eliezer Yudkowsky and Nate Soares.
Additionally, here are four different ways that FDT can be explained. I find them all complimentary for understanding and intuiting it well.
 The decision theory that tells you to act as if you were setting the output to an optimal decisionmaking process for the task at hand.
 The decision theory that has you cooperate in situations isomorphic to a prisoners’ dilemma against a model of yourselfincluding when your opponent locks in their choice and shows it to you before you make yours.
 The decision theory that has you onebox it in situations isomorphic to Newcombian gamesincluding when the boxes are transparent; see also Parfit’s Hitchhiker.
 The decision theory that shifts focus from what type of decisions you should make to what type of decisionmaking agent you should be.
I’ll assume a solid understanding of FDT from here on. I’ll be arguing in favor of it, but it’s fairly controversial. Much of what inspired this post was an AI Alignment Forum post called A Critique of Functional Decision Theory by Will MacAskill which raised several objections to FDT. Some of his points are discussed below. The rest of this post will be dedicated to discussing two key principles that help to answer criticisms and dissolve confusions around FDT.
1. Acknowledging One’s own PredictabilityOpponents of FDT, usually proponents of causal decision theory (CDT), will look at a situation such as the classic Newcombian game and reason as so:
I can choose to onebox it and take A or twobox it and take A+B. Regardless of the value of A, A+B is greater, so it can only be rational to take both. After all, when I’m sitting in front of these boxes, what’s in them is already in them regardless of the choice I make. The functional decision theorist’s perspective requires assuming that causation can happen backwards in time! Sure, oneboxers might do better at these games, but nonsmokers do better in smoking lesion problems. That doesn’t mean they are making the right decision. Causal decision theorists may be dealt a bad hand in Newcombian games, but it doesn’t mean they play it badly.The problem with this argument, I’d say, is subtle. I actually fully agree with the perspective that for causal decision theorists, Newcombian games are just like smoking lesion problems. I also agree with the point that causal decision theorists are dealt a bad hand in these games but don’t play it badly. The problem with the argument is some subtle confusion about the word ‘choice’ plus how it says that FDT assumes that causation can happen backwards in time.
The mistake that a causal decision theorist makes isn’t in twoboxing. It’s in being a causal decision theorist in the first place. In Newcombian games, the assumption that there is a highlyaccurate predictor of you makes it clear that you are, well, predictable and not really making free choices. You’re just executing whatever source code you’re running. If this predictor thinks that you will twobox it, your fate is sealed and the best you can do is then to twobox it. The key is to just be running the right source code. And hence the first principle:
Questions in decision theory are not questions about what choices you should make with some sort of unpredictable free will. They are questions about what type of source code you should be running.
And in this sense, FDT is actually just what happens when you use causal decision theory to select what type of source code you want to enter a Newcombian game with. There’s no assumption that causation can occur backwards. FDT simply acknowledges that the source code you’re running can have a, yes, ***causal*** effect on what types of situations you will be presented with when models of you exist.
Instead of FDT assuming causal diagrams like these:
It really only assumes ones like these:
I think that many proponents of FDT fail to make this point: FDT’s advantage is that it shifts the question to what type of agent you want to benot misleading questions of what types of “choices” you want to make. But this isn’t usually how functional decision theorists explain FDT, including Yudkowsky and Soares in their paper. And I attribute some unnecessary confusion and misunderstandings like “FDT requires us to act as if causation happens backward in time,” to it.
To see this principle in action, let’s look at a situation presented by Will MacAskill. It’s similar to a Newcombian game with transparent boxes. And I say “similar” instead of “isomorphic” because of some vagueness which will be discussed soon. MacAskill presents this situation as follows:
You face two open boxes, Left and Right, and you must take one of them. In the Left box, there is a live bomb; taking this box will set off the bomb, setting you ablaze, and you certainly will burn slowly to death. The Right box is empty, but you have to pay $100 in order to be able to take it. A longdead predictor predicted whether you would choose Left or Right, by running a simulation of you and seeing what that simulation did. If the predictor predicted that you would choose Right, then she put a bomb in Left. If the predictor predicted that you would choose Left, then she did not put a bomb in Left, and the box is empty. The predictor has a failure rate of only 1 in a trillion trillion. Helpfully, she left a note, explaining that she predicted that you would take Right, and therefore she put the bomb in Left. You are the only person left in the universe. You have a happy life, but you know that you will never meet another agent again, nor face another situation where any of your actions will have been predicted by another agent. What box should you choose?Macaskill claims that you should take left because it results in a “guaranteed payoff”. Unfortunately, there is some vagueness here about what it means for a longdead predictor to have run a simulation of you and for it to have an error rate of one in a trillion trillion. Is this simulation true to your actual behavior? What type of information about you did this long dead predictor have access to? What is the reference class for the error rate?
Let’s assume that your source code was written long ago, that the predictor understood how it functioned, that it ran a truetofunction simulation, and that you were given an unaltered version of that source code. Then this situation isomorphic to a transparentbox Newcombian game in which you see no money in box A (albeit more dramatic), and the confusion goes away! If this is the case then there are only two possibilities.
 You are a causal decision theorist (or similar), the predictor made a selffulfilling prophecy by putting the bomb in the open right box alongside a note, and you will choose the left box.
 You are a functional decision theorist (or similar), the predictor made an extremely rare, one in a trilliontrillion mistake, and you will unfortunately take the box with a bomb (just as a functional decision theorist in a transparent box Newcombian game would take only box A).
So what source code would you rather run when going into a situation like this? Assuming that you want to maximize expected value and that you don’t value your life at more than 100 trillion trillion dollars, then you want to be running the functional decision theorist’s source code. Successfully navigating this game, transparentbox Newcombian games, twinopponentrevealsfirst prisoners’ dilemmas, Parfit’s Hitchiker situations, and the like all require you have source code that would tell you to commit to making the suboptimal decision in the rare case in which the predictor/twin made a mistake.
Great! But what if we drop our assumptions? What if we don’t assume that this predictor’s simulation was functionally true to your behavior? Then it becomes unclear how this prediction was made, and what the reference class of agents is for which this predictor is supposedly only wrong one in a trillion trillion times. And this leads us to the second principle.
2. When a Predictor is Subjunctively Entangled with an AgentAn alternate title for this section could be “when statistical correlations are and aren’t mere.”
As established above, functional decision theorists need not assume that causation can happen backwards in time. Instead, they only need to acknowledge that a prediction and an action can both depend on an agent’s source code. This is nothing special whatsoever: an ordinary correlation between an agent and predictor that arises from a common factor: the source code.
However, Yudkowsky and Soares give this type of correlation a special name in their paper: subjunctive dependence. I don’t love this term because it gives a fancy name to something that is not fancy at all. I think this might be responsible for some of the confused criticism that FDT assumes that causation can happen backward in time. Nonetheless, “subjunctive dependence” is at least workable. Yudkowsky and Soares write:
When two physical systems are computing the same function, we will say that their behaviors “subjunctively depend” upon that function.
This concept is very useful when a predictor actually knows your source code and runs it to simulate you. However, this notion of subjunctive dependence isn’t very flexible and quickly becomes less useful when a predictor is not doing this. And this is a bit of a problem that MacAskill pointed out. A predictor could make good predictions without potentially querying a model of you that is functionally equivalent to your actions. He writes:
...the predictor needn’t be running your algorithm, or have anything like a representation of that algorithm, in order to predict whether you’ll one box or twobox. Perhaps the Scots tend to onebox, whereas the English tend to twobox. Perhaps the predictor knows how you’ve acted prior to that decision. Perhaps the Predictor painted the transparent box green, and knows that’s your favourite colour and you’ll struggle not to pick it up. In none of these instances is the Predictor plausibly doing anything like running the algorithm that you’re running when you make your decision. But they are still able to predict what you’ll do. (And bear in mind that the Predictor doesn’t even need to be very reliable. As long as the Predictor is better than chance, a Newcomb problem can be created.)Here, I think that MacAskill is getting at an important point, but one that’s hard to see clearly with the wrong framework. On its face though, there’s a major problem with this argument. Suppose that in Newcombian games, 99% of browneyed people oneboxed it, and 99% of blueeyed people twoboxed it. If a predictor only made its prediction based on yout eye color, then clearly the best source code to be running would be the kind that always made you twobox it regardless of your eye color. There’s nothing newcombian, paradoxical, or even difficult about this case. And pointing out these situations is essentially how critics of MacAskill’s argument have answered it. Their counterpoint is that unless the predictor is querying a model of you that is functionally isomorphic to your decision making process, then it is only using “mere statistical correlations,” and subjunctive dependence does not apply.
But this counterpoint and Yudkoswky and Soares’ definition of subjunctive dependence miss something! MacAskill had a point. A predictor need not know an agent’s decisionmaking process to make predictions based on statistical correlations that are not “mere”. Suppose that you design some agent who enters an environment with whatever source code you gave it. Then if the agent’s source code is fixed, a predictor could exploit certain statistical correlations without knowing the source code. For example, suppose the predictor used observations of the agent to make probabilistic inferences about its source code. These could even be observations about how the agent acts in other Newcombian situations. Then the predictor could, without knowing what function the agent computes, make betterthanrandom guesses about its behavior. This falls outside of Yudkowsky and Soares’ definition of subjunctive dependence, but it has the same effect.
So now I’d like to offer my own definition of subjunctive dependence (even though still, I maintain that the term can be confusing, and I am not a huge fan of it).
I should consider predictor P to “subjunctively depend” on agent A to the extent that P makes predictions of A’s actions based on correlations that cannot be confounded by my choice of what source code A runs.
And hopefully, it’s clear why this is what we want. When we remember that questions in decision theory are really just questions about what type of source code we want to enter an environment using, then the choice of source code can only affect predictions that depend in some way on the choice of source code. If the correlation can’t be confounded by the choice of source code, the right kind of entanglement to allow for optimal updateless behavior is present.
Additional TopicsGoing MetaConsider what I call a Mind Police situation: Suppose that there is a powerful mind policing agent that is about to encounter agent A and read its mind (look at its source code). Afterward, if the mind policer judges A to be using decision theory X, they will destroy A. Else they will do nothing.
Suppose that decision theory X is FDT (but it could be anything) and that you are agent A who happens to use FDT. If you were given the option of overwriting your source code to implement some alternative, tolerated decision theory, would you? You’d be better off if you did, and it would be the output of an optimal function for the decision making task at hand, but it’s sort of unclear whether or this is a very functional decision theorist thing to do. Because of situations like these, I think that we should consider decision theories to come in two flavors: static which will never overwrite itself, and autoupdatable, which might.
Also, note that the example above is only a firstorder version of this type of problem, but there are higherorder ones too. For example, what if the mind police destroyed agents using autoupdatable decision theories?
Why Roko’s Basilisk is NonsenseA naive understanding of FDT has led some people to ask whether a superintelligent sovereign, if one were ever developed, would be rational to torture everyone who didn’t help to bring it into existence. The idea would be that this sovereign might consider this as part of an updateless strategy to help it come into existence more quickly and accomplish its goals more effectively.
Fortunately, a proper understanding of FDT and subjunctive dependence tell us that an optimallybehaving embedded agent doesn’t need to pretend that causation can happen backward in time. Such a sovereign would not be in control of its source code, and it can’t execute an updeateless strategy if there was nothing there to notupdate on in the first place. So Roko’s Basilisk is only an information hazard if FDT is poorly understood.
ConclusionIt's all about the source code.
Discuss
What is Life in an Immoral Maze?
Previously in sequence: Moloch Hasn’t Won, Perfect Competition, Imperfect Competition, Does Big Business Hate Your Family?
This post attempts to give a gearslevel explanation of maze life as experienced by a middle manager.
Again, if you have not yet done so, you are highly encouraged to read or review Quotes from Moral Mazes. I will not have the space here to even gloss over many important aspects.
An Immoral Maze can be modeled as a superperfectly competitive job market for management material. All the principles of superperfect competition are in play. The normal barriers to such competition have been stripped away. Too many ‘qualified’ managers compete for too few positions.
If an aspirant who does not devote everything they have, and visibly sacrifice all slack, towards success, they automatically fail. Those who do make such sacrifices mostly fail anyway, but some of them “succeed”. We’ll see later what success has in store for them.
The Lifestyle of a Middle ManagerAt the managerial and professional levels, the road between work and life is usually open because it is difficult to refuse to use one’s influence, patronage, or power on behalf of another regular member of one’s social coterie. It therefore becomes important to choose one’s social colleagues with some care and, of course, know how to drop them should they fall out of organizational favor. (Moral Mazes, Location 884, Quote #117)
We have this idea that there is work and there is notwork, and once one leaves work one is engaged in notwork distinct from work. We also have this idea that there are things that are off limits even at work, like sexual harassment.
For a person without anyone reporting to them, who is ‘on the line’ in the book’s parlance, this can be sustained.
For those in middle management who want to succeed, that’s not how things work. Everything you are is on the table. You’d better be allin.
You will increasingly choose your friends to help you win. You will increasingly choose your hobbies, and what you eat, and your politics, and your house, and your church, and your spouse and how many kids you have, to help you win. And of course, you will choose your (lack of) morality.
In the end, you will sacrifice everything, and I mean everything, that you value, in any sense, to win.
If the job requires you to move, anywhere in the world, you’ll do it, dragging your nuclear family along and forcing all of you to leave behind everything and everyone you know. Otherwise, you’re just not serious about success.
Slack will definitely not be a thing.
Your time is especially vulnerable.
Higherlevel managers in all the corporations I studied commonly spend twelve to fourteen hours a day at the office. (Location 1156, Quote #120, Moral Mazes)
This is the result of total competition between producers – the managers are effectively rival producers trying to sell themselves as the product.
The market for managers is seen, by those who make the decisions, as highly efficient.
If managers were seen as wildly different in terms of talent, intelligence, or some other ability that helped get things done, that would help a lot. You could afford to be a little quirky, to hold on to the things you value most, without losing the game entirely. Your success will be influenced by your personality and dedication, but nothing like solely determined by them.
Alas, the perception in these mazes is exactly the opposite.
See, once you are at a certain level of experience, the difference between a vicepresident, an executive vicepresident, and a general manager is negligible. It has relatively little to do with ability as such. People are all good at that level. They wouldn’t be there without that ability. So it has little to do with ability or with business experience and so on. All have similar levels of ability, drive, competence, and so on. What happens is that people perceive in others what they like—operating styles, lifestyles, personalities, ability to get along. Now these are all very subjective judgments. And what happens is that if a person in authority sees someone else’s guy as less competent than his own guy, well, he’ll always perceive him that way. And he’ll always pick—as a result—his own guy when the chance to do so comes up. (Location 1013, Quote #87, Moral Mazes)
It is known that most people ‘don’t have what it takes’ to be a manager. This is clearly true on many levels. Only one of them is a willingness to fully get with the program.
Once you get several levels up, the default assumption is that everyone is smart enough, and competent enough. That the objectlevel is a fully level playing field. The idea that someone can just be better at doing the actual job doesn’t parse for them.
All remaining differences are about negative selection, about how hard you want it and are willing to sacrifice everything, or about how well you play political games. Nor do they much care whether you succeed at your job, anyway.
Some additional supporting quotes on that follow. A large portion of the quotes reinforce this perspective.
If you can’t work smart, work hard:
When asked who gets ahead, an executive vicepresident at Weft Corporation says: The guys who want it [get ahead]. The guys who work. You can spot it in the first six months. They work hard, they come to work earlier, they leave later. They have suggestions at meetings. They come into a business and the business picks right up. They don’t go on coffee breaks down here [in the basement]. You see the parade of people going back and forth down here? There’s no reason for that. I never did that. If you need coffee, you can have it at your desk. Some people put in time and some people work. (Location 992, Quote 29, Moral Mazes)
But everyone at this level works hard, which was more about showing you work hard than the results of the work, because concrete outcomes don’t much matter:
As one manager says: “Personality spells success or failure, not what you do on the field.” (Location 1383, Quote 33, Moral Mazes)
It’s not like there were ever objective criteria:
Managers rarely speak of objective criteria for achieving success because once certain crucial points in one’s career are passed, success and failure seem to have little to do with one’s accomplishments. (Location 917, Quote 42, Moral Mazes)
Which makes sense, because if everyone is the same, then concrete outcomes are just luck:
Assuming a basic level of corporate resources and managerial knowhow, real economic outcome is seen to depend on factors largely beyond organizational or personal control. (Location 1592, Quote 46, Moral Mazes)
I am supremely confident that this perspective is completely bonkers. There is huge differential between better and worse no matter how high up you go or how extreme your filters have already been. But what matters here is what the managers believe. Not what is true. Talent or brilliance won’t save you if no one believes it can exist. If noticed it will only backfire:
Striking, distinctive characteristics of any sort, in fact, are dangerous in the corporate world. One of the most damaging things, for instance, that can be said about a manager is that he is brilliant. This almost invariably signals a judgment that the person has publicly asserted his intelligence and is perceived as a threat to others. What good is a wizard who makes his colleagues and his customers uncomfortable? (Location 1173, Quote 88, Moral Mazes)
How do things get so bad?
That’s the question we’ll look at an aspect of next post. From here I anticipate 35 day gaps between posts.
Questions that will be considered later, worth thinking about now, include: How does this persist? If things are so bad, why aren’t things way worse? Why haven’t these corporations fallen apart or been competed out of business? Given they haven’t, why hasn’t the entire economy collapsed? Why do regular people, aspirant managers and otherwise, still think of these manager positions as the ‘good jobs’ as opposed to picking up pitchforks and torches?
Discuss
Q & A with Stuart Russell in AISafety.com Reading Group
On Wednesday the 8th 11:45 PST = 19:45 UTC, Stuart Russell will be joining the online AISafety.com Reading Group, and answer questions about his book, Human Compatible.
If you'd like to join, please add me on Skype ("soeren.elverlin").
This book has previously been discussed on LessWrong in 2 posts:
https://www.lesswrong.com/posts/FuGDYNvA6qh4qyFah/thoughtsonhumancompatible
Discuss
Machine Learning Can't Handle LongTerm TimeSeries Data
More precisely, today's machine learning (ML) systems cannot infer a fractal structure from time series data.
This may come as a surprise because computers seem like they can understand time series data. After all, aren't selfdriving cars, AlphaStar and recurrent neural networks all evidence that today's ML can handle time series data?
Nope.
SelfDriving CarsSelfdriving cars use a hybrid of ML and procedural programming. ML (statistical programming) handles the lowlevel stuff like recognizing pedestrians. Procedural (nonstatistical) programming handles highlevel stuff like navigation. The details of selfdriving car software are trade secrets, but we can infer bits of Uber's architecture from the National Transportation Safety Board's report on Uber's selfcrashing car as summarized by jkaufman.
"If I'm not sure what it is, how can I remember what it was doing?" The car wasn't sure whether Herzberg and her bike were a "Vehicle", "Bicycle", "Unknown", or "Other", and kept switching between classifications. This shouldn't have been a major issue, except that with each switch it discarded past observations. Had the car maintained this history it would have seen that some sort of large object was progressing across the street on a collision course, and had plenty of time to stop.
We see here (above) is that the car throws away its past observations. Now let's take a look at a consequence of this.
"If we see a problem, wait and hope it goes away". The car was programmed to, when it determined things were very wrong, wait one second. Literally. Not even gently apply the brakes. This is absolutely nuts. If your system has so many false alarms that you need to include this kind of hack to keep it from acting erratically, you are not ready to test on public roads.
Humans have to write ugly hacks like this when when your system isn't architected bottomup to handle concepts like the flow of time. A machine learning system designed to handle time series data should never have human beings in the loop this low down the ladder of abstraction. In other words, Uber effectively uses a stateless ML system.
All you need to know when driving a car is the position of different objects and their velocities. You almost never need to know the past history of another driver or even yourself. There is no such thing as time to a stateless system. A stateless system cannot understand the concept of time. Stateless ML systems make sense when driving a car.
AlphaStarAlphaStar (DeepMind's StarCraft II AI) is only a little more complicated than Uber's selfcrashing car. It uses two neural networks. One network predicts the odds of winning and another network figures out which move to perform. This turns a timeseries problem (what strategy to perform) into a two separate stateless[1] problems.
Comparisons between AlphaStar and human beings are fudged because StarCraft II depends heavily on actionsperminute (APM), the speed a player can perform actions. Humans wouldn't have a chance if AlphaStar was not artificially limited in the number of actions it would take. Games between humans and AlphaStar are only interesting because AlphaStar's actions are limited thereby giving humans a handicap.
Without the handicap, AlphaStar crushes human beings tactically. With the handicap, AlphaStar still crushes human beings tactically. Human beings can beat AlphaStar on occasion only because elite StarCraft II players possess superior strategic understanding.
Most conspicuously, human beings know how to build walls with buildings. This requires a sequence of steps that don't generate a useful result until the last of them are completed. A wall is useless until the last building is put into place. AlphaStar (the red player in the image below) does not know how to build walls.
With infinite computing power, AlphaStar could eventually figure this out. But we don't have infinite computing power. I don't think it's realistic to expect that AlphaStar will ever figure out how to build a wall with its current algorithms and realistic hardware limitations.
AlphaStar is good at tactics and bad at strategy. To state this more precisely, AlphaStar hits a computational cliff for understanding conceptually complex strategies when time horizons exceed the tactical level. Human brains are not limited in this way.
Recurrent Neural Networks (RNNs)RNNs are neural networks with a form of shortterm memory. A newlyinvented variant incorporates long shortterm memory. In both cases, the RNN is trained with the standard backpropagation algorithm used by all artificial neural networks[2]. The backpropagation algorithm works fine on short timescales but quickly breaks down when strategizing about longer timescales conceptually. The algorithm hits a computational cliff.
This is exactly the behavior we observe from AlphaStar. It's also the behavior we observe in natural language processing and music composition. ML can answer a simple question just fine but has trouble maintaining a conversation. ML can generate classical music just fine but can't figure out the chorus/verse system used in rock & roll. That's because the former can be constructed stochastically without any hidden variables while the latter cannot.
The Road AheadThis brings us to my first law of artificial intelligence.
Any algorithm that is not organized fractally will eventually hit a computational wall, and vice versa.
―Lsusr's First Law of Artificial Intelligence
For a data structure to be organized fractally means you can cut a piece off and that piece will be a smaller version of the original dataset. For example, if you cut a sorted list in half then you will end up with two smaller sorted lists. (This is part of why quicksort works.) You don't have to cut a sorted list in exactly the middle to get two smaller sorted lists. The sorted list's fractal structure means can cut the list anywhere. In this way, a sorted list is organized fractally along one dimension. Others examples of fractal datastructures are heaps and trees.
Another fractal structure is a feed forward neural network (FFNN). FFNNs are organized fractally along two dimensions. There are two ways you can cut a neural network in half to get two smaller neural networks. The most obvious is to cut the network at half at a hidden layer. To do this, duplicate the hidden layer and then cut between the pair of duplicated layers. The less obvious way to cut a neural network is to slice between its input/output nodes.
Each dimension of fractality is a dimension the FFNN can scale indefinitely. FFNNs are good at scaling the number of input and output nodes they possess because a FFNN is structured fractally along this dimension (number of input/output nodes). FFNNs are good at scaling the complexity of processing they perform because a FFNN is structured fractally along this dimension too (number of hidden layers).
Much of the recent development in image recognition comes from these two dimensions of fracticality[3]. Image recognition has a high number of input nodes (all the color channels of all the pixels in an image). FFNN can apply complex rules to this large space because of its fractal geometry in the number of hidden layers dimension.
FFNNs are stateless machines so feeding time series data into a FFNN doesn't make sense. RNNs can handle time series data, but they have no mechanism for organizing it fractally. Without a fractal structure in the time dimension, RNNs cannot generalize information from short time horizons to long time horizons. They therefore do not have enough data to formulate complex strategies on long time horizons.
If we could build a neural network fractallyorganized in the time domain then it could generalize (apply transfer learning) from short time horizons to long time horizons. This turns a small data problem into an big data problem. Small data problems are hard. Big data problems are easy.
This is why I'm so interested in ConnectomeSpecific Harmonic Waves (CSHW). The fractal equation of harmonic waves (Laplacian eigendecomposition) could answer the problem of how to structure a neural network fractally in the time domain.
AlphaStar does contain one bit of timeseries comprehension. It can predict the actions of enemy units hidden by fog of war. I'm choosing to ignore this on the grounds it isn't an especially difficult problem. ↩︎
The human brain lacks a known biological mechanism for performing the backpropagation algorithm used by artificial neural networks. Therefore biological neural networks probability use a different equation for gradient ascent. ↩︎
Progress also comes from the application of parallel GPUs to massive datasets, but scaling int this way wouldn't be viable mathematically without the twodimensional fractal structure of FFNNs. ↩︎
Discuss
[Book Review] The Trouble with Physics
Lee Smolin's book The Trouble with Physics: The Rise of String Theory, the Fall of a Science, and What Comes Next is ostensibly about why string theory can't solve what he calls the Five Great Problems in theoretical physics:
 "Combine general relativity and quantum theory into a single theory that can claim to be the complete theory of nature" i.e. "the problem of quantum gravity".
 "Resolve the problems in the foundations of quantum mechanics, either by making sense of the theory as it stands or by inventing a new theory that does make sense."
 "Determine whether or not the various particles and forces can be unified in a theory that explains them all as manifestations of a single, fundamental entity."
 "Explain how the values of the free constants in the standard model of particle physics are chosen in nature."
 "Explain dark matter and dark energy. Or, if they don't exist, determine how and why gravity is modified on large scales. More generally, explain why the constants of the standard model of cosmology, including the dark energy, have the values they do."
Actually, The Trouble with Physics is about a much broader problem—disruptive innovation as described in Clayton Christenson's The Innovator's Dilemma and Thomas Kuhn's The Structure of Scientific Revolutions. In Smolin's view, the scientific establishment is good at making small iterations to existing theories and bad at creating radically new theories. It's therefore not implausible that the solution to quantum gravity could come from a decade of solitary amateur work by someone totally outside the scientific establishment.
This is an extraordinary claim and extraordinary claims demand extraordinary evidence. Smolin's book includes plenty of such evidence.
He [Carlo Rovelli] got so many emails [from string theorists] asserting that Mandelstam had proved [String Theory] finite that he decided to write to Mandelstam himself and ask his view. Mandelstam is retired, but he responded quickly. He explained that what he had proved is that a certain kind of infinite term does not appear anywhere in the theory. But he told us that he had not actually proved that the theory itself was finite, because other kinds of infinite terms might appear. No such term has ever been seen in any calculation done so far, but neither has anyone proved that one couldn't appear.
Smolin's book is full of evidence like this. I find his argument convincing because it aligns with my personal experience. I earned a bachelor's degree in physics because I wanted to help figure out the Great Problems. I wanted to discuss big ideas with Einsteins and Feynmans. Instead I found narrowly specialized postdocs. They're good at math but tend to have little education in the broader history of science. To them, the physical laws might as well have been handed down on stone tablets. Physicists' job is that of a madrasa.
This might be tenable if the foundations of physics (general relativity and quantum theory) were plausibly true. But general relativity and quantum theory contradict each other. They cannot both be correct. Therefore at least half of physics is wrong.
The scientific establishment isn't structured to resolve problems of this magnitude. Until we restructure the institution of physics so that it promotes diversity of thought (unlikely anytime soon) it's not inconceivable that the answers to the Five Great Problems could come from an amateur.
Smolin's book has inspired me to begin working on a theory of quantum gravity. I'll need to learn new things like quantum field theory. I might give up before getting anywhere. But at least I know I don't understand basic physics. That puts me in good company.
I think I can safely say that nobody understands quantum mechanics.
― Richard Feynman
Discuss
Страницы
 « первая
 ‹ предыдущая
 …
 2
 3
 4
 5
 6
 7
 8
 9
 10
 следующая ›
 последняя »