Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 1 час 7 минут назад

DC SSC Meetup

5 августа, 2019 - 19:16
Published on August 5, 2019 4:16 PM UTC

Meetup this Saturday, August 10th.



Discuss

Do you do weekly or daily reviews? What are they like?

5 августа, 2019 - 04:23
Published on August 5, 2019 1:23 AM UTC

  • What system(s) do you use to keep yourself organized and working toward your goals? I'm interested in technologies, but more interested in what ontology you use to organize your tasks, events, goals, and plans.
  • Did you build your system slowly over time, or adopt it all at once?
  • What feels most important about your system to you?


Discuss

Can we really prevent all warming for less than 10B$ with the mostly side-effect free geoengineering technique of Marine Cloud Brightening?

5 августа, 2019 - 03:12
Published on August 5, 2019 12:12 AM UTC

If we're to believe the Philosophical Transactions of the Royal Society, or the Copenhagen Consensus Center, or apparently any of the individual geoengineering researchers who've modelled it, it's possible to halt all warming by building a fleet of autonomous wind-powered platforms that do nothing more sinister than spraying seawater into the air, in a place no more ecologically sensitive than the open ocean, and for no greater cost than 10 billion USD.

If this works, no significant warming will be allowed to occur after the political static friction that opposes the use of geoengineering is broken.

Isn't any amount of mean warming bad? So shouldn't we deploy something like this as soon as possible? Shouldn't we have started deploying it years ago?

The only clear side effect I've seen mentioned is a potential reduction in rainfall in south america, but one analysis found that this could be avoided by simply not doing any brightening near the land in that region.

I want to make it clear how little support this needs in order to get done: 10 billion USD is less than a quarter of the tax income generated by the nation of just New Zealand one year. A single tiny oecd government could, in theory, do it alone. It wont need the support of a majority of the US. It probably wont require any support from the US at all.

What should we do with this information?

I do buy the claim that public support for any sort of emission control will evaporate the moment geoengineering is realised as a tolerable alternative. Once the public believe, there will never be a quorum of voters willing to sacrifice anything of their own to reduce emissions. I think we may need to start talking about it anyway, at this point. Major emitters have already signalled a clear lack of any real will to change. The humans will not repent. Move on. Stop waiting for humanity to be punished for its sin, act, do something that has some chance of solving the problem.

Could the taboos against discussing geoengineering delay the discovery of better, less risky techniques?

Could failing to invest in geoengineering research ultimately lead to the deployment of relatively crude approaches with costly side effects? (Even if cloud brightening is the ultimate solution to warming, we still need to address ocean acidification and carbon sequestration, and I'm not aware of any ideal solution to those problems yet, but two weeks ago I wasn't aware of cloud brightening, so for all I know the problem isn't a lack of investment, might just be a lack of policy discussion.)

Public funding for geoengineering research is currently non-existent in the US (All research in the US is either privately funded or funded by universities), and weak in China (3M USD. 14 members, no tech development, no outdoor experiments.)



Discuss

[Resource Request] What's the sequence post which explains you should continue to believe things about a particle moving that's moving beyond your ability to observe it?

5 августа, 2019 - 01:31
Published on August 4, 2019 10:31 PM UTC

I think it's a good and legitimate use of the question system to ask people to help you locate posts you can't quite remember where it is, or request posts relevant to a topic you're interested in. If there are ever "too many" such questions, I'm sure we can find a suitable way to "filter" them to be seen only in suitable places.

The posts is something about a particle moving at the speed of light and that you should continue to believe in your models even though you can't observe it any more: moving at the speed of light + expanding universe (?).




Discuss

AI Alignment Open Thread August 2019

5 августа, 2019 - 01:09
Published on August 4, 2019 10:09 PM UTC

This is an experiment in having an Open Thread dedicated to AI Alignment discussion, hopefully enabling researchers and upcoming researchers to ask small questions they are confused about, share very early stage ideas and have lower-key discussions.



Discuss

Is there a user's manual to using the internet more efficiently?

5 августа, 2019 - 00:19
Published on August 4, 2019 6:51 PM UTC

I'd like to condition the responses by elaborating on what I currently do and why I think having a body of work to reference would be beneficial.

This all started a couple months ago when I realized that most of internet usage revolved around Reddit/Quora + Wikipedia + arXiv. I use the internet to have questions answered, to reference authoritative information, and to explore developments in various research interests.

This equilibrium was disturbed by the invention of GPT-2. I started a project of writing a journal in conjunction with this "super"-autocomplete algorithm and realized that part of the reason I thought it sounded so great was because I was prompting it with things you might find on Reddit/Wikipedia/arXiv. This is kind of hard to explain, basically I awoke to this gestalt that the way I was thinking was being affected by the content I was consuming. Obvious in hindsight, but at the time quite a shocker.

I researched some ways to start curating my own content and found Pocket. After that, things started taking off. I currently use a combination of Pocket/GPT-2/Google to manage my curation of internet content. To constrain information overload, I generally use a question -> hypothesis -> research/experiment workflow. Sometimes getting the right question is hard so I'll use GPT-2 to try and "super"-autocomplete my way into a phrase that has potential. After that I try to google the question/phrase that popped in the first step. However, GPT-2 really has been sending me all over the internet so it's quite difficult to relate things together or to evaluate the quality of the information I'm receiving.

I think having some sort of book organizing the different "dimensions" of internet usage would be useful. Being able to have a tangible organization layout of the available tools would help me select ones that useful for whatever I'm interested in. At the moment I have only a few; search, save, "super"-autocomplete. Any and all references are useful, but the more in depth the better. Thanks!



Discuss

The Planning Problem

5 августа, 2019 - 00:15
Published on August 4, 2019 6:58 PM UTC

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

(Originally posted on my blog here. Not sure how mathematical this place is, but it was seamless to crosspost so I'll continue doing that and see what happens!)

It's fair to say that we spend a lot of our lives floating from task to task. Some are work and some are not. Nevertheless, they are tasks. Things that we want to do. Some are complicated are require planning. However, since planning is such an integral step in completing more complicated tasks it seems as though we might want to "meta-plan" or plan our planning. I became fascinated by the concept after I realized it should be possible to frame the whole thing as a sort of optimization problem. In practice, the problem remains complicated by the fact that optimization occurs in a necessarily online fashion. However, any reasonable method of planning is especially well behaved in the formalism I'll present. Moreover, even when the method is faulty due to human bias, as long as it is consistent we still end up being to prove that meta-planning, in a certain sense, is rational.

One particularly recursive aspect of the problem is that to carry out a task to completion we must already have a plan that we execute. The plan could be written in an ‘online’ manner, which is done in improvisation. Still, the line between planning and doing at this point becomes very blurry and quickly leads to a chicken/egg problem. Which came first, knowledge of a plan or how to execute it? I’ll bound the scope of this using a few concrete examples and then we’ll jump off into abstraction to step around the recursion.

A First Example

For me, the most natural first example would be this article. I don’t want to spend too much time planning out the essay. On the other hand, I don’t want to plan so little that the organization and flow are poor. Usually, I spend a bit too much time in the planning phase so this article is an experiment in trying to do something a bit more optimal. What do I mean by more optimal? The general idea is that you are carrying out some task, such as writing an article and you can spend some amount of time thinking and planning out a strategy to achieve your goal. After that, you carry out the task or write the article. Imagine a task like the traveling salesman. If there are enough cities and enough distance between them most people would agree that having a clear itinerary is going to be more effective than just visiting cities at random. In this example, it's clear that there must be a trade-off given that the general algorithm for planning would be NP complete!

The intuition seems to be that there is a strategy that can balance the objectives of planning and executing optimally. However, over the years I’ve caught myself making the error of measuring the effectiveness of having a plan or strategy in terms of how much time it saves me in carrying out the task. Yet, I will commonly ignore the time spent on creating a strategy in the first place. For example, while I was writing this article I spent a significant amount of time crafting a uniqueness proof for the optimal strategy I’ll propose in the next section. Was that optimal? I had a picture of the proof done in under an hour. If you’d asked me right after I’d proved the theorem whether or not it was a good use of time I’d say "Yes, but it wasn’t necessary for the article." So it’s possible that deviations from this supposed optimal strategy might be part some other strategy for the optimization of a different quantity. However, the case remains that making the proof was not an optimal use of my time if I claim the purpose was for the completion of the article.

It's worth pointing out that people are notoriously bad at estimating how much time things will take (see planning fallacy). Moreover, we’re poor at estimating the amount of time we spend on tasks with variable time requirements. Thus, when we decide to create a plan we often neglect to estimate the cost of taking such an action and we often fail to measure whether or not the plan was successful. It’s a sort of recursive issue since usually, we are trying to use the plan to estimate the cost of taking another action. In some sense what I’m trying to do to facilitate analysis is contain the infinite-regress to a second-order meta-plan.

A formal statement

Given that some vague notions are surrounding the ideas of planning, executing, and optimality let's put things on a more formal footing before going into detail. Assume that there is an oracle that could tell you how long a task would take without a dedicated plan. Say that for the task of writing this article it returned seven hours. If I think that I planning could help reduce this time I can take a bet with this oracle. I would place a bet that the time it takes to make a plan plus the time it takes to execute the resulting plan is less than the time returned by oracle. If not, I lose the bet. For example, I might spend 6 hours creating a detailed plan and then spend two hours executing for a total of eight hours. In this case, I lose. Even though I was able to significantly reduce the execution time, I spent way too much time planning to make the strategy effective and ultimately spent more time on the task than if I had just started executing from the start.

Same as the above, but with the level of formality necessary to make a rigorous argument. Let's say you have to complete some task T. Suppose also that you have some sort of initial policy π0 that delineates all the steps you need to take to complete T. Moreover, you also have an oracle Φ that will produce an estimate of you how long it will take to execute π. Naturally, we’ll place a restriction on Φ(π) so that it’s bounded and always equal to or greater than zero. Now, say you can invest t-time and plan π0→πt in the sense that it’s always the case that Φ(π0)≥Φ(πt). Thus, with a slight abuse of notation we can write the result of planning π for t-time as Φ(t). Naturally, our goal is to minimize the total amount of time Γ(t)=t+Φ(t) it takes to execute the planning and the resulting policy.

In the case where everything is nice Φ∈C1 and is convex. We know that it must be the case that Φ(t)≥0. Thus, it should follow that limt→∞Φ(t) exists and that limt→∞Φ′(t)=0. The point is that condition for the minima is easy to state and is equivalent to when Φ′(t)=−1. Thus, as long as the benefit rate of planning is faster than the rate at which time progresses there will be an optimal planning time t′.

Examples a)

Let’s consider a semi-realistic example of a planning function. The most obvious requirement to try and model is that we should see diminishing returns on planning as Φ(t)→0. Thus, a crude model would be to model our planning function so that the reduction in execution time is proportional to the size of the current execution time.

∂Φ∂t=−λ⋅Φ ⇒Φ(t)=T0⋅e−λt

We’re using T0 to model the amount of time a task would take with zero planning and λ to model the rate of reduction. We can solve for the optimal t′ using our discussion above since Φ is convex and bounded. We end up with,

t′=1λlog(λT0) ,Γ=1λ(1+log(λT0))

Thus, the planning to executing ratio ranges from zero to one and depends solely on λT0. For λT0<1 we’ll have t′<0 which means that planning is an ineffective option.

It's worth noting a connection between this simplistic model and the simplistic model we usually use to make plans. Usually, we make plans by creating lists of sub-tasks or steps that need to be completed. Sometimes, we iterate this process if the sub-tasks are difficult to complete. In an idealized model, we'd model the creation of a task list using a branching process and then claim that the execution time of a task is reduced proportionally to the number of leaves in the resulting tree.

b)

There is an important quantity here that can be used to make a practical algorithm. From the first example, we showed that the task reduction quantity λT0 governs the effectiveness of planning. The requirement that 1">λT0>1 is the same as demanding that the rate of reduction from planning be greater than one.

For many common tasks, it’s immediately obvious whether or not a plan should be constructed. Assume that this decision rule is a statistic of λT0. It follows that we can estimate the reduction in total time by looking at the current plan for execution at each time step. Once this estimate of the total time reaches a minimum we should stop planning and start executing.

The pathological aspect of this algorithm is that there might be a critical threshold that needs to be reached before planning is effective. Consider that we estimate λT0, but the true model for Φ has a varying rate equation. However, even in the worst case, we’d only double the amount of time taken to complete the task. The reasoning is that for planning to be an effective strategy the minimum must be reached sometime before the estimated time for execution of the initial plan. If this assumption is violated our original hypothesis would be as well which means that are no guarantees about when or if a minimum might be reached.

It would seem as though we could layer more complexity onto these simple models to make them more effective. For instance, we can formalize what it means to determine whether or not our guess is correct by using statistical estimators. It also seems as though the convexity requirement on Φ is a bit strong for our purposes since planning is a somewhat complicated and somewhat arbitrary/random affair. Finally, the fact that we can create plans that have sub-plans means that we can recursively call our meta-plan for making plans. We'll come back to this last point in a bit.

General Strategy

As noted in the previous section, we’ve restricted Φ so severely that we might be concerned it would tell us nothing about whether or not it’s a good idea in general to try and optimize π. This is because we have no guarantee that our optima at t′ is better than t+Φ(t) for t∈[0,t′]. Now, lucky for us most of this has already been studied in the context of optimal persistent policy theory. The theory deals with optimal selection problems where the n-th observation costs c(n). Thus, we can use this tool and assume we can construct a sequence of plans {πt}t≥0 and incur unit cost for each plan we create.

Let V(π) denote the expected time of planning plus execution for continuing to compress π after the observation of π. It follows that V defines a stopping criteria for if V(π)<Φ(π) we should expect to do worse unless we stop planning and start executing. In practice, for each t we observe/estimate Φ(t) and pay a cost t to do so. To accommodate error in estimation we introduce a transition function to map πt→πt+1. It’s given as the probability of going to policy y starting from x and will be written as Px(y). All of our policies live in the decision space D. The decision to continue planning incurs unit cost. If our V was optimal for each π we’d have,

(1)V(π)=min(Φ(π),∫DV(y) dPx(y)+1)

Theorem 3.1. Suppose that Φ(x) is bounded on D. Then equation (1) has at least one solution V that satisfies infd∈DV(d)≥infd∈DΦ(d).

Proof. The trick is to define a sequence of partial solutions using recursion. Define truncations of V as,

V0(π)=Φ(π) V1(π)=min(Φ(π),∫πV0(π) dPx(y)+1) Vn(π)=min(Φ(π),∫DVn−1(π) dPx(y)+1)

⇒Vn(π)≥Vn+1(π)

This is saying that the expected execution will go down as we continue to refine our V. It also follows that there is a limit to how far we can push this refinement process as we must have,

⇒Vn(π)≥infπ∈D(Φ(π))∀n

The intuition here is that we can only do as good as the best plan available. It follows from the Lebesgue Convergence Theorem that,

V∗(π)=limn→∞Vn(π)

=min(Φ(π),limn→∞∫DVn−1(π) dPx(y)+1)

=min(Φ(π),∫DV∗(π) dPx(y)+1)

⇒infπ∈DV∗(π)≥infπ∈DΦ(π)

Okay, now we need to show that our solution is unique. The idea of the proof is that if you had more than a single solution, say V and U, there will have to be regions where one policy will go for continuing and the other won’t. If we consider a region where U<V this means that the advantage of continuing will be greater if you use U. On the other hand, over the region where the value of V is greater than U there will also have to be a maximum difference between the value functions. However, this would mean that the advantage function of U will need to be greater than V which is a contradiction.

Given the sketch, it sounds like we’ll have to demand that Φ(D) be compact. We’ll show after the proof that it’s possible to relax this condition to an assertion that it’s always possible to transition to a certain set of policies.

Theorem 3.2. Let Φ(D) be a compact topological space and for π∈D let 0">Px(A)>0 for all open sets in A∈D. Then there’s at most a single solution such that (i) V is continuous and (ii) infπ∈DV(π)=infπ∈DΦ(x).

Proof. For a function R(x) that measures execution time we define the advantage as γr(π)=R(x)−∫DR(y) dPx(y). Think of this as a measure of the change in expected execution time. If it’s positive the current state has an advantage over the next expected state. We define things this way so that if R is a solution of (1) then it follows that γr(x)≤1. To see this note that subtracting V from the left-hand side of (1) needs to give zero. Now, imagine we have two solutions U and V for (1). Let,

U(x) \rbrace, \quad S_3 = \lbrace x \mid V(x) < U(x) \rbrace ">S1={x∣V(x)=U(x)},S2={x∣V(x)>U(x)},S3={x∣V(x)<U(x)}

We’ll show that S2=S3=∅. Define W(x)=max(U(x),V(x)) then for x∈S1∪S2 we have W(x)=V(x) and for all x∈D,

∫DV(y) dPx(y)≤∫DW(y) dPx(y)

From this we conclude that γw(x)≤γv(x)≤1. Since U(x)<V(x) and inf(V(x))=inf(Φ(x)) we conclude that we have to have γu(x)=1 or else (ii) won’t be satisfied. We’re about to run into a problem. We know from the compactness of Φ(D) that that there exists x0 such that,

0">W(x0)−V(x0)=max(W(x)−V(x))>0

\int_D W(y) - V(y) \ dP_x(y)">⇒W(x0)−V(x0)>∫DW(y)−V(y) dPx(y)

\gamma_v(x)">⇒γw(x0)>γv(x)

This is a contradiction so it must be the case that S2=∅. By symmetry of U and V we can also eliminate S3.

Corollary 3.3. Let Φ(D) be a measurable space where there exists a non-empty Φ(D′) such that Φ(D′)⊂Φ(D) and \delta > 0">Px(D′)>δ>0 for all x∉D′. Then we have a single solution V that is measurable and bounded.

We can use the same argument as above. However, because we lost compactness we must look at the infimum which will have a value of say −M. We fix ϵ<Mδ(1+δ)−1 and then we can get a point xϵ inside of the set we took an infimum over. Now when we integrate over S we can extract Sϵ and everything will work out.

Redux

Does this information help illuminate anything? Returning to the article example given at the beginning, it would seem as though I spent too much time planning the mathematical portion of the writing. I'm about done with the article and I see that the mathematical portion grew in scope way beyond what was intended in the original plan. This deviation was significant due to me not realizing that the article would need to be formatted. I think that if I had realized how much time it would take to format the article with late it would've become readily apparent that I should've been more sparing in the amount of math I used. Overall, I spent about 14 hours on this article while my original estimate was about 7. I spent several hours making mathematical proofs when I really should've been reading up on how to format an article with latex correctly. I also think that I could've spent a bit more time proof-reading everything before submitting.

What ultimately happened was that while I created an initial plan for the article, the unexpected addition of mathematical arguments broke all my estimates. I should've realized that formatting latex would take a long time since it was my first attempt. It would seem as though the idea of clean separation between planning and execution is an abstraction. The reality is that we all switch between planning and executing whenever we engage with sufficiently complex tasks. I only planned the structure of the article for about an hour because I was overly optimistic about how long it would take to finish the whole thing. I fell prey to the planning fallacy.

Does this mean this idea failed? Not exactly. Recall the idealized model of planning. When we make plans we oftentimes will create tasks that themselves require further planning. It seems perfectly reasonable that we could simply "call" our meta-plan on these sub-tasks to reduce the effect of human bias. I was overly optimistic in thinking about how long it would take to format the math. This problem could've been heavily mitigated if I had a plan that made the task of making the mathematical section a sub-plan. If I had done this, I would've had a chance to look at the bigger picture and most likely would've proceeded differently.

Outlook

Ultimately, I think that the idea was a success because it was developed to enough generality to be useful in getting a concept handle on some pretty big questions surrounding the idea of meta-planning. First, what I am currently engaging in is the construction of a plan that when executed can manage the planning of plans. While I showed that for a broad class of tasks meta-planning is rational, for a much broader class of problems it isn't. Not all problems are compressible. This means that oftentimes the most effective plan will be to have no real plan at all. You can't plan for this either. Pursuing this line of reasoning is difficult as you quickly reach the limit of rationality due to the undecidability of the various questions involved. Perhaps this is a reason for less than perfect rationality in humans. Second, we're overly optimistic about how much time our plans will take to execute. However, when we realize that our plans were too optimistic we are more than capable of updating them because we naturally make hierarchical or tree-like plans. Using these three observations I conclude that the current model I've developed is sufficient to fine-tune itself for the plans that any systematic method could hope to be capable of tackling. There is no need to continue worrying about abstract details at the meta-level. I need to practice using what I came up with.



Discuss

Inversion of theorems into definitions when generalizing

4 августа, 2019 - 20:44
Published on August 4, 2019 5:44 PM UTC

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

This post describes a pattern of abstraction that is common in mathematics, which I haven't seen described in explicit terms elsewhere. I would appreciate pointers to any existing discussions. Also, I would appreciate more examples of this phenomenon, as well as corrections and other insights!

Note on prerequisites for this post: in the opening example below, I assume familiarity with linear algebra and plane geometry, so this post probably won't make much sense without at least some superficial knowledge of these subjects. In the second part of the post, I give a bunch of further examples of the phenomenon, but these examples are all independent, so if you haven't studied a particular subject before, that specific example might not make sense, but you can just skip it and move on to the ones you do understand.

There is something peculiar about the dependency of the following concepts in math:

  • Pythagorean theorem
  • Law of cosines
  • Angle between two vectors
  • Dot product
  • Inner product

In the Euclidean geometry of R2 (the plane) and R3 (three-dimensional space), one typically goes through a series of steps like this:

  1. Using the axioms of Euclidean geometry (in particular the parallel postulate), we prove the Pythagorean theorem.
  2. We take the right angle to have angle π/2 and calculate other angles in terms of this one.
  3. The Pythagorean theorem allows us to prove the law of cosines (there are many proofs of the law of cosines, but this is one way to do it).
  4. Now we make the Cartesian leap to analytic geometry, and start treating points as strings of numbers in some coordinate system. In particular, the Pythagorean theorem now gives us a formula for the distance between two points, and the law of cosines can also be restated in terms of coordinates.
  5. Playing around with the law of cosines (stated in terms of coordinates) yields the formula x1y1+x2y2=∥(x1,x2)∥∥(y1,y2)∥cosθ, where (x1,x2) and (y1,y2) are two vectors and θ is the angle between them (and similarly for three dimensions), which motivates us to define the dot product (as being precisely this quantity).

In other words, we take angle and distance as primitive, and derive the inner product (which is the dot product in the case of Euclidean spaces).

But now, consider what we do in (abstract) linear algebra:

  1. We have a vector space, which is a structured space satisfying some funny axioms.
  2. We define an inner product ⟨v,w⟩ between two vectors v and w, which again satisfies some funny properties.
  3. Using the inner product, we define the length of a vector v as ∥v∥=√⟨v,v⟩, and the distance between two vectors v and w as ∥v−w∥.
  4. Using the inner product, we define the angle between two non-zero vectors v and w as the unique number θ∈[0,π] satisfying cosθ=⟨v,w⟩∥v∥∥w∥.
  5. Using these definitions of length and angle, we can now verify the Pythagorean theorem and law of cosines.

In other words, we have now taken the inner product as primitive, and derived angle, length, and distance from it.

Here is a shot at describing the general phenomenon:

  • We start in a concrete domain, where we have two notions A and B, where A is a definition and B is some theorem. (In the example above, A is length/angle and B is the inner product, or rather, B is the theorem which states the equivalence of the algebraic and geometric expressions for the dot product.)
  • We find some abstractions/generalizations of the concrete domain.
  • We realize that in the abstract setting, we want to talk about A and B, but it's not so easy to see how to talk about them (because the setting is so abstract).
  • At some point, someone realizes that instead of trying to define A directly (as in the concrete case), it's better to generalize/"find the principles" that make B tick. We factor out these principles as axioms of B.
  • Finally, using B, we can define A.
  • We go back and check that in the concrete domain, we can do this same inverted process.

Here is a table that summarizes this process:

Notion Concrete case Generalized case A primitive; defined on its own terms defined in terms of B B a theorem defined axiomatically

In what sense is this pattern of generalization "allowed"? I don't have a satisfying answer here, other than saying that generalizing in this particular way turned out to be useful/interesting. It seems to me that there is a large amount of trial-and-error and art involved in picking the correct theorem to use as the B in the process. I will also say that explicitly verbalizing this process has made me more comfortable about inner product spaces (previously, I just had a vague feeling that "something is not right").

Here are some other examples of this sort of thing in math. In the following examples, the step of using B to define A does not take place (in this sense, the inner product case seems exceptional; I would greatly appreciate hearing about more examples like it).

  • Metric spaces: in Euclidean geometry, the triangle inequality is a theorem. But in the theory of metric spaces, the triangle inequality is taken as part of the definition of a metric.
  • Sine and cosine: in middle school, we define these functions in terms of angles and ratios of side lengths of a triangle. Then we can prove various things about them, like the power series expansion. When we generalize to complex inputs, we then take the series expansion as the definition.
  • Probability: in elementary probability, we define the probability of an event as the number of successful outcomes divided by the number of all possible outcomes. Then we notice that this definition satisfies some properties, namely: (1) the probability is always nonnegative; (2) if an event happens for certain, then its probability is 1; (3) if we have some mutually exclusive events, then we can find the probability that at least one of them happens by summing their individual probabilities. When we generalize to cases where the outcomes are crazy (namely, countably or uncountably infinite), instead of defining probability as a ratio, we take the properties (1), (2), (3) as the definition.
  • Conditional probability: when working with finite sets, we can define the conditional probability as P(A∣B):=|A∩B||B|. We then see that if Ω is the (finite) sample space, we have P(A∣B)=|A∩B||B|=|A∩B|/|Ω||B|/|Ω|=P(A∩B)P(B). But now when we move to infinite sets, we just define the conditional probability as P(A∣B):=P(A∩B)P(B).
  • Convergence in metric spaces: in basic real analysis in R, we say that limn→∞an=L if the sequence (an)∞n=0 satisfies some epsilon condition (this is the definition). Then we can prove that limn→∞an=L if and only if limn→∞|an−L|=0. Then in more general metric spaces, we define "limn→∞an=L" to mean that limn→∞d(an,L)=0. (Actually, this example is a little cheating, since we can just take the epsilon condition and swap in d(an,L) for |an−L|.)
  • Differentiability: in single-variable calculus, we define the derivative to be f′(x0):=limx→x0f(x)−f(x0)x−x0 if this limit exists. We can then prove that f′(x0)=L if and only if limx→x0|f(x)−(f(x0)+L(x−x0))||x−x0|=0. This latter limit is an expression that makes sense in the several-variable setting, and is what we use to define differentiability.
  • Continuity: in basic real analysis, we define continuity using an epsilon–delta condition. Later, we prove that this is equivalent to some statement involving open sets. Then in general topology we take the open sets statement as the definition of continuity.
  • (Informal.) Arithmetic: in elementary school arithmetic, we "intuitively apprehend" the rational numbers. We discover (as theorems) that two rational numbers a/b and c/d are equal if and only if ad=bc, and that the rationals have the addition rule a/b+c/d=(ad+bc)/(bd). But in the formal construction of number systems, we define the rationals as equivalence classes of pairs of integers (with second coordinate is non-zero), where (a,b)∼(c,d) iff ad=bc, and define addition on the rationals by (a,b)+(c,d):=(ad+bc,bd). Here we aren't really even generalizing anything, just formalizing our intuitions.
  • (Somewhat speculative.) Variance: if a random variable X has a normal distribution, its probability density can be parametrized by two parameters, μ and σ2, which have intuitive appeal (by varying these parameters, we can change the shape of the bell curve in predictable ways). Then we find out that σ2 has the property σ2=E[(X−μ)2]. This motivates us to define the variance as E[(X−μ)2] for other random variables (which might not have such nice parametrizations).


Discuss

Cephaloponderings

4 августа, 2019 - 19:45
Published on August 4, 2019 4:45 PM UTC

Cross-posted from Putanumonit.

Hello all. This is Jacob’s wife, and while Jacob is off chilling in the fjords, I’m staying home and have volunteered to write a guest post.

Instead of trying to match the usual Putanumonit fare, I chose to write about something I find very exciting. I’m a biologist, and while I normally work on research involving the sense of touch in tiny roundworms, I love to read about interesting animal biology that I come across through pop-sci media. Yes, I am one of those people you might find spouting “weird animal sex facts” at parties. But seriously, the diversity of life out there is incredible, and I really can’t get enough of learning and thinking about it. So here’s a bit about a creature that’s been capturing my fascination recently.

Consider the octopus. But first, you might want to consider considering the octopus. Why consider the octopus? I think they’re incredibly interesting creatures, and upon reflecting I think that’s probably because I’m so confused by them. They wouldn’t be very interesting to me if everything I knew about them fit well into my world view without jostling the content or structure of surrounding information, even if I had reached the point of knowing enough to know how much I didn’t know. Working in research, I’ve been whacked upon the head several times with the fact that just because information isn’t known or a problem isn’t solved, that doesn’t automatically mean that anyone cares about learning the information or solving the problem.

That goes for myself as well: I don’t know how many teeth the average walrus has, but until now it’s never occurred to me to look it up because it really doesn’t make much of a difference to me whether it’s 26 or 32. Once I started thinking about it, I had to Google the answer (it’s 18), but I’m happy to stop there instead of asking the same question about every other toothed animal I can think to name. It’s slightly more interesting for me to consider what walruses use their tusks for (Google says mostly for gripping the ice to haul themselves out of the water), but why should that question be more interesting? Walrus tusks are weird, and weirdness contributes to interestingness.

Whaaaat is going on?

For me, having the initial question answered actually spawns more questions. If tusks are so useful for pulling oneself up onto the ice, why are there seals without tusks? Now I’ve added confusion on top of the weirdness, and I’m even more interested. The same thing happened to me regarding octopuses.

Exhibit A: Octopuses are weird.

Octopuses and other cephalopods (the clade including octopuses, squid, and cuttlefish) can change the color of their skin by expanding and contracting pigment-filled bags to show or conceal their contents. Changes in colors and patterns are associated with certain behaviors, like hunting, resting, or defensive maneuvering. Octopuses can also change the texture of their skin by extending projections called papillae, which go from being small bumps to tall spikes. Feel free to spend a few minutes watching this in actionon YouTube.

Nothing but a bit of coral here

Another particularly strange facet of these creatures: octopuses have an unusually complex system for RNA editing, so that the same stretch of DNA can give rise to a large diversity of RNA codes (basic bio review: DNA is transcribed into RNA which is translated into proteins). In most animals, evolution is thought to occur mostly through changes to DNA, but cephalopod evolution and adaptationappears to have occurred mostly through RNA editing, with their DNA genomes changing relatively little over millennia.

Exhibit B: Octopuses are confusing.

Octopuses appear to be very intelligent. They get up to all sorts of shenanigans in captivity, like escaping their tanks, disassembling plumbing, or purposefully messing with electrical equipment for their own amusement. They’ve also been trained to solve puzzles and mazes, and seem to be able to recognize individual humans. However, they lack a lot of characteristics that we usually associate with intelligent animals: they aren’t social, they aren’t long-lived, and they don’t engage in post-hatching parental care.

Some intelligent animal species have something akin to culture, where certain adaptive behaviors must be taught. Cetacean (whale and dolphin) hunting techniques are one example of this, and since octopuses are known to use complex hunting strategies it would make sense for them to use their intelligence for intergenerational information exchange. But octopus parental care only happens up until egg-hatching, whereupon the octopus mom will go slink off and die instead of teaching her babies anything.

That brings me to something I find particularly confusing about octopuses: why do octopuses die after reproducing? It isn’t unusual for animals to die after reproducing once, but its certainly worth asking why that should be the case. What’s particularly interesting about the way octopuses die after reproducing is that it seems to occur by active self-destruction rather than by total expenditure of available resources.

Despite the adage of sperm being cheap and eggs being expensive, even male octopuses will often die after reproducing once. Some go to lengths to avoid being cannibalized by their mate, like detaching their mating arm (which is effectively like a penis for them) and leaving it stuck inside the female.

But without their penises it’s not as if they’re escaping with their lives in order to mate again (although some species have been found to have more than one mating arm). Instead they go off to become senile and die from their own carelessness within a few weeks. Usually that means being eaten by some other predator, so it would actually make more sense in my mind for those male octopuses to allow their mate to eat them, providing her with energy to take care of the fertilized eggs instead of just giving up their bodies to whatever random carnivore. Males have been observed mating with multiple female octopuses, but only for a relatively short period of time before they wander off and die alone.

While female octopuses do spend a lot of time and energy taking care of their fertilized eggs and will stop hunting while doing so, it still doesn’t seem that their deaths are absolutely energetically necessary. Female octopuses in captivity have been observed self-mutilating in apparent effort to hasten their own deaths after mating, smashing themselves into walls and eating off the tips of their own tentacles. Also, it seems that if a certain part of the octopus brain (called the optic gland) is removed, the females won’t engage in this behavior and instead go on to live, eat, and even mate again after reproducing. The optic gland was also found to release hormone-like signals that initiate programmed cell death elsewhere in the octopus’s body. With this being the case it seems that something more than malnourishment causes the female octopus to die after reproduction.

It’s important to note that these experiments were done on octopuses living in captivity, so perhaps a wild female octopus would have little chance of survival after spending so much energy taking care of her eggs, but if that were really the case, why would the animals need to have a self-destruct program installed?

Brooding octopus mom

All this points to orphaned octopus babies having an advantage over those with living parents, which is easy enough to fathom. Without mom and dad around, the kids are left with more food and territory for themselves. However I’m still confused. Wouldn’t a given gene be more likely to replicate itself residing in an individual that reproduced multiple times in addition to being in half the octopus babies it helped produce?

Here’s a try at coming up with a possible explanation: maybe octopuses are like single-use consumer products, think plastic forks. Plastic forks are so cheap that people are willing to buy them, eat with them once, toss them in the trash, and then buy new plastic forks for the next picnic that comes around. In order to reuse the forks, you’d have to wash them, which takes effort, but they can also be flimsy enough that you might already have a few bent prongs after a single round of use.

I don’t think I’d describe octopuses as particularly “cheap” though: while a single brood contains thousands of eggs, the female spends months guarding and cleaning the unhatched eggs without leaving to hunt and eat. After that, the offspring take months or even years to reach sexual maturity, and on average, you’d only expect two individuals from a brood to make it that far and successfully reproduce (assuming a stable population size). To continue with the fork analogy, it could be that they’re annoying to wash, here meaning that it’s difficult for them to make the transition from mating mode to growth and maintenance mode. Seeing as how the octopuses with their optic glands removed seemed to do okay at that, I’m not so sure, but again those octopuses were presumably being kept safe and fed with minimal effort on their part.

Is it that octopuses are flimsy like plastic fork prongs? Well, you’ve got the males that detach their mating arms to keep from being eaten, and generally they really can’t mate again after that (but again I’m super confused about why they bother trying to avoid being cannibalized if they don’t mate again afterwards). And you’ve got the females that are malnourished and weak after taking care of their eggs for months without hunting (but again I’m super confused as to why the self-destruct mode is necessary if they’re so likely to die right afterwards). So “flimsy” seems to fit but in a way that doesn’t make a lot of sense.

There is at least one octopus species that doesn’t tend to die after reproducing for the first time, called the Larger Pacific Striped Octopus (let’s just say LPSO). After reaching sexual maturity, LPSO females will brood for up to 8 months, repeatedly mating and spawning new eggs. The species’ unusual behavior life history was first recorded in the 1970s, but was rejected for publication because it seemed way too strange (in the sense of being so different from other octopuses) and reviewing scientists didn’t trust the observations. After that it took about 30 years for the observations to be confirmed and published.

LPSO kisses

What I consider to be the most striking difference about LPSOs is that they seem to be much more social than other octopus species. Instead of living solitary lives as almost all other octopuses do, they’ve been found living in large groups and sharing dens and hunting grounds. It seems like they’re capable of recognizing individuals among their species, and from what I could find they haven’t been observed cannibalizing each other. Perhaps most octopus parents have to die simply because they’re way too likely to eat their own offspring?

I think the biggest obstacle preventing us from understanding octopuses is that they’re hard for us to observe. We can capture them and raise them in captivity, but that doesn’t tell us much about how they’re adapted to the ecosystem they live in. We can stick video cameras underwater and make graduate students take careful notes on any octopus that swims by, but that only works for species that live in shallow water where light can get through. How to effectively study octopuses will prove to be quite the puzzle, but I’m looking forward to additional insights.

I hope you’re very confused about octopuses at this point, and I hope you’re happy about it. Science occurs in areas where believing you understand what’s going on tends to mean you’re deluding yourself, and NOT UNDERSTANDING has to be a default state that can sometimes be chipped into bits of UNDERSTANDING A LITTLE. But you know you might be in for a really good time when you come across a chunk of I AM SO CONFUSED BY THIS SEEMINGLY CONTRADICTORY INFORMATION, because that’s where you might find an interesting story that other people will care to hear.



Discuss

LessWrong Community Weekend 2019 – Last 10 Spots

4 августа, 2019 - 12:20
Published on August 4, 2019 9:20 AM UTC

TL;DR: 10 remaining spots! Apply here: tiny.cc/lwcw2019_signup Keynote by Dr. Wanja Wiese: “From Predictive Engines to Conscious Machines and Uploading?” Already joining us in Berlin? Tell your best fellow human to join us When? Fr 30.08 noon - Mo 02.09 Where? http://jh-wannsee.de How much? €200

Less than 28 days left until the LessWrong Community and aspiring rationalists take off together for an exciting long weekend of sharing, exploration, connection and celebration. In the last few months we have already gotten many inspiring applications and filled all but the very last 10 spots! Apply now and join us for this special event.

From Friday August 30th to Monday 2nd September aspiring rationalists from across Europe will gather for 4 days of socializing, fun and intellectual exploration. The majority of the content will be unconference style and participant driven. Yet, we are very delighted to welcome Dr. Wanja Wiese as our keynote speaker and his take on predictive engines, conscious machines, and uploading. Find the full abstract for the keynote attached below.

On Friday afternoon we will put up four big daily planners and before you can spell “epistemology” the attendees will fill them up with 50+ workshops, talks and activities of their own devising, such as sessions about rationality techniques, acrobatic yoga and authentic relating; you can learn all about new hyper-cost-effective altruistic interventions, whether a dragon could hover and much much more.

This is our 6th year and we feel that the atmosphere and sense of community at these weekends is something really special. If that sounds like something you would enjoy and you have some exciting ideas or skills to contribute, do come along and get involved. This year is the biggest one yet and it’s an entire day longer than previous years!

The ticket price of €200 includes accommodation for 3 nights, on-site meals (breakfast, lunch, dinner) and snacks, as well as a tasty welcome lunch at 12:00 on Friday, and a shuttle bus from the restaurant to the venue. On Monday, we checkout by 10:00, but can continue to use some of the conference rooms for coworking and socializing until 15:00.

We still have spots available! Apply here: tiny.cc/lwcw2019_signup and make sure to let us know what experience and ideas you may contribute to this event: tiny.cc/lwcw2019_contribution.

If you would not attend due to financial constraints or if you have any questions, please email us at lwcw.europe@gmail.com.

Looking forward to seeing you there, The Community Weekend organizers and LessWrong Deutschland e.V.

From Predictive Engines to Conscious Machines and Uploading? Predictive processing approaches continue to play an influential role in cognitive neuroscience and philosophy of cognitive science. According to predictive processing, perception and action are underpinned by inference processes on sensory signals, based on an internal model of the world. Since predictive processing is a type of computation, it can also be implemented in artificial, silicon-based systems. But would this endow artificial systems with the same types of mental properties that intelligent biological systems, such as ourselves, possess? A lot hinges on whether the neural mechanisms underpinning consciousness and cognition can be regarded as implementations of predictive processing. If implementing certain forms of predictive processing is sufficient for consciousness, non-biological conscious machines will be possible. In principle, this would also enable us to upload our minds, thereby transcending the limits of mortal biological organisms. But would an uploaded version of myself really be me? And to what extent am I real in the first place, especially if what I experience as ‘me’ is the result of an evolved inference process?



Discuss

Zeno walks into a bar

4 августа, 2019 - 10:00
Published on August 4, 2019 7:00 AM UTC

Zeno walks into a bar.

"I have a problem," he said.

"What is it?" said the bartender.

"Well, it has to do with the movement of physical bodies," said Zeno.

"Talk to my friend Max," said the bartender. He gestured toward a German man wearing round spectacles.

"Sir," said Zeno, "I wonder if you could help me with a problem."

"What's the problem?" said Max.

"Suppose I shoot an arrow from point .mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} A to point B," said Zeno. "Before it reaches point B it must first reach a point C1 midway between points A and B."

"Naturally," said Max.

"And before the arrow reaches point C1 it must reach a point C2 midway between points C1 and A," continued Zeno.

"I see," said Max.

"And before the arrow reaches point C2 it must reach a point C3 midway between points C2 and A," continued Zeno.

"Wait a minute," said Max. "How far apart are points A and B?"

"10 meters," said Zeno.

"Then yes," said Max. "I understand your situation."

"And before the arrow reaches point C3 it must reach a point C4 midway between points C3 and A," continued Zeno. "Do you see the impasse?"

"Nope," said Max, "I think we're getting somewhere. How long is the arrow?"

"One meter," said Zeno.

"The distance between points C3 and C4 is one sixteenth of a meter," said Max. "A one-meter-long arrow can be at point C3 and C4 at the same time."

"Let's consider the tip of the arrow then," said Zeno. "Before the tip of the arrow reaches point C3 it must reach a point C4 midway between points C3 and A."

They talked deep into the night.

"And before the high-energy particle reaches point C118 it must reach a point C119 midway between points C118 and A," continued Zeno.

"Hold on," said Max. "How far apart are points A and C119?"

"1.5×10−35 meters" said Zeno.

"That's shorter than 1.6×10−35 meters," said Max. "The uncertainty in the position of a particle must always exceed 1.6×10−35 meters, because of space-time equivalence and the quantum-mechanical velocity operator's non-commutation with position. Even theoretically, the wave function of a particle can't ever occupy a space smaller than 1.6×10−35 meters."

"Thanks," said Zeno.

"By the way," said Max, "What brought you to this question in the first place?"

"I wanted to know how to define the momentum of a particle at an instantaneous moment of time," said Zeno.

"You could have just asked," said Max. "The probability distribution of a particle's momentum is determined by the instantaneous phase and magnitude of its wave."



Discuss

Alleviating Bipolar with meditation

4 августа, 2019 - 08:15
Published on August 4, 2019 5:15 AM UTC

Original post: Alleviating Bipolar with meditation

I was asked on the slack, about bipolar and what might help from a meditation standpoint.  I have my own experiences to share. (standard non-medical advice disclaimer applies here, i’m not qualified to give professional advice and you should probably confirm with a professional if you have doubts about trying any of this.)

Here’s a list of things that might help with the subjective mood swinging of bipolar experience.

1. A broadening of awareness and contexts. 

For about 6 months of time when I was really focused on moods (and 10 years before that), I felt like I didn’t have moods, moods had me (moods distinct from emotions which can be had from moment to moment, moods are more like background, the colour of the day). I would wake up and find out today was “miserable” or “excited”.

I worked on a specific type of meditation practice that is called broadening of awareness (there are 2 different instructions for methods).  I got lucky that this helped me and I wasn’t expecting it. When moods had me, it felt like things “just are” miserable. Now my awareness is broader than the moods and “I”* contain them.  (*meditative “I” and “self” are a rabbit hole)

Instructions: Most people have their sense of their self boundary in line with their skin barrier. “I” end at my skin. But it’s possible to expand that boundary, and shift it to larger. Particularly the “kinetic sphere”, the area where one might be able to reach outside the body, and then further to the whole room size. Holding this “barrier” thing at the size of the room means that I’m “anchored” metaphorically to more solid things than my own body. Obviously “I’m” still the same but my ground is the actual stationary room. Which does not feel moods like my body does. (*explanation of why it helps may be entirely irrelevant, fact is, anecdata: it helped me)

There’s space in my new expanded “me” to find the body being a certain mood but also to find stillness out there in the room which doesn’t get dragged around like the moods do.  I felt the pull of daily moods dry up. Obviously my body is still in grump but “I’m not” mentally trapped in that experience. From there, there’s a new, deeper breathing pattern that supports the broader awareness practice and that’s to be discovered and also hinted at.  I would encourage trying it for a few minutes a day and then going for a permanent shift into what is sometimes described as “spaciousness”.

Instructions 2: awareness specifically in the visual field can be expanded out the peripheral. Start by picking an object straight ahead to look at and focus on. Now expand the awareness to the peripheral of the visual field. Hold there for 30 seconds, then push on towards expanding the peripheral. this works well looking up at the sky, or the ocean because of the broadness of the visual object in the visual field. push the “awareness” beyond the visual field until there’s a sense of spidey-sense tingling to what’s outside the visual field. Hold a broadness of awareness to the visual area and the spidey sense. Try to engage this broad sense regularly and through the day, try to live in this broad-sense of the world around you. Notice that a “mood” is within this sense, not fully covering the whole space. If you work at the broadness, that sense comes.

2. Stages of insight

At the same time as trying that practice, I was cycling through (technical meditation term – can be read about in MCTB2 book) “the stages of insight“. As I would cycle I would hit sensation like fear, and it would call up involuntary intrusive memories about things I feared, then I would the next day have a “when will it end” feeling and wrestle with that one.

For 2, what became important is forming a relationship with the memories that I didn’t like. Due to lots of meditation, I was pretty clear what was normal and what was an intrusive visit from my past. I started asking the question, “why is this here?” and that question eventually turned into, “how is this here to help?” or “what do I need to still learn from this memory?” and that was a huge shift.

After those questions were hard ingrained into my attitude, within a week, shitty memories stopped showing up. Possibly because I got so good at relating to them that I was never calling them, “shitty memories”, and possibly because I never felt shit again about them, I’d just appreciate the lesson that I was to learn.  And from that I stopped cycling nearly as hard. I still notice bits of cycling but I’m above the cycle, not in it.

3 Greater bodily awareness.

a few days ago I wanted a photo of myself, so I put on a fancy shirt and got out of bed to take the photo.  3 minutes later I found myself eating things. When I asked myself what’s going on, because I wasn’t hungry, I noticed that I was cold and I was using food to stop feeling cold. An interesting discovery. I made my way back to warm things.

It’s bodily awareness that helps with the moods and actions. I can feel where in my body (or not) I’m feeling depressed or angry and I can alleviate it via movement or internal sensation and not by outwardly being moody or suffering mood swings.

For this I’ve done a lot of meditation and body scan attention work. Any sensation is relevant, itching the head, the knot in the stomach, the tingle in the toes. It’s all relevant to the way I think.

It’s a rat rationality thing to assume that these sensation experiences are noise but they are not. All sensation is relevant.

Some combination of the 3 have helped me to the point where I doubt I have bipolar any more.  I was fairly confident at one point and now it seems unlikely to be a useful diagnosis.

And if there’s a 4 and 5 it’s, watch sleep and social life and make sure to get enough of both, as well as being aware of instability in both which can start a cycle of instability.  This is from Interpersonal Social Rhythm Therapy IPSRT – the only therapy designed for bipolar. Fixing my sleep made a big difference, and fixing my mood first thing in the morning did too.

Shoutout to Bipolar Awakenings for being more on the odd-strange-spiritual side of meditative practice towards progress on alleviating bipolar.



Discuss

Proposed algorithm to fight anchoring bias

3 августа, 2019 - 07:07
Published on August 3, 2019 4:07 AM UTC

Anchoring is a classic cognitive bias which has been discussed on Less Wrong before. Anchoring seems very difficult to avoid. Experiments have found that warning subjects about anchoring, or giving them cash incentives, doesn't solve the problem.

Here's an algorithm to fight anchoring that I would like to see a researcher test, based on binary search:

  1. Tell subjects to think of a number which is clearly too high for the quantity they want to estimate (an upper bound).
  2. Tell subjects to think of a number which is clearly too low (a lower bound).
  3. Tell subjects to find the midpoint of the upper bound and the lower bound and figure out whether it's too high or too low.
  4. The midpoint has now been judged as an upper/lower bound. Combined with the original lower/upper bound, we have a new, narrower range to explore. If this range is narrow enough, report its midpoint; otherwise go to step 3.

You could have two experimental conditions: one condition where subjects think of a number which is clearly too high first (the steps are in the order above), and another condition where subjects think of a number which is clearly too low first (steps 1 & 2 are swapped). If estimates from the two conditions are similar, the technique is successful.



Discuss

Open & Welcome Thread August 2019

3 августа, 2019 - 02:56
Published on August 2, 2019 11:56 PM UTC

  • If it’s worth saying, but not worth its own post, here's a place to put it.
  • And, if you are new to LessWrong, here's the place to introduce yourself.
    • Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are welcome.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ.

The Open Thread sequence is here.



Discuss

'Longtermism' definitional discussion on EA Forum

3 августа, 2019 - 02:53
Published on August 2, 2019 11:53 PM UTC

Will MacAskill is seeking to come to consensus on the meaning of the term longtermism in the EA community. Figured my fellow LW-er EAs who check LW more often than the EA Forum may want to follow along and possibly weigh in.



Discuss

Practical consequences of impossibility of value learning

3 августа, 2019 - 02:06
Published on August 2, 2019 11:06 PM UTC

There is a No Free Lunch result in value-learning. Essentially, you can't learn the preferences of an agent from its behaviour unless you make assumptions about its rationality, and you can't learn its rationality unless you make assumptions about its preferences.

More importantly, simplicity/Occam's razor/regularisation don't help with this, unlike with most No Free Lunch theorems. Among the simplest explanations of human behaviour are:

  1. We are always fully rational all the time.
  2. We are always fully anti-rational all the time.
  3. We don't actually prefer anything to anything.

That result, though mathematically valid, seems highly theoretical, and of little practical interest - after all, for most humans, it's obvious what other humans want, most of the time. But I'll argue that the result has strong practical consequences.

Identifying clickbait

Suppose that Facebook or some other corporation decides to cut down on the amount of clickbait on its feeds.

This shouldn't be too hard, the programmers reason. They start by selecting a set of clickbait examples, and check how people engage with these. They programme a neural net to recognise that kind of "engagement" on other posts, which nets a large amount of candidate clickbait. They then go through the candidate posts, labelling the clear examples of clickbait and the clear non-examples, and add these to the training and test sets. They retrain and improve the neural net. A few iterations later, their neural net is well trained, and they let it run on all posts, occasionally auditing the results. Seeking to make the process more transparent, they run interpretability methods on the neural net, seeking to isolate the key components of clickbait, and clear away some errors or over-fits - maybe the equivalent, for clickbait, of removing the "look for images of human arms" in the dumbbell identification nets.

The central issue

Could that method work? Possibly. With enough data and enough programming efforts, it certainly seems that it could. So, what's the problem?

The problem is that so many stages of the process requires choices on the part of the programmers. The initial selection of clickbait in the first place; the labelling of candidates at the second stage; the number of cycles of iterations and improvements; the choice of explicit hyper-parameters and implicit ones (like how long to run each iteration); the auditing process; the selection of key components. All of these rely on the programmers being able to identify clickbait, or the features of clickbait, when they see them.

And that might not sound bad; if we wanted to identify photos of dogs, for example, we would follow a similar process. But there is a key difference. There is a somewhat objective definition of dog (though beware ambiguous cases). And the programmers, when making choices, will be approximating or finding examples of this definition. But there is no objective, semi-objective, or somewhat objective definition of clickbait.

Why? Because the definition of clickbait depends on assessing the preferences of the human that sees it. It can be roughly defined as "something a human is likely to click on (behaviour), but wouldn't really ultimately want to see (preference)".

And, and this is an important point, the No Free Lunch theorem applies to humans. So humans can't deduce preferences or rationality from behaviour, at least, not without making assumptions.

So how do we solve the problem? Because humans do often deduce the preferences and rationality of other humans, and often other humans will agree with them, including the human being assessed. How do we do it?

Well, drumroll, we do it by... making assumptions. And since evolution is so very lazy, the assumptions that humans make - about each other's rationality/preference, about their own rationality/preference - are all very similar. Not identical, of course, but compared with a random agent making random assumptions to interpret the behaviour of another random agent, humans are essentially all the same.

This means that, to a large extent, it is perfectly valid for programmers to use their own assumptions when defining clickbait, or in other situations of assessing the values of others. Indeed, until we solve the issue in general, this may be the only way of doing this; it's certainly the only easy way.

The lesson

So, are there any practical consequences for this? Well, the important thing is that programmers realise they are using their own assumptions, and take these into consideration when programming. Even things that they feel might just be "debugging", by removing obvious failure modes, could be them injecting their assumptions into the system. This has two major consequence:

  1. These assumptions don't form a nice neat category that "carve reality at its joints". Concepts such as "dog" are somewhat ambiguous, but concepts like "human preferences" will be even more so, because they are a series of evolutionary kludges, rather than a single natural thing. Therefore we expect that extrapolating programmer assumptions, or moving to a new distribution, will result in bad behaviour, that will have to be patched anew with more assumptions.
  2. There are cases when their assumptions and those of the users may diverge; looking out for these situations is important. This is easier if programmers realise they are making assumptions, rather than approximating objectively true categories.


Discuss

Power Buys You Distance From The Crime

2 августа, 2019 - 23:50
Published on August 2, 2019 8:50 PM UTC

Introduction

Taxes are typically meant to be proportional to money (or negative externalities, but that’s not what I’m focusing on). But one thing money buys you is flexibility, which can be used to avoid taxes. Because of this, taxes aimed at the wealthy tend to end up hitting the well-off-or-rich-but-not-truly-wealthy harder, and tax cuts aimed at the poor end up helping the middle class. Examples (feel free to stop reading these when you get the idea, this is just the analogy section of the essay):

  • Computer programmers typically have the option to work remotely in a low-tax state; teachers need to be where the classroom is. 
  • Estate taxes tend to hit families with single large assets (like a business) harder than those with diverse investments (who can simply sell assets to pay for taxes), who are hit harder than those with enough wealth to create trust funds.
  • Executives can choose to receive stock (which is taxed more favorably) instead of cash to the exact percentage they desire. Well paid employees are offered stock, but the amount will not be tailored to their needs. Lower level employees either are not offered this, or are not in a position to take advantage of it.
  • The legal distinction between a business (whose expenses are tax deductible) and a hobby (deductions not allowed) is based on whether the activity nets you income (there are complications and you can sometimes prove a money loser is a business, but this is a good rule of thumb). Small business owners (e.g. lawyers) can fold their occasionally-revenue-generating hobby (e.g. photography) into their real business, enabling tax deductions for their hobby.
  • IRAs, 401ks, HSAs, and FSAs all lock your money up for a time or purpose, in exchange for lower or delayed taxes. You can only take advantage of them if you’re sure you won’t need the money for another purpose sooner.
  • More examples here.

Note that most of these are perfectly legal and the rest are borderline. But we’re still not getting the result we want, of taxes being proportional to income.

When we assess moral blame for a situation, we typically want it to be roughly in proportion to much power a person has to change said situation. But just like money can be used to evade taxes, power can be used to avoid blame. This results in a distorted blame-distribution apparatus which assigns the least blame to the person most able to change the situation. Allow me a few examples to demonstrate this.

 

Examples 1 + 2: Corporate Malfeasance

The Wells Fargo account fraud scandal: in order to meet quotas, entry level Wells Fargo employees created millions of unauthorized accounts (typically extra services for existing customers). No executive (provably) gave an order to make phony accounts, but they did set the quotas without reference to what customers wanted, and didn’t (sufficiently) disincentivize fake accounts. Who is more morally responsible in that situation: an entry level employee trying desperately to make rent, or the rich executive who didn’t ask questions about all the new accounts with @wellsfargo.com e-mail addresses? The entry level employee is the one who made the conscious decision to defraud people, but if he didn’t he’d get fired and someone else would commit the fraud. The executive is the one who could actually change something.

Or corporate slavery. No company goes “I’m going to go out and enslave people today” (especially not publicly), but not paying people is sometimes cheaper than paying them, so financial pressure will push towards slavery. Public pressure pushes in the opposite direction, so companies try not to visibly use slave labor. But they can’t control what their subcontractors do, and especially not what their subcontractors’ subcontractors’ subcontractors do, and sometimes this results in workers being unpaid and physically blocked from leaving.

Who’s at fault for the subcontractor(^3)’s slave labor? One obvious answer is “the person locking them in during the fire” or “the parent who gives their kid piecework”, and certainly it couldn’t happen without them. But if we say “Nike’s lack of knowledge makes them not responsible”, we give them an incentive to subcontract without asking follow up questions. The executive is probably benefiting more from the system of slave labor than the factory owner is from his little domain, and has more power to change what is happening. If the small factory owner pays fair wages, he gets outcompeted by a factory that does use slave labor. If the Nike CEO decides to insource their manufacturing to ensure fair working conditions, something actually changes.

…Unless consumers switch to a cheaper, slavery-driven shoe brand.

Which is actually really hard to not do. You could choose more expensive shoes, but the profit margin is still bigger if you shrink expenses, so that doesn’t help (which is why Fairtrade was a failure from the workers’ perspective). You can’t investigate the manufacturing conditions of everything you buy– it’s just too time consuming. But if you punish obvious enslavement and conduct no follow up studies, what you get is obscured enslavement, not decent working conditioners.

Moral Mazes describes the general phenomenon on page 21:

Moreover, pushing down details relieves superiors of the burden of too much knowledge, particularly guilty knowledge. A superior will say to a subordinate, for instance: “Give me your best thinking on the problem with [X].” When the subordinate makes his report, he is often told: “I think you can do better than that,” until the subordinate has worked out all the details of the boss’s predetermined solution, without the boss being specifically aware of “all the eggs that have to be broken.” It is also not at all uncommon for very bald and extremely general edicts to emerge from on high. For example, “Sell the plant in [St. Louis]; let me know when you’ve struck a deal,” or “We need to get higher prices for [fabric X]; see what you can work out,” or “Tom, I want you to go down there and meet with those guys and make a deal and I don’t want you to come back until you’ve got one.” This pushing down of details has important consequences.

First, because they are unfamiliar with—indeed deliberately distance themselves from—entangling details, corporate higher echelons tend to expect successful results without messy complications. This is central to top executives’ well-known aversion to bad news and to the resulting tendency to kill the messenger who bears the news.

Second, the pushing down of details creates great pressure on middle managers not only to transmit good news but, precisely because they know the details, to act to protect their corporations, their bosses, and themselves in the process. They become the “point men” of a given strategy and the potential “fall guys” when things go wrong. From an organizational standpoint, overly conscientious managers are particularly useful at the middle levels of the structure. Upwardly mobile men and women, especially those from working-class origins who find themselves in higher status milieux, seem to have the requisite level of anxiety, and perhaps tightly controlled anger and hostility, that fuels an obsession with detail. Of course, such conscientiousness is not necessarily, and is certainly not systematically, rewarded; the real organizational premiums are placed on other, more flexible, behavior.

These examples differ in an important way from tax structuring: structuring requires seeking out advice and acting on it to achieve the goal. It’s highly agentic. The Wells Fargo and apparel-outsourcing cases required no such agency on the part of executives. They vaguely wished for something (more revenue, fewer expenses), and somehow it happened. An employee who tried to direct the executives’ attention to the fact that they were indirectly employing slaves would probably be fired before they ever reached the executives. Executives are not only outsourcing their dirty work, they’re outsourcing knowledge of their dirty work. 

[Details of personal anecdotes changed both intentionally and by the vagaries of human memory]

 

Example 3: Foreign Medical Care

My cousin Angela broke her leg while traveling in Thailand, and was delighted by the level of care she received at the Thai hospital– not just medically, but socially. Nurses brought her flowers and were just generally nicer than their American counterparts. Her interpretation was that Thailand was a place motivated by love and kindness, not money, and Americans should aspire to this level of regard for their fellow human being. My interpretation was that she had enough money to buy the goodwill of everyone in the room without noticing, so what she should have learned is that being rich is awesome, and that being an American who travels internationally is enough to qualify you as rich.

This is mostly a success story for the free market: Angela got good medical care and the nurses got money (I’m assuming). Any crime in this story were committed off-screen. But Angela was certainly benefiting from the nurses’ restrained choices in life. And had she had actual power to affect healthcare in US, trying to fix it based on what she learned in Thailand would have done a lot of damage.

 

Example 4: My Dating an Artist Experience

My starving-artist ex-boyfriend, Connor, stayed with me for two months after a little bad luck and a lot of bad decisions cost him his job and then apartment (this was back when I had a two bedroom apartment to myself– I miss Seattle). During this time we had one big fight. My view on the fight now is that I was locally in the right but globally the disagreement was indicative of irreconcilable differences that should have led us to break up. That was delayed by months when he capitulated.

One possibility is that he genuinely thought he could change and that I was worth the attempt. Another is that he saw the incompatibility, or knew things that should have led him to see it, but lied or blocked out the knowledge so that he could keep living with me. This would be a shitty, manipulative thing for him to do. On the other hand, what did I expect? If the punishment for breaking up with me was, best case scenario, moving into a homeless shelter, of course he felt pressure to appease me. 

It wasn’t my fault he felt that pressure, any more than it was Angela’s fault her nurses were born with fewer options than her. Time in my spare bedroom was a gift to him I had no obligation to keep giving. But if I’d really valued a coercion free decision, I would have committed to housing him independent of our relationship. Although if that becomes common knowledge, it just means people can’t make an uncoerced decision to date me at all. And if helping Connor at all meant a commitment to do so forever, he would get a lot less help.

This case is more complicated than the corporate cases because the powerful person (me) was getting merely the appearance of what she wanted (a genuine relationship with a compatible person), not the real thing. And because the exploited party was either me or Connor, not a third party like bank customers. No one thinks the Wells Fargo CEO was a victim the way I arguably was. But the universe was contorting itself to give me what I apparently wanted.

Summary

What all of these stories have in common is that (relatively) powerful people’s desires were met by people less powerful than them, without them having to take responsibility for the action or sometimes even the desire. Society conspired to give them what they wanted (or in the case of Connor, a facsimile of what I wanted) without them having to articulate the want, even to themselves. That’s what power means: ability to make the game come out like you want. Disempowered people are forced to consciously notice things (e.g., this quota is unreachable) and make plans (e.g., create fraudulent accounts) where a powerful person wouldn’t. And it’s unfair to judge them for doing so while ignoring the morality of the powerful who never consider the system that brings them such nice things. 

Take home message:

  1. The most agentic person in a situation is not necessarily most morally culpable. One of the things power buys you is distance from the crime.
  2. Power obscures information flow. If you are not proactively looking to see how your wants and needs are being met, you are probably benefiting from something immoral.

 

This piece was inspired by a conversation with and benefited from comments by Ben Hoffman. I’d also like to thank several commenters on Facebook for comments on an earlier draft and Justis Mills for copyediting.



Discuss

Very different, very adequate outcomes

2 августа, 2019 - 23:31
Published on August 2, 2019 8:31 PM UTC

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

Let Up be the utility function that - somehow - expresses your preferences. Let Uh be the utility function expresses your hedonistic pleasure.

Now imagine an AI is programmed to maximise U(q)=qUp+(1−q)Uh. If we vary q in the range of 5% to 95%, then we will get very different outcomes. At 5%, we will generally be hedonically satisfied, and our preferences will be followed if they don't cause us to be unhappy. At 95%, we will accomplish any preference that doesn't cause us huge amounts of misery.

It's clear that, extrapolated over the whole future of the universe, these could lead to very different outcomes[1]. But - and this is the crucial point - none of these outcomes are really that bad. None of them are the disasters that could happen if we picked a random utility U. So, for all their differences, they reside in the same nebulous category of "yeah, that's an ok outcome." Of course, we would have preferences as to where q lies exactly, but few of us would risk the survival of the universe to yank q around within that range.

What happens when we push q towards the edges? Pushing q towards 0 seems a clear disaster: we're happy, but none of our preferences are respected; we basically don't matter as agents interacting with the universe any more. Pushing q towards 1 might be a disaster: we could end up always miserable, even as our preferences are fully followed. The only thing protecting us from that fate is the fact that our preferences include hedonistic pleasure; but this might not be the case in all circumstances. So moving q to the edges is risky in the way that moving around in the middle is not.

In my research agenda, I talk about adequate outcomes, given a choice of parameters, or acceptable approximations. I mean these terms in the sense of the example above: the outcomes may vary tremendously from one another, given the parameters or the approximation. Nevertheless, all the outcomes avoid disasters and are clearly better than maximising a random utility function.

  1. This fails to be true if preference and hedonism can be maximised independently; eg if we could take an effective happy pill and still follow all our preferences. I'll focus on the situation where there are true tradeoffs between preference and hedonism. ↩︎



Discuss

Rethinking Batch Normalization

2 августа, 2019 - 23:21
Published on August 2, 2019 8:21 PM UTC

Yesterday we saw a glimpse into the inner workings of batch normalization, a popular technique in the field of deep learning. Given that the effectiveness of batch normalization has been demonstrated beyond any reasonable doubt, it may come as a surprise that researchers don't really know how it works. At the very least, we sure didn't know how it worked when the idea was first proposed.

One might first consider that last statement to be unlikely. In the last post I outlined a relatively simple theoretical framework for explaining the success of batch normalization. The idea is that batch normalization reduces the internal covariate shift (ICS) of layers in a network. In turn, we have a neural network that is more stable, and robust to large learning rates, and enables much quicker training.

And this was the standard story in the field for years, until a few researchers decided to actually investigate it.

Here, I hope to convince you that the theory really is wrong. While I'm fully prepared to make additional epistemic shifts on this question in the future, I also fully expect to never shift my opinion back.

When I first read the original batch normalization paper, I felt like I really understood the hypothesis. It felt simple enough, was reasonably descriptive, and intuitive. But I didn't get a perfect visual of what was going on — I sort of hand-waved the step where ICS contributed to an unstable gradient step. Instead I, like the paper, argued by analogy, that since controlling for covariate shifts were known for decades to help training, a technique to reduce internal covaraite shift is thus a natural extension of this concept.

It turned out this theory wasn't even a little bit right. It's not that covariate shifts aren't important at all, but that the entire idea is based on a false premise.

Or at least, that's the impression I got while reading Shibani Santurkar et al.'s How Does Batch Normalization Help Optimization? Whereas the original batch normalization paper gave me a sense of "I kinda sorta see how this works," this paper completely shattered my intuitions. It wasn't just the weight of the empirical evidence, or the theoretical underpinning they present; instead what won me over was the surgical precision of their rebuttal. They directly saw how to formalize the theory of improvement via ICS reduction and tested it on BatchNorm directly. The theory turned out to be simple, intuitive, and false.

In fairness, it wasn't laziness that prohibited researchers from reaching our current level of understanding. In the original batch normalization paper, the authors indeed proposed a test for measuring batch normalization's effect on ICS.

The problem was instead twofold: their method for measuring ICS was inadequate, and failed to consistently apply their proposed mechanism for how ICS reduction was supposed to work in their testing conditions. More importantly however, they didn't even test the theory that ICS reduction contributed to performance gains. Instead their argument was based on a simple heuristic: we know that covariate shifts are bad, we think that batch normalization reduces ICS, and we also know batch normalization increases performance charactersitics — therefore batch normalization works due to ICS reduction. As far as I can tell, most the articles that came after the original paper just took this heuristic at face value, citing the paper and calling it a day.

And it's not a bad heurstic, all in all. But perhaps it's a tiny bit telling that on yesterday's post, Lesswrong user crabman was able to anticipate the true reason for batch normalization's success, defying both my post and the supposed years that it took researchers to figure this stuff out. Quoth crabman,

I am imagining this internal covariate shift thing like this: the neural network together with its loss is a function which takes parameters θ as input and outputs a real number. Large internal covariate shift means that if we choose ε>0, perform some SGD steps, get some θ, and look at the function's graph in ε-area of θ, it doesn't really look like a plane, it's more curvy like.

In fact, the above paragraph doesn't actually describe internal covariate shift, but instead the smoothness of the loss function around some parameters .mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} θ. I concede, it is perhaps possible that this is really what the original researchers meant when they termed internal covariate shift. It is therefore also possible that this whole critique of the original theory is based on nothing but a misunderstanding.

But I'm not buying it.

Take a look at how the original paper defines ICS,

We define Internal Covariate Shift as the change in the distribution of network activations due to the change in network parameters during training.

This definition can't merely refer to the smoothness of the gradient around θ. For example, the gradient could be extremely bumpy and have sharp edges and yet ICS could be absent. Can you think of an example of a neural network like this? Here's one: think of a network with just one layer whose loss function is some extremely contorted shape because its activation function is some crazy non-linear function. It wouldn't be smooth, but its input distribution would be constant over time, given that it's only one layer.

I can instead think of two interpretations of the above definition for ICS. The first interpretation is that ICS simply refers to the change of activations in a layer during training. The second interpretation is that this definition specifically refers to change of activations caused by changes in network parameters at previous layers.

This is a subtle difference, but I believe it's important to understand. The first interpretation allows ease of measurement, since we can simply plot the mean and variance of the input distributions of a layer during training. This is in fact how the paper (section 4.1) tests batch normalization's effect on ICS. But really, the second interpretation sounds closer to the hypothesized mechanism for how ICS was supposed to work in the first place.

On the level of experimentation, the crucial part of the above definition is the part that says "change [...] due to the change in network parameters." Merely measuring the change in network parameters over time is insufficient. Why? Because the hypothesis was that if activation distributions change too quickly, then a layer will have its gradient pushed into a vanishing or exploding region. In the first interpretation, a change over time could still be slow enough for each layer to adapt appropriately. Therefore, we need additional information to discover whether ICS is occurring in the way that is described.

To measure ICS under the second interpretation, we have to measure the counterfactual change of parameters — in other words, the amount that the network activations change as a result of other parameters being altered. And we also need a way of seeing whether the gradient is being pushed into extreme regions as a result of these parameters being changed. Only then can we see whether this particular phenomenon is actually occurring.

The newer paper comes down heavily in favor of this interpretation, and adds a level of formalization on top of it. Their definition focuses on measuring the difference between two different gradients: one gradient with all of the previous layers altered by back propagation, and one gradient where all of the previous layers have been unaltered. Specifically, let L by a loss function for a neural network of k layers. Then, their definition of ICS for the activation i and time t is ||Gt,i−G′t,i||2 where

Gt,i=∇W(t)iL(W(t)1,...,W(t)k;x(t),y(t))

G′t,i=∇W(t)iL(W(t+1)1,...,W(t+1)i−1,W(t)i,W(t)i+1,...,W(t)k;x(t),y(t)))

and (x(t),y(t)) is the batch of input-label pairs to train the network at time t.

The first thing to note about this definition is that it allows a clear, precise measurement of ICS, which is based solely on the change of the gradient due to shifting parameters beneath a layer during backpropagation.

What Shibani Santurkar et al. found when they applied this definition was a bit shocking. Not only did batch normalization fail to decrease ICS, in some cases it even increased it when compared to naive feedforward neural networks. And to top that off, they found that even in networks where they artificially increased ICS, performance barely suffered.

In one experiment they applied batch normalization to each hidden layer in a neural network, and at each step, they added noise after the batch normalization transform in order to induce ICS. This noise wasn't just Gaussian noise either. Instead they chose the noise such that it was a different Gaussian at every time step and every layer, such that the Gaussian parameters (specifically mean and variance) varied according to a yet another meta Gaussian distribution. What they discovered was that even though this increased measured ICS dramatically, the time it took to train the networks to the baseline accuracy was almost identical to regular batch normalization.

And remember that batch normalization actually does work. In all of the experiments for mere performance increases, batch normalization has passed the tests with flying colors. So clearly, since batch normalization works, it must be for a different reason than simply reducing ICS. But that leaves one question remaining: how on Earth does it work?

I have already hinted at the reason above. The answer lies in something even simpler to understand than ICS. Take a look at this plot.


Imagine the red ball is rolling down this slope, applying gradient descent at each step. And consider for a second that the red ball isn't using any momentum. It simply looks at each step which direction to move and moves in that direction in proportion to the slope at that point.

A problem immediately arises. Depending on how we choose our learning rate, the red ball could end up getting stuck almost immediately. If the learning rate is too slow, then it will probably get stuck on the flat plane to the right of it. And in practice, if its learning rate is too high, then it might move over to another valley entirely, getting itself into an exploding region.

The way that batch normalization helps is by changing the loss landscape from this bumpy shape into one more like this.

Now it no longer matters that much what we set the learning rate to. The ball will be able to find its way down even if its too small. What used to be a flat plane has now been rounded out such that the ball will roll right down.

The specific way that the paper measures this hypothesis is by applying pretty standard ideas from the real analysis toolkit. In particular, the researchers attempted to measure the Lipschitzness of the loss function around the parameters θ for various types of deep networks (both empirically and theoretically). Formally a function is L-Lipschitz if |f(x1)−f(x2)|≤L||x1−x2|| for all x1 and x2. Intuitively, this is a measure of how smooth the function is. The smaller the constant L, the function has fewer and less extreme jumps over small intervals in some direction.

This way of thinking about the smoothness of the loss function has the advantage of including a rather natural interpretation. One can imagine that the magnitude of some gradient estimate is a prediction of how much we expect the function to fall if we move in that direction. We can then evaluate how good we are at making predictions across different neural network schemes and across training steps. When gradient predictiveness was tested, there were no surprises — the networks with batch normalization had the most predictive gradients.

Perhaps even more damning is that not only did the loss function become more smooth, the gradient landscape itself became more smooth, a property known as β−smoothness. This had the effect of not only making the gradients more predictive of the loss, but the gradients themselves were easier to predict in a certain sense — they were fairly consistent throughout training.

Perhaps the way that batch normalization works is by simply smoothing out the loss function. At each layer we are simply applying some normalizing transformation which helps remove extreme points in the loss function. This has the additional prediction that other transformation schemes will work just as well, which is exactly what the researchers found. In fact, there was pretty much nothing special with the exact way that batch normalization transforms the input, other than the properties that contribute to smoothness. And given that so many more methods have now come out which build on batch normalization despite using quite different operations, isn't this exactly what we would expect?

Is this the way batch normalization really works? I'm no expert, but I found this interpretation much easier to understand, and also a much simpler hypothesis. Maybe we should apply Occam's razor here. I certainly did.

In light of this discussion, it's also worth reflecting once again that the argument "We are going to be building the AI so of course we'll understand how it works" is not a very good one. Clearly the field can stumble on solutions that work, and yet the reason why they work is remains almost unknown for years, even when the answer is hiding in plain sight. I honestly can't say for certain whether happens a lot, or too much. I only have my one example here.

Tomorrow I'll be taking a step back from neural network techniques to analyze generalization in machine learning models. I will briefly cover the basics of statistical learning theory and will then move to a framing of learning theory in light of recent deep learning progress. This will give us a new test bed to see if old theories can adequately adapt to new techniques. What I find might surprise you.



Discuss

Permissions in Governance

2 августа, 2019 - 22:59
Published on August 2, 2019 7:50 PM UTC

Compliance Costs

The burden of a rule can be separated into (at least) two components.

First, there’s the direct opportunity cost of not being allowed to do the things the rule forbids. (We can include here the penalties for violating the rule.)

Second, there’s the “cost of compliance”, the effort spent on finding out what is permitted vs. forbidden and demonstrating that you are only doing the permitted things.

Separating these is useful. You can, at least in principle, aim to reduce the compliance costs of a rule without making it less stringent.

For instance, you could aim to simplify the documentation requirements for environmental impact assessments, without relaxing standards for pollution or safety.  “Streamlining” or “simplifying” regulations aims to reduce compliance costs, without necessarily lowering standards or softening penalties.

If your goal in making a rule is to avoid or reduce some unwanted behavior — for instance, to reduce the amount of toxic pollution people and animals are exposed to — then shifting up or down your pollution standards is a zero-sum tradeoff between your environmental goals and the convenience of polluters.

Reducing the costs of compliance, on the other hand, is positive-sum: it saves money for developers, without increasing pollution levels.  Everybody wins. Where possible, you’d intuitively think rulemakers would always want to do this.

Of course, this assumes an idealized world where the only goal of a prohibition is to reduce the total amount of prohibited behavior.

You might want compliance costs to be high if you’re using the rule, not to reduce incidence of the forbidden behavior, but to produce distinctions between people — i.e. to separate the extremely committed from the casual, so you can reward them relative to others.  Costly signals are good if you’re playing a competitive zero-sum game; they induce variance because not everyone is able or willing to pay the cost.

For instance, some theories of sexual selection (such as the handicap principle) argue that we evolved traits which are not beneficial in themselves but are sensitive indicators of whether or not we have other fitness-enhancing traits. E.g. a peacock’s tail is so heavy and showy that only the strongest and healthiest and best-fed birds can afford to maintain it. The tail magnifies variance, making it easier for peahens to distinguish otherwise small variations in the health of potential mates.

Such “magnifying glasses for small flaws” are useful in situations where you need to pick “winners” and can inherently only choose a few. Sexual selection is an example of such a a situation, as females have biological limits on how many children they can bear per lifetime; there is a fixed number of males they can reproduce with.  So it’s a zero-sum situation, as males are competing for a fixed number of breeding slots.  Other competitions for fixed prizes are similar in structure, and likewise tend to evolve expensive signals of commitment or quality.  A test that’s so easy anyone can pass it, is useless for identifying the top 1%.

On a regulatory-capture or spoils-based account of politics, where politics (including regulation) is seen as a negotiation to divide up a fixed pool of resources, and loyalty/trust is important in repeated negotiations, high compliance costs are easy to explain. They prevent diluting the spoils among too many people, and create variance in people’s ability to comply, which allows you to be selective along whatever dimension you care about.

Competitive (selective, zero-sum) processes work better when there’s wide variance among people. A rule (or boundary, or incentive) that’s meant to minimize an undesired behavior is, by contrast, looking at aggregate outcomes. If you can make it easier for people to do the desired behavior and refrain from the undesired, you’ll get better aggregate behavior, all else being equal.  These goals are, in a sense, “democratic” or “anti-elitist”; if you just care about total aggregate outcomes, then you want good behavior to be broadly accessible.

Requiring Permission Raises Compliance Costs 

A straightforward way of avoiding undesired behavior is to require people to ask an authority’s permission before acting.

This has advantages: sometimes “undesired behavior” is a complex, situational thing that’s hard to codify into a rule, so the discretional judgment of a human can do better than a rigid rule.

One disadvantage that I think people underestimate, however, is the chilling effect it has on desired behavior.

For instance:

  • If you have to ask the boss’s permission individually for each purchase, no matter how cheap, not only will you waste a lot of your employees’ time, but you’ll disincentivize them from asking for even cost-effective purchases, which can be more costly in the long run.
  • If you require a doctor’s appointment for giving pain medication every time, to guard against drug abuse, you’re going to see a lot of people who really do have chronic pain doing without medication because they don’t want the anxiety of going to a doctor and being suspected of “drug-seeking”.
  • If you have to get permission before cleaning or contributing supplies for a shared space, then that space will be chronically under-cleaned and under-supplied.
  • If you have to get permission from a superior in order to stop the production line to fix a problem, then safety risks and defective products will get overlooked. (This is why Toyota mandated that any worker can unilaterally stop the production line.)

The inhibition against asking for permission is going to be strongest for shy people who “don’t want to be a bother” — i.e. those who are most conscious of the effects of their actions on others, and perhaps those who you’d most want to encourage to act.  Those who don’t care about bothering you are going to be undaunted, and will flood you with unreasonable requests.  A system where you have to ask a human’s permission before doing anything is an asshole filter, in Siderea’s terminology; it empowers assholes and disadvantages everyone else.

The direct costs of a rule fall only on those who violate it (or wish they could); the compliance costs fall on everyone.  A system of enforcement that preferentially inhibits desired behavior (while not being that reliable in restricting undesired behavior) is even worse from an efficiency perspective than a high compliance cost on everyone.

Impersonal Boundaries

An alternative is to instantiate your boundaries in an inanimate object — something that can’t intimidate shy people or cave to pressure from entitled jerks.  For instance:

  • a lock on a door is an inanimate boundary on space
  • a set of password-protected permissions on a filesystem is an inanimate boundary on information access
  • a departmental budget and a credit card with a fixed spending limit is an inanimate boundary on spending
  • an electricity source that shuts off automatically when you don’t pay your bill is an inanimate boundary against theft

The key element here isn’t information-theoretic simplicity, as in the debate over simple rules vs. discretion.  Inanimate boundaries can be complex and opaque.  They can be a black box to the user.

The key elements are that, unlike humans, inanimate boundaries do not punish requests that are refused (even socially, by wearing a disappointed facial expression), and they do not give in to repeated or more forceful requests.

An inanimate boundary is, rather, like the ideal version of a human maintaining a boundary in an “assertive” fashion; it enforces the boundary reliably and patiently and without emotion.

This way, it produces less inhibition in shy or empathetic people (who hate to make requests that could make someone unhappy) and is less vulnerable to pushy people (who browbeat others into compromising on boundaries.)

In fact, you can get some of the benefits of an inanimate boundary without actually taking a human out of the loop, but just by reducing the bandwidth for social signals. By using email instead of in-person communication, for instance, or by using formalized scripts and impersonal terminology.  Distancing tactics make it easier to refuse requests and easier to make requests; if these effects are roughly the same in magnitude, you get a system that selects more effectively for enabling desired behavior and preventing undesired behavior. (Of course, when you have one permission-granter and many permission-seekers, the effects are not the same in aggregate magnitude; the permission-granter can get spammed by tons of unreasonable requests.)

Of course, if you’re trying to select for transgressiveness — if you want to reward people who are too savvy to follow the official rules and too stubborn to take no for an answer — you’d want to do the opposite; have an automated, impersonal filter to block or intimidate the dutiful, and an extremely personal, intimate, psychologically grueling test for the exceptional. But in this case, what you’ve set up is a competitive test to differentiate between people, not a rule or boundary which you’d like followed as widely as possible.

Consensus and Do-Ocracy

So far, the systems we’ve talked about are centralized, and described from the perspective of an authority figure. Given that you, the authority, want to achieve some goal, how should you most effectively enforce or incentivize desired activity?

But, of course, that’s not the only perspective one might take. You could instead take the perspective that everybody has goals, with no a priori reason to prefer one person’s goals to anyone else’s (without knowing  what the goals are), and model the situation as a group deliberating on how to make decisions.

Consensus represents the egalitarian-group version of permission-asking. Before an action is taken, the group must discuss it, and must agree (by majority vote, or unanimous consent, or some other aggregation mechanism) that it’s sufficiently widely accepted.

This has all of the typical flaws of asking permission from an authority figure, with the added problem that groups can take longer to come to consensus than a single authority takes to make a go/no-go decision. Consensus decision processes inhibit action.

(Of course, sometimes that’s exactly what you want. We have jury trials to prevent giving criminal penalties lightly or without deliberation.)

An alternative, equally egalitarian structure is what some hackerspaces call do-ocracy.

In a do-ocracy, everyone has authority to act, unilaterally. If you think something should be done, like rearranging the tables in a shared space, you do it. No need to ask for permission.

There might be disputes when someone objects to your actions, which have to be resolved in some way.  But this is basically the only situation where governance enters into a do-ocracy. Consensus decisionmaking is an informal version of a legislative or executive body; do-ocracy is an informal version of a judicial system.  Instead of needing governance every time someone acts, in a judicial-only system you only need governance every time someone acts (or states an intention to act) AND someone else objects.

The primary advantage of do-ocracy is that it doesn’t slow down actions in the majority of cases where nobody minds.  There’s no friction, no barrier to taking initiative.  You don’t have tasks lying undone because nobody knows “whose job” they are.  Additionally, it grants the most power to the most active participants, which intuitively has a kind of fairness to it, especially in voluntary clubs that have a lot of passive members who barely engage at all.

The disadvantages of do-ocracy are exactly the same as its advantages.  First of all, any action which is potentially harmful and hard to reverse (including, of course, dangerous accidents and violence) can be unilaterally initiated, and do-ocracy cannot prevent it, only remediate it after the fact (or penalize the agent.)  Do-ocracies don’t deal well with very severe, irreversible risks. When they have to, they evolve permission-based functions; for instance, the rules firms or insurance companies institute to prevent risky activities that could lead to lawsuits.

Secondly, do-ocracies grant the most power to the most active participants, which often means those who have the most time on their hands, or who are closest to the action, at the expense of absent stakeholders. This means, for instance, it favors a firm’s executives (who engage in day-to-day activity) over investors or donors or the general public; in volunteer and political organizations it favors those who have more free time to participate (retirees, students, the unemployed, the independently wealthy) over those who have less (working adults, parents).  The general phenomenon here is principal-agent problems — theft, self-dealing, negligence, all cases where the people who are physically there and acting take unfair advantage of the people who are absent and not in the loop, but depend on things remaining okay.

A judicial system doesn’t help those who don’t know they’ve been wronged.

Consensus systems, in fact, are designed to force governance to include or represent all the stakeholders — even those who would, by default, not take the initiative to participate.

Consumer-product companies mostly have do-ocratic power over their users. It’s possible to quit Facebook, with the touch of a button. Facebook changes its algorithms, often in ways users don’t like — but, in most cases, people don’t hate the changes enough to quit.  Facebook makes use of personal data — after putting up a dialog box requesting permission to use it. Yet, some people are dissatisfied and feel like Facebook is too powerful, like it’s hacking into their baser instincts, like this wasn’t what they’d wanted. But Facebook hasn’t harmed them in any way they didn’t, in a sense, consent to. The issue is that Facebook was doing things they didn’t reflectively approve of while they weren’t paying attention. Not secretly — none of this was secret, it just wasn’t on their minds, until suddenly a big media firestorm put it there.

You can get a lot of power to shape human behavior just by showing up, knowing what you want, and enacting it before anyone else has thought about it enough to object. That’s the side of do-ocracy that freaks people out.  Wherever in your life you’re running on autopilot, an adversarial innovator can take a bite out of you and get away with it long before you notice something’s wrong.  

This is another part of the appeal of permission-based systems, whether egalitarian or authoritarian; if you have to make a high-touch, human connection with me and get my permission before acting, I’m more likely to notice changes that are bad in ways I didn’t have any prior model of. If I’m sufficiently cautious or pessimistic, I might even be ok with the costs in terms of causing a chilling effect on harmless actions, so long as I make sure I’m sensitive to new kinds of shenanigans that can’t be captured in pre-existing rules.  If I don’t know what I want exactly, but I expect change is bad, I’m going to be much more drawn to permssion-based systems than if I know exactly what I want or if I expect typical actions to be improvements.



Discuss

Страницы