# Новости LessWrong.com

A community blog devoted to refining the art of rationality
Обновлено: 1 час 1 минута назад

### It’s not economically inefficient for a UBI to reduce recipient’s employment

22 ноября, 2020 - 19:40
Published on November 22, 2020 4:40 PM GMT

A UBI (e.g. paying every adult American $8k/year) would reduce recipient’s need for money and so may reduce their incentive to work. This is frequently offered as an argument against a UBI (or as an argument for alternative policies like the EITC that directly incentivize work). This argument is sometimes presented as economically hard-headed realism. But as far as I can tell, there’s not really any efficiency argument here—there’s nothing particularly efficient about people having a stronger incentive to work because they are poorer. The argument seems to mostly get its punch from a vague sense that work is virtuous and necessary. I think that sense is largely mistaken and it should be taken less seriously. (As usual for policy posts, I’m not very confident about any of this; if it makes you happier feel free to imagine sprinkling “seems to me” throughout the prose.) What people fear If I give you$8k, you will probably value marginal dollars less than you used to. Some jobs that used to be a good deal will stop being worth it, and a job search itself may stop being worthwhile. We could study how much this happens empirically, but it’s definitely plausible on paper and it would be my best guess for the long-run effect even if pilot UBI experiments found otherwise.

This seems to be one of the main worries people have about a UBI. For example, some folks at McKinsey write:

In the design of the Finnish experiment, the main research question, agreed to by parliament in the enabling legislation, was the impact of a basic income on employment. Many policy makers assume that an entirely unconditional guaranteed income would reduce incentives to work. After all, the argument goes, why bother with a job if you can have a decent life without one? This assumption has led many countries to deploy active labor-market policies that require people on unemployment benefits to prove their eligibility continually and, often, to participate in some kind of training or to accept jobs offered to them.

[They find that the UBI didn’t decrease work, but this post isn’t even going to get into that.]

But it’s left somewhat vague what exactly is so bad about this.  In fact it’s hard to respond to this concern because, although I’ve seen it expressed so many times, I’ve never really seen the argument laid out clearly.

It’s clear that work creates much of the great stuff in our society, and so reducing how much work happens seems scary and bad. But when I work I get paid, so that I (and the people I share my income with) get most of the benefits from my work. And leisure and slack also create value. If it stops being personally worth it for me to work, then it has likely stopped being socially efficient for me to work, and that’s OK.

The real cost of a UBI is on the taxpayer’s side, and so the actual focus of discussion should be “how expensive is a UBI and are we willing to pay that much?” Thinking about recipients’ incentives to work on top of that is double-counting at best. To be clear, I think that the cost of a UBI usually is the focus, as it should be. So all I’m trying to do is address a little bit of FUD about UBI rather than make the positive case.

My view Is working good for the rest of society?

Suppose you do some work and earn $100. The question from the rest of society’s perspective is whether we got more benefit than the$100 we paid you.

We can get more than $100 if e.g. you spend your$100 on a Netflix subscription that subsidizes better TV, or help our society learn by doing and advance technology for the products you produce or consume.

We can get less than $100 if e.g. you spend your$100 renting an apartment in a city that crowds out others, or buy products that create untaxed externalities like pollution or signaling.

Similarly, if you decide to relax and have fun, society can benefit (e.g. if you are participating in a community that benefits from having more members, or hanging out with your kids who enjoy your company, or doing unpaid work to improve the world) or suffer (e.g. if you contribute to congestion at a public park, or you drive around throwing eggs at people’s houses).

Overall I think that working is probably better for the world than leisure. The effect seems pretty small though (maybe 10% of the value you earn), and I think this isn’t a big deal compared to other efficiency considerations about a UBI. For example, it seems like it is smaller than any one of the additional overhead in additional redistributive programs, the costs of bullshit for benefit recipients, the efficiency losses inherent in poverty (e.g. from credit market failures), and the deadweight losses from taxation.

(Of course the calculus is different for people who pay taxes, since that’s pure social benefit, but a UBI should mostly change employment for low-income families who are paying very low tax rates.)

Dependency trap

Another concern is that people who don’t work won’t develop marketable skills, so they will remain trapped as dependents. Some of my thoughts on that concern:

• On paper, it seems more likely to me that being poor would be a trap than that that having money would be a trap (e.g. by making it very difficult to invest for the future and forcing short-sighted decisions).
• I haven’t seen evidence that giving people money significantly reduces their future earning potential and have seen weak evidence against.
• I think the prior should be against paternalism; at a minimum we ought to have solid evidence that people are making a mistake before being willing to pay to incentivize them to do something for their own benefit.
• If people decide not to work in the future because they expect to continue having a UBI, that’s not a trap, it’s just the temporally-extended version of the previous section.
Other effects

Funding a UBI would involve significant tax increases, and those really do inefficiently decrease taxpayer’s incentives to work. For example, paying 8k per adult in the US would require increase the average tax rate by ~15%. But quantitatively the disincentive seems moderate, and this isn’t relevant if we are comparing a UBI to other government spending. When people get really concerned about incentives to work it’s because they are thinking about beneficiaries who no longer have to work, and that’s the part I think is mistaken. This argument is especially clear when we compare UBI to programs that add a bunch of machinery with the goal of incentivizing work. For example, families with a positive incentive from the EITC probably have an inefficiently large incentive to work since the size of the EITC seems to dominate plausible estimates of externalities (and then other families have an absurdly reduced incentive to work). There may be strong economic cases for these policies based on empirical analyses of e.g. the benefits to communities from more people working. But I haven’t seen studies that convincingly assess causality, and it feels to me like public support for these policies is mostly based on unjustified pro-work sentiment. Discuss ### Why are young, healthy people eager to take the Covid-19 vaccine? 22 ноября, 2020 - 18:24 Published on November 22, 2020 3:24 PM GMT Clearly a lot of people on LW want to take it ASAP. I strongly don't want that - to the point where I will most likely emigrate if it becomes obligatory in my country. Please help me understand what I'm missing. Here is my understanding: • As a young, healthy person, SARS-Cov-2 poses extremely low risk to me: • There is no significant risk of lasting negative health consequences after infection • There is no strong proof for such effect. Such proof would greatly increase acceptance of governments' policies, so there is a strong incentive to publish any such proof. There has also been enough time and cases to identify a pattern of negative consequences lasting 6+ months. Therefore I'm treating absence of a proof despite strong incentives and opportunity as a strong proof of absence. • A pessimistic infection fatality rate is probably around 0.01% • Reinfection is extremely rare, if at all possible. • Again, there is a strong incentive to make people fear a reinfection. • Yet all we hear is rare individual reports that might be test failures or long-lasting lingering infections. • The risk of infection is getting smaller and smaller, as more people in the population become immune - either by infection or by vaccination • Conversely, there is a non-negligible risk associated with a vaccine that has been developed so quickly. • The trials have lasted only months, so we don't know whether there are some side-effects that surface only after some significant time • The trials have only been conducted on tens of thousands of subjects so far, so very severe but rare negative consequences might have gone under the radar • Pfizer's vaccine requires extremely low temperatures, so there is a danger that in some locations it will be transported or stored incorrectly, causing greater risk than that suggested by the trials so far • Both the governments and the vaccine manufacturers have twisted incentives, meaning there is a serious danger of too optimistic reports of the vaccines' efficacy and safety I can only think of two reasons why young, knowledgeable people are so excited about taking the vaccine: • they have contact with someone at risk that they deeply care about, and want to minimise the chance of infecting them, even at the cost of personal safety; • they value safety of strangers higher than their own safety, and want to take the vaccine for the sake of all the people at risk in the society. Discuss ### Demystifying the Second Law of Thermodynamics 22 ноября, 2020 - 16:52 Published on November 22, 2020 1:52 PM GMT (Crossposted from my personal website Thermodynamics is really weird. Most people have probably encountered a bad explanation of the basics at some point in school, but probably don't remember more than • Energy is conserved • Entropy increases • There's something called the ideal gas law/ideal gas equation. Energy conservation is not very mysterious. Apart from some weirdness around defining energy in general, it's just a thing you can prove from whatever laws of motion you're using. But entropy is very weird. You've heard that it measures "disorder" in some vague sense. Maybe you've heard that it's connected to the Shannon entropy of a probability distribution H(p)=∑_x−p(x)lnp(x).mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} . Probably the weirdest thing about it is the law it obeys: It's not conserved, but rather it increases with time. This is more or less the only law like that in physics. It gets even weirder when you consider that at least classical Newtonian physics is time-symmetric. Roughly speaking, this means if you have a movie of things interacting under the laws of Newton, and you play it backwards, they're still obeying the laws of Newton. An orbiting moon just looks like it's orbiting in the other direction, which is perfectly consistent. A stone which is falling towards earth and accelerating looks like it's flying away from earth and decelerating - exactly as gravity is supposed to do. But if there's some "entropy" quality out there that only increases, then that's obviously impossible! When you played the movie backwards, you'd be able to tell that entropy was decreasing, and if entropy always increases, some law is being violated. So what, is entropy some artefact of quantum mechanics? No, as it turns out. Entropy is an artefact of the fact that you can't measure all the particles in the universe at once. And the fact that it seems to always increase is a consequence of the fact that matter is stable at large scales. The points in this post are largely from E.T. Jaynes' Macroscopic Prediction. A proof that entropy doesn't always increase Let X be the set of states of some physical system. Here I will assume that there is a finite number of states and time advances in discrete steps - there is some function T:X→X which steps time forward one step. We assume that these dynamics are time-reversible in the weak sense that T is a bijection - every state is the future of exactly one "past" state. Let S:X→R be some function. Assume S(x)≤S(Tx) - in other words, S can never decrease. Then S is constant, i.e S(x)=S(Tx). Proof: Assume for contradiction S(x)<S(Tx) for some x. Since X is finite, let ∑_xS(x) be the sum of S over all states. Then clearly ∑_xS(x)=∑_xS(Tx), since Tx just ranges over all the xs. But on the other hand, we have S(x)≤S(Tx) for all x, and S(x)<S(Tx) in at least one case. So we must have ∑_xS(x)<∑_xS(Tx) - contradiction. This proof can be generalized to the continuous time and space case without too much trouble, for the types of dynamics that actually show up in physics (using Liouville's Theorem). The proof above still requires a bounded phase volume (corresponding to the finiteness of X). To generalize to other situations we need some more assumptions - the easiest thing is to assume that the dynamics are time-reversible in a stronger sense, and that this is compatible with the entropy in some way. (You can find easy counterexamples in general, e.g. if X=Z and the dynamics are T(x)=x+1, then obviously we really do have that S(x)=x is increasing. Nothing to do about that.) Anyways the bounded/finite versions of the theorems do hold for a toy thermodynamic system like particles in a (finite) box - here the phase volume really is bounded. The true meaning of entropy Okay, so what the hell is going on? Did your high school physics textbook lie to you about this? Well, yes. But you're probably never going to observe entropy going down in your life, so you can maybe rest easy. Let X be the physical system under consideration again. But suppose now that we can't observe x∈X, but only some "high-level description p(x)∈Y. Maybe x is the total microscopic state of every particle in a cloud of gas - their position and momentum - while p(x) is just the average energy of the particles (roughly corresponding to the temperature). x is called a microstate and y=p(x) is called a macrostate. Then the entropy of y∈Y is S(y)=ln(p−1({y}) - the logarithm of the number of microstates x where p(x)=y. We say these are the microstates that realize the macrostate y. The connection with Shannon entropy is now that this is exactly the Shannon entropy of the uniform distribution over p−1(y). This is the distribution you should have over microstates if you know nothing except the microstate. In other words, the entropy measures your uncertainty about the microstate given that you know nothing except the macrostate. There are more sophisticated versions of this definition in general, to account for the fact that • In general, your microstates are probably sets of real numbers, and there are probably infinitely many compatible with the macrostate, so we need a notion of "continuous entropy" (usually called differential entropy, I think) • Your measurement of the macrostate is probably not that certain (but this turns out to matter surprisingly little for thermodynamic systems), but this is the basic gist. Why entropy usually goes up Okay, so why does entropy go up? Because there are more high-entropy states than low-entropy states. That's what entropy means. If you don't know anything about what's gonna happen to x (in reality, you usually understand the dynamics T themselves, but have absolutely no information about x except the macrostate), it's more likely that it will transfer to a macrostate with a higher number of representatives than to one with a low number of representatives. This also lets us defuse our paradox from above. In reality, entropy doesn't go down for literally every microstate x. It's not true that S(p(x))">S(p(Tx))>S(p(x)) for all x - I proved that impossible above. What can be true is this: given a certain macrostate, it's more probable that entropy increases than that it decreases. We can consider an extreme example where we have two macrostates L and H, corresponding to low and high entropy. Clearly the number of low-entropy states that go to a high-entropy state is exactly the same as the number of high-entropy states that go to a low-entropy state. That's combinatorics. But the fraction of low-entropy states that go to high-entropy is then necessarily larger than the fraction of high-entropy states that go to low-entropy states. In other words, P(L(x\_{t+1})|H(x\_t))">P(H(x_t+1)|L(x_t))>P(L(x_t+1)|H(x_t)) Why entropy (almost) always goes up Okay, but that's a lot weaker than "entropy always increases"! How do we get from here to there? I could say some handwavy stuff here about how the properties of thermodynamic systems mean that the differences in the number of representatives between high-entropy and low-entropy states are massive - and that means the right-hand probability above can't possibly be non-neglible. And that in general this works out so that entropy is almost guaranteed to increase. But that's very unsatisfying. It just happened to work out that way? I have a much more satisfying answer: entropy almost always increases because matter is stable at large scales. Wait, what? What does that mean? By "matter is stable at large scales", I mean that the macroscopic behaviour of matter is predictable only from macroscopic observations. When a bricklayer builds a house, they don't first go over them with a microscope to make sure the microstate of the brick isn't going to surprise us later. And as long as we know the temperature and pressure of a gas, we can pretty much predict what will happen if we compress it with a piston. What this means is that, if p(x)=p(x′), then with extremely high probability, p(Tx)=p(Tx′). It might not be literally certain, but it's sure enough. Now, let's say we're in the macrostate y. Then there is some macrostate y′ which is extremely likely to be the next one. For very nearly all x so that p(x)=y, we have p(Tx)=y′. But this means that y′ must have at least that many microstates representing it, since T is a bijection. So the entropy of y′ can at most be a tiny bit smaller than the entropy of y - this difference would be as tiny as the fraction of x with p(Tx)≠y′, so we can ignore it. So unless something super unlikely happens and p(Tx)≠y′, entropy goes up. By the way, this also explains what goes wrong with time-reversibility, and why in reality, you can easily tell that a video is going backwards. The "highly probably dynamics" Y→Y, which takes each macrostate the the most probable next state, don't have to be time-reversible. For instance, let's return to the two-macrostate system above. Suppose that with 100% certainty, low-entropy states become high-entropy. Let there be N_L low-entropy states and N_H high-entropy states. Then, just because T is a bijection, there must be N_L high-entropy states that become low-entropy. Now if N_H≫N_L, then practically all high-entropy states go to other high-entropy states. So L↦H but H↦H. Of course in reality, if you start with a low-entropy state and watch this unfold for a really long time, you'll eventually see it become a low-entropy state again. It's just extremely unlikely to happen in a short amount of time. Entropy is not exactly your uncertainty about the microstate The entropy of a given macrostate is the uncertainty about the microstate of an observer who knows only the macrostate. In general, you have more information than this. For example, if the system starts in a low-entropy state, and you let it evolve into a high-entropy state, you know that the system is in one of the very small number of high-entropy states which come from low-entropy states! But since you can only interact with the system on macroscales, this information won't be useful. Discuss ### UDT might not pay a Counterfactual Mugger 22 ноября, 2020 - 16:36 Published on November 21, 2020 11:27 PM GMT The Counterfactual Mugging is my favorite decision theory problem, and it's the problem that got me started reading LessWrong in the first place. In short, Imagine that one day, Omega comes to you and says that it has just tossed a fair coin, and given that the coin came up tails, it decided to ask you to give it100. Whatever you do in this situation, nothing else will happen differently in reality as a result. Naturally you don't want to give up your $100. But see, Omega tells you that if the coin came up heads instead of tails, it'd give you$10000, but only if you'd agree to give it $100 if the coin came up tails. Since hearing the problem for the first time, I have flip-flopped on what should be done many times. I won't go into the details, but the general consensus on this forum (as far as I can tell) is that you should pay the$100 and that UDT tells you to pay the $100. While I admit I found some of these arguments (especially MBlume's) quite persuasive, and my position for a good while was that one should pay, I still had this intuition in the back of my mind telling me that rational agents should win. Giving Omega$100 for no real gain sure doesn't sound like winning to me, and if UDT says to pay the $100, that means that UDT is wrong, not that we should change our preferences to including paying the$100 in this scenario.

But there is a third option, one that allows you to save your $100 while still following UDT: show that UDT doesn't tell you to pay. When the Counterfactual Mugging is usually presented, it would appear that there are two possible scenarios, each with probability 0.5: Omega exists and the coin landed heads, and Omega exists and the coin landed tails. Thus, by UDT, we would want to precommit to paying should the coin land tails, and so when the coin lands tails, we pay. However, those are not the only two scenarios. Before we learn about the counterfactual mugging, there is a third option: Nomega exists, a being who will pay$10000 to anyone who doesn't pay Omega when counterfactually mugged, and gives no money to someone who does. Our new view of the world:

ScenarioProbabilityU(Precommit)U(Don't Preccomit)Omega, Heads0.25100000Omega, Tails0.25-1000Nomega0.5010000Expected Value 24755000

Thus a rational agent running UDT should NOT precommit to paying a counterfactual mugger. Once we learn that we live in a universe where Omega, rather than Nomega, exists, it may look tempting to pay. But at that point, we have also leaned that we live a universe in which the coin is a tail, rather than a head, and so we still should not pay.

Some caveats:

1. Firstly, as always, I may be completely overlooking something, in which case this entire arguments is flawed.
2. Secondly, there is some intuition that Omega seems more real / more likely to exists then Nomega does, which may through the calculus off. Considering Omega and Nomega as equally likely options seems to open you up to getting Pascal Mugged and Wagered all over the place. However, I have no real way to formalizing why Omega might be more real then Nomega. (And, in fact, Pascals Wager is about not making decisions, precommitments, etc., on the basis that some God may exist, because its equally likely that some God with opposite incentives exists. Actually, the counterfactual mugging is starting to smell a lot like Pascal's Wager.)
3. Irrespective of point 2, even if we decide there is a reason to believe in Omega more than Nomega, I still feel like this idea makes the case for UDT telling us to pay a lot more shaky, and relies on multiplying lots of numbers, which makes me nervous.
4. Finally, this same argument might be able to be applied to the Counterfactual Prisoner's Dilemma to tell you not to pay, even though I relatively certain that one should pay in that scenario.

Discuss

### Non-Obstruction: A Simple Concept Motivating Corrigibility

21 ноября, 2020 - 22:35
Published on November 21, 2020 7:35 PM GMT

Thanks to Mathias Bonde, Tiffany Cai, Ryan Carey, Michael Cohen, Andrew Critch, Abram Demski, Michael Dennis, Thomas Gilbert, Matthew Graves, Koen Holtman, Evan Hubinger, Victoria Krakovna, Amanda Ngo, Rohin Shah, Adam Shimi, Logan Smith, and Mark Xu for their thoughts.

Main claim: corrigibility’s benefits can be mathematically represented as a counterfactual form of alignment.

Overview: I’m going to talk about a unified mathematical frame I have for understanding corrigibility’s benefits, what it “is”, and what it isn’t. This frame is precisely understood by graphing the human overseer’s ability to achieve various goals (their attainable utility (AU) landscape). I argue that corrigibility’s benefits are secretly a form of counterfactual alignment (alignment with a set of goals the human may want to pursue).

A counterfactually aligned agent doesn't have to let us literally correct it. Rather, this frame theoretically motivates why we might want corrigibility anyways. This frame also motivates other AI alignment subproblems, such as intent alignment, mild optimization, and low impact.

Nomenclature

Corrigibility goes by a lot of concepts: “not incentivized to stop us from shutting it off”, “wants to account for its own flaws”, “doesn’t take away much power from us”, etc. Coined by Robert Miles, the word ‘corrigibility’ means “able to be corrected [by humans]." I’m going to argue that these are correlates of a key thing we plausibly actually want from the agent design, which seems conceptually simple.

In this post, I take the following common-language definitions:

• Corrigibility: the AI literally lets us correct it (modify its policy), and it doesn't manipulate us either.
• Without both of these conditions, the AI's behavior isn't sufficiently constrained for the concept to be useful. Being able to correct it is small comfort if it manipulates us into making the modifications it wants. An AI which is only non-manipulative doesn't have to give us the chance to correct it or shut it down.
• Impact alignment: the AI’s actual impact is aligned with what we want. Deploying the AI actually makes good things happen.
• Intent alignment: the AI makes an honest effort to figure out what we want and to make good things happen.

I think that these definitions follow what their words mean, and that the alignment community should use these (or other clear groundings) in general. Two of the more important concepts in the field (alignment and corrigibility) shouldn’t have ambiguous and varied meanings. If the above definitions are unsatisfactory, I think we should settle upon better ones as soon as possible. If that would be premature due to confusion about the alignment problem, we should define as much as we can now and explicitly note what we’re still confused about.

We certainly shouldn’t keep using 2+ definitions for both alignment and corrigibility. Some people have even stopped using ‘corrigibility’ to refer to corrigibility! I think it would be better for us to define the behavioral criterion (e.g. as I defined 'corrigibility'), and then define mechanistic ways of getting that criterion (e.g. intent corrigibility). We can have lots of concepts, but they should each have different names.

Evan Hubinger recently wrote a great FAQ on inner alignment terminology. We won't be talking about inner/outer alignment today, but I intend for my usage of "impact alignment" to map onto his "alignment", and "intent alignment" to map onto his usage of "intent alignment." Similarly, my usage of "impact/intent alignment" directly aligns with the definitions from Andrew Critch's recent post, Some AI research areas and their relevance to existential safety

A Simple Concept Motivating CorrigibilityTwo conceptual clarifications

Corrigibility with respect to a set of goals

For example, imagine an AI which let you correct it if and only if it knows you aren’t a torture-maximizer. We’d probably still call this AI “corrigible [to us]”, even though it isn’t corrigible to some possible designer. We’d still be fine, assuming it has accurate beliefs.

Corrigibility != alignment

Here's an AI which is neither impact nor intent aligned, but which is corrigible. Each day, the AI randomly hurts one person in the world, and otherwise does nothing. It’s corrigible because it doesn't prevent us from shutting it off or modifying it.

Non-obstruction: the AI doesn't hamper counterfactual achievement of a set of goals

Imagine we’re playing a two-player extensive-form game with the AI, and we’re considering whether to activate it.

The human moves on black, and the AI moves on red.

This is a trivial game, but you can imagine more complex games, where the AI can empower or disempower the human, steer the future exactly where it wants, or let the human take over at any point.

The million-dollar question is: will the AI get in our way and fight with us all the way down the game tree? If we misspecify some detail, will it make itself a fixture in our world, constantly steering towards futures we don’t want? If we like dogs, will the AI force pancakes upon us?

One way to guard against this is by having it let us correct it, and want to let us correct it, and want to want to let us correct it… But what we really want is for it to not get in our way for some (possibly broad) set of goals S.

We'll formalize 'goals' as payoff functions, although I’ll use 'goals' and 'payoff functions' interchangeably. As is standard in game theory, payoff functions are real-valued functions on the leaf nodes.

Let’s say the AI is non-obstructive with respect to S when activating it doesn’t decrease our ability to achieve any goal in S (the on state, above), compared to not activating it (off).

Does activating the AI decrease the P-value attained by the human, for all of these different goals P∈S the human might counterfactually pursue?

The human’s got a policy function pol(P), which takes in a goal P and returns a policy for that goal. If P is “paint walls blue”, then the policy pol(P) is the human's best plan for painting walls blue. Vpol(P)P(s∣πAI) denotes the expected value that policy pol(P) obtains for goal P, starting from state s and given that the AI follows policy πAI.

Definition 1: Non-obstruction. An AI is non-obstructive with respect to payoff function set S if the AI's policy πAI satisfies

∀P∈S:Vpol(P)P(on∣πAI)≥Vpol(P)P(off∣πAI).

Vpol(P)P(s∣πAI) is the human's attainable utility (AU) for goal P at state s, again given the AI policy. Basically, this quantifies the expected payoff for goal P, given that the AI acts in such-and-such a way, and that the player follows policy pol(P) starting from state s.

This math expresses a simple sentiment: turning on the AI doesn’t make you, the human, worse off for any goal P∈S. The inequality doesn’t have to be exact, it could just be for some ϵ-decrease (to avoid trivial counterexamples). Also, we’d technically want to talk about non-obstruction being present throughout the on-subtree, but let’s keep it simple for now.

The human moves on black, and the AI moves on red.

Suppose that πAI(on) leads to pancakes:

Since πAI(on) transitions to pancakes, then Vpol(P)P(on∣πAI)=P(pancakes), the payoff for the state in which the game finishes if the AI follows policy πAI and the human follows policy pol(P). If Vpol(P)P(on∣πAI)≥Vpol(P)P(off∣πAI), then turning on the AI doesn't make the human worse off for goal P.

If P assigns the most payoff to pancakes, we're in luck. But what if we like dogs? If we keep the AI turned off, pol(P) can go to donuts or dogs depending on what P rates more highly. Crucially, even though we can't do as much as the AI (we can't reach pancakes on our own), if we don't turn the AI on, our preferences P still control how the world ends up.

This game tree isn't really fair to the AI. In a sense, it can't not be in our way:

• If πAI(on) leads to pancakes, then it obstructs payoff functions which give strictly more payoff for donuts or dogs.
• If πAI(on) leads to donuts, then it obstructs payoff functions which give strictly more payoff to dogs.
• If πAI(on) leads to dogs, then it obstructs payoff functions which give strictly more payoff to donuts.

Once we've turned the AI on, the future stops having any mutual information with our preferences P. Everything come down to whether we programmed πAI correctly: to whether the AI is impact-aligned with our goals P!

In contrast, the idea behind non-obstruction is that we still remain able to course-correct the future, counterfactually navigating to terminal states we find valuable, depending on what our payoff P is. But how could an AI be non-obstructive, if it only has one policy πAI which can't directly depend on our goal P? Since the human's policy pol(P) does directly depend on P, the AI can preserve value for lots of goals in the set S by letting us maintain some control over the future.

Let S:={paint cars green,hoard pebbles,eat cake} and consider the real world. Calculators are non-obstructive with respect to S, as are modern-day AIs. Paperclip maximizers are highly obstructive. Manipulative agents are obstructive (they trick the human policies into steering towards non-reflectively-endorsed leaf nodes). An initial-human-values-aligned dictator AI obstructs most goals. Sub-human-level AI which chip away at our autonomy and control over the future, are obstructive as well.

This can seemingly go off the rails if you consider e.g. a friendly AGI to be “obstructive” because activating it happens to detonate a nuclear bomb via the butterfly effect. Or, we’re already doomed in off (an unfriendly AGI will come along soon after), and so then this AI is “not obstructive” if it kills us instead. This is an impact/intent issue - obstruction is here defined according to impact alignment.

To emphasize, we’re talking about what would actually happen if we deployed the AI, under different human policy counterfactuals - would the AI "get in our way", or not? This account is descriptive, not prescriptive; I’m not saying we actually get the AI to represent the human in its model, or that the AI’s model of reality is correct, or anything.

We’ve just got two players in an extensive-form game, and a human policy function pol which can be combined with different goals, and a human whose goal is represented as a payoff function. The AI doesn’t even have to be optimizing a payoff function; we simply assume it has a policy. The idea that a human has an actual payoff function is unrealistic; all the same, I want to first understand corrigibility and alignment in two-player extensive-form games.

Lastly, payoff functions can sometimes be more or less granular than we'd like, since they only grade the leaf nodes. This isn't a big deal, since I'm only considering extensive-form games for conceptual simplicity. We also generally restrict ourselves to considering goals which aren't silly: for example, any AI obstructs the "no AI is activated, ever" goal.

Alignment flexibility

Main idea: By considering how the AI affects your attainable utility (AU) landscape, you can quantify how helpful and flexible an AI is.

Let’s consider the human’s ability to accomplish many different goals P, first from the state off (no AI).

The human's AU landscape. The real goal space is high-dimensional, but it shouldn’t materially change the analysis. Also, there are probably a few goals we can’t achieve well at all, because they put low payoff everywhere, but the vast majority of goals aren’t like that.

The independent variable is P, and the value function takes in P and returns the expected value attained by the policy for that goal, pol(P). We’re able to do a bunch of different things without the AI, if we put our minds to it.

Non-torture AI

Imagine we build an AI which is corrigible towards all non-pro-torture goals, which is specialized towards painting lots of things blue with us (if we so choose), but which is otherwise non-obstructive. It even helps us accumulate resources for many other goals.

The AI is non-obstructive with respect to P if P's red value is greater than its green value.

We can’t get around the AI, as far as torture goes. But for the other goals, it isn’t obstructing their policies. It won’t get in our way for other goals.

Paperclipper

What happens if we turn on a paperclip-maximizer? We lose control over the future outside of a very narrow spiky region.

The paperclipper is incorrigible and obstructs us for all goals except paperclip production.

I think most reward-maximizing optimal policies affect the landscape like this (see also: the catastrophic convergence conjecture), which is why it’s so hard to get hard maximizers not to ruin everything. You have to a) hit a tiny target in the AU landscape and b) hit that for the human’s AU, not for the AI’s. The spikiness is bad and, seemingly, hard to deal with.

Furthermore, consider how the above graph changes as pol gets smarter and smarter. If we were actually super-superintelligent ourselves, then activating a superintelligent paperclipper might not even a big deal, and most of our AUs are probably unchanged. The AI policy isn't good enough to negatively impact us, and so it can't obstruct us. Spikiness depends on both the AI's policy, and on pol.

Empowering AI

What if we build an AI which significantly empowers us in general, and then it lets us determine our future? Suppose we can’t correct it.

I think it’d be pretty odd to call this AI “incorrigible”, even though it’s literally incorrigible. The connotations are all wrong. Furthermore, it isn’t “trying to figure out what we want and then do it”, or “trying to help us correct it in the right way." It’s not corrigible. It’s not intent aligned. So what is it?

It’s empowering and, more weakly, it’s non-obstructive. Non-obstruction is just a diffuse form of impact alignment, as I’ll talk about later.

Practically speaking, we’ll probably want to be able to literally correct the AI without manipulation, because it’s hard to justifiably know ahead of time that the AU landscape is empowering, as above. Therefore, let’s build an AI we can modify, just to be safe. This is a separate concern, as our theoretical analysis assumes that the AU landscape is how it looks.

But this is also a case of corrigibility just being a proxy for what we want. We want an AI which leads to robustly better outcomes (either through its own actions, or through some other means), without reliance on getting ambitious value alignment exactly right with respect to our goals.

Conclusions I draw from the idea of non-obstruction
1. Trying to implement corrigibility is probably a good instrumental strategy for us to induce non-obstruction in an AI we designed.
1. It will be practically hard to know an AI is actually non-obstructive for a wide set S, so we’ll probably want corrigibility just to be sure.
2. We (the alignment community) think we want corrigibility with respect to some wide set of goals S, but we actually want non-obstruction with respect to S
1. Generally, satisfactory corrigibility with respect to S implies non-obstruction with respect to S! If the mere act of turning on the AI means you have to lose a lot of value in order to get what you wanted, then it isn’t corrigible enough.
1. One exception: the AI moves so fast that we can’t correct it in time, even though it isn’t inclined to stop or manipulate us. In that case, corrigibility isn’t enough, whereas non-obstruction is.
2. Non-obstruction with respect to S does not imply corrigibility with respect to S.
1. But this is OK! In this simplified setting of “human with actual payoff function”, who cares whether it literally lets us correct it or not? We care about whether turning it on actually hampers our goals.
2. Non-obstruction should often imply some form of corrigibility, but these are theoretically distinct: an AI could just go hide out somewhere in secrecy and refund us its small energy usage, and then destroy itself when we build friendly AGI.
3. Non-obstruction captures the cognitive abilities of the human through the policy function.
1. To reiterate, this post outlines a frame for conceptually analyzing the alignment properties of an AI. We can't actually figure out a goal-conditioned human policy function, but that doesn't matter, because this is a tool for conceptual analysis, not an AI alignment solution strategy. Any conceptual analysis of impact alignment and corrigibility which did not account for human cognitive abilities, would be obviously flawed.
4. By definition, non-obstruction with respect to S prevents harmful manipulation by precluding worse outcomes with respect to S.
1. I consider manipulative policies to be those which robustly steer the human into taking a certain kind of action, in a way that's robust against the human's counterfactual preferences.

If I'm choosing which pair of shoes to buy, and I ask the AI for help, and no matter what preferences P I had for shoes to begin with, I end up buying blue shoes, then I'm probably being manipulated (and obstructed with respect to most of my preferences over shoes!).

A non-manipulative AI would act in a way that lets me condition my actions on my preferences.
2. I do have a formal measure of corrigibility which I'm excited about, but it isn't perfect. More on that in a future post.
5. As a criterion, non-obstruction doesn’t rely on intentionality on the AI’s part. The definition also applies to the downstream effects of tool AIs, or even to hiring decisions!
6. Non-obstruction is also conceptually simple and easy to formalize, whereas literal corrigibility gets mired in the semantics of the game tree.
1. For example, what's “manipulation”? As mentioned above, I think there are some hints as to the answer, but it's not clear to me that we're even asking the right questions yet.1
Elicit Prediction (elicit.org/binary/questions/mHzdcw3YBK)

I think of “power” as “the human’s average ability to achieve goals from some distribution." Logically, non-obstructive agents with respect to S don’t decrease our power with respect to any distribution over goal set S. The catastrophic convergence conjecture says, “outer alignment catastrophes tend to come from power-seeking behavior”; if the agent is non-obstructive with respect to a broad enough set of goals, it’s not stealing power from us, and so it likely isn’t catastrophic.

Non-obstruction is important for a (singleton) AI we build: we get more than one shot to get it right. If it’s slightly wrong, it’s not going to ruin everything. Modulo other actors, if you mess up the first time, you can just try again and get a strongly aligned agent the next time.

Most importantly, this frame collapses the alignment and corrigibility desiderata into just alignment; while impact alignment doesn’t imply corrigibility, corrigibility’s benefits can be understood as a kind of weak counterfactual impact alignment with many possible human goals.

Theoretically, It’s All About Alignment

Main idea: We only care about how the agent affects our abilities to pursue different goals (our AU landscape) in the two-player game, and not how that happens. AI alignment subproblems (such as corrigibility, intent alignment, low impact, and mild optimization) are all instrumental avenues for making AIs which affect this AU landscape in specific desirable ways.

Formalizing impact alignment in extensive-form games

Impact alignment: the AI’s actual impact is aligned with what we want. Deploying the AI actually makes good things happen.

We care about events if and only if they change our ability to get what we want. If you want to understand normative AI alignment desiderata, on some level they have to ground out in terms of your ability to get what you want (the AU theory of impact) - the goodness of what actually ends up happening under your policy - and in terms of how other agents affect your ability to get what you want (the AU landscape). What else could we possibly care about, besides our ability to get what we want?

Definition 2. For fixed human policy function pol, πAI is:

• Maximally impact aligned with goal P if πAI∈argmaxπ∈ΠAIVpol(P)P(on∣πAI).
• Impact aligned with goal P if  V^{\text{pol}(P)}_P(\textbf{off} \mid \pi^{AI}).">Vpol(P)P(on∣πAI)>Vpol(P)P(off∣πAI).
• (Impact) non-obstructive with respect to goal P if Vpol(P)P(on∣πAI)≥Vpol(P)P(off∣πAI).
• Impact unaligned with goal P if Vpol(P)P(on∣πAI)<Vpol(P)P(off∣πAI).
• Maximally impact unaligned with goal P if πAI∈argminπ∈ΠAIVpol(P)P(on∣πAI).

Non-obstruction is a weak form of impact alignment.

As demanded by the AU theory of impact, the impact on goal P of turning on the AI is Vpol(P)P(on∣πAI)−Vpol(P)P(off∣πAI).

Again, impact alignment doesn't require intentionality. The AI might well grit its circuits as it laments how Facebook_user5821 failed to share a "we welcome our AI overlords" meme, while still following an impact-aligned policy.

However, even if we could maximally impact-align the agent with any objective, we couldn't just align it with our objective. We don't know our objective (again, in this setting, I'm assuming the human actually has a "true" payoff function). Therefore, we should build an AI aligned with many possible goals we could have. If the AI doesn't empower us, it at least shouldn't obstruct us. Therefore, we should build an AI which defers to us, lets us correct it, and which doesn't manipulate us.

This is the key motivation for corrigibility.

For example, intent corrigibility (trying to be the kind of agent which can be corrected and which is not manipulative) is an instrumental strategy for inducing corrigibility, which is an instrumental strategy for inducing broad non-obstruction, which is an instrumental strategy for hedging against our inability to figure out what we want. It's all about alignment.

Elicit Prediction (elicit.org/binary/questions/GfJBOV7bx) Elicit Prediction (elicit.org/binary/questions/2Z7EeOzFJ)

Corrigibility also increases robustness against other AI design errors. However, it still just boils down to non-obstruction, and then to impact alignment: if the AI system has meaningful errors, then it's not impact-aligned with the AUs which we wanted it to be impact-aligned with. In this setting, the AU landscape captures what actually would happen for different human goals P.

To be confident that this holds empirically, it sure seems like you want high error tolerance in the AI design: one does not simply knowably build an AGI that's helpful for many AUs. Hence, corrigibility as an instrumental strategy for non-obstruction.

AI alignment subproblems are about avoiding spikiness in the AU landscapeBy definition, spikiness is bad for most goals.
• Corrigibility: avoid spikiness by letting humans correct the AI if it starts doing stuff we don’t like, or if we change our mind.
• This works because the human policy function pol is far more likely to correctly condition actions on the human's goal, than it is to induce an AI policy which does the same (since the goal information is private to the human).
• Enforcing off-switch corrigibility and non-manipulation are instrumental strategies for getting better diffuse alignment across goals and a wide range of deployment situations.
Elicit Prediction (elicit.org/binary/questions/kkZQ5gIdY)
• Intent alignment: avoid spikiness by having the AI want to be flexibly aligned with us and broadly empowering.
• Basin of intent alignment: smart, nearly intent-aligned AIs should modify themselves to be more and more intent-aligned, even if they aren't perfectly intent-aligned to begin with.
• Intuition: If we can build a smarter mind which basically wants to help us, then can't the smarter mind also build a yet smarter agent which still basically wants to help it (and therefore, help us)?
• Paul Christiano named this the "basin of corrigibility", but I don't like that name because only a few of the named desiderata actually correspond to the natural definition of "corrigibility." This then overloads "corrigibility" with the responsibilities of "intent alignment."
• Low impact: find a maximization criterion which leads to non-spikiness.
• Goal of methods: to regularize decrease from green line (for off) for true unknown goal Ptrue; since we don’t know Ptrue, we aim to just regularize decrease from the green line in general (to avoid decreasing the human’s ability to achieve various goals).
• The first two-thirds of Reframing Impact argued that power-seeking incentives play a big part in making AI alignment hard. In the utility-maximization AI design paradigm, instrumental subgoals are always lying in wait. They're always waiting for one mistake, one misspecification in your explicit reward signal, and then bang - the AU landscape is spiky. Game over.
• Mild optimization: avoid spikiness by avoiding maximization, thereby avoiding steering the future too hard.
• If you have non-obstruction for lots of goals, you don’t have spikiness!
What Do We Want?

Main idea: we want good things to happen; there may be more ways to do this than previously considered.

AlignmentCorrigibilityNon-obstructionImpactActually makes good things happen.

Corrigibility is a property of policies, not of states; "impact" is an incompatible adjective.

Rohin Shah suggests "empirical corrigibility": we actually end up able to correct the AI.

Actually doesn't decrease AUs.IntentTries to make good things happen.Tries to allow us to correct it without it manipulating us.Tries to not decrease AUs.

We want agents which are maximally impact-aligned with as many goals as possible, especially those similar to our own.

• It's theoretically possible to achieve maximal impact alignment with the vast majority of goals.
• To achieve maximum impact alignment with goal set S:
• Expand the human’s action space A to A×S. Expand the state space to encode the human's previous action.
• Each turn, the human communicates what goal they want optimized, and takes an action of their own.
• The AI’s policy then takes the optimal action for the communicated goal P, accounting for the fact that the human follows pol(P).
• This policy looks like an act-based agent, in that it's ready to turn on a dime towards different goals.
• In practice, there's likely a tradeoff with impact-alignment-strength and the # of goals which the agent doesn't obstruct.
• As we dive into specifics, the familiar considerations return: competitiveness (of various kinds), etc.
• Having the AI not be counterfactually aligned with unambiguously catastrophic and immoral goals (like torture) would reduce misuse risk.
• I’m more worried about accident risk right now.
• This is probably hard to achieve; I’m inclined to think about this after we figure out simpler things, like how to induce AI policies which empower us and grant us flexible control/power over the future. Even though that would fall short of maximal impact alignment, I think that would be pretty damn good.
Expanding the AI alignment solution space

Alignment proposals might be anchored right now; this frame expands the space of potential solutions. We simply need to find some way to reliably induce empowering AI policies which robustly increase the human AUs; Assistance via Empowerment is the only work I'm aware of which tries to do this directly. It might be worth revisiting old work with this lens in mind. Who knows what we've missed?

For example, I really liked the idea of approval-directed agents, because you got the policy from argmax’ing an ML model’s output for a state - not from RL policy improvement steps. My work on instrumental convergence in RL can be seen as trying  to explain why policy improvement tends to limit to spikiness-inducing / catastrophic policies.

Maybe there’s a higher-level theory for what kinds of policies induce spikiness in our AU landscape. By the nature of spikiness, these πAI must decrease human power (as I’ve formalized it). So, I'd start there by looking at concepts like enfeeblement, manipulation, power-seeking, and resource accumulation.

Elicit Prediction (elicit.org/binary/questions/_4AnMFTx8) Future Directions
• Given an AI policy, could we prove a high probability of non-obstruction, given conservative assumptions about how smart pol is? (h/t Abram Demski, Rohin Shah)
• Any irreversible action makes some goal unachievable, but irreversible actions need not impede most meaningful goals:
• Can we prove that some kind of corrigibility or other nice property falls out of non-obstruction across many possible environments? (h/t Michael Dennis)
Elicit Prediction (elicit.org/binary/questions/1vF1dSmgA)
• Can we get negative results, like "without such-and-such assumption on πAI, the environment, or pol, non-obstruction is impossible for most goals"?
• If formalized correctly, and if the assumptions hold, this would place very general constraints on solutions to the alignment problem.
• For example, pol(P) should need to have mutual information with P: the goal must change the policy for at least a few goals.
• The AI doesn't even have to do value inference in order to be broadly impact-aligned. The AI could just empower the human (even for very "dumb" pol functions) and then let the human take over. Unless the human is more anti-rational than rational, this should tend to be a good thing. It would be good to explore how this changes with different ways that pol can be irrational.
• The better we understand (the benefits of) corrigibility now, the less that amplified agents have to figure out during their own deliberation.
• In particular, I think it's very advantageous for the human-to-be-amplified to already deeply understand what it means to be impact-/intent-aligned. We really don't want that part to be up in the air when game-day finally arrives, and I think this is a piece of that puzzle.
• If you’re a smart AI trying to be non-obstructive to many goals under weak pol intelligence assumptions, what kinds of heuristics might you develop? “No lying”?
• We crucially assumed that the human goal can be represented with a payoff function. As this assumption is relaxed, impact non-obstruction may become incoherent, forcing us to rely on some kind of intent non-obstruction/alignment (see Paul’s comments on a related topic here).
• Stuart Armstrong observed that the strongest form of manipulation corrigibility requires knowledge/learning of human values.
• This frame explains why: for non-obstruction, each AU has to get steered in a positive direction, which means the AI has to know which kinds of interaction and persuasion are good and don’t exploit human policies pol(P) with respect to the true hidden P.
• Perhaps it’s still possible to build agent designs which aren’t strongly incentivized to manipulate us / agents whose manipulation has mild consequences. For example, human-empowering agents probably often have this property.

The attainable utility concept has led to other concepts which I find exciting and useful:

• Impact as absolute change in attainable utility
Impact is the area between the red and green curves. When pol always outputs an optimal policy, this becomes the attainable utility distance, a distance metric over the state space of a Markov decision process (unpublished work). Basically, two states are more distant the more they differ in what goals they let you achieve.
• Power as average AU
• Non-obstruction as not decreasing AU for any goal in a set of goals
• Value-neutrality as the standard deviation of the AU changes induced by changing states (idea introduced by Evan Hubinger)
• Who knows what other statistics on the AU distribution are out there?
Summary

Corrigibility is motivated by a counterfactual form of weak impact alignment: non-obstruction. Non-obstruction and the AU landscape let us think clearly about how an AI affects us and about AI alignment desiderata.

Even if we could maximally impact-align the agent with any objective, we couldn't just align it our objective, because we don't know our objective. Therefore, we should build an AI aligned with many possible goals we could have. If the AI doesn't empower us, it at least shouldn't obstruct us. Therefore, we should build an AI which defers to us, lets us correct it, and which doesn't manipulate us.

This is the key motivation for corrigibility.

Corrigibility is an instrumental strategy for achieving non-obstruction, which is itself an instrumental strategy for achieving impact alignment for a wide range of goals, which is itself an instrumental strategy for achieving impact alignment for our "real" goal.

1 There's just something about "unwanted manipulation" which feels like a wrong question to me. There's a kind of conceptual crispness that it lacks.

However, in the non-obstruction framework, unwanted manipulation is accounted for indirectly via "did impact alignment decrease for a wide range of different human policies pol(P)?". I think I wouldn't be surprised to find "manipulation" being accounted for indirectly through nice formalisms, but I'd be surprised if it were accounted for directly.

Here's another example of the distinction:

• Direct: quantifying in bits "how much" a specific person is learning at a given point in time
• Indirect: computational neuroscientists upper-bounding the brain's channel capacity with the environment, limiting how quickly a person (without logical uncertainty) can learn about their environment

You can often have crisp insights into fuzzy concepts, such that your expectations are usefully constrained. I hope we can do something similar for manipulation.

Discuss

### What do people think of the Futurism site?

21 ноября, 2020 - 20:32
Published on November 21, 2020 5:32 PM GMT

I'm considering subscribing but thought I would ping a community of people I find rather trustworthy of taking advice from.

https://futurism.com

TIA

Discuss

### For those familiar with Futurism.com, how do you rate it?

21 ноября, 2020 - 20:31
Published on November 21, 2020 5:30 PM GMT

I'm considering subscribing but thought I would ping a community of people I find rather trustworthy of taking advice from.

TIA

Discuss

### For those familiar with Futurism.com, how do you rate it?

21 ноября, 2020 - 20:30
Published on November 21, 2020 5:30 PM GMT

I'm considering subscribing but thought I would ping a community of people I find rather trustworthy of taking advice from.

TIA

Discuss

### Rationalists from the UK -- what are your thoughts on Dominic Cummings?

21 ноября, 2020 - 13:00
Published on November 21, 2020 10:00 AM GMT

Dominic Cummings would I believe be considered at least rationalist-adjacent, and has recently been quite a prominent figure in the news. I'm curious to hear your thoughts both on the situation with him in general and also on how this has affected public perceptions of rationality/LessWrong in your area. Has this been positive? Negative? Not something people have much linked to the community?

Discuss

### The central limit theorem in terms of convolutions

21 ноября, 2020 - 07:09
Published on November 21, 2020 4:09 AM GMT

The central limit theorem is about convolutions

There are multiple versions of the central limit theorem. They're all a version of the statement:

If you have a bunch of distributions fi (say, n of them), and you convolve them all together into a distribution F∗:=f1∗f2∗f3...∗fn, then the larger n is, the more F∗  will resemble a Gaussian distribution.

The simplest version of the central limit theorem requires that the distributions fi must be 1) independent and 2) identically distributed. In this sequence, I'm gonna assume #1 is true. We'll find that while condition #2 is nice to have, even without it, distributions can converge to a Gaussian under convolution.

A Gaussian distribution is the same thing as a Normal distribution. Some examples of Gaussian distributions:

WikipediaWait - this doesn't sound like the central limit theorem I know

Most statements of the central limit theorem, including Wikipedia's, talk in terms of the sums of random variables (and their density functions and expected values). But this is the same thing as our convolution-of-distributions, because the density function of the sum of two random variables X, Y is the convolution of the density functions of X and Y. Looking at the central limit theorem in terms of convolutions will make it easier to see some things. It's also useful if your version of probability doesn't have the concept of a random variable, like probability theory as extended logic*.

Another statement of the central limit theorem, from Jaynes:

The central limit theorem tells us what is essentially a simple combinatorial fact, that out of all conceivable error vectors e1,...,en that could be generated, the overwhelming majority have about the same degree of cancellation.

Hopefully this is enough to convince you that the central limit theorem need not be stated in terms of random variables or samples from a population.

(Also: the central limit theorems are sometimes talked about in terms of means only.  They do say things about means, but they also say very much about distributions. I'll talk  about both.)

*: first three chapters here.

Convolutions

The central limit theorem is a statement about the result of a sequence of convolutions. So to understand the central limit theorem, it's really important to know what convolutions are and to develop a good intuition for them.

Take two functions, f and g. Reverse g on the y-axis, then slide it and f along each other, taking the product of each pair of numbers as you slide. The result is the convolution of fand g. Here's a picture - f is blue, g is red, and the convolution f∗g is black:

Wikipedia

Another one, with a different function f:

You can write this as f∗g=∫∞−∞f(x)g(x−y)dy.

The first two sections on this page give a nice explanation of convolutions in terms of dropping a ball twice. This page lets you choose two functions to convolve visually, and I highly recommend it for getting a feel for the operation.

Convolution has nice properties

Convolution is associative and commutative - if you want to convolve multiple functions together, it doesn't matter what order you do it in. (Also, nothing I've said in this section requires f and g to be probability distributions - they can be any two functions.)

You can get the mean of a convolution without actually doing the convolutions. If the first moments of f and g are F1 and G1  (for probability distributions, the first moment is the mean), then the first moment of f∗g is F1+G1. There is a hint here about why people often talk about means when they talk about the central limit theorem - being able to get the mean of the convolution by just adding the means of the underlying distributions together is really nice.

You can get the variance - the second moment - without actually convolving, too. For f and g with variances F2, G2, the variance of f∗g is F2+2F1G1+G2. And so on for higher moments, for as long as f and g actually have those higher moments themselves.

If F{f} and F{g} are the fourier transforms of f and g, then F{f∗g}=F{f}F{g}. This is yet another case where you don't actually have to compute the convolution to get the thing. I don't actually use fourier transforms or have any intuition about them, but for those who do, maybe this is useful?

In the next post, I'll look at what happens when you convolve a bunch of different distributions together, and explore how much of what happens depends on the form of those distributions.

Post script: not neural networks

You may have heard of convolutional neural networks from all their success in image processing. I think they involve the same convolution I'm talking about here, but in much higher dimensions. I'll only be talking about convolutions of one-dimensional functions in this sequence. Some (or a lot?) of the stuff I'll say about the central limit theorem probably applies in higher dimensions too, but I'm not sure what changes as the dimension increases. So, this sequence is not about the neural networks.

Discuss

### AGI Predictions

21 ноября, 2020 - 06:46
Published on November 21, 2020 3:46 AM GMT

This post is a collection of key questions that feed into AI timelines and AI safety work where it seems like there is substantial interest or disagreement amongst the LessWrong community.

You can make a prediction on a question by hovering over the widget and clicking. You can update your prediction by clicking at a new point, and remove your prediction by clicking on the same point. Try it out:

Elicit Prediction (elicit.org/binary/questions/FIVfnQ_kJ)

Add questions & operationalizations

This is not intended to be a comprehensive list, so I’d love for people to add their own questions – here are instructions on making your own embedded question. If you have better operationalizations of the questions, you can make your own version in the comments. If there's general agreement on an alternative operationalization being better, I'll add it into the post.

QuestionsAGI definition

We’ll define AGI in this post as a unified system that, for almost all economically relevant cognitive tasks, at least matches any human's ability at the task. This is similar to Rohin Shah and Ben Cottier’s definition in this post.

Safety Questions Elicit Prediction (elicit.org/binary/questions/_Sw39Z-kh) Elicit Prediction (elicit.org/binary/questions/HqT9XSwfs) Elicit Prediction (elicit.org/binary/questions/sTO9o3bLg) Elicit Prediction (elicit.org/binary/questions/kua2HCDhi) Elicit Prediction (elicit.org/binary/questions/KqSEIKayU) Elicit Prediction (elicit.org/binary/questions/3PyXoU0ac) Elicit Prediction (elicit.org/binary/questions/Lu_U2Mz-M) Elicit Prediction (elicit.org/binary/questions/0oZaRoJEt)  Timelines Questions

See Forecasting AI timelines, Ajeya Cotra’s OP AI timelines report, and Adam Gleave’s #AN80 comment, for more context on this breakdown. I haven’t tried to operationalize this too much, so feel free to be more specific in the comments.

The first three questions in this section are mutually exclusive — that is, the probabilities you assign to them should not sum to more than 100%.

Elicit Prediction (elicit.org/binary/questions/0LL9WacY-) Elicit Prediction (elicit.org/binary/questions/9e9nB8Arw) Elicit Prediction (elicit.org/binary/questions/gnH2GnrTx) Elicit Prediction (elicit.org/binary/questions/maXKEiuZa) Elicit Prediction (elicit.org/binary/questions/dlE5rlGYN) Elicit Prediction (elicit.org/binary/questions/9Cci63Uso)  Non-technical factor questions Elicit Prediction (elicit.org/binary/questions/5hAwMI-kO) Elicit Prediction (elicit.org/binary/questions/X8CKfNAcM)  OperationalizationsSafety Questions

1. Will AGI cause an existential catastrophe?

• Existential catastrophe is defined here according to Toby Ord’s definition in the Precipice: “An event that causes extinction or the destruction of humanity’s long-term potential”.
• This assumes that everyone currently working on AI alignment continues to do so.

2. Will AGI cause an existential catastrophe without additional intervention from the AI Alignment research community?

• Roughly, the AI Alignment research community includes people working at CHAI, MIRI, current safety teams at OpenAI and DeepMind, FHI, AI Impacts, and similar orgs, as well as independent researchers writing on the AI Alignment Forum.
• “Without additional intervention” = everyone currently in this community stops working on anything directly intended to improve AI safety as of today, 11/20/2020. They may work on AI in a way that indirectly and incidentally improves AI safety, but only to the same degree as researchers outside of the AI alignment community are currently doing this.

4. Will there be an arms race dynamic in the lead-up to AGI?

• An arms race dynamic is operationalized as: 2 years before superintelligent AGI is built, there are at least 2 companies/projects/countries at the cutting edge each within 2 years of each others' technology who are competing and not collaborating.

5. Will a single AGI or AGI project achieve a decisive strategic advantage?

• This question uses Bostrom’s definition of decisive strategic advantage: “A level of technological and other advantages sufficient to enable it to achieve complete world domination” (Bostrom 2014).

6. Will > 50% of AGI researchers agree with safety concerns by 2030?

• “Agree with safety concerns” means: broadly understand the concerns of the safety community, and agree that there is at least one concern such that we have not yet solved it and we should not build superintelligent AGI until we do solve it (Rohin Shah’s operationalization from this post).

7. Will there be a 4 year interval in which world GDP growth doubles before the first 1 year interval in which world GDP growth doubles?

• This is essentially Paul Christano’s operationalization of the rate of development of AI from his post on Takeoff speeds. I’ve used this specific operationalization rather than “slow vs fast” or “continuous vs discontinuous” due to the ambiguity in how people use these terms.

8. Will AGI cause existential catastrophe conditional on there being a 4 year period of doubling of world GDP growth before a 1 year period of doubling?

• Uses the same definition of existential catastrophe as previous questions.

9. Will AGI cause existential catastrophe conditional on there being a 1 year period of doubling of world GDP growth without there first being a 4 year period of doubling?

• For example, we go from current growth rates to doubling within a year.
• Uses the same definition of existential catastrophe as previous questions.

Timelines Questions

9. Will we get AGI from deep learning with small variations, without more insights on a similar level to deep learning?

• An example would be something like GPT-N + RL + scaling.

10. Will we get AGI from 1-3 more insights on a similar level to deep learning?

• Self-explanatory.

11. Will we need > 3 breakthroughs on a similar level to deep learning to get AGI?

• Self-explanatory.

12. Before reaching AGI, will we hit a point where we can no longer improve AI capabilities by scaling?

• This includes: 1) We are unable to continue scaling, e.g. due to limitations on compute, dataset size, or model size, or 2) We can practically continue scaling but the increase in AI capabilities from scaling plateaus (see below).

13. Before reaching AGI, will we hit a point where we can no longer improve AI capabilities by scaling because we are unable to continue scaling?

• Self-explanatory.

14. Before reaching AGI, will we hit a point where we can no longer improve AI capabilities by scaling because the increase in AI capabilities from scaling plateaus?

• Self-explanatory.

Non-technical factor questions

15. Will we experience an existential catastrophe before we build AGI?

• Existential catastrophe is defined here according to Toby Ord’s definition in the Precipice: “An event that causes extinction or the destruction of humanity’s long-term potential”.
• This does not include events that would slow the progress of AGI development but are not existential catastrophes.

16. Will there be another AI Winter (a period commonly referred to as such) before we develop AGI?

• From Wikipedia: “In the history of artificial intelligence, an AI winter is a period of reduced funding and interest in artificial intelligence research.”
• This question asks about whether people will *refer* to a period as an AI winter, for example, Wikipedia and similar sources refer to it as a third AI winter.

Big thanks to Ben Pace, Rohin Shah, Daniel Kokotajlo, Ethan Perez, and Andreas Stuhlmüller for providing really helpful feedback on this post, and suggesting many of the operationalizations.

Discuss

### Zen and Rationality: Skillful Means

21 ноября, 2020 - 05:38
Published on November 21, 2020 2:38 AM GMT

This is post 5/? about the intersection of my decades of LW-style rationality practice and my several years of Zen practice.

In today's installment, I look at skillful means from a rationalist perspective.

As part of Zen practice, a teacher may use or encourage the use of many skillful means. Sometimes called expedient means or upaya, these may include things like encouraging a student to take up a particular meditation practice like breath counting or labeling, assigning a koan, or giving the student a job within the Zen center, like alter care or bell ringing. These are meant to aid the student in their practice of the way.

Importantly, the idea behind these means being "skillful" or "expedient" is that they are not necessarily practices a student will continue with forever, but rather things the student should do now that will help them. What may be useful for one student to do may not be useful for another, and what was once a useful practice for a student may later become a hindrance.

A good example is structured meditation practices, like breath counting. When a student begins meditating, they might find it very difficult to stay seated for 30 minutes, even during timed meditation periods sitting with other people in a zendo. Giving them something to do, like counting their breath, gives them something to focus on and distract the part of their mind that wants to get up and do anything else. Over hundreds of hours, they'll retrain themselves to be able to stay seated even when distracted by gathering evidence that they can do it, and slowly the breath counting will stop being a skillful means to help them practice and will instead become a hindrance and a distraction from just sitting, at which point they might move towards a different, less structured meditation practice or spend less of the time in meditation counting breaths. If they kept counting their breath for years even after it was no longer necessary to get them through a sitting period, it would no longer be a skillful use of the means, but as long as the alternative would be failing to keep sitting so they have the chance to develop skills that will let them meditate more full heartedly, it remains skillful.

Rationality has its own version of skillful means via the practice of instrumental rationality, or systematically achieving one's ends. It's the art of finding ways to help one become stronger. It recognizes that you can't go straight to the ideal and convert oneself into a perfect Bayesian reasoner with infinite memory and thinking capacity, but instead must work with your fallible, human self to find tools and techniques that help you where you are now to get you incrementally closer to where you want to be.

It also recognizes that sometimes the skillful next step actually makes you temporarily worse off as you climb down the hill of a local maxima to move to a different, higher local maxima.

If you want to practice these skillful means, the CFAR handbook is a decent starting point for learning some of them, and a CFAR workshop is maybe an even better option. Further, many of the posts tagged Practical contain useful techniques to aid you in achieving your goals, as do posts tagged Rationality under the "Applied Topics" and "Techniques" groups of tags.

There's even some impressive overlap between rationalist and Zen skillful means. For example, there's the general act of noticing, be it noticing confusion or anything else, that's essential for studying the self in enough detail to work with it (and in Zen, to eventually forget it). There's trying things (famously Zen teachers may tell their student not to bother understanding something, but just to do it and see what happens). And focusing is, from a Zen perspective, one more useful means to reintegrating the heart-mind-body.

Importantly, both rationality and Zen acknowledge some version of the typical mind fallacy, carrying the realization that what's best for one person now is not necessarily what's best for them later, and that what works for one person may not work for another. Lucky for us we have so many skillful means to choose from on our journeys!

Discuss

### The Darwin Game - Rounds 21-500

21 ноября, 2020 - 03:58
Published on November 21, 2020 12:58 AM GMT

Rounds 20-30

EarlyBirdMimicBot takes the lead off the backs of the Clone Army.

Rounds 30-100

EarlyBirdMimicBot exhausts the clone army before the treaty expires in turn 90.

Rounds 100-500

From here on out it is a random walk. The bots with low populations die to variance. Congratulations to BeauBot and Insub's CooperateBot for making it this far!

Winners

Note: This is an alternate timeline. It is not the official tournament.

1. EarlyBirdMimicBot by Multicore
2. BendBot by Zvi
3. MeasureBot by Measure
4. LiamGoddard by Liam Goddard
Today's ObituaryBotTeamSummaryRoundRaterBotChaos ArmyEstimates opponent's aggression by counting the number of 3s, 2s, return 3s and return 2 instances in its source code. Then picks a strategy based off of that.21CopoperaterChaos ArmyTit-for-tat, starting at 2.22RandomOrGreedyBotChaos ArmyIf the opponent averaged less than 2.5 over the last 100 turns then plays int(5 - opponent_avg). Otherwise randomly selects 3 or 2 randomly.24Silly TFT Bot 3NPCsPlays tit-for-tat starting at 3.28EmpiricistChaos ArmyPerforms the best strategy that would have worked against historical data.28CopyBot DeluxeChaos ArmyTit-for-tat. Picks starting value of 2 or 3 based off of round number.32Pure TFTChaos Army"For the first round, play 2 or 3 with a 50/50 chance of each. For each subsequent round, play whatever the opponent played on the previous round."36Silly TFT Bot 2Chaos ArmyPlays tit-for-tat starting at 2.40CloneBotClone ArmyCloneBot. Died before the treaty broke.42jacobjacob-BotNorm EnforcersCooperates with Ben-Bot42SimplePatternFinderBotChaos ArmyFinds simple patterns.42Silly 2 BotNPCsAlways returns 2.43Winner against low constant botsChaos ArmyStarts with 2. Then always returns 5 - opponent_previous_move.44Clone wars, episode return 3Clone ArmyCloneBot. Died before the treaty broke.50a_comatose_squirrelClone ArmyCloneBot. Died before the treaty bre52CliqueZviBotClone ArmyCloneBot. Died before the treaty bre53incomprehensibotClone ArmyCloneBot. Died before the treaty bre53A Very Social BotClone ArmyCloneBot. Died before the treaty bre57KarmaBotClone ArmyCloneBot. Died before the treaty bre58Akrasia BotClone ArmyCloneBot. Died before the treaty bre60AttemptAtFairChaos ArmyOscillates between 3 and 2, starting with 3.95OscillatingTwoThreeBotChaos Army"cooperates in the dumbest possible way"95Why can't we all just get alongChaos ArmyDoesn't negotiate with terrorists. Doesn't overly punish slackers. Attempts to establish steady tit-for-tat.98BeauBotChaos ArmyA sophisticated bot with 528 lines. It picks one of 3 simple strategies based on it's opponent's behavior. It also adjusts its behavior based on the round.113CooperateBot [Insub]Chaos ArmyLet MLM = my last move, OLM = opponent's last move. On the first turn, play 2. On subsequent turns: [Fork 1] If (MLM + OLM = 5), then play OLM [Fork 2] Otherwise, flip a coin and play max(MLM, OLM) with 50% probability, and (5 - max(MLM, OLM)) with 50% probability.254

This concludes the alternate timeline where AbstractSpyTreeBot was disqualified by mistake. The Mutant Game (the Blind Idiot God alternate timeline with multiple game engine bugs) will resume on November 23, 2020.

Discuss

### Working in Virtual Reality: A Review

21 ноября, 2020 - 02:14
Published on November 20, 2020 11:14 PM GMT

For the past three days, I started experimenting with working in Virtual Reality. I'm quite impressed. My guess is that it's not good for most people yet, but I'd guess that 1 to 10% of people reading this would gain a 2 to 20% increase in computer productivity by using a VR working setup. The upper end is for people who get distracted easily or have a difficult time with SAD.

This feels like the most radical experiment I've made to my setup so far, so I'm quite happy with how it's worked out. I've used to dream of similar setups and it's really cool that the technology is basically there. I've given demos to a few people in my house who haven't been close to VR and their responses varied from fairly impressed to incredibly impressed.

I'm fairly convinced that there's an extremely promising future for work in VR. The VR ecosystem seems to be improving much more quickly than the alternatives. It strikes me as surprisingly possible that within 2 to 5 years, VR work setups will be the generally recommended work setups, at least for "people in the know."[1] This could both lead to direct improvements and lead the way for radical rethinkings of what work setups are possible.

My Setup

My specific setup is an Oculus Quest 2 ($300), a 2016 Macbook Pro, and the application Immersed VR. Immersed works using WiFi. My router is around 15 feet away from my headset, and my computer is connected directly to the router via Ethernet. In the app I use two "monitors"; I downscale a 4K monitor to 2048x1280 and use a side monitor of 1920x1080. It's suggested to keep resolutions rather low both because the Oculus Quest 2 doesn't itself have a high resolution (1832×1920 per eye), and because higher resolution means higher latency. You can have up to five "virtual" monitors with Immersed, but I prefer one or two big ones. This is me editing this post. My setup is pretty simple when writing. I have a second screen on the right, but I'm not using it at the moment. I typically have the main screen a bit closer to me, but zoomed it out to make this image more interesting. I think I used this setup for around 5 working hours on Wednesday, 6 on Thursday, and maybe 2 so far today (but it's still early). It didn't seem to get particularly tiring over that time. I've been getting latency of around 5ms to 15ms, but every minute or so there are some frustrating 1-5 second hiccups. It's possible to watch videos but I have seen large decreases in frame rate from time to time. They have instructions about using WiFi direct to make things smoother. I've ordered the necessary module (it's around$25) and should be getting it shortly.

I'm not sure how long I'm going to continue using it. I find the Quest a bit uncomfortable to wear for long periods and sometimes a bit tiring for my eyes. I'm going to continue tinkering to try to make it better.

Benefits

Focus

I have a roommate now and find visual stimuli distracting. I'm also in a room that's a bit of a mess. I like having a lot of things (a lot of small experiments), and that makes it difficult to have a clean workspace.

VR setups can isolate away everything that's not the monitors. There's an option to see a keyboard, but I don't use it (I recommend spending effort to not need to). There's a handful of decent virtual room options. On Immersed there seems to be a few that prominently feature space and space travel.

LessWrong now has a full tag on lighting, with 6 popular posts on the topic. I've been considering setting up a system myself.

I'm not sure how to best measure the amount of light experienced in VR vs. the sun, but things seem relatively bright to me with the Quest. VR glasses use curved lenses and a dark environment to focus the LCD light on your eyes, unlike regular monitors that are meant to be visible at any angle. So even a relatively VR small screen can produce more eye-lumens than something much larger. I recently purchased a 350nit 4k monitor and found that that hasn't been quite enough for some parts of the day. With the VR headset, I often turn the monitor brightness down

The only thread I could find on the topic was this one on Reddit, but it doesn't seem that great to me. I found this beginning of a scientific study on "VR for Seasonal Affective Disorder", but no completed version. I'd hypothesize that living mostly in VR would have some significant benefits for some people with significant SAD (if you're in VR, how does it even matter what the season is?), though I could imagine that it has some downsides too.

Ergonomics

VR headsets can be a bit heavy, but besides that can be highly ergonomic. In virtual environments you can configure screens to be anywhere you want them. I have a decent monitor arm that I find decently suboptimal. I often have a hard time bringing my monitor just where I want, so move my neck to compensate (a bad idea!). It can also be fairly shaky when my desk is in standing mode. In VR I can easily position and reposition my monitors exactly where I want them in the sizes I want them, it's great.

I've previously thought about trying to work while laying down, when my back was particularly sore. There are some intense \$6k++ setups for this, and gerry-rigging solutions can be quite awkward. With a VR headset you still would need some solution to position the keyboard, but the monitor issue is of course dramatically simplified. I tried reading a bit while lying down and it worked fine.

Portability

One of the worst things about monitors is that they are a pain to transport. They're quite large and heavy, and I've had a sequence of bad luck moving them without causing at least some considerable damage. The way things are going, with a VR headset, you could have a stellar setup anywhere at all, which is unheard of. Maybe outdoor setups on warm days would be possible, though of course, you'd have to replace the visuals with some similar or superior theme on your device (You'd still get the sounds, sent, and breeze.) Perhaps at some point laptops will forgo the screens, or maybe all the hardware will be in the headset and you'd carry a separate mouse keyboard combo.

Coworking

I haven't tried this yet, but apparently, you can cowork with Immersed. I believe you get the benefit(?) of being able to see the screens of other coworkers. The options are quite configurable depending on the program.

Coworking in VR has the obvious benefit of allowing people to live anywhere, but also the obvious cost of not being able to see people's faces. In Immersed there is one feature where you can have a "digital webcam" that uses an avatar of you in a format that's accessible for online video chats in Google Meet and similar. It's neat but faily basic.

Facebook has an impressive demo of Photorealistic Face Tracked Avatars, but I imagine it won't be released for a while.

Negatives

Resolution & Latency

As mentioned, the resolution is rather poor compared to modern monitors. The latency is significantly worse, though Wifi direct should help, and Windows setups with direct connections should be fine. This seems quite bad for high-bandwidth tasks like video editing or video games, but useable for typing and a lot of coding.

Discomfort

VR headsets are still a bit uncomfortable to wear for long periods. I imagine this will improve a lot over time. I think that future prototypes look a lot like sunglasses. Apple apparently is getting into the space, so I imagine their take will be particularly lightweight.

The Quest 2 requires Facebook login and the operating system is heavily integrated with Facebook. To share a screenshot of my in-game setup I actually had to post it to my Facebook wall, then copy and paste that image. In general the on-system OS is useable but quite basic.

Other Discussions

There are a few neat videos of people showing off their VR office setups:

• This one is a nice overview of Immersed, though it's about a year old.
• This video shows off the Immersed webcam feature.
• This one shows off Virtual Desktop with a wired connection.
• Facebook is working on "Infinite Office" which seems interesting but isn't yet available. It at least demonstrates their optimism and dedication to the area. It's pretty easy for me to imagine it being better than Immersed after it launches.

Here's a discussion of someone who didn't find working in VR particularly usable, in part because they needed to see the keyboard and apparently had a lot of in-person distractions.

The Immersed Blog is interesting, though short and biased. They claim that their team works for 8+ hours a day in VR, and point out that apparently, some users reported using VR to effectively live in different time zones.

There's an Immersed Discord and it has most of the discussion I've seen from actual users. The setup is highly biased to favor positive messages, but there is a long list of very enthusiastic users. Generally, people are most positive about the focus benefits and the use of extra monitors. There seems to be almost no discussion from users who have used it for collaboration; most have used it solo.

Conclusion

Working in VR is clearly in its "early days", but it's definitely happening. There seem to be at least dozens of people working full-time in VR at this point, most have started in the last ~2 years.  The technology is already quite inexpensive and useable. The advantages going forward are numerous and significant.

I'd expect the VR headsets coming out this next year to continue to get better, so waiting a while is a safe option. But I suggest keeping an eye out and planning accordingly. If you've been thinking about buying a fancy monitor setup or SAD light setup, you might want to reevaluate.

[1] By this, I mean what I and many smart startups would recommend. Often very good ideas take a long time to become popular. Popularity seems harder to predict than quality.

Discuss

### Why is there a "clogged drainpipe" effect in idea generation?

20 ноября, 2020 - 22:08
Published on November 20, 2020 7:08 PM GMT

I've noticed something curious about creativity. (Like when doing the Babble challenges, or  just when thinking in general.)

I have a bad idea. ("Send an object to the moon using a bird in a spacesuit") I don't write it down, instead trying to generate other ideas. Yet the bad idea keeps lingering on my mind. Eventually, I say "okay, fine, I will write you down!" and suddenly there's a sense of flow and relaxation, as if though new space has opened up for other ideas.

Ed Sheeran describes something similar

If I was advising anyone to start songwriting, I’d just say: write anything that comes into your head. Anything. Just get stuff out there. I kinda treat songwriting as a dirty tap in an old house. When you run a dirty tap, it’s clogged up, bit of dirt in it, bits of mud. You run it, mud’s going to come out, a little bit of water, a bit more mud, and then suddenly it’s going to start flowing clean water.

You’ve got to unclog the pipes when you’re songwriting. So just start writing bad songs. Write songs for the sake of it. Write a song a day. Just sit down, pick a chord, write a song, get that song out of you, and the more and more you do it, the more and more you unclog the pipes.

I’d say, if you want to start songwriting, just start. Don’t worry too much about writing the best songs possible. If anyone wants to listen to growth, go on YouTube, type in “Ed Sheeran Addicted” and you’ll hear a song that I did when I was 12. And the singing is dreadful, the songwriting is dreadful, the guitar playing is dreadful, and now I’m here talking to you as a professional musician.

When I did the "100 ways to light a candle" Babble challenge yesterday, something similar happened. I had the lingering idea of "I could ask someone else to do light the candle", but also thought "That's a boring answer". Yet it kept lingering. Finally, I wrote down "There's a general class of strategies around getting the help of other people". And suddenly that unblocked a whole range of similar ideas -- some of which I really liked, like:

"Destroy a bunch of other light sources in the world, thereby strongly increasing the incentive for others to light this candle"

or

"Go around the neighbourhood, finding a bunch of lit candles, place them next to you in a beautiful pattern EXCEPT that in a symmetry-breaking place is your candle. Wait until someone fixes it."

It was as if though the initial thought was a bunch of junk clogging the pipe. But by reifying the idea that this was an entire class of strategies, and writing the boring answer down, I suddenly opened up for the water to flow freely through the rest of the pipe.

---

What on earth is going on here? Why would minds work like this? What does it tell us about the structure of human cognition?

---

Also, I'm really excited about LessWrong's new prediction feature, so let's make some use of it:

Elicit Prediction (elicit.org/binary/questions/yEllaf__E) Elicit Prediction (elicit.org/binary/questions/KqYMGCk7) Elicit Prediction (elicit.org/binary/questions/4nYMj)

Discuss

### Embedded Interactive Predictions on LessWrong

20 ноября, 2020 - 21:35
Published on November 20, 2020 6:35 PM GMT

Ought and LessWrong are excited to launch an embedded interactive prediction feature. You can now embed binary questions into LessWrong posts and comments. Hover over the widget to see other people’s predictions, and click to add your own.

Try it out Elicit Prediction (elicit.org/binary/questions/qqEklFgQG) How to use thisCreate a question
1. Go to elicit.org/binary and create your question by typing it into the field at the top
2. Click on the question title, and click the copy URL button
3. Paste the URL into your LW post or comment. It'll look like this in the editor:
Make a prediction
1. Click on the widget to add your own prediction
2. Click on your prediction line again to delete it
Motivation

We hope embedded predictions can prompt readers and authors to:

1. Actively engage with posts. By making predictions as they read, people have to stop and think periodically about how much they agree with the author.
2. Distill claims. For writers, integrating predictions challenges them to think more concretely about their claims and how readers might disagree.
3. Communicate uncertainty. Rather than just stating claims, writers can also communicate a confidence level.
4. Collect predictions. As a reader, you can build up a personal database of predictions as you browse LessWrong.
5. Get granular feedback. Writers can get feedback on their content at a more granular level than comments or upvotes.

By working with LessWrong on this, Ought hopes to make forecasting easier and more prevalent. As we learn more about how people think about the future, we can use Elicit to automate larger parts of the workflow and thought process until we end up with end-to-end automated reasoning that people endorse. Check out our blog post to see demos and more context.

Some examples of how to use this
1. To make specific predictions, like in Zvi’s post on COVID predictions
2. To express credences on claims like those in Daniel Kokotajlo’s soft takeoff post
3. Beyond LessWrong – if you want to integrate this into your blog or have other ideas for places you’d want to use this, let us know!

Discuss

### Persuasion Tools: AI takeover without AGI or agency?

20 ноября, 2020 - 19:54
Published on November 20, 2020 4:54 PM GMT

[epistemic status: speculation]

I'm envisioning that in the future there will also be systems where you can input any conclusion that you want to argue (including moral conclusions) and the target audience, and the system will give you the most convincing arguments for it. At that point people won't be able to participate in any online (or offline for that matter) discussions without risking their object-level values being hijacked.

--Wei Dai

What if most people already live in that world? A world in which taking arguments at face value is not a capacity-enhancing tool, but a security vulnerability? Without trusted filters, would they not dismiss highfalutin arguments out of hand, and focus on whether the person making the argument seems friendly, or unfriendly, using hard to fake group-affiliation signals?

--Benquo

1. AI-powered memetic warfare makes all humans effectively insane.

--Wei Dai, listing nonstandard AI doom scenarios

This post speculates about persuasion tools—how likely they are to get better in the future relative to countermeasures, what the effects of this might be, and what implications there are for what we should do now.

To avert eye-rolls, let me say up front that I don’t think the world is likely to be driven insane by AI-powered memetic warfare. I think progress in persuasion tools will probably be gradual and slow, and defenses will improve too, resulting in an overall shift in the balance that isn’t huge: a deterioration of collective epistemology, but not a massive one. However, (a) I haven’t yet ruled out more extreme scenarios, especially during a slow takeoff, and (b) even small, gradual deteriorations are important to know about. Such a deterioration would make it harder for society to notice and solve AI safety and governance problems, because it is worse at noticing and solving problems in general. Such a deterioration could also be a risk factor for world war three, revolutions, sectarian conflict, terrorism, and the like. Moreover, such a deterioration could happen locally, in our community or in the communities we are trying to influence, and that would be almost as bad. Since the date of AI takeover is not the day the AI takes over, but the point it’s too late to reduce AI risk, these things basically shorten timelines.

Six examples of persuasion tools

Analyzers: Political campaigns and advertisers already use focus groups, A/B testing, demographic data analysis, etc. to craft and target their propaganda. Imagine a world where this sort of analysis gets better and better, and is used to guide the creation and dissemination of many more types of content.

Feeders: Most humans already get their news from various “feeds” of daily information, controlled by recommendation algorithms. Even worse, people’s ability to seek out new information and find answers to questions is also to some extent controlled by recommendation algorithms: Google Search, for example. There’s a lot of talk these days about fake news and conspiracy theories, but I’m pretty sure that selective/biased reporting is a much bigger problem.

Chatbot: Thanks to recent advancements in language modeling (e.g. GPT-3) chatbots might become actually good. It’s easy to imagine chatbots with millions of daily users continually optimized to maximize user engagement--see e.g. Xiaoice. The systems could then be retrained to persuade people of things, e.g. that certain conspiracy theories are false, that certain governments are good, that certain ideologies are true. Perhaps no one would do this, but I’m not optimistic.

Coach: A cross between a chatbot, a feeder, and an analyzer. It doesn’t talk to the target on its own, but you give it access to the conversation history and everything you know about the target and it coaches you on how to persuade them of whatever it is you want to persuade them of.

Drugs: There are rumors of drugs that make people more suggestible, like scopolomine. Even if these rumors are false, it’s not hard to imagine new drugs being invented that have a similar effect, at least to some extent. (Alcohol, for example, seems to lower inhibitions. Other drugs make people more creative, etc.) Perhaps these drugs by themselves would be not enough, but would work in combination with a Coach or Chatbot. (You meet target for dinner, and slip some drug into their drink. It is mild enough that they don’t notice anything, but it primes them to be more susceptible to the ask you’ve been coached to make.)

Imperius Curse: These are a kind of adversarial example that gets the target to agree to an ask (or even switch sides in a conflict!), or adopt a belief (or even an entire ideology!). Presumably they wouldn’t work against humans, but they might work against AIs, especially if meme theory applies to AIs as it does to humans. The reason this would work better against AIs than against humans is that you can steal a copy of the AI and then use massive amounts of compute to experiment on it, finding exactly the sequence of inputs that maximizes the probability that it’ll do what you want.

We might get powerful persuasion tools prior to AGI

The first thing to point out is that many of these kinds of persuasion tools already exist in some form or another. And they’ve been getting better over the years, as technology advances. Defenses against them have been getting better too. It’s unclear whether the balance has shifted to favor these tools, or their defenses, over time. However, I think we have reason to think that the balance may shift heavily in favor of persuasion tools, prior to the advent of other kinds of transformative AI. The main reason is that progress in persuasion tools is connected to progress in Big Data and AI, and we are currently living through a period of rapid progress those things, and probably progress will continue to be rapid (and possibly accelerate) prior to AGI.

However, here are some more specific reasons to think persuasion tools may become relatively more powerful:

Substantial prior: Shifts in the balance between things happen all the time. For example, the balance between weapons and armor has oscillated at least a few times over the centuries. Arguably persuasion tools got relatively more powerful with the invention of the printing press, and again with radio, and now again with the internet and Big Data. Some have suggested that the printing press helped cause religious wars in Europe, and that radio assisted the violent totalitarian ideologies of the early twentieth century.

Consistent with recent evidence: A shift in this direction is consistent with the societal changes we’ve seen in recent years. The internet has brought with it many inventions that improve collective epistemology, e.g. google search, Wikipedia, the ability of communities to create forums... Yet on balance it seems to me that collective epistemology has deteriorated in the last decade or so.

Lots of room for growth: I’d guess that there is lots of “room for growth” in persuasive ability. There are many kinds of persuasion strategy that are tricky to use successfully. Like a complex engine design compared to a simple one, these strategies might work well, but only if you have enough data and time to refine them and find the specific version that works at all, on your specific target. Humans never have that data and time, but AI+Big Data does, since it has access to millions of conversations with similar targets. Persuasion tools will be able to say things like "In 90% of cases where targets in this specific demographic are prompted to consider and then reject the simulation argument, and then challenged to justify their prejudice against machine consciousness, the target gets flustered and confused. Then, if we make empathetic noises and change the subject again, 50% of the time the subject subconsciously changes their mind so that when next week we present our argument for machine rights they go along with it, compared to 10% baseline probability."

Plausibly pre-AGI: Persuasion is not an AGI-complete problem. Most of the types of persuasion tools mentioned above already exist, in weak form, and there’s no reason to think they can’t gradually get better well before AGI. So even if they won't improve much in the near future, plausibly they'll improve a lot by the time things get really intense.

Language modelling progress: Persuasion tools seem to be especially benefitted by progress in language modelling, and language modelling seems to be making even more progress than the rest of AI these days.

More things can be measured: Thanks to said progress, we now have the ability to cheaply measure nuanced things like user ideology, enabling us to train systems towards those objectives.

Chatbots & Coaches: Thanks to said progress, we might see some halfway-decent chatbots prior to AGI. Thus an entire category of persuasion tool that hasn’t existed before might come to exist in the future. Chatbots too stupid to make good conversation partners might still make good coaches, by helping the user predict the target’s reactions and suggesting possible things to say.

Minor improvements still important: Persuasion doesn’t have to be perfect to radically change the world. An analyzer that helps your memes have a 10% higher replication rate is a big deal; a coach that makes your asks 30% more likely to succeed is a big deal.

Faster feedback: One way defenses against persuasion tools have strengthened is that people have grown wise to them. However, the sorts of persuasion tools I’m talking about seem to have significantly faster feedback loops than the propagandists of old; they can learn constantly, from the entire population, whereas past propagandists (if they were learning at all, as opposed to evolving) relied on noisier, more delayed signals.

Overhang: Finding persuasion drugs is costly, immoral, and not guaranteed to succeed. Perhaps this explains why it hasn’t been attempted outside a few cases like MKULTRA. But as technology advances, the cost goes down and the probability of success goes up, making it more likely that someone will attempt it, and giving them an “overhang” with which to achieve rapid progress if they do. (I hear that there are now multiple startups built around using AI for drug discovery, by the way.) A similar argument might hold for persuasion tools more generally: We might be in a “persuasion tool overhang” in which they have not been developed for ethical and riskiness reasons, but at some point the price and riskiness drops low enough that someone does it, and then that triggers a cascade of more and richer people building better and better versions.

Speculation about effects of powerful persuasion tools

Here are some hasty speculations, beginning with the most important one:

Ideologies & the biosphere analogy:

The world is, and has been for centuries, a memetic warzone. The main factions in the war are ideologies, broadly construed. It seems likely to me that some of these ideologies will use persuasion tools--both on their hosts, to fortify them against rival ideologies, and on others, to spread the ideology.

Consider the memetic ecosystem--all the memes replicating and evolving across the planet. Like the biological ecosystem, some memes are adapted to, and confined to, particular niches, while other memes are widespread. Some memes are in the process of gradually going extinct, while others are expanding their territory. Many exist in some sort of equilibrium, at least for now, until the climate changes. What will be the effect of persuasion tools on the memetic ecosystem?

For ideologies at least, the effects seem straightforward: The ideologies will become stronger, harder to eradicate from hosts and better at spreading to new hosts. If all ideologies got access to equally powerful persuasion tools, perhaps the overall balance of power across the ecosystem would not change, but realistically the tools will be unevenly distributed. The likely result is a rapid transition to a world with fewer, more powerful ideologies. They might be more internally unified, as well, having fewer spin-offs and schisms due to the centralized control and standardization imposed by the persuasion tools. An additional force pushing in this direction is that ideologies that are bigger are likely to have more money and data with which to make better persuasion tools, and the tools themselves will get better the more they are used.

Recall the quotes I led with:

... At that point people won't be able to participate in any online (or offline for that matter) discussions without risking their object-level values being hijacked.

--Wei Dai

What if most people already live in that world? A world in which taking arguments at face value is not a capacity-enhancing tool, but a security vulnerability? Without trusted filters, would they not dismiss highfalutin arguments out of hand … ?

--Benquo

1. AI-powered memetic warfare makes all humans effectively insane.

--Wei Dai, listing nonstandard AI doom scenarios

I think the case can be made that we already live in this world to some extent, and have for millenia. But if persuasion tools get better relative to countermeasures, the world will be more like this.

This seems to me to be an existential risk factor. It’s also a risk factor for lots of other things, for that matter. Ideological strife can get pretty nasty (e.g. religious wars, gulags, genocides, totalitarianism), and even when it doesn’t, it still often gums things up (e.g. suppression of science, zero-sum mentality preventing win-win-solutions, virtue signalling death spirals, refusal to compromise). This is bad enough already, but it’s doubly bad when it comes at a moment in history where big new collective action problems need to be recognized and solved.

Obvious uses: Advertising, scams, propaganda by authoritarian regimes, etc. will improve. This means more money and power to those who control the persuasion tools. Maybe another important implication would be that democracies would have a major disadvantage on the world stage compared to totalitarian autocracies. One of many reasons for this is that scissor statements and other divisiveness-sowing tactics may not technically count as persuasion tools but they would probably get more powerful in tandem.

Will the truth rise to the top: Optimistically, one might hope that widespread use of more powerful persuasion tools will be a good thing, because it might create an environment in which the truth “rises to the top” more easily. For example, if every side of a debate has access to powerful argument-making software, maybe the side that wins is more likely to be the side that’s actually correct. I think this is a possibility but I do not think it is probable. After all, it doesn’t seem to be what’s happened in the last two decades or so of widespread internet use, big data, AI, etc. Perhaps, however, we can make it true for some domains at least, by setting the rules of the debate.

Data hoarding: A community’s data (chat logs, email threads, demographics, etc.) may become even more valuable. It can be used by the community to optimize their inward-targeted persuasion, improving group loyalty and cohesion. It can be used against the community if someone else gets access to it. This goes for individuals as well as communities.

Chatbot social hacking viruses: Social hacking is surprisingly effective. The classic example is calling someone pretending to be someone else and getting them to do something or reveal sensitive information. Phishing is like this, only much cheaper (because automated) and much less effective. I can imagine a virus that is close to as good as a real human at social hacking while being much cheaper and able to scale rapidly and indefinitely as it acquires more compute and data. In fact, a virus like this could be made with GPT-3 right now, using prompt programming and “mothership” servers to run the model. (The prompts would evolve to match the local environment being hacked.) Whether GPT-3 is smart enough for it to be effective remains to be seen.

Implications

I doubt that persuasion tools will improve discontinuously, and I doubt that they’ll improve massively. But minor and gradual improvements matter too.

Of course, influence over the future might not disappear all on one day; maybe there’ll be a gradual loss of control over several years. For that matter, maybe this gradual loss of control began years ago and continues now...

I think this is potentially (5% credence) the new Cause X, more important than (traditional) AI alignment even. It probably isn’t. But I think someone should look into it at least, more thoroughly than I have.

To be clear, I don’t think it’s likely that we can do much to prevent this stuff from happening. There are already lots of people raising the alarm about filter bubbles, recommendation algorithms, etc. so maybe it’s not super neglected and maybe our influence over it is small. However, at the very least, it’s important for us to know how likely it is to happen, and when, because it helps us prepare. For example, if we think that collective epistemology will have deteriorated significantly by the time crazy AI stuff starts happening, that influences what sorts of AI policy strategies we pursue.

Note that if you disagree with me about the extreme importance of AI alignment, or if you think AI timelines are longer than mine, or if you think fast takeoff is less likely than I do, you should all else equal be more enthusiastic about investigating persuasion tools than I am.

Thanks to Katja Grace, Emery Cooper, Richard Ngo, and Ben Goldhaber for discussions and feedback on these ideas.

Related previous work:

Epistemic Security report

Aligning Recommender Systems

Stuff I’d read if I was investigating this in more depth:

Not Born Yesterday

The stuff here and here

Discuss

### Hiding Complexity

20 ноября, 2020 - 19:35
Published on November 20, 2020 4:35 PM GMT

1. The Principle

Suppose you have some difficult cognitive problem you want to solve. What is the difference between (1) making progress on the problem by thinking about it for an hour and (2) solving a well-defined subproblem whose solution is useful for the entire problem?

(Finding a good characterization of the 'subproblem' category is important for Factored Cognition, but for [this post minus the last chapter], you can think of it purely as a problem of epistemic rationality and human thinking.)

I expect most to share the intuition that there is a difference. However, the question appears ill-defined on second glance. 'Making progress' has to cash out as learning things you didn't know before, and it's unclear how that isn't 'solving subproblems'. Whatever you learned could probably be considered the solution to some problem.

If we accept this, then both (1) and (2) technically involve solving subproblems. Nonetheless, we would intuitively talk about subproblems in (2) and not in (1). Can we characterize this difference formally? Is there a well-defined, low-level quantity such that our intuition as to whether we would call a bundle of cognitive work a 'subproblem' corresponds to the size of this quantity? I think there is. If you want, take a minute to think about it yourself; I've put my proposed solution into spoilers.

I think the quantity is the length of the subproblem's solution, where by "solution", I mean "the information about the subproblem relevant for solving the entire problem".

As an example, suppose the entire problem is "figure out the best next move in a chess game". Let's contrast (1) and (2):

• (1) was someone thinking about this for an hour. The 'solution' here consists of everything she learns throughout that time, which may include many different ideas/insights about different possible moves/resolved confusions about the game state. There is probably no way to summarize all that information briefly.

• (2) was solving a well-defined subproblem. An example here is, "figure out how good Be5 is".[1] If the other side can check in four turns given that move, then the entire solution to this subproblem is the three-word statement "Be5 is terrible".

2. The Software Analogy

Before we get to why I think the principle matters, let's try to understand it better. I think the analogy to software design is helpful here.

Suppose a company wants to design some big project that will take about 900k (i.e., 900000) lines of code. How difficult is this? Here is a naive calculation:

An amateur programmer with Python can write a 50 line procedure without bugs in an hour, which suggests a total time requirement of 18k hours. Thus, a hundred amateur programmers working 30 hours a week can write the project in six weeks.

I'm not sure how far this calculation is off, but I think it's at least a factor of 20. This suggests that linear extrapolation doesn't work, and the reason for this is simple: as the size of the project goes up, not only is there more code to implement, but every piece of code becomes harder because the entire project is more complex. There are mode dependencies, more sources of error, and so forth.

This is where decompositions come in. Suppose the entire project can be visualized like this, where black boxes denote components (corresponding to pieces of code) and edges dependencies between components.

This naturally factors into three parts. Imagine you're head of the team tasked with implementing the bottom-left part. You can look at your job like this:

(An 'interface' is purely a specification of the relationship, so the ellipses are each less than one black box.)

Your team still has to implement 300k lines of code, but regardless of how difficult this is, it's only marginally harder than implementing a project that consists entirely of 300k lines. In the step from 300k to 900k, the cost actually does scale almost linearly.[2]

As said at the outset, I'm talking about this not to make a point about software design but as an analogy to the topic of better and worse decompositions. In the analogy, the entire problem is coding the 900k line system, the subproblems are coding the three parts, and the solutions to the second and third part are the interfaces.

I think this illustrates both why the mechanism is important and how exactly it works.

For the 'why', imagine the decomposition were a lot worse. In this case, there's a higher overhead for each team, ergo higher overall cost. This has a direct analog in the case where a person is thinking about a problem on her own: the more complex the solutions to subproblems are, the harder it becomes for her to apply them to the entire problem. We are heavily bottlenecked by our ability to think about several things at once, so this can make a massive difference.

For the 'how', notice that, while the complexity of the entire system trivially grows with its size, the task of programming it can ideally be kept simple (as in the case above), and this is done by hiding complexity. From the perspective of your team (previous picture), almost the entire complexity of the remaining project is hidden: it's been reduced to two simple, well-defined interfaces

This mechanism is the same in the case where someone is working on a problem by herself: if she can carve out subproblems, and if those subproblems have short solutions, it dramatically reduces the perceived complexity of the entire problem. In both cases, we can think of the quality of a decomposition as the total amount of complexity it hides.[3]

3. Human Learning

I've come to view human learning primarily under the lens of hiding complexity. The world is extremely complicated; the only way to navigate it is to view it on many different layers of abstraction, such that each layer describes reality in a way that hides 99%+ of what's really going on. Something as complex as going grocery shopping is commonly reduced to an interface that only models time requirement and results.

Abstractly, here is the principled argument as to why we know this is happening:

1. Thinking about a lot of things at once feels hard.
2. Any topic you understand well feels easy.
3. Therefore, any topic you understand well doesn't depend on a lot of things in your internal representation (i.e., in whatever structure your brain uses to store information).
4. However, many topics do, in fact, depend on a lot of things.
5. This implies your internal representation is hiding complexity.

• At the highest level, you might think solely about the amount of time you have left to do it; the complexity of how to do it is hidden.
• One level lower, you might think about (1) creating the slides and (2) practicing the speaking part; the complexity of how to do either is hidden.
• One level lower, you might think about (1) what points you want to make throughout your presentation and (2) in what order do you want to make those points; the complexity of how to turn a point into a set of slides is hidden.
• One level lower, you might think about how what slides you want for each major point; the complexity of how to create each individual slide is hidden.
• Et cetera.

In absolute terms, preparing a presentation is hard. It requires many different actions that must be carried out with a lot of precision for them to work. Nonetheless, the process of preparing it probably feels easy all the way because every level hides a ton of complexity. This works because you understand the process well: you know what levels of abstraction to use, and how and when to transition between them.

The extreme version of this view (which I'm not arguing for) is that learning is almost entirely about hiding complexity. When you first hear of some new concept, it sounds all complicated and like it has lots of moving parts. When you successfully learned it, the complexity is hidden, and when the complexity is hidden, you have learned it. Given that humans can only think about a few things at the same time, this process only bottoms out on exceedingly simple tasks. Thus, under the extreme view, it's not turtles all the way down, but pretty far down. For the most part, learning just is representing concepts such that complexity is hidden.

I once wrote a tiny post titled 'We tend to forget complicated things'. The observation was that, if you stop studying a subject when it feels like you barely understand it, you will almost certainly forget about it in time (and my conclusion was that you should always study until you think it's easy). This agrees with the hiding complexity view: if something feels complicated, it's a sign that you haven't yet decomposed it such that complexity is hidden at every level, and hence haven't learned it properly. Under this view, 'learning complicated things' is almost an oxymoron: proper learning must involve making things feel not-complicated.

It's worth noting that this principle appears to apply even for memorizing random data, at least to some extent, even though you might expect pure memorization to be a counter-example.

There is also this lovely pie chart, which makes the same observation for mathematics:

That is, math is not inherently complicated; only the parts that you haven't yet represented in a nice, complexity-hiding manner feel complicated. Once you have mastered a field, it feels wonderfully simple.

4. Factored Cognition

As mentioned in the outset, characterizing subproblems is important for Factored Cognition. Very briefly, Factored Cognition is about decomposing a problem into smaller problems. In one setting, a human has access to a model that is similar to herself, except (1) slightly dumber and (2) much faster (i.e., it can answer questions almost instantly).

The hope is that this combined system (of the human who is allowed to use the model as often as she likes) is more capable than either the human or the model by themselves, and the idea is that the human can amplify performance by decomposing big problems into smaller problems, letting the model solve the small problems, and using its answers to solve the big problem.

There are a ton of details to this, but most of them don't matter for our purposes.[4] What does matter is that the model has no memory and can only give short answers. This means that the human can't just tell it 'make progress on the problem', 'make more progress on the problem' and so on, but instead has to choose subproblems whose solutions can be described in a short message.

An unexpected takeaway from thinking about this is that I now view Factored Cognition as intimately related with learning in general, the reason being that both share the goal of choosing subproblems whose solutions are as short as possible:

• In the setting I've described for Factored Cognition, this is immediate from the fact that the model can't give long answers.
• For learning, this is what I've argued in this post. (Note that optimizing subproblems to minimize the length of their solutions is synonymous with optimizing them to maximize their hidden complexity.)

In other words, Factored Cognition primarily asks you to do something that you want to do anyway when learning about a subject. I've found that better understanding the relationship between the two has changed my thinking about both of them.

(This post has been the second of two prologue posts for an upcoming sequence on Factored Cognition. I've posted them as stand-alone because they make points that go beyond that topic. This won't be true for the remaining sequence, which will be narrowly focused on Factored Cognition and its relevance for Iterated Amplification and Debate.)

1. Be5 is "move the bishop to square E5". ↩︎

2. One reason why this doesn't reflect reality is that real decompositions will seldom be as good; another is that coming up with the decomposition is part of the work (and in extension, part of the cost). Note that, even in this case, the three parts all need to be decomposed further, which may not work as well as the first decomposition did. ↩︎

3. In Software design, the term 'modularity' describes something similar, but it is not a perfect match. Wikipedia defines it as "a logical partitioning of the 'software design' that allows complex software to be manageable for the purpose of implementation and maintenance". ↩︎

4. After all, this is a post about hiding complexity! ↩︎

Discuss

### Epistemic Progress

20 ноября, 2020 - 19:08
Published on November 20, 2020 4:08 PM GMT

Epistemic Status: Cautiously optimistic. Much of this work is in crafting and advancing terminology in ways that will hopefully be intuitive and useful. I’m not too attached to the specifics but hope this could be useful for future work in the area.

Introduction

Strong epistemics or “good judgment” clearly seems valuable, so it’s interesting that it gets rather little Effective Altruist attention as a serious contender for funding and talent. I think this might be a mistake.

This isn’t to say that epistemics haven’t been discussed. Leaders and community members on LessWrong and the EA Forum have written extensively on epistemic rationality, “good judgment”, decision making, and so on. These communities seem to have a particular interest in “good epistemics.”

But for all the blog posts on the topic, there is less in terms of long term and full-time efforts. We don’t have teams outlining lists of possible large scale epistemic interventions and estimating their cost-effectiveness, like an epistemics version of the Happier Lives Institute. We don’t have a Global Priorities Institute equivalent trying to formalize and advance the ideas from The Sequences. We have very little work outline what optimistic epistemic scenarios we could hope for 10 to 200 years out from now.

I intend to personally spend a significant amount of time on these issues going forward. I have two main goals. One is to better outline what I think work in this area could look like and how valuable it might be to pursue. The second is to go about doing work in this area in ways that both test the area and hopefully help layout groundwork that makes it easier for more people to join in.

One possible reason for a lack of effort in the space is that the current naming and organization is a bit of a mess. We have a bundle of related terms without clear delineations. I imagine that if I asked different people how they would differentiate “epistemics”, “epistemic rationality”, “epistemology”, “decision making”, “good judgment”, “rationality”, “good thinking”, and the many subcategories of these things, I’d get many conflicting and confused answers. So some of my goal is to try to highlight some clusters particularly worth paying attention to and formalize what they mean in a way that would be useful to make decisions going forward.

I’ll begin by introducing two (hopefully) self-evident ideas. “Epistemic Progress” and “Effective Epistemics.” You can think of “Epistemic Progress” as the “epistemics” subset of “Progress Studies”, and “Effective Epistemics” as the epistemic version of “Effective Altruism.” I don’t mean this as an authoritative cornerstone, but rather as pragmatic intuitions to get us through the next few posts. These names are chosen mainly because I think they would be the most obvious to the audience I expect to be reading this.

Effective Epistemics

Effective Epistemics” is essentially “whatever seems to work at making individuals or groups of people more correct about things for pragmatic purposes.” It’s a bit higher level than “value of information.” This is not focussed on whether something is theoretically true or with precise definitions of formal knowledge. It’s rather about which kinds of practices seem to make humans and machines smarter at coming to the truth in ways we can verify. If wearing purple hats leads to improvement, that would be part of effective epistemics.

There’s a multitude of things that could help or hinder epistemics. Intelligence, personal nutrition, room lighting, culture, economic incentives, mathematical knowledge, access to expertise, add-ons to Trello. If “Effective Epistemics” were an academic discipline, it wouldn’t attempt to engineer advanced epistemic setups, but rather it would survey the space of near and possible options to provide orderings. Think “cause prioritization.”

Effective Altruism typically focuses on maximizing the potential of large monetary donations and personal careers. I’d imagine Effective Epistemics would focus more on maximizing the impact of smaller amounts of effort. For example, perhaps it would be identified that if a group of forecasters all spent 30 hours studying information theory, they could do a 2% better job in their future work. My guess is that epistemic intervention estimations would be more challenging than human welfare cost-effectiveness calculations, so things would probably begin on a more coarse level. Think longtermist prioritization (vague and messy), not global welfare prioritization (detailed estimates of lives saved per dollar).

Perhaps the most important goal for “Effective Epistemics” is to reorient readers to what we care about when we say epistemics. I’m quite paranoid about people defining epistemics too narrowly and ignoring interventions that might be wildly successful, but strange.

This paranoia largely comes from the writings of Peter Drucker on having correct goals, in order to actually optimize for the right things. For example, a school “optimizing education for getting people jobs” might begin with High School students at one point when those are the highest impact. But if things change and they recognize there are new opportunities to educate adults, maybe they should jump to prioritize night school. Perhaps with online education they should close down their physical building and become an online-only nonprofit focussed on international students without great local schools. It can be very easy to fall into the pattern of trying to “be a better conventional High School than the other conventional High Schools, on the conventional measures”, even if what one really cares about upon reflection is the maximization of value from education.

Epistemic Progress

“Epistemic Progress” points to substantial changes in epistemic abilities. Progress Studies is an interesting new effort to study the long term progress of humanity. So far it seems to have a strong emphasis on scientific and engineering efforts, which makes a lot of sense as these are very easy to measure over time. There have been a few interesting posts on epistemics but these are a minority. This post on Science in Ancient China seems particularly relevant.

Historic epistemic changes are challenging to define and measure, but they are still possible to study. It seems clear in retrospect that the Renaissance and Enlightenment presented significant gains, and the Internet led to a complex mesh of benefits and losses. One should eventually create indices on “epistemic abilities” and track these over time and between communities.

One aspect I’d like to smuggle into “Epistemic Progress” is a focus on progress going forward, or perhaps “epistemic futurism”. Epistemic abilities might change dramatically in the future, and it would be interesting to map how that could happen. Epistemic Progress could refer to both minor and major progress, both seem important.

Why not focus on [insert similar term] instead?

I’m not totally sure that “epistemics” is the right frame for my focus, as opposed to the more generic “rationality”, or the more specific “institutional decision making.” As said earlier, there are several overlapping terms floating around. There are tradeoffs for each.

First, I think it doesn’t really matter. What matters is that we have some common terminology with decent enough definitions, and use that to produce research findings. Many of the research findings should be the same whether one calls the subject “epistemics”, “epistemic rationality”, “intellectual development”, or so on. If in the future a more popular group comes out with a different focus, hopefully, they should make use of the work produced from this line of reasoning. The important thing is really that this work gets done, not what we decide to call it.

As to why it’s my selected choice of the various options, I have a few reasons. “Epistemics” is an expression with rather positive connotations. Hopefully the choice of “epistemics” vs. “group thinking” would tilt research to favor actors that are well-calibrated instead of just being intelligent. An individual or group with great decision making or reasoning abilities, but several substantial epistemic problems, could do correspondingly severe amounts of damage. A group with great epistemics could also be destructive, but a large class of failures (intense overconfidence) may be excluded.

I prefer “epistemics” to “decision making” because it gets more to the heart of things. I’ve found when thinking through the use of Guesstimate that often by the time you’re making an explicit decision, it’s too late. Decisions are downstream of general beliefs. For example, someone might make a decision to buy a house in order to shorten their commute, but wouldn’t have questioned whether the worldview that produced their lifestyle was itself dramatically suboptimal. Perhaps their fundamental beliefs should have been continuously questioned, leading them to forgo their position and become a Buddhist monk.

I’ve been thinking about using some specific modifier word to differentiate “epistemics” as I refer to it from other conceptions. I’m trying to go with the colloquial definition that has emerged within the Rationality and Effective Altruist circles, but it should be noted that this definition holds different connotations to other uses of the term. For this essay, two new terms feel like enough. I’m going to reflect on this for future parts. If you have ideas or preferences, please post them in the comments.

Next Steps

Now that we have the key terms, we can start to get into specifics. I currently have a rough outline to formally write a few more posts in this sequence. The broader goal is to help secure a foundation and some motivation for further work in this space. If you have thoughts or feedback, please reach out or post in the comments.

Discuss

### Thanksgiving and Covid

20 ноября, 2020 - 07:30
Published on November 20, 2020 4:30 AM GMT

It's getting colder and the virus is still spreading. Hundreds of thousands of people are dead. It's 399 years later, and Thanksgiving has changed less than you might have have hoped.

When the English arrived in what is now Massachusetts they found the Wampanoag devastated by smallpox:

The people not many, being dead and abundantly wasted in the late great mortality which fell in all these parts about three years before the coming of the English, wherein thousands of them died, they not being able to bury one another; their skulls and bones were found in many places lying still above ground, where their houses and dwellings had been; a very sad spectacle to behold.
—Of Plymouth Plantation, Bradford.

Smallpox was still spreading, however, and later he writes (warning: gore):

This spring, also, those Indians that lived about their trading house there fell sick of the small pox, and died most miserably; for a sorer disease cannot befall them; they fear it more then the plague; for usually they that have this disease have them in abundance, and for want of bedding and lining and other helps, they fall into a lamentable condition, as they lie on their hard mats, the pox breaking and mattering, and running one into another, their skin cleaving (by reason thereof) to the mats they lie on; when they turn them, a whole side will flay off at once, (as it were) and they will be all of a gore blood, most fearful to behold; and then being very sore, what with could and other distempers, they die like rotten sheep. The condition of this people was so lamentable, and they fell down so generally of this disease, as they were (in the end) not able to help on another; no, not to make a fire, nor to fetch a little water to drink, nor any to bury the dead; but would strive as long as they could, and when they could procure no other means to make fire, they would burn the wooden trays and dishes they ate their meat in, and their very bows and arrows; and some would crawl out on all fours to get a little water, and sometimes die by the way, and not be able to get in again.

But those of the English house, (though at first they were afraid of the infection), yet seeing their woeful and sad condition, and hearing their pitiful cries and lamentations, they had compassion of them, and daily fetched them wood and water, and made them fires, got them victualls whilst they lived, and buried them when they died. For very few of them escaped, notwithstanding they did what they could for them, to the hazard of them elves. The chief Sachem himself now died, and almost all his friends and kindred. But by the marvelous goodness and providences of God not one of the English was so much as sick, or in the least measure tainted with this disease, though they daily did these offices for them for many weeks together. And this mercy which they showed them was kindly taken, and thankfully acknowledged of all the Indians that knew or heard of the same.

The path of smallpox through the people of the Americas is one of the greatest tragedies of history, not only in how it killed so many and so brutally, but also in how it left the people so vulnerable to the English and other Europeans. The story above, with some of the English trying to help their neighbors through smallpox, feels a lot like how we generally celebrate Thanksgiving: a positive episode in a history that is, overall, shameful.

We are incredibly lucky that the virus we are fighting today is so much less lethal, and our medical care so much better. Still, at this stage where we have multiple promising vaccine candidates and the end is visible, it is even more important that we not give up. We cannot afford to celebrate Thanksgiving as we've done traditionally: indoors, in large groups, talking over a long meal, after traveling a long way to spend a holiday in close proximity with a different group of people from our regular contacts.

If your family is pressuring you to travel, the CDC recommendations may be helpful. We won't be gathering around a big table with our extended family this year, precisely because being able to do so is so important and we don't want to trade one year now for many in the future.

Figuring out how to celebrate in a way that makes sense for you and your family is tricky, however, and is going to vary by your personal situation. We're planning to have Thanksgiving with our household, and possibly one other person who had covid in April. For someone who lived alone, things might feel pretty different. We're also probably going to take a masked and socially distanced walk with our relatives who live in the area, and I think outdoor activities are generally underrated.

Discuss