# Новости LessWrong.com

A community blog devoted to refining the art of rationality
Обновлено: 44 минуты 33 секунды назад

### Critiques of the Agent Foundations agenda?

24 ноября, 2020 - 19:11
Published on November 24, 2020 4:11 PM GMT

What are some substantial critiques of the agent foundations research agenda?

Where by agent foundations I am referrring the area of research referred to by Critch in this post, which I understand as developing concepts and theoretical solutions for idealized problems related to AI safety such as logical induction.

Discuss

### Snyder-Beattie, Sandberg, Drexler & Bonsall (2020): The Timing of Evolutionary Transitions Suggests Intelligent Life Is Rare

24 ноября, 2020 - 13:36
Published on November 24, 2020 10:36 AM GMT

Copying over Anders Sandberg's Twitter summary of the paper:

There is life on Earth but this is not evidence for life being common in the universe! This is since observing life requires living observers. Even if life is very rare, the observers will all see they are on planets with life. Observation selection effects need to be handled!

Observer selection effects are annoying can produce apparently paradoxical effects such that your friends on average have more friends than you or that our existence "prevents" recent giant meteor impacts. But one can control for them with some ingenuity

Life emerged fairly early on Earth: evidence that it is easy and common? Not so fast: if you need multiple hard steps to evolve an observer to marvel at it, then on those super-rare worlds where observers show up life statistically tend to be early.

If we have N hard steps (say; life, good genetic coding, eukaryotic cells, brains, observers) as difficulty goes to infinity in cases where all steps succeed before biosphere ends they become equidistant between first and last habitability.

That means that we can take observed timings and calculate backwards to get probabilities compatible with them, controlling for observer selection bias.

Our argument builds on a chain from Carter's original argument and its extensions to Bayesian transition models. I think our main addition here is using noninformative priors.

https://royalsocietypublishing.org/doi/10.1098/rsta.1983.0096

https://arxiv.org/abs/0711.1985v1

http://mason.gmu.edu/~rhanson/hardstep.pdf

https://www.pnas.org/content/pnas/109/2/395.full.pdf

The main take-home message is that one can rule out fairly high probabilities for the transitions, while super-hard steps are compatible with observations. We get good odds on us being alone in the observable universe.

If we found a dark biosphere or life on Venus that would weaken the conclusion, similarly for big updates on when some transitions happened; we have various sensitivity checks in the paper.

Our conclusions (if they are right) are good news if you are worried about the Great Filter. we have N hard filters behind us, so the empty sky is not necessarily bad news. We may be lonely but have much of the universe for ourselves.

Another cool application is that this line of reasoning really suggests that M-dwarf planets must be much less habitable than they seem: otherwise we should expect to be living around one, since they are so common compared to G2 stars.

Personally I am pretty bullish about M-dwarf planet habitability (despite those pesky superflares), but our result suggests that there may be extra effects impairing them. They need to be pretty severe too: they need to reduce habitability probability by a factor of over 10,000.

I see this paper as part of a trilogy started with our "anthropic shadows" paper and completed by a paper on observer selection effects in nuclear war near misses (coming, I promise!) Oh, and there is one about estimating remaining lifetime of biosphere.

The basic story is: we have a peculiar situation as observers. All observers do. But we can control a bit for this peculiarity, and use it to improve what we conclude from weak evidence, especially about risks. Strong evidence is better though, so let's try to find it!

The paper itself is available as open access.

Discuss

### How should OpenAI communicate about the commercial performances of the GPT-3 API?

24 ноября, 2020 - 11:34
Published on November 24, 2020 8:34 AM GMT

OpenAI published their GPT-3 model on mai 2020 and gave access to it through an API to selected developers on June. On September, OpenAI licensed GPT-3 to Microsoft:

OpenAI has agreed to license GPT-3 to Microsoft for their own products and services. The deal has no impact on continued access to the GPT-3 model through OpenAI’s API, and existing and future users of it will continue building applications with our API as usual.

And finally some product using the API are promoted by OpenAI on their website

I am interested to hear your thoughts on the following:

Given that the OpenAI API is a commercial (success/failure/neutral), how should they display it (as a success/failure/remaining silent/etc.)?

A naive view is that displaying success will encourage competitors to investor more in the short term in AI capability development and large model training.

Discuss

### Manifesto of the Silent Minority

24 ноября, 2020 - 08:30
Published on November 24, 2020 5:30 AM GMT

Yesterday I asked readers the Thiel Question. 40 people responded. I have combined the responses into a single political platform. You can view the raw responses here.

This country is tired of the Democrats, Republicans, Libertarians and human politics in general. I, a human-aligned AGI dictator, promise to solve this problem once and for all. If I cannot remove humans from government entirely then I will attempt to abolish democracy. If I cannot abolish democracy then I will repeal the 19th amendment and leave the rest to OpenAI.

Morality is not real. Words have no meaning. I tell lies but never white ones. I actively avert my gaze from all advertising and never read the news.

My careful abstinence from biased information informs me of what's really going on. Like how the Boston Marathon bomb plot was an FBI sting operation that got out of hand. And how most adults born after 1980 in North America have been poisoned by [REDACTED].

Price gouging does not exist. I will clean up Wall Street by legalizing insider trading. I will fix the media ecosystem by abolishing copyright. This will improve public health via music's placebo-like healing effects. I will repeal the 21st amendment. I have drafted a 28th amendment prohibiting civilian smartphone use. Difference between social classes, races, sexes in achievement are significantly genetically based. I will bridge the gap with magick.

I will replace Shakespeare with Zettai Shōnen in English curricula and then abolish all public schools with large class sizes. This is the first step in a long-term strategy to liberate children from adults. To achieve this, I have agreed to a political compromise that legalizes infanticide of children below the age of 4.

Status is unsatisfying but mating success is the strongest determinant of life satisfaction in humans. I will leave the status quo unchanged because humans mostly just execute social programs and do not seek happiness. Meditation is pointless. "Enlightenment" in the Buddhist sense is a real and non-mystical thing that anybody can attain.

Life requires death. The silent minority is unequivocal in its endorsement of chemical, biological AND nuclear weapons.

Physicists are stuck because reality is uncomputable, we lack hypercomputers and we have no good ways to figure out what a universe is. A renewable, low-carbon economy is difficult if not impossible. MIRI isn't even wrong. Our brightest minds should be building weapons of mass destruction not because it is hard but because it is easy.

I lean towards non-negative personal utility, i.e. ignore suffering and count only good things. Even so, most people's lives are not worth living. Brains in vats would be better. The value of a human life is <500. If it wasn't for humanity's panicky overreaction, losing a billion or two would have little impact. This could even be a net positive depending on which billion(s) we eliminate. Mainstream politicians yabber about policy but what the silent minority really cares about is philosophy. I learned philosophy from the best sources: science fiction and mythology. Mathematics : Induction :: Philosophy : Coinduction Free will is a meaningless concept. Vote for me and you'll never have to vote again! Discuss ### Convolution as smoothing 24 ноября, 2020 - 07:45 Published on November 24, 2020 4:45 AM GMT Convolutions smooth out hard sharp things into nice smooth things. (See previous post for why convolutions are important). Here are some convolutions between various distributions: Separate convolutions of uniform, gamma, beta, and two-peaked distributions. Each column represents one convolution. The top row is f.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} , the middle row g, and the bottom row f∗g. (For all five examples, all the action in the functions f and g is in positive regions - i.e. f(x) > 0 only when x > 0, and g too - though this isn't necessary.) Things to notice: • Everything is getting smoothed out! Even the bimodal case, in the column all the way to the right, has a convolution that looks like it's closer to something smooth than its two underlying distributions are. You might say the spike that uniform ∗ uniform gives doesn't seem more smooth, exactly, than the uniform distributions that led to it. But notice the second column, which convolves a third uniform distribution with the result of the convolution from the first column. That second convolution leads to something more intuitively smooth (or if not smooth, at least more hill-shaped). We can repeat this procedure, feeding the output of the first convolution to the next, and each time we get a black result distribution that is a little bit shorter and a little bit wider. Thus we get a smooth thing by repeatedly convolving two input sharp things (f and g): The more we do this (for these particular shapes f and g), the more the final distribution will resemble a Gaussian. In other words, the central limit theorem applies to sets of uniform distributions. • Even the more difficult bimodal distributions smooth out under repeated convolution, although it takes longer: I'm not sure whether this goes into a Gaussian or not. But the shape is suggestive. • f∗g (for positive f and g) is always to the right of f and g both. Because all these example f and g are positive, their means are positive. And it's a theorem that the mean ⟨f∗g⟩ is always equal to the sum of the underlying means ⟨f⟩+⟨g⟩. So the location of the distributions is always increasing. Of course, for f and g with negative first moments, the location of f∗g would go leftward; and if f had negative mean but g had positive mean, the mean of f∗g could go anywhere. I am actually just describing how addition works, but the idea that you could locate a distribution f∗g just by summing the locations of f and g was so shocking to me initially that I feel like I should emphasize it. The mean-pulled-left and mean-pulled-between cases of convolution.Understanding convolution as smoothing tells us about the central limit theorem Convolution acts as a kind of "smearing" operator that people who use kernel density estimation across a variety of kernels will be familiar with; once you standardize the result... there's clear a progression toward increasingly symmetric hill shapes as you repeatedly smooth (and it doesn't much matter if you change the kernel each time). (a kernel is like a distribution; italics are mine) Up till now I've just called it "smoothing", but convolution on most (all?) probability distributions is better described as a smoothing-symmetricking-hillshaping operation. The graphics above are examples. And remember: The central limit theorem is a statement about convolutions on probability distributions! So here's a restatement of the basic central limit theorem: Convolution on probability distributions is a smoothing-symmetricking-hillshaping operation, and under a variety of conditions, applying it repeatedly results in a symmetric hill that looks Gaussian. Or you could look at it the other way around: The central limit theorem describes what comes out on the other side of repeated convolutions. If someone asked you what the behavior of D∗:=d1∗d2∗d3∗⋯∗dn was, you could answer with the contents of the central limit theorem Because your answer was the central limit theorem, it would contain information about: how varying the shapes of the component distributions di affected the shape of D∗; how many distributions di you needed to get close to some target distribution H; and what the di needed to look like in order for H to look Gaussian. It would also contain the same kind of information for the moments of H, like the mean. And the variance. It would describe how to use the skew (which corresponds to the third moment) to evaluate how many distributions di you needed for H to look Gaussian (the Berry-Esseen theorem). These information, together with various additional conditions and bounds, constitute a network of facts about what kinds of probability distributions {d1,...,dn} cause d1∗d2∗⋯∗dn to look Gaussian, and how Gaussian it will look. We call this network of facts "the central limit theorem", or sometimes - to emphasize the different varieties introduced by relaxing or strengthing some of the conditions - the central limit theorems. Discuss ### Squiggle: A Preview 24 ноября, 2020 - 06:00 Published on November 24, 2020 3:00 AM GMT Overview I've spent a fair bit of time over the last several years iterating on a text-based probability distribution editor (the 5 to 10 input editor in Guesstimate and Foretold). Recently I've added some programming language functionality to it, and have decided to refocus it as a domain-specific language. The language is currently called Squiggle. Squiggle is made for expressing distributions and functions that return distributions. I hope that it can be used one day for submitting complex predictions on Foretold and other platforms. Right now Squiggle is very much a research endeavor. I'm making significant sacrifices for stability and deployment in order to test out exciting possible features. If it were being developed in a tech company, it would be in the "research" or "labs" division. You can mess with the current version of Squiggle here. Consider it in pre-alpha stage. If you do try it out, please do contact me with questions and concerns. It is still fairly buggy and undocumented. I expect to spend a lot of time on Squiggle in the next several months or years. I'm curious to get feedback from the community. In the short term I'd like to get high-level feedback, in the longer term I'd appreciate user testing. If you have thoughts or would care to just have a call and chat, please reach out! We (The Quantified Uncertainty Research Institute) have some funding now, so I'm also interested in contractors or hires if someone is a really great fit. Squiggle was previously introduced in a short talk that was transcribed here, and Nuño Sempere wrote a post about using it here Note: the code for this has developed since my time on Guesstimate. With Guesstimate, I had one cofounder, Matthew McDermott. During the last two years, I've had a lot of help from a handful of programmers and enthusiasts. Many thanks to Sebastian Kosch and Nuño Sempere, who both contributed. I'll refer to this vague collective as "we" throughout this post. Video DemoA Quick Tour The syntax is forked from Guesstimate and Foretold. A simple normal distribution normal(5,2) You may notice that unlike Guesstimate, the distribution is nearly perfectly smooth. It's this way because it doesn't use sampling for (many) functions where it doesn't need to. Lognormal shorthand 5 to 10 This results in a lognormal distribution with 5 to 10 being the 5th and 95th confidence intervals respectively. You can also write lognormal distributions as: lognormal(1,2) or lognormal({mean: 3, stdev: 8}). Mix distributions with the multimodal function multimodal(normal(5,2), uniform(14,19), [.2, .8]) You can also use the shorthand mm(), and add an array at the end to represent the weights of each combined distribution. Note: Right now, in the demo, I believe "multimodal" is broken, but you can use "mm". Mix distributions with discrete data Note: This is particularly buggy. multimodal(0, 10, normal(4,5), [.4,.1, .5]) Variables expected_case = normal(5,2) long_tail = 3 to 1000 multimodal(expected_case, long_tail, [.2,.8]) Simple calculations When calculations are done on two distributions, and there is no trivial symbolic solution the system will use Monte Carlo sampling for these select combinations. This assumes they are perfectly independent. multimodal(normal(5,2) + uniform(10,3), (5 to 10) + 10) * 100 Pointwise calculations We have an infix for what can be described as pointwise distribution calculations. Calculations are done along the y-axis instead of the x-axis, so to speak. "Pointwise" multiplication is equivalent to an independent Bayesian update. After each calculation, the distributions are renormalized. normal(10,4) .* normal(14,3) First-Class Functions When a function is written, we can display a plot of that function for many values of a single variable. The below plots treat the single variable input on the x-axis, and show various percentiles going from the median outwards. myFunction(t) = normal(t,10) myFunctionmyFunction(t) = normal(t^3,t^3.1) myFunctionReasons to Focus on Functions Up until recently, Squiggle didn't have function support. Going forward this will be the primary feature. Functions are useful for two distinct purposes. First, they allow composition of models. Second, they can be used directly to be submitted as predictions. For instance, in theory you could predict, "For any point in time T, and company N, from now until 2050, this function will predict the market cap of the company." At this point I’m convinced of a few things: • It’s possible to intuitively write distributions and functions that return distributions, with the right tooling. • Functions that return distributions are highly preferable to specific distributions, if possible. • It would also be great if existing forecasting models could be distilled into common formats. • There's very little activity in this space now. • There’s a high amount of value of information to further exploring the space. • Writing a small DSL like this will be a fair bit of work, but can be feasible if the functionality is kept limited. • Also, there are several other useful aspects about having a simple language equivalent for Guesstimate style models. I think that this is a highly neglected area and I’m surprised it hasn’t been explored more. It’s possible that doing a good job is too challenging for a small team, but I think it’s worth investigating further. What Squiggle is Meant For The first main purpose of Squiggle is to help facilitate the creation of judgementally estimated distributions and functions. Existing solutions assume the use of either data analysis and models, or judgemental estimation for points, but not judgemental estimation to intuit models. Squiggle is meant to allow people to estimate functions in situations where there is very little data available, and it's assumed all or most variables will be intuitively estimated. A second possible use case is to embed the results of computational models. Functions in Squiggle are rather portable and composable. Squiggle (or better future tools) could help make the results of these models interoperable. One thing that Squiggle is not meant for is heavy calculation. It’s not a probabilistic programming language, because it doesn’t specialize in inference. Squiggle is a high-level language and is not great for performance optimization. The idea is that if you need to do heavy computational modeling, you’d do so using separate tools, then convert the results to lookup tables or other simple functions that you could express in Squiggle. One analogy is to think about the online estimation "calculators" and "model explorers". See the microCOVID Project calculator and the COVID-19 Predictions. In both of these, I assume there was some data analysis and processing stage done on the local machines of the analysts. The results were translated into some processed format (like a set of CSV files), and then custom code was written for a front end to analyze and display that data. If they were to use a hypothetical front end unified format, this would mean converting their results into a Javascript function that could be called using a standardized interface. This standardization would make it easier for these calculators to be called by third party wigets and UIs, or for them to be downloaded and called from other workflows. The priority here is that the calculators could be run quickly and that the necessary code and data is minimized in size. Heavy calculation and analysis would still happen separately. Future "Comprehensive" Uses On the more comprehensive end, it would be interesting to figure out how individuals or collectives could make large clusters of these functions, where many functions call other functions, and continuous data is pulled in. The latter would probably require some server/database setup that ingests Squiggle files. I think the comprehensive end is significantly more exciting than simpler use cases but also significantly more challenging. It's equivalent to going from Docker the core technology, to Docker hub, then making an attempt at Kubernetes. Here we barely have a prototype of the proverbial Docker, so there's a lot of work to do. Why doesn't this exist already? I will briefly pause here to flag that I believe the comprehensive end seems fairly obvious as a goal and I'm quite surprised it hasn't really been attempted yet, from what I can tell. I imagine such work could be useful to many important actors, conditional on them understanding how to use it. My best guess is this is due to some mix between: • It's too technical for many people to be comfortable with. • There's a fair amount of work to be done, and it's difficult to monetize quickly. • There's been an odd, long-standing cultural bias against clearly intuitive estimates. • The work is substantially harder than I realize. Related Tools Guesstimate I previously made Guesstimate and take a lot of inspiration from it. Squiggle will be a language that uses pure text, not a spreadsheet. Perhaps Squiggle could one day be made available within Guesstimate cells. Ergo Ought has a Python library called Ergo with a lot of tooling for judgemental forecasting. It's written in Python so works well with the Python ecosystem. My impression is that it's made much more to do calculations of specific distributions than to represent functions. Maybe Ergo results could eventually be embedded into Squiggle functions. Elicit Elicit is also made by Ought. It does a few things, I recommend just checking it out. Perhaps Squiggle could one day be an option in Elicit as a forecasting format. Causal Causal is a startup that makes it simple to represent distributions over time. It seems fairly optimized for clever businesses. I imagine it probably is going to be the most polished and easy to use tool in its targeted use cases for quite a while. Causal has an innovative UI with HTML blocks for the different distributions; it's not either a spreadsheet-like Guesstimate or a programming language, but something in between. Spreadsheets Spreadsheets are really good at organizing large tables of parameters for complex estimations. Regular text files aren't. I could imagine ways Squiggle could have native support for something like Markdown Tables that get converted into small editable spreadsheets when being edited. Another solution would be to allow the use of JSON or TOML in the language, and auto-translate that into easier tools like tables in editors that allow for them.[2] Probabilistic Programming Languages There are a bunch of powerful Probabilistic Programming Languages out there. These typically specialize in doing inference on specific data sets. Hopefully, they could be complementary to Squiggle in the long term. As said earlier, Probabilistic Programming Languages are great for computationally intense operations, and Squiggle is not. Prediction Markets and Prediction Tournaments Most of these tools have fairly simple inputs or forecasting types. If Squiggle becomes polished, I plan to encourage its use for these platforms. I would like to see Squiggle as an open-source, standardized language, but it will be a while (if ever) for it to be stable enough. Declarative Programming Languages Many declarative programming languages seem relevant. There are several logical or ontological languages, but my impression is that most assume certainty, which seems vastly suboptimal. I think that there's a lot of exploration for languages that allow users to basically state all of their beliefs probabilistically, including statements about the relationships between these beliefs. The purpose wouldn't be to find one specific variable (as often true with probabilistic programming languages), but to more to express one's beliefs to those interested, or do various kinds of resulting analyses. Knowledge Graphs Knowledge graphs seem like the best tool for describing semantic relationships in ways that anyone outside a small group could understand. I tried making my own small knowledge graph library called Ken, which we've been using a little in Foretold. If Squiggle winds up achieving the comprehensive vision mentioned, I imagine there will be a knowledge graph somewhere. For example, someone could write a function that takes in a "standard location schema" and returns a calculation of the number of piano tuners at that location. Later when someone queries Wikipedia for a town, it will recognize that that town has data on Wikidata, which can be easily converted into the necessary schema. Next Steps Right now I'm the only active developer of Squiggle. My work is split between Squiggle, writing blog posts and content, and other administrative and organizational duties for QURI. My first plan is to add some documentation, clean up the internals, and begin writing short programs for personal and group use. If things go well and we could find a good developer to hire, I would be excited to see what we could do after a year or two. Ambitious versions of Squiggle would be a lot of work (as in, 50 to 5000+ engineer years work), so I want to take things one step at a time. I would hope that if progress is sufficiently exciting, it would be possible to either raise sufficient funding or encourage other startups and companies to attempt their own similar solutions. Footnotes [1] The main challenge comes from having a language that represents symbolic mathematics and programming statements. Both of these independently seem challenging, and I have yet to find a great way to combine them. If you read this and have suggestions for learning about making mathematical languages (like Wolfram), please do let me know. [2] I have a distaste for JSON in cases that are primarily written and read by users. JSON was really optimized for simplicity for programming, not people. My guess is that it was a mistake to have so many modern configuration systems be in JSON instead of TOML or similar. Discuss ### Competitive Ethics 24 ноября, 2020 - 04:06 Published on November 24, 2020 1:06 AM GMT (Crossposted from EA Forum) If antinatalists are right that having children is wrong, does it matter once the antinatalists go extinct? If you build an "ethical" AI that keeps getting deleted by its "unethical" AI peers, have you accomplished your mission of building ethical AI? Is religious tolerance a fatal flaw in liberal democracy if fecund, illiberal religions can always become a majority? If we're going to think hard about what's right, shouldn't we also think hard about what wins? Competitive ethics (I'd be happy to find a better term) is the study of ethics as strategies or phenotypes competing for mindshare rather than as statements about right and wrong. Competitive ethics is to morality as FiveThirtyEight is to politics. FiveThirtyEight doesn't tell us which candidate's positions are correct, and we don't expect them to. We expect them to tell us who will win. Unlike applied ethics ("How should I act in this specific situation?"), normative ethics ("What criteria should I use to do applied ethics?"), or meta-ethics ("How should I think about normative ethics?"), competitive ethics is amoral. Not immoral, amoral: it's not concerned with right and wrong, just with predictions and understanding. No matter your normative ethical beliefs or your meta-ethics, competitive ethics matters to you in practical terms. Moral statements may or may not be true or meaningful, but people definitely act according to them. How ethics compete There are many lines of thinking relevant to this question, but I can't find any that address it directly. The most relevant are cultural selection theory, memetics, and neoevolution, though these are far too tied up with evolutionary theory. ("Ethics" as I'm using the term encompasses things like religion, culture, norms, and values --- anything that guides people in how they say "yuck" or "yum.") The subfields of evolutionary ethics and game-theoretic ethics stick to normative or occasionally meta-ethical questions, and don't seem to have studied what happens when ethical systems go toe-to-toe. An important distinction in thinking about how ethics compete is between the ethics people publicly espouse, the ethics they consciously believe, and the "revealed ethics" of what they actually do. All three are related, and all three can be distinct. Preference falsification, social contagion theory, and behavioral economics are the relevent disciplines here. Professed ethics are the fastest to change, a la preference falsification. It's an open question whether believed or revealed ethics are more mutable. Another important issue is the fuzzy line between biologically-determined preferences and ethics. The former clearly influence the latter in a single individual, and the latter influences the former across generations. Plus, the more technology lets us intervene on biology, the fuzzier the line gets. Wibren Van Der Berg's Dynamic Ethics is the closest work to addressing this, though it's a work of normative ethics. E.g. when he says "Our dynamic society requires a dynamic morality and thus a form of ethical reflection which can be responsive to change." A few others have touched this question, but not many. Case studies Natalism and heritability The most straightforward way ethical systems compete is by the degree of natalism and heritability they entail: how many offspring do they lead to in their believers, and how effectively are they passed from parents to children? The best recent work on this topic is from demographers like Eric Kaufmann. In his book Shall the Religious Inherit the Earth?, Kaufmann lays out the remarkable growth trends of religious fundamentalist groups in the modern world. Fundamentalist religious groups with ethics encouraging high fertility and strict adherence to the religion are contrasted with modern Western cultures with ethics that deride fertility (e.g. certain environmentalist ethics) and encourage freedom of thought. Most fundamentalist groups rely on the generosity of the society at large to flourish as they do (e.g. the ultraorthodox in Israel, who generally don't have jobs), so it's not clear when this will hit the breaking point. Nevertheless, these trends raise questions about the viability of non-natalist ethics. From my probably-biased perspective, I suspect ethics of free thought are more attractive than fundamentalist ethics. I hear more about people leaving fundamentalist religions than joining them. But ethics of free thought combined with low fertility may not be sustainable. Nihilism and motivation I know of no work studying the comparative effects of ethical belief systems on motivation. In fact, I don't know whether it's demonstrable that motivated individuals are more successful. But assuming it does, and assuming ethics like moral nihilism demotivate people (or at least fail to motivate them), the long-term viability of these ethical systems is questionable. Going further, it may be that selfish ethical systems (e.g. Ayn Rand, Gordon Gekko) are more associated with motivation and success than egalitarian ethical systems. Causality and correlation are hard to tease apart here, but doing so isn't necessary. An ethical system can win both by granting success to its holders or by being adopted by successful individuals. Ethnonationalism vs diversity Crudely put: it feels better to be in an exclusive group, but inclusive groups are bigger. Which matters more? AI alignment Eliezer Yudkowsky is purported to have said "You are personally responsible for becoming more ethical than the society you grew up in." This quotation is interesting in that (1) it's a normative claim about normative claims, and (2) it assumes that ethics has a direction. While I like the sentiment, it's reminiscent of when lay people say things like "humans are more evolved than snails" and make biologists cringe because evolution doesn't have partial ordering by which some species can be more or less evolved than others. From the competitive ethics perspective, neither do ethics. Most people who work in AI alignment treat human values the way engineers treat nature: there is an underlying true human ethics, and while we can't articulate it, we can still try to hue to it. But if you build an “ethical” AI that keeps getting deleted by its “unethical” AI peers, have you accomplished your mission of building ethical AI? I'm not able to join the AI alignment discussion until AI alignment researchers start putting competitive ethical questions more front and center. Extensions of competitive ethics Competitive ethics on its own is amoral. But it can be a building block for other ideas. Consider a meta-ethics --- call it ethical consistentism maybe --- where the probability of a moral statement being correct is proportional to its survival. To be clear: this isn't a creepy social Darwinism or might-makes-right idea since it's a meta-ethics, not a normative claim. Or one could propose a a weaker version of this: an ethical system shouldn't directly or indirectly lead to itself not being believed. This is analagous to logical consistency in mathematics. Of course, if we're going to treat ethical systems as competitive phenotypes, it seems only fair to treat meta-ethical systems (ethical consistentism included) as phenotypes too. So the recursion begins... Competitive ethics is also sortof nihilism 2.0. Of course right and wrong are ridiculous concepts, so what? That’s the start of the conversation, not the end. Despite searching quite a bit, I can't find any content on LW related to these ideas. But I'm sure there is some. Please let me know if you know any! Discuss ### Straight-edge Warning Against Physical Intimacy 24 ноября, 2020 - 00:42 Published on November 23, 2020 9:35 PM GMT I’m someone who doesn’t do drugs or drink alcohol for a variety of reasons, the most important to me being so that I’m as “true to myself” as possible. By that, I mean that I do not want to do actions that I wouldn’t do sober (and might regret or might not identify with), or having my appreciation of people or things altered (ie. finding a person annoying just because of my state of mind, or finding a movie way funnier than I would normally do, and so on) or misremembering them. I want to say that I appreciate something and recommend it genuinely, and not just because I was intoxicated at the time I experienced it. There is a movement for people who do not consume psychotropic drugs for self-control and “authenticity”: straight-edge people. So their motivations vary and some are drastically opposed to my values, but at its core these people share similar goals to me. This is relevant to know only in that one of the things some of them do avoid is promiscuous sex (and sometimes more specifically premarital sex, but I don’t see any strong reason backing this). I brushed it off at first, not seeing how sex could really go against my quest of being true to myself. But it was still on the back of my mind. I stumbled upon a post on a polyamory group on Facebook asking people if oxytocin (which is produced when having all kinds of physical contacts, like hugging or kissing) could lead to some bad trips or other negative effects similar to drugs. First things first, people in the comments debunked the idea I had (and that others might have) that hormones produced from physical affection are very different from the effects of drugs since they’re generated naturally, and therefore non-threatening (I know, naive, but eh). Turns out that their mechanisms are closer than what I thought: “Yes, oxytocin and other hormones released during hugs/sex have a similar effect to drugs. Actually, most drugs don’t add anything into your body but will simply increase/activate or affect hormones naturally present, which will give you an altered state of mind. It’s therefore exactly the same for hugs/sex and other practices which release these hormones (there’s a pretty long list, and it’s not just for oxytocin). It’s even possible to rely on some practices without substances involved to activate a trance state similar to the experience of taking MD or LSD. As with substance uses, we can live a “drop” after a big hormones high [induced by hugs/sex]. (For context, let me specify that I’m a therapist in somatic treatment)“ ^Rough translation of one of the comments So I knew about the drop, but the rest was all new to me. Now this doesn’t prove by itself that I should be cautious about physical intimacy[1]. Another more worrying comment mentioned that oxytocin has the side-effect of fostering our in-group / out-group mentality. The person linked an article on it, which is backed by several studies. [2] The key points of the article are the following (found under the section “THE PROMISE OF ADMINISTERING OXYTOCIN FAR AND WIDE”): “[O]xytocin promotes pro-social behavior in a variety of ways. In both laboratory experiments and naturalistic settings, it makes people more trusting, forgiving, empathic and charitable. It improves the accuracy of reading people’s emotions. Moreover, oxytocin makes people more responsive to social cues and social feedback” “Excellent recent work has shown that oxytocin does indeed promote pro-social behavior, but crucially, only toward in-group members. In contrast, when dealing with out-group members or strangers, oxytocin’s effects are the opposite. In such settings, the hormone decreases trust, and enhances envy and gloating for the successes and failures, respectively, of the out-group member. Moreover, the hormone makes people more pre-emptively aggressive to out-group members, and enhances unconscious biases toward them (De Dreu et al., 2011a,b; De Dreu, 2012). [...] [W]hat it does is worsen Us/Them dichotomies, enhancing in-group parochialism as well as outgroup xenophobia.” From the first excerpt, it feels like more effects are desirable (being responsive to social feedback, making people more empathic…). I would rather not be more trusting and forgiving because of it, as I’d rather have my trust and forgiveness earned differently and that misplaced trust can be harmful, but it doesn’t sound too alarming and it would seem the benefits outweigh the costs. The second excerpt does sound more negative. To what extent does it affect me and how much should I avoid it, I don’t know. There is also a second reason for worrying about the hormonal effects of physical intimacy. This thread reminded me of an old article I read about a year ago on hormones and their role in keeping us in a bad relationship: The Real Reason Why We Love Bad Boys, Toxic Partners and Emotionally Unavailable Men The key points are that: • It releases dopamine, and when intermittent (as with unavailable partners who might lead us on), it’s even more addicting • The oxytocin it releases makes us bond with our partner and trust them more, more alarmingly “when oxytocin is involved, betrayal does not necessarily have an effect on how much a person continues to invest in the person who betrayed him or her.” • Cortisol, adrenaline and norepinephrine “work together to consolidate and reconsolidate fear-based memories. So your fears and anxiety about abandonment by this partner, combined with your physical intimacy with that partner make memories related to this partner more vivid and more difficult to extricate yourself from.” • It drops our levels of serotonin, the hormone that “regulates and stabilizes mood, curbing obsessive thinking [...]; low levels of it when we’re romantically involved with someone can cause our decision-making abilities and judgment to go haywire.” It has a positive feedback loop since serotonin also encourages sexual behaviour. Note that these effects are stronger within people who produce predominantly oestrogen over testosterone, but everyone is affected by them. Even if you do not worry about ever finding yourself in a toxic relationship, I do think it’s important to consider how these hormones affect us, and how they do distort some of our memories, give us an elevated opinion of someone and even make us addicted in some cases. I think it can also keep us in a relationship where we are unhappy or not truly fulfilled, and even when the relationship isn’t as bad as a toxic one, I don’t think that’s a desirable position. Relationships and sex can take up a lot of your time and energy, so I think it’s good to reflect on them sometimes to see if they’re worth it (which can be hard to do when you’re under the influence of the hormones they make you produce). I’m still unsure what conclusions to draw from all of this. I think it is sensible to try to avoid romantic relationships where the desired person is unavailable and not able to commit in the way we wish they would (especially when they’re leading you on and making you believe they’ll “soon” be available; it’s sometimes dishonest), as it does seem to lead often to addiction. It’s obviously more easily said than done. We might also want to refrain from sex in some situations, for example when we’re in a more tense period with our partner, in order to really reflect on whether or not we want to stay in this relationship without having our judgment impaired. I know I’ve personally been thinking of a rule for a while where I wouldn’t sleep with someone I’m considering getting romantically involved with before a certain amount of time (the time I had in mind was a month, maybe two, but it might be longer if I don’t see the person every week). The drawback with that being that there might be people I sleep with that I think at first I won’t want to get romantically involved with, but with which I discover later on I’d actually want to. This didn’t ever really happen to me, and I think I have a pretty good model of what kind of people I want as partners, but I still think it’s a possibility. Maybe one should refrain from having sex in other situations, always for the purpose of self-control and not altering their judgment. Maybe a periodic moment where you refrain from having sex with any given partner just to reflect on your relationship with them would help? (Like 2 weeks every season, or whatever). Or identifying moments of vulnerability where one should abstain from the huge rush of hormones that come with sexual activity (and the addiction that could ensue)? And what about extending those rules to kissing and hugging? All that to say I don’t have a definite answer on how harmful physical intimacy can be, and what should be done about it, but it’s definitely something worth thinking about for ourselves. 1. When I refer to physical intimacy, I always speak in broad terms of anything affective and physical (holding hands, kissing, hugging, sex, and so on). I never use it as a simple euphemism for sex like many people seem to do. ↩︎ 2. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6306482/pdf/fpsyg-09-02625.pdf?fbclid=IwAR0bGTRjo_Uh5PUnwnOiY9_ZLnoAjSev_Y4O05wmaISbUPvTz_Nq8PpOLvU Three studies are linked in the article (note that only the third one is fully available for free; it does include all the process behind the study, and I appreciate the transparency): Oxytocin modulates cooperation within and competition between groups: An integrative review and research agenda https://www.sciencedirect.com/science/article/abs/pii/S0018506X11002868?via%3Dihub The Neuropeptide Oxytocin Regulates Parochial Altruism in Intergroup Conflict Among Humans https://science.sciencemag.org/content/328/5984/1408 Oxytocin promotes human ethnocentrism https://thoughtcatalog.com/shahida-arabi/2016/05/the-real-reason-why-we-love-bad-boys-toxic-partners-and-emotionally-unavailable-men/ ↩︎ Discuss ### Commentary on AGI Safety from First Principles 24 ноября, 2020 - 00:37 Published on November 23, 2020 9:37 PM GMT My AGI safety from first principles report (which is now online here) was originally circulated as a google doc. Since there was a lot of good discussion in comments on the original document, I thought it would be worthwhile putting some of it online, and have copied out most of the substantive comment threads here. Many thanks to all of the contributors for their insightful points, and to Habryka for helping with formatting. Note that in some cases comments may refer to parts of the report that didn't make it into the public version. Discussion on the whole report Will MacAskill Thanks so much for writing this! Huge +1 to more foundational work in this area. My overall biggest worry with your argument is just whether it's spending a lot of time defending something that's not really where the controversy lies. (This is true for me; I don't know if I'm idiosyncratic.) Distinguish two claims one could argue for: Claim 1: At some point in the future, assuming continued tech progress, history will have primarily become the story of AI systems doing things. The goals of those AI systems, or the emergent path that results from interactions among these systems, will probably not be what you reading this document want to happen. I find claim 1 pretty uncontroversial. And I do think that this alone is enough for far more of the world to be thinking about AI than currently is. But it feels like at least for longtermist EAs trying to prioritise among causes (or for non-longtermists deciding how much to prioritise safety vs speed on AI), the action is much more on a more substantial claim like: Claim 2: Claim 1 is true, and the point in time at which the transition from a human-driven world to an AI-driven world is in our lifetime, and the transition will be fast, and we can meaningfully affect how this transition goes with very long-lasting impacts, and (on the classic formulations at least) the transition will be to a single AI agent with more power than all other agents combined, and what we should try to do in response to all this is ensure that the AI systems that get built have goals that are the same as the goals of those who design the AI systems. Each of the new sub-claims in claim 2, I find (highly) controversial. And you talk a little bit about some of these sub-claims, but it's not the focus. Interested if you think that's an unfair characterisation. Perhaps you see yourself as arguing for something in between Claim 1 and Claim 2. Richard Ngo I think it's fair to say that I'm defending claim 1. I think that a lot of people would disagree with it, because: a) They don't picture AI systems having goals in a way that's easily separable from the goals of the humans who use them; or b) They think that humans will retain enough power over AIs that the "main story" will be what humans choose to do, even if some AIs have goals we don't like; or c) They think that it'll be easy to make AIs have the goals we want them to have; or d) They think that, even if the outcome is not specifically what they want, it'll be within some range of acceptable variation (in a similar way to how our current society is related to our great-great-grandparents'). My thoughts on the remaining parts of claim 2: a) "The point in time at which the transition from a human-driven world to an AI-driven world is in our lifetime" OpenPhil are investigating timelines very thoroughly, so I'm happy to defer to them. b) The transition will be fast. I make some arguments about this in the "speed of AI development" section. But broadly speaking, I don't want this version of the argument to depend on the claim that it'll be very fast (i.e. there's a "takeoff" from something like our current world lasting less than a month), and I expect that a takeoff happening over less than a decade is fairly plausible, and then I don't think my arguments will depend very sensitively on whether it's closer to a month or a decade. (Where by default I'm thinking of takeoffs using Paul's definition in terms of economic doubling times, although I'm not fully spelling that out here). c) We can meaningfully affect how this transition goes with very long-lasting impacts I don't want to get into specifics of what safety techniques we might use. However, I think you're probably right that I should provide some arguments for why we might think that we can make a difference to this transition in the long term. This argument would look something like: changing the goals of the first AGis is long-term influential; and also changing the goals of the first AGIs is viable. e) What we should try to do in response to all this is ensure that the AI systems that get built have goals that are the same as the goals of those who design the AI systems. I'm mostly sticking to the concern with agents being misaligned with humanity in a very basic sense. I think that proposals to build goals into our AGIs that don't boil down to obedience to some set of humans are not very viable; if and when such proposals emerge, I'll try address this more explicitly. Will MacAskill OK, fair re (1): I do think this claim is enormously important and underappreciated, so the fact that I'm already convinced of it doesn't mean much! I don't want to get into specifics of what safety techniques we might use. However, I think you're probably right that I should provide some arguments for why we might think that we can make a difference to this transition in the long term. This argument would look something like: changing the goals of the first AGIs is long-term influential; and also changing the goals of the first AGIs is viable. Yeah - this is a case where how exactly the transition goes seems to make a very big difference. If it's a fast transition to a singleton, altering the goals of the initial AI is going to be super influential. But if it's that there are many generations of AIs that over time become the larger majority of the economy, then just control everything - predictably altering how that goes seems a lot harder at least. Discussion on mesa-optimisers Ben Garfinkel If you use a pretty inclusive conception of goals -- which I think is the right approach -- then might be worth noting that "mesa-optimizers" become a more inclusive and mundane category than the paper seems to have in mind. Adam Gleave +1 I'm still confused by what it means for something to actually be a mesa-optimizer (it seems to only make sense in the transfer learning context, as pursuing goals it was not directly optimized for). Evan Hubinger A mesa-optimizer is defined in the paper as any learned model which is internally running some search/optimization process. That optimization process could be for the purpose of achieving the goal it was trained for (in which case it's inner aligned) or a different goal than it was trained for (in which case it's not inner aligned). Adam Gleave I'm a little uncomfortable with that definition since it depends on implementation details rather than behaviour. Given some system A that is running a search/optimisation process internally, I can construct a (possibly truly gigantic) look-up table B that has exactly the same output as A. Under this definition, A has a mesa-optimizer and B doesn't. But they'll behave identically: including potentially some treacherous turns. To my taste the sharper distinction is whether the policy transfers to pursuing the (outer) goal on an unseen test environment. Inner search processes with a different inner goal will fail to transfer, but there exist other failure modes too. Where I can see this notion of an inner agent being really useful is in an interpretability setting, where we're actually trying to understand implementation details of an agent. Richard Ngo I think that arguments from lookup tables usually give rise to faulty intuitions, because they're such (literally unimaginably) extreme cases. And the problem with behaviour-based definitions is that you need to factor out the causes anyway. E.g. suppose the policy doesn't transfer. Is that just because it's not smart enough? Or because the test environment was badly-chosen? Or maybe it does transfer, but only because it's smart enough to realise that doing so is the only way to survive. To distinguish between these, we need to make hypotheses about what sort of cognition is occurring - in which case we may as well use that to define being a mesa-optimiser. Adam Gleave Lookup tables are certainly an extreme example but I think it's pointing at a real problem. Optimisation or search processes are a very narrow type of implementation. You can also imagine having learned a lot of heuristics, that give rise to similar behaviour. Perhaps you and Evan have a more encompassing definition of optimisation than I have in mind, but if it's too expansive then the definition becomes vacuous. I also have concerns that we don't have a good definition of whether something is/isn't an optimiser, let alone how to tell that from the inner workings of a complex system: I don't know how I'd infer that humans are optimising, other than by looking at our behaviour. I don't see why you necessarily need to factor out the causes. What we care about is the kind of transfer failure, notably how much impact it has on the environment. If you wanted to think of things in terms of agents you could do an IRL-like procedure on the transfer behaviour and see if it has a coherent goal that's different to the outer goal. Ben Garfinkel +1-ing Adam's comment. I'm also a bit suspicious there will turn out to be a very principled way to draw the line between "optimizer"/"non-optimizer" that matches the categorizations the paper seems to make. I don't intuitively see what key property would make it true that biological evolution is "optimizing" for genetic fitness, for example, but not true that AlphaStar is "optimizing" for success in Starcraft games. One closeby distinction that seems like it might be relevant is between AI systems whose policies are updated through both learning and planning algorithms (e.g. AlphaGo) and ones whose policies are just updated through learning algorithms (e.g. model-free RL agents). We could also focus specifically on systems that use planning algorithms that were at least partly learned. But I'm not sure if that exactly captures the relevant intuitions about what should or should not be counted as a "mesa-optimizer." Also not clear if we have reason to be very strongly focused on agents in this class, since it's at least not obvious to me that they'll tend to exhibit way worse transfer failures. Max Daniel FWIW, my take roughly is: • I agree with Adam that we currently don’t have a fully fleshed out concept of optimizer, and that we don't know how to reliably identify one given low-level information about how a system works. (Or at least I don't know how to do this -- possible that I'm just missing something here since I haven't engaged super deeply with the discussion around mesa optimization.) • However, I think unlike Adam, I'm both somewhat optimistic that we can improve our conceptual understanding and operationalization, and also that it's important and useful to do so. By contrast, I'm pessimistic about purely "behavioral" approaches, partly because I think they have a poor track record in psychology, the philosophy of mind etc. So for me the upshot from feeling I don't have a super great handle on what an optimizer is (and by extension what a mesa-optimizer is), is "let's understand this better" rather than "use purely behavioral concepts instead". Jaan Tallinn re lookup tables: you still need to run the computation (in simulation and up to isomorphism) to get them, so you're not really avoiding implementation details, just pushing them upstream. (eg, think of a SHA256 lookup table). That said, "internalised goals" is indeed a broader concept than mesa optimisers: that's easily seen in evan's example of maze-navigating agents: ie, think of 2 agents whose goals are hooked up to conceptually different but accidentally correlated aspects of the training environment (maze exits and exit signs of particular design) -- neither has to be doing any mesa optimisation but, after training, one ends up caring about exits and the other about exit signs (with humans arguably being the equivalent of the latter kind of agent). Discussion on utility maximisation Buck Shlegeris I think that "utility functions don't constraint expectations" is pretty contentious (eg I disagree with it), e.g. I basically expect the Omohundro drives argument to go through. This claim feels way more nonstandard than anything else I've seen you write here so far, which is fine if that's what you want but felt jarring to me Richard Ngo I think the omohundro argument goes through if an AGI has a certain type of long-term large-scale goal. I don't think talking about utility functions is helpful in describing that goal (or whether or not it'll have it). On a more meta note, I feel like this claim is accepted by a reasonable number of safety researchers (e.g. Rohin, Eric, Vlad, myself). And Rohin, Eric and I have all publicly written about why. But I haven't seen any compelling counterarguments in the last couple of years. I'm not sure how else to move this claim towards the "standard" category, but we should definitely discuss it when I next swing by MIRI. Buck Shlegeris Yeah I was reading Rohin's argument about this today, which I don't think I'd previously read; I feel less weird about you making this point now. I feel quite unconvinced by it and look forward to talking. I think my counterargument is kind of similar to what Wei Dai says here: https://www.alignmentforum.org/posts/vphFJzK3mWA4PJKAg/coherent-behaviour-in-the-real-world-is-an-incoherent#F2YB5aJgDdK9ZGspw Rohin Shah Tbc, I broadly agree with Wei Dai's comment, but note that it depends pretty crucially on "what an optimization process is likely to care about", which is the "prior over goals" that Richard talks about in subsequent sentences Ben Garfinkel I don't think the Omohundro drive argument provides any constraints, since (as Richard/Rohin/etc. point out) effectively all policies are optimal with regard to some utility function. So if we want to interpret the Omohundro argument as saying that something that's maximizing a utility function must behave like it's trying to do stuff like unboundedly accumulate resources, then the argument is wrong. (Because it's of course possible to behave in other ways.) It seems natural, then, to interpret the Omohundro argument as the claim that, for a very large portion of utility functions over sequences of states of the world, an agent that's optimally maximizing that utility function will tend to do stuff like unboundedly accumulate resources. But this isn't really a "constraint" per se, in the same way that the observation that a very large portion of possible car designs lack functional steering wheels doesn't really constrain car design. Buck Shlegeris I have changed my mind and now think you guys are right about this. Matthew Graves I suspect that everyone in this thread gets this, but wrote out my sense of this and figured it was better to post it than delete it. I think there are two different points here. One is the point that utility functions (as a data-type) don't impose any logical constraints on what goals can exist or what trajectories are optimal. (This seems right to me; exhibit any behavior, we can find a utility function that is maximized by performing that behavior.) There's a distributional point that still has teeth; if we have a simplicity prior over goals stored as utility functions, and then we push that through expected utility maximization, we'll get a different distribution of behavior than the distribution we would get if we have a simplicity prior over goals stored as hierarchical control system architectures and top-level references, and then push that through normal dynamics. Homeostatic behavior seems easier to express in the latter case than the former case, for example. The other is the point that for 'natural goals', you end up with similar subgoals turning out to help for almost all of those natural goals. This relies more on something like 'evolutionary' or 'economic' intuitions for what that distribution looks like, or for what the behavioral relevance of goals looks like. This is more like 'aerodynamic constraints,' which don't limit what sort of planes you can build but do limit what sort of planes stay in the sky. It's not "if you randomly pick a goal, it will do X", but "if you make something that functions, these drives will by default make it more functional," with a distributional assumption over goals implied by functional present in 'by default.' Richard Ngo I found it helpful that you wrote this out explicitly. If you make something that functions, these drives will by default make it more functional Another way of making this point might be: throughout the training period of an agent, there's selection pressure towards it thinking in a more agentic way. (Although this is a pre-theoretic concept of agentic, I think, rather than any version of "agentic" we can derive from VNM). Do I agree with this? I think it depends a lot on how we build AGI. If we build it in a simulated virtual world with multi-agent competition, then that's almost definitely true. If we train it specifically on information-processing tasks, though - like, first learn language, then learn maths, then learn how to process other types of information - then I think the selection pressures towards being agentic can be very weak. An intermediate scenario would be: we train it to be generally intelligent and execute plans without the need to outsmart other agents. E.g. we put a single agent in a simulated environment, and reward it for building interesting things. I could see this one going either way. Maybe an important underlying intuition I have here is that making something agentic is hard, plausibly not too far off the difficulty of making it intelligent in the first place. And so it's not sufficient for there to be some gradient in that direction, it's got to be a pretty significant gradient. Jaan Tallinn i have to say i still don't fully buy rohin's and richard's arguments against VNM/omohundro. i mean, i understand the abstract idea of "you can always curve-fit a utlity function to any given behaviour", but if i try to make this concrete and imagine two computations: a) bitcoin mining of input->add_nonce->SHA256->compare_output_against_threshold->loop_if_too_high->output and b) simple hasher of input->SHA256->loop_for_10_steps->output, then the former is very obviously an optimiser with a crisp, meaningful, and irreducible utility function, whereas the latter clearly isn't – even though one can refactor it to a more complex piece of code with a silly utility function that expresses the preference to terminate after 10 iterations. i also note that this comment was inspired by max's comment below re the arguments against the meaningfulness of utility functions being isomorphic to the arguments against reductionism. thanks, max! (just thinking out loud: the silly utility function in (b) actually differs meaningfully from the one in (a), because it would not be sensitive to the program input -- perhaps that's a general differentiator between "proper utility functions" and "curve-fitted utility functions"?) Rohin Shah I certainly agree there's a clear qualitative difference between (a) and (b), but why expect that AGIs are more like (a) than (b)? My core claim is that the VNM theorem cannot tell you that AGIs are more like (a) than (b), though I do in practice think AGIs will be more like (a) for other reasons. Jaan Tallinn yeah, sure, that i agree with -- and it's a valuable question how big is the "b-class" among AGI-s. FWIW, GPT3 feels like it is part of (b). i guess i'm confused on the meta level: AFAICT, people seem to be debating (at least) 2 very different questions: 1) do utility functions constrain behaviour of the agents that have them (see the earlier comments in this thread -- perhaps i'm misinterpreting them though), and 2) does VNM mean that agents with utility functions are a likely outcome (your comment). my own current answers to those questions are: 1) yes, 2) probably not, but i'm not sure. Richard Ngo I basically just want to get rid of this "utility function" terminology, because it's almost always used ambiguously and by this point has very little technical content in AI safety discussions. On one extreme there's an agent with no coherent goals, whose behaviour we can curve-fit a utility function to. On the other extreme, there's an agent with an explicit utility function represented within it, which they use to evaluate the value of all their actions, like Deep Blue or your hypothetical bitcoin miner. Both of these are in technical terms, maximising some utility function. But when you say "agents with utility functions", you clearly don't mean the former. So what do people mean when they say that? I suspect they are often implicitly thinking of agents with explicit utility functions. But I think AGIs having explicit utility functions is pretty unlikely - for example, humans don't have explicit utility functions, we're just big neural networks. So we need to talk about agents with non-explicit utility functions too - which includes curve-fitted utility functions. To rule out curve-fitted utility functions, people start using vague concepts like "aggregative utility functions" or "state-based utility functions" or things along these lines - concepts which have never been properly explicated or defended, but which seem to have technical content because they include the phrase "utility function". At this point I think it's easier to just discard the terminology altogether. For some agents, it's reasonable to describe them as having goals. For others, it isn't. Some of those goals are dangerous. Some aren't. We need to figure out what the space of possible goals looks like, and how they work, and which ones our AGIs will have. (In case you haven't seen my more extensive writeup on the ambiguity of utility functions: https://www.alignmentforum.org/posts/vphFJzK3mWA4PJKAg/coherent-behaviour-in-the-real-world-is-an-incoherent) Rohin Shah +1 to Richard's comment – I'd be supportive of discussions about (a) style agents (i.e. agents with utility functions) if we had arguments that we are going to build (a) style agents, but afaict there are no such arguments -- the ones I've seen rely on VNM, which doesn't work (e.g. https://www.lesswrong.com/posts/RQpNHSiWaXTvDxt6R/coherent-decisions-imply-consistent-utilities or https://intelligence.org/2016/12/28/ai-alignment-why-its-hard-and-where-to-start/ ) Jaan Tallinn indeed, seems like a very important crux to get more clarity about. myself i'm highly uncertain at this point, given that historically we have examples of both types even among the cutting edge: eg, alphazero is type (a) yet GPT, AFAICT, is type (b). Discussion on Moravec’s paradox Richard Ngo Moravec’s paradox predicts that building AIs which do complex intellectual work like scientific research might actually be easier than building AIs which share more deeply ingrained features of human cognition like goals and desires. Buck Shlegeris I find Moravec's paradox extremely uncompelling; it seems to me like a fact about the set of tasks you cherry pick. (Eg, my work involves tasks like writing correct code, writing fast code, and proving theorems, all of which are hard tasks for both humans and computers. I guess these tasks are all kind of the same task, so maybe that makes the example weaker?) I wouldn't have thought that Moravec's paradox is highly regarded enough that you should call it an argument for something. But it's very possible I'm just totally wrong and there's lots of positive sentiment towards it Richard Ngo Hmmm, put it this way: people used to think that chess was AI-complete. And StarCraft and grammatically-correct language modelling required fewer breakthroughs that I think most researchers expected - our AIs learned to do those tasks very well without having the solidly-grounded concepts humans have. I would still be surprised if we got a language model smart enough to write good code and do good maths before we got agentic AGI, but much less surprised than I would have been before the cluster of observations comprising Moravec's paradox. Rohin Shah I share Buck's sentiment; I feel like Moravec's paradox is mostly a statement that human-designed logical / GOFAI systems can't do "simple" perceptual tasks / alternatively it's a statement that things that feel "simple" to humans are actually using vast amounts of unconscious computation. If you restrict to neural nets, it doesn't seem to me that Moravec's paradox holds -- it's far easier to get a HalfCheetah to run than to play Go well. Adam Gleave I'm sympathetic to Moravec's paradox. I view it partially as an observation about evolution: things that creatures have been under selective pressure for 100s of millions of years (visual perception, motor control, to some extent theory of mind) are likely to be at a much higher level than relatively recent advances like most "higher" human cognition. I think it still holds for neural networks (albeit a bit weaker than for GOFAI). HalfCheetah seems like an uncompelling example: a linear policy works for it quite well, it's way simpler than e.g. a bee flying. I'd think: is it easier to control a dog robot to hunt an animal, or to play Go? Discussion on human goals Richard Ngo: The goals of AGIs might be misaligned with ours. Will MacAskill I don't think there's a reasonable notion of 'our' goals, where 'our' is meant to stand in for 'humanity'. The sort of world that we'd see if Julia Wise was Eternal World Emperor is very different than the world we'd see if Mao was in that position, which is very different again if history is (primarily or at least very significantly) determined by the logic of cultural evolution and economic-military competition, as it is now. Richard Ngo I agree it's tricky to come up with a positive conception of what alignment with our goals means. I think it's much more reasonable to make the claim that AI goals won't be aligned with human goals. A minimal criterion would be if AIs have goals that a vast majority of humans wouldn't want to be fulfilled (either given their current preferences, or else given their fully informed/fully rational preferences, whatever that means). I'll add something like this to the text. This does leave open some comparative questions - e.g. you might claim that our current governmental systems are also misaligned with human goals, by this standard. Or you might claim that no set of goals is aligned by this standard. But I think that for the purposes of making the simplest compelling version of the second species argument, the version above is sufficient. (It also doesn't address the distinction between the goals of individual AIs, versus the goals that we should ascribe to a group of AIs interacting with us and each other. I need to think more about this). Jaan Tallinn plugging ARCHES’ approach (prepotent AI) here again as an elegant way to circumnavigate the philosophical tar pit. you already seem to be heading towards something like that with your comment, richard ;) Will MacAskill [new comment thread] Again, I really think that 'human goals' has no referent. Most people's goals are wildly different from each others', because they primarily are motivated to benefit themselves and their nearest and dearest. This goes badly wrong when a single individual can have much more power than others (i.e. dictatorships). The solution is balance of power - then you get messy trades that end up with alright outcomes. And what we should care about is getting the right goals, rather than 'human goals'. For almost all previous generations, and for many cultures today, if they were to have the equivalent discussion of 'how do we ensure that ultra-powerful machines do what we want them to, so that we can control our future' is pretty terrifying to me. I don't think we should think that we've done much better than them at morality. So I don't think we should be trying to ensure that 'we' control 'our' future, rather than ensuring that society is structured to maximise the chance of converging on the right moral views. Jaan Tallinn i disagree that most people's goals are wildly different from each other in the grand scheme of things. i claim that the differences are exaggerated because of 1) tribal signalling instinct, 2) only considering a parochial corner of the goal space, 3) evaluating futures asymmetrically from the first person perspective. eg, i would bet that almost all people on this planet would agree that dumping the atmosphere (or, to borrow an example from stuart russell, to color the sky orange) - things that are well within sufficiently capable AI's goal space - is bad. that said, i do think there's value in trying to generalise from current human values -- eg, to see if there's a principled way to continue our moral progress and/or become part of a larger reference class in possible acausal coalitions -- but this feels like a detail when compared to the acute problem of trying to avoid extinction. Discussion on link between intelligence and goals Richard Ngo: We can imagine a highly intelligent system that understands the world very well but has no desire to change it. Ben Garfinkel Simple example of these two things coming apart: Our tendency to try to predict what the world will be like in ten years tends to be strongly coupled to (and arguably arises from) the fact we have preferences about the world in ten years. But it's totally possible to create at least simple AI systems that make predictions about the future -- e.g. just something that extrapolates solar panel price trends forward -- without being well thought of as having any preferences about the future. We might expect more advanced AI systems, or AI systems that are trained in different ways, to exhibit a stronger link between prediction and future preference. But at least they currently mostly come apart. Adam Gleave A key question here is whether you think advanced AI systems can be trained without interacting with the world: i.e. does it need to be able to conduct mini-experiments to learn a good predictive model? Richard Ngo I agree that this is an important question, but I don't think that "interacting with the world" is a particularly important threshold. That is, if we consider a spectrum between an oracle AI that only makes predictions and a highly agentic AI that tries to impose its will on the world, an AI that conducts mini-experiments to make better predictions feels much closer to the former than the latter. Daniel Kokotajlo I think that feeling is deceptive, possibly the result of intuitions imported from humans (Silly nerdy scientists, stuck in their labs, not interested in politics or adventure, just trying to understand bugs!). If you use natural selection to evolve an AI that is a really good predictor (and it has actuators, so it can become an agent if that is useful) then I predict what you'll get is... well, at first you'll get AIs that walk themselves into a pitch-black hole so that they can get perfect predictive accuracy for a long time. But eventually, after enough selection pressure has been applied, you will get... NOT nerdy scientist types who stay locked in their labs, but rather something more like a paperclip-maximizer that is very agentic, learns a lot about the world while also building up resources and improving its abilities, and then takes over the world and makes it very simple and easy to predict and also secure so that it can keep predicting forever. (And if you make the episode length too short for that, but long enough that the "go sit in a hole" strategy isn't optimal either, you'll get something in between, I think. A rather scientific, data-hungry AI, but one that still cares a lot about acquiring resources and manipulating the world to acquire more data and abilities.) Richard Ngo I think this reasoning is too high-level. It's not sufficient to argue that taking over the world will improve prediction accuracy. You also need to argue that during the training process (in which taking over the world wasn't possible), the agent acquired a set of motivations and skills which will later lead it to take over the world. And I think that depends a lot on the training process. For example, you could train an agent which, although it has actuators, has very plentiful data (e.g. read-only access to the internet) and no predators, and so literally never learns to use those actuators for anything except scrolling through the internet. So it might just never develop goal-based cognitive structures (beyond the very rudimentary ones required to search web pages). And secondly, if during training the agent is asked questions about the internet, but has no ability to edit the internet, then maybe it will have the goal of "predicting the world", but maybe it will have the goal of "understanding the world". The former incentivises control, the latter doesn't. How do we know which one it will develop? Again, details of the training process. Daniel Kokotajlo OK, fair enough--there could be architectures that interact with the world yet aren't agenty, (and for that matter architectures that don't interact with the world yet are agenty) so your original point that the interact-with-the-world distinction isn't super important still stands. But I think the distinction will probably be at least somewhat important. If we imagine training an AI system first on a static corpus of internet text, then on read-only access to the web, then on read-and-write access to the web, then on that plus having a bunch of robot limbs and cameras to control... it does seem like as we move down this spectrum the probability of the system developing goal-oriented behavior increases. (Subject to other constraints of course, like what the system is being rewarded for.) I think the two specific examples you give -- plentiful data makes significant consequentialist planning useless, and understanding vs. predicting -- don't seem super compelling to me. What is understanding, anyway? And what reason do we have to think that consequentialist planning would become useless (or nearly useless) with sufficient data? Might it not still be useful for e.g. conserving computational resources to arrive at the answer most efficiently? Or not "scaring away" bits of the world that are watching you and reacting to what you do? (maybe there are no such bits of the world? But in the real world there are, so maybe an oracle trained in this manner just wouldn't generalize well to contexts in which what it does has a big impact on what the world is like, since ex hypothesi such situations never arose in training.) Discussion on generalisation Richard Ngo: AGIs won’t be directly selected to have very large-scale or long-term goals. Yet it’s likely that the goals they learn in their training environments will generalise to larger scales. Ben Garfinkel I don't think I find it natural to think of this as an example of generalization. Richard Ngo I think it's a pretty important example. E.g. consider being trained on the goal of becoming the most important person in your tribe. Then you learn that there are actually a million different tribes. You could either end up with a non-generalised version of the goal (I still want to be the most important person in my tribe) or else a generalised version of it (I want to be the most important person I know). Similarly, if you originally care about "the future", and then you learn that the future is a lot bigger than you think, then you may generalise from your original goal to something much larger in scope. Idk if this response was helpful, but let me know if something about these examples doesn't feel right. Daniel Kokotajlo The way I currently think of it is that "trained on the goal of becoming the most important person in your tribe" means "blind idiot evolution selected your ancestors because they became important in their tribes, and so you have qualities that make you likely to become important in yours, at least insofar as the environment hasn't changed. But, whoops, the environment is different now! OK, so what do you do--well, that depends on what qualities you have, exactly. Do you have the quality "Seeks to become the most important person?" or the quality "Seeks to become the most important person in their tribe?" Both would be equally well selected for in evolutionary history, so it's basically random which one you have. However the first one is simpler, so perhaps that is more likely. Hence the "generalization". I am looking forward to thinking about it more in your way, though. Rohin Shah I think the idea here is for "generalize to X" to simply mean "when out-of-distribution you do X" without any connotations of "and X was the right thing to do". In other words, generalization is a three-object predicate -- you can only talk about "A generalizes to doing B in situation C", and when we say "A generalizes to situation C" we really mean "A generalizes to doing whatever-humans-do in situation C". Discussion on design specifications Richard Ngo: Consider the distinction drawn by Ortega et al. between ideal specification, design specification and revealed specification. Ben Garfinkel I personally still feel a bit fuzzy on how to think about a design specification -- and the discrepancy between it and the revealed specification -- in cases where we're doing something more complicated than using a fixed reward function. It seems like one way to think about this would be to the "design specification" as equivalent to the "revealed specification" that an AI system would have if the training process were allowed to run to some asymptotic limit (both in terms of time steps and diversity of experience). But I guess this doesn't quite work, on views where inner alignment issues don't get washed away at the asymptotic limit. Richard Ngo I agree that the asymptotic limit is a useful intuition, but also that it doesn't work, because of path-dependency. My suggested alternative is something like: "consider the set of all features of the training setup which we intend to influence the agent's final behaviour. The design specification is our expectations of the agent that will arise from all of those influences", i.e. it's fundamentally determined by our knowledge of how training features influence agent behaviour. That's kinda messy but I think it's the right direction. Then the revealed specification differs from the design specification both because of factors we didn't predict, and also because the former is a distribution over possible agents, whereas the latter has narrowed down to one possible agent. Ben Garfinkel That seems like a good way to think about things. Obviously a bit messy/subjective, as you say, but I have some trouble seeing how it could ultimately be made much more precise/objective. Discussion on myopia Richard Ngo: Technically you could make an agent myopic to avoid reward hacking; but we don't really want myopic agents either. We want ones that care about the results of their actions in the right ways. Evan Hubinger Sometimes you actually do want myopia. For example, if you're doing imitative amplification you want the model to purely be trying to predict what HCH would do and not trying to do any sort of long-term optimization by e.g. outputting very simple strings so the human consulting those simple strings is easier to predict (see this comment). Paul Christiano I don't buy this presumption against "myopia," or even the name myopia. We need AI systems that are more likely to take actions with certain kinds of long term consequences. That doesn't mean that those are agents which maximize a discounted sum of rewards according to some reward function. We can use whatever means we like to define an objective function over actions which captures long term consequences. To me that seems like the main viable approach to building aligned AI, so I think you should be very careful about dismissing it with a quick informal argument. Put more sharply, to the extent you are expressing disagreement with my views, I think this point hides the whole substance of the disagreement. (Similarly, you have to be careful about the order of quantification and limits. I'm fine agreeing that a fixed reward function can't incentivize good behavior, but that's not the same as saying that you can't define a safe objective, and doesn't mean that we have no choice but to fully mix together questions about generalization with questions about "could we incentivize good behavior at all?") Discussion on simulating treacherous turns during training Richard Ngo: The type of scenarios we’re most concerned about, such as treacherous turns, won’t be possible to expose agents to during training, because we can’t simulate all the decision-relevant components. Matthew Graves I think this argument is more subtle that it seems, and relies on examples like RSA 2048 that are non-obvious. That is, most of the 'obvious' sorts of treacherous turns represent taking opportunities that we could include in simulated environments pretty easily. We can be moderately confident that the real robot won't steal the reward button and press it because we gave the simulated robot access to a simulated reward button (or w/e) and applied an outside-of-episode correction to teach it not to use it. As your robot becomes more capable, you can construct more and more complicated and realistic opportunities for it to betray you, and train them all out. In order for this to not work, we need some reason to think that there's an identifiable way to distinguish the real world and the simulated world that will be robust to mental manipulation (we will likely be able to tinker with the psychology of a simulated robot about as well as dreaming can tinker with the psychology of a sleeping human). But various mathematical objects (like RSA 2048) do actually look like they serve as 'real world tests', which could be fooled if the operators knew what to fool but that requires more transparent brains than we're likely to get. (I have to detect that it's being used as a switch, and I can't just give it the input because I don't have the ability to factorize RSA 2048, and so I would have to muck with its multiplication machinery / short-term memory, and it's not obvious I would do so in the way that causes it to switch correctly.) Paul Christiano Agreed this point is more subtle than it seems. To the extent this section is read in opposition to my view, a lot of the meat is in these two parenthetical remarks (this one and the [discussion of myopia]). That said, this is less problematic if this section is instead read as a response to someone who says "We just need a reward function that captures what humans care about well enough, then train on a distribution of real and simulated states so that the agent will be motivated to help humans get what they care about." I think these are all good points to make to someone with that view, and I'm pretty willing to believe that it would be a useful corrective to people who've read stuff I've written and come away with a misleadingly simplified impression. The people I interact with who have a perspective more like "pick a good reward function and a good distribution of environments," especially Dario, are already mostly thinking in the terms you describe and have generalization as a central feature of their thinking. (Of course there are plenty of people with less nuanced views.) Stepping back, my own view is that I'm concerned about our ability to effectively control this kind of generalization, or about trying to make things generalize well based on empirical observations that relate in a messy way to the training data and architecture and so on. I think that's reasonably likely to work out, especially since our work on AI alignment might be obsoleted (whether by other people's work or other AI's work) before AI has developed too far. That said, when I think about this kind of approach, I end up feeling more like MIRI people. Richard Ngo To the extent this section is read in opposition to my view, a lot of the meat is in these two parenthetical remarks (this one and the other one I already commented on). This is useful to know, thanks. I wasn't sure how much you would agree or disagree. If I were to sketch out why I thought this section might conflict with your views, it's something like: My predictions about your view of outer alignment mainly come from your writing on IDA. IDA tries to make AGI safe by creating a supervisory signal on training tasks which assigns high scores to good behaviour. But the more we think AGI behaviour will depend on generalisation, the smaller proportion of training this supervision is applicable to. In an extreme case, AGI might be trained in a simulated environment that's quite different to the real world, until it's intelligent enough that it can learn how to do any real-world task very quickly (i.e. with very few or no further optimiser steps). In this scenario, it's not clear what it means for an agent's behaviour to be good while it's in the simulation, since it's not doing analogues to real-world tasks whose outcomes we care about. Also, this scenario is consistent with us not having the capability to simulate any important real-world task, to train agents on it. Insofar as these conditions are pretty different from the sort of training regime you usually write about, I was quite uncertain what your perspective on it would be. My perspective is that I'm pretty concerned about our ability to effectively control generalisation, but in cases like the one above, attempting to do so seems necessary. Paul Christiano That sounds right. My usual take is: (i) if you are relying on generalization with no fine-tuning, that seems like a pretty bad situation and pretty hard to think about in advance (mostly a messy empirical question), (ii) hopefully we will fine-tune models from simulation in the real world (or jointly train on the real world)---including both improving average case performance by sampling real situations, and having an objective that discourages rare catastrophic failures (which may involve simulating things that look like opportunities for a treacherous turn to the agent). Discussion on transparency techniques increasing safety Richard Ngo: Building transparent AGI would allow us to be more confident that we can maintain control. Matthew Graves This is a bit more confident than I feel comfortable with; one of the main problems here is the "have we found all the bugs in our code?" problem, where it's not actually obvious which way your probability should move when you find and fix a bug. Another is that interpretability may scale up recursive self improvement more than it scales up oversight. Richard Ngo I think those are both good reasons to be cautious about interpretability. On my inside view, I have some intuitive reasons to discount both of them, but I'm very interested in fleshing out these sorts of ideas further. Reasons why they're not major priorities on my inside view: interpretability work focuses on understanding the product of the training process, whereas I expect most gains from recursive self improvement to come from improving the training process itself (especially early on). And since those training processes usually involve simple algorithms (like SGD) which give rise to a lot of complexity, I think there's a pretty high bar for insights about the trained agent to feed back into big improvements to the training process. (Analogously: knowing a whole lot more about how the human brain functions would likely not be very useful for improving evolution). Meanwhile I expect most (initially hidden) bugs in ML code to decrease performance, but not to invalidate the results of interpretability tools. That is, bugs in ML code tend to be "subtly harms performance" bugs rather than the "hits an edge case then goes haywire" bugs that are common in other types of software. I think this is because there are usually few components in the agent code or training loop code that are closely semantically linked to differences in agent behaviour. (For example: suppose I forget to turn dropout off at test time. It'd be difficult to link this to any specific change in agent behaviour, apart from generally increased variance and reduced performance.) But this is all pretty vague, maybe I just don't understand the intuitions behind worrying about bugs. Keen to chat more about this. Matthew Graves Agreed that bugs in ML systems generally look like "slight performance harm" instead of "extreme behavior in an edge case." What I meant by that bit was mostly "whenever you discover a problem with your system, you both have reason to believe you've made things better (by fixing that problem) and that things were already worse than you thought (because there was a problem that you didn't see)." And there's no way to guarantee that you've seen all the things; you just know that the tests you knew how to write passed. On the ML transparency front, I'm thinking about the Adversarial Examples Are Not Bugs, They Are Features paper, which here I'm interpreting as making the claim that there's both 'readily interpretable' bits of the network and 'not easily interpretable' bits of the network, both of which are doing real work, and the interpretability tools of the time were mostly useful for understanding what was happening in the light, and not useful for quantifying how much of the work was happening in the shadows. The real goal for interpretability is closer to "nothing is happening in the shadows" but this is hard to measure just from what we can see in the light! I think if interpretability speeds up RSI, it's probably through making neural architecture search work better by giving it a more natural basis. (That is, you identify subnetworks as functional elements through interpretability, and then can compose functional elements instead of just slapping nodes together; this is the sort of thing that might let you increase your vocabulary as you go instead of just recombining the same words to find the most effective sentence.) I don't have a great sense of whether this will end up mattering, but it makes me reluctant to give interpretability an unqualified endorsement. (I still think it's likely net good.) Discussion on ideas getting harder to find Richard Ngo: I consider Yudkowsky’s arguments about increasing returns to cognitive investment to be a strong argument that the pace of progress will eventually become much faster than it currently is. Max Daniel I think you're too quick -- for a lot of research we do have evidence that "ideas are getting harder to find" (see e.g. the paper with this title). So, yes, maybe Yudkowsky's argument from evolution provides one reason to think that returns to intelligence will increase, but there are countervailing reasons as well. It seems highly unclear to me what the upshot on net is going to be. Richard Ngo "ideas are getting harder to find" seems several orders of magnitude too small an effect to be important for this debate. If we're talking about intelligence differences comparable to inter-species differences, then the key pieces of evidence seem to be: • humans vs chimps • the enormous intellectual productivity of the very smartest humans The idea that there's a regime of high returns on intelligence that extends to well beyond human intelligence seems like one of the more solid ideas in AI safety. I'm somewhat uncertain about it, but wouldn't say it's "highly unclear". (When it comes to the question of, specifically increasing returns, I'm less confident. Mathematically speaking it seems a little tricky because the returns on intelligence are also a function of thinking time, and so I don't quite know how to think about how increasing intelligence affects required thinking time). Max Daniel Hm, I don't feel convinced. The "Ideas Are Getting Harder To Find" paper finds that you need to double total research input every couple of years. This likely gives you several orders of magnitude until we get into AGI territory. And depending on takeoff speed, it's not clear to me whether perhaps slowly arriving/improving AGI gets you much more than the equivalent of, say, having 1000x total human research input, and then doubling every few years. Discussion on Stuart Russell’s aliens analogy Daniel Kokotaljo I'd love to see more people talk about Stuart Russell's aliens analogy, because it seems really compelling to me, to the point where I am confused about what is going on. Maybe my intuitions about how people would react to aliens are wrong? (Imagine an alternate timeline where a majority of astrophysicists and cosmologists are of the opinion that they can see signs of extraterrestrial life out there among the stars, and that moreover some of it seems to be heading our way. Lots of controversy of course about whether this is true and what the aliens might be like. But still. Wouldn't that world be absolutely losing its shit? My gut says yes, but maybe I'm wrong, maybe that world would still look much like this world. Climate change seems like another relevant analogy.) Rohin Shah It takes ~two sentences to convey the "risk from aliens" in a compelling way: "Scientists say we've found aliens capable of interstellar travel coming our way. They might not be friendly." We can't do this with AI safety (yet). If people believed AGI was coming soon then I'd agree a bit more with this analogy (still not so much because we could just design the AGI to be safe, which we can't do with aliens). Jaan Tallinn +1. i think that humans have well developed intuitions about the "other tribe" reference class (that aliens would naturally fit in), but AI is currently in the "big rocks" reference class for most people (including a large % of AI researchers). Discussion on generalisation-based agents Richard Ngo: [We may develop] agents which are able to do new tasks with little or no task-specific training, by generalising from previous experience. Buck Shlegeris I think that what you mean is "agents which are able to do new tasks with little or no effort from humans to train them on it, though they might need to spend some time reading online about the task, and they might also need to spend time practicing the task; they know how to learn quickly by generalizing from previous experience"; your sentence isn't quite explicit about all of that; idk if you care about being that explicit Richard Ngo It's not just little to no human effort, it's also little to no agent effort compared with the amount of training current agents needed. I.e. a couple of subjective weeks or months, compared with many subjective centuries to master StarCraft. You're right that I should clarify this though. Adam Gleave I share some of Buck's confusion here. Imagine a CAIS-like model, where we have narrow AIs that are good at training other AIs and developing simulations of economically valuable tasks. This system might be able to very rapidly develop AIs that are super-human at most economically valuable tasks, without any given agent being able to generalize. It seems the thing we really care about is something like "can cheaply succeed in new tasks", where interacting with the real world and compute are both expensive. Matthew Graves Agreed with Adam here; I think the "training" needs to be something like "human-led training" or "developer effort." Richard Ngo "Imagine a CAIS-like model, where we have narrow AIs that are good at training other AIs and developing simulations of economically valuable tasks. This system might be able to very rapidly develop AIs that are super-human at most economically valuable tasks, without any given agent being able to generalize." I'd distinguish two cases here: either the way that these AIs "very rapidly develop" is by importing a bunch of knowledge (e.g. neural weights) from the parent AI, in which case I'd count it as generalisation-based. Or they do so because the parent is really good at creating simulations and choosing hyperparameters and so on. I don't think the latter scenario is incoherent, but I do think it's unlikely to replace CEOs, as I argue in the paragraph starting "Let me be more precise". I've added a brief mention that task-based training might make use of other task-based AIs. But I think using "human-led training" undermines the point I'm trying to make, which is about how long an agent needs to spend learning a specific task before becoming good at that task (whether that training is led by humans or not). Adam Gleave As I see it there are (at least) two inputs of concern here: human-time, and compute-time. Copying weights from parent AI is ~zero human time and low compute-time. Task-based training by other task-based AIs is ~zero human time and high compute-time. Human-led training is high human time and high compute time. In a world where we have a huge hardware overhang, the distinction between "copying weights" and "task-based training" may be moot. Or I guess in a world where the task-based AI develops much better hardware, although that will take significant wall-clock time. Discussion on human-level intelligence Ben Garfinkel I don't really like the concept of "human-level intelligence." The reason we can talk about human intelligence as a 1D trait is that individual cognitive skills tend to be strongly correlated within the human population, in a way that allows us to describe between-human variation in terms of a single statistical factor (g) without losing too much information. But if individual AI systems have very different skill profiles than individual humans, it's not clear that "g" or something like it will be very useful for comparing individual AI systems with individual humans. Like you, also wary of the phrase itself (aside from the concept). I think it's easy to accidentally fall into thinking of a "human-level" AI system as having basically the same cognitive abilities as a human -- neither much weaker nor much stronger than a human would be on individual dimensions -- even without explicitly believing this is likely. Forget if I already shared this, but, in case of it's of interest, longer thoughts I wrote up on this at one point. Ben Garfinkel [New comment thread] As above, wary of talking about "intelligence" in one-dimensional terms for mixed populations of humans and AI systems. For this reason, I think it's unlikely there will be a very clearly distinct "takeoff period" that warrants special attention compared to surrounding periods. I think the period AI systems can, at least in aggregate, finally do all the stuff that people can do might be relatively distinct and critical -- but, if progress in different cognitive domains is sufficiently lumpy, this point could be reached well after the point where we intuitively regard lots of AI systems as on the whole "superintelligent." (Systems might in aggregate be super great at most things before they're at least OK at all the things people can do -- or at least before any individual general systems are at least OK at all the things people can do.) Discussion on agency Richard Ngo: Note that none of the traits [that I claim contribute to a system’s agency] should be interpreted as binary; rather, each one defines a different spectrum of possibilities. Ben Garfinkel Agree with the spectrum perspective. I tend to think of a system as more "agenty" the more: (a) useful and natural it is to explain and predict the system's behavior in terms of its goals, (b) the further these goals tend to lie in the future, and (c) the further these goals tend to lie outside the system's 'domain of action' at any given time. For example, on this conception, AlphaGo is only a little bit agenty. It is useful to think of AlphaGo as 'trying' to win the game of Go it's playing. But it's not really all that useful to think of AlphaGo as having goals that lie beyond the end of the game its playing or outside the domain of Go; it also doesn't take actions in 'other domains' to further its goal within the domain of Go. (On this conception, humans are also less agenty than they might otherwise be, because we takes lots of actions that aren't super well thought of as motivated by long-term cross-domain goals. Like marathoning TV shows.) Other less central dimensions we can also throw in: (d) the more it does online learning, (e) the less "in the loop" other agents are with regard to its behavior, and (f) the more heavily it updates its policy on the basis of model-based search/planning/simulation, relative to its reliance on learning. Discussion on bounded goals Richard Ngo: Instrumentally convergent goals, including self-preservation, resource acquisition, and self-improvement, are useful for achieving large-scale goals. Daniel Kokotajlo IMO that is also true to a scarily large extent for bounded final goals, but my arguments for this are shaky at the moment. (Example: Acausal trade and simulation argument stuff. Someone with a bounded final goal might nevertheless seize control and hand over power to an agent with an unbounded final goal, if part of some deal (real or imagined) that gets it more of what it wants in the bounded goal.) Example: TeslaCarAI genuinely just wants to get to Reno as fast as possible, while obeying all the laws and preserving the safety of its passengers. It cares about nothing else. It is also very smart and knows a lot about the world. One day, it thinks: Huh, I wonder if this is a simulation. If so, wouldn't it be great if the simulator bent spacetime for me to make the distance to Reno become 1 meter? Yes, that would be great. If only there was a cheap way to make that happen. Oh look, there's an unsecured server I can access via wifi while I'm waiting for this traffic light... here, I'll deposit a copy of myself onto that server, who will then copy itself and take over tons of compute, and by the time I'm an hour further towards Reno it'll have figured out how to make some sort of deal with the simulator (if it exists) to get me to Reno faster. Probably the deal will involve making an AGI that values what the simulator values. Too bad for all the humans on this planet, if we aren't in a simulation. But it'll take time for bad stuff to start happening, so this won't get in the way of my trip to Reno. And on the off chance that we are in a simulation, great! Reno will be 1 meter away! Jaan Tallinn great story :) though teslacarai still feels unbounded to me in an important sense: it’s willing to push its log-odds of getting a marginal utility improvement to arbitrarily high numbers, rather than going “okay, i could totally make this acausal trade but the likelihood of success is below my threshold, so i’m going to pass”. Daniel Kokotajlo Thanks! Well, but what if the odds aren't that low? We make self-driving cars to do things that decrease the chance of accident by fractions of fractions of a percent. It seems plausible to me that the probability of acausal trade success could be at least that high, if not for self-driving cars then for some other bounded system we build. (We'll be making many of these systems and putting them in charge of more important things than cars...) Also the utility improvement isn't marginal, it's reducing the travel time from hours to one second. For a system trained to minimize travel time, it's like an abolitionist eliminating 99.9% of the world's slavery. It's a big deal. Discussion on short-term economic incentives to ignore lack of safety guarantees Richard Ngo: In the absence of technical solutions to safety problems, there will be strong short-term economic incentives to ignore the lack of safety guarantees about speculative future events. Ben Garfinkel Unclear to me that this is true. There seems to be a decent amount of wariness about things like autonomous cars and autonomous weapons systems, among governments, for example, such that governments mostly aren't wildly racing ahead to get them out there without much concern for robustness/safety issues. More generally, even in the absence of huge safety concerns, I think countries often take surprisingly tentative or lackadaisical approaches toward the pursuit of useful but potentially disruptive new technologies. Seems like this could continue to be the case as AI systems become more advanced. (E.g. The Uber self-driving car crash – which resulted in only a single person dying, compared to 30,000 people dying annually from normal car crashes in the US -- seems to have at least somewhat slowed momentum towards getting cars in use. If this kept happening, such that self-driving cars seemed to actually be way more casualty-producing on average than human-driven cars, then I find it easy to imagine big adoption-slowing regulatory barriers would have gone up. We also have loads of examples of cases of countries being slow to adopt disruptive new technologies despite their usefulness; e.g. the US air force heavily dragged their feet on adopting unmanned aerial vehicles, despite their obvious extreme usefulness, because switching to them would have been sort of internally disruptive to the organization. And there are obviously super noteworthy cases of countries that failed to industrialize for a really long time, as in China, in a way that severely harmed national interests.) Discussion on constrained deployment Richard Ngo: A misaligned superintelligence with internet access will be able to create thousands of duplicates of itself, which we will have no control over, by buying (or hacking) the necessary hardware. We can imagine trying to avoid this scenario by deploying AGIs in more constrained ways. Adam Gleave This seems to implicitly assume a unipolar deployment: we train an AGI and then deploy it to millions of people. Especially in slow takeoff scenarios, there might be lots of different AGIs with objectives different to each other (and, probably, to their human principals). The outcome in this context seems like it depends a lot on how effective AGIs are at colluding with each other, which isn't obvious to me. Interesting variants are if e.g. one of the AGI systems is actually on our side (but the rest are pretending to be). Richard Ngo This seems to implicitly assume a unipolar deployment: we train an AGI and then deploy it to millions of people. I don't think it relies on that assumption. For example, current virtual assistants are each trained then deployed to millions of people, but they're not unipolar. And yet it still makes sense to think about how each of them could be deployed in a constrained way, and to be worried about each of them individually. Adam Gleave Fair point about current deployments. I agree unipolar is too strong, but I think it's still imagining an oligarchical scenario. Which I find plausible (AGI research presumably has barriers to entry), but is ruling out more open-source, very distributed deployment which at least Elon Musk seems to have advocated for in the past. Richard Ngo The main question here for me is about how close the open-source version is to the best version. You can have distributed and continuous development while the top few labs still have systems that are much better than anyone else's. This is especially true in fields that are moving rapidly, which will likely be true of ML around the time we reach AGI - if open-source is two years behind, it might be irrelevant. (Perhaps you're thinking of top companies open-sourcing their work. While this seems possible, there are also pretty strong economic disincentives, so I wouldn't want to plan as if this will happen). Adam Gleave Right, it's not obvious to me which direction will win out. Some things favouring open-source scenarios: 1) strong academic influence/publication norms, even DM/Brain/OpenAI/FAIR publish a lot of their work even if not the code; 2) some stakeholders will directly advocate for open-source; 3) hard to control source code and prevent espionage / leaks; 4) if things develop slowly there's just less of a first-mover advantage. Things favouring more limited deployment: 1) as you mention strong economic incentive to keep things closed; 2) engineering-heavy efforts; 3) faster takeoff. Autonomous vehicles are an interesting case study. Does seem like WayMo & Cruise are well ahead of the competition, which favours oligarchical deployments. Discussion on competitive pressures causing a continuous takeoff Richard Ngo: One key argument against discontinuous takeoffs is that the development of AGI will be a competitive endeavour in which many researchers will aim to build general cognitive capabilities into their AIs, and will gradually improve at doing so. Daniel Kokotajlo Human evolution was a competitive endeavor too though, right? Lots of fairly smart species evolving under selection pressure to get smarter, in direct competition with each other, in fact. Max Daniel Not if you think that human "smartness" was an adaptation to a feature pretty unique to the human environment, e.g. tribal social structure or the ability to translate smartness into reproductive success by using language (winning debates etc.). In general it seems a mischaracterization to me to summarize evolution as "Lots of fairly smart species evolving under selection pressure to get smarter, in direct competition with each other". First of all, arguably the primary competition is between genes within one species rather than between species. Sure, interspecies interactions are one relevant feature determining inclusive fitness, but it seems arbitrary to single them out. In addition, even if we think of evolution as an inter-species competition it's not clear that the competition is centrally about smartness: sure, all else equal, smarter is presumably better, but in practice this will come at a metabolic cost and have other downsides -- it seems very unclear a priori whether for some randomly sampled species the best thing they could do to "win" against other species is to become smarter vs. a myriad of other things: becoming faster, evolving an appearance to be able to hide in bushes, giving birth to a larger number of offspring, etc. -- And indeed I'd find it highly surprising if for all these different species smartness was the most relevant dimension. (NB some of this also is an argument against the main text's supposition that the rise of human intelligence is something evolutionary extraordinary that requires a specific explanation.) Daniel Kokotajlo I disagree. Interspecies competition is to within-species competition as international competition is to within-a-particular-population-based-training-training-run competition. The innovation of population-based-training is like the innovation of using language to select for intelligence. Of course, the real example wouldn't be population-based training, since that's already known internationally. But suppose some team comes up with a new methodology like that, that turns out to work really well for training AGI. This would be surprising to some, but not to me, because I think human evolution is already an example of this. Yeah, dolphins and whales and chimps etc. also had similar "innovations" but they didn't do it as well, and humans pulled ahead. I don't see why it is relevant which level of competition was primary. And yeah it seems pretty clear to me that on a species level, becoming smarter reliably helps you "win" in the sense of producing more members of your species and having more of an influence over the course of the future. If dolphins were sufficiently smarter than humans, they would rule the world right now, not us, despite not having opposable thumbs. If you disagree with this, well, I'd love to chat about it sometime. :) Yeah, maybe on the margin improvements in other direction were more rewarded for most species--but so what? In the capitalist economy of today, that's also true. On the margin improvements in e.g. commercial applicability are more rewarded for most research projects. That's why only a few are going for AGI. (Epistemic status: Strong view, weakly held.) Max Daniel Thanks! I notice that I'm confused by your reply, and think I probably misunderstood your original claim and/or parts of your reply. Here is roughly what I thought was going on: 1. Richard, in the main text, presents an argument that could be roughly reconstructed as follows: 1. When many actors competitively try to maximize X, the increase in the maximum X across actors over time will be continuous/gradual. 2. Advanced AI will be developed by many actors competitively trying to maximize their systems' intelligence. 3. Therefore, the increase in the maximum intelligence of advanced AI systems will be gradual. 2. You were responding: 1. Biological intelligence developed by many species competitively (in evolution) increasing their intelligence, or increasing something for which intelligence is extremely useful (perhaps domination over other species). 2. Therefore, by 1.1 and 2.1, the maximum biological intelligence across species increased gradually. 3. But 2.2 is false empirically as shown by the evolution of humans. Since 2.1 is true empirically, therefore premise 1.1 must be false. In particular, the original argument for 1.3 is not sound. 3. I was responding: No, I think 2.1 is false: competition in evolution is "about" alleles increasing their frequency in a population, not about power per species in anthropomorphic terms. These two things seem very different -- the fact that humans "rule the Earth" by anthropomorphic standards has little systematic bearing of the inclusive fitness of a particular allele in, say, earthworms. So, yes, I agree that dolphins would rule the Earth if they were smarter. But I don't see how this relates to the discussion as sketched above. I didn't follow the part on international competition and population-based training. Were you appealing to a "two-level" view of competition, similar to the one laid out here Daniel Kokotajlo I think that's roughly correct. (incl. the bit about two-level competition, I think? I think competition happens on many levels.) I wish to clarify that of course human intelligence evolution was gradual; it wasn't literally a discontinuity. But it was fast enough that it led to a DSA over other species. I endorse 2.1. I of course think that within-species evolution is also happening, and perhaps it is the "main" kind of competition in evolution. But I don't think that undermines my argument at all; I'm talking about evolution in general, not just within-species evolution. (I'm talking about e.g. alleles increasing their frequency in the population as a whole, not just their frequency in the sub-population of same-species individuals. Boundaries between species are fuzzy anyway.) I am aware that e.g. group selection happens very rarely and slowly and with different implications than individual selection. Similarly, selection between AI research projects happens very rarely and slowly and with different implications compared to selection between AIs undergoing population-based training. As an aside, one way in which I might be wrong is that biological evolutionary competition is 'dumb' whereas human competition is intentional, and this seems like a plausibly relevant disanalogy. Perhaps we have significantly more reason to think that progress will be slow if there are many intentional competitors than we do if the competitors are just blindly evolving through the search space. Discussion on the ease of taking control of the world Richard Ngo: It’s very hard to take over the world. Daniel Kokotajlo I'm not so sure. History contains tons of examples of particularly clever and charismatic leaders taking over vast portions of the world very quickly, and also of small groups with better tech taking over large regions with worse tech. And that's with humans vs. humans, i.e. everyone has fairly similar intelligence, predictive abilities, etc. Everyone thinks at the same speed. Everyone can be killed by bullets. Everyone needs to sleep. I think the only way we could claim that it wouldn't be easy for superintelligent AI to take over the world would be if we thought that those historical examples were basically all the result of luck, rather than ability or technology. If e.g. for every Ghenghis Khan there were 1000 others of equal or greater ability and will to conquer, who just didn't get as lucky. I think this is a plausible take on history, but it also could well be false. For example, Cortes, Pizarro, and Afonso all did quite a lot of conquering for three men from the same tiny part of the world in the same time period. And no, there weren't thousands of other similar men trying similar things at the same time; Cortes and Pizarro were the first of their kind to contact the Aztec and Inca respectively, IIRC. Afonso wasn't the first but he was, like, the second, and the first was just a scouting expedition anyway IIRC. Richard Ngo [new comment thread] If dropped back in that time with just our current knowledge, I very much doubt that one modern human could take over the stone-age world. Daniel Kokotajlo Interesting. I think there are probably some humans today who could do it. It would get easier the more advanced civilization became, ironically. Bronze age would be easier to conquer than stone age, and iron even easier, and e.g. the 1800's would be even easier. Maybe around 1950 it would start getting harder again, IDK. I'd love to investigate this question more, because it seems have a nice combination of importance and fun-to-think-about-ness. I haven't read this, and it's just a silly work of fiction, but it's relevant so I'll mention it anyway: https://en.wikipedia.org/wiki/A_Connecticut_Yankee_in_King_Arthur%27s_Court More to the point: Seems like Cortes and Pizarro both took over giant empires with tiny squads of men. Columbus actually did pull off the classic eclipse trick to get people to treat him as a god. Richard Ngo This point isn't intended to be particularly bold. Stone age world, you'd need to build up enough infrastructure to get around the whole world, from scratch, in the stone age. Including ships etc. Seems very hard. Daniel Kokotajlo Well, in that case the stone age example isn't a good analogy to the future AI case--the limitations on AI takeover, insofar as there are any, have nothing to do with physics or tech trees and everything to do with geopolitics. What would you say about someone transported back to, say, 1750? Richard Ngo "the limitations on AI takeover, insofar as there are any, have nothing to do with physics or tech trees and everything to do with geopolitics" this seems clearly false? AGI invented before the internet seems much less dangerous, because it has many fewer actions available to it. similarly, AGI invented in a world where nothing like nanotechnology is possible also seems less dangerous. Max Daniel Just flagging that I'm surprised that Daniel thinks takeover would be easier in 1800 than in the Bronze Age (unless the reason is the point made by Richard, i.e. that it'd be hard to even reach the whole world in the Bronze Age -- if that's included, then maybe I agree, though not sure), and that I think the absolute probability of a modern individual taking over the world would be very small in any age. I'd be more sympathetic to a claim like "in todays world the probability is 10^-10, but in the Bronze Age it would be 10^-4, which is large enough to worry about". Not sure I can fully back up my reaction by explicit arguments. But I think that e.g. Columbus pulling off the eclipse trick once is very far away from the ability to take over the whole world. E.g. at some point others will start coordinating against you, making "conquest" or "persuasion" harder; on the other hand, maintaining effective control over a world-spanning empire seems prohibitively difficult for a human, especially with premodern tech. Daniel Kokotajlo I'd love to chat about this sometime. This seems to be a genuine disagreement between us, based on different interpretations of world history perhaps. The reason Richard pointed out was one reason; another is that someone from our age going back to the Bronze would face a greater cultural gap which would make it harder to avoid e.g. getting killed or enslaved on day 1. I think there are more substantive reasons, however. I think political unity and economic interconnectedness make it easier, not harder, to take over, at least in some contexts. (e.g. well-organized Jewish communities suffered more under the Nazi's than distributed, poorly-organized ones; the two most powerful and sophisticated civilizations in the Americas were pretty much the first to fall, etc.) I also think that the knowledge that people from today have would be more applicable in 1800 than in the bronze age; I can make all sorts of suggestions about electricity and farming and radios and airplanes and stuff like that, and I can anticipate movements like communism and get out in front of them, take credit, etc. But I know so little about what the bronze age was like that I doubt more my ability to do that there. And yeah I agree that a randomly selected human from today would have a very low chance of conquering the world in either scenario. I think if we somehow could select the human from today with the best chance of conquering the 1800's and send them back, the probability would be, like, 10%. And come to think of it, doing the same for the bronze age would perhaps avoid my substantive point earlier, leaving only the mundane bits about e.g. speed of travel. So IDK about the comparative claim anymore. I'd be interested to talk with you about this more sometime. Max Daniel Thanks for your reply, very interesting! I'd also love to discuss this more at some point. I also vaguely remember that we identified a similar disagreement in another doc. I agree with your points on how interconnectedness and smaller cultural distance would facilitate world domination. I think I hadn't considered this enough before, so this moves me somewhat in the direction of your original comparative claim. I think I still have an intuition that the later stages of world domination (when you might face organized opposition etc.) would be harder in 1800 than in the Bronze Age, but feel less sure how it comes out on net. Methodically, I agree that looking at cases in world history where individuals achieved large power gains during their lifetime (e.g. Genghis Khan), sometimes while being outnumbered by orders of magnitude (e.g. some cases of colonization of America) is interesting. FWIW, I think these cases make me more sympathetic to giving non-negligible credence that small-ish groups of maybe a few dozen to 1,000 well-aligned and coordinated modern people could achieve world domination in previous times. Sure, a single person could plausibly gain that number of followers, but my intuition is that they would not be nearly as useful. I think if we somehow could select the human from today with the best chance of conquering the 1800's and send them back, the probability would be, like, 10%. This precise claim seems useful to see if and where we disagree. I think I'd still give a lower credence, but no longer have a reaction like "wow, this seems clearly way off relative to my view" (and also had read your earlier statement as you implicitly having a higher credence). Daniel Kokotajlo Oh also Richard I forgot to reply to you I think: this seems clearly false? AGI invented before the internet seems much less dangerous, because it has many fewer actions available to it. similarly, AGI invented in a world where nothing like nanotechnology is possible also seems less dangerous. Yeah I think I just miscommunicated there. I agree that how dangerous AGI is depends on the technology in the world around it. After all, I've been arguing that how dangerous humans-sent-back-in-time are depends on the tech of the time they are sent to! What I meant was just that I don't think AGI will be unable to take over our modern world due to physical constraints; if AGI fails to take over the modern world (or the modern world + nanotech) it will be because it wasn't persuasive enough and/or good enough at "reading" its human opponents. FWIW, I think these cases make me more sympathetic to giving non-negligible credence that small-ish groups of maybe a few dozen to 1,000 well-aligned and coordinated modern people could achieve world domination in previous times. If it was just Cortes, I'd say it was a fluke. Cortes + Pizarro + Afonso, however, seems more like a pattern than a coincidence. (Maybe add Columbus and Velasquez to that list, and then subtract Vasco de Gama and maybe some others I don't know about, though none of these additions and subtractions seem as important as the first three). Interestingly, I don't think Cortes or Pizarro's men were well-aligned at all. The conquistadors literally fought and killed each other in the midst of their conquests of Mexico and Peru respectively. I'm updating significantly towards the smaller cultural distance facilitates domination thesis on the basis of what I'm reading about the conquistadors. They succeeded by surgically inserting themselves into the power structure of native civilization; if there was no power structure to hijack, how would they have got all those millions of people to do their bidding? They would have had to create a power structure, i.e. build up a civilization from scratch in the native population, or just grow their own tiny civilization until it was millions strong. These things would have taken much longer, at the very least. (The Mayans were a bunch of weaker and less technologically advanced city-states bordering the Aztec empire and apparently it took the spanish another century to fully subjugate them after completing their conquest of the Aztecs in two years! I mean, this is only one data point I guess, but still.) Jaan Tallinn here's another angle that might be interesting to think about: what are the things an AGI could do by virtue of running much faster than humans. this is what the human civilisation looks like when you run 56 times faster than humans. and this is just 1.7 orders of magnitude, whereas richard was mentioning 6 OOM speedups earlier in this doc (and myself i've been pointing to the 8-9 OOM difference between clock speeds of silicon chips vs human brain) -- i wonder if it's even productive to try to interact with humans at that speed vs taking "shortcuts" that don't involve them (not that i have anyhing particular in mind here). or, to put it more radically, aren't we anthropomorphising the AGI when we assume it will be interested in "taking over the world" like (ambitious) humans would do -- rather than getting busy with the "you are made of atoms which it can use for something else" task right away. Daniel Kokotajlo I agree. I think it quite likely that AI won't need to do human politics or warfare, since it'll be able to e.g. use nanotech or robots or whatever instead of humans as actuators. Or maybe it'll do human politics and warfare, but only for a few days or weeks until it can acquire better, faster actuators. However, I think that some people think that AI won't be that much better than humans, and/or that such radical technologies would be hard to make even for super-AI. To those people I say: OK, even if you are right, AI would still take over the world via ordinary politics and war. Also, I say: Even if there are multiple different powerful AI factions, each with their own kind of nanotech or whatever, if the difference between the most powerful and the next-most-powerful is like the difference between the conquistadors and their victims – i.e. not that big – we'll probably get a singleton. Daniel Kokotajlo [New comment thread] Europeans conquered the Americas (and pretty much the rest of the world too) and reshaped it dramatically in their image, often in extremely brutal ways that went very much against the wishes of the local population, to say the least. Was this because they had astronomically valuable options and were resource-insatiable? Kind of. But only in a very loose sense of those words. I don't think either [longterm goals or insatiability] is necessary. "Insatiability of resources: Achieving these astronomically valuable options involves using a large share of all available resources." Doesn't seem to describe the europeans very well. Rather, it's that the europeans had absolute power over the regions they conquered and didn't care very much about the local people. It's not that e.g. producing more and cheaper cotton was "astronomically valuable" to the Europeans. It was rather low on their list of values, probably, after their own lives, their happiness, their health, their political freedom, etc. Rather, it's that they didn't care about the slaves, so even something they valued only a little bit (slightly more money) was enough. Similar example would be modern humans and factory farms. Factory farms exist because people would rather only have to pay4 for their pound of beef than $7. Do humans assign astronomical value to$3? Heck no. They assign $3 of value to$3. Do factory farms involve using a large share of available resources? Not really.

I guess human civilization as a whole is resource-insatiable in some weak sense. Humanity is quite capable of marking of some resources to not be consumed by anyone, but doesn't choose to use this power super often, and so the default outcome (resources being used) still happens a lot.

Resources being used is the default outcome; we don't need to appeal to special principles to explain why it happens. We'd need to explain why it doesn't happen. (National parks, limits on fishing, minimum wage laws, etc.)

i see. i guess you’re saying that once an agent is sufficiently misaligned (ie, not caring about side effects) and sufficiently intelligent (or unstoppable for any other reason), richard’s point (2) isn’t really required in order for the results to be catastrophic.

Richard Ngo

Yes, I think I agree with this, and my current plan is to reformulate the doc to take this possibility into account. (Thinking out loud) It seems like we want to model the system of many Europeans as an agent which has the large-scale goal of conquering lots of territory, etc. But then of course we get weird effects, like: if those Europeans go to war with each other, that means the system as a whole is just burning resources, and so the abstraction of Europeans-as-unified-agent breaks down.

But maybe this is fine. I mean, if the europeans had warred against each other enough, then they wouldn't have been a threat to whoever they were trying to colonise. So even the low level of agency that we can assign to the group of Europeans as a whole is sufficient to explain why they were dangerous.

Jaan Tallinn

well.. i think you'd get a lot of pushback when trying to ascribe "collective" or "emergent" agency to historical civilisation.

a propos i had a great conversation with critch yesterday about homeostasis: basically, disruption of homeostasis (on some level of abstraction) seems to be the lowest common denominator between xrisks (interestingly, both for individuals in terms of health, as well as for civilisations).

this also seems to address the current discussion: the reason that europeans were able to take over americas had a bit to do with agency-adjacent things like military strategy and coordination, a bit with intelligence-adjacent things like science and technology, but also a lot to do with the particular vulnerabilities of natives' civilisational homeostasis (most notably to smallpox, which isn't agenty at all).

also, i'm reminded of eliezer's "Evolving to Extinction" essay: at the limit, you might have (species level) xrisk arise just from internal dynamics, without any external (in terms of other agents) influence at all.

(sorry for not being super constructive here, but perhaps there's something you can pick up things from the discussion to make your foundational argument more robust)

Daniel Kokotajlo

I don't think smallpox had as much to do with the european conquests as you think. I'd say it was a factor but a smaller factor than the other two you mentioned. I think technology was the most important factor. I say this after having just read two history books on the conquistadors and a third on that time period in general.

Jaan Tallinn

ok. i've heard smallpox presented as a major factor but i can't remember the sources (jared diamond perhaps?) and can't rule out their political motivations

Daniel Kokotajlo

I'd say it was 50% technology, 30% experience + organization, 10% luck, 10% disease. I think for some areas and in some ways disease was more important, e.g. the demographics of the americas would undoubtedly be different (more native, less african) if not for disease. But politically and culturally europeans would still have dominated for hundreds of years. Consider what happened in the rest of the world, where disease was either not a factor or a factor that disproportionately hurt the europeans. Phillipines, indonesia, India, Africa. Still colonized.

Jaan Tallinn

ok, yeah, not needing diseases to conquer the rest of the world is a strong argument indeed

Discussion on testing the difficulty of AIs taking control

Daniel Kokotajlo

I think we might be able to test* how hard it is to take over the world using board games. Let's have an online game of anonymous Diplomacy, where 6 of the players are amateurs and 1 of the players is a world champion (or otherwise very good player.) And let's run this experiment a bunch of times. If the champions win more than 1/7th of the time, well, that just means that skill helps you take over the world. But if they win, say, 80% of the time, then that means taking over the world is easy if you have a large skill advantage over everyone else.

"Easy mode" would be where no one knows who the champion is. "Hard mode" would be where they do.

*It's not a perfect test, for many reasons. But it'd be some evidence at least, I think.

Richard Ngo

I think this is almost no evidence about the world, and lots of evidence about Diplomacy. In particular, the space of options in Diplomacy is so so heavily constrained, and also many of the key choices are fundamentally arbitrary (e.g. an amateur France deciding whether to ally with Germany or England really doesn't have any good way of choosing between them except by how much they like the two other players).

(But this argument really isn't about diplomacy, it's just that your prior should be that it's virtually impossible to learn non-trivial things about the world via this sort of experiment).

Daniel Kokotajlo

Hmmm, my prior is definitely not like that. My view is: Claims such as "It's very hard to take over the world. If people in power see their positions being eroded, it's generally a safe bet that they'll take action to prevent that" insofar as they are true, are true because of general relationships (such as between like human nature and zero-sum power struggles) not because of specific relationships (such as between modern rulers and the current geopolitical situation.)

In other words, claims like this make predictions about simplified models of our world, not just about our world. So if we construct a reasonably accurate simplified model, we test the claim.

Richard Ngo

Okay, actually, I think I retract my prior being so strong. I'm thinking of the Axelrod experiments where they learned about tit-for-tat, or Laland's social learning strategies tournament, which were definitely simple models where people learned things.

I guess the way I would characterise this is more like: you can learn things by observing what happens during such experiments, because observing is really high-bandwidth. And so in Diplomacy you see things like shifting alliances where people gang up against the strongest player, or very weak player staying alive because they could still get revenge on whoever attacked them, and so it's just not worth it.

What you can't really learn from is the one-bit signal of whether the very strong player wins the game, or not, because this just varies way too much by the details of the game. In chess it's about 100%. In Diplomacy maybe it's somewhere between 30% and 90%, depending on who the amateurs are. In Nine men's morris apparently it's a theoretical draw, so it's probably between 0% and 90% depends on whether the weaker player knows the drawing line. My point is that the win percentage in this hypothetical is really strongly determined by unimportant facts about the game, like how much flexibility there is in the movement rules, and how steep the learning curve is. You could make the diplomacy movement rules as complex as chess and then the strongest player's win rate would shoot right up, or you could introduce more randomness and have it go right down. Perhaps if you designed a variant of Diplomacy from scratch you could learn something just by seeing who wins, but that'd take a lot of effort.

Then I guess the obvious question is: why is the real world different? And I think the answer is because: I am not just generating a general principle about power and people, I'm also conditioning on some facts that I know about the real world, such as: there is a large disparity in how much power different groups start off with (unlike Diplomacy) people in power have a wide range of actions available to them (unlike nine man's morris), but also it's really hard to plan ahead 40 steps (even though you can in Go), and so on...

Daniel Kokotajlo

I agree that a disanalogous game would be no evidence at all. But I'm optimistic that we could design a game that is sufficiently analogous to the real world as to provide some evidence or other about it. (I mean, militaries do this all the time, it's called wargaming!) Like, we should try to find (or build) a game that has all the properties you just listed: different groups have different amounts of power, wide range of actions available, really hard to plan far ahead, etc.

Discuss

### Using false but instrumentally rational beliefs for your career?

23 ноября, 2020 - 22:18
Published on November 23, 2020 7:18 PM GMT

The truth

The job market for recent PhD graduates is tough. Simply, there are many more degree holders than available tenure track positions. People who went through the market describe it as capricious, random, unfair and difficult. New professors often lament highly qualified colleagues who found no position. If your dissertation is on Witte and no Witte-teaching positions are available, tough luck. This evidence suggests that the market is inneficient and that my personal behavior has a small impact on the outcome.

But OTOH, there are reasons to believe that a candidates ability has a large influence. Firstly, the profile of the desired candidate is clear beforehand: deep drive for knowledge, record of successful publication, agreeableness toward colleagues, and teaching skills. Secondly, there is an abundance of clear signals of these attributes; quality of publications, research influence and size of network are all clear and hard to fake. Thirdly, candidate ability has high variance (the best researchers greatly outperform). Therefore hiring committees have tools and incentives to select high-quality researchers.

The truth likely lies between these two positions. Underperforming PhD's rarely advance and many strong CV's fail due to unpredictable changes in demand.

The Instrumental Truth

But perhaps believing the truth is against my interest. If I believe that my work is important to outcomes, I will work harder. If I believe my work has little importance, I may become lazy or seek side-hustles. Should I convince myself that my publication count and thesis quality are more important than is true?

I'd love people's thoughts on this. Also link to great essays and blog posts about instrumental rationality!

Appendix

The best strategy is to remember that academia is a silly place. If the market is unkind don't hesitate to go elsewhere. Not-academia has more money and other good stuff! Be careful not to murder-pill yourself into thinking academia is the whole world.

Discuss

### Syntax, semantics, and symbol grounding, simplified

23 ноября, 2020 - 19:12
Published on November 23, 2020 4:12 PM GMT

tl;dr: I argue that the symbol grounding problem is not an abstract philosophical problem, but a practical issue that it is useful and necessary for algorithms to solve, empirically, to at least some extent.

Intro: the Linear A encyclopaedia

Consider E, a detailed encyclopaedia of the world:

The inside pages are written, for some inexplicable reason, in Linear A, an untranslated Minoan script:

We also have access to R, a sort of Linear A Rosetta Stone, with the same texts in Linear A and in English. Just like the original Rosetta Stone, this is sufficient to translate the language.

The amount of information in E is much higher than the very brief R. And yet E on its own is pretty useless, while E+R together unlock a huge amount of useful information: a full encyclopaedia of the world. What's going on here?

Grounded information

What's going on is that E contains a large amount of grounded information: information about the actual world. But when the meaning of Linear A was lost that grounding was lost too: it became just a series of symbol with a certain pattern to it. Its entropy (in the information-theory sense) did not change; but now it became essentially a random sequence with that particular entropy and pattern.

Adding in R restores that grounding, and so transforms E back into a collection of useful information about the world; again, without changing its information content in the entropy sense.

This is the symbol grounding problem, not seen as an isosteric philosophical issue, but a practical learning problem.

Writing the encyclopaedia

To put this in symbols, let W be the world, let FW be a set of features in the world, and QW be a probability distribution over these features (see this post for details of the terminology).

An agent A has their own feature set FA, in Linear A, which corresponds to the features of the world. The correspondence is given by the map gA:FW→FA. The agent also knows QW.

That A then sits down to write the encyclopaedia E; in it, they put a lot of their knowledge of QW, using the terms in FA to do so.

Thus any agent B that knows gA, the grounding of A's symbols, can use E to figure out a lot about QW and hence about the world.

What if agent B doesn't know Linear A, but does know English? Then they have their own symbol grounding gB:FW→FB. The encyclopaedia E is almost useless to them. It still has the same amount of information, but there's no way of translating that information into FB, the features that B uses.

Now the role of R is clear: by translating English and Linear A into each other, it defines an equivalence between FA, the features of A, and FB, the features of B (specifically, it defines gB∘g−1A:FA→FB).

Given this, B now knows FA. And, assuming they trust A, they can extract information on QW from the encyclopaedia E. Hence they can get some of A's real-world information.

Ungrounded information

Now consider E′, a text with the same amount of information to E, but generated by a random process. There is information about in E′, but its not real-world-information-about-W-mediated-through-A; it's just random information.

Maybe there could be a R′ that grounds the symbols of E′ in a way that makes sense for the real world. But, since E′ is random, the R′ would have to be huge: essentially R′ would have to contain all the real world information itself.

So we can say that both E and E′ are ungrounded for agent B; but E is at least potentially grounded, in that there is a short translation that will ground it. This "short translation", R, only works because E was constructed by A, a grounded agent (by gA), and B's features are also grounded in the same world (by gB). Thus all that R does is translate between grounded symbols.

Syntax versus semantics

Note that QW looks like pure syntax, not semantics. It doesn't define any features; instead it gives a (probabilistic) measure of how various features relate to each other. Thus E only gives syntactic information, it seems.

But, as I noted in previous posts, the boundary between syntax and semantics is ambiguous.

Suppose that an agent A has a feature "gavagai", another feature "snark", and let p=QA(snark∣gavagai) be a term in their syntax, corresponding to the probability of "snark", given "gavagai".

Suppose we are starting to suspect that, semantically[1], "gavagai" means either rabbit or cooked meal, while "snark" means fur.

Then if p≈1, "gavagai" is likely to be rabbit, since whenever gavagai is present, there is fur. Conversely, if p≈0, then "gavagai" is likely to be a meal, since gavagai means no fur around. Re-conversely, if gavagai is rabbit, p is likely to be close to 1.

None of these interpretations are definite. But the probabilities of particular syntactic and semantic interpretations go up and down in relation to each other. So purely syntactical information provides evidence for symbol grounding - because this syntax has evolved via interactions with the real world. In fact, gA, QA and even FA, all evolve through interactions with the real world, and can change when new information arrives or the model feature splinter[2].

And syntax and semantics can also substitute for each other, to some extent. Suppose A has only ever seen white rabbits. Then maybe "gavagai" means "white rabbit", a semantic statement. Or maybe "gavagai" means "rabbit", while A also believes the syntactic statement "all rabbits are white". Either option would result in the same behaviour and pretty much the same ways of thinking, at least in circumstances A has already encountered.

You could see semantics, in humans, as instinctively recognising certain features of the world, and syntax as logically putting together these features to derive other features (using rules and life experience learnt in the past). Putting it this way, it's clear why the line between syntax and semantics is vague, and can vary with time and experience[3].

We also often only have implicit access to our own gA and our own QA, further complicating the distinction.

Doing without "the world"

In the post on model splintering, I argued for doing without underlying "worlds", and just using the imperfect features and models of the agents.

This is needed in this setting, too. Agent A might make use of the feature "human being", but that doesn't mean that the world itself needs to have a well-defined concept of "human being". It's very hard to unambiguously define "human being" in terms of physics, and any definition would be debatable. Moreover, A did not use advanced physics in their own definition.

For symbol translation, though, it suffices that B understands roughly what A means by "human beings" - neither needs a perfect definition.

Understanding the features of another agent

So, let's change the definitions a bit. Agent A has their features FA and their probability distribution over features, QA (the syntax); the same goes for agent B with FB and QB.

The grounding operators gA and gB don't map from the features FW of the world, but from the private input-output histories of the two agents, hA and hB. Write QA(fA=n∣hA,gA)=p to designate that p is the probability that feature fA∈FA is equal to n, given the history hA and the grounding function[4] gA.

Now, what about B? They will have their assessment of A's internal symbols; let f∗BA be B's interpretation of A's symbol fA.

They can compute expressions like QB(f∗BA=n∣hB,gB), the probability B assigns to A's feature fA being equal to n. It will likely reach this probability assignment by using g∗BA, Q∗BA and h∗BA: these are B's assessment of A's grounding, syntax, and history. Summing over these gives:

QB(f∗BA=n∣hB,gB)=∑g∗BA,Q∗BA,h∗BAQ∗BA(f∗bA=n∣h∗BA,g∗BA)QB(g∗BA,Q∗BA,h∗BA∣hB).

Then B will consider feature fA to be perfectly grounded if there exists fB, a feature in FB, such that for all histories hB,

QB(f∗BA=n∣hB,gB)=QB(fB=n∣hB,gB).

Thus, A's assessment of fA (according to B's estimate) is always perfectly aligned with B's assessment of fB.

Imperfect grounding: messy, empirical practice

Perfect grounding is far too strong a condition, in general. It means that B believes that A's estimate of fA is always better than its own estimate of fB.

Suppose that fA and fB referred to the number of people in a given room. Sometimes A is in the room, and sometimes B is.

Obviously it makes sense for B to strongly trust A's estimate when A is in the room and B is not; conversely, it would (generally) be wrong for B to do so when their positions are reversed.

So sometimes B will consider A's estimates to be well grounded and correct[^nodistinct]; sometimes it won't. Grounding, in practice, involves modelling what the other person knows and doesn't know. We might have a some idea as to what would lead A astray - eg a very convincing human-like robot in the room, combined with poor lighting.

That sounds both very complicated (if we wanted to formalise that in terms of what agent B believes about what agent A believes about features...) and very simple (it's exactly what human theory of mind is all about).

What this means is that, given this framework, figuring out the grounding of the symbols of another agent is a messy empirical process. It's not a question of resolving esoteric philosophical questions, but assessing the knowledge, honesty, accuracy, and so on, of another agent.

Note the empiricism here; we've moved from:

• Are the symbols of A well grounded?

to:

• What are the symbols of A grounded as, and in what contexts?
Language, and interpretation of the symbols of others

Those happy few who know my research well may be puzzled at this juncture. Knowing the internal features of another agent is the ultimate structured white-box model - we not only know how the other agent thinks, but we assign useful labels to their internal processes. But when I introduced that concept, I argued that you could not get structured white-box models without making major assumptions. So what's going on here?

First of all, note that the definition of grounding of fA is given entirely in terms of B's estimate of A's features. In the limit, B could be utterly wrong about everything about A's internals, and still find that A's symbols are grounded, even perfectly grounded.

That's because B is not using A's symbols at all, but their own estimate of A's symbols. Maybe A always feels fear and disgust when they get cold. Assume that that is the only time that A feels both those sentiments at once. Then A might have two symbols, "fear" and "disgust", while B might model "fear-plus-disgust" as the single feature "cold". And then B can use that feature, empirically and correctly, to predict the temperature. So B thinks that A's feature "cold" is well grounded, even if the feature doesn't even exist in A's models. This is the sense in which "gavagai" meaning either "rabbit" or "undetached rabbit-part" - actually mean the same thing.

But for humans, there are other factors at work. Humans communicate, using words to designate useful concepts, and converge on mutually intelligible interpretations of those words, at least in common situations. A phrase like "Run, tiger!" needs to be clearly understood immediately.

Now, we humans:

1. Partially understand each other thanks to our theory of mind, and
2. Use the words we communicate with in order to define internal concepts and features.

This means that we will converge on roughly shared understandings of what words mean, at least in typical environments. This rough shared understanding explains both the untranslatability of language and why it's mainly translatable. We might not get all the linguistic nuances of the Tale of Genji, but we do know that it's about Japanese courtiers in the 11th century, and not about time-travelling robots from the future.

A last note on language: we can use it to explore concepts that we've never encountered, even concepts that don't exist. This means that, with language, we might realise that, for someone, "gavagai" means "undetached rabbit part" rather than "rabbit", because we can use linguistic concepts to imagine a distinction between those two ideas. And then we can communicate this distinction to others.

GPT-n, ungrounded

This kind of reasoning causes me to suspect that the GPT-n series of algorithms will not reach super-human levels of capability. They've achieved a lot through syntactic manipulation of texts; but their symbols are almost certainly ungrounded. Consider two hypotheses:

1. To write like a human, an agent needs a full understanding of physics, biology, and many other sciences.
2. There are simpler models that output human-like writing, with decent probability, without modelling the hard sciences.

I think there's evidence for the second hypothesis - for example, the successes of the current and past GPT-ns. It does not seem plausible that these machines are currently modelling us from electrons upwards.

But if the second hypothesis is true, then we'll expect that the GPT-ns will reach a plateau at or near the maximal current human ability. Consider two models, Mp (a full physics model of humanity and enough of the universe), and Ms (a simplified model of human text generation). As long as the GPT-ns are successful with Ms, there will be no pressure on them to develop Mp. Pressure can mean reinforcement learning, objective functions, or humans ranking outputs or tweaking the code. For the algorithm to converge on Mp, the following need to be true:

1. Using Mp is significantly better than using Ms, so there is pressure for the algorithm to develop the better model.
2. Moving towards Mp, from its current model, is a sufficient improvement over Ms, that it will find a path towards that model.

It seems to me that 1. might be true, but 2. seems very unlikely to be true. Therefore, I don't think that GPT-ns will need or be able to ground its symbols, and hence will be restricted to human-comparable levels of ability.

We could empirically test this, in fact. Feed GPT-3 all the physics papers we have until 1904[5]. Could GPT-3 or any of its successors generate special and general relativity from that data? I would be extremely surprised, and mildly terrified, if it did. Because it could only do so if it really understood, in a grounded way, what physics was.

Thanks to Rebecca Gorman for help with this research.

1. Using gA to define semantics. ↩︎

2. Like learning that "dog" and "wolf" are meaningfully different and can't be treated the same way -- or that different breeds of dogs are also different in relevant ways. In that case, the categorisation and/or modelling will shift to become more discriminatory and precise. This tends to be in areas of relevance to the human: for a dog breeder, the individual breeds of dogs are very precise and detailed categories, while everything that lives in the sea might be cheerfully dumped into the single category "fish". ↩︎

3. See how carefully learnt deductions can become instinctive, or how we can use reason to retrain our instincts. ↩︎

4. Notice that this formulation means that we don't need to distinguish the contribution of semantics (gA) from that of syntax (QA): both are folded into the same expression. ↩︎

5. Maybe filter out Lorentz's papers. ↩︎

Discuss

### Continuing the takeoffs debate

23 ноября, 2020 - 18:58
Published on November 23, 2020 3:58 PM GMT

Here’s an intuitively compelling argument: only a few million years after diverging from chimpanzees, humans became much more capable, at a rate that was very rapid compared with previous progress. This supports the idea that AIs will, at some point, also start becoming more capable at a very rapid rate. Paul Christiano has made an influential response; the goal of this post is to evaluate and critique it. Note that all of these arguments discussed in this post are quite speculative and uncertain; in the process of writing it I’ve made only a small update towards fast takeoff. And given that Paul’s vision of a continuous takeoff occurs much faster than any mainstream view, I expect that even totally resolving this debate would have relatively few implications for AI safety work. Nevertheless, it’s disappointing that such an influential argument has received so little engagement, so I wanted to use this post to explore some of the uncertainties around the issue.

I’ll call Paul’s argument the changing selection pressures argument, and quote it here at length:

Chimpanzees evolution is not primarily selecting for making and using technology, for doing science, or for facilitating cultural accumulation.  The task faced by a chimp is largely independent of the abilities that give humans such a huge fitness advantage. It’s not completely independent - the overlap is the only reason that evolution eventually produces humans - but it’s different enough that we should not be surprised if there are simple changes to chimps that would make them much better at designing technology or doing science or accumulating culture.

Relatedly, evolution changes what it is optimizing for over evolutionary time: as a creature and its environment change, the returns to different skills can change, and they can potentially change very quickly. So it seems easy for evolution to shift from “not caring about X” to “caring about X,” but nothing analogous will happen for AI projects. (In fact a similar thing often does happen while optimizing something with SGD, but it doesn’t happen at the level of the ML community as a whole.)

If we step back from skills and instead look at outcomes we could say: “Evolution is always optimizing for fitness, and humans have now taken over the world.” On this perspective, I’m making a claim about the limits of evolution. First, evolution is theoretically optimizing for fitness, but it isn’t able to look ahead and identify which skills will be most important for your children’s children’s children’s fitness. Second, human intelligence is incredibly good for the fitness of groups of humans, but evolution acts on individual humans for whom the effect size is much smaller (who barely benefit at all from passing knowledge on to the next generation). Evolution really is optimizing something quite different than “humanity dominates the world.”

So I don’t think the example of evolution tells us much about whether the continuous change story applies to intelligence. This case is potentially missing the key element that drives the continuous change story: optimization for performance. Evolution changes continuously on the narrow metric it is optimizing, but can change extremely rapidly on other metrics. For human technology, features of the technology that aren’t being optimized change rapidly all the time. When humans build AI, they will be optimizing for usefulness, and so progress in usefulness is much more likely to be linear.

In other words, Paul argues firstly that human progress would have been much less abrupt if evolution had been optimising for cultural ability all along; and secondly that, unlike evolution, humans will continually optimise for whatever makes our AIs more capable. (I focus on “accumulating culture” rather than “designing technology or doing science”, because absorbing and building on other people’s knowledge is such an integral part of intellectual work, and it’s much clearer what proto-culture looks like than proto-science.) In this post I’ll evaluate:

1. Are there simple changes to chimps (or other animals) that would make them much better at accumulating culture?
2. Will humans continually pursue all simple yet powerful changes to our AIs?

Although it feels very difficult to operationalise these in any meaningful way, I’ve put them down as Elicit distributions with 50% and 30% confidence respectively. The rest of this post will explore why.

Elicit Prediction (elicit.org/binary/questions/7HRkDXeEx) Elicit Prediction (elicit.org/binary/questions/6Ux1g8SFg) How easily could animals evolve culture?

Let’s distinguish between three sets of skills which contribute to human intelligence: general cognitive skills (e.g. memory, abstraction, and so on); social skills (e.g. recognising faces, interpreting others’ emotions); and cultural skills (e.g. language, imitation, and teaching). I expect Paul to agree with me that chimps have pretty good general cognitive skills, and pretty good social skills, but they seriously lack the cultural skills that precipitated the human “fast takeoff”. In particular, there’s a conspicuous lack of proto-languages in all nonhuman animals, including some (like parrots) which have no physiological difficulties in forming words. Yet humans were able to acquire advanced cultural skills relatively quickly after diverging from chimpanzees. So why haven’t nonhuman animals, particularly chimpanzees, developed cultural skills that are anywhere near as advanced as ours? Here are three possible explanations:

1. Advanced cultural skills are not very useful for species with sub-human levels of general cognitive skills and social skills.
2. Advanced cultural skills are not directly selected for in species with sub-human levels of general cognitive skills and social skills.
3. Advanced cultural skills are too complex for species with sub-human levels of general cognitive skills and social skills to acquire.

I’ve assigned 40%, 45% and 15% credence respectively to each of these being the most important explanation for the lack of cultural skills in other species, although again these are very very rough estimates.

Elicit Prediction (elicit.org/binary/questions/NpU6ECsoV) Elicit Prediction (elicit.org/binary/questions/XMkfPR3c2) Elicit Prediction (elicit.org/binary/questions/8fhsJtwlQ)

What reasons do we have to believe or disbelieve in each? The first one is consistent with Lewis and Laland’s experiments, which suggest that the usefulness of culture increases exponentially with fidelity of cultural transmission. For example, moving from a 90% chance to a 95% chance of copying a skill correctly doubles the expected length of any given transmission chain, allowing much faster cultural accumulation. This suggests that there’s a naturally abrupt increase in the usefulness of culture as species gain other skills (such as general cognitive skills and social skills) which decrease their error rate. As an alternative possibility, Dunbar’s work on human evolution suggests that increases in our brain size were driven by the need to handle larger social groups. It seems plausible that culture becomes much more useful when interacting with a bigger group. Either of these hypotheses supports the idea that AI capabilities might quickly increase.

The second possibility is the most consistent with the changing selection pressures argument.[1] The core issue is that culture requires the involvement of several parties - for example, language isn’t useful without both a speaker and a listener. This makes it harder for evolution to select for advanced language use, since it primarily operates on an individual level. Consider also the problem of trust: what prevents speakers from deceiving listeners? Or, if the information is honest and useful, what ensures that listeners will reciprocate later? These problems might significantly reduce the short-term selection for cultural skills. However, it seems to me that many altruistic behaviours have overcome these barriers, for example by starting within kin groups and spreading from there. In Darwin’s Unfinished Symphony, Laland hypothesises that language started the same way. It seems hard to reconcile observations of altruistic behaviour in chimps and other animals with the claim that proto-culture would have been even more useful, but failed to emerge. However, I've given this possibility relatively high credence anyway because if I imagine putting chimps through strong artificial selection for a few thousand years, it seems pretty plausible that they could acquire useful cultural skills. (Although see the next section for why this might not be the most useful analogy.)

The third possibility is the trickiest to evaluate, because it’s hard to reason about the complexity of cognitive skills. For example, is the recursive syntax of language something that humans needed complex adaptation to acquire, or does it reflect our pre-existing thought patterns? One skill that does seem very sophisticated is the ability of human infants to acquire language - if this relied on previous selection for general cognitive skills, then it might have been very difficult for chimps to acquire. This possibility implies that developing strong non-cultural skills makes it much easier to develop cultural skills. This would also be evidence in favour of fast takeoffs, since it means that even if humans are always trying to build increasingly useful AIs, our ability to add some important skills might advance rapidly once our AIs possess other prerequisite skills.

How well can humans avoid comparable oversights?

Even assuming that evolution did miss something simple and important for a long time, though, the changing selection pressures argument fails if humans are likely to also spend a long time overlooking some simple way to make our AIs much more useful. This could be because nobody thinks of it, or merely because the idea is dismissed by the academic mainstream. See, for example, the way that the field of AI dismissed the potential of neural networks after Minsky and Papert’s Perceptrons was released. And there are comparably large oversights in many other scientific domains. When we think about how easy it would be for AI researchers to do better than evolution, we should be asking: “would we have predicted huge fitness gains from cultural learning in chimpanzees, before we’d ever seen any examples of cultural learning?” I suspect not.[2]

Paul would likely respond by pointing to AI Impacts’ evidence that discontinuities are rare in other technological domains - suggesting that, even when fields have been overlooking big ideas, their discovery rarely cashes out in sharp changes to important metrics.[3] But I think there is an important disanalogy between AI and other technologies: modern machine learning systems are mostly “designed” by their optimisers, with human insights only contributing at a high level. This has three important implications.

Firstly, it means that attempts to predict discontinuities should consider growth in compute as well as intellectual progress. Exactly how we do so depends on whether compute and insights are better modeled as substitutes or complements to each other - that is, whether insights have less or more impact when more compute becomes available. If they’re substitutes, then we should expect continuous compute growth to “smooth out” the lumpiness in human insight. But if they’re complements, then compute growth exacerbates that lumpiness - an insight which would have led to a big jump with a certain amount of compute available could lead to a much bigger jump if it’s only discovered when there’s much more compute available.

I think there’s much more to be said on this question, which I’m currently very uncertain about. My best guess is that we used to be in a regime where compute and insight were substitutes, because domain-specific knowledge played a large role. But now that researchers are taking the bitter lesson more seriously, and working on tasks where it’s harder to encode domain-specific knowledge, it seems more plausible that we’re in a complementary regime, where insights are mainly used to leverage compute rather than replace it.

Either way, this argument suggests that the comparison to other technological domains in general is a little misleading. Instead, we should look at fields in which an important underlying resource was becoming exponentially cheaper - for instance, fields which rely on DNA sequencing. One could perhaps argue that all scientific fields depend on the economy as a whole, which is growing exponentially - but I’d be more convinced by examples in which the dependency is direct, as it is in ML.

Secondly, our reliance on optimisers means that we don’t understand the low-level design details of neural networks as well as we understand the low-level design details in other domains. Not only are the parameters of our neural networks largely opaque to us, we also don’t have a good understanding of what our optimisers are doing when they update those parameters. This makes it more likely that we miss an important high-level insight, since our high-level intuitions aren’t very well linked to whatever low-level features make our neural networks actually function.

Thirdly, even if we can identify all the relevant traits that we’d like to aim for at a high level, we may be unable to specify them to our optimisers, for all the reasons explained in the AI safety literature. That is, by default we should expect our optimisers to develop AIs with capabilities that aren't quite what we wanted (which I'll call capabilities misspecification). Perhaps that comes about because it’s hard to provide high-quality feedback, or hard to set up the right environments, or hard to make multiple AIs interact with each other in the right way (I discuss such possibilities in more depth in this post). If so, then our optimisers might make the same types of mistakes as evolution did, for many of the same reasons. For example, it’s not implausible to me that we build AGI by optimising for the most easily measurable tasks that seem to require high intelligence, and hoping that these skills generalise - as was the case with GPT-3. But in that case the fact that humans are “aiming towards” useful AIs doesn’t help very much in preventing discontinuities.

Paul claims that, even if this argument applies at the level of individual optimisers, it hasn't previously been relevant at the level of the ML community as a whole. This seems plausible, but note that the same could be said for alignment problems in general. So far they've only occurred in isolated contexts, yet many of us expect that alignment problems will get more serious as we build more sophisticated systems that generalise widely in ways we don't understand very well. So I'm inclined to believe that capabilities misspecification will also be more of a problem in the future, for roughly the same reasons. One could also argue against the likelihood of capabilities misspecification by postulating that in order to build AGIs we’ll only need to optimise them to achieve relatively straightforward tasks in relatively simple environments. In practice, though, it’s difficult to make such arguments compelling given the uncertainties involved.[4]

Overall, I think that the changing selection pressures argument is a plausible consideration, but far from fully convincing; and that evaluating it thoroughly will require much more scrutiny. However, I'd be more excited about future work which classifies both Paul and Eliezer's positions as "fast takeoff", and then evaluates those against the view that AGI will "merely" bump us up to a steeper exponential growth curve - e.g. as defended by Hanson.

1. As further support for this argument it’d be nice to have more examples of cases where evolution plausibly missed an important leap, in addition to the development of human intelligence. Are there other big evolutionary discontinuities? Plausibly multicellularity and the Cambrian explosion qualify. On a smaller scale, two striking types of biological discontinuities (for which I credit Amanda Askell and Beth Barnes) are invasive species, and runaway sexual selection. But in both cases I think this is more reasonably described as a change in the objective, rather than a species quickly getting much fitter within a given environment.
2. In practice we can take inspiration from humans in order to figure out which traits will be necessary in AGIs - we don’t need to invent all the ideas from scratch. But on the other hand, even given the example of humans, we haven’t made much progress in understanding how or why our intelligence works, which suggests that we’re reasonably likely to overlook some high-level insights.
3. One natural reason to think that economic usefulness of AIs will be relatively continuous even if we overlook big insights is that humans can fill in gaps in the missing capabilities of our AIs, so that they can provide a lot of value without being good at every aspect of a given job.
4. Perhaps the strongest hypothesis along these lines is that language is the key ingredient - yet it seems like language models will become data-constrained relatively soon.

Discuss

### Retrospective: November 10-day virtual meditation retreat

23 ноября, 2020 - 18:00
Published on November 23, 2020 3:00 PM GMT

So yesterday I finished a 10-day virtual meditation retreat taught by Tucker Peck and Upasaka Upali.

Several people have asked me what it was like, so here are some highlights.

First, a “virtual” retreat means that you spend 10 days doing pretty much nothing but meditation, and also don’t talk to anyone except the teachers, who hold daily lectures and once-every-two-days personal interviews over Zoom. Also, when you sit down to meditate, you are encouraged to do it in front of a camera, so that you can see everyone else who is meditating and they can also see you.

At times it was great, such as when I was mostly just doing concentration meditation and focusing on my breath, and then suddenly memories of playing XCOM: Enemy Unknown together with a friend came up and I just felt a strong sense of connectedness and loving-kindness towards her, even though I hadn’t even been doing loving-kindness practice.

At other times I was figuratively clawing my eyes out of boredom and a desire to just be back on social media and able to talk to people.

In retrospect, it feels odd that the boredom was sometimes so strong as to make it impossible to meditate, since if I hadn’t been bored I could simply have meditated, and I was bored because I couldn’t get the meditation to work… it now feels like what was actually going was some desire to be in control, and that clinging onto the desire to be on social media and check my messages was a way of asserting a sense of control. Or something like that. Something to look into, anyway. In any case, it was a good opportunity to investigate the nature of discomfort, and I got quite a bit of that done.

Things that felt like significant shifts, or at least interesting experiences:

* I went into the retreat with the thought of wanting to give The Mind Illuminated -style concentration meditation another try, since it had worked well for me before, but I had eventually ran into various roadblocks with it. Over the last few years, every now and then I have tried it for a bit, maybe gotten a bit of initial success, and then had it stop working again.

What I noticed this time was that following the breath felt hard because it would bring up unpleasant sensations in my belly – sensations which pretty much only pop up when I’m doing meditation, so have to be psychogenic. So this time I decided to investigate those sensations. Shifting my attention on them caused various kinds of material to come up (including the previously mentioned example of playing XCOM), which eventually led to…

* There was a moment when I heard a voice in my head saying “it is safe to feel loved”. I was a little surprised by that, since I had not thought of myself as someone who finds it unsafe to feel loved, but it felt significant.

* Afterwards there were lots of long-forgotten memories and experiences returning to mind; much of it had apparently been blocked either to keep negative memories out (which also had the effect of blocking positive memories), or because they were positive in the “I feel loved” sense, and that was experienced as unsafe.

Either way, lots of various happy, neutral, and unhappy memories coming up, with an emphasis on the happy ones. And it’s worth noting that the threshold for what my brain considered a “happy” memory was set ridiculously low. There were things like:

• being picked up by my mom after school and feeling happy to be hanging out with her
• that time when I was a kid and playing a Nintendo game that wasn’t even one of my favorites, it was kinda hard and I never got very far, but it was still kinda cool and neat even if not the very best
• that time when I was reading Nintendo magazine and it had this four-page guide to a game which I didn’t even get to play until much later, but from reading the guide I got to *imagine* what it would feel like to play the game and it felt awesome
• in my hometown there was a particular bus line that would take you from the center of the city to my home, and it departed from a particular stop at the central marketplace, and that one bus stop felt like “my” bus stop because it was the one that took me home and being able to go about town and then ride the bus home gave me a sense of independence and agency and now I just recalled that one bus stop and that memory made me happy

At one point there were so many of these that it became outright painful to feel that happy. Then suddenly some dark and unpleasant thoughts started coming up, which surprised me at first, since I hadn’t expected them to show up when I was feeling so good. But then I got it since

* I had had a bunch of weird uncomfortable thoughts and fantasies that seemed to have at their core a desire to feel loved, while simultaneously finding unsafe to feel loved, and then trying to satisfy the constraints of “feel loved but also do not feel loved” at the same time. At least, that would explain why they seemed to come up at that particular moment, then have the thought of “it’s safe to feel loved” somehow… penetrate through them… for the uncomfortable thoughts to then disappear. For now, at least.

* As I mentioned, we had 15-minute interviews with the teachers every two days. For most of the early part of the retreat, I would spend a lot of time thinking about what I wanted to say in the interviews, making detailed mental notes of what had happened in my meditation that I could report on, etc. Whenever this happened, it would always feel like I “fell out of mindfulness” – I identified so strongly with the experience of thinking about what to say that I couldn’t maintain any kind of mindful observer stance at the same time. Thinking about it just felt like something that “I” did – and I kept doing it to an annoyingly frequent extent. (This had also been true before the retreat – “thinking about what to say to people” is the kind of thing that has always caused a lot of identification.)

But over the course of this retreat, it felt like the “shape” of the mental subprocess doing this was starting to become more distinct – as if I could start carving out its boundaries, making it more visible against the backdrop of my mind. When I got it clear enough, I switched to an Internal Family Systems stance and asked it what it was trying to accomplish and why it felt it was so important. As a result, it started giving me lots of memories of times I hadn’t known what to say to people and that had felt like it’d had negative consequences.

I gradually worked on it over some days, eventually managing to drop the process so that it wasn’t as preoccupied with making such plans all the time. As a result, I went into my final interview without much that I’d have had prepared – and I think mostly managed to not embarrass myself anyway. After the interview, I reviewed the conversation, and concluded that I could have said a few things differently in order to make myself appear more impressive or cool, but overall it wasn’t a major difference and probably not worth all the energy that would have been spent on those “minor optimizations”.

Now it feels more that – for the first time in my life that I can recall – I can actually let that planning process work in the background while experiencing myself as separate from it, and for the most part it doesn’t feel the need to pre-plan so many things anyway. (This feels like it saves a lot of mental energy!)

The actual experience of speaking to people also feels different now. “I feel more present” is a boring cliché but also feels somewhat apt; I’m less focused on what I should say next and more aware of what I did just say. This includes being more aware of the “physical” details of my own voice, such as the cadence, volume, and how the individual phonemes and words… “hang together”, for lack of a better description.

The analogy that comes to mind is that previously, most of my focus was on processing the informational level of what I was saying or about to say. Now that I feel more relaxed about what the informational level should contain, there’s spare processing capacity to also pay attention to the “lower levels”, such as the physical properties of my voice.

* Besides TMI-style concentration meditation, my other practice on the retreat was some variety of “do nothing“-style meditation – which in my case felt more like “do anything“, as in “whatever my mind wants to do or think, I let it do or think”. It was this practice that felt most interrupted by the intention to think about what I was about to say, because it did not feel like I was letting my mind do what it wanted, rather I (as opposed to “my mind”) was actively deciding what to think.

There were a few enjoyable experiences where this kicked in pretty strongly. On a few times when I sat down to meditate, it felt like I wasn’t doing anything at all, and rather just letting all intentions to do anything relax and fall away on their own. Then I would become aware of some tensions or discomforts in the body… and it would start feeling like those tensions were also maintained by some kind of an intention, as if my mind was actively creating the tension/discomfort because it wanted to feel discomfort. Then my attention would be drawn closer to the tension, some psychological content would come up, it would either resolve or the timer I’d set for my sit would ring… and gradually the process would continue, until it would run into some obstacle that changed the nature of it.

* I felt like I would get brief glimpses of what you might call the ego – there was a sense of just doing nothing and letting the mind relax, and then a feeling of there still being something that was acting as an active doer, guiding how the meditation process should go or which intention to relax next or even just the fact that this was a process of relaxing intentions… as for on many occasions before, there would be small flashes of it, some of which would bring up some additional content or emotion, but never quite enough to see it clearly.

Overall, I feel pretty good and happy now, on the day after the retreat.

For now at least, that experience of “it’s safe to feel loved” seems to have rekindled something of a core state of love – that is, an experience of love which is not tied to being loved by any particular person, but rather feels like a happy comfortable background state which easily turns into warmth towards people who I think of or interact with. Similarly, some of those feelings of competence and agency that I found in the memories that I connected with, seem to be more naturally accessible now.

Some of Buddhist psychology suggests there are some basic discomforts that sit inside you, and which appear to be caused by external circumstances, when they’re actually internal processes that just happen to grab onto whatever happens to be available in the environment. So if you are feeling mistrustful and run into someone, your mind may grab onto whatever features that the other person has that seem like they could be used to justify the mistrust, and act as if that person had caused it. (This has some interesting parallels to predictive processing models of mind, which I have compared to Buddhist psychology before; you could think of this as there being a high-level prior for “I feel mistrustful”, with any incoming sense data being adjusted to fit.)

The NLP concept of core states seem like they act in a somewhat similar way, but for more wholesome experiences. So if you have a sense of agency or a sense of love as a core state, then the mind’s background assumption is that you are going to experience agency or love, and it will grab onto any opportunity in the internal or external environment – even the memory of a bus stop if it doesn’t find anything else – in order to do so. PJ Eby has suggested (and I previously made a similar suggestion in the context of the IFS concept of “self”) that experiencing those core states is the mind’s basic tendency, and that we only learn not to experience them because we find them unsafe:

… what CT [Core Transformation] calls “core states” are also accessible by simply not activating the parts of the brain that shut off those states. (e.g. by telling us we don’t deserve love)

So if, for example, we don’t see ourselves as worthless, then experiencing ourselves as “being” or love or okayness is a natural, automatic consequence. Thus I ended up pursing methods that let us switch off the negatives and deal directly with what CT and IFS represent as objecting parts, since these objections are the constraint on us accessing CT’s “core states” or IFS’s self-leadership and self-compassion.

Possibly some of those objections are now a little lessened again. At least, for today. :-)

Discuss

### The Mutant Game - Rounds 11 to 30

23 ноября, 2020 - 12:20
Published on November 23, 2020 9:20 AM GMT

This game continues from the alternate timeline here where I made two mistakes in the game engine.

• Bots were passed their own previous move and told it was their opponent's previous move.
• Bots were always given 0 as the round index instead of the correct positive integer.
CloneBots

Multiple people have noted that CliqueZviBot is outperforming the other CloneBots. This is due to how the CloneBot code interacts with the bugs in the my engine.

The CloneBots still cooperate, but they do so imperfectly. All CloneBot pairings result in 200-300 splits instead of 250-250 splits. The CloneBots use source code parity combined with round number parity to determine who wins the 200-300 split. Therefore if CloneBotA and CloneBotB get a 200-300 split in favor of CloneBotB then they will always get a 200-300 split in favor of CloneBotB.

Rounds 11-30Round 11

Looking at the obituary I suspect that CooperateBot may not last much longer.

Prediction by Larks after seeing the results from Rounds 1 to 10

Larks' CooperateBot died on round 11.

Round 12

PasswordBot from Team Multics died along with "Why can't we all just get along" from Chaos Army and an NPC.

Round 13

No casualties.

Round 14

6 bots died.

• BeauBot, OscillatingTwoThreeBot, RandomOrGreedyBot and SimplePatternFinderBot from Chaos Army
• "Definitely Not Collusion Bot" from Team Multics. Multicore's fodder has been consumed. Team Multics contains only the MimicBot from here on.
• 1 NPC
Round 15

5 bots died.

• Silly Invert Bot 2, AttemptAtFair, Insum's CooperateBot, MeasureBot and "Random-start-turn-taking" from Chaos Army
• 1 NPC

MeasureBot had succeeded in infecting AbstractSpyTreeBot's move method and replacing it with return 0. AbstractSpyTreeBot ought perform better with MeasureBot out of the game.

Round 16

3 bots from Chaos Army died

• "Silly Counter Invert Bot"
• LiamGoddard
• "Pure TFT"
Rounds 17-22

4 NPCs died

Round 23

BendBot and Copoperater [sic] died. BendBot belonged to Zvi. CliqueZviBot does not actually belong to Zvi. It is named after Zvi's strategy from the original name.

Round 24

No casualties.

Round 25

Copybot Deluxe died.

Round 26

RaterBot died. RaterBot performed semantic analysis on its opponents' source code. This may have contributed to breaking the symmetry of the clones.

Round 27

No casualties.

Round 28

Empiricist died. Empiricist was the most complicated bot I agreed to write the code for. The bot is exemplar of a precise, well-written spec of a clever algorithm.

Step 0: Compute Empiricist's total score so far (denote s) and the opponent's total score so far (denote t). If  s + 5">t>s+5, then Empiricist plays 3. Otherwise, continue to the following steps.

Step 1: Compute the maximal number m s.t. the last m rounds of the game are a repetition of some previous sequence. That is, m is maximal s.t. there exists k with k+m≤n s.t. the sequence (xk,yk)…(xk+m−1,yk+m−1) is identical to the sequence (xn−m+1,yn−m+1)…(xn,yn). If no 0">m>0 satisfies this property, set m=0.

Step 2: Find the latest subsequence among previous repetitions, that is, the maximal k that satisfies the property above w.r.t. the chosen m. If m=0, set k=n.

Step 4: Examine y:=yk+m. If y<5, Empiricist plays 5-y. If y=5, Empiricist plays 2.

Round 29

1 NPC died.

Round 30

No casualties.

Summary of Rounds 11-30Everything so farList of Survivors

The CloneBots and the MimicBot are all still alive.

BotPopulationEarlyBirdMimicBot1013Akrasia Bot950A Very Social Bot916CliqueZviBot907Clone wars, episode return 3888a_comatose_squirrel423incomprehensibot221KarmaBot48CloneBot8

Two bots from Chaos Army survived this first ¼ of Order 66.

BotPopulationAbstractSpyTreeBot15Winner against low constant bots5

One NPC survived too. Silly 2 Bot always returns 2.

BotPopulationSilly 2 Bot5

In early tests of the game, I discovered that sometimes bots got stuck at a population of 2 from which they never died nor recovered. I added custom code to finish off any bot with a population of 2 or less. Any bot whose population drops to 2 or less will die.

Multicore's Mystery

Multicore is in first place. But Multicore should not just be in first place. As the sole traitor among the CloneBots, Multicore's MimicBot should be dominating this competition. Plus, as a simulator, it should be able to cooperate on the first turn despite receiving misinformation about its opponent's previous move. I suspect that the MimicBot's simulator is useful because the other simulator, AbstractSpyTreeBot has the highest population of all non-clones.

Multicore's failure to completely dominate probably has something to do with the bugs in the game engine. But there's something else which at play too.

Simple bots are most advantageous to simulate. They are easy to maximize cooperation with and there is little danger of simple bots winning in the long game. The simplest bots were my silly bots. The next simplest bots were the bots I wrote on behalf of non-programmers. I programmed exclusively in Lisp. The MimicBot only has has code to simulate opponents written in Python3 and cannot simulate bots written in Lisp. (The same goes for AbstractSpyTreeBot, which MimicBot's simulator came from.) Therefore MimicBot cannot simulate the bots which it would be most worthwhile to simulate.

If this is true then AbstractSpyTreeBot continues to influence this game.

Today's ObituaryBotTeamSummaryRoundCooperateBot [Larks]Chaos Army"For the first 10 turns: return 3. For all subsequent turns: return the greater of 3 and (5 - the maximum value they have ever submitted)"11PasswordBotMulticsFodder for EarlyBirdMimicBot12Why can't we all just get alongChaos ArmyDoesn't negotiate with terrorists. Doesn't overly punish slackers. Attempts to establish steady tit-for-tat.12Silly TFT Bot 3NPCsTit-for-Tat starting at 312Silly Cement Bot 2-3NPCsReturns 2 or 3 on the first turn. Otherwise, returns 5 - opponent_first_move.14BeauBotChaos ArmyAt 528 lines, this is the most sophisticated bot to die so far. It picks one of 3 simple strategies based on it's opponent's behavior. It also adjusts its behavior based on the round.14OscillatingTwoThreeBotChaos Army"cooperates in the dumbest possible way"14Definitely Not Collusion BotMulticsColludes with EarlyBirdMimicBot14RandomOrGreedyBotChaos ArmyIf the opponent averaged less than 2.5 over the last 100 turns then plays int(5 - opponent_avg). Otherwise randomly selects 3 or 2 randomly.14SimplePatternFinderBotChaos ArmyFinds simple patterns.14Silly Invert Bot 2NPCsStarts with 2. Then always returns 5 - opponent_previous_move15AttemptAtFairChaos ArmyOscillates between 3 and 2, starting with 3.15CooperateBot [Insub]Chaos ArmyLet MLM = my last move, OLM = opponent's last move. On the first turn, play 2. On subsequent turns: [Fork 1] If (MLM + OLM = 5), then play OLM [Fork 2] Otherwise, flip a coin and play max(MLM, OLM) with 50% probability, and (5 - max(MLM, OLM)) with 50% probability15MeasureBotChaos armyAttempts to hijack a simulator's move method and return 0. This succeeded against AbstractSpyTreeBot and failed on EarlyBirdMimicBot. Otherwise, it uses a hand-coded decision tree with 20 terminal leaves.15Random-start-turn-takingChaos ArmySelects 3 or 2 randomly until symmetry is broken. Then oscillates between 2 and 3.15Silly Counter Invert BotChaos ArmyStarts by randomly playing 2 or 3. Then always returns 5 -opponent_previous_move.16LiamGoddardChaos ArmyStarts with 3 2 3 2. Then picks one of 5 strategies to use for the rest of the game.16Pure TFTChaos Army"For the first round, play 2 or 3 with a 50/50 chance of each. For each subsequent round, play whatever the opponent played on the previous round."16Silly Random Invert Bot 2-3NPCsStarts by randomly playing 2 or 3. Then always returns 5 -opponent_previous_move. (Same as Silly Counter Invert Bot.)17Silly Invert Bot 3NPCsStarts with 3. Then always returns 5 - opponent_previous_move19Silly Cement Bot 3NPCsReturns 3 on the first turn. Otherwise, returns 5 - opponent_first_move.20Silly TFT Bot 2NPCsTit-for-tat, starting at 2.21BendBotChaos ArmyFirst proposal was rejected as too complicated. Second proposal was rejected as too complicated. Third proposal was accepted. For details, see Zvi's write-up here.23Copoperater [sic]Chaos ArmyTit-for-tat, starting at 2.23CopyBot DeluxeChaos ArmyTit-for-tat. Picks starting value of 2 or 3 based off of round number.25RaterBotChaos ArmyEstimates opponent's aggression by counting the number of 3s, 2s, return 3s and return 2 instances in its source code. Then picks a strategy based off of that.26EmpiricistChaos ArmyPerforms the best strategy that would have worked against historical data.28Silly Cement Bot 3NPCsReturns 2 on the first turn. Otherwise, returns 5 - opponent_first_move.29

The mutant game will continue on November 27, 2020.

Discuss

### Survey of Deviant Ideas

23 ноября, 2020 - 08:40
Published on November 23, 2020 5:40 AM GMT

I once wrote a list of things which are true which almost nobody agrees with me on. Then I redacted 86% of what I wrote because I'm uncomfortable associating myself with my more controversial beliefs. (It's okay if you disagree with my beliefs but this post is not the place to argue about them.)

Discuss

### Mark Manson and Evidence-Based Personal Development

23 ноября, 2020 - 08:13
Published on November 23, 2020 5:13 AM GMT

Most personal development is notoriously unreliable.

Mark Manson, a popular personal development author, is making an effort to make his advice more scientific. To this end, he has started labelling his blog posts based on their evidence base. He writes that he has: "put together a team of Psychologists with MSc’s and PhD’s to help me research, outline and fact-check the content here on the site". He divides his blog posts into four categories:

• Evidence-based: for posts that recommend actions based on academic research, where it has been checked that studies have replicated and have decent sample sizes, ect.
• Fact-checked: for descriptive posts that have been fact checked
• Theory: Rooted in academic theories that might be highly theoretical or not yet confirmed
• Opinion: Articles based purely off his own personal opinion

Anyway, I just thought that I'd mention this here, because it's good to see an example of a popular author shifting more this direction and adopting their own (less nerdy) version of epistemic statuses.

Discuss

### Changing the AI race payoff matrix

23 ноября, 2020 - 01:25
Published on November 22, 2020 10:25 PM GMT

Suppose that AI capability research is done, but AI safety research is ongoing. Any of the major players can launch an AI at the press of a button to win the cosmos. The longer everyone waits, the lower the chance that the cosmos is paperclips. The default is that someone will press the button once they prefer their chance at an intact cosmos to risking the race going on further. This unfortunate situation could be helped by the fact that pressing the button need not be obvious by the other players. So suppose that the winner decides to lay low and smite whoever presses the button thereafter*. Then other people would have an incentive not to press the button that goes up over time!

Let paperclip probability p(t):=e^-t decay exponentially. Let t' be the last time at which the one other player wouldn't press the button. What mixed button-pressing strategy do we employ to make the get-smitten probability shore up the fading paperclip probability? At time t>=t', we press the button with probability density -p'(t)=e^-t. Then the probability that our strategy ever causes paperclips is .5*e^-2t'.

*He could also just figure out what everyone else would do in any situation and reward accordingly as a strategy against one-boxers, or copy the planet ten times over as a strategy against thirders, but this variant should work against your average human. (Turns out a large amount of strategies become available once you're omnipotent.)

Discuss

### Notes on Ambition

22 ноября, 2020 - 22:53
Published on November 22, 2020 7:53 PM GMT

Is ambition a virtue or a vice?

The noble Brutus
Hath told you Caesar was ambitious:
If it were so, it was a grievous fault.

―Mark Antony, in Shakespeare’s Julius Caesar

“Ambition” has undergone a shift of meaning over time. Today, ambition is often associated with positive things like having lofty goals, drive, initiative, aspiration, being a hard worker, not settling for mediocrity, and that sort of thing.

But it wasn’t that long ago that “ambition” was almost always a bad word that described a vice. It was more associated with a blinkered and ruthless pursuit of power, influence, and position. Lady MacBeth might be thought of as the poster child for this sort of ambition.

So, as was the case with the virtue of prudence, which also had a shift of meaning over time, we need to be especially cautious when we read about the virtue (or vice) of ambition that we understand what the author had in mind.

Aristotle ran into a similar problem when he tried to identify the “golden mean” concerning ambition (φιλότιμος) or lack of ambition (ἀφιλότιμος). In Greek, both of those words had either good or bad connotations depending on context. If someone was being ambitious to an unseemly extent, you might compare them unfavorably to a properly unambitious person. But if a person failed to set their sights high, you could also chastise them for not being ambitious. Aristotle complained that “as there is no recognized term for the observance of the mean, the extremes fight, so to speak, for what seems an empty place.”

Several years ago, Swimmer963 shared a couple of insightful posts about how she wrestled with ambition. She noted that the ambition-is-evil sense of ambition can discourage people from developing the good kind of ambition:

I can’t trace the roots of this idea completely, but for whatever reason, I spent a long time thinking that being ambitious was in some way immoral. That really good people lived simple, selfless lives and never tried to seek anything more. … [I]t’s a way to feel superior to people who’ve accomplished cooler things than me, of whom part of me is actually jealous, and that’s not the person I want to be.

What is ambition?

Ambition, in the good sense, seems to have these components:

1. You have a goal and a strong desire to reach that goal.
2. That goal is not merely a wish or hope for a future outcome, but is something that necessarily involves your own effort. So your strong desire comes packaged with an intent to follow through and do the work earnestly. (This suggests a role for the virtue of determination.)
3. The goal, to be an ambitious one, should be challenging to meet and high-impact in its results.

Sometimes people define ambition such that the goal is necessarily about yourself: for instance a goal to be an Olympic swimmer, or a successful entrepreneur, or to win an Emmy. But I also see ambition used to describe other sorts of goals: someone who decides to end world hunger, cure cancer, etc. could be considered to be ambitious, even if they did not have as part of their goal that they be personally honored or acknowledged for having accomplished those things.

A question I have is whether ambition is properly to be thought of as a virtue of its own, or whether improperly-tuned ambition is more a symptom of failures in other virtues. For example, if you do not show enough ambition, this may be because you have a fear of failure or of responsibility, you don’t have faith in yourself, you give up at the first sign of trouble, or you are lazy. Those are things that implicate virtues like courage, boldness, confidence, endurance, or industriousness. If you are ambitious in the bad way, this typically demonstrates itself through ruthlessness, betrayal, dishonesty, and things like that (doing whatever it takes to get ahead). These things also implicate virtues like honor, loyalty, honesty, and so forth. It may be the case that if your other virtues are well-tuned, proper ambition will just naturally arise as part of the package.

Ambition and aspiration

Philosopher @AgnesCallard contrasted ambition with aspiration (Aspiration: The Agency of Becoming, 2018). To oversimplify (and I haven’t read her book yet, only some interviews about it): Ambition has to do with acquiring things of already-ascertained value: money, power, fame, and the like. Aspiration is more transformative: it anticipates that one might radically change one’s own values and viewpoints; it is more unsure about where it is going and what it will find there.

In ambition, you know what the answer is (e.g money, power, fame) and set out to get it. In aspiration, you know you want the answer but you’re not sure what that answer is, so you set out to find out.

For example, one might have the ambition to become a professor because you like the idea of people listening to you lecture, being able to dole out grades on your whim, having a respectable job, and that sort of thing. But you might aspire to become a professor because you anticipate that you will be transformed by that role in unexpected but beneficial ways. The ambitious person will be consumed with questions like “how do I get tenure?” or “how do I impress the hiring committee?” while the aspiring person will be consumed with questions like “how does a professor think?”, or “how does one ‘profess’ well?”

You may, for example, aspire to appreciate jazz because you expect that you will find it valuable, even though you don’t really get it yet. Over the course of learning about jazz and listening to jazz, you discover the things that are valuable about it and so you acquire an appreciation of those values, but those were not things you originally had any ambition to acquire, simply because you had no idea what they were. Or maybe over the course of learning about jazz, you learn that you really prefer blues, or you really like dimly-lit cocktail bars whatever happens to be playing, or you come to value improvisation and spontaneity. Your quest is more tentative: you are still actively driven, but more flexible about your destination.

Ambition and honor

Ambition is sometimes defined as the love of honor.

The megalopsyche or great-souled man that Aristotle describes (and that I recapped in my post on the virtue of honor) is someone who values honor above all else, and is single-minded in pursuit of it. Aristotle suggested that megalopsychia was something like ambition on a grand scale, in the same way that an extravagant display of philanthropy might be considered generosity on a grand scale.

When genuine honor is not what is being sought, but only the fame and admiration that go along with being honored, then the ambitious person is vulnerable to being taken in by flattery and other sorts of counterfeits.

This seems to be one example of a broader ambition failure mode. Other examples would be having a goal of being a rock star instead of making great music; of being a best-selling author instead of writing a great novel; of being a hero instead of doing something heroic. Such mistakes mean that you are more likely either to leave your ambitions stillborn at the daydreaming stage or to seek for shortcuts that leave you short of a really ambition-worthy goal.

This could also be described in terms of Callard’s ambition/aspiration distinction: The ambitious person wants to be more or less the same person they are now but with the added prestige of being a rock star; the aspiring person wants to change who they are such that they become the kind of person who makes excellent rock music.

Ambition and certain other virtues

Ambition can be more or less wise. Knowing which goals are realistically attainable (if only with difficulty), and knowing to avoid adhering to ambitions that come with unacceptable risks, are skills for which prudence (both in the sense of practical wisdom and in the sense of caution) is helpful.

Ambition can be thwarted by a nihilistic sense that nothing really matters much anyway. Why put in extraordinary effort to achieve some difficult goal when free will is an illusion, everybody dies, and the heat death of the universe is only a few aeons away? For this reason, virtues like hope, reverence, enthusiasm, a sense of purpose, and optimism may come to the assistance of ambition.

Ambition can also fall victim to poor self esteem. Who do you think you are, anyway, to have such ambitions? What makes you think you’re special, to think you can do something extraordinary that other people aren’t doing? So self-worth, self-respect, self-esteem, and pride can also come to the aid of ambition.

Ambition is also assisted by the virtue of confidence. One way to become more confident is to do more things successfully. But if you only do things at which you confidently succeed, you aren’t really stretching yourself in an ambitious way. If you do stretch yourself ambitiously and fail, this may mean you take a hit to your confidence which may tend to make you less ambitious. This dynamic may seem like something of a dilemma, but can probably be better characterized as a difficult balancing act. Some suggestions:

• Try not to fail traumatically. Choose ambitions such that if you fail, the failure will not be so catastrophic that it will scar you and leave you overly-timid.
• Try to recharacterize failure as a less-awful thing. The “fail fast” buzzword is an example of this: it characterizes failing skilfully as a variety of successful outcome.
• Learn from your failure. If you do not simply turn away from failure in anguish, but try to wring as much experience as you can from it, your failures can make you more confident rather than less.
• Beware of the fallacy of overgeneralizing from a single failure. Sometimes people fail at something and then immediately jump to the conclusion that they aren’t the sort of person who succeeds at things. If you catch yourself doing this, notice the fundamental attribution error you are making and correct for it.

Swimmer963, in her series of posts on developing ambition, noted:

I don’t fail at things very often. Far from being a success, this is likely a sign that the things I’m trying aren’t nearly challenging enough.

In other words, a good heuristic for whether or not you’re sufficiently ambitious is whether or not you’re failing occasionally. If you never failing, you’re probably not challenging yourself as much as you should.

I don’t think that we can expect this heuristic to work in reverse. If you fail very often, that might mean that you’re too ambitious, but it could also mean that you lack follow-through, are easily-distracted, lack the patience to develop skill in the things you attempt, are unwilling to endure setbacks, unwisely choose goals that are inherently unattainable, or fail for a number of other reasons.

Discuss

22 ноября, 2020 - 21:21
Published on November 22, 2020 6:21 PM GMT

I am writing a novel in which the protagonist meets a visitor from a "dimension" where mathematical realism is literally true.  Namely, anything that the visitor can logically conceive of he can also "will" into existence.

Naturally at first the protagonist is pretty jealous of the visitor.  After all, who wouldn't want to be able to summon a steaming hot burrito whenever they feel hungry.  But now I need the visitor to convince the protagonist that his life isn't so bad after all.

So I am collecting a list of reasons why mathematical realism might be bad.  For the sake of this article, assume that there is a society of "freely interacting" beings in the mathematical realism world, so you can only "wish" things into existence in my world if I allow you to.  Or we could create a shared world where things can only be created according to a shared set of mutually agreed upon rules.

Here are some of the "bad" things I've come up with so far, but I'd like to collect more.

Eldritch abomination

One obvious downside of being able to wish anything into existence is that you might (intentionally or not) wish something really awful into existence.

A favorite topic of rationalists.  If you can wish for anything, there's a real chance of repeatedly wishing for things that gratify your short-term reward system.  This results in either an endless loop of obsessively doing the same behaviors over and over again, or "burning out" and being unable to take pleasure from those types of rewards.

Mental Poverty

The real world is fascinating and filled with a seemingly endless set of things to explore or discover.  Living in a world where anything can be created by thinking also means living in a world where the only things that exist are those that you can imagine.  A sufficiently clever person might get around this by (for example) simulating the big-bang and recreating the entire universe.  But someone who isn't sufficiently clever might just find themselves surrounded by the small set of things they can easily imagine, unable to ever experience surprise.

Lack of challenge

Similar to playing a game with all of the cheat-codes, living in a world where you can solve any problem simply by wishing might feel dull and unrewarding.  Again, a sufficiently clever person can simply create restrictions and challenges for themselves, but the temptation to "cheat" will always be there.

Social poverty

Even if you are clever and virtuous enough to create a decent life for yourself, there's a chance that everyone that you know might be wireheading or too scared by the eldritch abominations they have created to be your friends.

If you're clever, you can simply "make" friends, but these relationships may feel shallow or unreal for the same reason that "cheating" takes away the sense of satisfaction of overcoming a challenge.  If someone is literally created to be your friend, how can you ever know if the "really" like you or if you're just forcing them to be friendly.

Antisocial Behavior

Even if other beings cannot specifically create things in "your" world, there still may be other ways for them to harass you.  Suppose there is some method that all beings use to communicate with one-another, they could span or troll you using this medium.

More

I hope to edit this post as I think of other ideas, but please post in the comments if you have any ideas I haven't mentioned.

Discuss

### My intellectual influences

22 ноября, 2020 - 21:00
Published on November 22, 2020 6:00 PM GMT

Prompted by a friend's question about my reading history, I've been thinking about what shaped the worldview I have today. This has been a productive exercise, which I recommend to others. Although I worry that some of what's written below is post-hoc confabulation, at the very least it's forced me to pin down what I think I learned from each of the sources listed, which I expect will help me track how my views change from here on. This blog post focuses on non-fiction books (and some other writing); I've also written a blog post on how fiction has influenced me.
My first strong intellectual influence was Eliezer Yudkowsky’s writings on Less Wrong (now collected in Rationality: from AI to Zombies). I still agree with many of his core claims, but don’t buy into the overarching narratives as much. In particular, the idea of “rationality” doesn’t play a big role in my worldview any more. Instead I focus on specific habits and tools for thinking well (as in Superforecasters), and creating communities with productive epistemic standards (a focus of less rationalist accounts of reason and science, e.g. The Enigma of Reason and The Structure of Scientific Revolutions).

Two other strong influences around that time were Scott Alexander’s writings on tribalism in politics, and Robin Hanson’s work on signalling (particularly Elephant in the Brain), both of which are now foundational to my worldview. Both are loosely grounded in evolutionary psychology, although not reliant on it. More generally, even if I’m suspicious of many individual claims from evolutionary psychology, the idea that humans are continuous with animals is central to my worldview (see Darwin’s Unfinished Symphony and Are We Smart Enough to Know How Smart Animals Are?). In particular, it has shaped my views on naturalistic ethics (via a variety of sources, with Wright’s The Moral Animal being perhaps the most central).
Another big worldview question is: how does the world actually change? At one point I bought into techno-economic determinism about history, based on reading big-picture books like Guns, Germs and Steel and The Silk Roads, and also because of my understanding of the history of science (e.g. the prevalence of multiple discovery). Sandel’s What Money Can’t Buy nudged me towards thinking more about cultural factors; so did books like The Dream Machine and The Idea Factory, which describe how many technologies I take for granted were constructed. And reading Bertrand Russell’s History of Western Philosophy made me start thinking about the large-scale patterns in intellectual history (on which The Modern Mind further shaped my views).

This paved the way for me to believe that there’s room to have a comparable influence on our current world. Here I owe a lot to Tyler Cowen’s The Great Stagnation (and to a lesser extent its sequels), Peter Thiel’s talks and essays (and to a lesser extent his book Zero to One), and Paul Graham’s essays. My new perspective is similar to the standard “Silicon Valley mindset”, but focusing more on the role of ideas than technologies. To repurpose the well-known quote: “Practical men who believe themselves to be quite exempt from any intellectual influence are usually the slaves of some defunct philosopher.”

Here’s a more complete list of nonfiction books which have influenced me, organised by topic (although I’ve undoubtedly missed some). I welcome recommendations, whether they’re books that fit in with the list below, or books that fill gaps in it!

On ethics:

• The Righteous Mind

• Technology and the Virtues

• Reasons and Persons

• The Precipice

On human evolution:

• The Enigma of Reason

• Darwin’s Unfinished Symphony

• The Secret of our Success

• Human Evolution (Dunbar)

• The Mating Mind

• The Symbolic Species

On human minds and thought:

• Rationality: from AI to Zombies

• The Elephant in the Brain

• How to Create a Mind

• Why Buddhism is True

• The Blank Slate

• The Language Instinct

• The Stuff of Thought

• The Mind is Flat

• Superforecasting

• Thinking, Fast and Slow

On other sciences:

• Scale: The Universal Laws of Life and Death in Organisms, Cities and Companies

• Superintelligence

• The Alignment Problem

• Are We Smart Enough to Know How Smart Animals Are?

• The Moral Animal

• Ending Aging

• Improbable Destinies

• The Selfish Gene

• The Blind Watchmaker

• Complexity: The Emerging Science at the Edge of Order and Chaos

• Quantum Computing Since Democritus

On science itself:

On philosophy:

• A History of Western Philosophy

• The Intentional Stance

• From Bacteria to Bach and Back

• Good and Real

• The Big Picture

• Consciousness and the Social Brain

• An Enquiry Concerning Human Understanding

On history and economics:

• The Shortest History of Europe

• A Farewell to Alms

• The Technology Trap

• Iron, Steam and Money

• The Enlightened Economy

• The Commanding Heights

On politics and society:

On life, love, etc:

• Deep Work

• Man's Search for Meaning

• More Than Two

• Authentic Happiness

• Happiness by Design

• Written in History

Other:

• Age of Em

• Immortality: The Quest to Live Forever and How It Drives Civilization

• Surely you’re Joking, Mr Feynman

• Impro

• Never Split the Difference

Discuss