Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 16 минут 16 секунд назад

D&D.Sci 4th Edition: League of Defenders of the Storm Evaluation & Ruleset

5 октября, 2021 - 20:30
Published on October 5, 2021 5:30 PM GMT

This is a follow-up to last week's D&D.Sci scenario: if you intend to play that, and haven't done so yet, you should do so now before spoiling yourself.


Code is available here for those who are interested.


A character has two stats: an Element and a Power Level.

18 of the 19 characters are power levels 1-6 of the elements Fire, Water and Earth:

Power LevelFireWaterEarth1Volcano VillainArch-AlligatorLandslide Lord2Oil OozeCaptain CanoeEarth Elemental3Fire FoxMaelstrom MageDire Druid4Inferno ImpSiren SorceressQuartz Questant5Phoenix PaladinWarrior of WinterRock-n-roll Ranger6Blaze BoyTidehollow TyrantGreenery Giant

The remaining character, the Nullifying Nightmare, has a Power Level of 5 with the unique element of Void.  

The NPC team consists of Fire 5, Water 6, Earth 3, Earth 4, Earth 6.

Congratulations here to abstractapplic, who was the first to figure out the elements.


A fight between two teams is composed of fights between the individual characters.  When two characters fight one another, it works as follows:

  • Some elements counter others:
    • Fire is countered by Water
    • Water is countered by Earth
    • Earth is countered by Fire
  • If one character's element is countered by the other, that character loses the fight (regardless of power level).  For example, if Oil Ooze (Fire 2) fights Greenery Giant (Earth 6), Oil Ooze will win.
  • If the characters are the same element, the higher-power one will win.  For example, if Phoenix Paladin (Fire 5) fights Fire Fox (Fire 3), Phoenix Paladin will win.
  • If the characters are the same element and the same power, each has a 50% chance to win.
  • There are two special cases:
    • The 1 of each element counters the 6 of the same element, and beats it rather than losing to it.  So if Volcano Villain (Fire 1) fights Blaze Boy (Fire 6), Volcano Villain will win.  (Congratulations to Yonge, who I think is the first person to have explicitly noticed one of these counters).
    • The Nullifying Nightmare has Power Level 5, and the unique element of Void.  Void does not counter any elements, and is not countered by any elements - the Nightmare just fights with Power Level directly, as if it were the same element as its opponent.  So it will lose a fight to any Power 6, have a 50% chance against any Power 5, and beat any Power 1-4.

To find the outcome of a game between two teams:

  • Choose a random character from each team.
  • Those two characters fight.
  • The loser is KOd and removed from their team.
  • The winner sticks around.
  • Repeat this process until one team has run out of characters.  That team loses.

This ruleset encourages balanced teams.  For example, a team of 5 Fire characters will lose to a team of 4 Earth and 1 Water characters.  Even though 4/5 of character matchups favor the Fire characters, nothing on their team can beat the one Water character, and eventually it will work its way through their whole team and win.

Once you understand how the rules work, general strategy is:

  • Choose high-Power characters.
  • Try to be reasonably balanced elementally.
  • Try to counter the enemy team (with 1s against their 6s, a tilt towards the right elements, etc).
  • The elemental counter mechanic does not always work as you might expect it to at the team level - rather than thinking about countering your opponent's common elements, you should think of it as targeting your opponent's weak elements.  If your opponent has 3 Earth and 2 Water characters, for example,  rather than saying 'that team has lots of Earth characters, I should bring high-power Fire characters to beat them' (which will get you wiped out by the Water characters), you want to say 'that team has no Fire characters, I should bring high-power Earth characters it won't be able to beat'.

The games you have access to were played by players grouped together by the games auto-party functionality.  These players do not build coordinated teams, so there's no correlation between characters.

However, some characters are more popular than others (most notably the Siren Sorceress, whose infamous costume has won her a large and...enthusiastic...fan base).

Overall, Water characters are somewhat more common and Earth characters somewhat less common.  This doesn't affect the game itself, but it means that naive win rate evaluations will make Fire characters look substantially weaker than they are (since the Water characters that counter them are common and the Earth characters that they counter are rare).

The full dataset contains around a million game results.  However, Cloud Liquid Gaming's previous data specialist loaded the data into an Excel carelessly and accidentally truncated it at row 65536, so you only received 65535 entries in your data.

(Real-world Data Science Moral: it is very rare for the length of a dataset to naturally be a power of 2/one less than a power of 2.  If you see that, you should suspect that your data got cut off at some point.)


Note: all winrates below were Monte-Carlo calculated rather than explicitly derived.  Luckily it doesn't look like rankings are close enough for slight Monte Carlo error to matter.

The optimal team for fighting the NPC team consists of Fire 5 (Phoenix Paladin), Fire 6 (Blaze Boy), Water 1 (Arch-Alligator), Earth 5 (Rock-n-roll Ranger), Earth 6 (Greenery Giant).

The most important things to bring (in roughly descending order) were:

  • Blaze Boy (Fire6), who can be beaten only by one character on the NPC team (their Tidehollow Tyrant, Water6).
  • Greenery Giant (Earth6), who can be beaten only by the NPC's Phoenix Paladin (Fire5), or by losing the tiebreaker to the NPC's Greenery Giant.
  • Characters that beat the NPC Tidehollow Tyrant (e.g. Arch-Alligator, Rock-n-Roll Ranger).  If you can KO the Tidehollow Tyrant, your Blaze Boy can beat everything else on their team itself.
  • Characters that beat the NPC Greenery Giant/Phoenix Paladin (e.g. Phoenix Paladin)

(Nullifying Nightmare, despite its very high overall win rate, is not that good to bring here - it performs poorly against teams with multiple 6s on them).

Entries were:

Entrant(s)TeamWin RateOptimal PlayFire5, Fire6, Water1, Earth5, Earth681.47%gjm*Fire5, Fire6, Water1, Earth1, Earth680.40%Alumium, simonFire5, Fire6, Earth1, Earth5, Earth676.53%GuySrinivasanFire5, Fire6, Water3, Earth5, Earth672.41%Measure, JemistFire6, Water6, Earth5, Earth6, Void570.01%abstractapplicFire5, Fire6, Water6, Earth6, Void566.97%Maxwell PetersonFire5, Water1, Earth1, Earth6, Void562.05%YongeWater6, Earth1, Earth5, Earth6, Void536.00%Random Play5 randomly selected characters28.55%lsusrFire3, Water5, Earth2, Earth3, Earth414.97%

*After fixing a very entertaining early bug where he set up his code to pessimize his team instead of optimizing it.  I like to think that a guy from the opposing team snuck in and offered him a briefcase full of cash to sabotage his employers.

Congratulations to everyone who submitted.  Particular shoutouts go to: 

  • The top answer submitted by gjm, whose answer was extremely close to optimal (Earth5 vs Earth1 is a very close call against this team).
  • The second-place answer submitted by Alumium and simon, who did extremely well despite submitting a team with a worrying-looking elemental tilt (the lack of Water characters did not end up hurting them much because the NPC team has only one Fire character, Phoenix Paladin with Strength 5, which can be KOd e.g. by your Blaze Boy).
  • The answer by GuySrinivasan, who suffered a lot by bringing Maelstrom Mage (Water3) instead of Water6 or Water1, but who was the first person not to get tricked into bringing the Nullifying Nightmare.
  • The answer by Maxwell Peterson, who suffered a lot by missing Blaze Boy from his team but was the first person to bring along Arch-Alligator (the optimal counter to the opposing team's Tidehollow Tyrant).

Note: all winrates below were Monte-Carlo calculated rather than explicitly derived. Luckily it doesn't look like rankings are close enough for slight Monte Carlo error to matter.

Note: The commentary below should be considered non-final for a few days to give people time to point out that I've misread the teams they submitted/added up win percentages wrong/made other obvious mistakes.  If I have messed something like that up I'll have to recalculate, so don't count on victory/defeat until some more eyes have confirmed.

The PVP submissions received were (ordered from earliest to latest received, though as there were no duplicate teams this didn't end up mattering):

lsusr: Fire4, Water1, Water5, Earth5, Void5

Measure: Fire1, Water4, Water6, Earth1, Void5

abstractapplic: Fire5, Fire6, Water6, Earth6, Void5

Yonge: Water6, Earth1, Earth5, Earth6, Void5

GuySrinivasan: Fire6, Water5, Water6, Earth6, Void5

Maxwell Peterson: Fire6, Water6, Earth1, Earth6, Void5

gjm: Fire5, Fire6, Water6, Earth6, Void5

Alumium: Fire6, Water6, Earth1, Earth5, Earth6

Jemist: Fire6, Water6, Earth5, Earth6, Void5

simon: Fire2, Fire6, Water1, Water6, Earth6

The most common team consisted of the Nullifying Nightmare, all three 6-power characters, and one 5-power character: abstractapplic, GuySrinivasan and Jemist all submitted variants on this team (each choosing a different element for their 5-power character), plus Alumium submitted that team earlier before changing it out for a different one.

Win rates were:

 AlumiumsimonMaxwell PetersonJemistabstractapplicgjmGuySrinivasanYongeMeasurelsusrOverall ScoreAlumium–46.29%56.57%58.27%62.69%62.87%65.60%82.14%55.99%73.95%5.64simon53.71%–53.87%64.97%67.80%68.06%60.75%69.63%50.21%68.09%5.57Maxwell Peterson43.43%46.13%–54.18%58.78%58.53%61.90%73.82%68.37%84.80%5.50Jemist41.73%35.03%45.82%–50.46%50.49%49.26%73.09%76.70%87.46%5.10abstractapplic37.31%32.20%41.22%49.54%–50.13%50.51%67.15%64.42%86.36%4.79gjm37.13%31.94%41.47%49.51%49.87%–50.21%67.34%64.28%86.50%4.78GuySrinivasan34.40%39.25%38.10%50.74%49.49%49.79%–54.93%57.00%88.72%4.62Yonge17.86%30.37%26.18%26.91%32.85%32.66%45.07%–77.44%75.83%3.65Measure44.01%49.79%31.63%23.30%35.58%35.73%43.00%22.56%–40.80%3.26lsusr26.05%31.91%15.20%12.54%13.64%13.50%11.28%24.17%59.20%–2.07

The three symmetrical teams did quite well (with the differences between them coming down to which elements countered other teams best), but did not ultimately win.  

The Nullifying Nightmare was extremely common but not very strong (since most teams included multiple 6s) - ultimately neither of the top 2 teams included it.  

Conditional on this data holding up when more eyes look at it, I believe the victory went to Alumium, who managed to get all three 6s, avoid the Nightmare, include a Power 1 character for the counterpick against strong teams, and have an elemental tilt that helped prey more effectively on some lower-tier teams.  Alumium's team was a bit Earth-heavy, but no other team quite managed to compete.

Condolences to simon, whose team was close to being extremely strong, but whose confusing inclusion of Oil Ooze (Fire2) instead of either Fire1 or Fire5 cost him just enough to drop him to second place.

Congratulations Alumium!  Once your victory has been confirmed and you've figured out what theme (either a general genre or a specific work*) you want to request an upcoming scenario be based on, PM or comment and I'll try to get it to happen.  I can't promise it'll happen soon (it'll take some time to write one of these, other people are queued up to publish theirs, and I might end up submitting a Christmas-themed one in December, so you'll end up waiting until some time late this year or early next year).

*Ability to select a specific work is contingent on me being familiar with that work and thinking I can write a scenario based on it.


I'm interested to hear feedback on what people thought of this scenario.  If you played it, what did you like and what did you not like?  If you might have played it but decided not to, what drove you away?  What would you like to see more of/less of in future?

Thanks for playing!  Now, if you'll excuse me, the League of Legends world championship is starting, and I need to go watch North America's finest best least dreadful teams be shamefully routed by teams from countries I've never heard of!


Using Rationality in Mafia

5 октября, 2021 - 20:11
Published on October 5, 2021 4:46 PM GMT

I'm an avid player of the game of Mafia, played over weeks/months on forums.

I'm wondering how the skills and tools of rationality (most saliently Bayesian reasoning, but potentially a wide range of others) could be best used to accurately identify people aligned with the Mafia/convince people to think you're aligned with the Town.

What are your ideas?


Covid Home Test Experiment

5 октября, 2021 - 18:24
Published on October 5, 2021 3:24 PM GMT

I was exposed to COVID 4 days ago. I received the Pfizer vaccine second dose 4 months ago. I have done 2 rapid tests today. One came back positive, one negative. I would like to do 18 more and report back my data. However, I cannot afford that many home tests.

Would anyone like to sponsor this home science project?


Modelling and Understanding SGD

5 октября, 2021 - 16:41
Published on October 5, 2021 1:41 PM GMT

I began this as a way to get a better understanding of the feeling of SGD in generalized models. This doesn't go into detail as to what a loss function actually is, and doesn't even mention neural networks. The loss functions are likely to be totally unrealistic, and these methods may be well out-of-date. Nonetheless I thought this was interesting and worth sharing.

Imagine we have a one-parameter model, fitting to one datapoint. The parameter starts at W=0.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-surd + .mjx-box {display: inline-flex} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}  and the loss is L=−exp(−(W−2)2). The gradient will then be dLdW=2(W−2)exp(−(W−2)2).

An imaginary continuous gradient descent will smoothly move to the bottom of the well and end up with W=2.

A stepwise gradient descent needs a hyperparameter T telling it how much to move the parameters each step. Let's start with this at 1.

This gives us a zig-zag motion. The system is overshooting. Let's pick a smaller T value.

Hmm. Now our model converges quite slowly. What about 0.5?

Much better. Seems that there's an optimal value of T, which we will later see depends on the spikiness of the loss function.

T is not dimensionless and has the dimension of W2/L, as ΔW=−TdLdW. Certain function-minimising methods like Newton's method use (for example) ΔW=−kdLdW/d2LdW2 where k is dimensionless. This is why different loss functions require different T values.

SGD and Local Minima

What if we have two datapoints? Now L=(l0+l1)/2. Let l0=−exp(−(W−2)2) as above but l1=−exp(−(W−2)2)−exp(−100(W−1)2). 

Now our loss function L has a new local minimum I think of this as "the pit". Where we end up depends on where we start. If we start at W=4 then we'll clearly end up at W=2 but if we start at W=0, then:

Huh, this isn't what I expected at all! The gradient must have been too high around the local minimum. Let's try with T = 0.05, which will be slower but ought to work better.

This is more like what I expected. Now our system is stuck in the local minimum.

But most modern gradient descent algorithms use stochastic gradient descent. We can model this here by randomly picking one of the two datapoints (with associated loss function) to descend by each step.

What happens now? Well now we have a chance to escape the pit. Let's say we're at the minimum of the pit. How many consecutive descent steps performed on l0 will get us out of the pit?

Well by solving for the stationary points of l1, we get W=1.004 and W=1.196. It turns out 5 steps of descent on l0 will get us to W=1.201 which is out of the pit. Within 100 steps we have an 81% chance of getting out.

The interesting thing is that this doesn't depend much on the depth of the W=1 pit. If the local pit is twice as deep in l1, then we only require one more consecutive step to escape it. If this is the case, then W=1 is in fact a global minimum in L. But because of SGD, it's not stable, and the only stable point is p0=2! Clearly there is something other than overall minimum which affects how the parameter of the model changes over time.

What about smaller values of T? The number of consecutive steps in l0 needed to escape the pit is inversely proportional to T. In the infinite limit as T→∞, we just converge on a continuous gradient descent, which will find the first minimum it comes to.

This reminds me of chemistry. It's as if the size of the step has to be big enough for the model to overcome some activation energy to cross the barrier from the p0=1 pit to the p0=2 one. This is the motivation for the letter T: temperature. The larger T is, the more likely it is that the model will cross over out of the local minimum.

Monte Carlo

In fact we generally need significantly fewer than this. Starting around 1, one update on l0 is enough to push us into the high-gradient region of l1, which means even an update on l1 will not move us back down into the centre, but rather to the left side of the pit, where a subsequent l1 update might push us further towards the right.

Let's estimate the probability of escaping the pit in 100 steps as a function of T:

Probability of escaping the pit in 100 steps as a function of T

What about 1000?

Probability of escaping the pit in 1000 steps as a function of T

As we sort-of expected, it is easier to escape the pit than our original model predicted.

Let's look more closely at the region between 0.01 and 0.02:

Probability of escaping the pit in 100 steps as a function of T (note Y axis)Probability of escaping the pit in 1000 steps as a function of TProbability of escaping the pit in 10000 steps as a function of T

It looks roughly like the log of the number of steps required to escape the pit is a function of T. Hopefully the later posts in this series will allow me to understand why.


Force neural nets to use models, then detect these

5 октября, 2021 - 14:31
Published on October 5, 2021 11:31 AM GMT

Research projects

I'm planning to start two research projects on model splintering/reward generalisation and learning the preferences of irrational agents.

Within those projects, I'm aiming to work on subprojects that are:

  1. Posed in terms that are familiar to conventional ML;
  2. interesting to solve from the conventional ML perspective;
  3. and whose solutions can be extended to the big issues in AI safety.

The point is not just to solve the sub-problems, but to solve them in ways that generalise or point to a general solution.

The aim is to iterate and improve fast on these ideas before implementing them. Because of that, these posts should be considered dynamic and prone to be re-edited, potentially often. Suggestions and modifications of the design are valuable and may get included in the top post.

Force model use and then detect it

Parent project: this is a subproject of the value learning project.


I've seen human values residing, at least in part, in our mental models. We have a mental model of what might happen in the world, and we grade these outcomes as good or bad. In order to learn what humans value, the AI needs to be able to access the mental models underlying our thought processes.

Before starting on humans, with our messy brains, it might be better to start on artificial agents, especially neural-net based ones that superficially resemble ourselves.

The problem is that deep learning RL agents are generally model-free. Or, when they are model-based, they are generally constructed with a model explicitly, so that identifying their model is as simple as saying "the model is in this sub-module, the one labelled 'model'."


The idea here is to force a neural net to construct a model within itself - a model that we can somewhat understand.

I can think of several ways of doing that. We could get a traditional deep learning agent that performs on a game. But we might also force it to answer questions about various aspects of the game, identifying the values of certain features we have specified in advance ("how many spaceships are there on the screen currently?"). We can then use multi-objective optimisation with a strong simplicity prior/regulariser. This may force the agent to use the categories it has constructed to answer the questions, in order to play the game.

Or we could be more direct. We could, for instance, have the neural net pass on instructions or advice to another entity that actually plays the game. The neural net sees the game state, but the other entity can only react in terms of the features we've laid down. So the neural net has to translate the game state into the features (this superficially looks like an autoencoder; those might be another way of achieving the aim).

Ideally, we may discover ways of forcing an agent to use a model without specifying the model ourselves; some approaches to transfer learning may work here, and it's possible that GPT-3 and other transformer-based architectures already generate something that could be called an "internal model".

Then, we go looking for that model within the agent. Here the idea is to use something like the OpenAI microscope. That approach allows people to visualise what each neuron in an image classifier is reacting to, and how the classifier is doing its job. Similarly, we'd want to identify where the model resides, how it's encoded and accessed, and similar questions. We can then modify the agent's architecture to test if these characteristics are general, or particular to the agent's specific design.

Research aims
  1. See how feasible it is to force a neural net based RL agent to construct mental models.
  2. See how easy it is to identify these mental models within the neural net, and what characteristics they have (are they spread out, are they tightly localised, how stable are they, do they get reused for other purposes?).
  3. See how the results of the first two aims might lead to more research, or might be applied to AI-human interactions directly.


[ACX Linkpost] Too Good to Check: A Play in Three Acts

5 октября, 2021 - 08:04
Published on October 5, 2021 5:04 AM GMT

Y'all read this already, right? Read it.


Nuclear Espionage and AI Governance

5 октября, 2021 - 05:06
Published on October 4, 2021 11:04 PM GMT


Using both primary and secondary sources, I discuss the role of espionage in early nuclear history. Nuclear weapons are analogous to AI in many ways, so this period may hold lessons for AI governance. Nuclear spies successfully transferred information about the plutonium implosion  bomb design and the enrichment of fissile material. Spies were mostly ideologically motivated. Counterintelligence was hampered by its fragmentation across multiple agencies and its inability to be choosy about talent used on the most important military research program in the largest war in human history. Furthermore, the Manhattan Project’s leadership prioritized avoiding domestic political oversight over preventing espionage. Nuclear espionage most likely sped up Soviet nuclear weapons development, but the Soviet Union would have been capable of developing nuclear weapons within a few years without spying. The slight gain in speed due to spying may nevertheless have been strategically significant.

Based on my study of nuclear espionage, I offer some tentative lessons for AI governance:

  • The importance of spying to transformative AI development is likely to be greater if the scaling hypothesis is false than if it is true. 
  • Regardless of the course that AI technology takes, spies may be able to convey information about engineering or tacit knowledge (although more creativity will be required to transfer tacit than explicit knowledge). 
  • Nationalism as well as ideas particularly prevalent among AI scientists (including belief in the open source ideal) may serve as motives for future AI spies. Spies might also be financially motivated, given that AI development mostly happens in the private sector (at least for now) where penalties for spying are lower and financial motivations are in general more important.
  • One model of technological races suggests that safety is best served by the leading project having a large lead, and therefore being secure enough in its position to expend resources on safety. Spies are likely, all else equal, to decrease the lead of the  leader in a technological race. Spies are also likely to increase enmity between competitors, which seems to increase accident risk robustly to changes in circumstances and modeling assumptions. Therefore, it may make sense for those who are concerned about AI safety to take steps to oppose espionageeven if they have no preference for the labs being harmed by espionage over the labs benefiting from espionage. 
  • On the other hand, secrecy (the most obvious way to prevent espionage) may increase risks posed by AI by making AI systems more opaque. And countermeasures to espionage that drive scientists out of conscientious projects may have perverse consequences. 

Acknowledgements: I am grateful to Matthew Gentzel for supervising this project and Michael Aird, Christina Barta, Daniel Filan, Aaron Gertler, Sidney Hough, Nat Kozak, Jeffery Ohl, and Waqar Zaidi for providing comments. This research was supported by a fellowship from the Stanford Existential Risks Initiative. 

This post is a short version of the report, x-posted from EA Forum. The full version with additional sections, an appendix, and a bibliography, is available here

1. Introduction

The early history of nuclear weapons is in many ways similar to hypothesized future strategic situations involving advanced artificial intelligence (Zaidi and Dafoe 2021, 4). And, in addition to the objective similarity of the situations, the situations may be made more similar by deliberate imitation of the Manhattan Project experience (see this report to the US House Armed Service Committee). So it is worth looking to the history of nuclear espionage for inductive evidence and conceptual problems relevant to AI development. 

The Americans produced a detailed official history and explanation of the Manhattan Project, entitled the Smyth Report, and released it on August 11, 1945, five days after they dropped the first nuclear bomb on Japan (Wellerstein 2021, 126). For the Soviets, the Smyth Report “candidly revealed the scale of the effort and the sheer quantity of resources, and also hinted at some of the paths that might work and, by omission, some that probably would not” (Gordin 2009, 103). While it would not have allowed for copying the Manhattan Project in every detail, the Soviets were able to use the Smyth Report as “a general guide to the methods of isotope separation, as a checklist of problems that needed to be solved to make separation work, and as a primer in nuclear engineering for the thousands upon thousands of engineers and workers who were drafted into the project” (Gordin 2009, 104). 

There were several reasons that the Smyth Report was released. One was a belief that, in a democratic country, the public ought to know about such an important matter as nuclear weapons. Another reason was a feeling that the Soviets would likely be able to get most of the information in the Smyth Report fairly easily regardless of whether it was released. Finally, releasing a single report would clearly demarcate information that was disseminable from information that was controlled, thereby stemming the tide of disclosures coming from investigative journalists and the tens of thousands of former Manhattan Project employees (Wellerstein 2021, 124-125). Those leaks would not be subject to strategic omission, and might, according to General Leslie Groves (Director of the Manhattan Project) “start a scientific battle which would end up in congress” (Quoted in Wellerstein 2021, 125). The historian Michael Gordin summarized the general state of debate between proponents and opponents of nuclear secrecy in the U.S. federal government in the late 1940s as follows:

How was such disagreement possible? How could Groves, universally acknowledged as tremendously security-conscious, have let so much information, and such damaging information, go?... The difference lay in what Groves and his opponents considered to be useful for building an atomic bomb. Groves emphasized the most technical, most advanced secrets, while his opponents stressed the time-saving utility of knowing the general outlines of the American program (Gordin 2009, 93).

In Gordin's view,  "in the context of the late 1940s, his [Groves's] critics were more right than wrong" (Gordin 2009, 93), though it is important to note that the Smyth Report's usefulness was complemented by the extent of KGB spying of which neither Groves nor his critics were yet aware. Stalin decided to imitate the American path to the nuclear bomb as closely as possible because he believed that it would be both the “fastest” and the “most reliable” (Quoted in Gordin 2009, 152-153). The Smyth Report (and other publicly available materials on nuclear weapons) contained strategic omissions. The Soviets used copious information gathered by spies to fill in some of the gaps.

2. Types of information stolen 2.1 Highly abstract engineering: bomb designs 

Bomb designs were one of the most important categories of information transferred by espionage. To illustrate why design transfer was so important, it is necessary to review some basic principles of nuclear weaponry (most of what follows on nuclear weapons design is adapted from a 2017 talk by Matt Bunn). 

Fission weapons work by concentrating a critical mass of fissile material. A critical mass is enough fissile material to start a nuclear chain reaction. A critical mass by itself, however, is not a viable nuclear weapon because it will heat up dramatically, turn into gas, expand in volume, and cease to constitute a critical mass, thereby stopping the chain reaction before it has had a chance to consume most of the fuel. The simplest possible nuclear bomb, a gun type design, works by launching a shell of highly enriched uranium-235 into another piece of highly enriched uranium-235. Neither piece of uranium-235 is critical by itself, but together they amount to a critical mass. The tamper prevents the critical mass from expanding out into a diffuse cloud of gas. A massive amount of heat is released, turning the fissile material to gas. The temperature rises to that of the core of the sun. In a gas, a rise in temperature causes a corresponding increase in pressure. This leads to a massive increase in pressure, and an extremely energetic explosion. The bomb dropped on Hiroshima, Little Boy, was a gun type bomb.

Gun type bomb design

The amount of fissile material required to achieve critical mass decreases with density squared. So compressing one’s fissile material means one gets more explosive power for the same amount of fuel. This is the key to the more advanced plutonium implosion bomb design, which was used for the Fat Man bomb dropped on Nagasaki. A plutonium implosion bomb has a core of fissionable plutonium surrounded by a tamper in the middle and, at the top layer, a chemical explosive. The explosive detonates, pushing the tamper in towards the core, which begins a nuclear chain reaction. This design uses plutonium-239, which is easier to obtain than the uranium-235 used in a gun type bomb. 

Plutonium implosion bomb design

The first Soviet nuclear test was not of the relatively simple gun type. Instead it was a far more complex plutonium implosion assembly. The Soviets received the American plutonium implosion design twice, from two spies, and copied it for their first nuclear bomb (Holloway 1994, 366; Haynes, Klehr, and Vassiliev 2009, 117, 119). 

Having two sources for the design gave the Soviets confidence that the design would work and was not FBI disinformation, no small thing given that the leaders of the Soviet nuclear weapons effort had reason to believe they would be executed if the first test failed (Gordin 2009, 171; Holloway 1994, 218). Furthermore, the Soviets were hard pressed to separate enough uranium-235 from the more plentiful uranium-238 to make a gun type uranium bomb work (gun type plutonium bombs are not viable). This was because the Western Allies had taken pains to corner the world supply of high quality uranium ore. The low quality ore that the Soviets had was adequate to the task of breeding plutonium, but it would have been more expensive and slower for the Soviets to separate enough uranium-235 enough to build a gun type bomb (Gordin 2009, 149-151). Often, controlling material and controlling information are thought of as different strategies for preventing nuclear proliferation. But in the first years after the creation of the atomic bomb, the West’s failure to control information about nuclear weapons design undermined its strategy of controlling fissile material to prevent nuclear proliferation.

2.2 Less abstract engineering

Most of the effort expended during the Manhattan Project went into the enrichment of fissile material. Ted Hall provided information about methods of uranium isotope (“25” in KGB code) separation, as reported in a decrypted cable sent from New York Station to Moscow Center on May 26, 1945:

KGB cable about Ted Hall

Fuchs similarly provided data about electromagnetic techniques of isotope separation for uranium-235 (“ENORMOZ” in KGB code ordinarily referred to the Manhattan Project as a whole, but in this case it meant uranium-235 specifically), which was reported in a decrypted cable from Moscow to New York of April 10, 1945.

KGB cable about Klaus Fuchs

In addition to technical reports on enriching fissile material from Fuchs and Hall, the Soviets had plant designs for the Oak Ridge facility from Russell McNutt, data on plutonium from an unidentified spy, and data on the Chalk River facility in Canada’s nuclear reactor from Alan Nunn May, (see the appendix of the full report for a list of Manhattan Project spies). The Soviets were also occasionally able to acquire physical samples from spies. They received 162 micrograms of uranium-235 from Alan Nunn May, and David Greenglass “provided the Soviets with a physical sample of part of the triggering mechanism [of a plutonium bomb]” (Klehr and Haynes 2019, 12).

2.3 Types of information and the AI case

To the extent that the information that the most advanced AI projects have that their closest competitors lack is highly abstract and easy to convey, the potential significance of spying is very large. Simple, abstract ideas (analogous to basic principles of bomb design in the nuclear case) are the easiest to transfer. The question of how important theoretical breakthroughs will be to the future development of AI is closely related to the debate over the scaling hypothesis. The scaling hypothesis holds that current techniques are sufficient to eventually produce transformative artificial intelligence (TAI) if the neural networks are just made large enough (Branwen 2020; for an explanation of the idea of TAI see Karnofsky 2016). The reason that TAI does not yet exist, per the scaling hypothesis, is that the hardware and the will to invest in scaling does not yet exist (Branwen 2020). To the extent that this is true, it seems that stealing highly abstract ideas about AI algorithms is unlikely to make much of an impact, and that there is unlikely to be an algorithmic analog of the plutonium implosion bomb design. On the other hand, abstract ideas about data types, data processing, or assembling the requisite computing power might be transferred by spies to great effect.

    Spies transferred about 10,000 pages of technical material on nuclear weapons from the Manhattan Project to the Soviet Union (Haynes, Klehr, and Vassiliev 2009, 60). At that level of volume, one can convey information about engineering that is concrete and detailed rather than abstract and simple, such as the design of machinery and facilities used for the separation of uranium isotopes. Even devoted proponents of the scaling hypothesis acknowledge that when replicating an effort based on scaling up existing techniques, one should “never underestimate the amount of tweaking and special sauce it takes” (Branwen 2020). 

But just how significant is engineering knowledge of an intermediate level of abstraction likely to be as a bottleneck on AI capabilities? Unlike the Manhattan Project, advanced AI does not obviously require a massive industrial effort to purify rare materials. However, if significant AI research begins to be conducted by governments and international trade in computer chips becomes more restricted, the ability to solve engineering problems in the physical world might again come to differentiate the leading nation from its closest competitors. In such a regime, spying on the details of electrical engineering, materials science, and industrial processes might prove important (see Khan and Mann 2020).

The Anglo-American effort to prevent nuclear proliferation by cornering the world supply of uranium (discussed in section 2.1 above) might have been far more effective but for the Soviet’s use of espionage data on the plutonium route to the bomb. Similarly, strategies to restrict AI proliferation that rely on restricting information, and strategies that rely on restricting access to physical materials (in the AI case rare earth metals, chips, and semiconductor manufacturing equipment rather than high quality uranium ore) might be mutually reinforcing.

Tacit knowledge seems to play an important role in AI research. Knowing what sorts of training data to select for a model might involve tacit knowledge. More significantly, knowing which research directions are likely to be promising is a key element of AI research (or any other kind of research), and such knowledge includes an important tacit dimension. In a discussion of what one learns in a computer science PhD program, Andrej Karpathy explained the importance of the tacit knowledge embedded in “taste” to computer science research:

When it comes to choosing problems you’ll hear academics talk about a mystical sense of “taste”. It’s a real thing. When you pitch a potential problem to your adviser you’ll either see their face contort, their eyes rolling, and their attention drift, or you’ll sense the excitement in their eyes as they contemplate the uncharted territory ripe for exploration. In that split second a lot happens: an evaluation of the problem’s importance, difficulty, its sexiness, its historical context (and possibly also its fit to their active grants). In other words, your adviser is likely to be a master of the outer loop and will have a highly developed sense of taste for problems. During your PhD you’ll get to acquire this sense yourself (Karpathy 2016).

Research taste cannot easily be transferred by espionage. It might be possible to formalize certain aspects of research taste, or to accelerate the process of learning about it implicitly by mimicking the experience of training under a senior researcher. How much better is the taste of the best researchers on the most advanced AI project likely to be than the taste of the second-best researchers on the second best AI project? Rohin Shah reports that advanced computer science PhD students at UC Berkeley have much better research taste than beginning PhD students, and that professors have better taste than advanced PhD students (Shah 2020). Is there a similar asymmetry in taste between the very best researchers in the world and their close competitors? This seems like a promising question for further study but, provisionally: Michael Polanyithe philosopher whose work brought about the current focus on tacit knowledge in the history of science and technologybelieved that the greatness of a discovery was proportional to the amount of tacit knowledge required to select the problem that led to it (Polanyi [1966] 2009, 23). The more that taste and other forms of tacit knowledge distinguish the leading AI projects from less advanced ones, the more difficult it will be for spies to significantly help the laggards catch up. Spies could work to transfer personnel from the leader to the laggards as a way of transferring tacit knowledge. But this would duplicate the issues with trust that limited the usefulness of Soviet spies who were exfiltrated such as George Koval and Oscar Seborer. Alternatively, spies might try some scheme of rendering tacit knowledge explicit.

3. Motivations for espionage 3.1 Klaus Fuchs: ideology and conscience 

Klaus Fuchs was (along with Ted Hall) one of the two most important spies in the Manhattan Project. He was a theoretical physicist. Fuchs took refuge in England after the Nazis came to power in Germany because his history as a Communist Party activist made him a target of the Gestapo. While in England, Fuchs began to work on nuclear weapons research and informed a German Communist Party leader that he had information that might be of interest to Soviet intelligence. Fuchs was sent to America to work as a nuclear physicist on the Manhattan Project, and continued to spy for the U.S.S.R. (Haynes, Klehr, and Vassiliev 2009, 92-93). 

Fuch’s sister Kristel Heineman helped him on several occasions to make contact with his KGB courier in America, Harry Gold (Haynes, Klehr, and Vassiliev 2009, 95). Fuchs’s initial involvement in spying was clearly ideologically motivated. He later accepted money from the KGB. Fuchs claimed to his KGB courier that he did so to prove his loyalty to the Soviet Union, because he had been told that offering payment was a KGB strategy used to “morally bind” other spies to keep helping the KGB (Haynes, Klehr, and Vassiliev 2009, 128). 

Klaus Fuchs

In 1949, British and American intelligence discovered Fuchs by decrypting KGB cables as part of the Venona counterintelligence project and correlating the covernames "Charles" and "Rest" with known facts about Fuchs’s background and whereabouts (Greenspan 2020, 193-228). By that time, Fuchs was back in England and working for the British nuclear weapons lab at Harwell. MI5 investigator James Skardon approached Fuchs and said that MI5 was certain Fuchs had been spying, but did not disclose it knew: “Skardon… suggested that FUCHS had been passing information to the Russians.... Skardon then took him very carefully over the ground during the period when he [Fuchs] was in America... and said that if it was not FUCHS it ‘could only be his twin brother’” (Greenspan 2020 239-240). Skardon repeatedly led Fuchs to believe he could keep his job at Harwell if he confessed (Greenspan 2020, 239, 259-260). At first Fuchs denied it, but after several interviews, he confessed to spying (Greenspan 2020, 257-258).

 Later, Fuchs gave a written confession. The ideological motivations given in that confession were as follows:  Fuchs’s father always emphasized to him the importance of following his conscience. In university, Fuchs started out as a social democrat, but switched to the Communist Party after what he saw as the social democrat’s failure to effectively oppose the rise of Hitler (Fuchs [1950] 1989, 182-183). While working as a Communist Party activist, he began to feel that he should subordinate his personal conscience and ideas about decency to party discipline (Fuchs [1950] 1989, 183). In his confession, he reported a kind of inward compartmentalization, allowing one part of himself to be at ease with his fellow scientists and another part to spy on them.  

In Fuchs’s confession, he claimed to have come to reject his former beliefs that 1. standards of personal decency had to be suspended for political reasons 2. one should subordinate one's thoughts to the Party and 3. the Marxist theory of freedom through the mastery of the blind forces that control society could be put into practice in an individual's life by skillful manipulation of his own environment, including that part of his environment composed of the people around him (Fuchs [1950] 1989, 185-186). Fuchs claimed his newly re-awakened conscience required him to stop working with the KGB early in 1949 and to turn himself in 1950 in order to spare his friends at Harwell from the suspicion that would be cast on them by ambiguity about who the spy in the British nuclear weapons program was (Fuchs [1950] 1989, 185-186). His confession shows that he continued to believe he would be allowed to remain at Harwell (Fuchs [1950] 1989, 185).

The primary source evidence is potentially consistent with ideological disillusionment serving as one factor motivating Fuchs’s decisions to stop meeting with his KGB courier in early 1949 (although this also might also have been due to Fuchs somehow discovering that he was being investigated, see Greenspan 2020, 271-284). Remarkably, Fuchs told a similar story of ideological development (but with a different valence) when he met with KGB agents in a Moscow restaurant after his release from British prison and relocation to East Germany. Fuchs told the agents that he had been unduly influenced by bourgeois ideology, but that he had since corrected himself (Haynes, Klehr, and Vassiliev 2009, 134-135). 

3.2 Ted Hall: ideology and great power balancing

Ted Hall was the youngest physicist working on the Manhattan Project. He graduated from Harvard at 18. Hall was a communist, and had been active as a labor organizer while in college (Haynes, Klehr, and Vassiliev 2009, 110-112). In 1944, at age 19, he approached a representative of the Soviet Union in New York and offered to serve as a spy. His explanation of his motivations for giving the U.S.S.R. information about American nuclear weapons research is recorded in former KGB agent Alexander Vassiliev’s notes on the KGB’s archives, which have been translated into English and are hosted on the Wilson Center’s website

The S.U. [Soviet Union] is the only country that could be trusted with such a terrible thing. But since we cannot take it away from other countries—the U.S.S.R. ought to be aware of its existence and stay abreast of the progress of experiments and construction. This way, at a peace conference, the USSR—on which the fate of my generation depends—will not find itself in the position of a power subjected to blackmail (Vassiliev, Yellow Notebook #1, 21).

Although Hall would later claim that he had originally set out only to inform the Soviet Union of the fact that the United States was developing nuclear weapons (Hall [1995] 1997, 288), that claim would seem to be belied by his statement that the "U.S.S.R. ought to... stay abreast of the progress of experiments and construction." Decrypted Venona cables revealed Hall’s status as a Soviet spy to American intelligence services after the war. However, Hall, unlike Fuchs, did not confess when questioned. Unwilling to reveal its access to secret Soviet communications, and unable to admit secret evidence in court, the U.S. government let Hall go (Haynes, Klehr, and Vassiliev 2009, 123-124). After his spying was revealed by the declassification of the Venona cables in 1995, Hall admitted to having been a Soviet spy:

It has even been alleged that I “changed the course of history.” Maybe the “course of history,” if unchanged, would have led to atomic war in the past fifty years—for example the bomb might have been dropped on China in 1949 or the early fifties. Well, if I helped to prevent that, I accept the charge. But such talk is purely hypothetical. Looking at the real world we see that it passed through a very perilous period of imbalance, to reach the existing slightly less perilous phase of “MAD” (mutually assured destruction) (Hall [1995] 1997, 288).

Hall’s two justifications, more than fifty years apart, both focused on the international balance of power.

3.3 Reflections on nuclear spy motivations

Ideology was by far the biggest motivation for Manhattan Project spies. Financial motivations were less important than ideological motivations, probably because penalties for spying could include decades in prison or death. When the stakes are very high, spying requires a certain kind of altruism, as narrowly self-interested motivations are unlikely to be able to overcome fear of the penalties if one is caught. It is also striking how many spies (Klaus Fuchs, David Greenglass, Oscar Seborer) were helped by members of their families in their espionage. Family loyalties might have served to prevent spies from desisting from spying (although Greenglass overcame this obstacle when he testified against his sister and brother-in-law, sending them to the electric chair). Another factor, in addition to family loyalties, that served to make it easier to start spying for the Soviet Union than to stop was the KGB practice of paying spies even if they were originally ideologically motivated. Receiving payment from the KGB removed any possible ambiguity about what the spies were doing and increased expected penalties, reducing the odds that spies would confess.

3.4 Possible AI spy motivations 

The Soviet Union was in an unusual position in the 1930s and 1940s. Its governing ideology commanded a significant following among educated people all over the world. This made it much easier to recruit spies. Unlike socialist internationalist loyalty to the Soviet Union, nationalism continues to be widespread and might motivate AI spying. This is true even of spying in the private sector, as spies might believe that by helping firms based in their homelands they are doing their patriotic duty. The most significant nuclear spy outside of the Manhattan Project, A. Q. Khan, was motivated by Pakistani nationalism. While security clearance investigations try to detect foreign loyalties, nothing like the security clearance system exists in the private sector. Furthermore, nation-states might force their otherwise unwilling nationals or firms to help with AI espionage. However, this issue must be treated with extreme care. There is an obvious risk of xenophobic or racist bias. Furthermore, there is a risk that attempting to prevent espionage by restricting the access to sensitive information of those with potential conflicts of national loyalties will, pragmatically in addition to morally, backfire. During the Cold War, the United States deported a Chinese-born aerospace engineer, Qian Xuesen based on unproven allegations that he was a spy. Qian went on to build missile systems for the People’s Republic of China. In addition to ideas that are widely popular (such as nationalism), ideas that are common among software engineers and computer scientists but rarer in the general population might prove significant as motivations for AI espionage. Belief in the open source or free software ideal, which opposes secrecy in software development, is one obvious example.

 Despite the potential motivating force of American nationalism as an ideology for spies, it seems doubtful that the U.S. government or U.S. firms will be net beneficiaries of AI espionage if competition is most intense between countries (if an AI arms race is undertaken largely between U.S. firms, then some U.S. firms may well be net beneficiaries). Spying can help lagging participants in a race to develop new technologies catch up, but it is hard to see how it can help the leader improve its lead (unless the overall leader is behind in certain specific areas). The United States appears to be ahead of the rest of the world in AI, with China being its only plausible close competitor. One recent analysis broke down AI capabilities into four drivers: hardware; research and algorithms; data; and size of commercial AI sector. The United States led China by a wide margin in every category except for data (Ding 2018, 29).

The most important AI research today is conducted in the private sector. Unless that changes, the most important spying will have to be done on private firms. This changes the balance of motivations that might prove significant. Most obviously, given that most people approach their work with the goal of making money, it suggests that financial gain might be more significant as a motive for AI espionage than it was as a motive for nuclear espionage. Financially motivated public sector spies tend to be of lower quality than ideological spies because, given the legal penalties for spying, only irrational people or people in truly desperate need of money would agree to take on the requisite level of risk. But in the private sector, expected penalties are lower. 

4. Manhattan Project counterintelligence

The historian Alex Wellerstein argues that counterintelligence efforts at the Manhattan Project had three main goals: 1. preventing Axis powers from spying 2. preventing wartime allies (such as the Soviet Union) from spying and 3. preventing scientists from getting a holistic understanding of the Manhattan Project, and (more importantly) preventing politicians and the broader American public from discovering the Manhattan Project's existence. Broadly, 1 and 3 were successful but 2 was not (Wellerstein 2021, 91-92). It may be that bureaucratic incentives to focus on secrecy from domestic political actors drew energy away from preventing Soviet espionage. General Leslie Groves was particularly concerned about Congress getting wind of the massive budget of the Manhattan Project and cutting it off, or subjecting Manhattan Project leaders to onerous postwar investigations (Wellerstein 2021, 81). During congressional hearings on atomic spying after the war, Groves “argued… that the Manhattan Project security apparatus had been primarily focused on preventing leaks and indiscretions, not rooting out disloyalty” (Wellerstein 2021, 224-225). 

General Leslie Groves

There were other reasons, besides Groves’s relative lack of interest in preventing Soviet spying, for the success of the Manhattan Project spies. Responsibility for detecting espionage was divided between two mutually hostile agencies, the FBI and army intelligence. And, most fundamentally, a significant portion of the world’s top scientific talent was sympathetic to the Soviet Union, which introduced a capability-alignment tradeoff (Walsh 2009).

5. The significance of nuclear espionage

The Soviet Union detonated its first nuclear bomb on August 29, 1949, four years after the first successful American nuclear test. In Stalin and the Bomb, David Holloway evaluated the impact of nuclear espionage on Soviet nuclear weapons development as follows: 

The first Soviet atomic bomb was a copy of the American plutonium bomb tested at Alamogordo in July 1945. Espionage played a key role in the atomic Soviet project, [sic] and its role would have been even greater if the Soviet leaders had paid more heed to the intelligence they received during the war. The best estimates suggest, however, that the Soviet Union could have built a bomb by 1951 or 1952 even without intelligence about the American bomb. There already existed in the Soviet Union strong schools of physics and radiochemistry, as well as competent engineers. Soviet nuclear research in 1939-41 had gone a long way toward establishing the conditions for an explosive chain reaction. It was because Soviet nuclear scientists were so advanced that they were able to make good use of the information they received from Britain and the United States about the atomic bomb.... The nuclear project was a considerable achievement for Soviet science and engineering (Holloway 1994, 366, emphasis added).

The empirical outline of Holloway’s account does not appear to be open to serious doubt. The Soviets made significant use of espionage data and, on the other hand,  Soviet scientists were world-class and could have developed the bomb within a few years of 1949 without espionage.

Michael Gordin makes an interesting argument in Red Cloud at Dawn. The Soviets laboriously checked, re-checked, and adapted spy data. Given the effort that the Soviets had to go through to assure themselves of the veracity of the information that they got from spies, Gordin suggests that it is an open question whether the Soviets really saved any time by using spy data (Gordin 2009, 153-154). Gordin concedes however that, even if the Soviets saved no time, they “surely saved much uncertainty” (Gordin 2009, 153).

Reducing uncertainty can change one’s strategy. If a country increases its confidence that it will soon have a powerful weapon hitherto monopolized by an enemy, it may become rational to behave more aggressively towards that enemy. 

Ignoring the prospective effects of knowing (rather than merely guessing) that one will soon have a powerful weapon, saving uncertainty meant removing the chance that the Soviets were unlucky and would have had to wait longer to get nuclear weapons. Stalin himself did not believe that nuclear weapons were very strategically significant in and of themselves (Gordin 2009, 62). He did, however, understand the enormous importance that the Americans assigned to nuclear weapons. Thus, he refused Kim Il Sung’s request to support a North Korean invasion of South Korea in 1948 because he feared an American intervention on the South Korean side. In 1950, however, Stalin was willing to support Kim’s invasion, in part because he believed that the Soviet Union’s nuclear weapons would deter American intervention (Haynes, Klehr, and Vassiliev 2009, 62). Therefore, it seems that even if one takes maximally unfavorable assumptions and assumes that espionage saved the Soviet Union no time and only uncertainty, without espionage there would have been a substantially greater chance that the Korean War would have been delayed or, because of the other changes made possible by delay, avoided. 

Furthermore, I do not think maximally unfavorable assumptions about the efficacy of nuclear espionage are justified. Absent further argument, it seems to me that we should default to the view that it is easier to check data and designs that one has in hand than it is to derive entirely new data and designs. Holloway’s estimate that intelligence saved the Soviets two to three years seems to be a subjective guess rather than the output of a quantitative model of bomb timelines. However, given that Holloway undertook the most thorough study of the Soviet nuclear weapons program (at least in English), he should be afforded some (small) amount of epistemic deference. Given the basic facts of the case, the Soviets saving something in the neighborhood of two to three years is not hard to believe. Because of the importance of the Korean War, that ought to qualify as a significant impact on world history. 

In addition to the impact of espionage on the development of nuclear weapons, nuclear espionage may also have raised the temperature of the Cold War. Even if we grant, as we should, that the Cold War would have occurred anyway, the discovery of Alan Nunn May’s nuclear spying in 1946 may have reduced the odds that control of nuclear weapons would be ceded to multilateral international institutions (Zaidi and Dafoe 2021, 23, 42, 42n179). The distrust engendered by nuclear espionage highlights the potential of spying to increase enmity between the leader and the laggards in a technological race, and to reduce the odds of cooperation aimed at mitigating the risks of such a race. This effect emerges from the inherent dynamics of espionage and is likely to apply to AI races as well as nuclear races.

6. Secrecy

Among people concerned about existential risk, there sometimes seems to be a presumption in favor of secrecy. One plausible origin for this presumption is the 2016 article “The Unilateralist’s Curse and the Case for a Principle of Conformity” by Nick Bostrom, Thomas Douglas, and Anders Sandberg. Bostrom et al. argue that even a well-intentioned group of independent actors is likely to err in the direction of taking a risky action, because if one can act unilaterally the probability of action will be proportional not to the average of the group but to the probability that the most optimistic actor will act. Bostrom et al.’s proposed solution to the unilateralist's curse is a principle of conformity in situations where unilateralism is possible. When the action in question is publishing or not publishing some information, the principle of conformity is equivalent to a presumption in favor of secrecy.

Note, though, that in “The Unilateralist’s Curse” Bostrom et al. do not argue for conformity all things considered. Rather, they argue that the unilateralist’s curse provides a defeasible reason for conformity. Their paper does not attempt to establish whether, in any given, situation our prior inclinations to conform or not to conform are correct. If one is concerned about the dissemination of information hazards, one should bear in mind that omissions might reveal as much as commissions in certain circumstances, and weigh carefully what strategy of releasing or withholding information is least hazardous (Bostrom 2019; Bostrom 2011). 

One should also be concerned by the tendency of secrecy regimes to perpetuate themselves. Alex Wellerstein, explains this point of view

This is, perhaps, the real application for the history of nuclear secrecy to these fields: once the controls come in, they don’t go away fast, and they may not even work well to prevent the proliferation of technology. But they will do other kinds of work in their effort to partition the world into multiple parts: creating in-communities and out-communities, drawing scrutiny to those who practice in these arts, and monopolizing patrons. There may be good reasons for other scientific communities to embrace secrecy—if the information in question truly was unlikely to be independently discoverable, had potentially large negative applications relative to the possible positive applications, and could be effectively controlled, then it might be a candidate—but if they took my advice, they would think long and hard about what types of secrecy activities they wanted to adopt and how to make sure that their attempts at secrecy did not outstrip their other values (Wellerstein 2021, 410, emphasis added).

Many of the concerns Wellerstein raises seem rather remote from existential risk. This might lead researchers concerned with existential risk to assume that they have nothing to learn from the anti-secrecy perspective. I think that would be a mistake, because Wellerstein’s observation that regimes of secrecy tends to be self-perpetuating is highly relevant to existential risk. Secrecy serves to worsen our understanding of (and, therefore, our ability to control) emerging technologies. Secrecy may have had this effect in the early Cold War United States, where a large thermonuclear arsenal was accumulated alongside a failure to seriously study the catastrophic risks that thermonuclear war posed (Gentzel 2018). If secrecy is hard to uproot, it might further raise existential risk by preventing concerns about safety from spreading to all relevant actors. 

In “What Failure Looks Like,” the AI researcher Paul Christiano explains some reasons why AI may pose an existential risk. Those reasons all involve imperfectly understood AI systems whose goals diverge from those of human beings and which are able to gain power and influence in part because of their creators' imperfect understanding of the systems' true goals. Christiano anticipates that this problem will arise due to competitive incentives to deploy powerful AI systems as soon as possible combined with the inherent opacity of contemporary machine learning techniques (Christiano 2019). But secrecy about advanced AI might compound the problem of recognizing misaligned AI systems. And if  approaches to AI safety that rely on rendering AI systems interpretable prove essential to preventing misalignment, secrecy is likely to be a major barrier. Whether such considerations are important enough to establish a presumption against secrecy is beyond the scope of this post. But the empirical tendency of secrecy regimes to expand their remit and endure indefinitely should be taken seriously.

7. Conclusion: espionage and existential risk

Espionage is most likely to be significant if discontinuous progress in AI can be achieved on the basis of key abstract insights. To the extent that the scaling hypothesis is true, espionage is likely to be less important. But even if the scaling hypothesis is true, espionage may be significant if it transfers engineering knowledge or tacit knowledge (which can be transferred either by exfiltrating agents or rendering what was tacit explicit). Espionage during the Manhattan Project may have accelerated Soviet nuclear weapons development by two to three years, which does not sound like much, but may have altered the course of the early Cold War. This was achieved by the less than 0.1% of Manhattan Project employees who were Soviet spies (part of the effectiveness of this small group may have been due to the disproportionate representation of high-ranking employees among spies). If a technology is truly transformative, even a small gain in speed is strategically significant.

On balance, AI espionage is likely to increase existential risk. In “Racing to the Precipice” Stuart Armstrong, Nick Bostrom, and Carl Shulman create a game theoretic model of AI arms races’ effects on safety. Armstrong et al. find that risks are greatest when enmity between competitors is high, knowledge of other projects is available, and (conditional on knowledge of other projects being available) the leader has only a small lead. One should expect espionage to increase enmity between competitors, increase knowledge of competitors’ projects, and reduce the distance between the leader and the laggards. Thus, to the extent that Armstrong et al.’s model reflects the real strategic situation, the expected impact of espionage is to increase existential risk. Eoghan Stafford, Robert Trager, and Allan Dafoe’s forthcoming “International Strategic Dynamics of Risky Technology Races” builds a more complex model. Like Armstrong et al., Stafford et al. find that enmity increases risk in all situations. However, whereas Armstrong et al. find that a close race is more dangerous, Stafford et al. find that under certain circumstances, close races are less dangerous than very uneven races. If, in Stafford et al.’s model, enmity between leader and laggard is high and the laggard is far behind, compromising on safety might seem to be the only way that the laggard can have a chance of winning. But in a more even race, the laggard might be less willing to compromise on safety because they would have a chance of winning without taking extreme risks. Thus, granting for the sake of the argument that the assumptions of Stafford et al.’s model hold, espionage’s tendency to narrow gaps might, under some circumstances, reduce existential risk. However, this consideration would seem to me to be outweighed by espionage’s tendency to increase enmity. 

It therefore may be valuable for people concerned about existential risk to contribute to preventing AI espionage even if they have no preference between the project being spied on and the project doing the spying. On the other hand, secrecy (the most obvious countermeasure to espionage) may increase existential risk by worsening issues with interpretability. And subjecting AI researchers to background checks may asymmetrically weaken conscientious projects as their competitors, not worried about existential risk or espionage, will gain from the talent that they reject. All of these considerations should be carefully weighed by AI policy practitioners before deciding to prioritize or deprioritize preventing espionage. 


What can we learn from traditional societies?

5 октября, 2021 - 03:24
Published on October 5, 2021 12:24 AM GMT

I have been reading books on Anthropology for a few years now and the one that ignited my passion on the topic was Jared Diamonds' very popular Guns, Germs and Steel. Jared Diamond is a polymath, historian, anthropologist and physiologist, fluent in many languages, a Pulitzer prize-winning and a very cool guy. He has followed a very heterodox academic pathway that inspired me to try weird things. His book "The world until yesterday" is a gem that has not received all the attention it deserves, so I wanted to do a bit of justice here and share some of its most interesting ideas and some of my thoughts after reading it.

State societies vs traditional societies

5000 years ago, the vast majority of people lived in small groups composed of 20-100 individuals. But at several points in human history as a result of the domestication of crops (which boosted an explosive population growth), some of these groups started coming together and created the first cities. It is possible to sit in a big circle and discuss the best ways to organize a society when your group has 40 people. If your group is composed of several thousand individuals, it is not: this gave rise to the centralization of power and the birth of the first states.

Today, the situation is totally different: a vast majority of people are under the rule of a state and just a few preserved the ancient way of life. These traditional societies retain behaviours, moral codes and societal structures that have not been modified for thousands of years, and resemble more what we could call a "natural state" of humans. These people face challenges more similar to the challenges we had to face in the ancestral environment where we evolved. They can be found in very diverse places such as the Amazon forest, New Guinea, Australia, Alaska or Subsaharan Africa. So what are some of the most striking aspects of their ways of living? Can we learn valuable things from them?


In the Western world, most births occur nowadays in hospitals, with professional assistance from doctors and nurses. The attitude towards childbirth varies a lot from culture to culture. Often, childbirth takes place with the assistance of other women. On occasions, childbirth is a public event; for instance, in the Agta people in the Philippines, a woman gives birth in a public house where everyone can visit and shout out instructions to the delivering woman (push! breathe! etc). At the other extreme, the Piraha Indians women give birth by themselves and even if they are having problems, other people are not allowed to intervene (even if the problems mean the death of both the mother and the kid).


A common practice  (but not universal) in traditional societies is that when twins are born, one of them is killed (sometimes being buried alive), because it is not possible for the mother to rear both children. Most traditional societies can be classified as hunter-gatherers and food is a scarce resource for them. 

Parent responsibility

In state societies, children’s rearing depends (most of the time) exclusively on the parents, which normally makes life way harder in many different ways (I am not a parent myself, but I have seen way too often how people with kids go great lengths to rationalize how great parenting is, even when is not).

In modern Western society, a child’s parents are normally responsible for most of the care. Allo-parenting (individuals who are not the biological parents of the kids but take care of them) tend to have a much larger role in many traditional societies than in the Western world. In different traditional societies, children are free to walk wherever they want in the village and are considered to be the responsibility of everyone.

Regarding the involvements of fathers, there is a lot of variation. For instance, in Aka Pygmies fathers play a role in parenting almost as important as the mother.  In societies such as the African Bantu groups or New Guinea Highlanders, men spent a great deal of time warring against other men and child-rearing is an occupation exclusively for women.

Child autonomy

Something that distinguishes hunter-gatherer societies is that they are very egalitarian. In Western society, we normally assume that the responsibility for a kid's development lies on the parents and they can control how the child turns out. In many traditional societies, children are autonomous individuals and their will and desires are considered at the same level as other adults.

New Guinean Highlanders usually have burn scars as a result of playing with fire when they were infants (without adults telling them not to because they think that these experiences are part of learning). The children of the Hadza and the Piraha are allowed to play with huge and sharp knives, even if this means sometimes ending up with severe scars that they will carry for life.

Depending on the dangers of the environment, young kids are allowed to walk away from the other members of the tribe by themselves or in the company of other children. For instance, in the Amazon rainforest, where there are many poisonous animals such as snakes, bees or spiders and dangerous animals (jaguars, peccaries), kids of the Ache tribe are not allowed to go far from themselves without adult supervision. The situation is very different in the forests of Madagascar, where there are no dangerous mammals and children are allowed to go long distances by themselves.

Physical punishment

There is no consensus, either in the state-societies or in the traditional societies, about what level of physical punishment is required to correctly educate children. However, Diamond claims that there is a general trend: hunter-gatherers inflict very little or absolutely no physical harm to kids. For Akay Pygmies, if one parent hits the infant, the other parent can consider that a reason for divorce. !Kung children are permitted to slap and insult their parents and they won't be punished for that: for the !Kung, kids are simply not responsible for their actions. Farmers societies tend to be more strict and herders usually are the strictest, inflicting severe punishment to kids (probably because misbehaviour can imply the loss of valuable livestock and entail serious consequences for the whole family)


In the Western world (most) kids are forced to go to special centres where they are educated by professionals, controlled by the bureaucrats of the state. For many years, kids need to spend a large fraction of their time adhering to strict rules of sitting for many hours while listening to an adult talk about subjects that are very often totally disconnected from the problems they will have to face later in life.

Kids in traditional societies like to play a lot and the games they play are directly related to the activities they will have to perform once they grow up. They start playing with toy bows and arrows very young and by the time they are teenagers, they are fully trained to shoot animals for food (or enemies in a war). They spend time doing acrobatics or climbing trees, skills that will be essential for their survival when looking for food. In general, there is a seamless transition between the games that are played during childhood and the tasks that have to be performed in adult life. I really wonder, couldn’t we somehow copy this from traditional societies?

Attitude against danger

In the West, a common attitude against danger is acting macho. It seems that in traditional societies (but there is only limited anecdotic evidence for this) this attitude would be considered strange. American anthropologist Marjorie Shostak recalls how the !Kung are very aware that hunts are dangerous and people can be killed by large mammals, so they actively avoid dangerous situations and by no means are embarrassed by showing signs of what could be considered cowardice. 


Languages are way more evolvable than we used to think. In the absence of invasion and imposition, the normal thing is that different groups of people speaking a common language end up talking slightly different dialects after few generations, which means that they can continue to evolve into unintelligible languages given enough time and distance. An example that comes to mind (this is from Chomsky, not Diamond) is that in the past, if you went from France to Italy, from village to village you wouldn’t see a sharp transition after you crossed the border (as you mostly see today). Instead, you would find a gradient of languages, with a positive correlation between geographical distance and language similarity. The reason why we see French or Italian as a unified language spoken more or less equally in large geographical areas is due to the influence of the states.

Most English native speakers speak a single language, in spite of many years of being exposed to other languages during school time. In traditional societies, it is very often the case that people speak 4-5 or more languages naturally. Languages are learnt by interaction with other native speakers, not by taking complicated grammar courses. Languages are also useful. It is very often the case that adjacent groups of people speak different languages, and the interaction with your neighbours can sometimes be a matter of life and death.

Diamond discusses the benefits of preserving different languages (most are becoming quickly extinct), such as some evidence that they can protect against Alzheimer, that they are cultural treasures, etc. but I remained unconvinced by these arguments. In fact, I absolutely love that there are many languages in the world and I hope that this continues to be the case, but I cannot find a better argument for it than a simple “I like languages” (which I think is also Diamond’s real reason and the rest is rationalization).


Gossip is universal, a behaviour observed in all societies known to date. Diamond recalls being impressed by how much time New Guineans spend talking to each other compared to Americans and Europeans. He mentions that on occasions they would even wake up in the middle of the night and continue talking even about the most apparently banal topics such as how many times one has pissed during the day. There are similar reports of the talkativeness of other peoples such the !Kung or the African Pygmies. The world of many traditional societies is dangerous, and it is a matter of life or death acquiring as much information as possible, even when that information might seem irrelevant.


"All human societies practise both violence and cooperation; which traits appear to predominate depends on the circumstances"

Violence is a common phenomenon for many traditional societies and there seems to be a general trend: the more people per unit of food available in an area, the more violent the society is. Many societies (but not all) participate in something called traditional warfare, involving many different groups that create alliances and fight against each other, sometimes continuously for decades. The number of deaths caused by traditional warfare is worse than even the bloodiest wars of state-wars, when looking at the deaths per capita.

Something that I found interesting is that even these people live in permanent stress and a state of alert due to the continuous danger, once the battles start, they are extremely inefficient at organizing themselves as armies and killing their enemies. For instance, when shooting arrows, they do it one by one, which gives the enemy a chance to dodge the arrows. Implementing basic strategies (such as shooting arrows simultaneously) of war would have a huge impact on their success against their enemies, which makes me think that it is not so easy to come up with these strategies in the first place.

A key difference between traditional warfare and state-warfare is that traditional societies do not take prisoners; they just kill the captured people. Taking prisoners means more mouths to feed, and food is a precious and scarce resource for traditional societies. When it comes to killing, warriors will show no mercy to the enemy and children and people of any gender are killed without remorse.

Traditional warfare was largely abolished after the Europeans started colonizing the world in 1492 and only survived in very isolated places of the world such as New Guinea and the Amazon rainforest.


In state-societies, we are conditioned since birth to abhor crimes such as murder. However, in special circumstances, .e.g. when people join the military, they need to be reconditioned to abandon those principles temporarily. Very often these people end up with huge traumas for the rest of their lives.

In traditional societies, such as New Guineans, people have known since childhood about the great warriors of their tribes and how they are praised for their killings. "Of course New Guineans end up feeling unconflicted about killing the enemy: they have had no contrary message to unlearn".

“Traditional human societies [] outside the control of state government have shown that war, murder, and demonization of neighbours have been the norm, not the exception and that members of those societies espousing those norms are often normal, happy, well-adjusted people, not ogres”

Old People

In traditional societies, old people are sometimes seen as a source of wisdom and knowledge. For rural Fidjeans for instance, old people usually share the same house that they have inhabited all their lives and they are taken care of by their children and grandchildren up to the point that sometimes the food is chewed for them. However, this is not the norm. For many hunter-gatherers, old people mean more mouths to feed and more work to do, which also means that when a person is not autonomous, sometimes they need to be "disposed of". The name of this practice is called senilicide. There are, by and large, five ways that traditional people get rid of their old people

  1. Neglect: They simply ignore them until they die.
  2. Intentionally abandoning the old person when the rest of the group shifts camp
  3. They encourage the old person to commit suicide (by jumping a cliff, going out to sea, etc)
  4. Assisted suicide by strangling, stabbing or burying alive (with the consent of the elder)
  5. Killing the victim without the victim's cooperation or consent.
Contact with the Western World

There is obviously a lot of variation in this regard: for instance, the Andaman islanders are fierce warriors that attack any intruder that gets closer to their shores, which means that they have successfully remained isolated so far. The contact of traditional societies with Westerners has meant total destruction for many of them, but we shouldn’t oversimplify the situation In many cases, traditional societies are very happy about establishing contact with the western world, because it brings good stuff and they are very happy to profit from it. More specifically, the western occupation of lands such as New Guinea has meant the cease of traditional warfare (which is a cause of literal nightmares for New Guineans), easy access to food and to medicine. By and large, most people are happy to go from traditional societies to a westernized lifestyle than the other way around.

“An American friend of mine travelled halfway around the world to meet a recently discovered band of New Guinea forest hunter-gatherers, only to discover that half of them had already chosen to move to an Indonesian village and put on T-shirts because life there was safer and more comfortable. -Rice to eat, and no more mosquitoes!- Was their short explanation”

My personal opinionThe ugly

Diamond’s book draws heavily from anecdotal experience. I personally don’t mind this and I see the benefits of doing that, but in some cases, you are left wondering how generalizable the examples that he provides really are. Something I thought when reading this book is that the first chapters are a bit too slow (I almost abandoned the book at some point), which is a pattern I have seen in some other Diamond’s works (e.g. Collapse). I would encourage the reader to make the effort to keep reading, it is worth it.

The bad

I was personally unpleasantly surprised to discover about the levels of violence in traditional societies, and even more about some of their practices. There is a widespread romanticized view of traditional societies as somehow more morally pure people. This topic is in fact a highly sensitive issue in Australia, the country where I live. I see very often how “white-Australian” (sorry for the lack of a better term) are often accused of the abominable crimes of destroying the aboriginal cultures (i.e. the stolen generation). Although I do consider a tragedy the result of many of these cultural exchanges (and traditional societies always end up on the worst side of any of these interactions), I have to say that reading this book made me rethink many of my preconceived ideas. I do see now why some people would want to force other people to adapt their practices and I have to concede that they might have a point. 

The good

I think there are many interesting lessons here. The first one is acquiring awareness that other ways of life (and moralities) are possible. In the Western world, we usually see life in a very narrow way and we assume that most of these things we have around us are the way things must be (e.g. education in schools, or parenting). For instance, I personally plan not to have any children, but I admit that I would feel different if the responsibility of raising children was a much more communal thing. 

Interestingly enough, I spoke about this with my mom and she said that you don’t need to find examples from societies so distant to us: when she grew up, in a small city in the south of Spain, she was raised in a house where she lived with grandparents, her uncles, aunts and a large number of cousins. She feels that this way of life has changed a lot in the last 50 years: most children will now be raised exclusively by their parents and will live in houses without much interaction with other adults or kids (besides any siblings). 

However, the most interesting realization for me was how alien the education system really is. I have always felt this way, but I think I couldn’t articulate my thoughts as well as I can now after reading this book. Education in traditional life is not a separate part of your life. Education is not a thing. The way you learn is by playing in controlled environments, doing the same things you will be doing in your adult life. And playing is fun. In the West, we are able to educate children in factories, obtaining equality (all children can obtain at least some form of education) at the price of destroying the fun and teaching things that are very often absolutely unrelated to the problems that they will face as adults. As I said, I probably won’t have children but if I had, they certainly wouldn’t undergo the normal education system.   


Thanks a lot to Miranda for the corrections and the feedback. All errors are my own 



How good is security for LessWrong and the Alignment Forum?

5 октября, 2021 - 01:27
Published on October 4, 2021 10:27 PM GMT

As far as I can tell, LessWrong/Alignment Forum haven’t been noticeably attacked or disrupted. However, I’m concerned that could change because:

  • Attack command, control and automation will definitely improve, making more sophisticated attacks easier to deploy at scale.
  • If AI becomes as important as we think, and the Rationality community succeeds at influencing its trajectory, LW/AF may become targets for surveillance/influence operations.

So my questions are:

  • How much focus does Lightcone Infrastructure and put on security?
  • Does Lightcone contract with any external security experts or penetration testers?
  • Are the any plans to implement two factor authentication for LW/AF?
  • Are there any planned responses if automated trolling/astroturfing attacks become much more common/advanced, as seems plausible with the rise of strong language models?
  • Are there plans for secondary hosting providers, in case Amazon/the US become hostile?
  • Is there some way we can download and backup all public conversations hosted on LW/AF?
  • Relatedly, how are backups handled for LW/AF?

Thank you very much to Lightcone Infrastructure and the LessWrong team for your work. I’d be glad for any insight they (or anyone else) can share.


2021 Darwin Game - Tundra

5 октября, 2021 - 01:21
Published on October 4, 2021 10:21 PM GMT

Our Tundra is an inhospitable[1] environment. The only significant food available to herbivores is Lichen, which has a tiny nutritional value of 1. The Tundra is cold too. Staying warm requires the cold tolerance adaptation, which costs +2 size.

Name Carrion Leaves Grass Seeds Detritus Coconuts Algae Lichen Tundra 1 1 1 1 1 0 0 300

An organism must expend 20% of its energy just to survive. A herbivore foraging for lichen cannot have a size greater than 5 or else it will expend more energy in metabolism than it is possible to acquire from eating Lichen.

All organisms have base size 0.1. The cold adaptation (+2) plus the Lichen digestive tract (+1) costs a total of +3 size. A Tundra herbivore has a minimum size of 3.1. A herbivore with size 5.1 is untenable since it expends more energy (1.02) than is possible to obtain from Lichen (1.00).

Players submitted 39 species native to the Tundra. Only 4 of them were viable herbivores: Micropas, Arctic Slug, Northern Nibbler and "lichen" (not to be confused with the foragable "Lichen"). (Multicore's Arctic Fox was a carnivore.)

These species could support little in the way of weapons, armor and speed. They were defenseless. In the first 8 turns, all four of our viable foragers are eaten to extinction.

Goes Extinct in Generation Species 5 Pristol 7 Micropas 7 Arctic Slug 8 Northern Nibbler 8 lichen

After the viable herbivores were eliminated, total ecological collapse was inevitable.

Goes Extinct in Generation Species 9 Yonge_Cold 9 Boreakeet 9 Beck’s Penguin 10 SmolFire 10 Arctic Ambusher 10 Zlorg 10 Arctic Fox 10 Orange-Krill 10 abominable_snowman 12 Antasvara 12 Unfortunately Large Cockroach 12 cg-mouse 13 Porostozer Malutki 13 1994 Mazda RX7 14 Raburetta 14 Pittsburgh-Penguins 15 Louse-lion 15 Wolverine 16 Jtp 16 Wolves 17 Seals 19 Direwolf 24 Tsc 27 Tundrus Rex 29 Frankenstein 32 Broken Fetters 34 Alaskans 37 Dragon 37 Porostozer Mamuci 39 Rocks 41 Duckofants 43 White-Whales 50 tp511 52 Frostwing Snipper

The Frostwing Snipper

An honorable mention goes to Nem's Frostwing Snipper, a Speed 10 species that could digest both Lichen and Seeds. The maximum speed made the Frostwing Snipper immune to predation which let it survive the initial carnage. The ability to digest seeds meant that the Frostwing Snipper did consume enough energy on average to more than replace itself.

However, "on average" is not enough. The Tundra's carrying capacity of Frostwing Snippers was too small. Random fluctuations eventually knocked the Frostwing Snipper into extinction.



  1. The original Tundra was even more inhospitable than this. I made it easier thanks to early feedback from aphyer. ↩︎


How much memory is reserved for cute kitten pictures?

5 октября, 2021 - 00:30
Published on October 4, 2021 9:30 PM GMT

In my social circles, I frequently tell a joke making fun of the awful lot of cute kitten pictures available on the internet ("somewhere in the world, a whole server farm is doing nothing but storing pictures of cute kittens"). Joking apart, there are thousands of data centers around and the world's total data storage capacity is measured in Zettabytes.

How much of this memory do you think to be actually occupied by cute kitten pictures/videos? What could be an effective way to make a Fermi estimate?


Open & Welcome Thread October 2021

4 октября, 2021 - 22:52
Published on October 4, 2021 7:52 PM GMT

(I saw October didn't have one. First post - please let me know if I do something wrong.)

If it’s worth saying, but not worth its own post, here's a place to put it.

If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the new Concepts section.

The Open Thread tag is here. The Open Thread sequence is here.


[Book Review] "I Contain Multitudes" by Ed Yong

4 октября, 2021 - 22:29
Published on October 4, 2021 7:29 PM GMT

I have read this book once before, just before I went to university, and I found it immensely enlightening at the time. Now I return to it as an older man, and find it has lost a lot of its shine. I don't know if this is reflective of my loss of naivete, or increase in biological knowledge.

Core Thesis

The subject of this book is the microbiome. This is the community of micro-organisms (although it's almost all focused on bacteria with a small amount of emphasis on viruses) that live around us, on us, and inside us.

The book is mostly a long series of interesting investigations into the microbiomes of humans and animals. What is implied (and sometimes said outright) is that biologists have neglected studying it.

The history of the study of microbiology is explored in detail. As I am not a huge fan of scientific history literature I mostly skimmed this but I expect it would be interesting to many people. Some key points here are the origins of the focus on microbes as pathogens and parasites, and how this idea has changed to focus more and more on symbiosis over time. We need our microbes and they need us!

Microbes do not have the same properties when it comes to genes as do large organisms. They can and do frequently exchange genetic material with other microbes. This means that in a microbiological community, the set of all genes present is more important than the set of specific microbes.

Holobionts and their Hologenomes

This culminates in the concept of the "hologenome". This is the set of genes in both an organism and its microbes, which are together referred to as a "holobiont". Hologenomes have some properties common to mere "genomes", such as the construction of biological pathways (which are like manufacturing lines, the product of one reaction being the starting-point of the next) using genes from many different bacteria and the host.

On the other hand, genomes of macroscopic organisms are generally static, only changing upon being transferred to the next organism. Hologenomes contain the "core" genes of the main organism, and it's necessary symbionts. Other genes are part of slow-changing microbes, and others still are ephemeral. Microbial genes are subject to different types of natural selection depending on their relationship to the host.

This perspective is genuinely very interesting. It's a different way of thinking about organisms and their bacterial communities, and I think it's the strongest part of the book.

Cool Facts

One thing this book certainly doesn't lack is cool facts about biology. Honestly some of these things are so cool and interesting that if you're a sucker for cool biology facts it's worth reading the book just for these. Here's a selection:

  • The human body contains more bacterial cells than human ones, but there are 500 times more different genes in the bacteria than in your own genome
  • Human gut mucus has an incredibly high concentration of bacteriophage viruses (which infect bacteria) embedded into it, which help prevent gut bacteria from invading the body.
  • Babies' immune systems actively suppress themselves to allow for bacteria to colonize them after birth. They then learn to tolerate these bacteria in the gut.
  • Human milk contains special sugars (HMOs) which induce lots of bacteria to behave differently in the guts of infants. They also seem to prevent infection of human cells by bacteria in cultures.
  • The Hawaiian bobtail squid is able to use tiny water currents to filter bacteria from the water, detect as few as five cells of a specific type of bacteria (Vibrio fischeri), and then cultivates them in special organs. It uses these to produce light to hide its own shadow from creatures underneath it.
  • Paracatenula worms are >50% bacteria, and 90% of their body is a bacteria-holding organ called the trophosome. They can regrow a head from a trophosome but not the other way round.
  • Wolbachia, a group of symbiotic bacteria, are only passed from female insects to their descendants. This means males are reproductive dead ends. Wolbachia have taken to sterilizing males, to killing them so their sisters can eat them, to feminizing them.
  • Lots of insects just die if they're deprived of microbes. Unlike human bacteria which live in the gut, these actually live inside the cells of the insects.
  • There are two species of wasps which appear to be reproductively isolated from each other (i.e. individuals from one species cannot breed with those from another) entirely due to having different bacteria living in them.
  • One sort of parasitic wasp has a domesticated virus in it's own genome. It produces this virus when it parasitizes its host and releases it into the host's tissue.
  • A type of caterpillar which is parasitized by a different wasp can fight back. It does this by having its own symbiotic bacteria release a virus (which is dormant in the bacteria's genome) to kill the wasp's symbiotic bacteria.

The takeaway here is that symbiotic bacteria (and particularly in insects) fall into the Floridaman reference class. Everything is possible and only a fool expects to be unsurprised. These are all pretty well-sourced and not too significant to the overall thesis so I'll save the fact-checking and follow-ups for later.

Medical and Ecological Interventions

Ed Yong did a good job of finding the actual researchers in the field and talking to them. This is good. But researchers love to talk about their next project, and find settled science to be terribly boring. Many of these projects are thus highly speculative, and a fairly high proportion are medical.

Some of the findings are uncontroversial, like the fact that bacteria-free mice (and humans) are generally not healthy. Bacteria in the gut are necessary for digestion.

Lots of them are, well, not uncontroversial.

Oh Dear God We're Doing Autism

One of the first examples given is of research by Paul Patterson and Sarkis Mazmanian. Patterson gave pregnant mice a compound to mimic viral infection, attempting to create mouse models of schizophrenia (based on the fact that flu infections in pregnant mothers are linked to schizophrenia). The resulting mice showed behaviour which sort of resembles both schizophrenia and autism.

More remarkably, transferring samples of gut microbiota from autistic children to sterile mice (in this case sterile refers to having no microbiome, not to reproductive capability) seems to replicate the symptoms. But feeding a bacteria (B. fragilis) to these mice seemed to cure most of their symptoms! While Mazmanian has continued to produce papers, interest in B. fragilis has petered out. Mazmanian has also not been able to replicate the results in humans.

That's not to say that autistic people don't have different microbiomes, which is fairly well-documented. It just appears that an altered microbiome isn't the only factor in play. This wasn't surprising to me on reading the book.

Faecal Transplants And Beyond

C. difficile causes terrible infections of the gut. They're most common after antibiotic treatment, as C. diff is generally more resilient to antibiotics than the rest of the gut microbiome. This means that treatments using antibiotics are often ineffective. Unless all of the C. diff is removed and the gut reset, it is likely to return.

In this case, one of the major treatments is the rather unpleasant method of faecal transplant. Some faeces is taken from a healthy individual and put into the gut (through colonoscopy or a slow-dissolving pill) of the sick person. This has a ~95% (!) success rate, and is now standard practice. The FDA tried to stop faecal transplants for a bit because we all know their attitudes towards improving people's lives but it is still in common usage.

C. diff is an ideal target for microbiome-based interventions. We know the exact bacterium responsible. It is a case of an attractor in the microbiome-landscape characterised by an extreme lack of diversity. It occurs almost entirely due to outside influences on the gut (rather than human genetics). Most microbiome-based interventions are much less hard to get right.

Eliminating Dengue Fever

Dengue fever is carried by mosquitoes. When infected by a certain type of Wolbachia, the mosquitoes can no longer carry dengue fever. This Wolbachia can be introduced to mosquito populations to reduce the impact of dengue fever. When the book was written, this was in very early stages, now it has been rolled out to regions encompassing over 6 million people.

The website for this project says that "In areas where Wolbachia is self-sustaining at a high level, there have been no dengue outbreaks". This sounds remarkable but how many areas have actually met that criteria? Givewell do not have an effectiveness assessment for them, but I have found an assessment of their effectiveness from an australian consulting firm. It seems they have definitely released a lot of mosquitoes but the nature of their intervention makes it difficult to assess actual cases of dengue averted.

Diet-dependant Interventions

Cows can't normally eat Leucaena. But they can be given a gut microbe which can remove the toxins. This has been a success story in Australia, which now uses it as a major feedstock. This works because the cows consistently eat Leucaena, which gives the microbes something to eat.

Similar trials have occurred in humans who are prone to kidney stones (and other stones). The cause is oxalate in the diet, but giving these people a supplement of Oxalis bacteria, which can break down oxalate, generally fails. This is because people who are oxalate-sensitive don't eat oxalate, and the bacteria starve.

Obesity is linked to changes in gut microbiome. Transferring gut microbes from a non-obese mouse to an obese one can make it thinner compared to a diet-matched control, but this only happens when that controlled diet is high-fibre. The same seems to be true of humans. With perfect control over gut microbes, we can influence obesity, but in reality lots of that control comes from the diet.

The microbiome is not magic. The microbes in it respond to changes in their environment, and for gut microbes, that's our diet. Unfortunately the dietary advice remains the same. This also explains the lack of microbiome-associated cures for the laundry list of diseases associated with the microbiome.

Microbiome → Health

From my first reading, I recalled the presence of enormous amounts of claims as to how the microbiome influences health, and how manipulating the microbiome might change our health. I was quite excited to have either the thrill of discovering them anew or the thrill of tearing them down.

I was surprised to find that the book didn't contain many concrete claims of this nature. Instead there was a large amount of existing research in models, and a few quite strange studies which just seem odd (microbes reducing the symptoms of depression in people with IBD? what about the IBD itself??). I think this is partially a failure of my memory, but partially a failure of the book.

I can't fault the book for making actually wrong statements, but I can fault it for giving off incorrect impressions. It feels like trying to analyse some lawyer's testimony on a client's character, mentioning his time spent going to churches, which when investigated turns out to be true, but only because he's a cleaner or something. The concrete claims are all basically true, but more banal that I recalled.


I think that most of this book is a very well-written and accessible account of the microbiome, and the author does a good job of making the reader excited by it. I think that it overreaches in a lot of ways, and a reader would be wise to consider what it is claiming to be true and what is being speculated on.

The biggest issue with microbiome research is a lack of gears-level models. This is mostly due to the gigantic complexity of the topic. We have good models of certain interventions and disease states, but these are really not common. Changing the microbiome of mice can make them more or less anxious, fatter or thinner, and we don't know why.

There have only been five years since the publication. This is barely more than one generation of PhD students. Further studies on the microbiome have not made enormous progress, but that is to be expected. Whether the microbiome will be a focus of future disease treatment is unclear. The book is arguably overselling things, but part of the point of a book is to intrigue us, and to draw our attention towards things we would otherwise have ignored. That is something "I Contain Multitudes" does very well. 


Andrew Yang on "The Priests of the Decline"

4 октября, 2021 - 22:24
Published on October 4, 2021 7:24 PM GMT

In this post excerpted from his new book, Andrew Yang uses his experience in politics to explain why the US political system rarely accomplishes much. It's strikingly similar to another recent post someone shared from Dominic Cummings: Dominic Cummings : Regime Change #2: A plea to Silicon Valley - LessWrong.

I've copied some of the key sections below:

"I call this dynamic constructive institutionalism — a tendency among leaders to state publicly and even hold the belief that everything will work out, despite quantitative evidence to the contrary, coupled with an inability to actually address a given institution’s real problems...

Indeed, two groups that are especially prone to constructive institutionalism are those that we rely upon both to give us a sense of the problems and to solve them — journalists and politicians.

Journalists are typically trained to be impartial observers, which inhibits them from expressing emotion or opinion. They are supposed to calmly document and present the news. For many, there is an implicit perch of authority and stability. Unfortunately, this has also turned many into market-friendly automatons and cultural guardians who make pro forma gestures about decorum, virtue, and propriety while ignoring the disintegration of trust, the dissipating integrity of their own organizations, or the decline of the American way of life...

If journalists are conditioned to calmly document dispassionately, politicians are conditioned to invoke profundity, resilience, and the greater good at every turn. As a politician you’re like a totem or shaman. You show up to a gathering or charity event to speechify and elevate the proceeding: “Thank you for the incredible work that you’re doing. It’s so important.” Which it is, of course. Though it would be if you didn’t show up too.

You are meant to embody the concerns of the community. You listen patiently to all. You are present. If someone asks you a question, you answer it reassuringly. You express values and aspirations. You are a human security blanket, and your job is to make people feel better."


To every people according to their language

4 октября, 2021 - 21:42
Published on October 4, 2021 6:42 PM GMT

Here is an approximate transcript of a conversation I had with a friend, who is an Orthodox Jew.


ME: Imagine the US government announced a Manhattan project to reverse aging and end death forever. It's estimated this project will cost 5% of GDP. There's no guarantees it will be successful, or if successful how long it will take, but the majority of domain experts agree that 50 years is a pretty good median estimate.

Suddenly all the people between the ages of 20 and 50, who currently accept their death as inevitable, will do a quick bit of mental arithmetic and realize that 50 years is a bit too close for comfort. There'd be calls to spend 10% of GDP and get it done in 25 years, or even 50% of GDP and get it done in 5 years! I'm not saying that either of those strategies would work, but they highlight how all of a sudden a light bulb will go off in peoples minds, and people would be frantically trying to escape death and aging.

So if we know that once we realize there's a possibility to escape death we will be desperate to clutch at it, and right now there's at least some chance that such a Manhattan project could be feasible, we should push for this Manhattan project right away. Let's not wait till the government announces it and we realize how much we always wanted it!

FRIEND: I'm not sure I would be frantic to support such a project - in fact I'd probably oppose it. I believe it would fail anyway since God wants people to die.

ME: But God doesn't inherently want people to die - (according to Orthodox Judaism) before the fall from Eden people were meant to live forever, and people only die as a punishment for Adam and Eve's sin. And in the time of the Messiah, people will live forever.

FRIEND: Yep, but God hasn't bought the Messiah yet.

ME: Do you believe that the Messiah is meant to be bought by people, or by God?

FRIEND: I've only ever heard that people are meant to bring the Messiah through creating world peace, and building the State of Israel. It's always been clear that it's God that's meant to end death.

ME: Well obviously people have always thought that, since it seemed that ending death would be impossible for humanity to achieve. I assume everyone assumed the in-gathering of exiles would be miraculous before Zionism and the creation of the state of Israel. But now that it finally might be within Humanity's grasp, wouldn't it make most sense, and be most consistent to say that the Messianic times as a whole, including world peace and an end to death and disease, are all meant to be bought about by human effort?

FRIEND: That's actually a really beautiful vision! I really like that idea!


When you try to persuade someone of X, it can sometimes seem like you reach an impasse when it turns out that they have fundamentally different beliefs and values to you.

There's often a temptation to try and convince them that their fundamental beliefs are incorrect, so that you can build on that basis, and reach the conclusion you were trying to.

That almost always fails, and so instead you get angry. "Stupid person! How could they possibly think Y? And it's not just a harmless belief - as a result of Y they don't believe X, and that's going to doom us all!"

Needless to say that's not helpful either.

Far more effective to enter the mindset of the other person, and see if X could still make sense according to them. So long as you treat their views with respect, it's often possible to unite people with very different fundamental beliefs with a common cause.

In this conversation I didn't try to dissuade my friend of their religious philosophy. Instead I showed how the idea of fighting death could slot right into it. And it worked a treat!

Note that I wasn't deceiving them by pretending to agree with their religious philosophy - they are perfectly aware of my differences of opinion. But they appreciated the effort to go with what they believed in rather than fighting it. And it ended up bringing us closer together...


FRIEND: Is this a vision of the Messiah you could buy into?

ME: Actually yes it is - I wouldn't necessarily agree with the theological underpinnings of such a vision, but if there was a movement with this as their vision I would be happy to join, given how the aims fit right in with mine, and the story it's telling might not be true, but at least is moving and beautiful.

FRIEND: I'm so glad that we can both agree such a mission!


Candy Innovation

4 октября, 2021 - 16:50
Published on October 4, 2021 1:50 PM GMT

What innovations in candy have there been since the '90s? Are there new flavors? Better imitations of existing flavors? New textures?

So far, all of the candy my kids have brought home seems to be things we could get 25 years ago. Though possibly flavors have improved, since I've only been trying what they've decided to share with me.

There have been some gains due to globalization, where candy that was previously hard to get in the US or unknown here is now more widely available, but has there been development beyond that?

(This post brought to you by yesterday's neighborhood, pinata)

Comment via: facebook


The Dark Side of Cognition Hypothesis

4 октября, 2021 - 08:47
Published on October 3, 2021 8:10 PM GMT

It is sometimes claimed that the ultimate, unifying goal of artificial intelligence research is to instantiate human-level cognition in a computational system (e.g., Minsky, 1961; Lake et al, 2017). If artificial general intelligence (AGI) of this sort is ever successfully developed, the consequences would be unimaginable in scope—surely, it would be the most impressive invention of our tool-making species to date. 

In what follows, I’ll argue that current AI systems almost entirely lack a critical facet of human-level cognition. I’ll discuss the reasons why it is particularly hard for us to recognize—let alone instantiate—this aspect of our cognition, and I'll investigate the predictions that emerge from this sort of account. After sketching this general picture, I'll argue that the framing put forward here ultimately has the potential to unify AI engineering and AI safety as one single project.

Introduction: How We Come to Understand the Mind

At the outset, it is worth asking the extent to which cognitive science bears relevance to AI. It seems clear, however, that if the overarching goal of AI research is to capture the computations that comprise human-level cognition, then a sufficiently comprehensive understanding of human cognition seems a necessary precondition for bringing about this outcome. In other words, if we want to build mind-like computational systems, it follows that we must first understand the mind to some sufficient degree. 

What, then, are the epistemological resources we have at our disposal for understanding the mind? Philosophers and cognitive scientists generally answer along the following lines: to the degree that “the mind is what the brain does,” as Minsky put it, the investigations of neuroscience and psychology allow us to better understand the mind as a standard, third-person, external, empirical object of scientific inquiry (Minsky, 1988). 

But the mind is also unlike other objects of scientific inquiry in one particular way. In addition to—and in some sense prior to—objective inquiry, we can also come to understand the mind through first-person, subjective experience of having (or being) minds ourselves. For instance, our use of a standard cognitive vocabulary (i.e., speaking about beliefs, values, goals, and thoughts as such) both in scientific research and in everyday conversation does not happen because we have consulted the empirical literature and decided to adopt our preferred terminology; instead, we speak this way because of the fact that everyone’s first-person experience agrees that such language corresponds to what we might call “self-evidently subjectively real” mental phenomena (e.g., Pylyshyn, 1984). 

It is also fairly clear that our first-person experiences of mind are not diametrically opposed to scientific inquiry, but rather, actually do much of the work of calibrating the relevant empirical investigations: namely, our motivation to study phenomenon X versus Y in cognitive science almost always originates from some first-person intuition about the relative prominence of the phenomena in question. For instance, the reason the neuropsychological mechanisms of punishment avoidance are far better empirically understood than putative mechanisms of punishment-seeking (i.e., masochism) is because we court experience-based intuitions that the former phenomenon seems real and important for broadly understanding the mind (cognitive science thus studies it rigorously), while the latter phenomenon is generally unrelatable, rare, and pathological (cognitive science thus studies it far less intensely). Our first-person, nonempirical experience of having (or being) minds thus not only directly supplements our understanding of cognition, but also broadly informs, motivates, and calibrates subsequent objective investigations of both neuropsychology and AI. 

What happens, then, when the third-person, empirical apparatus of cognitive science turns to investigate these highly relevant, inquiry-directing, first-person experiences themselves? In other words, what happens when we study empirically what features of mind are and are not actually included in our experience of having a mind? A fairly determinate answer emerges: the mind can neither uniformly penetrate nor uniformly evaluate claims about its own processes. That is, our experience of having (or being) a mind selectively and reliably misses critical information about many of its actual underlying phenomena. 

Smell the Problem?

A simple example that I find particularly illustrative (in the case of human cognition) is the profound asymmetry between the two sensory modalities of olfaction (i.e., smelling) and vision. Whereas researchers posit that we can hardly communicate (“see what others are saying”) without relying on visual analogies and concepts (Ferreira & Tenenhaus, 2007; Huettig et al, 2020), olfaction, on the other hand, has been dubbed “the muted sense” in light of the well-documented difficulty individuals have in verbalizing basic smell-related data, such as identifying the source of common odors (Olofsson and Gottfried, 2015). It is often quipped that there are really only five words exclusively dedicated to smell in the English language: smelly, stinky, acrid, fragrant, and musty—all other seemingly olfactory descriptions are argued to co-opt gustatory or visual language (e.g, we say something “smells like cinnamon,” but we do not say something “looks like banana”—we simply say “yellow”) (Yong, 2015). 

Asymmetries in linguistic accessibility of olfactory and visual information are not the only relevant discrepancies between the two sensory modalities. Perl and colleagues outline the bizarre, pervasive role of subconscious sniffing in the emerging field of social olfaction, including discoveries of highly specific mechanisms to this end, such as individuals subconsciously increasing sniffing of their “shaking” hand after within-sex handshakes (putative “other-inspection”) while subconsciously increasing sniffing of their “non-shaking” hand after cross-sex handshakes (putative “self-inspection”) (Perl et al, 2020). Of particular note to our idiosyncratic inquiry (as well as the posited connection between first-person intuitions about cognition and subsequent research agendas), the researchers in this paper explicitly comment that “we are hard-pressed to think of a human behaviour that is so widespread, that troubles so many people, that clearly reflects underlying processes with developmental, clinical and social relevance, and yet has so little traction in the formal medical/psychological record” (Perl et al, 2020). 

Needless to say, in spite of the demonstrated importance of olfaction in the greater system of the mind from an empirical point of view (and in spite of much still remaining unknown about the functional role of the mysterious modality), AI research has all but ignored the relevance of olfaction for the field’s stipulated goal of instantiating human-level cognition in a computational system. A simple Google Scholar search for “AI olfaction” and “computation olfaction” yield 26,100 and 29,400 results, respectively, while “AI vision” and “computation vision” yield 3 million and 2.7 million results, respectively. 

While the computations associated with vision may indeed be orders of magnitude more familiar to us than those associated with olfaction, it is extremely implausible that this 100-fold asymmetry will be found to map onto the comparative importance of the modalities and their associated computations in the mind/brain. Of course, just because we do not understand from experience what olfaction is up to does not imply that olfaction is not essential in the greater system of the mind. But, given these startling asymmetries in research interest across sensory modalities, it seems as though AI—and neuropsychology more broadly—operates as if this were true. 

More All-But-Missing Pieces: Social Cognition and Skill Learning

This problem, of course, is not limited to discrepancies between vision and olfaction: there are many other extremely important functions of the mind that we either do not experience as such or otherwise find notoriously difficult to comprehend in explicit, systematic, linguistic terms. Two highly relevant further examples are (1) social cognition writ large and (2) skill learning and memory. 

With regard to the former, decades of research primarily led by John Bargh at Yale has demonstrated the ubiquity of automaticity and subconscious processing in social contexts, including imitation (Dijksterhuis & Bargh, 2001), subliminal priming in a social environment (Bargh et al, 2009), social goal pursuit (Bargh et al, 2001), and the effects of group stereotypes (Bargh et al, 1996). In short, humans are profoundly unaware (i.e., do not have any firsthand experience) of many—if not most—of the underlying computational processes active in social contexts. We are, in some sense, the recipients of these processes rather than the authors of them. 

With respect to skill learning and memory, also known as procedural learning and memory, similar patterns emerge: first, “nondeclarative” procedural learning processes have been demonstrated by double dissociation in the brain to function largely independently of “declarative” learning and memory processes, strongly indicating that behavioral learning and memory exists and operates independently from our discrete capacity to systematize the world in explicit, verbal terms (e.g., Tranel et al, 1994). Accordingly, it has been found that not only do people find it exceedingly challenging to explicitly articulate how to perform known skills (e.g., dancing, riding a bike, engaging in polite conversation), but also that attempting to do so can actually corrupt the skill memory, demonstrating that procedural learning is not only “implicit,” but can sometimes be “anti-explicit” (Flegal & Anderson, 2008). There is thus a well-documented “dark side of the mind;” a large subset of ubiquitous cognitive phenomena that—unlike vision, say—we have serious trouble self-inspecting.

Descriptive and Normative Cognition

Let’s now attempt to make some sense of these mysterious computations: is there some discoverable, underlying pattern that helps to elucidate which class of cognitive processes are presumably (1) highly consequential in the greater system of the mind, but (2) with which we are experientially underacquainted? I submit that there is such a pattern—and that by superimposing this pattern onto the current state of AI research, it will retroactively become clear what the field has successfully achieved, what is currently lacking, and what implications this understanding carries for building safe and effective AI. 

The hypothesis is as follows. There is a fundamental, all-encompassing distinction to be drawn between two domains of cognition: descriptive and normative cognition. 

As it will be defined, it is the category of normative cognition into which all the previously considered, implicit, experientially unclear cognitive processes seem to fall (recall: olfaction, social cognition, procedural learning and memory). An account will subsequently be given as to why their being normative necessarily renders them exceedingly hard to understand in explicit, linguistic terms.  

This descriptive-normative dichotomy is not unfamiliar, and it bears specific resemblance to Hume’s well-known distinction between claims of “is” and “ought” (Cohon, 2004). “Descriptive cognition” as it is being used here will refer to the mind’s general capacity to map “true” associations between external phenomena—as they are transduced by sense organs—and to successively use these remembered associations to build systems of concepts (“models”) that map reality. Behavioral psychologists often refer to this kind of learning as classical conditioning—the stuff of Pavlov’s dogs. Interestingly, this account is associated with many of the functional properties of the posterior half of neocortex, including vision (in the occipital lobe), conceptual association (in the parietal lobe), and explicit memory formation (in the hippocampi) storage (in the temporal lobe). In a (hyphenated) word, descriptive cognition entails model-building. In accordance with the Humean distinction, descriptive cognition is responsible for computing “what is”—it thus ignores questions of “should” and “should not.”

“Normative cognition” as it is being used here will refer to the process of behavioral decision-making and its dependence on the construction, maintenance, and update of a complex “value system” (analogous to a descriptive cognition’s “belief system”) that can be deployed to efficiently and effectively adjudicate highly complex decision-making problems. Analogously, this account is associated with many of the functional properties of the anterior half of neocortex, which is known to be differentially responsible for executive functions, valuation, emotion, behavioral planning, and goal-directed cognition (for a comprehensive review, see Stuss and Knight, 2013). In a word, normative cognition entails (e)valuation. In Hume’s vocabulary, normative cognition is the computational apparatus that can be said to deal with “ought” claims. 

In my own research, people were found to vastly differ from one another in the relative attention and interest they devote to their own descriptive and normative representations, further bolstering the legitimacy of the distinction. More descriptively-oriented people tend to prioritize science, rationality, logic, and truth, while more normatively-oriented people tend to prioritize the humanities, art, narrative, and aesthetics (Berg, 2021).  

Before proceeding, it is worth considering the nature of the relationship between descriptive and normative cognition as they have been defined. Clearly, these overarching processes must interact in some way, but how, exactly? And if they interact to a sufficient degree, what right do we really have to differentiate these processes? Here, I will characterize descriptive and normative cognition as epistemologically independent but as mutually enabling: were it not for the constraining influence of the other, the unique computational role of each would be rendered either irrelevant or impossible. Though I do believe the relevant neuropsychology supports this account, I think it can be demonstrated on logical grounds alone. 

First, without a sufficiently accurate (descriptive) model of the external world, it is impossible for an agent to efficiently and adaptively pursue its many (normative) goals and avoid their associated obstacles in a complex, dynamic environment. Here is why I believe this must necessarily be true: one cannot reliably select and navigate towards some desired “point B” (the “normative” computational problem) without having a sufficiently well-formed understanding of (1*) where “point A” is, in environmental terms, (2*) which of the many possible point Bs are actually plausible, (3) which of the plausible point Bs are preferable, (4*) where the preferred point B is, in practical, implementable terms, (5*) which of the many possible “routes” from A to B are actually plausible, and (6) which of the plausible “routes” from A to B are preferable. Of these six preconditions for normative action, four of them (denoted with an asterisk) unambiguously depend upon descriptive models of the agent’s environment. Therefore, on a purely theoretical level, descriptive cognition can be demonstrated to be what actually renders the hard normative problems of (3) and (6) tractable. 

In the same vein, normative cognition enables and constrains descriptive cognition. This is because the only way to adjudicate the hard problem of which models are actually worth the trouble of building given highly finite time, energy, intelligence, and information is by appealing to the misleadingly simple answer, the most relevant models—that is, those models that most reliably facilitate pursuit of the most important goals (and avoidance of the most important obstacles), where “important” really means “important given what I care about” and where what one cares about is in turn determined by one’s constructed and constantly-evolving value system. So while it is certainly true that descriptive and normative cognition are tightly interrelated, these two broad domains of mind are indeed calibrated to different epistemologies—descriptive cognition, to something ultimately like “cross-model” predictive accuracy (e.g., “what do I believe?”); normative cognition, to something ultimately like “cross-goal” reward acquisition (e.g., “what do I care about?”).

The “Dark Side of Cognition” Hypothesis

An intriguing hypothesis, relevant to the future success of AI, emerges from this account. If descriptive and normative cognition are fundamentally computationally discrete, then it should follow that, within any one mind, (A) descriptive cognition would be technically incapable of mapping (i.e., building models of) normative cognition itself, and, (B) analogously, normative cognition would be technically incapable of evaluating (i.e., assigning a goal-directed value to) descriptive cognition itself. This is because all the evidence there is for the internal structure of the normative cognition (to be hypothetically modeled by descriptive cognition) could only ever be conceivably accessed “during” normative cognition itself (e.g., while introspecting during one’s own model-building), and so too for descriptive cognition.   

Of particular relevance to the field of AI is (A), that descriptive cognition would be technically incapable of mapping (i.e., building models of) normative cognition itself. This is because, returning to our starting point, the unifying goal of AI research is to instantiate human-level cognition in a computational system, which seems to require a descriptive understanding of all cognition—descriptive and normative alike. But herein lies what I strongly believe to be the overriding oversight in current AI approaches: if (1) all cognition can be validly classified as either descriptive or normative, (2) what we descriptively know about the mind is either directly supplemented by or indirectly guided by our first-person experience of having (or being) minds, and (3) it is technically impossible in building a descriptive model of our own minds to map its normative parts, then we should reasonably expect current approaches to AI omit, ignore, or discount normative cognition. I will call this the “Dark Side of Cognition Hypothesis,” or “DSCH” for short.  

Examining the Hypothesis

Is DSCH—the idea that up to this point, AI has largely ignored normative cognition—borne out by the available evidence? Let us attempt to answer this question using as our case study Lake and colleagues’ paper, Building machines that learn and think like people, which helpfully captures both the state of the field and its own researchers’ thoughts about its trajectory (Lake et al, 2017; hereafter, “L, 2017”). 

After its introduction, the paper presents two modern challenges for AI, dubbed “the characters challenge,” which concerns accurate machine parsing and recognition of handwritten characters, and “the Frostbite challenge,” which refers to control problems related to the eponymous Atari game using a DQN (L, 2017). Then, the paper talks at length about the interesting prospect of embedding core concepts like number, space, physics, and psychology into AI in order to assist with what is referred to as the “model-building” process (explicitly contrasted against the notion of “pattern recognition”) (L, 2017). Finally, in “future directions,” the paper talks at length about the predictive power of deep learning and future prospects for further enhancing its capabilities (L, 2017). 

As a dual index into the state of the field and the minds of its researchers, this paper offers both a sophisticated account of what we might now refer to as “artificial descriptive cognition” (particularly in its cogent emphasis on “model-building”) and a number of intriguing proposals for enhancing it in the future. However—and in spite of the paper itself quoting Minsky in saying “I draw no boundary between a theory of human thinking and a scheme for making an intelligent machine”—in its 23 total sections on the present and future of AI, the paper brings up topics related to “artificial normative cognition” in only three (and this is when counting generously) (L, 2017). Two of these invocations relate to DQNs, which, by the paper’s own characterization of the class of algorithms (“a powerful pattern recognizer...and a simple model-free reinforcement learning algorithm [emphasis added]”), still derive most of their power from descriptive, not normative, computation. 

The third example comes from the paper’s discussion of using a partially observable MDP for instantiating theory of mind into AI (L, 2017). This example is particularly illustrative of the kind of oversight we might expect under an account like DSCH: the researchers seem to acknowledge that to fundamentally make sense of other minds, an agent should attempt to predict their goals and values using a POMDP (as if to say, “others’ goals and values are the most fundamental part of their minds”), and yet, in discussing how to build minds, the researchers all but ignore the instantiation of complex goals and values, instead opting to focus solely on descriptive questions of bringing about maximally competent model-building algorithms (L, 2017). 

Though the Lake paper is just a single datapoint—and in spite of the ample credit the paper deserves for its genuinely interesting proposals to innovate “artificial descriptive cognition”—the paper nonetheless supports the account DSCH provides: our intuitive descriptive theories of how to model the mind, much to our collective embarrassment, omit the evaluative, socially-enabling processes that render us distinctly human. Needless to say, the paper uses the word “vision” eight times and does not mention olfaction (L, 2017).

Human-level cognition features normative “value systems” that are equally complex—and as computationally relevant to what makes human-level cognition “human-level”—as more-familiar, descriptive “belief systems,” and yet most AI research seems to almost exclusively attend to the “algorithmization” of latter, as DSCH would predict. As understandable as this state of affairs may be, this oversight is not only stymying the progress of the multi-billion dollar field of AI research; it is also highly dangerous from the perspective of AI safety. 

Normative Cognition as a Safety Mechanism

One of the more troubling and widely-discussed aspects of the advent of increasingly competent AI is that there is no guarantee that its behavior will be aligned with humanity’s values. While there have been numerous viable proposals for minimizing the likelihood of this kind of scenario, few involve positive projects (i.e., things to do rather than things to avoid) that straightforwardly overlap with current agendas in AI research. Devoting meaningful effort to the explicit construction of human-level normative cognition will simultaneously progress the field of AI and the adjacent mission of AI safety researchers: endowing AI systems with a value system (and means for updating it) designed in accordance with our own will vastly decrease the likelihood of catastrophic value-based misunderstandings between engineers and their algorithms. 

It is important to note that there is a reason we trust humans more than a hypothetical superintelligence (and hence support human-in-the-loop-type proposals for ensuring AI alignment): virtually all humans have a certain kind of cognition that intuitively renders them trustworthy. They generally care about others, they want to avoid catastrophe, they can err on the side of caution, they have some degree of foresight, and so on. But this is because we expect them to value these things—and to competently map these values onto their behavior. If we understood normative cognition—the cognition that enables competent valuation—we could in theory build AI systems that we would trust not to accidentally upend civilization as much as (if not far more than) human engineers, systems with a genuine sense of duty, responsibility, and caution. 

The ultimate danger of current AI approaches is that valueless and unvaluing systems are being constructed with the prayer that their behavior will happen to align with our values. This is sure to fail (or, at the very least, not succeed perfectly), especially as these systems become increasingly competent. An AGI without normative cognition would be one that we would immediately recognize as horrifyingly unbalanced: at once a genius map-maker, able to build highly complex models, and a highly foolish navigator, unable to use these models in a manner that we would deem productive—or safe. In order to build AGI whose values are aligned with our own, its intelligence must scale with its wisdom. The former, I believe, is descriptive in character; the latter, normative. Both are mutually necessary for avoiding catastrophe.      


What, then, should be done to correct this asymmetry in AI between descriptive and normative cognitive modeling? We would imagine one obvious answer to be that the field should simply spend relatively less time on pattern recognition and model-building and relatively more time on developing and formalizing normative computations of value judgment, goal pursuit, social cognition, skill acquisition, olfaction, and the like, in accordance with the foundation already laid by current RL approaches. This, I believe, is highly necessary but not alone sufficient. The simple reason why is because AI researchers, for all their talents, are generally not experts in the complexities of human normative cognition—and this is not their fault. Understanding these processes has not, at least up to this point, a skill-set required to excel in the field. 

However, such experts do exist, even if they do not self-identify as such: these are predominantly the scholars and thinkers of the humanities. Before, we reasoned that within any one mind, one cannot make descriptive sense of one’s own normative cognition given a fundamental epistemological gap between the two processes; humanities scholars cleverly innovate around this problem by distilling the content of normative cognition into an external narrative, philosophy, artwork, or other text, thereby enabling investigation into the underlying mechanics of its rich normative (value-based) content. In this way, normative cognition has been studied rigorously for millennia, just not under this idiosyncratic name.

Once AI reaches the point in its near development where it will become necessary to confront questions about the implementation of higher-level goals, values, and motivations—especially in the social domain—I believe that the probability of the field’s success in instantiating human-level cognition (and doing so safely) will be proportional to its capacity to accommodate, synthesize, and ultimately “program in” the real and important insights of the humanities. Not only would this proposal for the inclusion of the humanities in the future trajectory of AI research increase the likelihood of the field’s success, but it would also enable a crucial bulwark against the possibility for profound ethical blunders that could more generally accompany the poorly understood integration of (potentially sentient, suffering-capable) minds into computational systems. 

Generally speaking, the goal of computationally instantiating human-level cognition is surely the most ambitious, profound, and evolutionarily significant in the history of humankind. Such an accomplishment would be all but certain to dramatically radically alter the trajectory of everything we care about as a species, especially if one grants the possibility of an “intelligence explosion,” which most AI researchers in fact do (Good, 1966; Muller and Bostrom, 2016). Accordingly, the construction of a human-level cognitive system must not be considered an esoteric task for clever programmers, but rather as a profound responsibility of descriptively- and normatively-minded thinkers alike. In the absence of multidisciplinary collaboration on this grand project, it is overwhelmingly likely that some critical feature (or, as the DSCH posits, an entire domain) of our minds that renders them truly human will be omitted, ignored, underestimated, or never considered in the first place, the consequences of which we will be all too human to fully understand and from which we may never have the opportunity to recover. The stakes are high, and it is incumbent on researchers and thinkers of all backgrounds and persuasions to get the initial conditions right. 



Works Cited

Bargh, J. A., Chen, M., & Burrows, L. (1996). Automaticity of social behavior: Direct effects of trait construct and stereotype activation on action. Journal of Personality and Social Psychology, 71(2), 230–244. https://doi.org/10.1037/0022-3514.71.2.230

Bargh, J. A., Gollwitzer, P. M., Lee-Chai, A., Barndollar, K., & Trötschel, R. (2001). The automated will: Nonconscious activation and pursuit of behavioral goals. Journal of Personality and Social Psychology, 81(6), 1014–1027. https://doi.org/10.1037/0022-3514.81.6.1014

Berg, C. (2021). Hierarchies of Motivation Predict Individuals’ Attitudes and Values: A Neuropsychological Operationalization of the Five Factor Model. PsyArXiv. https://doi.org/10.31234/osf.io/wk6tx

Cohon, R. (2018). Hume’s Moral Philosophy. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy (Fall 2018). Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/archives/fall2018/entries/hume-moral/

Dijksterhuis, A., & Bargh, J. A. (2001). The perception–behavior expressway: Automatic effects of social perception on social behavior. In Advances in experimental social psychology, Vol. 33 (pp. 1–40). Academic Press.

Ferreira, F., & Tanenhaus, M. K. (2007). Introduction to the special issue on language–vision interactions. Journal of Memory and Language, 57(4), 455–459. https://doi.org/10.1016/j.jml.2007.08.002

Flegal, K. E., & Anderson, M. C. (2008). Overthinking skilled motor performance: Or why those who teach can’t do. Psychonomic Bulletin & Review, 15(5), 927–932. https://doi.org/10.3758/PBR.15.5.927

Good, I. J. (1966). Speculations Concerning the First Ultraintelligent Machine. In Advances in Computers (Vol. 6, pp. 31–88). Elsevier. https://doi.org/10.1016/S0065-2458(08)60418-0

Good—1966—Speculations Concerning the First Ultraintelligent.pdf. (n.d.). Retrieved May 15, 2021, from https://asset-pdf.scinapse.io/prod/1586718744/1586718744.pdf

Harris, J. L., Bargh, J. A., & Brownell, K. D. (2009). Priming effects of television food advertising on eating behavior. Health Psychology, 28(4), 404–413. https://doi.org/10.1037/a0014399

Huettig, F., Guerra, E., & Helo, A. (n.d.). Towards Understanding the Task Dependency of Embodied Language Processing: The Influence of Colour During Language-Vision Interactions. Journal of Cognition, 3(1). https://doi.org/10.5334/joc.135

Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman, S. J. (2016). Building Machines That Learn and Think Like People. ArXiv:1604.00289 [Cs, Stat]. http://arxiv.org/abs/1604.00289

Lawson et al. - 2017—Adults with autism overestimate the volatility of .pdf. (n.d.). Retrieved May 15, 2021, from https://www.nature.com/articles/nn.4615.pdf?origin=ppub

Lawson, R. P., Mathys, C., & Rees, G. (2017). Adults with autism overestimate the volatility of the sensory environment. Nature Neuroscience, 20(9), 1293–1299. https://doi.org/10.1038/nn.4615

Lord, C., Risi, S., Lambrecht, L., Cook, E. H., Leventhal, B. L., DiLavore, P. C., Pickles, A., & Rutter, M. (n.d.). The Autism Diagnostic Observation Schedule–Generic: A Standard Measure of Social and Communication Deficits Associated with the Spectrum of Autism. 19.

Lord et al. - The Autism Diagnostic Observation Schedule–Generic.pdf. (n.d.). Retrieved May 15, 2021, from https://link.springer.com/content/pdf/10.1023/A:1005592401947.pdf

Miller, L. K. (1999). The Savant Syndrome: Intellectual impairment and exceptional skill. Psychological Bulletin, 125(1), 31–46. https://doi.org/10.1037/0033-2909.125.1.31

Minsky, M. (1961). Steps toward Artificial Intelligence. Proceedings of the IRE, 49(1), 8–30. https://doi.org/10.1109/JRPROC.1961.287775

Minsky, M. (1988). Society Of Mind. Simon and Schuster.

Müller, V. C., & Bostrom, N. (2016). Future Progress in Artificial Intelligence: A Survey of Expert Opinion. In V. C. Müller (Ed.), Fundamental Issues of Artificial Intelligence (pp. 555–572). Springer International Publishing. https://doi.org/10.1007/978-3-319-26485-1_33

Olofsson, J. K., & Gottfried, J. A. (2015). The muted sense: Neurocognitive limitations of olfactory language. Trends in Cognitive Sciences, 19(6), 314–321. https://doi.org/10.1016/j.tics.2015.04.007

Perl et al. - Are humans constantly but subconsciously smelling .pdf. (n.d.). Retrieved May 14, 2021, from https://royalsocietypublishing.org/doi/pdf/10.1098/rstb.2019.0372

Perl, O., Mishor, E., Ravia, A., Ravreby, I., & Sobel, N. (n.d.). Are humans constantly but subconsciously smelling themselves? 13.

Pylyshyn, X. (n.d.). Computation and Cognition | The MIT Press. The MIT Press. Retrieved May 5, 2021, from https://mitpress.mit.edu/books/computation-and-cognition

Stuss, D. T., & Knight, R. T. (2013). Principles of Frontal Lobe Function. OUP USA.

Tranel, D., Damasio, A. R., Damasio, H., & Brandt, J. P. (1994). Sensorimotor skill learning in amnesia: Additional evidence for the neural basis of nondeclarative memory. Learning & Memory, 1(3), 165–179. https://doi.org/10.1101/lm.1.3.165

Wilson, E. O. (1999). Consilience: The Unity of Knowledge. Vintage Books.

Yong, E. (2015, November 6). Why Do Most Languages Have So Few Words for Smells? The Atlantic. https://www.theatlantic.com/science/archive/2015/11/the-vocabulary-of-smell/414618/


What role should LW play in AI Safety?

4 октября, 2021 - 05:21
Published on October 4, 2021 2:21 AM GMT

Many people on LW consider AI Safety either the most, or one of the most, important issues that humanity has to deal with. Surprisingly, I've seen very little discussion about how the LW community slots in here. I'm sure that the Lightcone team has discussed this extensively, but very little of their discussions have made it onto the forum. I hope that they write up some more of their thoughts at some point, so that the community can engage with them, but since there hasn't been much written on this topic, I'll focus mostly on how I see this topic.

I think a good place to begin would be to list the different ways that the Less Wrong community contributes or has contributed towards this project. By the LW community, I mean the broader rationalsphere, although I wouldn't include people who have just posted on LW once or twice without reading it ir itherwise engaging with the community:

a) By being the community out of which MIRI arose
b) By persuading a significant number of people to pursue AI safety research either within academia or outside of it
c) By donating money to AI Safety organisations
d) By providing a significant number of recruits for EA
e) By providing an online space in which to explore self-development
f) By developing rationality tools and techniques useful for AI safety (incl. CFAR)
g) By improving communication norms and practices
h) By producing rationalist or rationalist-adjacent intellectuals who persuade people that AI Safety is important
i) By providing a location for discussing and sharing AI Safety research
j) By creating real-world communities that provide for the growth and development of participants
k) By providing people a real-world community of people who also believe that AI safety is important
g) By providing a discussion space free from some of the political incentives affecting EA

Some of these purposes seem to have been better served by the EA community. For example, I expect that the EA community is currently ahead in terms of the following:

a) Building new institutions that focus on AI safety
b) Donating money to AI Safety organisations
c) Recruiting people for AI Safety research

The rationality community may very well be ahead of EA in terms of having produced intellectuals who persuaded people that AI Safety is important, but I would expect EA and the academic community to be more important going forward.

I think that LW should probably focus more on the areas where it has a comparative advantage and which takes into account our strengths and weaknesses.

I would list the strengths of the LW community compared to EA as the following:

  • Greater development of and stronger filter for rationality (hopefully we aren't all just wasting our time)
  • Greater intellectual focus and intelligence filter
  • Less subject to political and public relations incentives
  • Stronger concentration of mathematical and programming skills

And I would list our weaknesses as:

  • Less practical focus and operations ability
  • Less co-ordination and unity
  • Less ability to operate in the social landscape
  • Less engagement with academic philosophy

I would list the strengths of the rationality community compared to the academic AI Safety community as the following:

  • Greater development of and stronger filter for rationality
  • Greater focus on actually solving the problem and less temptation to dress up previous research as relevant
  • Less overhead associated with academic publication and faster ability to iterate
  • Less pressure to publish for the sake of publishing
  • Less pressure to maintain respectability

And I would list our weakness as:

  • Less technical skills
  • Less ability to access funding sources outside the AI Safety/Rationality/EA communities
  • More likely to be attempting to make progress on the side of our day jobs

Given this situation, how should LW slot into the AI safety landscape?

(I know that the ontology of the posty is a bit weird as there is overlap between LW/EA/Academia, but despite its limitations, I still feel that this frame is useful)


Trust and The Small World Fallacy

4 октября, 2021 - 03:38
Published on October 4, 2021 12:38 AM GMT

When people encounter other people they recognize, they exclaim "small world!"

I suspect that most people have 300 acquaintances or less. I probably have under 100. Still, sometimes I run into people I know and I'm tempted to say "small world".

But it's not actually a small world, is it? It's an unimaginably enormous world.

I mean that literally. You cannot imagine how big the world is.

You're not likely to meet a million people in your life. If you were to meet 100 strangers in 8 hours, you would have less than 5 minutes to spend with each person. If you met 100 strangers every day including weekends, with no vacation days, it would take over 27 years to meet a million people.

How many of those million people would you be able to remember after you've been meeting 100 of them every day for 27.4 years? A few hundred, maybe? A few thousand if you have an especially good memory? It seems to me that even after you've met a million people, your brain is already too small to properly comprehend the thing you just accomplished.

And a million people is nothing in this world. This world has over 7,000 million people. It's truly beyond imagination.

There was a time when the entire global anti-vax movement was centered around a single man who wrote a single paper citing the opinion of 12 parents that perhaps the combination MMR (measles, mumps, and rubella) vaccine caused a combination autism and bowel disease, or as the paper put it, "chronic enterocolitis in children that may be related to neuropsychiatric dysfunction." Among other anomalies, this man took unusual steps like holding a press conference about his N=12 study "Early Report", having a "publicist" answer his phone, and filing a patent for a measles vaccine months before publishing his paper.

At that time you could argue that we should Beware The Man of One Study. Science produces many studies, including many that suffer from a small sample size, and even some with large biases. Some studies are even fraudulent. Did you know that over 100,000 papers have been published on the topic of climate change? The point is, any reasonable person won't take a single study as proof (though it is still evidence).

Of course, it's not as if "Beware The Man of One Study" would have ever been an effective argument against an anti-vaxxer, even back then. Somehow, the original claim that "the combination MMR vaccine is related to a bowel disease and autism, and we should give kids 3 single vaccines instead" morphed into "the MMR vaccine causes autism" which turned into "vaccines cause autism". The man of one study "early report" became the global movement of zero studies. And the telephone game alone can't explain this transformation. In an actual telephone game, the last child in line will not insist that what they heard is obviously the real truth and that the rest of the class is engaged in a coverup, nor will the child suspect that maybe the conspiracy goes all the way up to the principal's office. So if somebody can explain why anyone bought into "all vaccines cause autism" in the first place, I'm all ears. (Post hoc ergo propter hoc, obviously, but what's hard to explain is extreme confidence based on basically no evidence.)

So, kudos to those skeptical of an idea supported only just one study or blog post.

It's not enough though.

If there is just one crank or quack with a degree in science or medicine for every hundred ordinary scientists, how many is that?

Very roughly, there are 11 million people with science degrees in the U.S. alone, and if 1 out of every hundred is a crank or quack, that would be 110,000 cranks and quacks with science degrees, including roughly 6,500 cranks and quacks with science PhDs in the U.S. alone. I don't have a good estimate of the prevalence of quackery or crankery, but even if it were only 0.1%, we'd still have 11,000 cranks and quacks with science degrees and 650 with science PhDs in the U.S. That's the nature of living in a Giant World.

This leads me to propose the Small World Fallacy: the feeling that if you see a long parade of scientists or doctors proposing the same ideas over and over, that idea must surely be correct.

It's the Chinese Robber Fallacy in reverse. The Chinese Robber Fallacy allows you to demonize a group by writing out a parade of negative facts about the group you want to demonize. Like demonizing Chinese people by talking about each and every robbery recorded in the world's largest country. Or if we wanted to demonize cardiologists, we'd dig up every accusation and conviction made against any cardiologist:

It takes a special sort of person to be a cardiologist. This is not always a good thing.

You may have read about one or another of the “cardiologist caught falsifying test results and performing dangerous unnecessary surgeries to make more money” stories, but you might not have realized just how common it really is. Maryland cardiologist performs over 500 dangerous unnecessary surgeries to make money. Unrelated Maryland cardiologist performs another 25 in a separate incident. California cardiologist does “several hundred” dangerous unnecessary surgeries and gets raided by the FBI. Philadelphia cardiologist, same. North Carolina cardiologist, same. 11 Kentucky cardiologists, same. Actually just a couple of miles from my own hospital, a Michigan cardiologist was found to have done $4 million worth of the same. Etc, etc, etc.

My point is not just about the number of cardiologists who perform dangerous unnecessary surgeries for a quick buck. It’s not even just about the cardiology insurance fraud, cardiology kickback schemes, or cardiology research data falsification conspiracies. That could all just be attributed to some distorted incentives in cardiology as a field. My point is that it takes a special sort of person to be a cardiologist.

Consider the sexual harassment. Head of Yale cardiology department fired for sexual harassment with “rampant bullying”. Stanford cardiologist charged with sexually harassing students. Baltimore cardiologist found guilty of sexual harassment. LA cardiologist fined $200,000 for groping med tech. Three different Pennsylvania cardiologists sexually harassing the same woman. Arizona cardiologist suspended on 19 (!) different counts of sexual abuse. One of the “world’s leading cardiologists” fired for sending pictures of his genitals to a female friend. New York cardiologist in trouble for refusing to pay his $135,000 bill at a strip club. Manhattan cardiologist taking naked pictures of patients, then using them to sexually abuse employees. New York cardiologist secretly installs spycam in office bathroom. Just to shake things up, a Florida cardiologist was falsely accused of sexual harassment as part of feud with another cardiologist.

And yeah, you can argue that if you put high-status men in an office with a lot of subordinates, sexual harassment will be depressingly common just as a result of the environment. But there’s also the Texas cardiologist who pled guilty to child molestation. The California cardiologist who killed a two-year-old kid. The author of one of the world’s top cardiology textbooks arrested on charges Wikipedia describes only as “related to child pornography and cocaine”.

Then it gets weird. Did you about the Australian cardiologist who is fighting against extradition to Uganda, where he is accused of “terrorism, aggravated robbery and murdering seven people”? What about the Long Island cardiologist who hired a hitman to kill a rival cardiologist, and who was also for some reason looking for “enough explosives to blow up a building”?

Like I said, it takes a special sort of person.

Of course, to prove that our reporting is fair and balanced, we also acknowledge that cardiologists sometimes help people. #NotAllCardiologists

Using this technique in reverse, we seek out the many cranks and quacks who agree with us (just so long as they have academic credentials), gather them all together on the same blog, TV channel or documentary, and sing praises to their credentials and their bravery for coming forward despite the risks to their career. As for any who disagree with us, we simply don't invite them. (Though if we do want the appearance of legitimacy, we could also invite a token voice from the other side. In that case we can talk over them, or edit out their key arguments, or try to goad them into anger so that we appear to be the reasonable ones, or invite an expert in a certain field (e.g. glaciology) and then counter him with arguments about related fields (e.g. ocean science) that the expert doesn't know much about. Or we can simply take advantage of the fact that most scientists are not stars of their college debate club, and face the scientist off against a quack with years of experience in debate and salesmanship.)

So that's the Small World Fallacy. Related to it is what I will call the Gish Fallacy, named after the Gish Gallop: a series of arguments delivered in rapid succession so that there are too many arguments for your debate opponent to address. The Gish Fallacy, then, is to believe that a long series of arguments constitutes good evidence that a belief is true. (Plus there's another small world fallacy, where e.g. 1,000 deaths is treated as a large number in a country of 330 million people, while inconveniently high numbers are stated as a percentage of the population instead. Probably this trick has another name.)

By themselves, the Small World fallacy and the Gish Fallacy aren't very interesting, because they can be understood as reasonable consequences of how humans process information. Each new piece of information fits into either a mental model or (more often) a story/narrative, which any good Bayesian would recognize as evidence for the proposition(s) supported by that mental model or narrative.

In other words, it's more likely that you would hear people say "vaccines cause autism" in a world where vaccines do cause autism than in a world where they don't. It's also more likely that you would see a parade of doctors talking about the dangers of vaccines in a world where vaccines are dangerous than in a world where they aren't.

So there's actually nothing wrong with believing those doctors and coming away thinking that "vaccines cause autism" or "spike protein is dangerous" or even "Covid vaccine could be worse than the disease". This is all fine! Believing this can be perfectly reasonable under circumstances in which you've accidentally received a biased stream of information.

It's just that...

We don't live in that world.

In our world you hear both "vaccines cause infertility" and "there's no evidence vaccines cause infertility" (autism is so 1998 — try to keep up), and then somehow you pick one of those statements and are completely confident that you picked correctly.

The problem comes when someone provides evidence that a particular vaccine could possibly cause infertility and you completely ignore it. (When I heard this, I didn't ignore it, I listened closely and remain open to evidence to this day. It's just that I need much more evidence than "one guy said this on a blog and then some other guys cited the blog.")

By the same token, the problem comes when someone provides evidence that the guy who said "the ovaries get the highest concentration" of vaccine LNPs was lying. At this point, is your response to refuse to acknowledge even a chance that he isn't trustworthy

If so, you may be a proud member of at least half the population (including, no doubt, some LessWrong fans). I'm not talking about the minority of Americans who refuse Covid vaccines — I'm talking about the majority who ignore evidence, regardless of political stripe.

What do we call this behavior? Tim Urban calls it The Attorney on the Thinking Ladder. Scott Alexander calls it a trapped prior, maybe forgetting his earlier musings about related medical conditions.

Whatever it is, it's a real problem that causes real conflict and real deaths. I would go so far as to say that lousy epistemic practice, on the whole, not only kills people, but is the root cause of most suffering and early death in the world.

Case in point: My uncle — and former legal guardian, a man who I grew up with for 8 years and who gave me my first real job — died last week after spending weeks on a ventilator following a Covid-19 infection and stroke. I will be attending the funeral tomorrow.

Like my own father, my uncle was unvaccinated.

Will his brother's death affect my father's views on vaccination? I doubt it. I predict he will blame the stoke and the hospital staff for refusing to give him drugs such as ivermectin (if they didn't give him ivermectin; I really have no idea.) "Covid wasn't what killed him", he will say, "and vaccines are still dangerous".

My dad, you see, has been watching his very own Small World Fallacy, a "faith-based" TV channel called Daystar with its own dedicated anti-vax web site. It features a parade of opinions from people called "doctor", bringing far-left luminaries like Robert Kennedy Jr together with the Evangelical Right, plus gospel truths from the original anti-vaxxer Andrew Wakefield in the film "Vaxxed".

In summary:

  • After you filter out one side of a debate, the other side is still a very large group that can be used to create the Small World Fallacy: an impression of tremendous evidence based on the sheer number of proponents of a theory. It's often paired with the Gish Fallacy: an impression of tremendous evidence created by a large number of arguments.
  • Therefore, to the extent that an information source filters out ideas/analyses based simply on what conclusion those ideas/analyses lead to, a large collection of supporters and arguments presented for a theory do not prove or disprove the theory, but should reduce your confidence in the trustworthiness of the source. Even if you like the source, it could be misleading you.

But all of this leaves us in a pickle. Without becoming experts ourselves, how are we supposed to tell which side of the debate is right?

  • Even if the mainstream media were trustworthy, it lost most of its funding when the internet arrived. It not only competes with unpaid bloggers like myself, but faces a mentality that "information should be free".
  • The CDC and FDA have said and done boneheaded things throughout the pandemic. When, how and why can we trust anything they say?
  • Scientists and journalists are paid! Can we trust them anyway, or should be put our faith in bloggers who make wild accusations for free? Or maybe we should trust the private sector? "Greed is good", so any research they fund must be kosher?

The non-answer to this is "trust no one". But most people use "trust no one" as an excuse to believe whatever the hell they want.

Here are some practices I would advocate:

First, don't trust any source that consistently sides with one political party or one political ideology, because Politics is the Mind Killer.

Second, more generally, be suspicious of a source that filters out information according to whether it points toward The Desired Conclusion. Such sources aren't useless, but are certainly not to be trusted. Prefer to read sources without obvious biases. Spend time looking for a variety of opinions, and hang out with smart people who share your disdain for echo chambers.

Third, consider scientists (and other experts talking about their own field) to be generally more trustworthy than non-scientists (full disclosure: I'm not a scientist), and consider scientists as a group are more trustworthy than any individual scientist.

I'm not saying you can trust any random scientist. And yes there is a replication crisis, and social science doesn't have a good reputation. But it seems like a great many people think that you can trust a non-scientist because they sound trustworthy, or speak with confidence, or tell a good story, or most dangerously, share your politics.

In other words, people think they can ignore credentials and trust someone who "speaks to their gut", when in fact this is a great way to end up believing bullshit. Another way people screw up is to think someone is trustworthy because they use a lot of technical language that sounds scientific. Unfortunately, this is ambiguous; they might be truthful, or they might be using fancy words in an effort to look smart. Even someone who has the university degree of an expert, and has published papers in a field, might be a crank in that same field (though cranks often hop over to nearby fields). And while only a small minority of scientists are cranks, cranks have a tendency to attract far more attention than non-cranks. It's not necessarily that cranks are more charismatic, but they are always very confident and have very strong views, and it seems like a lot of influencers are attracted to confident people who sound trustworthy, tell a good story, share their politics and make bold statements. Thus, cranks rise to the top.

The fact that many scientists are awful communicators who are lousy as telling stories is not a point against them. It means that they were more interested in figuring out the truth than figuring out how to win popularity contests.

So, trust scientific consensus where available. However, scientific consensus information is often hard to find, or no one has gathered it. Plus, information you are told about consensus could be biased. I heard, for instance, that there was a 97% consensus about something, but it turned out to be more like an 90% consensus give-or-take when I researched it. That's still pretty decent, but importantly, it turned out that the other 10%-ish were highly disunified, often proposing different explanations; there was no serious competing theory for them to rally around.

And this brings me to another reason why scientists tend to be more trustworthy: they tend to have "gears-level models", i.e. their understanding of the world is mechanistic, like a computer; it's the kind of understanding that allows predictions to be made and checked, which in turn allows their models to be refined over time (or in some cases thrown out completely) when it makes prediction errors. Unlike layperson explanations or post-hoc rationalizations, this allows scientific models to improve over time, until eventually all scientists end up believing the same thing. This is not groupthink; careful scientific thinking and experiments allow different people to arrive at the same conclusion independently. In contrast, many people calling themselves "independent thinkers" come up with suspiciously different physical mechanisms to justify their suspiciously similar beliefs.

Fourth, if you can't figure out what the consensus is, but you still want to know if a theory is true, research two bold claims from that theory in some detail — the first two bold claims will do nicely. Ideally, however, don't pick claims from an obvious crank or you'll bias your own conclusion; pick the most reasonable-sounding version of the theory you know of. Search Google Scholar, email experts, read a textbook about the topic of interest, or call a random professor in a random university on the goddamn phone if that's what it takes.

But the detail is the important thing. People are normally motivated to stop their research when they have "proven" the conclusion they like. For many people this just means posting an article to Facebook because the headline spoke to them, so in comparison you probably think you're some kind of genius for searching on YouTube for a controversial claim and finding a video supporting or refuting it. Sorry, that's not enough. Keep digging until you know lot of detail about at least one of those claims. Where did it come from? How much evidence is there? Is there a competing theory for the same evidence? How often do scientists agree or disagree? Does readily-available data fit the theory? Does readily-available data fit a competing theory? It may sound like a lot of work, and it could be, but if you really care about the topic, you are only researching two claims and you should be able to push through it. This is called epistemic spot checking, and it works pretty well because bullshitters usually lie a lot. Therefore every bold claim from a bullshitter is much more likely to be false than true, and two truthful bold claims in a row proves that the source is either truthful or unusually lucky. (If it turns out that one claim is true and the other is false, chances are the theory can't be trusted, but check a third claim to be sure.)

Fifth, look for people who have a history of good forecasting. Predicting the future is hard, so a person who proves they are good at predicting the future has also proven a penchant for clear thought. (Now, can anyone tell me how to find blogs written by superforecasters?)

Sixth, read the sequences to improve yourself. Internalize the 12 virtues of rationality and all that. This stuff isn't perfect, but I don't know of anything better.

Seventh, if you read this all the way through, your epistemology was probably pretty good in the first place and you hardly needed this advice. Nevertheless I do want to stress that "who should I trust?" is a question whose difficulty is wildly underestimated, and the fact that 100 million people can so vehemently disagree with another 100 million people about simple factual questions like "does it cause autism?" is evidence of this.

Eighth, there really should be more and better methods available than those above. For instance, research is hard, peer-reviewed articles are jargon-filled to the point of incomprehensibility, and we shouldn't all have to do separate individual research. Someday I want to build an evidence-aggregation web site so we can collectively work out the truth using mathematically sane crowdsourcing. Until then, see above.


An analysis of the Less Wrong D&D.Sci 4th Edition game

4 октября, 2021 - 03:03
Published on October 4, 2021 12:03 AM GMT

This is an analysis of the game described in this post - you should read that first.

I trained a predictor on most of the games, and used the rest to validate. The predictor has an out-of-sample AUC of 0.79, which is not fantastic, but if these are really supposed to be games played by people, we can hardly hope that using just the team compositions to predict who won would get close to perfect accuracy. The post describing the game gives a single Blue team composition to optimize against. Once I had the predictor, I generated all possible Green teams, and evaluated how each would do against that composition. That gave me win probabilities for every Green team against the one single Blue team.

The predictor has some calibration problems:

Although the model has OK calibration, it's pretty underconfident at the low end, and pretty overconfident at the high end. But I'm sick of dinking with the model, so let's push on, pretend it's perfectly calibrated, and talk about its predictions as win probabilities.

Blue has a good team

So, first, Blue has quite the team. There are 11,628 (choose-5-from-19) possible team comps for Green, but against the Blue comp from the original post, less than 700 of these have a better-than-even win probability:

This is really surprising. More on this later.

So here's my recommendation to the Green team: The team with the highest probability of winning (against the Blue team Dire Druid, Greenery Giant, Phoenix Paladin, Quartz Questant, Tidehollow Tyrant) is Arch-Alligator, Greenery Giant, Landslide Lord, Nullifying Nightmare, and Phoenix Paladin. The predicted win probability for that Green team is around 75%. (Considering the calibration plot above, maybe we should move this to 78%.)

Let's see what some winning Green compositions look like. There are 132 teams with a predicted win probability of 60% or greater (60% chosen arbitrarily). Here's how often each character shows up in these top teams:

Since there are 19 characters, each shows up in 1/19 = 5.3% of all possible compositions. But Arch-Alligator shows up in 87% of these winning compositions! Or 16x as often as if we picked teams randomly. So Arch-Alligator is extreme fire against this Blue composition for some reason, Fire Fox and onward are bad against it, and the other characters are of varying effectiveness. I'm surprised how many characters have a ratio of more than 1 here - I don't understand that. Naively, I'd've expected half to have a ratio of less than 1. - especially since most compositions have less than 50% win probability against this Blue team.

How well do individual heroes do against each other?

Up till now we've been looking at how well different compositions do against the one specific Blue composition that we think the team we're playing will choose. But now forget them for a moment, and instead, let's look at this video game as a whole: how well do the different heroes do against each other?

I looked at the win probabilities for all kinds of different match ups, pitting 1,400 random Green teams against 1,400 different random Blue teams, for a total 1,400^2 = 2 million games. Then I took the average win probabilities for each individual character on each Green team against each individual on the Blue team. For example, to evaluate the matchup of Nullifying Nightmare versus Tidehollow Tyrant, I took the mean win probability of the ~140,000 games between them in my generated set of 2 million, and in these 140K games the rest of the team was random.

Nullifying Nightmare is super OP. Some other characters are overpowered too: Blaze Boy, Greenery Giant, and Tidehollow Tyrant, and kinda Warrior of Winter all have favorable matchups against most other heroes. (Except for Nightmare, it can be easier to see this by looking at the rows instead of the columns: Blaze Boy's row [near the bottom] is mostly blue, which says that Green's win probability is lower than 50% for most of those matchups.)

The opponent Blue team has two of the four overpowered characters

In the light of the heatmap, maybe we can start to explain why the Blue composition from the previous section was favorable against most teams: it has Greenery Giant and Tidehollow Tyrant, two of the four overpowered individual picks. What if we made teams just from the most four overpowered characters, plus whoever else in the 5th position? Turns out most of these teams would rock:

The Blue team from the post only has two of these OP characters, but about the same predicted win percentage, which I don't really understand. But a good strategy is stuffing your team with heroes from these top four, and Blue is halfway there.

Arch-Alligator's effectiveness against the opponent's Blue team is confusing

From the dominance of Arch-Alligator in the previous section, I thought I'd find that he was OP, but he's actually right near 50% for most matchups. Here's something else weird: Alligator is not particularly favorable against any of the Blue team's characters! See how the squares for the Blue team from the post are blue (lower probability) in Arch-Alligator's column. He's an especially bad pick against Greenery Giant. Being a bad independent matchup against each individual team member, but great against the overall composition, suggests there are some team synergies going on that this heatmap is missing (or that I've made some error). Alligator being good against this Blue team, despite being average overall in individual match-ups, suggests one or more of the following:

  • Alligator and one or more of the other members of the team (Greenery Giant/Landslide Lord/Nullifying Nightmare/Phoenix Paladin) work well in general.
  • Alligator and 1+ other team members work well together against this particular Blue composition.
  • Alligator doesn't work particularly well with other members of the Green team, but is a counter for some combination of the Blue team members. For example, maybe two Blue team members can pair up and charge for 5 seconds in order to unleash some powerful ability, and Alligator is good at interrupting this pairing.

I have some ideas on how to look into this, but it feels hard, I've been writing this post all afternoon, and aphyer is posting the answers two days from now, so I probably won't get to it!

Notes on this analysis

I used a style of analysis called surrogate analysis here. Instead of interrogating the data directly, I instead trained a model on the data, and interrogated the model. The downside of this is if the model is bad, the analysis will lead you into strange places that don't correspond to reality. The upside is that you can ask the model questions you couldn't ask the data directly. For example, there are 65K matchups in the dataset here, but the total number of possible matchups is 19-choose-5 squared, which is 135 million. The surrogate model can give answers on all those games! If team compositions not present in the data are totally different than the ones present, then the model will give bad answers to them, but if the interactions and patterns from the data are generalizable, the surrogate can be really helpful. I think this book covers surrogate modeling well, though I've only read a few chapters.

I used the gradient-boosting algorithm XGBoost because it runs fast and I have good experiences with it. Any booster would be fine, as would a random forest, but random forest in R is comparatively slow and I didn't want to wait around too long while re-training the model as I iterated.

Here's my (messy) code.