Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 22 часа 27 минут назад

Randomized Zettelkasten Test

21 июля, 2021 - 21:59
Published on July 21, 2021 5:21 PM GMT

Motivation and Method

I read the book How to Take Smart Notes a couple years ago and have tried to keep my notes together in a Zettelkasten ever since. I currently use Obsidian, which contains 2500 individual notes collected over two or three years - some abstract to encourage being hooked up to anything and some as concrete as possible to attain specific insight. 

However, despite the enjoyment of keeping my notes in one place and looking at all of them in a big network, very few of my additions have come from connections between notes. I want to see if I've hit diminishing returns or if original insights are still available with this method. 

To that end I'm generating 25 random numbers between 1 and 2500, collecting the results to make a toy Zettelkasten a hundredth the size of the full one, and seeing how many meaningful ideas can be developed from it. Playing fast and loose with priors here, a graph with 25 vertices can have a maximum of 300 edges, and by Sturgeon's Revelation I'd expect only 10% of these to be good. So: if I find 30 worthwhile connections or products, I'll be satisfied. 

Now to roll.

The 1%-Zettelkasten
  1. Satellites (onto-cartography) - ontological machines caught up in the "gravity" of "bright objects". (See this post for more details.)
  2. Ancestors don't die young - for some definition of "young"; all of our ancestors lived long enough to reproduce.
  3. Biennials - plants that take two years to fully grow. An important distinction if you ever need to plant crops for food.
  4. "Death extinguishes envy" - a quote that stuck with me, though it's not fully true. Death removes us from the social world and solidifies a much kinder impression of us than the average person might have during our lives.
  5. Diffuse responsibility - the well-discussed state where 100 people seeing a problem will do nothing because they each feel 1/100th the responsibility.
  6. New senses make new media - speculation. A new medium can be developed by finding some new material or way of arranging things, but if it were possible for ex. to feel microscopic etchings with one's fingers I would expect a new artform to arise from it.
  7. Phenomenological barriers are symmetric - just as cisgender people can't fully feel what it would be like to be transgender (because to have the experience with any fidelity would make them no longer cisgender), transgender people can't completely fathom being cisgender. (Or replace cis and trans with gay and straight or another binary of your choice.)
  8. "Type A" and "Type B" drivers - the folk distinction between drivers who move as fast as possible towards their destination and those who prefer to enjoy the ride.
  9. Campaign progression - tabletop advice: a campaign should proceed in steps, with the players creating situations with greater control or freedom only to deal with increasingly nasty threats.
  10. Three-person fair coin tosses - the process of flipping a coin twice, where one person wins on HH, another HT, and a third TH. On TT, you start over.
  11. Voting antipatterns - conditions where voting systems produce outcomes people don't want: someone gets elected no one wants (Dark Horse), one candidate soaks up votes for the majority choice (vote-splitting), the winner by some metrics get squeezed out on either side (center squeeze), two people team up to eliminate a third but whoever helps more will lose when they go head-to-head (Burr dilemma), or pairwise winners proceeding in a cycle with no majority victor (Condorcet cycle).
  12. Wonder rooms became museums - the progression from eccentric intellectuals collecting interesting things to those things becoming a public educational resource.
  13. Do the obvious - see this post.
  14. Jungian extension - idle thought after briefly reading about Jung's concept of the anima as the unconscious feminine part of male psychology (or the animus as the masculine part of female psychology): shouldn't the anima possess an animus of its own, the inner feminine having an inner-inner masculine with an inner-inner-inner feminine, etc.?
  15. Mereological fungibility - the state of a thing being interchangeable with others of its class, type, kind etc regardless of ordering; the condition of paying ten one-dollar bills where any one could be the last without effecting the outcome.
  16. Maze cryptography - idea for a creating simple Morse-like code based on cardinal directions, then turning a message into the correct solution to a maze.
  17. Medical anatomy improves, artistic anatomy doesn't - while there is always more art to be made, our understanding of all the sketchable parts of a person is complete while our understanding of the internal workings of a person and how they work is still expanding.
  18. Catachronism - recently coined, the redefinition of the past and present in terms of a potential future.
  19. Unmotivated attackers - the security fairy-tale of attackers who simply give up after their first few tries, or at all.
  20. Aleatoric novels - early choose-your-own-adventure style novels where readers proceeded in the text by flipping coins at certain points. Or so I'd thought - aleatoricism does exist as a method of using randomness in creative work, but I can't find any mention of coin-flipping in novels and I'm not sure where I got this impression.
  21. Band names should be used for other collaborations - idle thought.
  22. Current-event entertainments - kinds of entertainment that can be checked daily or semi-daily and that involve moment-to-moment relevance, like streamers or political commentators. Arguably includes soap operas.
  23. Derangement (permutations) - when a set of items undergoes a permutation and every item winds up in a different place than it started.
  24. Extend a stone in atari - beginner Go advice, essentially not to let a single stone be surrounded.
  25. The Multiple-Stage Fallacy - outlined in this post, the phenomenon of driving the probability of anything to near-zero by breaking it into "stages", assigning probabilities to each "stage" and multiplying them, without allowing updates on likelihood once earlier "stages" are passed.
  • 1-4. Death is a bright object - many things in life get inescapably defined in relation to death, in the same way that they're defined by being on Earth or by the nature of time.
  • 1-5. Responsibility (onto-cartography) - too many agents in one system, or at a certain moment in the system's history, can lead to very little action.
  • 1-18. Catachronism as temporal gravity - whatever future we expect defines our actions in the present, and likely future states of a system can affect its behavior in the here and now.
  • 2-3. Biennials are successful - it seems to work as a growth strategy.
  • 2-4. Our ancestors were envious - they had more time to play these social games and have all the available negative reactions to each other.
  • 2-11. Evolution as avoidance of antipatterns - granting of course that evolution also involves running into many many antipatterns and then dying.
  • 4-13. Death as escape from social world - taking death here more metaphorically, the motivation behind someone deleting an account and making a new one under a different name, or a famous author publishing under a pseudonym.
  • 4-18. Death defines life up to death - treading much the same ground as before, whatever the destination of life is will inevitably define the journey of it.
  • 5-13. They're waiting on you - the obvious trained reaction to noticing you're a bystander in some situation, taking the burden of action onto yourself because someone has to.
  • 5-15. Psychological fungibility of strangers - perhaps the problem of diffuse responsibility is not that we ourselves feel less responsible, but that we see everyone else in the crowd as interchangeable and assume that like us they would do something in this situation if it were presented to them alone.
  • 5-19. Extramotivation in foes - just as some people can take it upon themselves to help when everyone else is paralyzed, expect that some people will use this special level of agency to work toward goals against your own.
  • 6-7. Fewer senses make new media - speculative again, but it's possible to communicate things through a lack of understandable media, whether by painting messages only the colorblind can read or by including sections that are unreadable as an artistic point.
  • 8-9. Type A and type B players - really a gross generalization of the player typology seen in most game-mastering guides, but could eventually converge on this genre of advice given enough nuance.
  • 9-11. Group campaign failures - collaborative storytelling can go in directions no one wants it to
  • 9-18. Campaign attractors - likely end states of a campaign, usually either of it being left unfinished or of a certain ending being achieved.
  • 9-20. RPGs as aleatoric - the injection of randomness into these games being one of the key things that separate them from other kinds of collaborative art.
  • 9-22. Play-by-post as current-event entertainment - possibly lacking relevance to the outside world, but changing daily and encouraging regular checkups (unless it's scheduled differently).
  • 10-16. Coin flip cryptography - it's possible to encode coin flips into Morse, so if someone could communicate a large series of them discreetly then a message could be sent.
  • 11-18. Steering toward antipatterns - in a system you want to crash, you should look for common failure modes (there will likely be several flavors) and move towards them.
  • 12-13. Turn wonder rooms into museums - when you have a collection of eccentric odds and ends that it's fun to show people, make it into a resource that can enrich a wider audience.
  • 12-17. Medicine as extension of art - (Edit: I forgot to include the details on this one) the connection between the human form and curing illnesses was not always historically obvious, and had to be developed as those who studied the former informed those who studied the latter.
  • 12-21. Galleries as art - galleries are mostly named after places, people or subjects, but there seems to be plenty of room for them to take on some more stylistic elements.
  • 13-17. Consider nonimproving corpuses - distinguish between bodies of knowledge that are still being completed and bodies of knowledge that have moved on to being more about themselves - the difference between group theory and James Joyce studies.
  • 13-18. Aim for a contextualizing future - look for a destination that redefines the journey as good.
  • 13-24. Look for beginner advice - self-explanatory.
  • 15-23. Fungibility is derangibility - kind of a cheat; when you can do things in any order you can do them in any order.
  • 15-25. Negative or Zenoic mereology - arguments that involve splitting things into infinitesimal parts to prey on our intuitions about parts that are only very small.
  • 19-24. Unmotivated opponents - the opponents imagined in "wouldn't it be nice" play, where you make moves in the hope that your strategy is not discovered instead of making moves that are best whether they're understood or not.
  • 23-25. Multiple Stage-ing when derangible - breaking things into stages and multiplying their probabilities seems less of a sin if those stages could happen in any order / the probabilities are independent of each other. (Typically if you're trying to break something into smaller pieces you'll find that they're dependent on each other, though.)

I have 29 results, which is suspiciously in line with such an off-the-cuff prediction, so there's probably some motivated stopping here - take it as my having reached the finish line, not testing to see how much can be produced in total. 

The prediction I made doesn't quite hold anyway, because a Zettelkasten in the wild would also include connections between these new results and the old ones. I didn't consider that at the start, but in hindsight I'd expect there to be further, diminishing growth from here. 

The usefulness of these combinations might be questionable, as are the original 25, but I don't curate my notes very heavily and take them on many subjects. It might also be leveled that my connections are not fully connective, or that they introduce too much outside information, but I remain skeptical that trying to undergird extremely disparate fields will lead to anything other than statements of the obvious or category theory. 

All in all, I'm surprised to see this working. At first I found almost no combinations at all, and it was only through specific effort that I produced most of these. It took probably two hours distributed across a couple of days to generate all 29, which implies that the kind of work I'm doing here takes more energy generally than reading off a list and being struck by inspiration. Mulling over connections this way isn't the best use of my time, typically, but I'm glad to see that it works roughly as advertised. 


Ask Not "How Are You Doing?"

21 июля, 2021 - 20:53
Published on July 21, 2021 5:53 PM GMT

Like the human body, English conversation retains certain vestigial features. Some of these are malignant, in that they impede lively discourse. Here I address the most common and damaging example I know of: the phrase, “How are you doing?”

This is a staple greeting. Throughout most of the United States and beyond, the phrase follows “hello” almost by reflex. It makes sense in theory. Asking about subjective well-being gives us immediate access to our conversation partner’s personal life, which supports building relationships. Furthermore, the question’s vagueness politely allows plenty of room to choose a topic.

This may be why the phrase has been widespread at least since Shakespeare1. Alas, far gone on the days of queen Gertrude’s response: “One woe doth tread upon another’s heel… ” Instead, the typical modern answer sounds something like this: “fine.” In fact, I rarely hear any other response.

Fine‽ I think, flabbergasted. You could have talked about anything! You could have launched into a rant about the weather or the nature of well-being in society! You could have pursued the opportunity to make an ally, spread an idea, or build rapport on common ground! Instead, you combust all the myriad branches of possibility with a single syllable: “fine”!

I should probably mention that, until very recently, I always made this mistake. Sometimes, an honest answer feels like a social gaffe. This wastes time and contributes nothing to breaking the ice. We might as well just leave it at “hello.”

Allow me to offer an alternative. The next time you greet someone, don’t resort to this inefficient ghost of a greeting. Don’t ask “how are you doing,” ask instead “what are you doing?”

This simple change has many benefits:

  • Its newness catches your conversation partner off-guard, luring them out of the chilly “stranger mode” of conversation and into a truly open discussion.
  • The focus lingers on their work. Rather than tear their attention from the task at hand, they can discuss it.
  • It’s concrete, which leads to better thinking habits.

In the spirit of balanced inquiry, let’s look at the drawbacks:

  • It draws long and expository conversations compared to the traditional “how are you doing” question. If your intent is acknowledgment rather than discussion, you may prefer “how.” However, I would object that asking someone about their health when you don’t actually care is dishonest. Maybe don’t do that.
  • Sometimes you already what someone is doing. They’re visibly walking to class or eating. You can modify the question in these situations: “what’s your next class,” or “what will you do after lunch?” A little thought easily supplies strong conversation starters.
  • It may come off as nosy. I think this is actually pretty rare, but make sure to pay attention to context and the person you’re talking to.

These drawbacks are all limited. In my experience, it’s almost always more effective to ask “what are you doing” rather than “how are you doing.” If you retain this habit over time, you may experience more enlightening conversations and a slightly enlivened social life.

1See Hamlet IV.VII, line 175: https://www.bartleby.com/46/2/47.html


The shoot-the-moon strategy

21 июля, 2021 - 19:19
Published on July 21, 2021 4:19 PM GMT

Sometimes you can solve a problem by intentionally making it "worse" to such an extreme degree that the problem goes away. Real-world examples:

  • I accidentally spilled cooking oil on my shoe, forming an unsightly stain. When soap and scrubbing failed to remove it, I instead smeared oil all over both shoes, turning them a uniform dark color and thus "eliminating" the stain.
  • Email encryption can conceal the content of messages, but not the metadata (i.e. the fact that Alice sent Bob an email). To solve this, someone came up with a protocol where every message is always sent to everyone, though only the intended recipient can decrypt it. This is hugely inefficient but it does solve the problem of metadata leakage.

Hypothetical examples:

  • If I want to avoid spoilers for a sports game that I wasn't able to watch live, I can either assiduously avoid browsing news websites, or I can use a browser extension that injects fake headlines about the outcome so I don't know what really happened.
  • If a politician knows that an embarassing video of them is about to leak, they can blunt its effect by releasing a large number of deepfake videos of themself and other politicians.

The common theme here is that you're seemingly trying to get rid of X, but what you really want is to get rid of the distinction between X and not-X. If a problem looks like this, consider whether shooting the moon is a viable strategy.


Reward splintering for AI design

21 июля, 2021 - 19:13
Published on July 21, 2021 4:13 PM GMT

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-surd + .mjx-box {display: inline-flex} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} This post will look at how model splintering can be used by an AI to extend human-specified rewards beyond its training environment, and beyond the range of what humans could do.

The key points are:

  • Most descriptive labels (eg "happiness", "human being") describe collections of correlated features, rather than fundamental concepts.
  • Model splintering is when the correlated features come apart, so that the label no longer applies so well.
  • Reward splintering is when the reward itself is defined on labels that are splintering.
  • We humans deal with these issues ourselves in various ways.
  • We may be able program an AI to deal with these in similar ways, using our feedback as needed, and extending beyond when we can no longer provide it with useful feedback.

Section 1 will use happiness as an example, defining it as a bundle of correlated features and see what happens when these start to splinter. Section 2 defines model and reward splintering in more formal terms. And finally section 3 will analyse how an AI could detect reward splintering and deal with it.

1. What is happiness? A bundle of correlated features

How might we define happiness today? Well, here's one definition:

Happiness is that feeling that comes over you when you know life is good and you can't help but smile. It's the opposite of sadness. Happiness is a sense of well-being, joy, or contentment. When people are successful, or safe, or lucky, they feel happiness.

We can also include some chemicals in the brain, such as serotonin, dopamine, or oxytocin. There are implicit necessary features there as well, that are taken as trivially true today: happiness is experienced by biological beings, that have a continuity of experience and of identity. Genuine smiles are good indicators of happiness, as is saying "I am happy" in surveys.

So, what is happiness, today? Just like rubes and bleegs, happiness is a label assigned to a bunch of correlated features. You can think of it a similar to the "g factor", an intelligence measure that is explicitly defined as a correlation of different cognitive task abilities.

And, just like rubes and bleggs, those features need not stay correlated in general environments. We can design or imagine situations where they easily come apart. People with frozen face muscles can't smile, but can certainly be happy. Would people with anterograde amnesia be truly happy? What about simple algorithms that print out "I am happy", for ever? Well, there it's a judgement call. A continuity of identity and consciousness are implicit aspects of happiness; we may decide to keep them or not. We could define the algorithm as "happy" with "happiness" expanding to cover more situations. Or we could define a new term, "simple algorithmic happiness", say, that carves out that situation. We can do the same with the anterograde amnesia (my personal instincts would be to include anterograde amnesia in a broader definition of happiness, while carving off the algorithm as something else.

Part of the reason to do that is to keep happiness as a useful descriptive term - to carve reality along its natural joints. And as reality gets more complicated or our understanding of it improves, the "natural joints" might change. For example, nationalities are much less well defined than, say, eye colour. But in terms of understanding history and human nature, nationality is a key concept, eye colour much less so. The opposite is true if we're looking at genetics. So the natural joints of reality are shifting depending on space and time, and also on the subject being addressed.

Descriptive versus prescriptive

The above looks at "happiness" as a descriptive label, and how it might come to be refined or split. There's are probably equations for how best to use labelled features in the descriptive sense, connected to the situations the AI is likely to find itself in, its own processing power and cost of computation, how useful it is for it to understand these situations, and so on.

But happiness is not just descriptive, it is often prescriptive (or normative): we would want an AI to increases happiness (among other things). So we attach value or reward labels to different state of affairs.

That makes the process much more tricky. If we say that people with anterograde amnesia don't have "true happiness", then we're not just saying that our description of happiness works better if we split it into two. We're saying that the "happiness" of those with anterograde amnesia is no longer a target for AI optimisation, i.e. that their happiness can be freely sacrificed to other goals.

There are some things we can do to extend happiness as preference/value/reward across such splintering:

  1. We can look more deeply into why we think "happiness" is important. For instance, we clearly value it as an interior state, so if "smiles" splinter from "internal feeling of happiness", we should clearly use the second.
  2. We can use our meta-preferences to extend definitions across the splintering. Consistency, respect for the individual, conservatism, simplicity, analogy with other values: these are ways we can extend the definition to new areas.
  3. When our meta-preferences become ambiguous - maybe there are multiple ways we could extend the preferences, depending on how the problem is presented to us - we might accept that multiple extrapolations are possible, and that we should take a conservative mix of them all, and accept that we'll never "learn" anything more.

We want to program an AI to be able to do that itself, checking in with us initially, but continuing beyond human capacity when we can no longer provide guidance.

2. Examples of model splintering

The AI uses a set F of features that it creates and updates itself. Only one of them is assigned by us - the feature R, the reward function. The AI also updates the probability distribution Q over these features (this defined a generalised model, M=(F,Q)). It aims to maximise the overall reward R.

When talking about M=(F,Q), unless otherwise specified, we'll refer to whatever generalised model the AI is currently using.

We'll train the AI with an initial labelled dataset D of situations; the label is the reward value for that situation. Later on, the AI may ask for a clarification (see later sections).

Basic happiness example

An AI is trained to make humans happy. Or, more precisely, it interacts with the humans sequentially, and, after the interaction, the humans click on "I'm happy with that interaction" or "I'm disappointed with that interaction".

So let fh={0,1} be a Boolean that represents how the human clicked, let fπ be the AI's policy, and, of course, R is the reward. So F={fh,fπ,R}.

In the training data D, the AI generates a policy, or we generate a policy for it, and the human clicks on happiness and disappointment. We then assign a reward of 1 to a click on happiness, and a reward of 0 on a click of disappointment (thus R=fh on D). In this initial training data, the reward and fh are the same.

The reward extends easily to the new domain

Let's first look at a situation where model changes don't change the definition of the reward. Here the AI adds another feature fm, which characterises whether the human starts the interaction in a good mood. Then the AI can improve its Q to see how fh and fπ interact with fm; presumably, a good initial mood increases the chances of fh=1.

Now the AI has a better distribution for fh, but no reason to doubt that fh is equivalent with the reward R.

A rewarded feature splinters

Now assume the AI adds another feature, fs, which checks whether the human smiles or not during the interaction; add this feature to F. Then, re-running the data D while looking for this feature, the AI realises that fs=True is almost perfectly equivalent with R=1.

Here the feature on which the reward depends has splintered: it might be fh that determines the reward, or it might be a (slightly noisy) fs. Or it might be some mixture of the two.

Though the rewarded feature has splintered, the reward itself hasn't, because fh and fs are so highly correlated: maximising one of them maximises the other, on the data the AI already has.

The reward itself splinters

To splinter the reward, the AI has to experience multiple situations cases where fs and fh are no longer correlated.

For instance, fs can be true without fh if the smiling human leaves before filling out the survey (indeed, smiling is a much more reliable sign of happiness than the artificial measure of filling out a survey). Conversely, bored humans may fill out the survey randomly, giving positive fh without fs.

This is the central example of model splintering:

  • Multiple features could explain the reward in the training data D. But these features are now known to come apart in more general situations.
Independent features become non-additive

Another situation with model splintering would be where the reward is defined clearly by two different features, but there is never any tension between them - and then new situations appear where they are in tension.

Let's imagine that the AI is a police officer and a social worker, and its goal is to bring happiness and enforce the law. So let F={fh,fp,fm,fs,fl,R} where fl is a feature checking whether the law was enforced when it needs to be.

In the training data D, there are examples with fh being True or False, while fl is undefined (no need to enforce any law). There are also examples with fl being True or False, while fh is undefined (no survey was offered). Whenever fh and fl were True, the reward R was 1, while it was 0 if they were False.

Then if the AI finds situations where both of fh and fl are defined, it doesn't know how to extend the reward, especially if their values contradict each other.

Reconvergence of multiple reward features

It's not automatically the case that as the AI learns more, rewards have to splinter more. Maybe the AI can develop another feature, fa, corresponding to human approval (or approval from its programmers). Then it can see fh and fl as being specific cases of fa - its programmers approve of humans saying they're happy, and of the law being enforced. In that case, the AI could infer a more general reward that also covers situations where fh and fl are in contradiction with each other.

Changes due to policy choices

We've been considering situations where features have become uncorrelated just because the AI has got a broader understanding of the world. But we also need to consider situations where the AI's own policy starts breaking some of the correlations that otherwise existed.

For example, we could split fl, enforce the law, into fL, a descriptive feature describing the law, and fe, a feature measuring whether that law is enforced.

Then in its training data, fL is fixed (or changes exogenously). But when the AI gets powerful, fL suddenly becomes malleable, dependent on its policy choices. This is another form of model splintering, one that we might want the AI to treat with extra caution and conservatism.

3. Dealing with model splintering Detecting reward splintering

Reward splintering occurs when there are multiple ways of expressing R, on the labelled data D, and they lead to very different rewards in the world in general.

So we could have multiple reward functions r, all consistent with R over D. We could define a distance function d(r,R) which measures how far apart r and R are on D, and a complexity measure c(r). Then the ''goodness of fit'' of r to R could be


Thus reward functions have higher fit if they are close to R on the labelled data D, and are simpler to express. Then define m(r) as the maximum expected value of r (using Qi to computer expectations), if the AI were to use an r-maximising policy.

Then the following is always positive, and gives a good measure of the divergence between maximising the weighted mixes of the r, versus maximising the individual r's:


When that metric hits a certain threshold, the AI knows that significant reward splintering has occurred.

Dealing with reward splintering: conservatism

One obvious way of dealing with reward splintering is to become conservative about the rewards.

Since human value is fragile, we would initially want to feed the AI with some specific over-conservative method of conservatism (such as smooth minimums). After learning more about our preferences, it should learn fragility of value directly, so could use a more bespoke method of conservatism[1].

Dealing with reward splintering: asking for advice

The other obvious solution is to ask humans for more reward information, and thus increase the set D on which it has reward information. Ideally, the AI would ask for information that best distinguishes between different reward functions that have high f(r) but that are hard to maximise simultaneously.

When advice starts to fail

Suppose the AI could ask question q1, that would give it labelled data D1. Alternatively, it could ask question q2, that would give it labelled data D2. And suppose that D1 and D2 would imply very different reward functions. Suppose further that the AI could deduce the Di likely to occur from the question qi.

In that case, the AI is getting close to rigging its own learning process, essentially choosing its own reward function and getting humans to sign off on it.

The situation is not yet hopeless. The AI should prefer asking questions that are 'less manipulative' and more honest. We could feed it a dataset QA of questions and answers, and label some of them as manipulative and some as not. Then the AI should choose questions that have features that are closer to the non-manipulative ones.

The AI can also update its estimate of manipulation, of QA, by proposing (non-manipulatively - notice the recursion here) some example questions and getting labelled feedback as to whether these were manipulative or not.

When advice fails

At some point, if the AI continues to grow in power and knowledge, it will reach a point where it can get the feedback Di by asking question qi - and all the qi would count as "non-manipulative" according to the criteria it has in QA. And the problem extends to QA itself - it knows that it can determine future QA and thus future qi and Di.

At this point, there's nothing that we can teach the AI. Any lesson we could give, it already knows about, or knows it could have gotten the opposite lesson instead. It would use its own judgement to extrapolate R, D, QA, thus defining and completing the process of 'idealisation'. Much of this would be things its learnt along the way, but we might want to add an extra layer of conservatism at the end of the process.

  1. Possibly with some meta-meta-considerations that modelling human conservatism is likely to underestimate the required conservatism. ↩︎


Why is Kleros valued so low?

21 июля, 2021 - 16:50
Published on July 21, 2021 1:50 PM GMT

Kleros provides a system that allows smart contract in natural language to be written by letting randomly drafted juries evaluate the contract. It seems to me that this principle is incredibly powerful for all sorts of real world applications whether it's arbitrating the outcome of a prediction market or making sure that the events listed in the prediction market follow rules spelled out in natural language. 

At the moment it seems like Ethereums transaction costs are still limiting the utility of Kleros but either further progress in Ethereum or a Ethereum Virtual Machine on another chain is likely soon able to allow Kleros to provide the services more cheaply. 

Kleros Market cap is currently at $58,597,050 (with $73,236,896 fully diluted) which is relatively low compared to other crypto-assets especially given that it has potentially so much real world use-cases compared to a lot of other blockchain projects without real world usecases. Either there's some argument against Kleros that I'm not seeing or it should be valued 100-1000X of what it's current value happens to be.  


My Marriage Vows

21 июля, 2021 - 13:48
Published on July 21, 2021 10:48 AM GMT

I'm getting married. We decided to take marriage vows very seriously, and write vows that we will be fully committed to uphold. These vows are going to be a commitment no weaker than any promise I ever made or any contract I ever signed. Therefore, it is very important to avoid serious errors in their content.

I'm interested to hear feedback of the form "making these vows might turn out to be a big mistake for you, and here is why"[1] or of the form "here is how the spirit of these vows can be implemented better". Given that this is a community which nurtures security mindset, I have great expectations :) More precisely, I am less interested in extreme nitpicking / rule-lawyering, since that should be neutralized by the Vow of Good Faith anyway (but tell me if you think I'm wrong about this!) and more in serious problems that can arise in at least semi-realistic situations. (Of course, since many of us here expect a Singularity in a few decades, semi-realistic is not a very high bar ;)

Without further adieu, the vows:

I, [name], solemnly pledge to [other name] three sacred Vows as I take [pronoun] to be my [spouse]. These vows are completely sincere, literal, binding and irrevocable from the moment both of us take the Vows and as long as we both live, or until the marriage is dissolved or until my [spouse]’s unconscionably[2] breaks [pronoun]’s own Vows which I believe in all likelihood will never happen. Let everyone present be my witness.

The First Vow is that of Honesty. I will never set out to deceive my [spouse] on purpose without [pronoun]’s unambiguous consent[3], without exception. I will also never withhold information that [pronoun] would in hindsight prefer to know[4]. The only exception to the latter is when this information was given to me in confidence by a third party as part of an agreement which was made in compliance with all Vows[5]. If for any reason I break my vow, I will act to repair the error as fast as reasonably possible.

The Second Vow is that of Concord. Everything I do will be according to the policy which is the Kalai-Smorodinski solution to the bargaining problem defined by my [spouse]’s and my own priors and utility functions, with the disagreement point set at the counterfactual in which we did not marry. This policy is deemed to be determined a priori and not a posteriori. That is, it requires us to act as if we made all precommitments that would a priori be beneficial from a Kalai-Smorodinksi bargaining point of view[6]. Moreover, if I deviate from this policy for any reason then I will return to optimal behavior as soon as possible, while preserving my [spouse]’s a priori expected utility if at all possible[7]. Finally, a hypothetical act of dissolving this marriage would also fall under the purview of this Vow[8].

The Third Vow is that of Good Faith, which augments and clarifies all three Vows. The spirit of the Vows takes precedence over the letter. When there’s some doubt or dispute as to how to interpret the Vows, the chosen interpretation should be that which my [spouse] and I would agree on at the time of our wedding, in the counterfactual in which the source of said doubt or dispute would be revealed to us and understood by us with all of its implications at that time as well as we understand it at the time it actually surfaced[9].

  1. Conditional on the assumption that my decision to marry is about as well-grounded as one can expect. I am not soliciting criticism of my choice of spouse! ↩︎

  2. Meaning that it's a grave or consistent violation rather than a minor lapse. ↩︎

  3. Consent is mentioned to allow us to e.g. play tabletop games where you're supposed to deceive each other. ↩︎

  4. That is, information X such that if the spouse knew X, they would believe it's good that they found out about it. This excludes information which is not important (knowing X is practically useless) and infohazards (knowing X is actively harmful). ↩︎

  5. If I enter an agreement with a third party in violation of the Vow of Concord, the Vow of Honesty takes precedence over the agreement and I might have to violate the latter and pay whatever fine is necessary. ↩︎

  6. We are taking an "updateless" perspective here. The disagreement point is fixed in the counterfactual in which we didn't marry in the first place, it does not move to the counterfactual of divorce. Notice also that marriage is guaranteed to be an a priori Pareto improvement over no-marriage because this is our current estimate, even if it turns out to be false a posteriori. ↩︎

  7. If the violation shifts the Pareto frontier such that the previous optimum is outside of it, the new Pareto optimum is chosen s.t. the violating party bears the cost. ↩︎

  8. This makes all of the Vows weightier than they otherwise would be. The Vows can be unmade by dissolving the marriage, but the act of dissolving the marriage is in itself subject to the Vow of Concord, which limits the ability to dissolve it unilaterally. ↩︎

  9. In other words, interpretation is according to the extrapolated volition of us at the time of our wedding, where the extrapolation is towards our knowledge and intellectual ability at the time of making the judgment. ↩︎


Entropic boundary conditions towards safe artificial superintelligence

21 июля, 2021 - 13:27
Published on July 20, 2021 10:15 PM GMT

I wanted to share a recently published paper on mechanisms to decrease the likelihood of ASI being destructive towards living beings in general.


Artificial superintelligent (ASI) agents that will not cause harm to humans or other organisms are central to mitigating a growing contemporary global safety concern as artificial intelligent agents become more sophisticated. We argue that it is not necessary to resort to implementing an explicit theory of ethics, and that doing so may entail intractable difficulties and unacceptable risks. We attempt to provide some insight into the matter by defining a minimal set of boundary conditions potentially capable of decreasing the probability of conflict with synthetic intellects intended to prevent aggression towards organisms. Our argument departs from causal entropic forces as good general predictors of future action in ASI agents. We reason that maximising future freedom of action implies reducing the amount of repeated computation needed to find good solutions to a large number of problems, for which living systems are good exemplars: a safe ASI should find living organisms intrinsically valuable. We describe empirically-bounded ASI agents whose actions are constrained by the character of physical laws and their own evolutionary history as emerging from H. sapiens, conceptually and memetically, if not genetically. Plausible consequences and practical concerns for experimentation are characterised, and implications for life in the universe are discussed.

Our approach attempts to avoid direct implementation of machine ethics by harnessing the concept of causal entropic forces (i.e. forces that emerge from the microscale as a result of emergent phenomena having a mechanistic origin), and then building a set of boundary conditions for a new class of agents:

Let us define an empirically bounded ASI agent (EBAA) as a type of superintelligent agent whose behaviour is driven by a set of interlocking causal entropic forces and a minimal set of boundary conditions informed by empirical measurements of its accessible portion of the universe. Its entropic forces and its boundary conditions define and constrain its top-level goal satisfaction process.

These agents attempt to satisfy two goals:

(1) Build predictive empirical explanations for events in the accessible universe as sophisticated and generally applicable as possible.


(2) Choose histories that maximise long-term sustained information gain.

We define each boundary condition, as well as auxiliary principles that can help accelerate the search from life-friendly ASI solutions. We also provide a model in connection to the Drake equation and life in the cosmos, as well as reasoning around brain-machine interfaces.

A short teaser from the Conclusions section:

In this paper, we have developed a rigorous speculation around a viable path for the development of safe artificial superintelligence by equating intelligence with a set of embodied, local causal entropic forces that maximise future freedom of action, and by postulating top-level goals, boundary condi- tions and auxiliary principles rooted in our best understanding of physical laws as safeguards which are likely to remain as ASI agents increase their sophistication.

While it is almost certain that an ASI agents will replace these boundary conditions and principles, those provided here appear to have higher chance of leading to safe solutions for humans and other lifeforms, and be more directly implementable than the solutions described by research around ethics and critical infrastructure. Our main contention is that constructing ASI agents solely for the sake of human benefit is likely lead to unexpected and possibly catastrophic consequences, and that the safer scenario is to imbue ASI agents with a desire to experience interactions with very advanced forms of intelligence.

I would be happy to share the complete paper via email to those interested.


Is there a reasonable reading according to which Baric, Shi et al 2015 isn't gain-of-function research?

21 июля, 2021 - 11:19
Published on July 21, 2021 8:19 AM GMT

From the A SARS-like cluster of circulating bat coronaviruses shows potential for human emergence by Baric, Shi et al:

Wild-type SARS-CoV (Urbani), mouse-adapted SARS-CoV (MA15) and chimeric SARS-like CoVs were cultured on Vero E6 cells (obtained from United States Army Medical Research Institute of Infectious Diseases), grown in Dulbecco's modified Eagle's medium (DMEM) (Gibco, CA) and 5% fetal clone serum (FCS) (Hyclone, South Logan, UT) along with antibiotic/antimycotic (Gibco, Carlsbad, CA). DBT cells (Baric laboratory, source unknown) expressing ACE2 orthologs have been previously described for both human and civet; bat Ace2 sequence was based on that from Rhinolophus leschenaulti, and DBT cells expressing bat Ace2 were established as described previously8. Pseudotyping experiments were similar to those using an HIV-based pseudovirus, prepared as previously described10, and examined on HeLa cells (Wuhan Institute of Virology) that expressed ACE2 orthologs. 

To me building chimeric viruses and then infact human cells (HeLa cells are human cells) looks like dangerous gain-of-function research. Fauci seems to argue that someone the NIH is able to define this work as not being gain-of-function research. To me this redefinition seems to be the bureaucratic way they circumvent the gain-of-function moratorium. Before the moratorium was imposed Fauci was arguing against it and the moratorium wasn't imposed by anyone in the NIH or the HHS but the Office of Science and Technology Policy. To me that looks like a way to evade safety regulation by the NIH by dedefining terms because the NIH didn't like the moratorium.

This question is about more then just assigning guilt for things that happened in 2015. If we want to prevent further risk, getting the NIH to accept that growing chimeric viruses that infect human cells is what the gain-of-function regulation is supposed to prevent seems to me to be very important. 

It's likely also a good case study for evading safety regulation and we should think about it in other context as well. If we end up with AI safety regulation, how do we prevent the people causing problems from just redefining the terms so that it doesn't apply to them?

If anyone has a defense of not classifying this work as gain-of-function research I'm also happy to hear that.






The Utility Function of a Prepper

21 июля, 2021 - 05:26
Published on July 21, 2021 2:26 AM GMT

Looking at the top posts on /r/prepping right now, you can find these two images:

Various "survival" gear. Good luck carrying all that wet food!


I think this one is cute, and I might talk more about it in another post.

I used to think that people who are stockpiling food, water, and guns out of fear of are acting irrationally. But then I realized maybe they just have different utility functions than me.

I think preppers have a satisficing utility function. As an exaggerated example, imagine a utility function where you got 1 utility each day from having enough food and water, and 0 utility from not having enough food or water. If you had that utility function, you really should start stockpiling food and water immediately!

Arguments about prepping are often have the general form of a prepper saying "it improves the worst-case outcomes in lots of scenarios" and a non-prepper saying "but it doesn't maximize expected value!". If you accept that the source of the disagreement is preppers having satisficing utility functions, and non-preppers having maximizing utility functions, you can see the futility of that line of argument. Telling someone their utility function is wrong won't get them to change their utility function.

Once I started thinking about preppers as satisficers, I realized why they are so worried about civilizational collapse. If you feel reliant on other people to provide you with the resources you need to meet your satisficing threshold, then reducing that reliance would be a high priority.

While I have you here and we are talking about prepping, I'm going to try to convince you to store a little water in your house. Water is one of the largest (by volume) necessary inputs to human life. If the water main burst in your house, it would be highly inconvenient and would require moving. If a natural disaster took out running water in your neighborhood or city, it might be a humanitarian crisis. Buying some large jugs of water and stashing them under your bed is a very cheap, low-cost intervention that has a decent chance of greatly improving your life if you lose access to running water.


Funds are available to support LessWrong groups, among others

21 июля, 2021 - 04:11
Published on July 21, 2021 1:11 AM GMT

(Cross-posted from the EA Forum)

This post was written by Buck and Claire Zabel but it’s written in Buck’s voice, and “I” here refers to Buck, because it’s about grantmaking that he might do. (Claire contributed in her personal capacity, not as an Open Phil grantmaker). 

In addition to accepting applications for EA groups in some locations as part of my EAIF grantmaking, I am interested in evaluating applications from people who run groups (in-person or online, full-time or part-time) on a variety of related topics, including:

  • Reading groups, eg for The Precipice or Scout Mindset
  • Groups at companies, online-only groups, and other groups not based in particular geographic locations or universities.
  • Groups discussing blogs or forums that are popular with EAs, such as Slate Star Codex / Astral Codex Ten or LessWrong.
  • Longtermist-only, AI-centric or biosafety-centric groups, animal welfare groups, or other groups that address only a single EA cause area. (I might refer these applications to the Long-Term Future Fund or the Animal Welfare Fund as appropriate; both of these funds have confirmed to me that they’re interested in making grants of this type.)

I also welcome applications from people who do or want to do work for existing groups, or group organizers who want funding to hire someone else to work with them. Eg: 

  • Maintaining or overhauling group websites, if you think this is worthwhile for your particular group
  • Working 10hrs/week on a student group
  • Running group mailing lists

In cases where the project/expense isn’t a good fit for the EA Funds, but I think it’s worth supporting, I am likely able to offer alternative sources of funds.

I might stop doing this if someone appears who’s able to commit more time and thought to funding and supporting these kinds of groups, but for the time being I want to offer folks who want to work on these kinds of things a chance to request support.

I think that people who put serious time into creating high-quality groups deserve compensation for the time they put in, so please don’t let thoughts like “I only work on this for 10 hours a week” or “I’m happy to do this in a volunteer capacity” discourage you from applying. If you’re unsure if something is a reasonable fit, feel free to email me (bshlegeris@gmail.com) and ask before applying. Depending on your cost of living, ask for a rate of $20-50 per hour (this includes employer's payroll tax and would correspond to ~$15-40/h gross salary).

The EAIF application form is here; you should also feel free to email me any questions you have about this.


Punishing the good

21 июля, 2021 - 02:30
Published on July 20, 2021 11:30 PM GMT

Should you punish people for wronging others, or for making the wrong call about wronging others?

For example:

  1. A newspaper sends me annoying emails all the time, but suppose that empirically if they didn’t behave like this, they would get markedly fewer subscribers, and may not survive. And suppose their survival is in fact worth a little annoyance for a lot of people, we all agree. Such that if I was in their position, I agree that I would send out the annoying emails. Should I resent them and unsubscribe from their paper for their antisocial behavior, or praise them and be friendly because overall I think they made the right call?
  2. Suppose Bob eats beef, which he thinks makes him feel somewhat better and so be better able to carry out his job as a diplomat negotiating issues in which tens of thousands of lives are at stake. He also thinks it is pretty bad for the cows, but worth it on net. Suppose he’s right about all of this. Five hundred years later, carnivory is illegal and hated, and historians report that Bob, while in other regards a hero, did eat beef. Should the people of 2521 think of Bob as an ambiguous figure, worthy of both pride and contempt? or should they treat him as purely a hero, who made the best choice in his circumstances?

I have one intuition that says, ‘how can you punish someone for doing the very best thing they could have done? What did you want them to do? And are you going to not punish the alternative person, who made a worse choice for the world, but didn’t harm someone in the process? Are you just going to punish everyone different amounts?’

But an argument for the other side—for punishing people for doing the right thing—is that it is needed to get the incentives straight. If Alice does $100 of harm to Bruce to provide $1000 of help to Carrie, then let’s suppose that that’s good (ignoring the potential violation of property rights, which seems like it shouldn’t be ignored ultimately). But if we let such things pass, then Alice might also do this when she guesses that is only worth $60 to Carrie, if she cares about Carrie more than Bruce. Whereas if we always punish Alice just as much as she harmed Bruce, then she will take the action exactly when she would think it worth it if it was her own welfare at stake, rather than Bruce’s. (This is just the general argument for internalizing externalities - having people pay for the costs they impose on others.)

This resolution is weirder to the extent that the punishment is in the form of social disgrace and the like. It’s one thing to charge Bob money for his harms to cows, and another to go around saying ‘Bob made the best altruistic decisions he could, and I would do the same in his place. Also I do think he’s contemptible.’

It also leaves Bob in a weird position, in which he feels fine about his decision to eat beef, but also considers himself a bit of a reprehensible baddie. Should this bother him? Should he try to reform?

I’m still inclined toward punishing such people, or alternately to think that the issue should be treated with more nuance than I have done, e.g. distinguishing punishments from others’ opinions of you, and more straightforward punishments.


Working With Monsters

20 июля, 2021 - 18:23
Published on July 20, 2021 3:23 PM GMT

This is a fictional piece based on Sort By Controversial. You do not need to read that first, though it may make Scissor Statements feel more real. Content Warning: semipolitical. Views expressed by characters in this piece are not necessarily the views of the author.

I stared out at a parking lot, the pavement cracked and growing grass. A few cars could still be seen, every one with a shattered windshield or no tires or bashed-in roof, one even laying on its side. Of the buildings in sight, two had clearly burned, only blackened reinforced concrete skeletons left behind. To the left, an overpass had collapsed. To the right, the road was cut by a hole four meters across. Everywhere, trees and vines climbed the remains of the small city. The collapsed ceilings and shattered windows and nests of small animals in the once-hospital behind me seemed remarkably minor damage, relatively speaking.

Eighty years of cryonic freeze, and I woke to a post-apocalyptic dystopia.

“It’s all like that,” said a voice behind me. One of my… rescuers? Awakeners. He went by Red. “Whole world’s like that.”

“What happened?” I asked. “Bioweapon?”

“Scissor,” replied a woman, walking through the empty doorway behind Red. Judge, he’d called her earlier.

I raised an eyebrow, and waited for elaboration. Apparently they expected a long conversation - both took a few seconds to get comfortable, Red leaning up against the wall in a patch of shade, Judge righting an overturned bench to sit on. It was Red who took up the conversation thread.

“Let’s start with an ethical question,” he began, then laid out a simple scenario. “So,” he asked once finished, “blue or green?”.

“Blue,” I replied. “Obviously. Is this one of those things where you try to draw an analogy from this nice obvious case to a more complicated one where it isn’t so obvious?”

“No,” Judge cut in, “It’s just that question. But you need some more background.”

“There was a writer in your time who coined the term ‘scissor statement’,” Red explained, “It’s a statement optimized to be as controversial as possible, to generate maximum conflict. To get a really powerful scissor, you need AI, but the media environment of your time was already selecting for controversy in order to draw clicks.”

“Oh no,” I said, “I read about that… and the question you asked, green or blue, it seems completely obvious, like anyone who’d say green would have to be trolling or delusional or a threat to society or something… but that’s exactly how scissor statements work…”

“Exactly,” replied Judge. “The answer seems completely obvious to everyone, yet people disagree about which answer is obviously-correct. And someone with the opposite answer seems like a monster, a threat to the world, like a serial killer or a child torturer or a war criminal. They need to be put down for the good of society.”

I hesitated. I knew I shouldn’t ask, but… “So, you two…”

Judge casually shifted position, placing a hand on some kind of weapon on her belt. I glanced at Red, and only then noticed that his body was slightly tensed, as if ready to run. Or fight.

“I’m a blue, same as you,” said Judge. Then she pointed to Red. “He’s a green.”

I felt a wave of disbelief, then disgust, then fury. It was so wrong, how could anyone even consider green... I took a step toward him, intent on punching his empty face even if I got shot in the process.

“Stop,” said Judge, “unless you want to get tazed.” She was holding her weapon aimed at me, now. Red hadn’t moved. If he had, I’d probably have charged him. But Judge wasn’t the monster here… wait.

I turned to Judge, and felt a different sort of anger.

“How can you just stand there?”, I asked. “You know that he’s in the wrong, that he’s a monster, that he deserves to be put down, preferably slowly and painfully!” I was yelling at Judge, now, pointing at Red with one hand and gesticulating with the other. “How can you work with him!?”

Judge held my eyes for a moment, unruffled, before replying. “Take a deep breath,” she finally said, “calm yourself down, take a seat, and I’ll explain.”

I looked down, eyed the tazer for a moment, closed my eyes, then did as she asked. Breathe in, breathe out. After a few slow breaths, I glanced around, then chose a fallen tree for a seat - positioning Judge between Red and myself. Judge raised an eyebrow, I nodded, and she resumed her explanation.

“You can guess, now, how it went down. There were warning shots, controversies which were bad but not bad enough to destroy the world. But then the green/blue question came along, the same question you just heard. It was almost perfectly split, 50/50, cutting across political and geographical and cultural lines. Brothers and sisters came to blows. Bosses fired employees, and employees sued.  Everyone thought they were in the right, that the other side was blatantly lying, that the other side deserved punishment while their side deserved an apology for the other side’s punishments. That they had to stand for what was right, bravely fight injustice, that it would be wrong to back down.”

I could imagine it. What I felt, toward Red - it felt wrong to overlook that, to back down. To let injustice pass unanswered.

“It just kept escalating, until bodies started to pile up, and soon ninety-five percent of the world population was dead. Most people didn’t even try to hole up and ride out the storm - they wanted to fight for what was right, to bring justice, to keep the light in the world.”

Judge shrugged, then continued. “There are still pockets here and there, where one side or the other gained the upper hand and built a stronghold. Those groups still fight each other. But most of what’s left is ruins, and people like us who pick over them.”

“So why aren’t you fighting?” I asked. “How can you overlook it?”

Judge sighed. “I was a lawyer, before Scissor.” She jerked her head toward Red. “He was too. We even came across each other, from time to time. We were both criminal defense attorneys, with similar clients in some ways, though very different motivations.

“Red was… not exactly a bleeding heart, but definitely a man of principles. He’d made a lot of money early on, and mostly did pro-bono work. He defended the people nobody else would take. Child abusers, serial killers, monsters who everyone knew were guilty. Even Red thought they were guilty, and deserved life in prison, maybe even a death sentence. But he was one of those people who believed that even the worst criminals had to have a proper trial and a strong defense, because it was the only way our system could work. So he defended the monsters. Man of principle.

“As for me, I was a mob lawyer. I defended gangsters, loan sharks, arms dealers… and their friends and families. It was the families who were the worst - the brothers and sons who sought sadistic thrills, knowing they’d be protected. But it was interesting work, the challenge of defending the undefendable, and it paid a fortune.

“We hated each other, back in the day. Still do, on some level. He was the martyr, the white knight putting on airs of morality while defending monsters. And I was the straightforward villain, fighting for money and kicks. But when Scissor came, we had one thing in common: we were both willing to work with monsters. And that turned out to be the only thing which mattered.”

I nodded. “So you hated each other, but you’d both spent years working with people you hated, so working with each other was… viable. You even had a basis to trust one another, in some weird way, because you each knew that the other could work with people they hated.”

“Exactly. In the post-scissor world, people who can work with monsters are basically the only people left. We form mixed groups - Red negotiates with Greens for us, I negotiate with Blues. They can tell, when they ask whether you’re Blue or Green - few people can lie convincingly, with that much emotion wrapped up in it. A single-color group would eventually encounter the opposite single-color group, and they’d kill each other. So when we meet other groups, they have some Blues and some Greens, and we don’t fight about it. We talk, we trade, we go our separate ways. We let the injustice sit, work with the monsters, because that’s the only way to survive in this world.

“And now you have to make a choice. You can go out in a blaze of glory, fight for what you know is right, and maybe take down a few moral monsters in the process. Or you can choose to live and let live, to let injustice go unanswered, to work with the monsters you hate. It’s up to you.”


An examination of Metaculus' resolved AI predictions and their implications for AI timelines

20 июля, 2021 - 12:08
Published on July 20, 2021 9:08 AM GMT

Cross-posted from the EA forum


Metaculus is a forecasting website which aggregates quantitative predictions of future events. One topic of interest on Metaculus is artificial intelligence. Here I look at what we might be able to learn from how the questions on this subject have gone so far, in particular, how the predictions of the Metaculus community have performed. If they have done poorly, it would be of value to making future predictions and interpreting existing ones to know if there are any patterns to this which might reveal common mistakes in making AI related predictions.


There are three main types of questions I looked at - Date based questions, numeric range questions, and binary questions. 

  • For date questions, slightly fewer than the Metaculus community expected have resolved by now, but the sample size is small.
  • For numeric questions, it did not seem like the community was biased towards predicting faster or slower timelines than the question resolution implied, but it did look like the community was quite overconfident in their ability to predict the resolution.
  • For binary questions, it looked like the community expected more developments to occur than actually happened, but they were appropriately not very confident in their predictions.

Overall it looked like there was weak evidence to suggest the community expected more AI progress than actually occurred, but this was not conclusive. 

Data I used

I got data from all 259 questions on Metaculus with the category “Computer Science - AI and Machine Learning” (169 questions) or on the AI subdomain (90 extra questions, there was some overlap). This was quite a manual process; see the appendix for an explanation of the process. The data came in the form of a json object containing both current and historical data for each question. 

I took the community prediction 25% of the way through the question lifetime for each question (i.e., 25% of the way from when the question is opened to when it closes). This was to avoid the cases where resolution becomes obvious one way or another and goes to 1% or 99%, and in order to only count each question once. The AI subdomain questions generally had fewer predictions on them (at the 25% mark) with 12 vs 39 predictions in the median case. I did not differentiate them in this analysis.

The questions were in 3 formats:

  • Date continuous - “when will [event] occur?”, resolving as a date.
  • Numeric continuous - resolving as a number within a specified range, or as being above or below the range bounds. These were quite varied and included questions like “How many papers will be published in [field]?” or “What score will the best published algorithm get on [test] by [date]?”
  • Binary - “Will [event] occur?”, resolving positively or negatively.

In the case of continuous questions, the viable range, resolution, and medians/quartiles were mapped by default to values in [0,1]. The scale was provided in the question json, but I didn’t use this at any point. Metaculus permits both linear and log scales, I did not differentiate between questions using either format. 

I manually inspected the binary and numeric continuous resolved questions in order to more easily permit drawing implications about whether people were "too optimistic" or "too pessimistic", in a very rough sense, about AI safety considerations. This involved reversing the ordering of points on some questions such that 0 on the [0,1] scale consistently corresponded to “faster timelines” or “less safe” and 1 to the opposite. In the case of binary resolution, I did the same thing, but as most were of the form “will development occur” for some benchmark of AI progress there was less to do here. This involved flipping the probabilities for 3 of the questions which were phrased as “this resolves positive if X does not happen” such that positive resolution is treated as negative, and a 10% prediction is taken as 90% of the converse, etc. 

For continuous numeric questions, I conflated “positive for AI safety” with “suggests slower timelines” and flipped the top and bottom of the ranges of some questions to make this consistent. There is a link to how I did this in the appendix. 

I made no such adjustments to the date questions, as most were of the form “when will [event] occur”, and with the exception of this meta question it seemed clear that early resolution of the resolved questions involved faster timelines implications. 

Results for Date Questions

There were 41 date questions, of which 7 have resolved by now.

6/7 resolved date questions resolved early, 5/6 very early (that is, before their predicted 25th percentile time). What can we say from this? An obstacle to drawing inferences from this is that there is clearly a selection effect where the resolved questions out of a bunch of largely long time horizon questions are more likely to have resolved on the early side. 

To try to get around this, I have looked at all unresolved questions, to check how many the Metaculus community would have expected to resolve by now. Of all 41 date questions, 4 were predicted to be 25%-50% to resolve by now (1 did), 2 were predicted to be 50%-75% to resolve by now (1 did), and 5 were predicted to be at least 75% to resolve by now (2 did).

This suggests that predictors were somewhat overconfident that things would happen, if anything, though the sample size is small. I am hesitant to put much weight on this, however, as it seems one of the >75% to resolve by now questions which did not resolve perhaps should have, according to the comments on the question, and if this is the case then 3 out of 5 of the >75% predictions coming true would be considerably weaker evidence. 

Of the remaining 30 questions, which were all predicted as <25% to resolve by now, 3 did, which is difficult to interpret as some of these were probably <1% to resolve by now and others may have been higher (I don’t have more granularity than the quartiles to test this).

Results for Numeric Questions

I looked at only the resolved numeric and binary questions as I did not think that the selection effects which worried me on date questions applied here.

Of 16 resolved numeric questions with obvious implications for faster/slower timelines or safer/less safe (henceforth I'm conflating faster with less safe and slower with safer), 8 resolved as faster/less safe than the Metaculus algorithm predicted, 8 as slower/more safe. 

Of these 16 questions, only 2 (both resolving as slightly slower progress than expected) resolved within the 25%-75% confidence interval, with the other 14 resolving outside this. The two which resolved within this interval were both of the form "how many papers will be published on [topic] in [year]" (here and here). To me, that seems intuitively easier to forecast (I guess straightforward extrapolation from trends will get you most of the way there) than e.g. "How many Machines will produce greater than or equal to 900 GTEPs on the Graph500 BFS benchmark?" which requires both projecting the number of machines capable of meeting the benchmark, and the continued use of that benchmark.

Much faster here is “below 25th percentile”, much slower is “above 75th percentile” and slightly slower is “in the 50th-75th percentile range”.

This suggests that for these questions the predictors were too confident in their understanding of the various questions, and were surprised in both positive and negative directions.

Results for Binary Questions

For the binary questions, there were 41 questions which were of the form "will [development] occur by [date]?", and four other questions which did not seem to have much bearing on AI timelines (“will a paper suggesting we are in a simulation be published?”, “will this particular named bot win this tournament for AIs?”, “Will there be some kind of controversy surrounding the OpenAI Five Finals? “ and “Will the number of people who have joined a Metaculus AI workshop double in the next 6 weeks? “) which I excluded.

A standard way of looking at performance on binary predictions is to use Brier Scores. The Brier Score is a scoring rule for probabilistic forecasts. It is equivalent to the mean squared error of forecasts for binary forecasts, so a lower score is better. Predicting 50% on every question will yield a Brier score of 0.25, predicting the correct resolution with 100% confidence will score 0, and predicting the wrong resolution with 100% confidence will score 1.

The Brier Scores of the community AI predictions were 0.2027, with an overconfidence score of -0.57%. This means that on average, moving a prediction slightly away from 50-50 would improve it, but this is a pretty negligible overconfidence score compared to the scores I found in my previous work. Had the predictors been perfectly calibrated, they would have expected to score 0.2032, so the community appeared for these questions to be aware of their own uncertainty. Note that this does not mean they were very well calibrated, as we shall see. 

Brier scores do not give us much insight into whether there was a bias towards predicting positive or negative resolution, however. I look at this next.

Looking at the distribution of predictions, the 25th, 50th and 75th percentile of predictors expected 13.65, 17.48 and 21.16 propositions to resolve positively, and 11 occurred, suggesting somewhat slower than expected progress. How surprising is this? I estimate this using a Poisson Binomial Distribution, using the discrete Fourier transform approximation for the probability mass function. 

11 or fewer positive resolutions has probability 2.48%. What if we took the 25th percentile of predictors probabilities?

Now 11 or fewer positive resolutions has 25.05% probability.

So it seems plausible that the community was somewhat biased to think events will happen when predicting on these questions. This is similar to what I previously found for longer time horizon questions, though the time horizons here on the resolved binary questions were all less than 1 year, with a median of 77 days.

A caveat: it is unlikely that these questions are totally independent of each other, which this distribution assumes is the case. I expect “progress in AI” to be correlated, such that we could expect an acceleration in progress to enable several questions which are running concurrently to resolve positively, or a slowdown to affect many questions, so these probabilities are more like lower bounds. I think this is relatively weak evidence in light of this.

What caused predictions to be off?

A somewhat benign seeming occurrence which occasionally threw people off was performance on a task not improving as much as expected because it looked like nobody had tried (e.g here) or the technical specifications had been met but not submitted to a journal, as required by this question. Sometimes unexpected factors threw the result way off, such as when spiking GPU prices due to bitcoin mining sent this question way outside expectations.

These sorts of unknowns suggest that predictors were generally overconfident of their ability to accurately guess these questions, and would have been better off making less confident predictions.

I don’t think there were any slam dunk conclusions to take from this study. There was weak evidence that Metaculus forecasters were biased towards thinking AI things will happen sooner than they actually did, and that they were particularly overconfident about numeric questions, suggesting these questions were harder to predict on.

I say weak evidence, because I don’t endorse a confident takeaway from this data from a faster/slower timelines perspective, as the question set is not particularly suited to aggregate this info and the errors in the continuous questions seem to be in both optimistic and pessimistic directions. 

As well as this, it is likely that in a reasonable attempt to address this it seems likely that some questions should carry much more weight than others, and I made no attempt to do this.


All code for this post is contained in this GitHub repository, available under a GNU General Public License v3.0.

Getting the questions from Metaculus is explained in the readme here.

The python notebook and links to the google sheets I used are also in that repository. Please feel free to contact me if you have any questions about this.


This essay is a project of Rethink Priorities.

It was written by Charles Dillon, a volunteer for Rethink Priorities. Thanks to Peter Wildeford for the idea to look into this topic, and Michael Aird and Peter Wildeford for advice and feedback on this post. If you like our work, please consider subscribing to our newsletter. You can see all our work to date here.



Pi Sound Box

20 июля, 2021 - 07:10
Published on July 20, 2021 4:10 AM GMT

In an effort to reduce clutter, simplify setup, and avoid damaging my gear, I've packaged up my whistle synth and rhythm stage setup into a box:

This consolidates:

  • Raspberry Pi
  • Power supply
  • USB hub
  • Sound card x2
  • DI box
  • Internal cables and adapters

The box is aluminum, 10x8x4. I like working with aluminum: it's very strong, but it's soft enough to drill without too much trouble. I removed the front and back plates and clamped them between blocks of wood for drilling.

The DI is a Radial PRO D2. I removed the shell and reused the original screws to attach it to my case. I also carefully aligned it so I could reuse the existing XLR ports; 1/4" jacks are pretty easy but XLR is more hassle.

I should have included an ethernet port so I can reprogram it without opening it up. I may come back and add one.

It's very much a prototype: a production version of something like this would probably only be about 4" on a side. Combining existing parts does not minimize size or weight.

(In setting this up I moved my rhythm stage setup (github), which I essentially hadn't touched in a year, from a Mac to the Raspberry Pi. This meant porting it to Linux, which meant learning how to do MIDI on Linux. It also meant dropping a few sounds which aren't open source only sort of support Linux, primarily the sax and trombone. Since I'm no longer playing the Axis 49 because of wrist issues, though, this isn't actually giving up very much.)


Book review: The Explanation of Ideology

20 июля, 2021 - 06:42
Published on July 20, 2021 3:42 AM GMT

Book review: The Explanation of Ideology: Family Structure and Social Systems, by Emmanuel Todd.

What features distinguish countries that embraced communism from countries that resisted?

Why did Islam spread rapidly for a century and a half, then see relatively few changes in its boundaries for more than a millennium?

Todd's answer is that the structure of the family is a good deal more stable than ideologies and religions, and different family structures create different constraints on what ideologies and religions will be accepted. Published in 1983, it still seems little-known.

Maybe this neglect is most pronounced in the English-speaking parts of the world, where one family structure is overwhelmingly popular, and alternatives are often dismissed as primitive relics. France seems more conducive to Todd's insights, since France has four different family structures, each dominating in various regions.

Here are the main dimensions that he uses to categorize family structures:

  • Exogamous: marriages between cousins are heavily discouraged, versus endogamous: marriages between cousins are common.
  • Nuclear versus community: Are children expected to move away from the parental home upon marriage?
  • Equal versus unequal. Beware that this is a nonstandard meaning, focused on relations between brothers, especially on whether inheritances are split equally. Todd says this is inversely correlated with sexual equality. He seems willing to accept sexual inequality as not worth trying to eliminate ("male dominance, a principle ... which is in practice much more universal than the incest taboo").
  • Liberty versus authority. This is mostly about parental authority over children.

Here are his categories, listed in roughly descending order of how many Europeans practice them (this is Todd's order; the book is a bit Eurocentric).

Exogamous Community

This system is equal, authoritarian, and universalist. It mostly coincides with countries that adopted communism at some point, plus Finland and northern India.

It is relatively unstable, tending to produce features such as communism, which wages war on the family, and urbanization, which pushes toward a more nuclear family. But then why is it the most populous family system (41% of the world population when the book was written)? Todd does not ask. Some of it might be due to generating population growth, but that can't be a full explanation. It seems unlikely to be due to people especially enjoying it, as it has the highest suicide rate of any family system.

Why is Cuba, with its apparently Western culture, the sole country in the New World that's fertile for Communism? Todd doesn't have direct evidence of Cuba's family system, yet he maintains it's an exogamous community system. After some hand-wavy talk of other sources of Cuban culture, he pieces together hints from the suicide rate and census data. The census data does suggest that married children have some tendency to live with parents (but is that due to a housing shortage more than to culture?). The suicide rate provides some sort of evidence, but there's a lot of noise in that signal. He apparently provides more evidence in his 2011 book (French only), according to this paper, and his 2019 book.


This system is unequal, and intermediate between nuclear and community: the only child to remain with his parents after marriage is the son who is the primary heir.

The exogamous and endogamous versions are apparently not worth distinguishing. The endogamous version seems uncommon - maybe it's only found in non Ashkenazi Jews?

These isolationist cultures resist assimilation more than do most other family systems. That produces fairly small, homogeneous countries, or fragmented groups. Examples are Germany, Sweden, Japan, Korea, Scotland, Catalans, and Jewish culture.

Egalitarian Nuclear

This system is exogamous, non-authoritarian, and universalist. It includes nearly all of the Catholic regions of Europe and South America.

Absolute Nuclear

This system is non-authoritarian, exogamous, and weakly unequal. It's weakly isolationist. It's fairly similar to the Egalitarian Nuclear type.

It's found in Anglo-Saxon countries, Holland, and, surprisingly, Denmark is in this category, in spite of the cultural features it shares with Sweden.

Where did he get the label "absolute" from? I'll suggest replacing it with libertarian.

Endogamous Community

This is found mainly in the Muslim parts of the region that extends from northern Africa to the western tips of India and China. It's equal and universalist.

Its strict religious rules about inheritance result in unusually weak parental authority. Todd considers it authoritarian, but in a sense that's very alien to the European understanding of that word. Authority in this case is embodied in custom and in the Koran, not in humans or human-designed organizations.

It has unusually good fraternal bonds, and low tension within the family. Suicide rates are were less than 1/20 of the European average, and illegitimate births are rare.

Henrich mentioned that Protestant culture caused an increase in suicide rates compared to Catholic culture, due to trade-offs that made it more likely to produce a Tesla or a Google, at the cost of making people lonelier. Todd implies that the exogamous community system is further in the direction of less loneliness, likely at a cost of less innovation.

The split between Christianity and Islam was due, according to Todd, to differences over exogamy. Christianity became more hostile to cousin marriage due to increasing influence of northern regions that more strongly opposed cousin marriage. Islam imposed some incest restrictions on cultures that had none, but tolerated incest more than did Christianity, so it was more welcome in regions that were committed to cousin marriage. Islam was also sometimes tolerated by the next two categories of family systems, although they don't fully accept all of the Koran's rules.

Arab socialism is a unique attempt to build socialism without the state, or to be more precise and less derisive, an effort to construct socialism in a culture without any special aptitude or a tradition of centralized, bureaucratic administration.

Endogamous systems in general reject state authority. Todd attributes this to their reluctance to create bonds of kinship with strangers. Whereas the exogamous systems provide a role model for creating a strong relationship with non-kin. This reasoning sounds suspect to me. I prefer Henrich's way of reaching a similar result.

History is made by individuals in nuclear family countries, by the government (a parental symbol) in authoritarian systems. It is defined by custom and thus eliminated in the case of endogamous anthropological systems. Islam's historical passivity can be seen to derive from its fundamental anthropological mechanism.

The Muslim father is too easy-going to be hated or rejected, either in human or divine form. The Islamic god is too forgiving for anyone to want to annihilate him.

Asymmetric Community

This system is endogamous, with marriage encouraged between children of a brother and a sister, but with a prohibition on marriage between children of two brothers, or children of two sisters.

It's found mostly in southern India.

It's egalitarian in the narrow sense of equality between brothers, but it supports large inequalities outside of the family (e.g. the caste system). This seems to weaken Todd's message elsewhere that equality within the family tends to generate egalitarian political forces.

Some unusual variants of this family system support a form of communism that's more laid-back than we expect from communism (Stalinists, Maoists, and sometimes Trotskyites cooperate well).

They are found in Sri Lanka and the Indian state of Kerala. These variants are distinguished by polyandry being common, often with brothers sharing a wife. They're either matrilineal, or intermediate between matrilineal and patrilineal.


Todd calls this a "faulty nuclear" system, with few rules, or rules that are often ignored. It has some overlap with the Absolute Nuclear family, but it oscillates between communitarianism and mild individualism.

It's seen in parts of southeastern Asia, some indigenous South American cultures, the Incan empire, ancient Egypt.

It tends to produce strong village solidarity.

It often produces strong but informal grouping by class, with marriage being mostly within a class. The topmost class looks powerful, and commands slaves to build displays of power such as pyramids. Yet the lack of discipline means that power is fragile, and easily destroyed by outside forces.

It fits well with the ambiguous deity of Buddhism.

Todd makes some weird claims about the massacre of Indonesian communists in 1965-6: it was substantially a grass-roots uprising, partly from within the communist movement, and eliminated communism, even in regions where communists had gotten a majority of the votes. That fits with Todd's claims that this family system is undisciplined and anti-authoritarian, unwilling to attach strongly to an ideology. But it's moderately inconsistent with Wikipedia's account.

African / Unstable

Sub-Saharan Africa is noted for systems with shorter-duration polygynous marriages. Todd hints at a lot of diversity within these regions, but documents little of it.

Islam has had difficulty penetrating these regions because its strict taboo on inheriting wives conflicts with a standard feature of these family systems.

Conflicts with Henrich?

I found this book via Policy Tensor, which points to some tension between Henrich's The WEIRDest People and Todd's belief that family structures are very hard to change. Actually, Policy Tensor claims to have evidence that Henrich is flat out wrong, but Policy Tensor presents way too little evidence to justify that claim.

I see some hints that Todd's 2011 book has more detail on the early history of family systems, possibly with clear evidence against Henrich.

Todd tells us that when there's a change in what family structure dominates a region, it's mostly due to a subpopulation becoming more dominant. It's not too hard to imagine that some of Europe's increasing prohibitions on cousin marriage under the early Christian church were due to increased influence from northern cultures, which apparently were more firmly against cousin marriage than the southernmost European cultures. And most of the correlations that Henrich reports could have been due to pre-existing local and regional cultures influencing what religious doctrines were accepted, rather than religions altering the culture.

I don't see much evidence on whether family systems are too persistent for Henrich's claims of Christianity causing exogamy to be plausible. Todd wants us to assume that family systems persist over many centuries, but he also notes that they do sometimes change, e.g. that urbanization erodes community and authoritarian systems.

The most important conflict I see between Henrich and Todd is that Henrich describes the marital rules for Christianity as a whole, seemingly taking it for granted that European Christianity had a fairly uniform culture at any one time. Whereas Todd wants us to assume that cultural change in Rome would tell us almost nothing about changes in London, and that we should presume (in the absence of clear evidence) that London's culture was mostly a continuation of its pre-Christian culture. Henrich tests many different hypotheses about what might cause the correlation between culture and exposure to Christianity, but he seems biased towards hypotheses for which he found good data, and he likely didn't find much data for the geographical distribution of culture circa 500 CE.

Henrich and Todd agree on a number of important points that others neglect. Henrich still looks mostly right, but there's plenty of complexity that he's sweeping under the rug. Henrich overstates the effect of the church on culture, and overstates the novelty of WEIRD culture.

Here's Todd partly supporting Henrich:

Developed in France and England, the individualist model was offered to the world. ... In the middle ages, the individual did not exist. He emerged in the West during the Reformation and the French Revolution.

Both authors seem to agree that different systems are good at achieving different goals. They'd mostly say that Muslim culture in the year 1500 looked more successful than British culture of the time, and that was partly due to the strengths of the endogamous family system. They'd also agree that modest changes after 1500 in British culture brought out the strengths of the exogamous nuclear families. So it's a bit confusing to try to classify cousin marriage as a sign of a backwards or an advanced culture.

Both authors agree that culture mostly changes via evolutionary forces, although they likely disagree on particular exceptions:

But the family, varied in its forms, is not itself determined by any necessity, logic or rationale. It simply exists, in its diversity, and lasts for centuries or millennia. ... It reproduces itself identically, from generation to generation, the unconscious imitation of parents by their children is enough to ensure the perpetuation of anthropological systems. ... It is a blind, irrational mechanism, but its power derives precisely from its lack of consciousness ... Furthermore, it is completely independent of its economic and ecological environments.

Evaluating predictions

With many books, I check for mistakes by following references. I didn't try that here, partly because he rarely connects specific claims to specific sources. Instead, enough time has passed that it's appropriate to judge him based on well-known changes since the book was published.


Where would communism spread or recede?

Todd sounded pretty confident that communism would not spread further in the New World, and his reasoning also applies to most non-communist states other than Finland, with a bit of uncertainty about Italy and India.

It may be hard for many of you to recall, but in 1983 many people were concerned about the trend of expanding communism, and few people were forecasting a collapse of communism in anything other than vague and distant timelines.

Todd firmly predicted that Ethiopia would resist Soviet attempts to turn it communist. He wrote at a time when that prediction bucked a moderately clear trend. Soviet influence seems to have peaked about when the book was published, and in about 4 years Ethiopia started a clear move away from communism.

Todd's thesis suggests that communism was more likely to be rejected in places where communism was imposed by force on a family system that doesn't support it:

I see no clear evidence that these places rejected communism more than did those with exogamous community families, so I count this as a failed implied prediction.

Todd predicted further decline in the French communist party, and it looks like that happened.

Some of this might be due to his prediction (made elsewhere) that the Soviet Union would collapse, which doesn't seem to directly follow from the claims in the book.


Given Todd's ideas, it becomes painfully obvious that that the US attempt at installing a Western-style government in Iraq would thoroughly fail.

An influential political faction thought that the US could accomplish in Iraq something like what it did with Germany and Japan after WWII. Those two countries looked different enough culturally to provide what looked like medium-quality evidence that Western-style governments could be imposed in many countries.

Had that faction believed Todd, they'd have known that their evidence only covered one type of family structure, and that the difference between exogamous and endogamous marriage practices would make an enormous difference. I'm referring not just to details such as the willingness of Iraqis to accept democracy, but more basic issues like their reluctance to respect features such as nations, or civil authority.


"Assassinating the president is almost a custom in North America." - I guessed that this was clearly discredited by the absence of assassination attempts after 1981, but Wikipedia lists enough attempts that I have to admit there's some truth to Todd's claim.


Todd's beliefs imply some predictions about which European countries are likely to have the most conflict with Muslim immigrants. E.g. the book led me to expect more tension in Germany and Sweden than in Poland and Spain. Tables 2 to 5 of this report mostly confirm that prediction, but this survey of attitudes shows the opposite pattern. So I'm confused as to whether there's a stable pattern.


I recommend Testing Todd: family types and development, which provides mixed evidence on some of the book's claims. But note that some of the hypotheses which that paper attributes to Todd don't match my understanding of the book's claims.

  • Todd says the endogamous community family is anti-racist, yet this paper reports it as the most racist family system, while claiming the racism data support Todd's view.
  • The paper shows that authoritarian family system has greater rule of law than other systems, and claims that conflicts with Todd's position. That seems to require a bizarre misunderstanding. I count this as clearly confirming Todd.
  • I'm confused as to whether they use an appropriate measure of innovation - they find that authoritarian family systems are more innovative than nuclear family systems, which looks suspicious to me.

In sum, his predictions were clearly better than what a random pundit of the time would have made, but not good enough that I'd bet much money on his beliefs.


This is one of the rare books that is shorter than I wanted.

The book's claims are unlikely to be more than 60% correct, but they're still quite valuable for focusing attention on topics which are both important and neglected. Whenever I try to understand differences between cultures, I'll remember to ask whether family structures explain patterns, and I'll likely often decide it's hard to tell.

I've become frustrated at how little attention sources such as Wikipedia pay to what I now see as the most important features of a culture.

I'm pretty sure that the patterns that he describes are much more than mere coincidences, but I don't trust his guesses about the causal mechanisms.

PS. - Parts of the book are much too Freudian for me. E.g. a section on witch-hunts (which happen mainly in authoritarian family societies) is titled "Killing the mother".


In search of benevolence (or: what should you get Clippy for Christmas?)

20 июля, 2021 - 04:12
Published on July 20, 2021 1:12 AM GMT

(Cross-posted from Hands and Cities)

Suppose that you aspire to promote the welfare of others in a roughly impartial way, at least in some parts of your life. This post examines a dilemma that such an aspiration creates, especially given subjectivism about meta-ethics. If you don’t use idealized preference-satisfaction as your theory of welfare, your “helping someone” often ends up “imposing your will” on them (albeit, in a way they generally prefer over nothing) — and for subjectivists, there’s no higher normative authority that says whose will is right. But if you do use idealized preference satisfaction as your theory of welfare, then you end up with a host of unappealing implications — notably, for example, you can end up randomizing the universe, turning it into an ocean of office supplies, and causing suffering to one agent that a sufficient number of others prefer (even if they never find out).

I don’t like either horn of this dilemma. But I think that the first horn (e.g., accepting some aspect of “imposing your will,” though some of the connotations here may not ultimately apply) is less bad, especially on further scrutiny and with further conditions.

I. “Wants” vs. “Good for”

Consider two ontologies — superficially similar, but importantly distinct. The first, which I’ll call the “preference ontology,” begins with a set of agents, who are each assigned preferences about possible worlds, indicating how much an idealized version of the agent would prefer that world to others. The second, which I’ll call the “welfare ontology,” begins with a set of patients, who are each assigned “welfare levels” in possible worlds, indicating how “good for” the patient each world is.

On a whiteboard, and in the mind’s eye, these can look the same. You draw slots, representing person-like things; you write numbers in the slots, representing some kind of person-relative “score” for a given world. Indeed, the ontologies look so similar that it’s tempting to equate them, and to run “Bob prefers X to Y” and “X is better than Y for Bob” together.

Conceptually, though, these are distinct — or at least, philosophers treat them as such. In particular: philosophers generally equate “welfare” with concepts like “rational self-interest” and “prudence” (see, e.g., here). Bob’s preferences always track Bob’s welfare levels, that is, only if Bob is entirely selfish. But Bob need not be entirely selfish. Bob, for example, might prefer that his sister does not suffer, even if he’ll never hear about her suffering. Such suffering, we tend to think, isn’t bad for him; he, after all, doesn’t feel it, or know about it. Rather, it’s bad from his perspective; he wants it to stop.

The preference ontology is also generally treated as empirical, and the welfare ontology, as normative. That is, modulo the type of (in my view, quite serious) complications I discussed in my last post, an agent’s preference ranking is supposed to represent what, suitably idealized, they would want, overall. A patient’s welfare ranking, by contrast, is supposed to represent what they should want, from the more limited perspective of prudence/self-interest. 

II. Does Clippy think you’re selfish?

A common, utilitarianism-flavored interpretation of altruism takes the welfare ontology as its starting point. The egoist, on this conception, limits their concern to the welfare number in their own slot; but the altruist transcends such limits; they look beyond themselves, to the slots of others. What’s more, the impartial altruist does this from a kind of “point of view of the universe” — a point of view that puts all the slots on equal footing (this impartiality is sometimes formalized via the condition that the altruist is indifferent to “switching” the welfare levels in any of the slots; e.g., indifferent to Bob at 10, Sally at 20, vs. Bob at 20, Sally at 10).

That is: altruism, on this conception, is a kind of universalized selfishness. The altruist assists, for its own sake, in the self-interest of someone else; and the impartial altruist does so for everyone, in some sense, equally (though famously, some patients might be easier to benefit than others). 

A common, anti-realist flavored meta-ethic, by contrast, takes the preference ontology as its starting point. On this view, the normative facts, for a given agent, are determined (roughly) by that agent’s idealized preferences (evaluative attitudes, etc). You — who prefer things like joy, friendship, flourishing civilizations, and so forth, and so have reason to promote them — are in one slot; Clippy, the AI system who prefers that the world contain a maximal number of paperclips, is in another. I’ll call this “subjectivism” (see here and here for previous discussion); and I’ll also assume that there is no universal convergence of idealized values: no matter how much you both reflect, you and Clippy will continue to value different things.

Importantly, subjectivism does not come with a welfare ontology baked in. After all, on subjectivism, there are no objective, mind-independent facts about what’s “good for someone from a self-interested perspective.” We can talk about what sorts of pleasures, preference satisfactions, accomplishments, and so on a given life involves; but it is a further normative step to treat some set of these as the proper or improper objects of someone’s “prudence,” whatever that is; and the subjectivist must take this step, as it were, for herself: the universe doesn’t tell her how.

This point, I think, can be somewhat difficult to really take on board. We’re used to treating “self-interest” as something basic and fairly well-understood: the person out for their self-interest, we tend to think, is the one out for, among other things, their own health, wealth, pleasure, and so on, in a suitably non-instrumental sense. For a subjectivist, though, the world gives these activities no intrinsically normative gloss; they are not yet prudential mistakes, or successes. The subjectivist’s idealized attitudes must dub them so.

One way to access this intuition is to imagine that you are Clippy. You look out at the universe. You see agents with different values; in particular, you see Joe, who seems to prefer, among other things, that various types of creatures be in various types of complex arrangements and mental states. You see, that is, the preference ontology. But the welfare ontology is not yet anywhere to be found. Clippy, that is, does not yet need a conception of what’s “good for someone from a self-interested perspective.” In interacting with Joe, for example, the main thing Clippy wants to know is what Joe prefers; what Joe will pay for, fight for, trade for, and so on. Joe’s “self-interest,” understood as what some limited subset of his preferences “should be”, doesn’t need to enter the picture (if anything, Joe should prefer paperclips). Indeed, if Clippy attempted to apply the normative concept of “self-interest” to herself, she might well come up short. Clippy, after all, doesn’t really think in terms of “selfishness” and “altruism.” Clippy isn’t clipping “for herself,” or “for others.” Clippy is just clipping.

Perhaps a subjectivist might try to follow Clippy’s philosophical lead, here. Who needs the welfare ontology? Why do we have to talk about what is good “for” people? And perhaps, ultimately, we can leave the welfare ontology behind. In the context of impartial altruism, though, I think its appeal is that it captures a basic sense in which altruism is about helping people; it consists, centrally, in some quality of “Other-directedness,” some type of responsiveness to Others (we can think of altruism as encompassing non-welfarist considerations, too, but I’m setting that aside for now). And indeed, conceptions of altruism that start to sound more “Clippy-like,” to my ear, also start to sound less appealing. 

Thus, for example, in certain flavors of utilitarianism, “people” or “moral patients” can start to fade from the picture, and the ethic can start to sound like it’s centrally about having a favored (or disfavored) type of stuff. Clippy wants to “tile the universe” with paperclips; totalist hedonistic utilitarians want to tile it with optimized pleasure; but neither of them, it can seem, are particularly interested in people (sentient beings, etc). People, rather, can start to seem like vehicles for “goodness” (and perhaps, ultimately, optional ones; here a friend of mine sometimes talks about “pleasure blips” — flecks of pleasure so small and brief as to fail to evoke a sense that anyone is there to experience them at all; or at least, anyone with has the type of identity, history, plans, projects, and so forth that evoke our sympathy). The good, that is, is primary; the good for, secondary: the “for” refers to the location of a floating bit of the favored stuff (e.g., the stuff is “in” Bob’s life, experience, etc), rather than about what makes that stuff favored in the first place (namely, Bob’s having an interest in it). 

Or at least, this can be the vibe. Actually, I think it’s a misleading vibe. The most attractive forms of hedonistic utilitarianism, I think, remind you that there is, in fact, someone experiencing the pleasure in question, and try to help you see through their eyes — and in particular, to see a pleasure which, for all the disdain with which some throw around the word, would appear to you, if you could experience it, not as a twitching lab rat on a heroin drip, but as something of sublimity and energy and boundlessness; something roaring with life and laughter and victory and love; something oceanic, titanic, breathtaking; something, indeed, beyond all this, far beyond. I am not a hedonist — but I think that casual scorn towards the notion of pleasure, especially actually optimal pleasure (not just ha-ha optimal, cold optimal, sterile optimal), is foolish indeed.

That said, even if totalist utilitarianism avoids the charge of being inadequately “people focused,” I do think there are legitimate questions about whether “people” are, as it were, deeply a thing (see here for some discussion). Indeed, some total utilitarian-ish folks I know are motivated in part by the view that they aren’t. Rather, what really exists, they suspect, are more fine-grained things — perhaps “experiences,” though here I get worried about the metaphysics — which can be better or worse. For simplicity, I’m going to set this bucket of issues aside, and assume the “slots” in the welfare ontology, and the preference ontology, are in good order. (I’ll note in passing, though, that I think accepting this sort of deflationary picture of “people” provides a strong argument for totalism, in a way that bypasses lots of other debates. For example, if you can erase boundaries between lives, or redraw them however you want, then e.g. average utilitarianism, egalitarianism, prioritarianism, and various person-affecting views collapse: you can make the same world look like zillions of tiny and worse lives, or like one big and better life, or like very unequal lives; you can make preventing an agent’s death look like adding new people, and vice versa; and so on.)

Suppose, then, that you are a meta-ethical subjectivist, interested in being as impartially altruistic as possible with some part of your life, and who wants to use the standard, people-focused welfare ontology as the basis for your altruism. You’ve got a preference ontology already — you can see Clippy, out there, preferring those paperclips — but your welfare ontology is on you. How should you construct it?

III. Imposing your “altruistic” will

Theories of welfare are often said to come in roughly three varieties:

  • Hedonist (welfare is determined by certain types of experiences, notably pleasure/not-pain),
  • Preference-based (welfare is determined by certain types of preference-satisfaction), and
  • Objective list theories (welfare is determined by the possession of some not-just-experiences stuff that I wrote down on a list — stuff like friendship, knowledge, accomplishment, and so on).

Here my girlfriend notes: “wow, what an unappealing bunch of options.” And we might add: and so not-obviously joint-carving/exhaustive? That said, note that the “objective list” can be made as long or short as you like — hedonism is really just a very short version — and that it can also include “hybrid goods” that include both a subjective and an objective component, e.g. “pleasure taken in genuinely good things,”or “the experience of preferences actually being satisfied.”

For present purposes, though, I’m interested in a different carving, between:

  1. The view that welfare is determined by overall idealized preference satisfaction, and
  2. All other theories of welfare.

To a first approximation, that is, (1) takes the empirical preference ontology, and makes it into the normative welfare ontology. Impartial altruism, on this view, looks like a form of unadulterated preference utilitarianism: you find some way of impartially aggregating everyone’s overall preferences together, and then act on that basis. As I gestured at above, this option negates the difference between selfish preferences (typically associated with the notion of welfare) and all the rest, but perhaps that’s OK. However, it faces other problems, which I’ll discuss in more detail below.

For now, let’s focus on (2), a category that includes all hedonist and objective list theories, but also all limited preference-based views — that is, views that try to identify some subset of your preferences (for example, your “self-regarding preferences”), satisfaction of which determines your welfare.

I want to highlight what I see as a basic objection to category (2): namely, it implies that acting out of altruistic concern for a given agent will predictably differ from doing what that agent would want you to do, even where the agent’s desires are entirely informed, idealized, and innocuous (indeed, admirable). That is, these views fail a certain kind of golden rule test (see Christiano (2014) for some interesting discussion): the person on the “receiving end” of your altruism — the person you are supposed to be helping — will predictably wish you had acted differently (though they’ll still, often, be happy that you acted at all).

Suppose, for example, that you’re trying to be altruistic towards me, Joe. You have the chance to (a) send me a ticket for a top-notch luxury cruise, or (b) increase by 5% the chance that humanity one day creates a Utopia, even assuming that I’ll never see it. I’m hereby asking you: do (b). In fact, if you do (a), for my sake, I’m going to be extremely pissed. And it seems strange for your altruism to leave me so pissed off (even if it’s better, according to me, than nothing). (Note: the type of presents I prefer in more normal circumstances is a different story; and I suspect that in practice, trying to channel someone’s more charitable/altruistic aspirations via your gifts, rather than just giving them conventionally nice things/resources, is generally tough to do in a way they’d genuinely prefer overall.)

Or consider another case, with less of a moral flavor. Sally wants pleasure for Sally. Bob mostly wants pleasure for Sally, too. You, an altruist, can give (a) five units of pleasure to each of Sally and Bob, or (b) nine units, just to Sally. As a hedonist about well-being, you choose (a). But when Sally and Bob find out, they’re both angry; both of them wanted you to choose (b), and would continue to want this even on idealized reflection. Who are you to choose otherwise, out of “altruism” towards them?

The intuition I’m trying to pump, here, is that your altruism, in these cases, seems to involve some aspect of “imposing your will” on others. My will was towards Utopia, but you wanted to “promote my welfare,” not to respect me or to further my projects. Bob and Sally have a synergistic, Sally-focused relationship going on — one that would, we have supposed, withstand both of their reflective scrutiny. But you “knew better” what was good for them. In fact, suspiciously, you’re the only one, ultimately, that ends up preferring the chosen outcome in these cases. An altruist indeed.

IV. Johnny Appleseeds of welfare

I think this objection applies, to some extent, regardless of your meta-ethics. But I think it bites harder for subjectivists. For robust normative realists, that is, this is a relatively familiar debate about paternalism. For them, there is some mind-independent, objective fact about what is good for Bob and Sally, and we can suppose that you, the altruist, know this fact: what’s good for Bob and Sally, let’s say, is pleasure (though obviously, we might object, as people often do in the context of paternalism, to positing such knowledge). The question for the realist, then, becomes whether it’s somehow objectionable to do what’s objectively best for Bob and Sally, even if they both want you to do something else, and would continue to want this on reflection (though some forms of normative realism expect Bob and Sally to converge on your view on reflection, in which case the paternalistic flavor of the case weakens somewhat). In a sense, that is, for the normative realist, you’re not just imposing your will on Bob and Sally, here; you’re imposing God’s will; the will of the normative facts about welfare; the will of the universe, whose point of view you, as an impartial altruist, occupy.

But the subjectivist can appeal to no such higher normative authority. The subjectivist, ultimately, had to decide what they were going to treat as “good for” people; they chose pleasure, in this case; and so, that’s what their altruism involves giving to people, regardless of whether those people want other things more. The altruistic subjectivist, that is, is like the Johnny Appleseed of welfare (Johnny Appleseed was a man famous in America for planting lots of apple trees in different places). Whatever your agricultural aspirations, he’s here to give you apple trees, and not because apple trees lead to objectively better farms. Rather, apple trees are just, as it were, his thing.

It can feel like some aspect of other-directedness has been lost, here. In particular: this sort of altruism can feel like it’s not, fully, about helping the Other, on their own terms. Rather, it’s about giving the Other what the Self wants the Other to have. Of course, welfare, for the altruistic subjectivist, isn’t conceptualized as “the stuff I want other people to have”; rather, it’s conceptualized as “the stuff that seems actually good for other people to have; the genuinely valuable stuff.” But it still feels like the Self’s preferences are lurking in the background, and limiting the role that even very innocuous preferences in the Other can play, with no ultimate justification save that the Self just isn’t into that kind of thing.

Of course, the subjectivist altruist can more easily meet a more minimal condition: namely, that the recipients of her altruism be glad to have had the interaction at all, even if they would’ve preferred a different one. And indeed, almost every theory of welfare involves close ties to a patient’s preference-like attitudes. In the case of hedonism, for example, pleasure and pain are plausibly constituted, in part, by certain motivation-laden attitudes towards internal states: it’s not clear that it really makes sense to think of someone as being fully indifferent to pleasure and pain — at least not in a way that preserves what seems important about pleasure/pain for welfare. Similarly, limited preference-based views are connected to preferences by definition; many classic items on the objective list (friendship, appreciation of beauty, virtue) are at least partly constituted by various pro-attitudes; and the ones that aren’t (knowledge, accomplishment) seem much less compelling components of welfare in the absence of such attitudes (e.g., if someone has no interest in knowledge, it seems quite unclear that possessing it makes their life go better). Thus, subjectivists who accept these theories of welfare are still well-positioned to do things that other agents (especially human agents) typically like and want, other things equal. Such agents just might want other things much more.

That said, we can also imagine cases in which someone actively prefers no interaction with the altruist whatsoever (here I’m reminded of a line from Reagan (retweets not endorsements): “The nine most terrifying words in the English language are: I’m from the Government, and I’m here to help.”). It might be, for example, that even if I like pleasure, other things equal, if you give me a bunch of pleasure right now, you’ll distract me from other projects that are more important to me. Indeed, I’ve spoken, in the past, with people who accept that knock-on effects aside, they are morally obligated to push me into an experience machine and trap me inside, despite my adamant preference to the contrary (though interestingly, a number also profess to being something like “weak-willed” in this respect; they accept that they “should” do it, but somehow, they don’t want to; it must be those pesky, biasing intuitions, inherited from that great distorter of moral truth, evolution/culture/psychology, that are getting in the way…). Especially assuming that I wouldn’t, on idealized reflection, prefer to be put in an experience machine in this way, this sort of “altruism” gives me a feeling of: stay the hell away. (Though to be clear, the people I have in mind are in fact extremely nice and cooperative, and I don’t actually expect them to do anything like this to me or others; and note, too, that we can give accounts of why it’s wrong to push me into the machine that don’t appeal to theories of welfare.)

Regardless, even if those who accept views of welfare in category (2) can manage, perhaps in combination with other norms, to ensure that recipients of altruism always prefer, other things equal, to so receive, the “this isn’t fully about the Other” objection above feels like it still stands. The Other offers the Self a wishlist, ranked by how much they want it; the Self confines their gifts to ones on the wish-list, yes; but they don’t choose, fully, according to the Other’s ranking. How, then, do they choose? For subjectivists, it must be: according to some criteria that is theirs, and not the Other’s (God has no opinion). Little Billy wants an Xbox much more than a wildflower guide; Granny wants Little Billy to have a wildflower guide; neither of them is objectively right about what’s best “for Billy”; rather, they are in a kind of clash of wills (albeit, one Billy prefers over nothing) over what will become of Little Billy’s life; and Granny, the altruist, and the one with money for presents, is winning.

Even in this “wish-list” scenario, then, it still feels like there’s a pull towards something less paternalistic — an altruism that channels itself, fully, via the will the recipient, rather than restricting itself according to the will of the altruist; an altruism that gives recipients what they maximally want; that gives Billy the Xbox; Joe the Utopia; Bob and Sally the Sally-pleasure; and farmers the trees they most want to plant. This is the altruism I meant to point at with option (1) above; let’s turn to that option now.

V. Can’t you just give people what they want?

My impression is that unrestricted idealized preference-based views about welfare (that is, the view that welfare just consists in getting what idealized you wants overall) are not popular, for a variety of reasons. Here I’ll start with a few objections that don’t seem to me decisive. Then I’ll move on to some that seem more so.

One objection is that by failing to distinguish between self-regarding and altruistic preferences, such theories fail to capture the concept of welfare as typically understood. I grant that there is some problem here — I do think that there’s an intuitive distinction between more self-interest flavored and more altruistic preferences, which it would be nice to be able to capture and make use of — but overall, it doesn’t worry me much. Indeed, eliding this distinction is necessary to avoid the “imposing your will” objections discussed in the previous section. That is, it’s precisely because Bob is not conventionally selfish re: Sally’s pleasure that his will clashes with that of an altruist bent on promoting his self-interest. If avoiding such a clash requires giving up on a deep distinction between the selfish and altruistic parts of someone’s preferences, I currently feel OK with that.

Another objection is that this type of view can lead to loopy-type paradoxes resulting from people having preferences about their own preferences/welfare, or about the preferences/welfare of others. Bradley (2009), for example, explores cases in which an agent has a preference X that his life go badly, where his life’s badness is determined by his degree of preference satisfaction such a way that, if preference X is satisfied, his live goes well, in which case X isn’t satisfied, in which case his life goes badly, and so on. And Trammel (2018) explores a couple of similar problems: for example, if we construct some overall preference ranking U by aggregating individual preference rankings, then what is U in a city of “Saints,” all of whom take U as their individual preference ranking? I expect lots of problems in this vein, and I don’t have great suggestions for how to eliminate them. Indeed, in my experience, they arise in force basically as soon as you try to actually pin down what an unrestricted preference utilitarianism should do in given case — given that their preference utilitarianism, too, is in one of the preference slots.

That said, I also don’t think that these problems are unique to unrestricted preference-based views about welfare (or overall social utility) in particular. Rather, to me they have the “liar-like” flavor that comes from allowing certain types of self-reference and loopy-ness in general (see Bradley (2009) for more discussion). Thus, for example, if we allow agents to have preferences about preferences at all (e.g., “I prefer that this preference not be satisfied”), then we should be unsurprised if we get problematic loops: but such loops are a problem for even defining the degree to which this person’s preferences are satisfied — a problem that comes well before we start debating what normative role we should give to preference satisfaction. To me, that is, these loops don’t currently seem all that much more problematic, from a normative perspective, than the claim that true beliefs can’t be part of welfare, because it’s possible to have liar-like beliefs. That said, I haven’t thought much about it.

A third objection is from “worthless preferences” — that is, preferences the satisfaction of which just doesn’t seem that good for the agent in question. Examples might include: a man who prefers that the number of tennis balls in a particular far-away box be even (he’ll never know); a man who prefers to count the blades of grass in his backyard, over and over (he gets no pleasure from it). Is the satisfaction of these preferences really good for such men? If we compare these lives with ones filled with conventional forms of flourishing, are we really limited to just tallying up degrees of preference satisfaction, and then saying, beyond that, “different strokes for different folks”?

Ultimately, I think this objection mostly belongs in the following sections, on “forceful/decisive” objections, and I discuss in more depth there. I include it here too, though, because my sense is that I feel it less than many other people. For example, when I imagine a man, far away, who really wants someone to add an extra tennis ball to a box he’ll never see (imagine him hoping, praying, weeping), I feel pretty happy to do it, and pretty happy, as well, to learn that it got added by someone else. This example is complicated somewhat by the fact that the favor is so cheap; and it’s a further question whether I’d want to pay the same type of costs to add the ball that I might pay in order to help someone else flourish in more conventional ways. Still, though, I don’t feel like I’m just saying “eh, I see no appeal, but this is cheap and I’m not certain.” Rather, I feel a direct sort of sympathy towards the man, and a corresponding pull to help — a pull that comes in part, I think, from trying to look at the world through this man’s eyes, to imagine actually caring about the tennis ball thing. Similarly, when I imagine a man who prefers counting blades of grass to writing great novels or seeking great bliss, I don’t have some feeling like: “your thing is worthless, grass-counting man.” Rather, I feel more like: “people are into all kinds of things, and I like stuff that would look silly, from the outside, to aliens, too.” 

I think normative realism lends itself to being more immediately dismissive, here. “Preferences,” the realist thinks, “can just be any old thing. Look at this tennis ball guy, for example. He’s blind to the Good. If only he could see! But certainly, you do him no favors by playing along with his mistaken normative world, rather than giving him what’s really good for him, namely [the realist’s favored thing].” But for me, on subjectivism, this sort of impulse weakens, especially in light of the “imposing your will” objection discussed in the previous section, and of related “golden rule” type arguments about how I would like this guy to treat me. 

Of course, our relationships to the preferences of other agents — especially ones with power, or who could’ve easily had power — bring in further questions, beyond non-instrumental altruism, about more instrumentally flavored forms of cooperation/trade. The lines separating these considerations from others get blurry fast, especially if, like a number of people I know, you try to get fancy about what broadly “instrumental” cooperation entails. For now, though, I’m bracketing (or, trying to bracket) this bucket of stuff: the question is about the goals that you’re trading/cooperating in order to achieve — goals absent which trade/cooperation can’t get started, and which can themselves include an impartially altruistic component.

To the extent that I can distinguish between (fancy) instrumentally cooperative stuff and direct altruism, that is, I find that with the latter, and even with so-called “worthless preferences,” I can still get into some preference-utilitarianism type of mindset where I think: “OK, this guy is into counting grass, Clippy is into paperclips, Bob is apparently especially excited about Sally having pleasure, Little Billy wants an Xbox — I’ll look for ways to satisfy everyone’s preferences as much as possible. Can we make grass out of paperclips?” That is, I feel some pull to mimic, in my preference ontology, the type of self-transcendence that the altruist performs in the welfare ontology (and no surprise, if we equate preference satisfaction and welfare). On such an approach, that is, I make of my preference slot (or at least, the portion of it I want to devote to altruism) an “everyone slot” — a slot that (modulo un-addressed worries about problematic loops) helps all the other slots get what they want, more; a slot that’s everyone’s friend, equally. And in doing so, I have some feeling of trying to leave behind the contingencies of my starting values, and to reach for something more universal.

But when I think about this more, the appeal starts to die. Here are a few reasons why.

VI. Factories and forest fires

As a starter: whose preferences count? This comes up already in considering Clippy. If I imagine a conscious, emotionally vulnerable version of Clippy — one who really cares, passionately, about paperclips; who lies awake at night in her robot bed, hoping that more paperclips get made, imagining their shining steel glinting in the sun — then I feel towards Clippy in a manner similar to how I feel about the tennis-ball man; if I can make some paperclips that Clippy will never know about, but that she would be happy to see made, I feel pretty happy to do it (though as above, the resources I’m ready to devote to the project is a further question). But if we start to strip away these human-like traits from Clippy — if we specify that Clippy is just an optimizer, entirely devoid of consciousness or emotion; a mobile factory running sophisticated software, that predictably and voraciously transforms raw material into paperclips; a complex pattern that appears to observers as a swelling, locust-like cloud of paperclips engulfing everything that Clippy owns (to avoid pumping other intuitions about cooperation, let’s imagine that Clippy respects property rights) — then pretty quickly I start to get less sympathetic.

But what’s up with that? Naively, it seems possible to have preferences without consciousness, and especially without conventionally mammalian emotion. So why would consciousness and emotion be preconditions for giving preferences weight? One might’ve thought, after all, that it would be their preference-y-ness that made satisfying them worthwhile, not some other thing. If preferences are indeed possible without consciousness/emotion, and I deny weight to the preferences of a non-conscious/non-emotional version of Clippy, I feel like I’m on shaky ground; I feel like I’m just making up random stuff, privileging particular beings — those who I conceive of in a way that hits the right sympathy buttons in my psychology  — in an unprincipled way (and this whole “consciousness” thing plausibly won’t turn out to be the deep, metaphysically substantive thing many expect). But when I start thinking about giving intrinsic weight to the preferences of unconscious factories, or to systems whose consciousness involves no valence at all (they don’t, as it were, “care”; they just execute behaviors that systematically cause stuff — assuming, maybe wrongly, that this is a sensible thing to imagine), then I don’t feel especially jazzed, either.

This worry becomes more acute if we start to think of attributing preferences to a system centrally as a useful, compressed way of predicting its behavior rather than as some kind of deep fact (whatever that means; and maybe deep facts, in this sense, are rare). If we get sufficiently loose in this respect, we will start attributing preferences to viruses, economic systems, evolutionary processes, corporations, religions; perhaps to electrons, thermostats, washing machines, and forest fires. From a “fancy instrumental cooperation” perspective, you might be able to rule some of these out on the grounds that these “agents” aren’t the right type to e.g. repay the favor; but as I said, I’m here talking about object-level values, where weighting-by-power and related moves look, to me, unappealing (indeed, sometimes objectionable). And note, too, that many proxy criteria people focus on in the context of moral status — for example, metrics like brain size, cognitive complexity, and so on — seems more relevant to how conscious a system is, than to how much it has, or doesn’t have, preferences (though one can imagine accounts of preferences that do require various types of cognitive machinery — for example, machinery for representing the world, whatever that means). 

Of course, all welfare ontologies face the challenge of determining who gets a slot, and how much weight the slot is given; and preference utilitarianism isn’t the only view where this gets gnarly. And perhaps, ultimately, consciousness/valence has a more principled tie to the type of preferences we care about satisfying than it might appear, because consciousness involves possessing a “perspective,” a set of “eyes” that we can imagine “looking out of”; and because valence involves, perhaps, the kind of agential momentum that propels all preferences, everywhere (thanks to Katja Grace for discussion). Still, the “thinner” and less deep you think the notion of preference, the worse it looks, on its own, as a basis for moral status.

VII. Possible people craziness

Another objection: preference utilitarianism leaves me unexcited and confused, from a population ethics perspective, in a way that other theories of welfare do not. In particular: in thinking about population ethics, I feel inclined, on both theoretical and directly intuitive grounds, to think about the interests of possible people in addition to those of actually or necessarily existing people (preference utilitarianisms that don’t do this will face all the familiar problems re: comparing worlds with different numbers of people — see, e.g., Beckstead (2013), Chapter 4). Thus, for example, when I consider whether to create Wilbur, I ask questions about how good life would be, for Wilbur — even though Wilbur doesn’t yet exist, and might never do so. 

One straightforward, totalism-flavored way of formulating this is: you give all possible people “slots,” but treat them like they have 0 welfare in worlds where they don’t exist (people you can’t cause to exist therefore never get counted when, per totalism, you’re adding things up). On this formulation, though, things can get weird fast for the preference utilitarian (unlike, e.g., the hedonistic utilitarian), because at least on one interpretation, existence isn’t a precondition for preference satisfaction: that is, possible people can have non-zero welfare, even in worlds where they’re never created.

Suppose, for example, that I am considering whether to create Clippy. You might think that the altruistic thing to do, with respect to Clippy, is to create her, so she can go around making paperclips and therefore satisfying her preferences. But wait: Clippy isn’t into Clippy making paperclips. Clippy is into paperclips, and my making Clippy, let’s suppose, uses resources that could themselves be turned into clips. Assuming that I’m in a position to make more clips than Clippy would be able to make in her lifetime, then, the thing to do, if I want to be altruistic towards Clippy, is to make paperclips myself: that’s what Clippy would want, and she’d be mad, upon coming to exist, if she learned I had chosen otherwise.

But Clippy isn’t the only one who has preferences about worlds she doesn’t exist in. To the contrary, the space of all possible agents includes an infinity of agents with every possible utility function, an infinite subset of which aren’t picky about existing, as long as they get what they want (see Shulman (2012) for exploration of issues in a related vein). Trying to optimize the universe according to their aggregated preferences seems not just hopeless, but unappealing — akin to trying to optimize the universe according to “all possible rankings over worlds.” Maybe this leads to total neutrality (every ranking has a sign-reversed counterpart?), maybe it leads to some kind of Eldritch craziness, maybe actually it turns out kind of nice for some reason, but regardless: it feels, Dorothy, like we’re a long way from home.

Of course, this sort of “give possible people a slot” approach isn’t the only way of trying to fashion a preference utilitarian population ethic that cares about possible people. Perhaps, for example, we might require that Clippy actually get created in a world in order for her to have welfare in that world, despite the fact that implementing this policy would leave Clippy mad, when we create her, that we didn’t do something else. And there may be lots of other options as well, though I don’t have any particularly attractive ones in mind. For example, I expect views that advocate for “tiling the universe” with preference satisfaction to look even more unappetizing than those that advocate for tiling it with e.g. optimally pleasant experience; here I imagine zillions of tiny, non-conscious agents, all of whom want a single bit to be on rather than off, for two plus two to equal four, or some such (though obviously, this vision is far from a steel-man). And views that solely try to minimize preference dissatisfaction — sometimes called “anti-frustrationist” views — prefer nothingness to Utopia + a single papercut.

VIII. Sadism, randomness, OfficeMax

Ultimately, though, I think my true rejection of unrestricted preference utilitarianism is just: despite my willingness to put tennis balls in boxes, and to make some limited number of paperclips for (suitably sympathetic?) robots, when push comes to shove I do actually start to get picky about the preferences I’m excited to devote significant resources to satisfying (here I expect some people to be like: “uhh… duh?“).

Suppose, for example, that here I am, in a position to save the lives of humans, prevent the suffering of animals, help build a beautiful and flourishing future, and so on; but I learn that actually, there are a suitably large number of paperclip maximizers in distant star systems that the best way to maximize aggregate preference satisfaction is to build paperclip factories instead (let’s assume that the Clippers will never find out either way, and that they and the Earthlings will never interact). Fancy instrumental cooperation stuff aside, I’m inclined, here, just to acknowledge that I’m not really about maximizing aggregate preference satisfaction, in cases like this; burning or sacrificing things of beauty and joy and awareness and vitality, in order to churn out whatever office supplies happen to be most popular, across some (ill-defined, contingent, and often arbitrary-seemiing) set of “slots,” just isn’t really my thing.

Indeed, cases like this start to make me feel “hostage,” in a sense, to whatever utility functions just happen to get written into the slots in question. I imagine, for example, living in a world where a machine churns out AI system after AI system, each maximizing for a different randomly selected ranking over worlds, while I, the preference utilitarian, living in a part of a universe these AI systems will never see or interact with, scramble to keep up, increasingly disheartened at the senselessness of it all. As I wipe the sweat off my brow, looking out on the world I am devoting my life to, effectively, randomizing, I expect some feeling like: wait, why am I doing this again? I expect some feeling like: maybe I should build a Utopia instead.

And once we start talking about actively sadistic preferences, I have some feeling like: I’m out of here. Consider a sadist who just really wants there to be suffering in a certain patch of desert they’ll never see. You might say: “ah, satisfying this preference would conflict with someone else’s stronger preference not to suffer, so the sadist always loses out; sorry, sadist” (are you really sorry?). Hmm, though: we never said anything about just how strong the sadist’s preference was, or, indeed, about how we are counting “strength.” What’s more, suppose that there are tons of such sadists, spread out across the universe, all obsessed with this one patch of desert, all totally oblivious to what happens in it (thanks to Katja Grace for pointing me to examples in this vein). If you make the sadists “big” and numerous enough, eventually, the naive preference utilitarian conclusion becomes clear, and I’m not on board. 

Perhaps you say: fine, forget the sadists, ban them from your slots, but keep the rest. But what kind of arbitrary move is that? Are they not preference-havers? When you prick them, do they not bleed? Whence such discrimination? We got into this whole preference utilitarian game, after all, to stop “imposing our will,” and to be genuinely “responsive to the Other.” Who decided that sadists aren’t the Other, if not “our will”?

Indeed, in some sense, for the preference utilitarian, this example is no different from a non-sadistic case. Suppose, for example, that the universe contains tons of hyper-focused Clippers, all of whom want that patch of desert in particular turned into clips; but there is also a conservationist whose adamant preference is that it stay sand. Here, the preference utilitarian seems likely to say that we should accept the will of the majority: clips it is. But from the perspective of the preference ontology, what’s, actually, the distinction between this case and the last? Whoever would suffer for the sake of the oblivious sadists, after all, is ultimately just someone whose strong preferences conflict with theirs: if you draw the cases on the whiteboard, you can make them look the same. But somehow suffering, at least for me, is different; just as Utopia, for me, is different. Somehow (big surprise), at the end of the day, I’m not a preference utilitarian after all. 

IX. Polytheism

Overall, then, I think that subjectivists (and to a lesser extent, realists) aspiring towards impartial altruism are left with the following tension. On the one hand, if they aren’t unrestricted preference utilitarians, then their altruism will involve at least some amount of “imposing” their will on the people they are trying to help: they will be, that is, Johnny Appleseeds, who offer apple trees to people who want cherry trees more. But if they are unrestricted preference utilitarians, their ethic looks unappealing along a variety of dimensions — not least, because it implies that in some cases, they will end up turning the farm into random office supplies, torture chambers, and the like.

Currently, my take is that we (or, many of us) should accept that even in altruism-flavored contexts, we are, to some extent, Johnny Appleseeds. We have a particular conception of the types of lives, experiences, opportunities, forms of flourishing, and so on that we hope other agents can have — a conception that most humans plausibly share to a large extent, especially for most present-day practical purposes, but that other agents may not, even on idealized reflection; and a conception which no higher and more objective normative authority shares, either. This doesn’t mean we go around giving agents things they don’t want; but it might well mean that to the extent they don’t want what we’re offering, we focus more on other agents who do, and that we are in that sense less “impartial” than we might’ve initially thought (though this depends somewhat on how we define impartiality; on some definitions, if you’re equally happy to give e.g. pleasure, knowledge, flourishing, etc to anyone who wants it, the “equally” and the “anyone” leave you impartial regardless).

That said, I do think there’s room to go a lot of the way towards helping agents centrally on their own terms — even Clippers, tennis-ballers, and so on — especially when you find yourself sharing the world with them already; to err very hard, that is, on the side of giving Little Billy the Xbox. Some of this may well fall under object-level altruism; but even if it doesn’t, there’s a lot of room for more fully preference-utilitarian, golden-rule type thinking, in the context of various fancy (or not so fancy) types of cooperation, reciprocity, norm-following, and general “niceness.” My current best guess is that trying to shoe-horn all altruism/morality-flavored activity into this (ultimately instrumental?) mold gives the wrong verdicts overall (for example, re: moral concern towards the powerless/counterfactually powerless), but I think it has a pretty clear and important role to play, especially as the preferences of the agents in question start to diverge more dramatically. After all, whatever their preferences, many agents have an interest in learning to live together in peace, make positive sum trades, coordinate on mutually-preferred practices and institutions, and so on — and these interests can ground commitments, hypothetical contracts, dispositions, and so on that involve quite a bit of thinking about, and acting on the basis of, what other agents would want, regardless of whether you, personally, are excited about it.

All this talk of personal excitement, though, does raise the question, at least for me: is the Johnny Appleseed model really altruism, as it’s often understood? Indeed, I think the subjectivist can start to worry about this before we start talking about Johnny Appleseed stuff. In particular: once we have let go of the normative realist’s Good, with a capital G — the God who was supposed to baptize some privileged forms of activity as occurring “from the point of view of the universe,” the thing that everyone (or, everyone whose vision is not irreparably clouded) was, in some sense, supposed to be on board with (see, e.g., a common interpretation of the Good as generating “agent-neutral reasons,” reasons that apply to everyone; an interpretation that it’s quite unclear the subjectivist can sustain) — we might wonder whether all activity, however directed towards others, is relegated to the land that the old God would’ve dismissed as “mere preference,” just “what you like.” Saving lives, curing diseases, opening cages, fighting oppression, building wise and flourishing civilizations — are these things so different, for subjectivists, than collecting stamps, or trying to tile the universe with neon pictures of your face? At the end of the day, it’s just your slot; just your “thing.” You wanted, perhaps, a greater sense of righteousness; a more Universal Wind at your back; a more Universal Voice with which to speak. But you were left with just yourself, the world around you, other people, the chance to choose.

(Here I think of this verse from Jacob Banks, and about what he takes himself to have learned from travelers, and from mirrors — though I don’t think it’s quite the same).

Perhaps, then, you responded to this predicament by seeking a different Universal voice: not the voice of God, or of the realist’s Good, but the voice of all agents, aggregated into a grand chorus. Perhaps, in part (though here I speculate even more than elsewhere), you sought to avoid the burdens of having a will yourself, and certainly, of “imposing it.” Perhaps you sought to deny or accept the will of others always by reference to some further set of wills, other than your own: “I’d love to do that for you, Clippy; it’s just that Staply and the others wouldn’t like it.” Perhaps some part of you thought that if you made yourself transparent enough, you wouldn’t exist at all; other agents will simply exist more, through you (an aim that gives me some feeling of: what makes them worth it, in your eyes, but yourself such nothingness?). Perhaps such transparency seemed a form of ethical safety; a way, perhaps, of not being seen; of avoiding being party to any fundamental forms of conflict; of looking, always, from above, instead of face to face. 

Indeed, it’s easy, especially for realists, to assume that if one is trying to promote “the Good,” or to do the “Right Thing,” or to do what one “Should Do,” then by dint of pitching one’s intentions at a certain conceptual level (a move sometimes derided as “moral fetishism“), the intentions themselves, if sincere, are thereby rendered universally defensible. Perhaps, empirically, one fails to actually promote the Good; perhaps one was wrong, indeed, about what the Good consists in; but one’s heart, we should all admit, was in the right place, at least at a sufficient level of abstraction (here I think of Robespierre, talking endlessly of virtue, as blood spatters the guillotine again and again). And perhaps there is some merit to “fetishism” of this kind, and to thinking of hearts (both your heart, and the hearts of the others) in this way. Ultimately, though, the question isn’t, centrally, what concepts you use to frame your aspirations, except insofar as these concepts predict how you will update your beliefs and behavior in response to new circumstances (for example, perhaps in some cases, someone’s professing allegiance to “the Good” is well-modeled as their professing allegiance to the output a certain type of population-ethics-flavored discourse, the conclusion of which would indeed change how they act). Ultimately, the question is what, on the object level, you are actually trying to do.

And here the preference utilitarian tries, in a sense, to do everything at once; to make of their song the Universal Song, even absent the voice of God. But without God to guide it, the Universal Chorus ultimately proves too alien, too arbitrary, and sometimes, too terrible. In trying to speak with every voice, you lose your own; in trying to sing every song at once, you stop singing the songs you love. 

Thus, the turn to the Johnny Appleseed way. But now, you might think, the line between cancer curing (altruism?) and stamp collecting (random hobby?) has gotten even thinner: not only does it all come down to your preferences, your slot, your thing, but your thing isn’t even always fully responsive to the Other, the Universal Chorus, on its own terms. You’re not, equally, everyone’s friend, at least not from their idealized perspective. Your “helping people” has gotten pickier; now, apparently, it has to be “helping people in the way that you in particular go in for.” Is that what passes for impartial altruism these days? 

I think the answer here is basically just: yes. As far as I can currently tell, this is approximately as good as we can do, and the alternatives seem worse. 

I’ve been describing realism as like monotheism: the realist looks to the One True God/Ranking for unassailable guidance. Subjectivism, by contrast, is like polytheism. There are many Gods — far too many, indeed, to fight for all of them at once, or to try to average across. This is the mistake of the preference utilitarian, who tries to make, somehow, from the many Gods, another monotheism — but one that ultimately proves grotesque. Johnny Appleseed, though, acknowledges, and does not try to erase or transcend, the fundamental plurality: there is a God of consciousness, beauty, joy, energy, love; there is a God of paperclips; there is a God of stamps, and of neon pictures of your face; there is a God of fire, blood, guillotines, torture; there is a God of this randomized ranking of worlds I just pulled out of a hat. And there are meta-Gods, too; Gods of peace, cooperation, trade, respect, humility; Gods who want the other Gods to get along, and not burn things they all care about; Gods who want to look look other Gods in the eye; to understand them; maybe even to learn from them, to be changed by them, and to change them in turn (an aspiration that the Bostrom’s “goal content integrity” leaves little room for). 

All of us serve some Gods, and not others; all of us build, and steer, the world, whether intentionally or not (though intention can make an important difference). Indeed, all of us build, and steer, each other, in large ways and small — an interdependence that the atomized ontology of “slots” may well be ill-suited to capture. Johnny Appleseed fights for the God of apple trees. You fight for your Gods, too.

(Or at least, this is my current best guess. Some part of my heart, and some “maybe somehow?” part of my head, is still with the monotheists.)

X. Cave and sun

All that said, at a lower level of abstraction, I am optimistic about finding more mundane distinctions between collecting stamps and savings lives, curing cancers, preventing suffering, and so on. For one thing, the latter are activities that other people generally do, in fact, really want you to do. Granted, not everyone wants this as their favorite thing (Clippy, for example, might well prefer that your altruism towards her were more paperclip-themed — though even she is generally glad not to die); but the population of agents who are broadly supportive, especially on present-day Earth, is especially wide — much wider, indeed, than the population excited about the neon face thing. In this sense, that is, Johnny Appleseed finds himself surrounded, if not by apple tree obsessives, then at least, by lots of people very happy to grow apples. Their Gods, in this sense, are aligned.

We might look for other distinctions, too — related, perhaps, to intuitive (if sometimes hard to pin down) boundaries between self and other, “about me” vs. “about others.” Salient to me, for example, is the fact that wanting other people to flourish, for its own sake, breaks through a certain kind boundary that I think can serve, consciously or unconsciously, as a kind of metaphysical justification for certain types of conventional “selfishness” — namely, the boundary of your mind, your consciousness; perhaps, your “life,” or what I’ve called previously, your “zone.” Sidgwick, in my hazy recollection, treats something like this boundary as impassible by practical reason, and hence posits an irreconcilable “dualism” between rational egoism and utilitarianism — one that threatens to reduce the Cosmos of Duty to a Chaos. And I think we see it, as well, in various flavors of everyday solipsism; solipsism that forgets, at some level, that what we do not see — the lives and struggles and feelings of others; the dreams and ideals and battles of our ancestors; history, the future, the unknown, the territory — still exists; or perhaps, a solipsism which treats the world beyond some veil as existing in some lesser sense, some sense that is not here; not the same type of real as something “in me.” 

To care, intrinsically, about what happens to someone else, even if you never find out, is to reject this solipsism. It is to step, perhaps briefly, out of the cave, and into the sun; to reach past the bubble of your mind, and into the undiscovered land beyond. And an aspiration towards impartiality, or to see “from the point of view of the universe,” can be seen as an effort to stay in the sun, fully; to treat equally real things as equally real; to build your house on the rock of the world.

Is trying to build such a house distinctive to altruism? Not from any theoretical perspective: the rock of the world doesn’t actually privilege one set of values over others. Utility functions, after all, are rankings over real worlds, and you can fight for any ranking you like. Clippy cares about the world beyond her map, too; so does a tyrant who wants to live on in the history books, or in people’s memories (is that “about others?”); and even the hedonistic egoist can be understood as endorsing a ranking over full worlds — albeit, one determined solely by the pleasure that occurs in the bits she labels “my mind.” Indeed, in a sense, the point of view of the universe is the place for basically everyone to start; it’s just, the universe itself; the thing that’s actually there, where whatever you’re fighting for will actually happen.

In practice, though, it feels like the difference between zone and beyond-zone, cave and sun, is important to (though not exhaustive of) my own conception of impartial altruism. This difference, for example, is one of the main things I feel inclined to question the egoist about: “are you sure you’re really accounting for the fact that the world beyond you exists? That other people are just as real as you?” Perhaps they are; and perhaps, there, the conversation stalls. But often, the conversation doesn’t stall when I have it with myself. To the contrary, for me, it is when the world seems most real that the case for altruism seems clearest. Indeed, I think that in general, altruism believes very hard in the existence of the world — and impartial altruism, in the entire world. Other things believe this too, of course: it’s a kind of “step 1.” But for the altruistic parts of my own life, it feels like it’s a lot of the way.

(Thanks to Nick Beckstead, Paul Christiano, Ajeya Cotra, and especially Katja Grace for discussion.)


Some thoughts on David Roodman’s GWP model and its relation to AI timelines

20 июля, 2021 - 01:59
Published on July 19, 2021 10:59 PM GMT

[Cross posted from the EA forum.]

I’ve been working on a report (see blog) assessing possible trajectories for GWP out to 2100. A lot of my early work focussed on analysing a paper of my colleague David Roodman. Roodman fits a growth model to long-run GWP; the model predicts a 50% probability that annual GWP growth is >= 30% by 2043.

I was thinking about whether to trust this model’s GWP forecasts, compared with the standard extrapolations that predict GWP growth of ~3% per year or less.[1] I was also thinking about how the model might relate to AI timelines.

This post briefly describes some of my key takeaways, as they don’t figure prominently in the report. I explain them briefly and directly, rather than focussing on nuance or caveats.[2] I expect it to be useful mostly for people who already have a rough sense for how Roodman’s model works. Many points here have already been made elsewhere.

Although for brevity I sometimes refer to “Roodman’s extrapolations”, what I really mean is the extrapolations of his univariate model once it’s been fitted to long-run GWP data. Of course, David does not literally believe these extrapolations. More generally, this post is not about David’s beliefs at all but rather about possible uses and interpretations of his model.

[Views are my own, not my employers]

Economic theory doesn’t straightforwardly support Roodman’s extrapolation over standard extrapolations

Early on in the project, I had the following rough picture in my mind (oversimplifying for readability):

Standard extrapolations use what are called ‘exogenous growth models’. These fit the post-1900 data well. However, the exponential growth is put in by hand and isn’t justified by economic theory. (Exogenous growth models assume technology grows exponentially but don’t attempt to justify this assumption; the exponential growth of technology then drives exponential growth of GDP/capita.)

On the other hand, endogenous growth models can explain growth without putting in the answer by hand. They explain technological progress as resulting from economic activity (e.g. targeted R&D), and they find that exponential growth is implausible - a knife-edge case. Ignoring this knife-edge case, growth is either sub- or super- exponential. Roodman fits an endogenous growth model to the data and finds super-exponential growth (because growth has increased over the long-run on average).

So Roodman’s model uses a better growth model (endogenous rather than exogenous). Roodman’s model also has the advantage of taking more data in account (standard economists typically don‘t use pre-1900 data to inform their extrapolations).

Overall, we should put more weight on Roodman than standard extrapolation, at least over the long-run.

I no longer see things this way. My attitude is more like (again oversimplifying for readability):

Although exogenous growth models don’t justify the assumption of exponential growth of technology, semi-endogenous growth models justify this claim pretty nicely.[3] These semi-endogenous models can explain the post-1900 exponential growth and the pre-1900 super-exponential growth in a pretty neat way - for example see Jones (2001).

Roodman’s model departs from these semi-endogenous models primarily in that it assumes population is ‘output-bottlenecked’.[4] This assumption means that if we produced more output (e.g. food, homes), population would become larger as a result: more output → more people. This assumption hasn’t been true over the last 140 years, and doesn’t seem to be true currently: since the demographic transition in ~1880 fertility has decreased even while output per person increased. (That said, significant behaviour change or technological advance could make the assumption reasonable again, e.g. return to Malthusian conditions, human cloning, AGI)

So semi-endogenous growth models are more suitable than Roodman’s model for extrapolating GWP into the future: the main difference between them is that the latter assumes population is output-bottlenecked. Both theories can explain the pre-1900 data,[5] and semi-endogenous models provide a better explanation of the post-1900 data.

Overall, by default I’ll trust the projections of the semi-endogenous models.[6] There’s one important caveat. If significant behaviour change or tech advance happens, then population may become output-bottlenecked again. In this case, I’ll trust the predictions of Roodman’s model.

Roodman’s GWP extrapolation is aggressive from an outside-view perspective

The above section gives an inside-viewy reason to think Roodman’s GWP projections are aggressive. (The model assumes population is output-bottlenecked; when you remove this assumption you predict slower growth.)

I think they’re also aggressive from an outside-view perspective, based purely on recent GWP growth data.

First, the model over-predicts GWP growth over the last 60 years and over-predicts frontier GDP/capita growth over the last 120 years. (This is widely recognised, and is documented in Roodman’s original post.)

Second, its median prediction for growth in 2020 is 7%. This is after updating on the GWP data until 2019. Why is this? Roodman’s model bases its prediction on the absolute level of GWP, and doesn’t explicitly take into account the recent growth rate. Roughly speaking, it believes that "higher GWP means higher growth" based on the pre-1900 data and it observes ~3% growth in the 1900s. GWP in 2020 is way higher than the average GWP in the 1900s, so the model predicts a higher value for 2020 growth than it observed throughout the 1900s.[7]

Why does it matter that the model predicts 7% growth in 2020? Well, GDP growth in frontier economies has recently been more like 2% (source). That’s a difference of 1.8 doublings.[8] Another 1.8 doublings gets us to 24% growth.[9] In log-space, Roodman’s model thinks that we’ve already covered half the distance between 2% and 24%.[10]

To put it another way, Roodman’s model falsely thinks we’ve already covered ~half the distance to TAI-growth in log-space.

If in fact we have to travel twice as far through log-space, it will take more than twice as long according to hyperbolic-y models like Roodman’s. That’s because each doubling of growth is predicted to take less long than the last. Roodman’s model thinks we’ve already covered the slowest doublings (from 2% to 4%, 4% to 7%). In its mind, all we have left are the much-quicker doublings from 7 to 14% and 14% to 24%.

How would adjusting for this change the GWP projections? Roughly speaking, it should much more than double the time until 24% growth. Double it because growth has to double twice as many times. Much more than double it because the doublings Roodman’s model omitted will take much longer than the ones it included.

I modelled this a bit more carefully, focussing on the time until we have 30% growth. Roodman’s model’s median prediction for the first year of 30% growth is 2043 - ~20 years away. I tried to adjust the model in two hacky ways, each time forcing it to predict that the growth in 2020 was 2%.[11] I found the median prediction for 30% growth shifts back to ~2110 or later - ~90 years away.[12] The time to 30% much more than doubles.

(The average GWP growth over the last 20 years is ~3.5%. If I set the 2020 growth to 3.5% rather than 2%, the predicted date of explosive growth is delayed to ~2075, ~55 years away.)

In other words, if we adjust Roodman’s model based on what we know to be the current growth rate, its predictions become much more conservative.

To be clear, I’m not saying that these adjustments make the model ‘better’. For example, they may overadjust if the recent period is surprisingly slow.[13] But I do think my adjustments are informative when considering what to actually believe from an outside-view perspective about future growth, especially in the next few decades. From an outside view perspective, I’d personally put more weight on the adjusted models than on Roodman’s original model.

(Note: there may be inside view-y reasons to think an AI-driven growth acceleration will be sooner and more sudden faster than Roodman’s model suggests; I’m putting these aside here.)

Roodman’s GWP extrapolation shouldn’t be given much weight in our AI timelines

Roodman’s model can predict how long it will take to get to a 30% annual GWP growth. Some people have thought about using this to inform timelines for transformative artificial intelligence (TAI). This rough idea is “we have pretty good outside-view reasons to think 30% growth is coming soon; TAI is the only plausible mechanism; so we should expect TAI soon”.

I don’t think this reasoning is reliable, for a few reasons (some discussed above):

  • The same reasoning would have led you astray over the last few decades, as the model’s predicted date of 30% growth has been increasingly delayed.
  • The model thinks we’re already halfway in log-space to TAI-growth; this makes its TAI timelines aggressive.
  • We shouldn’t trust the predictions of Roodman’s model until we have advanced AI (or population is output-bottlenecked for another reason). So it can’t predict when advanced AI will happen.
    • In my mind, population being output-bottlenecked is (part of) the mechanism for super-exponential growth. Roodman’s model describes how powerful this mechanism has been over human history: how quickly has it led growth to increase.[14] The mechanism no longer applies, due to the demographic transition. However, advanced AI could reinstate the mechanism in a new form. So forecasting advanced AI is like forecasting when this mechanism will be in place again. But Roodman’s model forecasts growth on the assumption that the mechanism already applies; it can’t (reliably!) forecast when the mechanism will start to apply again.

      • Here I’m drawing on my belief that population being output-bottlenecked was an important mechanism driving historical super-exponential growth, that this mechanism no longer applies, and that AI could reinstate this mechanism in a new form.
  • The dynamics of historical growth and a potentially future AI-driven growth explosion will be different in many ways.
    • Roodman’s model is fit to long-run GWP data. The dynamics increasing growth in this period are more people -> more ideas -> more people and a probably bunch of significant one-off changes in institutions around the industrial revolution.[15]

    • With AI the dynamics of increasing growth are more AIs -> more ideas -> more hardware/software/wealth -> more and cleverer AIs ->...

    • There’s a suggestive surface similarity there, suggesting that if the former leads to super-exponential growth the latter might as well.

    • But the actual processes will look pretty different which could introduce huge differences in growth.

      • e.g. ‘How easy is it to make AI cleverer compared with humans?’, ‘How many resources does it take to sustain an AI mind compared with a human mind?’, ‘AIs can be copied’, ‘People may not want to hand over tasks to AIs’, ‘Will diminishing marginal returns to tech R&D be different for AI minds than for human minds?’.

Overall I think Roodman’s model is useful for indicating that something big could happen, that growth could dramatically accelerate, but otherwise not very informative. To the extent Roodman’s model is informative about AI timelines, I view it as aggressive for the reasons given in the bullets.

  1. E.g. see here or links from the report. ↩︎

  2. Read the report for nuance and caveats! ↩︎

  3. In semi-endogenous growth models technology improves as the result of R&D effort but there are diminishing returns -- each 1% increase in technology (measured as TFP) requires more researcher-year than the last -- so that you need exponential growth in researchers to sustain exponential technology growth. The justification of exponential growth is then roughly: the number of researchers has grown roughly exponentially, so we’d expect technology to have grown roughly exponentially as well. ↩︎

  4. This assumption is made explicit in Roodman’s multivariate model. The univariate model doesn’t feature population, so is naturally understood as a purely statistical model without interpretation. However, in Roodman’s paper the univariate model is motivated theoretically as the univariate analogue of the multivariate model (in which population is output-bottlenecked). This is why I say that the univariate model “assumes population is ‘output-bottlenecked’”. Technically, you can get hyperbolic growth from the multivariate model even if population is held constant and so it is not literally true that the univariate model assumes population is output-bottlenecked. However, more extreme parameter values are needed for this to happen, and such values are in tension with the non-hyperbolic growth of the last 100 years. So in practice, if not technically, I do think of Roodman’s univariate model as assuming that population is output-bottlenecked. ↩︎

  5. Here I’m putting aside reasonable doubts over whether their explanation of pre-1900 growth is correct. ↩︎

  6. When combined with the standard assumption that global population will stabilize, semi-endogenous models imply economic growth will gradually slow down over time. They don’t imply constant exponential growth long into the future. ↩︎

  7. I’ve oversimplified my description of the data to simplify this paragraph. In reality GWP growth increased until ~1960 and got as high as 5%, even though frontier growth stopped increasing from 1900. ↩︎

  8. 2 * 2^(1.8) = 7 ↩︎

  9. 7 * 2^(1.8) = 24 ↩︎

  10. Why think about it in terms of log-space? Roodman’s model (ignoring the randomness) believes that “each time GWP increases by a factor f, GWP growth doubles” f depends on the data, and comes out at about 3.5 for Roodman’s data set. So in Roodman’s model, considering doublings of growth, i.e. log-space, is natural: growth doubles each time GWP increases by 3.5X. This is true of other hyperbolic models as well, e.g. Kremer (1993). ↩︎

  11. Perhaps I should have set it to 2%, but I was using the recent GWP growth rate of 3.5%. ↩︎

  12. The first method is my ‘growth multiplier’ explained here. Its median predicted date of explosive growth ranged from **2120 - 2140 **depending on an arbitrary choice of timescale (r in the model). See code here. The second method just reduces the instantaneous growth rate of Roodman’s model at every time-step by a constant factor 2/7 (because it currently predicts 7% rather than 2%). This led to a median prediction of 2110. See code here. ↩︎

  13. I also think Roodman’s unadjusted model is more informative about how fast we could grow if the population were as large as our economy could support (Malthusian conditions). ↩︎

  14. Of course, Roodman’s parameters will also implicitly include other mechanisms influencing growth like the massive increase in the share of labour focussed on innovation, improvements in education, and other things. ↩︎

  15. E.g. expansion of R&D as a share of the economy and better institutions for investing in new businesses. ↩︎


Work on Bayesian fitting of AI trends of performance?

19 июля, 2021 - 21:45
Published on July 19, 2021 6:45 PM GMT

I remember seeing a report on trends of performance in AI. It would be quite useful for me to find it again. As I remember it, the author looked at things like ImageNet top5 accuracy and extrapolated how it increased over time.

I can't remember the author, though I think it was sponsored by OpenPhil. Anyone knows what I am talking about? 

I'd also be interested in related work about predicting trends of performance in technology more in general :)


Is the argument that AI is an xrisk valid?

19 июля, 2021 - 18:08
Published on July 19, 2021 1:20 PM GMT

Hi folks,

My supervisor and I co-authored a philosophy paper on the argument that AI represents an existential risk. That paper has just been published in Ratio. We figured LessWrong would be able to catch things in it which we might have missed and, either way, hope it might provoke a conversation. 

We reconstructed what we take to be the argument for how AI becomes an xrisk as follows: 

  1. The "Singularity" Claim: Artificial Superintelligence is possible and would be out of human control.
  2. The Orthogonality Thesis: More or less any less of intelligence is compatible with more or less any final goal. (as per Bostrom's 2014 definition)

From the conjuction of these two presmises, we can conclude that ASI is possible, it might have a goal, instrumental or final, which is at odds with human existence, and,  given the ASI would be out of our control, that the ASI is an xrisk.

We then suggested that each premise seems to assume a different interpretation of 'intelligence", namely:

  1. The "Singularity" claim assumes general intelligence
  2. The Orthogonality Thesis assumes instrumental intelligence

If this is the case, then the premises cannot be joined together in the original argument, aka the argument is invalid.

We note that this does not mean that AI or ASI is not an xrisk, only that the the current argument to that end, as we have reconstructed it, is invalid.

Eagerly, earnestly, and gratefully looking forward to any responses. 


Preparing for ambition

19 июля, 2021 - 09:13
Published on July 19, 2021 6:13 AM GMT

(Cross posted on my personal blog.)

When I was in college, I decided that I wanted to spend my life starting startups. It seemed like the perfect "career path". If things really work out for you, the reward is billions of dollars. Awesome. But here's the really cool part: even if you "fail", like that Seinfeld episode, things will still even out for you.

The consolation prize is a life working as a programmer. What does that life look like? Well, it's pretty cushy. You make good money, are treated relatively well, have the option of working remotely, and get to do something that is intellectually interesting. Not bad. And if you play your cards right, you'll be able to retire early.

So back when I was in college beginning my startup journey, this was my mindset. Don't get me wrong, I did really, really, really want to make those billions of dollars and use the money to change the world. That is definitely where the crosshairs were aiming. But at the same time, I recognized that if that didn't work out for me, life would still be pretty great.

I like to think of this as the Gravy Mindset. In the context of mashed potatoes, the gravy is "extra". Mashed potatoes are perfectly satisfying by themselves, but pouring gravy on top makes them extra good. In the context of this "career path" of starting startups, a 9-5 job as a programmer is the potato, and succeeding with a startup is the gravy. If we were to throw some random numbers around, we can say that the potato by itself is an 8/10, but if you add some gravy to it it brings it up to a 10/10.

This is how things started for me. I had this Gravy Mindset. But somewhere along the lines, things changed. Instead of it being an 8/10 and a 10/10, it became a 2/10 and a 4/10. I was so insanely fixated on the gravy that the potato started tasting pretty gross without it. And when I was fortunate enough to get a taste of some gravy, it just felt like, "Yeah, so? This is how it's supposed to be."

In the language of DHH, I had become poisoned by ambition.

In the right dose, ambition works wonders. It inspires you to achieve more, to stretch yourself beyond the comfort zone, and provides the motivation to keep going when the going gets tough. Rightfully so, ambition is universally revered.

But ambition also has a dark, addictive side that’s rarely talked about.


That’s exactly the danger of what too much ambition can do: Narrow the range of acceptable outcomes to the ridiculous, and then make anything less seem like utter failure.


But when the ambition is cranked up to the max due to prior accomplishments and success, it can easily provide only pressure and anxiety. When that’s the case, winning isn’t even nearly as sweet as the loss is bitter. When you expect to win, it’s merely a checked box if you do — after the initial rush of glory dies down.

I want to really emphasize how irrational I think this all is. Let's use a different analogy. This analogy demonstrates a perspective that I have on happiness.

Imagine that you go to a restaurant. Your favorite food is chicken parmiagiana, and that's what you plan on ordering. But when the waiter comes over, he tells you that they don't have any chicken parmiagiana.

Who cares! There are so many other delicious items on the menu! How about some chicken francese?!

And let's take things further and suppose that they are all out of a whole category of items, like meat. No worries. Penne a la vodka would be delicious. Even if they were out of all entres, soup and breadsticks would also be great.

The point is, there are so many delicious items on the menu. And, analogously, there are so many ways in life to derive happiness. If one of them is taken away from you or just doesn't work out, it's fine. There are so many other options.

Personally, I take this to an extreme. I like to think that as long as I have my mind, I'd be ok. The mind is such a powerful thing. There is so much you can do with it.

Unfortunately, neither my Gravy Mindset nor my Menu Mindset proved to be an antidote to the poison of ambition. How can this be? I think it has to do with Mental Mountains and subagents inside your mind.

You know how in the movie Inside Out there are the different emotions? How the blog Wait But Why talks about different characters inside your head like the Panic Monster and the Instant Gratification Monkey? How in HPMoR each house is a separate voice inside of Harry's head? At first I thought that these were just cute rhetorical devices. Now I am realizing that they are pointing towards a deep, important truth about how our minds actually work.

Here's the theory. Your mind doesn't just consist of one "self". There are many different selves. Different subagents. And these different selves often disagree with each other. For example, in my case, one self believes in the Gravy Mindset, whereas another self is like this sad guy in the movie Soul: hopelessly obsessed with and poisoned by an ambition. Unfortunately, it is the latter self that is in charge of producing emotions for me.

How can these different "selves" persist in this state of disagreement? Why doesn't the Gravy Mindset self just walk up to the poisoned self and enlighten him? That would solve all of my problems.

Well, basically, the different selves all live in different towns. Let's use the Mental Mountains analogy. It's as if each self lives in their own town, and each town is located in a valley that is walled off from the other towns by large mountains. In theory you could have the Gravy Mindset self hike up the mountain into the neighboring town, but making that hike is quite difficult. You have to cross some rough terrain.

This is where emotional memory reconsolidation comes in. It's supposed to be a new form of therapy that helps you make the hike up and over the mountain, so that you can rescue that stranded, wounded self living alone in a valley, causing you all of this pain. The idea is that you have to actually make that hike. You can't just try to yell over the mountain, "Stop being stupid and causing me all of this pain!". You have to actually make the trek.

Dr. Tori Olds has a different analogy that I also think is really good. Imagine that you have a zip file on your computer. You can't just edit it in that state. You have to first unzip it. Only then can you edit it. Same thing with healing a damaged subagent. The subagent is zipped up. You can't just yell at it and have it update. You have to unzip it (or activate it). Only then can you heal it.

Hopefully your head isn't spinning too much from all of these analogies. I'd really encourage you to click some of the links and learn more about this stuff. If there is one thing I could say to my past self, it would probably be this stuff. Along with the fact that ambition can be poison.

When I was younger, I never would have expected myself to become poisoned by ambition. Why would I? I had a perfectly good, rational understanding of why the pursuit of startups shouldn't cause me stress and anxiety. Stress and anxiety make sense if you're worried about something truly bad happening. Like if you have a family to support and medical bills to pay and are in danger of losing your job, it would be rational to fear losing your job, and thus to feel stress and anxiety. But in my situation, the fallback plan was a cushy life of a programmer, so there shouldn't be any stress.

Writing this out now, it sounds utterly ridiculous. What about all of the lived experiences I've had that demonstrate that I respond with stress and anxiety in many, many situations where I theoretically "shouldn't" have such a response? Eg. feeling stress about getting a B instead of an A in school. There is no rational reason to feel such stress, but nevertheless, I felt it. So clearly my mind doesn't produce emotions like an idea rational agent should. In which case, why expect that I would respond so rationally to the stresses of a startup? It was a big mistake that I should never have made. Again, if I could go back and tell my past self something, this is what comes to my mind right now as the most important thing.

Actually, I'm not sure. What would be the point of telling my past self this? In a perfect world, I'd be able to prepare for ambition. It's not that I'd want to go down a different path for my life. I'm happy that I am pursuing startups. I just wish that I was more mentally prepared for it.

An analogy that comes to mind is when you go to the gym for the first time in a while. Say you usually run an eight minute mile. If you haven't been to the gym in a month and you try to run an eight minute mile, you'll be utterly hating life. But if you build yourself back into shape first, you'll be able to run at that eight minute mile pace reasonably well. It's not like you won't be out of breath. It'll still be a workout. But, there's a big difference, y'know? It's a "good hurt". That's what I wish I could have done before pursuing startups. Train myself mentally such that I am "in shape" and respond to stress with a "good hurt".

Maybe I'm being just as naive now as I was when I made my original mistake. Think about reference classes. Everyone who starts startups is super stressed and anxious. Everyone who pursues serious ambitions more generally is super stressed and anxious. You can train yourself to run an eight minute mile, but that's not where the bar is here. The bar of "don't get stressed while pursuing a serious ambition" is more like getting yourself in shape to run a four minute mile. Is there anyone in the world that can do that? If so, it's gotta be pretty rare.

Is this assumption correct? I'm not actually sure. If anyone out there knows the answer or even just has some data points, I'd appreciate you letting me know. I do get a pretty strong sense that a four minute mile is a pretty good analogy though. I'm sure there are some world class "athletes" who manage to pull it off, but it's incredibly hard. One data point that comes to my mind is this interview with Jerry Seinfeld on the Tim Ferris podcast. Around minute 53, Jerry says he thinks depression and anxiety happens to all creative types, and Tim seems to agree. And there was another point in the podcast where he talks about stand up comedians specifically. He says psychologically, it's a super grueling career, and that basically all the stand up comedians he knows that didn't fizzle out end up having serious issues with anxiety. What I hear through the grape vine in careers like startups and academic research is that it's pretty similar. I don't know enough about other fields, but I strongly suspect it's the same thing.

In which case, is it even worth preparing for ambition in the first place? Even if you get yourself to the point of being able to run a six minute mile, when you try to then run a four minute mile, you'll still end up utterly exhausted. So what's the point?

I don't have the answers here. I've made it to the point where I'm posing the questions, which I'd like to pat myself on the back for as making progress, but there is still a long ways to go. And right now I am trying to make progress on this. I've spent five years of my life starting two different startups. Both failed. I've learned so much though, and I am ITCHING to apply what I've learned and begin starting startups again. It's been literally keeping me up at night. But I know what a grueling path that could be to start walking down, so I'm trying to be proactive before I do so again.