## Вы здесь

# Сборщик RSS-лент

### Relevance Norms; Or, Gricean Implicature Queers the Decoupling/Contextualizing Binary

**Reply to:** Decoupling vs Contextualising Norms

Chris Leong, following John Nerst, distinguishes between two alleged discursive norm-sets. Under "decoupling norms", it is understood that claims should be considered in isolation; under "contextualizing norms", it is understood that those making claims should also address potential implications of those claims in context.

I argue that, at best, this is a false dichotomy that fails to clarify the underlying issues—and at worst (through no fault of Leong or Nerst), the concept of "contextualizing norms" has the potential to legitimize derailing discussions for arbitrary political reasons by eliding the key question of *which* contextual concerns are *genuinely relevant*, thereby conflating legitimate and illegitimate bids for contextualization.

Real discussions adhere to what we might call "relevance norms": it is almost universally "eminently reasonable to expect certain contextual factors or implications to be addressed." Disputes arise over *which* certain contextual factors those are, not *whether* context matters at all.

The standard academic account explaining how what a speaker means differs from what the *sentence* the speaker said means, is H. P. Grice's theory of conversational implicature. Participants in a conversation are expected to add neither more nor less information than is needed to make a *relevant* contribution to the discussion.

Examples abound. If I say, "I ate some of the cookies", I'm *implicating* that I didn't eat *all* of the cookies, because if I had, you would have expected me to say "all", not "some" (even though the decontextualized sentence "I ate some of the cookies" is, in fact, true).

Or suppose you're a guest at my house, and you ask where the washing machine is, and I say it's by the stairs. If the machine then turns out to be broken, and you ask, "Hey, did you know your washing machine is broken?" and I say, "Yes", you're probably going to be pretty baffled why I didn't say "It's by the stairs, *but you can't use it because it's broken*" earlier (even though the decontextualized answer "It's by the stairs" was, in fact, true).

Leong writes:

Let's suppose that blue-eyed people commit murders at twice the rate of the rest of the population. With decoupling norms, it would be considered churlish to object to such direct statements of facts. With contextualising norms, this is deserving of criticism as it risks creates a stigma around blue-eyed people.

With relevance norms, objecting might or might not make sense depending on the context in which the direct statement of fact is brought up.

Suppose Della says to her Aunt Judith, "I'm so excited for my third date with my new boyfriend. He has the most beautiful blue eyes!"

Judith says, "Are you sure you want to go out with this man? Blue-eyed people commit murders at twice the rate of the general population."

How should Della reply to this? Judith is just in the wrong here—but *not* as a matter of a subjective choice between "contextualizing" and "decoupling" norms, and not because blue-eyed people are a sympathetic group who we wish to be seen as allied with and don't want to stigmatize. Rather, the probability of getting murdered on a date is quite low, *and* Della already has a lot of individuating information about whether her boyfriend is likely to be a murderer from the previous two dates. Maybe (Fermi spitballing here) the evidence of the boyfriend's eye color raises Della's probability of being murdered from one-in-a-million to one-in-500,000? Judith's bringing the possibility up *at all* is a waste of fear in the same sense that lotteries are said to be a waste of hope. Fearmongering about things that are almost certainly not going to happen is *uncooperative*, in Grice's sense—just like it's uncooperative to tell people where to find a washing machine that doesn't work.

On the other hand, if I'm making a documentary film interviewing murderers in prison and someone asks me why so many of my interviewees have blue eyes, "Blue-eyed people commit murders at twice the rate of the rest of the population" is a *completely relevant reply*. It's not clear how else I could possibly answer the question without making reference to that fact!

So far, *relevance* has been a black box in this exposition: unfortunately, I don't have an elegant reduction that explains what *cognitive algorithm* makes some facts seem "relevant" to a given discussion. But hopefully, it should now be intuitive that the determination of what context is relevant is the consideration that is, um, relevant. Framing the matter as "decouplers" (context doesn't matter!) *vs*. "contextualizers" (context matters!) is misleading because once "contextualizing norms" have been judged admissible, it becomes easy for people to motivatedly derail any discussions they don't like with endless isolated demands for contextualizing disclaimers.

Discuss

### Do you get value out of contentless comments?

Some people like to receive comments of the form "Good post!", even when these comments contain no other engagement with the post. If you post on LW, I'd like to know (a) whether you like receiving these comments, and (b) whether you like receiving these comments more than you would like receiving a strong upvote by their authors.

Discuss

### Historical forecasting: Are there ways I can get lots of data, but only up to a certain date?

Suppose I wanted to get good intuitions about how the world works on historical timescales.

I could study history, but just reading history is rife with historical hindsight bias, both on my own part, and even worse, on the part of the authors I'm reading.

So if I wanted to master history, a better way would be to do it forecasting-style. I read what was happening in the some part of the world, up to a particular point in time, and then make bets about what will happen next. This way, I have feedback as I'm learning, and I'm training an actual historical predictor.

However, this requires a **strong** limit be enforced on the materials I'm reading: no information about "what's going to happen" can leak backwards. And unfortunately, this is kind of standard in history books. Usually, the author talk about how events are leading towards other events that they know will occur.

Is there some databases (or something), where I might be able to read a wide number of primary sources and economic / socioeconomic indicators (like the amount of pottery fragments, average skeleton size, how far specialized goods traveled, how much money was in circulation, the literacy rate, etc.), but which will only show me data **up to** a certain date, with a strong constraint of not accidentally seeing spoilers?

Discuss

### Ненасильственное общение. Тренировка

### Ненасильственное общение. Тренировка

### Hybrid Lottery Update

Beantown Stomp has been open for registration for a bit over a week, and we have 122 people registered. In our announcement we said we would run a lottery for the remaining tickets if we had 150 registrations in the first week, and we didn't, so we're going to stay first-come first-served.

When I initially proposed a hybrid model I had been thinking of running the initial stage for a month, and after people convinced me that a month was too long we decided to go for a week. The goal with a hybrid lottery is that you only run a lottery if you need one. Some events are so popular that they sell out before everyone who would like a ticket has had a chance to fill out the registration form, while others don't sell out at all.

Looking at the shape of the registration curve, selling two thirds of the tickets in the first 24hr would be a better way to identify whether a lottery is needed. Most of the first-week registrations were in the first 24hr, after which it tapered way off.

Now that I no longer think it's important to leave registration open for a week or a month, though, we really could just do something simpler. Open registration, and treat all entries you get in the first 24hr equally. If after 24hr you're full, run a lottery, otherwise start first-come-first-served. Since it already takes us about a day to get back to people anyway, there's not much downside in this approach.

I'd probably actually want to do this by announcing "registration will open on day X, though you can fill out the form now, we just won't get back to you until day X+1".

Am I missing anything, or should we do this for future iterations?

Discuss

### Double Cruz and Verification of Claims

On Overcoming Bias Hanson asks what is verifiable.

One comment includes: Play the "double crux game", where any disagreement regarding something unverifiable is reduced to an assertion which is simpler to check, which is reduced to an assertion that is simpler to check, which is eventually reduced to an assertion that all agree is verifiable.While I think the idea of double crux is very useful in identifying sources of disagreement that allow people to avoid talking past one another I question the idea that it would necessarily lead to allowing some higher level claim be verified.

It strikes me as something of a fallacy of composition type error.

Does this seem a reasonable view to others?

Discuss

### Defining AI wireheading

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

What does it mean for an AI to wirehead its reward function? We're pretty clear on what it means for a human to wirehead - artificial stimulation of part of the brain rather than genuine experiences - but what does it mean for an AI?

We have a lot of examples of wireheading, especially in informal conversation (and some specific prescriptive examples which I'll show later). So, given those examples, can we define wireheading well - cut reality at its joints? The definition won't be - and can't be - perfectly sharp, but it should allow us to have clear examples of what is and what isn't wireheading, along with some ambiguous intermediate cases.

Intuitive examplesSuppose we have a weather-controlling AI whose task is to increase air pressure; it gets a reward for so doing.

What if the AI directly rewrites its internal reward counter? Clearly wireheading.

What if the AI modifies the input wire for that reward counter? Clearly wireheading.

What if the AI threatens the humans that decide on what to put on that wire? Clearly wireheading.

What if the AI takes control of all the barometers of the world, and sets them to record high pressure? Clearly wireheading.

What if the AI builds small domes around each barometer, and pumps in extra air? Clearly wireheading.

What if the AI fills the atmosphere with CO₂ to increase pressure that way? Clearly wire... actually, that's not so clear at all. This doesn't seem a central example of wireheading. It's a failure of alignment, yes, but it doesn't seem to be wireheading.

Thus not every example of edge or perverse instantiation is an example of wireheading.

Prescriptivist wireheading, and other definitionsA lot of posts and papers (including some of mine) take a prescriptivist approach to wireheading.

They set up a specific situation (often with a causal diagram), and define a particular violation of some causal assumptions as wireheading (eg "if the agent changes the measured value X without changing the value of α, which is being measured, that's wireheading").

And that is correct, as far as it goes. But it doesn't cover all the possible examples of wireheading.

Conversely, this post defines wireheading as a divergence between a true utility and a substitute utility (calculated with respect to a model of reality). This too general, almost as general as saying that every Goodhart curse is an example of wireheading.

The definitionThe most intuitive example of wireheading is that there is some property of the world that we want to optimise, and that there is some measuring system that estimates that property. If the AI doesn't optimise the property, but instead takes control of the measuring system, that's wireheading (bonus points if the measurements the AI manipulates go down an actual wire).

This re-emphasises that "wireheading is in the eye of the beholder": if our true goal is actually the measuring system (maybe our AI is in competition with another one to maximise a score in a game, and we really don't care how it does this), then there will be no wireheading, just an AI following a correct objective.

Thus wireheading is always a failure of some (implicit or explicit) goal; thus every example of wireheading is a failure of value alignment, though the converse is not true.

Also key to the definition is the fact that the measuring system is, in some sense "much smaller" than whatever property of the system it is measuring. Pumping out CO₂ is not the correct instantiation of some goal along the lines of "increase air pressure so humans enjoy better weather"; but nor is it merely manipulating the measurement of that goal.

The definitionThus we can define wireheading as:

- Given some implicit goal G, an agent wireheads if, instead of moving towards G, it manipulates some
*narrow*measurement channel that is intended to measure G, but will fail to do so after the agent's manipulation.

The difference with the prescriptivist approach is that the measurement channel is not specified; instead, we ask whether we can usefully characterise some feature of the setup as a "narrow measurement channel", and then apply the definition.

This can be seen as a particular failure of abstraction: the abstract goal G was collapsed to the output of the measurement channel.

Examples, counter-examples, and edge casesUnder this definition, all the intuitive examples of wireheading above fit: the measurement channel the AI takes over (its internal counter, the wire going into it, the statements made by humans, the barometers, the immediate environments of the barometers) is always much smaller than the whole atmosphere, which was its intended goal.

And that's why the CO₂ example isn't wireheading: the AI is doing a massive manipulation of the world, on the same scale as its intended goal; it isn't just manipulating the measurement channel[1].

The case of the domes around the barometers is an interesting one to consider. Obviously, if the AI put a dome around the planet and pumped in extra air, this wouldn't count as wireheading. Thus, we can imagine the domes growing bigger and merging, thus giving a smooth transition from "clearly wireheading" to "clearly not wireheading", and showing that ambiguous cases must exist.

We can also produce examples of Goodhart curse that are not wireheading. Take the practice of "teaching to the test". In this case, there is a simple objective (the test results) and the school acts to optimise for that objective. However, in typical schools this is not wireheading; teaching to the test involves drilling students in specific skills, training them, and having them memorise certain facts. Though these are done specifically to pass the test, these are the kinds of actions that a teacher would undertake anyway. One can talk about how this "narrows" the intellect, but, except in extreme cases, this cannot be characterised as gaining control of a narrow measurement channel.

For an interesting edge case, consider the RL agent playing the game CoastRunners. As described here, the score-maximising agent misbehaved in an interesting way: instead of rushing to complete the level with the highest score possible, the agent instead found a way to boat in circles, constantly hitting the same targets and ever increasing its score.

Is that wireheading? Well, it's certainly Goodhart: there is a discrepancy between the implicit goals (got round the course fast, hitting targets) and the explicit (maximise the score). But do we feel that the agent has control of a "narrow" measurement channel?

I'd argue that it's probably not the case for CoastRunners. The "world" for this agent is not a particularly rich one; going round and round and hitting targets is what the agent is intended to do; it has just found an unusual way of doing so.

If, instead, this behaviour happened in some subset of a much richer game (say, SimCity), then we might see it more naturally as wireheading. The score there is intended to measure a wider variety of actions (building and developing a virtual city while balancing tax revenues, population, amenities, and other aspects of the city), so "getting a high score while going round in circles" is much closer to "controlling a measurement channel that is narrow (as compared to the implicit goal)" than in the CoastRunners situation.

But, this last example can illustrate the degree of judgement and ambiguity that can exist when identifying wireheading in some situations.

Note that the CO₂ example can fit with the definition of this post. One just needs to imagine that the agent's model does not specify the gaseous content of the air in sufficient detail to exclude a CO₂-rich air as a solution to the goal.

This illustrates that the definition used in that post doesn't fully capture wireheading. ↩︎

Discuss

### A Brief Intro to Domain Theory

So, domain theory is a fairly abstract branch of math which is about giving semantics to weird recursive constructions in computer science, in the form of partially ordered sets with additional structure. I'm still learning the parts of it which regard building a link to explicit computable rules that can be implemented in a programming language, and it's not easy at all to learn. Takes a lot of persistence.

However, the parts I *have* learned so far seem worth explaining more widely, due to the ability to pull off some *very unique* fixpoint constructions in domain theory that are very hard to do in any other area of math. The classical example is showing that there are nontrivial models of the untyped lambda calculus. Lambda terms can act as functions from lambda terms to lambda terms, but it's awfully hard to explicitly come up with a space .mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0}
.MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0}
.mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table}
.mjx-full-width {text-align: center; display: table-cell!important; width: 10000em}
.mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0}
.mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left}
.mjx-numerator {display: block; text-align: center}
.mjx-denominator {display: block; text-align: center}
.MJXc-stacked {height: 0; position: relative}
.MJXc-stacked > * {position: absolute}
.MJXc-bevelled > * {display: inline-block}
.mjx-stack {display: inline-block}
.mjx-op {display: block}
.mjx-under {display: table-cell}
.mjx-over {display: block}
.mjx-over > * {padding-left: 0px!important; padding-right: 0px!important}
.mjx-under > * {padding-left: 0px!important; padding-right: 0px!important}
.mjx-stack > .mjx-sup {display: block}
.mjx-stack > .mjx-sub {display: block}
.mjx-prestack > .mjx-presup {display: block}
.mjx-prestack > .mjx-presub {display: block}
.mjx-delim-h > .mjx-char {display: inline-block}
.mjx-surd {vertical-align: top}
.mjx-mphantom * {visibility: hidden}
.mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%}
.mjx-annotation-xml {line-height: normal}
.mjx-menclose > svg {fill: none; stroke: currentColor}
.mjx-mtr {display: table-row}
.mjx-mlabeledtr {display: table-row}
.mjx-mtd {display: table-cell; text-align: center}
.mjx-label {display: table-row}
.mjx-box {display: inline-block}
.mjx-block {display: block}
.mjx-span {display: inline}
.mjx-char {display: block; white-space: pre}
.mjx-itable {display: inline-table; width: auto}
.mjx-row {display: table-row}
.mjx-cell {display: table-cell}
.mjx-table {display: table; width: 100%}
.mjx-line {display: block; height: 0}
.mjx-strut {width: 0; padding-top: 1em}
.mjx-vsize {width: 0}
.MJXc-space1 {margin-left: .167em}
.MJXc-space2 {margin-left: .222em}
.MJXc-space3 {margin-left: .278em}
.mjx-test.mjx-test-display {display: table!important}
.mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px}
.mjx-test.mjx-test-default {display: block!important; clear: both}
.mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex}
.mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left}
.mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right}
.mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0}
.MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal}
.MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal}
.MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold}
.MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold}
.MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw}
.MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw}
.MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw}
.MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw}
.MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw}
.MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw}
.MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw}
.MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw}
.MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw}
.MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw}
.MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw}
.MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw}
.MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw}
.MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw}
.MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw}
.MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw}
.MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw}
.MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw}
.MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw}
.MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw}
.MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw}
@font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')}
@font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')}
@font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold}
@font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')}
@font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')}
@font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold}
@font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')}
@font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic}
@font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')}
@font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')}
@font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold}
@font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')}
@font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic}
@font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')}
@font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')}
@font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')}
@font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')}
@font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold}
@font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')}
@font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic}
@font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')}
@font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')}
@font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic}
@font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')}
@font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')}
@font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')}
@font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')}
@font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')}
@font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')}
@font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold}
@font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}
Λ that's isomorphic to the space of functions [Λ→Λ], except for the single-point space, due to cardinality constraints. We must work with a restricted notion of function space in order to dodge the cardinality constraints.

Also, in game theory, we can view a policy as a function from the opponents policy to a probability distribution over our own actions. This was the subject of the lawvere problem a while ago that was solved by reflective oracles. Specifically, you'd want a pair of spaces π1≅[π2→[0,1]], π2≅[π1→[0,1]]. (and then figure out what sort of computable construction corresponds to the thing you just built, if one exists).

If you want to have some seemingly-impossible type signature, it seems worthwhile to see what arsenal of tools domain theory has to let you construct such a space. So this will be a quick rundown of chapters 1-5 in these domain theory notes. http://www.cs.bham.ac.uk/~axj/pub/papers/handy1.pdf

**The Basics:**

Domains are a special type of partially ordered set, where the standard ≥ order roughly corresponds to "information content". Now, given an arbitrary partially ordered set, sup and inf (sup roughly corresponds to the smallest batch of information that contains all the information present in the elements of the set you're taking the sup of, and inf corresponds to the largest batch of information such that every element of the set has more information than that) may not exist for certain subsets. As a toy example, take the poset of three elements that looks like a V. The set consisting of the top two elements has an inf, but it doesn't have a sup. If we add an element on top to turn the poset into a diamond shape, then all subsets have a sup and inf.

In a domain, we don't require the existence of arbitrary sup and inf (that's called a complete lattice), and we don't even require the existence of sup and inf for arbitrary finite sets (that's called a lattice). What we do require is that every directed set has a sup. A poset which fulfills this is called a dcpo (directed-complete partial order).

What's a directed set? It's a nonempty set A, where, for all x,y∈A, there's a z∈A s.t. z≥x,z≥y. You can always find upper bounds (not necessarily the supremum, though!) for any finite subset of a directed set, within the directed set. Consider the poset given by the natural numbers, where the ≥ ordering is given by the standard ≥ ordering on natural numbers. This looks like a chain that starts at 0 and extends forever upwards. This fails the requirement that every directed set has a sup! Because N is a directed set, but there's no sup for it. If we add a single element on the top corresponding to ω, which is ≥ everything, then every directed set has a sup again.

So, one requirement for being a domain is that every directed set has a sup. Or, in other words, given some arbitrary batch of information-states where for every two information-states, there's a third one incorporating all the information in both of them (and maybe some more info), there must be a minimal information-state incorporating all the information from the whole batch. Any finite poset fulfills the "every directed set has a sup" property, but there may be interesting failures when you move to infinitely many points.

The next component is continuity. Besides the standard ≥ ordering, there's a >> ordering, which corresponds to approximation. x approximates y,x<<y, iff, for all directed sets A where sup(A)≥y, there's a z∈A s.t. z≥x. As an example, if we take the space [0,1] and equip it with the usual ≥ order to turn it into a poset, x>>y (in the poset) corresponds to x>y (in the standard number ordering). Transitivity holds, and antisymmetry holds. But reflexivity doesn't necessarily hold, as the previous example shows. An element that approximates itself, x>>x, is called compact.

A continuous dcpo is one where, for all points x, x=sup{y|y<<x}. This set happens to be directed (though this fact isn't obvious), and in short, it means that any information-state can be described as the sup of information states that approximate it.

An example of a non-continuous dcpo is: Have two copies of the natural numbers, and add an ω that's above both of those chains, so they're "glued together at infinity". No numbers (in either of the chains) approximate ω, because for any point in one of the chains, you can take the entire other chain, and that's a directed set with a sup of ω, but nothing in that directed set is above the point you picked. So ω can't be built by taking the sup of stuff that approximates it, because there is no such stuff.

If, for all points x, x=sup{y|y<<x,y<<y} (ie, every element is the sup of the compact elements below it), then it's called an algebraic dcpo. These are nicer to work with.

A domain (as defined in the linked notes) is a continuous dcpo (every element can be built as the sup of stuff that approximates it, and all directed sets have a sup). But there's one more condition that shows up so often that I think it should just be folded into the definition of a domain, because if you drop it, an *awful lot* of useful stuff stops working.

Pointedness. Specifically, your domain must have an element ⊥ that is below everything. The typical interpetation of this is something that loops forever. So I'll define a domain as a continuous dcpo with a ⊥.

**Functions:**

We only consider continuous functions between domains. Specifically, a continuous function is one which fulfills the following two properties.

x≤y→f(x)≤f(y)

f(supA)=supa∈Af(a)

In other words, the function should preserve the information ordering (increasing the information content of the input increases information content of the output), and also it shouldn't matter whether you take the sup of a directed set first and then send it through f, or whether you send your directed set through f and take the sup of the image (which is also a directed set by the first condition)

[D→E] is the space of all continuous functions from a domain D to a domain E, and it fulfills a lot of the conditions to be a domain, though not *necessarily* continuity. The ≥ ordering on functions is given as follows: f≥g↔∀x:f(x)≥g(x). (ie, f always produces a more informative result than g on the same input)

The notion of sup in the function space for directed collection of functions is as follows: sup(F)(x)=supf∈Ff(x). The image in the second half is directed because F is a directed set of functions, thus showing that the result is well-defined, and then you can put in some extra work to show that sup(F) is also a continuous function as well, so the function space is a dcpo.

The bottom element in the function space is the function f where ∀x:f(x)=⊥E, the maximally uninformative function that just loops forever on all inputs.

The approximation order << in the function space might be quite badly behaved, however. The function space might not even be a domain. In order to get the function space [D→E] to be a domain, you have to impose extra conditions on either D or E. There are several possible extra conditions you can impose which get the function-space to be a domain (or algebraic domain), and also get the function-space to have the same property. So when we jump up to the category theory level, we typically work in some subcategory of domains that is closed under taking the function space.

One neat property of domains is that every continuous function from a domain to itself has a *least* fixed-point below all the other fixed points, and this least fixed-point also has a nice pretty description. The least fixed point of f∈[D→D], is given by the sup of the directed set A with elements: ⊥,f(⊥),f(f(⊥)),f(f(f(⊥)))...

⊥≤f(⊥) because ⊥ is below everything. x≥y→f(x)≥f(y) so repeatedly applying this shows that each element in the chain is ≥ the element below it. So it's a directed set and has a sup. The image of A after applying f is just A itself (minus the bottom element). By continuity of f, f(supA)=supa∈Af(a)=supn∈Nfn+1(⊥)=supA. So supA is a fixed point of f.

Also, the function fix:[D→D]→D that maps a function to its fixed point is continuous.

There's also a category-theory analogue of this same result, but it's quite a bit more complicated.

**Quick note on notation:**

The standard notation for composition of functions, f∘g, doesn't mesh well with intuition, because you have to read it backwards. First g is applied, then f is applied. This can be dealt with acceptably when you aren't composing that many functions, but later on in domain theory, you work up to some really complicated function compositions, so making up a notation that puts the functions the proper way around empirically made proofs vastly easier for me. From now on, f;g;h will be a notation for "first apply f, then g, then h". It's the same as h∘g∘f, but without the need to take ten seconds to mentally reverse the order whenever it shows up.

**Embeddings and Projections:**

An embedding-projection pair between a pair of domains D and E is a pair of continuous functions e:D→E,p:E→D, s.t. e;p=idD, and p;e≤idE. In other words, E is richer than D and has more information states available, so you can embed E into D. Crunching E back down to D via the projection, recovers D exactly (ie, the embedding doesn't identify different things, it's injective). However, projecting E down to D, and then embedding back into E destroys information, so p;e(x)≤x, which is equivalent to p;e≤idE.

These are nice because an embedding uniquely determines a projection, and vice-versa. If you've got an embedding, it's usually pretty easy to come up with a natural candidate for a projection, and once you verify the two defining properties of that pair of functions, you know there's nothing else you could substitute in (note that there may be many embedding/projection pairs! But once you fix one of the two, that uniquely determines the other part).

This comes in handy once we get up to the category theory level, because lots of times when you're drawing a diagram you're like "damn, I need this arrow to go the other way", but if the arrow is an embedding or projection, you can just automatically get an arrow going the other way by going "take the unique projection/embedding corresponding to the embedding/projection I already have". Normally you'd need an isomorphism to pull off the "reverse the arrows" trick, but embeddings and projections also let you do that. Also all isomorphisms are embeddings and projections.

Also, embeddings and projections have the nice property that they have to be strict (ie, they must both map ⊥ to ⊥).

**Category Theory Constructions:**

So, there's several things you can do to a domain to get another domain. We already know about function-space, but there are others.

Cartesian product, ×, is simple. It's the domain of pairs where (x,y)≥(z,a) iff x≥z,y≥a.

These get you a cartesian closed category, if you are working in a sufficiently nice subcategory of domains where the function-space is forced to be a domain too and have the defining property of the subcategory.

Considering the subcategory of domains where the only morphisms are strict functions (functions which map ⊥ to ⊥), our candidate for the function space would be the strict function space, [D→E]⊥!, of all functions which map ⊥ to ⊥.

This doesn't harmonize with the cartesian product. However, there's another product. The smash product, ⊗. It's like cartesian product, but all pairs with a ⊥ element in one of the coordinates are identified as the same point, the ⊥ element of the product. Or, in other words, if one of the components of the pair loops forever, it's just classified as the loop-forever bottom element of the smash product.

Strict function space and smash product get you a monoidal closed category.

There's lifting, which is taking D and making a new domain D⊥ by sticking a single new ⊥ element below everything in D.

There's something called the coalesced sum, ⊕, which corresponds to the coproduct in category theory. It's done by taking two domains D and E, and identifying their bottom elements, which glues them together at the bottom point. The coalesced sum of the three-element domain that looks like a V, with itself, is a domain with 4 incomparable points, and a bottom element below them all.

So, function space, strict function space, cartesian product, smash product, coalesced sum, and lifting, are our basic building blocks.

There's one more rather complicated one, the bilimit, which plays a key role in generating fancy domains with seemingly-impossible type signatures.

**The Bilimit:**

Consider an infinite chain of domains indexed by the natural numbers, D0,D1,D2... where, for all n, there's an embedding en of Dn into Dn+1. Call this an expanding system. In other words, as you go further up the chain, the domains get more complicated and keep adding extra points while preserving the existing structure. This is the category-theory analogue of a chain (a type of directed set), analogous to the chain we made to construct a fixed point of a function. It's then reasonable to ask "what's the analogue of sup in this context?"

A composition of embeddings is an embedding (the projection is given by composing the corresponding projections), so we've also got embeddings from any domain in this sequence to any higher domain, and projections from any domain in this sequence to any lower domain.

There's a very special domain called the bilimit, which is both a limit of the diagram with the projections, and a colimit of the diagram with the embeddings. Or, in other words, embedding Dn into B does the same thing as embedding Dn into Dm, m>n, and embedding Dm into B. Same with projection. It doesn't matter whether you project B into Dn, or project B into Dm and then into Dn. If something else is a limit or colimit, you can embed (or project) B into that thing as needed.

The bilimit can be thought of as the infinite product of the domains (equipped with the obvious ordering), but you only keep the points of the form <x0,x1,x2...> where pn(xn+1)=xn. In other words, a point in the bilimit corresponds to some infinite sequence of points/ever-more-detailed information states from the ever-more-detailed domains, and crunching the m'th domain (more detailed) down to the n'th domain (less detailed) maps the m'th coordinate (more detailed information state, ) down to the n'th coordinate (less detailed information state). The bilimit of a finite sequence would just be the last domain in the sequence, because once you've got the final coordinate, that fixes everything else. The bilimit of an infinite sequence is like the "completion" of the process of the domains getting more and more detailed.

To project from the bilimit to one of the finite stages, just take the appropriate coordinate, that gives you the point to map to. To embed from one of the finite stages to the bilimit, just project back to get points for the earlier stages, and embed forwards to get points for the later stages.

If you take a point x in the bilimit B, projecting to Dn and then embedding back into B gets you a point y≤x, because project-then-embed moves points down. We can consider the batch of points in B given by projecting-then-embedding into some finite stage. This is a directed set, and the sup is x itself. (ie, to get a close approximant for x, just project into some really distant domain and then embed back, that gets you a long initial sequence of coordinates that match up with x itself)

This bilimit is the analogue of sup, on a category-theory level.

Since we're working in some subcategory of domains in order to make sure the function spaces work out alright, we'd also want our defining property of the subcategory to be closed under bilimits as well. There are indeed some conditions you can impose so you're working in a subcategory closed under bilimits.

The nicest and most convenient such subcategory is the category of bifinite domains. There's a really complicated definition of them, but after putting in a lot of work, you can show that something is bifinite iff it's the bilimit of some expanding system of finite pointed posets. (ie, every bifinite domain can be written as the bilimit of an infinite sequence of finite partially ordered sets which all have a bottom element). Another cool note is that these are all algebraic (every point x can be made by taking the sup of elements that approximate themselves that are below x). The compact elements of a bilimit are those which can be made by taking a point at some finite stage and embedding it into the bilimit (ie, a point is compact only if it requires finitely much information to specify, you can complete the rest of the information by just embedding it forward forever once you get far enough)

The problem is that, since you've gotta make it as a limit of finite domains, that impairs your ability to incorporate probabilities into things, since there are infinitely many probabilities. Incorporating probabilities into domains is one of the things on my to-learn list.

**Category Theory + Domain Theory:**

To complete the analogy and get an analogue of the least-fixpoint theorem, but for domains, we need an analogue of a continuous function (we have an analogue of sup in the form of the bilimit).

In this case, the analogue of a continuous function would be a continuous functor (more on what makes a functor continuous later). For example, consider the functor F s.t. F(D)=[D→X]. A fixpoint of F would be a domain D that's isomorphic to [D→X]. We also want our functor to map embeddings to embeddings for the upcoming application, so if there's an embedding e:D→E, we need F(e) to be an embedding from [D→X] to [E→X]. If you've got a functor with multiple inputs, then it needs to map tuples of embeddings to embeddings. Consider the cartesian product functor where F(D,E)=D×E. Given embeddings e:D→D′ and e′:E→E′, F(e,e′) needs to be an embedding from D×E to D′×E′.

Let's do this for the function space, as that's the most complicated one.

Exercise: Do this for all the other constructors, like lifting and cartesian product and coalesced sum.

We have domains D,D′,E,E′, and embeddings e:D→D′ and e′:E→E′. We want F(e,e′) to be an embedding from [D→E] to [D′→E′]. Well... we have a function f from D to E and we're trying to get a function from D′ to E′. Using f, then e′, we can get from D to E′. But we have to somehow get from D′ to D, but we only have an embedding going the other way...

But it's an embedding, so we can just use the unique projection p, which goes the other way! This is what I was talking about with restricting to embeddings so you can get your arrows going the other way when the need arises.

Specifically, our candidate for the embedding is: F(e,e′)(f)=p;f;e′, and our candidate for the projection going the other way is to map g:D′→E′ to e;g;p′. Let's call this function P(e,e′) for later use. It's a bit of work to show that these are continuous functions, so I'm gonna skip that part, but I will show the part about these two things fulfilling the relevant conditions for embeddings and projections.

Condition 1: embedding then projecting had better recover your original point.

F(e,e′);P(e,e′)(f)=P(e,e′)(p;f;e′)=e;p;f;e′;p′=idD;f;idD′=f

by the definition of our embeddings and projections, and embed-then-project giving you identity.

Condition 2: projecting then embedding had better produce a lower point.

P(e,e′);F(e,e′)(g)=F(e,e′)(e;g;p′)=p;e;g;p′;e′

p;e;g;p′;e′(x)≤g;p′;e′(x)≤g(x) so p;e;g;p′;e′≤g

(again, by project-then-embed giving you a lower point, for the inequalities)

So our functor does map pairs of embeddings to embeddings.

Now do it with the other building blocks.

Anyways, returning to the original thing, we should go over the definition of continuity for a functor.

For an expanding sequence with bilimit B, applying the functor to the sequence gives you a different expanding sequence (because it maps embeddings to embeddings). This has a bilimit B′.

A functor is continuous if F(B)≅B′. In other words it doesn't matter if you take the bilimit and shove it through the functor, or shove your sequence through the functor, and then take the bilimit of that. Swapping out "functor" for function, and "bilimit" for sup, we can see the close parallel to continuity of functions.

It's kinda hard to verify this condition, but there's a stronger one called local continuity which is a lot easier to check and implies continuity. For all the basic building blocks of domains (product, function space, lifting, coalesced sum, etc...), and compositions of them, the associated functors are continuous, which is quite nice.

**Yo Dawg I Heard You Like Fixed Points:**

Let's momentarily just look at continuous functors that only take one input, not multiple. Like the functor mapping Λ to [Λ→Λ], as a concrete example. We're going to be trying to complete the analogy to the original fixpoint theorem. We start with ⊥, and keep applying our function to it to build up a chain, and then take the sup of the chain, and that's the minimal fixed point. So what's the analogue of ⊥ in a category-theory sense?

It's just the domain consisting of a single point, call it I. And there's only one possible embedding from that to whatever domain you want, the embedding which maps the single point to the bottom element in the target domain.

So our expanding sequence is I,F(I),F(F(I)),F(F(F(I))).... and for the embeddings, we've got our unique embedding e from I into F(I), so F(e) is an embedding from F(I) into F(F(I)), and F(F(e)) is an embedding from F(F(I)) into F(F(F(I)))... Ok, we have our embeddings, and thus our projections. Now we take the bilimit B. By functor continuity, F(B)=bilimit of the expanding sequence shoved through F. But shoving our expanding sequence through F just gives us the same sequence, but with I clipped off. This doesn't change the bilimit at all.

So, as desired, F(B)≅B. And this is how you cook up mathematical spaces X isomorphic to [X→X]. Or if you want a space X that's isomorphic to X×D, that also can be done.

Now, if you're attentive, you'll have noticed that this construction just gives you a single point as your canonical solution to X≅[X→X]. This can be fixed by starting with something other than a single point (like a domain with two points, a top and a bottom), and then specifying a custom embedding from that into F(2-point domain), giving you a full-blown model of the lambda calculus. But apparently the least fixed-point/canonical solution grants you extra-nice properties. I don't know what those are, since I haven't gotten to that chapter, but apparently you have more powerful tools available if you're working in the least-fixed-point domain instead of some other fixed-point.

There was a paper by Abramsky (here) that looked at something called the lazy lambda calculus, which turns out to be exactly the least-fixed-point solution of X≅[X→X]⊥, which I'm still digesting. This is notable because adding an extra point at the bottom lets you get away from a single point. We start with the one-element domain, take the function space (still one-element domain), add a bottom element (ok, now we have two elements), take the function space of that (now three elements for the three possible continuous functions), add a bottom element, and from there it gets waay more complicated.

The analogue of the isomorphism between X and [X→X]⊥ in the computational sense is that you can take a lambda term (element of X), work on it until you get it into something called weak head normal form (a form like λx.M), and then it's a function from lambda terms to lambda terms. The bottom element comes in because maybe the lambda term doesn't reduce to weak head normal form, and then the reduction process just loops forever, and that's the bottom element (because bottom elements correspond to stuff that just loops forever or doesn't return an output)

Now, with tupling and some more work, it's possible to even solve multiple fixpoints simaltaneously. If you've got continuous functors F1,F2,F3, then there's a least-fixpoint solution to:

A≅F1(A,B,C)

B≅F2(A,B,C)

C≅F3(A,B,C)

So that's what domain theory can do. You can set up a bunch of isomorphisms that you want, of the form "these domains should be isomorphic to these other domains that can be defined in terms of the basic building blocks like function space and smash product", and cook up a canonical solution. An example I'm particularly interested in is (V is the three-element domain that looks like a V):

A≅[B→A]⊕V

B≅[A→B]⊕V

This corresponds to two domains of strategies, where a strategy is a continuous function from their strategy to your strategy, or it's one of two possible actions (or loop forever). So you can take two points in these, and repeatedly feed them to each other to either get looping forever, or a concrete action (the "repeatedly play the strategies against each other" function is a continuous function from A×B to V×V, though that's a bit tricky to show). This seems kinda like the full domain-theory version of Vanessa's metathreat hierarchy without the restriction to operate on some fixed finite level, though I'm not sure how to get probabilities in there.

More later on how to get those domains into something you can actually work with on a computer, once I learn that.

Discuss

### The LessWrong 2018 Review

*If you have 1000+ karma, you have until Dec 1st to nominate LessWrong posts from 2018 (yes, 2018, not 2019) for the first LessWrong Review. The nomination button is available from a post's dropdown menu. *

*Multiple nominations are helpful – posts with enough nominations will proceed to a review phase (ending December 31st), followed by a week of voting. **Details below.*

*The LW team will be compiling the best posts and reviews into a physical book, awarding $2000 divided among top posts and (up to) $2000 divided among top reviews.*

*Here are **the top 2018 posts sorted by karma**, and **here they are aggregated by month**.*

*You can see nominated posts here**. *

This is the first week of the LessWrong 2018 Review – an experiment in improving the LessWrong Community's longterm feedback and reward cycle.

This post begins by exploring the motivations for this project (first at a high level of abstraction, then getting into some more concrete goals), before diving into the details of the process.

Improving the Idea PipelineIn his LW 2.0 Strategic Overview, habryka noted:

We need to build on each other’s intellectual contributions, archive important content, and avoid primarily being news-driven.

We need to improve the signal-to-noise ratio for the average reader, and only broadcast the most important writing

[...]

Modern science is plagued by severe problems, but of humanity’s institutions it has perhaps the strongest record of being able to build successfully on its previous ideas.

The physics community has this system where the new ideas get put into journals, and then eventually if they’re important, and true, they get turned into textbooks, which are then read by the upcoming generation of physicists, who then write new papers based on the findings in the textbooks. All good scientific fields have good textbooks, and your undergrad years are largely spent reading them.

Over the past couple years, much of my focus has been on the **early-stages** of LessWrong's idea pipeline – creating affordance for off-the-cuff conversation, brainstorming, and exploration of paradigms that are still under development (with features like shortform and moderation tools).

But, the beginning of the idea-pipeline is, well, not the end.

I've written a couple times about what the later stages of the idea-pipeline might look like. My best guess is still something like this:

I want LessWrong to encourage extremely high quality intellectual labor. I think the best way to go about this is through escalating positive rewards, rather than strong initial filters.

Right now our highest reward is getting into the curated section, which... just isn't actually that high a bar. We only curate posts if we think they are making a good point. But if we set the curated bar at "extremely well written and extremely epistemically rigorous and extremely useful", we would basically never be able to curate anything.

My current guess is that there should be a "higher than curated" level, and that the general expectation should be that posts should only be put in that section after getting reviewed, scrutinized, and most likely rewritten at least once.

I still have a lot of uncertainty about the right way to go about a review process, and various members of the LW team have somewhat different takes on it.

I've heard lots of complaints about mainstream science peer review: that reviewing is often a thankless task; the quality of review varies dramatically, and is often entangled with weird political games.

Meanwhile: LessWrong posts cover a variety of topics – some empirical, some philosophical. In many cases it's hard to directly evaluate their truth or usefulness. LessWrong team members had differing opinions on what sort of evaluation is most useful or practical.

I'm not sure if the best process is more open/public (harnessing the wisdom of crowds) or private (relying on the judgment of a small number of thinkers). The current approach involves a mix of both.

What I'm most confident in is that the review should focus on older posts.

New posts often feel exciting, but a year later, looking back, you can ask if it *actually* has become a helpful intellectual tool. (I'm also excited for the idea that, in future years, the process could also include reconsidering previously-reviewed posts, if there's been something like a "replication crisis" in the intervening time)

Regardless, I consider the LessWrong Review process to be an experiment, which will likely evolve in the coming years.

GoalsBefore delving into the process, I wanted to go over the high level goals for the project:

*1. Improve our longterm incentives, feedback, and rewards for authors*

*2. Create a highly curated "Best of 2018" sequence / physical book*

*3. Create **common knowledge** about the LW community's collective epistemic state regarding controversial posts*

**Longterm incentives, feedback and rewards**

Right now, authors on LessWrong are rewarded essentially by comments, voting, and other people citing their work. This is fine, as things go, but has a few issues:

- Some kinds of posts are quite valuable, but don't get many comments (and these disproportionately tend to be posts that are more proactively rigorous, because there's less to critique, or critiquing requires more effort, or building off the ideas requires more domain expertise)
- By contrast, comments and voting both nudge people towards posts that are clickbaity and controversial.
- Once posts have slipped off the frontpage, they often fade from consciousness. I'm excited for a LessWrong that rewards Long Content, that stand the tests of time, as is updated as new information comes to light. (In some cases this may involve editing the original post. But if you prefer old posts to serve as a time-capsule of your post beliefs, adding a link to a newer post would also work)
- Many good posts begin with an "epistemic status: thinking out loud", because, at the time, they were just thinking out loud. Nonetheless, they turn out to be quite good. Early-stage brainstorming is good, but if 2 years later the early-stage-brainstorming has become the best reference on a subject, authors should be encouraged to change that epistemic status and clean up the post for the benefit of future readers.

The aim of the Review is to address those concerns by:

- Promoting old, vetted content directly on the site.
- Awarding prizes not only to authors, but to reviewers. It seems important to directly reward high-effort reviews that thoughtfully explore both how the post could be improved, and how it fits into the broader intellectual ecosystem. (At the same time,
*not*having this be the final stage in the process, since building an intellectual edifice requires four layers of ongoing conversation) - Compiling the results into a physical book. I find there's something... literally
*weighty*about having your work in printed form. And because it's much harder to edit books than blogposts, the printing gives authors an extra incentive to clean up their past work or improve the pedagogy.

**A highly curated "Best of 2018" sequence / book**

Many users don't participate in the day-to-day discussion on LessWrong, but want to easily find the best content.

To those users, a "Best Of" sequence that includes not only posts that seemed exciting at the time, but distilled reviews and followup, seems like a good value proposition. And meanwhile, helps move the site away from being time-sensitive-newsfeed.

**Common knowledge about the LW community's collective epistemic state regarding controversial posts**

Some posts are highly upvoted because everyone agrees they're true and important. Other posts are upvoted because they're more like exciting hypotheses. There's a lot of disagreement about which claims are actually true, but that disagreement is crudely measured in comments from a vocal minority.

The end of the review process includes a straightforward vote on which posts seem (in retrospect), useful, and which seem "epistemically sound". This is not the *end* of the conversation about which posts are making true claims that carve reality at it's joints, but my hope is for it to ground that discussion in a clearer group-epistemic state.

**1 week (Nov 20th – Dec 1st)**

- Users with 1000+ karma can nominate posts from 2018, describing how they found the post useful over the longterm.
- The nomination button is in the post dropdown-menu (available at the top of posts, or to the right of their post-item)
- For convenience, you can review posts via:
- a list of all 2018 posts, sorted by karma
- if you want a more in-depth overview, 2018 posts clustered by month

**4 weeks (Dec 1st – Dec 31st)**

- Authors of nominated posts can opt-out of the review process if they want.
*They also can opt-in, while noting that they probably won't have time to update their posts in response to critique. (This may reduce the chances of their posts being featured as prominently in the Best of 2018 book)*

- Posts with sufficient* nominations are announced as contenders.
*We're aiming to have 50-100 contenders, and the nomination threshold will be set to whatever gets closest to that range*

- For a month, people are encouraged to look at them thoughtfully, writing comments (or posts) that discuss:
- How has this post been useful?
- How does it connect to the broader intellectual landscape.
- Is this post epistemically sound?
- How could it be improved?
- What further work would you like to see people do with the content of this post?

- A good frame of reference for the reviews are shorter versions of LessWrong or SlatestarCodex book reviews (which do a combination of epistemic spot checks, summarizing, and contextualizing)
- Authors are encouraged to engage with reviews:
- Noting where they disagree
- Discussing what sort of followup work they'd be interested in seeing from others
- Ideally, updating the post in response to critique they agree with

*Voting Phase*

** 1 Week (Jan 1st – Jan 7th)**

Posts that got at least one review proceed to the voting phase. The details of this are still being fleshed out, but the current plan is:

- Users with 1000+ karma rate each post on a 1-10 scale, with 6+ meaning
*"I'd be happy to see this included in the 'best of 2018'"*roundup, and 10 means*"this is the best I can imagine"* - Users are encouraged to (optionally) share the reasons for each rating, and/or share thoughts on their overall judgment process.

*Books and Rewards*

**Public Writeup / Aggregation**

Soon afterwards (hopefully within a week), the votes will all be publicly available. A few different aggregate statistics will be available, including the raw average, and potentially some attempt at a "karma-weighted average."

**Best of 2018 Book / Sequence**

Sometime later, the LessWrong moderation team will put together a physical book, (and online sequence), of the best posts and most valuable reviews*. *

This will involve a lot of editor discretion – the team will essentially take the public review process and use it as input for the construction of a book and sequence.

I have a lot of uncertainty about the shape of the book. I'm guessing it'd include anywhere from 10-50 posts, along with particularly good reviews of those posts, and some additional commentary from the LW team.

*Note: This may involve some custom editing to handle things like hyperlinks, which may work differently in printed media than online blogposts. This will involve some back-and-forth with the authors.*

**Prizes**

- Everyone whose work is featured in the book will receive a copy of it.
- There will be $2000 in prizes divided among the authors of the top 3-5 posts (judged by the moderation team)
- There will be
**up to**$2000 in prizes for the best 0-10 reviews that get included in the book. (The distribution of this will depend a bit on what reviews we get and how good they are) *(note: LessWrong team members may be participating as reviews and potentially authors, but will not be eligible for any awards)*

Discuss

### A fun calibration game: "0-hit Google phrases"

Here's a simple calibration game: propose some phrase(s), like "the ultimate east care pant" (something one of my pairs of pants say) and ask "How likely is it that Google return no search results for this phrase (in quotes)?"

Discuss

### Thinking of tool AIs

Preliminary note: the ideas in the post emerged during the Learning-by-doing AI safety workshop at EA Hotel; special thanks to Linda Linsefors, Davide Zagami and Morgan Sinclaire for giving suggestions and feedback.

As the title anticipates, long-term safety is not the main topic of this post; for the most part, the focus will be on current AI technologies. More specifically: why are we (un)satisfied with them from a safety perspective? In what sense can they be considered tools, or services?

An example worth considering is the YouTube recommendation algorithm. In simple terms, the job of the algorithm is to find the videos that best fit the user and then suggest them. The expected watch time of a video is a variable that heavily influences how a video is ranked, but the objective function is likely to be complicated and probably includes variables such as click-through rate and session time.[1] For the sake of this discussion, it is sufficient to know that the algorithm cares about the time spent by the user watching videos.

From a safety perspective - even without bringing up existential risk - the current objective function is simply wrong: a universe in which humans spend lots of hours per day on YouTube is not something we want. The YT algorithm has the same problem that Facebook had in the past, when it was maximizing click-throughs.[2] This is evidence supporting the thesis that we don't necessarily need AGI to fail: if we keep producing software that optimizes for easily measurable but inadequate targets, we will steer the future towards worse and worse outcomes.

Imagine a scenario in which:

- human willpower is weaker than now;
- hardware is faster than now, so that the YT algorithm manages to evaluate a larger number of videos per time unit and, as a consequence, gives the user better suggestions.

Because of these modifications, humans could spend almost all day on YT. It is worth noting that, even in this semi-catastrophic case, the behaviour of the AI would be more tool-ish than AGI-like: it would not actively oppose its shutdown, start acquiring new resources, develop an accurate model of itself in order to self-improve, et cetera.

From that perspective, the video recommendation service seems much more dangerous than what we usually indicate with the term tool AI. How can we make the YT algorithm more tool-ish? What *is* a tool?

Unsurprisingly, it seems we don't have a clear definition yet. In his paper about CAIS, Drexler writes that it is typical of services to deliver bounded results with bounded resources in bounded times.[3] Then, a possible solution is to put a constraint on the time that a user can spend on YT over a certain period. In practice, this could be done by forcing the algorithm to suggest random videos when the session time exceeds a threshold value: in fact, this solution doesn't even require a modification of the main objective function. In the following, I will refer to this hypothetical fixed version of the algorithm as "constrained YT algorithm" (cYTa).

Even though this modification would prevent the worst outcomes, we would still have to deal with subtler problems like echo chambers and filter bubbles, which are caused by the fact that recommended videos share something in common with the videos watched by the user in the past.[4] So, if our standards of safety are set high enough, the example of cYTa shows that the criterion "bounded results, resources and time" is insufficient to guarantee positive outcomes.

In order to better understand what we want, it may be useful to consider current AI technologies that we are satisfied with. Take Google Maps, for example: like cYTa, it optimizes within hard constraints and can be easily shut down. However, GMaps doesn't have a known negative side effect comparable to echo chambers; from this point of view, also AIs that play strategy games (e.g. Deep Blue) are similar to GMaps.

Enough with the examples! I claim that the "idealized safe tool AI" fulfills the following criteria:

- Corrigibility
- Constrained optimization
- No negative side effects

Before I get insulted in the comments because of how [insert_spicy_word] this list is, I'm going to spell out some details. First, I've simply listed three properties that seem necessary if we want to talk about an AI technology that doesn't cause any sort of problem. I wouldn't be surprised if the list turned out to be non-exhaustive and I don't mean it to be taken as a definition of the concept "tool" or "service". At the same time, I think that these two terms are too under-specified at the moment, so adding some structure could be useful for future discussions. Moreover, it seems to me that 3 implies 2 because, for each variable that is left unconstrained during optimization, side effects usually become more probable; in general, 3 is a really strong criterion. Instead, 1 seems to be somewhat independent from the others. Last, even though the concept is idealised, it is not so abstract that we don't have a concrete reference point: GMaps works well as an example.[5]

Where do we go from here? We can start by asking whether what has been said about CAIS is still valid if we replace the term service with the concept of idealized safe tool. My intuition is that the answer is yes and that the idealized concept can actually facilitate the analysis of some of the ideas presented in the paper. Another possible question is to what extent a single superintelligent agent can adhere to 3; or, in other words, whether limiting an AI's side effects also constrains its capability of achieving goals. These two papers already highlighted the importance of negative side effects and impact measures, but we are still far from getting a solid satisfactory answer.

SummaryJust for clarity purposes, I recap the main points presented here:

- Even if AGI was impossible to obtain, AI safety wouldn’t be solved; thinking of tools as naturally safe is a mistake.
- As shown by the cYTa example, putting strong constraints on optimization is not enough to ensure safety.
- An idealized notion of safe tool is proposed. This should give a bit more context to previously discussed ideas (e.g. CAIS) and may stimulate future research or debate.

All the details are not publicly available and the algorithm is changed frequently. By googling "YouTube SEO" I managed to find these, but I don't know how reliable the source is. ↩︎

As stated by Yann LeCun in this discussion about instrumental convergence: "[...] Facebook stopped maximizing clickthroughs several years ago and stopped using the time spent in the app as a criterion about 2 years ago. It put in place measures to limit the dissemination of clickbait, and it favored content shared by friends rather than directly disseminating content from publishers." ↩︎

Page 32. ↩︎

With cYTa, the user will experience the filter bubble only until the threshold is reached; the problem would be only slightly reduced, not solved. If the threshold is set really low then the problem is not relevant anymore, but at the same time the algorithm becomes useless because it recommends random videos for most of the time. ↩︎

In order to completely fulfill 3, we have to neglect stuff like possible car accidents caused by distraction induced by the software. Analogously, an AI like AlphaZero could be somewhat addicting for the average user who likes winning at strategy games. In reality, every software can have negative side effects; saying that GMaps and AlphaZero have none seems a reasonable approximation. ↩︎

Discuss

### Junto: Questions for Meetups and Rando Convos

I ponder a lot about community and how important local community is for the functioning of society; many are the riches brought from afar by long distance communication. Nonetheless, local rationality meetups can increase local metis by generating intelligent community. I read in Isaacson’s biography of Benjamin Franklin how he (Franklin) employed his Junto to advance scientific knowledge, civil society, and business; there are some great examples there. But the Wikipedia page will do well enough for an overview.

https://en.wikipedia.org/wiki/Junto_(club)

What I have done here is taken the Junto discussion questions of Franklin's club and reformulated them to serve as a model for the types of questions we can be asking each other to keep advancing community and local knowledge.

- Have you read anything useful or insightful recently? Particularly in technology, history, literature, science, or other fields of knowledge?
- What problems have you been thinking about recently?
- Has there been any worthwhile or important local news?
- Have any businesses failed lately, and do you know anything about the cause?
- Have any businesses recently risen in success, how so?
- Do you know of anyone, who has recently done something interesting, praiseworthy or worthy of imitation? Or who has made a mistake we should be warned against and avoid?
- Have you been doing anything recently to increase your psychological and physical health?
- Is there any person whose acquaintance you want, and which someone in the group can procure for you?
- Do you think of anything at present by which the group could easily do something useful?
- Do you know of any deserving younger person, for whom it lies in the power of the group to encourage and help advance in his career?
- Do you see anything amiss in the present customs or proceedings of the group, which might be amended?

Discuss

### Doxa, Episteme, and Gnosis Revisited

Exactly two years to the day I started writing this post I published Map and Territory's most popular post of all time, "Doxa, Episteme, and Gnosis" (also here on LW). In that post I describe a distinction ancient Greek made between three kinds of knowledge we might translate as hearsay, justified belief, and direct experience, respectively, although if I'm being totally honest I'm nowhere close to being a classics scholar so I probably drew a distinction between the three askew to the one ancient Attic Greeks would have made. Historical accuracy aside, the distinction has proven useful over the past couple years to myself and others, so I thought it was worth revisiting in light of all I have learned in the intervening time.

Nuanced DistinctionsTo start, I still draw the categories of doxa, episteme, and gnosis roughly the same as I did before. To quote myself:

Doxa is what in English we might call hearsay. It’s the stuff you know because someone told you about it. If you know the Earth is round because you read it in a book, that’s doxa.Episteme is what we most often mean by “knowledge” in English. It’s the stuff you know because you thought about it and reasoned it out. If you know the Earth is round because you measured shadows at different locations and did the math to prove that the only logical conclusion is that the Earth is round, that’s episteme.Gnosis has no good equivalent in English, but the closest we come is when people talk about personal experience because gnosis is the stuff you know because you experienced it. If you know the Earth is round because you traveled all the way around it or observed it from space, that’s gnosis.There's more nuance to it than that, of course. Doxa, for example, also refers to thoughts, beliefs, ideas, propositions, statements, and words in addition to its connotations of hearsay, common belief, and popular opinion. Episteme, to Plato, was the combination of doxa and logos, contrary to my example above where I root episteme in observational evidence, although then again maybe not because "logos" can mean not only "reason", "account", "word", and "speech" but also "ground" or "ultimate cause". And gnosis, despite its connotations in English as a special kind of insightful knowledge about the true nature of existence as a result of its use by Christian mystics, shares the same root or is the root via borrowing of the word for "knowledge" in most European languages, English included.

Further, the boundaries between the three categories are not always clear. We've already seen one way this is so, where I described episteme in a way that it's grounded by gnosis via the direct experience of observation, but this is an empiricist perspective on what episteme is and there's an equally valid notion, in terms of category construction, of episteme as reasoning from first thought within a traditional rationalist perspective. Another is that all knowledge is in a certain sense gnosis because there must have been some experience by which you gained the knowledge (unless you really want to double down on rational idealism and go full Platonist), although this need not confuse us if we understand the difference between the experience of something and the something quoted/bracketed within the experience. And similarly, all knowledge we speak of must first become doxa in our own minds that we tell ourselves before it becomes doxa for others by being put into words that draw common distinctions, hence episteme and gnosis can only be generated and never directly transmitted.

Additional CategoriesIn addition to doxa, episteme, and gnosis, we can draw additional distinctions that are useful for thinking about knowledge.

One is metis, or practical wisdom. This is the knowledge that comes from hard won experience, possibly over many generations such that no one even knows where it came from. Metis is often implicit or exists via its application and may look nonsensical or unjustified if made explicit. To return to my original examples, this would be like knowing to take a great circle route on a long migration because it's the traditional route despite not knowing anything about the roundness of Earth that would let you know it's the shortest route.

Related to metis is techne, or procedural knowledge or the knowing that comes from doing. In English we might use a phrase like "muscle memory" to capture part of the idea. It's like the knowledge of how to walk or ride a bike or type on a keyboard or throw a clay pot, and also the kind of knowledge that produces things like mathematical intuition, the ability to detect code smell, and a gut sense of what is right. It's knowledge that co-arises with action.

I'm sure we could capture others. Both metis and techne draw out distinctions that would otherwise disappear within doxa and gnosis, respectively. We can probably make further distinctions for, say, episteme that is grounded in gnosis vs. episteme that is grounded in doxa, gnosis about other types of knowledge, and doxa derived by various means. We are perhaps only limited by our need to make these distinctions and sufficient Greek words with which to make them.

RelationshipsRather than continuing down the path of differentiation, let's look instead at how our three basic ways of knowing come together and relate to one another. In the original post I had this to say about the way doxa, episteme, and gnosis interact:

Often we elide these distinctions. Doxa of episteme is frequently thought of as episteme because if you read enough about how others gained episteme you may feel as though you have episteme yourself. This would be like hearing lots of people tell you how they worked out that the Earth is round and thinking that this gives you episteme rather than doxa. The mistake is understandable: as long as you only hear others talk about their episteme it’s easy to pattern match and think you have it too, but as soon as you try to explain your supposed episteme to someone else you will quickly discover if you only have doxa instead. The effect is so strong that experts in fields often express that they never really knew their subject until they had to teach it.In the same way episteme is often mistaken for gnosis. At least since the time of Ptolemy people have had episteme of the spherical nature of the Earth, and since the 1970s most people have seen pictures showing that the Earth is round, but astronauts continue to experience gnosis of Earth’s roundness the first time they fly in space. It seems no matter how much epistemic reckoning we do or how accurate and precise our epistemic predictions are, we are still sometimes surprised to experience what we previously only believed.But none of this is to say that gnosis is better than episteme or that episteme is better than doxa because each has value in different ways. Doxa is the only kind of knowledge that can be reliably and quickly shared, so we use it extensively in lieu of episteme or gnosis because both impose large costs on the knower to figure things out for themselves or cultivate experiences. Episteme is the only kind of knowledge that we can prove correct, so we often seek to replace doxa and gnosis with it when we want to be sure of ourselves. And gnosis is the only kind of knowledge available to non-sentient processes, so unless we wish to spend our days in disembodied deliberation we must at least develop gnosis of doxastic and epistemic knowledge to give the larger parts of our brains information to work with. So all three kinds of knowledge must be used together in our pursuit of understanding.That sounds pretty nice, like all three kinds of knowledge need to exist in harmony. In fact, I even said as much by concluding the original with an evocative metaphor:

It’s coincidental that ancient Greek chose to break knowledge into three kinds rather than two or four or five, but because it did we can think of doxa, episteme, and gnosis like the three legs of a stool. Each leg is necessary for the stool to stand, and if any one of them is too short or too long the stool will wobble. Pull one out and the stool will fall over. Only when all three are combined in equal measure do we get a study foundation to sit and think on.Alas, I got some things wrong in the original with how I described the relationship between these three aspects of knowledge, specifically in the way things fall apart when the three aspects are not balanced. I won't reprint those words here to avoid spreading confusion, and will rather try to make amends by better describing what can happen when we privilege one kind of knowledge over the others.

To **privilege doxa** is to value words, thoughts, and ideas over reason and experience. This position is sometimes compelling: as the saying goes, if you can't explain something, you don't really understand it, and to explain it you must have and generate doxa. Further, doxa lets you engage with the world at a safe distance without getting your hands dirty, but this comes with the risk of becoming detached, unhinged, ungrounded, unroot, disconnected, and otherwise uncorrelated with reality because, on its own, doxa is nothing more than empty words. The people we pejoratively claim to put doxa first are sophists, pundits, ivory-tower intellectuals, certain breeds of bloggers, and, of course, gossips. The remedy for their condition is to spend more time thinking for oneself and experiencing life.

When we **privilege episteme** we believe our own reason over and above what wisdom and experience tell us. The appeal of favoring episteme lies in noticing that wisdom and experience can mislead us, such that if we just bothered to think for 5 minutes we would have noticed they were wrong. And, of course, sometimes they are, but if we continue down this path we run into infinite inferential regress, the uncomputable universal prior, the problem of the criterion, epistemic circularity, and more mundane problems like making commonly known mistakes, ignoring our experiences because we don't understand them, and otherwise failing because we didn't reckon we would. Putting episteme first is the failure mode of high modernists, logical positivists, traditional rationalists, and internet skeptics. If we fall victim to their mistakes, the solution lies with finding the humility to accept that sometimes other people know things even when we don't and to trust our lived experiences to be just as they are, nothing more and nothing less.

Finally, **privileging gnosis** is to rely on our experiences at the expense of reason and wisdom. There's a certain logic to the radical empiricism of this approach: what I can know for sure is what I experience with my eyes, ears, nose, tongue, body, and mind, and every other way of knowing is a secondary source. But this leaves out the important contributions of what we can know about the world that lies beyond our direct experience where we learn from others and from reasoning, effectively giving up the epistemic benefits that come with language. Solipsists, hippies, mystics, and occultists are among the folk who tend to value gnosis over episteme and doxa. For them we might advise listening more to others and spending more time at rigorous, precise, and careful thought to balance out their over-strong belief in what they experience.

Walking the middle way between these three attractors is not easy. If nothing else, there's a certain temptation that can arise to identify with the way of knowing you like best and the people who engage most with that way of knowing. I encourage you to resist it! You can hang out with and wear the attire of an intellectual, a rationalist, or a hippie without succumbing to their stereotypical epistemological failure modes of excess doxa, episteme, and gnosis. There is no special virtue in making wrong predictions about the world, regardless of how you came to make that wrong prediction. Instead, you can aspire to remain a sharp blade that cuts to the truth no matter the whetstone used to hone the blade or the stance from which the cut is made.

Beyond DistinctionsIf it's the case that there's no special privileging of one kind of knowledge over another and the path to truth lies with combining them all, you might ask why make any distinctions at all? Certainly it feels at times useful to draw these distinctions, but as we've seen these distinctions are blurry, nuanced, and blend into each other. What about the alternative of unifying these kinds into a single concept that captures them all?

By itself the English word "knowledge" fails to do that adequately because it tends to point towards explicit knowledge and disregards that which is known implicitly and that which is inseparable from its embeddedness in the world, and we know this because it's noteworthy to point out ways that things like gnosis and metis and techne can count as knowing. So what is the thing that ties these notions all together?

I think it's worth considering what it means to know something. Knowing is an intentional act: a subject (you, me, them) knows an object (the something known). Thus it is a kind of relationship between subject and object where the subject experiences the object in a particular way we consider worth distinguishing as "knowing" from other forms of experience. In knowing the object seems to always be something mental, viz. the object is information not stuff, ontological not ontic. For example, you might say I can't know the cup on my desk directly, only the experience of it in my mind—the noumenon of the cup is not known, only the phenomenon of it. And from there we can notice that knowing is not a single experience, but composed of multiple motions: initial contact with a mental object, categorization of the object in terms of ontology, evaluation of it, possible recollection of related mental objects (memories), integration into a network of those related mental objects, and self-reflection on the experience of the mental object.

Given the complexity of the knowing act, I'm inclined to infer that even if the neurological processes that enable knowing can be thought of as a unified system, its complex enough that we should expect it to have many aspects that to us would look like different kinds of knowledge. When certain aspects of that process are more salient than the others, we might see a pattern and label that knowing experience as doxa, episteme, or gnosis. So knowledge is neither a single kind or multiple, but a holon both composed of distinct kinds and cut from a single kind, codependent and inseparable from one another. Thus there are different kinds of knowledge and there is just one kind of knowing, and holding both perspectives is necessary to understanding the depths of what it means to know.

More to say?There's always more to say. For example, I chose to leave out a more detailed discussion on the etiology of knowledge, which confuses the matter of a bit since it can mean putting one kind of knowledge causally first which can be mistaken for thinking one kind is more important than the others. Maybe I'll return to this topic in another two years or more and have additional insights to share.

Discuss

### [AN #74]: Separating beneficial AI into competence, alignment, and coping with impacts

Find all Alignment Newsletter resources __here__. In particular, you can __sign up__, or look through this __spreadsheet__ of all summaries that have ever been in the newsletter. I'm always happy to hear feedback; you can send it to me by replying to this email.

Audio version __here__ (may not be up yet).

**Highlights**

__AI alignment landscape__ *(Paul Christiano)* (summarized by Rohin): This post presents the following decomposition of how to make AI go well:

[__Link__ to image below]

**Rohin's opinion:** Here are a few points about this decomposition that were particularly salient or interesting to me.

First, at the top level, the problem is decomposed into alignment, competence, and coping with the impacts of AI. The "alignment tax" (extra technical cost for safety) is only applied to alignment, and not competence. While there isn't a tax in the "coping" section, I expect that is simply due to a lack of space; I expect that extra work will be needed for this, though it may not be technical. I broadly agree with this perspective: to me, it seems like the major technical problem which *differentially* increases long-term safety is to figure out how to get powerful AI systems that are *trying* to do what we want, i.e. they have the right __motivation__ (__AN #33__). Such AI systems will hopefully make sure to check with us before taking unusual irreversible actions, making e.g. robustness and reliability less important. Note that __techniques like verification, transparency, and adversarial training__ (__AN #43__) may still be needed to ensure that the *alignment* itself is robust and reliable (see the inner alignment box); the claim is just that robustness and reliability of the AI's *capabilities* is less important.

Second, strategy and policy work here is divided into two categories: improving our ability to pay technical taxes (extra work that needs to be done to make AI systems better), and improving our ability to handle impacts of AI. Often, generically improving coordination can help with both categories: for example, the __publishing concerns around GPT-2__ (__AN #46__) have allowed researchers to develop synthetic text detection (the first category) as well as to coordinate on when not to release models (the second category).

Third, the categorization is relatively agnostic to the details of the AI systems we develop -- these only show up in level 4, where Paul specifies that he is mostly thinking about aligning learning, and not planning and deduction. It's not clear to me to what extent the upper levels of the decomposition make as much sense if considering other types of AI systems: I wouldn't be surprised if I thought the decomposition was not as good for risks from e.g. powerful deductive algorithms, but it would depend on the details of how deductive algorithms become so powerful. I'd be particularly excited to see more work presenting more concrete models of powerful AGI systems, and reasoning about risks in those models, as was done in __Risks from Learned Optimization__ (__AN #58__).

**Previous newsletters**

__Addendum to AI and Compute__ *(Girish Sastry et al)* (summarized by Rohin): Last week, I said that this addendum suggested that we don't see the impact of AI winters in the graph of compute usage over time. While true, this was misleading: the post is measuring compute used to *train* models, which was less important in past AI research (e.g. it doesn't include Deep Blue), so it's not too surprising that we don't see the impact of AI winters.

**Technical AI alignment**

**Mesa optimization**

__Will transparency help catch deception? Perhaps not__ *(Matthew Barnett)* (summarized by Rohin): __Recent__ (__AN #70__) __posts__ (__AN #72__) have been optimistic about using transparency tools to detect deceptive behavior. This post argues that we may not want to use *transparency tools*, because then the deceptive model can simply adapt to fool the transparency tools. Instead, we need something more like an end-to-end trained deception checker that's about as smart as the deceptive model, so that the deceptive model can't fool it.

**Rohin's opinion:** In a __comment__, Evan Hubinger makes a point I agree with: the transparency tools don't need to be able to detect all deception; they just need to prevent the model from developing deception. If deception gets added slowly (i.e. the model doesn't "suddenly" become perfectly deceptive), then this can be way easier than detecting deception in arbitrary models, and could be done by tools.

**Prerequisities:** __Relaxed adversarial training for inner alignment__ (__AN #70__)

__More variations on pseudo-alignment__ *(Evan Hubinger)* (summarized by Nicholas): This post identifies two additional types of pseudo-alignment not mentioned in __Risks from Learned Optimization__ (__AN #58__). **Corrigible pseudo-alignment** is a new subtype of corrigible alignment. In corrigible alignment, the mesa optimizer models the base objective and optimizes that. Corrigible pseudo-alignment occurs when the model of the base objective is a non-robust proxy for the true base objective. **Suboptimality deceptive alignment** is when deception would help the mesa-optimizer achieve its objective, but it does not yet realize this. This is particularly concerning because even if AI developers check for and prevent deception during training, the agent might become deceptive after it has been deployed.

**Nicholas's opinion:** These two variants of pseudo-alignment seem useful to keep in mind, and I am optimistic that classifying risks from mesa-optimization (and AI more generally) will make them easier to understand and address.

**Preventing bad behavior**

__Vehicle Automation Report__ *(NTSB)* (summarized by Zach): Last week, the NTSB released a report on the Uber automated driving system (ADS) that hit and killed Elaine Herzberg. The pedestrian was walking across a two-lane street with a bicycle. However, the car didn't slow down before impact. Moreover, even though the environment was dark, the car was equipped with LIDAR sensors which means that the car was able to fully observe the potential for collision. The report takes a closer look at how Uber had set up their ADS and notes that in addition to not considering the possibility of jay-walkers, "...if the perception system changes the classification of a detected object, the tracking history of that object is no longer considered when generating new trajectories". Additionally, in the final few seconds leading up to the crash the vehicle engaged in *action suppression*, which is described as "a one-second period during which the ADS suppresses planned braking while the (1) system verifies the nature of the detected hazard and calculates an alternative path, or (2) vehicle operator takes control of the vehicle". The reason cited for implementing this was concerns of false alarms which could cause the vehicle to engage in unnecessary extreme maneuvers. Following the crash, Uber suspended its ADS operations and made several changes. They now use onboard safety features of the Volvo system that were previously turned off, action suppression is no longer implemented, and path predictions are held across object classification changes.

**Zach's opinion:** **While there is a fair amount of nuance regarding the specifics of how Uber's ADS was operating it does seem as though there was a fair amount of incompetence in how the ADS was deployed.** Turning off Volvo system fail-safes, not accounting for jaywalking, and trajectory reseting seem like unequivocal *mistakes*. A lot of people also seem upset that Uber was engaging in action suppression. However, given that randomly engaging in extreme maneuvering in the presence of other vehicles can *indirectly cause* accidents I have a small amount of sympathy for why such a feature existed in the first place. Of course, the feature was removed and it's worth noting that "there have been no unintended consequences—increased number of false alarms".

**Read more:** Jeff Kaufman writes a __post__ summarizing both the original incident and the report. Wikipedia is also rather thorough in their reporting on the factual information. Finally, __ Planning and Decision-Making for Autonomous Vehicles__ gives an overview of recent trends in the field and provides good references for people interested in safety concerns.

**Interpretability**

__Explicability? Legibility? Predictability? Transparency? Privacy? Security? The Emerging Landscape of Interpretable Agent Behavior__ *(Tathagata Chakraborti et al)* (summarized by Flo): This paper reviews and discusses definitions of concepts of interpretable behaviour. The first concept, **explicability** measures how close an agent's behaviour is to the observer's expectations. An agent that takes a turn while its goal is straight ahead does not behave explicably by this definition, even if it has good reasons for its behaviour, as long as these reasons are not captured in the observer's model. **Predictable** behaviour reduces the observer's uncertainty about the agent's future behaviour. For example, an agent that is tasked to wait in a room behaves more predictably if it shuts itself off temporarily than if it paced around the room. Lastly, **legibility** or **transparency** reduces observer's uncertainty about an agent's goal. This can be achieved by preferentially taking actions that do not help with other goals. For example, an agent tasked with collecting apples can increase its legibility by actively avoiding pears, even if it could collect them without any additional costs.

These definitions do not always assume correctness of the observer's model. In particular, an agent can explicably and predictably achieve the observer's task in a specific context while actually trying to do something else. Furthermore, these properties are dynamic. If the observer's model is imperfect and evolves from observing the agent, formerly inexplicable behaviour can become explicable as the agent's plans unfold.

**Flo's opinion:** Conceptual clarity about these concepts seems useful for more nuanced discussions and I like the emphasis on the importance of the observer's model for interpretability. However, it seems like concepts around interpretability that are not contingent on an agent's actual behaviour (or explicit planning) would be even more important. Many state-of-the-art RL agents do not perform explicit planning, and ideally we would like to know something about their behaviour before we deploy them in novel environments.

**AI strategy and policy**

__AI policy careers in the EU__ *(Lauro Langosco)*

**Other progress in AI**

**Reinforcement learning**

__Superhuman AI for multiplayer poker__ *(Noam Brown et al)* (summarized by Matthew): In July, this paper presented the first AI that can play six-player no-limit Texas hold’em poker better than professional players. Rather than using deep learning, it works by precomputing a blueprint strategy using a novel variant of Monte Carlo linear counterfactual regret minimization, an iterative self-play algorithm. To traverse the enormous game tree, the AI buckets moves by abstracting information in the game. During play, the AI adapts its strategy by modifying its abstractions according to how the opponents play, and by performing real-time search through the game tree. It used the equivalent of $144 of cloud compute to calculate the blueprint strategy and two server grade CPUs, which was much less hardware than what prior AI game milesones required.

**Matthew's opinion:** From what I understand, much of the difficulty of poker lies in being careful not to reveal information. For decades, computers have already had an upper hand in being silent, computing probabilities, and choosing unpredictable strategies, which makes me a bit surprised that this result took so long. Nonetheless, I found it interesting how little compute was required to accomplish superhuman play.

**Read more:** __Let's Read: Superhuman AI for multiplayer poker__

**Meta learning**

__Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning__ *(Tianhe Yu, Deirdre Quillen, Zhanpeng He et al)* (summarized by Asya): "Meta-learning" or "learning to learn" refers to the problem of transferring insight and skills from one set of tasks to be able to quickly perform well on new tasks. For example, you might want an algorithm that trains on some set of platformer games to pick up general skills that it can use to quickly learn new platformer games.

This paper introduces a new benchmark, "Meta World", for evaluating meta-learning algorithms. The benchmark consists of 50 simulated robotic manipulation tasks that require a robot arm to do a combination of reaching, pushing and grasping. The benchmark tests the ability of algorithms to learn to do a single task well, learn one multi-task policy that trains and performs well on several tasks at once, and adapt to new tasks after training on a number of other tasks. The paper argues that unlike previous meta-learning evaluations, the task distribution in this benchmark is very broad while still having enough shared structure that meta-learning is possible.

The paper evaluates existing multi-task learning and meta-learning algorithms on this new benchmark. In meta-learning, it finds that different algorithms do better depending on how much training data they're given. In multi-task learning, it finds that the algorithm that performs best uses multiple "heads", or ends of neural networks, one for each task. It also finds that algorithms that are "off-policy"-- that estimate the value of actions other than the one that the network is currently planning to take-- perform better on multi-task learning than "on-policy" algorithms.

**Asya's opinion:** I really like the idea of having a standardized benchmark for evaluating meta-learning algorithms. There's a lot of room for improvement in performance on the benchmark tasks and it would be cool if this incentivized algorithm development. As with any benchmark, I worry that it is too narrow to capture all the nuances of potential algorithms; I wouldn't be surprised if some meta-learning algorithm performed poorly here but did well in some other domain.

**News**

__CHAI 2020 Internships__ (summarized by Rohin): CHAI (the lab where I work) is currently accepting applications for its 2020 internship program. The deadline to apply is **Dec 15**.

Discuss

### Affordable Housing Workarounds

After reading some about how affordable housing is actually implemented, it looks to me like rich people could exploit it to avoid paying property and inheritance taxes, and generally get around the means testing requirements.

Affordable housing is about renting or selling homes well below market price, so if there were a large pool of affordability-restricted properties there would be a lot of incentive for people to figure out how to get around the spirit of the restrictions. I'm going to talk about buying here, but renting has a lot of similarities.

A typical buyer restriction today (Somerville example) is something like:

- Annual income no more than $71,400 for a household of two (80% AMI).
- Non-retirement assets no more than $250k.
- Haven't owned a home within 3y ("first-time homebuyer").
- No students.
- Preference for people who currently live or work in the city.
- No legal minimum income, but mortgage lenders will apply ones in practice.

Buyers who meet these restrictions are entered into a lottery, and the winner gets a 2-bedroom 2.5 bathroom 1,500 square-foot unit for $177k instead of $1,049k. Property taxes are also very low, ~$200/y instead of ~9k/y. [1]

These restrictions apply at purchase time: you have to have a relatively low income and assets to qualify, but then there are no further restrictions. This makes sense, because otherwise we would be requiring poor people to stay poor, but it also allows a lot of potential ways for rich people to 'legally cheat':

Intentionally keep a low income for several years. Three years at $70k instead of $140k loses you $210k, but you'd save more than that in property taxes alone long-term.

Arrange for deferred or speculative compensation. Stock that vests in four years, stock options, start a startup.

Get training that gives you high earning potential, but don't start your high paying job until after you have the house. This training is effectively an asset, but it's very hard for the affordable housing administrators to price it, so it's ignored.

Learn through self-study or apprenticeship to get around the prohibition on students.

Postpone transfers to your children until after they have qualified for affordable housing, since the income and assets of relatives are not considered.

Buy land, take advantage of density bonuses, build a large 100% affordable fancy building, and sell the units to your just-out-of-school currently-low-earning children.

There are also longer-term issues around resale. You can sell to anyone you want, as long as they meet the buyer restrictions and pay no more than the legal maximum price. This means sellers are in a position where they can effectively give a very large untaxed gift. This could let parents transfer large amounts of wealth to their children, untaxed. [2] You could also have problems with corruption, where I buy your property for $200k, but then I sneak you an extra $100k so you sell it to me instead of someone else.

Since these are implemented by deed restriction, they could be hard to fix if they're getting exploited. It's also not necessarily obvious whether or how much abuse there is, since the whole problem is that based on the city's verification legitimately poor people and artificially poor people look the same. (And what do we mean by "artificially poor," and do we want to include children of bankers who decide to become artists or low-paid academics?)

It's possible that the amount of hassle for the potential savings is too low for it to be worth it for rich people to subvert. If 90% of the units are used as intended and only 10% are tax shelters, I'd consider it not great but probably still good. But I'm very nervous about building a system that sets up so many opportunities for people with good lawyers to get around the spirit of the rules.

[1] The property is assessed at a low value because the city sets
maximum resale prices. Since that's below the value of the city's residential
exemption you're taxed as if the property is worth just 10% of
it's assessed value. I calculate $8,830/year in property taxes for
the market rate unit (after the residential exemption) and just
$190/year for the affordable unit.

[2] Stow MA's Deed Restriction Program (faq) is an example of a way of doing this that seems especially prone to exploitation.

*Comment via: facebook*

Discuss

### Wrinkles

Why does our skin form wrinkles as we age?

This post will outline the answer in a few steps:

- Under what conditions do materials form wrinkles, in general?
- How does the general theory of wrinkles apply to aging human skin?
- What underlying factors drive the physiological changes which result in wrinkles?

In the process, we’ll draw on sources from three different fields: mechanical engineering, animation, and physiology.

Why do Materials Wrinkle?Imagine we have a material with two layers:

- A thin, stiff top layer
- A thick, elastic bottom layer

We squeeze this material from the sides, so the whole thing compresses.

The two layers want to do different things under compression:

- The thin top layer maintains its length but wants to minimize bending, so it wants to bow outward and form an arc
- The elastic bottom layer wants to minimize vertical displacement, so it wants to just compress horizontally without any vertical change at all.

Because the two layers are attached, these two objectives trade off, and the end result is waves - aka wrinkles. Longer waves allow the top layer to bend less, so a stiffer top layer yields longer waves. Shorter waves allow the bottom layer to expand/compress less vertically, so a stiffer bottom layer yields shorter waves. The “objectives” can be quantified via the energy associated with bending the top layer or displacing the bottom layer, leading to quantitative predictions of the wavelength - see __this great review paper__ for the math.

Engineers do this with a thin metal coating on soft plastic. The two are bound together at high temperature, and then the whole system compresses as it cools. The end result is cool wrinkle patterns:

Other interesting applications include predicting mountain spacing (with crust and mantle as the two layers) and surface texture of dried fruit - see __the review paper__ for more info and cool pictures.

The same thing happens in skin.

Skin LayersFor our purposes, skin has three main layers:

- The epidermis is a thin, relatively stiff top layer
- The SENEB (subepidermal non-echogenic band, also sometimes called subepidermal low-echogenic band, SLEB) is a mysterious age-related layer, mostly absent in youth and growing with age, between the epidermis and dermis - more on this later
- The dermis is the thick base layer, containing all the support structure - blood vessels, connective tissue, etc

Both the SENEB and the dermis are relatively thick, elastic layers, while the epidermis is thin and stiff. So, based on the model from the previous section, we’d expect this system to form wrinkles.

But wait, if our skin has a thin stiff top layer and thick elastic bottom layer even in youth, then why do wrinkles only form when we get old?

Turns out, young people have wrinkles too. In youth, the wrinkles have short wavelength - we have lots of tiny wrinkles, so they’re not very visible. As we age, our wrinkle-wavelength grows, so we have fewer, larger wrinkles - which are more visible. The real question is not “why do wrinkles form as we age?” but rather “why does the wavelength of wrinkles grow as we age?”.

Based on the simple two-layer model, we’d expect that either the epidermis becomes more stiff with age, or the lower layers become less stiff.

This the right basic idea, but of course it’s a bit more complicated in practice. __These guys__ use a three-layer model, cross-reference parameters from the literature with what actually reproduces realistic age-related wrinkling (specifically for SENEB modulus), and find realistic age-related wrinkles with these numbers:

(arrows indicate change from young to old). Other than the SENEB elastic modulus, all of these numbers are derived from empirically measured parameters - see the paper for details.

Age-Related Physiological ChangesWe have two main questions left:

- Why do the dermis and epidermis stiffen with age?
- What exactly is the SENEB, and why does it grow with age?

I haven’t looked too much into stiffening of the dermis, but the obvious hypothesis is that it stiffens for the same reason lots of other tissues stiffen with age. At some point I’ll have a post on stiffening of the vasculature which will talk about that in more depth, but for now I’m going to punt.

The paper from the previous section notes that the epidermis stiffens mainly due to dehydration; rehydrating the epidermis reverses the stiffening (this is the basis of many cosmetics). A dehydrated epidermis makes sense, since both the SENEB and age-related problems in the vasculature will isolate the epidermis more from the bloodstream (although I haven’t seen direct experimental evidence of that causal link).

That leaves the mysterious SENEB. What is it, and why does it grow with age?

The name “subepidermal non-echogenic band” is a fancy way of saying that there’s a layer under the epidermis which is transparent to ultrasound imaging. That’s the main way the SENEB is detected: it shows up as a space between the epidermis and dermis on ultrasound images of the skin.

As far as I can tell, little is known about the SENEB. The main things we __do know__:

- SENEB grows with age; see numbers above
- SENEB is found in aged skin typically exposed to sunlight (“photoaged”, e.g. hands and face) but not in hidden skin (e.g. butt).

Most authors claim that the SENEB consists of elastin deposits. That matches what we know of __solar elastosis__, the build-up of elastin deposits in photoaged skin. But I haven’t seen anyone systemically line up the ultrasonic and histologic images and chemically analyze the SENEB layer to check that it really is made of elastin. (This may just be a case of different researchers with different tools using different names for things which are the same.)

Assuming that the SENEB does consist of accumulated elastin, why is elastin accumulating? Well, it turns out that elastin is __never__ __broken down__ in humans. It does not turn over. On the other hand, the skin presumably needs to produce new elastin sometimes to heal wounds. Indeed, many authors note that the skin’s response to UV exposure is basically a wound-healing response. Again, I haven’t seen really convincing data, but I haven’t dug too thoroughly. It’s certainly plausible that elastin is produced in response to UV as part of a wound-healing response, and then accumulates with age. That would explain why the SENEB grows in photoaged skin, but not in hidden skin.

Discuss

### Austin meetup notes Nov. 16, 2019: SSC discussion

The following is a writeup (pursuant to Mingyuan's proposal) of the discussion at the Austin LW/SSC Meetup on November 16, 2019, at which we discussed six different SlateStarCodex articles. We meet every Saturday at 1:30pm - if you're in the area, come join us!

You are welcome to use the comments below to continue discussing any of the topics raised here. I also welcome meta-level feedback: How do you like this article format? What sorts of meetups lead to interesting writeups?

Disclaimer: I took pains to make it clear before, during, and after the meetup that I was taking notes for posting on LessWrong later. I do not endorse posting meetup writeups without the knowledge and consent of those present!

The Atomic Bomb Considered As Hungarian High School Science Fair ProjectThere was a Medium post on John von Neumann, which was discussed on Hacker News, which linked to the aforementioned SSC article on why there were lots of smart people in Budapest 1880-1920.

Who was John von Neumann? - One of the founders of computer science, founder of game theory, nuclear strategist. For all his brilliance he's fairly unknown generally. Everyone who knew him said he was an even quicker thinker than Einstein; but why didn't he achieve as much as Einstein? Perhaps because he died of cancer at 53.

Scott Alexander says: {Ashkenazi Jews are smart. Adaptations can have both up- and down-sides (e.g. sickle cell anemia / malaria resistance); likewise some genes cause genetic disorders and also intelligence. These are common in Ashkenazim.}

Jews were forced into finance because Christians weren't allowed to charge interest on loans, but it turned out interest was really useful.

Scott Alexander says: {And why this time period? Because restrictions on Jews only started being lifted just before this period, and they needed a generation or so to pass before they could be successful. And afterward, Nazis happened. Why Hungary and not Germany? Hungary has a "primate city" (Budapest), i.e. a city that's much more prominent than others in its area, so intellectuals will tend to gather there. Germany, by contrast, is less centralized.}

Simulation of idea-sharing and population density - cities are more likely to incubate ideas (Hacker News discussion). Does that mean we'll get more progress if everyone in a certain field gathers in one place? Perhaps. It's helpful to get feedback for your ideas to get your thinking on the right track, rather than going down a long erroneous path without colleagues to correct you.

Building Intuitions On Non-Empirical Arguments In ScienceScott Alexander says: {Should we reject the idea of a multiverse if it doesn't make testable predictions? No, because it's more parsimonious, contra the "Popperazi" who say that new theories must have new testable predictions.} This article is interesting because it goes as far as you can into the topic without getting into actual advanced physics.

Similar to Tegmark's argument.

What kinds of multiverse are there? Everett (quantum) multiverse, and cosmological multiverse (different Big Bangs with different physical laws coming from them, etc.). This article applies to both (although maybe you could argue that these are both the same thing).

Related LessWrong article: Belief in the Implied Invisible.

But how do you think about the probability of you being in a multiverse, if that multiverse might contain an infinite number of beings? Should we totally discount finite-population universes (as being of almost-zero probability) because infinity always outweighs any finite number? See Nick Bostrom's Ph.D. dissertation (this is not that dissertation but it likely covers substantially the same material).

The reason for accepting the Everett multiverse is Occam's razor, because it makes the math simpler. Is that accurate? - Yes, but there's a fundamental disagreement about what "simpler" means. On the one hand, Schrödinger's equation naturally predicts the Many-Worlds Interpretation (MWI). On the other hand, MWI doesn't explain where the probabilities come from. MWIers have been trying to figure this out for a while.

Generally probability refers to your state of knowledge about reality. But quantum mechanics overturns that by positing fundamental uncertainty that is not merely epistemic.

Re MWI probabilities, see Robin Hanson's "Mangled Worlds": {Multiverse branches that don't obey the 2-norm probability rule (a.k.a. the "Born rule") can be shown to decline in measure "faster" than branches that do, and if a branch falls below a certain limit it ceases to exist in any meaningful sense because it merges into the background noise, etc.}

Robin Hanson's an economist, right? - Yes, but he may have studied physics at one point.

Scott Aaronson's 2003 paper: {Maybe it's natural to use the 2-norm to represent probability, because it's the only conserved quantity. If we didn't, we could arbitrarily inflate a particular branch's probability.}

Autism And Intelligence: Much More Than You Wanted To KnowTower-vs-foundation model - intelligence is composed of a "tower" and a "foundation", and if you build up the tower too much without building up the foundation, the tower collapses and you end up being autistic. Analogy: MS Word and Powerpoint got better with each update till eventually they got so complex that they're not usable any more.

What mechanisms could explain the tower-vs-foundation model?

Is intelligence linear? You can have e.g. a musical prodigy, or someone who's exceptionally good at specific tasks despite being autistic.

How is intelligence defined here? - By IQ tests, in the cited studies. But these are designed for neurotypical people.

People with autism have higher-IQ families. But maybe such families are simply more likely to take their kids to doctors to get diagnosed with autism - a major confounder.

The studies look mostly at males and the father's genes, but you'd think the mother's genes are equally important.

Facebook post (archive) similar to the tower-vs-foundation concept.

Maybe you could do surveys of lower-income communities to check for autism incidence there - but this is difficult particularly because they're more likely to be mistrustful of strangers asking about such things. Or maybe not; maybe lower-income people are *more* likely to accept payment for scientific studies.

Testing for autism is questionable - why is there a 3:1 male:female ratio? Is this reflective of reality, or of bias in diagnosis? Perhaps you could tell by seeing if rates of diagnosis increase over time at the same rate for males and females - if females are generally diagnosed later than males, then that might be because of bias in the diagnosis that makes males with autism more likely to be diagnosed than females with autism.

How fuzzy is the category of autism? "It's a spectrum" - or more of a multivariate space?

Article in The Guardian says: {The move to accept (and not treat) autism has been harmful for people with severe autism.}

Scott Alexander says: {If you want to call something a disease, it should have a distinct cluster/separation from non-diseased cases, rather than just a continuum with an arbitrary line drawn on it.} This is particularly important in psychology, because oftentimes we can only observe symptoms and only guess as to the cause (in contrast to e.g. infectious diseases).

Samsara (short story)In a world where everyone has attained enlightenment, one man stands alone as being unenlightened... He gets more and more stubborn the more the enlightened ones try to reach him, and founds his own school of unenlightenment. We'll stop the discussion here to avoid spoilers, but you should read it.

This is the type of story that would benefit from having padding material added to the end so that you don't know when the ending is about to come, à-la *Gödel, Escher, Bach*.

It's like that Scottish movie *Trainspotting* (which requires subtitles for Americans because of the heavy Scottish dialect) - "What if I don't want to be anything other than a heroin addict"?

Scott Alexander says: {A survey asked people if they would respond to a financial incentive, and if they thought others would respond to the same incentive. People said that others would be more likely to respond to incentives than they themselves were.}

It could be entirely true that *most* people wouldn't respond to incentives, but some people would, and so when you ask them if "other people" would respond, they answer as if you're asking if "anyone" would. The survey question is unclear.

Social desirability bias - you don't want to be known as someone who accepts incentives easily, because that puts you in a bad negotiating position. Always overstate your asking price.

"Would you have sex with me for a billion dollars..." joke.

Speaking of salary negotiations: Always have a good second option you can tell the employer about. But if a candidate claims that "Amazon and Google" are contacting them, that doesn't mean they're any more desirable - Amazon and Google contact everyone!

You could look at sin taxes to see if they have any effect.

*Predictably Irrational* by Dan Ariely - a daycare started fining parents who were late in picking up their kids, but this resulted in even more parents being late.

Incentives occur at the margin, so it can be effective to have incentives even if "most" people don't respond.

Social incentives are powerful. Can you set up social incentives deliberately? One example: Make public commitments to do something, and get shamed if you later don't do it. But see Derek Sivers's TED talk *Keep your goals to yourself*. But did they consider the effect of publicly checking in on your progress later?

With purely financial incentives e.g. Beeminder, you might treat it transactionally like in the daycare example.

Aside: {Multi-armed bandit problem: There are a bunch of slot machines with different payouts. What's the best strategy? Explore vs. Exploit tradeoff. *Algorithms to Live By* - book by Brian Christian and Tom Griffiths, who were also on Rationally Speaking. E.g. If you find a good restaurant in a city you're visiting for just a few days, you should go back there again, but in your hometown you should explore more different restaurants.}

Hypothesis explaining the survey: You have more information about yourself. If someone estimates that they have a 30% chance of e.g. moving to another city, they'll say "No" to the survey 100% of the time.

Aside: {*Yes Minister* TV show features Minister Jim Hacker, a typical well-meaning politician concerned about popularity and getting stuff done; and Sir Humphrey, his secretary, a 30-year civil servant who knows how things actually work and is always frustrating the minister's plans. "The party have had an opinion poll done; it seems all the voters are in favour of bringing back National Service. - Well, have another opinion poll done showing the voters are *against* bringing back National Service!"}

Scott Alexander concludes: {Skeptical of the research, because we do see people respond to financial incentives. Even if most people don't, it could still be important.}

Too Much Dark Money In AlmondsScott Alexander says: {Why is there so little money in politics? Less than $12 billion/year in the US, which is less than the amount of money spent on almonds. Hypothesis: this is explained by coordination problems.}

Other ideas: People want to avoid escalation since if they spend money their political opponents will just spend more, etc. But this is implausible because it itself requires a massive degree of coordination.

What if money in politics doesn't actually make much difference? If the world is as depicted in *Yes Minister*, the government will keep doing the same thing regardless of political spending anyway.

Maybe a better comparison is (almond advertising):(political spending)::(almonds):(all government spending).

Spending directly on a goal is more effective than lobbying the government to spend on that goal, e.g. Elon Musk and SpaceX.

What would have more political spending, an absolute monarchy or a direct democracy? (Disagreement on this.)

Why is bribery more common in some places than others? Maybe you just can't get anything done at all without bribes. Or maybe some places hide it better by means of e.g. revolving-door lobbyist deals, "We'll go easy on your cousin who's in legal trouble", etc.

Aside: {Scott Alexander asks: {Is someone biased simply because they have a stake in something?} Total postmodern discourse would entirely discount someone's argument based on their stake in the matter; but we aren't so epistemically helpless that we can't evaluate the actual contents of an argument.}

Aside: {Administrative clawback: If you fix problems, you'll get less money next year - perhaps by more than enough to cancel out the benefits of the fix. They'll expect you to make just as much progress again, which may not be possible. Don't excel because that'll raise expectations for the future.}

Or maybe almonds are a bigger deal than you think!

Discuss

### How I do research

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

*Someone asked me about this, so here are my quick thoughts.*

Although I've learned a lot of math over the last year and a half, it still isn't my comparative advantage. What I do instead is,

Find a problemthat seems plausibly important to AI safety (low impact), or a phenomenon that's secretly confusing but not really explored (instrumental convergence). If you're looking for a problem, corrigibility strikes me as another thing that meets these criteria, and is still mysterious.

Think about the problemStare at the problem on my own, ignoring any existing thinking as much as possible. Just think about what the problem is, what's confusing about it, what a solution would look like. In retrospect, this has helped me avoid anchoring myself. Also, my prior for existing work is that it's confused and unhelpful, and I can do better by just thinking hard. I think this is pretty reasonable for a field as young as AI alignment, but I wouldn't expect this to be true at all for e.g. physics or abstract algebra. I also think this is likely to be true in any field where philosophy is required, where you need to find the right formalisms instead of working from axioms.

Therefore, when thinking about whether "responsibility for outcomes" has a simple core concept, I nearly instantly concluded it didn't, without spending a second glancing over the surely countless philosophy papers wringing their hands (yup, papers have hands) over this debate. This was the right move. I just trusted my own thinking. Lit reviews are just proxy signals of your having gained comprehension and come to a well-considered conclusion.

Concrete examples are helpful - at first, thinking about vases in the context of impact measurement was helpful for getting a grip on low impact, even though it was secretly a red herring. I like to be concrete because we actually need *solutions* - I want to learn more about the relationship between solution specifications and the task at hand.

Make simplifying assumptions wherever possible. Assume a ridiculous amount of stuff, and then pare it down.

Don't formalize your thoughts too early - you'll just get useless mathy sludge out on the other side, the product of your confusion. Don't think for a second that having math representing your thoughts means you've necessarily made progress - for the kind of problems I'm thinking about right now, the math has to *sing* with the elegance of the philosophical insight you're formalizing.

Basically forget all about whether you have the license or background to come up with a solution. When I was starting out, I was too busy being fascinated by the problem to remember that I, you know, wasn't allowed to solve it.

Obviously, there are common-sense exceptions to this, mostly revolving around trying to run without any feet. It would be pretty silly to think about logical uncertainty without even knowing propositional logic. One of the advantages of immersing myself in a lot of math isn't just knowing more, but knowing what I don't know. However, I think it's pretty rare to secretly lacking the basic skills to even start on the problem at hand. You'll probably know if you are, because all your thoughts keep coming back to the same kind of confusions about a formalism, or something. Then, you look for ways to resolve the confusion (possibly by asking a question on LW or in the MIRIx Discord), find the thing, and get back to work.

Stress-test thoughtsSo you've had some novel thoughts, and an insight or two, and the outlines of a solution are coming into focus. It's important not to become enamored with what you have, because it stops you from finding the truth and winning. Therefore, think about ways in which you could be wrong, situations in which the insights don't apply or in which the solution breaks. Maybe you realize the problem is a bit ill-defined, so you refactor it.

The process here is: break the solution, deeply understand why it breaks, and repeat. Don't get stuck with patches; there's a rhythm you pick up on in AI alignment, where good solutions have a certain flavor of integrity and compactness. It's OK if you don't find it right away. The key thing to keep in mind is that you aren't trying to pass the test cases, but rather to find brick after brick of insight to build a firm foundation of deep comprehension. You aren't trying to find the right equation, you're trying to find the state of mind that makes the right equation obvious. You want to understood new pieces of the world, and maybe one day, those pieces will make the ultimate difference.

Discuss

## Страницы

- « первая
- ‹ предыдущая
- …
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- …
- следующая ›
- последняя »