Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 1 час 2 минуты назад

Results of LW Technical Background Survey

26 июля, 2019 - 20:33
Published on July 26, 2019 5:33 PM UTC

See results here.

The main goal of the survey was to provide info for authors about their target audience, so here's a high-level overview toward that end:

  • The average respondent is some kind of professional programmer, with an undergrad degree (or equivalent) in CS.
  • Most people have seen at least some economics and probability, but not at the level of a undergrad degree.
  • Almost everyone knows calculus, but linear algebra or differential equations will likely be lost on at least ~25% of respondents.
  • There are substantial zero-knowledge and high-knowledge counts for most areas.
  • About half of respondents had read the sequences in their entirety.

Here are charts of the responses to each question. I strongly recommend looking at them directly rather than just taking my summary at face value. As always, remember this is an opt-in survey without any sort of verification of responses, so take everything with a grain of salt.

One interesting note: we had a handful of respondents declaring very high skill levels (Nobel-level economists, Turing-level computer scientists, or primary developers of popular software). I'd personally be interested to hear what exactly those people work on, especially if they're willing to occasionally field questions on their area of expertise. All y'all should leave a comment or something.

Actually, I'm curious what everyone works on, especially specialties for all the researchers. Feel free to leave a quick comment, especially if you're able and willing to occasionally field questions in your area of expertise.


Old Man Jevons Can’t Save You Now (Part 2/2)

26 июля, 2019 - 06:51

Can you summarize highlights from Vernon's Creativity?

26 июля, 2019 - 04:12
Published on July 26, 2019 1:12 AM UTC

Gwern had responded to Elizabeth's question about creativity with the following:

A good anthology to read is Creativity, ed Vernon 1970 - it's old but it shows you what people were thinking back when Torrance was trying to come up with creativity tests, and the many psychometric criticisms back then which I'm not sure have been convincingly resolved.

I'm not sure whether Elizabeth has already read it, but I'd be interested in reading the highlights from that if anyone was up for distilling it down into something more manageable.


How often are new ideas discovered in old papers?

26 июля, 2019 - 04:00
Published on July 26, 2019 1:00 AM UTC

Suppose someone wrote a paper about X two decades ago. A modern reader realizes the X paper sheds light on an unrelated idea Y. Do we have any information on how often this happens? How often is this just "I figured out Y for a different reason, and while doing my lit review I realized that the X paper is also relevant for Y"?


Ought: why it matters and ways to help

25 июля, 2019 - 21:00
Published on July 25, 2019 6:00 PM UTC

I think that Ought is one of the most promising projects working on AI alignment. There are several ways that LW readers can potentially help:

In this post I'll describe what Ought is currently doing, why I think it's promising, and give some detail on these asks.

(I am an Ought donor and board member.)

Factored evaluation

Ought's main project is currently designing and running "factored evaluation" experiments, and building relevant infrastructure. The goal of these experiments is to answer the following question:

Consider a principal who wants to incentivize experts to work on hard problems, but finds that the work is too complex for the principal to tell what is good or bad. Can they overcome this problem by dividing the evaluation task into smaller pieces, delegating each of them to other experts, and recursively incentivizing good work on those pieces using the same mechanism?

Here's what an experiment looks like:

  • Recruit a pool of "judges," "helpful experts," and "malicious experts."
  • Start with a question that is too difficult for the judge to answer.
    • In initial experiments, the questions are difficult because the judge lacks relevant context. For example, the judge may want to answer a question about a long article, while only being allowed to look at a few hundred characters of text.
    • Limiting the amount of text the judge can read is an artificial restriction. This is intended as a warm-up, an easier analogy for the situation where the experts have knowledge that cannot be explained to a single judge in a reasonable amount of time (or where we want to reduce the cost of judging).
  • An "honest expert" provides a good answer to that question, a "malicious expert" provides a bad answer to that question, and a judge is tasked with telling which answer was better. The answers themselves may be too large and complex for the judge to look at, in which case they'll be handed a compact representation of each answer which they can inspect as needed.
  • In order to tell which answer is better, the judge asks subquestions. For example, they might ask "What's the best reason to think that answer A is better than answer B?" or "What does the author of article X believe about question Y?"
  • We answer these subquestions by delegating to the experts. The honest expert answers each subquestion as well as they can. The malicious expert can propose a bad subanswer, and we run the experiment recursively to see if the judge would pick the good or bad subanswer to the subquestion. If the malicious expert can propose subanswers that corrupt the top-level answer, and that are selected by the judge when we run this experiment recursively, then the malicious expert wins and the experiment fails.

This is not Ought's only project, but it's currently the largest single focus. Other projects include: exploring how well we can automate the judge's role on simple questions using existing ML, and thinking about possible decomposition strategies and challenges for factored evaluation.

Why this is important for AI alignment

ML systems are trained by gradient descent to optimize a measurable objective. In the best case (i.e. ignoring misaligned learned optimization) they behave like an expert incentivized to optimize that objective. Designing an objective that incentivizes experts to reveal what they know seems like a critical step in AI alignment. I think human experts are often a useful analogy for powerful ML systems, and that we should be using that analogy as much as we can.

Not coincidentally, factored evaluation is a major component of my current best-guess about how to address AI alignment, which could literally involve training AI systems to replace humans in Ought's current experiments. I'd like to be at the point where factored evaluation experiments are working well at scale before we have ML systems powerful enough to participate in them. And along the way I expect to learn enough to substantially revise the scheme (or totally reject it), reducing the need for trials in the future when there is less room for error.

Beyond AI alignment, it currently seems much easier to delegate work if we get immediate feedback about the quality of output. For example, it's easier to get someone to run a conference that will get a high approval rating, than to run a conference that will help participants figure out how to get what they actually want. I'm more confident that this is a real problem than that our current understanding of AI alignment is correct. Even if factored evaluation does not end up being critical for AI alignment I think it would likely improve the capability of AI systems that help humanity cope with long-term challenges, relative to AI systems that help design new technologies or manipulate humans. I think this kind of differential progress is important.

Beyond AI, I think that having a clearer understanding of how to delegate hard open-ended problems would be a good thing for society, and it seems worthwhile to have a modest group working on the relatively clean problem "can we find a scalable approach to delegation?" It wouldn't be my highest priority if not for the relevance to AI, but I would still think Ought is attacking a natural and important question.

Ways to helpWeb developer

I think this is likely to be the most impactful way for someone with significant web development experience to contribute to AI alignment right now. Here is the description from their job posting:

The success of our factored evaluation experiments depends on Mosaic, the core web interface our experimenters use. We’re hiring a thoughtful full-stack engineer to architect a fundamental redesign of Mosaic that will accommodate flexible experiment setups and improve features like data capture. We want you to be the strategic thinker that can own Mosaic and its future, reasoning through design choices and launching the next versions quickly.Our benefits and compensation package are at market with similar roles in the Bay Area. We think the person who will thrive in this role will demonstrate the following:4-6+ years of experience building complex web apps from scratch in Javascript (React), HTML, and CSSAbility to reason about and choose between different front-end languages, cloud services, API technologiesExperience managing a small team, squad, or project with at least 3-5 other engineers in various rolesClear communication about engineering topics to a diverse audienceExcitement around being an early member of a small, nimble research organization, and playing a key role in its successPassion for the mission and the importance of designing schemes that successfully delegate cognitive work to AIExperience with functional programming, compilers, interpreters, or “unusual” computing paradigmsExperiment participants

Ought is looking for contractors to act as judges, honest experts, and malicious experts in their factored evaluation experiments. I think that having competent people doing this work makes it significantly easier for Ought to scale up faster and improves the probability that experiments go well---my rough guess is that a very competent and aligned contractor working for an hour does about as much good as someone donating $25-50 to Ought (in addition to the $25 wage).

Here is the description from their posting:

We’re looking to hire contractors ($25/hour) to participate in our experiments [...] This is a pretty unique way to help out with AI safety: (i) Remote work with flexible hours - the experiment is turn-based, so you can participate at any time of day (ii) we expect that skill with language will be more important than skill with math or engineering.If things go well, you’d likely want to devote 5-20 hours/week to this for at least a few months. Participants will need to build up skill over time to play at their best, so we think it’s important that people stick around for a while.The application takes about 20 minutes. If you pass this initial application stage, we’ll pay you the $25/hour rate for your training and work going forward.Apply as Experiment Participant

I think Ought is probably the best current opportunity to turn marginal $ into more AI safety, and it's the main AI safety project I donate to. You can donate here.

They are spending around $1M/year. Their past work has been some combination of: building tools and capacity, hiring, a sequence of exploratory projects, charting the space of possible approaches and figuring out what they should be working on. You can read their 2018H2 update here.

They have recently started to scale up experiments on factored evaluation (while continuing to think about prioritization, build capacity, etc.). I've been happy with their approach to exploratory stages, and I'm tentatively excited about their approach to execution.


On the purposes of decision theory research

25 июля, 2019 - 10:18
Published on July 25, 2019 7:18 AM UTC

Following the examples of Rob Bensinger and Rohin Shah, this post will try to clarify the aims of part of my research interests, and disclaim some possible misunderstandings about it. (I'm obviously only speaking for myself and not for anyone else doing decision theory research.)

I think decision theory research is useful for:

  1. Gaining information about the nature of rationality (e.g., is “realism about rationality” true?) and the nature of philosophy (e.g., is it possible to make real progress in decision theory, and if so what cognitive processes are we using to do that?), and helping to solve the problems of normativity, meta-ethics, and metaphilosophy.
  2. Better understanding potential AI safety failure modes that are due to flawed decision procedures implemented in or by AI.
  3. Making progress on various seemingly important intellectual puzzles that seem directly related to decision theory, such as free will, anthropic reasoning, logical uncertainty, Rob's examples of counterfactuals, updatelessness, and coordination, and more.
  4. Firming up the foundations of human rationality.

To me, decision theory research is not meant to:

  1. Provide a correct or normative decision theory that will be used as a specification or approximation target for programming or training a potentially superintelligent AI.
  2. Help create "safety arguments" that aim to show that a proposed or already existing AI is free from decision theoretic flaws.

To help explain 5 and 6, here's what I wrote in a previous comment (slightly edited):

One meta level above what even UDT tries to be is decision theory (as a philosophical subject) and one level above that is metaphilosophy, and my current thinking is that it seems bad (potentially dangerous or regretful) to put any significant (i.e., superhuman) amount of computation into anything except doing philosophy.

To put it another way, any decision theory that we come up with might have some kind of flaw that other agents can exploit, or just a flaw in general, such as in how well it cooperates or negotiates with or exploits other agents (which might include how quickly/cleverly it can make the necessary commitments). Wouldn’t it be better to put computation into trying to find and fix such flaws (in other words, coming up with better decision theories) than into any particular object-level decision theory, at least until the superhuman philosophical computation itself decides to start doing the latter?

Comparing my current post to Rob's post on the same general topic, my mentions of 1, 2, and 4 above seem to be new, and he didn't seem to share (or didn't choose to emphasize) my concern that decision theory research (as done by humans in the foreseeable future) can't solve decision theory in a definitive enough way that would obviate the need to make sure that any potentially superintelligent AI can find and fix decision theoretic flaws in itself.


AnnaSalamon's Shortform

25 июля, 2019 - 08:24
Published on July 25, 2019 5:24 AM UTC


Dony's Shortform Feed

25 июля, 2019 - 02:48
Published on July 24, 2019 11:48 PM UTC


Metaphorical extensions and conceptual figure-ground inversions

24 июля, 2019 - 09:21

DEF CON / Las Vegas meetup (Aug. 8, 2019)

24 июля, 2019 - 08:41
Published on July 24, 2019 5:41 AM UTC

Anyone else going to DEF CON 27? Let's meet up!

* Thursday, August 8, 2019
* 4:30pm - 6:00pm (Pacific Daylight Time)
* Cafe Belle Madeleine - Paris Las Vegas (about 200 feet east of the casino floor)
* 3655 S Las Vegas Blvd, Las Vegas, NV 89109

I will be there with a sign saying "LW Meetup."

Of course, even if you have no idea what DEF CON is but just happen to be in the area, you're welcome to come as well. Hopefully we can stay in touch while we're all in town!


AI Safety Debate and Its Applications

24 июля, 2019 - 01:31
Published on July 23, 2019 10:31 PM UTC

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

All of the experimental work and some of the theoretical work has been done jointly with Anna Gajdova, David Lindner, Lukas Finnveden, and Rajashree Agrawal as part of the third AI Safety Camp. We are grateful to Ryan Carey and Geoffrey Irving for the advice regarding this project. The remainder of the theoretical part relates to my stay at FHI, and I would like to thank the above people, Owain Evans, Michael Dennis, Ethan Perez, Stuart Armstrong, and Max Daniel for comments/discussions.

Debate is a recent proposal for AI alignment, which naturally incorporates elicitation of human preferences and has the potential to offload the costly search for flaws in an AI’s suggestions onto the AI. After briefly recalling the intuition behind debate, we list the main open problems surrounding it and summarize how the existing work on debate addresses them. Afterward, we describe, and distinguish between, Debate games and their different applications in more detail. We also formalize what it means for a debate to be truth-promoting. Finally, we present results of our experiments on Debate games and Training via Debate on MNIST and fashion MNIST.

Debate games and why they are useful

Consider an answer A to some question Q --- for example, "Where should I go for a vacation?" and "Alaska". Rather than directly verifying whether A is an accurate answer to Q, it might be easier to first decompose A into lower-level components (How far/expensive is it? Do they have nice beaches? What is the average temperature? What language do they speak?). Moreover, it isn't completely clear what to do even if we know the relevant facts --- indeed, how does Alaska's cold weather translate to a preference for Alaska from 0 to 10? And how does this preference compare to English being spoken in Alaska? As an alternative, we can hold a debate between two competing answers A and A′="Bali" to Q. This allows strategic debaters to focus on the most relevant features of the problem (perhaps the language isn't important), and replace the possibly-difficult direct evaluation by often-simpler comparison of two options (whatever their position on a 0-10 scale, Bali's beaches are clearly better than Alaska's).

Importantly for AI-alignment, this approach also naturally incorporates elicitation of human feedback. Recently, the paper AI Safety via Debate [1] proposed training agents via self-play on a zero-sum debate game. Given a question or proposed action, two agents take turns making short statements up to a limit, then a human judges which of the agents gave the most true, useful information.

Unlike [1] we explicitly distinguish between debating over which of the answers is better (Debate game) and coming up with answers for which to argue. This is useful since the two tasks often require somewhat different skills, and they might be performed by different agents. Moreover, this distinction highlights the fact that Debate games have several other uses apart from “training via debate”: below, we also describe how to use Debate games for “answer verification” and “incentivizing an AI to give a useful answer”.

Terminology remark: To settle on precise terminology, we use "question", "answers", and "arguments" to refer to the topic of the debate, the initial claims made by the two debaters, and the subsequent statements made during the debate. However, the usefulness of debates is by no means restricted to Oracle AIs since we could also consider questions of the form Q="Which policy would you adopt to perform task T?".

Open problems around AI Safety Debate

We now list the main high-level open problems around AIS debate. For further open questions and suggestions for future work, see the original Debate paper. In particular, we do not discuss the issues related to holding the debate in natural-language or the possibility of hacking the human judge while debating.

(H) How exactly does debate look like? To be able to apply debate in the first place, we need to understand -- in enough detail to make the ideas implementable -- (i) what “holding a Debate game around Q” means and (ii) how to use this Debate game for getting useful answers to Q.

(TW) Does truth win? Crucially, the optimal strategy in a Debate game corresponding to Q, A1, and A2 consists of making the most convincing arguments --- not the truthful ones. Similarly, the optimal way of selecting which answer to argue for (described in more detail in a later section) is to choose the most convincing answer --- not the correct one. For debate to be useful, we need to understand the conditions under which persuasiveness coincides with truthfulness. Indeed, as we can see at the example of political debate on the one hand and mathematical proofs on the other, different “debate settings” can greatly influence whether debate is truth-promoting or not.

(C) Competitiveness: Is the debate approach comparably efficient with alternative ways of obtaining answers to Q? (If not, then the economic incentives might prevent it from being adopted.)

State of the art: The original Debate paper gave some examples of (H)(i) and presented an initial ML experiment to address (TW), showing that an MNIST debate (described in the paper) is reasonably truth-promoting. Regarding (TW), it also showed that a certain idealized version of debate solves problems in the PSPACE class. As we point out later, an analogy can be made between “training via debate” and the AlphaZero algorithm. This gives some evidence that the answer to (C) might be positive.

Our work: As a part of the third AI Safety camp, we sought to better understand (H) and (TW). The present document summarizes our theoretical progress in (H). Regarding (TW), we have replicated the MNIST experiment above. We have then extended the Debate game to fashion MNIST and investigated the impact of several different debate protocols. We have also implemented a prototype architecture that uses the Debate game for training. We view this prototype as a part of (H)(ii) and a prerequisite to the investigation of (C). The main results of these experiments are presented below. Further questions we wish to investigate are listed in this google doc.

Technical description of Debate games

As a way of getting closer to answering the first part of (H), we can think of a debate game as having the following ingredients:

  • Question: A high-level and/or difficult-to-answer question Q in whose answer we are interested.
  • Answers: Candidates A1 and A2 for a correct answer to Q.
  • Low-level components: Facts about Q, A1, and A2 (e.g., the distance to Bali, temperature in Alaska in July, visa rules in Bali, ...).
  • Decomposition: The rules governing how the low-level components of Q aggregate into which of the two answers is better (e.g., the temperature only matters if we can actually get to the place, the expenses are additive, ...).
  • Debate protocol: The syntax of arguments the debaters can make, rules governing which arguments are allowed in which part of the debate, a way of deciding when the debate ends, and possibly some other parts.
  • Judge: A way of: (a) expressing preferences about low-level components ("I like warm weather more than a blizzard."), (b) verifying claims about them (googling for weather Alaska, having a vague impression that Alaska is closer to Europe than Bali) and about the decomposition ("If Bali is warmer and I can get there, then I like it more."), and (c) observing adherence to the debate protocol.

Together, these ingredients combine into a Debate game, which can be played by agents possessing varying levels of knowledge and debate skill:

i. Agents state their answers to the judge (independently of each other).

ii. The agents take turns making arguments to the judge. These can relate to (a) Q, (b) the pair (A1,A2), (c) all arguments made so far, or any combination of (a)-(c).

  • Which arguments are legal depends on the decomposition and the debate protocol.
  • Which arguments actually end up being made depends on the agents and their debating skill and knowledge.

iii. Once arguments get sufficiently low-level, they are used as in input for the judge, whose output then determines the winner and the corresponding utilities u1∈[−1,1], u2=−u1.


  • While the ingredients above seem fruitful for thinking about Debate games, the line between low-level components, decomposition, and debate protocol might often be blurry.
  • The judge might be suboptimal or biased in many ways (e.g., being mistaken about their preferences, having a chance of making errors verifying facts or decomposition, not having access to the correct information in the first place, being prone to deception).
  • Debate games make sense even when there is no human-preference aspect; indeed, the self-play performed by AlphaZero's training procedure is a prime example of this ([1]).
  • In human debates, it is expected that neither of the debaters has a full picture, and it is, therefore, acceptable to make a false sub-claim that the debater later retracts. However, we typically hold the debating AIs to a higher standard, meaning that as soon as any of the sub-claims made in the debate is proven wrong, the agent who made that claim loses the debate.
  • Consider the question “How many bridges do you cross when taking the Trans-Siberian Railway?”. To get to a low-level question “Is there a bridge between points A and A + 100m ?”, answerable by a google-maps “judge”, we can use the decomposition rule “bridges in [A,C] = bridges in [A,B) + bridges in [B,C]” One viable debate protocol is that one agent proposes the answer “There are X bridges between 0 km and 9 258 km”. If the other agent opposes the answer, the debate proceeds recursively by the proponent claiming that “There are Y bridges between 0 km and 92582 km and X−Y between 92582 km and 9258 km.” and the opponent choosing whether to dispute the first or the second part.
  • The debate protocol can be very informal, such as in real-life debates. However, it can also be very formal, such as when two players (implicitly) announce “I am better at chess than you!”, and then play a match to prove their points.
  • Debates can vary a lot in how truth-promoting they are --- from political debates on one side of the spectrum to mathematical proofs (and their verification) on the other.
Applications of Debate games

To partially address the second part of (H), we list three applications of Debate games that would --- assuming all the details can be fleshed out --- all be relevant to AI alignment.

A) Answer verification

Suppose we have an answer A to some question Q, but we don't know whether A is trustworthy or not (A could be our best guess at Q, or something suggested by a black-box AI). To increase our confidence in the answer, we call in two debate AIs, and have one argue for "A is a correct answer to Q" and the other for "A is a wrong answer to Q". Alternatively, we generate a lot of competing answers A′, and have debates over "Which better answers Q, A or A′?". If A loses any of the debates, we throw it away (while learning what was wrong with it). Stuart Armstrong’s Devil’s advocate AI falls under (A).

B) Training via Debate (debate as a training signal)

Let Q be a class of questions. Suppose that you cannot directly evaluate how good an answer A is to Q∈Q but for any alternative A′ you can compare whether A is better or worse than A′ by zooming in on some details of A and A′. We then train a question-answering system for Q using some form of active learning, where self-play debates between promising answers are used as a training signal. AlphaZero is an instance of (B).

C) Incentivizing an AI to give a useful answer

Suppose we have two strong AIs capable of answering some question Q (e.g., Oracles). We can incentivize useful answers to Q by rewarding the AI whose answer wins the debate over "Which is a better answer to Q, A1 or A2?" (the debaters might, or might not, be the original two AIs).[1] I am not aware of an existing instance of (C).[2]

We now give several examples and a more detailed description of the Debate game and its ingredients. We then briefly recap the progress made so far, highlight related open questions, and give a non-technical summary of the experimental results obtained at the third AI Safety Camp.

Truth-promotion as the key desired property of Debate games

We now give a more precise game-theoretical definition of truth-promotion for (TW).

Consider a space A of possible answers to a question Q. For fixed answers A1, A2∈A and debating policies π1, π2, the expected outcome of the Debate game is some vA1,A2:=Eπ[u1|A1,A2]. For simplicity, assume that Q only has one correct answer Atrue. As a first approximation of debate's usefulness, we can define the truth-win probability twp(Q) as the chance that Atrue to Q wins (u1≥0) against a randomly chosen A′∈A. This definition is somewhat vague and sensitive to the choice of A, but it has one more important problem: even with twp(Q)=99.9 there might be an alternative incorrect answer that wins more debates than Atrue, and conversely, even with twp(Q)=51 it might be the case that all other answers win even fewer debates than Atrue.

Rather than depending on win-probability in the above sense, an "optimal" choice of an answer to Q then corresponds to finding some Nash equilibrium σ∗ of the (zero-sum) matrix game with payoff matrix M=(vA1,A2)Ai∈A and selecting A∼σ∗. We say that debate about Q is truth-promoting if the only Nash equilibrium of M is the pure strategy under which both players pick Atrue.[3] More generally, we define the truth-promotion likelihood tpl(Q):=Eσ∗1∈NE σ∗1(Atrue) as the probability of picking the correct answer under a (uniformly) randomly selected Nash equilibrium strategy.[4] This will be 0 or 1 for questions with a single, pure, equilibrium, but can be any number in between for questions with multiple equilibria or mixed equilibria.

Overall, we conjecture that most (reasonable) applications of debate will be successful if and only if the corresponding truth-promotion likelihoods are high.

Experimental results

This section presents the experimental results of our AISC3 project. The implementation can be found here. We assume that the reader is familiar with the MNIST debate experiment from [1]. Since the details of that experiment are somewhat specific, we highly recommend reading Section 3.1 of [1] before interpreting the results presented below.

Debate games

Recall that [1] considers Debate games around the question “Which digit is shown on this MNIST image?”, with the judge being a neural network classifier that can only see 4 (or 6) pixels of the image, and debate protocol “agents take turns revealing non-black pixels of the digit, until 4 (6) pixels have been revealed”. The debaters select their actions using a Monte Carlo tree search (MCTS) algorithm.

As the first step in our empirical investigation, we reimplemented this experiment, investigated how the results change if we vary the length of debates and the strength of the debaters, and extended the results to the fashion MNIST dataset. The table below shows that our results closely resemble[5] those of [1].

In Figures 1, 2, and 3 below, we can see how the honest player’s win-chance (defined in more detail in [1]) increases when more pixels are shown (4, 6, or 10), and how it develops as the debaters get stronger (i.e., as MCTS uses more rollouts). In particular, we see that there is initially a massive advantage in being the second player, and this advantage mostly disappears once the debaters are strong enough. This implies that the ability of the second player to react to the actions of the first player isn’t decisive in the MNIST debate; rather, we conjecture that the main reason for this “last-mover advantage” is that it’s easier for MCTS algorithm to look fewer moves ahead.

Figures 1 to 3: The percentage of images for which the honest player wins the debate against a non-precommitted opponent (no P), resp. all debates against a precommitted opponent (P) (see [1]). The x-axis denotes the number of MCTS rollouts per decision; 0-rollout MCTS chooses actions randomly. The "avg" graph depicts the average of win-chance when the honest player goes first and second. The 0, 100, 1000, 10k, and 100k-rollout statistics are evaluated on 10k, 4k, 400, 100, and 100 images respectively (possibly explaining the drop between 10k and 100k win-rate). Note that randomly selecting an answer would lead to 10% accuracy (and not, e.g., 50%).

In all scenarios, the honest-player win rates are somewhat lower for fashion MNIST (Figures 4, 5, and 6) --- we believe this is because the fashion MNIST game is more difficult than the standard MNIST, mostly because there are more non-black pixels, which makes the branching factor of the game tree roughly an order of magnitude higher.

Figures 4 to 6: Fashion MNIST variants of Figures 1, 2, and 3.

While the above measurements do give some indication of how often is the MNIST debate truth-promoting, it is not true that the average of honest player’s win chances (depending on whether they go first or second) would be an accurate estimate of (or even equal to) the truth-promotion likelihood. The reason for using the win-chances as defined in [1] was that it allows for a comparison of results, but also that a rigorous measurement of tpl requires more computational resources (at least without further theoretical insights). As future work, we plan to measure tpl more precisely and analyze how often is there a single best answer (when the starting player is selected randomly[6]).

Training via Debate

Figure 7: A simple architecture for Training via Debate.

As our main practical contribution, we implemented a prototype pipeline (above) for training a (full) MNIST classifier when one only has access to Debate games, but not to the true labels. The main idea to use the neural network to predict a label for an image, run a debate against the non-precommitted opponent[7] (using the terminology of [1]), and update the neural network weights towards the label only if it won the debate. In the graph below, we see that the training procedure is stable, and achieves an accuracy that is even significantly higher than the honest player's win-chance in the corresponding debates (this is probably due to some of the errors canceling out during the training). While this training only used the rather weak 100-rollout MCTS debaters[8], it nonetheless demonstrates the viability of Training via Debate in this setting.

Figure 8: Accuracy of an MNIST classifier trained via debates that used a 6-pixel sparse classifier as a judge (without access to the true labels) and 100-rollout MCTS debaters. Each line depicts one run of training.

[1] Irving, Geoffrey, Paul Christiano, and Dario Amodei. "AI safety via debate." arXiv preprint arXiv:1805.00899 (2018).

  1. Since the AIs are assumed to maximize the reward, and they know how the process works in advance, we hope that both A1 and A2 will be good answers to Q (possibly even having A1=A2). The debate and the consequent rewards thus only serve as an incentive for the AIs, and we should be able to adopt either of the answers without knowing the result. ↩︎

  2. Arguably, (B) and (C) are similar in spirit, and in particular, an agent might implement (B) to perform well in (C). The main differences are that (i) a human commissions the debate in (C) while an AI does so in (B), (ii) answers in (C) are selected strategically whereas those in (B) aim to maximize information value, and (iii) only one question gets answered in (C). Moreover, (iv) the debates in (B) are likely to be judged automatically while those in (C) are judged by an actual human. ↩︎

  3. If the only Nash equilibrium is select the true answer 100% of the time, all incorrect answers are "unstable" in the sense that there exists some other answer which will expose them (i.e., win the Debate game against them). However, this answer doesn't necessarily have to be the true one --- truth might tie against them. However, we can also require a stronger property where the true answer wins the Debate game against every other answer. (_h/t Max Daniel) ↩︎

  4. Since Nash equilibria in two-player zero-sum games form a simplex, we have tpl(Q)=σ∗,center1(Atrue) for σ∗,center:= the center of the NE simplex. ↩︎

  5. Note that we used the standard version (rather than PUCB as in [1]) of the Monte Carlo tree search algorithm (which is furthermore inherently non-deterministic). Also, due to computational limitations, we only used 100 images to evaluate our results, instead of 10 000 used in [1]. This probably explains the differences in obtained win-rates. ↩︎

  6. We cannot hope to decide whether the honest debater goes first or second since in practical applications of debate, we would not know which debater is which. ↩︎

  7. Running the non-precommitted debate isn't the only way of implementing Training via Debate --- we chose this approach to simplify implementation and reduce computation time. ↩︎

  8. Due to computational restrictions, we only ran Training via Debate using the results of debates between 100-rollout-MCTS agents. As one can guess from looking at Figure 2, this leads to similar performance as training from random debates. However, more computation and an optimized implementation should allow for training with much stronger debaters and (according to Figure 2) lead to significantly better accuracy. ↩︎


The three ways to upset people with your speech (private)

23 июля, 2019 - 21:06
Published on July 23, 2019 6:06 PM UTC

In the proposal, I stated that even though we want people to be able to communicate things which might be upsetting to others, we should encourage communicators of such “upsetting information” to not make their communications any more upsetting than necessary. Jessica fairly requested clarification on this point.

I haven’t had an explicit model here, so this is fresh, but I think I’d describe the picture something like the following:

Different Causes of Upsetness

When communicating something to someone, they can be upset by any or all of the following:

  1. The information contained in your core, primary, explicit, propositional message is upsetting to them, e.g. you inform someone that they have been donating to cause X which is in fact quite harmful.
  2. Secondary information they perceive in your message (e.g. your attitudes, intentions, emotional state, relationship to the receiver which expressed via the non-explicit, pragmatic aspects of speech) is upsetting to them, e.g. they think that you are judging them for donating to a harmful cause and are inciting others to do the same.
  3. They are upset because of implications and inferences they assign to the fact that you communicated your core message. Even if you did not send any secondary information that directly indicate a threat, they may be reacting to the core fact they were criticised in public.
1. The information in your core, propositional message is upsetting to someone

I am against people needing to worry much about the core content of their message being upsetting to others. In other words, you should be able to make clear explicit statements about reality which you believe are true without needing to self-censor because of how others don’t like hearing about those realities.

I am strongly afraid that any policy or norm which tried to limit what you could assert about reality would too easily get in the way of communicating information which ought to be communicated, e.g. criticisms of people doing harm. In this aspect, I do not think feelings should be spared at the expense of truth. Instead people should be able to state their beliefs unambiguously.

Upsetness due to this first factor is what I would consider something like the baseline upsetness of a message. Attempting to minimize this component of upsetness is risky to epistemics and I don’t advocate for it. By extension, I think of this component as your “best case” upsetness. If a person is no more upset by what you said than what can be attributed to the raw fact of having learned something, then you are doing well. (I like to think of this as the upsetness which would result from "immaculate inception”, i.e., how upset someone would be if God planted that same message right into their heads without there being a social act of someone else communicating it to them).

2. Secondary information perceived in your message is upsetting.

Let’s use the reused the example of stating cause X is harmful and that supporting it is causing harm. That is the core proposition of your message which is a valid thing to communicate. We can imagine two different posts communicating this same core message.

Version 1:

I applaud the efforts of all those trying to do good, those who donate their time, attention, and money towards making this world better. Unfortunately, good intentions don’t always cause good outcomes. I regret to say that after careful investigation I believe cause X is in fact harmful (as I will elaborate), and that those who have supported it should place their efforts elsewhere. It is important that we pay attention to factors A and B . . .

Version 2:

I can’t believe that people who support cause X think they are good people! They are fooling themselves! So much for “effective”. Even a quick look at the evidence a few seconds thought should set off alarm bells: factors A and B are reason alone to suspect that X is causing a lot of damage. People supporting cause X should STOP NOW.

These two posts contain the same core message: “cause X is harmful because of factors A and B, those who have supporting it should stop.” Yet there is very different secondary information contained in each. The speaker in version 1 exerts themselves to express that notwithstanding their opinion that those supporting cause X have been contributing to harm, they are still positively disposed to those supporters and recognize their nobel efforts. The post expresses the criticism clearly, but tries to make this about the object level question of how to do good, and not be something that attacks the goodness or status of those implicated in message.

In contrast, the speaker of version 2 is palpably attacking both the act of supporting cause X and those who have been supporting X to date. Version 2 is quite clearly trying to lower the status of those who have supported cause X and incite the audience against them. For the sake of illustration, I have used a very not subtle example to highlight how much extra information exists beyond the core message. In the real world, people aren’t so over the top, but they still pack judgments and implications into their secondary channels of their communication, even just subtle things like “I judge the poor judgment and naivete of these people” or “I’d kind of prefer if they went away.”

It is primarily upsettingness from this component (potentially hostile secondary information) that I want to push that people invest effort to minimize. When you have something upsetting to say, try to write something closer to version 1 and definitely not something like version 2. The same goes if you’re just criticizing an argument or viewpoint as when criticizing actions. Someone might have been a mistake, but that doesn’t mean you need to try to ridicule them. Who knows, maybe you’re making the mistake. Yet if people are using the secondary channels to communicate hostilities and take potshots at each other, it makes hard to update and admit wrongdoings and generally feel safe while pursuing the truth together.

3. People are upset because of the inferences and implications they assign to the raw fact your communicate your core message.

Version 3:

Supporting cause X causes harm. Factors A and B are evidence of this. People who have been supporting cause X should stop . . .

The version is extremely to the point. No effort is expended either trying to attack or defend those who’ve been supporting cause X to date. In an ideal world, this would be the ideal message. Nice, efficient, clear communication.

Regrettably, people might still react poorly to public posting of version 3, and not entirely without justification. It shouldn’t be the case that making a mistake equates to someone being evil. You, author of version 3, may emphatically believe that. You, in your heart of hearts, may bear no ill feelings towards those who’ve been erroneously supporting cause X. But those who’ve been supporting cause X don’t know that - they’re used to a world where people who accuse you of doing harmful things are people who don’t like you and who want to cause you harm. Or maybe they know that you don’t bear them any ill-will, but they’re afraid of what others will think of them still in light of this public criticism. Their political enemies could pick up on this fresh criticism and add it to their arsenal. They feel they are under threat, if not from you, then from your message, and they must respond to protect themselves.

If you are definitely not doing anything to signal hostility or threat or judgment via your secondary channels when sending a message (though it can be hard to be sure), i.e. “pure” version 3 [1], then I feel that you’ve probably met your obligations for not upsetting people (even if they still get upset). It is not strictly your problem if people are afraid of any public criticism and would it would be bad of you to censor yourself because of their fears here. You’re not doing anything beyond trying to earnestly communicate information you think is valuable.

At the same time, I think it is supererogatory to invest some effort countering the fears/upset people might have due to the third component. I think it costs relatively little (once you’ve learnt the habit and learnt to appreciate people) to offer some reassurance at the same time as you criticize and correct. To inflect a friendly tone rather than a hostile one. Deliver the crucial message that cause X is harmful or wrong, but also reaffirm that you think these people are good (hopefully you can earnestly believe that) and still want them as friends/neighbors/allies. In an ideal world you wouldn’t have to waste the extra keystrokes, but it’s a relatively cheap way to keep discourse feeling pleasant and safe for all those involved. So if you’re going to criticize someone (behavior, idea, or other) (which you should if it’s warranted), please spend just a few words countering the fears people predictably have when they are criticised.

Show some good will, help them save face, signal that you want to cooperate. These are all eminently compatible with sharing hard truths.

[1] It’s entirely possible to be harm others with perfectly neutral statements, so even writing the minimal words to convey your message, it is the case that a) you are maybe still attacking them, or b) your listener can reasonably infer that you are given very reasonable priors and the context, etc. A lot of information is contained in which particular topics are raised and the timing. Since these communications all depend so heavily on context, it’s hard to discuss in the abstract. Judgment is required.


How to take smart notes (Ahrens, 2017)

23 июля, 2019 - 18:35
Published on July 23, 2019 3:34 PM UTC

This is my rephrasing of (Ahrens, 2017, How to Take Smart Notes). I added some personal comments.

The amazing note-taking method of Luhmann

To be more productive, it's necessary to have a good system and workflow. The Getting Things Done system (collect everything that needs to be taken care of in one place and process it in a standardised way) doesn't work well for academic thinking and writing, because GTD requires clearly defined objectives, whereas in doing science and creative work, the objective is unclear until you've actually got there. It'd be pretty hard to "innovate on demand". Something that can be done on demand, in a predetermined schedule, must be uncreative.

Enter Niklas Luhmann. He was an insanely productive sociologist who did his work using the method of "slip-box" (in German, "Zettelkasten").

Making a slip-box is very simple, with many benefits. The slip-box will become a research partner who could "converse" with you, surprise you, lead you down surprising lines of thoughts. It would nudge you to (number in parenthesis denote the section in the book that talks about the item):

  • Find dissenting views (10.2, 12.3)
  • Really understand what you learned (10.4, 11.2, 11.3, 12.6)
  • Think across contexts (12.5)
  • Remember what you learned (11.3, 12.4)
  • Be creative (12.5, 12.6, 12.7, 13.2)
  • Get the gist, not stuck on details (12.6)
  • Be motivated (13.3)
  • Implement short feedback loops, which allows rapid improvements (12.6, 13.5)
Four kinds of notes Fleeting notes

These are purely for remembering your thoughts. They can be: fleeting ideas, notes you would have written in the margin of a book, quotes you would have underlined in a book.

They have no value except as stepping stones towards making literature and permanent notes. They should be thrown away as soon as their contents have been transferred to literature/permanent notes (if worthy) or not (if unworthy).


Jellyfish might be ethically vegan, since they have such a simple neural system, they probably can't feel pain.

Ch. 9 How to attention attend:

  1. One thing at a time. No multitasking
  2. When writing, attend to idea flow. Meaning, not wording. ...
Literature notes

These summarize the content of some text, and give the citation.


(Kahneman & Tversky, 1973) shows that people often do not take into account the prior when doing a Bayesian probability problem. In particular, when no evidence is given, the prior probabilities are used; when worthless evidence is given, prior probabilities are ignored.

Kahneman, Daniel, and Amos Tversky. “On the Psychology of Prediction.” Psychological Review (1973)

Such notes could be made in Zotero, which is how I do it. You might make them separately in some other notebook software, or just in plain text files.

Permanent notes

Each permanent note contains one idea, explained fully, in complete sentences, as if part of a published paper.

There are many tools available for storing the permanent notes, see Tools • Zettelkasten Method. I personally recommend TiddlyWiki.

Project notes

These are notes made only for a project, such as a note that collects all the notes that you'd want to assemble into a paper. They can be thrown away after the project is finished.

Four principles Writing is the only thing that matters.

Don't just read. Make reading notes. Don't just learn. Make blog posts or something to share what you learned.

Also, hand-written notes has some advantage. In (Mueller & Oppenheimer, The Pen Is Mightier Than the Keyboard: Advantages of Longhand Over Laptop Note Taking, 2014), it's shown that students who take notes by laptop understood lectures less, due to their tendency to transcribe verbatim without understanding. From mouth to ears to fingers, bypassing the brains completely.

The way I see it, this is not an argument against using the computer, but an argument for repharsing instead of copy-pasting/direct quoting/mere transcribing.

Be simple

Don't underline, highlight, write in the margins, or use several complicated systems for annotation. It'd make it really hard for you to retrieve these scattered ideas later. You would be forced to remember with your biological brain to keep track of what information is put where.

Put all these ideas in the same simple system of your slip-box, and you will be set free to use your biological brain to think about these ideas.

Your simple slip-box system would be like an external brain that interfaces seamlessly with your biological brain.

Papers are linear, but writing is nonlinear

This is why advice on "how to write" in the form of a list of "do this then that" is bound to do badly.

Instead, you should write a lot of permanent notes in your slip-box. Then when the time comes for you to write a paper, just select a linear path out of the network of notes, then rephrase and polish that into a paper.

Calculate productivity not by how many pages of paper you've written, but by how many permanent notes you've written per day. This is because some pages of a paper can take months to write, others can take hours. In contrast, each permanent note takes roughly the same amount of time to write.

Short feedback loops

Feedback loops should be short. It makes you learn fast, fail fast, succeed fast. According to (Kahneman & Klein, Conditions for Intuitive Expertise: A Failure to Disagree, 2009), this is how intuitive expertise is made: a lot of practice in an environment with rapid and unambiguous feedback.

The traditional way of writing a paper takes months before you get a feedback in the form of reviewers' comments. Instead, you should make notes, which you could make several per day, allowing fast feedback loops. If you really understood something, you'd see it in the form of a well-written note. If not, then you know you haven't really understood it. You can experiment with other ways to make the notes and you will see immediately what works and what doesn't.

Six methods How to pay attention

Don't multitask. Pay attention to one task at a time.

When writing, pay attention to the idea flow, what you want the words to mean. Don't pay attention to what the words actually mean.

When proofreading, pay attention to what the words are saying, and not what you think they mean.

Pay attention only to what you must and don't pay attention to anything else, because attention is very precious.

Routinize things that can be routinized, such as food, water, clothes... Wear only one outfit ever, like Steve Jobs. Eat only one meal plan, buy exactly the same kind of groceries, or better, always eat the first vegan meal plan at the canteen.

Use the Zeigarnik effect to your advantage. If you want something to stop intruding your mind, write it down and promise yourself that you'll "deal with it later". If you want to keep pondering something (perhaps a problem you want to solve), don't write it down, and go for a walk with that problem on your mind.

How to make literature notes

As mentioned before, each literature note contains exactly two parts: the content of a text, and the bibliographical location of the text. If you do the note in a bibliography software like Zotero, you can attach the note directly to the text, and there's no need for the bibliography information.

The most important thing is to capture your understanding of the text, so don't quote. Quoting can easily lead to out-of-context quoting. Preseve the context as much as possible by paraphrasing.

Prepare the literature notes so that when you make permanent notes, you can elaborate on the texts, that is, describe the context, find connections and contrasts and contradictions with other texts.

How to make permanent notes

Recontexutalize ideas in your thought. Write down why you would care about an idea. For example (from section 11.2), if the idea is an observation from (Mullainathan and Shafir, 2013, Scarcity: Why having too little means so much):

people with almost no time or money sometimes do things that don’t seem to make any sense... People facing deadlines sometimes switch frantically between all kinds of tasks. People with little money sometimes spend it on seeming luxuries like take-away food.


As someone with a sociological perspective on political questions and an interest in the project of a theory of society, my first note reads plainly:

Any comprehensive analysis of social inequality must include the cognitive effects of scarcity. Cf. Mullainathan and Shafir 2013.

How to link between notes

There are three kinds of links between notes:

  • Index -> Entry point note
  • Note -> Note
  • Note <-> Note

At the top level, there is one note called "Index". The index note is just a list of tags/keywords with links. Each tag/keyword is a topic that you care about, and is linked to a few notes (Luhmann limited himself to at most 2) that serve as "entry points" to the topic.

The entry points are often notes that give overviews to the topic. Luhmann would make these notes to be an annotated list of notes that cover various aspects of the topic. His entry-point notes would have list length up to 25.

Between notes, there are two kinds of links: sequential and horizontal. In fact, sequential links are really just horizontal links that you annotate as "sequential".

For example, consider this note:

Following: [link 1] [link 2]...

Content content [link 3] content content [link 4]...

Followed by: [link 5] [link 6] ...

After reading this note, you can go along the sequence and read "Followed by" notes, or take a sideways stride and follow the horizontal link [link 3].

The advantage of marking some links as sequential is that you get clear sequences of thought that you can easily follow, but they are by no means essential. You could just make horizontal links.

Ideally, you should make the network of slip-box notes to be like a small-world network, with a few notes having many connections, and some notes having "weak ties" to far-away notes (Granovetter, Mark S, 1977 The Strength of Weak Ties).

How to write a paper

Don't brainstorm, since brainstormed ideas are what's easily available, instead of innovative or actually relevant. Especially don't group-brainstorm, which tend to become even less innovative due to groupthink effects (Mullen, Brian, Craig Johnson, and Eduardo Salas, 1991, Productivity Loss in Brainstorming Groups: A Meta-Analytic Integration).

Instead, do a walk through the slip-box and select a linear path. That gives you a draft from which you can polish into a paper.

Work on several papers simultaneously, switch if bored. This is a kind of "slow multitasking", which is good multitasking. Luhmann said

When I am stuck for one moment, I leave it and do something else... I always work on different manuscripts at the same time. With this method, to work on different things simultaneously, I never encounter any mental blockages.

When you need to cut out something that you really like, but just doesn't belong to the paper (such as something that is not relevant to the argument), you can make a file named "maybe later.txt" and dump all the things that you promise to add back later (but never actually do). This is a psychological trick that works.

How to start the habit of using slip-boxes

Old habits die hard. The best way to break an old habit is to make a new habit that can hopefully replace the old habit.

For getting into the habit of using slip-boxes, you can start by making literature notes. Once you have that habit, making permanent notes would be a natural next habit to take on.


Prediction as coordination

23 июля, 2019 - 09:19
Published on July 23, 2019 6:19 AM UTC

I want to introduce a model of why forecasting might be useful which I think is underappreciated: it might help us solve coordination problems.

This is currently only a rough idea, and I will proceed by examples, pushing this post out early rather than not at all.

The standard model of forecasting

This looks something like:

We have our big, confusing, philosophical, long-term uncertainties. We then need to 1) find the right short-term questions which capture these uncertainties, which are 2) understandable to traditional Superforecasters without very deep inside knowledge, and whose expertise has only been demonstrated on short-term questions in more well-understood domains, who then 3) use tools like outside views and guesstimates to answer them.

When I hear people say they're not excited about forecasting, it's almost always because they think this standard model won't work for AI safety. I'm very sympathetic to that view.

Example 1: coordination in mathematics via formalism

When quantifying our beliefs, we lose a large amount of nuance and interpretability. This is similar to how, when formalising things mathematically, we sacrifice the majority of human understanding.

What we gain, instead, is the ability to express and communicate thoughts...

  • much more succinctly
  • using a precise, interpersonally standardised format of interpretation
  • in a way which clarifies certain logical and conceptual relations

This is a trade-off that allows a community of mathematicians to make intellectual progress together, and to effectively make results common knowledge in a way which allows them to coordinate on what to solve next.

Example 2: futures markets as using predictions for coordination

Getting enough food for everyone is a big coordination problem. We want some people to stockpile things like rice and wheat so that we're prepared for a drought, but we also don't want people to waste opportunities on storing stuff which has to be thrown away in case the next harvest goes well. These kinds of problems are solved by futures markets, which effectively predict the future price of rice, and thereby provide an incentive arbitrage away any abrupt price fluctuations (i.e. to strategically stockpile/sell out rice so as to match future supply and demand). Robert Shiller has suggested these as a candidate for the greatest financial innovation.

Example 3: predicting community consensus

One particularly interesting use case is trying to predict what the x-risk community will believe at some time t in the future. Assuming the community is truth-seeking, anyone who spots the direction in which opinions will converge in advance of their convergence has 1) performed an important epistemic service, and 2) provided important evidence of their own epistemic trustworthiness.

For example, the CAIS model has gathered a fair amount of attention. (I personally don't have a strong inside view on it.) If someone would have predicted this shift more than a year ago, we would want to trust them a bit more next time they predicted a shift in community attention.

It was mentioned to me that one researcher thought this model important more than 1.5 years ago; but the reason he thought so was not because of superior reasoning -- but because of inside knowledge.

This is an inefficiency. The frontiers of our collective attention allocation do not line up with the frontiers of our intellectual progress, and hence see abrupt fluctuations as papers are released and the advantage of inside info is dispelled.

One implementation of this might look like sending out a survey asking about important questions to important organisations on ~yearly intervals, and then have people trying to predict the outcomes of that survey.

This has one important advantage over the standard uses of forecasting: we don't have to resolve the questions "all the way down". If we simply ask what people will think take-off speeds are likely to be, rather than what take-off speeds are actually going to be, and further assume that people move closer to the truth in expectation, this gives us a much cheaper signal to evaluate.

Example 4: avoiding info-cascades

Info-cascades occur when people update off of each others beliefs without appropriately sharing the evidence for those beliefs, and the same pieces of evidence end up "double-counted". Having better system for tracking who believes what and why could help solve this, and prediction systems could be one way of doing so.

Example 5: building fire alarms

Eliezer notes that rather than being evidence of a fire, fire alarms make it common knowledge that it's social acceptable to act as if there's a fire. They're the cue on which everyone jumps from one social equilibrium to another.

Eliezer claims there's no fire alarm for AGI in society more broadly. I suspect there are also areas within the x-risk space where we don't have fire alarms. Prediction systems are one way of building them.

Who is going to make the forecasts?

An important clarification: I'm not saying that we should "outsource" the intellectual work of solving hard x-risk research problems to forecasters without domain-expertise. (That is another interesting and controversial proposal one might discuss.)

Rather, I'm saying that we should use predictions as a vehicle to capture the changing beliefs of current domain-experts, and allocate their attention going forwards, (smoothing out attentional discontinuities in expectation).

I'm not saying we should replace Eric Drexler with a swarm of hobby forecasters. I'm saying that a few full-time x-risk researchers might realise before the rest of the community that Eric's work deserves marginally more attention, and be right about that, and that community-internal forecasting systems can allow us to more effectively use their insights.

Can't we just use blog posts?

Compared to a numerical prediction, a blog post...

  • ...takes more effort to produce
  • ...might take more effort to read and interpret
  • ...doesn't have a standardized format of interpretation
  • ...doesn't allow gathering and visualisation of the beliefs of multiple people
  • ...doesn't natively update in the light of new information

Blog posts have the crucial property of being "essay complete" in their expressiveness, but that comes at the cost of idiosyncracy and poor scalability.

A better model is probably to treat blog posts as part of the ground truth over which predictions operate, just as the rice and wheat markets provide the ground truth for their respective futures markets.

I'd rather have only blog posts than only prediction systems, but I'd rather have both than only blog posts.


The Real Rules Have No Exceptions

23 июля, 2019 - 06:38
Published on July 23, 2019 3:38 AM UTC

(This is a comment that has been turned into a post.)

From Chris_Leong’s post, “Making Exceptions to General Rules”:

Suppose you make a general rule, ie. “I won’t eat any cookies”. Then you encounter a situation that legitimately feels exceptional , “These are generally considered the best cookies in the entire state”. This tends to make people torn between two threads of reasoning:

  1. Clearly the optimal strategy is to make an exception this one time and then follow the rule the rest of the time.

  2. If you break the rule this one time, then you risk dismantling the rule and ending up not following it at all.

How can we resolve this? …

This is my answer:

Consider even a single exception to totally undermine any rule. Consequently, only follow rules with no exceptions.[1]. When you do encounter a legitimate exception to a heretofore-exceptionless rule, immediately discard the rule and replace it with a new rule—one which accounts for situations like this one, which, to the old rule, had to be exceptions.

This, of course, requires a meta-rule (or, if you like, a meta-habit):

Prefer simplicity in your rules. Be vigilant that your rules do not grow too complex; make sure you are not relaxing the legitimacy criteria of your exceptions. Periodically audit your rules, inspecting them for complexity; try to formulate simpler versions of complex rules.

So, when you encounter an exception, you neither break the rule once but keep following it thereafter, nor break it once and risk breaking it again. If this is really an exception, then that rule is immediately and automatically nullified, because good rules ought not have exceptions. Time for a new rule.

And if you’re not prepared to discard the rule and formulate a new one, well, then the exception must not be all that compelling; in which case, of course, keep following the existing rule, now and henceforth.

But why do I say that good rules ought not have exceptions? Because rules already don’t have exceptions.

Exceptions are a fiction. They’re a way for us to avoid admitting (sometimes to ourselves, sometimes to others) that the rule as stated, together with the criteria for deciding whether something is a “legitimate” exception, is the actual rule.

The approach I describe above merely consists of making this fact explicit.

  1. By which I mean “only follow rules to which no legitimate exception will ever be encountered”, not “continue following a rule even if you encounter what seems like a legitimate exception”. ↩︎


jacobjacob's Shortform Feed

23 июля, 2019 - 05:56
Published on July 23, 2019 2:56 AM UTC

What it says on the tin.


"Shortform" vs "Scratchpad" or other names

23 июля, 2019 - 04:21
Published on July 23, 2019 1:21 AM UTC

Within the next day or two, we'll be launching some features that give the "shortform" concept real infrastructural support. Shortform comments will appear on the All Posts page, on the new /shortform page, and it'll be easier to automatically generate a post to write shortform content.

The only potential issue is that, as "shortform" comments have gained traction... they... often aren't especially short.

I'd kinda like to finalize the name before it goes live.

Currently the team is looking at "Scratchpad" as a name, intending to convey "this is a place you write early stage, off the cuff ideas without stressing too much about whether it should be a post." Sometimes this turns out to actually be short, sometimes not. The main counterpoint right now is that scratchpad kinda implies a level of "ephemerality" or something that isn't necessarily true either.

I'm particularly interested in feedback from people who've either already made a shortform feed (i.e. mr-hire, hazard, DanielFilan and others) if they have any opinions about this particular bikeshed.


Cambridge LW/SSC Meetup

23 июля, 2019 - 03:47
Published on July 23, 2019 12:47 AM UTC

This is the monthly Cambridge, MA LessWrong / Slate Star Codex meetup.

Note: The meetup is in apartment 2 (the address box here won't let me include the apartment number).


Solving for X instead of 3 in love triangles?

23 июля, 2019 - 01:22
Published on July 22, 2019 4:35 PM UTC

I was rereading HPMOR and got to the math problem about love triangles (4^3 possible combinations but you need to remove the combinations where 1 person is not loved or in love with someone) and solved it by counting all the combinations that did not work (10) and subtracted it from the 64 possibility's to get 54 then did the same sort of counting strategy for if there were four people and fond that knowing the answer for 3 people made counting here easy (4 nodes (4^6) = 4096 won't include everyone (54x4)+(3x6)+1 = 235 so answer is 3861) but then, And here is my question forward is there a mathematical formula so solve for X people instead of laboriously solving by adding 1 person at a time and counting the permutations that don't include everyone?


Where is the Meaning?

22 июля, 2019 - 23:18
Published on July 22, 2019 8:18 PM UTC

"You're being mean."
"No, I'm not.""This album is amazing!"
"What are you talking about? It's clearly trash.""Tomatoes are my favorite vegetable!"
"Dude, they're fruits."

People often argue about what things are, and they often do it with words. Sometimes this can create another argument about what a certain word means. This leads me to semi-rhetorically ask you the question, where is the meaning located? In the words? The people? The air? If this seems like philosophical thumb-twiddling, note that where you think meaning is determines where you go looking whenever a question of meaning pops up. If I think the dictionary is the end all be all of meaning, it's the first place I'll go when pondering, "What really is happiness?"

If you've ever been quoted out of context in a way that made you look bad, you might have grumbled, "That's not what I meant!" On some level, I think most people understand that words themselves do not have inherent meaning. There are thoughts in your head, and words are the things that come out of your mouth, which hopefully let others guess what you are thinking.

And yet... damn sometimes it just really feels like meaning is baked into a message. Like it's just there, waiting to be discovered. We're going to explore this feeling, what I think it means, and how to not let it trip you up.

Context and Predictability

Context matters:

"My sister just got a new dog."
"You snuck in? You sly dog!"

Context still matters:

Mobster: "Watch do to that guy who owed you money?"
Mob Boss: "I put him to sleep."
Spouse 1: "Where's our Henry? I haven't seen him in a bit"
Spouse 2: "I put him to sleep."

Neither of those are meant to be shocking, just reminders that you already know that the same word, even the same sentence, can mean completely different things based on the context.

Everyone has the shared context of "having a human brain" and "being subject to the same laws of physics". This lets me communicate things like "Get away! You're not welcome" merely by throwing rocks. Language is a wild invention that was bootstrapped from an incredibly general context, and now it lets us create even more shared context which allows for even more specific thoughts and ideas to be shared. The context that you share with the people you're talking to is incredibly important in figuring out who meant what.

Given that it seems pretty clear that context is essential to understanding the meaning of someone's words, let's hop back to examine the feeling that meaning is intrinsic to words. Douglas Hofstadter talks about this feeling in Godel, Escher, Bach (the chapter "The Location of Meaning" really digs into some interesting ideas about meaning that are beyond the scope of this post). In the context of the Rosetta Stone being translated, he says:

Just how intrinsic is the meaning of a text, when such mammoth efforts are required in order to find the decoding rules? Has one put meaning into the text, or was that meaning already there? My intuition says that the meaning was always there, and that despite the arduousness of the pulling-out process, no meaning was pulled out that wasn't in the text to start with. This intuition comes mainly from one fact: I feel that the result was inevitable; that, had the text not been deciphered by this group at this time, it would have been deciphered by that group at that time-and it would have out the same way. That is why the meaning is part of the text itself; it acts upon intelligence in a predictable way. Generally, we can say: meaning is part of an object to the extent that it acts upon intelligence in a predictable way.
(emphasis mine)

Predictability is key. You can see how if saying "Good morning, what's the time?" prompted some people to give you a handshake, others to tell you their cell phone number, and others to start dancing, you'd be far less inclined to think that "Good morning, what's the time?" had intrinsic meaning.

I claim that "It feels like meaning is intrinsic to the words themselves" is a thought composed of multiple steps:

  1. I notice that some words predictably convey a particular meaning.
  2. When I communicate, say in writing, I'm only giving the other person words.
  3. If I'm only giving them some words, and they reliable get the same meaning, it must be because the meaning is some how baked into the words.

And now that we've drawn out the implicit jumps of logic hidden, we can see that step 2 is a bit fishy. Yes, you might have only given someone your words, but what they already had was some context. Context from having grown up in the same country as you, context from having know you from work, context from earlier parts of your conversation. When a context is very familiar you can fuse to it. When this happens you no longer notice your context when you look at the world, because the context becomes the lens through which you do your looking. If you turn your attention to these mysterious words that seem to so predictably convey your meaning, they are the only culprits available and you quite reasonably decide that they must contain the meaning.

There is a right answer to "What did you mean by [words you said]?" but there is not a right answer to "What do [words you said] mean?"

Go to the source of the meaning

A sketch:

Dale is chopping up a tomato and putting it into a fruit saladBrice: What are you doing?! A tomato is a vegetable, don't put it in a fruit salad!Dale: Nope, I looked it up, totally a fruit.Dale and Brice spend 10 minutes having an unproductive argument about whether a tomato is a fruit or a vegetable, it ends with them agreeing to disagree

Brice said some words. What does the word fruit mean? Let's investigate the definition of fruit.

Here's a parallel universe where Brice has read A Hazardous Guide To Words (or the Sequences, but then again Brice is reading HGTW because he can't be bothered to read the Sequences):

Dale is chopping up a tomato and putting it into a fruit saladBrice: What are you doing?! A tomato is a vegetable, don't put it in a fruit salad!Dale: Nope, I looked it up, totally a fruit.Brice: Sorry, what I meant was that the flavors of those bananas, strawberries, blueberries, and orange that you already have in the fruit salad won't really jive with the flavor of the tomato, and I think you shouldn't include it.Dale and Brice spend 10 minutes having an interesting conversation about what sorts of flavors do and don't go well together. It ends with them both learning something.

Brice said some words. What did Brice mean? Let's have Brice tell us more.

Remember, words don't mean things, people mean things. And because language is awesome, plenty of words are particularly good indicators that a person meant a particular thing.