Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 56 минут 49 секунд назад

Does improved introspection cause rationalisation to become less noticeable?

30 июля, 2019 - 13:03
Published on July 30, 2019 10:03 AM UTC

I've recently updated that noticing is a key rationality skill -- not just noticing confusion, but noticing your cognition more generally. This allows you to figure out at a very granular level why you're not reaching your goals, and then intervene to change those reasons.

For example:

At one point I found myself procrastinating on ordering the catering for an event. Noticing the disconnect between my high-level goals ("make a good event") and my concrete actions ("spend time on FB"), triggered me to try to notice what was up in my mind (this is a particular trigger-response pattern I've trained myself to use). I found that I didn't want to make the call since last time I called them, they couldn't hear what I was saying and were kind of rude about it. I didn't want my phone to be bad or my accent to be inaudible, an so I didn't want to call them again. I then proceeded to borrow a friend's phone, and called them without problem.

Another example, this time with a concrete cognitive rather than practical intervention:

I noticed myself being unhappier than I wanted to be. So when the unhappiness clashed with the higher-level desire for happiness, it triggered a noticing process, and I realised my mind was running an algorithm like: "notice happy thought --> remember Hamming problem or that timelines might be short --> feel bad". This sounds ridicolously unhelpful when written out, but is in fact what was going on. So I started training myself to hold on to the happiness in the first part of the chain without automatically falling into the second.

Here's a worry with this: if part of my congition is consciously accessible and interpretable, and part of it is not, will extensive noticing-and-intervening cause motivated cognition to become less noticable?

It will by selection effects, since the more noticeable parts I'll change. But this feels more definitionally true than actually worrying.

It also might by negative reinforcement, if my mind learns that when subagents make their desires known they'll tend to be overruled/modified. (To prevent this, and as a safer policy in my current epistemic state, I make sure to sometimes deliberately not intervene on things I've noticed.) But this shouldn't be the case if I genuinely listen to subagents and take their preferences into account; as well as if the subagent theory doesn't fit (which seems more plausible in the second example above).


Is there some other reason to believe that improved ability to notice your cognition will cause rationalisation, motivated cognition, thought patterns highly valued by certain subagents, etc. to become less noticeable?



Discuss

When Having Friends is More Alluring than Being Right (by Ferrett Steinmetz)

30 июля, 2019 - 12:58
https://s0.wp.com/i/blank.jpg

On Framing Political Opinions to Quickly Assess the Crux of Disagreement

30 июля, 2019 - 08:21
Published on July 30, 2019 3:07 AM UTC

The following will be elementary to many of you, and for that I must explain that this idea popped into my head fully formed and I need to improve my writing habit, so here goes. Hopefully this will be insightful for someone!

Political opinions can be framed as a set of three components: Cause, Means, and Ends (consequences).

Example: Poor people can’t afford rent in this town on minimum wage, so you should vote for a Democrat to enact higher minimum wage laws.

Let’s unpack this as follows:

Cause: Poor people can’t afford rent in this town on minimum wage

Means: vote for a Democrat to enact higher minimum wage laws

Ends: Unstated! But we can infer that the claim is that higher minimum wage laws create a world in which poor people can afford rent in this town.

I used this example specifically because it’s a common argument, and the ends are unclear. I have noticed that the majority of political movements and ideologies have very vague Ends. This is in part due to how difficult it is to predict the future and the outcomes of different government policies. But part of it is to prevent disagreements from occurring regarding the Ends.

So you may be in an argument with someone who agrees with you about the Cause, agrees about the Ends, but disagrees about the Means, and any other combination.

I’ve been in political movements where everyone allied with a Cause, but then it fell apart from infighting about Means, and we barely even touched Ends! It’s very rare that we meet people who match our beliefs on all three components.

Another element of this is that our minds create meaning by finding patterns and matching those patterns to categories which we then label. We use those labels often without realizing how fuzzy the borders of our categories are, and then sometimes we argue about labels instead of realizing that we’re arguing about where the boundaries of the categories are, or whether a Thing belongs in the category or not.

Here are some examples to illustrate what I mean. When I was part of Occupy Wall Street, we surveyed the attendees to determine what causes people agreed with. We found a huge array of different causes brought us together, but a few stood out as central issues. Then we attempted to discuss solutions and quickly realized how many different worldviews were held by our group members. Eventually, the fundamental disagreements were insurmountable, and the movement fizzled.

I advocate for being as clear as possible in your arguments both for good rhetorical reasons, but also for signaling reasons. If you’re trying to change the world, you want to attract like-minded people who agree with you on as many portions of the three components as possible. This isn’t usually possible, and getting work done is ultimately more important than agreeing on what work needs to be done. And I don’t think people usually talk about Ends in anything other than vagueries like “making the world a better place.” However, successful movements have centered around visionary leaders. So to attract the right people to your cause you should be expressing your vision of the future (Ends) and advocating the actions (Means) most likely to get us there while being open to disagreement about the prioritization of Causes.



Discuss

Information empathy

30 июля, 2019 - 04:32
Published on July 30, 2019 1:32 AM UTC

Epistemic status: Giving a name to a basic skill that some people exhibit and others don't, because I think that it's useful to keep track of who exhibits it and that giving it a name will help with this. See also: Historian's Fallacy, Theory of Mind (both Wikipedia links).

The experiment I'm about to describe may be apocryphal (links, anyone?), but it illustrates my point nicely.

In the experiment, a child is shown a box with a "trick" lid, such that when the lid is on the box it appears that there is a cookie inside, but when the lid is removed you can see that it is in fact empty. First, with the lid on the box, the child is asked what is inside. "A cookie," they answer. Then the lid is removed, and again the child is asked what is inside. "Nothing," they say. Finally, the lid is replaced, and the experimenter tells the child that they are going to show the box to another child, who has not yet seen the box with the lid off.

"What will the other child think is in the box?" the experimenter asks.

At this point, things can go one of two ways. Children beyond a certain developmental stage will say "she'll say there's a cookie", while children who have not yet reached this stage will say "she'll say it's empty".

I call this skill Information Empathy, and from what I can tell, the median age at which children develop it is 35.

Jokes aside, I feel like there have been actual occasions where somebody has been angry at me after I did X, where the primary reason for their anger seems to be neither that X happened to them nor that my doing X was careless, but rather that I did X knowing full well what the consequences were... even when they could see that that was not true.

Maybe this term will be useful for some of you.



Discuss

Does it become easier, or harder, for the world to coordinate around not building AGI as time goes on?

30 июля, 2019 - 01:59
Published on July 29, 2019 10:59 PM UTC

(Or, is coordination easier in a long timeline?)

It seems like it would be good if the world could coordinate to not build AGI. That is, at some point in the future, when some number of teams will have the technical ability to build and deploy and AGI, but they all agree to voluntarily delay (perhaps on penalty of sanctions) until they’re confident that humanity knows how to align such a system.

Currently, this kind of coordination seems like a pretty implausible state of affairs. But I want to know if it seems like it becomes more or less plausible as time passes.

The following is my initial thinking in this area. I don’t know the relative importance of the factors that I listed, and there’s lots that I don’t understand about each of them. I would be glad for…

  • Additional relevant factors.
  • Arguments that some factor is much more important than the others.
  • Corrections, clarifications, or counterarguments to any of this.
  • Other answers to the question, that ignore my thoughts entirely.
If coordination gets harder overtime, that’s probably because...
  • Compute increases make developing and/or running an AGI cheaper. The most obvious consideration is that the cost of computing falls each year. If one of the bottlenecks for an AGI project is having large amounts of compute, then “having access to sufficient compute” is a gatekeeper criterion on who can build AGI. As the cost of computing continues to fall, more groups will be able to run AGI projects. The more people who can build an AGI, the harder it becomes to coordinate all of them into not deploying it.
    • Note that It is unclear to what degree there is currently, or will be, a hardware overhang. If someone in 2019 could already run an AGI, on only $10,000 worth of AWS, if only they knew how, then the cost of compute is not relevant to the question of coordination.
  • The number of relevant actors increases. If someone builds an AGI in the next year, I am reasonably confident that that someone will be Deep Mind. I expect that in 15 years, if I knew that AGI would be developed one year from then, it will be much less overdetermined which group is going to build it, because there will be many more well funded AI teams with top talent, and, most likely, none of them will have as strong a lead as Deep Mind currently appears to have.
    • This consideration suggests that coordination gets harder over time. However, this depends heavily on other factors (like how accepted AI safety memes are) that determine how easily Deep Mind could coordinate internally.
If coordination gets easier over time, that’s probably because…
  • AI safety memes become more and more pervasive and generally accepted. It seems that coordination is easier in worlds where it is uncontroversial and common knowledge that an unaligned AGI poses and existential risk, because everyone agrees that they will lose big if anyone builds an AGI.
    • Over the past 15 years, the key arguments of AI safety have gone from being extremely fringe, to a reasonably regarded (if somewhat controversial) position, well inside the overton window. Will this process continue? Will it be commonly accepted by ML researches in 2030, that advanced AI poses and existential threat? Will it be commonly accepted by the leaders of nation-states?
    • What will the perception of safety be in a world where there is another AGI winter? Suppose that narrow ML proves to be extremely useful in a large number of fields, but there’s lots of hype about AGI being right around the corner, then that bubble bursts, and there is broad disinterest in AGI again. What happens to the perception of AI safety? Is there a sense of “It looks like AI Alignment wasn’t important after all”? How cautious will researchers be in developing new AI technologies.
  • [Partial subpoint to the above consideration] Individual AI teams develop more serious info security conscious processes. If some team in Deep Mind discovered AGI today, and the Deep Mind leadership opted to wait to insure safety before deploying it, I don’t know how long it would be until some relevant employees left to build AGI on their own, or some other group (such as a state actor) stole their technology and deployed it.
    • I don’t know if this is getting better or worse, overtime.
  • The technologies for maintaining surveillance of would-be AGI developers improve. Coordination is made easier by technologies that aid in enforcement. If surveillance technology improves that seems like it would make coordination easier. As a special case, highly reliable lie detection or mind reading technologies would be a game-changer for making coordination easier.
    • Is there a reason to think that offense will beat defense in this area? Surveillance could get harder over time if the technology for detecting and defeating surveillance outpaces the technology for surveilling.
  • Security technology improves. Similarly, improvements in computer security (and traditional info security), would make it easier for actors to voluntarily delay deploying advanced AI technologies, because they could trust that their competitors (other companies and other nations), wouldn’t be able to steal their work.
    • I don’t know if this is plausible at all. My impression is that the weak point of all security systems is the people involved. What sort of advancements would make the human part of a security system more reliable?


Discuss

Toy model piece #1: partial preferences revisited

29 июля, 2019 - 19:35
Published on July 29, 2019 4:35 PM UTC

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

I'm working towards a toy model that will illustrate all the steps in the research agenda. It will start with some algorithmic stand-in for the "human", and proceed to create the UH, following all the steps in that research agenda. So I'll be posting a series of "toy model pieces", that will then be ultimately combined in a full toy model. Along the way, I hope to get a better understanding of how to do the research agenda in practice, and maybe even modify that agenda based on insights making the toy model.

In this post, I'll revisit and re-formalise partial preferences, and then transform them into utility functions.

The problem with the old definition

My previous model of partial preferences can't capture some very simple mental models, such as P="the more people smile, the better the world is".

This is because the partial preference decomposes the space of worlds locally as Y×Z, fixes two values y− and y+ in Y, and only compares worlds of type (y−,z) and (y+,z) for fixed z. This means that we can only compare worlds with the same z value, and only two of these worlds can be compared: so we can't say w1<w2<w3 for three distinct worlds. Thus we can't say that three people smiling is better than two, which is better than one. Not being able to capture preferences like P is a major flaw.

New definition: preorder

So now model a partial preference as preorder. A preorder ≤ is a type of ordering that is transitive (if w1≤w2 and w2≤w3, then w1≤w3) and reflexive (w≤w for all worlds w).

The previous type of partial preference can be made into a preorder quite easily: w1<w2 implies w1≤w2, and add w≤w for all worlds w.

Now we can easily express P="the more people smile, the better the world is". Let w(n,v) be a world with n smiling people in it, with v representing all the other relevant variables describing the world. The P is described by the preorder:

  • w(n,v)≤w(n,v′) if and only if v=v′ and n≤m.

For a general preorder ≤, define w<w′ to mean w≤w′ but it not being the case that w′≤w.

Circular preferences and utility functions

Unfortunately, unlike the previous partial preferences, preorders can allow for circular preferences w1≤w2≤w3≤w1. In practice, most sensible partial preferences will not have circular preferences, and will instead resemble P: just a collection of orderings among separate sets of worlds.

But, it will might be possible to have circular partial preferences, maybe of the type "in Australia, the cities gets nicer as you go clockwise along the coast".

So you need a way of dealing with circular preferences, and with complicated sets of partial preferences that might include some circular preferences.

We also want a way to transform all of these preorders into a full preference, given as a utility function over all worlds. The research agenda calls for aggregating similar preferences before making them into full preferences, but we still need some way of doing this in the cases where we have a single partial preference. The rest of this section will show one way of doing this.

The sensible case

In the simplest case, we'd have a partial preference such as "these ten worlds are good, in this order", and we'd map them to a utility function with equal spaces between each world. And we wouldn't say these ten worlds were all better or all worse than the other worlds not mentioned.

And we can do that, in the most likely and sensible case. Take P and its preorder ≤ (and <): under this partial preference, the worlds decompose into simple ordered chains. That means that if W is the set of worlds, then it decomposes as a disjoint union W=⋃i∈IWi (for some indexing set I). These sets are incomparable: if wi∈Wi and wj∈Wj, then neither wi≤wj nor wj≤wi.

Moreover, each of these Wi is totally ordered by <: so we can index any wi∈Wi by some natural number k, as wik, and say that wik<wil if and only if k<l. Let ni=||Wi||−1 be the size of Wi minus one, and order the elements of it from 0 upwards: so the set is ordered as wi0<wi1<…<wini.

Then here is how we construct a utility function U that extends the partial order:

  • For all i∈I, k∈N, set U(wik) to 2k−ni.

This means that if w is incomparable with all other worlds (ie, is not relevant to the partial preference), the U(w)=0, and that all chains Wi have utilities U(wi0)=−ni, U(wi1)=−ni+2, …, U(wini)=ni. So they are symmetric around 0.

The general case

Here I'll give a way of generalising the above procedure to the case of any preorder. Note that this situation should only come up very rarely, since most preorders derived from mental models will be more sensible. Still, for completeness, here is a process that extends the simple model:

First, to get rid of circular preferences, project the set of worlds W to ¯¯¯¯¯¯W, by using the equivalence relation w≅w′ means w≤w′ and w′≤w. Call this projection p. The preorder on W descends via p to a partial order. So we now work in ¯¯¯¯¯¯W, which has no circular preferences. The if we assign a utility U(¯¯¯¯w) to ¯¯¯¯w∈¯¯¯¯¯¯W, we can extend this to W by setting U(w)=U(p(w)).

Now working in ¯¯¯¯¯¯W, define a link between two (equivalence class of) worlds w and w′. Write w←w to say that w<w′, and that there does not exist any world v∈¯¯¯¯¯¯W with w<v<w′.

Now, decompose ¯¯¯¯¯¯W as collection of disjoint sets ⋃i∈IWi (for some indexing set I). Two worlds w and w′ are in the set Wi if you can get from one to another following links; ie if there exists worlds wik with w=wi0 and w′=wil and for all k, either wik←wi(k+1) or wi(k+1)←wik.

Let ni be the number of links in Wi. We'll now assign utility to elements of Wi, as a constrained optimisation process; in the following, all worlds are assumed to be in Wi:

  • minimise ∑w←w′||U(w′)−U(w)||2, subject to the constraints that:
  • ∑w←w′||U(w′)−U(w)||=2ni,
  • ∑wU(w)=0,
  • if w<w′, then U(w)≤U(w′).

It's not hard to see that this extends the simple model above, which has ||U(w′)−U(w)|| for all w←w′.

The final version of partial preferences

Is this the final version of partial preferences? No, certainly not. But to get a better generalisation, we're going to have to have a look at how people actually model things inside their brains and thought processes. Hence the question of how best to model partial models will be an empirical one. But this very general definition will suffice for the moment.



Discuss

Mapping of enneagram to MTG personality types

29 июля, 2019 - 18:20
Published on July 29, 2019 3:20 PM UTC

For fun, Parina and I mapped enneagram types to a pair of Magic: The Gathering colors. (Following in the great tradition of https://medium.com/s/story/the-mtg-color-wheel-c9700a7cf36d) For each color pair, there’s no ordering; think of it as a set.

Without further ado:

1 Reformer White red

2 Helper Black green

3 Achiever Black white

4 Individualist Blue green

5 Investigator Blue red

6 Loyalist Black blue / blue white

7 Enthusiast Red green

8 Challenger Black red

9 Peacemaker Green white

Let me know if this helps or resonates with you.



Discuss

What woo to read?

29 июля, 2019 - 18:19
Published on July 29, 2019 3:19 PM UTC

Recently, I've heard more discussion of chakras, energy, and similar ideas in rationalist type and adjacent circles.

I don't really know where this comes from. So I thought I'd ask, what "woo" do you all find useful?

I'm interested in:

  • pointers to explanations of woo
  • explicit explanations of why woo is useful for rationalist types


Discuss

Is there neuroscience research on cognitive biases?

29 июля, 2019 - 10:45
Published on July 29, 2019 7:45 AM UTC

I recently watched NeuraLink's presentation, and wondered how can something like that help us reason.

The obvious way is an AI that can reason, and is connected to our brains and we reason together with it.

But another direction i thought of, is just helping us notice when we're using motivated cognition and letting cognitive biases take place.

Another thing i thought of was reducing Akrasia, can it help us win the fight between areas in the brain in an akratic situation?

With my very little knowledge about the subjects in hand, it seems like it would be an easier target - seems to me you'd need fewer electrodes, less understanding of how reasoning works, and simpler software.

Although the questions about the possibilities of this technology are interesting, i don't expect anything more the guesses and predictions to be available right now.

So my question is whether we have neurological knowledge about how these mechanisms work.



Discuss

What is our evidence that Bayesian Rationality makes people's lives significantly better?

29 июля, 2019 - 02:20
Published on July 28, 2019 11:20 PM UTC

Anecdotally, it has streamlined my thinking process exponentially, and made me more self-aware. However, most proponents of most belief systems will make similar claims.

Our general techniques of using science to come to truth is inarguably valuable, but that's not unique to us.

What evidence can I show to a non-Rationalist that our particular movement (i.e. our particular techniques for overcoming biases, studying decision theory, applying Bayesianism, learning CBT techniques, etc.) is valuable for making their lives significantly better?



Discuss

What supplements do you use?

28 июля, 2019 - 20:01
Published on July 28, 2019 5:01 PM UTC

Cross-posted to my personal blog.

There's a rationalist tradition of thinking carefully about supplements & nootropics: 1, 2, 3

However, I haven't seen a record of what supplement regimens people end up using in practice.

I've been fooling around with supplement stacks for a few years now and feel pretty good about my current regimen (outlined below), but want to see if there's any low-hanging fruit I've missed.

So I'm curious... what supplements do you use? (Reply below, or shoot me an email)

Also curious to hear about obvious mistakes and/or oddities in my current stack.

My supplement regimen

Disclaimer: I'm not a doctor, this isn't medical advice, etc. etc.

Background: 27 years old, biological male, ~200 lbs, BMI of 25, pescatarian

Supplements I take:


Supplements I'm considering:

  • Apparently Tyrosine boosts cognitive performance during stress. There are noticeable subjective effects when I take 1.0 - 2.0 g on an empty stomach. I tried some recently and enjoyed it. It's safe, cheap, and legal so I may start using it regularly to complement my caffeine use.
  • Following this post, I may start taking Vitamin K2 in the morning to complement my Vitamin D3 supplementation.
  • Romeo has told me that Choline is a good daily supplement (apparently many people are deficient & deficiency is associated with depression). I haven't poked the literature on this yet.

Supplements I don't take any more:

  • I used to take fish oil daily. Gwern likes it, but I didn't notice any effect & was scared off by the potential negative effects. Also I eat sushi sometimes which probably does some of the work fish oil would do.
  • I used to take a small aspirin daily but stopped after a large RCT found that the downside risk probably negates the benefit.
  • For a while I took lithium to boost mood, but stopped after my replication of Gwern's RCT failed to show an effect.


Discuss

Compilers/PLs book recommendation?

28 июля, 2019 - 18:49
Published on July 28, 2019 3:49 PM UTC

I'm interested in:

  • Data structures for representing programs
  • Algorithms for translating between those representations
  • Analysis & manipulations of each representation; questions which are natural to ask/answer in each

Most books I can find on compilers/PLs tend to spend most of their time on the text representation (and algorithms for translating programs out of text, i.e. parsing) and the machine-code representation (and algorithms for translating programs into machine code). For purposes of this question, I'm not particularly interested in either of these representations - they're not very natural data structures for representing programs, and we mostly use them because we have to.

I'm interested mainly in high-level "intermediate" representations. In terms of applications, I'm more interested in reasoning about about the code (e.g. control flow analysis, type checking) and manipulating the code (e.g. automated parallelization, high-level machine-independent optimization methods) rather than translating between any particular source/target format.

Questions:

  • Does anybody know of a good book (or other source) on that focuses on high-level "intermediate" representations of programs?
  • Is there some other question I should be asking, e.g. a different term to search for?


Discuss

Arguments for the existence of qualia

28 июля, 2019 - 13:52
Published on July 28, 2019 10:52 AM UTC

One trap that we must be wary of is adopting beliefs because they are popular among people who strive to think critically or scientifically, as opposed to being the result of critical or scientific thinking. One good example of this is RationalWiki which purports to report the political beliefs which all Rational(TM) people should hold. Similarly, I believe that most people who believe in materialism do so on the basis of extremely poor reasons and without knowledge of some of the stronger arguments for qualia existing. Maybe we should ultimately support materialism, but I get the impression that many people jump to this conclusion too quickly. Here are some points I believe people should consider first

(I know someone else in the rationality community wrote a post arguing for consciousness recently and I was meaning to read it, but I lost the link before I had a chance)

Failure to bite the bullet argument

If qualia don't exist, why is anything that you experience good or bad? In this case, things like pleasure, pain and meaning are nothing more than ways that particles can combine. But if this is the case, why are these combinations special or more important than other combinations. Why should we try to make certain combinations happen and certain combinations not.

It's fairly common to deny the existence of the objectively good or bad, but denying the existence of the subjectively good or bad is a much stronger claim. But everyone acts like this matters and so it appears somewhat hypocritical. And there is an argument that we should continue to act normally on the basis of meta-theoretic uncertainty, but no-one makes that argument.

Pascal's Wager Argument

This last point is actually a pretty strong argument for believing in qualia. If they don't exist, nothing matters, but if they do exist then we benefit from acting as though they do exist. Therefore, we should assume the later.

Expected Evidence Argument

Claiming the existence of qualia is often seen as anti-scientific, some people would even go as far to say that they don't see much difference between claiming the existence of ghosts or qualia. One key difference is that if ghosts existed we would expect objective evidence of them, even if the experiments would be hard to run. For example, we would expect a greater rate of howling in houses where someone was murdered. Even if they could only interact with us psychically, we would expect a higher rate of mental illness in these houses, even if the person living there had no idea of the past. Since if ghosts existed we would expect the existence of objective evidence, the lack of any such evidence counts against them.

On the other hand, it's not so clear that we should expect any objective evidence of subjective experience. Arguably, the only evidence we should expect of subjective experience is direct, subjective evidence.

Initial Foundations Argument

How should we come to understand the world? It seems like we might want to first pick a class of phenomena to be the foundation and that this should be whatever we are most certain of existing. We will then need to decide on the best way of knowing about that phenomenon and then choose a way of figuring out what other kinds of things might exist in the universe.

So what should our initial foundations be? One option is the external physical world, while the other is subjective experience. The later makes more sense to me as it describes why we believe in an external world. It isn't that we just assume it a priori, but instead that we notice patterns in our subjective experience and then theorise that there might be some object that exists independently of our experience causing these regularities. On the other hand, subjective experience makes much more sense to assume a priori and hence more sense as an initial foundation. Indeed, we could even say that this approach is truer to the scientific method since we are concluding even the existence of the external world empirically. In other words, objective experience needs to be justified in terms of subjective experience and not the other way round.

Transcendence Argument

Arguably the nature of an atom (or whatever elementary particle we choose) transcends its mere mathematical description. Firstly, the claim that "THIS IS ALL THAT THERE IS TO IT" seems like a strong claim and one which we can never know. Surely, it is much more reasonable to maintain that there is at least the possibility of there being something in its nature beyond this. Indeed, if there were not, this would seem to imply that a perfect simulation of an atom is an atom and this seems absurd.

Following this reason, why can't there be an element of consciousness that transcends its mere mathematical description? And if a simulation of an atom is not automatically an atom, then perhaps a simulation of consciousness isn't automatically conscious?

Relabeling Argument

Let's suppose I have a system with a variable x which takes values between 0 and 10. Suppose we define a second variable y which is also between 0 and 10 which satisfies x+y=10.

Is this a different system than the original? It seems this comes down to whether y is a new entity or just a relabelling of x. The one thing that I would expect to be uncontroversial here is that it is possible for this new system to just be a relabelling. Whether or not it is necessarily just a relabelling would be much more contentious.

If we were to say that sometimes introducing a variable that like that could be more than just a relabelling, then that would be to accept that objects can have a nature that is not fully encapsulated by their mathematical definition as I previously argued.

On the other hand, if circumstances like this are always just a relabelling, then there is no difference between the system with y in it and the system without y in it. This becomes important when y is a much more complicated property, such as the amount of "pain" an organism is experiencing. If the system is the same with the entity representing pain or without this entity, then it seems like it can't have been important. This implies that someone insisting it was just a relabelling must then bite the bullet of qualia being unimportant.

Against the Illusion Argument

Some people say consciousness or qualia are just illusions. At best, this seems like a really bad analogy. For example, if I think I see an oasis, but it is actually an illusion, then it is the oasis that is illusionary and not my experience of seeing. In other words, if an experience is an illusion, then we still have the experience of seeing that illusion. And any discussion of illusions where the illusion isn't experienced seems to be a very misleading way of using that term.

Arguments against consciousness

I'll finish by noting that there are some very strong arguments against consciousness too. For quite a while I felt that the epiphenomenal theory was the most plausible, but there are two devastating critiques. The first is the evolutionary argument, it seems absurd that positive qualia would line up with events that are evolutionary advantageous and negative qualia would line up with events that are evolutionary disadvantageous if qualia has no causal mechanism to impact evolution. And the second is that it sure looks like qualia has a causal impact, since we are discussing them right now. So to believe in the epiphenomenal theory is to believe qualia for a reason completely independent from us actually having qualia.

These are very strong arguments, but I nonetheless worry about closing the door on this debate too quickly as there are quite possibly theories that we haven't considered yet, especially when there appear to be very strong arguments for qualia as well.



Discuss

Applying Overoptimization to Selection vs. Control (Optimizing and Goodhart Effects - Clarifying Thoughts, Part 3)

28 июля, 2019 - 12:32
Published on July 28, 2019 9:32 AM UTC


Clarifying Thoughts on Optimizing and Goodhart Effects - Part 3

Previous Posts: Re-introducing Selection vs Control for Optimization, What does Optimization Mean, Again? -

Following the previous two posts, I'm going to try to first lay out the way Goodhart's Law applies in the earlier example of rockets, then try to explain why this differs between selection and control. (Note: Adversarial Goodhart isn't explored, because we want to keep the setting sufficiently simple.) This sets up the next post, which will discuss Mesa-Optimizers.

Revisting Selection vs. Control Systems

Basically everything in the earlier post that used the example process of rocket design and launching is susceptible to some form of overoptimization, in different ways. Interestingly, there seem to be clear places where different types of overoptimization is important. Before looking at this, I want to revisit the selection-control dichotomy from a new angle.

In a (pure) control system, we cannot sample datapoints without navigating to them. If the agent is an embedded agent, and has sufficient span of control to cause changes in the environment, we cannot necessarily reset and try over. In a selection system, we only sample points in ways that do not affect the larger system. Even when designing a rocket, our very expensive testing has approximately no longer term effects. (We'll leave space debris from failures aside, but get back to it below.)

This explains why we potentially care about control systems more than selection systems. It also points to why Oracles are supposed to be safer than other AIs - they can't directly impact anything, so their output is done in a pure selection framework. Of course, if they are sufficiently powerful, and are relied on, the changes made become irreversible, which is why Oracles are not a clear solution to AI safety.

Goodhart in Selection vs. Control Systems

Regressional and Extremal Goodhart are particularly pernicious for selection, and potentially less worrying for control. Regressional Goodhart is always present if we are insufficiently aware of our goals, but in general Causal Goodhart failures seems more critical in control, because it is often narrower. To keep this concrete, I'll go through the classes of failure, and note how they could occur at each stage of rocket design. To do so, we need to clarify goals at each stage. Our goal in stage 1 is to find a class of designs and paths to optimize. In stage 2, we build, test, and refine a system. In many ways, this stage is intended to circumvent goodhart-failures, but testing does not always address extremal cases, so our design may still fail.

Regressional Goodhart hits us if we have any divergence between our metric and our actual goal. For example, in stages 1 and 2, finding an ideal complex or chaotic path that is dependent on exact positions of planets in a multibody system would be bad, or a path involving excessive G-forces or other dangerous things might be more fuel efficient than a simpler path. For example, a gravitational slingshot around the sun might be cheap, but fry or crush the astronauts. Alternatively, a design with a shape that does not allow people to fit inside might be found when optimizing. Each of these impact goals potentially not included in the model. Regressional goodhart is less common in control for this case, since we kept the mesa-optimizer limited to optimizing a very narrow goal already chosen by the design-optimization.

Extremal Goodhart is always a model failure. It can be because the model is insufficiently accurate, (Model Insufficiency) or because there is a regime change. Regime changes seem particularly challenging in systems that design mesa-optimizers, since I think the mesa-optimization is narrower in some way than the global optimizer (if not, it's more efficient to have an executing system rather than a mesa-optimizer.)

Causal Goodhart is by default about an irreversible change. In selection systems, it means that our sampling accidentally broke the distribution. For example, we test many rockets, creating enough space debris to make further tests vulnerable to collisions. We wanted the tests to sample from the space, but we accidentally changed the regime while sampling.

In the current discussion, we care about metric-goal divergence because the cost of the divergence is high - typically, once we get there, some irreversible consequence happens, as explained above. This isn't exclusively true of control systems, as the causal Goodhart example shows, but it's clearly more common in such systems. Once we're actually navigating and controlling the system, we don't have any way to reset to base conditions, and causal changes create regime changes - and if these are unexpected, the control system is suddenly in a position of opitimizing using an irrelevant model.

And this is a critical fact, because as I'll argue in the next post, mesa-optimizers are control systems of a specific type, and have some new overoptimization failure modes because of that.



Discuss

What does Optimization Mean, Again? (Optimizing and Goodhart Effects - Clarifying Thoughts, Part 2)

28 июля, 2019 - 12:30
Published on July 28, 2019 9:30 AM UTC


Clarifying Thoughts on Optimizing and Goodhart Effects - Part 2

Previous Post: Re-introducing Selection vs Control for Optimization In the post, I reviewed Abram's selection/control distinction, and suggested how it relates to actual design. I then argue that there is a bit of a continuum between the two cases, and that we should add an addition extreme case to the typology, direct solution.

Here, I will revisit the question of what optimization means.

NOTE: This is not completely new content, and is instead split off from the previous version and rewritten to include an (Added) discussion of Eliezer's definition for measuring optimization power, from 2008. Hopefully this will make the sequence clearer for future readers.

In the next post, Applying over-Optimization in Selection and Control, I apply these ideas, and concretize the discussion a bit more before moving on to discussing Mesa-Optimizers in Part 4.

What does Optimization Mean, Again?

This question has been discussed a bit, but I still don't think its clear. So I want to start by revisiting a post Eliezer wrote in 2008, where he suggested that optimization power was ability to select states from a preference ordering over different states, and could be measured with entropy. He notes that this is not computable, but gives us insight. I agree, except that I think that the notion of the state space is difficult, for some of the reasons Scott discussed when he mentioned that he was confused about the relationship between gradient descent and Goodhart's law. In doing so, Scott proposed a naive model that looks very similar to Eliezer's;

simple proxy of "sample points until I get one with a large U value" or "sample n points, and [select] the one with the largest U value" when I think about what it means to optimize something for U. I might even say something like ".mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} n bits of optimization" to refer to sampling 2n points. I think this is not a very good proxy for what most forms of optimization look like."

I want to start by noting that this is absolutely and completely a "selection" type of optimization, in Abram's terms. As Scott noted, however, it's not a good model for what most optimization looks like, and that's part of why I think Eliezer's model is less helpful than I did when I originally read it.

There's a much better model for gradient descent optimization, which is... gradient descent. It is a bit closer to control than direct optimization, since in some sense we're navigating through the space, but for almost all actual applications, it is still selection, not control. To review how it works, points are chosen iteratively, and the gradient is assessed at each point. The gradient is used to select a new point at some (perhaps very clever, dynamically chosen next point.) Some stopping criteria is checked, and it iterates at that new point. This is almost always tons more efficient than generating random points and examining them.

(Addded) It's far better than a grid search, usually, for most landscapes, but also makes it clear why I think it's hard to discuss optimization power in Eliezer's terms on a practical level, at least when dealing with a continuous system. The problem I'm alluding to is that any list of preferences over states depends on number of states. Gradient descent type optimization is really good at focusing on specific sections of the state space, especially compared to grid search. We might find a state where grid search would require a tremendously high resolution, but we don't ever compute a preference ordering over 2n states. With gradient descent, we instead compute preferences for a local area and (hopefully) zoom-in, potentially ignoring other parts of the space. An optimizer that focuses very narrowly can have high-resolution but miss the non-adjacent region with far better outcomes, or can have fairly low resolution but perform far better - and the second optimizer is clearly more powerful, but I don't know how to capture this.

But to return to the main discussion, the process of gradient descent is also somewhere between selection and control - and that's what I want to explain.

In theory, the evaluation of each point in the test space could involve an actual check of the system. I build each rocket, watch to see whether it fails or succeeds according to my metric. For search, I'd just pick the best performers, and for more clever approaches, I can do something like find a gradient by judging performance of parameters to see if increasing or decreasing those that are amenable to improvement would help. (I can be even more inefficient, but find something more like a gradient, by building many similar rockets, each an epsilon away in several dimensions, and estimating a gradient that way. Shudder.)

In practice, we use a proxy model - and this is one place that allows for the types of overoptimization misalignment we are discussing. (But it's not the only one.) The reason this occurs is laid out clearly in the Categorizing Goodhart paper as one of the two classes of extremal failure - either model insufficiency, or regime change. This also allows for (during simulation undetectable) causal failures, if the proxy model gets a causal effect wrong.Even without using a proxy model, we can be led astray by the results if we are not careful. Rockets might look great, even in practice, and only fail in untested scenarios because we optimized something too hard - extremal model insufficiency. (Lower weight is cheaper, and we didn't notice a specific structural weakness induced by ruthlessly eliminating weight on the structure.) For our purposes, we want to talk about things like "how much optimization pressure is being applied." This is difficult, and I think we're trying to fit incompatible conceptual models together rather than finding a good synthesis, but I have a few ideas on what selection pressure leading to extremal regions means here.

  • Extreme proxy values (in comparison to most of the space) seems similar to having lots of selection pressure. If we have a insanely tall and narrow peak, we may be finding something strange rather than simply improving.
  • Extreme input values (unboundedly large or small values) may indicate a worrying area vis-a-vis overoptimization failures.
  • Lots of search time alone does NOT indicate extremal results - it indicates lots of things about your domain, and perhaps the inefficiency of your search, but not overoptimization. (This is in contrast to the naive grid-search model, where lots of points visited means more optimizing.)

As an aside, Causal Goodhart is different. It doesn't really seem to rely on extremes, but rather on manipulating new variables, ones that could have an impact on our goal. This can happen because we change the value to a point where it changes the system, similar to extremal Goodhart, but does not need to. For instance, we might optimize filling a cup by getting the water level near the top. Extremal regime change failure might be overfilling the cup and having water spill everywhere. Causal failure might be moving the cup to a different point, say right next to a wall, in order to capture more water, but accidentally break the cup against the wall.Notice that this doesn't require much optimization pressure - Causal Goodhart is about moving to a new region of the distribution of outcomes by (metaphorically or literally) breaking something in the causal structure, rather than by over-optimizing and pushing far from the points that have been explored.This completes the discussion so far - and note that none of this is about control systems. That's because in a sense, most current examples don't optimize much, they simply execute an adaptive program.

One critical case of a control system optimizing is a mesa-optimizer, but that will be deferred until after the next post, which introduces some examples and intuitions around how Goodhart-failures occur in selection versus control systems.



Discuss

Keeping Beliefs Cruxy

28 июля, 2019 - 04:18
Published on July 28, 2019 1:18 AM UTC

You might want to doublecrux if either:

  • You're building a product and disagree about how to go about it.
  • You want to make your beliefs more accurate, and you think a particular person you disagree with is likely to have useful information for you.
  • You just... enjoy resolving disagreements in a way that mutually pursues truth for whatever reason.

Regardless, you might find yourself with the problem:

Doublecruxing takes a lot of time.

For a reasonably 'serious' disagreement, I think it frequently takes a least an hour, and often longer. Habryka and I once took 12 hours over the course of 3 days to make any kind of progress on a particularly gnarly disagreement. And sometimes disasgreements can persist for years despite significant effort.

Now, doublecruxing is faster than many other forms of truth-aligned-disagreement resolution. I actually it's helpful to think of doublecrux as "the fastest way for two disagreeing-but-honest-people to converge locally towards the truth", and if someone came up with a faster method, I'd recommend deprecating doublecrux in favor it that. (Meanwhile, doublecrux is not guaranteed to be faster for 3+ people to converge but I still expect it to be faster for smallish groups with particularly confusing disagreements)

Regardless, multiple hours is a long time. Can we do better?

I think the answer is yes, and it basically comes in the form of:

  • Practice finding your own cruxes
  • Practice helping other people find their cruxes
  • Develop metacognitive skills that make cruxfinding natural and intuitive
  • Caching the results into a clearer belief-network

I'd summarize all of that as "develop the skill and practice of keeping your beliefs cruxy."

By default, humans form beliefs for all kinds of reasons, without regard for how falsifiable they are. The result is a tangled, impenetrable web. Productive disagreement takes a long time because people are starting from the position of "impenetrable web."

If you make a habit of asking yourself "what observations would change my mind about this?", then you gain a few benefits.

First, your beliefs should (hopefully?) be more entangled with reality, period. You'll gain the skill of noticing how your beliefs should constrain your anticipations, and then if they fail to do so, you can maybe update your beliefs.

Second, if you've cultivated that skill, then during a doublecrux discussion, you'll have an easier time engaging with the core doublecrux loop. (So, a conversation that might have taken an hour takes 45 minutes – your conversation partner might still take a long time to figure out their cruxes, but maybe you can do your own much faster)

Third, once you gotten into this habit, this will help your beliefs form in a cleaner, more reality-entangled fashion in the first place. Instead of building an impenetrable morass, you'll be building a clear, legible network. (So, you might have all your cruxes full accessible from the beginning of the conversation, and then it's just a matter of stating them, and then helping your partner to do so)

[Note: I don't think you should optimize directly for your beliefs being legible. This is a recipe for burying illegible parts of your psyche and then missing important information. But rather, if you try to actually understand your beliefs and what causes them, the legibility will come naturally as a side benefit]

Finally, if everyone around you is doing, this radically lowers the cost of productive-disagreement. Instead of taking an hour (or three days), as soon as you bump into an important disagreement you can quickly navigate through your respective belief networks, find the cruxes, and skip to the part where you actually Do Empiricism.

I think keeping beliefs cruxy is a good example of a practice that is both a valuable "Rabbit" strategy, as well as something worth Stag Hunting Together on.

If you have an organization, community, or circle of friends where many people have practiced keeping-beliefs-cruxy, people will individually benefit, as well as creating a truthseeking culture more powerful than the sum of its parts.



Discuss

Is this info on zinc lozenges accurate?

28 июля, 2019 - 01:05
Published on July 27, 2019 10:05 PM UTC

Podcast: Zinc Definitely Fights Colds, But You’re Probably Using the Wrong Kind

This podcast claims that zinc lozenges are "probably almost essentially a cure for the common cold". But there are many caveats:

  • must be zinc acetate or zinc gluconate (but gluconate is strictly worse)
  • use immediately on getting a cold
  • 18mg zinc per lozenge
  • dissolve in mouth 20-30 minutes
  • take every 2 hours
  • must have metallic taste, astringency
  • must free of anything ending -ate (except stearate) or -ic acid; free of magnesium except magnesium stearate (it's insoluble)
  • only one product on the market satisfies these requirements

The guy sounds to me like he knows what he's talking about. But I don't have the technical expertise to really know. (I think I could detect a mediocre bullshitter, but not necessarily a high level one.) If true, it seems like the sort of information that would be good for more people to know; but also like the sort of information that would be more widely known if it were true. (But I can sketch an argument for why it might not.) The research section at the linked page cites three journal articles of which two are open access, but I haven't looked closely at them.

My own experience is that I got some of these lozenges about a year ago, after reading a transcript of the episode. I thought I'd gotten them too late, but my cold cleared up much faster than I would have expected otherwise. Since then I've been trying to collect more anecdotal data, but my body is stubbornly refusing to even start coming down with a cold. Twice I thought it might be, and I took lozenges and didn't; but I think I took one lozenge on one occasion and three (spaced out) on another, and he thinks they shouldn't be effective enough for that to have worked. I'm not sure what to make of this, except that it shouldn't be much given the sample size.

Unfortunately the transcript has now been removed, and I can't find it on archive.org. I've made notes of the first ~35 minutes (of ~70). If someone could take a look (or listen), and say whether it all seems basically accurate, that would be fantastic. Almost all of it seems consistent with what I think I know, with one surprise that I've bolded. Apologies for the poor formatting.

  • zinc is important to the immune system in ways that are irrelevant to this. If you aren't getting enough, you'll probably benefit from getting more. Good sources are oysters, red meat and cheese.

  • RDA for zinc is satisfied by eating oysters once a week or beef once a day

  • it's more prevalant and also more bioavailable in animal foods than plant foods. So if you mainly eat a plant diet you may benefit from supplements or zinc-rich foods

  • phytate (grains, nuts, seeds are good sources) is "storage house for minerals"; allows plants to germinate & grow when conditions are right. Phytates make zinc less bioavailable in both the meal and supplements

  • but again this is separate from using zinc to cure colds

  • George Eby's 3yo daughter with leukemia had many colds, given 50mg zinc gluconate, refused to swallow, cold disappeared

  • Eby and colleagues published RCT in 1984 showing zinc lozenges could reduce median cold duration 5 days mean duration 7 days; basically cures cold

  • Almost every zinc lozenge on the market is useless for this purpose

  • ionic zinc (+ve charge, free not bound to anything) affects nasal tissue and adinoid tissue (lymph tissue in throat) i.e. two major sites of infection during cold: inhibits activation of viral polypeptides that are used in replication of cold virus; inhibit production in our cells of ICAM 1 (intracellular adhesion molecule 1) which is dock that allows virus to grab hold of cell and enter it

  • zinc products are all salts, not ionic. So we need one that releases ionic zinc in the relevant tissues at the right time

  • zinc interferes with replication of virus. So you need to take it almost immediately after being infected or at the first sign of symptoms

  • cold incubation period ~1 day, no symptoms, contagious; 2-3 days where replication and symptoms are increasing; then it peaks and declines, after 5 days basically undetectable but your symptoms may continue. So if you start using them 2-3 days in, they probably won't do anything

  • tablet or capsule releases zinc into stomach so that's no good

  • nasal sprays can kill your sense of smell

  • zinc released from a lozenge will reach your nasal tissues and throat tissues

  • some say you want a salt that releases ionic zinc at the pH of saliva. But actually it needs to release at pH of your nasal and throat tissues

  • saliva pH is 5; over 100 times more acidic than pH of cellular environment which is 7.4

  • 7.4 is basic, right? "100 times more acidic than [something on the other side of neutral]" seems like a weird thing to say? It sounds to me like "-5°C is 100 times more freezing than +2°C". Also, if I google "pH of saliva" I see 6.2 to 7.6. (I wouldn't be at all surprised to discover I'm just wrong about the acidic/freezing analogy.)

  • lots of zinc salts release ionic at pH 5, only a handful at 7.4

  • of salts in lozenges, only acetate and gluconate release any meaningful amount

  • at 7.4, gluconate is 50% ionic and acetate is 100% ionic. so zinc acetate should be twice as effective

  • most lozenges are neither; only one is acetate

  • zinc in your mouth has a metallic taste (astringent), dries it out. So people try to make zinc lozenges more palatable

  • but the astringency comes from the ionic zinc in your mouth. So if it's not astringent, it's not gonna help.

  • OTOH being astringent doesn't mean it will help, because that's in your saliva not your nose/throat tissues

  • food acids e.g. citrate or tartrate will very tightly bind zinc

  • studies with citrate or tartrate in lozenge seem to suggest it actually makes the cold last longer

  • ionic magnesium delivered to nose/throat tissues will nullify zinc, increase replication of cold virus

  • one product tested found harmful was produced with very high heat in presence of fats, maybe palm oil; high heat produced insoluble zinc waxes with the fatty acids

  • lubricants used in supplements, like magnesium stearate or other stearates, are insoluble; so they don't yield an acid that could bind to the zinc and don't yield much ionic magnesium and don't cause problems



Discuss

Shortform Beta Launch

27 июля, 2019 - 23:09
Published on July 27, 2019 8:09 PM UTC

We've had unofficial experiments with shortform for over a year. More and more people have been trying it out and finding it useful. Now, we're pushing shortform into an officially supported feature.

My description of shortform, inspired by pattern's comment, is:

Writing that is short in length, or written in a short amount of time. Includes off-the-cuff thoughts and brainstorming.Why shortform?

Sometimes shorter is better

I've noticed when I write a Facebook post... it ends up exactly as long as it's supposed to be. I write 3-5 paragraphs that nicely encapsulate my idea, and then it looks about the right length, and I click submit and have a nice, clear discussion.

When I start writing a LessWrong post, sometimes I look at the beautiful serif text on the nice blank white page and... I dunno, it feels like I'm supposed to write a 3 page essay, so I do. But my idea would have been better if I expressed it in 3 paragraphs.

Sometimes off-the-cuff is better

I also often want to brainstorm early stage ideas in way that isn't (necessarily) optimized for others to read – figuring out how to explain something well might be hard, and I'm not even sure the idea is good yet. But, people who've been following along with my thought process and understand what I'm gesturing at can still chime in with ideas.

Sometimes I just wanna start writing without worrying about what sort of thing I'm writing yet.

There's also an important in-between case, where maybe I'm writing something off-the-cuff and brainstormy, and maybe I'm actually writing a full treatise on something important. And I just... don't want to spend cycles figuring that out. I want my editor to feel unopinionated, and I want to be able to click 'submit' at the end without stressing out about whether I'm submitting 'good enough content'.

Sometimes, this results in an initial shortform comment eventually getting revamped into a major post.

For all of these reasons, and more, it seems useful to have a part of lesswrong optimized for shorter writing.

New Features, focusing on Visibility

Shortform is created in the form of comments (attached to an automatically generated shortform post). The new features mostly aim to:

  • Automate a lot of work. Instead of having to manually create a post called "So-and-so's shortform", you just start writing a comment, click submit, and then that post is automatically created for you
  • Improve visibility. Part of the point of shortform (in some cases) is to be a bit less visible. But so far it's been extremely un-visible, appearing only in the Recent Discussion section of the frontpage, usually only for a couple hours.

Features include:

New /shortform page

If you're in the mood to engage specifically with shortform content (as an author or reader), you can go to lesswrong.com/shortfom. There you can:

  • Read the latest shortform content. Currently, these are sorted by "most recently replied to", with the 3 latest replies shown underneath.
    • If a shortform item has more than 3 replies, there'll be a "N additional comments" button you can click to load more, and the reply with have a little "show parent" icon to indicate that there's conversation missing.
  • Start writing a shortform comment. When you click the "submit" button, you'll automatically generate a shortform post, and your comment will be added to that post. The comment will appear below in the stream of content.

It looks like this:

All Posts page visibility

If you're using the All Posts daily view, the top 5 shortform comments from that day will be visible (and you can click "load more")

Clicking on a shortform item will expand it and load replies.

Upcoming Features

This is all just the minimum viable product to get things rolling. There are some obvious features to add, such as:

  • Letting users subscribe to individual people's shortform
  • Making it easier to permalink to shortform content
  • Making it easier to convert a shortform comment into a full post

Let us know if you have other suggestions or feedback



Discuss

The Artificial Intentional Stance

27 июля, 2019 - 10:00
Published on July 27, 2019 7:00 AM UTC

Another post in the same incremental vein. Still hoarding the wild speculation for later.

I

The idea of the "intentional stance" comes from Dan Dennett, who wanted to explain why it makes sense that we should think humans have such things as "beliefs" or "desires." The intentional stance is just a fancy name for how humans usually think in order to predict the normal human-scale world - we reason in terms of people, desires, right and wrong, etc. Even if you could be prodded into admitting that subatomic particles are "what there really are," you don't try to predict people by thinking about subatomic particles, you think about their desires and abilities.

We want to design AIs that can copy this way of thinking. Thus, the problem of the artificial intentional stance. Value learning is the project of getting the AI to know what humans want, and the intentional stance is the framework that goes between raw sensory data and reasoning as if there are things called "humans" out there that "want" things.

II

Suppose you want to eat a strawberry, and you are trying to program an AI that can learn that you want to eat a strawberry. A naive approach would be to train the AI to model the world as best it can (I call this best-fit model the AI's "native ontology"), and then bolt on some rules telling it that you are a special object who should be modeled as an agent with desires.

The reason doesn't work is because the intentional stance is sort of infectious. When I think about you wanting to eat a strawberry using my intentional stance, I don't think about "you" as a special case and then use my best understanding of physiology, biochemistry, and particle physics to model the strawberry. Instead, I think of the verb "to eat" in terms of human desires and abilities, and I think of the strawberry in terms of how humans might acquire or eat one.

This is related to the concept of "affordances" introduced by James J. Gibson. Affordances are the building blocks for how we make plans in the environment. If I see a door, I intuitively think of opening or locking it - it "affords opening." But maybe an experienced thief will intuitively think of how to bypass the door - they'll have a different intuitive model of reality, in which different affordances live.

When you say you want to eat a strawberry, you are using an approximate model of the world that not only helps you model "you" and "want" at a high level of abstraction, but also "eat" and "strawberry." The AI's artificial intentional stance can't just be a special model of the human, it has to be a model of the whole world in terms of what it "affords" the human.

III

If we want to play a CIRL-like cooperative game with real human goals, we'll need the artificial intentional stance.

CIRL assumes that the process generating its examples is an agent (modeled in a way that is fixed when implementing that specific CIRL-er), and tries to play a cooperative game with that agent. But the true "process that determines its inputs" is the entire universe - and if the CIRL agent can only model a small chunk of the universe, there's no guarantee that that chunk will be precisely human-shaped.

If we want an AI to cooperate with humans even if it's smart enough to model a larger-than-human chunk of the universe, this is an intentional stance problem. We want it to model the inputs to some channel in terms of specifically human-sized approximate agents, living in the universe. And then use this same intentional-stance model to play a cooperative game with the humans, because its this very model in which "humans" are possible teammates.

This exposes one of the key difficulties in designing an artificial intentional stance: it needs to be connected to other parts of the AI. It's no good having a model of humans that has no impact on the AI's planning or motivation. You have to be able to access the abstraction "what the humans want" and use it elsewhere, either directly (if you know the format and where it's stored in memory), or indirectly (via questions, examples, etc.).

IV

The other basic difficulty is: how are you supposed to train or learn an artificial intentional stance?

If we think of it as a specialized model of the world, we might try to train it it for predictive power, and tune the parameters so that it gets the right answers as often as possible. But that can't be right, because the artificial intentional stance is supposed to be different than the AI's best-predicting native ontology.

I'm even skeptical that you can improve the intentional stance by training it for efficiency, or predictive power under constraints. Humans might use the intentional stance because it's efficient, but it's not a unique solution - the psychological models that people use have changed over history, so there's that much wiggle room at the very, very least. We want the AI to copy what humans are doing, not strike out on its own and end up with an inhuman model of humans.

This means that the artificial intentional stance, however it's trained, is going to need information from humans, about humans. But of course humans are complicated, our stories about humans are complicated, and so the AI's stories about humans will be complicated. An intentional stance model has to walk a fine line so that it captures important detail, but not so much detail that the humans no longer are being understood in terms of beliefs, desires, etc.

V

I think this wraps up the basic points (does it, though?), but I might be missing something. I've certainly left out some "advanced points," of which I think the key one is the problem of generalization: if an AI has an intentional stance model of you, would you trust it to design a novel environment that it thinks you'll find extremely congenial?

Oh and of course I've said almost nothing about practical schemes for creating an artificial intentional stance. Dissecting the corpse of a scheme can help clarify issues, although sometimes fatal problems are specific to that scheme. By the end of summer I'll try to delve a little deeper, and take a look at whether you could solve AI safety by prompting GPT-2 with "Q: What does the human want? A:".



Discuss

Just Imitate Humans?

27 июля, 2019 - 03:35
Published on July 27, 2019 12:35 AM UTC

Do people think we could make a singleton (or achieve global coordination and preventative policing) just by imitating human policies on computers? If so, this seems pretty safe to me.

Some reasons for optimism: 1) these could be run much faster than a human thinks, and 2) we could make very many of them.

Acquiring data: put a group of people in a house with a computer. Show them things (images, videos, audio files, etc.) and give them a chance to respond at the keyboard. Their keyboard actions are the actions, and everything between actions is an observation. Then learn the policy of the group of humans. By the way, these can be happy humans who earnestly try to follow instructions. To model their policy, we can take the maximum a posteriori estimate over a set of policies which includes the truth, and freeze the policy once we're satisfied. (This is with unlimited computation; we'd have to use heuristics and approximations in real life). With MAP, this will be quick to run once we freeze the policy, and we're no longer tracking tons of hypotheses, especially if we used some sort of speed prior. Let .mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} T be the number of interaction cycles we record before freezing the policy. For sufficiently large T, it seems to me that running this is safe.

What are people's intuitions here? Could enough human-imitating artificial agents (running much faster than people) prevent unfriendly AGI from being made?

If we think this would work, there would still be the (neither trivial nor hopeless) challenge of convincing all serious AGI labs that any attempt to run a superhuman AGI is unconscionably dangerous, and we should stick to imitating humans.



Discuss

Страницы