Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 38 минут 23 секунды назад

Double Tongue Whistling

2 октября, 2019 - 14:10
Published on October 2, 2019 11:10 AM UTC

I can whistle about seven notes per second, which corresponds to a reel at 105bpm. [1] While this isn't a problem for whistling basslines it's slightly too slow for melodies at contra dance speed (~110-122bpm). I want to figure out how to whistle faster, and I know there are people who whistle faster, but I don't know how it's usually done.

I see two main routes:

  • Do what I currently do, but faster.

  • Figure out how to do something else.

The former doesn't seem very promising: I've been playing around with whistling for decades and I suspect I'm pretty close to a local maximum with my current approach. On the other hand, I'm just trying to get from seven notes per second to eight, which seems like it might be possible?

The latter is pretty open, and probably involves doing something that's slower at first but will eventually be faster. The main problem is, how do I know that after I put in all that effort I'll actually end up with something faster? Ideally there would be people demonstrating on youtube or something, with "here's how to whistle quickly" videos, but I'm not seeing that.

Still, it seems like some sort of double-tonguing should work. Normally when I whistle I mark the notes with my glottis, the same as the two glottal stops in "uh-oh". A different option, though, would be to mark the notes by making a velar closure with my tongue, the same as the two velar stops in "cook" ("k"). And then I could alternate between them, which seems like it should let me get up to twice the speed of the slower one. Here's what the three sound like:

I find velar stops harder, partly because it's not what I'm used to, and partly because I'm already using my tongue to form the whistle. Currently I can them 4-5 times per second. When alternating I can do a little better, 5-6 times per second, but that's still less than the 8-10 you'd expect from doubling my velar-only speed. I can go te-ke-te-ke with closures 10-11 times per second, so I am optimistic.

Has anyone learned to double-tongue their whistling successfully? Does this work?

[1] Or a jig at 140bpm, since that's six notes per measure instead of eight.

Comment via: facebook


Cambridge LW/SSC Meetup

2 октября, 2019 - 06:37
Published on October 2, 2019 3:37 AM UTC

This is the monthly Cambridge, MA LessWrong / Slate Star Codex meetup.

Note: The meetup is in apartment 2 (the address box here won't let me include the apartment number).


Does the US nuclear policy still target cities?

2 октября, 2019 - 03:18
Published on October 2, 2019 12:18 AM UTC

The history of nuclear strategic bombing

Daniel Ellsberg’s The Doomsday Machine brought my attention to a horrifying fact about early US nuclear targeting policy. In 1961, the US had only one nuclear war plan, and it called for the destruction of every major Soviet city and military target. That is not surprising. However, the plan also called for the destruction of every major Chinese city and military target, even if China had not provoked the United States. In other words, the US nuclear war plan called for the destruction of the major population centers of the most populous country in the world, even in circumstances where that country had not attacked the United States or its allies. Ellsberg points out that at the time, people at RAND and presumably other parts of the US defense establishment understood that the Chinese and the Soviets were beginning to diverge in strategic interests and thus should not be treated as one bloc. Nevertheless, the top levels of the US command, including President Eisenhower, were committed to the utter destruction of both Chinese and Soviet targets in the event of a war with either country.

The policy of destroying cities is a legacy left over from strategic bombing in World War II. The destruction of Hiroshima and Nagasaki are the most famous, but the fire bombings of Japanese and Germany cities destroyed far more infrastructure and killed far more people than the two atomic bombs. The given rational for strategic bombing was to destroy the ability of the enemy states to continue to make war. If a state can no longer produce airplanes and tanks, either because the factories have been destroyed or because there are no longer people to work in the factories, then its ability to resist is diminished.

Given the level of technology and development in WWII, strategic bombing had a chance at achieving military objectives, because the conflict was to carry on for multiple years. On the timescale of years, a country’s capacity to build armaments and resupply armies in the field can be crucial to victory.

Nuclear war changes this calculus. In a modern nuclear war involving SLBMs (Submarine launched ballistic missiles), ICBMs (Intercontinental ballistic missiles), strategic bombers, and other weapon systems, the majority of an adversary’s military, industrial and population centers could be destroyed in a matter of days or hours. It is hard to imagine a nuclear war lasting years or even months. Without a prolonged war, the original rationale for strategic bombing disappears, or is at least much reduced. A state may still wish to reduce the capacity of its enemy to fight future wars, but it can no longer claim that the wholesale destruction of cities is necessary to achieve military objectives in the current war.

Why then, did early US nuclear policies call for the destruction of cities?

Nuclear game theory in the 1960s

The destruction of cities was primarily a threat of inflicting harm rather than an attempt to destroy the capacity of the enemy to wage war. The idea, formalized by RAND game theorist  Thomas Schelling, was that both the United States and the Soviet Union would threaten massive retaliation against each other’s civilian populations and industry to deter the other from starting a war.

Schelling developed a category of game theory that involved what he termed “mixed motive games”. Games where both sides sought advantage, but where the payoff to one side did not strictly correlate to the loss to the other side. In these types of games, both players may wish to avoid outcomes that are mutually unfavorable (Strategy of Conflict, pg. 89). In the case of nuclear deterrence, both sides strongly preferred to avoid nuclear war, and thus were both were deterred from taking actions that would directly lead to nuclear war.

Much of Schelling’s work concerns itself with how states in a nuclear stalemate can pursue their own advantage while avoiding escalation to nuclear war. In this type of game, states try to maneuver each other into positions where the only possible actions are 1) escalate and risk nuclear war or 2) de-escalate and concede something to the other side.

During the Cuban Missile Crisis, Kennedy ordered a blockade of Cuba, believing that such an action would not be sufficient for the Soviet Union to initiate a war. The United States believed that the Soviet Union would not try to break the blockade, because such an action would be recognized by both sides as starting a (nuclear) war. Because Kennedy proved correct in his belief that the Soviet Union would not go to war over the blockade nor risk initiating war by breaking the blockade, the United States used to its advantage both countries unwillingness to go to war.

What does this have to do with the targeting cities? To answer this question it’s necessary to consider how a nuclear war might start. Although nuclear powers would almost always prefer to avoid a nuclear war, each has an incentive to strike first if they believe nuclear war to be inevitable. By striking first they may destroy their adversary’s nuclear forces before they can be used. At this point, it’s useful to define a couple of terms. Counterforce targeting refers to the targeting of enemy military installations, especially other nuclear forces. Countervalue targeting refers to the targeting of enemy infrastructure and population centers.

Consider the primary goal of a first strike. Under the most plausible nuclear war scenarios, it is to eliminate the nuclear forces of the rival state; its objective is primarily counterforce in nature. This is markedly different than the goal of a second strike. The primary goal of a second strike, under normal assumptions of deterrence, is actually to provide fulfilment of the pre-commitment made to retaliate if ever attacked. That is, it is necessary to actually be committed to attacking second so as to avoid being attacked in the first place. 

Schelling effectively argued that the more punishing the second strike threatened to be, the more effective the deterrent would be also. If true, then in the event of a nuclear war, a state following the optimal strategy of deterrence would target cities as well as nuclear targets to make their nuclear response as punishing as possible; that is, it would destroy both counterforce and countervalue targets. This would seem to lead to a policy for states to attack cities, without a second thought. Indeed, this was the policy of the US and the Soviet Union in the 1950s and early 1960s. However, just because targeting cities promised to be a more effective deterrent did not mean it promised to be the best policy. 

Given some probability of nuclear war, the effectiveness of a deterrent strategy ought to be weighed against the severity of the resulting war were that strategy employed. In other words, it might make sense for a state to commit to not targeting cities in a second strike if they themselves do not have their cities destroyed in a first strike. While this may reduce the effectiveness of their deterrent (and perhaps only marginally -- nuclear war is plenty damaging without cities being destroyed -- the fallout alone will kill many millions), it may also greatly reduce the severeness of a nuclear war. 

Herman Kahn, a prominent and controversial RAND researcher, argued that states would be rational to refrain from destroying cities in a first strike, to retain some bartering power that might allow them to save more of their own cities. The argument is that the defending force might refrain from destroying many enemy cities if doing so prevented their own cities from being destroyed. Kahn believed that the US should study and prepare for negotiating for the avoidance of US cities in a nuclear war and that in order to do this the country should:

  1. Develop the ability to have sufficiently protected or hidden nuclear forces to be able to both survive a first strike and carry out counterforce and countervalue attacks.
  2. Have “backup presidents”, or people with authority to both order attacks and negotiate with the Soviet Union in the midst of a war, and that the US should have multiple secure locations which are staffed 24/7 by these leaders.

Both Herman Kahn and Thomas Schelling agreed that negotiating the end of a nuclear war would be difficult, but both believed it was critical that nuclear states remain capable of negotiation. Schelling writes about this in his 1966 work, Arms and Influence:

The closing stage, furthermore, might have to begin quickly, possibly before the first volley had reached its targets; and even the most confident victor would need to induce his enemy to avoid a final, futile orgy of hopeless revenge. In earlier times, one could plan the opening moves of war in detail and hope to improvise plans for its closure; for thermonuclear war, any preparations for closure would have to be made before the war starts. ...A critical choice in the process of bringing a war to a successful close--or to the least disastrous close--is whether to destroy or to preserve the opposing government and its principal channels of command and communication. If we manage to destroy the opposing government’s control over its own armed forces, we may reduce their military effectiveness. At the same time, if we destroy the enemy government’s authority over its armed forces, we may preclude anyone’s ability to stop the war, to surrender, to negotiate an armistice, or to dismantle the enemy’s weapons.Historical developments in US nuclear targeting policy

The United State’s nuclear targeting policy has evolved from one of indiscriminate destruction of military and civilian targets, including cities, to one that promises proportional retaliation. While the public documents, perhaps intentionally, do not make the US’s position clear, their implication is that the United States would only target cities in the event that their own cities were destroyed.The first nuclear targeting plans existed in the form of SIOP (Single Integrated Operational Plan). This classified document outlined our nuclear policy starting in 1961 until 2004, and now exists in the form of the Operations Plan (OPLAN). The first SIOP specified all out targeting of both military targets and population centers, that is both counterforce and countervalue targeting, in both first strike and second strike scenarios. Later SIOPs contained multiple options, including the option to hold the bombing of cities in reserve.

This paper: "The Trump Administration’s Nuclear Posture Review (NPR): In Historical Perspective" summarizes how the Kennedy administration began to advocate for a limited war scenario that spared cities:

President Kennedy went so far as to endorse Secretary of Defense McNamara’s effort to get the Soviets to agree to a “no cities” nuclear targeting rule, which McNamara and the President soon abandoned in the face of objections from NATO and the US Congress as well as the Kremlin that the idea was totally unrealistic. McNamara thereupon did a 180°turn to champion a MAD arms limitation (and retention) pact with the Soviets – to prevent nuclear war by guaranteeing it will be mutually suicidal. The Johnson administration’s effort to negotiate such a treaty with Kosygin was aborted in 1968 by the Soviet Union’s brutal repression of the reformist Dubcek regime in Czechoslovakia. McNamara continued to work secretly with the military, however, to enlarge the menu in the SIOP (Single Integrated Operational Plan) from which the president could select limited and controlled nuclear responses to a nuclear attack – preserving some possibility of a nuclear cease fire prior to Armageddon.

Even though McNamara’s efforts to change nuclear war plans to spare cities failed, his influence led to changes in the SIOP that for the first time specified a flexible response in nuclear war planning. Nixon would later make additional changes to the SIOP, giving the United States even more flexibility in nuclear targeting scenarios. It is not clear whether the United States ever developed a serious “no cities” strategy during the Cold War, but it did at least lay the foundations for one.

For the first time in US history, President Obama's administration stated the US would not target cities with nuclear weapons. However, this statement did not rule out escalation to countervalue targeting in the midst of a nuclear war, and is best interpreted to mean that the US would only target cities as a retaliatory measure. From the same Historical Perspective paper:

Yet Obama, while conceding to this presumed need to be prepared to actually use nuclear weapons in extreme situations, was not about to totally devolve the planning for such use onto the Pentagon…. he was adamant in his guidance to the military that if that crucial threshold ever had to be crossed, all operations had to be “consistent the fundamental principles of the Law of Armed Conflict. Accordingly, plans will … apply the principles of distinction and proportionality and seek to minimize collateral damage to civilian populations and civilian objects. The United States will not intentionally target civilian populations or civilian objects”(US Department of Defense, 2013 US Department of Defense. 2013. Report on Nuclear Employment Strategy of the United States Specified in Section 491 of 10 U.S.C. June 12.The restrictive rules of nuclear engagement were translated into the military’s doctrinal language: “The new guidance,” elaborated the Pentagon’s June 2013 Report on Nuclear Employment Strategy, requires the United States to maintain significant counterforce capabilities [jargon for directed at strategic weapon systems] against potential adversaries. The new guidance does not rely on a ‘counter-value’ or ‘minimum deterrence’ strategy [jargon for directed at centers of population]. (US Department of Defense, 2013 US Department of Defense. 2013. Report on Nuclear Employment Strategy of the United States Specified in Section 491 of 10 U.S.C. June 12.Did this mean that the United States was discarding its ultimate assured destruction threat for deterring nuclear war? Clearly not. The guidance was carefully drafted. Does not rely on is different from will not resort to. But more explicitly and openly than previously, the language indicates that assured massive destruction of the enemy country would be the very last resort in an already massively escalating nuclear war, in which all the lesser options had been exhausted and had failed to control the violence.

President Trump’s nuclear policy, as contained in the 2018 Nuclear Posture Review, differs in a number of ways from President Obama’s policies, but doesn’t substantially change the doctrine of holding the targeting of cities in reserve.

If deterrence fails, the initiation and conduct of nuclear operations would adhere to the law of armed conflict and the Uniform Code of Military Justice. The United States will strive to end any conflict and restore deterrence at the lowest level of damage possible for the United States, allies, and partners, and minimize civilian damage to the extent possible consistent with achieving objectives.Every U.S. administration over the past six decades has called for flexible and limited U.S. nuclear response options, in part to support the goal of reestablishing deterrence following its possible failure. This is not because reestablishing deterrence is certain, but because it may be achievable in some cases and contribute to limiting damage, to the extent feasible, to the United States, allies, and partners. Conclusion and takeaways:

The US nuclear targeting policy, in so much as public statements and documents reveal, has shifted substantially from a policy of targeting cities by default to a policy that leaves cities as reserve targets for full escalation scenarios. The US policy has never ruled out the possibility of escalation to full countervalue targeting and is unlikely to do so.

The maxim “no plan survives contact with the enemy” is especially worrying from the perspective of nuclear war planning. During the early cold war years described in Daniel Ellsberg’s book, the military culture promoted a dedication to nuclear readiness--so much so that officers violated their own protocols to ensure they could launch nuclear weapons in a crisis. Readiness for retaliation, especially full countervalue retaliation, naturally trades off against risk of full escalation. 

As both Herman Kahn and Thomas Schelling made clear, communication between legitimate authorities is essential to the ability to negotiate the end of a nuclear conflict. Yet the military value of disabling an enemy’s nuclear command, control, and communications (NC3) capabilities is large. This may be the biggest risk to cities; if Moscow and Washington are both destroyed in the early stages of nuclear conflict, then this could easily escalate to all out countervalue targeting. It is important not only that some command structure with the authority to negotiate remain intact on each side, but also that both parties can communicate with each other and trust that the adversary’s command structure is actually intact and capable of negotiation.

Finally, all of this means very little if either of two potential adversaries fail to make plans for a) refraining from initial targeting of cities b) maintaining NC3 capabilities through an initial nuclear strike, and c) have the authority and intention to negotiate a peace & de-escalation. Indeed, public statements by Soviet leadership during the cold war suggested they had no intention of sparing cities in a retaliatory strike, making any possible US policy of gradual escalation potentially useless. Ultimately, it’s important to recognize here that not all nuclear war scenarios have equal outcomes, and that both sides in a nuclear conflict could benefit greatly from engaging in strategic restraint.


Survival and Flourishing Fund Applications closing in 3 days

2 октября, 2019 - 03:12
Published on October 2, 2019 12:12 AM UTC

The grant round we announced a month ago for the new Survival and Flourishing Fund is closing in 3 days. We haven't gotten that many applications, so I would recommend applying.


From the original announcement post:

The plan is to make a total of $1MM-$2MM in grants to organizations working on the long term flourishing and survival of humanity.At this point in time SFF can only make grants to charities and not individuals, so if you have a project or organization that you want to get funding for, you will have to either already be part of an established charity, found one, or be sponsored by an existing charity.


Open & Welcome Thread - October 2019

2 октября, 2019 - 02:10
Published on October 1, 2019 11:10 PM UTC

  • If it’s worth saying, but not worth its own post, here's a place to put it.
  • And, if you are new to LessWrong, here's the place to introduce yourself.
    • Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are welcome.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ.

The Open Thread sequence is here.


LW Team Updates - October 2019

2 октября, 2019 - 02:08
Published on October 1, 2019 11:08 PM UTC

Like last month, this is a once-monthly updates for LessWrong team activities and announcements.

Please also feel free to use the comments section on this post as a Schelling point to give feedback, file bug reports, or ask questions you have about the site. (You can also email us, ask a question, use Intercom, or message us via our FB page.)

Recent Features

Link Previews

We successfully shipped Link Previews in September. Now when you hover of an embedded link you get a pop-up, for internal links to LessWrong posts you get a preview. See the full announcement here.

Improvements to the Community Map

In the course of supporting SSC Meetups Everywhere, we've made a few improvements to the community map on our Community page.

  • In addition to Groups and Events, now users can also add themselves to the map together with some info about what they're looking for or willing to do, e.g. someone can list themselves as willing to arrange local meetups if there are enough people in their area.
  • You can sign up for event and group notifications, as well as notifications for once there are N people within a given radius from you.
  • The UI has been improved.
  • The map is now more performant.

For more information about the LessWrong community section, including the different event types (LW/SSC/EA/MIRIx), see the relevant section of our FAQ.

Upcoming Features

Although we hoped to deliver them in September, the subscriptions overhaul and new editor are still under works. Unless things go terribly wrong, they should be released this month. Convert Comments to Posts is currently blocked behind some tech debt.

We might announce plans for additional near-term features once we've completed our Q4 planning.

General Updates

Petrov Day

On September 26th, 1983, Stanislav Petrov, a Russian lieutenant colonel, decided not to report [false] of incoming missiles from the US, thereby likely preventing nuclear war. September 26 has since become a Rationalist community holiday celebrating the calamity that did not occur that day and reminding us of the dangers present in a world powerful dual-use technologies.

LessWrong commemorated Petrov Day this year from the unilateralist curse angle and turned the site into a live exercise and experiment. A big red button was placed on the homepage and 125 users were given launch codes with the power to bring down the frontpage for 24 hours.

The LessWrong frontpage on Petrov Day

Though some considered it, hearteningly, no one entered launch codes and LessWrong remained safe.

Read the full debrief here.

Meetups Month

September was the month for SSC Meetups Everywhere. To support this, the LessWrong team placed the community map on the frontpage for most of the month. Over a hundred SSC Everywhere meetups took place over the month and Scott Alexander, MIRI team members, and a few others went on tour to visit them.

Meetups for LessWrong, Slate Star Codex, Effective Altruism, and MIRIx continue to happen throughout the year. Check them out on the community page.

Curated Emails are currently down

Users subscribed to emails for curated posts may have noticed these haven't been coming through. We're aware and have a fix, however the fix is risky to the rest of the site and we've been waiting for a good opportunity to test it, hopefully soon. The RSS feed for posts, including curated posts, is working fine.

Q3 was Metrics Quarter

In Q3, the LessWrong team experimented with targeting a metric. While we have no expectation that any simple metrics could reliably track our long-term goals and values, it seemed instructive to see how much we could move a simple metric if we really tried. For a quarter, we attempted to follow Paul Graham's advice and get 7% weekly growth on a value that mattered.

Stand by for a post describing the experiment and how it went.

Ways to Follow LessWrong

We've been expanding the number of ways people can consume LessWrong content beyond the site. These now include:

The Facebook, Twitter, and LinkedIn accounts receive regular updates M/W/F of a mix of curated posts, top all-time content, and outstanding new content.

Feedback & Support

The team can be reached for feedback and support via:


Impact Is Not Primarily About World State

2 октября, 2019 - 00:03
Published on October 1, 2019 9:03 PM UTC

These existential crises also muddle our impact algorithm. This isn't what you'd see if impact were primarily about the world state.

Appendix: We Asked a Wrong Question

How did we go wrong?

When you are faced with an unanswerable question—a question to which it seems impossible to even imagine an answer—there is a simple trick that can turn the question solvable. Asking “Why do I have free will?” or “Do I have free will?” sends you off thinking about tiny details of the laws of physics, so distant from the macroscopic level that you couldn’t begin to see them with the naked eye. And you’re asking “Why is .mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} X the case?” where X may not be coherent, let alone the case. “Why do I think I have free will?,” in contrast, is guaranteed answerable. You do, in fact, believe you have free will. This belief seems far more solid and graspable than the ephemerality of free will. And there is, in fact, some nice solid chain of cognitive cause and effect leading up to this belief. ~ Righting a Wrong Question

I think what gets you is asking the question "what things are impactful?" instead of "why do I think things are impactful?". Then, you substitute the easier-feeling question of "how different are these world states?". Your fate is sealed; you've anchored yourself on a Wrong Question.

At least, that's what I did.

Exercise: someoneme, early last year says that impact is closely related to change in object identities.

Find at least two scenarios which score as low impact by this rule but as high impact by your intuition, or vice versa.

You have 3 minutes.

Gee, let's see... Losing your keys, the torture of humans on Iniron, being locked in a room, flunking a critical test in college, losing a significant portion of your episodic memory, ingesting a pill which makes you think murder is OK, changing your discounting to be completely myopic, having your heart broken, getting really dizzy, losing your sight.

That's three minutes for me, at least (its length reflects how long I spent coming up with ways I had been wrong).

Appendix: Avoiding Side Effects

Some plans feel like they have unnecessary side effects:

Go to the store.versusGo to the store and run over a potted plant.

We talk about side effects when they affect our attainable utility (otherwise we don't notice), and they need both a goal ("side") and an ontology (discrete "effects").

Accounting for impact this way misses the point.

Yes, we can think about effects and facilitate academic communication more easily via the phrase, but we should be careful not to guide research from that frame. This is why I avoided vase examples early on – their prevalence seems like a symptom of an incorrect frame.

(Of course, I certainly did my part to make them more prevalent, what with my first post about impact being called Worrying about the Vase: Whitelisting...)


  • Your ontology can't be ridiculous ("everything is a single state"), but as long as it lets you represent what you care about, it's fine by AU theory.
  • Read more about ontological crises at Rescuing the utility function.
  • Obviously, something has to be physically different for events to feel impactful, but not all differences are impactful. Necessary, but not sufficient.
  • AU theory avoids the mind projection fallacy; impact is subjectively objective because probability is subjectively objective.
  • I'm not aware of others explicitly trying to deduce our native algorithm for impact. No one was claiming the ontological theories explain our intuitions, and they didn't have the same "is this a big deal?" question in mind. However, we need to actually understand the problem we're solving, and providing that understanding is one responsibility of an impact measure! Understanding our own intuitions is crucial not just for producing nice equations, but also for getting an intuition for what a "low-impact" Frank would do.


What funding sources exist for technical AI safety research?

1 октября, 2019 - 18:30
Published on October 1, 2019 3:30 PM UTC

What funding sources exist? Who are they aimed at - people within academia, independent researchers, early-career researchers, established researchers? What sort of research are they aimed at - MIRI-style deconfusion, ML-style approaches, more theoretical, less theoretical? What quirks do they have, what specific things do they target?

For purposes of this question, I am not interested in funding sources aimed at strategy/infrastructure/coordination/etc, only direct technical safety research.


Communication and Waking Hours

1 октября, 2019 - 14:00
Published on October 1, 2019 11:00 AM UTC

One of the ways technology interacts with culture is about who's job it is to avoid contacting people at bad times. With older technologies there were clear conventions based on their limitations:
  • Mail: you can send the letter whenever, and the person will deal with it when convenient.

  • Phone: don't call someone when they might be asleep, but if you work nights unplug your phone during the day.

It's more complicated now, though, because the new ways of contacting people all boil down to "a message goes from your phone to theirs" and their phone may or may not be configured to let them know about the message. I think the rules basically come down to:

  • If the default configuration is either always silent or has a nightly "quiet period" then you can send whenever and not worry about waking people.

  • Otherwise treat it as if it will wake people up unless you happen to know that this particular person has configured their app differently.

In the first category we have email, Slack, and probably some other apps. In the second we have phone, SMS, Whatsapp, FB Messenger, Hangouts, Signal, and almost everything else.

Overall this is a bad equillibrium: it would be better to at least have the option to send your message whenever, and if the person is asleep they'll get it in the morning. The receiver has a much better idea of when are ok times for notifications than the sender does . But there's currently no way to know how someone has their device configured, so you generally need to stay on the safe side and wait for a reasonable hour before sending.

If we did successfully move from "sender guesses do-not-disturb" to "receiver's device tracks do-not-disturb", however, it would be good to have some sort of override. For example, if you called me in the middle of the night ideally you would get an automated message saying that I'm asleep, but offering you the option to press a button to get reach me if it's urgent enough that you want to wake me up. Systems where calls from trusted contacts bypass do-not-disturb don't do this well: not every late-night call from my sister is intended to wake me up, and there are people who may need to contact me urgently that I won't have thought to configure in my phone.


Announcing the Farlamp project

1 октября, 2019 - 05:16
Published on October 1, 2019 2:16 AM UTC

Announcing the Farlamp project

Project definition:

I'm studying the impact of overseer failure on RL-based IDA, because I want to know under what conditions the amplification increases or decreases the failure rate, in order to help my reader understand whether we need to combine capability amplification with explicit reliability amplification in all cases.

In this project I will:

  1. Take the implementation of iterated distillation and amplification from Christiano et al.'s ‘Supervising strong learners by amplifying weak experts’ and adapt it to reinforcement learning. (It is using supervised learning now.)
  2. Introduce overseer failures and see how they influence the overall failure rate.
  3. Write a paper about the results.

Overseer failures in SupAmp and ReAmp contains a more extensive introduction, as well as an explanation of the relevant terms, concepts etc.

The project repo contains all the public artifacts I have produced so far.

At the moment I'm expanding my ML skills using the book Hands-On Machine Learning with Scikit-Learn & TensorFlow. After this I will start working on the IDA code.

Paul Christiano has been funding this project, giving me a chance to try my hand at research (again). The next evaluation will be in December/January. Depending on the progress, the funding and therewith the project can be discontinued.

I'm also Looking for remote writing partners.


Looking for remote writing partners (for AI alignment research)

1 октября, 2019 - 05:16
Published on October 1, 2019 2:16 AM UTC

I'm looking for other junior researchers to form a distributed writing group for mutual support. Please get in touch if you're interested.


One to three other people and I.

I do AI alignment research independently. I'm working on the Farlamp project (see the GitHub repo or the project announcement on LessWrong). And I live in Japan.


Read and discuss one another's work.

Duration: 15 min x number of participants


Each at their desk, all connected by video call.




Because in a group we can:

  • Bring up questions, responses, suggestions for each other's research.
  • Review each other's outlines and drafts.
  • Keep each other disciplined.
  • Support each other in other ways.
  • If you want to join, then PM or email me (<given name of President Nixon>.moehn@posteo.de).
  • We start and end our meetings on time.
  • We talk about the state of our work, free-form or following prompts. This format will change based on what we find works well, how far a person is in their research etc.

Idea, content and rationale derived from Booth et al.: The Craft of Research.


Please Take the 2019 EA Survey!

1 октября, 2019 - 00:21
Published on September 30, 2019 9:21 PM UTC

The 2019 Effective Altruism Survey is now live at the following link: https://www.surveymonkey.co.uk/r/EAS2019LW

The average completion time for this year’s survey is 20 minutes.

The survey will close on the 10th of October at midnight BST.

The EA Survey provides valuable information about the demographics of the EA community, how people get involved, how they donate, what causes they prioritize, their experiences of EA, and more.

This year, we worked more closely than ever with the Centre for Effective Altruism, 80,000 Hours, and other EA organizations to identify questions that would be as valuable as possible to their work. Overall, we received more than 100 requests for questions to be included from various members of the EA community, which meant making many tough decisions about what questions we could include, while keeping the survey to an appropriate length.

This year, the Centre for Effective Altruism has generously donated a prize of $1000 USD that will be awarded to a randomly selected respondent to the EA Survey, for them to donate to any of the organizations listed on EA Funds. Please note that to be eligible, you need to provide a valid e-mail address so that we can contact you.

We would like to express our gratitude to the EA Meta Fund for supporting our work.


Elimination of Bias in Introspection: Methodological Advances, Refinements, and Recommendations

30 сентября, 2019 - 23:23
Published on September 30, 2019 8:23 PM UTC

Radek Trnka & Vit Smelik

New Ideas in Psychology 56 (2020)


Building on past constructive criticism, the present study provides further methodological development focused on the elimination of bias that may occur during first-person observation. First, various sources of errors that may accompany introspection are distinguished based on previous critical literature. Four main errors are classified, namely attentional, attributional, conceptual, and expressional error. Furthermore, methodological recommendations for the possible elimination of these errors have been determined based on the analysis and focused excerpting of introspective scientific literature. The following groups of methodological recommendations were determined: 1) a better focusing of the subject’s attention to their mental processes, 2) providing suitable stimuli, and 3) the sharing of introspective experience between subjects. Furthermore, the potential of adjustments in introspective research designs for eliminating attentional, attributional, conceptual, and expressional error is discussed.


List of resolved confusions about IDA

30 сентября, 2019 - 23:03
Published on September 30, 2019 8:03 PM UTC

AI Alignment is a confusing topic in general, but even compared to other alignment topics, IDA seems especially confusing. Some of it is surely just due to the nature of communicating subtle and unfinished research ideas, but other confusions can be cleared up with more specific language or additional explanations. To help people avoid some of the confusions I or others fell into in the past while trying to understand IDA (and to remind myself about them in the future), I came up with this list of past confusions that I think have mostly been resolved at this point. (However there's some chance that I'm still confused about some of these issues and just don't realize it. I've included references to the original discussions where I think the confusions were cleared up so you can judge for yourself.)

I will try to maintain this list as a public reference so please provide your own resolved confusions in the comments.

alignment = intent alignment

At some point Paul started using "alignment" refer to the top-level problem that he is trying to solve, and this problem is narrower (i.e., leaves more safety problems to be solved elsewhere) than the problem that other people were using "alignment" to describe. He eventually settled upon "intent alignment" as the formal term to describe his narrower problem, but occasionally still uses just "aligned" or "alignment" as shorthand for it. Source

short-term preferences ≠ narrow preferences

At some point Paul used "short-term preferences" and "narrow preferences" interchangeably, but no longer does (or at least no longer endorses doing so). Source

preferences = "actual" preferences (e.g., preferences-on-reflection)

When Paul talks about preferences he usually means "actual" preferences (for example the preferences someone would arrive at after having a long time to think about it while having access to helpful AI assistants, if that's a good way to find someone's "actual" preferences). He does not mean their current revealed preferences or the preferences they would state or endorse now if you were to ask them. Source

corrigibility ≠ based on short-term preferences

I had misunderstood Paul to be using "corrigibility to X" as synonymous with "based on X's short-term preferences". Actually "based on X's short-term preferences" is a way to achieve corrigibility to X, because X's short-term preferences likely includes "be corrigible to X" as a preference. "Corrigibility" itself means something like "allows X to modify the agent" or a generalization of this concept. Source

act-based = based on short-term preferences-on-reflection

My understanding is that "act-based agent" used to mean something different (i.e., a simpler kind of AI that tries to do the same kind of action that a human would), but most people nowadays use it to mean an AI that is designed to satisfy someone's short-term preferences-on-reflection, even though that no longer seems particularly "act-based". Source

act-based corrigibility

Evan Hubinger used "act-based corrigibility" to mean both a method of achieving corrigibility (based on short-term preferences) and the kind of corrigibility achieved by that method. (I'm not sure if he still endorses using the term this way.) Source

learning user preferences for corrigibility isn't enough for corrigible behavior

Because an act-based agent is about "actual" preferences not "current" preferences, it may be incorrigible even if it correctly learns that the user currently prefers the agent to be corrigible, if it incorrectly infers or extrapolates the user's "actual" preferences, or if the user's "actual" preferences does not actually include corrigibility as a preference. Source

distill ≈ RL

Summaries of IDA often describe the "distill" step as using supervised learning, but Paul and others working on IDA today usually have RL in mind for that step. Source

outer alignment problem exists? = yes

The existing literature on IDA (including a post about "reward engineering") seems to have neglected to describe the outer alignment problem associated with using RL for distillation. (Analogous problems may also exist if using other ML techniques such as SL.) Source

corrigible to the user? ≈ no

IDA is typically described as being corrigible to the user. But in reality it would be trying to satisfy a combination of preferences coming from the end user, the AI developer/overseer, and even law enforcement or other government agencies. I think this means that "corrigible to the user" is very misleading, because the AI is actually not likely to respect the user's preferences to modify (most aspects of) the AI or to be "in control" of the AI. Sources: this comment and a talk by Paul at an AI safety workshop

strategy stealing ≠ literally stealing strategies

When Paul says "strategy stealing" he doesn't mean observing and copying someone else's strategy. It's a term borrowed from game theory that he's using to refer to coming up with strategies that are as effective as someone else's strategy in terms of gaining resources and other forms of flexible influence. Source


I try not to have opinions

30 сентября, 2019 - 22:52
Published on September 30, 2019 7:52 PM UTC

Consider two kinds of mental judgments.

  1. Beliefs are judgments of truths. (Ex. "I believe that 2 plus 2 equals 4." "I am 85% confident that Mozambique is located in Africa.")
  2. Preferences are things people like or don't like. The existance or nonexistance of a preference is a fact.

What, then is an opinion?

"judgment or belief not founded on certainty or proof" — dictionary.com

To form a belief without proof or uncertainty is a recipe for overconfidence. The more opinions you have the more likely you are to be wrong. More importantly, opinions undermine your error correction system. You can't correct an opinion via proof if the opinion was never founded on proof in the first place.

The most insidious opinions are those based on vague words like "good", "bad" and "should" because there does not necessarily even exist an underlying falsifiable fact to be wrong about.

It's okay to use "should" as shorthand for things everyone agrees on. You can say "Bob Shepherd is a good politician" because in this context the word "good" is shorthand for "a politician who will will satisfy the preferences of his consitutents". But if you encounter someone who prefers a world without Bob Shepherd then words like "good" don't just lose axiomatic meaning. "Good" never meant anything in the first place.

The moment you believe "Bob Shepherd is a good politician" where "good" refers to intrinsic merit instead of the aggregate of individual preferences is the moment you break the chain of reasoning connecting your beliefs to falsifiable facts. In other words, your belief about facts and preferences turns into an unfalsifiable opinion.


Connectome-specific harmonic waves and meditation

30 сентября, 2019 - 21:08
Published on September 30, 2019 6:08 PM UTC

TL;DR: meditation is a process for altering the harmonics of brainwaves to produce good brain states.

Pulling together a few threads:

I think we can infer a lot from these observations, but I'll leave those for another post.


[AN #66]: Decomposing robustness into capability robustness and alignment robustness

30 сентября, 2019 - 21:00
Published on September 30, 2019 6:00 PM UTC

[AN #66]: Decomposing robustness into capability robustness and alignment robustness View this email in your browser

Find all Alignment Newsletter resources here. In particular, you can sign up, or look through this spreadsheet of all summaries that have ever been in the newsletter. I'm always happy to hear feedback; you can send it to me by replying to this email.

Starting this week, we have a few new summarizers; you can always find the whole team here. I (Rohin) will continue to edit all of the summaries and opinions, and add some summaries and opinions of my own.

Audio version here (may not be up yet).


2-D Robustness (Vladimir Mikulik) (summarized by Matthew): Typically when we think about machine learning robustness we imagine a scalar quantity representing how well a system performs when it is taken off its training distribution. When considering mesa optimization (AN #58), it is natural to instead decompose robustness into two variables: robust capabilities and robust alignment. When given an environment that does not perfectly resemble its training environment, a mesa optimizer could be dangerous by competently pursuing a mesa objective that is different from the loss function used during training. This combination of robust capabilities without robust alignment is an example of a malign failure, the most worrisome outcome of creating a mesa optimizer.

Matthew's opinion: Decomposing robustness in this way helps me distinguish misaligned mesa optimization from the more general problem of machine learning robustness. I think it's important for researchers to understand this distinction because it is critical for understanding why a failure to solve the robustness problem could plausibly result in a catastrophe rather than merely a benign capabilities failure.

Rohin's opinion: I strongly agree with this distinction, and in fact when I think about the problem of mesa optimization, I prefer to only think about models whose capabilities are robust but whose objective or goal is not, rather than considering the internals of the model and whether or not it is performing search, which seems like a much hairier question.

Technical AI alignment   Iterated amplification

Finding Generalizable Evidence by Learning to Convince Q&A Models (Ethan Perez et al) (summarized by Asya): This paper tries to improve performance on multiple-choice questions about text passages using a technique similar to AI safety via debate (AN #5). The set-up consists of a judge model and one or more evidence agents. First, the judge model is pretrained on samples consisting of a passage, a multiple-choice question about that passage, and the correct answer to that question. Then, in the experimental portion of the set-up, instead of looking at a full passage, the judge model looks at a subsequence of the passage created by combining the outputs from several evidence agents. Each evidence agent has been given the same passage and assigned a particular answer to the question, and must select a limited number of sentences from the passage to present to the judge model to convince it of that answer.

The paper varies several parameters in its setup, including the training process for the judge model, the questions used, the process evidence agents use to select sentences, etc. It finds that for many settings of these parameters, when judge models are tasked with generalizing from shorter passages to longer passages, or easier passages to harder passages, they do better with the new passages when assisted by the evidence agents. It also finds that the sentences given as evidence by the evidence agents are convincing to humans as well as the judge model.

Asya's opinion: I think it's a cool and non-trivial result that debating agents can in fact improve model accuracy. It feels hard to extrapolate much from this narrow example to debate as a general AI safety technique. The judge model is answering multiple-choice questions rather than e.g. evaluating a detailed plan of action, and debating agents are quoting from existing text rather than generating their own potentially fallacious statements.

What are the differences between all the iterative/recursive approaches to AI alignment? (Issa Rice)

Mesa optimization

Utility ≠ Reward (Vladimir Mikulik) (summarized by Rohin): This post describes the overall story from mesa-optimization (AN #58). Unlike the original paper, it focuses on the distinction between a system that is optimized for some task (e.g. a bottle cap), and a system that is optimizing for some task. Normally, we expect trained neural nets to be optimized; risk arises when they are also optimizing.

Agent foundations

Theory of Ideal Agents, or of Existing Agents? (John S Wentworth) (summarized by Flo): There are at least two ways in which a theoretical understanding of agency can be useful: On one hand, such understanding can enable the design of an artificial agent with certain properties. On the other hand, it can be used to describe existing agents. While both perspectives are likely needed for successfully aligning AI, individual researchers face a tradeoff: either they focus their efforts on existence results concerning strong properties, which helps with design (e.g. most of MIRI's work on embedded agency (AN #31)), or they work on proving weaker properties for a broad class of agents, which helps with description (e.g. all logical inductors can be described as markets, summarized next). The prioritization of design versus description is a likely crux in disagreements about the correct approach to developing a theory of agency.

Flo's opinion: To facilitate productive discussions it seems important to disentangle disagreements about goals from disagreements about means whenever we can. I liked the clear presentation of this attempt to identify a common source of disagreements on the (sub)goal level.

Markets are Universal for Logical Induction (John S Wentworth) (summarized by Rohin): A logical inductor is a system that assigns probabilities to logical statements (such as "the millionth digit of pi is 3") over time, that satisfies the logical induction criterion: if we interpret the probabilities as prices of contracts that pay out $1 if the statement is true and $0 otherwise, then there does not exist a polynomial-time trader function with bounded money that can make unbounded returns over time. The original paper shows that logical inductors exist. This post proves that for any possible logical inductor, there exists some market of traders that produces the same prices as the logical inductor over time.

Adversarial examples

E-LPIPS: Robust Perceptual Image Similarity via Random Transformation Ensembles (Markus Kettunen et al) (summarized by Dan H): Convolutional neural networks are one of the best methods for assessing the perceptual similarity between images. This paper provides evidence that perceptual similarity metrics can be made adversarially robust. Out-of-the-box, network-based perceptual similarity metrics exhibit some adversarial robustness. While classifiers transform a long embedding vector to class scores, perceptual similarity measures compute distances between long and wide embedding tensors, possibly from multiple layers. Thus the attacker must alter far more neural network responses, which makes attacks on perceptual similarity measures harder for adversaries. This paper makes attacks even harder for the adversary by using a barrage of input image transformations and by using techniques such as dropout while computing the embeddings. This forces the adversarial perturbation to be substantially larger.

AI strategy and policy

Why Responsible AI Development Needs Cooperation on Safety (Amanda Askell et al) (summarized by Nicholas): AI systems are increasingly being developed by companies, and as such it is important to understand how competition will affect the safety and robustness of these systems. This paper models companies as agents engaging in a cooperate-defect game, where cooperation represents responsible development, and defection represents a failure to develop responsibly. This model yields five factors that increase the likelihood of companies cooperating on safety. Ideally, companies will have high trust that others cooperate on safety, large benefits from mutual cooperation (shared upside), large costs from mutual defection (shared downside), not much incentive to defect when others cooperate (low advantage), and not be harmed too much if others defect when they cooperate (low exposure).

They then suggest four concrete strategies that can help improve norms today. First, companies should help promote accurate beliefs about the benefits of safety. Second, companies should collaborate on research and engineering. Third, companies should be transparent and allow for proper oversight and feedback. Fourth, the community should incentivize adhering to high safety standards by rewarding safety work and penalizing unsafe behavior.

Nicholas's opinion: Given that much of current AI progress is being driven by increases in computation power, it seems likely to me that companies will soon become more significant players in the AI space. As a result, I appreciate that this paper tries to determine what we can do now to make sure that the competitive landscape is conducive to taking proper safety precautions. I do, however, believe that the single step cooperate-defect game which they use to come up with their factors seems like a very simple model for what will be a very complex system of interactions. For example, AI development will take place over time, and it is likely that the same companies will continue to interact with one another. Iterated games have very different dynamics, and I hope that future work will explore how this would affect their current recommendations, and whether it would yield new approaches to incentivizing cooperation.

Read more: The Role of Cooperation in Responsible AI Development

Other progress in AI   Hierarchical RL

Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives (Anirudh Goyal et al) (summarized by Zach): Learning policies that generalize to new environments is a fundamental challenge in reinforcement learning. In particular, humans seem to be adept at learning skills and understanding the world in a way that is compositional, hinting at the source of the discrepancy. Hierarchical reinforcement learning (HRL) has partially addressed the discrepancy by decomposing policies into options/primitives/subpolicies that a top-level controller selects from. However, generalization is limited because the top-level policy must work for all states.

In this paper, the authors explore a novel decentralized approach where policies are still decomposed into primitives, but without a top-level controller. The key idea is to incentivize each primitive to work on a different cluster of states. Every primitive has a variational information bottleneck between the state and predicted action, that allows us to quantify how much information about the state the primitive uses in selecting actions. Intuitively, a primitive that knows how to open gates is going to extract a lot of information about gates from the state to choose an appropriate action, and won’t extract much information in states without gates. So, our high-level controller can just be: check which primitive is using the most state information, and let that primitive choose the action.

The reward R from a trajectory is split amongst the primitives in proportion to how likely each primitive was to be chosen. This is what incentivizes the primitives to use information from the state. The primitives also get a cost in proportion to how much information they use, incentivizing them to specialize to a particular cluster of states. Finally, there is a regularization term that also incentivizes specialization, and in particular prevents a collapse where a single primitive is always active.

To demonstrate effectiveness, the authors compare the baseline HRL methods option-critic and Meta-learning Shared Hierarchy to their method in grid-world and motion imitation transfer tasks. They show that using an ensemble of primitives can outperform more traditional HRL methods in generalization across tasks.

Zach's opinion: Overall, this paper is compelling because the method presented is both promising and provides natural ideas for future work. The method presented here is arguably simpler than HRL and the ability to generalize to new environments is simple to implement. The idea of introducing competition at an information theoretic level seems natural and the evidence for better generalization capability is compelling. It'd be interesting to see what would happen if more complex primitives were used.

Miscellaneous (AI)

Unreproducible Research is Reproducible (Xavier Bouthillier et al) (summarized by Flo): This paper argues that despite the growing popularity of sharing code, machine learning research has a problem with reproducibility. It makes the distinction between the reproducibility of methods/results, which can be achieved by fixing random seeds and sharing code, and the reproducibility of findings/conclusions, which requires that different experimental setups (or at least random seeds) lead to the same conclusion.

Several popular neural network architectures are trained on several image classification datasets several times with different random seeds determining the weight initialization and sampling of data. The relative rankings of the architectures with respect to the test accuracy are found to vary relevantly with the random seed for all data sets, as well as between data sets.

The authors then argue that while the reproducibility of methods can help with speeding up exploratory research, the reproducibility of findings is necessary for empirical research from which robust conclusions can be drawn. They claim that exploratory research that is not based on robust findings can get inefficient, and so call for the machine learning community to do more empirical research.

Flo's opinion: I really like that this paper not just claims that there is a problem with reproducibility, but demonstrates this more rigorously using an experiment. More robust empirical findings seem quite important for getting to a better understanding of machine learning systems in the medium term. Since this understanding is especially important for safety relevant research, where exploratory research seems more problematic by default, I am excited for a push in that direction.


Open Phil AI Fellowship (summarized by Rohin): The Open Phil AI Fellowship is seeking applications for its third cohort. Applications are due by October 25. The fellowship is open to current and incoming PhD students, including those with pre-existing funding sources. It provides up to 5 years of support with a stipend of $40,000 and a travel allocation of $10,000.


Copyright © 2019 Rohin Shah, All rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.


Thermal Mass Thermos

30 сентября, 2019 - 16:10
Published on September 30, 2019 1:10 PM UTC

A thermos does a great job with liquids: the insulation means you can pour in hot soup in the morning, and enjoy hot soup at lunch. But it doesn't do nearly as good a job with solids: fill one with rice and it quickly falls into the "danger zone".

One recommendation you'll see is that you should fill the thermos with hot water, let it sit, and then swap the water for the food. This does help, but because the thermos doesn't have that much thermal mass it only helps some. What if we put a rock in the thermos to hold heat, and preheat it along with the rest of the thermos? How much does this help? Does this help enough?

The general rule is that hot food shouldn't be below 140F for more than two hours, because substantial bacteria can grow, and the closer it is to 100F the worse it is. My goal here is to be able to send hot dry food with Lily's lunch, and I pack it four hours before she gets to eat it, so the question is: what's the temperature two hours in, which is two hours before lunch time? If that's above 140F we're ok, if not we should figure something else out.

I ran an experiment on one cup of cooked rice, the kind of dry food you're not normally supposed to put in a thermos. I heated it in the microwave until it was steaming hot (it measured ~188F with an infrared thermometer):

I tested this in three configurations:

  • T1: a thermos that has been pre-heated with boiling water.

  • T2: same as T1, but with the addition of a 4.3oz (122g) rock with volume 1.5oz (45ml).

  • T3: same as T2, but the thermos has been preheated with two rounds of boiling water.

After two hours I measured the temperature of the rice:
  • T1: 108F
  • T2: 136F (143F in the middle)
  • T3: 142F (148F in the middle)
The rock is clearly helping a lot, though it looks I should do the full T3 treatment with two rounds of boiling water in this particular thermos.

This works, but it is kind of hacky: rocks are odd shaped, hard to clean, rattle around, hide food, etc. How else could we solve this? Ideas:

  • Use water instead of a rock, because water has excellent heat capacity by volume and very good heat capacity by weight. Have two sections: an outer one you pour boiling water into, and an inner one for the dry food. Keep heat from leaving the overall enclosure, but allow heat to move freely between the two sections.

  • Something microwave-safe which uses a solid for the thermal mass. tick it in the microwave for a few minutes before putting the food in, or even with the food inside. Water has a maximum temperature of ~212F, but with a design like this you could bring your thermal mass up to, say, 300F (wrapped in an insulating layer so no one got burnt).

  • Use a phase-change material? Maybe like this travel mug but with a higher target temperature than 136F.

  • Battery powered cross between a thermos and a crock pot. I do see some things kind of like this aimed at people who want to keep coffee warm in their car (no battery; takes power from the car) and some expensive ones (ex) that have good internal batteries. But nothing I can find designed to eat warm dry food out of?

Why isn't there a product designed to keep dry food warm? Am I just not finding it?


What's your favorite notetaking system?

30 сентября, 2019 - 09:18
Published on September 30, 2019 6:18 AM UTC

Abram recently wrote up the Zettelkasten system for notetaking while doing research. Do you have an opinion on an alternative system, and if so, what is it?

Things you might optionally include but should not let the lack of preclude answering:

  • How the system works
  • Plusses and minuses of the system
  • Who or what problems you think it works especially well and poorly for
  • Comparisons to other notetaking systems.
  • Your final judgement of the system.


Noticing Frame Differences

30 сентября, 2019 - 04:24
Published on September 30, 2019 1:24 AM UTC

Previously: Keeping Beliefs Cruxy

When disagreements persist despite lengthy good-faith communication, it may not just be about factual disagreements – it could be due to people operating in entirely different frames — different ways of seeing, thinking and/or communicating.

If you can’t notice when this is happening, or you don’t have the skills to navigate it, you may waste a lot of time.

Examples of Broad FramesGears-oriented Frames

Bob and Alice’s conversation is about cause and effect. Neither of them are planning to take direct actions based on their conversation, they’re each just interested in understanding a particular domain better.

Bob has a model of the domain that includes gears A, B, C and D. Alice has a model that includes gears C, D and F. They’re able to exchange information, and their information is compatible,and they each end up with a shared model of how something works.

There are other ways this could have gone. Ben Pace covered some of them in a sketch of good communication:

  • Maybe they discover their models don’t fit, and one of them is wrong
  • Maybe combining their models results in a surprising, counterintuitive outcome that takes them awhile to accept.
  • Maybe they fail to integrate their models, because they were working at different levels of abstraction and didn’t realize it.

Sometimes they might fall into subtler traps.

Maybe the thing Alice is calling “Gear C” is actually different from Bob’s “Gear C”. It turns out that they were using the same words to mean different things, and even though they’d both read blogposts warning them about that they didn’t notice.

So Bob tries to slot Alice’s gear F into his gear C and it doesn’t fit. If he doesn’t already have reason to trust Alice’s epistemics, he may conclude Alice is crazy (instead of them referring to subtly different concepts).

This may cause confusion and distrust.

But, the point of this blogpost is that Alice and Bob have it easy.

They’re actually trying to have the same conversation. They’re both trying to exchange explicit models of cause-and-effect, and come away with a clearer understanding of the world through a reductionist lens.

There are many other frames for a conversation though.

Feelings-Oriented Frames

Clark and Dwight are exploring how they feel and relate to each other.

The focus of the conversation might be navigating their particular relationship, or helping Clark understand why he’s been feeling frustrated lately

When the Language of Feelings justifies itself to the Language of Gears, it might say things like: “Feelings are important information, even if it’s fuzzy and hard to pin down or build explicit models out of. If you don’t have a way to listen and make sense of that information, your model of the world is going to be impoverished. This involves sometimes looking at things through lenses other than what you can explicitly verbalize.”

I think this is true, and important. The people who do their thinking through a gear-centric frame should be paying attention to feelings-centric frames for this reason. (And meanwhile, feelings themselves totally have gears that can be understood through a mechanistic framework)

But for many people that’s not actually the point when looking through a feelings-centric frame. And not understanding this may lead to further disconnect if a Gearsy person and a Feelingsy person are trying to talk.

“Yeah feelings are information, but, also, like, man, you’re a human being with all kinds of fascinating emotions that are an important part of who you are. This is super interesting! And there’s a way of making sense of it that’s necessarily experiential rather than about explicit, communicable knowledge.”

Frames of Power and Negotiation

Dominance and Threat

Erica is Frank’s boss. They’re discussing whether the project Frank has been leading should continue, or whether it should stop and all the people on Frank’s team reassigned.

Frank argues there’s a bunch of reasons his project is important to the company (i.e. it provides financial value). He also argues that it’s good for morale, and that cancelling the project would make his team feel alienated and disrespected.

Erica argues back that there are other projects that are more financially valuable, and that his team’s feelings aren’t important to the company.

It so happens that Frank had been up for a promotion soon, and that would put him (going forward) on more even footing with Erica, rather than her being his superior.

It’s not (necessarily) about the facts, or feelings.

If Alice and Bob wandered by, they might notice Erica or Frank seeming to make somewhat basic reasoning mistakes about how much money the project would make or why it was valuable. Naively, Alice might point out that they seem to be engaging in motivated reasoning.

If Clark or Dwight wandered by, they might notice that Erica doesn’t seem to really be engaging with Frank’s worries about team morale. Naively, Clark might say something like “Hey, you don’t seem to really be paying attention to what Frank’s team is experiencing, and this is probably relevant to actually having the company be successful.”

But the conversation is not about sharing models, and it’s not about understanding feelings. It’s not even necessarily about “what’s best for the company.”

Their conversation is a negotiation. For Erica and Frank, most of what’s at stake are their own financial interests, and their social status within the company.

The discussion is a chess board. Financial models, worker morale, and explicit verbal arguments are more like game pieces than anything to be taken at face value.

This might be fully transparent to both Erica and Frank (such that neither even considers the other deceptive). Or, they might both earnestly believe what they’re saying – but nonetheless, if you try to interpret the conversation as a practical decision about what’s best for the company, you’ll come away confused.

The Language of Trade

George and Hannah are negotiating a trade.

Like Erica and Frank, this is ultimately a conversation about what George and Hannah want.

A potential difference is that Erica and Frank might think of their situation as zero-sum, and therefore most of the resolution has more to do with figuring out “who would win in a political fight?”, and then having the counterfactual loser back down.

Whereas George/Hannah might be actively looking for positive sum trades, and in the event that they can’t find one, they just go about their lives without getting in each other’s way.

(Erica and Frank might also look for opportunities to trade, but doing so honestly might first require them to establish the degree to which their desires are mutually incompatible and who would win a dominance contest. Then, having established their respective positions, they might speak plainly about what they have to offer each other)

Noticing obvious frame differences

So the first skill here, is noticing when you’re having wildly different expectations about what sort of conversation you’re having.

If George is looking for a trade and Frank is looking for a fight, George might find himself suddenly bruised in ways he wasn’t prepared for. And/or, Frank might have randomly destroyed resources when there’d been an opportunity for positive sum interaction.

Or: If Dwight says “I’m feeling so frustrated at work. My boss is constantly belittling me”, and then Bob leaps in with an explanation of why his boss is doing that and maybe trying to fix it…

Well, this one is at least a stereotypical relationship failure mode you’ve probably heard of before (where Dwight might just want validation).

Untangling Emotions, Beliefs and Goals

A more interesting example of Gears-and-Feelings might be something like:

Alice and Dwight are talking about what career options Dwight should consider. (Dwight is currently an artist, not making much money, and has decided they want to try something else)

Alice says “Have you considered becoming a programmer? I hear they make a lot of money and you can get started with a 3 month bootcamp.”

Dwight says “Gah, don’t talk to me about programming.”

It turns out that Dwight’s dad always pushed him to learn programming, in a fairly authoritarian way. Now Dwight feels a bunch of ughiness around programming, with a mixture of “You’re not the boss of me! I’mma be an artist instead!”

In this situation, perhaps the best option might be to say: “okay, seems like programming isn’t a good fit for Dwight,” and move on.

But it might also be that programming is actually a good option for Dwight to consider… it’s just that the conversation can’t proceed in the straightforward cost/benefit analysis frame that Alice was exploring.

Dwight making meaningful updates on whether programming is good for him depends on untangling his emotions, and/or exploring the relationship between his explicit models and his messier internals. It might require making piece with some longstanding issues with his father, or learning to detach them from the “should I be a programmer” question.

It might be that the most useful thing Alice can do is give him the space to work through that on his own.

If Dwight trusts Alice to shift into a feelings-oriented framework (or a framework that at least includes feeling), Alice might be able to directly help him with the process.

It may also be that this prerequisite trust doesn't exist, or that Dwight just doesn't want to have this conversation, in which case it's probably just best to move on to another topic.

Subtle differences between frames

This gets much more complicated when you observe that a) there’s lots of slight variations on frames, and b) many people and conversations involve a mixture of frames.

It’s not that hard to notice that one person is in a feelings-centric frame while another person is in a gears-centric frame. But things can actually get even more confusing if two people share a broad frame (and so think they should be speaking the same language), but actually they’re communicating in two different subframes.

Example differences between gears-frames

Consider variations of Alice and Bob – both focused on causal models – who are coming from these different vantage points:

Goal-oriented vs Curiosity-driven conversation

Alice is trying to solve a specific problem (say, get a particular car engine fixed), and Bob thinks they’re just, like, having a freewheeling conversation about car engines and how neat they are (and if their curiosity took them in a different direction they might shift the conversation towards something that had nothing to do with car engines).

Debate vs Doublecrux

Alice is trying to present arguments for her side, and expects Bob to refute those arguments or present different arguments. The burden of presenting a good case is on Bob.

Whereas Bob thinks they’re trying to mutually converge on true beliefs (which might mean adopting totally new positions, and might involve each person focusing on how to change their own mind rather than their partner’s)

Specific ontologies

If one person is, say, really into economics, then they might naturally frame everything in terms of transactions. Someone else might be really into programming and see everything as abstracted functions that call each other.

They might keep phrasing things in terms that fit their preferred ontology, and have a hard time parsing statements from a different ontology.

Example differences between feelings-frames

“Mutual Connection” vs “Turn Based Sharing”

Clark might be trying to share feelings for the sake of building connection (sharing back and forth, getting into a flow, getting resonance).

Whereas Dwight might think the point is more for each of them to fully share their own experience, while the other one listens and takes up as little space as possible.

“I Am My Feelings” vs “My Feelings are Objects”

Clark might highly self identify with his feelings (in a sort of Romantic framework). Dwight might care a lot about understanding his feelings but see them as temporary objects in his experience (sort of Buddhist)

Concrete example: The FOOM Debate

One of my original motivations for this post was the Yudkowsky/Hanson Foom Debate, where much ink was spilled but AFAICT neither Yudkowsky nor Hanson changed their mind much.

I recently re-read through some portions of it. The debate seemed to feature several of the “differences within gears-orientation” listed above:

Specific ontologies: Hanson is steeped in economics and sees it as the obvious lens to look at AI, evolution and other major historical forces. Yudkowsky instead sees things through the lens of optimization, and how to develop a causal understanding of what recursive optimization means and where/whether we’ve seen it historically.

Goal vs Curiosity: I have an overall sense that Yudkowsky is more action oriented – he’s specifically setting out to figure out the most important things to do to influence the far future. Whereas Hanson mostly seems to see his job as “be a professional economist, who looks at various situations through an economic lens and see if that leads to interesting insights.”

Discussion format: Throughout the discussion, Hanson and Yudkowsky are articulating their points using very different styles. On my recent read-through, I was impressed with the degree and manner to which they discussed this explicitly:

Eliezer notes:

I think we ran into this same clash of styles last time (i.e., back at Oxford). I try to go through things systematically, locate any possible points of disagreement, resolve them, and continue. You seem to want to jump directly to the disagreement and then work backward to find the differing premises. I worry that this puts things in a more disagreeable state of mind, as it were—conducive to feed-backward reasoning (rationalization) instead of feed-forward reasoning. It’s probably also worth bearing in mind that these kinds of metadiscussions are important, since this is something of a trailblazing case here. And that if we really want to set up conditions where we can’t agree to disagree, that might imply setting up things in a different fashion than the usual Internet debates.

Hanson responds:

When I attend a talk, I don’t immediately jump on anything a speaker says that sounds questionable. I wait until they actually make a main point of their talk, and then I only jump on points that seem to matter for that main point. Since most things people say actually don’t matter for their main point, I find this to be a very useful strategy. I will be very surprised indeed if everything you’ve said mattered regarding our main point of disagreement.

I found it interesting that I find both these points quite important – I've run into each failure mode before. I'm unsure how to navigate between this rock and hard place.

My main goal with this essay was to establish frame-differences as an important thing to look out for, and to describe the concept from enough different angles to (hopefully) give you a general sense of what to look for, rather than a single failure mode.

What to do once you notice a frame-difference depends a lot on context, and unfortunately I'm often unsure what the best approach is. The next few posts will approach "what has sometimes worked for me", and (perhaps more sadly) "what hasn't."