Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 46 минут 6 секунд назад

Remote AI alignment writing group seeking new members

18 января, 2020 - 05:10
Published on January 18, 2020 2:10 AM UTC

If you're interested, then email me (<given name of President Nixon>.moehn@posteo.de) or PM me on LessWrong.

We meet weekly for 15 min ⨉ number of members. At the moment the meetings consist of explaining aspects of our research to each other. This helps us get better at explaining (sometimes off the cuff), as well following technical explanations and coming up with questions about them. Also, questions from others are a good way to find holes in one’s understanding and patch them. – Better find them among friends and patch them early than getting flustered in public later.

The research topics of the two current members are:

  • impact measures/instrumental convergence à la Alex Turner
  • iterated distillation and amplification à la Paul Christiano

Even if your topic is far afield, we'll try and make it work. Explanations and questions are universal.

There used to be four members. This means:

  1. The meetings are useful for some and not for others. If you're unsure, I suggest you join. If you don't get enough value for your time, you can leave again. We respect that.

  2. Two spots are free!

See also the original announcement.


Being a Robust, Coherent Agent (V2)

18 января, 2020 - 05:06
Published on January 18, 2020 2:06 AM UTC

Second version, updated for the 2018 Review. See change notes.

There's a concept which many LessWrong essays have pointed at it (honestly, the entire sequences are getting at). But I don't think there's a single post really spelling it out explicitly:

You might want to become a more robust, coherent agent.

By default, humans are a kludgy bundle of impulses. But we have the ability to reflect upon our decision making, and the implications thereof, and derive better overall policies. Some people find this naturally motivating – there’s something aesthetically appealing about being a coherent agent.

But if it’s not naturally appealing, the reason I think it’s worth considering is robustness – being able to adapt to novel challenges in complex domains.

This is related to instrumental rationality, but I don’t think they’re identical. If your goals are simple and well-understood, and you're interfacing in a social domain with clear rules, and/or you’re operating in domains that the ancestral environment would have reasonably prepared you for… the most instrumentally rational thing might be to just follow your instincts or common folk-wisdom.

But instinct and common wisdom often aren’t enough, such as when...

  • You expect your environment to change, and default-strategies to stop working.
  • You are attempting complicated plans for which there is no common wisdom, or where you will run into many edge-cases.
  • You need to coordinate with other agents in ways that don’t have existing, reliable coordination mechanisms.
  • You expect instincts or common wisdom to be wrong in particular ways.
  • You are trying to outperform common wisdom. (i.e. you’re a maximizer instead of a satisficer, or are in competition with other people following common wisdom)

In those cases, you may need to develop strategies from the ground up. Your initial attempts may actually be worse than the common wisdom. But in the longterm, if you can acquire gears-level understanding of yourself, the world and other agents, you might eventually outperform the default strategies.

Elements of Robust Agency

I think of Robust Agency as having a few components. This is not exhaustive, but an illustrative overview:

  • Deliberate Agency
  • Gears-level-understanding of yourself
  • Coherence and Consistency
  • Game Theoretic Soundness
Deliberate Agency

Make conscious choices about your goals and decision procedures that you reflectively endorse, rather than going with whatever kludge of behaviors that evolution and your social environment cobbled together.

Gears Level Understanding of Yourself

In order to reflectively endorse your goals and decisions, it helps to understand your goals and decisions, as well as various intermediate parts of yourself. This requires many subskills, including the ability to introspect on yourself, and make changes to how your decision making works.

(Meanwhile, it also helps to understand how your decisions interface with the rest of the world, and the people you interact with. Gears level understanding is generally useful. Scientific and mathematical literacy helps you validate your understanding of the world)

Coherence and Consistency

If you want to lose weight and also eat a lot of ice cream, that’s a valid set of human desires. But, well, it might just be impossible.

If you want to make long term plans that require commitment but also want the freedom to abandon those plans whenever, you may have a hard time (also, people you made plans with might get annoyed).

You can make deliberate choices about how to resolve inconsistencies in your preferences. Maybe you decide “actually, losing weight isn’t that important to me”, or maybe you decide that you want to keep eating all your favorite foods but also cut back on overall calorie consumption.

Commitment versus freedom gets at a deeper issue – each of those opens up a set of broader strategies, some of which are mutually exclusive. How you resolve the tradeoff will shape what future strategies are available to you.

There are benefits to reliably being able to make trades with your future-self, and with other agents. This is easier if your preferences aren’t contradictory, and easier if your preferences are either consistent over time, or at least predictable over time.

Game Theoretic Soundness

There are other agents out there. Some of them have goals orthogonal to yours. Some have common interests with you, and you may want to coordinate with them. Others may be actively harming you and you need to stop them.

They may vary in…

  • How much they've thought about their goals.
  • What their goals are.
  • Where their circles of concern are drawn.
  • How hard (and how skillfully) they're trying to be be game theoretically sound agents, rather than just following local incentives.
  • Beliefs and strategies.

Being a robust agent means taking that into account. You must find strategies that work in a messy, mixed environment with confused allies, active adversaries, and sometimes people who are a little bit of both. (This includes creating credible incentives and punishments to deter adversaries from bothering, and motivating allies to become less confused).

Related to this is legibility. Your gears-level-model-of-yourself helps you improve your own decision making. But it also lets you clearly expose your policies to other people. This can help with trust and coordination. If you have a clear decision-making procedure that makes sense, other agents can validate it, and then you can tackle more interesting projects together.


Here’s a smattering of things I’ve found helpful to think about through this lens:

  • Be the sort of person that Omega (even a version of Omega who's only 90% accurate) can clearly tell is going to one-box. Or, less exotically: Be the sort of person who your social network can clearly see is worth trusting, with sensitive information, or with power. Deserve Trust.
  • Be the sort of agent who cooperates when it is appropriate, defects when it is appropriate, and can realize that cooperating-in-this-particular-instance might look superficially like defecting, but avoid falling into a trap.
  • Think about the ramifications of people who think like you adopting the same strategy. Not as a cheap rhetorical trick to get you to cooperate on every conceivable thing. Actually think about how many people are similar to you. Actually think about the tradeoffs of worrying about a given thing. (Is recycling worth it? Is cleaning up after yourself at a group house? Is helping a person worth it? The answer actually depends, don't pretend otherwise).
  • If there isn't enough incentive for others to cooperate with you, you may need to build a new coordination mechanism so that there is enough incentive. Complaining or getting angry about it might be a good enough incentive but often doesn't work and/or isn't quite incentivizing the thing you meant. (Be conscious of the opportunity costs of building this coordination mechanism instead of other ones. Be conscious of trying and failing to build a coordination mechanism. Mindshare is only so big)
  • Be the sort of agent who, if some AI engineers were whiteboarding out the agent's decision making, they would see that the agent makes robustly good choices, such that those engineers would choose to implement that agent as software and run it.
  • Be cognizant of order-of-magnitude. Prioritize (both for things you want for yourself, and for large scale projects shooting for high impact).
  • Do all of this realistically given your bounded cognition. Don't stress about implementing a game theoretically perfect strategy, but do be cognizant how much computing power you actually have (and periodically reflect on whether your cached strategies can be re-evaluated given new information or more time to think). If you're being simulated on a whiteboard right now, have at least a vague, credibly notion of how you'd think better if given more resources.
  • Do all of this realistically given the bounded condition of *others*. If you have a complex strategy that involves rewarding or punishing others in highly nuanced ways.... and they can't figure out what your strategy is, you may instead just be adding random noise instead of a clear coordination protocol.
Why is this important?The world is unpredictable

The world is changing, rapidly, due to cultural clashes as well as new technology. Common wisdom can’t handle the 20th century, let alone a singularity.

I feel comfortable making the claim: Your environment is almost certainly unpredictable enough that you will benefit from a coherent approach to solving novel problems. Understanding your goals and your strategy are vital.

There are two main reasons I can see to not prioritize the coherent agent strategy:

1. There may be higher near-term priorities.

You may want to build a safety net, to give yourself enough slack to freely experiment. It may make sense to first do all the obvious things to get a job, have enough money, and social support. (That is, indeed, what I did)

I'm not kidding when I say that building your decisionmaking from the ground up can leave you worse off in the short form. The valley of bad rationality be real, yo. See this post for some examples of things to watch out for.

I think becoming a more coherent agent is useful, but if you don't have general safety, I'd prioritize that first.

2. Self-reflection and self-modification is hard.

It requires a certain amount of mental horsepower, and some personality traits that not everyone has, including:

  • Social resilience and openness-to-experience (necessary to try nonstandard strategies).
  • Something like ‘stability’ or ‘common sense’ (I’ve seen some people try to rebuild their decision theory from scratch and end up hurting themselves).
  • In general, the ability to think on purpose, and do things on purpose.

If you’re the sort of person who ends up reading this post, I think you are probably the sort of person who would probably benefit (someday, from a position of safety/slack) from attempting to become more coherent, robust and agentic.

I’ve spent the past couple years hanging around people who are More Agentic Than Me. It took a long while to really absorb their worldview and understand how and why I might want to develop as an agent.

I hope this post gives others a clearer idea of what this path might look like, so they can consider it for themselves.

Game Theory in the Rationalsphere

That said, the reason I was motivated to write this wasn’t to help individuals. It was to help with group coordination.

The EA, Rationality and X-Risk ecosystems include lots of people with ambitious, complex goals. They have many common interests and should probably be coordinating on a bunch of stuff. But they disagree on many facts, and strategies. They vary in how hard they’ve tried to become game-theoretically-sound agents.

My original motivation for writing this post was that I kept seeing (what seemed to me) to be strategic mistakes in coordination. It seemed to me that they were acting as if the social landscape was more uniform, and expecting people to be on the same “meta-page” of how to resolve coordination failure.

But then I realized that I’d been implicitly assuming something like “Hey, we’re all trying to be robust agents, right? At least kinda? Even if we have different goals and beliefs and strategies?”

And that wasn’t obviously true in the first place.

I think it’s much easier to coordinate with people if you are able to model each other. If people have common knowledge of a shared meta-strategic-framework, it’s easier to discuss strategy and negotiate. If multiple people are trying to make their decision-making robust in this way, that hopefully can constrain their expectations about when and how to trust each other.

And if you aren’t sharing a meta-strategic-framework, that’s important to know!

So the most important point of this post is to lay out the Robust Agent paradigm explicitly, with a clear term I could quickly refer to in future discussions, to check “is this something we’re on the same page about, or not?” before continuing on to discuss more complicated ideas.


"How quickly can you get this done?" (estimating workload)

18 января, 2020 - 03:10
Published on January 18, 2020 12:10 AM UTC

Epistemic status: based partly upon what I learnt as a certified ScrumMaster and mainly from practical experience running a software development team. Some of the later ideas come from Gower Handbook of Project Management, Judgement in Managerial Decision Making, and How to Measure Anything.

When asked how long something will take, you have hidden ambiguity in the question and multiple sources of error. Here we will attempt to clarify both and start to address some of these issues.

What is the question?

Let's start with the ambiguity of the question. Lets ignore ambiguity over the definition of 'finished' and any errors in your estimation for now. Lets also assume that you already have a prioritised list of projects/work-packages, some of which will be in-progress, some wont have been started yet (I will take the terminology from SCRUM and call this the 'backlog' but it can apply to anything, not just projects as we traditionally think of them). Here are a few things that I think people actually mean when that ask "how long will this [project] take?":

  1. If you started this now and worked on nothing else (during your usual work hours) when would this task be completed?
  2. If you started this project now and worked on no other projects (during your usual work hours), but still had to deal with the usual non-development work (such as meetings, firefighting and emails) when would this task be completed?
  3. If you worked on this project as the absolute highest priority work. How many hours would you have spent on this project by the time it is finished?
  4. Given that there are other projects that are in progress or even higher priority than this, and that some of your time is spent in non-development work. When will this project be finished?
  5. As with 4 but realising that you are likely to add other work into the backlog, some of which will be higher priority than the project in discussion.

There is an important subtlety to the phrasing of these questions which greatly changes their answers. This is not a problem if the questioner and the estimator agree on the definition but if you wanted an answer to variant 5 but got the answer to variant 1 then there is a problem.
Lets go through an example to show how these variants differ and which is the best version in different scenarios.


A customer requested a feature to be added to some software and that you are responsible for the team that is developing the software.

To work out where you should add it to the backlog (i.e. it's priority compared to other things your team could work on) you want to know the "size" of the work, in this case you do not care whether you team has lots of meeting on at the moment or the priority of other projects - thus variant 3 is the best version in this case. Let's say that you have a perfectly accurate and precise predicting machine that says this is 2-hours (I want to revisit this simplification in another post). But it's important to say that number along with the understanding that you cannot actually get the work done in that time. Your team will have non-development work to do, they will have meetings, and need breaks, if you know they have to deal with "firefighting" of emergencies then this needs to be factored in too. You go back to your predicting machine and ask it: what is the current fraction of time my team can focus on development work, it replies with 50%.
Unfortunately the answer to variant-2 is not "in 4 hours time" (2h / 50%) because there will be times when you cannot progress on the project because you are waiting on something (most commonly other people). Lets say for this project you need sign-off from the marketing department roughly half-way through the work (other common reasons might be waiting for access, for information, for the results of other people's work, etc). So we go back to our predicting machine and it says "the reply from marketing will take 3h, there will be no other delays". This give us our answer to variant-2: "the work can be done in 8 hours, but 3 of those hours we will be waiting on others, and 2 of those will be on unrelated non-development work". Note that by being specific when answering these questions you reduce confusion.
This still doesn't help our customer, because we are unlikely to start the work right away. We will probably finish off the work we are on, possibly implement other higher-priority work and then start. We go back to our prediction machine which says "the projects that are in-progress or higher priority than this one (from the backlog) will be complete in 4 days time". We add that to the 8h for the project in question and get the answer to variant-4 "assuming the backlog doesn't change the project will be complete in 5 days". But we know that isn't a realistic assumption, just like this project, we regular add new projects to the backlog and that shifts the priorities and thus the timescales for each item to be complete. We go back to our prediction machine it says "before you get started on the project in question you will add a further 6 days of development work that is higher priority than the project in question". So you have to add this extra 6 days to the 5 days, take into account any weekends and holidays and give the final answer to variant-5, the one the customer actually cares about: "it will be 11 working days to before you can use this feature".
So we have gone from a piece of work that will only take 2-hours to one that will not be ready for 11 working days. This is the scale of the ambiguity.

These figures and not unrealistic either - though I would be working hard as a team lead to increase the 50% development work figure and reduce or eliminate the 3h wait for marketing sign-off.

What is 'done'/'complete'?

Lets add one piece of complexity back in. What do you mean when you say done? Sticking with the example of a piece of software, here are a few possible parts to what you might mean by done, but there are more, and many of those have their own ambiguity:

  1. basic functionality works on the machine of one developer: "it's working for me"
  2. thoroughly tested
  3. thoroughly documented
  4. an automatic test suite written that covers every line changed
  5. deployed so a (/every) customer can use the feature

Again, there might be a doubling or more of workload between "working on my machine" to deployed and documented ready for customers to use.
The only point I will make on this is to recognise the ambiguity and address it upfront. But unlike the last section the definitions will depend a lot on the area you are working in.

Conclusion and Recommendations

I will leave the advice on how to improve estimation for another time (please let me know if you are interested in this) but there are a few points to take away. Firstly "how long will this take?" or "when can you get this done by?" is a tricky question even if you are a well calibrated (accurate) and precise estimator. It's best to give your answer, not just in hours but as a longer sentence, "it's only 2h work but we wont be done for 11 days" or more verbosely "the work can be done in 8 hours, but 3 of those hours we will be waiting on others, and 2 of those will be on unrelated non-development work. We probably wont start it for another 10 working days though".


Studying Early Stage Science: Research Program Introduction

18 января, 2020 - 01:12
Published on January 17, 2020 10:12 PM UTC

(Leverage just posted the below introduction into their current research program to Medium, and I got permission to crosspost it here. I think it's quite a good read, and I more broadly think that the history of science is one of the best ways to study the art of rationality)


Scientific progress is responsible for some of the most amazing developments in human history. It has enabled us to cure diseases, increase agricultural yields, and travel quickly and safely across the globe. As such, the opportunity to enable progress in science can be very valuable.

Having studied examples from the history of science, we at Leverage Research believe it is possible to describe how early discoveries led to the creation of impressive and sophisticated scientific disciplines. By understanding the methodologies used by the researchers at key points in the past, as well as the social and institutional contexts that enabled them to make progress, we believe it may be possible to help modern researchers make progress in new or stagnating fields.

In this paper, we outline a research program designed to investigate this hypothesis. We describe three important examples from the history of science, identify a phenomenon worth studying, state a specific formulation of our hypothesis, and describe our research methodology¹. With this paper and the work that follows it, we hope to set the stage for the study of early stage science.

You can read more about early stage science on our website.

Three Historical CasesGalvani, Volta, and the Invention of the Battery

In 1780, an Italian physician and biologist named Luigi Galvani announced the discovery of a new electrical phenomenon, which he called “animal electricity.”

For some time, Galvani had been performing experiments using dissected frogs to investigate how electricity interacted with animal bodies. In his most famous experiment, he found that he could make dissected frog legs twitch by hanging them on iron hooks and probing them with a piece of metal. He concluded on the basis of his investigations that an electrical charge was being generated by the frog leg itself, and that this “animal electricity” was generated by an electrical fluid in the frogs (Cajavilca, Varon, and Sternbach 2009, 160).

Figure 1: Galvani experimenting on frogs

Galvani’s research attracted others to investigate animal electricity. In particular, it drew the attention of an Italian physicist and chemist by the name of Alessandro Volta.

Volta began to suspect that electricity was being generated from a source other than the frog itself. Volta’s hypothesis was that the electrical charge was instead created by Galvani’s use of different metals to mount and probe the frog leg, and further that certain metals were naturally disposed to pass an electric charge given the presence of a conductor.

To test his hypothesis, Volta needed some way to detect the presence of an electric charge. In the days before the invention of precise instruments for measuring electricity, this was no small challenge. Others had used the creation of visible sparks or the physical sensation of getting an electric shock as a way of detecting electricity, but these effects tended to require a strong current and Volta suspected that the current involved in the frog leg experiment was relatively small.

He needed a more sensitive instrument. So, he used his tongue.

He reasoned that if a frog leg would conduct electricity, his tongue probably would too. And, Volta knew from previous experiments that an electric charge applied to the tongue could create a bitter sensation. This let him conduct an experiment: he touched either side of his tongue with combinations of metals and used the presence or absence of the bitter sensation to detect an electrical charge. He found that some combinations of metals did cause the bitter sensation, leading him to conclude that it was in fact the metals that passed a current through his tongue, rather than his tongue generating its own “animal electricity” (Cajavilca, Varon, and Sternbach 2009, 162; Shock and Awe: The Story of Electricity 2011).

Subsequent investigations led Volta to make further breakthroughs in understanding the ability of metals to transmit an electric charge. He eventually discovered that he could generate a strong and consistent electrical charge by stacking zinc and copper on top of one another in an alternating pattern with brine-soaked cloth or cardboard in between.

Volta’s invention was dubbed the Voltaic pile, and was the precursor to the modern battery.

Figure 2: Volta’s experiment and the Voltaic Pile

The debate between Galvani and Volta over animal electricity and related topics would result not only in the Voltaic pile, but also in the early development of the fields of electrophysiology, electromagnetism, and electrochemistry (Cajavilca, Varon, and Sternbach 2009, 159).

While the two had deep theoretical disagreements, Galvani’s creative experiments identified a phenomenon — frogs twitching when touching two pieces of metal — which Volta was able to study with a unique method of his own, using his tongue instead of a frog. Galvani’s theory of animal electricity was wrong, but his work enabled further research that led to the creation of the voltaic pile and the foundations of our modern understanding of electricity.

Volta and Galvani both had to develop many of their own theories in the absence of an accepted paradigm for electricity, and both had to design their own experimental tools and methods. The trajectory of their respective research programs, and of their research efforts combined, would have been hard to predict in advance. They tried anyway, and their work helped turn the study of electrical phenomenon into a fully developed scientific discipline.

Galileo and the Development of Telescopic Astronomy

The history of science includes a number of cases of scientific progress that share some of the notable features of Galvani and Volta’s research. Another example is the story of Galileo and the early development of telescopic astronomy.

Galileo Galilei pioneered the use of the telescope to observe celestial objects. He observed Jupiter’s moons, the phases of Venus, Saturn, the existence of sunspots and more. Initially, though, Galileo had trouble convincing others of the reliability and usefulness of the telescope. When Galileo visited Bologna in 1610 to demonstrate his telescope:

In the presence of a number of learned men, Galileo showed his telescope and let others observe earthly and celestial things through it. They agreed that for earthly objects the instruments performed as promised but that in the heavens it was not reliable. Although Galileo’s notes show that on the first night two and on the second night all four of the satellites were visible, none of the gentlemen present were able to see satellites around Jupiter. (Van Helden 1994, 11)

The learned men were skeptical of Galileo’s device for understandable reasons. The optical principles behind the device were not well understood at the time and some of Galileo’s claims about the heavens contravened established wisdom on the topic. Furthermore, Galileo’s telescope was not a very powerful or easy-to-use instrument. His telescope was capable of no more than 20x magnification and a field of view of around 15 feet, whereas one can purchase a modern amateur telescope today for around 100 USD that is capable of 10 times greater magnification (200x) and a field of view 34 times larger (516 feet) (Sky and Telescope 2017; Telescopic Watch 2019).

To illustrate the difficulties this posed, compare the image of Ely Cathedral viewed with a modern telescope at the same zoom as Galileo’s telescope (20x) with the field of view available to modern telescopes (Figure 3, top picture), with the image of the same object at the same distance with a replica of one of Galileo’s telescopes (Figure 3, bottom picture).

Figure 3: Top picture — modern telescope at 20x zoom; Bottom picture — replica of Galileo’s telescope, also at 20x zoom (Astronomy and Nature TV 2010)

The small circle in the middle of the bottom picture is the field of view available through the replica telescope. The light colored area shows the visible portion of the cathedral.

In addition to the small field of view, Galileo lacked modern equipment for mounting and stabilizing his telescope. This meant that even if Galileo could find the relevant astronomical object in the small field of view, ensuring that the telescope remained fixed on that object for repeated observations was a significant challenge.

Finally, even under optimal circumstances, it is not uncommon for a user of Galileo’s telescope to fail to see what they are meant to see. The instrument takes a certain amount of getting used to as the eye and brain become accustomed to interpreting the image and not everyone has sufficiently good eyesight to use the device properly (Van Helden 1994, 11). As one historian of science notes:

On several occasions I have taken a group of my students to look for Jupiter’s satellites through a replica of one of Galileo’s telescopes — students who were convinced the moons were really there — and the results were always mixed. Some saw all that were visible, some saw one or two, and some saw none at all. No matter how public the occasion, the actual observing remains an individual and private act. (Van Helden 1994, 12)

Indeed, the private nature of telescopic observations created difficulties in settling scientific disputes. For example, in 1643, Antonius Rheita published a book announcing the discovery of new satellites around Jupiter and Saturn. While others reported not seeing the satellites, Rheita claimed the satellites could only be seen with the new, more powerful telescope he had invented. All telescopes at the time were hand-made and thus differed in clarity and magnification, and so scientists could not simply use their own instruments to confirm or disconfirm each others’ claims as their failure to replicate results could be the consequence of inferior equipment. Scientists eventually reached agreement on this question in 1647, when Hevelius (who claimed to have built a still more powerful telescope), was able to convince scientists that the supposed satellites were actually fixed stars behind the planets (Van Helden 1994, 8–20).

The telescope was initially an unreliable, poorly understood tool. It did not produce directly shareable or replicable results and in practice often created scientific disputes that were difficult to resolve. Yet despite the challenges it posed, the telescope did sometimes allow early telescopic astronomers to make observations that were more detailed than those available to the naked eye. This modest improvement was enough to attract a small number of early scientists who began working with the device. Over time, they learned more about the optical principles involved, improved at grinding lenses for new telescopes, and developed more effective ways of communicating their findings to other astronomers and the public.

Black, Lavoisier, and the Chemical Revolution

In 1756, Joseph Black published his Experiments Upon Magnesia Alba in which he described experiments he performed on an unusual substance which he called “fixed air.” Black originally discovered the substance² through experiments on magnesia alba — now called magnesium carbonate — and chalk (calcium carbonate). Black observed that when magnesia alba was heated or combined with an acid it began to bubble and left behind a residue. Black was able to use an analytical balance that he invented to precisely weigh the residue and note that it lost a noticeable amount of weight. Black hypothesized that the bubbling and weight loss was caused by the liberation of a gas that had been “fixed” in the magnesia alba (hence the name, fixed air).

Black was curious about the properties of this fixed air and so began to devise ways to experiment with the substance.

He noticed that it had an unusual effect on fire, noting that:

I mixed together some chalk and vitriolic acid. . . The strong effervescence produced an air or vapour, which, flowing out at the top of the glass, extinguished a candle that stood close to it; and a piece of burning paper immersed in it, was put out as effectually as if it had been dipped in water. (West 2014, L1059)

This indicated that fixed air was not the same as atmospheric air. There were other unusual properties as well. He investigated the effect of fixed air on animals and found that it was remarkably toxic when inhaled — he noted that “sparrows died in it in ten or eleven seconds” (Robinson 1803).

The toxicity of fixed air when inhaled led Black to suspect that fixed air might be the air expelled as part of the respiration process itself. He designed several experiments to test this hypothesis. In one he blew bubbles into a solution of limewater (calcium hydroxide) and noted that a precipitate of chalk was leftover indicating that he had succeeded in fixing the air back into the chalk. He repeated this experiment on a larger scale by placing limewater soaked in rags in an air duct in the ceiling of a church and observing a chalk residue as the lime soaked up the fixed air from the congregation’s respiration (West 2014, L1059).

He made several other discoveries about the substance and explained various observed phenomena. For instance, there was an observed unusual effect at the Grotto del Cano in Italy, where animals that visited the Grotto died, but humans could visit unharmed. Black explained this phenomenon by proposing that fixed air was heavier than atmospheric air and thus sank closer to the ground, poisoning animals but not humans. He also discovered that fixed air was emitted during the fermentation process (West 2014, L1059).

We now know “fixed air” as carbon dioxide, and Black’s work isolating and describing the properties of carbon dioxide represented the first demonstration that gases can be weighable constituents of solid bodies, the first demonstration that gases are unique chemical substances and not atmospheric air in different states of purity, and the first demonstration that respiration involves the transformation of gases. In the 18 years after the publication of Black’s work all the respiratory gases were isolated and characterized with Henry Cavendish discovering hydrogen in 1766, Daniel Rutherford isolating nitrogen in 1772, and Joseph Priestly isolating oxygen in 1774 (West 2014, L1059).

Joseph Black’s work in isolating and describing carbon dioxide contributed significantly to scientific progress in chemistry. In much the same fashion as Galvani and Volta, Black discovered a new phenomenon and developed unusual and creative methods to study the phenomenon. In particular, his insight that carbon dioxide was connected to respiration allowed for rapid progress in isolating the other gases involved in respiration. As other scientists investigated new gases they attempted to explain them in terms of ultimately incorrect, yet sometimes useful theories. Indeed, even Black’s theory that the air was “fixed” in the solids is somewhat different than the modern understanding of the phenomenon. In modern nomenclature, Magnesia Alba is MgCO3, and by heating it MgCO3 becomes MgO + CO2. We might say that carbon dioxide can be liberated from Magnesia Alba but in modern terms, we probably wouldn’t describe carbon dioxide as being “fixed” in the Magnesia Alba. While Black’s theory of carbon dioxide may not match our modern understanding, he nevertheless contributed significantly to scientific progress by isolating and cleverly studying a new phenomenon.

As additional gases were isolated and described, there was considerable disagreement about what the gasses were and scientists advanced a number of competing theories to explain their properties. For example, Rutherford and Priestly’s discoveries were both explained in terms of the then popular phlogiston theory of combustion which posited that an element called phlogiston was released from materials as they combusted and combustion would continue until either all the phlogiston had been released or the air was saturated with phlogiston such that it could not contain more. On this theory nitrogen was labeled “phlogisticated air” and oxygen was labeled “dephlogisticated air.”³

The debate over the nature of these newly discovered airs culminated in the work of Antoine-Laurent de Lavoisier and what is often called the chemical revolution. Between 1775 and 1789, Lavoisier is credited with discovering the law of conservation of mass and a new theory of combustion, which explained combustion and acidic corrosion in terms of oxygen⁴ and eventually replaced the phlogiston theory. Lavoisier’s approach to chemistry built on Black’s approach through its focus on weight, but utilized much more sophisticated and elaborate equipment to investigate the properties of the newly discovered airs (West 2014, L1060).

Identifying a Phenomenon

The research efforts of Volta and Galvani, Galileo, and Black appear to have a number of attributes in common.

They feature a relative absence of established theories and well-understood instruments in the area of investigation, the appearance of strange or unexplained phenomena, and lack of theoretical and practical consensus among researchers. Progress seems to occur despite (and sometimes enabled by) flawed theories, individual researchers use imprecise measurement tools that are frequently new and difficult to share, and there exists a bi-directional cycle of improvement between increasingly sophisticated theories and increasingly precise measurement tools.

For example, consider Black’s study of fixed air. He began with an unexplained phenomenon: certain substances losing weight when heated. His approach to studying that phenomenon involved a wide range of methods, such as blowing bubbles into limewater and leaving limewater-soaked rags in the air duct of a church. This led to a larger study of new gases, where there were many theoretical disagreements between researchers, and where flawed theories (e.g., the theories of phlogisticated and dephlogisticated air) seemed to aid the process of discovery. Finally, improvements in measuring the weight of substances allowed for better theories and refinements to the researchers’ instruments.

These examples gesture at the potential existence of a recognizable cluster of discovery-related attributes (“Attribute Cluster 1”) that plausibly play an important role in scientific progress. This is striking, because this is different from another attribute cluster (“Attribute Cluster 2”) that is more familiar and more commonly referred to, that appears throughout the history of science. That second cluster includes large groups of scientists working together, a foundation of largely accepted theory, precise and well-understood instruments, researcher consensus on the quality of these instruments, and many small discoveries that tend to build iteratively on one-another and cohere with previous theory.

This raises a number of questions: Are the attributes in Attribute Cluster 1 actually present in cases described above? Do those attributes appear in a special set of cases in the history of science? More generally, is there a natural cluster of discovery-related attributes in the conceptual vicinity of Attribute Cluster 1 that appear in important cases of scientific discovery, like those described above? What can we learn from investigating the cases that are the most natural candidates for exemplifying attributes from Attribute Cluster 1? Is the distinction between Attribute Cluster 1 and Attribute Cluster 2 useful in the context of describing the progress of science?

The pattern suggested by the cases described above, and those like them, is noteworthy. If there is in fact a recognizable cluster of discovery-related attributes that plausibly play a role in scientific progress other than Attribute Cluster 2, this could be important for understanding when and how new fields make scientific progress. We believe this possibility is rendered at least somewhat plausible by the cases above and deserves further study.


The cases of Volta and Galvani, Galileo, and Black illustrate a hypothesized pattern in the development of fields of science which we call “early stage science.”

We hypothesize that:

  1. Some scientific fields develop from initial investigations in nascent fields to highly functional knowledge acquisition programs.
  2. The histories of the development of these highly functional knowledge acquisition programs are characterized by a similar, describable pattern.
  3. As part of this pattern, the relevant scientific fields have different attributes at different points in their development. More specifically, earlier in the development of the relevant scientific fields, the fields have an attribute cluster in the conceptual vicinity of Attribute Cluster 1, as suggested by numerous examples. Some of those fields later have an attribute cluster in the conceptual vicinity of Attribute Cluster 2. The fact of these attribute clusters implies the existence of phases of scientific development.
  4. These phases of scientific development arise from the facts about how people figure things out about the world. This includes how inference, experimentation, observation, theory development, tool development, and other aspects of how people can seek to understand the world around them lead researchers to be able to improve their understanding of a given phenomenon.
  5. In accordance with this, researchers need to use different tools and practices depending on their starting state of knowledge. We expect to find that there is a coherent logic to the meta-practices that lead to the development and advancement of research programs under different starting conditions.
  6. More specifically, a model of early stage scientific practice will explain how researchers overcome the difficulties that arise when attempting to gain knowledge when dealing with substantially unknown phenomena, poor tools, and so forth. When stated, this model will be both prima facie plausible and verifiable by checking against the research activities that led to important discoveries in practice.

It is consistent with the above hypothesis that many of the attributes specifically identified in the historical cases above are not actually constitutive of early stage science, so long as conceptually similar attributes can be identified which are.


The current methodology of our investigation primarily focuses on analyzing historical case studies of scientific discovery in new fields. In this section, we cover how we plan to select and analyze specific cases.

Selecting Cases

To select cases, we will attempt to identify functional modern and historical scientific research programs and identify the initial discoveries integral to their development. By starting with functional scientific research programs and working back, we are aiming to ensure that the cases of early stage research we study are success cases. The pattern that we hypothesize to exist is not meant to be a pattern that occurs in all early stage research, but rather effective early stage research. Since the effectiveness of early stage research can be difficult to assess, we believe the safest set of cases to examine are cases where the relevant field developed into a full-fledged functional research program.

To identify functional modern and historical scientific research programs, we will look for fields where there are large groups of researchers studying similar phenomena, using very similar methods, with a corpus of shared theory, and substantial predictive power. This is our initial hypothesis of signs that will enable us to identify a sufficient set of functional scientific research programs. If we encounter reasons to change our criteria, we will do so, especially if doing so will help us to better identify successful early stage research cases. It is open to us, for instance, that we would be better served by looking at Kuhnian paradigms or broad scientific consensus, rather than looking for the attributes above.

Once we identify functional scientific research programs and identify the initial discoveries integral to their development, we will then analyze how researchers made those discoveries.

For example, consider modern astronomy. As part of doing their research, large groups of modern astronomers use highly advanced telescopes and highly advanced analytical methods to find, see, and study intergalactic objects at an incredible degree of precision. Astronomers have a shared body of theory, make the same observations with different telescopes, and are able to predict with high precision what they and others will observe when they look through telescopes at different points in the sky. These attributes indicate that modern astronomy is plausibly a highly functional scientific research program.

Having selected astronomy as a plausible success case for scientific development, we can then investigate the beginnings of the field to find plausibly important and formative discoveries. In this case, Galileo and others’ work with early telescopes significantly changed how we both observe and think about objects in space. As a result, Galileo plausibly represents a case where successful early research methods might be visible and thus is fruitful to study.

Analyzing Cases

In analyzing the historical cases, we will be trying to build a coherent model of how researchers make scientific progress in the early stages of the development of scientific fields. We expect to approach this from many angles, looking at similarities and differences between cases, building causal models of the cases, and trying to state plausible general obstacles early stage researchers might encounter. By cross-checking the cases and the general model 11 against each other, we hope to improve our models of individual cases as well as our general model.

This may or may not lead to convergence on a single, intelligible general model that fits the cases and also is plausible abstractly. If it does, we will consider this evidence in favor of our hypothesis. If it does not, either because plausible general models do not fit the cases or because the general models contain attributes that are not intelligibly related to the logic of discovery in the early stages of a field, we will consider this evidence against our hypothesis.

If we reach an adequately plausible general model, we will then investigate whether it can be used to generate recommendations for present-day scientists seeking to make progress in fields operating under early stage research conditions. If a particular model relies heavily on factors local to the particular historical case, researcher, or era, it may not serve as a good template for recommendations for present-day scientists or a general theory of scientific methodology. On the other hand, if a model tends to feature potentially generalizable methodological elements with mechanistic relations (e.g., a particular model for how instrument development works at different levels of theoretical uncertainty), we can see whether the model can generate recommendations for current researchers.


Following the methodology above, we believe we will be able to select and analyze cases of discovery that have led to the development of highly functional scientific research programs. We expect our analysis to help confirm or disconfirm the existence of early stage science as a unique phase of science with a unique and understandable methodology. It is our hope that investigating this hypothesis will shed light on what methods should be used to make scientific progress in new or underdeveloped fields, and thereby help push science forward.

  1. A later paper will situate our investigation in the surrounding literature, including describing our relation to Kuhn, Popper, Lakatos, historicism, methodologism, natural epistemology, and complementary science.
  2. Black was likely not the original discoverer of the substance. It had been briefly described around 100 years prior by Jan Baptist van Helmont who called it gas sylvestre although Black was likely the first to describe its properties in detail.
  3. Rutherford’s initial name for nitrogen was “noxious air.” Labeling it phlogisticated air is often attributed to Priestly.
  4. Indeed, the name “oxygen” comes from the greek roots (oxys) meaning “acid” and (-gen¯es) meaning “producer.”

This paper was originally posted on our website.

  • Bartlett, Richard J. 2019. “Galileo’s Telescope — The What, When and How.” Telescopic Watch. Last modified May 16, 2019. https://telescopicwatch.com/galileo-telescope/.
  • Cajavilca, Christian, Joseph Varon, and George L. Sternbach. 2009. “Luigi Galvani and the Foundations of Electrophysiology.” Resuscitation 80 (2): 159–62. doi:10.1016/j.resuscitation.2008.09.020.
  • Dalby, Robert. 2010. “Looking through Galileo’s Telescope — Practical Comparison.” Astronomy and Nature TV. Last modified September 27, 2019. 12 https://www.youtube.com/watch?v=nzXnnwxJmSg.
  • Quinn, Jim. 2017. “Stargazing with Early Astronomer Galileo Galilei.” Sky Telescope. Last modified May 9, 2019. https://www.skyandtelescope.com/astronomy-resources/stargazing-with-galileo/.
  • Robison, John. 1803. Lectures on the elements of chemistry by the late Joseph Black. Edinburgh: W. Creech.
  • “Spark.” 2011. Shock and Awe: The Story of Electricity. BBC Four. Van Helden, Albert. 1994. “Telescopes and Authority from Galileo to Cassini.” Osiris 9: 8–29. doi:10.1086/368727.
  • West, John B. 2014. “Joseph Black, Carbon Dioxide, Latent Heat, and the Beginnings of the Discovery of the Respiratory Gases.” American Journal of Physiology-Lunch Cellular and Molecular Physiology 306 (12): L1057–63.


Conversational Cultures: Combat vs Nurture (V2)

17 января, 2020 - 23:23
Published on January 17, 2020 8:23 PM UTC

You are viewing Version 2 of this post: a major revision written for the LessWrong 2018 Review. The original version published on 9th November 2018 can be viewed here.

See my change notes in this comment for updates I think are especially significant.

Combat Culture

I went to an orthodox Jewish high school in Australia. For most of my early teenage years, I spent one to three hours each morning debating the true meaning of abstruse phrases of Talmudic Aramaic. The majority of class time was spent sitting opposite your chavrusa (study partner, but linguistically the term has the same root as the word “friend”) arguing vehemently for your interpretation of the arcane words. I didn’t think in terms of probabilities back then, but if I had, I think at any point I should have given roughly even odds to my view vs my chavrusa’s view on most occasions. Yet that didn’t really matter. Whatever your credence, you argued as hard as you could for the view that made sense in your mind, explaining why your adversary/partner/friend’s view was utterly inconsistent with reality. That was the process. Eventually, you’d reach agreement or agree to disagree (which was perfectly legitimate), and then move onto the next passage to decipher.

Later, I studied mainstream analytic philosophy at university. There wasn’t the chavrusa, pair-study format, but the culture of debate felt the same to me. Different philosophers would write long papers explaining why philosophers holding opposite views were utterly confused and mistaken for reasons one through fifty. They’d go back and forth, each arguing for their own correctness and the others’ mistakeness with great rigor. I’m still impressed with the rigor and thoroughness of especially good analytic philosophers.

I’ll describe this style as combative, or Combat Culture. You have your view, they have their view, and you each work to prove your rightness by defending your view and attacking theirs. Occasionally one side will update, but more commonly you develop or modify your view to meet the criticisms. Overall, the pool of arguments and views develops and as a group you feel like you’ve made progress.

While it’s true that you’ll often shake your head at the folly of those who disagree with you, the fact that you’re bothering to discuss with them at all implies a certain minimum of respect and recognition. You don’t write lengthy papers or books to respond to people whose intellect you have no recognition of, people you don’t regard as peers at all.

There’s an undertone of countersignalling to healthy Combat Culture. It is because recognition and respect are so strongly assumed between parties that they can be so blunt and direct with each other. If there were any ambiguity about the common knowledge of respect, you couldn’t be blunt without the risk of offending someone. That you are blunt is evidence you do respect someone. This is portrayed clearly in a passage from Daniel’s Ellsberg recent book, The Doomsday Machine: Confessions of a Nuclear War Planner (pp. 35-36):

From my academic life, I was used to being in the company of very smart people, but it was apparent from the beginning that this was as smart a bunch of men as I have ever encountered. That first impression never changed (though I was to learn, in the years ahead, the severe limitations of sheer intellect). And it was even better than that. In the middle of the first session, I ventured--though I was the youngest, assigned to taking notes, and obviously a total novice on the issues--to express an opinion. (I don’t remember what it was.) Rather than showing irritation or ignoring my comment, Herman Kahn, brilliant and enormously fat, sitting directly across the table from me, looked at me soberly and said, “You’re absolutely wrong.”

A warm glow spread throughout my body. This was the way my undergraduate fellows on the editorial board of the Harvard Crimson (mostly Jewish like Herman and me) had routinely spoken to each other: I hadn’t experienced anything like it for six years. At King’s College, Cambridge, or in the Society of Fellows, arguments didn’t take this gloves-off, take-no-prisoners form. I thought, “I’ve found a home.” [emphasis and paragraph break added]

That a senior member of the RAND group he had recently joined was willing to be completely direct in shooting down his idea didn’t cause the author to shut down in anguish and rejection, on the contrary, it made it author feel respected and included. I’ve found a home.

Nurture Culture

As I’ve experienced more of the world, I discovered that many people, perhaps even most people, strongly dislike combative discussions where they are being told that they are wrong for ten different reasons. I’m sure some readers are hitting their foreheads and thinking “duh, obvious,” yet as above, it’s not obvious if you’re used to a different culture. 

While Combat Culture prioritizes directness and the free expression of ideas, in contrast, Nurture Culture prioritizes the comfort, wellbeing, and relationships of participants in a conversation. It wants everyone to feel safe, welcome, and well-regarded within a general spirit of  “we’re on the same side here”.

Since disagreement, criticism, and pushback can all lead to feelings of distance between people, Nurture Culture tries to counter those potential effects with signals of goodwill and respect.  Nurture Culture wants it to be clear that notwithstanding that I think your idea is wrong/stupid/confused/bad/harmful, that doesn’t mean that I think you’re stupid/bad/harmful/unwelcome/enemy, or that I don’t wish to continue to hear your ideas. 

Nurture Culture makes a lot of sense in a world where criticism and disagreement are often an attack or threat– people talk at length about how their enemies and outgroups are mistaken, never about how they’re correct. Even if I have no intention to attack someone by arguing that they are wrong, I’m still providing evidence of their poor reasoning whenever I critique. 

There is a simple entailment: holding a mistaken belief or making a poor argument is evidence of poor reasoning such that when I say you are wrong, I’m implying, however slightly, that your reasoning or information is poor. And each time you’re seen to be wrong, you (rightly) lose a few points in people’s estimation of you. It might be a tiny fractional loss of a point of respect and regard, but it can’t be zero.

So some fraction of the time people express disagreement because they generally believe it, but also in the wider world, a large fraction of the time people express disagreement it is as an attack [xx1]. A healthy epistemic Nurture Culture works to make it possible to safely have productive disagreement by showing that disagreement is safe and not intended as an attack.

To offer a concrete example of Nurture Culture and why you might want it, I wrote the following fictional account of Alex Percy:

I was three months into my new role as an analyst at Euler Associates. It was my first presentation making any recommendations of consequence. It was attended by heads of three other departments. Senior managers who managed teams of hundreds and budgets of millions. I was nervous. Sure, I had a doctorate in operations management, but that was merely classroom and book knowledge. 

Ten minutes into the meeting I’d finished my core pitch. Ms. Fritz seemed mildly irritated, Mr. Wyman radiated skepticism. My heart sank. I mean, I shouldn’t have been surprised. If  (if!) I keep my job long enough, I’m sure I’ll get there...

“You lost me on slide 4, but let’s talk it through. It seems you’re assuming that regional sales growth is going to plateau, which I hadn’t been assuming myself, but I could see it being true. Let’s assume it for now and can chat for a few minutes to see if your right about the flow-through effects”, these were the first words from Dr. Li.

I felt elation. Engagement! A chance to explain my reasoning! Being taken seriously! Maybe contributing to Euler Associates wouldn’t be such a painful grind.



My heart sank. I mean, not really surprising. If (if!) I keep my job long enough, I’m sure I’ll get there...

“To get that result,  you assume regional sales growth is going to plateau. That would be very surprising. So that must be wrong.” 

Such was the terse response to my months of work. I’d triple-checked that premise– it wasn’t certain but it was reasonable. Was I supposed to argue back with someone who probably doubted I was worth their time? Did I meekly accept that I hadn’t justified this enough in my pitch and try to do better next time? My boss gave no indication when I look at him. Ach, I’m such a fool. I should have known you need to really justify the controversial points. My first big presentation and I blew it.


It might be a virtue to be tough and resilient, but it also costly to to bring that degree of emotional fortitude to bare. Someone might be pushing against a strong prior that they are unwelcome or that others are trying to make them push uphill contribute.

The first scenario describes Nurturing behavior. A busy senior manager signaling to the new person: “I’m going to give you a chance, I know you want to help.”
If nothing else, I expect the consulting firm where Nurturing behavior is commonplace to be a far more pleasant place to work, and probably better for people’s health and wellbeing.

The Key Cultural Distinction: Significance of a Speech-Act

It is possible to describe to Combat and Nurture cultures by where they fall on a number of dimensions such as adversarial-collaborative, “emotional effort and concern”, filtering, etc. I discuss these dimensions in Appendix Axx

However, I believe that the key distinction between Combat and Nurture cultures  is found in one primary question:

Within a culture, what is the meaning/significance of combative speech acts? 

By combative speech acts, I mean things like blatant disagreement, directly telling people they are wrong, and arguing one-sidedly against someone’s ideas with no attempt to find merit. The question is, within a culture, do these speech acts imply negative attitudes or have negative effects for the receiver? Or, conversely, are they treated like any other ordinary speech? 

The defining feature of a Combat Culture is that these traditionally combative speech acts do not convey threat, attack, or disdain. Quite the opposite– when Herman Kahn says to Daniel Ellsberg, “you’re absolutely wrong”, Ellsberg interprets this as a sign of respect and inclusion. The same words said to Alex Percy by the senior managers of Euler Associates are likely to be interpreted as a painful “I’m not interested in you or what you have to say.”

And like in language in general, the significance of speech acts (combative or otherwise) needs to be coordinated between speakers and listeners. Obviously, issues arise when people from different cultures assign different meanings to the same thing in an interaction. Someone with Nurture Culture meanings associated to speech-acts can feel attacked in a Combative space. A Combatively-cultured person [xx2]  in a Nurture culture space can make others feel attacked.

The prevailing culture often isn’t clear, and often spaces are mixed. You can imagine how that goes.

When does each culture make sense?

In the original version of this post, I mostly refrained from commenting on which culture was better. At this point, I think I can say a little more.

Combat Culture has a number of benefits that make it attractive, particularly for truth-seeking discussion: it has greater freedom of speech, no extra effort is required to phrase things politely, one doesn’t need to filter as much or dance around, and less attention is devoted to worrying about offending people.

And as Said Achmiz mentioned in a comment on the original post, “generally speaking, as people get to know each other, they tend to relax around each other, and drop down into a more “combative” orientation in discussions and debates with each other. They say what they think.” Or they outright countersignal. That’s some reason to have a generic preference for a culture where fewer speech acts parse as attacks or threats.

But Combat Culture only works when you’ve established that the standardly hostile speech acts aren’t unfriendly. It works when you’ve got a prior of friendliness, or have generally built up trust. And that means it tends to work well where people know each other well, have common interests, or some other strong factor that makes them feel aligned and on the same team. Something that creates a sense of trust and safety that you’re not being attacked even if your beliefs, works, or choices are.

Interestingly, while in any given particular exchange Combat Culture might seem to be more efficient and “pay less communicative overhead", that’s only possible because the Combat Culture previously invested “overhead” into creating a high-safety context. It is because you’ve spent so much time with a person and signaled so much caring and connection that you’re now able to bluntly tell them that they’re absolutely wrong.

However, there’s more than one path to being able to tell someone point-blank that they’re dead wrong. Being old friends is one method, but you also get the relevant kind of trust and safety in contexts where not everyone knows each other well, like in philosophy and mathematics departments, as well as my Talmud class and Ellsberg’s group at RAND. 

In those contexts, I believe the mechanics work out to produce priors such as everyone makes mistakes; mistakes aren’t a big deal; you point out mistakes and move on. Perhaps it’s because in mathematics mistakes are very legible once identified and so everyone is seen to make them, or that for the whole endeavor to work, people have to people to point them out and people get completely used to it as normal. In philosophy, everyone’s been saying that everyone else is wrong for millennia. Someone saying you’re wrong just means you’re part of the game.

Somehow, these domains have done the work required to give blunt disagreement different significance than is normal in the wider world. They’ve set up conducive priors [A2] for healthy combat.

Combat without Safety, Nurture without Caring

One of my regrets with the original Combat vs Nurture post is that some people starting describing generically aggressive and combative conversation cultures as “Combat Culture.” No! I meant more specific and better! As clarified above, I meant specifically a culture where combative speech-acts aren’t perceived as threatening by people. If people do feel threatened or attacked, you don’t have a real Combat Culture!

Others pointed out that there are non-Combative cultures that aren’t actually nurturing at all. This is entirely correct. You can avoid blatantly being aggressive while still being neutral to hostile underneath. That isn’t what I meant to call Nurture Culture.

In a way, both Combat Culture and Nurture are my attempted steelmen for two legitimate conversational cultures which are situated within a larger space of many other conversational cultures. Abram Demski attempts to do a better job of carving up the entire space.

Know thyself and others

Compared to 5-10 years ago, I’ve updated that people operating in different cultures from my own are probably not evil. To my own surprise, I’ve seen myself both be an advocate for Combat and Nurture at different points.

I continue to think the most important advice in this space is being aware of both your own cultural tendencies and those of the people around you. Even if others are wrong, you’ll probably do better by understanding them and yourself relative to them.

Further Content


Fiddle Effects Tech

17 января, 2020 - 20:00
Published on January 17, 2020 5:00 PM UTC

Imagine you're a fiddle player who primarily plays without effects, but would occasionally like to be able to play with them. What can you do?

One option is to put a pickup on the fiddle and run that into guitar pedals. This will work, but pickups generally sound much worse than clip-on mics like the ubiquitous AT PRO-35. Since you're mostly playing uneffected, you don't want to give that up.

Another option is to get a vocal effects processor. For example, I have a VoiceTone D1. These take balanced XLR from the mic, send balanced XLR to the board, and provide phantom power, so they make a lot of sense technically. Unfortunately, since they are designed for a vocal signal I've found they sound pretty crummy when applied to fiddle or mandolin.

A mixer should work here, but it seems like both overkill and a hack? Get a small mixer with phantom power, run the output of the mixer into a guitar pedal, run the output of the guitar pedal into another channel on the mixer. For example, with a cheap mixer like the Behringer Xenyx 502 you would:

  • run the mic into channel one, which provides phantom power
  • pan channel one hard right
  • run the right main output into the guitar pedal
  • run the guitar's pedal's output into channel 2 (left)
  • set the 2/3 balance to hard left
  • run the left main output (balanced) to the mixer
A slightly less cheap mixer (ex: the Soundcraft Notepad-5, which I have) lets you do this with a monitor channel instead of panning, but it's still not ideal.

You're also in a funny place impedance-wise, where the guitar pedal may be expecting a high impedance input or to be driving a high impedance output. If you need that you could add a reamper (ex) before the pedal and a DI (ex) after. I'm not sure whether this is something pedals tend to care about?

All of this seems like a mess to me. It seems like maybe a bunch of people would want a box with:

  • XLR input with phantom power
  • A good quality pre-amp with gain control
  • High-impedance 1/4" output to the pedal
  • High-impedance 1/4" input from the pedal
  • XLR output
  • Maybe a second dry-only XLR output
Does a box like this exist? Alternatively, are there pedals in the same form factor as the vocal effects processors, but that are designed for instruments?

Comment via: facebook


How does a Living Being solve the problem of Subsystem Alignment?

17 января, 2020 - 13:03
Published on January 17, 2020 9:32 AM UTC

So, a Living Being is composed of multiple parts who act pretty much on tandem except extreme situations like Cancer, how does that work?


Assigning probabilities to supernatural-type claims

17 января, 2020 - 06:05
Published on January 17, 2020 3:05 AM UTC

Epistemic status: I wrote this post quickly, and largely to solicit feedback on the claims I make in it. This is because (a) I’m not sure about these claims (or how I’ve explained them), and (b) the question of what I should believe on this topic seems important in general and for various other posts I’m writing. (So please comment if you have any thoughts on this!)

I’ve now read a bunch on topics related to the questions covered here, but I’m not an expert, and haven’t seen or explicitly looked for a direct treatment of the questions covered here. It’s very possible this has already been thoroughly and clearly covered elsewhere; if so, please comment the link!

I lean towards the idea that we can always assign probabilities to propositions (or at least use something like an uninformative prior), even if sometimes we have incredibly little basis for making those probabilities. Sometimes people propose what seem to me to be very weak counterexamples to that claim, such as the following:

there are situations with so many unique features that they can hardly be grouped with similar cases, such as the danger resulting from a new type of virus, or the consequences of military intervention in conflict areas. These represent cases of (Knightian) uncertainty where no data are available to estimate objective probabilities. While we may rely on our subjective estimates under such conditions, no objective basis exists by which to judge them (e.g., LeRoy & Singell, 1987). (source)

It seems obvious to me that a wealth of data is available for such cases. There have been many viruses and military interventions before. None of those situations will perfectly mirror the situations we’re trying to predict, and that’s definitely a very important point. We should therefore think very carefully about whether we’re being too confident in our predictions (i.e., using too narrow a “confidence interval”[1] and thus not adequately preparing for especially “high” or “low” possibilities).

But we can clearly do better than nothing. To start small, you’d be comfortable with the claim that a new type of virus, if it hits this year, is more likely to kill somewhere between 0 and 1 billion people than somewhere between 1000 and 1001 billion people (i.e., far more than everyone alive), right? And in fact, we have empirical evidence that some people can reliably do better than chance (and better than “0 to 1 billion”) in making predictions about geopolitical events like these, at least over timelines of a few years (from Tetlock’s work).


What about something that seems more unique or unprecedented, and where we also may have to stretch our predictions further into the future, like artificial general intelligence (AGI) timelines? On that question, experts disagree wildly, and are seemingly quite swayed by things like how the question is asked (Katja Grace on 80k; search for “It’s a bit complicated” in the transcript”). This makes me highly unconfident in any prediction I might make on the topic (and thus pushes me towards making decisions that are good given a wide range of possible timelines).

But I believe I know more than nothing. I believe I can reasonably assign some probability distribution (and then use something like the median or mean of that as if it were a point estimate, for certain purposes). If that seems like raw hubris, do you think it’s worth actually behaving as if AGI is just as likely to be developed 1 minute from now as somewhere around 2 to 300 years from now? What about behaving as if it’s likely to occur in some millennium 50 quintillion years from now, and not in this millennium? So you’d at least be fairly happy bounding your probability distribution somewhere in between those points 1 minute from now and 50 quintillion years from now, right?

One could say that all I’ve done there is argue that some probabilities we could assign would seem especially outrageous, not that we really can or should assign probabilities to this event. But if some probabilities are more reasonable than others (and it certainly seems they are, though I can’t prove it), then we can do better by using those probabilities than by using something like an uninformative prior.[2] And as far as I’m aware, principles for decision making without probabilities essentially collapse to acting as if using an uninformative prior or predictably lead to seeming irrational and bad decisions (I’ll be posting about this soon).

And in any case, we do have relevant data for the AGI question, even if we’ve never developed AGI itself - we have data on AI development more broadly, development related to computing/IT/robotics more broadly, previous transformative technologies (e.g., electricity), the current state of funding for AI, current governmental stances towards AI development, how funding and governmental stances have influenced tech in the past, etc.

Supernatural-type claims

But that leads me to what does seem like it could be a strong type of counterexample to the idea that we can always assign probabilities: claims of a “supernatural”, “metaphysical”, or “unobservable” nature. These are very fuzzy and debatable terms, but defining them isn’t my main purpose here, so instead I’ll just jump into some examples:

  1. What are the odds that “an all-powerful god” exists?
  2. What are the odds that “ghosts” exist?
  3. What are the odds that “magic” exists?
  4. What are the odds that “non-naturalistic moral realism” is correct (or that “non-natural objective moral facts” exist)?[3]

My intuitions would suggest I should assign a very low probability to each of these propositions.[4] But what basis would I have for that? More specifically, what basis would I have for any particular probability (or probability distribution) I assign? And what would it even mean?

This is Chris Smith’s statement of this apparent issue, which was essentially what prompted this post:

Kyle is an atheist. When asked what odds he places on the possibility that an all-powerful god exists, he says “2%.”

[...] I don’t know what to make of [Kyle’s] probability estimate.

[Kyle] wouldn’t be able to draw on past experiences with different realities (i.e., Kyle didn’t previously experience a bunch of realities and learn that some of them had all-powerful gods while others didn’t). If you push someone like Kyle to explain why they chose 2% rather than 4% or 0.5%, you almost certainly won’t get a clear explanation.

If you gave the same “What probability do you place on the existence of an all-powerful god?” question to a number of self-proclaimed atheists, you’d probably get a wide range of answers.

I bet you’d find that some people would give answers like 10%, others 1%, and others 0.001%. While these probabilities can all be described as “low,” they differ by orders of magnitude. If probabilities like these are used alongside probabilistic decision models, they could have extremely different implications. Going forward, I’m going to call probability estimates like these “hazy probabilities.”

I can sympathise with Smith’s concerns, though I think ultimately we can make sense of Kyle’s probability estimate, and that Kyle can have at least some grounding for it. I’ll now try to explain why I think that, partly to solicit feedback on whether this thinking (and my explanation of it) makes sense.

In the non-supernatural cases mentioned earlier, it seemed clear to me that we had relevant data and theories. We have data on previous viruses and military interventions (albeit likely from different contexts and circumstances), and some relevant theoretical understandings (e.g., from biology and epidemiology, in the virus case). We lack data on a previous completed instance of AGI development, but we have data on cases we could argue are somewhat analogous (e.g., industrial revolution, development and roll-out of electricity, development of the atomic bomb, development of the internet), and we have theoretical understandings that can guide us in our reference class forecasting.

But do we have any relevant data or theories for the supernatural-type cases?

Assuming the claim's truth can affect the world

Let’s first make the assumption (which I’ll reverse later) that these propositions, if true, would at some point have at least some, theoretically observable consequences. That is, we’ll first assume that we’re not dealing with an utterly unverifiable, unfalsifiable hypothesis, the truth of which would have no impact on the world anyway (see also Carl Sagan’s dragon).[5] This seems to be the assumption Smith is making, as he writes “Kyle didn’t previously experience a bunch of realities and learn that some of them had all-powerful gods while others didn’t”, implying that it would be theoretically possible to learn whether a given reality had an all-powerful god.

That assumption still leaves open the possibility that, even if these propositions were true, it’d be extremely unlikely we’d observe any evidence of them at all. This clearly makes it harder to assign probabilities to these propositions that are likely to track reality well. But is it impossible to assign any probabilities, or to make sense of probabilities that we assign?

It seems to me (though I’m unsure) that we could assign probabilities using something like the following process:

  1. Try to think of all (or some sample of) the propositions that we know have ever been made that are similar to the proposition in question. This could mean something like one or more of the following:

    • All claims of a religious nature.
    • All claims that many people would consider “supernatural”.
    • All claims where no one really had a particular idea of what consequences we should expect to observe if they were true rather than false. (E.g., ghosts, given that they’re often interpreted as being meant to be invisible and incorporeal.)
    • All claims that are believed to roughly the same level by humanity as a whole or by some subpopulation (e.g., scientists).
  2. Try to figure out how many of these propositions later turned out to be true.

    • This may require debating what counts as still being the same proposition, if the proposition was originally hardly specified. For example, does the ability to keep objects afloat using magnets count as levitation?
  3. Do something along the lines of reference class forecasting using this “data”.

    • This’ll likely require deciding whether certain data points count as a relevant claim turning out to not be true or just not yet turning out to be true. This may look like inside-view-style thinking about roughly how likely we think it’d be that we’d have observed evidence for that claim by now if it i_s_ true.
    • We might do something like giving some data points more or less “weight” depending on things like how similar they seem to the matter at hand or how confident we are in our assessment of whether that data point “turned out to be true” or not. (I haven’t thought through in detail precisely how you’d do this; you might instead construct multiple separate reference classes, and then combine these like in model combination, giving different weights to the different classes.)
  4. If this reference class forecasting suggests odds of 0%, this seems too confident; it seems that we should never use probabilities of 0 or 1. It seems that one option for handling this would be Laplace’s solution to the rule of succession.

    • For example, if we found that 18 out of 18 relevant claims for which we “have data” “turned out to be false”, our reference class forecast might suggest there’s a 1_00% c_hance (because 18/18=1) that the claim under consideration will turn out to be false too. To avoid this absolute certainty, we add 1 to the numerator and 2 to the denominator (so we do 19/20=0.95), and find that there’s a 95% chance the claim under consideration will turn out to be false too.
    • There may be alternative solutions too, such as letting the inside view considerations introduced in the next step move one away from absolute certainty.
  5. Construct an inside-view relevant to how likely the claim is to be true. This may involve considerations like:

    • Knowledge from other fields like physics, and thinking about how consistent this claim is with that knowledge (and perhaps also about how well consistency with knowledge from other fields has predicted truth in the past).
    • The extent to which the claim violates Occam’s razor, and how bad it is for a claim to do so (perhaps based on how well sticking to Occam’s razor has seemed to predict the accuracy of claims in the past).
    • Explanations for why the claim would be made and believed as widely as it is even if it isn’t true. E.g., explanations from the evolutionary psychology of religion, or explanations based on how memetics.
  6. Combine the reference class forecast and the inside view somehow. (Perhaps qualitatively, or perhaps via explicit model combination.)

I don’t expect that many people actually, explicitly use the above process (I personally haven’t). But I think it’d be possible to do so. And if we want to know “what to make of” probability estimates for these sorts of claims, we could perhaps think of what we actually do, which is more implicit/intuitive, as approximating that explicit process. (But that’s a somewhat separate and debatable claim; my core claims are consistent with the idea that in practice people are coming to their probability assignments quite randomly.)

Another, probably more realistic way people could arrive at probability estimates for these sorts of claims is through:

  1. Do some very vague, very implicit version of the above.

    • E.g., just “thinking about” how often things “like this” have seemed true in the past (without actually counting up various cases), and “thinking about” how likely the claim seems to you, when you bear in mind things like physics and Occam’s razor.
  2. Then introspect on how likely this claim “feels” to you, and try to arrive at a number to represent that.

    • One method to do so is Hubbard’s “equivalent bet test” (described here).

Many people may find that method quite suspicious. But there’s evidence that, at least in some domains, it’s possible to become fairly “well calibrated”, and thus do better than chance at assigning probability estimates, following “calibration training” (see here and here). Ideally, the person using that method would have engaged in such calibration training before. If they have, they might add a third step, or add as part of step 2, an adjustment to account for them tending to over- or underestimate probabilities (or perhaps probabilities of roughly this kind).

I’m not aware of any evidence of whether people can become well-calibrated for these “supernatural-type claims”. And I believe there’s somewhat limited evidence on how well calibration training generalises across domains. So I think there are major reasons for skepticism, which I’d translate into large confidence intervals on my probability distributions.

But I’m also not aware of any extremely compelling arguments or evidence indicating that people wouldn’t be able to become well-calibrated for these sorts of claims, or that calibration training wouldn’t generalise to domains like this. So for now, I think I’d say that we can make sense of probability estimates for claims like these, and that we should have at least a very weak expectation that methods like the above will result in better probability estimates than if we acted as though we knew nothing at all.

Assuming the claim's truth can't affect the world

I think the much trickier case is if we assume that the truth of these claims would never affect the (natural/physical/whatever) world at all, and would thus never be observable. I think the standard rationalist response to this possibility is dismissiveness, and the argument that, under those conditions, whether or not these claims are true is an utterly meaningless and unimportant question. The claims are empty, and not worth arguing about.

I find this response very compelling, and it’s the one I’ve typically gone with. I think that, if we can show that probabilities can be meaningfully assigned to all claims that could ever theoretically affect the natural world at all, that’s probably good enough.

But what if, for the sake of the argument, we entertain the possibility that some claims may never affect the natural world, and yet still be important? Me not dismissing that possibility outright and immediately may annoy some readers, and I can sympathise with that. But it seems to me at least interesting to think about. And here’s one case where that possibly actually does seem to me like it could be important:

What if non-naturalistic moral realism is “correct”, and what that means is that “moral facts” will never affect the natural world, and will thus never be observable, even in principle - but our actions are still somehow relevant to these moral facts. E.g., what if it could be the case that it’s “good” for us to do one thing rather than another, in a sense that we “really should” care about, but “goodness” itself leaves no trace at all in the natural world. (This could be something like epiphenomenalism, but here I’m going quite a bit beyond what I really know.)

In this case, I think reference forecasting is useless, because we’d never have any data on the truth or falsehood of any claims of the right type.

But at first glance, it still seems to me like we may be able to make some headway using inside views or something like arriving at a “feeling” about the likelihood and then quantifying this using the equivalent bet test. I’m very unsure about that, because usually those methods should rely on at least some, somewhat relevant data. But it seems like perhaps we can still usefully use considerations like how often Occam’s razor has worked well in the past.

And this also reminds me of Scott Alexander’s post on building intuitions on non-empirical arguments in science (additional post on that here). It also seems reminiscent of some of Eliezer Yudkowsky’s writing on the many-worlds interpretation of quantum mechanics, though I read those posts a little while ago and didn’t have this idea in mind at the time.[6]

Closing remarks

This quick post has become longer than planned, so I’ll stop there. The basic summary is that I tentatively claim we can always assign meaningful probabilities, even to supernatural-type (or even actually supernatural) claims. I’m not claiming we should be confident in these probabilities, and in fact, I expect many people should massively reduce their confidence in their probability estimates. I’m also not claiming that the probabilities people actually assign are reliably better than chance - that’s an empirical question, and again there’d likely be issues of overconfidence.

As I said at the start, a major aim of this post is to get feedback on my thinking. So please let me know what you think in the comments.

  1. See this shortform post of mine for other ways of describing the idea that our probabilities might be relatively “untrustworthy”. ↩︎

  2. I think that my “1 minute” example doesn’t demonstrate the superiority of certain probability distributions to an uninformative prior. This is because we could argue that the issue there is that “1 minute from now” is far more precise than “2 to 300 years from now”, and an uninformative prior would favour the less precise prediction, just as we’d like it too. But I think my other example does indicate, if our intuitions on that are trustworthy, that some probability distributions can be superior to an uninformative prior. This is because, in that example, predictions mentioned spanned the same amount of time (a millennium), just starting at different points (~now vs ~50 quintillion years from now). ↩︎

  3. These terms can be defined in many different ways. Footnote 15 of this is probably a good quick source. This page is also relevant, but I’ve only skimmed it myself. ↩︎

  4. Though in the case of non-naturalistic moral realism, I might still act as though it’s correct, to a substantial extent, based on a sort of expected value reasoning or Pascal’s wager. But I’m not sure if that makes sense, and it’s not directly relevant for the purposes of this post. (I hope to write a separate post about that idea later.) ↩︎

  5. I acknowledge that this may mean that these claims aren’t “actually supernatural”, but they still seem like more-challenging-than-usual cases for the idea that we can always assign meaningful probabilities. ↩︎

  6. To be clear, I’m not necessarily claiming that Alexander or Yudkowsky would approve of using this sort of logic for topics like non-naturalistic moral realism or the existence of a god, rather than just dismissing those questions outright as meaningless or utterly inconsequential. I’m just drawing what seems to me, from memory, some potential connections. ↩︎


Against Rationalization II: Sequence Recap

17 января, 2020 - 01:51
Published on January 16, 2020 10:51 PM UTC

Previously: Eliezer's "Against Rationalization" sequence

I've run out of things to say about rationalization for the moment. Hopefully there'll be an Against Rationalization III a few years from now, but ideally some third author will write it.

For now, a quick recap to double as a table of contents:


Using Expert Disagreement

17 января, 2020 - 01:42
Published on January 16, 2020 10:42 PM UTC

Previously: Testing for Rationalization

One of the red flags was "disagreeing with experts". While all the preceding tools apply here, there's a suite of special options for examining this particular scenario.

The "World is Mad" Dialectic

Back in 2015, Ozymandias wrote:

I think a formative moment for any rationalist– our “Uncle Ben shot by the mugger” moment, if you will– is the moment you go “holy shit, everyone in the world is fucking insane.”First, you can say “holy shit, everyone in the world is fucking insane. Therefore, if I adopt the radical new policy of not being fucking insane, I can pick up these giant piles of utility everyone is leaving on the ground, and then I win.”Second, you can say “holy shit, everyone in the world is fucking insane. However, none of them seem to realize that they’re insane. By extension, I am probably insane. I should take careful steps to minimize the damage I do.”I want to emphasize that these are not mutually exclusive. In fact, they’re a dialectic (…okay, look, this hammer I found is really neat and I want to find some place to use it).

(I would define a "dialectic" as two superficially opposite statements, both of which are true, in such a way that resolving the apparent paradox forces you to build a usefully richer world-model. I have not run this definition past Ozy, much less Hegel.)

To which Eliezer replied in 2017:

But, speaking first to the basic dichotomy that’s being proposed, the whole point of becoming sane is that your beliefs shouldn’t reflect what sort of person you are. To the extent you’re succeeding, at least, your beliefs should just reflect how the world is.Good News, Everyone!

I did an empirical test and found no sign of a general factor of trusting your own reasoning.

But I did find lots of disagreement. What are people deciding based on?

Types of DisagreementExplicit

The expert actually said the opposite of your conclusion.

This is the simplest form of disagreement. The expert could still be wrong, or lying, or communicating unclearly and not actually saying what you think they said. But at least there's no complications from the form of disagreement itself.

The "communicating unclearly" hypothesis should always be high in your probability space. Communication is hard. (It could be spun as "you misunderstood", but this makes the theory less pleasant without altering its substance, so generally don't bother.)

The lying theory is tricky, as it can explain anything. A good lying-based theory should explain why the expert told that particular lie out of millions of possibilities. Without that, the theory contains a contrived co-incidence, and should be penalized accordingly.

With explicit disagreement, you may be able to find the expert's reasoning. If so, try to find the crux and recurse. If you can't follow the reasoning, this is a sign that the expert either understands something important that you don't, or is spouting complete bullshit. Look for signs that the expert has substance, such as real-world accomplishments or specific endorsements from notably perceptive and honest people.

Superlative or Silence

If an expert oncologist says "glycolisis inhibitors are the best available treatment for this sort of cancer" -- and you think it might be better to use oil-drop immersion to sequence the tumor cells, find the root mutation, and CRISPR an aggressive protease into that spot -- then technically you are disagreeing. This is similar to if she is asked for something better than glycolisis inhibitors and says nothing.

But it's not really an active disagreement. Most likely she's never considered the sequence-and-CRISPR possibility.

(Alternative theory: she doesn't consider a therapy to be "available" if it isn't well-tested and endorsed by appropriate parties, even if it is something an agenty patient with moderate wealth could probably obtain. Communication is hard!)

Nobody considers every possibility. What people who try usually do in practice is depend on a large community to do the search. This means that if an individual disagrees with you via superlative or silence, your real disagreement is with the community they depend on.

Implied by Action

An expert takes actions which suggest a disagreement.

A venture capitalist declines to invest in a cancer-curing startup. Does this mean he thinks the cure doesn't work?

Maybe there's other factors in the decision. Maybe he thinks the founder has no head for business, and the company is fatally underestimating regulatory hurtles, but he would invest in the same technology in better hands. Unless you have a very close seat of observation, this looks the same to you.

Or maybe his values are not what you think. Maybe he's very focused on short-term returns, and a company which is five years away from revenue is of no interest to him.

Types of ExpertsIndividuals

By far the most straightforward.

Do consider how much of an expert the person is on this exact question. Many experts have broad knowledge, which comes at an opportunity cost of depth. Do you know more about medicine than a randomly chosen doctor? Almost certainly not. (Unless you are a doctor yourself, of course.) Do you know more about one particular disease that said doctor doesn't specialize in? Perhaps one that you have? Entirely possible. More about the exact variation which you have? (Cases of a disease are not identical.) Quite likely.

Also consider how much effort the expert put into forming their opinion. Do they have skin in the game? Is it their job to worry about questions like this? (And if so, how dutiful are they?) If this is their way of brushing you off? (If so, respect that decision and go ponder elsewhere.) The further down the list, the more likely they could be wrong because they don't care enough to be right.

Institutions or Communities

Eliezer just wrote an entire book on the subject. You'd think I wouldn't have much to add.

In fact I do. In addition to everything he wrote, consider the size of the community. This is especially relevant to superlative or silence disagreements, where the key question is "Did anybody evaluate this?"

The ratio of particle physicists to types of fundamental particles is about a thousand to one, and the number of ways those particles can interact is pretty limited. The odds that you've thought of a good question about particle physics which no established physicist already considered are pretty low. (Unless you have studied deeply in the field, of course.)

The ratio of entomologists to species of insect is closer to one to a thousand. Asking a good question about insects which no entomologist has seriously considered is pretty easy.

(The ratio of intestinal microbiologists to species in the intestinal microbiome is unknown -- we don't have a good estimate of the latter.)

I suspect this is an important piece of why Eliezer was able to outdo the psychiatric community regarding Seasonal Affective Disorder. True, there are a lot of psychiatric researchers, but there are a lot of conditions for them to study, and SAD isn't a high priority. Once you zoom in to those working on treatmentment-resistant-SAD, the numbers aren't that high, and things fall through the gaps.

An order-of-magnitude fermi estimate can be useful here.

Traditions or Biology

Traditions and biology have something in common: they are formed by evolution.

Evolution does not share your values. You can often tell what its values are, but it's a constant effort to remember this.

It also has a tendency to take undocumented dependencies on circumstances. It can be really hard to figure out which aspects of the environment cause a solution evolution picked to be the ideal one.


Zvi has written about this at length.

One thing I'd add is low-friction. This is especially important for refining odds far from 50%. If someone's offering a bet on "pi will equal 3 tomorrow" and I can buy "no" for $0.96 and get $1.00 when it resolves, but the betting site will take 5% of my winnings as a fee, I'm not going to bet. So the odds on the proposition will stay at the absurdly high 4%.

Tying up money that has been bet counts as a type of friction. If $0.96 will get me $1.00 in a year with no fee, but index funds are getting 5%/year and I can't do both, I have very little reason to take the bet. Labor can also be a type of friction, especially if bet sizes are capped. If I could gain $100 but it would require five hours of wrestling with terrible UIs or APIs (or studying cryptocurrencies), there's little incentive to bet.

Zvi describes how friction like this can drive people away from a market and make it too small to be useful. And that may well be the primary effect. But it can also cause asymmetric distortions, driving all probabilities toward the middle.


Bay Solstice 2019 Retrospective

16 января, 2020 - 20:15
Published on January 16, 2020 5:15 PM UTC

I was the Creative Director for last year’s Winter Solstice in the Bay Area. I worked with Nat Kozak and Chelsea Voss, who were both focused more on logistics. Chelsea was also the official leader who oversaw both me and Nat and had final say on disputes. (However, I was granted dictatorial control over the Solstice arc and had final say in that arena.) I legit have no idea how any one of us would have pulled this off without the others; love to both of them and also massive respect to Cody Wild, who somehow ran the entire thing herself in 2018.

While I worked with a bunch of other people on Solstice, all opinions expressed here are my own. 

ProcessBeginnings & research phase

Chelsea, Nat, and I spent just about a full year preparing for Solstice. Our first meeting as a group was on January 20th, and we received official confirmation that we’d be in charge about a month later. In the following month, the three of us had roughly weekly planning meetings via video chat. In the first meeting, we set goals for what we wanted to have done by the first of April, and after that we checked in regularly for a while.

My goal by the first of April was to have a rough outline of the entire arc and a plan for how to make the tone be coherent. I also wanted to get a better handle on the task, and to that end I had conversations with several previous Solstice organizers and also read as much as I could. This included Ray’s sequence on ritual, most of the writing on the Secular Solstice website (both the blog and the resources section were useful), discussions on the Rational Ritual Facebook group, and whatever else I could find. 

I also spent a full day listening to every single song that had ever been recommended for Solstice (see below) and making notes on them in a spreadsheet. When I ran out of songs in that reference class, I sorted my iTunes library by most listens and started going through to see if any of those songs might fit. This was quite surprisingly fruitful - I actually ended up using three of the songs that were originally put on my list in this way.

Another thing I did was go around and ask people what they wanted out of Solstice. I got responses that were actually fairly useful, like “I lose interest during long speeches” and “there should be more singalongs” and “I like singalongs but a lot of the songs have really complicated tunes and I can’t handle it.”

I think even with all that research and preparation, I still didn’t have a very good sense of the history of Solstice. I had only been to two big Bay Area Solstices, plus a private Solstice in the woods, plus a short, small, off-the-cuff Solstice that Habryka and I ran at my mom’s house in 2018. I wasn’t around for early Solstices, and I’d never seen what they were like in Boston, Seattle, or New York. 

Creating the setlist

The first setlist

I met the April 1st deadline for having an outline of the entire arc. I drew up my first setlist in my notebook on a five-hour plane ride. I was taking into consideration more small snippets of advice than I can list here, but I can quote the guiding goal I referred to throughout the entirety of my time working on this Solstice:

I want my Solstice to be about the brokenness of the world and the ethos of "somebody has to and no one else will", but also the fact that even if you exercise your individual agency to do the most you can, we still might fail. 

Below are some sample pages from my notebook, with most names redacted. Note that I didn't ask any of these people before assigning them to songs; this was just idle speculation on a plane ride about who might be able to pull off each song.

Some sample pages from my notebook

After that was a process of constant iteration. 


We had our first full run-through of the setlist on June 16th. In the two months leading up to this, Chelsea checked in with me weekly. I had a lot of speeches to write and planned to write one and send it to her each week (which I more or less accomplished). I found this accountability mechanism quite helpful.

The first run-through was just me, Habryka, and Chelsea. We sat on the floor of my bedroom, played recordings of the songs to sing along with, and took turns reading the speeches. For this run-through, I wrote three new speeches, re-used two existing Solstice speeches, and threw in Pale Blue Dot, an abridged version of the poem Ring Out, Wild Bells, and the Sequences post On Doing The Impossible. After the run-through, which took about two hours (we timed each piece), the three of us debriefed, and I used the extensive notes I took to make changes to the setlist.

I felt burned out after this and had the luxury of taking all of July off from Solstice work. This was a huge benefit of starting so early.

The next run-through was in September, scheduled around the dates when our featured musician, Taylor (who lives in Vermont) would be in town. This was significantly more involved but still fairly informal. Seven people participated in total (including Ray, who was experimenting with projection), we provided our own instrumentation for most of the songs, and most of the speeches were read by the people who would eventually give them. Afterwards, each person individually gave me feedback, and again, I made significant changes to the setlist in response.

Regular coworking hours

Around the time of the September run-through, things really picked up. Going into crunch time, Chelsea, Nat, and I set up a regular time for Solstice coworking. We met for a couple hours every Monday night - that served both as our designated time to work on Solstice things (since we all also have day jobs) and an opportunity for in-person communication about high-context or sensitive topics.

Choosing performers

Skill level

We put out a call for auditions in August, but we didn’t publicize it very well, so we didn’t receive many applications. I required auditions for speakers and singalong vocalists, but not for instrumentalists - in retrospect, this was an obvious mistake. I let almost all of the instrumentalists who applied participate in as many songs as they wanted, but there were issues with everything from not having time to rehearse to not being able to play in the key the vocalist wanted to disagreements on chords and time signatures. 

Since we didn’t get enough applications to fill all the slots, I reached out to people who I already knew were competent from seeing them perform in previous Solstices or in other contexts (such as at jam sessions or in the REACH Musical Revue). This ended up working quite well - I think that everyone I chose in this way gave a good performance.

NYC has often hired professional musicians, rather than having community members provide the instrumentation - a possibility I knew about but never seriously considered. While I still wouldn’t pay for professional musicians, I’ve come to understand better why someone might decide to do so, after experiencing the difficulties (mentioned above) of getting amateur musicians to produce a performance-ready piece.

On the other hand, a lot of the amateurs did do an excellent job! So I think the takeaway here is just that it’s important to have people audition (or be very sure of their competence level in some other way) and make sure they’re able to put in the time commitment to rehearse.

Orientation towards Solstice as a whole

Something I struggled with a fair amount was disagreements about arc cohesion vs showcasing technical skill. Arc cohesion trumped all other concerns for me, with singalongability a close second. However, it’s often the case that when singers are given the chance to perform, they want to do something more interesting than just lead the melody of a singalong, so I was sometimes at loggerheads with performers who wanted to do more complex pieces or include intricate harmonies. There are some pieces for which I regret not being firmer about putting my foot down on this issue, and I think that ultimately it’s probably reasonable to exclude performers on this basis if you can’t come to an agreement.

On the flip side, I was extremely grateful to the people whose pieces I cut after the dress rehearsal, a week before the performance. I apologized to all of them and gave them the opportunity to contest the decision, but they were all really great about it and said things along the lines of “I care way more about Solstice being better than I do about cashing out on the work I did.” To those people - I really appreciate you, and thank you for being wonderful!

Final preparations

In late November we did a walk-through of the venue (important for testing planetarium footage and lighting options), then in early December we had a tech rehearsal at the planetarium (mostly for A/V) and a dress rehearsal (not at the planetarium). Much of the benefit here was logistical, and I won’t touch on that too much since it’s not my wheelhouse, but it also gave me a chance to see what the final product would be like. I was originally supposed to stop editing the setlist entirely on November 5th, but in reality, I made several changes to the order and even cut some pieces in response to the dress rehearsal, and the exact content of some things was kind of up in the air even on the day of. While I technically missed my deadline or whatever, I don’t regret that at all - I think my Solstice was quite significantly better than it would have been if I had stopped iterating on the setlist a month earlier.

Considerations for creating a setlistCreating a cohesive message

When it came to speeches, I took an arc-first approach. I decided what message fit in every place in the arc and found a piece that fit there. Only after that did I approach potential speakers. I ended up using four existing pieces wholesale; the rest were based on existing pieces but were either remixed or heavily edited by the speaker. The takeaway action item here is to have something very specific in mind for each speech and conveying that to the speaker. This improves cohesion while still allowing each speaker to put their own twist on the piece if they so choose.


  • Tessa remixed Ray’s A Bottomless Pit of Suffering to make more sense in Berkeley (it was originally given in NYC, where it’s very cold) and to have more of her own voice.
  • I had Nate Soares on the docket from quite early on in the process, but he’s busy executive-directing MIRI so I didn’t want to take up too much of his time. With his input, I chose an old post from his blog for him to read (How We Will Be Measured), and Chelsea and I edited it to be shorter and more appropriate as a speech. Nate then made a bunch of his own changes to reflect the ways his thinking has changed over the past few years, but ultimately it was still recognizable as the same piece.
  • Though it ended up being cut at the very last minute, Alex Altair adapted a scene from HPMoR into a speech, which was pretty cool.
Creating a smooth arc

In order to create a smooth experience, you have to make sure that there are smooth transitions  between the message, emotional tone, and musical/artistic feel of each piece and the next. It turns out to be really hard to match these all up. For example, there are some funny/upbeat/light-hearted songs about death (e.g. We Will All Go Together When We Go), and some fairly serious-sounding songs about more light-hearted topics (e.g. Time Wrote the Rocks). Some songs are up-tempo, some are slow and mournful, some have percussion, some are performed by choir. There are just a ton of considerations. (This is why Ray writes so many of his own songs - that’s the only way you can really have control over the message, tone, and feel all at once.)

I was aware of all of these considerations, and that’s a big part of the reason that I made sure to run through each version of the setlist from start to finish, but I don’t think I quite got it right until the final performance (even the dress rehearsal, one week prior, was fairly rocky in this regard). And even then there were still a few problems, like the energy drop from Son of Man to Uplift and the energy drop from Singularity to Five Thousand Years (making Five Thousand Years a rather anticlimactic finale).


The previous Solstices I had attended were just a series of pieces strung together, and the audience were mostly left to discern the arc on their own. At some point in my research, I saw a video clip of Kenzi MCing a Solstice, and I immediately decided I wanted to do that. 

In my opinion, there are a ton of advantages to having an MC. Here are a couple:

  1. It gives the audience more insight into why each piece was chosen and generally gives you a chance to tie the arc together more explicitly.
  2. It allows you to make announcements during the program without breaking immersion, such as giving trigger warnings or asking the audience to hold their applause until further notice.
  3. It fills what would otherwise be awkward pauses between pieces as performers get on and off the stage.
Post-performance feedback

About 50 people filled out the feedback survey, and their feedback falls into a few rough categories for me:

  1. “This person is right and I would change/tweak this if I had it to do over”
  2. “This feedback should not be acted upon”
  3. “Feedback on this is very split, and you can’t win ‘em all”
Things I would tweak


The peak-end rule is really important, and my Solstice didn’t have a very strong ending. I had to go up onstage and be like, “Okay, now it’s over.” If I had it to do over, I might cut Five Thousand Years and end on Singularity, which everyone loved.

Moment of darkness

The moment of darkness itself (two minutes of silence in the pitch black) got mixed reviews - some people found it very powerful, some people found it existentially horrifying and had to distract themselves, and a lot of people found it didn’t really land. The main thing I would change here is the way I introduced it.

I said, “We’re about to sit in silence for two minutes. If you’re up for it, I want you to look up at the stars, and think of someone you’ve lost. Someone whose voice you will never hear again, whose mind is gone from the world forever. Give them your grief, yes, but also give them your resolve.”

In retrospect, this was far too specific an ask. A lot of people said that the moment of darkness didn’t really land specifically because they’d never lost anyone close to them. (I copied the text from the Solstice Habryka and I ran in Madison, where it worked very well, but where the circumstances were very different in quite a number of significant ways.) If I had it to do over, I would encourage people to sit with their feelings, wherever they were at, rather than prescribing something for them to think about. 

First speech

I struggled for a long time to find or write an appropriate speech for the first-speech slot in the program. It was only a day or so before the dress rehearsal that I settled on giving an abridged version of Nate’s This Is A Dawn. While it had roughly the right message, I don’t think I myself was that bought into it, and as a result, people seemed to find it a bit generic, and not really meaningful. I’m not actually sure what in particular I would do here if I had a do-over, but I do want to highlight that the first-speech slot is quite important and I definitely didn’t totally nail it.

Feedback that should not be acted upon

A hopefully uncontroversial example of this is the person who doesn’t like the sound of strummed instruments, and therefore gave a low rating to every song with strummed guitars. Sure, this is a valid way to feel, but at the same time, one person’s preference in this area does not mean it makes sense to cut all guitars from Solstice.

A more controversial example, but one that I am still willing to stand by publicly, is the common complaint that Son of Man is sexist. Look, I’m a woman. Chelsea is a woman. The person who soloed on Son of Man is a woman. My sense is that, while some people were genuinely offended, and that came through in their feedback, most of the people who registered complaints were just people who were worried that other people might have been offended. I continue to think this song is an excellent fit for Solstice message-wise and has great energy (it was intended to be performed a bit faster but there were some technical difficulties with the drums). I would not hesitate to include it again.

Triggering contentEating disorder trigger

For complicated reasons, there was a brief discussion of weight loss at the end of Solstice. It was intended as a sort of light-hearted post-credits piece, but we mishandled it, and people didn’t end up getting the chance to leave if the topic was difficult for them. This had significant negative consequences for some people, and I sincerely apologize for that.

We’re taking steps to make sure that future Bay Solstices are more careful around sensitive topics like this. Specific action items include providing verbal trigger warnings in addition to the ones written in the program, and allowing significantly more time for people to leave if they need to, including having some people planted in the audience to stand up and leave so that it feels socially okay to do so. (Even though I myself won’t be running Solstice next year, I’m in close contact with next year’s organizers and have made at least one of them aware of this.)

Other triggers

On the feedback form, some people mentioned being very upset by Solstice because it reminded them that they were lonely or felt like they could be accomplishing more. I do not think anything should change about Solstice itself in response to this feedback, because being reminded that the universe is vast and dark and cold is pretty much the entire point of Solstice. 

Perhaps in the future it would be good to make it clear to potential Solstice-goers that Solstice deals a lot with death, individual responsibility, and the vast, uncaring universe. Then they can make more of an informed choice about whether or not to go, and if they can’t handle it, they can’t reasonably blame it on anyone but themselves.

Individual pieces

Raw feedback

(The table below has the pieces in chronological order of how they appeared in the performance.)

  HatedDislikeMehLikeLove1The Circle13612212The X days of X-Risk34716143To Drive the Cold Winter Away011416114Bold Orion01420165Time Wrote the Rocks06101796This is a Dawn (abridged)00121957Hard Times Come Again No More02818128There Will Come Soft Rain36121199Pale Blue Dot003122510Stardust0214131411Do You Realize146161412A Bottomless Pit of Suffering117151413Bitter Wind Lullaby0311161014Eulogy00183315The Moment of Darkness115151716Spoken Call/Response021118717We Are the Light211316518Light Pollution011219319Endless Lights031114920Brighter Than Today012161821How We Will Be Measured041215822Son of Man361015623Uplift017151724What it means to win12620525Beautiful Tomorrow137161226Singularity12093127Five Thousand Years247171028After-the-Credits-Eliezer-Bit865914

At a high level, most people liked most things! This is heartening. 

Ray and I sorted the pieces four different ways*, and there were five pieces that clearly came out on top and five that (only a little bit less) clearly came out at the bottom.


  1. Eulogy
  2. Singularity
  3. Pale Blue Dot
  4. Brighter Than Today
  5. Bold Orion


  1. After-the-Credits Eliezer bit
  2. Son of Man
  3. There Will Come Soft Rain
  4. We Are the Light
  5. Light Pollution

Effect of delivery

Something I notice that’s interesting (but not that surprising) is the large effect that delivery had on people’s ratings. For example, two of the highest rated pieces were Singularity and Pale Blue Dot. In addition to a solid delivery by Chelsea, Pale Blue Dot had a backing track and custom planetarium footage. Singularity was extremely energetic and fun, and people had generally positive affect towards all of the songs that prominently featured Taylor because he’s such an obviously skilled musician. Brighter Than Today and Bold Orion were also energetic and very polished performances.

By contrast, people were relatively lukewarm on Endless Lights and Bitter Wind Lullaby, two Solstice staples that I think of as being fairly well-liked in general. Both of these songs had significant problems with their execution, with the performers having trouble agreeing on the time signature. As a result, it was difficult for people to sing along, which seems to have made for a negative overall impression.

Addressing the elephant in the room

An additional thing that people who attended this Solstice might want to see addressed is what the heck was up with the Eliezer piece. Even apart from those who found it triggering or otherwise inappropriate, a lot of people were just confused about why it happened (e.g., several people's reaction was, "Why is this guy talking to me like I'm his friend, I don't even know him"). The explanation is perhaps not all that satisfying, but I'll give it anyway.

In early October, Eliezer contacted me asking if he could do a shenanigan at Solstice. He explained his idea to me, and while I didn't really see how it would fit in, I also didn't want to reject him out of hand.

I talked to a couple people I trusted about this, and we came to the conclusion that it would be pretty valuable to have Eliezer onstage. The reasons for this were a bit nebulous, but roughly rested on the following:

  • Regardless of any single community member's personal feelings on him and his writings, it's hard to deny that this community would not exist as it does today without Eliezer. (I, for example, came in through a chance encounter with HPMOR in high school, and basically every aspect of my current life is a direct result of that encounter.)
  • Eliezer has increasingly retreated from public life over the past few years, and this has resulted in some feelings of abandonment on the part of the community.
  • Having Eliezer onstage during Solstice would show his implicit support for the community and the event; following the above, it would remind the audience of what brought us all together and that we haven't been abandoned by our founders.

Based on this reasoning, it was having Eliezer onstage that mattered, and the content of his piece wasn't really relevant. The eating disorder trigger was honestly not something I even considered until someone mentioned it after the dress rehearsal. It was at that point that I decided to move the piece to be 'post-credits' (it had previously been early in the program proper), to make it opt-out for people uncomfortable with the topic, but as mentioned above, I failed to handle this correctly.

It’s also worth noting that, while more people hated the Eliezer bit than hated any other piece, there were also a fair number of people who loved it (if you sort by the raw number of Loves, it comes dead middle). So it was in fact not universally reviled (lots of people found it hilarious or heartwarming); it was just very polarizing.

Summary of takeaways

This is just all of the takeaways from the main body of this post, in the order that they appeared.

  • Starting a year in advance and testing and iterating often makes for a really good final product but also burns you out like hell. I think this was ultimately definitely worth it, but if I was told I had to do another year of this I would probably flee the country.
  • Deciding on a central theme/thesis for your Solstice early on is really important.
  • Set a regular time to work on Solstice things so that they don’t slip through the cracks, especially if you have a full-time job. It’s best if you can meet with other people regularly for this purpose, because accountability.
  • While hiring professional musicians may be easier, there are enough skilled musicians in the Bay Area rationalist community that I think it’s worthwhile to go that route, especially since this makes it feel more like a community event. Just make sure that people audition (or are known to be skilled and easy to work with) and can commit to rehearsing with each other.
  • Choose speakers and other performers largely based on their skill level, but it’s also important to make sure that they’re value-aligned with you when it comes to the Solstice you’re creating together.
  • It’s okay to iterate on the content until the very last minute so long as everyone is on the same page / no one is thrown off or blind-sided by the late changes.
  • If you want your arc to be really cohesive, you need to exert centralized control over each piece rather than just leaving performers to do their own thing.
  • It’s really hard to create a smooth arc over all the dimensions that matter. If you can write your own songs or work with a friend who can write original songs, this is a huge asset.
  • MCing is great.
  • Not all feedback should be acted upon.
  • Pay attention to the peak-end rule.
  • Potentially triggering topics should be handled more carefully than they were by me. It’s important for people to have a genuine opportunity to make an informed choice about what they’re exposed to.
  • Delivery/execution of pieces is just as important as (if not more important than) the semantic content and the fit in the arc.
ResourcesSetlist spreadsheet

I love spreadsheets with a passion, and I found keeping all of the relevant material in one place to be enormously helpful both for me and for communication purposes. (Whenever someone had a question about the arc, the performers, or anything, we could just pull up the spreadsheet, and even make a copy of it to see how changing the order of the pieces would feel.)

Here is a template for the spreadsheet I used. Let me know if anything is unclear!

Masterlist of Solstice materials

Daniel Speyer runs the Secular Solstice GitHub page, which is a useful resource, but it’s also very hard to edit - especially if, like me, you’re not a programmer and don’t know how to use GitHub in general. The Giant Epic Rationalist Solstice Filk spreadsheet is likewise a useful resource, but it’s kind of a mess. So I made my own spreadsheet, which is publicly editable and incorporates every song, poem, story, and speech from the above two repositories. (Apologies to Daniel Speyer and to anyone who sees this as polluting the commons by instantiating too many competing projects.)

* The sorting algorithms we used were the following:

  1. % Positive : (Liked + Loved) / (Total responses)
  2. Overall-Liked : (Liked + Loved) – (Disliked + Hated)
  3. Weighted : (2.25*Loved + Liked) – (2.25*Hated + Disliked)
  4. Loved : Raw number of ‘Love’s

Thanks to Ray Arnold, Nat Kozak, and Chelsea Voss for their input and edits.


Reality-Revealing and Reality-Masking Puzzles

16 января, 2020 - 19:15
Published on January 16, 2020 4:15 PM UTC

Tl;dr: I’ll try here to show how CFAR’s “art of rationality” has evolved over time, and what has driven that evolution.

In the course of this, I’ll introduce the distinction between what I’ll call “reality-revealing puzzles” and “reality-masking puzzles”—a distinction that I think is almost necessary for anyone attempting to develop a psychological art in ways that will help rather than harm. (And one I wish I’d had explicitly back when CFAR was founded.)

I’ll also be trying to elaborate, here, on the notion we at CFAR have recently been tossing around about CFAR being an attempt to bridge between common sense and Singularity scenarios—an attempt to figure out how people can stay grounded in common sense and ordinary decency and humane values and so on, while also taking in (and planning actions within) the kind of universe we may actually be living in.


Arts grow from puzzles. I like to look at mathematics, or music, or ungodly things like marketing, and ask: What puzzles were its creators tinkering with that led them to leave behind these structures? (Structures now being used by other people, for other reasons.)

I picture arts like coral reefs. Coral polyps build shell-bits for their own reasons, but over time there accumulates a reef usable by others. Math built up like this—and math is now a powerful structure for building from. [Sales and Freud and modern marketing/self-help/sales etc. built up some patterns too—and our basic way of seeing each other and ourselves is now built partly in and from all these structures, for better and for worse.]

So let’s ask: What sort of reef is CFAR living within, and adding to? From what puzzles (what patterns of tinkering) has our “rationality” accumulated?

Two kinds of puzzles: “reality-revealing” and “reality-masking”

First, some background. Some puzzles invite a kind of tinkering that lets the world in and leaves you smarter. A kid whittling with a pocket knife is entangling her mind with bits of reality. So is a driver who notices something small about how pedestrians dart into streets, and adjusts accordingly. So also is the mathematician at her daily work. And so on.

Other puzzles (or other contexts) invite a kind of tinkering that has the opposite effect. They invite a tinkering that gradually figures out how to mask parts of the world from your vision. For example, some months into my work as a math tutor I realized I’d been unconsciously learning how to cue my students into acting like my words made sense (even when they didn’t). I’d learned to mask from my own senses the clues about what my students were and were not learning.

We’ll be referring to these puzzle-types a lot, so it’ll help to have a term for them. I’ll call these puzzles “good” or “reality-revealing” puzzles, and “bad” or “reality-masking” puzzles, respectively. Both puzzle-types appear abundantly in most folks’ lives, often mixed together. The same kid with the pocket knife who is busy entangling her mind with data about bark and woodchips and fine motor patterns (from the “good” puzzle of “how can I whittle this stick”), may simultaneously be busy tinkering with the “bad” puzzle of “how can I not-notice when my creations fall short of my hopes.”

(Even “good” puzzles can cause skill loss: a person who studies Dvorak may lose some of their QWERTY skill, and someone who adapts to the unselfconscious arguing of the math department may do worse for a while in contexts requiring tact. The distinction is that “good” puzzles do this only incidentally. Good puzzles do not invite a search for configurations that mask bits of reality. Whereas with me and my math tutees, say, there was a direct reward/conditioning response that happened specifically when the “they didn’t get it” signal was masked from my view. There was a small optimizer inside of me that was learning how to mask parts of the world from me, via feedback from the systems of mine it was learning to befuddle.)

Also, certain good puzzles (and certain bad ones!) allow unusually powerful accumulations across time. I’d list math, computer science, and the English language as examples of unusually powerful artifacts for improving vision. I’d list “sales and marketing skill” as an example of an unusually powerful artifact for impairing vision (the salesperson’s own vision, not just the customer’s).

The puzzles that helped build CFAR

Much of what I love about CFAR is linked to the puzzles we dwell near (the reality-revealing ones, I mean). And much of what gives me the shudders about CFAR comes from a reality-masking puzzle-set that’s been interlinked with these.

Eliezer created the Sequences after staring a lot at the AI alignment problem. He asked how a computer system could form a “map” that matches the territory; he asked how he himself could do the same. He asked, “Why do I believe what I believe?” and checked whether the mechanistic causal history that gave rise to his beliefs would have yielded different beliefs in a world where different things were true.

There’s a springing up into self-awareness that can come from this! A taking hold of our power as humans to see. A child’s visceral sense that of course we care and should care—freed from its learned hopelessness. And taking on the stars themselves with daring!

CFAR took these origins and worked to make at least parts of them accessible to some who bounced off the Sequences, or who wouldn’t have read the Sequences. We created feedback loops for practicing some of the core Sequences-bits in the context of folks’ ordinary lives rather than in the context of philosophy puzzles. If you take a person (even a rather good scientist) and introduce them to the questions about AI and the long-term future… often nothing much happens in their head except some random stuck nonsense intuitions (“AIs wouldn’t do that, because they’re our offspring. What’s for lunch?”). So we built a way to practice some of the core moves that alignment thinking needed. Especially, we built a way to practice having thoughts at all, in cases where standard just-do-what-the-neighbors-do strategies would tend to block them off.

For example:

  • Inner Simulator. (Your “beliefs” are what you expect to see happen—not what you “endorse” on a verbal level. You can practice tracking these anticipations in daily life! And making plans with them! And once you’ve seen that they’re useful for planning—well, you might try also having them in contexts like AI risk. Turns out you have beliefs even where you don’t have official “expertise” or credentials authorizing belief-creation! And you can dialog with them, and there’s sense there.)
  • Crux-Mapping; Double Crux. (Extends your ability to dialog with inner simulator-style beliefs. Lets you find in yourself a random opaque intuition about AI being [likely/unlikely/safe/whatever], and then query it via thought experiments until it is more made out of introspectable verbal reasoning. Lets two people with different intuitions collide them in verbal conversation.)
  • Goal Factoring and Units of Exchange. (Life isn’t multiple choice; you can name the good things and the bad things, and you can invest in seeking the alternatives with more of the good and less of the bad. For example, if you could save 4 months in a world where you were allowed to complete your PhD early, it may be worth more than several hours to scheme out how to somehow purchase permission from your advisor, since 4 months is worth rather more than several hours.)
  • Hamming Questions. (Some questions are worth a lot more than others. You want to focus at least some of your attention on the most important questions affecting your life, rather than just the random details in front of you. And you can just decide to do that on purpose, by using pen and paper and a timer!)[1]

Much good resulted from this—many loved the Sequences; many loved CFAR’s intro workshops; and a fair number who started there went into careers in AI alignment work and credited CFAR workshops as partially causal.

And still, as we did this, problems arose. AI risk is disorienting! Helping AI risk hit more people meant “helping” more people encounter something disorienting. And so we set to work on that as well. The thing I would say now about the reality-revealing puzzles that helped grow CFAR is that there were three, each closely linked with each other:

  1. Will AI at some point radically transform our lightcone? (How / why / with what details and intervention options?)
  2. How do we get our minds to make contact with problem (1)? And how do we think groundedly about such things, rather than having accidental nonsense-intuitions and sticking there?
  3. How do we stay human, and stay reliably in contact with what’s worth caring about (valuing honesty and compassion and hard work; having reliable friendships; being good people and good thinkers and doers), while still taking in how disorientingly different the future might be? (And while neither pretending that we have no shot at changing the future, nor that “what actions should I take to impact the future?” is a multiple choice question with nothing further to do, nor that any particular silly plan is more likely to work than it is?)

CFAR grew up around all three of these puzzles—but (2) played an especially large role over most of our history, and (3) has played an especially large role over the last year and (I think) will over the coming one.

I’d like to talk now about (3), and about the disorientation patterns that make (3) needed.

Disorientation patterns

To start with an analogous event: The process of losing a deeply held childhood religion can be quite disruptive to a person’s common sense and values. Let us take as examples the two commonsensical statements:

  • (A) It is worth getting out of bed in the morning; and,
  • (B) It is okay to care about my friends. These two commonsensical statements are held by most religious people. They are actually also held by most atheists. Nevertheless, when a person loses their religion, they fairly often become temporarily unsure about whether these two statements (and various similar such statements) are true. That’s because somehow the person’s understanding of why statements (A) and (B) are true was often tangled up in (for example) Jehovah. And figuring out how to think about these things in the absence of their childhood religion (even in cases like this one where the statements should survive!) can require actual work. (This is particularly true because some things really are different given that Jehovah is false—and it can take work to determine which is which.)

Over the last 12 years, I’ve chatted with small hundreds of people who were somewhere “in process” along the path toward “okay I guess I should take Singularity scenarios seriously.” From watching them, my guess is that the process of coming to take Singularity scenarios seriously is often even more disruptive than is losing a childhood religion. Among many other things, I have seen it sometimes disrupt:

  • People's belief that they should have rest, free time, some money/time/energy to spend on objects of their choosing, abundant sleep, etc.
    • It used to be okay to buy myself hot cocoa from time to time, because there used to be nothing important I could do with money. But now—should I never buy hot cocoa? Should I agonize freshly each time? If I do buy a hot cocoa does that mean I don’t care?”
  • People's in-practice ability to “hang out”—to enjoy their friends, or the beach, in a “just being in the moment” kind of way.
    • “Here I am at the beach like my to-do list told me to be, since I’m a good EA who is planning not to burn out. I’ve got my friends, beer, guitar, waves: check. But how is it that I used to be able to enter “hanging out mode”? And why do my friends keep making meaningless mouth-noises that have nothing to do with what’s eventually going to happen to everyone?”
  • People's understanding of whether commonsense morality holds, and of whether they can expect other folks in this space to also believe that commonsense morality holds.
    • “Given the vast cosmic stakes, surely doing the thing that is expedient is more important than, say, honesty?”
  • People's in-practice tendency to have serious hobbies and to take a deep interest in how the world works.
    • I used to enjoy learning mathematics just for the sake of it, and trying to understand history for fun. But it’s actually jillions of times higher value to work on [decision theory, or ML, or whatever else is pre-labeled as ‘AI risk relevant’].”
  • People's ability to link in with ordinary institutions and take them seriously (e.g. to continue learning from their day job and caring about their colleagues’ progress and problems; to continue enjoying the dance club they used to dance at; to continue to take an interest in their significant other’s life and work; to continue learning from their PhD program; etc.)
    • “Here I am at my day job, meaninglessly doing nothing to help no one, while the world is at stake—how is it that before learning about the Singularity, I used to be learning skills and finding meaning and enjoying myself in this role?”
  • People's understanding of what’s worth caring about, or what’s worth fighting for
    • “So… ‘happiness’ is valuable, which means that I should hope we get an AI that tiles the universe with a single repeating mouse orgasm, right? ... I wonder why imagining a ‘valuable’ future doesn’t feel that good/motivating to me.”
  • People's understanding of when to use their own judgment and when to defer to others.
    • “AI risk is really really important… which probably means I should pick some random person at MIRI or CEA or somewhere and assume they know more than I do about my own career and future, right?”

My take is that many of these disorientation-bits are analogous to the new atheist’s disorientation discussed earlier. “Getting out of bed in the morning” and “caring about one’s friends” turn out to be useful for more reasons than Jehovah—but their derivation in the mind of that person was entangled with Jehovah. Honesty is analogously valuable for more reasons than its value as a local consumption good; and many of these reasons apply extra if the stakes are high. But the derivation of honesty that many folks were raised with does not survive the change in imagined surroundings—and so it needs to be thought through freshly.

Another part of the disorientation perhaps stems from emotional reeling in contact with the possibility of death (both one’s own death, and the death of the larger culture/tribe/species/values/life one has been part of).

And yet another part seems to me to stem from a set of “bad” puzzles that were inadvertently joined with the “good” puzzles involved in thinking through Singularity scenarios—“bad” puzzles that disable the mental immune systems that normally prevent updating in huge ways from weird and out-there claims. I’ll postpone this third part for a section and then return to it.

There is value in helping people with this disorientation; and much of this helping work is tractable

It seems not-surprising that people are disrupted in cases where they seriously, viscerally wonder “Hey, is everything I know and everything humanity has ever been doing to maybe-end, and also to maybe become any number of unimaginably awesome things? Also, am I personally in a position of possibly incredibly high leverage and yet also very high ambiguity with respect to all that?”

Perhaps it is more surprising that people in fact sometimes let this into their system 1’s at all. Many do, though; including many (but certainly not all!) of those I would consider highly effective. At least, I’ve had many many conversations with people who seem viscerally affected by all this. Also, many people who tell me AI risk is “only abstract to [them]” still burst into tears or otherwise exhibit unambiguous strong emotion when asked certain questions—so I think people are sometimes more affected than they think.

An additional point is that many folks over the years have told me that they were choosing not to think much about Singularity scenarios lest such thinking destabilize them in various ways. I suspect that many who are in principle capable of doing useful technical work on AI alignment presently avoid the topic for such reasons. Also, many such folks have told me over the years that they found pieces at CFAR that allowed them to feel more confident in attempting such thinking, and that finding these pieces then caused them to go forth and attempt such thinking. (Alas, I know of at least one person who later reported that they had been inaccurate in revising this risk assessment! Caution seems recommended.)

Finally: people sometimes suggest to me that researchers could dodge this whole set of difficulties by simply reasoning about Singularity scenarios abstractly, while avoiding ever letting such scenarios get into their viscera. While I expect such attempts are in fact useful to some, I believe this method insufficient for two reasons. First, as noted, it seems to me that these topics sometimes get under people’s skin more than they intend or realize. Second, it seems to me that visceral engagement with the AI alignment problem is often helpful for the best scientific research—if a person is to work with a given “puzzle” it is easier to do so when they can concretely picture the puzzle, including in their system 1. This is why mathematicians often take pains to “understand why a given theorem is true” rather than only to follow its derivation abstractly. This is why Richard Feynman took pains to picture the physics he was working with in the “make your beliefs pay rent in anticipated experiences” sense and took pains to ensure that his students could link phrases such as “materials with an index of refraction” with examples such as “water.” I would guess that with AI alignment research, as elsewhere, it is easier to do first-rate scientific work when you have visceral models of what the terms, claims, and puzzles mean and how it all fits together.

In terms of the tractability of assisting with disorientation in such cases: it seems to me that simply providing contexts for people to talk to folks who’ve “been there before” can be pretty helpful. I believe various other concepts we have are also helpful, such as: familiarity with what bucket errors often look like for AI risk newcomers; discussion of the unilateralist’s curse; explanations of why hobbies and world-modeling and honesty still matter when the stakes are high. (Certainly participants sometimes say that these are helpful.) The assistance is partial, but there’s a decent iteration loop for tinkering away at it. We’ll also be trying some LessWrong posts on some of this in the coming year.

A cluster of “reality-masking” puzzles that also shaped CFAR

To what extent has CFAR’s art been shaped by reality-masking puzzles—tinkering loops that inadvertently disable parts of our ability to see? And how can we tell, and how can we reduce such loops? And what role have reality-masking puzzles played in the disruption that sometimes happens to folks who get into AI risk (in and out of CFAR)?

My guess is actually that a fair bit of this sort of reality-masking has occurred. (My guess is that the amount is “strategically significant” but not “utterly overwhelming.”) To name one of the more important dynamics:

Disabling pieces of the epistemic immune system

Folks arrive with piles of heuristics that help them avoid nonsense beliefs and rash actions. Unfortunately, many of these heuristics—including many of the generally useful ones—can “get in the way.” They “get in the way” of thinking about AI risk. They also “get in the way” of folks at mainline workshops thinking about changing jobs/relationships/life patterns etc. unrelated to AI risk. And so disabling them can sometimes help people acquire accurate beliefs about important things, and have more felt freedom to change their lives in ways they want.

Thus, the naive process of tinkering toward “really helping this person think about AI risk” (or “really helping this person consider their life options and make choices”) can lead to folks disabling parts of their epistemic immune system. (And unfortunately also thereby disabling their future ability to detect certain classes of false claims!)

For example, the Sequences make some effort to disable:

Similarly, CFAR workshops sometimes have the effect of disabling:

  • Taste as a fixed guide to which people/organizations/ideas to take in or to spit out. (People come in believing that certain things just “are” yucky. Then, we teach them how to “dialog” with their tastes… and they become more apt to sometimes-ignore previous “yuck” reactions.)
  • Antibodies that protect people from updating toward optimizing for a specific goal, rather than for a portfolio of goals. For example, entering participants will say things like “I know it’s not rational, but I also like to [activity straw vulcans undervalue].” And even though CFAR workshops explicitly warn against straw vulcanism, they also explicitly encourage people to work toward having goals that are more internally consistent, which sometimes has the effect of disabling the antibody which prevents people from suddenly re-conceptualizing most of their goal set as all being instrumental to/in service of some particular purportedly-paramount goal.
  • Folks’ tendency to take actions based on social roles (e.g., CFAR’s Goal-Factoring class used to explicitly teach people not to say “I’m studying for my exam because I’m a college student” or “I have to do it because it’s my job,” and to instead say “I’m studying for my exam in order to [cause outcome X]”).

Again, these particular shifts are not all bad; many of them have advantages. But I think their costs are easy to underestimate, and I’m interested in seeing whether we can get a “rationality” that causes less disablement of ordinary human patterns of functioning, while still helping people reason well in contexts where there aren’t good prexisting epistemic guardrails. CFAR seems likely to spend a good bit of time modeling these problems over the coming year, and trying to develop candidate solutions—we’re already playing with a bunch of new curriculum designed primarily for this purpose—and we’d love to get LessWrong’s thoughts before playing further!


Thanks to Adam Scholl for helping a lot with the writing. Remaining flaws are of course my own.

  1. If you don't know some of these terms but want to, you can find them in CFAR's handbook. ↩︎


How Escape From Immoral Mazes

16 января, 2020 - 16:10
Published on January 16, 2020 1:10 PM UTC

Previously in sequence and most on point: What is Success in an Immoral Maze?How to Identify an Immoral Maze

This post deals with the goal of avoiding or escaping being trapped in an immoral maze, accepting that for now we are trapped in a society that contains powerful mazes. 

We will not discuss methods of improving conditions (or preventing the worsening of conditions) within a maze, beyond a brief note on what a CEO might do. For a middle manager anything beyond not making the problem worse is exceedingly difficult. Even for the CEO this is an extraordinarily difficult task.   

To rescue society as a whole requires collectively fighting back. We will consider such options in  later posts.

For now, the problem statement is hard enough.

To reiterate the main personal-level takeaway

Being in a maze is not worth it. They couldn’t pay you enough. Even if they could, they definitely don’t. If you end up CEO, you still lose. These lives are not worth it. Do not be a middle manager at a major corporation or other organization that works like this. Do not sell your soul.

Problem statement

Increasingly, avoiding mazes is easier said than done.

First, one must identify them, for which the previous post offers a guide.

After that, there are still many hard problems to solve.

How do we avoid moral mazes? How do we justify that choice to others? What alternative choices do we have? What if we’re already in a maze? What if we’ve already self-modified in ways that make it hard to extract ourselves? What if our human or social capital only pays off inside them? 

What about if you are doing object-level work without anyone who reports to you, but you have a maze above you?

And for those who think this way, do I have a moral obligation to suffer and do this anyway, in order to maximize my charitable giving, or to otherwise do good?

How do we avoid immoral mazes?

Truly understand how painful it will be to interact with a maze even if you’re not an employee. Know the signs, as discussed in the previous post. Keep a close eye out for mazes. Realize that you have other options. Choose other paths. 

This isn’t an ‘all things being equal’ choose other paths. This is making what look like major sacrifices and different life choices and profession choices, or taking big risks (that may or may not include starting a business or doing work outside of an organization) in order to have skin in the game. Really understand that the offer from even a relatively tolerable maze is much, much worse than it looks, and that opportunities outside mazes are often much better and more realistic than they look.

Young people starting out in the labor market often have The Fear that they will never find a job or never find a good job or another good job. If you are capable of getting this far, and you persevere, that is not true for you. A wide variety of jobs and other opportunities are out there.

I realize some people have already become so trapped in mazes that they cannot walk away.

If you actually can’t walk away, see the last two questions.

What do you do if you find yourself inside a maze?

Quit. Seriously. Go do something else. Ideally, do it today.

At least start planning and looking. Every day there is another day you suffer, another day you invest your social and human capital in ways that can’t be transferred, and another day you become infected by the maze that much more. 

If you actually can’t afford to quit, see the last three questions.

How do we justify our choice to others?

When I worked for a financial firm, the question ‘what do you do?’ (or, in a scarier form, ‘who are you?’ as implicitly defined by your job) had an easy answer. I work for (firm). One of the big benefits was being able to tell an easy, compact story of me and my life and work choices, that most found praiseworthy. It was easy. It was comfortable. It also worked wonders for things like renting an apartment or otherwise proving myself respectable. 

A lot of the alternative answers that don’t involve mazes give you a much better life and method of earning a living, but they do make answering the ‘what do you do?’ and ‘how can I count on you to make rent or support a family?’ questions trickier. One must acknowledge this. 

It isn’t only strangers you tell this story to. It is your friends. It is your family. It is also yourself.

I likely stayed at (firm) months longer than I should have due to being scared of not being able to tell this story anymore, especially to my wife and to myself, and having to instead tell a different one.

A lot of this fear is the expectation that others won’t understand and won’t accept our justifications. That does happen, but far less than people typically expect or fear. Most people are far more sympathetic than the inside view might suggest.

Even the internet is supportive. Which is not its style.

This is largely because, at least for now, there is a widespread cultural belief that one should do what you love, and be content in one’s work. That work should provide meaning. That’s not always a good rule or good idea, although it is a fine aspiration. Not everyone can have soul in the game. But almost everyone recognizes that it would be better if one did. 

How do you go about telling your new story? (Justification continued)

Here’s my take on how to approach this, based on my experience. Comments suggesting improvements or alternatives are highly encouraged. 

There are two parts of this.

One is to figure out what you are doing, not only what you’re not doing, and how to talk about that. Some of those answers are mostly culturally normal and comfortable, some of them are less so. Now that I can say ‘I’m a game designer’ that goes over quite well. 

The most important things here are to make the thing you are doing sound simple, put it in terms that people can relate to, and to make it clear that you are comfortable and happy with it to the extent that this wouldn’t be lying to people. If you’re not comfortable and happy with it, people will pounce on that. It’s also much better to be happy with what you are doing for your own sake, so that is something to work on, either looking to get to that place, or to finding another option where you can do that. 

If you quit today without a plan, then what you will be doing is recovering from your experience and figuring out what to do next. I told that story for about a month. That story goes over better than you would expect – for a while. From a social (as well as financial) perspective you are most definitely on a clock. There are plenty of people who let that clock run out. But if it’s two weeks in, own it.

The other part of your explanation is justifying why you’re not doing the standard thing of indenturing yourself to a new maze, unless you have an obviously great alternative thing going. The worse your answer to part one sounds, the harder part two is going to be. 

First try giving it to them straight. Tell them you find large corporations highly toxic and morally compromising. It left you a wreck. You have no interest in the lifestyle you would live or the person that you would be. If they are genuinely curious, you can point them to Moral Mazes itself or this series of posts, or explain further in your own words. 

You can also use the culturally assigned incantations to explain your decision. Tell people you need to do what you are passionate about, to ‘follow your heart/passion,’ to do what you believe in, to ‘help people’ or ‘make a difference.’ To ‘get your hands dirty’ and ‘do something real.’ Some people appreciate wanting to ‘be your own boss’ or ‘do it your way,’ which are weaker, non-dystopian ways of sending the real message. 

And of course, you can simply say ‘it was making me miserable. I hated it. Doing this instead makes me happy.’

I’ve gotten into the habit of saying my job at (firm) was ‘a poor fit’ because I genuinely believe both that there were particular real needs they have that were expensive for me to provide, and that the firm was in many ways unusually great and unusually low on maze characteristics, and I do not think it is a mistake to take a job there if it would suit you. They treated me right and I don’t want to throw anyone under the bus.

You could also, if you wanted to, use the question as an opportunity to do a public service and spread the word. That’s supererogatory.

Another thing to keep in mind, as discussed in the next question, is that those pushing us towards mazes are often operating on traditions and heuristics that used to push towards virtuous action that led to happiness and real success. The world changed, and those traditions and heuristics started getting this wrong. This is highly sympathetic. It might help to approach from this perspective.

Most people get it. They don’t fully get it unless they’ve been on in the inside and reflected upon what happened. But they do get that there’s something soul-killing about working for the man and/or being lost in a maze of political actions. 

Others won’t get it. 

What if my family or culture won’t accept my justifications?

Some people won’t get it. They will respond that all of this is excuses for not wanting to do hard work or make sacrifices. That this is how the ‘real world’ works. That it is ‘time to grow up.’ That a ‘real adult/man/woman/hero/whomever’ would suck it up and deal with it. That it is your responsibility to do so, for your family, for the world or for yourself. That ‘get a steady job’ is what good and responsible people do. That this is how one survives in today’s world, and how one gets to raise and support a family. 

Often people who are counting on you, usually family members, will effectively let you know that while they care a non-zero amount about whether your life experience is miserable, or what impact your work has on the world, or what upside or opportunities for personal growth you might have, what they actually care about is whether you are projecting the illusion of security. They want to mentally cache that you/they ‘are going to be OK’ and that ‘everything is all right.’ 

This has remarkably little to do with actual security. Jobs in many mazes are not especially secure. Others are secure barring disruption of the underlying order, if you are willing to pay the prices discussed, and tie all your human and social capital to the maze. 

The security they seek is the security of the banker who loses money when everyone around him loses money. This is useful to the banker because one cannot then scapegoat and fire him if he has bad luck and does poorly. This is useful to those in a maze and those who tie their fates to them, because they hope they will similarly seem responsible and legitimate, and thus worthy of sympathy and assistance if things go poorly. 

It is a self-reinforcing charade. People demand this illusion of legitimacy to protect against others’ accusations of illegitimacy. They will defend this charade even if times change, the mazes mostly fail and those who are doing real things succeed, and attempt to forcibly transfer wealth from people doing real things to those who previously worked in mazes. All of that doesn’t make participation in the charade worthless. You can even hope to benefit from the expropriations. But it is important to know that it is a charade. 

If you face a family and/or culture that demands devotion of one’s entire life to the illusion of respectability, have sympathy for those making these demands. In the past, when these traditions and heuristics developed, the illusion of respectability corresponded to real work and other worthy virtues that lead to happiness and true success. If they fail to realize the change, and update accordingly, that is at least somewhat understandable. 

If attempts to make them realize this change or accept your perspective fail, one must treat them the same way one treats anything else that is out to get you. If their demands cannot be satisfied in a way you can accept, because they will simply demand more until it is something you cannot accept, then attempting to satisfy their demands is folly. 

If you previously chose those around you based upon being a member of a maze, then it is plausible that having invested in those people and relationships it makes sense to stick around. It is also plausible that being around them afterwards no longer makes sense. 

I know it is easy to say and tough to act upon, but hopefully in time either they will understand and/or come around, or you will realize that life is better without them. 

What if you’ve already self-modified too much?

This sucks quite a lot.

After a while, those little status differences and little battles start to deeply matter. Other things matter less and less. Humans can adapt to many things. Giving up all that likely fills you with deep existential dread.

Even worse, you’ve sculpted everything else, including your friends and often your family, around these obsessions. You, and often they, depend on the currently steady money and the illusion of security the maze provides. Without that, things could rapidly fall apart. 

The good news is, you’ve figured out that this happened. Perhaps you haven’t self-modified as much as you’ve feared, or you have a path back to undoing this. Often the modifications start reverting once you extract yourself from the situation. Often you’re deeply miserable, in a way that those around you at least subconsciously know quite well. Those around you often realize this long before the person realizes it about themselves. If you tell the people you care about what’s really going on, if they’re worth keeping, they’ll almost always be supportive. 

I’ve seen a number of people realize they hated their jobs and needed to go. Almost all of them got lots of support when they got the courage to say it out loud.

A note I got on a previous draft: “The happiest Uber drivers I have seen used to be middle management.”

If anything, I see many people around me being too supportive of opting out of working, or in a sense out of life, entirely. It is important to help and encourage people to do more things.

See the question above on how to explain your choices and situation to others. 

So my first suggestion is to admit to yourself what is happening to you. Take an inventory. Concretely observe what is actually going on around you, without excuses or euphemisms, and what that is doing to your brain and your life.

Then tell the people who care about you. And go from there.

Are mazes are where our human and/or social capital pays off?

Note that when I say ‘pays off’ here, I mean maximally pays off. If you have the skills and opportunity to advance inside a maze, you have the skills and opportunity to take a lower level position at a smaller institution, and still earn a reasonable living. That does not mean that this transition would not be painful, or that you could maintain your current lifestyle, or that your family and friends would stand by you, but you definitely have that option.

If you are an academic with a PhD, and notice your academic institution is a maze, keep in mind that the entire concern about you only getting paid off in academia probably simply isn’t true. Academics typically get substantial pay bumps when they move to private industry.

A lot of other professionals are similarly buying the security and familiarity of what they’re used to, rather than being paid off with dollars.

One must still be careful to ensure not to jump from one maze into another.

Then there’s big corporations. Big corporations really do pay better than smaller businesses if you can’t get equity in either business. The recent history of domestic ‘income inequality’ is in large part inequality between firms as bigger and more successful firms pay higher salaries even to their lower tiers, and have more higher tiers in which to earn yet more.

There are three theories I know about for why big corporations pay more.

Theory one is that big corporations are more likely to have O-ring production functions or otherwise benefit more from higher quality workers, so they pay more in order to attract better workers.

Theory two is that big corporations make more money per employee, and are big enough to potentially support unions, so employees demand and receive more of that pay.

Theory three is that working in a big corporation sucks, and employees realize this to at least some extent, so employees demand more money in order to be willing to work there.

If working at a major corporation is a major life cost, and working in management a bigger one, and these come with higher pay, than a lot of income inequality in developed countries does not represent a gap in desired life outcomes, and it might be more unfair if that part of the gap was closed.

A lot more of that pay gap is that some professions engage in rent seeking behavior to extract resources. Some big examples are finance, law, education/academia, and medicine. Again, that comes with much better pay.

It also usually comes with a big time investment in the development of the relevant social capital, human capital and credentials you need to succeed. If you went to medical school or law school or worked hard to get a tenure track, and later realize that your profession is a maze (I’m making no claim here that these professions are or aren’t mazes in general or how intense those mazes might be, although some central organizations within them clearly are very intense mazes), walking away from that is going to be expensive.

This is doubly true for human capital within a single organization. When I took a job at a financial firm, they spent a large part of my first two years training me. The first year had a lot of firm-specific detail but was mostly training about markets and trading in general, that applies everywhere there are markets. It was fascinating, and pretty great. Five stars, would study again, especially given they paid me rather than the other way around. The second year still had a lot of training and learning, but increasingly it was about the specific problems I was working on, developing relationships with and learning about coworkers and organizational structures and how we did things, and other information specific to the firm. This was less fun, and when I left, it became worthless.

I had another job I stayed in for five years. This was also a place I got to observe transition from mostly not being a maze into becoming one over time, although that’s a story I can’t tell online.

Early on there was a lot of learning, a lot of which was very specific to our business, but a lot of which applies universally. I worked mainly with one particular person, who knew what we were doing and cared about us doing it well. It provided great experience.

By the third year, I was learning about our specific products and customers and dynamics, in increasingly arcane fashion. I was also forced to interact increasingly with the maze growing around us, spending more time making bosses and others like what they saw rather than doing what was right for the business. I was unable to get the resources to enhance our performance, despite yearly returns on investment obviously well above 100%. I made an effort to switch over into problems that were both more valuable and offered more room for growth, both for me and for the business, and which I could tackle with the resources available.

By the fifth year, I wasn’t developing any skills that would be useful elsewhere, except that I was now learning to code because I got tired of no one being able to code what I needed. This brought me from ‘can code but not in an actually useful way’ to ‘can code real things that are useful, but badly/slowly.’ 

Note that the managers in Moral Mazes who succeed were always moving around to bigger and better things. If they weren’t, they instead moved on to similar and different thus hard to compare things to preserve the illusion of career momentum. If you have adopted the maze nature, many of the skills you have learned doing so translate to other mazes. Your existing within-maze status can often also be transferred to your new location, but only if you continue to be seen as a winner. If you’re a loser, no one else will want you, and moving on will mean moving down.

That means that once your path in the maze is stalled, even though you have invested a lot to get to this point, recovering your momentum is going to be extremely difficult. If you are not satisfied with your current role, your human capital is a lot less valuable than it naively appears to be, because it no longer has much upside even on its own terms. The fall from where you are to starting over can still be large.

The best feature of an academic maze is that they have a perfectly designed system in which to not care about getting ahead, which The Gervais Principle calls a loser, and which academia calls tenure.

The pattern remains. The more you dedicate time to a path, both a profession and a particular job, the more you give up when you leave and the less of your time you can carry with you. Many people don’t have great options. The job market isn’t that great out there if you don’t want to be coding and don’t have an in with the rent seekers, and can’t use the skills you’ve developed, or do the thing that legibly follows from your resume.

If mazes are where my social and/or human capital pays off, what should I do?

Let us ignore here why your capital pays off best in a maze. It does not much matter to your decision, in important senses, to what extent rent seeking, theft, coercion, fraud or even systems designed explicitly to make your skills not transfer to honest work are or aren’t responsible for this being the case. For whatever reason, often events conspire to prevent you from efficiently plying your trade (or in some cases, plying it at all), where you hold comparative advantage, without being part of a maze. 

Some people really do have a dilemma, where they can either do something menial and mindless that still gets them abused and doesn’t pay much, if they can find work at all. Or they can go out on a limb that looks super risky and likely to fail, and/or that requires years without compensation. Or they can keep working in the maze.

It is important that vastly more people think they are in this position, than are in this position. If you think you are in this position, consider the possibility that you are mistaken. Consider all the alternatives. Consider how much the reduction in medium-term funds and superficial status would actually matter to you. Consider how much of what’s holding you back is simply The Fear in some form.

Imagine exactly how relieved you’d be to be out of there. Remember that even if leaving really is super painful, involving a large reduction in consumption levels and superficial status and standard of living, and the abandonment of large sunk costs, that doesn’t mean it isn’t Worth It. 

My first line of response to this dilemma is exactly what you would expect: Consider leaving anyway. But I admit that isn’t always the right answer. In some cases, things really have gone too far, you have too many promises to keep and too many sunk costs.

Become a Loser

The next line of defense is to become a loser, in the sense laid out in The Gervais Principle. A loser does not strive to get ahead while at work. A loser finds their value in other places than work. At work, they pride themselves on putting forward at most the minimum amount of effort to get the job done. 

The Gervais Principle can be seen as the prequel to Moral Mazes, dealing with life at lower levels of mazes that have to interact with the real world. Mazes need, as several quotes describe, people who keep their heads down and ‘do their job’ with no ambitions for further advancement. Ideally one does this as low on the totem pole as one can stomach and afford, as the life that results is far less odious and taxing.

By declaring themselves as neutral and not a threat, such people are often left mostly alone if they’re important to the system continuing to run. They can now reclaim some slack and a personal life. It’s not a great solution. You’re still holding up the maze. You’re still interacting with it. You’ll still have to make severe moral sacrifices. But to some extent, some of the time, you can pick and choose what to have no part in. 

Over time, your position likely will slowly degrade. Eventually this may lead you to leave. Hopefully by then you’ll have been able to save enough and be prepared enough to be ready for that. If you’re stuck in a maze, the least you can do is turn a healthy monthly profit.

Take Risks

The final line of defense I can come up with is to take big bold risks. Either stand up for what you believe in or gamble to advance your own situation. Sometimes this will work, your situation will improve and you’ll learn your situation was better than you thought. Other times they’ll backfire, and you’ll learn your situation was worse than you thought and is now worse than that. Remember that if you get fired from a job you don’t want, that can be a big win, because you might not have had the courage to leave on your own and you might even get severance and unemployment. 

The real danger is often not that you get fired. It’s that you become ‘dead without knowing it’ as in this quote:

You can put the damper on anyone who works for you very easily and that’s why there’s too much chemistry in the corporation. There’s not enough objective information about people. When you really want to do somebody in, you just say, well, he can’t get along with people. That’s a big one. And we do that constantly. What that means, by the way, is that he pissed me off; he gave evidence of his frustration with some situation. Another big one is that he can’t manage—he doesn’t delegate or he doesn’t make his subordinates keep his commitments. So in this sort of way, a consensus does build up about a person and a guy can be dead and not even know it. (Location 1475, Quote 10)

This can lead you to waste years of your life struggling for something you had no chance of getting. This is one reason why a great way to take risk is to force the issue, asking for or demanding raises or promotions. It avoids this danger. The more uncertain you are about where you stand, the more you should take risk to create clarity.

At all my jobs in mazes, I would have greatly benefited from taking greater risks to create clarity, regardless of the outcome.

Can You Change Things From the Top?

If you by some miracle reach the top with your soul intact, now you can try and change the system. Or at least you can do harm reduction in earnest. One shouldn’t give this much hope or weight, since such intentions rarely survive that long, and doing anything lasting about it will still be quite hard. I don’t know what would work.

My friends and I have talked to several people who have reached the top. Many of them understand what the process has done to them, but don’t know how to fix themselves or the system. It isn’t cheap or easy to reverse or even halt the damage.

It is unlikely you can have much impact without reaching the actual top and becoming CEO. If you do become CEO, you may have a short window in which you can ‘clean house’ the way that maze CEOs do, and put people opposed to mazes into key positions where they can clean their houses in turn. You can then combine that with other efforts, and maybe get somewhere, but I don’t have the insights necessary to say much more, and such efforts will be exceedingly difficult. The maze will fight back.

I strongly believe it is much easier to build a new system from scratch than to ‘change the system from within.’

What about if you are doing object-level work without anyone who reports to you, but you have a maze above you?

In Moral Mazes such workers are said to be ‘on the line.’

Details of this situation will determine to what extent this represents being stuck in the maze, versus to what extent this represents doing regular object-level work.

What are you actually doing all day? What are your incentives?

If your essential scenario is given an object-level job to do and do it, that is mostly fine.

If your essential scenario is not that, it is less fine, but it is still far better than being a middle manager. It’s not good to have a bull**** job, but it’s not the nightmare we’re describing elsewhere.

Consider the car salesmen from Imperfect Competition.

One can imagine a car dealership as no different from the local hardware store, buying useful tools wholesale and selling them at a higher price to customers who want to buy and use those tools, and the only difference is that you sell 0.1% as many tools for a thousand times the price. One can also imagine that the demands of the car corporation, and the incentives they provide, and the misinformation they spread, and the regulations they twist and engineer, and so forth, end up with you effectively stuck in the maze. The truth is presumably somewhere in between – you see insane things around quotas and regulations and advertising campaigns you cannot control, and the dealerships have their own issues of their own design, but you are still mostly working for a small actual business most of the time.

The same would go for the workers in The Office, as analyzed in The Gervais Principle. Michael is largely in maze hell. Jim spends a lot of time avoiding maze hell. Most of the workers have to deal with the craziness and what it does to the business, but this is only ordinary soul crushing and not what middle managers deal with.

Jim’s situation on The Office is the biggest problem. There is no future. The only way up, to better your work situation, would be to dive into the maze. If you do that while not buying into the system, it will go badly for you on all levels. If you do buy in, then you’ve fully made the big mistake I’m warning against.

Consider the Uber drivers, some of whom are reported to be happy refugees from middle management.

To the extent that the driver is offered rides, chooses to accept them individually, and gets paid for each ride provided, the driver is good. They set their hours and level of effort. There is word that Uber and its ilk are now using algorithmic systems and various overall incentives to try and ensnare their drivers more broadly into the system, which would be worse, but the core experience is still one of real work.

Consider a software engineer, given specific tasks to code and coding them. That seems likely to be mostly fine.

Consider a worker that is literally ‘on the line’ in a manufacturing plant that makes physical objects. It is not the best or most compensated work, but you are mostly free from the maze.

Being ‘on the line’ and continuing to do real work is miles behind doing real work where you have skin in the game, but if you get to dodge the worst of all this, it is a reasonable temporary fallback if you lack alternatives. Look carefully at details.

If you are a manager but not a middle manager (e.g. no one who reports to you has anyone who reports to them), and the group of people you manage has object-level tasks to do together, you aren’t automatically doomed, but there is great danger lurking, including the risk you will be promoted.

Do I have a moral obligation to work in mazes to maximize my charitable giving?

No. You don’t. 

This post has done its best to deliberately ignore the moral costs of participating in mazes, because avoiding them is already over-determined without that.

I want to make it clear that I’m not relying on moral concerns.

But if that’s what you are concerned about, moral concerns work in the opposite direction. Making the world more and more maze-like by embracing the system, and engaging in zero-sum competitions to extract resources, while making your life miserable, is the opposite of a moral obligation. 

It may help to remember that a drowning child is hard to find

Moral systems that imply that subjecting oneself to torture in the service of immoral mazes or other harmful systems, for the purposes of allowing other such systems to then extract those resources from you, is a moral obligation, are not likely to be good ideas, or to have your or humanity’s best interests at heart. 


Testing for Rationalization

16 января, 2020 - 11:12
Published on January 16, 2020 8:12 AM UTC

Previously: Avoiding Rationalization

So you've seen reason to suspect you might be rationalizing, and you can't avoid the situation, what now?

Here are some tests you can apply to see whether you were rationalizing.

Reverse the Consequences

Let's explain this one via example:

Some Abstinence Educators like to use the "scotch tape" model of human sexuality. In it, sex causes people to attach to each other emotionally, but decreasingly with successive partners, just like tape is sticky, but less sticky when reused. Therefore, they say, you should avoid premarital sex because it will make you less attached to your eventual spouse.

Do you think this is a reasonable summary of human sexuality? Are people basically scotch tape?

Suppose the postscript had been: therefore you should have lots of premarital sex, so that you're not irrationally attached to someone. That way, when you believe you are in love and ready to commit, you really are.

Does this change your views on the scotch tape model? For many people, it does.

If so, then your views on the model are not driven by your analysis of its own merits, but by either your desire to have premarital sex, or your reluctance to admit Abstinence Educators could ever be right about anything.

(Or, possibly, your emotional revulsion at premarital sex or your affiliation to Abstinence Educators. The point of this section is unaffected.)

The point here is to break the argument into the piece to be evaluated, and the consequence of that piece which logically shouldn't effect the first part's validity but somehow does.

If the consequences seem hard to spin backwards, put on your Complete Monster Hat for a quick role-play. Suppose you think third-world charity is breaking the economies it goes to, and therefore you should keep your money for yourself, but this could be a rationalization from an unendorsed value (greed). Imagine yourself as a Dick Dastardly, a mustache-twirling villain who's trying to maximize suffering. Does Mr. Dastardly give generously to charity? Probably not.

I don't want to get into an analysis of economic aid here. If contemplating Mr Dastardly gives you a more complex result like "I should stop treating all third-world economic aid as equivalent" and not a simple "I should give", then the intuition pump is working as intended. Because it's helping you build a more accurate world-model.

Conservation of Expected Evidence

The examples in the original Conservation of Expected Evidence post cover this pretty well.

To put it in imperative terms, imagine you'd observed the opposite. If you wouldn't update the opposite way, something has gone wrong. If you wouldn't update as much, this must be balanced by having been surprised when you learned this.

Note that "opposite" can be a little subtle. There can be u-shaped response curves, where "too little" and "too much" are both signs of badness, and only "just right" updates you in favor of something. But unless such a curve is well-known beforehand, the resulting model takes a complexity penalty.

Ask a Friend

A classic way of dealing with flaws in your own thinking is to get someone else's thinking.

Ideally someone with uncorrelated biases. This is easiest to find when you have a personal connection weighing on your reasoning, and it's easy to find someone without one. (In extreme cases, this can be recusal: make the unconnected person do all the work.)

Be careful when asking that you don't distort the evidence as you present it. Verbosity is your friend here.

You may even find that your friend doesn't need to say anything. When you reach the weak-point in your argument, you'll feel it.

One's Never Alone with a Rubber Duck

If your friend doesn't need to say anything, maybe they don't need to be there. Programmers refer to this as "rubber duck debugging".

This has the advantage that your friend can be whomever you want. You can't actually run your ideas past Richard Feynman for a double-checking, for several reasons, but you can certainly run them past imaginary Richard Feynman. The ideal person for this is someone whose clear thinking you respect, and whose respect you want (as this will throw your social instincts into the fray at finding flaws).

Be sure, if attempting this that you explain your entire argument. In imagination, it's possible to fast-forward through the less interesting parts, but those could easily be where the flaw is.


Fire Alarm for AGI

16 января, 2020 - 07:07
Published on January 15, 2020 8:41 PM UTC

" While the interpretable model may not get to quite the same level of performance as the Oracle on the exact task used for training, it turns out that the process of generating the interpretable model results in something which generalises much better to new situations."

This sounds like artificial dreaming:

"...after a certain number of search steps for a fixed set H, and after choosing the best available synthesized program for this set, we sample a set of additional histories by simulating the current programmatic policy, and add these samples to H."

The following also struck me as portentous although not as much as the above paper.



The Alignment-Competence Trade-Off, Part 1: Coalition Size and Signaling Costs

16 января, 2020 - 02:10
Published on January 15, 2020 11:10 PM UTC

This is part 1 of a series of posts I initially planned to organize as a massive post last summer on principal-agent problems. As that task quickly became overwhelming, I decided to break it down into smaller posts that ensure I cover each of the cases and mechanisms that I intended to.

Overall, I think the trade-off between the alignment of agents and the competence of agents can explain a lot of problems to which people often think there are simple answers. The less capable an agent is (whether the agent is a person, a bureaucracy, or an algorithm) the easier it is for a principal to assess the agent, and ensure the agent is working toward the principal’s goals. As agents become more competent, they become both more capable of actually accomplishing the principal’s goals and of merely appearing to accomplish the principal’s goals while pursuing their own. In debating policy changes, I often find one sided arguments that neglect this trade-off, and in general I think efforts to improve policies or the bureaucratic structures of companies, non-profits, and governments should be informed by it.

Part 1:


Virtue signaling and moralistic anger are both forces that have been useful for holding people accountable, and powerful mechanisms of cultural evolution: spreading some norms more successfully than others, and resulting in many societies holding similar norms.

However, the larger a group becomes, the less members of the group know on average about other individual member’s behavior or the consequences of it: making it harder to evaluate complex actions. This in turn gives an advantage to more clear forms of signaling that are more inefficient and costly than those that could be sustainable in smaller groups.


  • While it would be efficient for a politician to accept money from competing special interest groups and to keep their behavior consistent with their constituents regardless, it is simpler for politicians to convince allies they aren’t corrupt by not accepting money from political opponents at all.
  • With more complex tax codes, governments can implement more economically efficient pigouvian taxes which increase economic growth by concentrating more and more of the tax burden on actions which produce negative externalities for others. However, the power to assess and tax negative externalities gives those that influence tax code the opportunity to shape tax code to their own advantage at the expense of others.
  • While a police officer could accept bribes and enforce the law anyway, unless you have a lot of information on the officer, you wouldn’t be convinced that there weren’t cases the officer was looking the other way. Likewise, people probably don’t trust the objectivity of police departments with the power of civil asset forfeiture even if such power can be used to reduce the tax burden of police and to create stronger deterrent effects on crime.
  • More pacifistic states are more credible in not holding hostile expansionist intentions, than defensive states, who are in turn more credible than states that take offensive or pre-emptive action. 
  • It is simpler for someone to be vegan than to try explaining a series of edge cases about animal welfare/consciousness or a strategy of eating meat when it doesn’t increase demand for meat. It would also look pretty suspicious for vegans to try selling wasted meat, even though doing so would undercut meat producers.
  • It may be more efficient for you to work from home or to shift your hours on fairly autonomous work to match times you are more productive, but generally employers require explanations, and seek to avoid giving their employees room to slack off.
  • Nepotistic hiring enables employers use to ensure the alignment of employees, via additional information on prospective employees and leverage on their social capital. More meritocratic hiring is more ideal for society, however, the more candidates there are to assess the harder it becomes to investigate the merit of each.  Accordingly, the larger a competing population of candidates is, the more their education will become focused to winning signaling competitions, and the less it will become focused on gaining skills that are difficult to demonstrate quickly.

In summary there are a lot of actions that are more directly efficient and selfishly beneficial for those that do them, but because they are not credible signals of good intent/are excuses that the corrupt would use, the options are not sustainable in larger societies. Small groups where people know each other well on the other hand can allow weirder norms to be sustainable without corruption due to their increased ability to vet each other. This may also explain why smaller groups in history often had more sustainable norms of exploiting defectors or outsiders which wouldn’t be sustainable in larger societies since you can’t tell if someone is robbing a thief or an innocent person. Reducing attempts at exploitation between small competing groups of insiders is likewise probably a good thing for scaling up societies.

In general, these signaling costs come from scenarios where people’s interests may not align, and the costs are paid to demonstrate alignment. Without efficient mechanisms to assess and vet each other, as groups scale they lose trust, and more costly signaling becomes required to sustain cooperation.


In Defense of the Arms Races… that End Arms Races

16 января, 2020 - 00:30
Published on January 15, 2020 9:30 PM UTC

All else being equal, arms races are a waste of resources and often an example of the defection equilibrium in the prisoner’s dilemma. However, in some cases, such capacity races may actually be the globally optimal strategy. Below I try to explain this with some examples.

1: If the U.S. kept racing in its military capacity after WW2, the U.S. may have been able to use its negotiating leverage to stop the Soviet Union from becoming a nuclear power: halting proliferation and preventing the build up of world threatening numbers of high yield weapons. Basically, the earlier you win an arms race, the less nasty it may be later. If the U.S. had won the cold war earlier, global development may have taken a very different course, with decades of cooperative growth instead of immense amounts of Soviet GDP being spent on defense, and ultimately causing its collapse. The principle: it may make sense to start an arms race if you think you are going to win if you start now, provided that a nastier arms race is inevitable later. 

2: If neither the U.S. nor Russia developed nuclear weapons at a quick pace, many more factions could have developed them later at a similar time, and this would be much more destabilizing and potentially violent than cases where there is a monopolar or a bipolar power situation. Principle: it is easier to generate stable coordination with small groups of actors than large groups. The more actors there are, the less likely MAD and treaties are to work, the earlier an arms race starts, the more prohibitively expensive it is for new groups to join the race.

3: If hardware design is a bottleneck on the development of far more powerful artificial intelligence systems, then racing to figure out good algorithms now will let us test a lot more things before we get to the point a relatively bad set of algorithms can create an immense amount of harm due to the hardware it has at its disposal (improbable example: imagine a Hitler emulation with the ability to think 1000x faster). Principle: the earlier you start an arms race, the more constrained you are by technological limits.1

I do not necessarily think these arguments are decisive, but I do think it is worth figuring out what the likely alternatives are before deciding if engaging in a particular capacity race is a bad idea. In general:

  • It’s nice for there to not be tons of violence and death from many factions fighting for power (multipolar situation)
  • It is nice to not have the future locked into a horrible direction by the first country/company/group/AI/etc. to effectively take over the world due some advantage derived from racing toward a technological advantage (singleton/monopolar power)
  • It’s nice for there to not be the constant risk of overwhelming suffering and death from a massive arms build up between two factions (bipolar situation)

So if an arms race is good or not basically depends on if the “good guys” are going to win (and remain good guys). If not, racing just makes everyone spend more on potentially risky tech and less on helping people. While some concerns about autonomous drones are legitimate and they may make individuals much more powerful, I am unsure it is good to stop investment races now unless they can also be stopped from happening later. Likewise, the consequences of U.S. leadership in such a race are likely to shape how lethal autonomous weapons proliferate in a more ethical direction, with lower probabilities of civilian deaths than the weapons that states would otherwise purchase.  It is also probably better to start figuring out what goes wrong while humans will still be controlling mostly autonomous drones than to wait for a bunch of countries to defect on unenforceable arms control agreements later in conflict and start deploying riskier/less well vetted systems.

If one thinks decision agencies will be better governed in the future, delaying technologies that centralize power may make sense to avoid locking in bad governments/companies/AI systems. However, to the degree competent bureaucracies can gain advantage from making risky tech investments regardless of their alignment with the general population, the more aligned systems must keep a lead to prevent others from locking in poor institutions.

Overall, arms races are wasteful and unsafe, but they may mitigate other even less safe races if they happen at the right time under the right conditions. In general, by suppressing the incentive for violence between individuals and building up larger societies, states pursuing power in zero-sum power races ultimately created positive sum economic spillovers from peace and innovation.


  1. As opposed to tech research continuing outside the military, and when an arms race begins there is a sudden destabilizing leap in attack capacity for one side or another. Return to Article
  2. You can see other related arguments in the debate on nuclear modernization here.


Go F*** Someone

15 января, 2020 - 21:39
Published on January 15, 2020 6:39 PM UTC

As always, cross-posted from Putanumonit.

From Tokyo to TriBeCa, people are increasingly alone. People go on fewer dates, marry less and later, have smaller families if at all. People are having less sex, especially young people. The common complaint: it’s just too hard. Dating is hard, intimacy is hard, relationships are hard. I’m not ready to play on hard mode yet, I’ll do the relationship thing when I level up.

And simultaneously, a cottage industry sprung up extolling the virtue of loneliness. Self-care, self-development, self-love. Travel solo, live solo, you do you. Wait, doesn’t that last one literally mean “go fuck yourself”?

This essay is to tell you: go fuck someone else. Ask someone on a date. At the very least, invite someone to hang out and ask them what they’re struggling with. This essay is not about how to make friends and lovers (a topic I’ll come back to), but an exhortation to actually go and do that. Now instead of later, directly instead of ass-backwards, seek relationships instead of seeking to be deemed worthy of relationships. If you think this is all too obvious to mention, reread the first two paragraphs again.

My argument doesn’t hinge on specific data relating to the intimacy recession and whether the survey counting sex dolls adjusted for inflation. If you’re reading Putanumonit as a brief escape from all the loving relationships smothering you, congrats! If you’re trying as hard as you can to connect and the world isn’t reciprocating, consider this essay as written for those you seek to connect with instead. Reverse all advice as neccessary.

This essay’s epistemic status is whatever The Last Psychiatrist was drinking.

Wherefore all this aloneness? The pink-hairs blame the red-pills who blame the pink-hairs. But really, they’re both in agreement that men and women are natural enemies and any interactions between the two are zero-sum. If you’re stuck in zero-sum thinking you’re probably on the wrong blog, but take this as a first dose of medicine and then go give someone a hug.

One level up from the gender war is the class war. Leftists blame loneliness on capitalism — single people buy twice as many toasters, sex toys, and Netflix subscriptions. Rightists blame socialism — for the state to be your daddy it must first destroy the family. I won’t spend much time on this. If your ability to connect with people depends more than zero on the GDP composition that’s the problem right there. “But in this economy…” Listen, if you’re struggling to build financial capital, maybe now is the time to invest in relationship capital instead?

The famous Atlantic article on The Sex Recession starts by noting that sex is now more accepted than ever:

If hookups are your thing, Grindr and Tinder offer the prospect of casual sex within the hour. The phrase If something exists, there is porn of it used to be a clever internet meme; now it’s a truism. BDSM plays at the local multiplex—but why bother going? Sex is portrayed, often graphically and sometimes gorgeously, on prime-time cable. Sexting is, statistically speaking, normal.
Polyamory is a household word. Shame-laden terms like perversion have given way to cheerful-sounding ones like kink. Anal sex has gone from final taboo to “fifth base”—Teen Vogue (yes, Teen Vogue) even ran a guide to it. With the exception of perhaps incest and bestiality—and of course nonconsensual sex more generally—our culture has never been more tolerant of sex in just about every permutation.
[…] These should be boom times for sex.

So why, in the words of philosopher Julia Kristeva, “everything is permitted and nothing is possible”?

I don’t think there’s a contradiction here. Everything is hard because it’s permitted.

There used to be no shortage of people who would judge you for having sex. Parents, peers, teachers, pastors, even the same media outlets that now claims to be “sex positive”. And when you had to escape surveillance and risk judgment just to make out with someone, it was HOT. The illicit is sexy. Sneaking around created a bond based on a shared secret and merely having sex in the face of restriction was an achievement to be proud of. Having good sex was gravy.

If “the culture” no longer judges you for getting naked, who will? Your partner might. They’ll think you’re inexperienced, or too experienced, or too frigid or horny or vanilla or too weird. This can be a problem, but it’s ameliorated by your partner repeatedly telling you that no, it was good, you’re just what they wanted. You should believe them. If they didn’t like you they’d make like Hamlet and ghost.

The big problem is when you start judging yourself. You can hide from your parents. You can find a partner who doesn’t judge your shortcomings. But you can’t outrun your own insecurities.

It starts by comparing yourself to the internet. Everyone’s dick is bigger in porn, the tits are perkier. Everyone’s dates are more romantic on Instagram, their vacations sexier. People who suck at relationships are a lot less visible online.

It also turns out that society will judge you for looking for romance if your perceived status isn’t up to snuff. Try to date “out of your league” and you’ll be labeled a creep or a thot, depending on gender[1]. People who seek help with dating can run into this judgment and begin to internalize their perceived inadequacy. They start diverting all their energy into acquiring status markers, into being perceived as relationship-worthy by the real or imagined crowd of observers.

There’s no natural end to this process. As people spend more effort on status-climbing and self-improvement they spend less time in actual relationships. Unfortunately, you don’t get better at dating by learning to meditate or doing pushups alone in your room. When people who are obsessed with self-improvement have a miserable time on apps and first dates, they often conclude that problem is lack of self-improvement — surely when two well-developed high-status people effortless love will spark by itself! And so people keep chasing the next personal milestone. Get that degree, lose 10 pounds, learn that skill, read that book…

It’s important to distinguish between life’s necessities and extras. If you’ve just lost your job, are dealing with a health crisis, or moved to a new city where you have no friends then you should probably stabilize these issues before dating. Dating is hard, and acute crises should be solved directly and not by looking for salvation in a partner. But most self-development isn’t addressing real crises even if it pretends to.
Self-development is riskless. Progress is slow but assured, and every step towards your personal goal is rewarded with likes and favs on social media. The pursuit itself raises one’s status. Opening up for connection, on the other hand, is scary. The rewards are great but so is the risk of failure. And real affection is the one thing you can’t brag about in an Instagram story. Intimacy for external consumption is not intimacy.

And so, as the great guru put it: people want to be fuckable more than they want to fuck.

Fuckability is capital. We seek to accumulate capital. Fucking is labor. We seek to avoid labor. And so people are more fuckable than ever, and do ever less fucking.

It gets worse.

The pathological case of becoming obsessed with status and perception is when relationships themselves are subjugated to this end. When the main measure of a relationship is in how it makes you appear. Narcissism.

I see it in rich women who refuse to date a man who makes less money than they do, no matter how severely it limits their mating pool, because it would be beneath them to have a poorer boyfriend. I see it men who refuse to date a woman who is a year older or an inch taller than they are.

It’s looking at accomplished women dropping out of demanding careers to raise kids as sexism. Could it be that someone may prefer to raise a family to grinding 70 hours a week at the office once they don’t need to worry about money? I certainly would! But if the only thing you count is personal status[2] then it would seem to you that these women are being cheated out of something by the evil patriarchy.

Narcissists ask: How does this relationship reinforce my ego narrative brand? How worthy does it make me seem? Ego-poisoned people who are short of narcissism merely ask: Would I be judged of a relationship? These questions are self-focused, and intimacy requires that you relinquish them entirely. Instead, the question that starts all good relationships is: Can I make someone happy?[3]

Making someone happy doesn’t imply forever, or as happy as they can be, or happier than anyone else could make them. A compliment makes a person happy. A text where you share something fun. Being a good listener on a date even if you didn’t blow their mind with electric conversation. A cuddle makes a person happy even if it stays a cuddle. Sex makes people happy even if it’s not PornHub-grade.

Romance is the most complex and rewarding multi-player game that humanity has invented. There are many romantic interactions that are short of your wildest dreams that are still worth having, that make two people happier than they would have been alone. And if you’re starting out, that’s where you should aim for.

Dating and sex and relationships are all trainable skills. You learn by doing. To learn painting you start by making 100 paintings. To get good at tennis you start by playing 100 matches. The first 100 will be mostly mediocre and some will be outright bad, but the 101st one has the chance to be good.

To go on a great date, you have to go on 100 mediocre dates. Or at least, put yourself in the mindset where that is your goal. That is how you learn to date and make people happy to be dating you. You learn how to deal with rejection and breakups and how to bounce back. Just as importantly, that’s where you learn to enjoy dating (see rule 97).

What if you’re not enjoying it? There are bad dates out there, people who are selfish and manipulative and dangerous or who just don’t show up. This sucks, and the only consolation is that with dating experience you get better at spotting them earlier.

But perhaps you are going on dates with lovely people but the dates aren’t going exactly according to the script you envisioned. Or the people who flirt and match with you are not quite what someone with your degrees and BMI and yoga skill deserves. In this case you should go back to self-development: fix your narcissism and figure out what value you actually provide to a romantic partner besides imagining that you raise their status through mere association.

How to tell if you’re in the latter category? If you get a lot of “I can’t believe a great guy/gal like you can’t find a girlfriend/boyfriend” from your friends, that’s a sign. Your friends saying that is not a compliment, it’s a mockery of your misguided self-focus. They’re saying that you have the resources to make someone happy, and that you’re failing to do so.

Unfortunately, dating is a matter of luck and circumstance. All you can do is be proactive and open. There’s no guarantee that you’ll meet the partners you want in a given time frame or for a given amount of mating effort. Exponential distributions are tough: you go through one mediocre match after another, and there’s no way to predict when the positive outlier comes. But still, you’ll always do better the earlier you start.

Perhaps there was a hidden benefit to the premodern mating context when you had roughly one shot at a successful partnering — all you could do is invest in the one relationship you’re given. But now that the option to date without lifelong commitment exists it affects your dating life even if you don’t plan on it. The option is always there for you and your partners. Waiting until you hit some life marker to start dating just means that you miss out on years of learning what other people are looking for, and what you yourself are looking for in a relationship.

And if you’re too busy for dating, actually busy with something that’s more important to you than romance, consider that dating doesn’t have to be a sink of time and energy. A casual date can be invigorating, and a partner can provide the support you need in your struggles.

So go out there and make some people mildly happy by going on mediocre dates[4]

and having mediocre sex and learning to connect with people romantically instead of having your head up your own ass. There are more interesting things to put in there with a partner.

  1. Men get the worst of it, especially those on the bottom of status ladder. Punching down at low-status people is generally contemptible and so people convince themselves that all incels are violent misogynists to justify it. I see having compassion for incels as a good litmus test of basic human decency. ↩︎

  2. I consider it quite unfortunate that being a middle manager (which entails a lot of personal benefits) is considered higher status than being a good parent or partner (which entails a lot of benefits for other people). ↩︎

  3. In case people are confused, the whole business with decision matrices is about choosing a partner (or a house). Once you’ve chosen, the only thing that counts is investing in the relationship, not scoring or comparing it. ↩︎

  4. If you don’t know who to go on a mediocre date with, I’m always available to deliver mediocre romantic satisfaction in person. ↩︎


[AN #82]: How OpenAI Five distributed their training computation

15 января, 2020 - 21:20
Published on January 15, 2020 6:20 PM UTC

[AN #82]: How OpenAI Five distributed their training computation View this email in your browser Find all Alignment Newsletter resources here. In particular, you can sign up, or look through this spreadsheet of all summaries that have ever been in the newsletter. I'm always happy to hear feedback; you can send it to me by replying to this email.

Audio version here (may not be up yet).


Dota 2 with Large Scale Deep Reinforcement Learning (OpenAI et al) (summarized by Nicholas): In April, OpenAI Five (AN #54) defeated the world champion Dota 2 team, OG. This paper describes its training process. OpenAI et al. hand-engineered the reward function as well as some features, actions, and parts of the policy. The rest of the policy was trained using PPO with an LSTM architecture at a massive scale. They trained this in a distributed fashion as follows:

- The Controller receives and distributes the updated parameters.

- The Rollout Worker CPUs simulate the game, send observations to the Forward Pass GPUs and publish samples to the Experience Buffer.

- The Forward Pass GPUs determine the actions to use and send them to the Rollout Workers.

- The Optimizer GPUs sample experience from the Experience Buffer, calculate gradient updates, and then publish updated parameters to the Controller.

The model trained over 296 days. In that time, OpenAI needed to adapt it to changes in the code and game mechanics. This was done via model “surgery”, in which they would try to initialize a new model to maintain the same input-output mapping as the old one. When this was not possible, they gradually increased the proportion of games played with the new version over time.

Nicholas's opinion: I feel similarly to my opinion on AlphaStar (AN #73) here. The result is definitely impressive and a major step up in complexity from shorter, discrete games like chess or go. However, I don’t see how the approach of just running PPO at a large scale brings us closer to AGI because we can’t run massively parallel simulations of real world tasks. Even for tasks that can be simulated, this seems prohibitively expensive for most use cases (I couldn’t find the exact costs, but I’d estimate this model cost tens of millions of dollars). I’d be quite excited to see an example of deep RL being used for a complex real world task without training in simulation.

Technical AI alignment   Technical agendas and prioritization

Just Imitate Humans? (Michael Cohen) (summarized by Rohin): This post asks whether it is safe to build AI systems that just imitate humans. The comments have a lot of interesting debate.

Agent foundations

Conceptual Problems with UDT and Policy Selection (Abram Demski) (summarized by Rohin): In Updateless Decision Theory (UDT), the agent decides "at the beginning of time" exactly how it will respond to every possible sequence of observations it could face, so as to maximize the expected value it gets with respect to its prior over how the world evolves. It is updateless because it decides ahead of time how it will respond to evidence, rather than updating once it sees the evidence. This works well when the agent can consider the full environment and react to it, and often gets the right result even when the environment can model the agent (as in Newcomblike problems), as long as the agent knows how the environment will model it.

However, it seems unlikely that UDT will generalize to logical uncertainty and multiagent settings. Logical uncertainty occurs when you haven't computed all the consequences of your actions and is reduced by thinking longer. However, this effectively is a form of updating, whereas UDT tries to know everything upfront and never update, and so it seems hard to make it compatible with logical uncertainty. With multiagent scenarios, the issue is that UDT wants to decide on its policy "before" any other policies, which may not always be possible, e.g. if another agent is also using UDT. The philosophy behind UDT is to figure out how you will respond to everything ahead of time; as a result, UDT aims to precommit to strategies assuming that other agents will respond to its commitments; so two UDT agents are effectively "racing" to make their commitments as fast as possible, reducing the time taken to consider those commitments as much as possible. This seems like a bad recipe if we want UDT agents to work well with each other.

Rohin's opinion: I am no expert in decision theory, but these objections seem quite strong and convincing to me.

A Critique of Functional Decision Theory (Will MacAskill) (summarized by Rohin): This summary is more editorialized than most. This post critiques Functional Decision Theory (FDT). I'm not going to go into detail, but I think the arguments basically fall into two camps. First, there are situations in which there is no uncertainty about the consequences of actions, and yet FDT chooses actions that do not have the highest utility, because of their impact on counterfactual worlds which "could have happened" (but ultimately, the agent is just leaving utility on the table). Second, FDT relies on the ability to tell when someone is "running an algorithm that is similar to you", or is "logically correlated with you". But there's no such crisp concept, and this leads to all sorts of problems with FDT as a decision theory.

Rohin's opinion: Like Buck from MIRI, I feel like I understand these objections and disagree with them. On the first argument, I agree with Abram that a decision should be evaluated based on how well the agent performs with respect to the probability distribution used to define the problem; FDT only performs badly if you evaluate on a decision problem produced by conditioning on a highly improbable event. On the second class of arguments, I certainly agree that there isn't (yet) a crisp concept for "logical similarity"; however, I would be shocked if the intuitive concept of logical similarity was not relevant in the general way that FDT suggests. If your goal is to hardcode FDT into an AI agent, or your goal is to write down a decision theory that in principle (e.g. with infinite computation) defines the correct action, then it's certainly a problem that we have no crisp definition yet. However, FDT can still be useful for getting more clarity on how one ought to reason, without providing a full definition.

Learning human intent

Learning to Imitate Human Demonstrations via CycleGAN (Laura Smith et al) (summarized by Zach): Most methods for imitation learning, where robots learn from a demonstration, assume that the actions of the demonstrator and robot are the same. This means that expensive techniques such as teleoperation have to be used to generate demonstrations. This paper presents a method to engage in automated visual instruction-following with demonstrations (AVID) that works by translating video demonstrations done by a human into demonstrations done by a robot. To do this, the authors use CycleGAN, a method to translate an image from one domain to another domain using unpaired images as training data. CycleGAN allows them to translate videos of humans performing the task into videos of the robot performing the task, which the robot can then imitate. In order to make learning tractable, the demonstrations had to be divided up into 'key stages' so that the robot can learn a sequence of more manageable tasks. In this setup, the robot only needs supervision to ensure that it's copying each stage properly before moving on to the next one. To test the method, the authors have the robot retrieve a coffee cup and make coffee. AVID significantly outperforms other imitation learning methods and can achieve 70% / 80% success rate on the tasks, respectively.

Zach's opinion: In general, I like the idea of 'translating' demonstrations from one domain into another. It's worth noting that there do exist methods for translating visual demonstrations into latent policies. I'm a bit surprised that we didn't see any comparisons with other adversarial methods like GAIfO, but I understand that those methods have high sample complexity so perhaps the methods weren't useful in this context. It's also important to note that these other methods would still require demonstration translation. Another criticism is that AVID is not fully autonomous since it relies on human feedback to progress between stages. However, compared to kinetic teaching or teleoperation, sparse feedback from a human overseer is a minor inconvenience.

Read more: Paper: AVID: Learning Multi-Stage Tasks via Pixel-Level Translation of Human Videos

Preventing bad behavior

When Goodharting is optimal: linear vs diminishing returns, unlikely vs likely, and other factors (Stuart Armstrong) (summarized by Flo): Suppose we were uncertain about which arm in a bandit provides reward (and we don’t get to observe the rewards after choosing an arm). Then, maximizing expected value under this uncertainty is equivalent to picking the most likely reward function as a proxy reward and optimizing that; Goodhart’s law doesn’t apply and is thus not universal. This means that our fear of Goodhart effects is actually informed by more specific intuitions about the structure of our preferences. If there are actions that contribute to multiple possible rewards, optimizing the most likely reward does not need to maximize the expected reward. Even if we optimize for that, we have a problem if value is complex and the way we do reward learning implicitly penalizes complexity. Another problem arises if the correct reward is comparatively difficult to optimize: if we want to maximize the average, it can make sense to only care about rewards that are both likely and easy to optimize. Relatedly, we could fail to correctly account for diminishing marginal returns in some of the rewards.



Goodhart effects are a lot less problematic if we can deal with all of the mentioned factors. Independent of that, Goodhart effects are most problematic when there is little middle ground that all rewards can agree on.

Flo's opinion: I enjoyed this article and the proposed factors match my intuitions. Predicting variable diminishing returns seems especially hard to me. I also worry that the interactions between rewards will be negative-sum, due to resource constraints.

Rohin's opinion: Note that this post considers the setting where we have uncertainty over the true reward function, but we can't learn about the true reward function. If you can gather information about the true reward function, which seems necessary to me (AN #41), then it is almost always worse to take the most likely reward or expected reward as a proxy reward to optimize.


AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty (Dan Hendrycks, Norman Mu et al) (summarized by Dan H): This paper introduces a data augmentation technique to improve robustness and uncertainty estimates. The idea is to take various random augmentations such as random rotations, produce several augmented versions of an image with compositions of random augmentations, and then pool the augmented images into a single image by way of an elementwise convex combination. Said another way, the image is augmented with various traditional augmentations, and these augmented images are “averaged” together. This produces highly diverse augmentations that have similarity to the original image. Unlike techniques such as AutoAugment, this augmentation technique uses typical resources, not 15,000 GPU hours. It also greatly improves generalization to unforeseen corruptions, and it makes models more stable under small perturbations. Most importantly, even as the distribution shifts and accuracy decreases, this technique produces models that can remain calibrated under distributional shift.

Miscellaneous (Alignment)

Defining and Unpacking Transformative AI (Ross Gruetzemacher et al) (summarized by Flo): The notion of transformative AI (TAI) is used to highlight that even narrow AI systems can have large impacts on society. This paper offers a clearer definition of TAI and distinguishes it from radical transformative AI (RTAI).

"Discontinuities or other anomalous patterns in metrics of human progress, as well as irreversibility are common indicators of transformative change. TAI is then broadly defined as an AI technology, which leads to an irreversible change of some important aspects of society, making it a (multi-dimensional) spectrum along the axes of extremitygenerality and fundamentality. " For example, advanced AI weapon systems might have strong implications for great power conflicts but limited effects on people's daily lives; extreme change of limited generality, similar to nuclear weapons. There are two levels: while TAI is comparable to general-purpose technologies (GPTs) like the internal combustion engine, RTAI leads to changes that are comparable to the agricultural or industrial revolution. Both revolutions have been driven by GPTs like the domestication of plants and the steam engine. Similarly, we will likely see TAI before RTAI. The scenario where we don't is termed a radical shift.

Non-radical TAI could still contribute to existential risk in conjunction with other factors. Furthermore, if TAI precedes RTAI, our management of TAI can affect the risks RTAI will pose.

Flo's opinion: I enjoyed this article and the proposed factors match my intuitions. There seem to be two types of problems: extreme beliefs and concave Pareto boundaries. Dealing with the second is more important since a concave Pareto boundary favours extreme policies, even for moderate beliefs. Luckily, diminishing returns can be used to bend the Pareto boundary. However, I expect it to be hard to find the correct rate of diminishing returns, especially in novel situations.

Six AI Risk/Strategy Ideas (Wei Dai) (summarized by Rohin): This post briefly presents three ways that power can become centralized in a world with Comprehensive AI Services (AN #40), argues that under risk aversion "logical" risks can be more concerning than physical risks because they are more correlated, proposes combining human imitations and oracles to remove the human in the loop and become competitive, and suggests doing research to generate evidence of difficulty of a particular strand of research.

Copyright © 2020 Rohin Shah, All rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.