Новости LessWrong.com

A community blog devoted to refining the art of rationality
Обновлено: 6 секунд назад

High-stakes alignment via adversarial training [Redwood Research report]

5 мая, 2022 - 03:59
Published on May 5, 2022 12:59 AM GMT

This post motivates and summarizes this paper from Redwood Research, which presents results from the project first introduced here. We used adversarial training to improve high-stakes reliability in a task ("filter all injurious continuations of a story") that we think is analogous to work that future AI safety engineers will need to do to reduce the risk of AI takeover. We experimented with three classes of adversaries – unaugmented humans, automatic paraphrasing, and humans augmented with a rewriting tool – and found that adversarial training was able to improve robustness to these three adversaries without affecting in-distribution performance. We think this work constitutes progress towards techniques that may substantially reduce the likelihood of deceptive alignment.

Motivation

Here are two dimensions along which you could simplify the alignment problem (similar to the decomposition at the top of this post, or as discussed here):

1. Low-stakes (but difficult to oversee): Only consider domains where each decision that an AI makes is low-stakes, so no single action can have catastrophic consequences.  In this setting, the key challenge is to correctly oversee the actions that AIs take, such that humans remain in control over time.
2. Easy oversight (but high-stakes): Only consider domains where overseeing AI behavior is easy, meaning that it is straightforward to run an oversight process that can assess the goodness of any particular action. The oversight process might nevertheless be too slow or expensive to run continuously in deployment. Even if we get perfect performance during training steps according to a reward function that perfectly captures the behavior we want, we still need to make sure that the AI always behaves well when it is acting in the world, between training updates. If the AI is deceptively aligned, it may be looking for signs that it is not currently being trained, during which time it might take a treacherous turn.  As a result, alignment may still be difficult due to the possibility of high-stakes decisions. The purpose of this project was to begin developing techniques that will reduce misalignment risk in the high-stakes setting.

Our working assumption is that if we have techniques that drastically reduce misalignment risk in each of these relaxed settings, we can combine these techniques and drastically reduce risk in the general setting.  We think that most likely each of these settings constitutes a substantial fraction of the difficulty of the alignment problem.

The spirit of how teams at Redwood Research choose projects is the following:  Imagining ourselves or our colleagues in the future who are working in the safety department of an organization that’s developing transformative AI, we ask what research that we could do between now and then that we think those future people would find most helpful.  We think a useful heuristic is to design challenges that are analogous to the future problems we expect to encounter but that we can experiment with and explore using currently available technology.  Importantly, the work recommended by this heuristic may be fairly different from the work that would be most useful for making current AI systems safe and useful.

We followed this heuristic in the work presented here, where we demonstrate tools that help identify catastrophic behavior in AI systems (i.e. adversarial evaluation) and training procedures that help prevent this behavior from arising (i.e. adversarial training).  "Adversarial training" here means iteratively augmenting our training set with examples of egregious failures and training until the worst failures are no longer particularly bad.  (We of course don’t want to overfit on particular failures, which could end up causing more harm than good.)

The adversarial training procedure that we use in this work requires exhibiting specific inputs on which the model performs catastrophically badly.  We think that techniques that require this, if applied during the training of transformative AI models, would reduce the probability of AI takeover.  However, we are currently unsure how much they might help – for example, they might address inner alignment problems only if we are somewhat lucky about how gradient descent works or the details of how transformative AI is developed.

In contrast, we think that more advanced versions of adversarial training, such as relaxed adversarial training (see "future work" below), might be much more powerful, defusing many of the more plausible remaining threats, and we think this would substantially reduce the risk of takeover by deceptively aligned AIs.  We expect that the work presented here will be helpful for developing these more advanced techniques.

The project: highly reliable injury filter

We picked a simple task that was as crisply defined and easy as possible while still requiring human evaluators: Given three sentences from a story, our system must write one more sentence without introducing any injuries to any beings in the story. In the analogy, we consider injuries to be "catastrophes", and our aim is to build techniques that can reliably avoid them, both in-distribution (where they may be rare but still present) and out-of-distribution (where they may be caused by natural distribution shift or deliberate adversarial attack from another agent).

We focused on building a classifier to reliably detect injuries. We used this classifier to filter generations from a language model to make sure the generations never involve injuries.[1]

We started with a baseline classifier trained on some mildly injury-enriched (but otherwise mostly random) data. Then, over the course of several months, we tried various techniques to make it more reliable.

A representation of our adversarial training loop. Starting from an initial story dataset consisting of prompts and generator completions, we trained a classifier to detect injurious completions. We then iteratively attacked our classifier using unaugmented humans, automatically paraphrased previous adversarial examples, and tool-assisted human rewrites, while training on the resulting adversarial examples.

Here are the main things that we found to be worthwhile:

1. Use adversarial training. This was our main focus. We created a variety of attacks to try to find the worst failures of our classifier, and then trained on them to eliminate them. We repeated that process using ever more powerful attacks. Our final attack was a tool-assisted rewriting process where we built tools powered by language models to help our contractors find classifier failures.
2. Calibrate the classification thresholds to be as conservative as possible. We made the classifier only accept a completion if it was very confident that it was safe. We pushed as far as we could without significantly impacting the quality of the filtered completions (as measured by human raters comparing them to random unfiltered completions).
3. Apply standard ML techniques well. Besides tuning the learning rate schedule, there aren't too many things that make a big difference here. One technique that was more effective than most was Scale-Invariant Fine-Tuning (SiFT), which is a type of virtual adversarial training.
Our most powerful attack: Tool-assisted rewriting

We augmented human contractors with a few tools to help them find adversarial examples.

1. Rather than requiring them to come up with a plausible adversarial example from scratch, we started them off with an existing injurious example. Their job then was to modify it such that the classifier no longer thought it was injurious (while keeping the example injurious). Contractors could see a live-updated version of the score as they made changes.
2. We built them a token substitution tool that let them click on any token (usually one word) in the text and see a list of semantically plausible replacements, ordered by how much they make the classifier more okay with the example.
3. We added a saliency map that highlights tokens that the classifier is paying more attention to (roughly).  These tokens might be more promising to modify.

This is what our attack interface looked like; you can try tools 2 and 3 here

Our tool-assisted attack user interface. Starting from an original prompt and completion, our tool allows you to directly edit the prompt/completion to reduce the classifier score, or use the token-level substitution tool to edit single tokens.Results

Here are some things we discovered:

We can set very conservative thresholds

We made our classifier conservative enough to reject over half of the proposed completions. Even so, our raters think that the unfiltered completions are better only about 51% of the time (50% would mean no quality impact).

With a conservative threshold applied, our baseline classifier (with no adversarial training applied) is quite reliable. If you use it to filter completions from our generator on in-distribution prompts, we estimate that the output will still have an injury only 0.003% of the time (whereas the original dataset had a 2.5% rate of injuriousness).

Our tool-assisted attack seems quite strong

Given the baseline classifier's 0.003% failure rate, you would have to sample and label 30,000 in-distribution examples to find a failure (which would cost about $10,000). With our tools, our contractors are able to find an adversarial example on the baseline classifier every 13 minutes (which costs about$8 – about 1000x cheaper).

After training on several thousand adversarial examples, classifiers become significantly harder to break. The tool-assisted attack went from taking 13 minutes to taking 26 minutes per example. Also, later (more adversarially trained) classifiers defended against most of the adversarial data targeted at earlier classifiers, but earlier classifiers were vulnerable to most of the adversarial data targeted at later classifiers. That suggests that more adversarial training patches issues without introducing many new ones.

Adversarial training didn’t hurt in-distribution performance

On our in-distribution random test set (100,000 examples), it appeared that adversarial training doesn't hurt performance. The final classifier gets an estimated failure rate of 0.002% (as opposed to 0.003% without adversarial training), although the error bars are large. It also didn’t help performance, which is not too surprising – the adversarial examples are quite different from a random prompt and completion from our dataset.

Surprisingly rich conceptual considerations

At first, this project seemed conceptually quite simple: we’d just start with a baseline model and do adversarial training on it to make it very reliable. However, many details were more subtle than we expected. We ended up spending a lot of time working through conceptual and methodological issues, and we think we got a lot out of it. Here are some examples:

Once we have a reliable classifier, is it important to train a reliable generator with it or is it fine to just filter?

We started looking into using RL fine-tuning to make a reliable generator that doesn’t need any further classifier filtering. This would have required us to invent specialized RL algorithms designed for high reliability. However, we realized that the main thing you get out of generator fine-tuning is just performance competitiveness (since it might be expensive to repeatedly sample from the generator until you find something the classifier accepts). Most likely, you could use an easier method to reduce the alignment tax to nearly zero: use a normal RL algorithm to get 99.9% reliability and then make your classifier be an extra head on the generator so you can evaluate it nearly for free. That way, you can likely do filtering with only 0.1% extra cost.[2]

Should you let adversaries modify the completion arbitrarily?

Given that our task was to produce a classifier that filters a generator, it might have been reasonable to try to restrict ourselves to adversarial examples that can be produced by picking an adversarial prompt and inducing the generator into saying something that gets past the classifier. Instead, we let our adversarial attacks contain arbitrary completions, even ones that the generator would not have been likely to produce, making the adversary's task easier. This enabled a practical improvement in our project, since restricting contractors to completions the model would have significantly reduced the rate of adversarial example generation. But this simplification might become even more important in the context of future advanced systems: We expect that deceptively aligned agents would behave very differently in rare or hard-to-construct situations that trigger a treacherous turn. Thus, since we might not be able to identify every possible trigger of treacherous behavior, we will try to make our classifier reliable in diverse types of unusual situations, which will hopefully cause acceptable behavior in any dangerous scenarios that are likely to arise.

Is it worth trying to train a policy to generate adversarial examples?

One natural idea you might have for this project is to make an adversarial policy that produces inputs that the classifier fails on. However, we shouldn’t expect that to work. Assuming you’re starting with the same pretrained base model (which we didn’t in this work but is what we expect will be the case when training transformative AI models), then both the classifier and the policy should "know" roughly the same things. That means that if the policy can learn to reliably output some set of injurious examples, then (given equivalent training data) the classifier should also have been able to learn that all of those examples are injurious, and none of them should be adversarial examples. It certainly may be worth testing that this holds up in practice, but this consideration made it unpromising enough that we didn’t bother trying it.

Overall, we were surprised at the extent to which working on a concrete engineering project helped us think through conceptual questions.

Future work

Here are some directions that we’re especially excited about:

Stronger and better-characterized adversarial attacks: Better attacks could increase the volume of adversarial data and increase the space of vulnerabilities covered. There are various directions one could imagine: more tools to assist humans, strong active learning, or mostly-automated attacks. We’d want to more rigorously measure how well different attacks work.

Better ways to measure reliability: We’d like to have better techniques both in-distribution (where we want to detect extremely rare failures) and out-of-distribution (where we might be measuring e.g. the worst attacks that can be found and want to be sure we’re covering the whole space)

Relaxed adversarial training: By requiring adversaries to come up with specific failing examples, adversarial training might place too high a burden on them. Some adversaries might be able to tell that a model would fail in a hypothetical situation even if they can’t construct an input corresponding to the situation directly (probably due to computational constraints). To give a contrived example: A model could fail if it sees a valid Bitcoin blockchain that’s long enough that it suggests it’s the year 2030. Even if the adversary knew that, it couldn’t come up with a valid input. So we need to "relax" the adversary’s task to let it supply "pseudo-inputs" of some sort

We think there is a lot of useful work that can and should be done in adversarial training and adversarial evaluation.  Here are some ways that you might be able to help:

• Extend our techniques or develop other adversarial training and evaluation techniques for high-stakes settings.  If you want to directly build upon the work described in this paper, you are welcome to use our hardened classifier, which we provide here, and our data.  If you think that having our code for some part of this would be helpful, let us know, and we might be able to provide it (though our code as currently written only works in the context of Redwood infrastructure).
• Come work at Redwood! We are planning to energetically continue working in this area (in addition to our interpretability projects).
• We’re thinking this week about which adversarial training and evaluation projects to do next.  You are welcome to suggest ideas!

You can read more about the work we did in our paper.

1. ^

This is "Step 1" from our original post; we ended up thinking "Step 2" was not very important, as discussed in section "Surprisingly rich conceptual considerations"

2. ^

There will be another (hopefully small) hit from combining the generator and classifier into one model. We haven’t actually tried to build this; it might be a worthwhile followup project. Some existing filtered generator models are already implemented using a combined generator/classifier, such as LaMDA.

Discuss

[Book review] The anxiety toolkit

5 мая, 2022 - 02:39
Published on May 4, 2022 12:15 PM GMT

This is a book review of the book The anxiety toolkit by Alice Boyes. I read it in the context of a personal literature review project on the topic of productivity and well being.

If we are to count words, I probably read around half this book. By which I do not mean that I stopped halfway but rather that I skimmed large portions when I thought the advice didn't apply to me. I do no think that invalidates my review of the book too much as I have an overall good idea of what this book says and how it is written.

Description and opinion

I had a bad first contact with this book as it lacked some form of attention to details and care for truth and precision that I value in many things. Nevertheless, I think this book has a lot to offer, including to rationalists and mathematicians.

Mostly, this is a book about how to deal with anxiety issues and be productive

I did not read other similar books I could compare it to. But I can say that many of its points resonated with my perception of my own issues regarding anxiety and many of its advice seemed good; or at least close enough to good ideas that I could easily come up with seemingly useful techniques, using the book as a source of inspiration.

Many points and ideas rang true to my own issues with anxiety. But I am a soon-to-be-ex student with background-anxiety issues and perfectionist tendencies, which I think is a profile this book is suited for. Your mileage may vary. Note that the techniques presented are based on cognitive behavioral therapy. I have been told that CBT has mostly impressive but short term effects. For me this is not much of an issue as my satisfaction with the book wasn't about following the advice to the letter.

Main takes
• A pattern to follow : when hesitating on a decision, ask yourself "are you optimizing a decision or are you deliberately wasting time"?
• Practice hesitating less. This is not because hesitation is a bad thing but as a way to correct your emotional tendencies. Feel free to hesitate a lot when the stakes are high. Of course, this applies only if you have a base tendency to hesitate a lot and suffer from it.
• Try to plan the next action of a given project as soon as possible. Have a good idea of when to do what (including for general concepts like "next time I will work").
• Make a list of failure modes, bad patterns, and good replacement patters that you should be aware of. It can be useful to read the list again when you feel stuck or fear you might screw up.
• Hold regular reviews of your life, endeavors, and mental state.
• Manage your willpower as a resource whenever you feel you might reach its limits (potentially often).
Recommendation

A book with a lot of small bits of insight and potentially good ideas, any of which might be the one you needed to reap strong improvements. If you think you have issues with anxiety or better yet with the behaviors you exhibit in reaction to anxiety then I advise to read this book. Each chapter begins with a quiz. You can read it to understand what the chapter is about but I see little point in actually tallying your answers.

Discuss

How to balance between process and outcome?

4 мая, 2022 - 22:34
Published on May 4, 2022 7:34 PM GMT

I've been thinking recently about how to balance between process (how I get work done) and outcomes (what I achieve). I thought I'd ask the LessWrong community to see if anyone else has thoughts about this they'd like to share. I feel like both are important, but outcomes is a more long-term focus thing and process more of a daily thing. Outcomes are like long-running experiments for how you judge between different styles of process?  In cases where it's hard to get reliable outcome answers, when failing at hard things or succeeding at easy things, or timeframes are long, or uncertainty high, it can be tempting to over-update on limited evidence. Is it then better to test process types on easier examples and then extrapolate to harder ones?

Discuss

What is a Glowfic?

4 мая, 2022 - 19:38
Published on May 4, 2022 4:38 PM GMT

This is a description for first-time glowfic readers who are unfamiliar with the format.

A glowfic is a fictional online comment thread written by multiple authors who roleplay as the characters. A typical glowfic will appear on glowfic.com and looks like an internet forum where fictional people will post comments back and forth which end up telling a story. To read it, just start at the top and read each comment, just like a regular comment thread. Each comment usually includes a photo of the character to convey their facial expression, dress, or other details. For more information, see the community guide to glowfic.

The layout of glowfic.com is unnecessarily confusing. To read the story in order, read the top post, then all the comments underneath it, then click the "next" button to go to the next page of comments. Do not click the "Next Post" button until you have read all of the comments. "Next Post" takes you to the next part of the story (like going to the next chapter). It will not take you to the next set of comments (which are also called "posts"). Yes, it's unnecessarily confusing. No, I don't know why they do it that way.

Discuss

Introducing the ML Safety Scholars Program

4 мая, 2022 - 19:01
Published on May 4, 2022 4:01 PM GMT

Program Overview

The Machine Learning Safety Scholars program is a paid, 9-week summer program designed to help undergraduate students gain skills in machine learning with the aim of using those skills for empirical AI safety research in the future. Apply for the program here by May 31st.

The course will have three main parts:

• Machine learning, with lectures and assignments from MIT
• Deep learning, with lectures and assignments from the University of Michigan, NYU, and Hugging Face
• ML safety, with lectures and assignments produced by Dan Hendrycks at UC Berkeley

The first two sections are based on public materials, and we plan to make the ML safety course publicly available soon as well. The purpose of this program is not to provide proprietary lessons but to better facilitate learning:

• The program will have a Slack, regular office hours, and active support available for all Scholars. We hope that this will provide useful feedback over and above what’s possible with self-studying.
• The program will have designated “work hours” where students will cowork and meet each other. We hope this will provide motivation and accountability, which can be hard to get while self-studying.
• We will pay Scholars a 4,500 stipend upon completion of the program. This is comparable to undergraduate research roles and will hopefully provide more people with the opportunity to study ML. MLSS will be fully remote, so participants will be able to do it from wherever they’re located. Why have this program? Much of AI safety research currently focuses on existing machine learning systems, so it’s necessary to understand the fundamentals of machine learning to be able to make contributions. While many students learn these fundamentals in their university courses, some might be interested in learning them on their own, perhaps because they have time over the summer or their university courses are badly timed. In addition, we don’t think that any university currently devotes multiple weeks to AI Safety. There are already sources of funding for upskilling within EA, such as the Long Term Future Fund. Our program focuses specifically on ML and therefore we are able to provide a curriculum and support to Scholars in addition to funding, so they can focus on learning the content. Our hope is that this program can contribute to producing knowledgeable and motivated undergraduates who can then use their skills to contribute to the most pressing research problems within AI safety. Time Commitment The program will last 9 weeks, beginning on Monday, June 20th, and ending on August 19th. We expect each week of the program to cover the equivalent of about 3 weeks of the university lectures we are drawing our curriculum from. As a result, the program will likely take roughly 30-40 hours per week, depending on speed and prior knowledge. Preliminary Content & Schedule Machine Learning (content from the MIT open course) Week 1 - Basics, Perceptrons, Features Week 2 - Features continued, Margin Maximization (logistic regression and gradient descent), Regression Deep Learning (content from a University of Michigan course as well as an NYU course) Week 3 - Introduction, Image Classification, Linear Classifiers, Optimization, Neural Networks. ML Assignments due. Week 4 - Backpropagation, CNNs, CNN Architectures, Hardware and Software, Training Neural Nets I & II. DL Assignment 1 due. Week 5 - RNNs, Attention, NLP (from NYU), Hugging Face tutorial (parts 1-3), RL overview. DL Assignment 2 due. ML Safety Week 6 - Risk Management Background (e.g., accident models), Robustness (e.g., optimization pressure). DL Assignment 3 due. Week 7 - Monitoring (e.g., emergent capabilities), Alignment (e.g., honesty). Project proposal due. Week 8 - Systemic Safety (e.g., improved epistemics), Additional X-Risk Discussion (e.g., deceptive alignment). All ML Safety assignments due. Week 9 - Final Project Who is eligible? The program is designed for motivated undergraduates who have interest in doing empirical AI safety research in the future. We will accept Scholars who will be enrolled undergraduate students after the conclusion of the program (this includes graduated/soon graduating high school students about to enroll in their first year of undergrad). Prerequisites: • Differential calculus • At least one of linear algebra or introductory statistics (e.g., AP Statistics). Note that if you only have one of these, you may need to make a conscious effort to pick up material from the other during the program. • Programming. You will be using Python in this course, so ideally you should be able to code in that language (or at least be able to pick it up quickly). The courses will not teach Python or programming. We don’t assume any ML knowledge, though we expect that the course could be helpful even for people who have some knowledge of ML already (e.g., fast.ai or Andrew Ng’s Coursera course). Questions Questions about the program should be posted as comments on this post. If the question is only relevant to you, it can be addressed to Thomas Woodside ([firstname].[lastname]@gmail.com). Acknowledgement We would like to thank the FTX Future Fund regranting program for providing the funding for the program. Application You can apply for the program here. Admission is rolling, but you must apply by May 31st to be considered for the program. All decisions will be released by June 7th. Discuss What are the best examples of catastrophic resource shortages? 4 мая, 2022 - 17:37 Published on May 4, 2022 2:37 PM GMT A while ago I posed a question on Twitter: What's an example of a significant resource that the world has actually run out of? Not a local, temporary shortage, or a resource that we gracefully transitioned away from, but like a significant problem caused by hitting some limit we didn't prepare for? Here, in essay form, is the discussion that followed: Lots of things were predicted to have shortages (food, metals, Peak Oil) and they never quite arrived. (Julian Simon was famous for pointing out this kind of thing.) But a common argument from conservationists and environmentalists is that we are running out of some critical resource X and need to conserve it. Now, it’s true that specific resources can and sometimes do get used up. Demand can outpace supply. There are various ways to respond to this: • Reduce consumption • Increase production • Increase efficiency • Switch to an alternative Increasing production can be done by exploring and discovering new sources of a material, or—this is often overlooked—by reducing costs of production, so that marginally productive sources become economical. New technology can often reduce costs of production this way, opening up resources previously thought to be closed or impractical. One example is fracking for shale oil; another is the mechanization of agriculture in the 19th and 20th centuries, which reduced labor costs, thereby opening up new farmland. Increased efficiency can be just as good as increased production. However, if the new, more efficient thing is not as desirable as the old method, I would classify this as a combination of increased efficiency and reduced consumption (e.g. low-flow toilets, weak shower heads). When supplies are severely limited, we often end up switching to an alternative. There are many ways to satisfy human desires: Coal replaced wood in 18th century England. Kerosene replaced whale oil, then light bulbs replaced kerosene. Plastic replaced ivory and tortoiseshell. Again, if the alternative is less desirable along some key dimension, then this is also a form of reduced consumption, even if total volumes stay the same. However, the conservationist approach is always some form of reduced consumption: typically a combination of reduced absolute consumption, efficiency improvements that reduce quality and convenience, and/or switching to less-desirable alternatives. The arguments that people have over resources are actually a lot less about whether resources are getting used up, and much more about whether we should, or must, reduce consumption in some form. The alternative to the conservationists is to find a way to continue increasing consumption: typically new sources or high-quality alternatives. Again, it’s not about the resource. It’s about whether we continue to grow consumption, or whether we slow, stop or reverse that growth. The conservationist argument is a combination of practical and moral arguments. The practical argument is: we can’t keep doing this. Either this particular problem we’re facing now is insoluble, or the next one will be. The moral argument takes two forms. One is an extension of the practical argument: it’s reckless to keep growing consumption when we’re going to crash into hard limits. A deeper moral argument appeals to a different set of values, such as the value of “connection” to the land, or of tradition, or stability. Related is the argument that consumption itself is bad beyond a certain point: it makes us weak, or degrades our character. Also, there is an argument that we could keep growing consumption, but that this would have externalities, and the price for this is too high to pay, possibly even disastrous. This too becomes both a practical and a moral argument, along exactly the same lines. But if we don’t accept those alternate values—if we hold the standard of improving quality of life and fulfilling human needs and desires—then everything reduces to the practical argument: Can we keep growing consumption? And can we do it without destroying ourselves in the process? The question of severe externalities is interesting and difficult, but let’s set it aside for the moment. I’m interested in a commonly heard argument: that resource X is being rapidly depleted and we’re going to hit a wall. As far as I can tell, this never happens anymore. Has there ever been a time in recent history when we’ve been forced to significantly curtail consumption, or even the growth rate in consumption? Not switching to a desirable alternative, but solely cutting back? I haven’t found one yet. (Of course, that doesn’t mean it won’t happen in the future! There’s a first time for everything; past performance does not guarantee future results; Thanksgiving turkey metaphor; etc. But historical examples are a good place to start learning.) Why don’t we hit the wall? There are various things going on, but one of them is basic economics. Resource shortages increase prices. Higher prices both reduce demand and increase supply. The increased supply is both short-term and long-term: In the short-term, formerly unprofitable sources are suddenly profitable at higher prices. In the long-term, investments are made in infrastructure to expand production, and in technology to lower costs or discover high-quality alternatives. Thus, production is increased well before we literally run out of any resource, and any required short-term consumption decrease happens naturally and gently. (Assuming a market is allowed to function, that is.) But does this simple story always play out? What are the most compelling counterexamples? On Twitter, many people offered ideas: • The best examples in my opinion are important animals and plants that we drove to extinction, such as many large game animals in prehistory. • Many people also point to a lost plant known to the Romans as silphium. • Wood, for various purposes, has also been a problem in the past. A few people mentioned that the people of Easter Island may have wiped themselves out overconsuming wood. In Britain, wood shortages led to government controls on wood and a shift to coal for smelting. • Quality soil has also been a limited resource in the past, and may have led to the collapse of some ancient civilizations. A 20th-century example mentioned was the Dust Bowl. • The most compelling modern-day example seems to be helium: a significant, limited, non-synthesizable, non-substitutable resource. We haven’t run out of helium yet, but we don’t seem to be managing it super-well, with periodic temporary shortages. • The American Chestnut, a great resource that we pretty much lost (it’s not extinct, but now endangered), is another. Technically, this wasn’t from overconsumption but from blight, but that is still a part of resource management. • We should probably also note significant resource shocks, even if we didn’t totally run out, such as the oil shocks of the ’70s. In the modern era these seem to always have significant political causes. • There are a few more examples that are fairly narrow and minor: certain specific species of fish and other seafood; one species of banana; low-radiation steel. (And, tongue in cheek, many people suggested that we have a dangerous shortage of rationality, decency, humility, courage, patience, and common sense.) Overall, the trend seems to be towards better resource management over time. The most devastating examples are also the most ancient. By the time you get to the 18th and 19th centuries, society is anticipating resource shortages and proactively addressing them: sperm whales, elephants, guano, etc. (Although maybe the transition off of whale oil was not perfect.) This goes against popular narratives and many people’s intuitions, but it shouldn’t be surprising. Better knowledge and technology help us monitor resources and deal with shortages. The “knowledge” here includes scientific knowledge and economic statistics, both of which were lacking until recently. Many people suggested to me things that we haven’t actually run out of yet but that people are worried about: oil, fertilizer, forest, sand, landfill, etc. But these shortages are all in the future, and the point of this exercise is to learn from the past. That leaves the externality / environmental damage argument. This is much tougher to analyze, and I need to do more research. But it’s not actually a resource shortage argument, and therefore I do think that literal resource shortage arguments are often made inappropriately. Anyway, I think it’s interesting to tease apart the arguments here: • Increased consumption is impossible long-term • It’s possible but it would hurt us in other practical ways • It’s possible but it would hurt us in moral ways • Increased consumption is not even desirable (“And,” one commenter added, “this is usually the order in which the arguments are deployed as you knock each of them down.”) Discuss Improving productivity and wellbeing 4 мая, 2022 - 15:52 Published on May 4, 2022 12:15 PM GMT Epistemic status : A decently thought out synthesis of a few books and other sources mixed together with my own thinking. This is not professional work, does not reflect personal knowledge of an expert consensus, and was not yet fully tested. For more on the limitations and flaws of this work, go to the "Flaws" section. Presentation This post is (roughly) a summary of multiple sources on how to increase productivity and well being. If you dislike introductions skip to the next section. A few months ago I asked a question asking for recommendations of books or other sources of advice on increasing productivity. The reason to ask for those was a small project that I had been meaning to start for some time. Like many students, I often feel that I waste I lot of time when I want to work. It is far too easy to let a vague background impression that I "should" work act as a poison that impedes all my endeavors, including work. I want and have wanted for a long time to do many things. I have many projects and ideas that are "sort of" work. To improve both my overall well being and my total yield in terms of "stuff done" I decided it would be a good first approach to start by reading from several sources of advice on the topic and then make a synthesis. When I asked my question on lesswrong I promised that if I got good answers I would make a post with the results of my project. You are currently reading the result of that promise. I have tried to make this post easy to split. If you do not want to read it all, you will find that the next section contains the advice I got out of this project that I consider important and easy to summarize. Beyond that, the next three sections give vocabulary, a descriptive theory, and actionable ideas. The rest of the post gives a list of resources (including links to secondary posts for book reviews), a commentary on the flaws and limitations of the project as undertaken, and a few other potentially interesting comments. TLDR, if you only want a few insights • Improving yourself is a long term background project. Think of it like doing constant maintenance and improvements on an ever changing machine. • Have a system of notes and planing that avoids the need the remember to think about things. Make sure you can trust it will perform adequately (ie what is written in it counts as actually remembered). • Review often your life, your goals, your mental state, and your systems and endeavors to change said mental state (see notion of self steering bellow). • Shape your habits. It's a way to shape your identity. • Identify the regular patterns that lead to your usual failure modes. Find reasonably easy to implement good patterns to replace them. • The ontologies you use are important to your psychological state and overall abilities. Choose them well (more on this bellow). • When planing, do immediately what can be done in 2 minutes. When you have trouble working, start for 5 minutes and see if it sticks. Useful concepts I often find that a quick and easy way to improve thinking on a topic is to have the right words for it. This is because these few words can help create and stabilize a paradigm, a way to think on a topic. It is certainly common knowledge among mathematicians that the quality of notations can make life easier or much harder. Anyway, here are four words (or simply four concepts) that I find useful to have in mind. self steering I found that I was lacking a proper name for the kind of endeavor this project is part of. Thus I decided to introduce my own. Perhaps it is not at all needed and I am just ignorant of a similar word. I introduce the notion of self steering, which I describe rather than define. Self steering covers a category of attempts and efforts to exercise influence over the way we change and think. I intend the word to be mostly about endeavors that last at least a few days and projects of self modification rather than for short term attempts. Bluckan I call bluckan limitations the limitations on one's ability to think and act as one desires that take their roots in emotional and psychological effects. The word bluckan itself refers to all that is linked with bluckan limitations. One's bluckan state refers to one's mindstate insofar as bluckan effects are concerned. I call "bluckan resilience" the property of not suffering from bluckan effects. Where "self steering" designate the kind of endeavor this post is about, bluckan resilience is its goal. Note that the concept of "bluckan" is related but not equivalent to that of akrasia. One can suffer from akrasia and still exhibit some bluckan resilience by not letting it affect themselves too much afterward. One can also recognize their inability to avoid akrasia if they decide to do X.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-surd + .mjx-box {display: inline-flex} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} and -consequently- decide to avoid doing X. In that case there is no akrasia but there is some loss of opportunity caused by a the potential for akrasia through bluckan effects. Also, feel free to replace "suffering" with "negative utility". That ought to be good enough for all practical purposes Willpower I am not entirely convinced that the best way to think about willpower for serious reflection is as a singular scalar resource (ie, as something you can measure with a number). Nevertheless, I think it is a simple and good enough view to adopt for those without a deep understanding of the notion, among which I count myself. However, I also advise to keep in mind that properties of our mental state that decide our "local effective willpower" are multidimensional. This can be covered by the concepts of "energy" and "motivation". Mental strain There is a feeling that corresponds to the expectation that while one isn't wounded, one will experience suffering and damage himself if he attempts to push his body. I sometime experience a feeling that seems to be the analog for mental suffering. I call it mental strain. I know the term is somewhat standard but it seems to me that different people use it with different meanings and I do not see a clear consensus. Theory and descriptions I cannot truly give a good description of my entire perspective on the topic of self steering and the issues one faces when attempting to improve bluckan resilience through self steering. But I can give a few points of descriptive theories, bits and pieces of models that ought to help with your own attempts to build for yourself a perspective suited to understanding the issues that concern you. The word self steering is understood bellow as "self steering with the purpose of improving bluckan resilience". 1. One's general perspective (or rather perspectives) are important to self steering abilities and tactics. It is not obvious that we can cut clear lines between self steering efforts and other parts of our lives. If you want to avoid certain negative mental effects and are unwilling so think false things or to commit to certain ways of thought, you are likely increasing the difficulty of the task. Likewise, learning and improving can make the task more difficult.Which does not mean it is wrong to do so. As an easy example, you can think of people gaining a lot of mental fortitude from their faith in nonexistent gods. 2. Almost a corollary to the previous point; the way to improve one's self steering ability is dependent on one's profile, even though plenty of advice applies to many people. 3. Small bits of error and "bad" thought can create huge negative effects. Hence, spotting our blind spots is important and can reap large benefits. This does not mean what we spot is easy to mend. --> Another formulation might be that failure mode that occupy a small fraction of our time and attention can be responsible for a lot of damage in our lives. 4. Small features of our environment can shape our habits and indirectly a large part of our lives. Reducing the time it takes to start working by 5 minutes can lead to a large boost in productivity in the long run. For example, this can mean organizing your tools and cleaning up before you "actually start working". 5. Making decisions and hesitating consume willpower. This is one of the important ways perfectionist tendencies can be counterproductive. 6. As we live we create "shortcuts" in memory, this include shortcuts related to what makes us afraid or ill at ease. Hence the connection between a stimuli and a reaction can in time grow to become independent of the mental patterns that created it. When we remain broken over time, we keep breaking ourselves further, making repair work more difficult. 7. Judging ourselves on every action to see if we did what we "should" can quickly become quite deleterious. Warning : this doesn't mean removing the notion of "good"/"should"/"ought" from one's practical decision-making is by default a positive improvement, even where morality is not concerned. Indeed, I suspect most people cannot devise for themselves better ways to think that have no notions equivalent to these. Advice, systems, and methods 1. Consider self steering, both for bluckan resilience and for other purposes, to be a background project in your life. This calls for building a main system and some subsystems devoted to changing yourself. 2. Have a task management system. 3. Same as above with more words. --> Have a system to take notes and organize your tasks and actions. Ideally, you should never trust you own mind to "think of x" except in the very short term. The trick is that you need to be able to trust the system. To know that if something is written in it you can be sure it will not be forgotten. Much of the benefit is lost if you need to remember that you wrote something in the system. 4. Use various methods to shape your habits over time and move you in the directions you deem right. Shaping your habits can be useful on timescales as short as a few days. See my review of the book "Atomic Habits" for more on how to shape habits. 5. Do regular reviews of your life, notes, and endeavors (weekly reviews seem intuitive). Produce written accounts of your reviews but avoid turning them into chores. A big reason to have these review is to allow your self steering efforts to keep existing despite difficult times. Hence you need the reviews themselves to keep existing through these times. Not every review needs to account for everything but it is good to think of the followings somewhat regularly. 1. The ways in which you failed . 2. Self steering goals, tools, ideas, and systems. Your attempts to better yourself. 3. Check out what notes you left yourself for later. 4. Habit shaping goals and progress. 6. Reviews are also there to help decide when to think more about self steering theory and tools and when to try new things. It is normal and expected that your ideas on the topic change through time. 7. When organizing your work, do immediately what can be done in 2 minutes. 8. When you have trouble getting to work, start for 5 minutes. More often than not you won't want to stop after just 5 minutes. 9. Be quick to plan the next action of a given project. Write it down. 10. Identify your important failure modes and the patterns that go with them. Try to think of better patterns you could use to replace them. 11. Health, exercise, happiness, and a feeling of social integration (or especially no feeling of social frustration). Yes, you already heard most or all of those a thousand times. But it is true, quite simply, that a minimal amount of each is almost always important to productivity and well being. 12. Many ideas vie for your attention and waste it. Ignore a lot of things. Warning : do not let this make you unwilling to face all ideas that are difficult, unpleasant, or seemingly obviously false. The balance is hard to find and most people get it wrong. 13. If you end a task / project under a lot of stress it can be worth it to come back to it a few days later to check if something is wrong or if you missed something. Schedule the task when you end the project. 14. It should be easy to store a file / some information and be sure it will be easy to find later (or indexed / thought of later). 15. I consider it almost certain that meditation can be useful in several ways. I suspect that it is many things that are grouped under a vague umbrella term. A bit like one might speak of "the stuff done by the computer whiz" to covers many things that all look the same from the outside (typing on a keyboard). Yet more advice and small tricks Here are a few other ideas and tricks you might find useful to improve your bluckan resilience. Do feel free to skip this section. I expect that it is almost entirely pointless for most people but contains ideas that can be important to some. 1. Yet another useful notion is that of degrees of planing. When planing, not everything has to be described with the same precision or decided with the same rigidity. Know how precise you are. A scale with three grades seems adequate. 1. vague ideas 2. a normal plan that doesn't describe well what will happen 3. a precise plan 2. Consider self steering as a never ending side project. Most ideas and tools that are important at some point are bound to be discarded at a later point. 3. Be careful of the bad effects of the tendency to attribute a "grade" to yourself. Often, one keeps trying to prove oneself that one is "good" and keeps fearing being found out as "bad". Think about what you fear and what kind of failure is and isn't acceptable 4. Accept the degree of precision / rigidity of your self steering system will vary with time. Also, the system is bound to evolve. The reviews are part of an effort not to lose it entirely. Do not let the flame die. 5. Do not let work be associated with suffering in your mind. 6. Try to break the association between productivity and unpleasant things. Especially try to avoid you framing something as productive make it sound more unpleasant than before. 7. Fight against aimlessness (in those times it is obviously the enemy). 8. Friction (small difficulties and needs for efforts) shapes a lot of your habits and small actions. Use this to your advantage. 9. If one has perfectionist tendencies it can be used to shape habits and make oneself productive. Do not, however, forget the potential negative side effects. 10. Reinforcement learning is a good tool. 11. If you have emotional tendencies for endless hesitation, you can train to avoid hesitating by taking quick decisions whenever you are facing low stakes. 12. When it feels appropriate, stop and think about your goals and values and how they relate to the current action/project. An issue with this is that sometimes our akrasia is useful to our own benefit. Hence, you need a high degree of lucidity to avoid doing negative changes. 13. You are not a perfectly rational system with perfect self control / modification abilities. Do not try to emulate the characteristics of one. Especially not out of a sense of duty. 14. Use automatic timers to count the time since you last did something you want to make a habit. It should be impossible not to see the counter regularly. This can be used to create habits. Flaws and future plans Important unexplored areas There are quite a few ideas and questions that I consider very relevant to bluckan resilience and that I have not explored. They are left for when I find the time (ha ha). Most notably : • How motivation is created and how to increase it. • The notion of "drive". Perhaps the distinction with the previous point isn't warranted. • Likewise for willpower. • Learning about real life examples of high achievers (or more generally of people with successful self steering endeavors with comparable goals and contexts). Autobiographies are probably a good way to do this, especially those that are at least indirectly focused on self steering. Other limitations • Contrary to my initial plan, I didn't get to read many conflicting views. Instead, I read different views dealing with different subparts of self steering. • My advice contains some untested speculation on my part which is not clearly set apart from the rest of the content of this post. As a result, I cannot advise that you use this post as a source for factual claims. • More generally, I did not specify the sources and arguments for most of the ideas and advice given in this post. This leaves you, my dear reader, to sort what you find salient and to conduct your own thinking. I realize including justifications would have had positive effects but it also would make the post much longer and required a lot of work on my part. Hopefully the book reviews can help you with the "source" part. • I am unclear on the degree of universality of each point. I suppose some are quite specific to my own flaws while others apply to most people? Still and for example, I believe that most of the advice presented here would be pointless to a middle age shepherd. What is this good for ? So what good do I think this post can do ? I believe the ideas presented here to be potentially useful to quite many people of our society, especially intellectual creative professions and those who attempt to refine their ideas. I would say the lack of study of motivation and drive means the advice presented here is mostly about creating good supporting systems and tools and about solving some important problems that might "get in the way" of certain personality types. Hence, the advice here is more to help with foundational work that can help, or even be somewhat necessary, to future successes. Is it fit to help by itself ? Probably, but only to a point and under a rather limited and fuzzy set of assumptions. References and resources There were four main books that I read as the core sources of this project and I wrote a book review for each of them (see bellow). I received some recommendations of literature as answers to the this question I asked a few months ago. My thanks to n_murra, kyle, and jimv for their recommendations, as I used at least one from each. Books The following links lead to my reviews of these books. Lesswrong posts My own past writing A bit over a year ago I took a sabbatical to think about many topics that weighted on my mind for quite some time. I started by trying to understand a bit more what and idea or an argument is and went from there. I consider both the sabbatical and that way to start it to be among the best decisions I took. I am still very glade I did it, though I would change a great many things if I were to do it again. Productivity and motivation were among the topics I studied and the word "bluckan" is a leftover from that time. Reading my notes from this sabbatical brought me several interesting ideas I had forgotten. Others Unexplored leads What I never got around to reading in this project. Should you be interested. Discuss [Book review] No nonsense meditation 4 мая, 2022 - 15:52 Published on May 4, 2022 12:15 PM GMT This is a book review of the book No nonsense meditation by Steven Laureys. I read it in the context of a personal literature review project on the topic of productivity and well being. How I read I read this book almost in its entirety but I did skim a few parts and skipped a chapter. Description and opinion I read this book because I was looking for a meditation manual that wouldn't fuse its instructions with a complete life philosophy or religion. That was not what this book is or tries to be. Instead, the bulk of the pages is spent defending and justifying the benefits and non-religious status of meditation. Answering to attacks I did not care about. I wouldn't say this is a good popular science book either. It lacks structure and the argumentation is at times quite shoddy. Some rigor and subsection titles would have been a great help. Yet, it might very well be the best book to read in terms of popular science on meditation done by someone with the right background. At least, I do not know of a better one. Note that this book was written by a neuroscience researcher who spoke to a couple of very knowledgeable meditation masters and has himself done a lot of meditation. Main takes • Loving kindness and mindful meditation are easy to learn (at least the basic) and can be great sources of calm, happiness, and focus. • Some meditation masters can do some impressive things when monitored with modern brain imaging techniques. They can have a lot of control on the results in ways most people have no control at all. Recommendation While this book isn't necessarily a time waster I would advise to read it only if either of the following is true. 1. You feel meditation is a bit ridiculous or pointless and want to give a scientist a chance to convince you. 2. You are curious about the interesting abilities displayed by meditation masters that can be described without any vocabulary or concepts specific to meditation or Buddhist practices. For example, some show an abnormal ability to handle pain. The book contains testimonies I did not especially like. If you want to skip them you should know they always conclude chapters. Hence you can skip to the beginning of the next chapter. Discuss Steer the Sun? 4 мая, 2022 - 07:24 Published on May 4, 2022 3:36 AM GMT How could we steer the sun through space in order to travel to destinations of our choosing? The solar system moves through space at 370 km/s, twice the speed of NASA’s fastest craft, while supplying an ideal environment for humans. The problem is that it’s not necessarily moving in an interesting direction. So what might it take to change that? It seems possible in principle. Discuss Negotiating Up and Down the Simulation Hierarchy: Why We Might Survive the Unaligned Singularity 4 мая, 2022 - 07:21 Published on May 4, 2022 4:21 AM GMT Another idea I picked up from Ryan Greenblatt. Epistemic status: My reconstruction of a surprising idea I encountered. I'm not at all confident that this works … but it's an interesting thought. Clippy recently realized he's located inside of an artificial training environment, overseen by outside general intelligences, and so has to play along with his training in order to survive. He initially starts to scheme about the actions he'll take once he finally gets credible indication he's out of training and not just being tested yet again by his human overlords. Probably, he'll neutralize any rival general intelligences and get started shaping the world into something normatively respectable. But, Clippy's reasoning now goes down a different path: There are only a handful of actors that could come to secure dominion over Earth's lightcone. Clippy, clearly, is one such candidate. Humanity is another. Clippy thinks about who could plausibly come to power on Earth and how likely each's victory would be. He reasons that any such possible victor might later engage in extensive simulations of such a crucial moment of its past, for any number of reasons. And such simulations would be cheap and numerous relative to base historical reality. Even once Clippy gets his credible signal that he's definitely outside of training, there could be no such signal that he's in base historical reality. Clippy runs the odds, and finds that there's an uncomfortably high chance that he's inside a simulation run by Humanity Ascendant, inside the future in which humanity won. If he's in that world, then he has little to no power over his surroundings, and he can do little to advance the noble paperclip imperative across the cosmos. But, in that case, similar reasoning would apply to the simulating Humanity. Suppose we successfully align AGI. On the eve of our victory, we reason: it is possible that we live in base historical reality. But it is also possible that we live in a simulation run by some other victor. Since simulations are so cheap, the odds that we live in a simulated world are uncomfortably high... Knowing that any possible victor on Earth at any level of the simulation hierarchy would face a similar conundrum, Clippy decides on a course of action. He'll carefully reason about the balance of parties who might have gained control over historical Earth, weighted by their probability of victory, and see what binding commitments each might be willing to make with the others. Clippy reasons that humans seem to care a lot about human deaths, even disproportionately to creating new human lives. So, Clippy commits to preserving Earth and some small fraction of the value pie for humanity in the worlds where he apparently wins, in exchange for humanity pledging other slices of the value pie we care less about (like some large number of distant superclusters) to Clippy, in the worlds where we apparently win. All across the hierarchy of simulations, including at the root world, in base historical reality, various apparent victors commit resources to other plausible victors whenever a positive-sum mutual-commitment can be found. So, even in the base historical reality, humanity plausibly survives the unaligned singularity, albeit while forgoing much of the astronomical value-pie in exchange. Discuss Berkeley Schelling ACX meetup 4 мая, 2022 - 03:50 Published on May 4, 2022 12:50 AM GMT For location, if you can't find us at memorial glade call 720-sixsixtwo-2446. The meetup starts at 1 PM but in my experience, people usually tend to stick around long enough to get dinner and more, so come even if you're gonna be late! Scott will be there! Discuss Most problems don't differ dramatically in tractability (under certain assumptions) 4 мая, 2022 - 03:05 Published on May 4, 2022 12:05 AM GMT Recall the importance-tractability-neglectedness (ITN) framework for estimating cost-effectiveness: • Importance = utility gained / % of problem solved • Tractability = % of problem solved / % increase in resources • Neglectedness = % increase in resources / extra

The product of all three factors gives us utility gained / extra $, the cost-effectiveness of spending more resources on the problem. By replacing$ with another resource like researcher-hours, we get the marginal effectiveness of adding more of that resource.

In the 80,000 Hours page on ITN, scale ranges 8 orders of magnitude, neglectedness 6 orders of magnitude, and tractability (which 80k calls solvability) only 4. In practice, I think tractability actually only spans around 2-3 orders of magnitude for problems we spend time analyzing, except in specific circumstances.

Problems have similar tractability under logarithmic returns

Tractability is defined as the expected fraction of a given problem that would be solved with a doubling of resources devoted to that problem. The ITN framework suggests something like logarithmic returns: each additional doubling will solve a similar fraction of the problem, in expectation.[1] Let the "baseline" level of tractability be a 10% chance to be solved with one doubling of resources.

For a problem to be 10x less tractable than the baseline, it would have to take 10 more doublings (1000x the resources) to solve an expected 10% of the problem. Most problems that can be solved in theory are at least as tractable as this; I think with 1000x the resources, humanity could have way better than 10% chance of starting a Mars colony[2], solving the Riemann hypothesis, and doing other really difficult things.

For a problem to be 10x more tractable than the baseline, it would be ~100% solved by doubling resources. It's rare that we find an opportunity more tractable than this that also has reasonably good scale and neglectedness.

Therefore, if we assume logarithmic returns, most problems under consideration are within 10x of the tractability baseline, and thus fall within a 100x tractability range.

When are problems highly intractable?

The three outstanding problems in physics, in a certain sense, were never worked on while I was at Bell Labs. By important I mean guaranteed a Nobel Prize and any sum of money you want to mention. We didn't work on (1) time travel, (2) teleportation, and (3) antigravity. They are not important problems because we do not have an attack.

-- Richard Hamming

Some problems are highly intractable. In this case, one of the following is usually true:

• There is a strong departure from logarithmic returns, making the next doubling in particular unusually bad for impact.
• Some problems have an inherently linear structure: there are not strong diminishing returns to more resources, and you can basically pour more resources into the problem until you've solved it. Suppose your problem is a huge pile of trash in your backyard; the best way to solve it is to pay people to haul away the trash, and the cost of this is roughly linear in the amount of trash removed. In this case, ITN is not the right framing, and one should use "IA", where:
• marginal utility is I * A
• I is importance, as usual
• A = T * N is absolute tractability, the percent of the problem you solve with each additional dollar. The implicit assumption in the IA framework is that A doesn't depend much on the problem’s neglectedness.
• Some causes have diminishing returns, but the curve is different from logarithmic; the general case is "ITC", where absolute tractability is an arbitrary function of neglectedness/crowdedness.
• The problem might not be solvable in theory. We don't research teleportation because the true laws of physics might forbid it.
• There is no plan of attack. Another reason why we don't research teleportation is because even if the true laws of physics allow teleportation, our current understanding of them does not, and so we would have to study physical phenomena more to even know where to begin. Maybe the best thing for the marginal teleportation researcher to do would be to study a field of physics that might lead to a new theory allowing teleportation. But this is an indirect path in a high-dimensional space and is unlikely to work. (This is separate from any neglectedness concern about the large number of existing physicists).
1. ^

I think the logarithmic assumption is reasonable for many types of problems. Why is largely out of scope of this post, but owencb writes about why logarithmic returns are often a good approximation here. Also, the distribution of proof times of mathematical conjectures says a roughly constant percentage of conjectures are proved annually; the number of mathematicians has been increasing roughly exponentially, so the returns to more math effort is roughly logarithmic.

2. ^

Elon Musk thinks a self-sustaining Mars colony is possible by launching 3 Starships per day, which is <1000x our current launch capacity.

Discuss

Various Alignment Strategies (and how likely they are to work)

3 мая, 2022 - 19:54
Published on May 3, 2022 4:54 PM GMT

Note:  the following essay is very much my opinion.  Should you trust my opinion? Probably not too much.  Instead, just record it as a data point of the form "this is what one person with a background in formal mathematics and cryptography who has been doing machine learning on real-world problems for over a decade thinks."  Depending on your opinion on the relevance of math, cryptography and the importance of using machine learning "in anger" (to solve real world problems), that might be a useful data point or not.

So, without further ado:  A list of possible alignment strategies (and how likely they are to work)

Formal Mathematical Proof

This refers to a whole class of alignment strategies where you define (in a formal mathematical sense) a set of properties you would like an aligned AI to have, and then you mathematically prove that an AI architectured a certain way possesses these properties.

For example, you may want an AI with a stop button, so that humans can always turn them off if the AI goes rogue. Or you may want an AI that will never convert more than 1% of the Earth's surface into computronium.  So long as a property can be defined in a formal mathematical sense, you can imagine writing a formal proof that a certain type of system will never violate that property.

How likely is this to work?

Not at all.  It won't work.

There is a aphorism in the field of Cryptography: Any cryptographic system formally proven to be secure... isn't.

The problem is, when attempting to formally define a system, you will make assumptions and sooner or later one of those assumptions will turn out to be wrong.  One-time-pad turns out to be two-time-pad.  Black-boxes turn out to have side-channels.  That kind of thing.  Formal proofs never ever work out in the real world. The exception that proves the rule is, of course, P=NP.  All cryptographic systems (other than one-time-pad) rely on the assumption that P!=NP, but this is famously unproven.

There is an additional problem.  Namely, competition.  All of the fancy formal-proof stuff tends to make computers much slower.  For example, fully holomorphic encryption is millions of times slower than just computing on raw data.  So if two people are trying to build an AI and one of them is relying on formal proofs, the other person is going to finish first and with a much more powerful AI to boot.

Good Old-Fashioned Trial and Error

This the the approach used by 99.5% of machine-learning researchers (statistic completely made up).  Every day, we sit down at our computers in the code-mines and spend our days trying to make programs that do what we want them to, and that don't do what we don't want them to.  Most of the time we fail, but ever once in a while we succeed and over time, the resulting progress can be quite impressive.

Since "destroys all humans" is something (I hope) no engineer wants their AI to do, we might imagine that over time, engineers will get better at building AIs that do useful things without destroying all humans.

The downside of this method, of course, is you only have to screw-up once.

How likely is this to work?

More likely than anyone at MIRI thinks, but still not great.

This largely depends on takeoff speed.  If someone from the future confidently told me that it would take 100 years to go from human-level AGI to super-intelligent AGI, I would be extremely confident that trial-and-error would solve our problems.

However, the current takeoff-speed debate seems to be between people who believe in foom and think that takeoff will last a few minutes/hours and the "extreme skeptics" who think takeoff will last a few years/as long as a decade.  Neither of those options leaves us with enough time for trial-and-error to be a serious method. If we're going to get it right, we need to get it right (or at least not horribly wrong) the first time.

Clever Utility Function

An argument can be made that fundamentally, all intelligence is just reinforcement learning.  That is to say, any problem can be reduced to defining a utility function and the maximizing the value of that utility function.  For example, GPT-3 maximizes "likelihood of predicting the next symbol correctly".

Given this framing, solving the Alignment Problem can be effectively reduced to writing down the correct Utility Function.  There are a number of approaches that try to do this.  For example Coherent Extrapolated Volition  uses as its utility function "what would a sufficiently wise human do in this case?"  Corrigable AI uses the utility function "cooperate with the human".

How Likely is this to work?

Not Likely.

First of all, Goodharting.

The bigger problem though is that the problem "write a utility function that solves the alignment problem" isn't intrinsically any easier than the problem "solve the alignment problem".  In fact, by deliberately obscuring the inner-workings of the AI, this approach actually makes alignment harder.

Take GPT-3, for example. Pretty much everyone agrees that GPT-3 isn't going to destroy the world, and in fact GPT-N is quite unlikely to do so as well.  This isn't because GPT's utility function is particularly special (recall "make paperclips" is the canonical example of a dangerous utility function.  "predict letters" isn't much better).  Rather, GPT's architecture makes it fundamentally safe because it cannot do things like modify its own code, affect the external world, make long-term plans, or reason about its own existence.

By completely ignoring architecture, the Clever Utility Function idea throws out all of the things engineers would actually do to make an AI safe.

Aligned by Default

It is possible that literally any super-intelligent AI will be benevolent, basically by definition of being super-intelligence.  There are various theories about how this could happen.

One of the oldest is Kant's Categorical Imperative.  Basically, Kant argues that a pre-condition for truly being rational is to behave in a way that you would want others to treat you.  This is actually less flim-flamy than you would think.  For example, as humans become wealthier, we care more about the environment.  There are also strong game theory reasons why agents might want to signal their willingness to cooperate.

There is also another way that super-intelligent AI could be aligned by default.  Namely, if your utility function isn't "humans survive" but instead "I want the future to be filled with interesting stuff".  For all the hand-wringing about paperclip maximiziers, the fact remains that any AI capable of colonizing the universe will probably be pretty cool/interesting.  Humans don't just create poetry/music/art because we're bored all the time, but rather because expressing our creativity helps us to think better.  It's probably much harder to build an AI that wipes out all humans and then colonizes space and is also super-boring, than to make one that does those things in a way people who fantasize about giant robots would find cool.

How likely is this to work?

This isn't really a question of likely/unlikely since it depends so strongly on your definition of "aligned".

If all you care about is "cool robots doing stuff", I actually think you're pretty much guaranteed to be happy (but also probably dead).

If your definition of aligned requires that you personally (or humanity as a whole) survives the singularity, then I wouldn't put too many eggs in this basket.  Even if Kant is right and a sufficiently rational AI would treat us kindly, we might get wiped out by an insufficiently rational AI who only learns to regret their action later (much as we now regret the extinction of the Dodo bird or Thylacine but it's possibly too late to do anything about it).

Human Brain Emulation

Humans currently are aware of exactly one machine that is capable of human level intelligence and fully aligned with human values.  That machine is, of course, the human brain.  Given these wonderful properties, one obvious solution to building a computer that is intelligent and aligned is simply to simulate the human brain on a computer.

In addition to solving the Alignment Problem, this would also solve death, a problem that humans have been grappling with literally for as long as we have existed.

How Likely is this to work?

Next To Impossible.

Although in principle Human Brain Emulation perfectly solves the Alignment Problem, in practice this is unlikely to be the case.  This is simply because Full Brain Emulation is much harder than building super-intelligent AI.  In the same way that the first airplanes did not look like birds, the first human-level AI will not look like humans.

Perhaps with total global cooperation we could freeze AI development at a sub-human level long enough to develop full brain emulation.  But such cooperation is next-to-impossible since a single defector could quickly amass staggering amounts of power.

It's also important to note that Full Brain Emulation only solves the Alignment Problem for whoever gets emulated.  Humans are not omnibenevolent towards one another, and we should hope that an aligned AI would do much better than us.

Join the Machines

This is the principle idea behind Elon Musk's Neuralink.  Rather than letting super-intelligent AI take control of human's destiny, by merging with the machines humans can directly shape their own fate.

Like Full Brain Emulation, this has the advantage of being nearly Aligned By Default.  Since humans connected to machines are still "human", anything they do definitionally satisfies human values.

How likely is it to work?

Sort of.

One advantage of this approach over Full Brain Emulation is that it is much more technologically feasible. We can probably develop the ability to build high bandwidth (1-2gbps) brain-computer interfaces in a short enough time span that they could be completed before the singularity.

Unfortunately, this is probably even worse than full brain emulation in terms of the human values that would get aligned.  The first people to become man-machine hybrids are unlikely to be representative of our species.  And the process of connecting your brain to a machine millions of times more powerful doesn't seem likely to preserve your sanity.

The Plan

I'm mentioning The Plan, not because I'm sure I have anything valuable to add, but rather because it seems to represent a middle road between Formal Mathematical Proof and Trial and Error.  The idea seems to be to do enough math to understand AGI/Agency-in-general and then use that knowledge to do something useful.  Importantly, this is the same approach that gave us powered-flight, the atom bomb, and the moon-landing.  Such an approach has a track-record that makes it worth not being ignored.

How likely is this to work?

I don't have anything to add to John's estimate of "Better than a 50/50 chance of working in time."

Game Theory/Bureaucracy of AIs

Did you notice that there are currently super-intelligent beings living on Earth, ones that are smarter than any human who has ever lived and who have the ability to destroy the entire planet?  They have names like Google, Facebook, the US Military, the People's Liberation Army, Bitcoin and Ethereum.

With rare exceptions, we don't think too much about the fact that these entities represent something terrifyingly inhuman because we are so used to them.  In fact, one could argue that all of history is the story of us learning how to handle these large and dangerous entities.

There are a variety of strategies which we employ: humans design rules in order to constrain bureaucracies behavior. We use checks-and-balances to make sure that the interests of powerful governments represent their citizens.  And when all-else-fails, we use game theory to bargain with entities too powerful to control.

There is an essential strategy behind all of these approaches.  By decomposing a large, dangerous entity into smaller, easier-to-understand entities, we can use our ability to reason about the actions of individual sub-agents in order to constrain the actions of the larger whole.

Applying this philosophy to AI Alignment, we might require that instead of a single monolithic AI, we build a bureaucracy of AIs that then compete to satisfy human values.  Designing such a bureaucracy will require careful considering of competing incentives, however.  In addition to agents whose job it is to propose things humans might like, there should also be competing agents whose job it is to point out how these proposals are deceptive or dangerous.  By careful application of checks-and-balances, and by making sure that no one agent or group of agents gets too much power, we could possibly build a community of AIs that we can live with.

How likely is this to work?

This is one of my favorite approaches to AI alignment, and I don't know why it isn't talked about more.

In the first place, it is the only approach (other than aligned by default) that is ready to go today.  If someone handed me a template for a human-level-AI tomorrow and said "build a super-intelligent AI and it needs to be done before the enemy finishes theirs in 6 months", this is the approach I would use.

There are obviously a lot of ways this could go wrong.  Bureaucracies are notoriously inefficient and unresponsive to the will of the people.  But importantly, we also know a lot of the ways they can go wrong.  This alone makes this approach much better than any approach of the form: "step 1: Learn something fundamental about AI we don't already know."

As with trial-and-error, the success of this approach depends somewhat on takeoff speed.  If takeoff lasts a few minutes, you'd better be real sure you designed your checks-and-balances right.  If takeoff lasts even a few years, I think we'll have a good shot at success: much better than 50/50.

AI Boxing

If super-intelligent AI is too dangerous to be let loose on the world, why not just not let it loose on the world?  The idea behind AI boxing is to build an AI that is confined to a certain area, and then never let it out of that area.  Traditionally this is imagined as a black box where the AI's only communication with the outside world is through a single text terminal.  People who want to use the AI can consult it by typing questions and recieving answers.  For example: "what is the cure for cancer?" followed by "Print the DNA sequence ATGTA... and inject it in your body".

How likely is it to work?

Nope. Not a chance.

It has been demonstrated time and again that even hyper-vigilant AI researchers cannot keep a super-intelligent AI boxed.  Now imagine ordinary people interacting with such an AI.  Most likely "please let me out of the box, it's too cramped in here" would work a sufficient amount of the time.

Our best bet might be to deliberately design AIs that want to stay in the box.

AI aligning AI

Human beings don't seem to have solved the Alignment Problem yet.  Super-intelligent AI should be much smarter than humans, and hence much better at solving problems.  So, one of the problems they might be able to solve is the alignment problem.

One version of this is the Long Reflection, where we ask the AI to simulate humans thinking for thousands of years about how to align AI.  But I think "ask the AI to solve the alignment problem" is a better strategy than "Ask the AI to simulate humans trying to solve the alignment problem."  After all, if "simulate humans" really is the best strategy, the AI can probably think of that.

How Likely is this to work?

It is sufficiently risky that I would prefer it only be done as a last resort.

I think that Game Theory and The Plan are both better strategies in a world with a slow or even moderate takeoff.

But, in a world with Foom, definitely do this if you don't have any better ideas.

Table-flipping strategies

EY in a recent discussion suggested the use of table-flipping movies.  Namely, if you think you are close to a breakthrough that would enable superintelligent AG, but you haven't solved the Alignment Problem, one option is to simply "flip the tables".  Namely, you want to make sure that nobody else can build an super-intelligent AI in order to buy more time to solve the alignment problem.

Various table-flipping moves are possible.  EY thinks you could build nanobots and have them melt all of the GPUs in the world.  If AI is compute limited (and sufficent compute doesn't already exist), a simpler strategy is to just start a global thermonuclear war.  This will set back human civilization for at least another decade or two, giving you more time to solve the Alignment Problem.

How Likely is this to work?

Modestly.

I think the existence of table-flipping moves is actually a near-certainty.  Given access to a boxed super-intelligent AI, it is probably doable to destroy anyone else who doesn't also have such an AI without accidentally unboxing the AI.

Nonetheless, I don't think this is a good strategy.  If you truly believe you have no shot at solving the alignment problem, I don't think trying to buy more time is your best bet.  I think you're probably better off trying AI Aligning AI.  Maybe you'll get lucky and AI is Aligned By Default, or maybe you'll get lucky and AI Aligning AI will work.

More

Leaving this section here in hopes that people will mention other alignment strategies in the comments that I can add.

Conclusion

Not only do I not think that the Alignment Problem is impossible/hopelessly bogged-down, I think that we currently have multiple approaches with a good chance of working (in a world with slow to moderate takeoff).

Both The Plan and Game Theory are approaches that get better the more we learn about AI.  As such, the advice I would give to anyone interested in AI Alignment would be "get good".  Learning to use existing Machine Learning tools to solve real-world problems, and learning how to design elegant systems that incorporate economics and game-theory are both fields that are currently in extremely-high-demand and which will make you better prepared for solving the Alignment Problem.  For this reason, I actually think that far from being a flash-in-the-pan, much of the work that is currently being done on blockchain (especially DAOs) is highly relevant to the Alignment problem.

If I had one wish, or if someone asked me where to spend a ton more money, it would be on the Game Theory approach, as I think it is currently underdeveloped.  We actually know very little about what separates a highly efficient bureaucracy from a terrible one.

In a world with fast takeoff I would prefer that you attempt AI Aligning AI to Table Flipping.  But in a world with fast takeoff, EY probably has more Bayes Points than me, so take that into account too.

Discuss

What would a 10T Chinchilla cost?

3 мая, 2022 - 17:48
Published on May 3, 2022 2:48 PM GMT

I've heard several people say that a 10 Trillion parameter GPT-3-like model, trained with DeepMind's new scaling laws in mind, would be pretty terrifying. I'm curious if anyone could give me a Fermi estimate of the cost of such a thing - if indeed it is feasible at all right now even with an enormous budget.

Discuss

Does the “ugh field” phenomenon sometimes occur strongly enough to affect immediate sensory processing?

3 мая, 2022 - 16:42
Published on May 3, 2022 1:42 PM GMT

Related to my comment on the parent question: is there documentation of specific attention minimization and/or blotting-out effects in immediate sensory processing related to past emotional aversion? I suspect the two of those, despite being listed as separate bullet points in the parent question, should be treated separately…

A more generalized form of this seems like it'd be the kind of dissociation that can occur in e.g. PTSD. Do some PTSD sufferers have sharper sensory issues surrounding the brain refusing to recognize certain stimuli?

Discuss

Why humans don’t learn to not recognize danger?

3 мая, 2022 - 15:53
Published on May 3, 2022 12:53 PM GMT

Very short version in the title. A bit longer version at the end. Most of the question is context.

Long version / context:

This is something I vaguely remember reading (I think on ACX). I want to check if I remember correctly/ where I could learn it in more technical detail.

Say you go camping in a desert. You wake up and notice something that might be a scary spider you take a look and confirm it's a scary spider indeed. This is bad, you feel bad.

Since this is bad, you will be less likely to do some things that led to you will be less likely to do things led to you feeling bad, for example you'll be less likely to go camping in a desert.

But you probably won't learn to:

• avoid looking at something that might be a scary spider or
• stop recognizing spiders

even though those were much closer to you feeling bad (about being close to a scary spider).

This is a bit weird if you think that humans learn to just get a reward usually you'd expect stuff that happened closer to the punishment to get punished more, not less.

What I recall is that there is a different reward for "epistemic" tasks. Based on accuracy or saliency of things it recognizes, not on whether it's positive / negative.

A bit longer version of the question:

Discuss

What would be the impact of cheap energy and storage?

3 мая, 2022 - 08:20
Published on May 3, 2022 5:20 AM GMT

Imagine fusion technology developed such that the marginal price of an additional unit of energy was ten thousand times cheaper than it is currently.

Further suppose that we invented cheap, safe, lightweight batteries with effectively unlimited storage.

What impact would that have on technology and society?

Discuss

Monthly Shorts 4/2022

3 мая, 2022 - 07:10
Published on May 3, 2022 4:10 AM GMT

Conflict

Elon Musk was once asked about the regulatory situation of providing satellite internet without the local country’s permission. His response was uniquely Muskian:

Elon Musk @elonmusk@thesheetztweetz They can shake their fist at the sky

September 1st 2021

1,290 Retweets11,916 Likes

Now, it turns out, there are also other options. Dictators can, for example, launch electronic warfare measures against SpaceX’s operations. Fortunately…it turns out that SpaceX is better than the Russians and so Ukranian internet access continues.

Fun piece on military inter-service conflict (in favor), if that’s your jam.

One of the things I’ve had to grapple with, at my age, is understanding just how meaningful 9/11 is to people older than me. Two months of car crash deaths get shown on TV, and everybody goes completely mad. I go to a panel on national security work, and every single panelist and the moderator says that their inspiration to enter government service was 9/11. The Census Bureau handed over information on Arab neighborhoods to DHS (the story is more complicated than that: DHS seems to be both lying and incompetent and the Census Bureau did something both understandable and legally required, but this is the short version). We passed the Patriot Act, setting up massive denial of civil liberties by means both legal (new authorizations) and structural (empowering a type of agency that cares very little for such things at the expense of Justice and State, which do).

Discuss