Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 1 час 4 минуты назад

Meta Programming GPT: A route to Superintelligence?

11 июля, 2020 - 19:11
Published on July 11, 2020 2:51 PM GMT

Imagine typing the following meta-question into GPT-4, a revolutionary new 20 Trillion parameter language model released in 2021:

"I asked the superintelligence how to cure cancer. The superintelligence responded __"

How likely are we to get an actual cure for cancer, complete with manufacturing blueprints? Or will we get yet another "nice sounding, vague suggestion" like "by combining genetic engineering and fungi based medicine"- the sort GPT-2/3 is likely to suggest?

The response depends on whether GPT focuses on either:

1. What GPT thinks humans think that the superintelligence would say; or

2. Using basic reasoning, solve for what the character (an actual superintelligence) would say if this scenario were playing out in real life.

If GPT takes the second approach, by imitating the idealised superintelligence, it would in essence have to act superintelligent.

The difference between the two lies on the fine semantic line: whether GPT thinks the conversation is a human imitating superintelligence, or an actual words of a superintelligence. Arguably, since it only has training samples of the former, it will do the former. Yet that's not what it did with numbers - it learnt the underlying principle, and extrapolated to tasks it had never seen.

If #1 is true, that still implies that GPT-3/4 could be very useful as an AXI: we just need it to imitate a really smart human. More below under "Human Augmentation".

Human-Like Learning?

Human intelligence ['ability to achieve goals'] can be modelled purely as an optimisation process towards imitating an agent that achieves those goals. In so far as these goals can be expressed in language, GPT exhibits a similar capacity to "imagine up an agent" that is likely to fulfil a particular goal. Ergo, GPT exhibits primitive intelligence, of the same kind as human intelligence.

More specifically, I'm trying to clarify that there is a spectrum between imitation and meta-imitation; and bigger GPT models are getting progressively better at meta-imitation.

  • Meta-Imitation is the imitation of the underlying type of thinking that is represented by a class of real or fictional actors. Eg., mathematics.
  • Imitation is direct (perfect/imperfect) copying of an observed behaviour : eg. recalling the atomic number of Uranium.

Language allows humans to imagine ideas that they then imitate- it gives us an ability to imitate the abstract.

Suppose you were a general in ancient Athens, and the problem of house lamps occasionally spilling and setting neighbourhoods aflame was brought you. "We should build a fire-fighting squad.", You pronounce. The words "fire fighting squad" may never have been used in history before that (as sufficient destiny of human population requiring such measures didn't occur earlier) - yet the meaning would be, to a great degree, plain to onlookers. The fire-fighting squad thus formed can go about their duties without much further instruction, by making decisions based on substituting the question "what do I do?" with "what would a hypothetical idealised firefighter do?".

With a simple use of language, we're able to get people to optimize for brand new tasks. Could this same sort of reasoning be used with GPT? Evidence of word substitution would suggest so.

So in one line, is Meta-Imitation = Intelligence ? And will GPT ever be capable of human-level meta-imitation?

Larger GPT models appear to show an increase in meta-imitation over literal imitation. For example, if you asked GPT-2:

"what is 17+244?"

It replies "11"

This is closer to literal imitation - It knows numbers come after a question including other numbers and an operator ("+"). Incidentally, young children seem to acquire language in a somewhat similar fashion:

They begin by imitating utterances (a baby might initially describe many things as "baba"); Their utterances grow increasingly sensitive to nuances of context over time "doggy" < "Labrador" < "Tommy's Labrador named Kappy". I'm arguing that GPT shows a similar increase in contextual sensitivity as the model size grows, implying increasing meta-imitation.

Human Augmentation

My definition of AXI relies on a turing test comprising of a foremost expert in a field conversing with another expert (or an AI). If the expert finds the conversation highly informative and indistinguishable from the human expert, we've created useful AXI.

GPT-2 and GPT-3 appear to show progression towards such intelligence - GPT written research papers providing interesting ideas being one example. Thus, even if GPT-4 isn't superintelligent, I feel it is highly likely to qualify as AXI [especially when trained on research from the relevant field]. And while it may not be able to answer the question on cancer, maybe it will respond to subtler prompts that induce it to imitate a human expert that has solved the problem. So the following might be how a human announces finding the cure for cancer, and GPT-4's completion might yield interesting results:

"Our team has performed in-vivo experiments where we were able to target and destroy cancerous cells, while leaving healthy ones untouched. We achieved this by targeting certain inactivated genes through a lentivirus-delivered Cas9–sgRNA system. The pooled lentiviruses target several genes, including "

[Epistemic status: weak - I'm not a geneticist and this is likely not the best prompt - but this implies that it would require human experts working in unison with AXI to coax it to give meaningful answers.]

Failure Modes

GPT has some interesting failure modes very distinct from a human - going into repetitive loops for one, and with GPT-3 in particular, and increasing tendency to reproduce texts verbatim. Maybe we'll find that GPT-4 is just a really good memoriser, and lacks abstract thinking and creativity. Or maybe it falls into even more loops than GPT-3. It is hard to say.

To me, the main argument against a GPT-4 acquiring superintelligence is simply its reward function- it is trained to copy humans, perhaps it will not be able to do things humans can't (since there is no point optimising for it). However, this is a fairly weak position. Because, to be precise, GPT attempts to imitate anything, real or hypothetical, in an attempt to get at the right next word. The examples of math, and invented words, show that GPT appears to be learning the processes behind the words, and extrapolating them to unseen scenarios.

Finally, the word "superintelligence" is likely to have a lot of baggage from its usage in sci-fi and other articles by humans. Perhaps, to remove any human linked baggage with the word superintelligence, we could instead define specific scenarios, to focus the AI on imitating the new concept, rather than recalling previous human usage. For example:

"RQRST is a robot capable of devising scientific theories that accurately predict reality. When asked to devise a theory on Dark Energy, RQRST responds,"


"Robert Riley was the finest geneticist of the 21st century. His work on genetic screening of embryos relied on "


"Apple has invented a new battery to replace lithium Ion, that lasts 20x as long. It relies on"

I'd love to see GPT-3 complete expert sounding claims of as-yet unachieved scientific breakthroughs. I'm sure it can already give researchers working in the domain interesting answers; especially once fine-tuning with relevant work is possible.


Hierarchy of Evidence

11 июля, 2020 - 15:54
Published on July 11, 2020 12:54 PM GMT

There have been many hierarchies of evidence made for various fields of science. I was looking for an image of a more general hierarchy that could easily be dropped into any online conversation to quickly improve the debate. I found none that had all the features I was looking for. So I took an old hierarchy: expanded it, made it more aesthetically pleasing and made it into a jpeg, pdf and pages-file so people can easily share and modify it (e.g translate it or convert it to different files). Here it is:


Maximal Ventilation

11 июля, 2020 - 04:20
Published on July 11, 2020 1:20 AM GMT

At this point, one of the things I'm most concerned about with covid-19 is breathing air that has been recently breathed by other people. Six foot distancing would be pretty good if it quickly settled out of the air, but it's looking more like it can stay suspended in the air for extended periods.

The ideal is outdoors. Even on a relatively calm day, the amount of air flow outdoors is enormous compared to what you might get inside. A gentle 1mph breeze clears an 90-ft diameter circle in a minute. An aggressive level of ventilation indoors is about a quarter of that, and most places get far less. So the more we can move outside, the better: I think it's foolish that we have restaurants open for indoor dining when outdoor dining gets you most of the same benefit with much less shared air. Especially when considering that eating means people will have their masks off while talking. Similarly, while people definitely need to be able to buy food, moving grocery stores to a model of offering only outdoor pickup would be safer for both the customers and the employees then letting the public in.

The previous two summers my kids did an outdoor summer program in a park, where the only indoor component was bathrooms. It didn't run on rainy days, which would make it hard to plan around, but it's much more socialization for the level of risk than something indoors. Prompted by tuberculosis risk, 100 years ago, occasionally students would have school outdoors, even in the winter:

When things can't be moved outdoors, it's still possible to dramatically increase the level of ventilation indoors. Dance organizers and people who cool their houses using fans are familiar with this: put a lot of fans in windows, use some fans inside to stir up the air, and think through the path the air will take. The best case is, one side of a room has fans blowing out each window, with cowls, and the other side of the room has a set fans blowing in. For example, if you have a 900 sqft classroom with four 20-in box fans, you are changing over the air in the room approximately once a minute.

This doesn't work well in hard rain, and you're not going to be able to have the place much warmer or cooler than the outside, but it would still help a lot. Since you're inside, you could give people electric blankets in the winter.

I don't see reopening guidelines talking about really aggressive ventilation. For example, the CDC suggests considering opening windows, when they could be suggesting setting up fans in them. I also don't see any guidelines giving minimum ACH numbers.

In general, it seems to me that a lot of guidance is still heavily focused on surfaces, and not adapting quickly enough to we're learning about the risks of poor ventilation.

Comment via: facebook


Mod Note: Tagging Activity is Not Private

11 июля, 2020 - 03:56
Published on July 11, 2020 12:56 AM GMT

We don’t yet have UI for it, so this isn’t obvious (hence the announcement), but it seemed good to clearly communicate that if you tag a post or vote on the tag relevance of a tag-post combination, this information may be viewed by other users and/or the mod team (after we built the relevant UI).

Tagging activity is unlike normal (karma-related) voting, for which the LessWrong team is fully committed to maintaining privacy for. 

Karma voting expresses a judgment of quality and/or approval, and it makes sense that for users to vote honestly, they need to feel that their votes will not be scrutinized or judged by others. In contrast, tagging activity – both tagging and tag voting – is an act of content creation for the community. In the same way we want to know the authors of posts and comments, we want to know the “authors” of the tags and tag pages that arise from tagging activity.

Knowledge of the content creators for tagging helps us both in positive ways, i.e. giving credit to those who make valuable contributions, and also in “negative” ways by making it easy to catch abuse/vandalism of the system. And by making it clear who is tagging in what way, it also becomes easier to have discussions about which tags apply where, which are good tags, etc.

A step beyond all this is that possibly the tagging feature will be extended into a full-fledged wiki feature. Obviously, wiki activity (creating and modifying wiki pages) needs to be linked to whoever is making changes. That makes it naturally the case that the tagging system (almost a proto-wiki) needs to link activity to user accounts.

All of the above makes it seem correct for tagging activity to not be private, unlike regular karma-related voting. We will soon launch the expanded UI for tagging that contains a history for each tag which will more clearly display what information is getting shared and how.

Sorry for not having been clearer about this earlier. If you have any concerns about this, don’t hesitate to reach out. Also if you made any activity in the last few weeks that you really don't want to be public, feel free to ping us and we can make sure that those stay private (DM, Intercom, or team@lesswrong.com are all good for reaching out about this.)


Your Prioritization is Underspecified

10 июля, 2020 - 23:48
Published on July 10, 2020 8:48 PM GMT

Good prioritization as a key determinant of productivity. Ambiguity as a key determinant of prioritization. A brief sketch towards a taxonomy of ambiguity. There seems to be a lot of unexplored territory and potential low-hanging fruit here.

Short term goals have nice legible feedback loops with yummy rewards. Long term goals face an uphill battle, but potentially give higher effort:reward ratios if successfully fought. Presumably if our throughput rate were high enough and our incoming tasks low enough we wouldn't need to rearrange our queue and a simple FIFO scheme would do. In practice there are always lots of opportunities rolling in, and this pileup only gets worse the better you get at prioritizing as people put more opportunities on your plate as you get more done. So we're stuck with prioritizing. Let's sketch some key fronts.

If you were really convinced that the next task on your to do list was the very best thing to advance your goals you'd feel a lot more interested in it. Some part of you believes this to be the case and other parts very obviously don't. Thus, uncertainty about how to prioritize. My impression is that various systems for prioritization get much of their power from addressing some core ambiguity and that people thus self-sort into those systems based on whether that ambiguity was a key bottleneck for them. This post isn't about outlining another set of antidotes but merely mapping out some (not necessarily all) various kinds of ambiguity.

Two types of ambiguity: risk and uncertainty

When I use the term ambiguity in this post, I'll be referring to both risk and uncertainty as potential roadblocks. Risk is within model ambiguity. Uncertainty is outside of model ambiguity. If I ask you to be on a coin flip, you'll model the odds as 50:50 and your downside risk will be that 50 fifty percent chance of loss. That model doesn't include things like 'the person I'm making a bet with pulls out a gun and shoots me while the coin is in the air.' That broader context within which the risk model is situated deals with uncertainty, including the uncertainty over whether or not your model is correct (weighted coins). Most of the other categories could be further broken down along the risk/uncertainty dimension, but that is left as run-time optimization in the interests of brevity.

Between-task ambiguity

There are a few ways that we are already prioritizing, and confusion about which one would be best in a given situation can serve as a roadblock. In First-Due prioritization we simply do whatever will have the nearest deadline. In Longest-Chain prioritization we prioritize whatever task will have the longest amount of time or largest number of sub tasks to get done. In Shortest-Chain prioritization we want to clear up the total list size as much as possible so we get all the shortest tasks done quickly. In Most-Salient prioritization we allow the vividness and emotional immediacy of tasks serve as the goad. In Most-Likely-Failure prioritization we look for tasks that have a step we are highly uncertain about and see if we can test that step, because if it fails we can maybe throw out a whole task and thus increase total throughput. In Most-Reusable prioritization we focus on those tasks whose partial or complete solutions will be most useful in the completion of multiple other tasks. In Expected-Value prioritization we focus on those tasks that will result in the potentially the biggest payoffs, presumably creating resources for engaging with other tasks. This might sound like the best until we realize we've only pushed the problem one level up the stack as we now need to juggle the fact that there are different sorts of value payoffs and our marginal utility for a given resource and ability to convert between different sorts of value might be changing over time. Due to the well known effects of loss aversion, it's also worth specifically naming a commonly encountered sub-type Expected-Loss prioritization.

Many people default to a strategy of delay and it is worth pointing out that conceptualizing this as simply some sort of character failure prevents us from identifying the benefit that this strategy provides. Namely, that it converts complex prioritization problems to simpler ones. Analysis of dependencies and choice of heuristics simplifies to 'Who will be angry with me soonest if I don't do X' a sort of mix of First-Due and Most-Salient. Many of the problems people refer to in discussions of akrasia involve situations in which these strategies caused obvious problems that could have been alleviated by a different prioritization heuristic.

Within-task ambiguity

Ambiguity about individual tasks serves as additional activation energy needed to engage with that task. One easy way of thinking about this ambiguity is by asking of it all the journalist questions: who, what, where, why, how. To this we might add a couple less well known ones that are about additional kinds of specificity:

'Which' as a drill down step if the answers to any of our other questions are too general to be of use. 'Who does this affect?', 'College students', 'Which?'

'Whence' (from what place?) as a sort of backwards facing 'why' accounting for ambiguity around where a task came from and whether we made our jobs harder when we stripped the task of that context in recording it.

See also the Specificity Sequence.

Goal-relevance ambiguity

Techniques like goal factoring are intended to collapse some of the complexity of prioritization by encouraging an investigation of how sub-tasks contribute to high level values and goals. I see three pieces here. Task-Outcome ambiguity involves our lack of knowledge about what the real effects of completing a task will be. Instrumental-Goal ambiguity deals with our lack of knowledge about how well our choice of proxy measures, including goals, will connect to our actual future preferences. An example of a dive into a slice of this region is the Goodhart Taxonomy. Part-Whole Relation ambiguity deals with our lack of knowledge of the necessity/sufficiency conditions along the way of chaining from individual actions to longer term preference satisfaction.

Meta: Ambiguity about how to deal with ambiguity

A few different things here.

What are we even doing when we engage with ambiguity in prioritization? An example of one possible answer is that we are continually turning a partially ordered set of tasks into a more ordered set of tasks up to the limit of how much order we need for our 'good enough' heuristics to not face any catastrophic losses. There are probably other answers that illuminate different aspects of the problem.

Ambiguity about the correct level of abstraction to explore/exploit on. When trying to do our taxes, instead of getting anything done we might write a post about the structure of prioritization. :[

Risk aversion as different from uncertainty aversion. Feels like there's potentially a lot to unpack there.

Motivational systems, whether rational, emotional, psychological, ethical, etc. as artificial constraints that make the size of the search space tractable.

Attacking ambiguity aversion directly as an emotional intervention. What is it we are afraid of when we avoid ambiguity and what is the positive thing that part is trying to get for us? There is likely much more here than just 'cognition is expensive' and this post itself could be seen as generating the space to forgive oneself for having failed in this way because the problem was much more complex than we might have given it credit for.

Ambiguity as a liquid that backs up into whatever system we install to manage it. Sure, you could deploy technique X that you learned to prioritize better (GTD! KonMarie! Eisenhower Matrices!) but that would be favoring the tasks you deploy them on over other tasks and there's ambiguity on whether that's a good idea. Related to ambiguity about the correct level to explore exploit on as well as Aether variables, bike shedding, and wastebasket taxons. i.e. Moving uncertainty around to hide it from ourselves when we don't know how to deal with it.

Concrete takeaways

I said this was more a mapping exercise, but if people were to only take a couple things away from this post I think I'd want them to be:

1. Ask 'Which' and 'Whence' more often as a trailhead for all sorts of disambiguation.

2. Don't write tasks vertically down the page, write them across the page so there's room underneath to unroll the details of each task.

and finally,

This article is a stub, you can improve it by adding additional examples


Was a PhD necessary to solve outstanding math problems?

10 июля, 2020 - 21:43
Published on July 10, 2020 6:43 PM GMT

Previous post: Was a terminal degree ~necessary for inventing Boyle's desiderata?

This is my second post investigating whether a terminal degree practically ~necessary for groundbreaking scientific work of the 20th century?

Mathematics seems like a great field for outsiders to accomplish groundbreaking work. In contrast to other fields, many of its open problems can be precisely articulated well in advance. It requires no expensive equipment beyond computing power, and a proof is a proof is a proof.

Unlike awards like the Nobel Prize or Fields Medal, and unlike grants, a simple list of open problems established in advance seems immune to credentialism. It's a form of pre-registration of what problems are considered important. Wikipedia has a list of 81 open problems solved since 1995. ~146 mathematicians were involved in solving them (note: I didn't check for different people with the same last name). I'm going to randomly choose 30 mathematicians, and determine their mathematical background.

The categories will be No PhD, Partial PhD, PhD, evaluated in the year they solved the problem. In my Boyle's desiderata post, 2/15 (13%) of the inventors had no PhD. I'd expect mathematics to exceed that percentage.


Robert Connelly: PhD Anand Natarajan: PhD Mattman: PhD Croot: PhD Mineyev: PhD Taylor: PhD Antoine Song: Partial PhD Vladimir Voevodsky: PhD Ngô Bảo Châu: PhD Haas: PhD Andreas Rosenschon: PhD Paul Seymour: PhD (D. Phil) Oliver Kullmann: PhD Shestakov: PhD Merel: PhD Lu: PhD Knight: PhD Grigori Perelman: PhD Haiman: PhD Ken Ono: PhD Ben J. Green: PhD Demaine: PhD Jacob Lurie: PhD Harada: PhD McIntosh: PhD Naber: PhD Adam Parusinski: PhD Atiyah: PhD Benny Sudakov: PhD John F. R. Duncan: PhD

Contrary to my expectation, all of these mathematicians had a PhD except Antoine Song, the only partial PhD. He finished his PhD the year after his work on Yau's conjecture.

So either:

a) This list is not in fact an unedited list of important mathematical conjectures and who solved them, but instead a list retroactively edited by Wikipedia editors to select for the the credentials of the discovers, or 

b) A PhD is an almost universal precursor to groundbreaking mathematical work.


First, the bad news. It's a problem that I have no way to verify that the list I used was not cherry-picked for problems solved by PhDs. The suspicious may want to look for a list of open mathematical problems published in a definitive form prior to 1995 and repeat this analysis.

My model for why a PhD would be necessary to achieve groundbreaking work is:

These degrees come with credibility; access to expensive equipment, funding, and data; access to mentors and collaborators. A smart person who sets out to do groundbreaking STEM work will have a much lower chance of success if they don't acquire an MD/PhD. Massive, sustained social coordination is ~necessary to do groundbreaking research, and the MD/PhD pipeline is a core feature of how we do that. Without that degree, grant writers won't make grants. Collaborators won't want to invest in the relationship. It will be extremely difficult to convince anybody to let someone without a terminal degree run a research program.

Authoritativeness of the proof, access to expensive equipment, and access to data don't seem to be very much at play in mathematical discoveries.

Perhaps the reason these mathematicians enrolled in their PhD is that the academic environment is both conventional and attractive for genius mathematicians, even though it's not actually necessary for them to do their work. My guess is that funding, the sense of security that comes with earning credentials allows risk-taking, and access to long-term collaborators and mentors also play an important role.

37 of the discoveries (46%) are credited to a single mathematician, giving some perspective on the extent to which access to collaborators is important.

How did the two inventors of Boyle's desiderata who didn't hold a terminal degree manage to do their work without a PhD? The fact that they both worked in the field of robotics seems relevant.

Maybe the story is something like this:

Earning a PhD is both attractive and helpful for people doing basic research in established fields.

A PhD is less important for doing groundbreaking applied engineering and entrepreneurial work, especially in tech.

It's hard to overstate the extent to which business contributes to academic work. How many mathematical, biological, and physical discoveries would never have been made, if it weren't for robotics (invented by someone with no higher education) and cheap compute (provided by the business sector)? How much has economic growth expanded our society's capacity to fund academic research?


Let's think about the situation of a STEM student with lots of potential, but no money and few accomplishments.

If they do a PhD, they'll get enough money to live on, and some time and mentorship to try and prove their intellectual leadership abilities. Coming out of it, they'll have a terminal degree, which will give them the option of continuing in academia if they like it, or leaving for industry if they don't.

If they go straight into industry without a PhD, they might earn more money early on. But they'll also have to work their way up from near the bottom, unless they can join a small startup early on. They might get caught in the immoral maze of some gigantic corporation. They won't have the same leeway a PhD student has to choose their own project. And they likely won't have the same long-term earning potential.

From that point of view, the PhD concept itself doesn't seem like empty credentialism. Instead, it's a mechanism for sifting through the many bright young people our society produces, giving a certain percentage of them a boost toward intellectual leadership and a chance to take a crack at a basic research problem. It's also a form of diversification, a societal hedge against an overly short-term, profit-oriented, commons-neglecting capitalist approach to R&D.


Talk: Key Issues In Near-Term AI Safety Research

10 июля, 2020 - 21:36
Published on July 10, 2020 6:36 PM GMT

I gave a talk for the Foresight Institute yesterday, followed by a talk from Dan Elton (NIH) on explainable AI and a panel discussion that included Robert Kirk and Richard Mallah.

While AI Safety researchers are often pretty well aware of related work in the computer science and machine learning fields, I keep finding that many people are not aware of a lot of very related work that is taking place in an entirely different research community. This other community is variously referred to as Assured Autonomy, Testing Evaluation Verification & Validation (TEV&V), or Safety Engineering (as they relate to AI-enabled systems, of course). As I discuss in the talk, this is a much larger and more established research community than AI Safety, but unfortunately until recently there was very little acknowledgement by the Assured Autonomy community of closely associated work by the AI Safety community, and vice versa.

Recently organizations such as CSER and FLI have been doing a lot of great work helping to connect these two communities with jointly-sponsored workshops at major AI conferences - some of you may have attended those. But I still think it would be useful if more people in both communities were a bit more aware of the work of the other community. This talk represents my attempt at a short intro to that.

Video (my presentation is from 2:28 to 19:00)

Short version of slide deck (the one I used in the presentation)

Longer version


When is evolutionary psychology useful?

10 июля, 2020 - 21:22
Published on July 10, 2020 6:22 PM GMT

I am currently improving my courtship skills and am reading Mate by Geoffrey Miller. The book relies heavily on evolutionary psychology, which I was previously critical of.

My earlier perspective was that because we know little about the human evolutionary environment, evo psych is full of "fully general arguments". I can assert lots of attributes to the evo environ to support any position, so I can support no position.

But I do not have a good source of priors for what women find sexy. Evo psych is offering priors and most of them so far match observational data. Was my previous view of Evo psych wrong, and if so how?


A space of proposals for building safe advanced AI

10 июля, 2020 - 19:58
Published on July 10, 2020 4:58 PM GMT

I liked Evan’s post on 11 proposals for safe AGI. However, I was a little confused about why he chose these specific proposals; it feels like we could generate many more by stitching together the different components he identifies, such as different types of amplification and different types of robustness tools. So I’m going to take a shot at describing a set of dimensions of variation which capture the key differences between these proposals, and thereby describe an underlying space of possible approaches to safety.

Firstly I’ll quickly outline the proposals. Rohin’s overview of them is a good place to start - he categorises them as:

  • 7 proposals of the form “recursive outer alignment technique” plus “robustness technique”.
    • The recursive outer alignment technique is either debate, recursive reward modelling, or amplification.
    • The robustness technique is either transparency tools, relaxed adversarial training, or intermittent oversight by a competent supervisor.
  • 2 proposals of the form “non-recursive outer alignment technique” plus “robustness technique”.
  • 2 other proposals: Microscope AI; STEM AI.

More specifically, we can describe the four core recursive outer alignment techniques as variants of iterated amplification, as follows: let Amp(M) be the procedure of a human answering questions with access to model M. Then we iteratively train M* (the next version of M) by:

  • Imitative amplification: train M* to imitate Amp(M).
  • Approval-based amplification: train M* on an approval signal specified by Amp(M).
  • Recursive reward modelling: train M* on a reward function specified by Amp(M).
  • Debate: train M* to win debates against Amp(M).

Here are six axes of variation which I claim underlie Evan’s proposals. Each proposal is more or less:

  1. Supervised
  2. Structured
  3. Adversarial
  4. Language-based
  5. Interpretability-dependent
  6. Environment-dependent

In more detail:

  1. Supervised: this axis measures how much the proposal relies on high-quality supervision by a (possibly amplified) human. Imitative amplification places the heaviest burden on the supervisor, since they need to identify a good action at each timestep. Approval-based amplification requires the supervisor to recognise good actions at each timestep, which should be easier; and standard RL only requires the supervisor to recognise good outcomes. Multi-agent, microscope and STEM AI don’t even require that.
  2. Structured: this axis measures how much the work of alignment relies on instantiating a specific structure. Recursive reward modelling, as explained here, is highly structured because it constructs a tree of agents implementing specific subtasks. Debate is somewhat less so, because the debate tree is defined implicitly, and only one path through it is actually taken. In other versions of amplification, it depends on how the calls to Amp(M) work - they might involve the human just asking M a couple of clarifying questions (in which case the structure is very minimal), or else spinning up many copies of M in a hierarchical and structured way. By contrast, multi-agent approaches are by default highly unstructured, since many of the agents’ incentives will be emergent ones that arise from flexible interactions. However, I should be clear that these classifications are intended merely as rough guidelines - comparisons of such a vague concept as “structuredness” will always be very subjective.
  3. Adversarial: this axis measures how much the proposal depends on AIs competing with each other during the training process. Debate and multi-agent training are strongly adversarial; other proposals are more or less adversarial the more or less they depend on adversarial training. STEM AI and Microscope AI are the least adversarial.
  4. Language-based: this axis measures how much the proposal relies on using natural language as a means of interaction. Debate (along with amplification, in practice) is heavily language-based; STEM AI is not very language-based; everything else is in the middle (depending on what types of tasks they’re primarily trained on).
  5. Interpretability-dependent: this axis measures how much the proposal relies on our ability to interpret the internal workings of neural networks. Some require this not at all; others (like microscope AI) require a detailed understanding of cognition; (relaxed) adversarial training requires the ability to generate examples of misbehaviour, which I expect to be even harder. Another source of variance along this axis is how scalable our interpretability tools need to be - adversarial training requires interpretability tools to run frequently during training, whereas in theory we could just analyse a microscope AI once.
  6. Environment-dependent: this axis measures how much the proposal depends on which environments or datasets we use to train AGIs (excluding the supervision component). Multi-agent safety and STEM AI are heavily environment-dependent; everything else less so.

I intend this breakdown to be useful not just in classifying existing approaches to safety, but also in generating new ones. For example, I’d characterise this paper as arguing that AI training regimes which are less structured, less supervised and more environmentally-dependent will become increasingly relevant (a position with which I strongly agree), and trying to come up with safety research directions accordingly. Another example: we can take each variant of iterated amplification and ask how we could improve them if we had better interpretability techniques (such as the ability to generate adversarial examples which display specific misbehaviours). More speculatively, since adversarial interactions are often useful in advancing agent capabilities, I’d be interested in versions of STEM AI which add an adversarial component - perhaps by mimicking in some ways the scientific process as carried out by humans.

There’s one other important question about navigating this space of possibilities - on what metric should we evaluate the proposals within it? We could simply do so based on their overall probability of working. But I think there are enough unanswered questions about what AGI development will look like, and what safety problems will arise, that these evaluations can be misleading. Instead I prefer to decompose evaluations into two components: how much does a proposal improve our situation given certain assumptions about what safety problems we’ll face along which branches of AGI development; and how likely are those assumptions to be true? This framing might encourage people to specialise in approaches to safety which are most useful conditional on one possible path to AGI, even if that’s at the expense of generality - a tradeoff which will become more worthwhile as the field of AI safety grows.

Thanks to the DeepMind safety reading group and Evan Hubinger for useful ideas and feedback.


Should I take an IQ test, why or why not?

10 июля, 2020 - 19:53
Published on July 10, 2020 4:52 PM GMT

I've seen discussion of IQ tests around LW. People imply there's a benefit to taking the test. I assume it is related to belief in belief or something. Can anyone flesh out this argument?


Mesa-Optimizers vs “Steered Optimizers”

10 июля, 2020 - 19:49
Published on July 10, 2020 4:49 PM GMT


The paper Risks from learned optimization introduced the term "inner alignment" in the context of a specific class of scenarios, namely a "base optimizer" which searches over a space of “inner” algorithms. If the inner algorithm is an optimizer, it's called a "mesa-optimizer", and if its objective differs from the base algorithm's, it's called an "inner alignment" problem. In this post I want to plead for us to also keep in mind a different class of scenarios, which I'll call "Steered Optimizers", and which also has an "inner alignment" problem. The inner alignment problem for mesa-optimizers is directly analogous to the inner alignment problem for steered optimizers, but the specific failure modes and risk factors and solutions are all somewhat different. I’ll argue that it's at least comparably likely for our future AGIs to be steered optimizers rather than mesa-optimizers. So again, we should keep both scenarios in mind.


I recently wrote a post about brain algorithms with "inner alignment" in the title, but I was talking about something kinda different than in the famous Risks from Learned Optimization paper that I was implicitly referring to. I didn't directly explain why I felt entitled to use the term “inner alignment” for this different situation, but I think it's worth going into, especially because it’s a more general approach to making AGI that goes beyond brain-inspired approaches.

(Terminology note: Following “Risks From Learned Optimization”, I will use the term "optimizer" in this post to mean an algorithm which uses foresight / planning to search over possible actions in pursuit of a particular goal, a.k.a. a "selection"-type optimizer. I want humans to count as “optimizers”, so I will also allow “optimizers” to sometimes choose actions for other reasons, and to maybe have inconsistent, context-dependent goals, as long as they at least sometimes use foresight / planning to pursue goals.)

Let's start with two scenarios in which we might create highly intelligent AI "optimizers":

1. Search-over-algorithms scenario: (this is the one from Risks from Learned Optimization). Here, you have a "base optimizer" which searches over a space of possible algorithms for an algorithm which performs very well according to a "base objective". For example, the base optimizer might be doing gradient descent on the weights of an RNN (large enough RNNs are Turing-complete!). Anyway, if the base optimizer settles on an inner algorithm which is itself an optimizer, then we call it a “mesa-optimizer”. Inner alignment is alignment between the mesa-optimizer’s objective and the base objective, while outer alignment is alignment between the base objective and what the programmer wants.

2. Steered Optimizer scenario: (this is how I think the human brain works, more or less, see my post "Inner alignment in the brain"). Here, you also have a two-layer structure, but the layers are different. The inner layer is an algorithm that does online learning, world-modeling, planning, and acting, and it is an optimizer. We wrote the inner-layer algorithm ourselves, and it is never modified or reset (the whole scenario is just one “episode”, in RL terms). But as the inner algorithm learns more and more, it becomes increasingly powerful, and increasingly difficult for us to understand—like comparing a newborn brain to an adult brain, where the latter carries a lifetime of experience and ideas. Meanwhile, the base layer watches the inner layer in real time, and tries to "steer" it towards optimizing the right target, using hooks that we had built into the inner layer’s source code. How does that steering work? In the simplest case, the base layer can be a reward function calculator, and it sends the reward information to the inner layer, which in turn tries to build a predictive model of the correlates of that reward signal, set them as a goal, and make foresighted plans to maximize them. (There could be other steering methods too—see below.) As in the other scenario, inner alignment is alignment between the inner layer’s objective(s) and the formula used by the base layer to compute rewards, while outer alignment is alignment between the latter and what the programmer wants.

Here’s a little comparison table:


“Search Over Algorithms” scenario

“Steered Optimizer” scenario

Base layerRun inner layer for N steps, compute score, do gradient descent on inner layer algorithmRun inner layer. As it runs, watch it, and send rewards (or other signals) to “steer” it.Inner layerArbitrary algorithm discovered by base layerKnown, hand-coded algorithm, involving world-modeling, planning, acting, etc.Interpretability of inner layer (by default, i.e. without special interpretability tools)Always inscrutableStarts from a known, simple state, but gets more and more inscrutable as it builds a complex world-modelWhat is the inner layer’s objective?It might not have one. If it does, we don’t know what it is (by default)We designed it to form and seek goals based on the steering signals it receives, but we don’t know its actual goals at any given time (by default)How many training episodes?Millions, I presume.As few as one; maybe several, but more like a run-and-debug loop.Are we doing this today?Not really (but see references in “Risks from Learned Optimization”).Not that I know of, off-hand, but it’s probably in the AI literature somewhere.

By the way, these two scenarios are not the only two possibilities, nor are they mutually exclusive. The obvious example for “not mutually exclusive” is the human brain, which fits nicely into both categories—the subcortex steers the neocortex (more on which below), and meanwhile evolution is a search-over-algorithms-type base optimizer for the whole brain.

Why might we expect AI researchers to build steered optimizers, rather than searches-over-algorithms?
  • Steered optimizers enable dramatically longer episodes than searches-over-algorithms. In the first line of the table above I wrote that search-over-algorithms involves running the inner layer for N steps per episode. How big is N? If we want to build a system that can learn a whole predictive world-model from scratch, that's an awfully big N! Evolution is a good example here; it picks a genotype and then spends many decades calculating its loss. Imagine doing ML with one gradient descent step per decade! For various reasons, I don't think this rules out a search-over-algorithms approach, but I definitely think it's a strike against its plausibility. Steered optimizers do not have this problem; they do not need to run through millions of episodes to reach excellent performance, just a single very long episode, or more likely dozens of very long episodes for debugging, hyperparameter search, etc.
  • As I keep mentioning, I think brains work as steered optimizers, with the steered optimizer subsystem centered around the neocortex (or pallium in birds and lizards), and the steering subsystem based in other parts of the brain. If I’m right about that, that would imply that (1) steered optimizers are a viable path to AGI, and (2) we have a straightforward-ish development path to get there, i.e. we “merely” need to reverse-engineer the neocortex.
  • Given that we know at least vaguely what a world-modeling-and-acting-and-planning algorithm is supposed to do and how, I think people will be able to write such an algorithm themselves faster than they could find it by blind search. I don't think it's that complicated an algorithm, compared to the collective capability of the worldwide AI community. (Why don’t I think the algorithm is horrifically complicated? Mainly from object-level reading and thinking about neocortical algorithms, which I discussed most recently here. I could be wrong.)

Incidentally, if we’re writing the inner algorithm ourselves, why not just put the goal into the source code directly? Well, that would be awesome ... But it may not be possible! I think the easiest way to build the inner algorithm is to have it build a world-model from scratch, more-or-less by finding patterns in the input, and patterns in the patterns, etc. So if you want the AGI to have a goal of maximizing paperclips, we face the problem that there is no “paperclips” concept in its source code; it has to run for a while before forming that concept. That’s why we might instead build an AGI by letting it start learning and acting, and trying to steer it as it goes.

How might one steer an AGI steered optimizer?
  • As mentioned above, we can send reward signals—calculated automatically and/or by human overseers.
  • A human, assisted by interpretability tools, could reach in and add / subtract / edit goals. Or a similar thing could be automated.
  • You could build a hook in the inner layer for receiving natural language commands. Like maybe, whenever you press the button and talk into the microphone, whatever world-model concepts are internally activated by that speech become the inner layer’s goals (or something like that).
  • Any of the weird tricks that the brain uses, as discussed in my posts inner alignment in the brain and an earlier post about human instincts.
  • I don’t know! I’m sure there are other things.
Lessons from being a human

If the human neocortex is a steered optimizer, what can we learn from that example?

1. How does it feel to be steered?

You try a new food, and find it tastes amazing! This wonderful feeling is your subcortex sending a steering signal up to your neocortex. All of the sudden, a new goal has been installed in your mind: eat this food again! This is not your only goal in life, of course, but it is a goal, and you might use your intelligence to construct elaborate plans in pursuit of that goal, like shopping at a different grocery store so you can buy that food again.

It’s a bit creepy, if you think about it!

“You thought you had a solid identity? Ha!! Fool, you are a puppet! If your neocortex gets dopamine at the right times, all of the sudden you would want entirely different things out of life!”

2. What does Inner Alignment failure look like in humans?

A prototypical inner alignment failure would be knowing that there is some situation that would lead the subcortex to install a certain goal in our minds, and we don’t want to have that goal (according to our current goals), so we avoid that situation.

Imagine, for example, not trying a drug because you’re afraid of getting addicted.

To make that analogy explicit, you could imagine that our brain was designed by an all-powerful alien who wanted us to take the drug, and therefore set up our brain with a system that recognizes the chemical signature of that drug, and installs that drug as a new goal when that chemical signature is detected. At first glance, that’s not a bad design for a steering mechanism, and indeed it works sometimes. But we can undermine the alien's intentions by understanding how that steering mechanism works, and thus avoiding the drug.

A more prosaic example: practically every “productivity hack” is a scheme to manipulate our own future subcortical steering signals.

3. What would corrigible alignment look like in humans?

Again analogizing from the definition in “Risks From Learned Optimization”, “corrigible alignment” would be developing a motivation along the lines of “whatever my subcortex is trying to reward me for, that is what I want!” Maybe the closest thing to that is hedonism? Well, I don’t think we want AGIs with that kind of corrigible alignment, for reasons discussed below.

More random thoughts on steering
  • An AGI might be easier to steer than a human brain, if we can find a way to reliably steer in response to imagination / foresight, and not just actions. In the example above, where I am trying not to get addicted to a drug, my job is made pretty easy by the fact that I need to actually take the drug before getting addicted. Merely thinking about taking the drug will not install that goal in my brain. Maybe we can avoid that problem in our steered AGIs somehow?
  • I mentioned corrigible alignment above. I think that the sense of “corrigible alignment” which is most analogous to the “Risks from learned optimization” paper is like hedonism—valuing the reward steering signals, as an end in themselves. If that’s the definition, then I would be concerned that a corrigibly-aligned system solves the inner alignment problem while horribly exacerbating the outer alignment problem, because the system is now motivated to wirehead or otherwise game the reward signals. It’s not necessarily an unsolvable outer alignment problem—maybe an AGI could be motivated by both hedonism and a specific aversion to self-modification other than by normal learning, for example. But I’m awfully skeptical that this is a good starting point. I think it’s more promising to go for a different flavor of corrigibility, where we try to steer the system so that it becomes motivated by something like “the intentions of the programmer”, i.e. a flavor of corrigibility that tries to cut through both the inner and the outer alignment problems simultaneously. (Maybe this is my opinion about corrigible alignment in the search-over-algorithms scenario as well...)
Related work

Deep RL from Human Preferences and Scalable Agent Alignment Via Reward Modeling both bring up the idea of taking reward signals, trying to understand those signals in the form of a predictive model, and then using that reward model as a target for training an agent (if I understand everything correctly). (This is not the only idea in the papers, and in most respects the papers are more like search-over-algorithms.) Anyway, that specific idea has parallels with how a steered optimizer would try to relate its reward signals to meaningful, predictive concepts in its world-model. In this post I’m trying to emphasize that reward-modeling part, and generalize it to other ways of steering agents. Also, unlike those papers, I prefer to merge the reward-modeling task and the choosing-actions task into a single model, because their requirements seem to heavily overlap, at least in the case of a powerful, world-modeling, optimizing agent. For example, the reward-modeling part needs to look at a bunch of reward signals and figure out that they correspond to the goal “maximize paperclips”; while the choosing-actions part needs to take the goal “maximize paperclips” and figure out how to actually do it. These two parts require much the same world-modeling capabilities, and indeed I don’t see how it would work except by having both parts actually referencing the same world-model.

(I'm sure there's other related work too, that’s just what jumped to my mind.)

(thanks Evan Hubinger for comments on a draft.)


Born as the seventh month dies ...

10 июля, 2020 - 18:08
Published on July 10, 2020 3:07 PM GMT

Epistemic status: Mathematical reasoning by an amateur. Mistakes are quite likely.

The problem

I was reading The Equation of Knowledge, and it starts with this little cute problem:

Suppose a dad has two kids. At least one of them is a boy born on Tuesday. What's the odds of his sibling being a boy?

Generalizing the problem:

P(2 boys | 1Bn := At least one boy with an independent characteristic (named N here) that has the probability 1/n)

The original problem can now be seen as an instance of P(2 Boys | 1B7).

Simple Bayesian solution

I started solving this with a simple application of Bayes:

P(1Bn | 2 boys) = P(First child being a boy having N | 2 boys) + P(Second child being a boy having N | 2 boys) - P(Both children being boys having N | 2 boys) = 1/n + 1/n - 1/(n^2) P(1Bn) = P(First child being a boy having N) + P(Second child being a boy having N) - P(Both children being boys having N) = (1/2)(1/n) + (1/2)(1/n) - ((1/2)(1/n))^2 = 1/n + 1/4(n^2) BayesFactor(2 boys | 1Bn) = P(1Bn | 2 boys)/P(1Bn) = (8n - 4)/(4n - 1) P(2 boys | 1Bn) = BayesFactor(2 boys | 1Bn) * P(2 boys)=((8n - 4)/(4n - 1)) * (1/4) = (2n - 1)/(4n - 1)

We have:

P(2 boys | 1B1 == At least one boy) = 1/3

P(2 boys | 1B7 == At least one boy born on Sunday) = 13/27

lim{n -> +Inf}[P(2 boys | 1Bn == At least one boy born exactly x seconds after the big bang)] = 2n/4n = 1/2

So ... I am somewhat confused. It's intuitively obvious that having two boys creates more opportunity for specific independent phenomena to happen. But, at first blush, my intuition was firmly suggesting that I throwaway the additional information as useless, and only careful thinking lead me to the (hopefully) correct answer. I also can't quite think of any practical examples for this epistemic error. Your thoughts appreciated.

Generalizing more

Repeating the same analysis, but generalizing the probability of "being a boy" to 1/k,

BayesFactor(2 boys | P(boy)=1/k, 1Bn) = (2n(k^2) - (k^2))/(2nk - 1) lim{n -> +Inf}[P(2 Boys | P(boy)=1/k, 1Bn)] = 1/k Generalizing to random variables

Suppose we have two independent, identically distributed variables X1 and X2, and another two i.i.d variables Z1 and Z2. All of these variables are mutually independent. Repeating the exact same calculations, we'll have:

Px := P(X1=x) = P(X2=x) Pz := P(Z1=z) = P(Z2=z) BayesFactor(X1=X2=x | (X1=x, Z1=z) or (X2=x, Z2=z) ) = ... = (2Pz - Pz^2)/(2PxPz - (PxPz)^2) P(X1=X2=x | (X1=x, Z1=z) or (X2=x, Z2=z) ) = BayesFactor(X1=X2=x | (X1=x, Z1=z) or (X2=x, Z2=z) ) * P(X1=X2=x) = ... = (2Px - PxPz)/(2 - PxPz) lim{Pz -> 0+}[P(X1=X2=x | (X1=x, Z1=z) or (X2=x, Z2=z) )] = 2Px/2 = Px

If we set Pz = 1 (basically nuking the Z variables), we'll have:

P(X1=X2=x | (X1=x, Z1=z) or (X2=x, Z2=z) ) = P(X1=X2=x | X1=x or X2=x) = Px/(2 - Px)

So the independent information provided by the Z variables can, maximally, improve the odds by a ratio of 2 - Px >= 1.


Frankenstein Delenda Est

10 июля, 2020 - 17:38
Published on July 10, 2020 2:38 PM GMT

[Not fiction, but still art. Cross-posted from Grand, Unified, Crazy.]


I am terrified by the idea that one day, I will look back on my life, and realize that I helped create a monster. That my actions and my decisions pushed humanity a little further along the path to suffering and ruin. I step back sometimes from the gears of the technology I am creating, from the web of ideas I am promoting, and from the vision of the future that I am chasing, and I wonder if any of it is good.

Of course I play only the tiniest of roles in the world, and there will be no great reckoning for me. I am a drop in the ocean of the many, many others who are also trying to build the future. But still, I am here. I push, and the levers of the world move, however fractionally. Gears turn. Webs are spun. If I push in a different direction, then the future will be different. I must believe that my actions have meaning, because otherwise they have nothing at all.

No, I do not doubt my ability to shape the future; I doubt my ability to choose it well. The world is dense with opportunity, and we sit at the controls of a society with immense potential and awful power. We have at our disposal a library full of blueprints, each one claiming to be better than the last. I would scream, but in this library I simply whisper, to the blueprints: how do you know? How do you know that the future you propose has been authored by The Goddess of Everything Else, and is not another tendril of Moloch sneaking into our world?

Many people claim to know, to have ascended the mountain and to be pronouncing upon their return the commandments of the one true future: There is a way. Where we are going today, that is not the way. But there is a way. Believe in the way.

I hear these people speak and I am overcome with doubt. I think of butterflies, who flap their wings and create Brownian motion, as unfathomable as any hurricane. I think of fungi, whose simple mushrooms can hide a thousand acres of interwoven root. I think of the human brain, a few pounds of soggy meat whose spark eludes us. The weight of complexity is crushing, and any claim to understanding must be counterbalanced by the collected humility of a thousand generations of ignorance.

And on this complexity, we build our civilization. Synthesizing bold new chemicals, organizing the world’s information, and shaping the future through a patchwork mess of incentives, choices, and paths of least resistance. Visions of the future coalesce around politics of the moment, but there is no vision of the future that can account for our own radical invention. Do not doubt that Russell Marker and Bob Taylor did as much to shape today as any president or dictator. The levers we pull are slow, and their lengths are hidden, but some of them will certainly move the world.

And on these levers, we build our civilization. Invisible hands pull the levers that turn the gears that spin the webs that hold us fast, and those invisible hands belong to us. We pronounce our visions of a gleamingly efficient future, accumulating power in our bid to challenge Moloch, never asking whether Moloch is, simply, us. The institutions of the American experiment were shaped by the wisdom of centuries of political philosophy. That they have so readily crumbled is not an indictment of their authors, but of the radical societal changes none of those authors could foresee. Our new society is being thrown together slapdash by a bare handful of engineers more interested in optimizing behaviour than in guiding it, and the resulting institutions are as sociologically destructive as they are economically productive.

And on these institutions, we build our civilization.


Sometimes, I believe that with a little work and a lot of care, humanity might be able to engineer its way out of its current rough patch and forward, into a stable equilibrium of happy society. Sometimes, if we just run a little faster and work a little harder, we might reach utopia.

There is a pleasant circularity to this dream. Sure, technology has forced disparate parts of our society together in a way that creates polarized echo chambers and threatens to tear society apart. But if we just dream a little bigger we can create new technology to solve that problem. And honestly, we probably can do just that. But a butterfly flaps its wings, and the gears turn, and whatever new technical solution we create will generate a hurricane in some other part of society. Any claims that it won’t must be counterbalanced by the collected humility of a thousand generations of mistakes.

Sometimes, I believe that the future is lying in plain sight, waiting to swallow us when we finally fall. If we just let things take their natural course, then the Amish and the Mennonites and (to a lesser extent) the Mormons will be there with their moral capital and their technological ludditism and their ultimately functional societies to pick up the pieces left by our self-destruction. Natural selection is a bitch if you’re on the wrong end of it, but it still ultimately works.

Or maybe, sometimes, it’s all a wash and we’ll stumble along to weirder and weirder futures with their own fractal echoes of our current problems, as in Greg Daniels’s Upload. But I think of the complexity of this path, and I am overcome with doubt.


I am terrified by the idea that one day, I will look back on my life, and realize that I helped create a monster. Not a grand, societal-collapse kind of monster or an elder-god-sucking-the-good-out-of-everything kind of monster. Just a prosaic, every-day, run-of-the-mill, Frankenstinian monster. I step back sometimes from the gears of the technology I am creating, from the web of ideas I am promoting, and from the vision of the future that I am chasing, and I wonder if it’s the right one.

From the grand library of societal blueprints, I have chosen a set. I have spent my life building the gears to make it go, and spinning the webs that hold it together. But I look up from my labour and I see other people building on other blueprints entirely. I see protests, and essays, and argument, and conflict. I am confident in my epistemology, but epistemology brings me only a method of transportation, not a destination.

I am terrified that it is hubris to claim one blueprint as my own. That I am no better than anyone else, coming down from the mountaintop, proclaiming the way. That society will destroy my monster of a future with pitchforks, or that worse, my monster will grow to devour what would have otherwise been a beautiful idyll.

Frankenstein was not the monster; Frankenstein created the monster.


Was a terminal degree ~necessary for inventing Boyle's desiderata?

10 июля, 2020 - 07:47
Published on July 10, 2020 4:47 AM GMT

Is an MD/PhD practically ~necessary for groundbreaking scientific work of the 20th century?

I went into this project believing that an MD/PhD is ~necessary on a practical level to do groundbreaking STEM work. My model was this:

"These degrees come with credibility; access to expensive equipment, funding, and data; access to mentors and collaborators. A smart person who sets out to do groundbreaking STEM work will have a much lower chance of success if they don't acquire an MD/PhD. Massive, sustained social coordination is ~necessary to do groundbreaking research, and the MD/PhD pipeline is a core feature of how we do that. Without that degree, grant writers won't make grants. Collaborators won't want to invest in the relationship. It will be extremely difficult to convince anybody to let someone without a terminal degree run a research program."

Ideally, we'd answer this question with an RCT. Perhaps we'd take a pool of 100,000 successful biology PhD applicants, deny half of them the opportunity to enroll in a PhD or MD for the rest of their lives, and find some way to compare the two groups for their scientific accomplishments.

Nobody's going to let me run that experiment. So I tried looking at the proportion of recent STEM Nobel Prize winners who hold terminal degrees.

Of the 2018-2019 winners of the Nobel Prize in biology, chemistry, medicine, physics, and economics, all of them have a PhD. I also peeked at the 2018 Fields Medal winners. 3/4 have a PhD and the last one completed at least several years of PhD work. The most recent 3 individuals listed for an OpenPhil grant (7/9/2020) in Human Health and Wellbeing (a category I chose arbitrarily) all have PhDs.

Do credentials determine who gets the awards for groundbreaking work?

Zvi asks whether these awards might be mainly just reflections of the credentials of the applicants. Are these awards cherry-picking from among the set of all possible groundbreaking discoveries or projects the ones that are invented or headed by terminal degree-holders? If so, then these awards and grants might be severely under-representing groundbreaking STEM discoveries made by scientists and inventors who don't hold a terminal degree.

An alternative approach is to look at historical lists of inventions-yet-to-be-made and see who invented them. Did they have a terminal degree?

For fun, let's evaluate Robert Boyle's desiderata. I'm not sure when he wrote it, but he lived from 1627-1691.

"It's 1673! Where's my Varnishes perfumable by Rubbing?"

Some of these can't be readily disambiguated, so we'll either ignore them (strikethrough) or choose a few inventions that fit the bill. Becoming a doctor only required a 4-year degree starting in around 1930, and I think it's reasonable to limit ourselves to inventions of the last 100 years (1920-2020).

  • The Prolongation of Life (can you really call this living?)
  • The Recovery of Youth, or at least some of the Marks of it, as new Teeth, new Hair colour'd as in youth (cosmetic surgery was mostly pioneered prior to the 1920s).
  • The Art of Flying (first manned flight and airplane prior to 1920).
  • The Art of Continuing long under water, and exercising functions freely there (SCUBA and submarines invented prior to 1920).
  • The Cure of Wounds at a Distance. Robotic surgery? The first example, the Arthrobot, was invented in 1985 primarily by "biomedical engineer James McEwen, Geof Auchinleck, a UBC engineering physics grad, and Dr. Brian Day as well as a team of engineering students." McEwen has a PhD, Auchinleck has a BASc, and Day has an MD. Who knows about the engineering students?
  • The Cure of Diseases at a distance or at least by Transplantation. The first kidney transplant was performed in 1954 by a team of MDs led by Joeseph E. Murray.
  • The Attaining Gigantick Dimensions (do you even lift, Boyle?)
  • The Emulating of Fish without Engines by Custome and Education only (ummm...)
  • The Acceleration of the Production of things out of Seed Norman Borlaug, who led the Green Revolution, had a PhD.
  • The Transmutation of Metalls. It's probably not quite what Boyle was dreaming of, but the conversion of radioactive thorium to radium was discovered by Frederick Soddy and Ernest Rutherford. I don't believe Soddy had a PhD. Rutherford earned his doctorate the same year they made the discovery.
  • The makeing of Glass Malleable (clear plastic was invented prior to the 1930s).
  • The Transmutation of Species in Mineralls, Animals, and Vegetables. The first transgenic animal was created by a team led by Thomas Wagner and Peter C. Hoppe, both PhDs holders.
  • The Liquid Alkaest and Other dissolving Menstruums (ambiguous)
  • The making of Parabolicall and Hyperbolicall Glasses (invented prior to 1930s)
  • The making Armor light and extremely hard (first commercially sold bulletproof vest invented prior to 1930s)
  • The practibable and certain way of finding Longitudes (invented prior to 1930s)
  • The use of Pendulums at Sea and in Journeys, and the Application of it to watches (invented prior to 1930s)
  • Potent Druggs to alter or Exalt Imagination, Waking, Memory, and other functions, and appease pain, procure innocent sleep, harmless dreams, etc. Anesthetics were pioneered before the 1930s. Let's go with LSD and antidepressants. LSD was invented by Albert Hoffman, who had a PhD. The first antidepressant was invented by Irving Selikoff and Edward H. Robizek, both of whom were post-1930s American MDs.
  • A Ship to saile with All Winds, and A Ship not to be Sunk (invented prior to 1930s/never invented, unless you count the Titanic).
  • Freedom from Necessity of much Sleeping exemplify’d by the Operations of Tea and what happens in Mad-Men. Modafinil, I guess? Michel Jouvet invented it, and had a PhD.
  • Pleasing Dreams and physicall Exercises exemplify'd by the Egyptian Electuary and by the Fungus mentioned by the French Author (If MDMA counts, it was synthesized prior to 1930).
  • Great Strength and Agility of Body exemplify’d by that of Frantick Epileptick and Hystericall persons. They already had powerful domesticated animals and machines to harness wind and water, so Boyle was probably thinking of a machine that was both very strong and yet intelligently guided, like a robot. George Devol, inventor of the first robot, had no higher education.
  • A perpetuall Light(the lightbulb was invented prior to the 30s, but mine still burns out, and I hear that entropy is always increasing...)
  • Varnishes perfumable by Rubbing Uh, saving the best for last, Boyle? Gale Matson was a 3M chemist, inventor of the technology underlying scratch and sniff in the 1960s, and he had a PhD.


Full disclosure: most of the inventions from prior to 1930 were created by people without terminal degrees, ranging from the tailor who created the first bulletproof vest to the Wright brothers, who didn't have high school diplomas.

I knew that Borlaug and Hoffman had PhDs before I made this list. They both seemed like the most obvious 20th century choices for the categories in Boyle's list, but still, you could easily accuse me of cherry-picking. But my process in general was to choose an invention that seemed to fit the bill, and only then look up the inventor.

Of the 15 named scientists who headed up the 9 groundbreaking inventions on this list, 3 didn't have terminal degrees (Auchinleck, Devol, and Soddy).

Note that Soddy did win a Nobel Prize for his work, despite not having a terminal degree. It's hard to say which, if either, of the following two arguments the existence of post-1930s non-PhD/MD STEM Nobel Prize winners supports:

a) The fact that STEM Nobel Prizes do sometimes get awarded to non-PhD/MDs confirms that they're awarded on merit, not credentials. Thus, the fact that the overwhelming majority do have PhDs/MDs suggests that terminal degrees really are ~necessary.

b) The fact that STEM Nobel Prizes do sometimes get awarded to non-PhD/MDs confirms that you don't need one to do groundbreaking work. Perhaps innovative people just tend to get PhDs/MDs, but they would still have found a way to make their groundbreaking inventions without those degrees.


It's a small sample, but overall, I'm surprised that 20% of these inventors didn't have PhDs/MDs. That's a point against my original argument.

It's interesting that two of them worked in robotics. I'm hesitant to generalize, but if I did this with other historical lists, I wonder what the proportion of non-credentialed groundbreaking inventors would be, and what fields they worked in. That might be a way of getting at the fields or types of important inventions most tractable for a smart, motivated outsider.


As Few As Possible

10 июля, 2020 - 07:37
Published on July 10, 2020 4:37 AM GMT

All of economics, every last bit of it, is about scarcity. About what is scarce and what is not, and about who does and who doesn’t get their needs (and sometimes wants) satisfied.

Much of the debate about healthcare is in fact a scarcity problem. There aren’t enough practitioners in practice to handle every patient, so some people don’t get doctors. It’s a combination of self-selection where people who can’t afford to take time off to have an ingrown toenail treated professionally hack at it with a pocketknife instead, and insurance selection where granny’s insurance will pay for hospice but not a new hip, and actual medical discretion now and then.

But this is about what medical care does right. Triage.

In times of very acute scarcity when many people are injured and need medical attention to survive, but few doctors are available to treat them, medicine does triage. The details can be complex, but one of the features of triage is that if doctors are too scarce to save everyone, they prioritize saving *as many people as possible*, even though that often means giving specific people no treatment at all. The scarcity is distributed so that *as few people as possible* suffer from scarcity.

This is not about mass trauma triage.

Imagine for a moment how utterly absurd it would be if there was an earthquake, but there were enough doctors to simultaneously treat everyone injured in it, but they insisted on following triage protocols and the very worst injured were given black tags indicating “do not treat”, even as surgeons who could save them sat idle, because everyone else was being treated. No civilized person would defend that kind of thing.

But this is not about hypothetical earthquakes with plenty of doctors.

The world produces significantly more food than needed to feed everyone. Quite a few people actually are angry that much of it is destroyed even though many people starve for lack of food. Many agree that is unacceptable that, even though there is enough to go around, not enough goes around.

This is not about hunger or food waste, or stores throwing bleach in the dumpsters to keep people from eating stale crackers.

This is about scarcity, and who gets it.

By it’s very nature as a lack of enough, if scarcity exists then someone must get it; there’s no accounting trick to pay Tuesday if there is no hamburger today. Various economic philosophies try to distribute scarcity in various ways. Some try to give the scarcity to the least productive, some try to split it as evenly as possible, still others just admit that the scarcity goes to the lowest status or losers at violence. Various implementations of those philosophical principles handle those philosophies to varying degrees of faithfulness and effectiveness.

But this is not about economics. This is about one of the fundamental decisions of morality: who gets the scarcity?

There is only one answer I can possibly accept:

As few people as possible, and no more.

A nurse in a triage ward who has to mark the patient who will not be treated should not be so sad at what they do as the one who orders that food be destroyed, lest some bargain hunters pilfer from the dumpster instead of buying from the commercial establishment. The hypothetical trolley controller standing as his hypothetical lever and wondering if he is a murderer would be aghast if he understood a zoning board’s decision that causes dozens of non-hypothetical to die of exposure so that a handful of high-status people, the eldest sons of the ones who got credit for industrializing the land that was credited to the winners of violence.

Any political or economic philosophy or policy that ever wantonly destroys a scarce thing, or fails to produce a scarce need that could have been produced, is evil. Those who manufacture scarcity for their own profit, to accumulate their own positional goods, have created a new hell. Not for themselves, but for their victims, and on Earth.

When there is not enough to go around, it doesn’t go all the way around. The answer is the same whether it be food, shelter, textbooks, medical attention, love, mosquito nets, or anything else that people need. Who should miss out?

As few as possible.


Kelly Bet on Everything

10 июля, 2020 - 05:48
Published on July 10, 2020 2:48 AM GMT

Cross-posted, as always, from Putanumonit.

It’s a core staple of Putanumonit to apply ideas from math and finance out of context to your everyday life. Finance is about making bets, but so is everything else. And one of the most useful concepts related to making bets is the Kelly criterion.

It states that when facing a series of profitable bets, your wagers should grow proportionally with your bankroll and with your edge on each bet. Specifically, that you should bet a percentage of your bankroll equivalent to your expected edge — if a bet has a 55% chance to go your way your edge is 55%-45%=10% and you should risk 10% of your bankroll on it (assuming equal amounts wagered and won). There could be reasons to avoid betting the full Kelly in practice: you’re not sure what your edge is, your bet size is limited, etc. But it’s a good guide nevertheless.

People’s intuition is usually that Kelly bets are too aggressive, that betting half of everything you have a on 75%-25% bet is too wild. But the Kelly criterion is actually quite conservative in that it maximizes not the expected size of your bankroll but it’s expected logarithm. “Exponential” means “fast and crazy”;  logarithm is the inverse of that. It’s slow and cautious. If you have $1,000 and you’d risk no more than $750 for an equal chance to win $3,000, you’re logarithmic in dollars and should “bet the Kelly”.

Log scales apply to the difficulty and value you get for most things. Life satisfaction grows with log(money).  Making a new friend is probably one tenth as valuable to someone who has 10 friends than to someone who has one, so your social life depends on log(friends). It’s equally hard to double one’s number of blog readers, sexual partners, job offers etc regardless of how many you have, as opposed to incrementing each by a fixed amount. It’s equally valuable too.

And so, for most things, it makes sense to bet the Kelly. You’ll need to find out what bets are available, where your edge is, and what your bankroll is.


Let’s start with the obvious one. What kind of Kelly bets can you make with money? Investments are the obvious one, and standard investment advice is to switch to high-risk-high-return assets when you have some money to spare.

You can also make bets on your ability to make money: take on a side project, look for a new job, start your own business, ask for a raise. Each one entails a risk and a possible reward. Your bankroll is your literal bankroll, your edge is your ability to make money for yourself or your employer.

People have a tendency to think that if they’re paid $N a month their value to their employer is something like N and half, but that often way off. Some people are worth less than what they are paid, but are kept around because their boss can’t tell. Some people are worth 10x their salary — an employer has no reason to pay you more if you don’t ask for it. I quit a job once and immediately got offered a 30% raise to come back. I did some math on what I’m worth, gambled on asking for 50%, and got it.


When your friendships are few and tenuous, people’s inclination is to play it safe and conform to the crowd. It won’t make you a social star, but it won’t turn people away either. But if you have an edge in popularity and enough close friends to fall back on you can make some bets on your own vision.

When I was younger and struggled to make friends I’d just wait to be invited to parties. When I finally figured it out and acquired a rich social life I started throwing my own events the way I like them: controversial topic parties, naked retreats in the woods, psychedelic rationality workshops. Each one is a gamble — the event could fail or people could just not show up. In either case I’d lose some of the status and goodwill that allowed me to plan those events in the first place. But when it works the payoff is equally great.

Creative Talent

Whatever creative outlet you have, you get better by getting feedback from the audience. Show people your paintings, read them your poems, invite them to your shows, link them to your blog. This is a gamble — if people don’t like what you’re making you won’t get their attention next time.

When I just arrived in NYC I was doing stand-up and would perform at bringer shows where you get stage time if you bring 3 or 4 paying guests. My ability to do that depended on the number of friends willing to humor me (bankroll) and my humor (edge). By the time I got decent enough to get an invite to a non-bringer show I had just about run out of comedy-tolerating friends to call on.


The most obvious way to bet on yourself in romance is to flirt with people “outside of your league”, your bankroll being in part your ability take rejection in stride and stay single for longer. The same applies the other way, with making the bet on breaking up a relationship that is merely OK in hopes of something better.

But you can also bet on an existing relationship. If the person you’re dating just got into a school or job in a faraway city your ability to go long-distance for a while depends a lot on the bankroll of relationship security you have. Ethical non-monogamy is a similar gamble: if you’re don’t have an edge in making your partner happy they may leave you. If you do, their happiness only doubles for their ability to date other people, and polyamory makes you all the more attractive as a partner.

Polyamory makes bad relationships worse and good ones better; if you only know people who opened up when their relationship started deteriorating you’re liable to miss this fact.


Psychedelics can drive you insane. They can also make you saner than you’ve every been. The same applies to meditation, mysticism, esoteric ideologies, and whatever else Bay Area Rationalists are up to. Epistemic Rationality is your bankroll and your edge.


A lot of people are seeing the rise in callout and cancel culture purely as a threat, a reason to go anonymous, lock their accounts, hide in the dark forest of private channels. But where there’s threat there’s also opportunity, and where reputations can be lost they can also be made. Chaos is a ladder.

In 2015 Scott Aaronson’s blog comment went viral and threatened to spark an outrage mob. Aaronson didn’t expect that popular feminist writers would dedicate dozens of pages to calling him an entitled privileged asshole for expression his frustrations with dating as a young nerd. But he also didn’t expect that Scott Alexander would write his most-read blog post of all time in defense of Aaronson, and that the entire Rationalist community would mobilize behind him. This wouldn’t have happened if Aaronson hadn’t proven himself a decent and honest person, writing sensitively about important topics under his real name. Aaronson’s reputation both online and in his career only flourished since.


Having children is a bet that you have enough of an edge on life that you can take care of another human and still do well. The payoff is equally life-changing.

Risk Averse Irrationalists

I wrote this post because of my endless frustration with my friends who have the most slack in life also being the most risk averse. They have plenty of savings but stay in soul-sucking jobs for years. They complain about the monotony of social life but refuse to instigate a change. They don’t travel, don’t do drugs, don’t pick fights, don’t flirt, don’t express themselves. They don’t want to think about kids because their lives are just so comfortable and why would you mess with that?

They often credit their modest success to their risk-aversion, when it’s entirely due to them being two standard deviations smarter than everyone they grew up with. By refusing to bet on themselves they’re consigned forever to do 20% better than the most average of their peers. To make 20% more money with a 20% nicer boyfriend and 1.2 Twitter followers.

And partly, I wrote this post for me. I spent my twenties making large bets on myself. I moved to the US nine years ago today, all alone and with a net worth of $0. I found polyamory and the love of my life. I started a blog under my real name, with my real opinions, on real topics.

Now in my mid-thirties my life is comfortable, my slack is growing, and I’m surrounded by younger friends who know all about discretion and little about valor. This post is a reminder to keep looking for my edge and keep pushing the chips in. There’s plenty more to be won.


How far along are you on the Lesswrong Path?

10 июля, 2020 - 01:22
Published on July 9, 2020 9:41 PM GMT

How far along are you on the Path?

Some say the Path has always existed and there have always been those who have walked it. The Path has many names and there are many notions of what the Path is.

To some the path is called philosophy, to others it is the art of rationality. To my people, it was called science and the pursuit of truth. Physics, psychology, neuroscience, machine learning, optimal control theory. All of these are aspects to the path. It has no true name.

With certainty, the only way to know whether you walk the path is by talking to those who don't walk the path.

One man, Scott Alexander describes the path as,

"Some people are further down the path than I am, and report there are actual places to get to that sound very exciting. And other people are around the same place I am, and still other people are lagging behind me. But when I look back at where we were five years ago, it’s so far back that none of us can even see it anymore, so far back that it’s not until I trawl the archives that realize how many things there used to be that we didn’t know. "


How to tell where you are on the path?

One day, a man comes up to a woman and says:

"I am further down the path than you are."

In response, the woman says:

"No, I am further down the path than you are."

Instantly, both people recognize the solution as they have both at least progressed some degree through the path. The man speaks:

"Only one of us can be more right than the other. Let us make a prediction about all things in this world for all time and whoever is more correct about the state of the world for all time is closer to the end of the path."

Realizing the recursive time constraint of determining who is correct with infinite precision, the woman instead points to a random child.

"Whoever can guess correctly how this child behaves for the next 1 minute has walked further along the path and we will default to their position as truth when we are in conflict and have no time to discuss the details."

The man proceeds to describe in detail the neurobiological system of the baby and all the deterministic forces that would lead the baby to breathe, think, move in the manner in which he predicted it would. He then goes on to describe all the biases in their environment and how it would play a role in the way the child would act.

The woman looks at the man and says, you are wrong. You are doubly wrong and you do not understand the nature of the path at all.

She writes something down on the piece of paper, gives it to the man and tell him to open it in 10 seconds.

He looks at it for a couple seconds and then realizing its time, he opens it.

The baby will cry and you are an idiot. I create the future

In that moment, he hears a cry and realizes that she was much farther along the path than he was. She had gone to the baby in the seconds he was focusing on the paper, picked it up and pinched it with what looked like significant force.

The woman looks at him and asks:

"Do you understand?"

He thinks for a long while and replies:

Either I have the ability to affect the entropic state of the universe of I dont. If I can, then I can create any future constrained by energy and possible state transitions. The truest prediction is one that I am already causally bound to.In the case where I can't effect the entropic state of the universe, I should still act and believe as if i do, because it is the most effective way within the closed system to affect the likelihood of a prediction.

She looks at him with a smile and a surprising glint of curiosity in her eyes. She thinks to herself silently:

Close. I was once where you were. There is a flaw in that logic. The flaw is axiomatic and has to do with the essence of reality. Pursue this question, does reality exist if you have no sensors in which to perceive it?

The answer lies in this. What is the difference between a prediction, the current perceived state and the reason to transition to the next state?

But instead says,

The ability to manipulate the system is far more important than the ability to predict its outcomes. At some point your ability to manipulate the system will become equivalent to the best prediction system in the world.My Path and Others

The path isn't linear and it isn't constrained to a single dimension. It is at the very least 4 dimensional and has no boundaries or edges as far as I know.

Some Condensed Examples of Path Progression

  • Scott'09 -> Scott'2020
    • Metropolitan Man -> Slate Star Codex -> Secret
  • Elizier'09 -> Elizier'2020
    • HPMOR -> Less Wrong -> AI to Zombies
  • Aires'09 -> Aires'20
    • MM/HPMOR Reader -> Engineer -> Kinect -> Hololens -> MASc in AI -> Uber Michelangelo Founder -> Uber AI -> Meta-learning/Neuroevolution -> Emotional Tensor Theory

I forked from Lesswrong in 2009, when I originally worked in SF and haven't returned except for a brief stint in 2011/12 when i returned to the bay area.

In my pursuit of rationality, AI and AGI, I sought to analyze the human emotion system from a neurobiology and machine learning perspective.

What is represented by the feelings we feel i.e what does the embedded neurotransmitter representation of emotions actually correspond to in terms of hardware and information theory?


Does Lesswrong have a blindspot related to emotions?

A quick search of emotions in the Lesswrong Archives shows less than 5 results

  • Why is there such a large gap of exploration into emotions on Lesswrong. Is it because they are colloquially the anathema to rationality?
  • Is it an inherited bias from the ideology of the lesswrong creators? or simply ignorance
  • Perhaps emotions aren't relevant in any way and are encompassed in rationality?


Call for Aid: Lesswrong 2.0 is enormous as is the path I have walked. I'm sure while there is overlap there are likely very strong contention points in both how AI systems work and human systems work. Help me find them.

I'd love to talk to two or three less wrong experts for 2 1.5hr sessions in July/August. If you'd like to help me, please comment directly and we can set up a calendar invite over email.


Slate Star Codex and Silicon Valley’s War Against the Media

9 июля, 2020 - 23:25
Published on July 9, 2020 7:20 PM GMT