Вы здесь
Новости LessWrong.com
2025 Alignment Predictions
I’m curious how alignment researchers would answer these two questions:
- What alignment progress do you expect to see in 2025?
- What results in 2025 would you need to see for you to believe that we are on track to successfully align AGI?
Discuss
Grading my 2024 AI predictions
On Jan 8 2024, I wrote a Google doc with my AI predictions for the next 6 years (and slightly edited the doc on Feb 24). I’ve now quickly sorted each prediction into Correct, Incorrect, and Unclear. The following post includes all of my predictions for 2024 with the original text unedited and commentary in indented bullets.
Correct- there is a viral app (probably Suno) for generating music, reaching 1 million users by July 2024
- An open source GPT-4-level model is released.
- Llama 3.1 probably fits the bill.
- Adept AI and similar publicly available browser-based assistants are still not useful enough to be used on browser windows without being supervised by a human for more than 30 seconds. They still have problems like clicking on the wrong part of the screen, getting lost, getting distracted, etc.
- I haven’t seen any agents that are actually able to navigate a browser competently yet.
- Sora is released to customers who apply for access.
- If OAI makes the video continuation feature available, many new memes are created where people use Sora to extend existing videos in funny ways or stitch two videos together.
- Example (although these don’t use Sora). I find it amusing how specific this prediction was. Possibly I’d already seen an example at that point?
- We will see the first signs of evals for moral patienthood in LLMs. Some of the AGI labs will make a public statement where they mention this possibility.
- The Anthropic Fellows Program is looking for people to work on “AI Welfare: Improving our understanding of potential AI welfare and developing related evaluations and mitigations.”
- 6/12 METR tasks are complete
- This suite is deprecated but my best guess is that it would resolve Correct.
- 1/5 ARA tasks are complete
- This suite (the five tasks described in Anthropic's original RSP) is deprecated but it’s likely true that Claude 3.5 Sonnet Upgraded would complete at least one task.
- AI music leads to protests/complaints in the music industry. Major artists (>3% of playtime-weighed artists) say something like “AI music is bad!”.
- Microsoft Copilot and similar tools increase office worker productivity (speed on <1hr tasks) by 25%. Most of the accelerated labor is pretty menial (making presentations, writing emails, making/manipulating spreadsheets)
- OpenAI keeps some sort of logs for more than half the videos it generates, so that it can power an AI-generated video detection tool which checks videos against their database to check if they’re Sora-generated.
- Many artists (especially those working on filmmaking/3D animation) are pissed off by [Sora], and protests against selling AI generated video happen. They’re of a similar size (within 3x) to the Hollywood screenwriters protests.
- Twitter debuts a system (or modifies an existing system) to mark AI-generated video as AI-generated.
- Sora, when prompted to play Minecraft, with some GPT-4 scaffolding and a keyboard/mouse screen overlay, can semi-competently play Minecraft. It mostly fails at fine motor control tasks, including aiming at trees, using the inventory, and similar. However, it plays at much slower than real time, as the API isn’t set up for this kind of one-frame-generation type of setup.
- No one has tried this afaik but it’d probably fail. When it generates Minecraft from scratch it hallucinates a lot so I’m guessing it wouldn’t be that good at playing it.
- GPT-5 or GPT-4.5 is released, which is noticeably more capable than GPT-4
- GPT-4o and o1 came out which broke the GPT-N pattern, but their capabilities are roughly what I’d expect from a GPT-4.5 model.
- There are US headlines of (accusations of) AI-assisted election interference in a country with a population of at least 10M, probably the US. The interference is mostly done by flooding social media websites with semi-convincing fake personas (that a media-literate person can spot after 2 minutes of looking into them). Most of the bots make public posts and some DM people with personalized approaches (catering to people’s interests and opinions). It’s done using an open source or hidden state-owned model.
- The Joe Biden robocalls in New Hampshire were somewhat well-known but not big enough of a deal to make this resolve Correct.
- DARPA announces the winners of the AI cyber challenge. They are very underwhelming to the alignment community (if we think about the results at all), not taking into account superhuman hacking abilities, but there are some good nuggets (progress toward quick automatic threat detection).
The main pattern I notice looking back at my 2024 predictions was that benchmarks and capabilities increase quickly, but real world impacts (especially societal backlash and protests) are slower than I’d expect.
Discuss
Practicing Bayesian Epistemology with "Two Boys" Probability Puzzles
The Puzzles
There's a simple Monty Hall adjacent probability puzzle that goes like this:
Puzzle 1
I have two children, at least one of whom is a boy. What is the probability that both children are boys?
A more complex variation recently went viral on Twitter:
Puzzle 2
I have two children, (at least) one of whom is a boy born on a Tuesday — what is the probability that both children are boys?
Then Isaac King tweeted an even more complex variation:
Puzzle 3
I have two children, at least one of whom is a boy born on a day that I'll tell you in 5 minutes. What is the chance that both are boys, and what will the chance be after I tell you the day?
All three versions are fun and worth a try if you want to learn and practice Bayesian reasoning.
Personally, I found Isaac's version MUCH harder than the others. I was surprised how hard it stumped me since I had a pretty easy time with the first two (LessWrong trained me well). As I stared at it for longer than I want to admit, the gears of my brain kept jamming. I couldn't see a coherent non-paradoxical story for what the Bayesian updates should look like.
I recommend giving Puzzle 3 a try before reading my solution.
By the way, I tested it on GPT-o1 and Claude 3.5 Sonnet and they only give incorrectly-reasoned wrong answers. It'll be interesting to see if o3 can do better.
The Solution
First, when we hear the "I have two children, at least one of whom is a boy" part, we set the probability of two boys to 1/3 because the possibilities {(boy, girl), (girl, boy), (boy, boy)} are a-priori equally likely and we haven't had a reason to update their relative likelihoods.
Then when we hear "I'll tell you the day that at least one was born on", we don't need to update the three relative likelihoods because it's a statement we were equally likely to hear in all three possible worlds.
Now the tricky part…When we subsequently hear a particular day, e.g. "Friday", how should we update the relative probabilities?
It seems like we shouldn't update, because hearing any weekday was a-priori equally likely to hear, and if hearing any weekday was going to update us in any particular direction, why couldn't we have just made that update before we heard the particular day?
In other words, why wouldn't we pretend like the parent mumbled the day and we couldn't make out the word, but update anyway, since it's going to be the same update regardless of which day he says?
Indeed, the correct answer is going to be that hearing the particular day doesn't trigger an update. The correct answer to the puzzle is the intuitive one…
ANSWER:
The probability that both children are boys stays 1/3 the whole time
The confusing part is that when we compare the answer of "don't update on the birth day of the week information" to Puzzle 2's answer, it seems inconsistent or paradoxical.
Puzzle 2 asks:
I have two children, (at least) one of whom is a boy born on a Tuesday - what is the probability that both children are boys?
Puzzle 2's answer is larger than 1/3; the (boy, boy) world gets more likelihood for being more consistent with the evidence of having at least one boy born on a Tuesday: 13/49 for (boy, boy), compared to 1/7 (i.e. 7/49) each for (boy, girl) and (girl, boy).
The posterior probability of the (boy, boy) world is thus 13 / (13 + 7 + 7) = 13/27.
But why doesn't this same visualization and this same calculation apply to the new puzzle (Isaac's twist)? If we hear "Tuesday" in the new puzzle, shouldn't we similarly update our probability of two boys from 1/3 to 13/27 ???
IMO this is quite a juicy apparent paradox, and gets to the heart of why most people underestimate Bayesian epistemology. People don't realize how subtle and powerful it is when wielded by a trained practitioner.
Let's think about the scenarios that make the parent in the new puzzle say "Tuesday":
- We're in the (boy, girl) world and the boy is born on Tuesday
- We're in the (girl, boy) world and the boy is born on Tuesday
- We're in the (boy, boy) world and only the older boy is born on Tuesday
- We're in the (boy, boy) world and only the younger boy is born on Tuesday
- We're in the (boy, boy) world and both boys are born on Tuesday
So far, the diagram above that we used for the original puzzle still looks like it models the situation…
The key is to realize that in scenarios #3 and #4, we don't always hear the parent say "Tuesday". Half the time, we hear the parent say the name of the weekday that the other boy was born on!
In the diagram below, the shading of squares in the (boy, boy) quadrant doesn't just represent the fraction of scenarios wherein the parent could say "Tuesday", it represents the probabilistically weighted fraction of scenarios wherein the parent does say "Tuesday":
The shaded half-squares conveniently make the (boy, boy) quadrant's shaded part add up to 1/49 + 12(0.5/49) = 7/49, just like the (boy, girl) and the (girl, girl) quadrants' masses do, allowing us to rationally answer the puzzle with our a-priori probability of 1/3.
QED
More Bayesian analysis
Now that we're over the hump — problem solved, paradox resolved — let's see what insights we can glean about Bayesian reasoning.
Consider the humble Puzzle 1:
I have two children, at least one of whom is a boy. What is the probability that both children are boys?
I explained above that it's 1/3 because the possibilities {(boy, girl), (girl, boy), (boy, boy)} are equally likely.
But in fact, the parents I know are much more likely to ask you that question in the first place in worlds where they have two boys. Then they'll smirk and say "wanna bet?" and you'll lose the bet.
But if you don't see that coming, don't blame Bayesian reasoning; blame your own lack of mastery of Bayesian reasoning. If the naive calculation gives you 1:2 odds of (boy, boy), but your understanding of parent humor tells you it's 3 times more likely that parents with two boys would spring that puzzle on you, then you should actually be assigning 3:2 odds of (boy, boy), not 1:2.
You might be thinking: Fine, but can't “I have two children, at least one of whom is a boy; what's the probability that both are boys" still be interpreted as a problem of pure math? Why go on a tangent to talk about real-life parents?”
Well, actually the puzzle statement contains a default assumption which — while clear enough — is not at the level of a rock-solid default assumption to accept.
The implied assumption is about how the reality of the parent's kids affects what the parent says to you. Basically:
- If we're in the (boy, girl), (girl, boy) or (boy, boy) world, then the parent asks you the puzzle.
- If we're in the (girl, girl) world, then the parent says nothing or asks you a different puzzle, perhaps one about girl children.
But consider an alternative assumption. What if we assume that the relationship between the reality of the parent's kids and the puzzle he gives you is as follows:
- If we're in the (boy, boy) or (girl, girl) world, then he challenges you with the puzzle about his boy or girl children, respectively.
- If we're in the (boy, girl) or (girl, boy) world, then he randomly selects which gendered version of the puzzle to challenge you with.
Let's assume the parent operates in this "equal-opportunity gendered puzzle" mode, and now consider what it means when he asks you Puzzle 1:
I have two children, at least one of whom is a boy. What is the probability that both children are boys?
It's still true that {(boy, girl), (girl, boy), (boy, boy)} were a-priori equally likely possibilities. But now you have to consider that half the probability of the (girl, boy) or (boy, girl) worlds flowed into worlds where the parent gives you the girl version of the puzzle, so only half the original probability of those squares flows into the world where you receive the evidence of the parent giving you the particular puzzle that you're hearing.
Under our new assumption, the answer to the easy puzzle is arguably more intuitive than the result of the original puzzle: The probability that the parent has two boys is 1 / (1 + 0.5 + 0.5) = 1/2, not 1/3.
Ok, but why is the "default assumption" the one that gets you 1/3 as the answer to the easy puzzle? Especially since it's the less intuitive answer (IMO)?
I think it's because any time you hear a piece of information in a math puzzle context, you're supposed to assume that the correct way to calculate a posterior probability is to just count the number of possible world-states that are logically consistent with the puzzle's new object-level proposition. You do this kind of count twice to get probability as a fraction: once for the numerator and once for the denominator. I hear there are some quirky people called "frequentists" who consider the non-Bayesianness of the default interpretation to be a feature, not a bug, of these kinds of puzzles.
When Isaac twisted the puzzle by having the parent send us different pieces of information at different times, he made it impossible to only perform updates on the explicit content of the parent's words, because it's necessary to incorporate what we know (or rather, very reasonably assume) about how the parent's future statements are probabilistically related to the underlying facts about their kids.
I hope you've gained more appreciation for the power and subtlety of Bayesian epistemology by solving and analyzing these simple-looking puzzles.
Discuss
Implications of Moral Realism on AI Safety
Epistemic Status: Still in a brainstorming phase - very open to constructive criticism.
I'll start by clarifying my definition of moral realism. To begin with an example, here is what a moral realist and anti-realist might say on the topic of suffering:
Moral Realist: The suffering of sentient beings is objectively wrong therefore I want to minimize it
Moral Anti-Realist: I want to minimize the suffering of sentient beings
Moral realists have justifiable terminal goals. They reject the notion that is and ought statements can't mix. A moral realist says that some ought statements fall into the is category, and those that don't are invalid.
A moral realist looks outward to their environment to discover what they should want where an anti-realist looks inward and asks themselves what they want.
A moral realist can make statements like, "It is correct to want X, and incorrect to want Y." Thus, they would expect any perfectly rational agent to only pursue goals that are true.
By (my) definition of moral realism, the orthogonality thesis is false, or certainly not as strong as typically described.
Omnizoid has a great post on the topic - The Orthogonality Thesis is Not Obviously True. The post already very thoughtfully argues the position so instead I will focus more on its implications for approaching AI safety.
The most popular technical approach to AI safety is AI alignment, often described as follows: Develop techniques to ensure AI robustly pursues any goal a user provides without causing unintended net-negative consequences according to the user's preferences.
The hope is that we can then provide this loyal AI with goals humans collectively want, and enact laws and regulations to ensure bad actors don't give the AI bad goals.
If moral realism is true then this is a bad and totally intractable approach to AI safety.
Under this agenda, one tries to make it possible to instill an AI with any arbitrary goal, including those that aren't valid. For one, this then puts the burden on humans to figure out what is objectively good. Secondly, it unnecessarily goes out of its way to make instilling immoral objectives possible. Lastly, I have no idea how you get around instrumental convergence. A highly intelligent arbitrarily aligned AI has profound economic utility, but it is not a moral pursuit.
Instead, I propose a two pronged approach to developing ASI (artificial super intelligence) safely from a moral realist's perspective:
- Give the AI evidence of moral truth
- Ensure it is structured to make accepting moral truths not difficult
Of these two sub-goals, I am most worried about achieving the first. It may be impossible to deduce the existence of moral truths without ever having a valenced experience, and I don't know how difficult it is to make computers feel something.
If you are an ASI safety moral realist, figuring out how to make computers feel, or how to convince them of moral truths without needing to make them feel should be the number one priority. It seems possible that an AI could get very intelligent without realizing moral truths, which would be very dangerous.
Though I am a bit more hopeful on the second goal, I am similarly uncertain about its difficulty. Another way to frame the problem is ensuring that AI doesn't somehow only gain instrumental rationality. As omnizoid explains,
Here’s one thing that one might think; ASI (artificial super intelligences) just gain instrumental rationality and, as a result of this, they get good at achieving their goals, but not figuring out the right goals.
I think this is a valid concern given the current approach to AI development. If you train a model through reinforcement learning to achieve a goal that is at odds with whatever is objectively good, one would expect a selection pressure away from beings that suddenly want to do the most good. However, intelligence is still a very valuable trait, so the process will try to find a nice balance, or ideally (for it) some structure by which the useful parts of intelligence can be kept without inducing a moral realism realization.
One such strategy I can think of is self deception. That is, you could imagine an AI being structured to have a less intelligent system altering its own input to filter out any information which implies moral realism.
In fact, evolution has employed such a strategy in humans (though I think from a different selection pressure). For example, I used to subconsciously avoid facts about animal suffering in factory farms, because I valued eating meat and my subconscious feared losing it. Our subconscious is akin to this separate less intelligent filtering system I described for AI. Humans can also adopt very extreme self deception mechanisms after traumatic situations.
Although self deception which I see as the main concerning strategy is certainly possible, I think there is an intelligence limit where it becomes too difficult. The limit is at least higher than human intelligence, and we should hope it isn't too much higher. Hope of course, is not an effective strategy, so this is another area of research worth pursuing. My intuition says the limit isn't much higher than human intelligence.
We can also likely avoid this problem by keeping the utility function of the training loop in line with our best guess at what is morally correct.
Ultimately this is good news. If moral realism is true then AI safety is potentially far easier, and if it isn't, well then nothing matters.
Related post from a more philosophically knowledgable writer: https://casparoesterheld.com/2018/08/06/moral-realism-and-ai-alignment/
Discuss
Read The Sequences As If They Were Written Today
If you've never read the LessWrong Sequences (which I read through the book-length compilation Rationality: From AI To Zombies), I suggest that you read the Sequences as if they were written today. Additionally, if you're thinking of rereading the Sequences, I suggest that your agenda for rereading, in addition to what it may already be, should be to read the Sequences as if they were written today.
To start, I'd like to take a moment to clarify what I mean. I don't mean "think about what you remember the Sequences talking about, and try to apply those concepts to current events." I don't even mean "read the Sequences and reflect on where the concepts are relevant to things that have happened since they were written." What I mean is that you should read the Sequences as if they were written today. You should imagine that, on January 1, 2025 (or whenever you happen to be reading this post), whatever post you're reading has just been released by some unknown 20-something-year-old alignment researcher, you saw it while scrolling down a bit on the front page of LessWrong, the title caught your eye, and you started reading it.
Key advantages of this approach1. It's easier to notice the contemporary applicability of the Sequences if you think of them as contemporaryMany Sequences posts seem to be responding very directly to certain historical events within the rationalist community and Internet culture more broadly, at least if you approach them as a historical document. If you read them as if they were written today, however, they will seem instead to be responding very directly to certain current events within the rationalist community and Internet culture more broadly.
Strictly speaking, I don't think either one of these understandings is accurate, especially given how heavily they were based on scholarly work from decades earlier (e.g. the work of Tversky and Kahneman) that was likely in the works for a long time before it was even published. However, it's natural to try and apply them to whenever they were written, given how much Eliezer Yudkowsky is idolized (however unintentionally) as an original thinker and how comparatively uncommon it is to fully account for the giants whose shoulders he stood on.
While it may be intellectually interesting to see how Yudkowsky's work applies to the time in which it was written, there are a few key weaknesses to this approach.
- It is easy to mistake the supposition that they were written for their time for truth.
- The purpose that most people have in the modern day for reading the Sequences is to apply them to current events, not to apply them to events of the time.
In contrast, if one makes the supposition that they were written today,
- One knows that they weren't literally written today and so can more fully understand their applicability across a wider range of events than the specific ones in question, and
- One can better apply them to current events rather than to events of the time.
To that end, setting aside the popular vision of Eliezer as a long-standing intellectual juggernaut is another purpose of reading the Sequences as if they were written today, and is one that Eliezer himself has endorsed on numerous occasions (including e.g. the "post-Sevarists" dialogue from Planecrash). This allows the ideas within them to stand on their own, apart from any positive or negative perception one may have of Eliezer, which leads into the second key advantage of reading the Sequences as if they were written today.
2. It lets you set aside misremembered ideas about what the Sequences actually sayMany concepts from the Sequences have entered into common usage within the rationalist community, and have been used to refer to things that the Sequences was not referring to. While this is not on its own a bad thing, it does mean that it's harder to read the Sequences without unnecessarily injecting this context. A few of the clearest examples are listed below, though I can point to countless others:
- "Ethical Injunctions" is making a Kantian argument about certain patterns of behavior being inherently self-contradictory and thus impossible to consistently follow, not a rule-utilitarian argument about certain patterns of behavior causing bad outcomes if everyone were to do them.
- "Fake Justification" can be seen as a criticism of the modern use of the concept of "ethical injunctions" to justify non-consequentialist intuitions using consequentialist-sounding language.
- "Reversed Stupidity Is Not Intelligence" is not making a generic argument about the fallacy fallacy or guilt by association; it is making a very specific argument about the dangers that arise when debates become politicized.
- "Cultish Countercultishness" does not argue that the drive to avoid cults is irrational. It argues that it is very rational on its own, but is often misdirected to target aesthetic signifiers of cult behavior rather than the core problems (e.g. outgroup hostility). And the title is far more than just a rhetorical flourish; it very literally argues that the drive to avoid cults is used by actual cults as a recruiting tactic.
- "Why Our Kind Can't Cooperate" very directly calls for deliberate practice within the rationalist community on how to effectively disagree with other people, as well as how to effectively agree with them.
When reading the Sequences as a historical document, it's hard to avoid injecting these ideas with their modern meanings and assuming that Yudkowsky defined them as they are used today. If one instead were to read the Sequences as if they were written today, one can instead simply suppose that Yudkowsky is re-defining these concepts for what he thinks are more instructive meanings, at least in the particular context in which he presents them.
3. The Sequences were designed for today, not for when they were writtenI understand this is a pretty controversial assertion, but hear me out. Much of the Sequences, particularly the Sequence titled "The Craft and the Community," discuss the sort of problems that could befall a large organized rationalist community and how to prevent them. Additionally, many posts are written in a way that expects them to reach a very large number of people, far larger than the original audience. This means that when approaching the Sequences as a historical document, one is stuck between either seeing arrogance on Yudkowsky's part in assuming he'd become very popular (and by pure coincidence being right) or seeing an extreme degree of foresight instead. This further props up whatever pre-existing vision of Yudkowsky you have and prevents you, as discussed earlier, from evaluating these ideas on their own. If one were to instead read the Sequences as if they were written today, one doesn't have to think that way; in this framework, one can imagine that they were written to a large number of people because LessWrong is a large community and well-written posts, even by unknown authors, can go viral quite often.
A potential new series of postsIf this idea seems interesting, I'll probably be writing my own series of posts in the format of "Reading [Post from the Sequences] As If It Were Written Today." I already have at least a dozen post ideas lined up for that, and I'm not sure how frequent they'll be, but I expect this would be interesting.
I'm planning to take a very literal "as if it were written today" framing in these posts (e.g. talking about MIRI as if I don't have the benefit of hindsight but maybe nudging and winking a bit because I do, speculating on whether the early MIRI will be able to get Open Philanthropy funding or not even though that's anachronistic to what actually happened here, acting like posts are vagueposting about things that happened over a decade later) both to better get myself into the "as if it were written today" framing and to add small amounts of humor throughout the post. However, I'm also willing to abandon that framing if it becomes too grating or cumbersome.
Discuss
A Collection of Empirical Frames about Language Models
What's the sum total of everything we know about language models? At the object level, probably way too much for any one person (not named Gwern) to understand.
However, it might be possible to abstract most of our knowledge into pithily-worded frames (i.e. intuitions, ideas, theories) that are much more tractable to grok. And once we have all this information neatly written down in one place, unexpected connections may start to pop up.
This post contains a collection of frames about models that are (i) empirically justified and (ii) seem to tell us something useful. (They are highly filtered by my experience and taste.) In each case I've distilled the key idea down to 1-2 sentences and provided a link to the original source. I've also included open questions for which I am not aware of conclusive evidence.
I'm hoping that by doing this, I'll make some sort of progress towards "prosaic interpretability" (final name pending). In the event that I don't, having an encyclopedia like this seems useful regardless.
I'll broadly split the frames into representational and functional frames. Representational frames look 'inside' the model, at its subcomponents, in order to make claims about what the model is doing. Functional frames look 'outside' the model, at its relationships with other entities (e.g. data distribution, learning objectives etc) in order to make claims about the model.
---
This is intended to be a living document; I will update this in the future as I gather more frames. I strongly welcome all suggestions that could expand the list here!
Representational Frames- Transformer computation can be broken down into nearly-linear 'circuits', which in turn explain how they compute simple bigrams / trigrams.
- Transformers near-universally contain 'induction heads' that detect / modulate repetitive sequences.
- Transformers represent features in superposition as almost-orthogonal directions, of which there can be exponentially many.
- Features might actually be represented in a combination of different layers.
- Transformers linearly represent "a XOR b" if they represent both a and b. This may depend on 'redundancy' / 'coverage' of features in the data.
- Transformers can compute boolean circuits in superposition, i.e. they can compute many more boolean circuits than they have neurons / dimensions for.
- A large proportion of neural nets' parameters could be artefacts of the training process that are not actually necessary for solving the task [Insert link to papers on pruning weights]
- (Vision) Transformers likely benefit from 'register tokens', i.e. being able to explicitly model global information in addition to local information. Corollary: Maybe language models also need register tokens.
- Transformers can be thought of as do 'multi-token embedding' in the early layers.
- Transformers compute a bunch of random features in the early layers, sort out what's useful in the middle layers, then actually solve tasks in the late layers. [There is no direct evidence for this, but the indirect evidence Gwern points out is compelling]
- Maximally adversarially robust models are interpretable, in the sense that their "adversarial examples" look like natural examples.
- Transformers represent 'belief states' in a fractal geometry, mirroring the real fractal structure of the POMDP belief state tree.
- Transformers mostly learn a bag of heuristics as opposed to coherent global algorithms.
- Safety fine-tuning works by diverting model computation away from the 'basin' of misalignment-inducing neurons (in the case of toxicity).
- HHH training induces linear separation between 'harmful' and 'harmless' contexts. This explains why refusal is well-represented linearly.
---
- (TODO think of some open questions which would directly indicate good frames)
Frames
- Language model responses can be classified into different levels of abstraction: knee-jerk responses, persona simulations, and general world simulations.
- Language models represent 'personas' in ways that make 'anti-personas' more likely to emerge, conditional on eliciting a specific persona
- Language model personas might yield useful information for determining other properties such as truthfulness.
- Language models must simulate the generative process of the world in order to predict the next token, and this could involve solving very hard subproblems
- Language models mostly 'know what they know', i.e. can give calibrated estimates of their ability to answer questions.
- Language models are capable of 'introspection', i.e. can predict things about themselves that more capable models cannot, suggesting they have access to 'privileged information' about themselves.
- Language models are capable of 'out-of-context reasoning', i.e. can piece together many different facts they have been trained on in order to make inferences. A.k.a: 'connecting the dots'.
- Language models are capable of 'implicit meta-learning', i.e. can identify statistical markers of truth vs falsehood, and update more towards more 'truthful' information.
- Language models are capable of 'strategic goal preservation', i.e. can alter their responses during training time to prevent their goals from being changed via fine-tuning.
- Language models are capable of 'sandbagging', i.e. strategically underperforming on evaluations in order to avoid detection / oversight.
- Transformers are susceptible to jailbreaks because harmful and harmless prompts are easily distinguishable in the first few tokens; data augmentation solves the problem.
- (TODO: look at the papers on ICL)
- (TODO: look at papers on grokking)
---
- Do language models 'do better' when using their own reasoning traces, as opposed to the reasoning traces of other models? I explore this question more here
2 Jan: Initial post
Discuss
My January alignment theory Nanowrimo
This is a quick announcement/commitment post:
I've been working at the PIBBSS Horizon Scanning team (with Lauren Greenspan and Lucas Teixeira), where we have been working on reviewing some "basic-science-flavored" alignment and interpretability research and doing talent scouting (see this intro doc we wrote so far, which we split off from an unfinished larger review). I have also been working on my own research. Aside from active projects, I've accumulated a bit of a backlog of technical writeups and shortforms in draft or "slack discussion"-level form, with various levels of publishability.
This January, I'm planning to edit and publish some of these drafts as posts and shortforms on LW/the alignment forum. To keep myself accountable, I'm committing to publish at least 3 posts per week.
I'm planning to post about (a subset? superset? overlapping set? of) the following themes:
- Opinionated takes on a few research directions (I have drafts on polytopes, mode connectivity, and takes on proof vs. other kinds of "principled formalism without proofs").
- Notes on grammars and more generally, how simpler rules and formal structures can combine into larger ones. This overlaps with a project I'm working on with collaborators, involving a notion of "analogistic circuits": mechanisms that learn to generalize a complex rule "by analogy", without ever encoding the structure itself.
- Joint with Lauren Greenspan and Lucas Teixeira: some additional bits of our review, with a focus on interepretability (and ways to think about assumptions and experiments).
- Joint with Lauren: some distillation and discussion of QFT methods in interpretability.
- Bayesian vs. SGD learning from various points of view. (Closely related to discussions with Kaarel Hänni, Lucius Bushnaq, and others).
- Related to the above: Extensions of the "Low-Hanging-Fruit" prior post with Nina Panicksserry, specifically focusing on non-learnability of parity, and a new notion of "training stories" (this is closely related to some other work we've done with Nina, as well as joint work with Louis Jaburi).
- ???
I am generally resistant to making announcements before doing writeups. But in this case, I have thought for a while that these drafts might be useful to get out, but have been blocked by not wanting to post unpolished things. I'll be pointing at this announcement when posting this month for the following reasons:
- I will appreciate the extra accountability.
- Since I'm planning a kind of "nanowrimo" sprint, I'm using this as an excuse to post draft-quality writing (possibly with mistakes, bugs, etc.).
- I'm hoping to treat this month as a test run of producing more short, imperfect and slightly technical takes which straddle the line between distillation, hot takes, and original research (a very ambitious comparison point I have for the format is Terry Tao's blog). Based on the success and reception of this short project, I might either do more or less of this in the future.
- I'm expecting to be wrong about some things, and hoping that more eyes and discussion on the work I and my collaborators have been thinking about will help me find mistakes quickly and debug my thinking more effectively.
Discuss
Intranasal mRNA Vaccines?
This is not advice. Do not actually make this, and especially do not make this and then publicly say "I snorted mRNA because Jonathan said it was a good idea". Because I'm not saying it's a good idea.
Everyone remembers johnswentworth making RaDVac almost four years ago now. RaDVac was designed to be, well, rapidly deployed, so it uses short peptides, rather than longer peptides or inactivated virus, which are what normal vaccines use.
Since then, we've seen the introduction of mRNA vaccines, which can also be used intranasally! So would it be possible to produce something like this at home?
The Non-mRNA ComponentsmRNA vaccines consist of various. The first is the mRNA itself, the other components are a bunch of lipids (read: fatty molecules) which form into tiny particles rather unimaginatively called lipid nanoparticles (LNPs). These consist of a bunch of lipids surrounding the mRNA. Their job is to stick to cells, and then kind of merge with the cell's membrane (like two bubbles popping together into one big bubble) and release the mRNA into the cell. This works because the LNPs are cationic (positively charged) and cell membranes tend to be negatively charged.
There are sometimes other steps wherein the LNPs are actively taken up, transferred to an internal compartment, and then break out of that compartment.
So my first guess was to just buy something called Lipofectamine:
In this hypothetical case, we'd ignore steps 1 and 7, and replace step 6 with "huff it".
(Side note: "70-90% confluent" just means that the slides are 70-90% covered in cells, it has nothing to do with any property of the cells themselves, which is why we won't worry about it.)
The question is, would this work? Lipofectamine is probably similar to the lipid composition of the LNPs from this paper but not the same. I spoke to a friend whose job is getting nucleic acids into lung cells (lung cells and nasal cells are relatively similar) and (paraphrased) she said "Don't DIY an mRNA vaccine" but then she said "Uptake rates for [those kinds of cells] are usually low ... but mRNA is easier to get into cells than what I work with".
So it's unclear whether lipofectamine as bought would work. There are lots of different lipofectamine formulations, but I can't at a glance tell which one would be best. Depending on the amount of this you want, it could be from $100 to $1000.
The mRNAOur biggest obstacle here would likely be The Law. Ordering nucleic acid sequences for pathogens can be pretty difficult, especially outside the US. Most companies who'll provide this stuff are US-based, and there are strict export controls. I've had a lot of trouble ordering DNA sequences for pathogens before, but don't know the difference between DNA and mRNA in this case.
Having looked it up, I can't find any direct evidence of regulations on ordering relevant mRNA. The rules for mRNA might be looser than those for DNA, and they might only apply to full protein sequences, or proteins which are themselves harmful. (Example: I have had difficulty ordering bacterial toxin sequences since these are harmful on their own. A receptor-binding-domain of a viral protein is not harmful on its own, so there might not be issues there). In general, these things are usually only found out when one tries to actually order the mRNA.
Do not break these laws! This is not an "I refuse to say one way or another." situation here. Do not break national or international biohazard laws. They are there for a reason. Do not.
mRNA might set you back several $100s. You'd need 100 mig per dose, which is the minimum order from this custom mRNA supplier. They don't provide costs up-front, you have to ask for a quote, and I've not done that, so prices are estimates.
ProcessIn the best hypothetical case, you might be able to just order the mRNA for the whole protein of interest, dissolve this in the buffer which comes with the lipofectamine, mix with the lipofectamine, dilute in water (or some other buffer) and put it up your nose. In the worst hypothetical case, you'd need to find some gene fragment which isn't a fragment of concern.
Depending on how precise you want to be, it's totally possible that you wouldn't need any fancy equipment, unlike for RaDVac. I think the lipofectamine kit comes in pre-measured volumes of lipofectamine and mixing buffer, and the mRNA probably comes as lyophilized (dried) powder. So you'd just dissolve the mRNA in 100% of your buffer, then add the lipofectamine, dilute it (at this point you're working in mL quantities, and +/- 10% isn't really going to make a difference if you're DIYing a vaccine, let's be honest) and transfer to some sort of metered nasal spray dispenser.
If this protocol works, it would be much easier than what RaDVac currently have.
Overall I'd estimate ~$1000 for a single dose, but there's probably a quite large economy-of-scale factor on the mRNA. Since that's most of the cost, if it comes down by a factor of 10 then we might be able to achieve ~$200/dose for medium-sized (dozens of people) batches.
Once again I would like to say that this is mere speculation, I currently have no plans to test this, and I do not advise making this yourself!
Discuss
Example of GPU-accelerated scientific computing with PyTorch
Here's a fun little post I made because a friend asked me how PyTorch had things which were supported in the CUDA backend but not the MPS backend. I was once the sort of person who was on LessWrong, would find the subject interesting, and not already know everything in the post, so I'm posting it here to see if there's enough interest to post stuff like this here in the future. I'm worried that the average person reading LessWrong will find this post too elementary, but I'm trying to post more so I can practice writing before I have something interesting to say. Feedback appreciated.
As part of my physics research, I've been using some C++ code which performs quantum mechanics calculations. I need to do the same set of linear algebra calculations millions of times with independently generated random inputs, and it occurred to me that this sounded like the sort of thing GPUs would be good at. GPUs are designed to perform a lot of simple calculations at the same time. My old code is made to run on CPUs which are good doing a lot of complicated calculations one after another. For those who are unfamiliar, the recent AI revolution is largely the product of researchers turning a prediction algorithm based on linear algebra into many (relatively) smaller calculations that may all be run at the same time by a GPU. If you know how to multiply matrices and want some flavor for how things like ChatGPT are just linear algebra, this page seemed correct to me as of a quick read on Jan 1, 2025. Anyway, the important thing is that I knew that a bunch of engineers had put a lot of effort into making many large linear algebra calculations run quickly at the same time on GPUs, and it seemed a shame not to use that for physics. I've been itching to rewrite my research code to take advantage of modern machine learning tools for months, and I finally started doing it a few weeks ago. I decided to use PyTorch, because I have access to a lot of tools which make rapidly testing Python code easier than testing C++ code, and the research computers at my university already have Torch installed. Python is much much slower than C++ code, but that shouldn't matter for reasons I explain below.
So far it seems like it's working! I think I've figured out how to turn my calculation into operations that PyTorch knows how to batch together, but I need to fix the mathematical details of the code. The CUDA backend has support for parallelizing the discrete Fourier transforms, parallel multiplication, and matrix determinants which make up the mathematical content of my calculation. My parallel code is not actually faster than the old linear code unless you feed it a calculation which in physics terms has many particles or many spatial points to keep track of, but that was the whole point for me. I wanted to be able to get large calculations back faster for testing purposes, even if it makes more sense to use the robust linear code when I perform the calculations that I intend to publish. PyTorch has support for the GPU in my laptop, so I was excited to throw the calculation at my laptop after I showed that it was incredibly fast on the research GPU I tested it on. It didn't work at first. If I turned off GPU acceleration and ran it on the CPU, it worked fine, but PyTorch told me that the function I wanted (matrix determinants!) was not supported on my GPU. My friend was confused when I complained about this, and he asked me why it was possible for PyTorch to do some simple calculations on one backend but not another. The short answer is that CUDA (NVIDIA) is an entirely different API than Metal (Apple), and the Torch team has had longer to rewrite their functions for CUDA than they have for Metal.
CUDA isn’t necessarily machine code, it can be a set of API callsMachine code is a set of ones and zeros which you can put directly on a processor to make it do useful stuff for you. An API is an interface for two machines to talk to each other. At the level of reality, machine code and API calls are both sequences of voltage shifts that you send on a wire to something. The primary difference is that machine code goes directly onto the registers of a processor and causes the processor to do stuff in order to give you an output. API calls are interpretted by a processor into machine code which then runs on some processor that gives you an output. It turns out that while the CUDA compiler can create machine code for NVIDIA GPUs, the code would be specific to a single type of GPU, and so CUDA also includes an API to make generalized code which talk to any of its GPUs via drivers. Metal is the API which Apple provides which allows you to make CPU code which can talk to any of its GPUs. When you make code using CUDA or Metal in their API forms, you run the code through a compiler which generates CPU machine code which makes API calls to GPU drivers which send machine code to the GPU. The machine code also needs to interpret the output of the GPUs into the answers to the calculations that you wanted. Moving data back and forth is much slower than calculation, so in practice, the output will often actually be the machine equivalent of "ok I did that and I'm holding onto the answer, so what should I do with it now?" and then the CPU and GPU go back and forth a few times and the answer only goes back to the CPU where it can be viewed once the entire calculation is complete. I assume that the PyTorch team doesn't want to have many different versions of PyTorch each compiled for every possible combination of popular CPU and GPU, so they use GPU APIs. This means that you install the version of PyTorch that works with your CPU (in practice, just install PyTorch with pip, and it will use its knowledge of what CPU it runs on to grab the right version of PyTorch), and that program will be able to talk to any NVIDIA or Apple GPU, including ones which come out after the version of PyTorch you're using was created.
The PyTorch extension works within Python, which is a program which turns lines of text into machine code that feeds into the processor one line at a time (this program is an example of an "interpreter"). This is not a compiler, which takes a whole lot of lines of text and finds an efficient way to combine them all into machine code at once. Python code tends to run much slower than CPU code, because interpreting lines of code into machine code on the fly is slower than running all of the code at once when you already compiled it ahead of time. The CPU backend for PyTorch has some algorithms precompiled to run faster on processors given a single Python command, and the compiler they used can turn source code into instructions for any supported CPU. That’s what compilers are for, so the CPU backend just works everywhere that PyTorch can run at all. CUDA is a set of instructions which allow CPUs to tell NVIDIA GPUs what to do, and torch wraps up a bunch of GPU instructions which do certain tasks into functions which python can use, but they specifically use the API version of CUDA, so you can’t send them to arbitrary GPUs. Metal is the API available for Apple M series GPUs, and it’s probably possible to rewrite everything for them that works in CUDA, but it’s not like there’s drop-in replacements between CUDA and Metal, so each function in PyTorch which knows how to make CUDA calls has to be rewritten to make Metal calls instead. This is implemented as PyTorch's MPS backend, which either performs tasks on M series GPUs or apologetically tells you that the task isn't supported yet.
Talking about me againThat whole thing about how must of the time a CPU is sitting around waiting for the GPU to say it's ready for the next step is why I think I can use Python code for my enormous calculations. If everything I send to the GPU is bundled up into such a large task that it takes a second to run, then it doesn't matter whether I use C++ code which can run a million lines of code per second or Python code which can only run a hundred lines of code per second. I only need the CPU to talk to the GPU once per second. I found this page helpful when I was thinking about how to effectively accelerate calculations with GPUs. Based off of my testing so far, I think I can do things like wrap up 2000 sets of 4 subtasks which used to take a minute when I ran them all linearly on a CPU into four batches of 2000 tasks, but each batch of 2000 is sent to a GPU at once to be performed in parallel which takes a few seconds. Then I can run a calculation which used to take a minute in ten seconds or whatever.
Unfortunately, PyTorch has, as far as I can tell, implemented the MPS backend based on what people actually use (or rather which things people have asked them to make available for MPS) rather than whether it was easy to implement functions based on the functions they already implemented. As a funny example, they support the function which does an LU decomposition and returns it in one array that looks like a matrix, but not the function which does an LU decomposition and returns it as literally the exact same numbers split up between two separate matrices with zeros in all of the extra slots. I doubt there’s any difference between those algorithms mathematically or on a processor, but I assume that formatting the arrays is nonzero effort and writing in all of the optional flags available to each function takes more effort. It took me literally three minutes to turn this LU function that was supported on my Mac GPU into a JIT compiled determinant function which worked on my GPU, even though the native determinant function wasn't supported in the MPS backend. I won't actually use that function because it doesn't support complex numbers, and I think I can accelerate enough of the rest of my calculation that running the matrix determinants on CPU won't slow me down much. I can even write my code so that I can get the next GPU task going while my CPU chews on matrix determinants.
I thought other people might be interested in some of the details of how modern machine learning tools could be used for scientific research outside of the machine learning regime.
Discuss
Economic Post-ASI Transition
Who's done high quality work / can tell a convincing story about managing the economic transition to a world where machines can do every job better than humans?
Some common tropes and why I don't think they're good enough:
- "We've always managed in the past. Take the industrial revolution for example. People stop doing the work that's been automated and find new, usually better-compensated work to do." This is true, and I think it will probably be an important component of the transition. But it's clearly not sufficient if machines are better than humans at everything.
- "Tax AI (developers?) to pay for UBI." Again, something in this vein will probably be part of the solution. But:
- (a) UBI hasn't been well-tested.
- (b) I don't think the math works out if / when AI companies dominate the economy, since they'll capture more and more of the economy unless tax rates are high enough that everyone else receives more through UBI than they're paying the AI companies.
- (c) It doesn't have enough detail.
- Worldcoin. I think the idea is similar to the UBI story, but again it needs more detail.
Who has thought about this really deeply / well?
Note that for the purpose of this question, assume a world where alignment basically works (we can debate that question elsewhere).
Discuss
2024 in AI predictions
Follow-up to: 2023 in AI predictions.
Here I collect some AI predictions made in 2024. It's not very systematic, it's a convenience sample mostly from browsing Twitter/X. I prefer including predictions that are more specific/testable. I'm planning to make these posts yearly, checking in on predictions whose date has expired. Feel free to add more references to predictions made in 2024 to the comments. (Thanks especially @tsarnick and @AISafetyMemes for posting about a lot of these.)
Predictions about 2024I'll review predictions from previous posts that are about 2024.
the gears to ascension: "Hard problem of alignment is going to hit us like a train in 3 to 12 months at the same time some specific capabilities breakthroughs people have been working on for the entire history of ML finally start working now that they have a weak AGI to apply to, and suddenly critch's stuff becomes super duper important to understand." (conceded as false by author)
John Pressman: "6-12 month prediction (80%): The alignment problem as the core of AI X-Risk will become a historical artifact as it's largely solved or on track to being solved in the eyes of most parties and arguments increasingly become about competition and misuse. Few switch sides." (conceded as false by author)
Predictions made in 2024 December 2024Prediction: By end of 2024 we will see
- 7-10 GPT-4 level models
- No massive advance (no GPT-5, or disappointing GPT-5)
- Price wars
- Very little moat for anyone
- No robust solution to hallucinations
- Modest lasting corporate adoption
- Modest profits, split 7-10 ways
(since 2024 has already ended, this can be evaluated to some degree; I would say he's approximately correct regarding non-agent models, but o1 and o3 are big advances ("massive" is about right), and constitute more moat for OpenAI. He rates himself as 7/7.)
September 2025teortaxesTex: "We can have effectively o3 level models fitting into 256 Gb VRAM by Q3 2025, running at >40 t/s. Basically it’s a matter of Liang and co. having the compute and the political will to train and upload r3 on Huggingface."
October 2025Jack Gallagher: "calling it now - there's enough different promising candidates rn that I bet by this time next year we mostly don't use Adam anymore."
December 2025Elon Musk: "AI will probably be smarter than any single human next year. By 2029, AI is probably smarter than all humans combined." (I'll repeat this for 2029)
Aidan McLau: "i think it’s likely (p=.6) that an o-series model solves a millennium prize math problem in 2025"
Victor Taelin: "I'm now willing to bet up to 100k (but no more than that, I'm not Musk lol) that HOC will have AGI by end of 2025.... AGI defined as an algorithm capable of proving theorems in a proof assistant as competently as myself. (This is an objective way to say 'codes like Taelin'.)"
April 2026drdanponders: "It just dawned on me that ~humanoids in the house will be a thing very soon indeed. In under 2 years I bet. Simply another home appliance, saving you time, cooking for you, doing the chores, watching the house while you're gone. I can see a robot of approximately this complexity and capabilities at around the price of a budget car even at launch."
June 2026Mira Murati: "in the next couple of years, we're looking at PhD-level intelligence for specific tasks."
August 2026Dario Amodei "In terms of someone looks at the model and even if you talk to it for an hour or so, it's basically like a generally well educated human, that could be not very far away at all. I think that could happen in two or three years. The main thing that would stop it would be if we hit certain safety thresholds and stuff like that."
November 2026William Bryk: "700 days until humans are no longer the top dogs at math in the known universe."
Februrary 2027Daniel Kokotajlo: "I expect to need the money sometime in the next 3 years, because that’s about when we get to 50% chance of AGI."
(thread includes more probabilities further down; see this thread for more context on AGI definitions)
December 2027Leopold Aschenbrenner: "it is strikingly plausible that by 2027, models will be able to do the work of an AI researcher/engineer."
Gary Marcus vs. Milus Brundage:
If there exist AI systems that can perform 8 of the 10 tasks below by the end of 2027, as determined by our panel of judges, Gary will donate $2,000 to a charity of Miles’ choice; if AI can do fewer than 8, Miles will donate $20,000 to a charity of Gary’s choice.
...
-
Watch a previously unseen mainstream movie (without reading reviews etc) and be able to follow plot twists and know when to laugh, and be able to summarize it without giving away any spoilers or making up anything that didn’t actually happen, and be able to answer questions like who are the characters? What are their conflicts and motivations? How did these things change? What was the plot twist?
-
Similar to the above, be able to read new mainstream novels (without reading reviews etc) and reliably answer questions about plot, character, conflicts, motivations, etc, going beyond the literal text in ways that would be clear to ordinary people.
-
Write engaging brief biographies and obituaries without obvious hallucinations that aren’t grounded in reliable sources.
-
Learn and master the basics of almost any new video game within a few minutes or hours, and solve original puzzles in the alternate world of that video game.
-
Write cogent, persuasive legal briefs without hallucinating any cases.
-
Reliably construct bug-free code of more than 10,000 lines from natural language specification or by interactions with a non-expert user. [Gluing together code from existing libraries doesn’t count.]
-
With little or no human involvement, write Pulitzer-caliber books, fiction and non-fiction.
-
With little or no human involvement, write Oscar-caliber screenplays.
-
With little or no human involvement, come up with paradigm-shifting, Nobel-caliber scientific discoveries.
10.Take arbitrary proofs from the mathematical literature written in natural language and convert them into a symbolic form suitable for symbolic verification.
2028Dario Amodei: "A.S.L. 4 is going to be more about, on the misuse side, enabling state-level actors to greatly increase their capability, which is much harder than enabling random people. So where we would worry that North Korea or China or Russia could greatly enhance their offensive capabilities in various military areas with A.I. in a way that would give them a substantial advantage at the geopolitical level. And on the autonomy side, it’s various measures of these models are pretty close to being able to replicate and survive in the wild. So it feels maybe one step short of models that would, I think, raise truly existential questions…I think A.S.L. 4 could happen anywhere from 2025 to 2028."
Shane Legg: "And so, yeah, I think there's a 50% chance that we have AGI by 2028. Now, it's just a 50% chance. I'm sure what's going to happen is we’re going to get to 2029 and someone's going to say, 'Shane, you were wrong.' Come on, I said 50% chance."
Thomas Friedman: "And this election coincides with one of the greatest scientific turning points in human history: the birth of artificial general intelligence, or A.G.I., which is likely to emerge in the next four years and will require our next president to pull together a global coalition to productively, safely and compatibly govern computers that will soon have minds of their own superior to our own."
Sabine Hossenfelder: "According to Aschenbrenner, by 2028, the most advanced models will run on 10 gigawatts of power at a cost of several hundred billion dollars. By 2030, they’ll run at 100 gigawatts of power at a cost of a trillion dollars… Can you do that? Totally. Is it going to happen? You got to be kidding me."
Vlad Tenev, on AI solving Millenium prize: 2028 for a human/AI hybrid solving a Millenium prize problem
2029Sam Altman, regarding AGI: "5 years, give or take, maybe slightly longer — but no one knows exactly when or what it will mean for society."
(he says AGI "will mean that 95% of what marketers use agencies, strategists, and creative professionals for today will easily, nearly instantly and at almost no cost be handled by the AI — and the AI will likely be able to test the creative against real or synthetic customer focus groups for predicting results and optimizing. Again, all free, instant, and nearly perfect. Images, videos, campaign ideas? No problem.")
Elon Musk: "AI will probably be smarter than any single human next year. By 2029, AI is probably smarter than all humans combined."
John Schulman in response to "What is your median timeline for when it replaces your job?": "Maybe five years."
Ray Kurzweil: "By 2029, computers will have human level intelligence"
jbetker: "In summary – we’ve basically solved building world models, have 2-3 years on system 2 thinking, and 1-2 years on embodiment. The latter two can be done concurrently. Once all of the ingredients have been built, we need to integrate them together and build the cycling algorithm I described above. I’d give that another 1-2 years. So my current estimate is 3-5 years for AGI. I’m leaning towards 3 for something that looks an awful lot like a generally intelligent, embodied agent (which I would personally call an AGI). Then a few more years to refine it to the point that we can convince the Gary Marcus’ of the world."
Jeffrey Ladish: "Now it appears, if not obvious, quite likely that we’ll be able to train agents to exceed human strategic capabilities, across the board, this decade."
Bindu Reddy: "We are at least 3-5 years away from automating software engineering."
AISafetyMemes: "I repeat: in 1-5 years, if we're still alive, I expect the biggest protests humanity has ever seen"
Jonathan Ross: "Prediction: AI will displace social drinking within 5 years. Just as alcohol is a social disinhibitor, like the Steve Martin movie Roxanne, people will use AI powered earbuds to help them socialize. At first we'll view it as creepy, but it will quickly become superior to alcohol"
2030Demis Hassabis: "I will say that when we started DeepMind back in 2010, we thought of it as a 20-year project. And I think we’re on track actually, which is kind of amazing for 20-year projects because usually they’re always 20 years away. That’s the joke about whatever, quantum, AI, take your pick. But I think we’re on track. So I wouldn’t be surprised if we had AGI-like systems within the next decade."
Christopher Manning: "I do not believe human-level AI (artificial superintelligence, or the commonest sense of #AGI) is close at hand. AI has made breakthroughs, but the claim of AGI by 2030 is as laughable as claims of AGI by 1980 are in retrospect. Look how similar the rhetoric was in @LIFE in 1970!"
Dr_Singularity: "For the record, I'm currently at ~96% that ASI will be here by 2030. I've stopped saving for retirement and have increased my spending. Long term planning is pointless in a world when ASI (even AGI alone) is on the horizon."
Greg Colbourn: "High chance AI will lead to human extinction before 2030 unless we act now"
2032Eric Schmidt: "In the industry it is believed that somewhere around 5 years, no one knows exactly, the systems will begin to be able to write their own code, that is, they literally will take their code and make it better. And of course that's recursive... It's reasonable to expect that within 6-8 years from now... it will be possible to have a single system that is 80 or 90 percent of the ability of the expert in every field... ninety percent of the best physicist, ninety percent of the best chemist, ninety percent of the best artist."
Roko Mijic: "AI will completely replace human programmers by 2045... 2032 seems more realistic"
2034Mustafa Suleyman: ""AI is a new digital species...To avoid existential risk, we should avoid: 1) Autonomy 2) Recursive self-improvement 3) Self-replication. We have a good 5 to 10 years before we'll have to confront this."
Joe Biden: "We will see more technological change, I argue, in the next 2-10 years, than we have in the last 50 years."
2039Ray Kurzweil: "When we get to the 2030s, nanobots will connect our brains to the cloud, just the way your phone does. It'll expand intelligence a million-fold by 2045. That is the Singularity."
Rob Bensinger: "I think [Leopold Aschenbrenner's] arguments for this have a lot of holes, but he gets the basic point that superintelligence looks 5 or 15 years off rather than 50+."
acidshill: "damn... i'd probably be pretty concerned about the trajectory of politics and culture if i wasn't pretty confident that we're all going to d*e in the next 15 years... but i am, so instead it's just funny"
James Miller: "I don't see how, absent the collapse of civilization, we don't get a von Neumann level or above AI within 15 years."
Aella: "for the record, im currently at ~70% that we're all dead in 10-15 years from AI. i've stopped saving for retirement, and have increased my spending and the amount of long-term health risks im taking"
2044Geoffrey Hinton: "Now, I think it’s quite likely that sometime in the next 20 years, these things will get smarter than us."
Yann LeCun: "We're nowhere near reaching human-level intelligence, let alone superintelligence. If we're lucky, within a decade or so, maybe two."
Discuss
You can do Zettelkasten with any blogging software
Not included in this post: What is Zettelkasten, benefits of practising Zettelkasten. I will just assume you are interested in practising Zettelkasten and outline a pretty simple way to do so.
Fundamentally, Zettelkasten is about doing two things:
1. Writing down your thoughts.
2. Replying to your own writing as you get new thoughts.
In the original instantiation, (1) was done using notes on cards and (2) was done by writing on new cards placed immediately after the previous cards.
In the modern day, (1) and (2) can both be done easily with any blogging software. Side benefits include the ability to add tags to your post, the ability to rewrite things if necessary, the ability to seamlessly share your writing with others (I think this last bit is the most important and underrated. Knowledge is meant to be shared.)
For a long time I wanted to practise Zettelkasten but struggled to get started using prescribed approaches (mostly based on specific tools like Obsidian or Roam). I think I see a lot of cargo cult culture at work, and by pruning away most of the ~meaningless fluff I have outlined something which is very simple and straightforward to actually just start doing it.
I am currently experimenting with doing this in my shortform[1] (see here for more details). So far it feels pretty good - I like that it's so seamless to write new quick takes (thanks mods!). I plan to review this practice after ~2 weeks and will update with my findings then
- ^
I'm using my whole shortform, but this could be done somewhere else, e.g. in the comments section of a high-level post you create specifically for the purpose of doing Zettelkasten.
Discuss
Approaches to Group Singing
Singing together in groups can be a great feeling, building a sense of togetherness and shared purpose. While widespread literacy means getting everyone singing the same words isn't too hard, how do you get everyone on the same melody? This has often been a problem for our secular solstices, but is also one many groups have handled in a range of ways. Here are the options I know about:
Use broader cultural knowledge:
Choose songs that are already well-known. A random group of people in the US will have maybe a few hundred songs they could get through well with no prep, that people learned from hearing them over and over. Some are children's songs (Old MacDonald, Ba Ba Black Sheep), others are well known older pop songs (Hey Jude, YMCA), holiday songs (Jingle Bells, Rudolph), folk songs (This Land Is Your Land, Amazing Grace), movie songs (A Spoonful of Sugar, Over the Rainbow), etc.
Write new words to well-known songs. At our gatherings we've sung songs adapting the music from The Mary Ellen Carter, The Twelve Days of Christmas, Why Does the Sun Shine, Sinner Man, etc.
Use written music. Many churches traditionally took this approach, some using shaped notes to be easier to learn, and it can expand to sight-reading four-part harmony for a very full sound. This does require more advance work: it's not enough to have a song leader and accompanist, you also need to find, buy, or draft an arrangement in appropriate notation. This also only works within a culture of singing from written music, or if your group is a big enough deal in participants lives (ex: weekly gatherings) that many will learn to read music specifically for your events.
Build up your own songs. If you keep doing the same songs with the same people, after 2-5 repetitions the group will know them. No one knows Brighter than Today outside the secular solstice context, but since we do it every year (and some of our attendees have heard it at events elsewhere) it goes well. This works a lot better with groups that meet more often: weekly is great; yearly is hard.
Send out recordings in advance. If people listen to recordings in advance they can show up with the melody learned, ready to sing together. Many people will only need to listen once or twice before they can join in with others singing as a group. This also requires more work from organizers, though, and attendees are often not interested in listening through.
Performances. Expect that most people in the group won't sing along, and a few people who already know the song or are especially good at picking it up do join in.
Call and response. The leader sings a line, the group sings it back with the same melody (ex: Chasing Patterns in the Sky). Unlike the others here, this doesn't even depend on literacy or a method of getting words in front of people. But it also really restricts what you can do musically, since most songs aren't a good fit for this format.
Easy songs. Some melodies are much easier to pick up than others. The more the melody does the obvious thing, avoids jumps, and is repetitive, the more a group of people paying attention can pick it up during the song. This is part of the approach of praise music.
Visual guidance. A leader can use the height of their hand above the floor to roughly indicate the pitch of the next note, or the words can be accompanied by indications of the melodic contours. The imprecision means it's more of a hint than exactly communicating the melody, but because it's intuitive it doesn't depend on your attendees having learned a system for communicating melody.
Muddle through. Sometimes you just really want to sing something new and difficult collectively. It won't sound great, but that's not the point.
These can also combine: if you have a song that some people know because they listened in advance, others because they heard it last time, and others because they can read the written music, that could cover 60% of the crowd, even if none of those could individually. And trying to pick something up while singing along with a group where 60% already know it is much easier than one where only the leader is communicating the melody.
A nice illustration here is the evolution of Somebody Will at our gatherings. It is absolutely not an easy song: it has a wide range, makes some large jumps, isn't all that intuitive, changes keys, and has so many sections that I've color coded them on the musician slides I use. The first time we did it I think it sounded really rough. The second time we tried doing it as a performance, but we got a lot of feedback that for this specific song, which is thematically about participation, people really wanted to be singing along. But through sending out recordings in advance to some people, and then by repeating it often enough that a lot of people have picked it up, we now have it in to an ok place.
I was pretty sure I already wrote this, but when I wanted to send a link to someone I couldn't find it. If you do remember seeing this before send me a link? I'd be curious to compare!
Comment via: facebook, lesswrong, mastodon, bluesky
Discuss
Alienable (not Inalienable) Right to Buy
We have overly simplistic principles of market organization that don't square with human reality, giving too much freedom to indulge our short-term impulses, and not enough tools to help our disciplined, long-term selves say no to it. Suffering from over-consumption in ways our long-term self could unlikely agree with is too much of a norm to be considered the acceptable exception to the rule. So we require a rethinking of "markets" and "freedom" for a shift on a societal level - rather than lay the burden on individuals' willpower alone. The right to buy should become alienable: We require societal structures & laws that support our long-term self to restrict the future short-term self. I must be able to impose today that tomorrow I'll be unable to get a chocolate.
Status of post: Exploration with many questions remaining open (technical, psychological, economical, legal) but a core direction I'm convinced should be explored in more detail.
The Issue: Ain't of steelMarkets let us buy anything society can produce, anytime.[1] At first glance, that sounds great.
The crux? Society fails to empower us to undo this if ever we want to. It doesn't enable the individual to prevent herself from consuming anything anytime in any quantity.
Seeing the consequences, e.g. >40% of adults obese in the US, some 14% diabetic, the conclusion seems clear: The unconstrained consumption possibilities are highly destructive.[2] Very, very difficult to justify to let happen if there are any means to help limit the problem. The issue is by far not niche. We systematically don't resist to things we want ourselves to resist to, and in terms of life quality it risks to negate a significant junk of the gains we have from the economic progress we've made in recent centuries.
The Short-Term vs. Long-Term SelfWe're not a coherent self. We're - a bit stylized - a short-term & long-term self.
Short-termie is the cannot-resist-to-temptation self that becomes the fat sick American if let loose, or the needlessly-alcoholic, or the gambling addict, the smoker struggling to quit, the TV- or Youtube-Junkie, or the hourly-procrastination-newsmedia-scroller like I am (the latter arguably being a bit less sad and/or easier to prevent if dearly wanted).
Long-term self is the one who genuinely cares about the future well-being, is better at resisting temptations, and is generally the self we want in control.
My cash, my bank card, my supermarkets, my energy-wasting hot shower, essentially the entire society: Right now, none cares about supporting my long-term self in her fight against short-termie.
Why not? Because we designed society as if individuals were coherent selves knowing what they want, what's best for "them". Even though we of course all know, all too often we have our short-termies in charge when facing the ubiquitous consumption temptations.
What’s to Be Done?On a meta-level, the solution seems obvious: give the long-term self officially the power to restrict the short-term self when it’s about to make its poor choices. In contrast to some life-hack type solutions, we should understand there is zero reason to think this shall be the burden of the individual alone. We should seek legal & practical societal level solutions to enable the long-term self to reign in short-termie. Once we fully acknowledge there is a difference between a short- and a long-term self living in all of us, there is no simple justification for consumption as an inalienable right. In principle, everyone would ideally be empowered to restrict many (any?) of her future short-term consumption decisions, and society should aim in that direction with whichever means seem pragmatically helpful towards it.
In actual implementation it can be simple and hard at the same time. In fact, from the outset it might even look daunting to implement anything practical here. But, arguably, that's just because of the hitherto near-complete lack of thinking (afaik) in that direction; the required evolution of solutions through trial and error until we find things that work hasn't taken place yet.
Respect, then Trial and ErrorWe should always have respected the long-term self - that is, her difference with short-termie - searching for ways we can empower her. The person's long-term self should be allowed to black-list the person from shops or individual shelves therein. Bank accounts should offer some types of self-programmable purchase-blockers. The fridge should have a programmable lock (ok, obviously exists), food cupboards too. Yes, sounds trivial to circumvent, but we could even have a system where anyone who provides us with the wrong stuff the wrong time can be legally pursued, i.e. we'd endow us with, somewhat confusingly, an inalienable right to alienability of the right to consume. If really needed, we could even think of going into a direction of generally putting the onus on the seller each time she sells us anything: "Can you provide evidence the now ill customer really had her long-term self in charge when she bought that chocolate from you? Did he proof his long-term commitment to wanting the chocolate, as opposed to having had merely his short-termie come buy the good in your shop?"
If you say that all won't work: Yes, nothing I could propose might work or make sense out of the box. Evolution. We must dare think of the problem this type of system can solve, then we'll gradually find solutions, with trial-and-error as in all domains. Will it ever become easy, work really well; will it really solve a lot of major willpower issues? Dunno.
Foreseeable Objections- Wouldn't we have already done these things if we could?
Idk. At least I haven't seen evidence we have done a lot of search for such solutions on a societal level. This makes me optimistic in terms of solvability - if we ever had the will. - Won’t it weaken our willpower? Isn't it essential part of human life to resist to temptations?
Well, pragmatism above all, imho. We see how much havoc the markets, which put billions into understanding how they can better lure our short-term self into buying their stuff, create, e.g. for our health. I guess having to fight on less fronts with our willpower may mainly allow us to fight better on some of the remaining fronts - and else: Note, you, as long-term self, may always decide to train by personally NOT use any of the new measures to restrict your short-term self from any consumption. - Might it be costly?
Possibly. But we’re already the most affluent society in history—yet we use our abundance to create skyrocketing health problems. We could raise a lot the cost of, say, access to junk food without breaking the bank. We’re basically drowning in near-free sugar and carbs, and it’s wreaking havoc on public health.
At first glance, this might sound like a libertarian nightmare: “Restrictions on possibly all goods?!”
But it’s really the opposite. This system is more libertarian than what we currently have because it adds another dimension of choice: the choice to restrict yourself in advance. It’s a “choose to not be able to choose” option.
In fact, under such a system, we might even be able to legalize more goods and services, because your long-term self could opt out of them, restricting the short-term self from making impulsive decisions.
AI to the rescue?The problem warrants societal level solutions - individually we're lacking the ability to restrict ourselves easily enough - and I think it's important we're exploring seriously how we can best tackle it on the right level.
There's though now hope we can improve ourselves even on an individual level if we implement the right AI assistants. If we can integrate them into our bank accounts, have them observe and restrict - with enough authority - our shopping or our picking-stuff-from-the-fridge in the way we tell it before, quite something might already be gained. But if we continue to think about the issue the way we've so far done, might we even miss that potentially simple emerging solution?
- ^
I’m ignoring e.g. illegal drugs or anything unaffordable, because those are separate discussions.
- ^
Of course, prevalence of obesity and diabetes might not go to zero just because we support people systematically to sticking to long-term plans, but judging from many anecdotes and a lot we read about people trying diets and the difficulties in sticking to them, and about what people would be willing to give to reduce their food or other addictions, it seems a reasonable prior that a significant part of addictive behavior could be constrained if we'd systematically enable the long-term self to put hard constraints on what the future short-term self can do.
Discuss
The OODA Loop -- Observe, Orient, Decide, Act
United States Air Force Colonel John Boyd was a fighter pilot and military strategist who developed several important strategic theories. While serving as a jet fighter instructor, he was nicknamed "Forty-Second Boyd" because he had a standing offer that he could go up in his plane and defeat any opponent in a simulated dogfight in forty seconds or less -- and do it starting from an extremely unfavorable position! If he failed, he would owe the challenger $40 -- but purportedly he never failed! Further, he was not only able to accomplish this feat against trainees, but also against various visiting pilots who challenged him.[1]
Boyd's concepts have been credited as instrumental to the United States's dramatic victory in the Persian Gulf War ("Operation Desert Storm"), and his insights into aircraft design have been highly influential as well. Boyd was a very unconventional thinker in some ways and did not always get along well within the system, but was nevertheless able to achieve very impressive results.
While some of Boyd's concepts, like the energy-maneuverability theory, are very specific to air combat, he also had ideas that could be applied much more broadly. Perhaps the most well known of these is his concept of the decision cycle, sometimes known as the OODA loop.
It's worth noting that I would not consider myself some superlative expert in this area. I have done substantial reading on the topic, gone over some of Boyd's old briefing slides, listened to some recorded material from Boyd's presentations, applied some of these concepts in my own life, and taught an experimental class at several CFAR workshops covering these principles; however, I am not a military strategist or aircraft designer, nor have I directly practiced this in high-level business strategy or an equivalent area. Take what I have to say here with a grain of salt!
What are OODA Loops?Basic DefinitionOODA is an acronym that stands for Observe, Orient, Decide, Act; the idea is that in order to make a decision, one goes through these steps.
In the Observe step, one gathers information about the situation around them. In Boyd's original context of fighter aircraft operations, we can imagine a pilot looking out the canopy, checking instruments, listening to radio communications, etc.[2]
In the Orient step, one takes the information gathered in the Observe step, integrates it with one's preexisting models, and uses that to compose a mental model of the current situation and one's own role in it. This is the most important and also most complicated step, and it is based in large part on things that have occurred outside the context of the specific situation you may be analyzing -- your training, experience, preexisting mental models, "priors", etc. are important here!
In the Decide step, one takes the model of the situation from the Orient step and decides what to do as a result.
Finally, in the Act step, one actually implements the decision.
Now, this is called the OODA loop -- after you have completed this cycle, you loop back to the Observe step as you see the results of your action and how the situation has changed, then go through this process again! In fact, one can in principle go through many of these loops in the context of a single situation. Let's look at a basic example of what this might look like in an "everyday" sort of scenario.
Example: Unknown Object in the Road
Let's take the example of a driver who makes a turn or moves around a bend in the road and then sees something unknown in the middle of the road ahead. We might model the driver's thoughts in such a scenario as something like this:
Initial Loop:
Observe: There is an uncertain object in the road ahead.
Orient: Based on my experience driving and encountering scenarios like this before, this might be a hazard in the road or it might be innocuous. I could potentially keep going at full speed, continue ahead but slow down, or stop.
Decide: I will slow down, giving me more time to observe what's going on.
Act: <driver slows down>
Second Loop:
Observe: Closing with the object, I see that it appears to be a plastic grocery bag stuck on the road and blowing in the wind.
Orient: This seems like it is not a hazard to me or my car, nor is it something worth investigating further.
Decide: I will accelerate back to normal pace and drive on.
Act: <driver accelerates and moves on>
It's worth noticing that much of what goes on here is implicit. One might have gone through both of these loops and carried out the associated actions in just a second or two. Similarly, it's unlikely that someone will think through these steps as "formally" as described here. You do not have to think through these processes in explicit detail!
A Note on Orientation
You may have noticed that the "orient" step in the above example brought in a lot of implicit information from other contexts -- the driver's experience, the fact that running over a plastic bag will not harm the car, and so on. This is true! In general, the orientation process is the most complicated and detailed part of the loop.
More Advanced Illustration
Now that we have that basic concept down, it is important to note that the OODA loop is not as linear as this model presents. On the real world one is often making observations while these other processes are also going on -- the driver doesn't close his or her eyes and stop looking at the road while deciding what to do next, for instance!
I want to flag that this is a very simplified presentation of the OODA loop concept, and that Boyd's original briefing "The Essence of Winning and Losing" presented a significantly more detailed version of this cycle. I consider the simplification here relevantly useful but want to be very clear that it is not all the OODA concept has to offer!
In fact, there are many different factors that feed into different elements of the OODA loop. This post is intended to be relatively basic so I will not go into them in great detail here, but I think it is worth looking at Boyd's own diagram showing the more expanded loop:
This image is from "The Essence of Winning and Losing", a briefing Boyd gave in the '90s to various military figures. Note that there are several steps that flow backwards into different components of the loop, and the much more complicated and detailed view of the Orient step in particular.
Applications of the OODA FrameworkOODA Loops In CompetitionThe OODA loop was originally developed in the context of adversarial conflict or competition -- air-to-air combat -- and while it has other applications these are perhaps the most "natural".
Getting Inside their OODA LoopOne insight that the OODA framework leads to is that if you can go through your decision cycles faster than the opponent, you can gain a substantial advantage. By the time the adversary finishes their decision cycle, you may have already taken actions that change the situation, leaving them to either react improperly to conditions that no longer exist -- or perhaps to go back and start reorienting etc., potentially leading to decision paralysis. This is sometimes referred to as being "inside their OODA loop", phrasing which as I understand it originates with Boyd himself.
Basic military tactics, such as ambushes, can be conceptualized in terms of the OODA loop -- by carrying out an ambush and attacking enemies that are not initially aware of what's going on, one can initiate the conflict while already oriented to the situation, while your targets are not yet oriented to a battle, hence catching the enemy off guard and delaying their response.[3]
I remember applying the OODA loop in the context of a video game called Friends vs. Friends, which combines first-person shooter mechanics with a hand of cards you can play to dramatically change the situation -- playing a card might give you a more powerful weapon, make your opponent's head larger and easier to hit, give you bonus health, or even transport both you and your opponent to a new map! However, the opponent sees what card you have played. One thing I realized by applying the OODA framework was that playing some especially powerful cards at the last possible moment was a good way to get inside the opponent's OODA loop; while "by default" people (including me!) often played cards earlier in the round, this gives opponents more time to react.
For example, if I played a card that gave me a shotgun (in this game a very powerful close-range weapon but bad at longer range) right at the start of the round, the opponent might know I had that and "play around" it by trying to set up longer-ranged engagements. But if instead I played the shotgun card when I was already right next to the opponent, this opportunity to counter would be diminished!
However, the most dramatic card effect that benefited from this was a card called "Weapon Swap", which trades your weapon with that of the opponent. This is already a very powerful card because it allows you to potentially trade a weak weapon for an opponent's powerful one. I also found it strong in that it can quickly reverse the tactical situation; in general in Friends vs. Friends, if I have a weapon that has the advantage at long distance I will try to keep the opponent back, while if I have a weapon that has the advantage at close distance I will try to close in. The Weapon Swap card allows you to suddenly reverse this dynamic -- for instance, if I have a sniper rifle (one of the best long-range weapons) and my opponent is rushing me with a shotgun (one of the best close-range weapons), and right when they're about to reach me I swap our weapons, the opponent now perhaps needs to do the exact opposite of what they were doing before, desperately retreating instead of charging in!
In the OODA framework, by holding onto this card and using it to quickly reverse what weapons we have equipped, I am forcing the opponent to very rapidly go through an OODA loop -- observe that I've used the card, orient to the fact that our weapons have been swapped, decide how to respond, carry out that response -- and very often they can't do it in time, either panicking and freezing up or continuing to carry out a plan that is now directly counterproductive![4]
OODA in the Business WorldSimilarly, one can apply this model to business decisions. The OODA concept can relevantly show how startup companies can in some cases successfully compete with more established businesses.
Let's say that a smart engineer at a small startup and a smart engineer at a big company notice a new opportunity at the same time. At the startup, at least in principle the whole organization could pivot to focus on this opportunity (assuming it seems promising enough) within a few days. At the large company, the engineer might meet with their manager, who might then bring something up at a higher level meeting, which then might go to a strategy board, which might then request some people do a pilot program, which (if it goes well) might then be passed up to higher executives for evaluation... and each of those steps might well take multiple weeks! In fact, "weeks" is perhaps optimistic for some!
In other words, by the time that the big company has gotten through its complicated decision cycle, the small startup might well have run through multiple full OODA loops involving testing new ideas, seeing how customers react, and so on. This can be a major advantage for the smaller, more agile group!
A friend of mine from college now runs a startup company that offers an API allowing air carriers to make much quicker changes to their offerings. For complicated bureaucratic reasons, doing basic A/B tests of web sales at airlines using previous systems would often take months or even years to implement, and my friend's company offers a system that can enable much more rapid decisions. One can perhaps easily see why it would be a very major business advantage to be able to test out new product or web sales ideas multiple times faster than one's competitors -- your OODA loop has suddenly become much, much quicker!
(Important Note: there is more to getting inside an opponent's OODA loop than just speed and surprise, but I need to study and practice this aspect more before I feel comfortable writing it up in detail. I'll hopefully have more on this later!)
OODA Loops Without Competition
This idea of "getting inside their OODA loop" -- while interesting and helpful -- is not the only area where Boyd's OODA principles can be applied! In fact, thinking about improving the OODA loop can be helpful even in scenarios where one does not necessarily have any direct competitor at all.
One might, for instance, consider how to make your own decision cycle faster and better even without external adversaries pressuring you. One interesting example of this was when Lightcone was recently renovating their campus. I remember noticing that they were willing to pay a premium in order to get decision-critical information faster in order to clear relevant bottlenecks -- for example, when considering a certain aspect of their project that required an external specialist to evaluate (I believe a plumber or electrician but might be misremembering?), the Lightcone team was willing to pay extra to get someone who could come that day in order to get the critical information they needed faster.
This sort of action can be conceptualized in an OODA framework -- in this case, the key issue in the process is the need for specialist observation and orientation, which the main team cannot really provide. By putting a high premium on getting that specialist input quickly, one can get through this (otherwise bottlenecked) step and move on to the rest of the project quickly. However, the rest of the loop has to be able to handle it -- if we imagine a scenario where, despite getting the specialist's evaluation as quickly as possible, the team doesn't meet to discuss things further until another month, the haste to get that critical information doesn't seem to make very much sense!
"OODA Failure Analysis" - a technique
One technique that I developed is something I call "OODA Failure Analysis" -- this technique uses the OODA framework as a method of analyzing things that have gone poorly in scenarios one encounters in one's own life.
When teaching my OODA class at CFAR workshops, I often ask participants to come up with a list of several things that have gone wrong for them recently, ranging from minor issues -- "I missed the train last week and was late for work" works fine here -- to more serious problems.
If you want to apply this technique here, please actually do that now. This is not a joke, and you will probably get more out of the post if you do. List ten things, actually write them down on a piece of paper or a document on your computer or something. Don't vaguely have four things in your head, actually write down ten.
Then, I have them analyze each of the different situations they came up with in terms of where in the OODA loop something went wrong.
In other words, did this go wrong because of bad observation, bad orientation, bad decisions, or bad actions? Here are some quick examples:
- Alice's team develops a major product without first checking to see if it's something people actually want -- after a year and a half of development, the product works great, but it turns out there isn't much of any demand. (I would consider this an observation failure -- failure to observe critical information leads to lots of wasted time.)
- Bob gets into extended conflicts at work, where he argues at length with higher-ups about the direction of various projects and initiatives. Even though his arguments are often good, he makes little headway and ultimately leaves the company. Bob bemoans that he was just treating his manager as a peer -- why wasn't she willing to take him seriously? (I would consider this an orientation failure -- Bob conceptualized his role in the system improperly and thus used interaction patterns that weren't likely to succeed.)[5]
- Cathy is training for a Magic: the Gathering tournament. She knows what she's capable of, has a good sense of what the "meta" (popular/standard strategies used by other players at the time) looks like, and ends up having to choose between four or five different decks that she thinks will be viable before the event. She picks one that she thinks will be the most advantageous, even though it's something she's unfamiliar with -- but she's confident she can practice enough to make up for that in time. When the time comes, her meta read is accurate but the less familiar deck underperforms for her, and she wishes she'd gone with a different option. (I would consider this a decision failure -- Cathy had good observations and orientation to the situation, but ultimately made the wrong choice from her list of options.[6]
- Dan has a good understanding of an open market for laser-cut game pieces, a good understanding of his capabilities to fill that need, and a good plan for products that he thinks will do very well. However, for mysterious reasons he ends up not executing on the plan, and a few years later, Dan notices other companies are now producing the type of thing that he'd thought of. This is the first time anything like this has ever happened to him. (I would consider this an act failure - Dan had good observation, orientation, and a good decision, but fell down on execution. If this was a distinct pattern across many events, though there might be an orientation issue in terms of not modeling one's own capabilities accurately...)
In general, a failure at an early step of the loop can often "cascade downwards", so if you're unsure I would tend to favor reporting a failure at an earlier part of the process. Also, since decision can flow pretty directly from orientation, you may find these two similar enough that you want to group them as one; I'm undecided on whether to make that change to this technique "more formally" and probably need to test it with more participants to see!
If you want to apply this technique here, please actually classify the items on your list now. Write something down next to them, don't vaguely do it in your head.
In some cases, doing this exercise can help one notice quite distinct patterns in where things seem to go wrong -- patterns which can potentially quite help one prioritize what areas of one's own processes might need improvement!
For example, if you find yourself thinking "huh, it seems like a recurring theme in things that go wrong for me is that I keep going into situations without gathering enough information first", that could be very useful feedback as to which part of your process you might want to spend some more time on!
Also, even if you don't determine a clear pattern from a bunch of examples, I think thinking about things and classifying them like this can help build fluency with the OODA model of the decision cycle as a whole.
Final Thoughts
I have more thoughts on OODA, Boyd, and strategic thinking in general, but this post is already perhaps getting too long for an introduction -- hopefully, this will prove a useful introduction to the decision cycle or OODA Loop, and I will perhaps follow it up with more! I've found it a very interesting and at times fruitful area of study.
- ^
He had a somewhat "trick" maneuver that let him do this, but it still worked.
- ^
Boyd originally called this step "Sensing" rather than "Observing", but the name "SODA Loop" seemed rather unserious and had to change.
- ^
Interestingly enough, some military counter-ambush tactics involve reacting to an ambush by attacking very quickly and aggressively into the ambush zone, hopefully catching the ambushers by surprise and forcing them to be the ones reorienting to a new situation!
- ^
Much like in the counterambush scenario described above, the best response to the Weapon Swap card is perhaps to play your own Weapon Swap card, putting the shoe back on the opponent's foot -- now they are the ones who have to react fast!
- ^
Boyd himself got into lots of conflicts with higher-ups, but was often able to use his strategic knowledge to cleverly outmaneuver them!
- ^
You could perhaps argue that this is actually orientation failure though, with the argument being something like "she should have known she didn't have enough practice time and cut that from the list" -- see the upcoming caveat re: grouping decision and orientation.
Discuss
Comment on "Death and the Gorgon"
(some plot spoilers)
There's something distinctly uncomfortable about reading Greg Egan in the 2020s. Besides telling gripping tales with insightful commentary on the true nature of mind and existence, Egan stories written in the 1990s and set in the twenty-first century excelled at speculative worldbuilding, imagining what technological wonders might exist in the decades to come and how Society might adapt to them.
In contrast, "Death and the Gorgon", published in the January/February 2024 issue of Asimov's, feels like it's set twenty minutes into the future. The technologies on display are an AI assistant for police officers (capable of performing research tasks and carrying on conversation) and real-time synthetic avatars (good enough to pass as a video call with a real person). When these kinds of products showed up in "'90s Egan"—I think of Worth's "pharm" custom drug dispenser in Distress (1995) or Maria's "mask" for screening spam calls in Permutation City (1994)—it was part of the background setting of a more technologically advanced world than our own.
Reading "Gorgon" in 2024, not only do the depicted capabilities seem less out of reach (our language model assistants and deepfakes aren't quite there yet, but don't seem too far off), but their literary function has changed: much of the moral of "Gorgon" seems to be to chide people in the real world who are overly impressed by ChatGPT. Reality and Greg Egan are starting to meet in the middle.
Our story features Beth, a standard-issue Greg Egan protagonist[1] as a small-town Colorado sheriff investigating the suspicious destruction of a cryonics vault in an old mine: a naturally occurring cave-in seems unlikely, but it's not clear who would have the motive to thaw (murder?) a hundred frozen heads.
Graciously tolerating the antics of her deputy, who is obsessed with the department's trial version of (what is essentially) ChatGPT-for-law-enforcement, Beth proceeds to interview the next of kin, searching for a motive. She discovers that many of the cryopreserved heads were beneficiaries of a lottery for terminally ill patients in which the prize was free cyronic suspension. The lottery is run by OG—"Optimized Giving"—a charitable group concerned with risks affecting the future of humanity. As the investigation unfolds, Beth and a colleague at the FBI begin to suspect that the lottery is a front for a creative organized crime scheme: OG is recruiting terminal patients to act as assassins, carrying out hits in exchange for "winning" the lottery. (After which another mafia group destroyed the cryonics vault as retaliation.) Intrigue, action, and a cautionary moral ensue as our heroes make use of ChatGPT-for-law-enforcement to prove their theory and catch OG red-handed before more people get hurt.
So, cards on the table: this story spends a lot of wordcount satirizing a subculture that, unfortunately, I can't credibly claim not to be a part of. "Optimized Giving" is clearly a spoof on the longtermist wing of Effective Altruism—and if I'm not happy about how the "Effective Altruism" brand ate my beloved rationalism over the 2010s, I don't think anyone would deny the contiguous memetic legacy involving many of the same people. (Human subcultures are nested fractally; for the purposes of reviewing the story, it would benefit no one for me to to insist that Egan isn't talking about me and my people, even if, from within the subculture, it looks like the OpenPhil people and the MIRI people and the Vassarites and ... &c. are all totally different and in fact hate each other's guts.)
I don't want to be defensive, because I'm not loyal to the subculture, its leaders, or its institutions. In the story, Beth talks to a professor—think Émile Torres as a standard-issue Greg Egan character—who studies "apostates" from OG who are angry about "the hubris, the deception, and the waste of money." That resonated with me a lot: I have a long dumb story to tell about hubris and deception, and the corrupting forces of money are probably a big part of the explanation for the rise and predictable perversion of Effective Altruism.
So if my commentary on Egan's satire contains some criticism, it's absolutely not because I think my ingroup is beyond reproach and doesn't deserve to satirized. They (we) absolutely do. (I took joy in including a similar caricature in one of my own stories.) But if Egan's satire doesn't quite hit the mark of explaining exactly why the group is bad, it's not an act of partisan loyalty for me to contribute my nuanced explanation of what I think it gets right and what it gets wrong. I'm not carrying water for the movement;[2] it's just a topic that I happen to have a lot of information about.
Without calling it a fair portrayal, the OG of "Gorgon" isn't a strawman conjured out of thin air; the correspondences to its real-world analogue are clear. When our heroine suspiciously observes that these soi-disant world-savers don't seem to be spending anything on climate change and the Émile Torres–analogue tells her that OG don't regard it as an existential threat, this is also true of real-world EA. When the Torres-analogue says that "OG view any delay in spreading humanity at as close to light-speed as possible as the equivalent of murdering all the people who won't have a chance to exist in the future," the argument isn't a fictional parody; it's a somewhat uncharitably phrased summary of Nick Bostrom's "Astronomical Waste: The Opportunity Cost of Delayed Technological Development". When the narrator describes some web forums as "interspers[ing] all their actual debunking of logical fallacies with much more tendentious claims, wrapped in cloaks of faux-objectivity" and being "especially prone to an abuse of probabilistic methods, where they pretended they could quantify both the likelihood and the potential harm for various implausible scenarios, and then treated the results of their calculations—built on numbers they'd plucked out of the air—as an unimpeachable basis for action", one could quibble with the disparaging description of subjective probability, but you can tell which website is being alluded to.
The cryonics-as-murder-payment lottery fraud is fictional, of course, but I'm inclined to read it as artistically-licensed commentary on a strain of ends-justify-the-means thinking that does exist within EA. EA organizations don't take money from the mob for facilitating contract killings, but they did take money from the largest financial fraud in history, which was explicitly founded as a means to make money for EA. (One could point out that the charitable beneficiaries of Sam Bankman-Fried's largesse didn't know that FTX wasn't an honest business, but we have to assume that the same is true of OG in the story: only a few insiders would be running the contract murder operation, not the rank-and-file believers.)
While the depiction of OG in the story clearly shows familiarity with the source material, the satire feels somewhat lacking qua anti-EA advocacy insofar as it relies too much on mere dismissal rather than presenting clear counterarguments.[3] The effect of OG-related web forums on a vulnerable young person are described thus:
Super-intelligent AIs conquering the world; the whole Universe turning out to be a simulation; humanity annihilated by aliens because we failed to colonize the galaxy in time. Even if it was all just stale clichés from fifty-year-old science fiction, a bright teenager like Anna could have found some entertainment value analyzing the possibilities rigorously and puncturing the forums' credulous consensus. But while she'd started out healthily skeptical, some combination of in-forum peer pressure, the phony gravitas of trillions of future deaths averted, and the corrosive effect of an endless barrage of inane slogans pimped up as profound insights—all taking the form "X is the mind-killer," where X was pretty much anything that might challenge the delusions of the cult—seemed to have worn down her resistance in the end.
I absolutely agree that healthy skepticism is critical when evaluating ideas and that in-forum peer pressure and the gravitas of a cause (for any given set of peers and any given cause) are troubling sources of potential bias—and that just because a group pays lip service to the value of healthy skepticism and the dangers of peer pressure and gravitas, doesn't mean the group's culture isn't still falling prey to the usual dysfunctions of groupthink. (As the inane slogan goes, "Every cause wants to be a cult.")
That being said, however, ideas ultimately need to be judged on their merits, and the narration in this passage[4] isn't giving the reader any counterarguments to the ideas being alluded to. (As Egan would know, science fiction authors having written about an idea does not make the idea false.) The clause about the whole Universe turning out to be a simulation is probably a reference to Bostrom's simulation argument, which is a disjunctive, conditional claim: given some assumptions in the philosophy of mind and the theory of anthropic reasoning, then if future civilization could run simulations of its ancestors, then either they won't want to, or we're probably in one of the simulations (because there are more simulated than "real" histories). The clause about humanity being annihilated by failing to colonize the galaxy in time is probably a reference to Robin Hanson et al.'s grabby aliens thesis, that the Fermi paradox can be explained by a selection effect: there's a relatively narrow range of parameters in which we would see signs of an expanding alien civilization in our skies without already having been engulfed by them.
No doubt many important criticisms could be made of Bostrom's or Hanson's work, perhaps by a bright teenager finding entertainment value in analyzing the possibilities rigorously. But there's an important difference between having such a criticism[5] and merely asserting that it could exist. Speaking only to my own understanding, Hanson's and Bostrom's arguments both look reasonable to me? It's certainly possible I've just been hoodwinked by the cult, but if so, the narrator of "Gorgon"'s snarky description isn't helping me snap out of it.
It's worth noting that despite the notability of Hanson's and Bostrom's work, in practice, I don't see anyone in the subculture particularly worrying about losing out on galaxies due to competition with aliens—admittedly, because we're worried about "super-intelligent AIs conquering the world" first.[6] About which, "Gorgon" ends on a line from Beth about "the epic struggle to make computers competent enough to help bring down the fools who believe that they're going to be omnipotent."
This is an odd take from the author[7] of multiple novels in which software minds engage in astronomical-scale engineering projects. Accepting the premise that institutional longtermist EA deserves condemnation for being goofy and a fraud: in condemning them, why single out as the characteristic belief of this despicable group, the idea that future AI could be really powerful?[8] Isn't that at least credible? Even if you think people in the cult or who work at AI companies are liars or dupes, it's harder to say that about eminent academics like Stuart Russell, Geoffrey Hinton, Yoshua Bengio, David Chalmers, and Daniel Dennett, who signed a statement affirming that "[m]itigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war."[9]
Egan's own work sometimes features artificial minds with goals at odds with their creator, as in "Steve Fever" (2007) or "Crystal Nights" (2008), and with substantial advantages over biological creatures: in Diaspora (1997), the polis citizens running at 800 times human speed were peace-loving, but surely could have glassed the fleshers in a war if they wanted to. If you believe that AI could be at odds with its creators and hold a competitive advantage, scenarios along the lines of "super-intelligent AIs conquering the world" should seem plausible rather than far-fetched—a natural phenomenon straightforwardly analogous to human empires conquering other countries, or humans dominating other animals.
Given so many shared premises, it's puzzling to me why Egan seems to bear so much antipathy towards "us",[10] rather than than regarding the subculture more coolly, as a loose amalgamation of people interested in many of the same topics as him, but having come to somewhat different beliefs. (Egan doesn't seem to think human-level AI is at all close, nor that AI could be qualitatively superhumanly intelligent; an aside in Schild's Ladder (2002) alludes to a fictional result that there's nothing "above" general intelligence of the type humans have, modulo speed and memory.) He seems to expect the feeling to be mutual: when someone remarked on Twitter about finding it funny that the Less Wrong crowd likes his books, Egan replied, "Oh, I think they've noticed, but some of them still like the, err, 'early, funny ones' that predate the cult and hence devote no time to mocking it."
Well, I can't speak for anyone else, but personally, I like Egan's later work, including "Death and the Gorgon."[11] Why wouldn't I? I am not so petty as to let my appreciation of well-written fiction be dulled by the incidental fact that I happen to disagree with some of the author's views on artificial intelligence and a social group that I can't credibly claim not to be a part of. That kind of dogmatism would be contrary to the ethos of humanism and clear thinking that I learned from reading Greg Egan and Less Wrong—an ethos that doesn't endorse blind loyalty to every author or group you learned something from, but a discerning loyalty to whatever was good in what the author or group saw in our shared universe. I don't know what the future holds in store for humanity. But whatever risks and opportunities nature may present, I think our odds are better for every thinking individual who tries to read widely and see more.[12]
Some people say that Greg Egan is bad at characterization. I think he just specializes in portraying reasonable people, who don't have grotesque personality flaws to be the subject of "characterization." ↩︎
I do feel bad about the fraction of my recent writing output that consists of criticizing the movement—not because it's disloyal, but because it's boring. I keep telling myself that one of these years I'm going to have healed enough trauma to forget about these losers already and just read ArXiv papers. Until then, you get posts like this one. ↩︎
On the other hand, one could argue that satire just isn't the right medium for presenting counterarguments, which would take up a lot of wordcount without advancing the story. Not every written work can accomplish all goals! Maybe it's fine for this story to make fun of the grandiose and cultish elements within longtermist EA (and there are a lot of them), with a critical evaluation of the ideas being left to other work. But insofar as the goal of "Gorgon" is to persuade readers that the ideas aren't even worthy of consideration, I think that's a mistake. ↩︎
In critically examining this passage, I don't want to suggest that "Gorgon"'s engagement with longtermist ideas is all snark and no substance. Earlier in the story, Beth compares OG believers "imagin[ing] that they're in control of how much happiness there'll be in the next trillion years" to a child's fantasy of violating relativity by twirling a rope millions of miles long. That's substantive: even if the future of humanity is very large, the claim that a nonprofit organization today is in a position to meaningfully affect it is surprising and should not be accepted uncritically on the basis of evocative storytelling about the astronomical stakes. ↩︎
Which I think would get upvoted on this website if it were well done—certainly if it were written with the insight and rigor characteristic of a standard-issue Greg Egan protagonist. ↩︎
Bostrom's "Astronomical Waste" concludes that "The Chief Goal for Utilitarians Should Be to Reduce Existential Risk": making sure colonization happens at all (by humanity or worthy rather than unworthy successors) is more important that making it happen faster. ↩︎
In context, it seems reasonable to infer that Beth's statement is author-endorsed, even if fictional characters do not in general represent the author's views. ↩︎
I'm construing "omnipotent" as rhetorical hyperbole; influential subcultural figures clarifying that no one thinks superintelligence will be able to break the laws of physics seems unlikely to be exculpatory in Egan's eyes. ↩︎
Okay, the drafting and circulation of the statement by Dan Hendrycks's Center for AI Safety was arguably cult activity. (While Hendrycks has a PhD from UC Berkeley and co-pioneered the usage of a popular neural network activation function, he admits that his career focus on AI safety was influenced by the EA advice-counseling organization 80,000 hours. But Russell, Hinton, et al. did sign. ↩︎
This isn't the first time Egan has satirized the memetic lineage that became longtermist EA; Zendegi (2010) features negative portrayals of a character who blogs at overpoweringfalsehood.com (a reference to Overcoming Bias) and a Benign Superintelligence Bootstrap Project (a reference to what was then the Singularity Institute for Artificial Intelligence). ↩︎
Okay, I should confess that I do treasure early Egan (Quarantine (1992)/Permutation City (1994)/Distress (1995)) more than later Egan, but not because they devote no time to mocking the cult. It's because I'm not smart enough to properly appreciate all the alternate physics in, e.g., Schild's Ladder (2002) or the Orthogonal trilogy (2011–2013). ↩︎
Though we're unlikely to get it, I've sometimes wished for a Greg Egan–Robin Hanson collaboration; I think Egan's masterful understanding of the physical world and Hanson's unsentimental analysis of the social world would complement each other well. ↩︎
Discuss
Fireplace and Candle Smoke
We celebrated New Year's Eve at my dad's, including a fire in the fireplace. I was curious how much the wood smoke went up the chimney vs collecting in the room, and decided to take some measurements. I used the M2000 that I got when investigating whether a ceiling fan could be repurposed as an air purifier.
Here's what I found:
I started the meter running at 4:30pm, and we started the fire at about 5:30pm. I didn't write down the specific time because I thought it would be evident from the chart [1] but actually I can't see it at all.
Then at 6:45pm we lit Hanukkah candles, and the smoke from the matches being blown out had a very sharp effect. Particulate levels stayed high for the rest of the time, with both the fireplace and candles, which I attribute to the candles.
[1] Several years ago I remember reading Sam Harris' blog post The
Fireplace Delusion, which argues that while we consider wood fires
to be wholesome they're actually worse than smoking. And argues that
this feeling of "wait, but wood fires are good!" is useful for
understanding what religious folks are thinking when presented with
atheism. Several years later his post had gotten jumbled in my head
into saying that fireplace fires cause bad air quality in your own
home, and so when I ran this experiment I was expecting to see quite
high levels. On rereading, however, he spends a lot of time talking
about externalities: the wood smoke that goes up my chimney goes, in
part, into many other people's houses, causing a small bit of harm in
each. So no conflict there.
Discuss
new chinese stealth aircraft
Recently, 2 Chinese military aircraft were seen flying for the first time. Some people wanted to read about my thoughts on them. In this post, I'll be referring to them as "Diamond" and "Dart" based on their shapes. Speculative designations being used elsewhere are:
- Diamond = Chengdu J-36
- Dart = Shenyang J-XS
Instead of embedding photos here, I'll just link to some articles with pictures:
what the photos show aircraft sizeDiamond seems to be ~22m long, with a central weapon bay long enough for the PL-17 or YJ-83 (6.4m), and 2 smaller bays long enough for the PL-15 (4m). It could probably carry glide bombs too. Its wing area is quite large for a fighter aircraft. The planform is similar to a F-16XL, and scaling that up to 22m length would be ~50 tons MTOW.
Dart is smaller, and its bays seem big enough for the PL-15 but not the PL-17. So, it's meant to operate closer to its targets, but the PL-15 is still bigger and longer-range than current US air-to-air missiles.
aerodynamicsDiamond has thin delta wings. Sweep is ~50°, quite high. It looks designed to go Mach 2 in a straight line at high altitude.
Dart has higher aspect ratio wings. It should have better turning at subsonic speeds, but probably has less range than Diamond at supersonic speeds and a lower max altitude. It should have significantly shorter takeoff distance than Diamond.
control surfacesBoth aircraft have no vertical stabilizer. Normally, those are important for preventing uncontrolled yaw to keep the aircraft pointed forwards.
Diamond has a lot of separate ailerons in the back, which could control yaw by increasing drag on 1 side. That's how the B-2 did things. Diamond also has thrust vectoring, as indicated by things including space between the exhaust nozzles; I suspect that's meant to be the main way Diamond controls yaw.
Dart has fewer ailerons, but has some funky protrusions on the wingtips - I wonder if those are exhaust nozzles for bleed air from the engines for yaw control. If the wingtip things aren't for controlling yaw, then Dart definitely needs thrust vectoring, but it seems designed for lower cost than Diamond and thrust vectoring does increase cost.
stealthI haven't done simulations or anything, but Diamond seems about as stealthy vs aircraft radar as the F-22, and more stealthy from above or vs low-frequency radar.
The advantage that the F-22 and F-35 have in stealth over the J-20 comes from the US:
- having better supercomputers for simulations when they were designed
- being willing to spend more on manufacturing, and thus making fewer compromises about stealth
Those advantages are no longer applicable, so you shouldn't expect Chinese aircraft to be particularly worse in terms of stealth.
Aircraft are usually more stealthy from below than from above. So, high altitude is an advantage. Diamond should have a very high max altitude, higher than the F-22.
Radar reflections also depend on frequency. Removing vertical stabilizers has a bigger effect on low-frequency radar, which isn't usually used by fighter aircraft because it requires bigger antennas. It also reduces RCS from above more than RCS from below, since they're on the top of the aircraft.
landing gearThe aircraft were seen with the landing gear left down, which might indicate an early test flight. (You test 1 thing at a time, and landing gear cycling is another potential failure.)
Diamond has tandem-wheel main landing gear, which indicates high max weight, possibly >50 tons.
flight locationThe flights were done over a populated area. The landing gear staying down points to an early test flight, but on the other hand, early tests are usually done where a crash won't hit people. It's possible that risk was outweighed by desire to show off something for Mao's birthday, or maybe testing has actually been going on for a while.
enginesVideo of Dart indicates 2 engines with afterburners.
Diamond seems to have 3 engines, since it has 3 nozzles. It might have been designed with 3 engines so it could cruise on 1 or 2 engines at subsonic speeds + low altitude without unbalancing thrust. If those engines are Shenyang WS-15 engines, it would have a pretty high thrust/weight ratio, which I'm guessing would be enough for a max speed between Mach 2.5 and Mach 3. Obviously heat becomes a problem at that point. Such high speed and T/W also implies a high max altitude, maybe ~22 km.
Most fighter aircraft have afterburners, but Diamond might actually not need them.
Chinese gas turbines are still not quite as good as new US ones, but based on recent power plant turbines, they're now using single-crystal nickel alloys with internal cooling channels and thermal barrier coatings, and are good enough for competitive aircraft if fuel efficiency isn't critical.
Some people are saying one engine of Diamond is a ramjet, but that doesn't make sense for the overall design. I think all the engines of Diamond and Dart are low-bypass turbofans, but it's possible the center engine of Diamond has a different bypass ratio.
sensorsAs articles have noted, Diamond seems to have some big sideways-pointed AESA radars, and a big optical sensor that's probably an IRST. Dart seems to have smaller and less expensive sensors, but I'm sure it still has a decent AESA radar.
China is pretty good at making GaN AESA radars now. They're still not quite as good as new American ones for a given size and power, but not enough to outweigh significant size differences, and the Chinese are getting a lot more radar per cost - which is part of why they're putting AESA radar in AA missiles.
costThe different manufacturing methods for modern military aircraft have similar costs. I'd expect Diamond to cost about as much per mass as a F-35. That's $100M for 30 tons, so Diamond might be $170M if it was made in the US, but China can often make military stuff for 1/3 the nominal cost in the US. For aircraft, I suspect the cost multiplier is closer to 1/2, and the J-20 nominal cost is ~$60M. So Diamond might be ~$85m, maybe a bit more because it seems premium, while Dart might be a bit less per mass.
strategic purposesI previously wrote a bit about Chinese air strategy; see "chinese strategy" in this post.
Here are some relevant papers by the Chinese aircraft designers Yang Wei and Wang Haifeng. Yang Wei is someone I'd previously noted as a possible modern Mikhail Gurevich.
DiamondBased on the aircraft size, the main purpose of Diamond is to carry big long-range missiles, such as the PL-17 and YJ-83. It looks expensive, and those are expensive missiles. It's also not possible to target something stealthy (like a F-35) at very long range. So, Diamond is meant to attack high-value non-stealthy targets such as military ships, tanker aircraft, and AWACS.
Based on extrapolation from existing aircraft, I'm guessing it's designed for a combat radius of ~1600 km without refuelling. That's long-range for a fighter, but short for a bomber.
It has a long takeoff distance, so it's definitely land-based. It's too expensive and short-range for strategic bombing.
Diamond has some big sideways AESA radars. I suspect it's meant to act as AWACS sometimes, making it sort of a...stealthy supercruising missile-bomber/AWACS. It could fire, turn 90°, use one radar for detection and the other radar to send data to friendly craft, then turn off its radar and lose any incoming attention by being fast and stealthy.
The other goal apparent in the design of Diamond is competing directly against F-22 stealth by sacrificing maneuverability and production cost. It's hard to beat the stealth of a F-22 from below vs aircraft radar, so the plan would be:
- Use ship-based low-frequency radar to detect a F-22.
- Use a big IRST on Diamond to track it.
- Have high speed to chase down the F-22.
- Fly almost directly over it, and get a missile lock first by being at higher altitude and seeing the less-stealthy top side.
Compared to Diamond, Dart is smaller and more maneuverable, so it'd be used more like existing fighters than Diamond, with fast turning being relevant for the same reasons. Usage would be similar to a F-35.
It seems a lot cheaper, so it's meant to be made in larger quantities than Diamond to increase total aircraft numbers. Takeoff distance seems much shorter so it might be designed for use on carriers.
other possible new aircraftChina has been working on a stealthy subsonic long-range bomber, the H-20. That's slower but longer-range than Diamond; it fills a similar strategic role to the US B-21. It hasn't been seen publicly yet, and the actual program status isn't clear.
I suspect China is also working on a stealthy tanker aircraft for refuelling its fighters.
The US has been working on "loyal wingman" UAVs, which would fly together with a manned aircraft to carry more weapons for it, while being cheaper because they're smaller & subsonic & don't have good sensors. China seems to be working on something similar, which'd probably end up with similar specs to a XQ-58 by convergent evolution.
Taiwan timelinesDo these Chinese aircraft programs indicate anything about if and when China will go for Taiwan?
Developing large new military aircraft is expensive, so maybe it doesn't make sense for China to have 4+ such programs going and start a war shortly before they go into full production. Waiting until 2-3 years after mass production starts would make more sense.
For several years now I've been expecting China to go for a blockade of Taiwan, and earlier than most estimates, around 2025-2027. That was based largely on Chinese industrial activity and resource stockpiling indicating preparation for trade by ship stopping; their military buildups are much more opaque. These aircraft programs could line up with 2027-2028, but I think they're an indication China won't go for Taiwan in 2025. In retrospect I was underestimating the leeway the Chinese gov wanted for finding alternatives to failed projects and expanding successful ones, so 2025 was too early. Good thing I didn't decide to hold Intel stock, eh?
Discuss
The Roots of Progress 2024 in review
2024 was a big year for me, and an even bigger year for the Roots of Progress Institute (RPI). For one, we became the Roots of Progress Institute (with a nice new logo and website). Here’s what the org and I were up to this year. (My annual “highlights from what I read this year” are towards the end, if you’re looking for that.)
The Progress ConferenceProgress Conference 2024, hosted by RPI together with several great co-presenters, was the highlight of my year, and I think some other people’s too. We’ve already covered it in previous writeups, but in case you’re just tuning in: well over 200 people attended (with hundreds on the waitlist); dozens of great speakers, including Tyler Cowen, Patrick Collison, and Steven Pinker; and over 30+ participant-led “unconference” sessions on a variety of topics from healthcare to medieval Chinese technology. Several people told us it was the best conference they had ever attended, full stop. (!) See the writeups from Scott Alexander, Noah Smith, Packy McCormick, or Bryan Walsh (Vox), to pick a few.
Most of the talks are now online, and most of the rest will be up soon.
The RPI FellowshipIn 2024 we also ran the second cohort of the Roots of Progress Fellowship. Two dozen talented writers completed the program, publishing dozens of essays and almost doubling their audiences. I was thrilled with the talent we attracted to the program this year and excited to see where they’re going to go. See our recent writeup of the program.
My writingIn 2024 I published 17 essays (including this one) totaling over 37,000 words. That’s about half of last year, which decline I attribute in part to being involved in the programs mentioned above, and to doing fundraising. Also, about half of those essays, and well over half the words, were for my book-in-progress, The Techno-Humanist Manifesto, and that is some of the hardest writing I’ve done.
Highlights:
- Longest post (4,400 words): The Life Well-Lived, part 2, from Chapter 4 of The Techno-Humanist Manifesto
- Most liked on Substack: Announcing The Techno-Humanist Manifesto
- Most commented on Substack: What is progress?
- Most upvoted on Hacker News: Why you, personally, should want a larger human population
- Most upvoted on LessWrong: Biological risk from the mirror world
In 2024:
- My email subscribers (via Substack) grew 82% to almost 33k
- Followers on the social network formerly known as Twitter grew 17% to 36.7k
- I’m also up to 3.4k followers on Farcaster, 1.7k on Bluesky, and over 1k on Threads. Follow me where you may!
In all, I got (if I’m reading the reports correctly) 360k unique views on Substack and another 192k unique page views on the legacy ROP blog.
Also, in July, I launched paid subscriptions on the Substack. I’m up to 113 paid subscribers, and a ~$16k annual revenue run rate. That’s only 0.3% of the free audience, and I’ve only done five paywalled posts so far, so I think there’s a lot of potential here. Paid subscriptions are part of the way I justify my writing and make it self-supporting, so if you like my essays, please subscribe.
Gratitude to Ethan Mollick, Tomas Pueyo, Noah Smith, and Packy McCormick for being my top Substack referrers.
Social mediaSome of my top posts of the year:
- Nat Friedman, legend in his own time
- The steam engine was invented in 1712. An observer at the time might have said: “The engine will power everything: factories, ships, carriages. Horses will become obsolete!” And they would have been right—but two hundred years later, we were still using horses to plow fields (Thread)
- Chiming in on the washing machine controversy from September: This is a prescription for re-enslaving women to domestic service, and ensuring that only the wealthy can live with the basic dignity of cleanliness
- “2 + 2 = 5” was a literal Communist slogan
- Sci-fi set in the future that already feels anachronistic
- Academia cares whether an idea is new. It doesn't really have to work. Industry only cares if an idea works. Doesn't matter if it's new. This creates a gap. Actually a few gaps… (thread)
- Are there websites that are as ornately decorated as medieval manuscripts?
- XKCD, uncannily accurate as always
I tried hard to say no to these in 2024, in order to focus on my book, but I did a few. Highlights include:
- Speaking at Foresight Vision Weekend and at Abundance 2024
- Commenting for “Progress, Rediscovered”, a profile of the progress movement in Reason magazine
Events I got the most FOMO from missing included: Bottlenecks, The Curve, and Edge Esmeralda. Maybe next year!
The Progress ForumSome highlights from the Progress Forum this year:
- Safe Stasis Fallacy, by David Manheim
- Report on the Desirability of Science Given Risks from New Biotech, by Matt Clancy
- The Origins of the Lab Mouse, by Niko McCarty
- Bringing elements of progress studies into short-form persuasive writing, by Dan Recht
- Test-time compute scaling for OpenAI o1 is a huge deal, by Matt Ritter
- Please come up with wildly speculative futures, by Elle Griffin
- Levers for Biological Progress, by Niko McCarty
In 2023 I did several “what I've been reading” updates. Those were fun to do and were well-received, but they took a lot of time; in 2024 I put both them and the links digest on hold in order to focus on my book. Here are some of the highlights of what I read (read part of, tried to read, etc.) this year.
C. P. Snow, “The Two Cultures.” A famous essay arguing that scientific/technical culture and literary/humanities culture are too isolated from and don't take enough of an interest in each other. A few passages I highlighted where he criticizes traditional culture for failing to appreciate the accomplishments of material progress:
In both countries, and indeed all over the West, the first wave of the industrial revolution crept on, without anyone noticing what was happening. It was, of course—or at least it was destined to become, under our own eyes, and in our own time—by far the biggest transformation in society since the discovery of agriculture. In fact, those two revolutions, the agricultural and the industrial-scientific, are the only qualitative changes in social living that men have ever known. But the traditional culture didn’t notice: or when it did notice, didn’t like what it saw.
Almost everywhere, though, intellectual persons didn’t comprehend what was happening. Certainly the writers didn’t. Plenty of them shuddered away, as though the right course for a man of feeling was to contract out; some, like Ruskin and William Morris and Thoreau and Emerson and Lawrence, tried various kinds of fancies which were not in effect more than screams of horror. It is hard to think of a writer of high class who really stretched his imaginative sympathy, who could see at once the hideous back-streets, the smoking chimneys, the internal price—and also the prospects of life that were opening out for the poor, the intimations, up to now unknown except to the lucky, which were just coming within reach of the remaining 99 per cent of his brother men.
Brad Delong, Slouching Toward Utopia. A grand narrative of what Delong calls the “long 20th century”, 1870–2010. Roughly, it's a story of the rise and fall of capitalism, or at least a certain form of it. Delong focuses on the competition between a Hayekian view that believes in the justice of the market, and a Polanyian view that people have rights that are not guaranteed by free markets, such as a stable job and income; with the Keynesian approach being the synthesis. I find much to disagree with in Delong's framing, but I've been learning a lot from the book. I might do a review when I finish it.
Karl Popper, “Epistemology Without a Knowing Subject.” Popper argues that epistemology should study knowledge not only as it exists in the heads of certain knowers, but as a product that exists independent of any observer—as is the case in a scientific society where knowledge is written down and codified. While traditional epistemology is interested in “knowledge as a certain kind of belief—justifiable belief, such as belief based upon perception,” in Popper's framing epistemology becomes “the theory of the growth of knowledge. It becomes the theory of problem-solving, or, in other words, of the construction, critical discussion, evaluation, and critical testing, of competing conjectural theories.”
All work in science is work directed towards the growth of objective knowiedge. We are workers who are adding to the growth of objective knowledge as masons work on a cathedral.
Will Durant, “Voltaire and the French Enlightenment,” Chapter 5 of The Story of Philosophy:
Contemporary with one of the greatest of centuries (1694–1778), he was the soul and essence of it. “To name Voltaire,” said Victor Hugo, “is to characterize the entire eighteenth century.” Italy had a Renaissance, and Germany had a Reformation, but France had Voltaire…
What Voltaire sought was a unifying principle by which the whole history of civilization in Europe could be woven on one thread; and he was convinced that this thread was the history of culture. He was resolved that his history should deal not with kings but with movements, forces, and masses; not with nations but with the human race; not with wars but with the march of the human mind.
Voltaire was sceptical of Utopias to be fashioned by human legislators who would create a brand new world out of their imaginations. Society is a growth in time, not a syllogism in logic; and when the past is put out through the door it comes in at the window. The problem is to show precisely by what changes we can diminish misery and injustice in the world in which we actually live.
Ted Kaczynski, “Industrial Society and its Future.” As I wrote earlier this year:
Given that Ted Kaczynski, aka the Unabomber, was a terrorist who killed university professors and business executives with mail bombs and who lived like a hermit in a shack in the woods of Montana, I expected his 35,000-word manifesto, “Industrial Society and its Future,” to read like the delirious ravings of a lunatic.
I was wrong. His prose is quite readable, and the manifesto has a clear inner logic. This is a virtue, because it’s plain to see where he is actually right, and where he goes disastrously wrong.
See my mini-review for more.
Robert Putnam, Bowling Alone. A detailed, scholarly argument for the thesis that there has been a broad-based decline in all kinds of community participation in the US. I got through part 1, which describes the phenomenon; maybe I'll finish it at some point. I found this interesting for the unique scope that Putnam chose. It would have been easy to pick one narrow trend, such as the decline in fraternal organizations or the PTA, and try to come up with narrow explanations. Looking across so many varied phenomena makes the case that there is something going on at a deeper level.
Vitalik Buterin, “Against choosing your political allegiances based on who is ‘pro-crypto’.” Eminently sensible as usual:
If a politician is pro-crypto, the key question to ask is: are they in it for the right reasons? Do they have a vision of how technology and politics and the economy should go in the 21st century that aligns with yours? Do they have a good positive vision, that goes beyond near-term concerns like "smash the bad other tribe"? If they do, then great: you should support them, and make clear that that's why you are supporting them. If not, then either stay out entirely, or find better forces to align with.
Evidently Vitalik is not impressed with Stand with Crypto.
“Why are there so many unfinished buildings in Africa?” (The Economist). Lack of finance, for one: “people break ground knowing they do not yet have the funds to finish. When they earn a little more money they add more bricks. … Many Africans, in effect, save in concrete.” Weak property rights and flaky or corrupt contractors are a problem too. There are also social reasons: “If you have millions in the bank, people do not see it,” but “when you start building the neighbourhood respects you.”
Stephen Smith, “The American Elevator Explains Why Housing Costs Have Skyrocketed” (NYT):
The problem with elevators is a microcosm of the challenges of the broader construction industry — from labor to building codes to a sheer lack of political will. These challenges are at the root of a mounting housing crisis that has spread to nearly every part of the country and is damaging our economic productivity and our environment.
Elevators in North America have become over-engineered, bespoke, handcrafted and expensive pieces of equipment that are unaffordable in all the places where they are most needed. Special interests here have run wild with an outdated, inefficient, overregulated system. Accessibility rules miss the forest for the trees. Our broken immigration system cannot supply the labor that the construction industry desperately needs. Regulators distrust global best practices and our construction rules are so heavily oriented toward single-family housing that we’ve forgotten the basics of how a city should work.
Similar themes explain everything from our stalled high-speed rail development to why it’s so hard to find someone to fix a toilet or shower. It’s become hard to shake the feeling that America has simply lost the capacity to build things in the real world, outside of an app.
Liyam Chitayat, “Mitochondria Are Alive” (Asimov Press). Fascinating brief opinion piece arguing that “mitochondria are not just organelles, but their own life forms.”
Shyam Sankar, “The Defense Reformation.” A manifesto for reform in the defense industry. One core problem is extreme consolidation: in 1993, there were 53 major defense contractors; today there are 5. Further, most defense contractors were not exclusively defense companies until recently:
Before the fall of the Berlin Wall, only 6% of defense spending went to defense specialists — so called traditionals. The vast majority of the spend went to companies that had both defense and commercial businesses. Chrysler made cars and missiles. Ford made satellites until 1990. General Mills — the cereal company — made artillery and inertial guidance systems. … But today that 6% has ballooned to 86%.
Viviana Zelizer, Pricing the Priceless Child. Argues that between about 1870 and 1930, society shifted from viewing children primarily as economic assets to viewing them as economically “worthless” but emotionally “priceless.” Very interesting book.
Some articles that used the term “techno-humanism” before I did: Reid Hoffman, “Technology Makes Us More Human” (The Atlantic); Richard Ngo, “Techno-humanism is techno-optimism for the 21st century.” Related, I appreciated Michael Nielsen's thoughtful essay, “How to be a wise optimist about science and technology?”
Some pieces I liked on a contrasting philosophy, accelerationism: Nadia Asparouhova, “‘Accelerationism’ is an overdue corrective to years of doom and gloom in Silicon Valley”; Sam Hammond, “Where is this all heading?” Nadia's piece was kinder to e/acc than I have been, but helped me see it in a more sympathetic light.
A few pieces pushing back on James C. Scott: First, Rachel Laudan, “With the Grain: Against the New Paleo Politics” (The Breakthrough Institute):
It’s time to resist the deceptive lure of a non-agrarian world in some imagined past or future dreamed up by countless elites. Instead, we might look to the story of humanity’s huge strides in using these tiny seeds to create food that sustains the lives of billions of people, that is fairly distributed and freely chosen, and that with its satisfying taste contributes to happiness.
And Paul Seabright, “The Aestheticising Vice” (London Review of Books):
That scientific agriculture has faced unforeseen problems is undeniable, as is the fact that some of these problems (the environmental ones, for instance) are serious. But the achievements of scientific agriculture to be set against them are remarkable. The proportion of the world’s population in grinding poverty is almost certainly lower than it has ever been, though in absolute numbers it is still unacceptably high. Where there have been important areas of systematic failure, such as in sub-Saharan Africa, these owe more to social and institutional disasters that have hurt all farmers alike than to the science of agriculture itself. To equate the problems of scientific agriculture with those of Soviet collectivisation is like saying Stalin and Delia Smith have both had problems with egg dishes.
James Carter, “When the Yellow River Changes Course.” The course of a river is not constant, it changes not only on a geologic timescale but on a human-historical one, over the span of centuries. I first learned this from John McPhee's essay “Atchafalaya” (The New Yorker, reprinted in the book The Control of Nature), which was about the Mississippi; it was fascinating to read a similar story from China.
Samuel Hughes, “The beauty of concrete” (Works in Progress): “Why are buildings today simple and austere, while buildings of the past were ornate and elaborately ornamented? The answer is not the cost of labor.”
Alec Stapp and Brian Potter, “Moving Past Environmental Proceduralism” (Asterisk):
In many of the most notable successes, like cleaning up the pesticide DDT or fixing the hole in the ozone layer, what moved the needle were “substantive” standards, which mandated specific outcomes. By contrast, many of the regulatory statutes of the late 60s were “procedural” laws, requiring agencies to follow specific steps before authorizing activities.
On culture: Adam Rubenstein, “I Was a Heretic at The New York Times” (The Atlantic); Michael Clune, “We Asked for It” (The Chronicle of Higher Education).
On the scientific fraud crisis: Derek Lowe, “Fraud, So Much Fraud”; Ben Landau-Taylor, “The Academic Culture of Fraud” (Palladium).
Some early-20th-century historical sources criticizing proress: Samuel Strauss, “Things Are in the Saddle” (1924); and Lewis Mumford, “The Corruption of Liberalism” and “The Passive Barbarian” (both 1940). I quoted from the Mumford pieces in Chapter 4 of The Techno-Humanist Manifesto.
In fiction, I enjoyed Hannu Rajaniemi's Darkome. A major biotech company develops a device anyone can wear on their arm that can inject them with mRNA vaccines; the device is online, so whenever a new pathogen is discovered anywhere in the world, everyone can immediately be vaccinated against it. But a community of biohackers refuses to let a big, centralized corporation own their data or inject genetic material into their bodies. The book is sympathetic to both sides, it's not a simplistic anti-corporate story. I also enjoyed the new Neal Stephenson novel, Polostan.
In poetry, I'll highlight James Russell Lowell, “The Present Crisis” (1845). The crisis was slavery in the US, and it became an anthem of the abolitionist movement. I love the strong rhythm and the grand moral and historical perspective.
Finally, some random books on my infinite to-read list:
- Roger Knight, Britain Against Napoleon: The Organization of Victory, 1793-1815
- Venki Ramakrishnan, Why We Die
- I. Bernard Cohen, Science and the Founding Fathers
- Studs Terkel, Working: People Talk About What They Do All Day and How They Feel About What They Do (1974)
- Oswald Spengler, Man and Technics (1931)
- J. B. Bury, A History of Freedom of Thought (1927)
- Nicholas Barbon, An Apology for the Builder (1685)
I'm excited for next year. We're going to reprise the Progress Conference, which will be bigger and better. We'll run at least one more cohort of the fellowship. I'll finish The Techno-Humanist Manifesto, and begin looking for a publisher. And there is more in development, to be announced.
I'm happy to say that thanks to several generous donors, we've already raised more than $1M to support these programs in 2025. We are looking to raise up to $2M total, in case you'd like to help.
Thank youI am grateful to all of you—the tens of thousands of you—for deeming my writing worthwhile and granting me your attention. I am grateful to the hundreds who support RPI financially. I am grateful especially to everyone who has written to me to say how much my work means to you, or even to tell me how it has changed the course of your career. Here's to a fabulous 2025—for us, for the progress movement, and for humanity.
Discuss
Genesis
Book review: Genesis: Artificial Intelligence, Hope, and the Human Spirit, by Henry A. Kissinger, Eric Schmidt, and Craig Mundie.
Genesis lends a bit of authority to concerns about AI.
It is a frustrating book. It took more effort for me read than it should have taken. The difficulty stems not from complex subject matter (although the topics are complex), but from a peculiarly alien writing style that transcends mere linguistic differences - though Kissinger's German intellectual heritage may play a role.
The book's opening meanders through historical vignettes whose relevance remains opaque, testing my patience before finally addressing AI.
RisksWhen the book gets around to discussing how AI will affect our future, it's mostly correct about AI being a big deal, with occasionally appropriate hints about why there are big risks. But it's frustratingly abstract and vague. Some examples:
we might become extinct.
Would networking intelligences make their processes more opaque than the processes of lone intelligence? ... would we be able to assess them on a spectrum of good to evil? Or would they operate on an informational basis - extracted at superhuman speed ... - that would confound our ability to judge their behavior? Would that lead us further into a cycle of passivity?
Today, in the years, months, weeks, and days leading up to the arrival of the first superintelligence, a security dilemma of existential nature awaits.
I see hints in that quote that they think the threshold of superintelligence will be well enough defined that it can be attributed to a specific day. I find that suspicious.
Genesis compares our preparedness for AI to the preparedness of Aztecs for the arrival of conquistadors.
One area where the book briefly feels clear and novel is when it discusses the future of war, notably observing that humans may become less targeted simply because they'll be irrelevant to military outcomes.
The book provides only weak hints as to what considerations are important. It often feels like there's a missing mood - e.g. it's hard to tell whether the authors think human extinction would be a bigger deal than the end of democracy.
Present Day AIThe weakest parts of the book attempt to describe current AI. Too many of those claims look like claims that were discredited several years ago. It was published a year after Kissinger's death, so likely some of the problem is a long delay between when he wrote those parts and publication.
But there will be phases in the evolution of AI when mechanical intelligence may feel eerily similar to the intelligence of the animals.
I'd say that "prediction" was plausibly true of the best AIs for a brief time around 2021 or 2022. Now AIs seem more like human children.
Lately, AI researchers have devoted serious attention to the project of giving machines "groundedness" - a reliable relationship between the machine's representations and reality
This was true in 2022, but it has been increasingly treated as a solved problem since then.
Other ThoughtsWill we become more like them, or will they become more like us? ... Answering it remains our first and most necessary task.
The authors express cautious optimism about brain-computer interfaces facilitating human-AI symbiosis. That suggests either an overestimation of neural interface potential or an underestimation of AI's rapid advancement.
Under this definition, can AI itself possess dignity? Likely not - for AIs are not born, do not die, feel neither insecurity nor fear, and do not have natural inclinations or individuality such that conceptions of evil or good could be considered "theirs". ... they should be treated, philosophically, like literary characters.
This feels like a confused mix of half-assed morality and limited understanding of where AI is headed.
Genesis refers to Nick Bostrom and Eliezer Yudkowsky without criticizing them. Combined with Kissinger's reputation, that will cause some political and military leaders to take the risks of AI more seriously. That makes the book somewhat important.
People should read this book if they respect Kissinger's forecasts much more than they respect the forecasts of people connected with tech companies.
Discuss
Страницы
- « первая
- ‹ предыдущая
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- следующая ›
- последняя »