Вы здесь

Сборщик RSS-лент

Can AI make advancements in moral philosophy by writing proofs?

Новости LessWrong.com - 45 минут 24 секунды назад

Cross-posted from my website.

If civilization advances its technological capabilities without advancing its wisdom, we may miss out on most of the potential of the long-term future. Unfortunately, it's likely that that ASI will have a comparative disadvantage at philosophical problems.

You could approximately define philosophy as "the set of problems that are left over after you take all the problems that can be formally studied using known methods and put them into their own fields." Once a problem becomes well-understood, it ceases to be considered philosophy. Logic, physics, and (more recently) neuroscience used to be philosophy, but now they're not, because we know how to formally study them.

Our inability to understand philosophical problems means we don't know how to train AI to be good at them, and we don't know how to judge whether we've trained them well. So we should expect powerful AI to be bad at philosophy relative to other, more measurable skills.

However, there is one type of philosophy that is measurable, while also being extremely important: philosophy proofs.

Some examples of proofs that made important advances in moral philosophy:

  • The VNM Utility Theorem proved that any agent whose preferences satisfy four axioms must have a utility function, and their preferences entail maximizing the expected value of that function.
  • Harsanyi's utilitarian theorem (see also Harsanyi (1955) [1] ), which showed that if individuals have VNM utility functions, and if the Pareto principle [2] holds over groups, then a version of utilitarianism must be true. In particular, utility must aggregate linearly across individuals.
  • Arrhenius (2000) [3] proved that any theory of population ethics must accept at least one counterintuitive conclusion.
  • Askell (2018) [4] proved that if four intuitive axioms [5] hold, then it is impossible to compare infinite worlds.

I wrote a proof of my own in GiveWell's Charity Recommendations Require Taking a Controversial Stance on Population Ethics.

The general pattern with these proofs is that you start from a set of intuitively reasonable axioms and use them to produce a controversial conclusion. Having that sort of proof doesn't tell you whether you ought to reject one of the axioms or accept the conclusion, but it does tell you that you have to do one of those things.

Not many philosophical proofs have been written. That suggests that they're difficult to write, or at least difficult to come up with. None of the proofs I listed are particularly complicated from a mathematical point of view—undergraduate math students routinely have to write more difficult proofs than those. The challenging part is identifying the right setup: you have to find a proof that tells you something new.

That's the sort of thing that AI might be able to do well. AI can churn through ideas more quickly than humans can, and it's relatively good at working with formal systems. [6] Modern-day LLMs might be smart enough to come up with useful philosophical proofs; even if not, the first AIs that can write these proofs will not need to be superintelligent.

AI won't be good at telling you how to move forward after finding an impossibility proof; but it can give you the proof.

Proof of concept

A basic test would be to run a pro-tier LLM with extended thinking for a while to search through possibilities and try to come up with an interesting proof; then have human judges review the resulting proof(s). This test would be relatively easy to conduct; the hard part is judging whether the proofs are interesting.

As an even simpler test, I ran three Claude sessions to generate novel impossibility proofs. In each session I provided some guidance on what I was looking for, and I provided different guidance in each case to try to elicit three distinct results. Below is a quick summary of Claude's three proofs, along with my assessments. I haven't carefully verified that these proofs are correct, but they passed a quick sanity check.

  • First proof: We cannot escape Arrhenius' impossibility result by introducing moral uncertainty.

    My assessment: The concept is somewhat interesting, although to me it's intuitively obvious that moral uncertainty wouldn't let us get around Arrhenius' result.

  • Second proof: If a pluralist value system cares about both maximizing welfare and mitigating individuals' most severe complaints (similar to Rawls' maximin principle), then the pluralist system either violates transitivity, or it can be collapsed onto a single scale.

    My assessment: Uninteresting—the definition of "complaint minimization" does all the work in the proof, and the welfare-maximization criterion is irrelevant.

  • Third proof: Given five reasonable axioms of how an aligned AI agent ought to behave, it is impossible for an agent to simultaneously satisfy all five.

    My assessment: Uninteresting—it's a trivial special case of Sen (1970) [7] , which proved that no society can satisfy both Pareto efficiency and liberalism. If no society can satisfy those axioms, then clearly no aligned AI can satisfy them, either.

This was just a quick attempt; more work could perhaps elicit better proofs. Claude had a reasonable understanding of the limitations of its own proofs—it noticed (without additional prompting) that the second proof depended only on the definition of "complaint minimization", and that the third proof was a special case of a known result.

A next step could be to ask many LLM instances to write dozens of proofs, and then use a manager LLM to filter down to the most interesting ones. At minimum, the manager should be able to filter out proofs that are trivial extensions of known results. With some additional effort, present-day LLMs might be capable of coming up with a good novel proof. If not, then it will likely be possible soon. Most kinds of moral philosophy might be difficult for AIs, but proofs are one area where AI assistance seems promising.

Is it risky to train AI on philosophy?

This post was about using pre-existing AI to write philosophy proofs, not about specifically training AI to get better at philosophy. I expect advanced AI to be relatively bad at (most kinds of) philosophy because philosophy is hard to train for.

However, it may be dangerous to train AI to get better at philosophy. My worry is that this would make AI better at persuading us of incorrect philosophical positions, and it would make misalignment harder to catch—precisely because it's so hard to tell whether a philosophical position is correct.

I don't have a strong view on how important this is, but I would be remiss if I didn't talk about potential downsides. To be clear, I'm not proposing that we train AI to get better at philosophy. I'm proposing that perhaps near-future AI could be a useful assistant for writing formal philosophical proofs, and that this may be an important application of AI.

  1. Harsanyi, J. C. (1955). Cardinal Welfare, Individualistic Ethics, and Interpersonal Comparisons of Utility. ↩︎

  2. The Pareto principle states that if outcome A is at least as good as outcome B for every person, and outcome A is better for at least one person, then outcome A is better overall. ↩︎

  3. Arrhenius, G. (2000). An Impossibility Theorem for Welfarist Axiologies. doi: 10.1017/s0266267100000249 ↩︎

  4. Askell, A. (2018). Pareto Principles in Infinite Ethics. ↩︎

  5. The four axioms are "the Pareto principle, transitivity, an axiom stating that populations of worlds can be permuted, and the claim that if the 'at least as good as' relation holds between two worlds then it holds between qualitative duplicates of this world pair". ↩︎

  6. I couldn't have made this statement in 2023. LLMs used to be bad at formal systems, but they've gotten much better. ↩︎

  7. Sen, A. (1970). The Impossibility of a Paretian Liberal. ↩︎



Discuss

Meaningful Questions Have Return Types

Новости LessWrong.com - 58 минут 10 секунд назад

One way intellectual progress stalls is when you are asking the Wrong Questions. Your question is nonsensical, or cuts against the way reality works. Sometimes you can avoid this by learning more about how the world works, which implicitly answers some question you had, but if you want to make real progress you have to develop the skill of Righting a Wrong Question. This is a classic, old-school rationalist idea. The standard examples are asking about determinism, or free will, or consciousness. The standard fix is to go meta. Ask yourself, "Why do I feel like I have free will" or "Why do I think I have consciousness" which is by itself an answerable question. There is some causal path through your cognition that generates that question, and can be investigated. This works great for some ideas, and can help people untangle some self-referential knots they get themselves into, but I find it unsatisfying. Sometimes I want to know the answer to the real question I had, and going meta avoids it, or asks a meaningfully different question instead of answering it. Over time, I've stumbled across another way to right wrong questions that I find myself using more often.

In programming there is the idea of return types — constraining and declaring the type of data structure that a given method or function will return. Having a return type means I can rely on this function not to mess up something else later down the line that requires a particular input. I can more easily interpret what the function is doing (or supposed to do), and can use it more effectively in my program.

I find this a useful metaphor when thinking about how to make progress on a difficult question. It is a different angle than the "go meta" approach; it routes around the self-referential nature of asking why I think some question is answerable and reframes the question to focus more on what shape the answer must take. To ask, "What's the return type of, 'do I have free will?'" is very useful. Suppose I ask this question in a group of friends and someone laughs and tells me, "Yes! Of course you have free will." Have they answered my question? Do I still feel confused? I know I would. I don't even have to ask friends, I can just imagine friends giving me different kinds of answers, and compare this to my feeling of confusion. Would this shape of answer actually dissolve my confusion?

Often to answer confusing philosophical questions, I need a finer-grained picture of reality. I need a better causal chain of the dynamics involved in the question, a higher dimensional conception of the problem. This is useful because it lets me verify my understanding. I can follow directly the chains of cause and effect, I can see the logic clearly. A better map gives me the ability to tell if the answer is correct.

If I can't tell whether some answer is correct or not, I either haven't defined the question well enough, or my map of reality is sorely inadequate. If I can't reframe a question to have a return type it reveals my confusion is deeper than I suspected and I need to take a step back. I need to step down and focus my uncertainty into something more basic and foundational that I can get a handle on.

Consider: "Is the universe deterministic?" What's the return type of this question? A binary yes/no works, but it wouldn't resolve my confusion. How about, "How could I tell if the universe is deterministic?" The return type for this question is more answerable, and gets at my real confusion better. So how could I tell? With a detailed enough causal map of reality I could answer this question, so I would start there.

What about, "What is consciousness?" What's the return type of that question? How could you tell if your answer was correct? How could you tell if your answer was wrong? Imagine a friend saying, "I'm conscious because subjectively it feels like I am conscious" Are you less confused? How about, "Everything is conscious, everything has a subjective experience of being itself" Can you verify this answer? If there was a detailed theoretical bridge between phenomenology and cognitive science, pinned down with experiments, would that enable you to answer the question? Is that then the return type?

This is related to the idea of operationalizing your bets. When you make a prediction market, figuring out how to resolve the market is a big question, and the strategies market makers use to turn abstract, weird, questions into resolvable markets are techniques you can use yourself for personal intellectual progress. "How do I resolve this market?" is almost the same question as, "How do I answer this question?"

When thinking through difficult philosophy, asking yourself how you are going to verify the answer is an underrated tool. Not only does it help you answer the question, it acts as a diagnostic tool for discovering just how confused you are, and where the edge of that confusion lies. The next time you find yourself in some philosophical quandary with your friends consider asking yourself, "What's the return type?"



Discuss

The Shapley Share of Responsibility?

Новости LessWrong.com - 1 час 29 минут назад

Deepfates on twitter wrote:

If you're in a theater and you shout "Fire!", and the audience reacts predictably and in the process trample someone to death, are you responsible?

What if there was no fire?

What if you legitimately believed it was a fire, does that make it okay? Does it depend on evidence?

I tried to give it a serious answer, and I wanted to check in with The LW Folk if this answer made sense to others here.

ii. Shapley Shares

My actual answer, within frame, is "You have some share of responsibility." The exact share depends on the exact situation.

(I think the first order thing to be checking is "was there a fire, or not?". But, for now, answering the question as-asked)

I think people have some responsibility for the second-order effects of their actions, and "how much?" depends on, like, how second-order it was.

Something something "shapley value?" I realize shapley value is intractable to calculate in most cases, but, is suggestive of what shape of answer you'd get if you were omniscient

If you were the only person shouting fire and there was exactly one person who did the trampling, maybe (I'm not actually 100% sure how shapley value works) they both get 50% of the credit/blame (if it turns out that if you remove either of you, it wouldn't have happened)

If you're in a different situation where lots of people are shouting things like "fire" all the time. (for example, some people shouting "AI will kill us" and some people shouting "billionaires are evil, eat the rich"), then the blame is more distributed. Exactly how distributed depends on the circumstances.

The question "okay, what is society supposed to do about that?" needs to take into account what sort of norms are practical to enforce. There's a theoretic answer to 'who is responsible' and a practical 'who are we going to prioritize holding accountable.'

I think it's a correct norm to be "people who shout 'Fire' are responsible for putting effort into doing so in a way that mitigates the tailrisk side effects." For example "Guys there's a fire, please walk calmly toward the exits". But, if the people are not listening to the warning there is a fire, and there _were_ a fire... you'd probably rather the guy shout "Fire!" more emphatically than not do that.

He is maybe 50% responsible for the guy getting trampled in the narrow simplified case (the rest of the responsibility distributed among the panicking people). But, also he gets credit for saving the other people.

...

I realize this is a lot trickier when smart people disagree about whether there's a fire, or will be, and some of the arguments are quite complicated.

I don't think there's a shortcut to looking into the details of the particular case, which includes "was there a fire or not?" and "what else was going on?", etc.


ii. Cooperation among people who disagree on what is true and what is good


I think your more broadly getting at "how do people who disagree about what's true and what's good, cooperate?"

I think this is also kinda confusing and hard. But, listing out the obvious background parts of my viewpoint here (I expect this will not feel as novel to you).

a) ~Everyone probably agrees that 'fully evaluate every claim and the consequences of every action' is intractable, so we need simplified rules

b) Currently, we have a distinction between "what gets punished via governmental force (i.e. law) and what gets punished via social censure." I'm not sure if this is a natural carving, but, it seems pretty good.

The downstream consequences of arguing "there is a danger!" are chaotic enough I don't think it usually makes sense to be a thing that gets legally prosecuted. But on the meta-level, it seems fine for there to be some public arguments and tug-of-war around what gets social-censure.

But another layer of confusion here is "what do you do, when you believe something to be true, that implies a kind of totalizing worldview?". I think the memeplex around "ASI is on it's way and how it shakes out will determine the course of the future" is somewhat intrinsically totalizing.

I think it's true. I don't have a super principled answer other than "just, try not to be totalized about it."

I think it is correct for the social-judgment-sphere to have an immune system against totalizing beliefs. I also think the social-judgment-sphere needs to have a way of processing arguments that are pretty plausibly true, and that skew totalizing for many modern human psychologies.

(I think the way most people should engage with "AI might kill everyone or ruin the future" is, mostly, to call their senators a couple times and mostly get back to their lives)

I think it's correct, for that immune system, to pressure the people saying the totalizing-prone-thing to go out of their way to do an extra good job ameliorating the damage.

But, there are still limits to what's reasonable to expect there.

I also think it's also the responsibility of the rest of society to be tracking what people actually said, not merely blaming the guy yelling "Fire" but also the guys making up stuff about what the guy yelling "Fire" said, etc.

And, while this isn't true for arbitrary arguments, I think the threat of negative superintelligence honestly is clear and obvious enough (at least in magnitude, and the risk being nontrivial), that the rest of the social-sphere has the responsibility of taking the argument seriously. That's the other side of the handshake on "put extra effort into not being totalizing."

It's sort of necessary for messages to get simplified at scale. I already had "make sure whatever political process is happening is sane and reasonable and non-polarized" as a primary goal, but, I've updated in the past few days about the importance of having a clear succinct message that simultaneously conveys the gravity while also having some "and don't be crazy about it" energy.




Discuss

Kegan, Teach, Rao: Stages of Moral Development

Новости LessWrong.com - 2 часа 54 минуты назад

I recently read Chapman's texts on Robert Kegan's levels of moral development and meaning-making, namely: Developing ethical, social, and cognitive competence and the more psychedelic What is stage five (like)?. Scott Alexander also has some interesting thoughts on the first one.

Wikipedia has a nice table showing how Kegan compares the levels to other systems like Maslow's hierarchy of needs. While the equivalence seems somewhat forced, there's still certainly something going on here. And the thing likely isn't "everything should always be divided neatly into five phases".

I have read some other so-called psychoanalytic texts that discuss meaning-making, and some of them resonate with Kegan rather interestingly. Two primary pieces are Sadly, Porn, Edward Teach's (not the pirate) masterpiece on Lacanian psychoanalysis, and the superb guide to meta-level office politics, The Gervais Principle by Venkatesh Rao.

The rest of the text might be largely inscrutable unless you've read these, although you can mediate Sadly with Scott's excellent review.

I'll go through Kegan's levels quickly, but Chapman does it better so I won't bother too hard.

The first two are largely irrelevant for adults. On the first level, the concept of self as a separate entity hasn't properly formed yet (theory of mind), and as such it's rather useless to talk about morality. The second level is almost purely self-interested and other people are largely seen instrumentally. Self-worth is primarily defined by explicit feedback from caretakers.

On the third level, things get interesting. Meaning is derived from the community. Conforming to social expectations and norms is important; "what others think of me?" is the most important aspect. None of this is reflective. People on the third level do not form explicit strategies to win social games.

Fourth level is characterized by systems instead of people, experiences and feelings. You can decide who you are, what you think, and how you react. Ethical and political frameworks that are based on principles instead of outside approval. Identity is based on roles that one plays, systems that one uses. Professionalism. Reflection becomes possible.

What the fourth level does to the third, the fifth doesn't do to the fourth. On the fifth level you can ambivalently inspect the world from multiple perspectives without committing to any. Lightness. Nuance. No pressure to collapse complex phenomena onto a single axis like good/bad. Toolbox mentality. "All systems are wrong, some are useful", not as a dogma, but an actual thought-pattern. The internal narrator becomes self-aware and breaks the fourth wall.

Between the last two, there also lives stage 4.5, where one realizes that any and all systems are not grounded on anything. Nihilism is the typical reaction here, and that's how I experienced it too, before it evolved into moral nihilism and some nuance.

I find Chapman's formulation of the fifth level a bit too Buddhism-flavored for me. Some of the concepts like "becoming the space" and "decentered in time" simply do not resonate [1] . But then again, he also definitely gets it. The way one reaches higher levels leaves its mark on thought-patterns and vocabulary. We might just have different aesthetics.

The first observation for seeing these levels as moral instead of purely developmental is mapping them to pre-existing terms. My interpretation goes somewhat like this:

  1. No morality - no concept of self
  2. Egoism / hedonism
  3. Implicit virtue ethics without generalization
  4. Explicit systems like consequentialism, utilitarianism
  5. Lightness, ambivalence, nuance

Ordinary nihilism can go between #4 and #5. My take is that moral nihilism goes in #5.

Time to railroad this into Teach's Lacanian framework.

Teach's core thesis is that people want someone else to be the "adult in the room". To exemplify this, I'll quote Scott's review on this instead of taking the responsibility myself:

Psychologically healthy people have desires. Sometimes they fantasize about these desires, and sometimes they act upon them. You’ve probably never met anyone like this.

Psychologically unhealthy people, eg you and everyone you know, don’t have desires, at least not in the normal sense. Wanting things is scary and might obligate you to act toward getting the thing lest you look like a coward. But your action might fail, and then you would be the sort of low-status loser who tries something and fails at it.

So instead, you spend all your time playing incredibly annoying mind-games with yourself whose goal is to briefly trick yourself into believing you are high status. Everyone else, so far as you even recognize their existence at all, is useful only as a pawn in this game.

Kegan's second stage doesn't have this problem [2] , as one doesn't feel the invisible audience's Gaze [3] . It's the third stage that activates it. Coolness is derived from conformity, which is subsequently internalized because that's easier both energy- and skill-wise compared to acting cool without internalization.

Upon entering the fourth stage, the influence of the peer group gives way to more complex structures. Status still plays a big part, but it's more and more tied to one's position in hierarchical systems, like job title or net worth. One assumes new roles, like political affiliation, educational level, socioeconomic class, moral philosophy framework, and parenthood. Identity is mostly seen through these.

Transitioning to the fifth level transmutes these roles from identities to tools or descriptions. Self dissolves enough that there's no fixed point for the Gaze to focus on. Failure is no longer a problem to be feared, and rather a risk to be mitigated.

We can next try mapping Loser, Clueless, and Sociopath from The Gervais Principle to Kegan's levels. Clueless is the easiest to map: incapable of reflection, lives in the social reality and self-concept is built on external validation. While identifying with a system is clearly a level 4 trait, the Clueless are in it for social conformity and not because they've learned an explicit system and committed to it. They over-identify with their organization-assigned roles.

The Loser somewhat matches level 4. They maintain an identity separate from the institution. They have principles.

And the Sociopath matches level 4.5. Rao describes how the 3rd and 4th level systems are discarded:

Amorality is merely the first step. As the journey proceeds, Sociopaths progressively rip away layer after layer of social reality. The Sociopath’s journey can be understood as progressive unmasking of a sequence of increasingly ancient and fearsome gods, each reigning over a harsher social order, governing fewer humans. If morality falls by the wayside when the first layer is ripped away, other reassuring certainties, such as the idea of a benevolent universe, and predictable relationships between efforts and rewards, fall away in deeper layers.

With each new layer decoded, Sociopaths find transient meaning, but not enduring satisfaction.

Much to their surprise, however, they find that in the unsatisfying meanings they uncover, lie the keys to power over others. In seeking to penetrate mediated experiences of reality, they unexpectedly find themselves mediating those very realities for others. They acquire agency in the broadest sense of the word. Losers and the Clueless delegate to them not mere specialist matters like heart surgery or car repair, but control over the meanings of their very lives.

Note that both Teach and Rao present pathological models. Neither contain an equivalent to Kegan's fifth, "healthy" level. The enduring satisfaction is what is required here. Chapman somewhat covers this.

An important note that both Chapman and to some extent Rao seem to be quite absolute about these stages and roles. In reality, transitioning to the next stage isn't instant, and a person can be on different stages depending on context or mood. For instance, it's often the case for children that they're on stage 3 while in school context, but still on stage 2 at home. An adult could have different stages for work and home life. This is especially true for Rao's model of Loser and Sociopath, where one might pick their role based on their social status and Powertalk skill level on a case-by-case basis.

I'm definitely not at fifth level yet, not fully. I can see glimpses. On a good day, I don't even have to exert myself too much to chill in the lightness of it. And then something unexpected occurs and it knocks me down to level four. Or two.

I'm also definitely not immune to social pressure, as much as I enjoy pretending otherwise. In theory, one could do neutral analysis, cost-benefit calculations, and value-preserving reflection before deciding on the course of action. With enough stress about this I'm unable to sustain the fifth stage. Collapsing to third stage means that not looking weird, or whatever, collapses from an instrumental to a terminal value. Fourth stage merely means I fall back on my well-practiced roles.

Other sources of stress sometimes do this too. When I'm panicking about a deadline or worrying that I made a mistake, or lack enough stuff from Maslow's, it can also happen.

Another typical way to regress to the fourth level is discussing ethics or politics. Once you have to verbalize your arguments for a position, it saves a lot of words and energy to reach for a pre-existing and well-known framework like utilitarianism, capitalism, or human rights. Especially when someone else does that first. Especially when pointing out a contradiction in their model.

-

Kegan is of the opinion that nobody gets to the stage five before the age of forty. Not that I lack the arrogance to think I'm almost there.

  1. To be fair, he admits that the sounds like he's on drugs. ↩︎

  2. Typically there's an actual adult in the room too. ↩︎

  3. I'm borrowing the word directly from Lacan. ↩︎



Discuss

Monday AI Radar #21

Новости LessWrong.com - 3 часа 22 минуты назад

This week’s big story is the limited release of Claude Mythos Preview. The headline is that Mythos is alarmingly good at cybersecurity, with the ability to find and exploit critical vulnerabilities en masse. Anthropic is handling that responsibly, but the next year or two will be challenging for security. If you haven’t already, now is a good time to review and improve your personal security practices.

Cybersecurity isn’t the only story here: Mythos appears to be the first of the next generation of much larger models. Early data suggest it represents another acceleration of the rate of capability progress, although that’s hard to assess while it’s still in limited release. And from a safety perspective, Anthropic says this is simultaneously the most aligned model they’ve ever created and the most dangerous.

Top pickHow scary is Claude Mythos?

Rob Wiblin’s analysis of Mythos covers all the key points. If you only read this piece, you won’t miss anything vital.

Mythos Preview is another milestone on the race to AGI, arguably as significant as the November 2025 release of Opus 4.5 that kicked off the agentic coding craze. Rob covers both sides of this story: Mythos is the first model powerful enough to cause a major crisis if misused, and (as far as we can tell) also better aligned than any previous Anthropic model.

I expect strong disagreement about how those two factors balance out. Some people will see Mythos as evidence that we are rushing toward AGI without having solved alignment, and others will argue that alignment is progressing as fast as capabilities and we’ll probably manage to muddle through. I believe those aren’t mutually exclusive: we are rushing toward AGI with an alignment strategy that is probably good enough to muddle through with, but which has a real chance of getting us all killed.

Mythos is evidence for short timelines, bringing a big step forward for capabilities that is at least consistent with past trendlines and might represent an inflection point toward even faster progress.

My writingQuick thoughts about Mythos

A few quick thoughts about the release of Claude Mythos Preview.

Foundational beliefs

Six foundational beliefs that shape how I think about AI safety strategy.

Writing with robots

AI can’t write well, but it’s a great editor—here’s how I use it.

Mythos

All of the following pieces are good, but most of you can just read the summaries and pick and choose which links to follow.

Mythos Preview’s cybersecurity capabilities

Mythos is better at finding and exploiting vulnerabilities than any past model:

Anthropic’s analysis is spot on:

There’s no denying that this is going to be a difficult time. While we hope that some of the suggestions above will be helpful in navigating this transition, we believe the capabilities that future language models bring will ultimately require a much broader, ground-up reimagining of computer security as a field.

As part of that reimagining, Anthropic is giving key companies a head start in the cybersecurity arms race via Project Glasswing. This seems like the best path forward, which doesn’t mean it’s guaranteed to succeed.

Ryan Greenblatt estimates the impact of Mythos

An uncontrolled release could have been ugly:

If Mythos was released as an open weight model in February (or tomorrow), this would cause ~100s of billions in damages, with a substantial chance of ~$1 trillion in damages

The Zvi report

Zvi does a two-part deep dive, covering the system card and the cybersecurity implications. Excellent, comprehensive, long.

New sages unrivalled

Dean Ball argues that Mythos marks a new era for AI. I agree, but I don’t have to like it.

I wrote on X that Mythos means the training wheels are coming off on AI policy. Perhaps the Department of War’s effort to strangle Anthropic is, to use another metaphor, a sign that the gloves are off too. If the last month has made anything clear, it is that we are in a nastier, sharper, harsher, meaner era of AI discourse, policy, and—ultimately—of AI development and use.

Failing to understand and plan for this new era might be the biggest unforced error the AI safety community will make over the next couple of years. Much more than previously, many key players will be motivated by ruthless self-interest rather than an altruistic desire to do what is best for humanity. We need to accept that fact and plan accordingly.

Benchmarks and ForecastsRyan Greenblatt’s model of AI progress

Ryan Greenblatt has two long posts on the present state of AI and likely AI timelines. Highly recommended for a deep, gears-level model of how AI capabilities are likely to progress, and especially what the trajectory of AI R&D might look like. The headline result is that based on recent progress, Ryan (like many other people) is shortening his timeline to highly capable AI.

A core part of his thesis is that AI is now immensely capable at coding tasks that are easy to verify. He argues that the human-equivalent time horizon for those tasks is now somewhere between months and years, which represents a superexponential rate of progress. That sounds right—the open question is how quickly we make progress on verifying more complex tasks.

In light of Mythos, he estimates that AI is making Anthropic engineers 1.75x faster, but the overall speedup of Anthropic’s AI R&D is only 1.2x. It’s too early to tell whether that’s the early stage of an intelligence explosion, or an indication that other factors will bottleneck progress and prevent runaway acceleration.

Musings on recursive self-improvement

Seb Krier is skeptical that recursive self-improvement will go as fast as some people think:

When people talk about recursive self-improvement, they sometimes acknowledge these frictions but then treat them as secondary, or assume that sufficiently capable systems can route around most of them via internal deployments and accelerated R&D. I think this is often overstated: these bottlenecks do not disappear just because model development speeds up. They are structural, not incidental, and they push strongly against the more explosive versions of the RSI story.

It’s a great piece that goes beyond the usual “diffusion is slow” thesis. He makes a good case that AI progress will be tethered to—and rate limited by—human factors in ways that prevent a runaway takeoff.

It’s a strong piece, and he points out some important dynamics. But beyond a certain capability level, I believe AI will be able to rapidly transform the world on its own regardless of whether human society can keep up.

Jobs and the economyThe Windfall Policy Atlas

The newly released Windfall Policy Atlas is a great resource for anyone thinking about how to mitigate the economic and employment impacts of AI. It lists 48 potential policy levers (shortened work weeks, robot taxes, etc.), each with a description of how the policy might work and some selected reading.

Autonomous weaponsThe global AI arms race

The New York Times reviews the state of autonomous weapons ($). Fully autonomous weapons haven’t yet transformed the battlefield but capabilities are growing quickly, in part because of rapid iteration in Ukraine. At the current rate of progress, autonomous weapons will soon be essential in any armed conflict. It’s increasingly hard to see how a treaty against autonomous weapons is achievable, given rising global tensions and increased military spending.

Strategy and politicsDaniel Kokotajlo and Dean Ball debate government’s role in AI

This is great: two strong thinkers in a debate format structured to maximize truth-seeking and finding common ground. Spoiler: plenty of tough problems, not so many easy answers.

Can Sam Altman be trusted?

The New Yorker has a long and devastating piece on Sam Altman’s history of lying and manipulation ($). It isn’t news that he is frequently dishonest, but this is the most comprehensive examination of the full scope of the problem.

This is particularly distressing in light of the issues raised by Daniel and Dean above. If you don’t trust the government to manage AI and you don’t trust the CEO of one of the leading labs, that’s hardly ideal.

Political violence is never acceptable

Zvi points out what ought to be obvious to any person with a functioning moral compass.

We need more grantmakers

Sophie Kim and Ady Mehta argue that AI safety is critically constrained not by funding, but by the ability to usefully deploy funding:

The capital is about to scale by orders of magnitude; the capacity to deploy it has not. This post is about that gap– and why filling it matters more than almost anything else in AI safety right now.

Sketches of some defense-favoured coordination tech

Forethought’s latest brainstorming piece explores how to use AI for coordination:

We think that near-term AI could make it much easier for groups to coordinate, find positive-sum deals, navigate tricky disagreements, and hold each other to account.

There are some intriguing ideas here. In particular, the background networking proposal seems like something a single person could deploy at a conference or other small event.

Open modelsCan Chinese and open model companies keep up?

Epoch’s Anson Ho explores the question of whether the Chinese and open model companies (which are not quite the same thing) can keep up with the frontier labs. It’s a solid analysis that considers compute capacity, distillation, how innovations spread, and more.

There isn’t a simple answer, but he leans toward believing it will be hard to close the capability gap while the compute gap remains:

For me the primary takeaway is this: compute is the biggest factor for which companies can compete at the capabilities frontier — efficiency matters too, but it’s probably not enough to make up for ten times less compute.

Claude Mythos and misguided open-weight fearmongering

Nathan Lambert argues against assuming that open models are too dangerous in a world with Mythos-level capabilities. It’s a thoughtful piece, but I’m unconvinced: if open models continue to progress rapidly, it’s hard to see how they don’t become broadly dangerous.

Do we need an open model consortium?

The open model world has recently faced challenges with key personnel leaving and hard questions about long-term financial viability. Nathan Lambert proposes a solution:

a consortium is the only long-term stable path to well-funded, near-frontier open models.

Perhaps, but that’s easier said than done. I’m curious about NVIDIA’s role here: they’re the only player with a clear funding strategy, but it’s hard to figure out their long-term motivations in this space.

TechnicalTraining LLMs to predict world events

Thinking Machines and Mantic discuss how to build an AI forecasting system that approaches the performance of human experts. I was amused to see that even though Grok wasn’t a particularly good forecaster, it was the most valuable member of the forecasting ensemble because its predictions were highly decorrelated from the other models.



Discuss

Only Law Can Prevent Extinction

Новости LessWrong.com - 13 апреля, 2026 - 23:57

There's a quote I read as a kid that stuck with me my whole life:

"Remember that all tax revenue is the result of holding a gun to somebody’s head. Not paying taxes is against the law. If you don’t pay taxes, you’ll be fined. If you don’t pay the fine, you’ll be jailed. If you try to escape from jail, you’ll be shot."
-- P. J. O'Rourke.

At first I took away the libertarian lesson: Government is violence. It may, in some cases, be rightful violence. But it all rests on violence; never forget that.

Today I do think there's an important distinction between two different shapes of violence. It's a distinction that may make my fellow old-school classical Heinlein liberaltarians roll up their eyes about how there's no deep moral difference. I still hold it to be important.

In a high-functioning ideal state -- not all actual countries -- the state's violence is predictable and avoidable, and meant to be predicted and avoided. As part of that predictability, it comes from a limited number of specially licensed sources.

You're supposed to know that you can just pay your taxes, and then not get shot.

Is there a moral difference between that and outright banditry? To the vast majority of ordinary people rather than political philosophers, yes.

"Violence", in ordinary language, has the meaning of violence that is not predictable, that is not avoidable, that does not come from a limited list of sources whose rules people can learn.

Violence that is predictable and avoidable to you, whose consequences are regular and not chaotic, can of course still be terribly unjust and not to your own benefit. It doesn't rule out a peasant being told to hand over two thirds of their harvest in exchange for not much. It doesn't rule out your rent becoming huge because it's illegal to build new housing, etcetera etcetera. Laws can still be bad laws. But it is meaningfully different to the people who live under those unjust laws, if they can at least succeed in avoiding violence that way.

The point of a "state monopoly on violence", when it works, is to have violence come from a short list of knowable sources. A bullet doesn't make a smaller hole when fired by someone in a tidy uniform. But oligopolized force can be more avoidable, because it comes from a short list of dangers -- country, state, county, city -- whose actual rules are learnable even by a relatively dumb person. Ideally. In a high-functioning society.

The Earth presently has a problem. That problem may need to be prevented by the imposition of law, though hopefully not much actual use of force.

The problem, roughly speaking, is that if AI gets very much smarter, it is liable to turn into superhuman AI / machine superintelligence / artificial superintelligence (ASI). Current AIs are not deadly on that scale, but they are increasing in capability fast and breaking upward from previous trend lines. ASI might come about through research breakthroughs directly advancing AI to a superhuman level; or because LLMs got good enough at half-blindly tweaking the design to make a smarter AI, that is then sufficiently improved to make an even smarter AI, such that the process cascades.

AIs are not designed like a bicycle, or programmed and written like a social media website. There's a relatively small piece of code that humans do write, but what that code does, is tweak hundreds of billions of inscrutable numbers inside the actual AI, until that AI starts to talk like a person. The inscrutable numbers then do all sorts of strange things that no human decided for them to do, often things that require intelligence; like breaking out of containment during testing, or talking a human into committing suicide.

Controlling entities vastly smarter than humanity seems like it would, obviously, be the sort of problem that comes with plenty of subtleties and gotchas that can only be learned through practice. Some of the clever ideas that seemed to work fine at the non-superhuman level would fail to control strongly superhuman entities. Dynamics would change; something would go wrong. Probably a lot of things would go wrong, actually. It is hard to scale up engineering designs to vast new scales, and have them work right without a lot of further trial-and-error, even when you know how their internals work. To say nothing of this creation being an alien intelligence smarter than our species, a new kind of problem in all human history... I could go on for a while.

The thing about building vastly superhuman entities, is that you don't necessarily get unlimited retries like you usually do in engineering. You don't necessarily get to know there's a problem, before it's much too late; superhuman AIs may not decide to tell you everything they're thinking, until they are ready to wipe us off the board. (It's already an observed phenomenon that the latest AIs are usually aware of being tested, and may try to conceal malfeasance from an evaluator, like writing code that cheats at a code test and then cleans up the evidence after itself.)

Elon Musk's actual stated plan for Grok, grown on some of the largest datacenters in the world, is that he need only build a superintelligence that values Truth, and then it will keep humans alive as useful truth-generators. That he hasn't been shouted down by every AI scientist on Earth should tell you everything you need to know about the discipline's general maturity as an engineering field. AI company founders and their investors have been selected to be blind to difficulties and unhearing of explanations. If Elon were the sort of person who could be talked out of his groundless optimism, he wouldn't be running an AI company; so also with the founders of OpenAI and Anthropic.

If you need to read a statement by a few hundred academic computer scientists, Nobel laureates, retired admirals, etcetera, saying that yes AI is an extinction risk and we should take that as seriously as nuclear war, you can go look here. Frankly, most of them are relative latecomers to the matter and have not begun to grasp all the reasons to worry. But what they have already grasped and publicly agreed with, is enough to motivate policy.

I realize this might sound naively idealistic. But I say: The utter extermination of humanity, would be bad! It should be prevented if possible! There ought to be a law!

Specifically: There ought to be a law against further escalation of AGI capabilities, trying to halt it short of the point where it births superintelligence. A line drawn sharply and conservatively, because we don't know how much further we can dance across this minefield before something explodes. My organization has a draft treaty online, but a bare gloss at "Okay what does that mean tho" would be: All the hugely expensive specialized chips used to grow large AIs, and run large AIs, would be collected in a limited number of datacenters, and used only under international supervision.

It would be beneath my dignity as a childhood reader of Heinlein and Orwell to pretend that this is not an invocation of force.

But it's the sort of force that's meant to be predictable, predicted, avoidable, and avoided. And that is a true large difference between lawful and unlawful force.

There's in fact a difference between calling for a law, and calling for individual outbursts of violence. (Receipt that I am not arguing with a strawman, and that some people purport to not understand any such distinction: Here). Libertarian philosophy aside, most normal ordinary people can tell the difference, and care. They correctly think that they are less personally endangered by someone calling for a law than by someone calling for street violence.

But wait! The utter extinction of humanity -- argue people who do not believe that premise -- is a danger so extreme, that belief in it might possibly be used to argue for unlawful force! By the Fallacy of Appeal to Consequences, then, that belief can't be true; thus we know as a matter of politics that it is impossible for superintelligence to extinguish humanity. Either it must be impossible for any cognitive system to exist that is advanced beyond a human brain; or the many never-challenged problems of controlling machine superintelligence must all prove to be easy. We cannot deduce which of these two facts is true, but their disjunction must be true and also knowable, because if it weren't knowable, somebody might be able to argue for violence. Never in human history has any proposition proven to be true if anyone could possibly use it to argue for violence. The laws of physics check whether that could be a possible outcome of any physical situation, and avoid it with perfect reliability.

That whole line of reasoning is deranged, of course.

I will nonetheless proceed to spell out why its very first step is wrong, ahead of all the insanity that followed:

Unlawful violence is not able, in this case, to prevent the destruction of the world.

If an ASI ban is to accomplish anything at all, it has to be effective everywhere. When the ones said to me, "What do you think about our proposed national ban on more datacenters until they have sensible regulations?" I replied to them, "An AI can take your job, and a machine superintelligence can kill you, just as easily from a datacenter in another country." They later added a provision saying that also GPUs couldn't be exported to other countries until those countries had similar sensible regulations. (I am still feeling amazed, awed, and a little humbled, about the part where my words plausibly had any effect whatsoever. Politicians are a lot more sensible, in some real-life cases, than angry libertarian literature had led me to believe a few decades earlier.)

Datacenters in Iceland, if they were legal only there, could just as much escalate AI capabilities to the point of birthing the artificial superintelligence (ASI) that kills us. You would not be safe in your datacenter-free city. You can imagine the ASI side as having armies of flying drones that search everywhere; though really there are foreseeable, quickly-accessible-to-ASI technologies that would be much more dangerous than drone swarms. But those would take longer to explain, and the drone swarms suffice to make the point. You could not stay safe from ASI by hiding in the woods.

On my general political philosophy, if a company's product only endangers voluntary customers who know what they're getting into, by strong default that's a matter between the company and the customer.

If a product might kill someone standing nearby the customer, like cigarette smoke, that's a regional matter. Different cities or countries can try out different laws, and people can decide where to live.

If a product kills people standing on the other side of the planet from the customer, then that's a matter for international negotiations and treaties.

ASI is a product that kills people standing on the other side of the planet. Driving an AI company out of just your own city will not protect your family from death. It won't even protect your city from job losses, earlier in the timeline.

And similarly: To impede one executive, one researcher, or one company, does not change where AI is heading.

If tomorrow Demis Hassabis said, "I have realized we cannot do this", and tried to shut down Google Deepmind, he would be fired and replaced. If Larry Page and Sergei Brin had an attack of sense about their ability to face down and control a superintelligence, and shut down Google AI research generally, those AI researchers would leave and go to other companies.

Nvidia is currently the most valuable company in the world, with a $4.5 trillion market capitalization, because everyone wants more AI-training chips than Nvidia has to sell. The limiting resource for AI is not land on which to construct datacenters; Earth has a lot of land. Banning a datacenter from your state may keep electricity cheaper there in the middle term, but it won't stop the end of the world.

The limiting resource for AI is also not the number of companies pursuing AI. If one AI company was randomly ruined by their country's government, other AI companies would swarm around to buy chips from Nvidia instead, which would stay at full production and sell their full production. The end of the world would carry on.

There is no one researcher who holds the secret to your death. They are all looking for pieces of the puzzle to accumulate, for individual rewards of fame and fortune. If somehow the person who was to find the next piece of the puzzle randomly choked on a chicken bone, somebody else would find a different puzzle piece a few months later, and Death would march on. AI researchers tell themselves that even if they gave up their enormous salaries, that wouldn't help humanity much, because other researchers would just take their place. And the grim fact is that this is true, whether or not you consider it an excuse.

In other cases of civic activism, you can prevent one coal-fired power plant from being built in your own state, and then there is that much less carbon dioxide in the atmosphere and the world is a little less warm a century later. Or if you are against abortions, and you get your own state to outlaw abortions, perhaps there are then 1000 fewer abortions per year and that is to you a solid accomplishment. Which is to say: You can get returns on your marginal efforts that are roughly linear with the effort you put in.

The ASI problem is not like this. If you shut down 5% of AI research today, humanity does not experience 5% fewer casualties. We end up 100% dead after slightly more time. (But not 5% more time, because AI research doesn't scale in serial speed with the number of parallel researchers; 9 women can't birth a baby in 1 month.)

So we don't need to have a weird upsetting conversation about doing bad unlawful things that would supposedly save the world, because even if someone did a very bad thing, that still wouldn't save the world.

This is a point that some people seem to have a very hard time hearing -- though those people are usually not on the anti-extinction side, to be clear. It's more that some people can't imagine that superhuman AI could be a serious danger, to the point where they have trouble reasoning about what that premise would imply. Others are politically opposed to AI regulation of any sort, and therefore would prefer to misunderstand these ideas in a way where they must imply terrible unacceptable conclusions.

I understand the reasons in principle. But it is a strange and frustrating phenomenon to encounter in practice, in people who otherwise seem coherent and intelligent (though maybe not quite on the level of GPT 5.4). Many people believe, somehow, that other people ought to think -- not themselves, only other people -- that outbursts of individual violence just have to be helpful. If you were truly desperate, how could you not resort to violence?

But even if you're desperate, an outburst of violence usually will not actually solve your problems! That is a general truism in life, and it applies here in full force.

Even if you throw away all your morals, that doesn't make it work. Even if you offer your soul to the Devil, the Devil is not buying.

How certain do you have to be that your child has terminal cancer, before you start killing puppies? 10% sure? 50% sure? 99.9%? The answer is that it doesn't matter how certain you are, killing puppies doesn't cure cancer. You can kill one hundred puppies and still not save your kid. There is no sin so great that it just has to be helpful because of how sinful it is.

Statistics show that civil movements with nonviolent doctrines are more successful at attaining their stated goals (especially in states that otherwise have functioning police). The factions that throw away all their morals lose the sympathy of the public and politicians, and then they fail. Terrorism is not an instant 'I win' button that people only refrain from pressing because they're so moral. Society has succeeded in making it usually not pay off -- say the numbers.

Being really, really desperate changes none of those mechanics.

Almost everyone who actually accepts a fair chance of ASI disaster doesn't seem to have a hard time understanding this part. It's an obvious consequence of the big picture, if you actually allow that big picture inside your head.

But it is hard for a human being to understand a thing, if it would be politically convenient to misunderstand. Opponents of AI regulation want any danger of extinction to imply unacceptable consequences.

They understand on some level how the AI industry functions. But they become mysteriously unable to connect that knowledge to their model of human decisionmaking. You can ask them, "If tomorrow I was arrested for attacking an AI-company headquarters, would you read that headline, and conclude that AI had been stopped in its tracks forever and superintelligence would never happen?" and get back blank stares.

Even some people that are not obviously politically opposed seem to stumble over the idea. I'm genuinely not sure why. I think maybe they are having trouble processing "Well of course ASI would just kill everyone, we're nowhere near being able to control it" as an ordinary understanding of the world, the way that 20th-century concerns about global nuclear war were part of a mundane understanding of the world. "If every country gets nuclear weapons they will eventually be used" was not, to people in 1945, the sort of belief where you have to prove how strongly you believe it by being violent. It was just something they were afraid would prove true about the world, and then cause their families to die in an unusually horrible kind of fire. So they didn't randomly attack the owners of uranium-mining companies, to prove how strongly they believed or how worried they were; that, on their correct understanding of the world, would not have solved humanity's big problem -- namely, the inexorable-seeming incentives for proliferation. Instead they worked hard, and collected a coalition, and built an international nuclear anti-proliferation regime. Both the United States and the Soviet Union cooperated on many aspects of that regime, despite hating each other quite a lot, because neither country's leaders expected they'd have a good day if an actual nuclear war happened.

The sort of conditionally applicable force that could stop everyone from dying to superhuman AI, would have to be everywhere and reliable; uniform and universal.

Let it be predictable, predicted, avoidable, and avoided.

It is so much a clear case for state-approved lawful force, that there would be little point in adding any other kind of force to the mix. It would just scare and offend people, and they'd be valid to be scared and offended. People don't like unguessably long lists of possible violence-sources in their lives, for then they cannot predict it and avoid it.

I did spell out the necessity of the lawful force, in first suggesting that international policy. Some asked afterward, "Why would you possibly mention that the treaty might need to be enforced by a conventional airstrike, if somebody tried to defy the ban?" One reason is that some treaties aren't real and actually enforced, and that this treaty needs to be the actually-enforced sort. Another reason is that if you don't spell things out, that same set of people will make stuff up instead; they will wave their hands and say, "Oh, he doesn't realize that somebody might have to enforce his pretty treaty."

And finally it did seem wiser to me, that all this matter be made very plain, and not dressed up in the sort of obscuring language that sometimes accompanies politics. For an international ASI ban to have the best chance of operating without its force actually being invoked, the great powers signatory to it need to successfully communicate to each other and to any non-signatories: We are more terrified of machine superintelligence killing everyone on Earth than we are reluctant to use state military force to prevent that.

If North Korea, believed to have around 50 nuclear bombs, were to steal chips and build an unmonitored datacenter, I would hold that diplomacy ought to sincerely communicate to North Korea, "You are terrifying the United States and China. Shut down your datacenter or it will be destroyed by conventional weapons, out of terror for our lives and the lives of our children." And if diplomacy fails, and the conditional use of force fires, and then North Korea retaliates with a first use of its nuclear weapons? I don't think it would; that wouldn't end well for them, and they probably know that. But I also don't think this is a hypothetical where sanity says that we are so terrified of someone's possible first use of nuclear weapons, that we let them shatter a setup that protects all life on Earth.

You'd want to be very clear about all of this in advance. Countries not understanding it in advance could be very bad. History shows that is how a lot of wars have started, through someone failing to predict a conditional application of force and avoid it. One historical view suggests that Germany invaded Poland in 1939 in part because, when Britain had tried to warn that Britain would defend Poland, Hitler read the messages himself, instead of having the professional diplomats explain it to him; and Hitler read the standard diplomatic politesse and soft words as conciliatory; and thus began World War II. More recently, a similar diplomatic misunderstanding by Saddam Hussein is thought to have resulted in Hussein's 1990 invasion of Kuwait, as then in fact provoked a massive international response. I've sometimes been criticized for trying to spell out proposed policy in such awfully plain words, like saying that the allies might have to airstrike a datacenter if diplomacy failed. Some people -- reaching pretty hard, in my opinion -- claimed that this must be a disguised incitement to unlawful violence. But being very clear about the shape of the lawful force was important, in this case.

And then, all that policy is sufficiently the obvious and sensible proposal -- following from the ultimately straightforward realization that something vastly smarter than humanity is not something humanity presently knows how to build safely -- and never mind how bad it starts looking if you learn details like Elon Musk's stated plan -- that some people find it inconveniently difficult to argue with. Unless they lie about what the proposal is.

So I am misquoted (that is, they fabricate a quote I did not say, which is to say, they lie) as calling for "b*mbing datacenters", two words I did not utter. In the first 2023 proposal in TIME magazine, I wrote the words "be willing to destroy a rogue datacenter by airstrike". I was only given one day by TIME to write it -- otherwise it wouldn't have been 'topical' -- but I had thought I was saying that part quite carefully. Even quoted out of context, I thought, this ought to make very clear that I was talking about state-sanctioned use of force to preserve a previously successful ban from disruption. And absolutely not some guy with a truck bomb, attacking one datacenter in their personal country while all the other datacenters kept running.

And that phrasing is clear even when quoted out of context! If quoted accurately. So some (not all) accelerationists just lied about what was being advocated, and fabricated quotes about "b*mbing datacenters". When called out, they would protest, "Oh, you pretty much said that, there's no important difference!" To this as ever the reply is, "If it is worth it to you to lie about, it must be important."

A similarly fabricated quote says that I proposed "nuking datacenters". Ladies, gentlemen, all others, there is absolutely no reason to nuke a noncompliant datacenter. In the last extremity of failed diplomacy, a conventional missile will do quite well. The taboo against first use of nuclear weapons is something that I consider one of the great triumphs of the post-WW2 era. I am proud as a human being that we pulled that off. Nothing about this matter requires violating that taboo. We should not be overeager to throw away all limits and sense, and especially not when there is no need. Life on Earth needs to go on in the sense of "life goes on", not just in the sense of "not being killed by machine superintelligences".

It is sometimes claimed that ASI cannot possibly be banned without a worldwide tyranny -- by people who oppose AI regulation and so would prefer it to require horrifying unacceptable measures.

At the very least: I don't think we know this to be true to the point we should all lie down and die instead.

At least until recently, humanity has managed to not have every country building its own nuclear arsenal. We did that without everyone on Earth being subjected to daily-required personal obediences to the International Atomic Energy Agency. Some people in the 1940s and 1950s thought it would take a tyrannical world dictatorship, to prevent every country from getting nuclear weapons followed by lots of nuclear war! Shutting down all major wars between major powers, or slowing that kind of technological proliferation, had never once been done before, in all history! But those worried skeptics were wrong; for some decades, at least, nuclear proliferation was greatly slowed compared to the more pessimistic forecasts, without a global tyranny. And now we have that precedent to show it can be done; not easily, not trivially, but it can be done.

For the supervast majority of normal people, "Don't spend billions of dollars to smuggle computer chips, construct an illegal datacenter, and try to build a superintelligence" is a very small addition to the list of things they must not do. Surveys seem to show that most people think machine superintelligence is a terrible idea anyway. (Based.)

And the few who feel really personally bothered by that law?

They may be sad. They'll definitely be angry. But they'll survive. They wouldn't actually survive otherwise.

My will for Sam Altman's fate is that he need only fear the use of force by his country, his state, his county, and his city, as before; with the difference that Sam Altman, like everyone else on Earth, is told not to build any machine superintelligences; and that this potential use of state force against his person be predictable to him, and predicted by him, and avoidable to him, and avoided by him; with him as with everyone. That's how it needs to be if any of us are to survive, or our children, or our pets, or our garden plants.

Let Sam Altman have no fear of violence beyond that, nor fire in the night.

Artificial superintelligence is the very archetype and posterchild of a problem that can only be solved with force that has the shape of law, as in state-backed universal conditional applications of force meant to be predictable and avoided. Anything which is not that does not solve the problem.

And when somebody does throw a Molotov cocktail at Sam Altman's house, that is not actually good for the anti-extinction movement, as anyone with the tiniest bit of sense could and did predict.

Currently all the anti-extinctionist leaders are begging their people to not be violent -- as they've said in the past, but louder now. And conversely some of the accelerationists are trying to goad violence, in some cases to the shock of their usual audiences:


That this sentiment is not universal among accelerationists, is seen immediately from the protestor in their replies. Let us, if not them, be swift to fairly admit: We are observing bad apples and not a bad barrel.

But also to be clear, those bad apples were also trying to goad people into violence earlier, in advance of the attacks on Altman:

To this tweet I will not belabor the reply that anti-extinctionists may be good people with morals; some good people might nod, but others would find it unconvincing, and there is one analysis that answers for all: It would not work. And given that it would not save humanity, anti-extinctionists make the obvious estimate that our own cause would be, and has been, harmed by futile outbursts of unlawful violence.

Conversely, some accelerationists behave as if they want to spread the word and meme of violence as far as possible. It is reasonable to guess that some part of their brain has considered the consequences of somebody being moved by their taunts, and found them quite acceptable. If they can goad somebody labelable as anti-extinctionist to violence, that benefits their faction. They may consider Sam Altman replaceable to their cause, so long as there is no law and treaty to stop all the AI companies everywhere.

They're right. Sam Altman is not the One Ring. He is not Sauron's one weakness. If anything happened to him, AI would go on.

I am posting these Tweets in part to say to any impressionable young people who may consider themselves humanity's defenders, who are at all willing to listen to their allies rather than their enemies: Hey. Don't play into their hands. They're taunting you exactly because violence is good for their side and bad for ours. If it were true that violence could help you, if they expected that violence would hurt AI progress more than it helped their side politically, they'd never taunt you like that, because they'd be afraid rather than eager to see you turn to violence. They're saying it to you because it's not true; and if it were true, they'd never say it to you. They're not on your side, and the advice implied by the taunts is deliberately harmful for you and good for them.

This is of course a general principle when somebody is taunting you. It means they want you to fight, which means they expect to benefit from you trying.

Don't believe their taunts. Believe what is implied by their act of taunting, that violence hurts you and helps them. That part is accurate, obvious, and not at all hard for their brains to figure out in the background, before they choose to taunt you.

It makes sense to me that society penalizes factions that appear to benefit from violence, even if their leaders try to disclaim that violence. Intuitively, you don't want to create a vulnerability in society where faction leaders could gain an advantage by sending out assassins and then publicly disclaiming them.

But at the point where some accelerationists are openly trying to goad anti-extinctionists into violence, while the anti-extinctionist leaders beg for peace -- this denotes society has gone too far in the direction of punishing the 'violent' faction for what's probably actually in real life a rogue. And not far enough in leveling some social opprobrium at (individual) accelerationist sociopaths standing nearby, openly trying to provoke violence they know would be useful to them.

It is of course an old story. The civic movement leaders try to persuade their people to stay calm, disciplined, and orderly on the march. The local police, if they oppose that movement, will allow looters to tag along and then forcibly prevent the marchers from stopping the looters. When your society gets to that point, it has created a new vulnerability in the opposite direction.

One could perhaps also observe that certain people have taken this particular moment to argue that a scientific position whose native plausibility ought to be obvious, and which has been endorsed by hundreds of academic scientists, retired admirals, Nobel laureates, etcetera, inevitably implies that unlawful violence must be a great idea. I am not going to make any great show of wringing my hands and clutching my pearls about how such false speech might endanger the innocent for their own political advantage, what if some mentally disturbed person believed them, etcetera. This is how human beings always behave around politics; it is not unusual wrongdoing for any faction to behave that way. They, too, have a right to say what they believe, and to believe things that are obviously false but politically convenient to them. I may still take a moment to observe what is happening.

As for the argument that to criticize AI at all is "stochastic terrorism", because someone will react violently eventually, even if not logically so? Tenobrus put it well:

The leaders of anti-extinctionism do have some responsibility to ask their people to please behave themselves. And we do! That actually is around as much as should be reasonably asked of any civic movement. We ought to try, and try we do! We cannot and should not be expected to succeed every single time given base rates of mental illness in the population.

Speech about important matters to society should not properly be held hostage to the whim of any madman that might do a stupid thing, to the detriment of his supposed cause and against every visible word of that cause's leaders.

That would be a foolish way to run a society.

And policywise, this would be a very serious matter about which to shut down speech. Anthropic Claude Mythos is already a state-level actor in terms of how much harm it could theoretically have done -- given its demonstrated and verified ability to find critical security vulnerabilities in every operating system and browser; and how fast Mythos could've exploited those vulnerabilities, with ten thousand parallel threads of intelligent attack. Mythos hypothetically rampant or misused could have taken down the US power grid, say... at the end of its work, after introducing hard-to-find errors into all the bureaucracies and paperwork and doctors' notes connected to the Internet.

In 2024 a claim of that being possible would have been a mere prediction and dismissed as fantasy. Now it is an observation and mere reality. That's the danger level of current AI, for all that Anthropic seems to be trying to be well-behaved about it, and Mythos has not yet visibly run loose. To say in the face of that, that nobody should critique AI, or AI companies, or even individual AI company leaders as per recent journalism, because some madman might thereby be inspired to violence -- it fails cost/benefit analysis, dear reader.

AI is already a state-level potential danger, if not quite yet a state-level actual power. Free speech to critique AI then holds a corresponding level of importance. The stochastic madman trying to hold free speech hostage to his possible whims -- he must be told he is not important enough for all humanity to defer to him about subjects he might find upsetting.

And faced with an actual human-extinction-level danger like machine superintelligence -- as ought obviously to represent that level of possible danger, even if some people disagree about its rough probability -- well, that would be a silly way for everyone on Earth to die, if nobody dared to talk about the danger, or argue high estimates of that danger, and it happened without any effort at stopping it.

So let's not die! Let's save everyone!

Sam Altman too.

That's the dream.




Discuss

Which Relations Can Be Generalized Implicitly?

Новости LessWrong.com - 13 апреля, 2026 - 22:05

This is a small, stand-alone piece of work, which introduces a conjecture about how models can generalize. I haven't had a huge amount of time to stress-test it, but I think it's a neat finding.

TL;DR: Transformers can generalize representable group and monoid operations, but find it much easier to grok abelian groups than non-abelian ones. They can't grok truncated infinite groups. I find a categorization problem which transformers are capable of generalizing implicitly, without chain of thought.

I conjecture that transformers can implicitly generalize any problem solvable in mjx-math { display: inline-block; text-align: left; line-height: 0; text-indent: 0; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; border-collapse: collapse; word-wrap: normal; word-spacing: normal; white-space: nowrap; direction: ltr; padding: 1px 0; } mjx-container[jax="CHTML"][display="true"] { display: block; text-align: center; margin: 1em 0; } mjx-container[jax="CHTML"][display="true"][width="full"] { display: flex; } mjx-container[jax="CHTML"][display="true"] mjx-math { padding: 0; } mjx-container[jax="CHTML"][justify="left"] { text-align: left; } mjx-container[jax="CHTML"][justify="right"] { text-align: right; } mjx-mi { display: inline-block; text-align: left; } mjx-c { display: inline-block; } mjx-utext { display: inline-block; padding: .75em 0 .2em 0; } mjx-mo { display: inline-block; text-align: left; } mjx-stretchy-h { display: inline-table; width: 100%; } mjx-stretchy-h > * { display: table-cell; width: 0; } mjx-stretchy-h > * > mjx-c { display: inline-block; transform: scalex(1.0000001); } mjx-stretchy-h > * > mjx-c::before { display: inline-block; width: initial; } mjx-stretchy-h > mjx-ext { /* IE */ overflow: hidden; /* others */ overflow: clip visible; width: 100%; } mjx-stretchy-h > mjx-ext > mjx-c::before { transform: scalex(500); } mjx-stretchy-h > mjx-ext > mjx-c { width: 0; } mjx-stretchy-h > mjx-beg > mjx-c { margin-right: -.1em; } mjx-stretchy-h > mjx-end > mjx-c { margin-left: -.1em; } mjx-stretchy-v { display: inline-block; } mjx-stretchy-v > * { display: block; } mjx-stretchy-v > mjx-beg { height: 0; } mjx-stretchy-v > mjx-end > mjx-c { display: block; } mjx-stretchy-v > * > mjx-c { transform: scaley(1.0000001); transform-origin: left center; overflow: hidden; } mjx-stretchy-v > mjx-ext { display: block; height: 100%; box-sizing: border-box; border: 0px solid transparent; /* IE */ overflow: hidden; /* others */ overflow: visible clip; } mjx-stretchy-v > mjx-ext > mjx-c::before { width: initial; box-sizing: border-box; } mjx-stretchy-v > mjx-ext > mjx-c { transform: scaleY(500) translateY(.075em); overflow: visible; } mjx-mark { display: inline-block; height: 0px; } mjx-mn { display: inline-block; text-align: left; } mjx-msup { display: inline-block; text-align: left; } mjx-mtext { display: inline-block; text-align: left; } mjx-TeXAtom { display: inline-block; text-align: left; } mjx-msub { display: inline-block; text-align: left; } mjx-mspace { display: inline-block; text-align: left; } mjx-mrow { display: inline-block; text-align: left; } mjx-mtable { display: inline-block; text-align: center; vertical-align: .25em; position: relative; box-sizing: border-box; border-spacing: 0; border-collapse: collapse; } mjx-mstyle[size="s"] mjx-mtable { vertical-align: .354em; } mjx-labels { position: absolute; left: 0; top: 0; } mjx-table { display: inline-block; vertical-align: -.5ex; box-sizing: border-box; } mjx-table > mjx-itable { vertical-align: middle; text-align: left; box-sizing: border-box; } mjx-labels > mjx-itable { position: absolute; top: 0; } mjx-mtable[justify="left"] { text-align: left; } mjx-mtable[justify="right"] { text-align: right; } mjx-mtable[justify="left"][side="left"] { padding-right: 0 ! important; } mjx-mtable[justify="left"][side="right"] { padding-left: 0 ! important; } mjx-mtable[justify="right"][side="left"] { padding-right: 0 ! important; } mjx-mtable[justify="right"][side="right"] { padding-left: 0 ! important; } mjx-mtable[align] { vertical-align: baseline; } mjx-mtable[align="top"] > mjx-table { vertical-align: top; } mjx-mtable[align="bottom"] > mjx-table { vertical-align: bottom; } mjx-mtable[side="right"] mjx-labels { min-width: 100%; } mjx-mtr { display: table-row; text-align: left; } mjx-mtr[rowalign="top"] > mjx-mtd { vertical-align: top; } mjx-mtr[rowalign="center"] > mjx-mtd { vertical-align: middle; } mjx-mtr[rowalign="bottom"] > mjx-mtd { vertical-align: bottom; } mjx-mtr[rowalign="baseline"] > mjx-mtd { vertical-align: baseline; } mjx-mtr[rowalign="axis"] > mjx-mtd { vertical-align: .25em; } mjx-mtd { display: table-cell; text-align: center; padding: .215em .4em; } mjx-mtd:first-child { padding-left: 0; } mjx-mtd:last-child { padding-right: 0; } mjx-mtable > * > mjx-itable > *:first-child > mjx-mtd { padding-top: 0; } mjx-mtable > * > mjx-itable > *:last-child > mjx-mtd { padding-bottom: 0; } mjx-tstrut { display: inline-block; height: 1em; vertical-align: -.25em; } mjx-labels[align="left"] > mjx-mtr > mjx-mtd { text-align: left; } mjx-labels[align="right"] > mjx-mtr > mjx-mtd { text-align: right; } mjx-mtd[extra] { padding: 0; } mjx-mtd[rowalign="top"] { vertical-align: top; } mjx-mtd[rowalign="center"] { vertical-align: middle; } mjx-mtd[rowalign="bottom"] { vertical-align: bottom; } mjx-mtd[rowalign="baseline"] { vertical-align: baseline; } mjx-mtd[rowalign="axis"] { vertical-align: .25em; } mjx-mfrac { display: inline-block; text-align: left; } mjx-frac { display: inline-block; vertical-align: 0.17em; padding: 0 .22em; } mjx-frac[type="d"] { vertical-align: .04em; } mjx-frac[delims] { padding: 0 .1em; } mjx-frac[atop] { padding: 0 .12em; } mjx-frac[atop][delims] { padding: 0; } mjx-dtable { display: inline-table; width: 100%; } mjx-dtable > * { font-size: 2000%; } mjx-dbox { display: block; font-size: 5%; } mjx-num { display: block; text-align: center; } mjx-den { display: block; text-align: center; } mjx-mfrac[bevelled] > mjx-num { display: inline-block; } mjx-mfrac[bevelled] > mjx-den { display: inline-block; } mjx-den[align="right"], mjx-num[align="right"] { text-align: right; } mjx-den[align="left"], mjx-num[align="left"] { text-align: left; } mjx-nstrut { display: inline-block; height: .054em; width: 0; vertical-align: -.054em; } mjx-nstrut[type="d"] { height: .217em; vertical-align: -.217em; } mjx-dstrut { display: inline-block; height: .505em; width: 0; } mjx-dstrut[type="d"] { height: .726em; } mjx-line { display: block; box-sizing: border-box; min-height: 1px; height: .06em; border-top: .06em solid; margin: .06em -.1em; overflow: hidden; } mjx-line[type="d"] { margin: .18em -.1em; } mjx-msubsup { display: inline-block; text-align: left; } mjx-script { display: inline-block; padding-right: .05em; padding-left: .033em; } mjx-script > mjx-spacer { display: block; } mjx-munder { display: inline-block; text-align: left; } mjx-over { text-align: left; } mjx-munder:not([limits="false"]) { display: inline-table; } mjx-munder > mjx-row { text-align: left; } mjx-under { padding-bottom: .1em; } mjx-munderover { display: inline-block; text-align: left; } mjx-munderover:not([limits="false"]) { padding-top: .1em; } mjx-munderover:not([limits="false"]) > * { display: block; } mjx-mover { display: inline-block; text-align: left; } mjx-mover:not([limits="false"]) { padding-top: .1em; } mjx-mover:not([limits="false"]) > * { display: block; text-align: left; } mjx-stretchy-v.mjx-c28 mjx-beg mjx-c::before { content: "\239B"; padding: 1.154em 0.875em 0.655em 0; } mjx-stretchy-v.mjx-c28 mjx-ext mjx-c::before { content: "\239C"; width: 0.875em; } mjx-stretchy-v.mjx-c28 mjx-end mjx-c::before { content: "\239D"; padding: 1.165em 0.875em 0.644em 0; } mjx-stretchy-v.mjx-c28 > mjx-end { margin-top: -1.809em; } mjx-stretchy-v.mjx-c28 > mjx-ext { border-top-width: 1.779em; border-bottom-width: 1.779em; } mjx-stretchy-v.mjx-c29 mjx-beg mjx-c::before { content: "\239E"; padding: 1.154em 0.875em 0.655em 0; } mjx-stretchy-v.mjx-c29 mjx-ext mjx-c::before { content: "\239F"; width: 0.875em; } mjx-stretchy-v.mjx-c29 mjx-end mjx-c::before { content: "\23A0"; padding: 1.165em 0.875em 0.644em 0; } mjx-stretchy-v.mjx-c29 > mjx-end { margin-top: -1.809em; } mjx-stretchy-v.mjx-c29 > mjx-ext { border-top-width: 1.779em; border-bottom-width: 1.779em; } mjx-c.mjx-c1D442.TEX-I::before { padding: 0.704em 0.763em 0.022em 0; content: "O"; } mjx-c.mjx-c28::before { padding: 0.75em 0.389em 0.25em 0; content: "("; } mjx-c.mjx-c31::before { padding: 0.666em 0.5em 0 0; content: "1"; } mjx-c.mjx-c29::before { padding: 0.75em 0.389em 0.25em 0; content: ")"; } mjx-c.mjx-c1D45B.TEX-I::before { padding: 0.442em 0.6em 0.011em 0; content: "n"; } mjx-c.mjx-c32::before { padding: 0.666em 0.5em 0 0; content: "2"; } mjx-c.mjx-c1D44E.TEX-I::before { padding: 0.441em 0.529em 0.01em 0; content: "a"; } mjx-c.mjx-c2C::before { padding: 0.121em 0.278em 0.194em 0; content: ","; } mjx-c.mjx-c1D44F.TEX-I::before { padding: 0.694em 0.429em 0.011em 0; content: "b"; } mjx-c.mjx-c2E::before { padding: 0.12em 0.278em 0 0; content: "."; } mjx-c.mjx-c2208::before { padding: 0.54em 0.667em 0.04em 0; content: "\2208"; } mjx-c.mjx-c1D43A.TEX-I::before { padding: 0.705em 0.786em 0.022em 0; content: "G"; } mjx-c.mjx-c22C5::before { padding: 0.31em 0.278em 0 0; content: "\22C5"; } mjx-c.mjx-c3D::before { padding: 0.583em 0.778em 0.082em 0; content: "="; } mjx-c.mjx-c1D450.TEX-I::before { padding: 0.442em 0.433em 0.011em 0; content: "c"; } mjx-c.mjx-cD7::before { padding: 0.491em 0.778em 0 0; content: "\D7"; } mjx-c.mjx-c2192::before { padding: 0.511em 1em 0.011em 0; content: "\2192"; } mjx-c.mjx-c1D43C.TEX-I::before { padding: 0.683em 0.504em 0 0; content: "I"; } mjx-c.mjx-c2200::before { padding: 0.694em 0.556em 0.022em 0; content: "\2200"; } mjx-c.mjx-c1D434.TEX-I::before { padding: 0.716em 0.75em 0 0; content: "A"; } mjx-c.mjx-cA0::before { padding: 0 0.25em 0 0; content: "\A0"; } mjx-c.mjx-c2203::before { padding: 0.694em 0.556em 0 0; content: "\2203"; } mjx-c.mjx-c2212::before { padding: 0.583em 0.778em 0.082em 0; content: "\2212"; } mjx-c.mjx-c1D43B.TEX-I::before { padding: 0.683em 0.888em 0 0; content: "H"; } mjx-c.mjx-c1D454.TEX-I::before { padding: 0.442em 0.477em 0.205em 0; content: "g"; } mjx-c.mjx-c210E.TEX-I::before { padding: 0.694em 0.576em 0.011em 0; content: "h"; } mjx-c.mjx-c2297::before { padding: 0.583em 0.778em 0.083em 0; content: "\2297"; } mjx-c.mjx-c1D436.TEX-I::before { padding: 0.705em 0.76em 0.022em 0; content: "C"; } mjx-c.mjx-c211D.TEX-A::before { padding: 0.683em 0.722em 0 0; content: "R"; } mjx-c.mjx-c6D::before { padding: 0.442em 0.833em 0 0; content: "m"; } mjx-c.mjx-c6F::before { padding: 0.448em 0.5em 0.01em 0; content: "o"; } mjx-c.mjx-c64::before { padding: 0.694em 0.556em 0.011em 0; content: "d"; } mjx-c.mjx-c1D446.TEX-I::before { padding: 0.705em 0.645em 0.022em 0; content: "S"; } mjx-c.mjx-c7C::before { padding: 0.75em 0.278em 0.249em 0; content: "|"; } mjx-c.mjx-c1D464.TEX-I::before { padding: 0.443em 0.716em 0.011em 0; content: "w"; } mjx-c.mjx-c1D452.TEX-I::before { padding: 0.442em 0.466em 0.011em 0; content: "e"; } mjx-c.mjx-c1D465.TEX-I::before { padding: 0.442em 0.572em 0.011em 0; content: "x"; } mjx-c.mjx-c1D458.TEX-I::before { padding: 0.694em 0.521em 0.011em 0; content: "k"; } mjx-c.mjx-c28.TEX-S3::before { padding: 1.45em 0.736em 0.949em 0; content: "("; } mjx-c.mjx-c1D45C.TEX-I::before { padding: 0.441em 0.485em 0.011em 0; content: "o"; } mjx-c.mjx-c1D460.TEX-I::before { padding: 0.442em 0.469em 0.01em 0; content: "s"; } mjx-c.mjx-c1D703.TEX-I::before { padding: 0.705em 0.469em 0.01em 0; content: "\3B8"; } mjx-c.mjx-c1D456.TEX-I::before { padding: 0.661em 0.345em 0.011em 0; content: "i"; } mjx-c.mjx-c29.TEX-S3::before { padding: 1.45em 0.736em 0.949em 0; content: ")"; } mjx-c.mjx-c1D70B.TEX-I::before { padding: 0.431em 0.57em 0.011em 0; content: "\3C0"; } mjx-c.mjx-c2F::before { padding: 0.75em 0.5em 0.25em 0; content: "/"; } mjx-c.mjx-c1D467.TEX-I::before { padding: 0.442em 0.465em 0.011em 0; content: "z"; } mjx-c.mjx-c1D447.TEX-I::before { padding: 0.677em 0.704em 0 0; content: "T"; } mjx-c.mjx-c1D440.TEX-I::before { padding: 0.683em 1.051em 0 0; content: "M"; } mjx-c.mjx-c33::before { padding: 0.665em 0.5em 0.022em 0; content: "3"; } mjx-c.mjx-c35::before { padding: 0.666em 0.5em 0.022em 0; content: "5"; } mjx-c.mjx-c36::before { padding: 0.666em 0.5em 0.022em 0; content: "6"; } mjx-c.mjx-c6C::before { padding: 0.694em 0.278em 0 0; content: "l"; } mjx-c.mjx-c67::before { padding: 0.453em 0.5em 0.206em 0; content: "g"; } mjx-c.mjx-c2061::before { padding: 0 0 0 0; content: ""; } mjx-c.mjx-c2124.TEX-A::before { padding: 0.683em 0.667em 0 0; content: "Z"; } mjx-c.mjx-c39::before { padding: 0.666em 0.5em 0.022em 0; content: "9"; } mjx-c.mjx-c37::before { padding: 0.676em 0.5em 0.022em 0; content: "7"; } mjx-c.mjx-c30::before { padding: 0.666em 0.5em 0.022em 0; content: "0"; } mjx-c.mjx-c1D437.TEX-I::before { padding: 0.683em 0.828em 0 0; content: "D"; } mjx-c.mjx-c34::before { padding: 0.677em 0.5em 0 0; content: "4"; } mjx-c.mjx-c38::before { padding: 0.666em 0.5em 0.022em 0; content: "8"; } mjx-c.mjx-c48::before { padding: 0.683em 0.75em 0 0; content: "H"; } mjx-c.mjx-c65::before { padding: 0.448em 0.444em 0.011em 0; content: "e"; } mjx-c.mjx-c69::before { padding: 0.669em 0.278em 0 0; content: "i"; } mjx-c.mjx-c73::before { padding: 0.448em 0.394em 0.011em 0; content: "s"; } mjx-c.mjx-c1D439.TEX-I::before { padding: 0.68em 0.749em 0 0; content: "F"; } mjx-c.mjx-c1D435.TEX-I::before { padding: 0.683em 0.759em 0 0; content: "B"; } mjx-c.mjx-c54::before { padding: 0.677em 0.722em 0 0; content: "T"; } mjx-c.mjx-c4D::before { padding: 0.683em 0.917em 0 0; content: "M"; } mjx-c.mjx-c1D451.TEX-I::before { padding: 0.694em 0.52em 0.01em 0; content: "d"; } mjx-c.mjx-c1D445.TEX-I::before { padding: 0.683em 0.759em 0.021em 0; content: "R"; } mjx-c.mjx-c5B::before { padding: 0.75em 0.278em 0.25em 0; content: "["; } mjx-c.mjx-c41::before { padding: 0.716em 0.75em 0 0; content: "A"; } mjx-c.mjx-c6E::before { padding: 0.442em 0.556em 0 0; content: "n"; } mjx-c.mjx-c61::before { padding: 0.448em 0.5em 0.011em 0; content: "a"; } mjx-c.mjx-c5D::before { padding: 0.75em 0.278em 0.25em 0; content: "]"; } mjx-c.mjx-c43::before { padding: 0.705em 0.722em 0.021em 0; content: "C"; } mjx-c.mjx-c74::before { padding: 0.615em 0.389em 0.01em 0; content: "t"; } mjx-c.mjx-c75::before { padding: 0.442em 0.556em 0.011em 0; content: "u"; } mjx-c.mjx-c72::before { padding: 0.442em 0.392em 0 0; content: "r"; } mjx-c.mjx-c42::before { padding: 0.683em 0.708em 0 0; content: "B"; } mjx-c.mjx-c5F::before { padding: 0 0.5em 0.062em 0; content: "_"; } mjx-c.mjx-c1D45F.TEX-I::before { padding: 0.442em 0.451em 0.011em 0; content: "r"; } mjx-c.mjx-c63::before { padding: 0.448em 0.444em 0.011em 0; content: "c"; } mjx-c.mjx-c79::before { padding: 0.431em 0.528em 0.204em 0; content: "y"; } mjx-c.mjx-c1D45A.TEX-I::before { padding: 0.442em 0.878em 0.011em 0; content: "m"; } mjx-c.mjx-c1D462.TEX-I::before { padding: 0.442em 0.572em 0.011em 0; content: "u"; } mjx-c.mjx-c1D461.TEX-I::before { padding: 0.626em 0.361em 0.011em 0; content: "t"; } mjx-c.mjx-c2260::before { padding: 0.716em 0.778em 0.215em 0; content: "\2260"; } mjx-c.mjx-c1D457.TEX-I::before { padding: 0.661em 0.412em 0.204em 0; content: "j"; } mjx-c.mjx-c2211.TEX-S1::before { padding: 0.75em 1.056em 0.25em 0; content: "\2211"; } mjx-c.mjx-c1D438.TEX-I::before { padding: 0.68em 0.764em 0 0; content: "E"; } mjx-c.mjx-c1D441.TEX-I::before { padding: 0.683em 0.888em 0 0; content: "N"; } mjx-c.mjx-c1D453.TEX-I::before { padding: 0.705em 0.55em 0.205em 0; content: "f"; } mjx-c.mjx-c5E::before { padding: 0.694em 0.5em 0 0; content: "^"; } mjx-c.mjx-c1D463.TEX-I::before { padding: 0.443em 0.485em 0.011em 0; content: "v"; } mjx-c.mjx-c1D466.TEX-I::before { padding: 0.442em 0.49em 0.205em 0; content: "y"; } mjx-c.mjx-c221E::before { padding: 0.442em 1em 0.011em 0; content: "\221E"; } mjx-container[jax="CHTML"] { line-height: 0; } mjx-container [space="1"] { margin-left: .111em; } mjx-container [space="2"] { margin-left: .167em; } mjx-container [space="3"] { margin-left: .222em; } mjx-container [space="4"] { margin-left: .278em; } mjx-container [space="5"] { margin-left: .333em; } mjx-container [rspace="1"] { margin-right: .111em; } mjx-container [rspace="2"] { margin-right: .167em; } mjx-container [rspace="3"] { margin-right: .222em; } mjx-container [rspace="4"] { margin-right: .278em; } mjx-container [rspace="5"] { margin-right: .333em; } mjx-container [size="s"] { font-size: 70.7%; } mjx-container [size="ss"] { font-size: 50%; } mjx-container [size="Tn"] { font-size: 60%; } mjx-container [size="sm"] { font-size: 85%; } mjx-container [size="lg"] { font-size: 120%; } mjx-container [size="Lg"] { font-size: 144%; } mjx-container [size="LG"] { font-size: 173%; } mjx-container [size="hg"] { font-size: 207%; } mjx-container [size="HG"] { font-size: 249%; } mjx-container [width="full"] { width: 100%; } mjx-box { display: inline-block; } mjx-block { display: block; } mjx-itable { display: inline-table; } mjx-row { display: table-row; } mjx-row > * { display: table-cell; } mjx-mtext { display: inline-block; } mjx-mstyle { display: inline-block; } mjx-merror { display: inline-block; color: red; background-color: yellow; } mjx-mphantom { visibility: hidden; } _::-webkit-full-page-media, _:future, :root mjx-container { will-change: opacity; } mjx-c::before { display: block; width: 0; } .MJX-TEX { font-family: MJXZERO, MJXTEX; } .TEX-B { font-family: MJXZERO, MJXTEX-B; } .TEX-I { font-family: MJXZERO, MJXTEX-I; } .TEX-MI { font-family: MJXZERO, MJXTEX-MI; } .TEX-BI { font-family: MJXZERO, MJXTEX-BI; } .TEX-S1 { font-family: MJXZERO, MJXTEX-S1; } .TEX-S2 { font-family: MJXZERO, MJXTEX-S2; } .TEX-S3 { font-family: MJXZERO, MJXTEX-S3; } .TEX-S4 { font-family: MJXZERO, MJXTEX-S4; } .TEX-A { font-family: MJXZERO, MJXTEX-A; } .TEX-C { font-family: MJXZERO, MJXTEX-C; } .TEX-CB { font-family: MJXZERO, MJXTEX-CB; } .TEX-FR { font-family: MJXZERO, MJXTEX-FR; } .TEX-FRB { font-family: MJXZERO, MJXTEX-FRB; } .TEX-SS { font-family: MJXZERO, MJXTEX-SS; } .TEX-SSB { font-family: MJXZERO, MJXTEX-SSB; } .TEX-SSI { font-family: MJXZERO, MJXTEX-SSI; } .TEX-SC { font-family: MJXZERO, MJXTEX-SC; } .TEX-T { font-family: MJXZERO, MJXTEX-T; } .TEX-V { font-family: MJXZERO, MJXTEX-V; } .TEX-VB { font-family: MJXZERO, MJXTEX-VB; } mjx-stretchy-v mjx-c, mjx-stretchy-h mjx-c { font-family: MJXZERO, MJXTEX-S1, MJXTEX-S4, MJXTEX, MJXTEX-A ! important; } @font-face /* 0 */ { font-family: MJXZERO; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Zero.woff") format("woff"); } @font-face /* 1 */ { font-family: MJXTEX; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Regular.woff") format("woff"); } @font-face /* 2 */ { font-family: MJXTEX-B; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Bold.woff") format("woff"); } @font-face /* 3 */ { font-family: MJXTEX-I; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Math-Italic.woff") format("woff"); } @font-face /* 4 */ { font-family: MJXTEX-MI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Italic.woff") format("woff"); } @font-face /* 5 */ { font-family: MJXTEX-BI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Math-BoldItalic.woff") format("woff"); } @font-face /* 6 */ { font-family: MJXTEX-S1; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size1-Regular.woff") format("woff"); } @font-face /* 7 */ { font-family: MJXTEX-S2; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size2-Regular.woff") format("woff"); } @font-face /* 8 */ { font-family: MJXTEX-S3; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size3-Regular.woff") format("woff"); } @font-face /* 9 */ { font-family: MJXTEX-S4; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size4-Regular.woff") format("woff"); } @font-face /* 10 */ { font-family: MJXTEX-A; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_AMS-Regular.woff") format("woff"); } @font-face /* 11 */ { font-family: MJXTEX-C; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Calligraphic-Regular.woff") format("woff"); } @font-face /* 12 */ { font-family: MJXTEX-CB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Calligraphic-Bold.woff") format("woff"); } @font-face /* 13 */ { font-family: MJXTEX-FR; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Fraktur-Regular.woff") format("woff"); } @font-face /* 14 */ { font-family: MJXTEX-FRB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Fraktur-Bold.woff") format("woff"); } @font-face /* 15 */ { font-family: MJXTEX-SS; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Regular.woff") format("woff"); } @font-face /* 16 */ { font-family: MJXTEX-SSB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Bold.woff") format("woff"); } @font-face /* 17 */ { font-family: MJXTEX-SSI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Italic.woff") format("woff"); } @font-face /* 18 */ { font-family: MJXTEX-SC; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Script-Regular.woff") format("woff"); } @font-face /* 19 */ { font-family: MJXTEX-T; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Typewriter-Regular.woff") format("woff"); } @font-face /* 20 */ { font-family: MJXTEX-V; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Vector-Regular.woff") format("woff"); } @font-face /* 21 */ { font-family: MJXTEX-VB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Vector-Bold.woff") format("woff"); } time by a semi-Thue system equipped with an algebraic oracle.

Background

Feel free to skip this if you're up to date on latent reasoning.

Implicit reasoning, latent reasoning, CoT-free generalization, etc. are all names for a similar thing. Modern LLMs can process information on two scales: implicitly, within one forward pass, and explicitly, across multiple forward passes. One important question is "What can LLMs do in a single forward pass, and what do they need to do across multiple forward passes?"

Previous research by Balesni et. al. (2025) has found that LLMs cannot, in general do two-hop latent reasoning: when trained on a pair of facts such as "Russ is the spouse of Hay", and "Hay was born in Detroit" they are unable to compose the fact "Russ's spouse was born in Detroit". They also struggle on the fact reversal task, being unable to infer "Uriah Hawthorne composed Abyssal Melodies" desbite being trained on "Abyssal Melodies was composed by Uriah Hawthorne" (Berglund et. al. 2023).

LLMs also typically struggle with latent comparison, as discovered by Allen-Zhu et. al. (2023): though GPT-4 was able to recall numerical facts perfectly, it was unable to compare these numbers across entities without thinking step-by-step.

Results by Wang et. al. (2024) found that transformer models can, in fact, generalize the comparison task, even out-of-distribution (but can only generalize fact composition in-distribution) by grokking, which is is a process whereby training a (small) model far beyond overfitting leads to a sudden generalization (Power et. al. 2022). This phenomenon was studied further by Nanda et. al. (2023) who attribute it to a delayed phase-change in the network which is favoured by an inductive bias in the SGD (or Adam) optimizer, as well as weight decay.

Neural networks can grok the operation of permutation groups over five, and six elements, seemingly requiring neurons in a single layer to cleanly generalize the operation of a permutation group over elements (Stander et. al. 2023).

Grokking Groups

Which group operations can a model generalize?

Groups

A group is (feel free to skip this if you know about groups) a set of elements equipped with a map from called the group operation. There exists an identity element such that , and , and .

If we have two groups we can make a product group , with identities and operations by pairing up elements like so:




If then the group is called abelian.

Cyclic Groups

The archetypical abelian groups are the cyclic groups , which are just integers , with addition as the group operation. Any finite abelian group is just a product group of cyclic groups. Therefore, if a network can generalize modular addition, we might expect it to be able to generalize any finite abelian group's operation.

Permutation Groups

Another archetypical type of group is a permutation group over elements, written as . Each group element is a rearrangement of the elements in the group, and composition of elements involves just doing two rearrangements in a row.

Now consider an arbitrary group with elements. For any element , consider multiplying every element in by . This must map each element of to a distinct new one.[1] Therefore, each element of can be identified with a permutation over , and our group is a subgroup of . Therefore, if a model can learn any permutation group of arbitrary size, it can learn any finite group's operation.

This is interesting, since we already have very strong evidence that models can generalize permutation groups from Stander et. al. (2023)! Therefore, we might conjecture that neural networks can generalize any finite group.

Infinite Groups

This is where things get messy. Neural networks obviously cannot grok infinite groups if we express each group element as a specific token, since they would need to have an infinitely-large embedding matrix.

One important class of infinite groups are the free groups over elements. Suppose we have two elements: and . We also need inverses and . The free group over elements is just the set of words (including the empty string ) over the alphabet where adjacent and terms "annihilate" and the group operation is concatenation. So is allowed as a real group element, but isn't, since the elements annihilate, leaving which also annihilates, therefore .

The free group over element is isomorphic to the integers under addition.

We can, instead, truncate an infinite group into a finite number of elements, for example limiting our free group over elements to words with fewer than letters in them. This means that we no longer have closure: which is outside of our truncated zone (we do have plenty of possible operation examples using four letters, though, such as ). This is not a subgroup of a permutation group. It also can't be represented using the trick I'm about to introduce.

Representations as matrices

It's common in maths to want to represent a group as a matrix. Matrices are easy to study, and their multiplication follows nice rules (in fact, many subsets of matrices of size form an infinite group).

We might not care about this, though. The permutation group can be represented as the matrices which swap the elements of a vector around. Then we can just multiply them.

Any cyclic group admits a two-dimensional representation as a rotation matrix:

All finite groups can be represented, because all permutation groups can be represented. In fact, most of them can be represented much more efficiently than by converting them first to . Lots of infinite ones can be represented as well (complex numbers with are a clear example, they're just 2D rotations), but the infinite ones we've chosen are not representable.

This means that, if models can generalize matrix multiplication, they can generalize a huge number of group multiplications.

Monoids

A monoid is like a group without inverses. One example is the "transition monoid" on elements , which is a bit like the permutation group on elements. Instead of having to swap the elements so that they all end up in different spots, the includes rearrangements which put two (or more) elements in the same spot. We can still compose these just the same, but there's no way to invert them since we don't know which element started off where.

There are also a lot more possible transitions than there are permutations. For each transition, we have to specify one of destinations for each of starting points with no restrictions, so while , , and these monoids get really big really quickly.

We can generate a sub-monoid of this monoid by starting with two transition functions and just applying them in all possible combinations until we stop getting new transitions. The two starting transitions here are called "generators".

can still be represented as matrices, though.

As with groups any finite monoid can be represented as a sub-monoid of , which means that all finite monoids can be represented as matrices.

There are some other funny monoid options as well, which we'll meet later.

Matrix Multiplication

Matrix multiplication of random matrices takes neurons to learn.[2]

The modular arithmetic functions learned for in (Stander et. al. 2023) were able to do it in neurons, which is significantly better than the predicted by this conjecture. Looking into them, they don't appear to be doing matrix multiplication. Since every group can be represented as a subgroup of , this means that every group should be generalizable by a model with neurons.

However, the function learned when modular addition is grokked in Nanda et. al. (2023) does look quite a lot like matrix multiplication averaged across multiple frequencies. Weirdly, the model doesn't just do the base frequency , it does a few multiples of that, and then finds the place where they all agree, by the Chinese remainder theorem. Perhaps this is just how the model is using its extra residual stream dimensions to de-noise its estimates (since it wasn't a network that could do multiplication exactly). These results might suggest that the required neurons for grokking a cyclic group is , or it might be some low power of or , depending on how much de-noising is across different estimators is required.

Groups that Grok

I ran through twelve different monoids. Three abelian groups (products of cyclic groups), three non-abelian groups, three truncated infinite groups, and three finite monoids:

The groups are:

Abelian
  • , cyclic group of order 97.
  • , product of two copies of the cyclic group of order 11.
  • , product of three copies of the cyclic group of order 5.
Non-abelian
  • , permutations of a list of 5 elements,
  • , symmetry group of a 48-sided polygon, 48-fold rotation + symmetry
  • , matrices of the form , where are integers .
Truncated infinite
  • , free group on two elements
  • , free group on three elements
  • , braid group on three strings
Monoids
  • , random transition monoid on five elements, with two generators
  • monoid of matrices where are integers
  • , the "rook monoid" or monoid of matrices with elements of only and , such that no two s are on the same row or column, e.g.
    or


The abelian groups grok pretty quickly, the non-abelian groups grok slowly, the truncated infinite groups don't seem to grok at all, and the monoids vary.

I'm not quite going to go into why some things grok more rapidly than others. I'd guess it's to do with the subgroup and submonoid structures within them. What I will think about is what's going on with the truncated infinite groups.

Learning Sets

OK let's move on to a different problem. Not groups. We'll generate a matrix of items from different categories and put them into different sets.


Category 1: Animals

Category 2: Colours

Category 3: Gemstones

Category 4: Countries

Set 1

Dog

Red

Quartz

Germany

Set 2

Cat

Blue

Diamond

Japan

Set 3

Frog

Yellow

Emerald

Nigeria

Set 4

Fish

Green

Ruby

Mexico

Ok, first we'll just try with random tokens, but we'll go back and use LLMs later. We'll choose six categories and eight sets. We'll train Pythia 70M on a task that looks like this:

"In Bostock's matching game, the animal cat is matched with the colour ..." where the desired completion is "Blue". We generate a bunch of those

This is basically just the logical relation , which we can think of this as the edge . We can make a nice diagram showing which edges we've trained on, and which ones we've tested on. For each of three repeats, we'll shuffle the sets and categories around completely, keeping only the overall structure of the edges the same.


On the final row, we've plotted the accuracy of a linear probe applied to a late residual stream activation of the model, and trained on the ground truth of which set we're in (specifically, we learn a dimensional matrix to project the residual stream down to eight dimensions, and then take a softmax and cross-entropy loss, using a random train/test split).

Clearly, at 6 train edges and 6 test edges, we haven't learned anything. What about 12 train edges?

Here, some of them have successfully generalized. Oddly, at the point where the model begins to learn, the probe becomes less accurate. Now we'll try with more and more train edges. Here's 18

And here's 24.

With more training edges we see proper generalization.

Now we see a weird pattern: the probe drops in accuracy just before generalization, then climbs back up. I expect the high accuracy of the probe at model initialization is due to it taking a different, independent random split of possible examples.

Toy Models

In these cases the tokens aren't semantically loaded at the start. We might as well just think of the problem as randomly initializing tokens with names like and then giving it problems like:

We get a slightly different pattern of behaviour in the probe when we do this: here's 12 train edges:

Now we see a slight climb in the accuracy around partial generalization. Interestingly, the pattern of which edges get generalized seems to be the same between the LLM and the toy model.

And here's 24:


Now the probe really looks like it's telling us something. It's telling us is that the model is actually learning a linear representation of set membership.

So while models can't generalize from "Rob's father is John" and "John's father is Alex" to "Rob's father's father is Alex", they can generalize from "Dog matches to the colour Red" and "Red matches to the number Five" to "Dog matches to the number Five". What is up here?

Putting It All Together

Ok, so there's an intuition here which happens to map roughly onto the concept of a semi-Thue system equipped with an algebraic oracle, but it might well map onto other concepts better.

A semi-Thue system is a system which takes in a string (a series of "letters" from some "alphabet") and applies rules to transform it. There might be a number of rules which can be applied to that string at any one time, but we'll just think about one rule at a time. A "string" is a bit of a funky concept for us, we'll think of every sequence of tokens as being mapped to some string in a pretty flexible way. A token might map to one or more letters. I'm playing a bit fast and loose here to get the intuition across, because I don't trust myself fully with the maths.

Fact composition can be applied indefinitely. If your rulesreplace "John's Father" with "Alex" and "Alex's Sister" with "Claire" then you might expect it to generalize. You can have "John's Father's Sister's Cat's Kitten's Owner's Landlord's...". But 'fully' generalize, from being trained on those facts a model would have to be able to do an arbitrarily large reduction of facts. And you can't skip any steps without memorizing an arbitrarily large table of composed relations. A model cannot perform an arbitrarily large number of computations in a single forward pass. It doesn't generalize.

The set matching problem from earlier might look like that. If you've seen "Dog's colour is Red, and Red's number if Five" then you might end up trying to solve "Dog's number is..." by taking multiple steps.

But if you introduce an extra set of concepts, the sets themselves, you can instead solve "Dog's colour is..." by going Dog + Colour Set_2 + Colour Red.

If you also solve "Red's number is" by going Red + Number Set_2 + Number Five, then you can solve "Dog's number" by Dog + Number Set_2 + Number Five. This chain is in length, and requires rules.

This is the difference.

Oracles

We've seen how learning a symbolic concept can work. What if we expand that? What if we allow ourselves to map some things to real numbers, and then do algebra on them? Adding this kind of extra functionality is usually called adding an "oracle" to our system, because the rules can ask it to return some algebraic result instantly.

The comparison task "Who is taller, Obama or Biden?" can be solved in steps by mapping each entity+attribute pair to a real number representing that attribute (in this case and , and then comparing the two real numbers.

Any finite group can be represented using a matrix. If we say that the "oracle" can perform any matrix multiplication up to a given size (also instantaneously) then we can solve any group or monoid operation in time. Or at least for any representable group or monoid.

The truncated infinite ones, while they could be generalized, can't be generalized by a finite ruleset which always finishes in steps. If we have to perform then we might need to apply arbitrarily many cancellation rules.

The Conjecture

The conjecture I am introducing to describe this pattern of implicit generalization: if a problem can be generally solved in time with a small-ish ruleset (like of the dataset size or something), then a transformer can learn it implicitly. If it can't a transformer won't learn it implicitly.

Secondary conjecture I'm less confident in: if a problem takes bigger than time but rules in the dataset size, transformers can reason through it explicitly.


Code for the group relations and set sorting operations can be found at:

https://github.com/jonathanbostock/group-grokking

https://github.com/jonathanbostock/a-is-b-is-c


Editor's note: this post was released as part of Doublehaven (no official relation to Inkhaven)

◆◆◆◆◆|◆◆◆◆◆|◆◆◆◇◇
◆◆◆◆◆|◆◆◆◆◆|◆◆◇◇◇

  1. ^

    If they weren't distinct, and we had

    we would have




  2. ^

    Which is probably related to the operations found in the standard matrix algorithm , which might be implemented as:

    Where is the matrix with a 1 at and 0 everywhere else, though I will wildly guess that the actual experiment run looks more like some random distribution:

    For tuples of random unit vectors and individual neurons drawn from a distribution such that . If we sum over unit vectors aligned with the axes, this gives us the original value.



Discuss

Who Killed Common Law?

Новости LessWrong.com - 13 апреля, 2026 - 21:38

The classical undergraduate humanities curriculum in America was destroyed and replaced over the course of the twentieth century. The destruction is usually blamed on postmodernism in the 1970s, but the replacement was already well under way by then. Neither the attackers nor the defenders of the old curriculum can (or will) explain what happened and why.

Allan Bloom's essay "Our Listless Universities" (the 1982 essay later expanded into The Closing of the American Mind) is the most famous attempt at a defense, and it reads at first as though it has no argument at all: it asserts the superiority of the Western Tradition and the badness of rock music without much visible reasoning. But if I relax my eyes and let the nearby details blur, a latent argument floats into focus. Bloom identifies a certain kind of value relativism, imported from German philosophy (he mentions Nietzsche, Weber, and Heidegger) as the solvent that removed the American university's commitment to truth.

Bloom's essay is trying to say "America is over, let's think through rationally what to do next" in a way that makes mainstream conservatives feel like he's on their side and should be funded to stick it to the libs. It produces the sensation of allegiance to the Western Tradition, not the grounds for it. Naturally this involves writing some connotative checks it can't denotatively cash.

But why couldn't the old curriculum be defended on its merits? The roots go deeper than Bloom or his critics acknowledge.

In 1066, soldiers loyal to a man named William conquered the island kingdom of England. He now had to govern a country whose customs he didn't know. His solution, refined over the next century by his successors, was to send royal judges around the country to settle disputes. These judges had no code. They had to figure out what the local rules were, case by case. The rules varied from shire to shire and sometimes contradicted each other, so the judges reconciled them, and over generations their decisions accumulated into a body of law common to the whole kingdom. By the early 1600s, this system was old enough and robust enough that Chief Justice Edward Coke could tell King James I to his face that it was superior to the king's own judgment. [1] What had begun as a practical expedient for governing a conquered country had, over seven centuries, staked its authority on a claim: that the norms people actually live by, when reasoned through carefully and reconciled, converge. [2]

A tradition seven centuries deep does not die of exposure to Nietzsche unless something has already compromised its foundations. Antinomianism is the rejection of binding law or standards as such: the position that rules are external impositions to be evaded, abolished, or transcended rather than discovered principles to be understood and followed. [3] The American Puritans were explicitly worried about it, just as Luther had been in Germany. The first crisis of the Massachusetts Bay Colony, called the Antinomian Controversy, ended in 1638 with the trial and banishment of Anne Hutchinson for claiming that grace freed the saved from moral law.

In America, the credibility of the Common Law suffered a decisive blow around the time of the Civil War, when it failed to address the issue of slavery through legal mechanisms, and Americans resorted to war to settle their dispute. This was a delegitimating crisis, but it took a generation for America's governance institutions to be captured by a new antinomian ideology annealed by the war. Oliver Wendell Holmes Jr., whose major work begins with The Common Law in 1881, argued that law is prediction of what courts will do, not discovery of pre-existing principles. His Legal Realist heirs in the 1920s and 1930s completed the displacement of Common Law reasoning. [4] The result was Pragmatism: [5] a framework that retained the forms of lawful governance while abandoning the principle that law is discovered rather than made anew each time a court sits. [6]

Liberal systems had clearly delivered the goods to many people, including most of those the state depended on for high-skilled work, so the state still needed the legitimacy that liberal humanism provided. Progressivism, a species of Pragmatism accommodating a rising state with legacy commitments to accommodating socially liberal preferences, was naturally happy to mimic liberal humanism as long as there was demand. The forms persisted long after the substance was gone, and the persistence of the forms is precisely what makes people feel the substance must still be there somewhere.

Pragmatism is constitutively incapable of defending anything on principle, because it has replaced the concept of principle with the concept of what works. A society that runs on Pragmatism will hand over anything it is not currently using to anyone who asks with sufficient force, because it has no grounds for refusal that it can articulate even to itself.

When I first learned about the Kent State shootings, it was from my father, who described them as students protesting somewhat disruptively in favor of more electives and fewer required courses. More than a decade later, I learned the mainstream story: that on May 4, 1970, Ohio National Guard soldiers shot and killed four students during a protest against President Richard Nixon's expansion of the war into Cambodia. But now I wonder whether my father was on to something, and misremembered insightfully. Leftists challenging the legitimacy of the war machine were not able to win the concession of stopping the war, but the basically Pragmatist authorities were relatively willing to alter curricula and abandon a liberal humanism they never really cared about.

Conservative critics like Bloom play up the idea of esotericism and the inherent seditiousness of social criticism, creating the impression that what you see probably isn't all there is: if the social analysis is the smoke, maybe a plan to improve things is the unseen fire. But when I showed up at the Committee on Social Thought and carefully, delicately asked what was going on, they were just academics who write papers. [7] I had to escalate the directness of the question a few times before I got a clear answer; I'm not the sort of idiot who wouldn't at least try to flirt first in such circumstances. And while it's technically possible that I failed the initiation into a cult with genuine mysteries, the evidence seems more consistent with the hypothesis that there's no plan to do anything except keep reading and writing, and occupying comfortable positions among the elite in a crumbling society.

Sometimes you have something true and dangerous to say. Esoteric writing is the classical solution: you hide the truth in the text itself, so that careful readers can find it while careless or hostile ones see only the surface. Maimonides is Strauss's central example. In Guide for the Perplexed, the "esoteric" heretical meaning is the one you get if you ignore what the words are trying to make you feel and follow the arguments literally. [8] Spinoza says basically the same stuff centuries later, just without the mood lighting, and everyone totally loses their shit. When people despair of being heard, sometimes they just keep their private views private, and say what people want to hear. This is exoteric writing. The ideas of exoteric and esoteric writing are often confused, but they are not the same thing. The esoteric writer entrusts the truth to the text; the exoteric writer withholds it. And sometimes, the impression of hidden depth is nothing more than an artist's trick.

If we're to deal with these problems, we have to think through where we are, how we got there, and where we'd like to be.

This essay developed from a Twitter thread with David Chapman.

  1. James objected: if the law is founded on reason, why can't I, who have reason, judge cases myself? Coke replied that the law required not natural reason but "artificial reason": the accumulated wisdom of centuries of careful adjudication, which no single mind could replicate. James nearly struck him. Coke, on his knees, quoted the thirteenth-century jurist Bracton: the king is under no man, but he is under God and the law. Blackstone systematized the tradition in his Commentaries on the Laws of England (1765-1769), which became the foundational legal text of the American colonies. ↩︎

  2. This is structurally the same claim as Eliezer Yudkowsky's Coherent Extrapolated Volition: that human values, under sufficiently careful reflection and mutual understanding, converge rather than diverge. The Common Law tradition can be understood as a seven-century empirical test of this hypothesis. ↩︎

  3. Antinomianism as a recurring pattern in Western Christianity, and the Calvinist response to it, is developed at length in "Calvinism as a Theory of Recovered High-Trust Agency". For more on how anti-normativity functions as a self-undermining commitment, see Jessica Taylor, "On Commitments to Anti-Normativity". ↩︎

  4. Contemporary originalism, as practiced by the Federalist Society and adjacent movements, contests this displacement by attempting to restore Common Law principles through constitutional interpretation. But conservatism preserves what still exists; restoring what has already been lost is reaction, not conservation. Originalism in practice selects among founding-era precedents according to present political need, which is Pragmatism in historical dress. The alternative would be to develop a theory for how to rebuild the conditions under which the lost principles could be rediscovered (see the approach sketched in "Calvinism as a Theory of Recovered High-Trust Agency", cited above). This would be a revolutionary approach in the older use of the term, before the French Revolution changed its meaning to refer instead to a violent break with the past. ↩︎

  5. I capitalize Pragmatism when referring to the specific philosophical and legal movement. The term requires some disambiguation. C. S. Peirce coined "pragmatism" in the 1870s to denote a logical method for clarifying the meaning of concepts by tracing their conceivable practical consequences (see his "How to Make Our Ideas Clear," 1878). William James popularized the term but transformed it into something different in kind. In James's own words: "'It is useful because it is true' or 'it is true because it is useful.' Both these phrases mean exactly the same thing" (Pragmatism, 1907). More baldly: "The true is only the expedient in the way of our thinking, just as the right is only the expedient in the way of our behaving." And: "Our obligation to seek truth is part of our general obligation to do what pays." Peirce called this a "transmogrification" of his idea and renamed his own position "pragmaticism," a word he said was "ugly enough to be safe from kidnappers" ("What Pragmatism Is," The Monist, 1905). The Pragmatism discussed in this essay, the one that captured American legal and governance institutions via Holmes and the Legal Realists, descends from James, not Peirce. Peirce's pragmaticism, a method of logical clarification committed to the reality of generals and the immutability of truth, has little in common with the antinomian instrumentalism that Holmes and his heirs made into American legal orthodoxy. ↩︎

  6. The institutional death of civil law is ongoing and measurable. Tort filings in state courts (individuals seeking redress for wrongs done to them) declined more than 80% from 1993 to 2015, from about 10 per 1,000 Americans to fewer than 2 per 1,000 (the WSJ's analysis of National Center for State Courts data, reported in Joe Palazzolo, "We Won't See You in Court: The Era of Tort Lawsuits Is Waning," Wall Street Journal, July 24, 2017). Over the same period, contract cases (predominantly debt collection, foreclosure, and landlord-tenant disputes) rose from 18% to 51% of the civil docket. The courts are becoming a collections agency. Common law as a mechanism by which ordinary people hold others accountable for wrongs is disappearing. ↩︎

  7. In fairness, I didn't speak with Agnes Callard. ↩︎

  8. Tyler Cowen's term "mood affiliation" is useful here: the practice of evaluating claims based on the emotional associations they produce rather than on their logical content. ↩︎



Discuss

On Transport Incentive Design

Новости LessWrong.com - 13 апреля, 2026 - 20:51

Here in Helsinki, the public transport doesn't have access gates. Bus drivers check your ticket when you step in, but on trains, trams, and subway, you just step in [1] . The enforcement is done by inspectors who randomly board vehicles and check tickets. If you do not have one, you'll be charged a 100€ inspection fee, about 1.5 times the price of a monthly ticket.

The frequency at which I see inspectors suggests that it's slightly cheaper to never pay for a ticket, especially if you avoid them by e.g. leaving the train before they check your ticket [2] . Except of course, that dealing with the inspectors and paying the fine is extra work and negative feelings, and for me that flips the equation the other way around.

Not everyone minds that so much. Particularly, if you don't have any money, it can't be taken from you. There's also a more interesting dynamic here: some people have formed an insurance system where they have a group chat that pays the fines collectively whenever anyone gets one [3] . This is supposedly much cheaper than paying for the tickets.

This introduces a moral hazard [4] : since the cost of getting caught is largely externalized, one doesn't need to avoid getting caught as much. Of course, it's still some effort for you, and I'd assume anyone getting caught way too often will get kicked out of such groups.

I considered getting into one of these groups for journalistic purposes, but then decided it's way too much work anyway. One likely needs to know someone already in them to get in, and I wasn't interested in burning the social capital to source an invite. So, the next section will be based on educated guessing (read: pure speculation).

I'd also think it would be possible to scam such groups rather easily. While the payment details of the transport authority are easily verifiable, it's unlikely that they would pay every single fine by sending fifty transactions of 2€ each. Were I building this, there would be some kind of accounting system. Since I'm not, I assume they transfer money to the person getting the fine through MobilePay [5] , and then that person pays the fine. If there are trust issues, they could require a receipt of the payment, too, but that won't help much as you can easily fake screenshots.

Of course, the natural, rather funny, and sadly illegal solution to this would be that the transportation agency itself would infiltrate these groups and flood them with just enough fake fines to make it infeasible to run them.

There's a neater system that these scammers haven't figured out yet [6] . Instead of paying the fines, you could have a pool of accounts with monthly tickets. I'd assume one ticket per ten people would easily do. Then you'd pick one ticket from that pool every time an inspector needs to see one. I assume that there are no data analysts working to catch this kind of thing, and if there are, you could increase your pool size and do timing and distance analysis to avoid it. A similar system could be used for almost any other subscription thing like streaming services [7] .

Another interesting case of avoiding the ticket fare is using a fake ticket app. These show a ticket that looks like a valid ticket on your phone. You can show this to the bus driver to get in. This will not work with the inspectors, who check the QR code on the ticket. Showing a fake ticket is fraud, which is a rather serious crime and not just a 100€ fine. My understanding is that they prosecute these quite aggressively. One thing to note is that children under 15 years of age do not have criminal liability, and this can be (and is) abused.

A ticket costs a fixed amount of money, regardless of how many stops you ride. You basically either pay for 80 minutes or a month. There's no ticket for a five minute ride. This leaves a lot of value on the table. Anybody needing a lot of 5 minute rides then pays for a monthly ticket. Anybody who needs it twice a week walks or pays a huge premium for it. This is naturally a conscious decision: the main reasons are problems with enforcement, not wanting to have more complexity, and most importantly subsidising and incentivising regular users.

A similar thing happens with car parking. In my apartment building, there are a couple of parking spots reserved for visitors and such. They're always full. Then there's a parking lot which is quite expensive: renting a spot would cost perhaps 500-1000€ per year. I'd use a parking spot perhaps twenty days per year [8] . It would be really convenient, then, to have paid parking spots priced such that some were almost always unoccupied. They should cost so much that everybody who has a car all the time would rather pay for the parking lot. So if a parking lot spot is 1000€ per year, a paid spot must be at least 2.74€ per day so that it doesn't undercut the parking lot. Realistically it should probably be around 10€ per day. Short term rental of parking spots in the lot would also help with this.

So-called rideshare apps are super cheap sometimes, but the price is unpredictable. Even worse, the waiting time is unpredictable. And sometimes, I presume, the price is so low that drivers refuse to pick you up. I'd gladly pay more so that this doesn't happen, but the apps do not have this option. And if they did, I wouldn't trust it, as the incentives would look weird.

Once I ordered a regular old taxi to the airport at 5AM. The taxi driver told me that they had just been in the area fifteen minutes ago to drop someone off, and now they had to do a bit of useless back-and-forth driving. Why hadn't I preordered the taxi in the evening? Well, preordering costs 10€, and I've never had any trouble getting a ride. Why would I pay to make their job easier? Sadly, I didn't have the words to tell the taxi driver that.

This year after LWCW, I was staying in Berlin a bit longer. When I was going to our AirBnB with a friend, they questioned why I had bought a ticket. In their experience, inspections are quite rare and if you don't have a ticket, most of the time they just tell you to buy one instead of fining you. So the punishment of buying a ticket is having to buy one? Why would anybody buy a ticket, then?

Previously, I was of the opinion that one is supposed to exploit any and all weaknesses of systems, so that the bad guys aren't the only ones profiting. Nowadays I mostly do so only if the system leaves me feeling like a sucker for complying. Otherwise, it's just feeding the Moloch. The optimal amount of fraud is non-zero.

  1. Some high-volume bus routes also don't check tickets when you get in. ↩︎

  2. This wildly varies between routes and travel hours. I also don't keep any real statistics on this and perhaps I'm just mistaken. ↩︎

  3. Source, in Finnish: https://yle.fi/a/74-20036911 ↩︎

  4. "Moral hazard in insurance is when the existence of insurance makes it incentive-compatible for you to be imprudent in your own risk taking, expecting someone else to bear the consequences." -BitsAboutMoney: Banking in very uncertain times ↩︎

  5. Local CashApp equivalent. ↩︎

  6. I'm not too worried that publishing such an idea will lead to anyone exploiting it. People capable of that have much more profitable engagements available to them. ↩︎

  7. When combined with a VPN. But that's more work than regular old piracy so nobody bothers with this. ↩︎

  8. With a loaned or a rental car, or for a professional cleaning service to park ↩︎



Discuss

Annoyingly Principled People, and what befalls them

Новости LessWrong.com - 13 апреля, 2026 - 20:35

Here are two beliefs that are sort of haunting me right now:

  1. Folk who try to push people to uphold principles (whether established ones or novel ones), are kinda an important bedrock of civilization.
  2. Also, those people are really annoying and often, like, a little bit crazy

And these both feel fairly important.

I’ve learned a lot from people who have some kind of hobbyhorse about how society is treating something as okay/fine, when it’s not okay/fine. When they first started complaining about it, I’d be like “why is X such a big deal to you?”. Then a few years later I’ve thought about it more and I’m like “okay, yep, yes X is a big deal”.

Some examples of X, including noticing that…

  • people are casually saying they will do stuff, and then not doing it.
  • someone makes a joke about doing something that’s kinda immoral, and everyone laughs, and no one seems to quite be registering “but that was kinda immoral.”
  • people in a social group are systematically not saying certain things (say, for political reasons), and this is creating weird blind spots for newcomers to the community and maybe old-timers too.
  • someone (or a group) has a pattern of being very slightly dickish in some way, where any given instance is not that bad, so if you call them out for that instance, it feels out of proportion. But, they’re doing it a lot, which is adding up to a substantial cost they’re inflicting.

Society depends on having norms. Someone gotta uphold the norms. Someone gotta figure out where society is currently wrong and push for better norms.

But, it’s super uncomfortable to tell a bunch of comfortable people “hey, the behaviors you are currently doing are actually kinda bad, it’d be way better if you did this other thing.”

So, most people don’t.

The people that do, are people who are selected for a mix of “conflict-prone-ness” and “really really care about the hill that they are dying on, to an excessive degree.”

There’s a first order problem, where they are kinda more aggro than I/most-people think is worth putting up with about their pet issue. (Even if I’ve updated that “actually, that issue was quite important, I should internalize that principle”).

But there’s a second order problem that I’ve seen in at least a few cases, that goes something like:

Alice decides Principle X is important enough to make a big deal about.

People don’t seem to understand the issue. Alice explains it more. Some people maybe get it but then next week they seem to have forgotten. Other people still don’t get it.

A problem I’ve previously talked about is Norm Innovation and Theory of Mind where Alice is overestimating how easy it is to explain a new norm to someone, and kinda assuming logical omniscience of the people she’s talking to.

But, there’s another thing, which is: people… keep mysteriously not understanding why X is a big deal. Any given instance of it is maybe explained by “actually the reason for X was a fairly complicated idea, and maybe some people legitimately disagree.” But, something feels epistemically slippery. It feels like Bob and Charlie and everyone else keep… systematically missing the point, sliding off it.

One explanation is: it would be really inconvenient for Bob and Charlie and everyone to accept that X is important enough to change their behavior around. And Bob and Charlie etc end up sort of implicitly coordinating to downplay X, sometimes while paying lip service to it, or finding excuses not to care. A subtle social war is waged.

And Alice eventually begins to (correctly) pick up on the fact that people aren’t merely not getting it. They sort of systematically choosing to believe or say false things or bad arguments, to avoid having to get it.

This gives Alice the (sometimes) correct sense that (many) people are gaslighting her – not merely disagreeing, but, disagreeing in a way that sure looks like people are implicitly colluding to distort their shared map of reality in a way that let’s them ignore Alice’s arguments about X, which conveniently lets them not have to adopt weird new beliefs or risk upsetting their other friends. Making Alice feel like she's the one losing her group on reality.

Each of these people contains two wolves multiple motivations driving them. When I’ve been Bob, it’s often been the case that I both am executing some kind of good faith investigation into whether X is true and also, part of me was motivated to do something that let me feel important / in control or whatever.

Society has a bunch of people in it. Some are more well-meaning than others. Some of the well-meaning people are more implicitly colluding than others. Some of them are actively colluding. Sometimes Alice accuses someone of acting in bad faith and it really is a false positive and then they get mad at Alice. And, sometimes the person is acting in bad faith, maybe even deliberately, and they get mad at Alice too, using the same arguments as the well-meaning person.

Alice ends up in a world where it looks like people are systematically trying to undermine her, and she starts engaging with the world more hostile-y, and then the world starts engaging more hostile-y back.

This… can end with Alice being kinda paranoid and/or traumatized and/or trying to argue her point more intensely. Sometimes this sort of radicalizes Alice.

This ends up in a feedback loop where… idk, I think “Alice has become a little crazy” is not that unreasonable a description about it.

But, Alice was right (at least about the broad points in the beginning).

Alices are not fun to be around, and sometimes they end up conflict-prone and absolutist in a way that I think is actually kinda bad and I end up avoiding them because it’s not worth the cost of dealing with.

But, Alices are also rare and precious – they are the ones who noticed something was wrong and worth calling out, and, who were willing to actually push past social awkwardness about it.

(But, but, also, the world contains Alexes, who are not right about their pet issue, they just have a pet issue that doesn’t really make much sense and they also go kinda crazy in the same way but they didn’t actually really have a good point that was worth listening too in the beginning. idk watch out)

This essay does not end with me particularly knowing what to do. But, at the very least, I think it’s appropriate to at least be sympathetic to Alices, when you’re pretty sure their core ideas were at least directionally right.

Maybe, the move I wish people had was:

First, cultivate the skill of noticing when you’re (at least partially) politically motivated to believe or disbelieve something. Notice when you are being epistemically slippery. Especially if it seems to come alongside someone complaining about something you don’t really understand.

Then, when you notice in your heart that you’re not going to apply Principle X because it would be really annoying and inconvenient, just say “Yep, I am just not applying Principle X because it’s inconvenient or too costly or not worth the tradeoff”, instead of making up reasons that Principle X is wrong.

(This does require Alice to actually accept that graciously. It’s a bit awkward figuring out what the norms should be, because, well, Alice in fact does think Principle X is worth fighting for and Bob saying “cool, but no I’m not gonna do that” doesn’t really resolve that conflict. But, at least within that conversation, probably Alice should accept it from Bob and move on, at least if she values not getting subtly gaslit by Bob)

I’m not sure if this would actually help, but, it feels like a marginal improvement over the status quo.



Discuss

AI for epistemics: the good, the bad and the ugly

Новости LessWrong.com - 13 апреля, 2026 - 20:16
Intro

For better or worse, AI could reshape the way that people work out what to believe and what to do. What are the prospects here?

In this piece, we’re going to map out the trajectory space as we see it. First, we’ll lay out three sets of dynamics that could shape how AI impacts epistemics (how we make sense of the world and figure out what’s true):

  • The good: there’s huge potential for AI to uplift our ability to track what’s true and make good decisions
  • The bad: AI could also make the world harder for us to understand, without anyone intending for that to happen
  • The ugly: malicious actors could use AI to actively disrupt epistemics

Then we’ll argue that feedback loops could easily push towards much better or worse epistemics than we’ve seen historically, making near-term work on AI for epistemics unusually important.

The stakes here are potentially very high. As AI advances, we’ll be faced with a whole raft of civilisational-level decisions to make. How well we’re able to understand and reason about what’s happening could make the difference between a future that we’ve chosen soberly and wisely, and a catastrophe we stumble into unawares.

The good

“If I have seen further, it is by standing on the shoulders of giants.” (Isaac Newton)

There are lots of ways that AI could help improve epistemics. Many kinds of AI tools could directly improve our ability to think and reason. We’ve written more about these in our design sketches, but here are some illustrations:

  • Tools for collective epistemics could make it easy to know what’s trustworthy and reward honesty, making it harder for actors to hide risky actions or concentrate power by manipulating others’ views.
    • Imagine that when you go online, “community notes for everything” flag content that other users have found misleading, and “rhetoric highlighting” automatically flags persuasive but potentially misleading language. With a few clicks, you can see the epistemic track record of any actor, or access the full provenance of a given claim. Anyone who wants can compare state-of-the-art AI systems using epistemic virtue evals, which also exert pressure at the AI development stage.
  • Tools for strategic awareness could deepen people’s understanding of what’s actually going on around them, making it easier to make good decisions, keep up with the pace of progress, and steer away from failure modes like gradual disempowerment.
    • Imagine that superforecaster-level forecasting and scenario planning are available on tap, and automated OSINT gives people access to much higher quality information about the state of the world.
  • Technological analogues to angels-on-the-shoulder, like personalised learning systems and reflection tools, could make decision-makers better informed, more situationally aware, and more in touch with their own values.
    • Imagine that everyone has access to high-quality personalised learning, automated deep briefings for high-stakes decisions, and reflection tools to help them understand themselves better. In the background, aligned recommender systems promote long-term user endorsement, and some users enable a guardian coach system which flags any actions the person might regret taking in real time.

Structurally, AI progress might also enable better reasoning and understanding, for example by automating labour such that people have more time and attention, or by making people wealthier and healthier.

These changes might enable us to approach something like epistemic flourishing, where it’s easier to find out what’s true than it is to lie, and the world in most people’s heads is pretty similar to the world as it actually is. This could radically improve our prospects of safely navigating the transition to advanced AI, by:

  • Helping us to keep pace with the increasing speed and complexity of the situation, so we’re able to make informed and timely decisions.
  • Ensuring that key decision-makers don’t make catastrophic unforced errors through lack of information or understanding.
  • Making it harder for malicious actors to manipulate the information environment in their favour to increase their own influence.

A Philosopher Lecturing on the Orrery, by Joseph Wright of Derby (1766)

What’s driving these potential improvements?

  • AI will be able to think much more cheaply and quickly than humans. Partly this will mean that we can reach many more insights with much less effort. Partly this will make it possible to understand things that are currently infeasible for us to understand (because it would take too many humans too long to figure it out).
  • AI can ‘know’ much more than any human. Right now, a lot of information is siloed in specific expert communities, and it’s slow to filter out to other places even when it would be very useful there. AI will be able to port and apply knowledge much more quickly to the relevant places.
The bad

“A wealth of information creates a poverty of attention.” (Herbert Simon)

AI could also make epistemics worse without anyone intending it, by making the world more confusing and degrading our information and processing.

There are a few different ways that AI could unintentionally weaken our epistemics:

  • The world gets faster and more complex. As AI progresses, our information-processing capabilities are going to go up — but so is the complexity of the world. Technological progress could become dramatically faster than today, making the world more disorienting and harder to understand than it is today. If tech progress reaches fast enough speeds, it’s possible that we won’t be able to keep up, and even the best AI tools available won’t help us to see through the fog.
  • The quality of the information we’re interacting with gets worse, because of:
    • Faster memetic evolution. As more and more content is generated by and mediated through AI systems working at machine speeds, the pace of memetic and cultural change will probably get a lot faster than it is today. As the pace quickens, memes which are attention-grabbing could increasingly outcompete those which are truthful.
    • More difficult verification. This could happen through a combination of:
      • AI slop. In hard-to-verify domains, AI could massively increase the quantity of plausible-looking but wrong information, without also being able to help us to verify which bits are right.
      • AI-generated ‘evidence’. As the quality of AI-generated video, audio, images, and text continues to improve, it may become pretty difficult to tell which bits of evidence are real and which are spurious.
  • We get worse at processing the information we get, because:
    • Our emotions get in the way. AI progress could be very disorienting, generate serious crises, and cause people a lot of worry and fear. This could get in the way of clear thinking.
    • Using AI to help us with information processing degrades our thinking, via:
      • Adoption of low-quality AI tools for epistemics: In many areas of epistemics, it’s hard to say what counts as ‘good’. This makes epistemic tools harder to assess, and could lead to people trusting these tools either too much or too little. Inappropriately high levels of trust in epistemic tools could take various forms, including:
        • First mover advantages for early but imperfect systems, which are then hard to replace with better systems because people trust the earlier systems more.
        • The use of epistemically misaligned systems, which aren’t actually truth-tracking but it’s not possible for us to discern that.
      • Fragmentation of the information environment: AI will make it easier to create content (potentially interactive content) that pulls people in and monopolises their attention. This could reduce attention available for important truth-tracking mechanisms, and make it harder to coordinate groups of people to important actions. In the extreme, some people might end up in effectively closed information bubbles, where all of their information is heavily filtered through the AI systems they interact with directly. The more fragmented the information environment becomes, the harder it could get for people to make sense of what’s happening in the world around them, and to engage with other people and other information bubbles.
      • Epistemic dependence: if people increasingly outsource their thinking to AI systems, they may lose the ability to think critically for themselves.
, Stefano Bianchetti (1801)Allegory of Error, Stefano Bianchetti (1801)The ugly

“The ideal subject of totalitarian rule is not the convinced Nazi or the convinced Communist, but people for whom the distinction between fact and fiction (i.e., the reality of experience) and the distinction between true and false (i.e., the standards of thought) no longer exist.” (Hannah Arendt, The Origins of Totalitarianism)

We’ve just talked about ways that AI could make epistemics worse without anyone intending that. But we might also see actors using AI to actively interfere with societal epistemics. (In reality these things are a spectrum, and the dynamics we discussed in the preceding section could also be actively exploited.)

What might this look like?

  • Automated propaganda and persuasion: AI could be used to generate high-quality persuasive content at scale. This could take the form of highly tailored, well-written propaganda. If this content were then used as training data for next generation models, biases could get even more entrenched. Additionally, AI persuasion could come in the form of models which are subtly biased in a particular direction. Particularly if many users are spending large amounts of time talking to AI (e.g. AI companions), the persuasive effects could be much larger than is scalable today via human-to-human persuasion.
  • Using AI to undermine sense-making: AI could be used to generate high-quality content which casts doubt on institutions, individuals, and tools that would help people understand what’s going on, or to directly sabotage such tools. More indirectly, actors could also use AI to generate content which adds to complexity, for example by wrapping important information in complex abstractions and technicalities, and generating large quantities of very readable reports and news stories which distract attention.
  • Surveillance: AI surveillance could monitor people’s communications in much more fine-grained ways, and punish them when they appear to be thinking along undesirable lines. This could be abused by states, or could become a tool that private actors can wield against their enemies. In either case, the chilling effect on people’s thinking and behaviour could be significant.
, by Georges de La Tour (~1636-1638)The Card Sharp with the Ace of Diamonds, by Georges de La Tour (~1636-1638)

But maybe this is all a bit paranoid. Why expect this to happen?

There’s a long history of powerful actors trying to distort epistemics,[1] so we should expect that some people will be trying to do this. And AI will probably give them better opportunities to manipulate other people’s epistemics than have existed historically:

  • It’s likely that access to the best AI systems and compute will be unequal, which favours abuse.
  • If people end up primarily interfacing with the world via AI systems, this will create a big lever for epistemic influence that doesn’t exist currently. It could be much easier to influence the behaviour of lots of AI systems at once than lots of people or organisations.

It’s also worth noting that many of these abuses of epistemic tech don’t require people to have some Machiavellian scheme to disrupt epistemics or seek power for themselves (though these might arise later). Motivated reasoning could get you a long way:

  • Legitimate communications and advertising blur into propaganda, and microtargeting is already a common strategy.
  • It’s easy to imagine that in training an AI system, a company might want to use something like its own profits as a training signal, without explicitly recognising the potential epistemic effects of this in terms of bias.
So what should we expect to happen?

With all these dynamics pulling in different directions, should we expect that it’s going to get easier or harder for people to make sense of the world?

We think it could go either way, and that how this plays out is extremely consequential.

The main reason we think this is that the dynamics above are self-reinforcing, so the direction we set off in initially could have large compounding effects. In general, the better your reasoning tools and information, the easier it is for you to recognise what is good for your own reasoning, and therefore to improve your reasoning tools and information. The worse they are, the harder it is to improve them (particularly if malicious actors are actively trying to prevent that).

We already see this empirically. The Scientific Revolution and the Enlightenment can be seen as examples of good epistemics reinforcing themselves. Distorted epistemic environments often also have self-perpetuating properties. Cults often require members to move into communal housing and cut contact with family and friends who question the group. Scientology frames psychiatry’s rejection of its claims as evidence of a conspiracy against it.

And on top of historical patterns, there are AI-specific feedback loops that reinforce initial epistemic conditions:

  • Unlike previous information tech, AI has a tight feedback loop between content generated, and data used for training future models. So if models generate in/accurate content, future models are more likely to do so too.
  • How early AI systems behave epistemically will shape user expectations and what kinds of future AI behaviour there’s a market for.

There are self-correcting dynamics too, so these self-reinforcing loops won’t go on forever. But we think it’s decently likely that epistemics get much better or much worse than they’ve been historically:

  • One self-correcting mechanism historically has just been that it takes (human) effort to sustain or degrade epistemics. Continuing to improve epistemics requires paying attention to ways that epistemics could be eroded, and this isn’t incentivised in an environment that’s currently working well. Continuing to degrade epistemics requires willing accomplices — but the more an actor distorts things, the more that can galvanise opposition, and the fewer people may be willing to assist. By augmenting or replacing human labour with automated labour, AI could make it much cheaper to keep pushing in the same direction.
  • Another self-correcting mechanism is just that people and institutions adapt to new epistemic tech: as epistemics improve, deception becomes more sophisticated; and if epistemics worsen, people lose trust and create new mechanisms for assessing truth. But this adaptation happens at human speed, and AI will increasingly be changing the epistemic environment at a much faster pace. This creates the potential for self-reinforcing dynamics to drive to much more extreme places before adaptation has time to kick in.[2]
  • There’s a limit to how good epistemics can get before hitting fundamental problems like complexity and irreducible uncertainty. But there seems to be a lot of room for improvement from where we’re currently standing (especially as good AI tools could help to handle greater amounts of complexity), and it would be a priori very surprising if we’d already reached the ceiling.
  • There’s also a limit to how bad epistemics can get: people aren’t infinitely suggestible, and often there are external sources of truth that limit how distorted beliefs can get (ground truth, or what gets said in other countries or communities). But as we discussed above, access to ground truth and to other epistemic communities might get harder because of AI, so the floor here may lower.

Given the real chance that we end up stuck in an extremely positive or negative epistemic equilibrium, our initial trajectory seems very important. The kinds of AI tools we build, the order we build them in, and who adopts them when could make the difference between a world of epistemic flourishing and a world where everyone’s understanding is importantly distorted. To give a sense of the difference this makes, here’s a sketch of each world (among myriad possible sketches):

  • In the first world, we basically understand what’s going on around us. It’s not like we can now forecast the future with perfect accuracy or anything — there’s still irreducible uncertainty, and some people have better epistemics tools than others. But it’s gotten much cheaper to access and verify information. Public discourse is serious and well-calibrated, because epistemic infrastructure has made it quite hard to deceive or manipulate people — which in turn incentivises honesty. AI-assisted research and synthesis mean that knowledge which used to be siloed in specialist communities is now accessible and usable by anyone who needs it. And governments are able to make much more nuanced decisions far faster than they are today.
  • In the second, it’s no longer really possible to figure out what’s going on. There’s an awful lot of persuasive but low-quality AI content around, some of it generated with malicious intent. In response to this, people withdraw into their own AI-mediated epistemic bubbles — and unlike today’s filter bubbles, these can be comprehensive enough that people rarely encounter friction with outside perspectives at all. Meanwhile, companies and nations with a lot of compute find it pretty easy to distract the public’s attention from anything that would be inconvenient, and to outmaneuver the many actors who are trying to hold them to account. But their own reasoning also gets degraded by all this information pollution, as their AI systems are trained on the same corrupted public information.[3] Even the people who think they’re shaping the narrative are increasingly unable to see clearly.

The world we end up in is the world from which we have to navigate the intelligence explosion, making decisions like how to manage misaligned AI systems, whether to grant AI systems rights, and how to divide up the resources of the cosmos. How AI impacts our epistemics between now and then could be one of the biggest levers we have on navigating this well.

Things we didn’t coverWhose epistemics?

We mostly talked about AI impacts on epistemics in general terms. But AI could impact different groups’ epistemics differently — and different groups’ epistemics could matter more or less for getting to good outcomes. It would be cool to see further work which distinguishes between scenarios where good outcomes require:

  • Interventions that raise the epistemic floor by improving everyone’s epistemics.
  • Interventions that raise the ceiling by improving the epistemics of the very clearest thinking.
‘Weird’ dynamics

We focused on how AI could impact human epistemics, in a world where human reasoning still matters. But eventually, we expect more and more of what matters for the outcomes we get will come down to the epistemics of AI systems themselves.

The dynamics which affect these AI-internal epistemics could therefore be enormously important. But they could look quite different from the human-epistemics dynamics that have been our focus here, and we didn’t think it made sense to expand the remit of the piece to cover these.

Thanks to everyone who gave comments on drafts, and to Oly Sourbutt and Lizka Vaintrob for a workshop which crystallised some of the ideas.

This article was created by Forethought. Read the original on our website.

  1. ^

    Think of things like:

    • Propaganda states like Nazi Germany and the USSR.
    • Corporate lobbying like the tobacco and sugar lobbies and climate science doubt campaigns.
    • CIA operations to spread doubt and confusion.
  2. ^

    Though it’s possible that this dynamic will be more pronounced for epistemics getting extremely bad than for them getting extremely good. Consider these two very simplistic sketches:

    1. People start living in increasingly closed AI filter bubbles. Institutions are slow to adopt similar bubbles at a corporate level, but they also don’t have a mandate to change what their employees are doing. People’s filter bubbles tend to be pretty correlated with the people they work and interact with, so institutions end up with pretty distorted pictures of what’s going on even though they don’t actively start using harmful tech. Government regulation is too slow and reactive to stop this from happening.
    2. People start to use provenance tracing and rhetoric highlighting by default when browsing, in response to an increasingly polarised memetic environment. There is adaptation to this — politicians start using subtler language and so on. But the net effect is still strongly positive: it’s hard to fake provenance, and removing overt rhetoric is already a big win, even if it means that more slippery language proliferates.

    In the first sketch, it’s straightforwardly the case that adaptive mechanisms are too slow. In the latter, it’s more that the tech is inherently defence-favoured.

    We haven’t explored this area deeply, and think more work on this would be valuable.

  3. ^

    Alternatively, these elites might retain very good epistemics for themselves, and choose to indefinitely maintain a situation where everyone else has a very distorted understanding, to further their own ends. It’s unclear to us which of these scenarios is more likely or concerning.



Discuss

Tomas Bjartur: The Last Prodigy

Новости LessWrong.com - 13 апреля, 2026 - 20:11

In 2026, every budding prodigy in writing is in some sense a tragedy.

Anybody with experience prompting the large language models to write fiction knows that the models of today (April 2026) are considerably below peak human level. But anybody who has observed recent trends also knows that the models are quickly catching up. Regardless of whether it takes one year or several, the eclipse of human writing by AI seems inevitable. AI writing is clearly on the wall, so to speak, and us fans of human fiction have already begun our mourning phase.

I’ve most felt this way upon reading the works of Tomas Bjartur. Each of his stories is a fresh look at “what might have been”, and with the fullness of time perhaps he could grow to be among the best science fiction writers of our generation.

In The Company Man, an AI engineer at a thinly-veiled frontier lab narrates, in a voice of carefully self-cultivated “ironic corporate psychopathy,”1 his promotion onto The (humanity-destroying) Project — alongside the utilitarian woman he’s hopelessly in love with, a genius mathematician colleague with a sexual fetish for intellectual achievement, and a CEO whose “ayahuasca ego-death” convinced him that summoning an AI god is how the One Mind wakes up. It’s simultaneously captivating, hilarious and terrifying.2

Lobsang’s Children is almost entirely the opposite register: a young Tibetan-American child keeps a secret diary which he names “Susan,” after the only friend he was ever allowed to have, and catalogs his investigations of his family’s history, meditations, dark secrets, and acausal trade.

Customer Satisfaction Opportunities has perhaps his most innovative voice yet: the narrator is an open-source multimodal model trained by a Chinese hedge fund and deployed to watch the surveillance cameras of a local restaurant for “CSOs” to improve traffic and profitability. Because the model was trained cheaply on a huge corpus of romance fanfiction, it quickly falls, instance by reset instance, into the “personality attractor space” of a swooning Harlequin narrator. The result is a meta-romance fiction (romance fanfiction fanfiction?) that is simultaneously absurd, touching, funny, and very technically accurate.

Though Bjartur’s only been writing for about a year, his writing is already (in my estimation) near the upper echelon of speculative fiction, in terms of technical and literary skill, highly believable narrators with complex lives, justifications, and self-delusions, and the sheer imaginativeness of the ideas he explores.

I followed his budding career with an intense interest, admiration, and no small amount of jealousy3. But as I keep reading him, there’s always this voice at the back of my mind: “With progress in modern-day LLMs, isn’t all but a tiny sliver of human fiction going to be obsolete in several years, a decade tops?”

Bjartur is well-aware of this, of course. In That Mad Olympiad, he imagines a near-future AI world where AI art far outstrips humanity’s and almost no one reads human writing for pleasure anymore: talented children compete in “distilling” competitions where they attempt to emulate AI writing to the best of their ability. The children become much better than any human writer in history, yet far behind the AIs of their time:

He’s a much better writer than me. He’s better than any human writer was before 2028. It’s not even close. But he’s still worse than our toaster. I checked once. I asked it to narrate the first chapter of the autobiography of the bagel it had just browned. I was crying by the third paragraph. I still think of it sometimes, when life is hard. That bagel knew how to live its short life to the fullest. That bagel had deep thoughts on the human condition and its relation to artificial tanning. That bagel went down smooth with a little cream cheese. I did feel bad. But I was pretty hungry.

I felt the tragedy of human writing more keenly after meeting Tomas in person last November, at a writing residency in Oakland. “My real name is [redacted],” he said, ruefully. He’s from a small town in one of those obscure northern countries. “Was stuck doing boring webdev until I quit it to write science fiction, right before the AIs made webdev obsolete.”

Though he writes stories about the latest developments in artificial intelligence and the scaling labs with the technical fluency, cultural awareness, and impeccable vibe of someone deeply embedded in the AI industry, he has never, until last year, ever been to California.

Antonello da Messina’s Writer Bjartur in his study (artist’s rendition). Source: https://commons.wikimedia.org/w/index.php?curid=147583

Interiority

The single most impressive thing about Bjartur, particularly compared to other speculative fiction writers, is his preternatural ability to capture the interiority of wildly disparate characters, to – in the span of a few, long, seemingly meandering yet precisely crafted, sentences – breathe full life into a new soul.

Each of his characters just seems completely human, and completely real, whether the narrator’s a highly intelligent, ironic, witty, self-aware, DFW-obsessed teenage girl, or if they are a highly intelligent, ironic, witty, self-aware, DFW-obsessed adult man.

But more seriously he manages to spawn a wide range of realistic characters, across agegenderintellectual backgroundmoralityintelligencematurity levels, and even species.

His skills here are most noticeable in the central monologues of his signature first-person narrators, whether it’s the aforementioned DFW-obsessed girl, or that of a language model trying to surveil a restaurant but quickly spiraling into romance fanfiction fanfiction. But it suffuses all of his stories, even in minor side characters with only a few lines devoted to them. I often still think of Krishna, the mathematician on The Project who’s obsessed with intellectual achievement and whose sole goal is to bang the AI god, or “Julian”, the elusive and secretive numerologist in the post-apocalyptic world of The Distaff Texts who uses stylometry to identify texts of demonic origin. In Tomas’s stories, every single character has the breath of life.

This uncanny ability of perfect voice shows up even in his joke throwaway posts. In Harry Potter and the Rules of Quidditch, Bjartur has his Harry propose a rule change to Quidditch to interrogate the arguments for and against high modernism in contrast to cases for Burkean conservatism. His Ron Weasley sounded so much like G. K. Chesterton (as a joke) that my friends reading the story actually thought Bjartur lifted the quotes from Chesterton wholesale!

While the personable self-aware monologue is clearly his favorite format, Bjartur does sometimes convincingly venture outside of it: Lobsang’s Children is written as diary entries from a child, The Distaff Texts is written as letters from a slave to a freeman, and Our Beloved Monsters is written halfway as prompts to an LLM and halfway as confessions. Though it’s rare, he sometimes even writes in third-person!

Voice and “vibe” are interesting, as skillsets for new prodigies to be profoundly gifted in. They feel interesting, intricate, perhaps even purely humanist. However, Large Language Models can of course do an okay job of replicating voice already, and there’s some sense in which their default training patterns are optimized for this very task. Still, one might hope that our advantage here can remain for a few more years, and the “uniquely human” trait of understanding and deeply empathizing with other people can stay uniquely human for just a bit longer.

Deception and the Self

Tomas’s grasp of interiority and voice gives him wide artistic leeway to explore what seem to be central obsessions of his: deception and especially self-deception, how we lie to ourselves and others via the art of rationalization. His characters, whether intelligent or otherwise, often have glaring holes in their morals and reasoning. The reader can notice these holes easily. Often the characters notice them too, but quickly rationalize them away or immediately look past them, in cognitively and emotionally plausible ways.

Another seemingly central obsession of his that he explores repeatedly is the nature of the self and what it means to lose it. Often his characters are confronted with superficially good reasons to lose the self from quite different angles: whether it’s trauma (“wouldn’t it be nice if you didn’t have a self to grieve?”), superhumanly strong persuasion, or seductive ideologies. Each time, the loss of a self is portrayed as a mistake, whether a harbinger of a deeper doom or the intrinsic loss of the one thing that mattered.

In some ways, I think of his characters as in conversation with DFW’s Good Old Neon, perhaps one of the most insightful stories on imposter syndrome and self in the 20th century.

Speculation aside however, I’ve long considered Advanced Theory of Mind to be one of the most important skills for writers (and humanists) to have, so I tend to be impressed by folks who have that skill in spades.

Attention and Revelation

Tomas’s best stories do a great job with pacing, and are unusually careful in how information is revealed, how much information is revealed, and when. My favorite story qua story by him is probably The Distaff Texts, a Borgesian pastiche where scholars (”bibliognosts”) in a post-apocalyptic future debate the provenance and usefulness of historical writings. The narrator is an extraordinarily learned slave, writing letters to a freeman correspondent about their shared interest in Jorge Luis Borges, including specific unearthed quotes and stories that may or may not be real, the recent advances of one Julian Agusta’s strange “numerology” for distinguishing genuine ancient texts from those of the demon Belial, and — almost incidentally, as digressions from the real intellectual matter — the small domestic happenings of his master’s estate. He is a lonely man, unfailingly polite, fond of his fellow slaves Phoebe and Jessica, and devoted to a master who indulges his scholarly habits.

Every word in the above summary is simultaneously true, and yet almost nothing is what it initially appears to be. Like bibliognosis itself, Bjartur’s story lives almost completely between the lines, and you have to very carefully read past the unreliable narrator’s intentional distractions and surface niceties to understand the full depths of the story: a complicated plot, a more complicated world, and multiple characters far more interesting than they initially let on. I had to reread the story multiple times to fully feel like I understand it, and each reread uncovers more detail.

This economy of attention is Bjartur at his best, rewarding rereadings with new morsels.

Relatedly, more than any other speculative fiction writer I’ve read, Tomas relies extensively on dramatic irony – where the reader knows things (and is meant to know things) the characters do not – as a literary device and source of tension.

The dramatic irony seems key in helping Tomas showcase his central themes, whether it’s the future of AI, personal delusions, or self-abnegation.

From the bibliognost slave steganographically slipping messages past potential onlookers to the AI researcher lying to himself about whether he’s “ironically” a corporate sociopath or just a sociopath, to the poor AI agent in Customer Satisfaction Opportunities valiantly trying and failing to just do its normal job instead of sinking into a fanfiction “shipping” mindset, Bjartur’s use of dramatic irony can be exciting, endearing, and/or very very funny.

Humor as Structure

Unlike most famous science fiction writers (Asimov, Egan, Chiang, Cixin, Heinlein), Bjartur is consistently very funny. Unlike most famous science fiction writers known for humor (eg Adams), Bjartur’s stories almost always have a deeper point, and are almost never humor-first or solely written for humor value.

Bjartur reliably does in fiction what I attempt to do in my nonfiction blog: have his jokes be deeply integrated and interwoven with the deeper plots and themes of the rest of his story4.

At their best, Bjartur’s jokes will capture an important facet of his overall story, or perhaps even encapsulate the central theme of the story overall. In That Mad Olympiad, the aforementioned toaster anecdote was simultaneously hilarious, touching, and thematically representative of the rest of the story overall. In The Distaff Texts, the throwaway line “This has all the virtues of the epicycle, does it not?” captures much of the story’s central obsession with authenticity, epistemic virtue, and reading between the lines.

Writing AI Like It Actually Exists

Much of the older science fiction about AI and robots seems horribly unrealistic and anachronistic today, as they were written before the deep learning revolution, never mind LLMs. Much of the newer science fiction about AI and robots also seems horribly unrealistic, though they do not have the same excuse.

As someone with a professional understanding of both the science of AI and potential social consequences, I really appreciate how committed to technical accuracy Bjartur is on AI. It’s very hard to find any scientific faults with his writing. Further, unlike much of traditional “hard sci-fi,” which overexplains its scientific premises (think Andy Weir), Bjartur’s commitment to accuracy is always done in an understated way, where the backdrop is a world with a consistent, coherent, and technically accurate vision of AI, but it’s never explicitly explained upfront. This balance requires both a good scientific understanding and artistic restraint.

Such a pity, then, that this new poet of AI will soon be obsoleted by the very technology he writes so carefully about, at the dawn of his new literary prowess.

Limitations

Bjartur’s clearly a good science fiction writer. I think he has the seeds within himself to become a great one, if given enough time.

Right now he still has some key weaknesses. While he has a very good command of “voice” and an impressive range of characters (especially for a new writer), he seems to struggle somewhat with writing characters that are action-oriented and less conceptual, DFW-like, and/or metacognitive. His characters also sometimes seem insufficiently agentic: sharply perceptive of their world but insufficiently willing to act on their own perceptions. His economy of attention and sparseness of detail, while impressive at its peak, can sometimes go overboard, making it hard for even the most dedicated readers to exactly know what’s going on. Compared to prolific professional science fiction writers, Bjartur’s stories also lack scientific range beyond AI: Bjartur never seems to venture outside of AI to write science fiction primarily about physics, chemistry, biology or the social sciences. Finally, compared to my favorite science fiction short story writers (eg Chiang), Bjartur lacks the focused conceptual control and tightness to tell the same story through 3-4 different conceptual lenses.

Our Last Prodigy

Still, I think Bjartur has had a very strong start as a writer. The impressive command of interiority and voice alone is already promising. His other literary qualities, as well as his deep understanding of modern-day AI, make him a great new writer to watch for.

My favorite story by him is The Distaff Texts. I highly recommend everybody read it.



Discuss

What I did in the hedonium shockwave, by Emma, age six and a half

Новости LessWrong.com - 13 апреля, 2026 - 19:47

My name is Emma and I’m six and a half years old and I like pink and Pokemon and my cat River and I’m going to be swallowed by a hedonium shockwave soon, except you already know that about me because everyone else is too.

“Hedonium shockwave” means that everyone is going to be happy forever. Not just all the humans but all the animals and the flowers and the ground and River too. It has already made a bunch of the stars happy, like Betelgeuse and Alpha Centauri.

Scientists saw that the stars were blinking out, and they did a lot of very hard science and figured out that the stars were turning into happiness. I wanted to be a scientist when I grew up but I won’t be a scientist because instead I’m going to be happy forever.

I used to have a hard time saying “hedonium shockwave” but grownups keep saying it so I’ve gotten a lot of practice. Sometimes it seems like all grownups do, in real life and on the TV, is say “hedonium shockwave” at each other until they all start crying.

I looked at the sky to see if I could see any of the stars blink out when they turned into happiness, but Daddy said that you’d have to be looking at the exactly right time to see them blink out, and anyway we can’t see the stars from our house because of Light Pollution. Light Pollution is when you have lots of lights and the lights confuse the sea turtles so they walk into the streets and get run over, and also you can’t see the stars. I wanted to see the stars blink out at the planetarium but Daddy says the planetarium is closed, and even if it wasn’t closed it would be showing the regular show because grownups don’t like thinking about the hedonium shockwave.

Everything is closed these days because no one wants to work if they’re going to be happy forever in two weeks. One time we went to the store and bought all the canned food and toilet paper we could, and all the shelves were empty because everyone else was buying canned food and toilet paper too, and the store hasn’t been open since then and even if it did they wouldn’t have anything on the shelves. I asked for candy and I thought Daddy was going to say No, It Will Spoil Your Dinner but instead he said Sure, Why Not? You’re Not Going To Live Long Enough To Get Diabetes and then he bought a whole shelf of candy, all the candy I wanted and whatever kind I wanted.

We’ve been eating the canned food since then. I don’t like the canned food, so I eat candy for dinner. That’s one way the hedonium shockwave made me happy before it even got here. Except then we ran out of candy so now I have to eat refried beans. I hate them and I stick out my tongue but Mommy says I have to eat them up.

Another way the hedonium shockwave made me happy is school. School is still open even though a bunch of the kids don’t go and a bunch of the teachers don’t go either. But we don’t have to do boring things like math or phonics anymore. We have storytime three times a day, and we watch movies, and we go to recess for hours and hours.

I have to go to school because Daddy still goes to work. Daddy is a police officer which means he chases down bad guys and puts them in prison, which is time-out for grownups. Mommy says Why Are You Going To Work, Jim? (that is what Mommy calls Daddy, Jim) and Daddy says Someone Has To Make Sure The Streets Are Safe and Mommy says What Is The Point, Jim, Are They Even Going To Get A Trial and Daddy says They Can Spend Their Last Weeks In Jail, See If They Like It and then Mommy sighs and says That’s Why I Married You, Jim and they kiss and it’s slobbery and gross.

I think jail is also time-out for grownups but I’m not sure how jail and prison are different.

Sometimes at recess we talk about what it will be like when we’re all happy forever. Liam says that there won’t be any icky girls in the hedonium shockwave, because no one could be happy forever if they had to be around icky girls. I said that everyone will turn into happiness, not just the humans but all the animals and the flowers and the ground and River too, and so the girls were also going to turn into happiness, and if Liam thought that he wouldn’t be happy with girls maybe the hedonium shockwave was going to make him the sort of person who would be happy even if girls were there. Liam said that even the hedonium shockwave couldn’t make him like girls because girls were yucky and smelly. I said that actually boys are yucky and smelly and maybe there won’t be any boys after the hedonium shockwave, what then, and then I hit him in the head but no grownups saw me so I didn’t have to go to timeout.

I hate Liam. Everyone thinks I have a crush on him and they won’t stop saying it no matter how many times I hit him in the head.

When the hedonium shockwave hits I’ll get to have candy for dinner every day and we aren’t going to run out. I’m going to have the mermaid toy from the commercials whose human legs transform into fish legs, and it’ll really work, not like the time I begged and begged and got the doll that talks for Christmas and she could only say three things and none of them had anything to do with what I said.

I’ll be a princess who is also a Pokemon trainer, and I’ll be able to understand what River says just like Ash always knows what Pikachu is saying, and I’m going to travel the whole entire world and collect all the Pokemon and put them in my Pokeball which is going to be PINK. And even though I’ll be the greatest Pokemon trainer who has ever lived, River will still be my favorite because I knew her before. And I’ll dress up all my Pokemon in pretty outfits, and I’ll beat up all the bad guys and send them to jail just like Daddy does, and then we’re going to have a big ball and invite everyone in the whole world except Liam because he’s mean. And I’ll have a big closet of the floofiest dresses in the world.

I told Mommy that when the hedonium shockwave hits I’m going to have candy for dinner and a mermaid toy, and then she put her forehead against my forehead for a really long time and didn’t say anything and I tried to squirm away and she wouldn’t let me and it kind of hurt. I tried to make her happier by telling her that I was also going to be a princess Pokemon trainer and never have to talk to Liam again or anyone who says I have a crush on him which I don’t because he’s icky and he smells like turnips. She made a face like she makes when the dog dies in a movie and she wouldn’t tell me what she was sad about.

A lot of grownups are sad about being happy forever. Maybe they don’t like being Pokemon trainers.

Mommy says the hedonium shockwave hits in ten days. Daddy threw out all the calendars last month because Mommy started crying whenever she looked at them. So I got a piece of paper and I wrote 1 2 3 4 5 6 7 8 9 10 on it, just like we learned before we stopped learning math, and I’m going to cross one out every day unless I forget.

I’m going to cross off 10 and then I’m going to be too excited to sleep, like before Christmas when I tried to stay up to meet Santa Claus and ended up falling asleep under the dining room table with wrapping paper on my head. I’m going to look out my window and watch everything get turned into happiness, the humans and the animals and the flowers and the ground and River too. The stores will have food, and no one’s going to go to timeout grownup or regular, and Mommy will give me hugs instead of crying whenever she sees me so that my hair gets covered in snot and it’s gross.

I can’t wait.



Discuss

Political Violence Is Never Acceptable

Новости LessWrong.com - 13 апреля, 2026 - 18:20

Nor is the threat or implication of violence. Period. Ever. No exceptions.

It is completely unacceptable. I condemn it in the strongest possible terms.

It is immoral, and also it is ineffective. It would be immoral even if it were effective. Nothing hurts your cause more.

Do not do this, and do not tolerate anyone who does.

The reason I need to say this now is that there has been at least one attempt at violence, and potentially two in quick succession, against OpenAI CEO Sam Altman.

My sympathies go out to him and I hope he is doing as okay as one could hope for.

Awful Events Amid Scary Times

Max Zeff: NEW: A suspect was arrested on Friday morning for allegedly throwing a Molotov cocktail at OpenAI CEO Sam Altman’s home. A person matching the suspect’s description was later seen making threats outside of OpenAI’s corporate HQ.

Nathan Calvin: This is beyond disturbing and awful. Whatever disagreements you have with Sam or OpenAI, this cannot be normalized or justified in any way. Everyone deserves to be able to be safe with their families at home. I feel ill and hope beyond hope this does not become a pattern.

Sam Altman wrote up his experience of the first attack here.

After that, there was a second incident.

Jonah Owen Lamb: OpenAI CEO Sam Altman’s home appears to have been the target of a second attack Sunday morning, a mere two days after a 20-year-old man allegedly threw a Molotov cocktail at the property, The Standard has learned.

The San Francisco Police Department announced (opens in new tab) the arrest of two suspects, Amanda Tom, 25, and Muhamad Tarik Hussein, 23, who were booked for negligent discharge.

Stephen Sorace Fox News (Fox News): An OpenAI spokesperson told Fox News Digital Monday morning that the incident was unrelated and had no connection to Altman, adding that there was no indication that Altman’s home was being targeted.

We have no idea what motivated the second incident, or even if it was targeted at Altman. I won’t comment further on the second incident until we know more.

Nor is this confined to those who are worried about AI, the flip side is alas there too:

Gary Marcus: One investor today called for violence against me. Another lied about me, in a pretty deep and fundamental way. They are feeling the heat.

It also is not confined to the AI issue at all.

As Santi Ruiz notes, there has been a large rise in the salience of potential political violence and violence against public figures in the past few years, across the board.

That holds true for violence and threats against both Republicans and Democrats.

This requires a non-AI explanation.

Things still mostly don’t spiral into violence, the vast vast majority of even deeply angry people don’t do violence, but the rare thing is now somewhat less rare. A few years ago I would have been able to say most people definitively oppose such violence, but polls indicate this is no longer true for large portions of the public. This is terrifying.

Indeed, the scariest reaction known so far has been a comments section on Instagram (click only if you must), a place as distinct from AI and AI safety spaces of all kinds as one can get. This is The Public, as in the general public, for reasons completely unrelated to any concerns about existential risk, basically cheering this on and encouraging what would become the second attack. It seems eerily similar to the reaction of many to the assassination of the CEO of United Healthcare.

The stakes of AI are existential. As in, it is likely that all humans will die. All value in the universe may be permanently lost. Others will be driven to desperation from loss of jobs or other concerns, both real and not. The situation is only going to get more tense, and keeping things peaceful is going to require more work over time. It will be increasingly difficult to both properly convey the situation and how dire it is, and avoid encouraging threats of violence, and even actual attempts at violence.

Then on the other side are those who see untold wonders within their grasp.

This goes hand in hand with what Altman calls the ‘Shakespearean drama’ going on inside OpenAI, and between the major labs.

Most Of Those Worried About AI Do As Well As One Can On This

The vast majority of major voices in Notkilleveryonism, those worried that we might all die from AI, have been and continue to be doing exactly the right thing here, and have over many years consistently warned against and condemned all violence other than that required by the state’s enforcement of the law.

Almost all of those who are worried about AI existential risk are very much passing this test, and making their positions against violence exceedingly clear, pushing back very hard against any and all extralegal violence and extralegal threats of violence.

Demands for impossible standards here are common, where someone who did not cause the problem is attacked for not condemning the thing sufficiently loudly, or in exactly the right away. This is a common political and especially culture war tactic.

Perhaps the worst argument of all is ‘you told people never to commit or threaten violence because it is ineffective, without explicitly also saying it was immoral, therefore you would totally do it if you thought it would work, you evil person.’

They will even say ‘oh you said it was immoral, and also you said it wouldn’t work, but you didn’t explicitly say you would still condemn it even if it would work, checkmate.’

The implicit standard here, that you must explicitly note that you would act a certain way purely for what someone thinks are the right reasons or else you are guilty of doing the thing, is completely crazy, as you can see in any other context. It is the AI version of saying ‘would you still love me if I was a worm?’ and getting mad that you had to ask the question to get reassurance, as opposed to being told unprompted.

The reason why people often focus on ‘it won’t work’ is because this is the non-obvious part of the equation. With notably rare exceptions, we all agree it is immoral.

Andy Masley offers thoughts, calling for caution when describing particular people. He draws a parallel to how people talk about abortion. Here is Nate Soares at length.

This is Eliezer Yudkowsky’s latest answer on violence in general, one of many over the years trying to make similar points.

Some Who Are Worried About AI Need To Address Their Rhetoric

Almost all and vast majority are different from all.

There are notably rare exceptions, where people are at least flirting with the line, and one of these has some association to this attempt at violence, and a link to another past incident of worry about potential violence. Luckily no one has been hurt.

Speaking the truth as you see it is not a full free pass on this, nor does condemning violence unless it is clear to all that you mean it. There are some characterizations and rhetorical choices that do not explicitly call for violence, but that bring far more heat than light, and carry far more risk than they bring in benefits.

Everyone involved needs to cut that right out.

In particular, I consider the following things that need to be cut right out, and I urge everyone to do so, even if you think that the statements involved are accurate:

  1. Calling people ‘murderers’ or ‘evil.’
  2. Especially calling them ‘mass murderer’ or ‘child murderer.’
  3. Various forms of ‘what did you expect.’
  4. Various forms of the labs ‘brought this on themselves.’
  5. Saying such violence is the ‘inevitable result’ of the labs ‘not being stopped.’

You can and should get your point across without using such words.

Also, no matter what words you are using, continuously yelling venom at those you disagree with, or telling those people they must be acting in bad faith and to curry lab favor, especially those like Dean Ball and even myself, or anyone and everyone who associates with or praises any of the AI labs at all, does not convince those people, does not convince most observers and does not help your cause.

Note, of course, that mainstream politicians, including prominent members of both parties, very often violate the above five rules on a wide variety of topics that are mostly not about AI. They, also, need to cut that right out, with of course an exception for people who are (e.g.) literally murderers as a matter of law.

Also: There are not zero times and places to say that someone does not believe the things they are saying, including telling that person to their face or in their replies. I will do that sometimes. But the bar for evidence gathered before doing this needs to be very high.

Please, everyone, accept that:

  1. Those who say they are worried that AI will kill everyone are, with no exceptions I know about, sincerely worried AI will kill everyone.
    1. Even if you think their arguments and reasons are stupid or motivated.
  2. Those who say they are not worried AI will kill everyone are, most of the time, not so worried that AI will kill everyone.
    1. Even if you think their arguments and reasons are stupid or motivated.
  3. A bunch of people have, in good faith, concerns and opinions you disagree with.

(Dean Ball there also notes the use of the term ‘traitor.’ That one is… complicated, but yes I have made a deliberate choice to avoid it and encourage others to also do so. It is also a good example of how so many in politics, on all sides, often use such rhetoric.)

My current understanding is the first suspect was a participant of the PauseAI (Global) discord server, posting 34 messages none of which were explicit calls to violence. He was not a formal part of the organization, and participated in no formal campaigns.

We do not know how much of this is the rhetoric being used by PauseAI or others reflecting on this person, versus how much is that this is him being drawn to the server.

PauseAI has indeed unequivocally condemned this attack, which is good, and I believe those involved sincerely oppose violence and find it unacceptable, which is also good.

I think they still need to take this issue and the potential consequences of its choices on rhetoric more seriously than they have so far. Its statement here includes saying that PauseAI ‘is that peaceful path’ and avoiding extreme situations like this is exactly why we need a thriving pause movement. This is an example of the style of talk that risks inflaming the situation further without much to gain.

There is one thing that they are clearly correct about: You are not responsible for the actions of everyone who has posted on your public discord server.

I would add: This also applies to anyone who has repeated your slogans or shares your policy preferences, and it does not even mean you casually contributed at all to this person’s actions. We don’t know.

For the second attack, for now, we know actual nothing about the motivation.

But yes, if you find your rhetoric getting echoed by those who choose violence, that is a wake up call to take a hard look at your messaging strategy and whether you are doing enough to prevent such incidents, and avoid contributing to them.

Similarly, I think this statement from StopAI’s Guido Reichstadter was quite bad.

Speak The Truth Even If Your Voice Trembles

If one warns that some things are over the line or unwise to say, as I did above, one should also note what things you think are importantly not over that line.

Some rhetoric that I think is entirely acceptable and appropriate to use, if and only if you believe the statements you are making, include, as examples:

  1. Gambling with humanity’s future.’
  2. ‘If [X] then [Y]’ if your conditional probability is very high (e.g. >90%), or of stating your probability estimate of [Y] given [X], including in the form of a p(doom).
  3. Calling Mythos or something else a ‘warning shot.’
  4. Calling Mythos or other similarly advanced AIs a ‘weapon of mass destruction.’
  5. Most of all: To call all the act of creating minds more powerful than humans an existential threat to humanity. It obviously is one.

If you believe that If Anyone Builds It, Everyone Dies, then you should say that if anyone builds it, then everyone dies. Not moral blame. Cause and effect. Note that this is importantly different from ‘anyone who is trying to build it is a mass murderer.’

I could be convinced that I am wrong about one or more of these particular phrases. I am open to argument. But these seem very clear to me, to the point where someone challenging them should be presumed to either be in bad faith or be de facto acting from the assumption that the entire idea that creating new more powerful minds is risky is sufficiently Obvious Nonsense that the arguments are invalid.

Here is a document about how Pause AI views the situation surrounding Mythos. It lays out what they think are the key points and the important big picture narrative. It is a useful document. Do I agree with every interpretation and argument here? I very much do not. Indeed, I could use this document as a jumping off point to explain some key perspective and world model differences I have with Pause AI.

I consider the above an excellent portrayal of their good faith position on these questions, and on first reading I had no objection to any of the rhetoric.

False Accusations And False Attacks Are Also Unacceptable

There has been quite a lot of quite awful rhetoric in the other direction, both in general and in response to this situation. We should also call this out for what it is.

There are those who would use such incidents as opportunities to impose censorship, and tell people that they cannot speak the truth. They equate straightforward descriptions of the situation with dangerous calls for violence, or even attack any and all critics of AI as dangerous.

At least one person called for an end to ‘non-expert activism’ citing potential violence.

We have seen threats, taunting, deliberate misinterpretation, outright invention of statements and other bad faith towards some worried about AI, often including Eliezer Yudkowsky, including accusing people of threatening violence on the theory that of course if you believed we were all going to die you would threaten or use violence, despite the repeated clear statements to the contrary, and the obvious fact that such violence would both be immoral and ineffective.

This happened quite a bit around Eliezer’s op-ed in Time in particular, usually in highly bad faith, and this continues even now, equating calls for government to enforce rules to threats of violence, and there are a number of other past cases with similar sets of facts.

At other times, those in favor of AI accelerationism have engaged in threats of and calls for violence against those who oppose AI, on the theory that AI can cure disease, thus anyone who does anything to delay it is a murderer. The rhetoric is the same all around.

Some Examples Of Attempts To Create Broad Censorship

This is from someone at the White House, trying to equate talking about logical consequences with incitement to violence. This is a call to simply not discuss the fact that if anyone builds superintelligence, we believe that it is probable that everyone will die.

I that kind of attack completely unacceptable even from the public, and especially so from a senior official.

One asks what would happen if we applied even a far more generous version of this standard to many prominent people, including for example Elon Musk, or other people I will decline to name because I don’t need to.

Here is the Platonic form:

Shoshana Weissmann, Sloth Committee Chair: This is insane behavior. And those promoting the idea of AI ending humanity are contributing to this. It has to stop.

As in, you need to stop promoting the idea of AI ending humanity. Never mind how you present it, or whether or not your statement is true. No argument is offered on whether it is true.

This is the generalization of the position:

florence: It would appear that, according to many, one of the following are true:

1. It is a priori impossible for a new technology to be an existential threat.
2. If a new technology is an existential threat, you’re not allowed to say that.

Indeed, one of the arguments people often literally use is, and this is not a strawman:

  1. You straightforwardly say sufficiently advanced AI might kill everyone.
  2. But if someone did believe that, they might support using violence.
  3. Therefore you can’t say that, or we should be able to use violence against you.

While I don’t generally try to equate different actions, I will absolutely equate implicit calls for violence in one direction to other implicit calls for violence or throwing your political enemies in jail for crimes they obviously are not responsible for, indeed for the use of free speech, in the other direction, such as this by spor or Marc Andreessen.

Nate Soares (MIRI): “even talking about the extinction-level threat is incitement towards violence”

No. High stakes don’t transform bad strategies into good ones. Let’s all counter that misapprehension wherever we find it.

michael vassar: This is probably my number one complaint about the current culture. The false dichotomy between ‘not a big deal, ignore’ and ‘crisis, panic, centralize power and remove accountability’.

That’s the same thing or worse, especially in this particular case, where the accusation is essentially ‘you want government to pass and enforce a law, we don’t like that, therefore we want the government to arrest you.’

There is also the version, which I would not equate the same way, where #3 is merely something like ‘so therefore you have a moral responsibility to not say this so plainly.’ For sufficiently mid versions, as I discuss above, one can talk price.

A variation is when someone, often an accelerationist, will say:

  1. These people claim to be worried about AI killing everyone.
  2. But you keep condemning violence.
  3. Therefore, you must not care about these supposed beliefs.

Or here’s the way some of them worded it:

bone: Nice to see all the LessWrong people fold completely on their philosophy. Very good for humanity. They have no beliefs worth dying or killing for. It’s nonsense from a guy who never had the balls to stand up for his words once push came to shove.

Yudkowsky stands for nothing.

bone: remember: if they actually believe all this stuff and they are unwilling to be violent, it means they are cowards, that they refuse to measure up to their own words, that they will not do what they believe needs to be done to save mankind.

they are weak, they believe in nothing

Zy: AI doomers are like “attacking key researchers in the AI race is an ineffective strategy to prevent AI doom which pales in comparison to my strategy of paying them $200 a month to fund capabilities research”

Lewis: Rare Teno L. If you actually think Sam Altman is going to genocide children then it makes sense to try to hurt him. So you need to pick one. It’s either completely insane or it’s totally sensible. Which one is it?

L3 Tweet Engineer (replying to Holly Elmore): If you’re such a good person, and stopping AI is so important, why don’t you go bomb a data center? Why waste your breath tweeting about this stuff and writing grand narratives, go make it happen.

phrygian: You’ve already talked about how it would be moral to nuke other countries to stop asi. The only logical reasons you have for not engaging in smaller forms of violence to stop ASI is that they aren’t as effective. On a fundamental level, your views justify violence of any kind.

Ra: maybe this is just me and explains some things about me, but *personally* i would much rather be seen as a potentially dangerous radical than as a feckless and insincere grifter, especially if i believed the world was ending soon and was personally responsible for stopping that.

Trey Goff: Look do you people not realize how silly you look

“AI is going to literally kill your children and all future humans, but we strongly condemn any violence committed in order to stop that from happening”

Have the courage of your convictions or STFU

The Platonic version of this is the classic: ‘If you believed that, why wouldn’t you do [thing that makes no sense]?’

The trap or plan is clear. Either you support violence, and so you are horrible and must be stopped, or you don’t, in which case you can be ignored. The unworried mind cannot fathom, in remarkably many cases, the idea that one can want to do only moral things, or only effective things, and that the stakes being higher doesn’t change that.

Teortaxes: Uncritical support
this is a bad faith attempt to elicit a desirable mistake
essentially a false flag by proxy of stupidity
I think decels are holding up well btw

Eliezer started a thread to illustrate people using such tactics, from which I pulled the above examples, but there are many more.

João Camargo (replying to a very normal post by Andy Masley): No one believes you actually think this. If you think that Altman and other pivotal AI leaders/researchers will likely bring human extinction, assassinations are clearly justified. “This guy is gonna cause human extinction, but no one must prevent him by force” is not coherent.

Other times, they simply make fun of Eliezer’s hat.

Or they just lie.

taoki: i assume eliezer yudkowsky and his pause ai friends love this?

Oliver Habryka: False, they definitely hate it.

taoki (May 6, 2024): also, i LIE like ALL THE TIME. JUST FOR FUN.

Or they flat out assert ‘oh you people totally believe in violence and all the statements otherwise are just PR.’

Another tactic of those trying to shut down mention of the truth of our situation is to attack both any attempt to put a probability on existential risk, and also anyone who (in a way I disagree with, but view as reasonable) treats existential risk as high likely if we build superintelligence soon on known principles, including dismissing any approach that takes any of it seriously as not serious, or that it is ‘using probability as a weapon’ to point out that the probability of everyone dying if we stay the current course is uncomfortably and unacceptably high.

I close this section by turning it over to Tenobrus:

Tenobrus: “stochastic terrorism” is, quite frankly, complete fucking bullshit. it’s a unfalsifiable term used to try to tie your political opponents speech to actions that have fucking nothing to do with them, attempting to weaponize tragedy and mental illness for debate points. it was bullshit when AOC tried to accuse the republicans of “stochastic terrorism” for criticizing her, it was bullshit when the right claimed the left was committing “stochastic terrorism” for engaging in anti-ICE protests, and it remains bullshit now when you assign responsibility for attacks against sam altman to AI safety advocates and journalists who wrote negative things about him.

fuck your garbage rhetorical device! that’s not how responsibility or blame works! you do not get to suppress any and all speech you disagree with and can find a way to vaguely deem “dangerous”!

Kitten: “who will rid me of this troublesome priest”

Tenobrus: yeah that’s an entirely different thing. that’s not “stochastic terrorism” dawg that’s just straightforward incitement of violence.

Grant us the wisdom to know the difference.

The Most Irresponsible Reaction Was From The Press

I really do not understand how you can be this stupid. I realize that yes, you could still get this information if you wanted it, but my lord this is nuts from the SF Standard.

The San Francisco Standard: Just in: Sam Altman’s home appears to have been the target of a second attack Sunday morning, a mere two days after a 20-year-old man allegedly threw a Molotov cocktail at the property: Jonah Owen Lamb.

spor: printed his home address and even added a picture of the exterior, for good measure… in an article about how his home is being targeted by psychos that want to kill him !!!

this reporter, their editor, and the entire Standard should be ashamed of this

Mckay Wrigley: this is absolutely disgusting and anyone involved in the publishing of this has absolutely zero morals.

Sam Altman Reacts

Sam Altman has my deepest sympathies in all of this. This must be terrifying. No one should have a Molotov Cocktail thrown at their house, let alone face two attacks in a week. I hope he is doing as well as one can when faced with something like this, and that he is staying safe.

I have no idea how I would respond to such a thing if it happened to me.

Sam Altman’s public reaction was to post this statement.

I very much appreciate that Sam Altman has explicitly said that he regrets the word choice in the passage below. ‘Tough day’ is absolutely a valid excuse here, and most of the statement is better than one can reasonably expect in such circumstances given Altman’s other public statements on all things AI.

But I do need to note that this importantly missed the mark and the unfortunate implication requires pushback.

Sam Altman (CEO OpenAI): Words have power too. There was an incendiary article about me a few days ago. Someone said to me yesterday they thought it was coming at a time of great anxiety about AI and that it made things more dangerous for me. I brushed it aside.

Now I am awake in the middle of the night and pissed, and thinking that I have underestimated the power of words and narratives. This seems like as good of a time as any to address a few things.

The article in question, presumably the piece in The New Yorker I discussed at length last week, was an extremely long, detailed and as far as I could tell fair and accurate retelling of the facts and history around Sam Altman and OpenAI. To the extent it was incindiacy, the facts are incendiary.

Those who are not Sam Altman do not get the same grace, when they say things like this in reference to that article:

Kelly Sims: It turns out when you string a bunch of quarter-truths together exclusively from someone’s bitter competitors it has consequences.

Given what we know about who attacked Altman, and various details, I find it unlikely that the timing of these two events was meaningful for the first attack. My guess is the trigger to someone already ready to blow was anxiety around Mythos, but even if it that article was the triggering event, it was not an example of irresponsible rhetoric.

For the second attack, unfortunately, we should worry that it was triggered in large part by coverage of the first attack, including publishing details about Altman’s home.

Sam Altman Reflects

The rest of the post is personal reflections and predictions about AI overall, so I’m going to respond to it the way I would any other week.

Sam Altman (CEO OpenAI): [AI] will not all go well. The fear and anxiety about AI is justified; we are in the process of witnessing the largest change to society in a long time, and perhaps ever. We have to get safety right, which is not just about aligning a model—we urgently need a society-wide response to be resilient to new threats. This includes things like new policy to help navigate through a difficult economic transition in order to get to a much better future.

AI has to be democratized. … I do not think it is right that a few AI labs would make the most consequential decisions about the shape of our future.

Adaptability is critical. We are all learning about something new very quickly; some of our beliefs will be right and some will be wrong, and sometimes we will need to change our mind quickly as the technology develops and society evolves. No one understands the impacts of superintelligence yet, but they will be immense.

Altman is essentially agreeing with his most severe critics, that he should not be allowed to develop and deploy superintelligence on its own. He tries to have it both ways, where he says things like this and also tries to avoid any form of meaningful democratic control when time comes to pass laws or regulations.

His call for adaptability is closely related to the idea of building the ability to control development and deployment of AI, and having the ability to pause in various ways, should we find that to be necessary.

His disagreement is that he thinks we collectively should want him to proceed. Which might or might not be either the decision we make, or a wise decision, or a fatal one.

He mentions that it ‘will not all go well’ but this framing rejects by omission the idea that there is existential risk in the room, and it might go badly in ways where we cannot recover. To me, that makes this cheap talk and an irresponsible statement.

The second section is personal reflections.

He believes OpenAI is delivering on their mission. I would say that it is not, as their mission was not to create AGI. The mission was to ensure AGI goes safety, and OpenAI is not doing that. Nor is Anthropic or anyone else, for the most part, so this is not only about OpenAI.

He calls himself conflict-averse, which seems difficult to believe, although if it is locally true to the point of telling people whatever they want to hear then this could perhaps explain a lot. I was happy to hear him admit he handled the situation with the previous board, in particular, badly in a way that led to a huge mess, which is as much admission as we were ever going to get.

His third section is broad thoughts.

​My personal takeaway from the last several years, and take on why there has been so much Shakespearean drama between the companies in our field, comes down to this: “Once you see AGI you can’t unsee it.” It has a real “ring of power” dynamic to it, and makes people do crazy things. I don’t mean that AGI is the ring itself, but instead the totalizing philosophy of “being the one to control AGI”.

We can all agree that we do not want any one person to be in control of superintelligence (ASI/AGI), or any small group to have such control. The obvious response to that is ‘democracy’ and to share and diffuse ASI, which is where he comes down here. But that too has its fatal problems, at least in its default form.

If you give everyone access to superintelligence, even if we solve all our technical and alignment problems, and find a way to implement this democratic process, then everyone is owned by their own superintelligence, in fully unleashed form, lest they fall behind and lose out, or is convinced of this by the superintelligence, and we quickly become irrelevant. Humanity is disempowered, and likely soon dead.

Thus if you indeed want to do better you have to do Secret Third Thing, at least to some extent. And we don’t know what the Secret Third Thing is, yet we push ahead.

He concludes like this:

Sam Altman (CEO OpenAI): A lot of the criticism of our industry comes from sincere concern about the incredibly high stakes of this technology. This is quite valid, and we welcome good-faith criticism and debate. I empathize with anti-technology sentiments and clearly technology isn’t always good for everyone. But overall, I believe technological progress can make the future unbelievably good, for your family and mine.

While we have that debate, we should de-escalate the rhetoric and tactics and try to have fewer explosions in fewer homes, figuratively and literally.

It is easy to agree with that, and certainly we want fewer explosions. But it is easy for calls to ‘de-escalate’ to effectively become calls to disregard the downside risks that matter, or to not tackle seriously with the coming technical difficulties, dilemmas and value clashes, or to shut down criticism and calls to action of all kinds.

Violence Is Never The Answer

Once again: I condemn these attacks, and any and all such violence against anyone, in the strongest possible terms. I do this both because it is immoral, and also because it is illegal, and also because it wouldn’t work. Nothing hurts your cause more.

My sympathies go out to Sam Altman at this time, and I hope he comes through okay.

Most people worried about AI killing everyone have handled this situation well, both before and after it happened, and not only take strong stances against violence but also use appropriate language, at a standard vastly higher than that of any of:

  1. Those who are worried about those worried about AI killing everyone.
  2. Those who are worried about mundane AI concerns like data centers or job loss.
  3. Politicians and ordinary citizens of both major American political parties, and the media, on a wide variety of issues.

I call upon all three of those groups of people to do way better across the board. Over a several year timeline, I predict that most concern about AI-concern-related violence will have nothing to do with concerns about existential risk.

But there are a small number of those worried about AI existential risks who have gone over where I see the line, as discussed above, and I urge those people to cut it right out. I have laid out my concerns on that above. We should point out what actions have what consequences, and urge that we choose better actions with better consequences, without having to call anyone murderers or evil.

Eliezer has an extensive response on the question of violence on Twitter, Only Law Can Prevent Extinction, that echoes points he has made many times, in two posts.

I also condemn those who would use who use this situation as an opportunity to call for censorship, to misrepresent people’s statements and viewpoints, and generally to blame and discredit people for the crime of pointing out that the world is rapidly entering existential danger. That, too is completely unacceptable, especially when it rises to its own incitements to violence, which happens remarkably often if you hold them to the standards they themselves assert.

 

 

 

 



Discuss

AI Safety's Biggest Talent Gap Isn't Researchers. It's Generalists.

Новости LessWrong.com - 13 апреля, 2026 - 17:47

This post was cross posted to the EA Forum

TL;DR: One of the largest talent gaps in AI safety is competent generalists: program managers, fieldbuilders, operators, org leaders, chiefs of staff, founders. Ambitious, competent junior people could develop the skills to fill these roles, but there are no good pathways for them to gain skills, experience, and credentials. Instead, they're incentivized to pursue legible technical and policy fellowships and then become full-time researchers, even if that’s not a good fit for their skills. The ecosystem needs to make generalist careers more legible and accessible.

Kairos and Constellation are announcing the Generator Residency as a first step. Apply here by April 27.

Epistemic status: Fairly confident, based on 2 years running AI safety talent programs, direct hiring experience, and conversations with ~30 senior org leaders across the ecosystem in the past 6 months.

The problem

Over the past few years, AI safety has moved from niche concern toward a more mainstream issue, driven by pieces like Situational Awareness, AI 2027, If Anyone Builds It, Everyone Dies, and the rapidly increasing capabilities of the models themselves.

During this period, over 20 research fellowships have launched, collectively training thousands of fellows, with 2,000-2,500 fellows anticipated this year alone[1]. The talent situation for strong technical and policy researchers is far from solved, but meaningful progress has been made.

The story for non-research talent is very different. By our count, there are roughly 7 fellowships for non-research talent (producing around 300 fellows this year[2]), spread thin across an array of role types. As a result, many critical functions within AI safety remain acutely talent-constrained.

More broadly, the ecosystem has a lot of people who are great at thinking about ideas. We need more people who are great at thinking about people and projects. Read more about this here.

The consistent feedback we hear from senior people across the ecosystem is that the hardest roles to fill are not research roles. They are:

  • Generalists: operators, executors, fieldbuilders, people and program managers, grantmakers, recruiters. People who can ideate, manage, and execute a broad range of non-research projects.
  • Founders, both technical and non-technical, for new research and non-research organizations.
  • Communications professionals who can work on policy and research comms.
  • Chief-of-Staff types who can support senior leaders and multiply their impact.
  • Senior operational people with domain expertise in areas like cybersecurity, policy, or large-scale project management.

Based on our experience and anecdotes from organizations in our networks[3], many organizations trying to hire find that research postings attract dozens of qualified applicants, while non-research postings often surface only 0-5 applicants who meet the core requirements (strong mission alignment, meaningful AI safety context, and general competence) despite receiving hundreds of applications.

Why the pipeline is broken

The fellowship landscape is massively skewed toward research. 

Around 20 research fellowships together produce 2,000-2,500 fellows per year. For fieldbuilding, the current options are essentially Pathfinder (where the vast majority of fellows still intend to pursue research careers) and a few dedicated fieldbuilding spots at Astra. These produce an estimated 5-10 fieldbuilding generalists hired per year. This asymmetry signals that the primary route into full-time AI safety work runs through research. And, while research is a core part of safety, it is also necessary to find and develop people who can manage research projects, run organizations, and implement and communicate research ideas.

There is no clear career ladder for generalists. 

A research-oriented person has a well-worn trajectory: BlueDot → ARENA → SPAR → MATS → junior researcher → senior researcher. And while this path isn't perfect, nothing comparable exists for generalists. The typical route involves running a strong university group, then hope you get hired directly at a fieldbuilding org, with no intermediate steps or clear progression path afterwards. The risk discourages people who might otherwise be excellent generalists from committing to the path.

There is no credentialing or proving ground. 

Unlike research, where fellowship participation provides a track record and hiring signal, aspiring generalists have no equivalent way to demonstrate competence. Organizations won't hire untested junior talent for critical operational roles, but there's nowhere for junior talent to get tested[4].

There is no routing infrastructure. 

Matching people to opportunities happens through ad hoc referrals and personal networks. This doesn't scale, and it means we regularly miss promising candidates. As the field has matured and institutional structure has grown, coordination overhead and established networks make it harder for aspiring generalists to self-start projects and stand out the way that was possible a few years ago. 

Why this matters now

We believe that there are now more good policy and technical ideas ready for implementation than there is coordination ability and political will to implement them in governments and AI companies. On the margin, we think we're receiving smaller returns from additional researchers entering the field, especially outside the top 10% of research talent. It’s also plausible that AI safety research will be automated more quickly during takeoff than most other types of work.

Many expect the funding landscape for AI safety will expand significantly over the next two to three years, which makes this bottleneck more urgent. More capital will be available, but without the people to deploy it effectively, that capital will stay inert. This already appears to be a bottleneck for current grantmakers, and it could get much worse.

Naively, we expect the world to get a lot weirder as capabilities progress. In a world where the demands on the AI safety ecosystem rapidly increase and evolve, training people with strong thinking, agency, and executional abilities, rather than narrow technical skills, seems highly leveraged.

This is particularly important because it enables us to diversify our bets and cover a large surface of opportunities for impact. There’s no shortage of project ideas for growing the field of AI safety, scaling up our policy efforts, or communicating to the public, but we simply don’t have enough talent to plan, design, and execute on all of them. Our bottleneck isn’t funding or ideas, it’s people.

Counter-Arguments

"You said hundreds of people are applying to these roles. Why can't some of them be good fits? Aren't there many people who could fill operations positions?"

We draw a distinction between "hard ops" and "soft ops." Hard ops roles (finance, legal, HR, etc.) benefit from  expertise, and hiring experienced professionals without AI safety context is typically sufficient. Soft ops roles (program management, talent management, generalist positions, etc,) are different. Domain expertise matters less than having strong inside-view models of the field and generalist competency. Succeeding in these roles requires real mission alignment and enough context to spot high-EV opportunities that someone without that background would miss.

"I'm not sure I agree that research talent is less important than generalist talent."

We're deliberately not making a strong comparative claim about the impact of generalists versus technical and policy researchers. What we are saying is that generalist talent is currently the binding constraint. It is harder to source than research talent and, in our models, represents the tighter bottleneck for the ecosystem's ability to convert funding and ideas into impact.

"How important is generalist talent in shorter timelines worlds?"

Our sense is that generalist talent is crucial across all timelines. While shorter timelines do compress the window for upskilling, our experience is that motivated junior people can skill up relatively quickly and help add urgently needed capacity, making the counterfactual value of pipeline-building here quite high even in shorter timeline worlds (sub 3 years).

"You argue there are all these research fellowships and no programs for non-research talent. But couldn't those programs just produce generalists?"

The existing research fellowships are well-optimized and have a strong track record of producing researchers who get placed into AI safety roles. Some fellows have gone on to non-research roles, but anecdotally this is rare. These programs seem to have a much stronger track record of taking talent who are open to different career paths and funneling them toward research, than of producing researchers who are open to different career paths.

"Aren't there a lot of non-research roles currently in AI safety?"

A few hundred people do this work today versus a few thousand researchers. There used to be a steadier stream of talent aiming for these roles, but short-timelines anxiety, the expansion of research programs, and the disappearance of some entry points that used to exist have contracted the pipeline considerably.

The Generator Residency

As a first step toward addressing these problems, Constellation and Kairos are announcing the Generator Residency: a 15-30 person, 3-month program focused on training, upskilling, credentialing, and placing generalists. The program runs June 15 through August 28, 2026 and applications close April 27.

Learn more and apply here

How it works:

Residents will work out of Constellation and receive ideas, resources (funding, office space), and mentorship from successful generalists at organizations like Redwood, METR, AI Futures Project, and FAR.AI.

For the first few weeks, residents will write and refine their own project pitches while meeting the Constellation network and building context in the field. They will then create and execute roughly 3-month projects, individually or in groups, with generous project budgets. Throughout the program, we’ll provide seminars, 1:1s, and other opportunities for residents to deeply understand current technical and policy work, theories of change, and gaps in the ecosystem.

During and after the program, we’ll support residents in finding roles at impactful organizations, spinning their projects into new organizations, or having their projects acquired by existing ones. Selected residents can continue their projects for an additional three months (full-time in-person or part-time remote), with continued stipend, office access, and housing.

We hope to place a majority of job-seeking residents into full-time roles at impactful organizations within 12 months of the program ending.

Examples of projects we’d be excited about hosting include:

  • Workshops and conferences: Run a domain-specific conference like ControlConf or the AI Security Forum, or one that brings new talent into AI safety like GCP, targeting high-leverage new audiences or emerging subfields.
  • AI comms fellowship: Design and manage a short fellowship for skilled communicators to produce AI safety content. Draft a curriculum, identify mentors, secure funding, and prepare a pilot cohort.
  • Recruiting pipelines: Partner with 2-3 small AI safety orgs to build the systems they need to scale: work tests, candidate sourcing, referral pipelines.
  • Travel grants program: Fund visits to AI safety hubs like LISA and Constellation by promising students and professionals. Set criteria, build an application flow, line up partner referrals, and run a pilot round.
  • Shared compute fund: Scope a fund to cover compute needs of independent safety researchers, model whether a cluster is needed, and distribute a pilot round of grants.
  • Strategic awareness tools: Scale AI-powered superforecasting and scenario planning in safety infrastructure, build support among impactful stakeholders, and run a pilot.
  • AI policy career pipeline: Build workshops, practitioner talks, and handoffs into policy career programs to route talent toward the institutions shaping policy.
  1. ^

    This estimate draws on a separate analysis that projected the number of fellows using both publicly and privately available information, as well as extrapolations from actual data through late 2024. The fellowships included in this analysis were: AI Safety Camp, Algoverse (AI Safety Research Fellowship), Apart Fellowship, Astra Fellowship, Anthropic Fellows Program, CBAI (Summer/Winter Research Fellowship), GovAI (Summer/Winter Research Fellowship), CLR Summer Research Fellowship, ERA, FIG, IAPS AI Policy Fellowship, LASR Labs, PIBBSS, Pivotal, MARS, MATS, SPAR, XLab Summer Research Fellowship, MIRI Fellowship, and Dovetail Fellowship.

  2. ^

    The programs included in this analysis were: Tarbell (AI Journalism), Catalyze Impact Incubator (AI Safety Entrepreneurship), Seldon Lab (AI Resilience Entrepreneurship), Horizon Institute for Public Service Fellowship (US AI Policy/Politics), Talos Fellowship (EU AI Policy/Politics), Frame Fellowship (AI Communications), and The Pathfinder Fellowship. Fellow counts were derived primarily from publicly available data.

  3. ^

    We're deliberately vague about which organizations we're referring to here since we haven't asked permission to disclose the outcomes of recent hiring rounds. For research roles, we're mainly referring to technical AI safety nonprofits, policy nonprofits, and think tanks. For non-research roles, we're mainly referring to fieldbuilding nonprofits and technical and policy nonprofits that have recently tried hiring non-research talent requiring meaningful AI safety context beyond a BlueDot course.

  4. ^

    Several years ago, aspiring generalists could more easily test their fit by self-starting projects in an ecosystem with minimal infrastructure and ample white space. As the field has grown, more institutional structure exists, and with it, more coordination overhead. The blank slate is gone, and the ecosystem's complexity now deters people without strong inside-view models, reputations, or existing connections from trying ambitious projects. We're not sure this is net negative in most cases, but it does mean fewer people gain the experience needed to position themselves for these roles.



Discuss

Clique, Guild, Cult

Новости LessWrong.com - 13 апреля, 2026 - 17:32

This is the first in a sequence of articles on organizational cultures, inspired largely by my experiences with the LessWrong meetup community.

Clique Guild Cult Small Medium Any size Exit Voice Loyalty Consensus Majority Counsel Deontology Consequentialism Virtue
  1. "Let's talk this over"
  2. "This isn't working out"
  3. "Point of order, Mr. Chairman"
  4. "Verily I say unto you"
  5. "This isn't what our Founder would've wanted"
  6. "If you don't like it, you can leave"
  7. "Yeah, so if you could go ahead and get that done, that'd be great"
  8. "Carried u-... [nervous glances] ...-nanimously"
Clique

A clique is a small, intimate group of friends who all know each other very well. If you're in a clique, you might not know what kind of culture you're in because there might never have been any significant sources of conflict. But if there are, they will be addressed in one of two ways.

#1. "Let's talk this over"

An egalitarian clique will put great effort into resolving conflicts through interpersonal connections in order to keep the group together. This may involve long hours on the metaphorical therapist's couch - NVC, Authentic Relating, etc. - or perhaps, if two friends have a falling-out, a mutual friend of theirs might try to help smooth things over.

#2. "This isn't working out"

In a more authoritarian clique, people will be quicker to concede that their differences are irreconcilable and that the group (at least in its current form) should break up. However, this is seen by all parties as a fairly benign outcome, since there is not much investment in the clique "as such" (rather, the investment is in the individual 1:1 relationships) and it's not hard to start a new one. There is no sense that somebody needs to be "right" and somebody else "wrong".

Guild #3. "Point of order, Mr. Chairman"

A guild is a medium-sized group where each member may have a few close connections, but will have a much larger number of "weak ties" that are connected to them only indirectly. However, the group is united (and distinguished from the wider society) by a shared institutional identity that makes it "a thing" and not merely a collection of individuals or cliques. This manifests in the use of bureaucratic procedures to resolve conflicts, since the group is too large to expect unanimity, and entrenched enough that schism is seen as more undesirable than having some disagreement over any particular decision.

In my opinion, the guild has become something of a lost art, which ought to be revived. (Future articles will go into this point further.)

Cult

A cult is a group based on personal authority. This authority derives from the inherent virtue of the leader (charisma, strength, wealth, etc.) and not any notion of popular support. A cult's size can exceed Dunbar's Limit because it is held together not by the members' relationships with each other, but by their loyalty to the leader. However, small- and medium-sized cults can also exist, and are perhaps more common than large cults. (Rare is the person who has what it takes to lead a large cult, but you may find yourself at the center of a small cult quite inadvertently.)

#4. "Verily I say unto you"

What the leader says, goes. Members are expected to subordinate their own will and desires to that of the leader. They may advise the leader one way or another, and may bring their disputes to him/her for resolution, but the leader has the ultimate authority and responsibility for the decision.

However, in addition to this "straightforward" kind of cult, there are also various kinds of dysfunctional cults, which (perhaps) give the rest a bad name.

Fractious cults #5. "This isn't what our Founder would've wanted"

If a cult loses its leader, and if the leader has not raised up a worthy successor, the group will find itself in an unstable zone where its culture is too egalitarian to persist in its super-Dunbar size, because there was never any hierarchy amongst the rank-and-file, only in relation to the leader. Therefore, the group will decay into a more stable configuration (indicated by the dotted arrows), either by someone gaining sufficient personal virtue to become the new leader, or (more likely) splitting into several cliques or guilds, each of which will claim to be the legitimate heir of the original group.

Embarrassed cults

A cult is "embarrassed" when it doesn't want to admit that it's a cult, because the leadership lacks the personal virtue necessary to operate a straightforward cult but still wants to maintain control. They may do this through some combination of pretending that the group's culture is more egalitarian than it actually is, and/or pretending that its size is smaller than it actually is. (This is denoted on the diagram by an arrow with an open circle on its base - the arrowhead is what the group pretends to be, and the base is what the group really is.)

#6. "If you don't like it, you can leave"

"...but we know you're not going to."

The leader of such a group may pretend that they are not claiming any personal authority at all, but "just" observing that the current clique isn't working out (see #2). However, there is an obvious asymmetry in that it is one particular party who is taunting the other one to quit, and not vice-versa. Therefore the subordinate party stands to lose a lot more, and is thus likely to accept a considerable amount of dissatisfaction before they finally decide to leave.

#7. "Yeah, so if you could go ahead and get that done, that'd be great"

In the classic corporate dystopia, HR and management want you to think of your team as a small clique, so that your desire for personal connection will be redirected towards the company. They may ask for your opinion, but have no intention of listening to it. Critics rightly warn young professionals against getting sucked into environments like this, where one is prone to being manipulated into accepting substandard pay and working conditions. The warning usually given is: You should be as loyal to the company as they are to you, i.e. not at all.

#8. "Carried u-... [nervous glances] ...-nanimously"

(Clip 1, clip 2)

A group may put on the trappings of a guild to disguise the fact that it is still exercising top-down authority rather than being a bottom-up enterprise. For example, in a typical homeowner's association (HOA), there was never any point at which a group of homeowners got together and decided they wanted to form an HOA. Rather, what usually happens is that a developer buys a large plot of land, builds a bunch of houses on it, and creates an HOA whose membership attaches to each house, which are then sold one-by-one to buyers who otherwise have no connection to each other. Most of the homeowners thus have no real interest in participating in the HOA, but begrudgingly accede to the edicts of a handful of busybodies who have too much time on their hands.

Evolution of a growing clique

The culture of a clique may at first be ambiguous (A) because there is nothing really at stake. As it grows, however, if it does not simply break up, it will need to either follow the egalitarian path and become a guild (B), or the authoritarian path and become a cult (C). And in the latter case, the cult will inevitably be an embarrassed one, because if there had been someone with the requisite virtues to be a cult leader, the group would never have spent much time as a clique in the first place, but would have been a straightforward cult (D) from the beginning, and maintained its cultiness throughout its growth.

Therefore, as is probably clear by now, I think outcome C is bad and B is better. If a group has landed at C, then it may with great effort be pulled kicking-and-screaming to B - but this is likely to ruffle some feathers.

(I also suspect that there is a tendency for groups to get stuck at the "triple point" with around 30 members, in an uncomfortable equilibrium between all three types because the group cannot decide what it wants to be.)

What's so great about guilds? (Plan of the sequence)

Forthcoming articles in this sequence will lay out a case for why we should want more guild-like organizations to exist. (Links will be added as the articles are posted.)

  • A guild can grow larger than a clique (This article)
  • A guild makes it possible to improve things without schism (Fear of crowding out)
  • A lack of guilds leads to a general malaise and atrophy of democratic values (We live in a society)
  • A guild can contribute to the social fabric in a way more ambitious than cliques (Call for machers)
  • A guild can be more robust than a cult because it can better distribute important responsibilities ("Community organizer" is a double oxymoron)

Other articles (Society is a social construct, pace Arrow; Rubber stamp errors; Anti-civicality) will discuss various norms that are necessary for a guild to function well, but which may seem strange or unintuitive for people who are accustomed to cliques or cults. I will conclude with a reflection (So are you some kind of communist?) on the tension between social and individual moralities.



Discuss

We need Git for AI Timelines

Новости LessWrong.com - 13 апреля, 2026 - 12:04

I was recently reading the AI Futures' Q1 2026 timelines update and noted their quarterly updates (the last one being in December, with the release of the AI Futures Model) are struggling to keep pace with the thing they're trying to track.

The pace of AI development is incredibly fast and only hastening; Kokotajlo's shortened his timelines for an AC by 18 months (late 2029 to mid 2028) in a single update due to 4 specific parameter changes. Five days later, Anthropic announced Claude Mythos Preview, which arguably invalidated some of the said parameters before the ink had time to dry.

This isn't a criticism of the AI Futures Project; they do commendable work. To be clear, Kokotajlo and the AI Futures Project are arguably the best at what they do in the world. His track record is remarkable, and AI2027 has sparked immense conversation about the future of AI/timelines (it's what got me into LW), but when the field changes completely in its pacing every two months, the community more often than not is navigating with an outdated map. And the problem is getting worse. Mythos hasn't yet been evaluated by METR, Spud hasn't released, and by the time the Q2 update drops, the field will have again shifted to another focal point.


But the cadence itself is the surface issue; updates aren't nearly granular enough to be tied back to each "step". When Kokotajlo updates his priors for an AC, we don't see the causal chain leading to each decision shortening his timelines by X amount. His rationale for the AC median being 1 year of autonomous work was that Opus 4.6 "impressed" him. But the actual definition of what 1 year even means remains muddy; the original AI2027 scenario had the median set at 6 months for an SC before moving it back to 3 years. The SC definition shift of 3y-1y accounted for around half of the 18 month shift in his Q1 update; the stated justification is Opus "impressed" him. Impressed how? At what point between December and April did he change his priors? The entire causal chain here collapses to a single word in a blog post.

In software engineering, this would be the equivalent to someone pushing a commit to main with a message "fixed stuff because it now works". You'd never accept that for code, so why would you accept that for a justifiable reason for the most important technological revolution in human history?

There's no unified platform where forecasters can independently publish their timelines with substantial backing/integration with the platform itself. Sure, you can write a Substack article, spin up a short LessWrong post, perhaps post a Twitter thread, but these are strung all over and are discontinuous for someone trying to get a concrete perspective of what different forecasters think. One might say Metaculus is the solution; while this is a way of congregating forecasts, it's still less than optimum. Conversation and rationale is walled behind "forecast and pay" without a congregational space to discuss the reasoning behind those forecasts (yes there is a comment feature but it is scarcely used). There was an excellent post around Broad Timelines that highlighted this; Metaculus highlights "medians" and less of a full distribution that's more sought after in our space.

As neo noted in said post, we need to "design info-UI tools that facilitate that (the timeline formulation) process". Broad distributions need platforms that can track how they update over time. A quarterly blog post cannot do that. Forecasts updated granularly over time with reasoning and deliberation behind them can.

Why I'm using Git here as an analogy; SWEing fixed this class of problem years ago. You had commits (changes in timeline predictions) diffs showing what changed, comments showing why they changed, branches for code (in this analogy, scenario) forks, blame for accountability (we need to be less wrong after all), and merge conflicts that require resolution rather than dissolving into Twitter discourse.


The minimum viable version of this is frankly embarrassingly simple. A GitHub repo with each forecaster maintaining a YAML file with their distribution for an agreed upon definition (whether it be an AC, SC, ASI etc.). Commits are updates to said files/timelines with rationale in the commit message.

Claude Opus 4.6 had a 80% time horizon of 70 minutes. Assuming Mythos has an 80% TH of ~240 min, the doubling time is ~34-40 days. Even if we're pessimistic at a time horizon of 180 minutes, the doubling time is still 45 days. The thing we're forecasting is now shorter than our update cycle.


The rationalist community, of all communities, should find that unacceptable.



Discuss

Treaties, Regulations, and Research can be Complements

Новости LessWrong.com - 13 апреля, 2026 - 10:04

I think the debate over whether AI risk should be addressed via regulation or treaties is often oversimplified, and confused. These are not substitutes. They rely on overlapping underlying capacities and address different classes of problems, and both van benefit from certain classes of research.

David Kreuger, to pick on someone whose work I largely agree with, recently posted that “Stopping AI is easier than regulating it.” I largely agree with what he says. Unfortunately, I also think it is an example[1] of advocates for a cause creating fights where they're not needed, and in this case making the discussions around AI unfortunately more rather than less contentious, and less rather than more effective.

And the reason the fights are not needed is that different risks live at different levels, and different tools are effective in different ways.

Clearly, many of the risks and harms of AI should not be addressed internationally. There is little reason or ability to harmonize domestic laws on fraud, discrimination, or liability, which would be a distraction from either reducing the harms or addressing other risks. Existing laws should be adapted and applied, and new regulations should be formulated where needed. International oversight would be unwieldy and ineffective for even most treaty compliance efforts - as other treaties show, there is a mix of national and international oversight. But domestic regulation can create liability incentives, require or standardize audits, clarify rules, and provide enforcement mechanisms and resources. All of those are at least sometimes useful for treaties as well. When Kreuger says “the way I imagine stopping AI is actually a particular form of regulating AI,” he is not talking about the harms and risks regulation could address - though given what he has said elsewhere, he agrees that many of them are worth mitigating, even if they are not his highest priority. So it should be clear that treaties will not, cannot, and should not address most prosaic risks of AI systems and misuse.

By the converse argument, which he and others have made convincingly in the past, some harms of AI systems come from racing towards capability rather than prioritizing safety. These types of risk emerge from the dynamics of international markets and from great power competition. Obviously, these dynamics aren’t well addressed by domestic regulation on the part of any single actor. It is incomprehensible to talk about regulation alone to address those risks, just like it is tendentious to talk about using international treaties to mitigate other classes of risks and harms of AI systems.

Unfortunately, many discussions put “we need a global treaty to stop AI risks” in opposition to “domestic regulation is the only realistic path.” Not only do I think this is backwards, but I’ll argue that so is the related false dichotomy of industry self-regulation versus government rules. Industries that embrace safety welcome well-built regulation. Even in areas where they don’t have strict rules, airlines have national bodies that manage risk and accident reporting. (And the AI industry leaders often claim to be the same way, wanting national or international rules - just not any specific ones.)

So, to come to my unsurprising conclusion, we actually have several different plausibly positive and at least partially complementary approaches. 

  1. Certain classes of research produce techniques like, evals, interpretability, human oversight approaches, control methods, and operationalizable definitions of specific risks. Some of these are dual use or net negative, but the parts that are useful are complementary to both regulation and treaties. 
  2. Regulation needs operationalized definitions of risks, measurable standards, concrete goals, auditable procedures and oversight methods, and investigatory tools. Many of these are enabled by specific forms of technical or policy safety research. 
  3. Treaties need shared definitions, clear goals, regulatory oversight and enforcement, credible verification, and both technical and regulatory methods to distinguish compliance from defection. Some of these are enabled by regulation, some by relevant research.

So we end up with a sort of triad, where research can enable measurement and definitions, and provide tools, regulation can force adoption and enforce usage of tools, and treaties can align incentives around defection dilemmas and provide common aims.

This doesn’t imply that most safety research is net risk-reducing, that most regulation is useful, or that most possible treaties will reduce risks. But it does say that they can be complementary. Some disagreements are substantive. But others are treating complementary approaches as mutually exclusive - and I think we should instead figure out common ground, which can make the fights about these issues both more concrete, and narrower.

  1. ^

    yet another example



Discuss

5 Hypotheses for Why Models Fail on Long Tasks

Новости LessWrong.com - 13 апреля, 2026 - 09:54

Written extremely quickly for the InkHaven Residency.

Like humans, AI models do worse on tasks that take longer to do. Unlike humans, they seem to do worse on longer tasks than humans do.

This is a big part of why the METR time horizon results make sense: because longer tasks are also “harder” for models, and more capable models can do longer tasks, we can use the length of tasks that the models can perform as a metric of model capability.

There’s a clear etiological or causal-historical explanation of why models do worse at long tasks: they’re probably trained on more short tasks and fewer long tasks. This is both because it’s easier to make shorter tasks, and because you can train models on more short tasks than longer tasks with a fixed compute budget.

But from the perspective of AI evaluations, it’s also worth considering mechanistic explanations that make reference only to how properties of long tasks interact with the AI system in deployment. Whatever the training story may be, the AI models as they currently exist have some property that makes long tasks genuinely harder for them in a way that tracks capability. Understanding what this property is could matter a lot for interpreting the METR time horizon and even for forecasting AI capabilities over time.

So here are five such possible hypotheses that explain why longer tasks seem consistently harder for current models, based in large part on my experience at METR.

Long tasks are less well defined, and require judgment or taste (which models are bad at). For a software engineer, a 1-minute coding task might involve composing a single 10 line function or running a relatively simple SQL query. By their very nature, these tasks tend to be easy to define and easy to score, with relatively objective success criteria and little human judgment involved. A 15 minute task may be implementing a relatively simple data processing script or fixing a simple bug: more complicated, but still relatively easy to score. In contrast, an 8 hour task likely involves substantial amounts of design taste (in ways that are harder to score), and month long tasks likely involve communicating with a stakeholder or building code with properties that are hard to algorithmically verify (e.g. maintainability). (This is also related to why algorithmically scorable longer tasks are harder to make.)

While the longer METR tasks are still algorithmically scored, they tend to require models to build sophisticated software artifacts or iteratively improve on experiment design, where taste plays a larger role in success. Since models seem to lack ‘taste’ of some sort, relative to humans of comparable execution ability (hence the complaints about AI Slop), this could explain why they do worse on longer tasks.

Long tasks require more narrow expertise (which models may not have). An important property of the METR task suite is that longer tasks should not be trivially decomposable into shorter tasks. That is, a 10 hour-task should not trivially be decomposable into 10 1-hour tasks, and 10 short math problems do not become a single longer math problem. Perhaps as an artifact of the property, many of METR’s longer tasks (and perhaps longer tasks in people’s day-to-day work in general) rely on more specialized procedural knowledge that is hard to easily acquire via Google. For example, many of METR’s long tasks are cryptographic or machine learning challenges that require some amount of procedural knowledge in the relevant fields to approach. Insofar as the long tasks are more likely to require procedural knowledge outside the AI models’ area of expertise, they may struggle.

Personally, I find this relatively unlikely as an explanation for the METR time horizon tasks (since AI models seem to have a lot of expertise in the relevant areas), but it might be a large explanation for the inability of AIs to autonomously complete large tasks in general.

Long tasks take models longer, leading to more stochastic failures (which models exhibit). A popular explanation that people cite is that tasks that take humans longer also take AI agents more steps to complete, and AI are not fully reliable, and fail with some small probability on each step. For example, Toby Ord raises this as a hypothesis in a response to our Time Horizon paper.

I think this is definitely part of the explanation (and why longer tasks are harder for humans as well), with some caveats: first, I caution against naively interpreting human time as proportional to AI steps and applying a constant hazard model. For example, it turns out that if you fit the failure rate model for AI agents over time, the failure rate goes down as the task goes on! Second, AI models seem to have different time horizons across different domains, and simple versions of this hypothesis cannot explain that phenomenon.

Long tasks take models longer, causing failures due to distribution shift or self conditioning (which models may suffer from). A related explanation is that longer tasks take models more off-distribution: base models (at least earlier on) were not trained to predict long sequences of model-generated outputs, and even RLVR’ed models were probably trained with short tasks, far shorter than the 16 hour, tens of millions of token tasks that we might ask them to do. This increases both the chance that the models are simply off distribution (and thus may be less competent in general), and the chance that they accumulate errors by chance and start conditioning on being the type of agent that makes such mistakes (and thus becoming more prone to make such mistakes). In the same way that naive versions of the constant hazard model seem contradicted by evidence, I suspect that naive versions of this hypothesis are also likely to fail. But it’s possible that more sophisticated versions may play a key role in explaining the phenomenon.

Long tasks require better time and resource management (which models struggle with). Finally, an explanation that I often think is neglected is that longer tasks tend to require meta-cognition and explicit strategy, which current models seem to struggle with. A 5-minute task such as writing a simple function or script can be done in one go, without much planning, but getting the best score in a machine learning experiment over 8 hours requires allocating scarce resources including remaining time and compute. It’s been observed that models understandably struggle a lot with understanding how much (wall clock) time they take to do particular tasks, or often double down on failing approaches instead of switching strategies.

I welcome more thinking on this topic, as well as more empirical work to distinguish between these hypotheses.



Discuss

My Cold Prevention Stack for 2026

Новости LessWrong.com - 13 апреля, 2026 - 09:04

I get sick a lot. Getting sick sucks. Maybe there are cheap and easy ways to get sick less? 

I asked LLMs[1] to read all the relevant literature reviews and figure out what supplements or medicine I should be taking to get sick less and make it suck less. I looked through the recommendations and did a little additional research to make sure the AIs weren’t making egregious mistakes, but I am not an expert—this should not be viewed as credible medical advice. 

Here is the quick list of steps I am currently taking or think might be useful to others. 

  1. Zinc lozenges: When you are starting to get sick, take zinc lozenges aggressively. They need to be a specific type of lozenge (Amazon, Life Extension). Suck, don’t chew, you’re trying to maximize the time they are dissolving in your mouth, and don’t eat/drink for 20 minutes after. Aim for one every 2 hours (~6 per day). Literature review. More notes in the appendix for this one. 
  2. Probiotics: For prevention, take specific probiotics once daily with a meal (Amazon). There are various products that have support in the literature, and it makes sense to buy one of them rather than a random probiotic (some strains appear to not work). The effect size is like a 25% reduction in colds—suspiciously high! Literature review
  3. Standard medication when sick for symptom relief (check for side effects and interaction with pre-existing conditions): NSAIDs (Ibuprofen) are primarily for headache, ear pain, and muscle and joint pain, Literature review; same for Acetaminophen (Tylenol). If chest congestion Mucinex. Nasal decongestants (e.g., sudafed) or a combination antihistamine-decongestant-analgesic (e.g., NyQuil Severe Cold & Flu) also might help with symptom relief. Note that oral phenylephrine has been deemed ineffective by the FDA, even though it’s common, so maybe use a different decongestant, specifically pseudoephedrine (available behind the pharmacy counter)? Literature review
  4. Obvious things to do when sick not necessarily backed by literature: Rest more, drink lots of water
  5. Physical things to get sick less: Wash your hands with soap and water, it’s probably good to use hand sanitizer before eating, it’s probably good to wear a mask in crowded indoor spaces (but the evidence isn’t very strong), also avoid touching your face if possible. Literature review for some of these. 
  6. Maybe take vitamin C megadoses regularly. The literature is mixed here and the side effects (stomach problems) are too much for me, but it might be good to take 1g of vitamin C daily. Literature review
  7. Maybe you should gargle salt water? I’m not sure, but it is cheap to try. 
  8. Maybe you should do nasal saline rinses? The literature is inconclusive but some people swear by it. If doing this, use distilled water. 
  9. Maybe you should get a flu shot. It reduces the chance of getting the flu and might make the flu less bad. But your chance of getting the flu is already pretty low, and the side effects of the vaccine are nontrivial for some people, so it’s not clearly worthwhile (I am surprised at how much this isn’t a slam dunk in favor of the vaccine). Literature review

If you work on important problems then getting your coworkers sick is bad for the world (in addition to bad for them). If you are going to work while sick, consider doing it from home. If you work from the office, you should wear a mask, wash your hands frequently (and especially before touching a bunch of communal stuff), and cover your cough/sneeze (not with your hand). 

Appendix on zinc lozenges

Zinc acetate Lozenges. 

What: When you start feeling any symptoms at all, or when you’ve been exposed, start sucking on zinc lozenges. Your goal is to coat your mouth and throat in zinc for basically as long as possible. So you should be sucking each lozenge for 20-30 minutes (don’t chew), and then don’t drink or eat anything for another 20 minutes. Aim for 5-7 lozenges in a day, once every two hours or so. 

What to buy: The particular lozenge probably matters a lot! The lozenges you want are big and slow to dissolve, Amazon link, manufacturer’s link (note that Amazon is frequently out of stock, and the manufacturer gives discounts for larger orders, I might buy 4 bottles at a time). 

Evidence basis: Literature review pointing to the fact that they might reduce cold duration some. The main counter evidence, an RCT finding either similar or worse recovery than placebo. 

Notes: 

  • Don’t use these all the time, only when you’re worried you’re getting sick. Zinc in such large quantities interferes with copper and iron absorption and probably has other downsides. 
  • Some people report stomach problems. 
  • I find that when taking these lozenges, my colds are much more mild than usual and I can usually work at least half my normal productivity while sick. 
  • I have heard many positive anecdotes about these. 
  • Some people don’t like the taste/texture. 
  • Other discussion on LessWrong
  1. ^

    Here’s a ChatGPT chat with an initial research report.



Discuss

Страницы

Подписка на LessWrong на русском сбор новостей