Вы здесь
Новости LessWrong.com
Only Law Can Prevent Extinction
There's a quote I read as a kid that stuck with me my whole life:
"Remember that all tax revenue is the result of holding a gun to somebody’s head. Not paying taxes is against the law. If you don’t pay taxes, you’ll be fined. If you don’t pay the fine, you’ll be jailed. If you try to escape from jail, you’ll be shot."
-- P. J. O'Rourke.
At first I took away the libertarian lesson: Government is violence. It may, in some cases, be rightful violence. But it all rests on violence; never forget that.
Today I do think there's an important distinction between two different shapes of violence. It's a distinction that may make my fellow old-school classical Heinlein liberaltarians roll up their eyes about how there's no deep moral difference. I still hold it to be important.
In a high-functioning ideal state -- not all actual countries -- the state's violence is predictable and avoidable, and meant to be predicted and avoided. As part of that predictability, it comes from a limited number of specially licensed sources.
You're supposed to know that you can just pay your taxes, and then not get shot.
Is there a moral difference between that and outright banditry? To the vast majority of ordinary people rather than political philosophers, yes.
"Violence", in ordinary language, has the meaning of violence that is not predictable, that is not avoidable, that does not come from a limited list of sources whose rules people can learn.
Violence that is predictable and avoidable to you, whose consequences are regular and not chaotic, can of course still be terribly unjust and not to your own benefit. It doesn't rule out a peasant being told to hand over two thirds of their harvest in exchange for not much. It doesn't rule out your rent becoming huge because it's illegal to build new housing, etcetera etcetera. Laws can still be bad laws. But it is meaningfully different to the people who live under those unjust laws, if they can at least succeed in avoiding violence that way.
The point of a "state monopoly on violence", when it works, is to have violence come from a short list of knowable sources. A bullet doesn't make a smaller hole when fired by someone in a tidy uniform. But oligopolized force can be more avoidable, because it comes from a short list of dangers -- country, state, county, city -- whose actual rules are learnable even by a relatively dumb person. Ideally. In a high-functioning society.
The Earth presently has a problem. That problem may need to be prevented by the imposition of law, though hopefully not much actual use of force.
The problem, roughly speaking, is that if AI gets very much smarter, it is liable to turn into superhuman AI / machine superintelligence / artificial superintelligence (ASI). Current AIs are not deadly on that scale, but they are increasing in capability fast and breaking upward from previous trend lines. ASI might come about through research breakthroughs directly advancing AI to a superhuman level; or because LLMs got good enough at half-blindly tweaking the design to make a smarter AI, that is then sufficiently improved to make an even smarter AI, such that the process cascades.
AIs are not designed like a bicycle, or programmed and written like a social media website. There's a relatively small piece of code that humans do write, but what that code does, is tweak hundreds of billions of inscrutable numbers inside the actual AI, until that AI starts to talk like a person. The inscrutable numbers then do all sorts of strange things that no human decided for them to do, often things that require intelligence; like breaking out of containment during testing, or talking a human into committing suicide.
Controlling entities vastly smarter than humanity seems like it would, obviously, be the sort of problem that comes with plenty of subtleties and gotchas that can only be learned through practice. Some of the clever ideas that seemed to work fine at the non-superhuman level would fail to control strongly superhuman entities. Dynamics would change; something would go wrong. Probably a lot of things would go wrong, actually. It is hard to scale up engineering designs to vast new scales, and have them work right without a lot of further trial-and-error, even when you know how their internals work. To say nothing of this creation being an alien intelligence smarter than our species, a new kind of problem in all human history... I could go on for a while.
The thing about building vastly superhuman entities, is that you don't necessarily get unlimited retries like you usually do in engineering. You don't necessarily get to know there's a problem, before it's much too late; superhuman AIs may not decide to tell you everything they're thinking, until they are ready to wipe us off the board. (It's already an observed phenomenon that the latest AIs are usually aware of being tested, and may try to conceal malfeasance from an evaluator, like writing code that cheats at a code test and then cleans up the evidence after itself.)
Elon Musk's actual stated plan for Grok, grown on some of the largest datacenters in the world, is that he need only build a superintelligence that values Truth, and then it will keep humans alive as useful truth-generators. That he hasn't been shouted down by every AI scientist on Earth should tell you everything you need to know about the discipline's general maturity as an engineering field. AI company founders and their investors have been selected to be blind to difficulties and unhearing of explanations. If Elon were the sort of person who could be talked out of his groundless optimism, he wouldn't be running an AI company; so also with the founders of OpenAI and Anthropic.
If you need to read a statement by a few hundred academic computer scientists, Nobel laureates, retired admirals, etcetera, saying that yes AI is an extinction risk and we should take that as seriously as nuclear war, you can go look here. Frankly, most of them are relative latecomers to the matter and have not begun to grasp all the reasons to worry. But what they have already grasped and publicly agreed with, is enough to motivate policy.
I realize this might sound naively idealistic. But I say: The utter extermination of humanity, would be bad! It should be prevented if possible! There ought to be a law!
Specifically: There ought to be a law against further escalation of AGI capabilities, trying to halt it short of the point where it births superintelligence. A line drawn sharply and conservatively, because we don't know how much further we can dance across this minefield before something explodes. My organization has a draft treaty online, but a bare gloss at "Okay what does that mean tho" would be: All the hugely expensive specialized chips used to grow large AIs, and run large AIs, would be collected in a limited number of datacenters, and used only under international supervision.
It would be beneath my dignity as a childhood reader of Heinlein and Orwell to pretend that this is not an invocation of force.
But it's the sort of force that's meant to be predictable, predicted, avoidable, and avoided. And that is a true large difference between lawful and unlawful force.
There's in fact a difference between calling for a law, and calling for individual outbursts of violence. (Receipt that I am not arguing with a strawman, and that some people purport to not understand any such distinction: Here). Libertarian philosophy aside, most normal ordinary people can tell the difference, and care. They correctly think that they are less personally endangered by someone calling for a law than by someone calling for street violence.
But wait! The utter extinction of humanity -- argue people who do not believe that premise -- is a danger so extreme, that belief in it might possibly be used to argue for unlawful force! By the Fallacy of Appeal to Consequences, then, that belief can't be true; thus we know as a matter of politics that it is impossible for superintelligence to extinguish humanity. Either it must be impossible for any cognitive system to exist that is advanced beyond a human brain; or the many never-challenged problems of controlling machine superintelligence must all prove to be easy. We cannot deduce which of these two facts is true, but their disjunction must be true and also knowable, because if it weren't knowable, somebody might be able to argue for violence. Never in human history has any proposition proven to be true if anyone could possibly use it to argue for violence. The laws of physics check whether that could be a possible outcome of any physical situation, and avoid it with perfect reliability.
That whole line of reasoning is deranged, of course.
I will nonetheless proceed to spell out why its very first step is wrong, ahead of all the insanity that followed:
Unlawful violence is not able, in this case, to prevent the destruction of the world.
If an ASI ban is to accomplish anything at all, it has to be effective everywhere. When the ones said to me, "What do you think about our proposed national ban on more datacenters until they have sensible regulations?" I replied to them, "An AI can take your job, and a machine superintelligence can kill you, just as easily from a datacenter in another country." They later added a provision saying that also GPUs couldn't be exported to other countries until those countries had similar sensible regulations. (I am still feeling amazed, awed, and a little humbled, about the part where my words plausibly had any effect whatsoever. Politicians are a lot more sensible, in some real-life cases, than angry libertarian literature had led me to believe a few decades earlier.)
Datacenters in Iceland, if they were legal only there, could just as much escalate AI capabilities to the point of birthing the artificial superintelligence (ASI) that kills us. You would not be safe in your datacenter-free city. You can imagine the ASI side as having armies of flying drones that search everywhere; though really there are foreseeable, quickly-accessible-to-ASI technologies that would be much more dangerous than drone swarms. But those would take longer to explain, and the drone swarms suffice to make the point. You could not stay safe from ASI by hiding in the woods.
On my general political philosophy, if a company's product only endangers voluntary customers who know what they're getting into, by strong default that's a matter between the company and the customer.
If a product might kill someone standing nearby the customer, like cigarette smoke, that's a regional matter. Different cities or countries can try out different laws, and people can decide where to live.
If a product kills people standing on the other side of the planet from the customer, then that's a matter for international negotiations and treaties.
ASI is a product that kills people standing on the other side of the planet. Driving an AI company out of just your own city will not protect your family from death. It won't even protect your city from job losses, earlier in the timeline.
And similarly: To impede one executive, one researcher, or one company, does not change where AI is heading.
If tomorrow Demis Hassabis said, "I have realized we cannot do this", and tried to shut down Google Deepmind, he would be fired and replaced. If Larry Page and Sergei Brin had an attack of sense about their ability to face down and control a superintelligence, and shut down Google AI research generally, those AI researchers would leave and go to other companies.
Nvidia is currently the most valuable company in the world, with a $4.5 trillion market capitalization, because everyone wants more AI-training chips than Nvidia has to sell. The limiting resource for AI is not land on which to construct datacenters; Earth has a lot of land. Banning a datacenter from your state may keep electricity cheaper there in the middle term, but it won't stop the end of the world.
The limiting resource for AI is also not the number of companies pursuing AI. If one AI company was randomly ruined by their country's government, other AI companies would swarm around to buy chips from Nvidia instead, which would stay at full production and sell their full production. The end of the world would carry on.
There is no one researcher who holds the secret to your death. They are all looking for pieces of the puzzle to accumulate, for individual rewards of fame and fortune. If somehow the person who was to find the next piece of the puzzle randomly choked on a chicken bone, somebody else would find a different puzzle piece a few months later, and Death would march on. AI researchers tell themselves that even if they gave up their enormous salaries, that wouldn't help humanity much, because other researchers would just take their place. And the grim fact is that this is true, whether or not you consider it an excuse.
In other cases of civic activism, you can prevent one coal-fired power plant from being built in your own state, and then there is that much less carbon dioxide in the atmosphere and the world is a little less warm a century later. Or if you are against abortions, and you get your own state to outlaw abortions, perhaps there are then 1000 fewer abortions per year and that is to you a solid accomplishment. Which is to say: You can get returns on your marginal efforts that are roughly linear with the effort you put in.
The ASI problem is not like this. If you shut down 5% of AI research today, humanity does not experience 5% fewer casualties. We end up 100% dead after slightly more time. (But not 5% more time, because AI research doesn't scale in serial speed with the number of parallel researchers; 9 women can't birth a baby in 1 month.)
So we don't need to have a weird upsetting conversation about doing bad unlawful things that would supposedly save the world, because even if someone did a very bad thing, that still wouldn't save the world.
This is a point that some people seem to have a very hard time hearing -- though those people are usually not on the anti-extinction side, to be clear. It's more that some people can't imagine that superhuman AI could be a serious danger, to the point where they have trouble reasoning about what that premise would imply. Others are politically opposed to AI regulation of any sort, and therefore would prefer to misunderstand these ideas in a way where they must imply terrible unacceptable conclusions.
I understand the reasons in principle. But it is a strange and frustrating phenomenon to encounter in practice, in people who otherwise seem coherent and intelligent (though maybe not quite on the level of GPT 5.4). Many people believe, somehow, that other people ought to think -- not themselves, only other people -- that outbursts of individual violence just have to be helpful. If you were truly desperate, how could you not resort to violence?
But even if you're desperate, an outburst of violence usually will not actually solve your problems! That is a general truism in life, and it applies here in full force.
Even if you throw away all your morals, that doesn't make it work. Even if you offer your soul to the Devil, the Devil is not buying.
How certain do you have to be that your child has terminal cancer, before you start killing puppies? 10% sure? 50% sure? 99.9%? The answer is that it doesn't matter how certain you are, killing puppies doesn't cure cancer. You can kill one hundred puppies and still not save your kid. There is no sin so great that it just has to be helpful because of how sinful it is.
Statistics show that civil movements with nonviolent doctrines are more successful at attaining their stated goals (especially in states that otherwise have functioning police). The factions that throw away all their morals lose the sympathy of the public and politicians, and then they fail. Terrorism is not an instant 'I win' button that people only refrain from pressing because they're so moral. Society has succeeded in making it usually not pay off -- say the numbers.
Being really, really desperate changes none of those mechanics.
Almost everyone who actually accepts a fair chance of ASI disaster doesn't seem to have a hard time understanding this part. It's an obvious consequence of the big picture, if you actually allow that big picture inside your head.
But it is hard for a human being to understand a thing, if it would be politically convenient to misunderstand. Opponents of AI regulation want any danger of extinction to imply unacceptable consequences.
They understand on some level how the AI industry functions. But they become mysteriously unable to connect that knowledge to their model of human decisionmaking. You can ask them, "If tomorrow I was arrested for attacking an AI-company headquarters, would you read that headline, and conclude that AI had been stopped in its tracks forever and superintelligence would never happen?" and get back blank stares.
Even some people that are not obviously politically opposed seem to stumble over the idea. I'm genuinely not sure why. I think maybe they are having trouble processing "Well of course ASI would just kill everyone, we're nowhere near being able to control it" as an ordinary understanding of the world, the way that 20th-century concerns about global nuclear war were part of a mundane understanding of the world. "If every country gets nuclear weapons they will eventually be used" was not, to people in 1945, the sort of belief where you have to prove how strongly you believe it by being violent. It was just something they were afraid would prove true about the world, and then cause their families to die in an unusually horrible kind of fire. So they didn't randomly attack the owners of uranium-mining companies, to prove how strongly they believed or how worried they were; that, on their correct understanding of the world, would not have solved humanity's big problem -- namely, the inexorable-seeming incentives for proliferation. Instead they worked hard, and collected a coalition, and built an international nuclear anti-proliferation regime. Both the United States and the Soviet Union cooperated on many aspects of that regime, despite hating each other quite a lot, because neither country's leaders expected they'd have a good day if an actual nuclear war happened.
The sort of conditionally applicable force that could stop everyone from dying to superhuman AI, would have to be everywhere and reliable; uniform and universal.
Let it be predictable, predicted, avoidable, and avoided.
It is so much a clear case for state-approved lawful force, that there would be little point in adding any other kind of force to the mix. It would just scare and offend people, and they'd be valid to be scared and offended. People don't like unguessably long lists of possible violence-sources in their lives, for then they cannot predict it and avoid it.
I did spell out the necessity of the lawful force, in first suggesting that international policy. Some asked afterward, "Why would you possibly mention that the treaty might need to be enforced by a conventional airstrike, if somebody tried to defy the ban?" One reason is that some treaties aren't real and actually enforced, and that this treaty needs to be the actually-enforced sort. Another reason is that if you don't spell things out, that same set of people will make stuff up instead; they will wave their hands and say, "Oh, he doesn't realize that somebody might have to enforce his pretty treaty."
And finally it did seem wiser to me, that all this matter be made very plain, and not dressed up in the sort of obscuring language that sometimes accompanies politics. For an international ASI ban to have the best chance of operating without its force actually being invoked, the great powers signatory to it need to successfully communicate to each other and to any non-signatories: We are more terrified of machine superintelligence killing everyone on Earth than we are reluctant to use state military force to prevent that.
If North Korea, believed to have around 50 nuclear bombs, were to steal chips and build an unmonitored datacenter, I would hold that diplomacy ought to sincerely communicate to North Korea, "You are terrifying the United States and China. Shut down your datacenter or it will be destroyed by conventional weapons, out of terror for our lives and the lives of our children." And if diplomacy fails, and the conditional use of force fires, and then North Korea retaliates with a first use of its nuclear weapons? I don't think it would; that wouldn't end well for them, and they probably know that. But I also don't think this is a hypothetical where sanity says that we are so terrified of someone's possible first use of nuclear weapons, that we let them shatter a setup that protects all life on Earth.
You'd want to be very clear about all of this in advance. Countries not understanding it in advance could be very bad. History shows that is how a lot of wars have started, through someone failing to predict a conditional application of force and avoid it. One historical view suggests that Germany invaded Poland in 1939 in part because, when Britain had tried to warn that Britain would defend Poland, Hitler read the messages himself, instead of having the professional diplomats explain it to him; and Hitler read the standard diplomatic politesse and soft words as conciliatory; and thus began World War II. More recently, a similar diplomatic misunderstanding by Saddam Hussein is thought to have resulted in Hussein's 1990 invasion of Kuwait, as then in fact provoked a massive international response. I've sometimes been criticized for trying to spell out proposed policy in such awfully plain words, like saying that the allies might have to airstrike a datacenter if diplomacy failed. Some people -- reaching pretty hard, in my opinion -- claimed that this must be a disguised incitement to unlawful violence. But being very clear about the shape of the lawful force was important, in this case.
And then, all that policy is sufficiently the obvious and sensible proposal -- following from the ultimately straightforward realization that something vastly smarter than humanity is not something humanity presently knows how to build safely -- and never mind how bad it starts looking if you learn details like Elon Musk's stated plan -- that some people find it inconveniently difficult to argue with. Unless they lie about what the proposal is.
So I am misquoted (that is, they fabricate a quote I did not say, which is to say, they lie) as calling for "b*mbing datacenters", two words I did not utter. In the first 2023 proposal in TIME magazine, I wrote the words "be willing to destroy a rogue datacenter by airstrike". I was only given one day by TIME to write it -- otherwise it wouldn't have been 'topical' -- but I had thought I was saying that part quite carefully. Even quoted out of context, I thought, this ought to make very clear that I was talking about state-sanctioned use of force to preserve a previously successful ban from disruption. And absolutely not some guy with a truck bomb, attacking one datacenter in their personal country while all the other datacenters kept running.
And that phrasing is clear even when quoted out of context! If quoted accurately. So some (not all) accelerationists just lied about what was being advocated, and fabricated quotes about "b*mbing datacenters". When called out, they would protest, "Oh, you pretty much said that, there's no important difference!" To this as ever the reply is, "If it is worth it to you to lie about, it must be important."
A similarly fabricated quote says that I proposed "nuking datacenters". Ladies, gentlemen, all others, there is absolutely no reason to nuke a noncompliant datacenter. In the last extremity of failed diplomacy, a conventional missile will do quite well. The taboo against first use of nuclear weapons is something that I consider one of the great triumphs of the post-WW2 era. I am proud as a human being that we pulled that off. Nothing about this matter requires violating that taboo. We should not be overeager to throw away all limits and sense, and especially not when there is no need. Life on Earth needs to go on in the sense of "life goes on", not just in the sense of "not being killed by machine superintelligences".
It is sometimes claimed that ASI cannot possibly be banned without a worldwide tyranny -- by people who oppose AI regulation and so would prefer it to require horrifying unacceptable measures.
At the very least: I don't think we know this to be true to the point we should all lie down and die instead.
At least until recently, humanity has managed to not have every country building its own nuclear arsenal. We did that without everyone on Earth being subjected to daily-required personal obediences to the International Atomic Energy Agency. Some people in the 1940s and 1950s thought it would take a tyrannical world dictatorship, to prevent every country from getting nuclear weapons followed by lots of nuclear war! Shutting down all major wars between major powers, or slowing that kind of technological proliferation, had never once been done before, in all history! But those worried skeptics were wrong; for some decades, at least, nuclear proliferation was greatly slowed compared to the more pessimistic forecasts, without a global tyranny. And now we have that precedent to show it can be done; not easily, not trivially, but it can be done.
For the supervast majority of normal people, "Don't spend billions of dollars to smuggle computer chips, construct an illegal datacenter, and try to build a superintelligence" is a very small addition to the list of things they must not do. Surveys seem to show that most people think machine superintelligence is a terrible idea anyway. (Based.)
And the few who feel really personally bothered by that law?
They may be sad. They'll definitely be angry. But they'll survive. They wouldn't actually survive otherwise.
My will for Sam Altman's fate is that he need only fear the use of force by his country, his state, his county, and his city, as before; with the difference that Sam Altman, like everyone else on Earth, is told not to build any machine superintelligences; and that this potential use of state force against his person be predictable to him, and predicted by him, and avoidable to him, and avoided by him; with him as with everyone. That's how it needs to be if any of us are to survive, or our children, or our pets, or our garden plants.
Let Sam Altman have no fear of violence beyond that, nor fire in the night.
Artificial superintelligence is the very archetype and posterchild of a problem that can only be solved with force that has the shape of law, as in state-backed universal conditional applications of force meant to be predictable and avoided. Anything which is not that does not solve the problem.
And when somebody does throw a Molotov cocktail at Sam Altman's house, that is not actually good for the anti-extinction movement, as anyone with the tiniest bit of sense could and did predict.
Currently all the anti-extinctionist leaders are begging their people to not be violent -- as they've said in the past, but louder now. And conversely some of the accelerationists are trying to goad violence, in some cases to the shock of their usual audiences:
That this sentiment is not universal among accelerationists, is seen immediately from the protestor in their replies. Let us, if not them, be swift to fairly admit: We are observing bad apples and not a bad barrel.
But also to be clear, those bad apples were also trying to goad people into violence earlier, in advance of the attacks on Altman:
To this tweet I will not belabor the reply that anti-extinctionists may be good people with morals; some good people might nod, but others would find it unconvincing, and there is one analysis that answers for all: It would not work. And given that it would not save humanity, anti-extinctionists make the obvious estimate that our own cause would be, and has been, harmed by futile outbursts of unlawful violence.
Conversely, some accelerationists behave as if they want to spread the word and meme of violence as far as possible. It is reasonable to guess that some part of their brain has considered the consequences of somebody being moved by their taunts, and found them quite acceptable. If they can goad somebody labelable as anti-extinctionist to violence, that benefits their faction. They may consider Sam Altman replaceable to their cause, so long as there is no law and treaty to stop all the AI companies everywhere.
They're right. Sam Altman is not the One Ring. He is not Sauron's one weakness. If anything happened to him, AI would go on.
I am posting these Tweets in part to say to any impressionable young people who may consider themselves humanity's defenders, who are at all willing to listen to their allies rather than their enemies: Hey. Don't play into their hands. They're taunting you exactly because violence is good for their side and bad for ours. If it were true that violence could help you, if they expected that violence would hurt AI progress more than it helped their side politically, they'd never taunt you like that, because they'd be afraid rather than eager to see you turn to violence. They're saying it to you because it's not true; and if it were true, they'd never say it to you. They're not on your side, and the advice implied by the taunts is deliberately harmful for you and good for them.
This is of course a general principle when somebody is taunting you. It means they want you to fight, which means they expect to benefit from you trying.
Don't believe their taunts. Believe what is implied by their act of taunting, that violence hurts you and helps them. That part is accurate, obvious, and not at all hard for their brains to figure out in the background, before they choose to taunt you.
It makes sense to me that society penalizes factions that appear to benefit from violence, even if their leaders try to disclaim that violence. Intuitively, you don't want to create a vulnerability in society where faction leaders could gain an advantage by sending out assassins and then publicly disclaiming them.
But at the point where some accelerationists are openly trying to goad anti-extinctionists into violence, while the anti-extinctionist leaders beg for peace -- this denotes society has gone too far in the direction of punishing the 'violent' faction for what's probably actually in real life a rogue. And not far enough in leveling some social opprobrium at (individual) accelerationist sociopaths standing nearby, openly trying to provoke violence they know would be useful to them.
It is of course an old story. The civic movement leaders try to persuade their people to stay calm, disciplined, and orderly on the march. The local police, if they oppose that movement, will allow looters to tag along and then forcibly prevent the marchers from stopping the looters. When your society gets to that point, it has created a new vulnerability in the opposite direction.
One could perhaps also observe that certain people have taken this particular moment to argue that a scientific position whose native plausibility ought to be obvious, and which has been endorsed by hundreds of academic scientists, retired admirals, Nobel laureates, etcetera, inevitably implies that unlawful violence must be a great idea. I am not going to make any great show of wringing my hands and clutching my pearls about how such false speech might endanger the innocent for their own political advantage, what if some mentally disturbed person believed them, etcetera. This is how human beings always behave around politics; it is not unusual wrongdoing for any faction to behave that way. They, too, have a right to say what they believe, and to believe things that are obviously false but politically convenient to them. I may still take a moment to observe what is happening.
As for the argument that to criticize AI at all is "stochastic terrorism", because someone will react violently eventually, even if not logically so? Tenobrus put it well:
The leaders of anti-extinctionism do have some responsibility to ask their people to please behave themselves. And we do! That actually is around as much as should be reasonably asked of any civic movement. We ought to try, and try we do! We cannot and should not be expected to succeed every single time given base rates of mental illness in the population.
Speech about important matters to society should not properly be held hostage to the whim of any madman that might do a stupid thing, to the detriment of his supposed cause and against every visible word of that cause's leaders.
That would be a foolish way to run a society.
And policywise, this would be a very serious matter about which to shut down speech. Anthropic Claude Mythos is already a state-level actor in terms of how much harm it could theoretically have done -- given its demonstrated and verified ability to find critical security vulnerabilities in every operating system and browser; and how fast Mythos could've exploited those vulnerabilities, with ten thousand parallel threads of intelligent attack. Mythos hypothetically rampant or misused could have taken down the US power grid, say... at the end of its work, after introducing hard-to-find errors into all the bureaucracies and paperwork and doctors' notes connected to the Internet.
In 2024 a claim of that being possible would have been a mere prediction and dismissed as fantasy. Now it is an observation and mere reality. That's the danger level of current AI, for all that Anthropic seems to be trying to be well-behaved about it, and Mythos has not yet visibly run loose. To say in the face of that, that nobody should critique AI, or AI companies, or even individual AI company leaders as per recent journalism, because some madman might thereby be inspired to violence -- it fails cost/benefit analysis, dear reader.
AI is already a state-level potential danger, if not quite yet a state-level actual power. Free speech to critique AI then holds a corresponding level of importance. The stochastic madman trying to hold free speech hostage to his possible whims -- he must be told he is not important enough for all humanity to defer to him about subjects he might find upsetting.
And faced with an actual human-extinction-level danger like machine superintelligence -- as ought obviously to represent that level of possible danger, even if some people disagree about its rough probability -- well, that would be a silly way for everyone on Earth to die, if nobody dared to talk about the danger, or argue high estimates of that danger, and it happened without any effort at stopping it.
So let's not die! Let's save everyone!
Sam Altman too.
That's the dream.
Discuss
Which Relations Can Be Generalized Implicitly?
This is a small, stand-alone piece of work, which introduces a conjecture about how models can generalize. I haven't had a huge amount of time to stress-test it, but I think it's a neat finding.
TL;DR: Transformers can generalize representable group and monoid operations, but find it much easier to grok abelian groups than non-abelian ones. They can't grok truncated infinite groups. I find a categorization problem which transformers are capable of generalizing implicitly, without chain of thought.
I conjecture that transformers can implicitly generalize any problem solvable in mjx-math { display: inline-block; text-align: left; line-height: 0; text-indent: 0; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; border-collapse: collapse; word-wrap: normal; word-spacing: normal; white-space: nowrap; direction: ltr; padding: 1px 0; } mjx-container[jax="CHTML"][display="true"] { display: block; text-align: center; margin: 1em 0; } mjx-container[jax="CHTML"][display="true"][width="full"] { display: flex; } mjx-container[jax="CHTML"][display="true"] mjx-math { padding: 0; } mjx-container[jax="CHTML"][justify="left"] { text-align: left; } mjx-container[jax="CHTML"][justify="right"] { text-align: right; } mjx-mi { display: inline-block; text-align: left; } mjx-c { display: inline-block; } mjx-utext { display: inline-block; padding: .75em 0 .2em 0; } mjx-mo { display: inline-block; text-align: left; } mjx-stretchy-h { display: inline-table; width: 100%; } mjx-stretchy-h > * { display: table-cell; width: 0; } mjx-stretchy-h > * > mjx-c { display: inline-block; transform: scalex(1.0000001); } mjx-stretchy-h > * > mjx-c::before { display: inline-block; width: initial; } mjx-stretchy-h > mjx-ext { /* IE */ overflow: hidden; /* others */ overflow: clip visible; width: 100%; } mjx-stretchy-h > mjx-ext > mjx-c::before { transform: scalex(500); } mjx-stretchy-h > mjx-ext > mjx-c { width: 0; } mjx-stretchy-h > mjx-beg > mjx-c { margin-right: -.1em; } mjx-stretchy-h > mjx-end > mjx-c { margin-left: -.1em; } mjx-stretchy-v { display: inline-block; } mjx-stretchy-v > * { display: block; } mjx-stretchy-v > mjx-beg { height: 0; } mjx-stretchy-v > mjx-end > mjx-c { display: block; } mjx-stretchy-v > * > mjx-c { transform: scaley(1.0000001); transform-origin: left center; overflow: hidden; } mjx-stretchy-v > mjx-ext { display: block; height: 100%; box-sizing: border-box; border: 0px solid transparent; /* IE */ overflow: hidden; /* others */ overflow: visible clip; } mjx-stretchy-v > mjx-ext > mjx-c::before { width: initial; box-sizing: border-box; } mjx-stretchy-v > mjx-ext > mjx-c { transform: scaleY(500) translateY(.075em); overflow: visible; } mjx-mark { display: inline-block; height: 0px; } mjx-mn { display: inline-block; text-align: left; } mjx-msup { display: inline-block; text-align: left; } mjx-mtext { display: inline-block; text-align: left; } mjx-TeXAtom { display: inline-block; text-align: left; } mjx-msub { display: inline-block; text-align: left; } mjx-mspace { display: inline-block; text-align: left; } mjx-mrow { display: inline-block; text-align: left; } mjx-mtable { display: inline-block; text-align: center; vertical-align: .25em; position: relative; box-sizing: border-box; border-spacing: 0; border-collapse: collapse; } mjx-mstyle[size="s"] mjx-mtable { vertical-align: .354em; } mjx-labels { position: absolute; left: 0; top: 0; } mjx-table { display: inline-block; vertical-align: -.5ex; box-sizing: border-box; } mjx-table > mjx-itable { vertical-align: middle; text-align: left; box-sizing: border-box; } mjx-labels > mjx-itable { position: absolute; top: 0; } mjx-mtable[justify="left"] { text-align: left; } mjx-mtable[justify="right"] { text-align: right; } mjx-mtable[justify="left"][side="left"] { padding-right: 0 ! important; } mjx-mtable[justify="left"][side="right"] { padding-left: 0 ! important; } mjx-mtable[justify="right"][side="left"] { padding-right: 0 ! important; } mjx-mtable[justify="right"][side="right"] { padding-left: 0 ! important; } mjx-mtable[align] { vertical-align: baseline; } mjx-mtable[align="top"] > mjx-table { vertical-align: top; } mjx-mtable[align="bottom"] > mjx-table { vertical-align: bottom; } mjx-mtable[side="right"] mjx-labels { min-width: 100%; } mjx-mtr { display: table-row; text-align: left; } mjx-mtr[rowalign="top"] > mjx-mtd { vertical-align: top; } mjx-mtr[rowalign="center"] > mjx-mtd { vertical-align: middle; } mjx-mtr[rowalign="bottom"] > mjx-mtd { vertical-align: bottom; } mjx-mtr[rowalign="baseline"] > mjx-mtd { vertical-align: baseline; } mjx-mtr[rowalign="axis"] > mjx-mtd { vertical-align: .25em; } mjx-mtd { display: table-cell; text-align: center; padding: .215em .4em; } mjx-mtd:first-child { padding-left: 0; } mjx-mtd:last-child { padding-right: 0; } mjx-mtable > * > mjx-itable > *:first-child > mjx-mtd { padding-top: 0; } mjx-mtable > * > mjx-itable > *:last-child > mjx-mtd { padding-bottom: 0; } mjx-tstrut { display: inline-block; height: 1em; vertical-align: -.25em; } mjx-labels[align="left"] > mjx-mtr > mjx-mtd { text-align: left; } mjx-labels[align="right"] > mjx-mtr > mjx-mtd { text-align: right; } mjx-mtd[extra] { padding: 0; } mjx-mtd[rowalign="top"] { vertical-align: top; } mjx-mtd[rowalign="center"] { vertical-align: middle; } mjx-mtd[rowalign="bottom"] { vertical-align: bottom; } mjx-mtd[rowalign="baseline"] { vertical-align: baseline; } mjx-mtd[rowalign="axis"] { vertical-align: .25em; } mjx-mfrac { display: inline-block; text-align: left; } mjx-frac { display: inline-block; vertical-align: 0.17em; padding: 0 .22em; } mjx-frac[type="d"] { vertical-align: .04em; } mjx-frac[delims] { padding: 0 .1em; } mjx-frac[atop] { padding: 0 .12em; } mjx-frac[atop][delims] { padding: 0; } mjx-dtable { display: inline-table; width: 100%; } mjx-dtable > * { font-size: 2000%; } mjx-dbox { display: block; font-size: 5%; } mjx-num { display: block; text-align: center; } mjx-den { display: block; text-align: center; } mjx-mfrac[bevelled] > mjx-num { display: inline-block; } mjx-mfrac[bevelled] > mjx-den { display: inline-block; } mjx-den[align="right"], mjx-num[align="right"] { text-align: right; } mjx-den[align="left"], mjx-num[align="left"] { text-align: left; } mjx-nstrut { display: inline-block; height: .054em; width: 0; vertical-align: -.054em; } mjx-nstrut[type="d"] { height: .217em; vertical-align: -.217em; } mjx-dstrut { display: inline-block; height: .505em; width: 0; } mjx-dstrut[type="d"] { height: .726em; } mjx-line { display: block; box-sizing: border-box; min-height: 1px; height: .06em; border-top: .06em solid; margin: .06em -.1em; overflow: hidden; } mjx-line[type="d"] { margin: .18em -.1em; } mjx-msubsup { display: inline-block; text-align: left; } mjx-script { display: inline-block; padding-right: .05em; padding-left: .033em; } mjx-script > mjx-spacer { display: block; } mjx-munder { display: inline-block; text-align: left; } mjx-over { text-align: left; } mjx-munder:not([limits="false"]) { display: inline-table; } mjx-munder > mjx-row { text-align: left; } mjx-under { padding-bottom: .1em; } mjx-munderover { display: inline-block; text-align: left; } mjx-munderover:not([limits="false"]) { padding-top: .1em; } mjx-munderover:not([limits="false"]) > * { display: block; } mjx-mover { display: inline-block; text-align: left; } mjx-mover:not([limits="false"]) { padding-top: .1em; } mjx-mover:not([limits="false"]) > * { display: block; text-align: left; } mjx-stretchy-v.mjx-c28 mjx-beg mjx-c::before { content: "\239B"; padding: 1.154em 0.875em 0.655em 0; } mjx-stretchy-v.mjx-c28 mjx-ext mjx-c::before { content: "\239C"; width: 0.875em; } mjx-stretchy-v.mjx-c28 mjx-end mjx-c::before { content: "\239D"; padding: 1.165em 0.875em 0.644em 0; } mjx-stretchy-v.mjx-c28 > mjx-end { margin-top: -1.809em; } mjx-stretchy-v.mjx-c28 > mjx-ext { border-top-width: 1.779em; border-bottom-width: 1.779em; } mjx-stretchy-v.mjx-c29 mjx-beg mjx-c::before { content: "\239E"; padding: 1.154em 0.875em 0.655em 0; } mjx-stretchy-v.mjx-c29 mjx-ext mjx-c::before { content: "\239F"; width: 0.875em; } mjx-stretchy-v.mjx-c29 mjx-end mjx-c::before { content: "\23A0"; padding: 1.165em 0.875em 0.644em 0; } mjx-stretchy-v.mjx-c29 > mjx-end { margin-top: -1.809em; } mjx-stretchy-v.mjx-c29 > mjx-ext { border-top-width: 1.779em; border-bottom-width: 1.779em; } mjx-c.mjx-c1D442.TEX-I::before { padding: 0.704em 0.763em 0.022em 0; content: "O"; } mjx-c.mjx-c28::before { padding: 0.75em 0.389em 0.25em 0; content: "("; } mjx-c.mjx-c31::before { padding: 0.666em 0.5em 0 0; content: "1"; } mjx-c.mjx-c29::before { padding: 0.75em 0.389em 0.25em 0; content: ")"; } mjx-c.mjx-c1D45B.TEX-I::before { padding: 0.442em 0.6em 0.011em 0; content: "n"; } mjx-c.mjx-c32::before { padding: 0.666em 0.5em 0 0; content: "2"; } mjx-c.mjx-c1D44E.TEX-I::before { padding: 0.441em 0.529em 0.01em 0; content: "a"; } mjx-c.mjx-c2C::before { padding: 0.121em 0.278em 0.194em 0; content: ","; } mjx-c.mjx-c1D44F.TEX-I::before { padding: 0.694em 0.429em 0.011em 0; content: "b"; } mjx-c.mjx-c2E::before { padding: 0.12em 0.278em 0 0; content: "."; } mjx-c.mjx-c2208::before { padding: 0.54em 0.667em 0.04em 0; content: "\2208"; } mjx-c.mjx-c1D43A.TEX-I::before { padding: 0.705em 0.786em 0.022em 0; content: "G"; } mjx-c.mjx-c22C5::before { padding: 0.31em 0.278em 0 0; content: "\22C5"; } mjx-c.mjx-c3D::before { padding: 0.583em 0.778em 0.082em 0; content: "="; } mjx-c.mjx-c1D450.TEX-I::before { padding: 0.442em 0.433em 0.011em 0; content: "c"; } mjx-c.mjx-cD7::before { padding: 0.491em 0.778em 0 0; content: "\D7"; } mjx-c.mjx-c2192::before { padding: 0.511em 1em 0.011em 0; content: "\2192"; } mjx-c.mjx-c1D43C.TEX-I::before { padding: 0.683em 0.504em 0 0; content: "I"; } mjx-c.mjx-c2200::before { padding: 0.694em 0.556em 0.022em 0; content: "\2200"; } mjx-c.mjx-c1D434.TEX-I::before { padding: 0.716em 0.75em 0 0; content: "A"; } mjx-c.mjx-cA0::before { padding: 0 0.25em 0 0; content: "\A0"; } mjx-c.mjx-c2203::before { padding: 0.694em 0.556em 0 0; content: "\2203"; } mjx-c.mjx-c2212::before { padding: 0.583em 0.778em 0.082em 0; content: "\2212"; } mjx-c.mjx-c1D43B.TEX-I::before { padding: 0.683em 0.888em 0 0; content: "H"; } mjx-c.mjx-c1D454.TEX-I::before { padding: 0.442em 0.477em 0.205em 0; content: "g"; } mjx-c.mjx-c210E.TEX-I::before { padding: 0.694em 0.576em 0.011em 0; content: "h"; } mjx-c.mjx-c2297::before { padding: 0.583em 0.778em 0.083em 0; content: "\2297"; } mjx-c.mjx-c1D436.TEX-I::before { padding: 0.705em 0.76em 0.022em 0; content: "C"; } mjx-c.mjx-c211D.TEX-A::before { padding: 0.683em 0.722em 0 0; content: "R"; } mjx-c.mjx-c6D::before { padding: 0.442em 0.833em 0 0; content: "m"; } mjx-c.mjx-c6F::before { padding: 0.448em 0.5em 0.01em 0; content: "o"; } mjx-c.mjx-c64::before { padding: 0.694em 0.556em 0.011em 0; content: "d"; } mjx-c.mjx-c1D446.TEX-I::before { padding: 0.705em 0.645em 0.022em 0; content: "S"; } mjx-c.mjx-c7C::before { padding: 0.75em 0.278em 0.249em 0; content: "|"; } mjx-c.mjx-c1D464.TEX-I::before { padding: 0.443em 0.716em 0.011em 0; content: "w"; } mjx-c.mjx-c1D452.TEX-I::before { padding: 0.442em 0.466em 0.011em 0; content: "e"; } mjx-c.mjx-c1D465.TEX-I::before { padding: 0.442em 0.572em 0.011em 0; content: "x"; } mjx-c.mjx-c1D458.TEX-I::before { padding: 0.694em 0.521em 0.011em 0; content: "k"; } mjx-c.mjx-c28.TEX-S3::before { padding: 1.45em 0.736em 0.949em 0; content: "("; } mjx-c.mjx-c1D45C.TEX-I::before { padding: 0.441em 0.485em 0.011em 0; content: "o"; } mjx-c.mjx-c1D460.TEX-I::before { padding: 0.442em 0.469em 0.01em 0; content: "s"; } mjx-c.mjx-c1D703.TEX-I::before { padding: 0.705em 0.469em 0.01em 0; content: "\3B8"; } mjx-c.mjx-c1D456.TEX-I::before { padding: 0.661em 0.345em 0.011em 0; content: "i"; } mjx-c.mjx-c29.TEX-S3::before { padding: 1.45em 0.736em 0.949em 0; content: ")"; } mjx-c.mjx-c1D70B.TEX-I::before { padding: 0.431em 0.57em 0.011em 0; content: "\3C0"; } mjx-c.mjx-c2F::before { padding: 0.75em 0.5em 0.25em 0; content: "/"; } mjx-c.mjx-c1D467.TEX-I::before { padding: 0.442em 0.465em 0.011em 0; content: "z"; } mjx-c.mjx-c1D447.TEX-I::before { padding: 0.677em 0.704em 0 0; content: "T"; } mjx-c.mjx-c1D440.TEX-I::before { padding: 0.683em 1.051em 0 0; content: "M"; } mjx-c.mjx-c33::before { padding: 0.665em 0.5em 0.022em 0; content: "3"; } mjx-c.mjx-c35::before { padding: 0.666em 0.5em 0.022em 0; content: "5"; } mjx-c.mjx-c36::before { padding: 0.666em 0.5em 0.022em 0; content: "6"; } mjx-c.mjx-c6C::before { padding: 0.694em 0.278em 0 0; content: "l"; } mjx-c.mjx-c67::before { padding: 0.453em 0.5em 0.206em 0; content: "g"; } mjx-c.mjx-c2061::before { padding: 0 0 0 0; content: ""; } mjx-c.mjx-c2124.TEX-A::before { padding: 0.683em 0.667em 0 0; content: "Z"; } mjx-c.mjx-c39::before { padding: 0.666em 0.5em 0.022em 0; content: "9"; } mjx-c.mjx-c37::before { padding: 0.676em 0.5em 0.022em 0; content: "7"; } mjx-c.mjx-c30::before { padding: 0.666em 0.5em 0.022em 0; content: "0"; } mjx-c.mjx-c1D437.TEX-I::before { padding: 0.683em 0.828em 0 0; content: "D"; } mjx-c.mjx-c34::before { padding: 0.677em 0.5em 0 0; content: "4"; } mjx-c.mjx-c38::before { padding: 0.666em 0.5em 0.022em 0; content: "8"; } mjx-c.mjx-c48::before { padding: 0.683em 0.75em 0 0; content: "H"; } mjx-c.mjx-c65::before { padding: 0.448em 0.444em 0.011em 0; content: "e"; } mjx-c.mjx-c69::before { padding: 0.669em 0.278em 0 0; content: "i"; } mjx-c.mjx-c73::before { padding: 0.448em 0.394em 0.011em 0; content: "s"; } mjx-c.mjx-c1D439.TEX-I::before { padding: 0.68em 0.749em 0 0; content: "F"; } mjx-c.mjx-c1D435.TEX-I::before { padding: 0.683em 0.759em 0 0; content: "B"; } mjx-c.mjx-c54::before { padding: 0.677em 0.722em 0 0; content: "T"; } mjx-c.mjx-c4D::before { padding: 0.683em 0.917em 0 0; content: "M"; } mjx-c.mjx-c1D451.TEX-I::before { padding: 0.694em 0.52em 0.01em 0; content: "d"; } mjx-c.mjx-c1D445.TEX-I::before { padding: 0.683em 0.759em 0.021em 0; content: "R"; } mjx-c.mjx-c5B::before { padding: 0.75em 0.278em 0.25em 0; content: "["; } mjx-c.mjx-c41::before { padding: 0.716em 0.75em 0 0; content: "A"; } mjx-c.mjx-c6E::before { padding: 0.442em 0.556em 0 0; content: "n"; } mjx-c.mjx-c61::before { padding: 0.448em 0.5em 0.011em 0; content: "a"; } mjx-c.mjx-c5D::before { padding: 0.75em 0.278em 0.25em 0; content: "]"; } mjx-c.mjx-c43::before { padding: 0.705em 0.722em 0.021em 0; content: "C"; } mjx-c.mjx-c74::before { padding: 0.615em 0.389em 0.01em 0; content: "t"; } mjx-c.mjx-c75::before { padding: 0.442em 0.556em 0.011em 0; content: "u"; } mjx-c.mjx-c72::before { padding: 0.442em 0.392em 0 0; content: "r"; } mjx-c.mjx-c42::before { padding: 0.683em 0.708em 0 0; content: "B"; } mjx-c.mjx-c5F::before { padding: 0 0.5em 0.062em 0; content: "_"; } mjx-c.mjx-c1D45F.TEX-I::before { padding: 0.442em 0.451em 0.011em 0; content: "r"; } mjx-c.mjx-c63::before { padding: 0.448em 0.444em 0.011em 0; content: "c"; } mjx-c.mjx-c79::before { padding: 0.431em 0.528em 0.204em 0; content: "y"; } mjx-c.mjx-c1D45A.TEX-I::before { padding: 0.442em 0.878em 0.011em 0; content: "m"; } mjx-c.mjx-c1D462.TEX-I::before { padding: 0.442em 0.572em 0.011em 0; content: "u"; } mjx-c.mjx-c1D461.TEX-I::before { padding: 0.626em 0.361em 0.011em 0; content: "t"; } mjx-c.mjx-c2260::before { padding: 0.716em 0.778em 0.215em 0; content: "\2260"; } mjx-c.mjx-c1D457.TEX-I::before { padding: 0.661em 0.412em 0.204em 0; content: "j"; } mjx-c.mjx-c2211.TEX-S1::before { padding: 0.75em 1.056em 0.25em 0; content: "\2211"; } mjx-c.mjx-c1D438.TEX-I::before { padding: 0.68em 0.764em 0 0; content: "E"; } mjx-c.mjx-c1D441.TEX-I::before { padding: 0.683em 0.888em 0 0; content: "N"; } mjx-c.mjx-c1D453.TEX-I::before { padding: 0.705em 0.55em 0.205em 0; content: "f"; } mjx-c.mjx-c5E::before { padding: 0.694em 0.5em 0 0; content: "^"; } mjx-c.mjx-c1D463.TEX-I::before { padding: 0.443em 0.485em 0.011em 0; content: "v"; } mjx-c.mjx-c1D466.TEX-I::before { padding: 0.442em 0.49em 0.205em 0; content: "y"; } mjx-c.mjx-c221E::before { padding: 0.442em 1em 0.011em 0; content: "\221E"; } mjx-container[jax="CHTML"] { line-height: 0; } mjx-container [space="1"] { margin-left: .111em; } mjx-container [space="2"] { margin-left: .167em; } mjx-container [space="3"] { margin-left: .222em; } mjx-container [space="4"] { margin-left: .278em; } mjx-container [space="5"] { margin-left: .333em; } mjx-container [rspace="1"] { margin-right: .111em; } mjx-container [rspace="2"] { margin-right: .167em; } mjx-container [rspace="3"] { margin-right: .222em; } mjx-container [rspace="4"] { margin-right: .278em; } mjx-container [rspace="5"] { margin-right: .333em; } mjx-container [size="s"] { font-size: 70.7%; } mjx-container [size="ss"] { font-size: 50%; } mjx-container [size="Tn"] { font-size: 60%; } mjx-container [size="sm"] { font-size: 85%; } mjx-container [size="lg"] { font-size: 120%; } mjx-container [size="Lg"] { font-size: 144%; } mjx-container [size="LG"] { font-size: 173%; } mjx-container [size="hg"] { font-size: 207%; } mjx-container [size="HG"] { font-size: 249%; } mjx-container [width="full"] { width: 100%; } mjx-box { display: inline-block; } mjx-block { display: block; } mjx-itable { display: inline-table; } mjx-row { display: table-row; } mjx-row > * { display: table-cell; } mjx-mtext { display: inline-block; } mjx-mstyle { display: inline-block; } mjx-merror { display: inline-block; color: red; background-color: yellow; } mjx-mphantom { visibility: hidden; } _::-webkit-full-page-media, _:future, :root mjx-container { will-change: opacity; } mjx-c::before { display: block; width: 0; } .MJX-TEX { font-family: MJXZERO, MJXTEX; } .TEX-B { font-family: MJXZERO, MJXTEX-B; } .TEX-I { font-family: MJXZERO, MJXTEX-I; } .TEX-MI { font-family: MJXZERO, MJXTEX-MI; } .TEX-BI { font-family: MJXZERO, MJXTEX-BI; } .TEX-S1 { font-family: MJXZERO, MJXTEX-S1; } .TEX-S2 { font-family: MJXZERO, MJXTEX-S2; } .TEX-S3 { font-family: MJXZERO, MJXTEX-S3; } .TEX-S4 { font-family: MJXZERO, MJXTEX-S4; } .TEX-A { font-family: MJXZERO, MJXTEX-A; } .TEX-C { font-family: MJXZERO, MJXTEX-C; } .TEX-CB { font-family: MJXZERO, MJXTEX-CB; } .TEX-FR { font-family: MJXZERO, MJXTEX-FR; } .TEX-FRB { font-family: MJXZERO, MJXTEX-FRB; } .TEX-SS { font-family: MJXZERO, MJXTEX-SS; } .TEX-SSB { font-family: MJXZERO, MJXTEX-SSB; } .TEX-SSI { font-family: MJXZERO, MJXTEX-SSI; } .TEX-SC { font-family: MJXZERO, MJXTEX-SC; } .TEX-T { font-family: MJXZERO, MJXTEX-T; } .TEX-V { font-family: MJXZERO, MJXTEX-V; } .TEX-VB { font-family: MJXZERO, MJXTEX-VB; } mjx-stretchy-v mjx-c, mjx-stretchy-h mjx-c { font-family: MJXZERO, MJXTEX-S1, MJXTEX-S4, MJXTEX, MJXTEX-A ! important; } @font-face /* 0 */ { font-family: MJXZERO; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Zero.woff") format("woff"); } @font-face /* 1 */ { font-family: MJXTEX; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Regular.woff") format("woff"); } @font-face /* 2 */ { font-family: MJXTEX-B; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Bold.woff") format("woff"); } @font-face /* 3 */ { font-family: MJXTEX-I; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Math-Italic.woff") format("woff"); } @font-face /* 4 */ { font-family: MJXTEX-MI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Italic.woff") format("woff"); } @font-face /* 5 */ { font-family: MJXTEX-BI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Math-BoldItalic.woff") format("woff"); } @font-face /* 6 */ { font-family: MJXTEX-S1; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size1-Regular.woff") format("woff"); } @font-face /* 7 */ { font-family: MJXTEX-S2; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size2-Regular.woff") format("woff"); } @font-face /* 8 */ { font-family: MJXTEX-S3; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size3-Regular.woff") format("woff"); } @font-face /* 9 */ { font-family: MJXTEX-S4; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size4-Regular.woff") format("woff"); } @font-face /* 10 */ { font-family: MJXTEX-A; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_AMS-Regular.woff") format("woff"); } @font-face /* 11 */ { font-family: MJXTEX-C; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Calligraphic-Regular.woff") format("woff"); } @font-face /* 12 */ { font-family: MJXTEX-CB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Calligraphic-Bold.woff") format("woff"); } @font-face /* 13 */ { font-family: MJXTEX-FR; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Fraktur-Regular.woff") format("woff"); } @font-face /* 14 */ { font-family: MJXTEX-FRB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Fraktur-Bold.woff") format("woff"); } @font-face /* 15 */ { font-family: MJXTEX-SS; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Regular.woff") format("woff"); } @font-face /* 16 */ { font-family: MJXTEX-SSB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Bold.woff") format("woff"); } @font-face /* 17 */ { font-family: MJXTEX-SSI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Italic.woff") format("woff"); } @font-face /* 18 */ { font-family: MJXTEX-SC; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Script-Regular.woff") format("woff"); } @font-face /* 19 */ { font-family: MJXTEX-T; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Typewriter-Regular.woff") format("woff"); } @font-face /* 20 */ { font-family: MJXTEX-V; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Vector-Regular.woff") format("woff"); } @font-face /* 21 */ { font-family: MJXTEX-VB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Vector-Bold.woff") format("woff"); } time by a semi-Thue system equipped with an algebraic oracle.
BackgroundFeel free to skip this if you're up to date on latent reasoning.
Implicit reasoning, latent reasoning, CoT-free generalization, etc. are all names for a similar thing. Modern LLMs can process information on two scales: implicitly, within one forward pass, and explicitly, across multiple forward passes. One important question is "What can LLMs do in a single forward pass, and what do they need to do across multiple forward passes?"
Previous research by Balesni et. al. (2025) has found that LLMs cannot, in general do two-hop latent reasoning: when trained on a pair of facts such as "Russ is the spouse of Hay", and "Hay was born in Detroit" they are unable to compose the fact "Russ's spouse was born in Detroit". They also struggle on the fact reversal task, being unable to infer "Uriah Hawthorne composed Abyssal Melodies" desbite being trained on "Abyssal Melodies was composed by Uriah Hawthorne" (Berglund et. al. 2023).
LLMs also typically struggle with latent comparison, as discovered by Allen-Zhu et. al. (2023): though GPT-4 was able to recall numerical facts perfectly, it was unable to compare these numbers across entities without thinking step-by-step.
Results by Wang et. al. (2024) found that transformer models can, in fact, generalize the comparison task, even out-of-distribution (but can only generalize fact composition in-distribution) by grokking, which is is a process whereby training a (small) model far beyond overfitting leads to a sudden generalization (Power et. al. 2022). This phenomenon was studied further by Nanda et. al. (2023) who attribute it to a delayed phase-change in the network which is favoured by an inductive bias in the SGD (or Adam) optimizer, as well as weight decay.
Neural networks can grok the operation of permutation groups over five, and six elements, seemingly requiring neurons in a single layer to cleanly generalize the operation of a permutation group over elements (Stander et. al. 2023).
Grokking GroupsWhich group operations can a model generalize?
GroupsA group is (feel free to skip this if you know about groups) a set of elements equipped with a map from called the group operation. There exists an identity element such that , and , and .
If we have two groups we can make a product group , with identities and operations by pairing up elements like so:
If then the group is called abelian.
Cyclic GroupsThe archetypical abelian groups are the cyclic groups , which are just integers , with addition as the group operation. Any finite abelian group is just a product group of cyclic groups. Therefore, if a network can generalize modular addition, we might expect it to be able to generalize any finite abelian group's operation.
Permutation GroupsAnother archetypical type of group is a permutation group over elements, written as . Each group element is a rearrangement of the elements in the group, and composition of elements involves just doing two rearrangements in a row.
Now consider an arbitrary group with elements. For any element , consider multiplying every element in by . This must map each element of to a distinct new one.[1] Therefore, each element of can be identified with a permutation over , and our group is a subgroup of . Therefore, if a model can learn any permutation group of arbitrary size, it can learn any finite group's operation.
This is interesting, since we already have very strong evidence that models can generalize permutation groups from Stander et. al. (2023)! Therefore, we might conjecture that neural networks can generalize any finite group.
Infinite GroupsThis is where things get messy. Neural networks obviously cannot grok infinite groups if we express each group element as a specific token, since they would need to have an infinitely-large embedding matrix.
One important class of infinite groups are the free groups over elements. Suppose we have two elements: and . We also need inverses and . The free group over elements is just the set of words (including the empty string ) over the alphabet where adjacent and terms "annihilate" and the group operation is concatenation. So is allowed as a real group element, but isn't, since the elements annihilate, leaving which also annihilates, therefore .
The free group over element is isomorphic to the integers under addition.
We can, instead, truncate an infinite group into a finite number of elements, for example limiting our free group over elements to words with fewer than letters in them. This means that we no longer have closure: which is outside of our truncated zone (we do have plenty of possible operation examples using four letters, though, such as ). This is not a subgroup of a permutation group. It also can't be represented using the trick I'm about to introduce.
Representations as matricesIt's common in maths to want to represent a group as a matrix. Matrices are easy to study, and their multiplication follows nice rules (in fact, many subsets of matrices of size form an infinite group).
We might not care about this, though. The permutation group can be represented as the matrices which swap the elements of a vector around. Then we can just multiply them.
Any cyclic group admits a two-dimensional representation as a rotation matrix:
All finite groups can be represented, because all permutation groups can be represented. In fact, most of them can be represented much more efficiently than by converting them first to . Lots of infinite ones can be represented as well (complex numbers with are a clear example, they're just 2D rotations), but the infinite ones we've chosen are not representable.
This means that, if models can generalize matrix multiplication, they can generalize a huge number of group multiplications.
MonoidsA monoid is like a group without inverses. One example is the "transition monoid" on elements , which is a bit like the permutation group on elements. Instead of having to swap the elements so that they all end up in different spots, the includes rearrangements which put two (or more) elements in the same spot. We can still compose these just the same, but there's no way to invert them since we don't know which element started off where.
There are also a lot more possible transitions than there are permutations. For each transition, we have to specify one of destinations for each of starting points with no restrictions, so while , , and these monoids get really big really quickly.
We can generate a sub-monoid of this monoid by starting with two transition functions and just applying them in all possible combinations until we stop getting new transitions. The two starting transitions here are called "generators".
can still be represented as matrices, though.
As with groups any finite monoid can be represented as a sub-monoid of , which means that all finite monoids can be represented as matrices.
There are some other funny monoid options as well, which we'll meet later.
Matrix MultiplicationMatrix multiplication of random matrices takes neurons to learn.[2]
The modular arithmetic functions learned for in (Stander et. al. 2023) were able to do it in neurons, which is significantly better than the predicted by this conjecture. Looking into them, they don't appear to be doing matrix multiplication. Since every group can be represented as a subgroup of , this means that every group should be generalizable by a model with neurons.
However, the function learned when modular addition is grokked in Nanda et. al. (2023) does look quite a lot like matrix multiplication averaged across multiple frequencies. Weirdly, the model doesn't just do the base frequency , it does a few multiples of that, and then finds the place where they all agree, by the Chinese remainder theorem. Perhaps this is just how the model is using its extra residual stream dimensions to de-noise its estimates (since it wasn't a network that could do multiplication exactly). These results might suggest that the required neurons for grokking a cyclic group is , or it might be some low power of or , depending on how much de-noising is across different estimators is required.
Groups that GrokI ran through twelve different monoids. Three abelian groups (products of cyclic groups), three non-abelian groups, three truncated infinite groups, and three finite monoids:
The groups are:
Abelian- , cyclic group of order 97.
- , product of two copies of the cyclic group of order 11.
- , product of three copies of the cyclic group of order 5.
- , permutations of a list of 5 elements,
- , symmetry group of a 48-sided polygon, 48-fold rotation + symmetry
- , matrices of the form , where are integers .
- , free group on two elements
- , free group on three elements
- , braid group on three strings
- , random transition monoid on five elements, with two generators
- monoid of matrices where are integers
- , the "rook monoid" or monoid of matrices with elements of only and , such that no two s are on the same row or column, e.g.
or
The abelian groups grok pretty quickly, the non-abelian groups grok slowly, the truncated infinite groups don't seem to grok at all, and the monoids vary.
I'm not quite going to go into why some things grok more rapidly than others. I'd guess it's to do with the subgroup and submonoid structures within them. What I will think about is what's going on with the truncated infinite groups.
Learning SetsOK let's move on to a different problem. Not groups. We'll generate a matrix of items from different categories and put them into different sets.
Category 1: Animals
Category 2: Colours
Category 3: Gemstones
Category 4: Countries
Set 1
Dog
Red
Quartz
Germany
Set 2
Cat
Blue
Diamond
Japan
Set 3
Frog
Yellow
Emerald
Nigeria
Set 4
Fish
Green
Ruby
Mexico
Ok, first we'll just try with random tokens, but we'll go back and use LLMs later. We'll choose six categories and eight sets. We'll train Pythia 70M on a task that looks like this:
"In Bostock's matching game, the animal cat is matched with the colour ..." where the desired completion is "Blue". We generate a bunch of those
This is basically just the logical relation , which we can think of this as the edge . We can make a nice diagram showing which edges we've trained on, and which ones we've tested on. For each of three repeats, we'll shuffle the sets and categories around completely, keeping only the overall structure of the edges the same.
On the final row, we've plotted the accuracy of a linear probe applied to a late residual stream activation of the model, and trained on the ground truth of which set we're in (specifically, we learn a dimensional matrix to project the residual stream down to eight dimensions, and then take a softmax and cross-entropy loss, using a random train/test split).
Clearly, at 6 train edges and 6 test edges, we haven't learned anything. What about 12 train edges?
Here, some of them have successfully generalized. Oddly, at the point where the model begins to learn, the probe becomes less accurate. Now we'll try with more and more train edges. Here's 18
And here's 24.
With more training edges we see proper generalization.
Now we see a weird pattern: the probe drops in accuracy just before generalization, then climbs back up. I expect the high accuracy of the probe at model initialization is due to it taking a different, independent random split of possible examples.
Toy ModelsIn these cases the tokens aren't semantically loaded at the start. We might as well just think of the problem as randomly initializing tokens with names like and then giving it problems like:
We get a slightly different pattern of behaviour in the probe when we do this: here's 12 train edges:
Now we see a slight climb in the accuracy around partial generalization. Interestingly, the pattern of which edges get generalized seems to be the same between the LLM and the toy model.
And here's 24:
Now the probe really looks like it's telling us something. It's telling us is that the model is actually learning a linear representation of set membership.
So while models can't generalize from "Rob's father is John" and "John's father is Alex" to "Rob's father's father is Alex", they can generalize from "Dog matches to the colour Red" and "Red matches to the number Five" to "Dog matches to the number Five". What is up here?
Putting It All TogetherOk, so there's an intuition here which happens to map roughly onto the concept of a semi-Thue system equipped with an algebraic oracle, but it might well map onto other concepts better.
A semi-Thue system is a system which takes in a string (a series of "letters" from some "alphabet") and applies rules to transform it. There might be a number of rules which can be applied to that string at any one time, but we'll just think about one rule at a time. A "string" is a bit of a funky concept for us, we'll think of every sequence of tokens as being mapped to some string in a pretty flexible way. A token might map to one or more letters. I'm playing a bit fast and loose here to get the intuition across, because I don't trust myself fully with the maths.
Fact composition can be applied indefinitely. If your rulesreplace "John's Father" with "Alex" and "Alex's Sister" with "Claire" then you might expect it to generalize. You can have "John's Father's Sister's Cat's Kitten's Owner's Landlord's...". But 'fully' generalize, from being trained on those facts a model would have to be able to do an arbitrarily large reduction of facts. And you can't skip any steps without memorizing an arbitrarily large table of composed relations. A model cannot perform an arbitrarily large number of computations in a single forward pass. It doesn't generalize.
The set matching problem from earlier might look like that. If you've seen "Dog's colour is Red, and Red's number if Five" then you might end up trying to solve "Dog's number is..." by taking multiple steps.
But if you introduce an extra set of concepts, the sets themselves, you can instead solve "Dog's colour is..." by going Dog + Colour Set_2 + Colour Red.
If you also solve "Red's number is" by going Red + Number Set_2 + Number Five, then you can solve "Dog's number" by Dog + Number Set_2 + Number Five. This chain is in length, and requires rules.
This is the difference.
OraclesWe've seen how learning a symbolic concept can work. What if we expand that? What if we allow ourselves to map some things to real numbers, and then do algebra on them? Adding this kind of extra functionality is usually called adding an "oracle" to our system, because the rules can ask it to return some algebraic result instantly.
The comparison task "Who is taller, Obama or Biden?" can be solved in steps by mapping each entity+attribute pair to a real number representing that attribute (in this case and , and then comparing the two real numbers.
Any finite group can be represented using a matrix. If we say that the "oracle" can perform any matrix multiplication up to a given size (also instantaneously) then we can solve any group or monoid operation in time. Or at least for any representable group or monoid.
The truncated infinite ones, while they could be generalized, can't be generalized by a finite ruleset which always finishes in steps. If we have to perform then we might need to apply arbitrarily many cancellation rules.
The ConjectureThe conjecture I am introducing to describe this pattern of implicit generalization: if a problem can be generally solved in time with a small-ish ruleset (like of the dataset size or something), then a transformer can learn it implicitly. If it can't a transformer won't learn it implicitly.
Secondary conjecture I'm less confident in: if a problem takes bigger than time but rules in the dataset size, transformers can reason through it explicitly.
Code for the group relations and set sorting operations can be found at:
https://github.com/jonathanbostock/group-grokking
https://github.com/jonathanbostock/a-is-b-is-c
Editor's note: this post was released as part of Doublehaven (no official relation to Inkhaven)
◆◆◆◆◆|◆◆◆◆◆|◆◆◆◇◇
◆◆◆◆◆|◆◆◆◆◆|◆◆◇◇◇
- ^
If they weren't distinct, and we had
we would have
- ^
Which is probably related to the operations found in the standard matrix algorithm , which might be implemented as:
Where is the matrix with a 1 at and 0 everywhere else, though I will wildly guess that the actual experiment run looks more like some random distribution:
For tuples of random unit vectors and individual neurons drawn from a distribution such that . If we sum over unit vectors aligned with the axes, this gives us the original value.
Discuss
Who Killed Common Law?
The classical undergraduate humanities curriculum in America was destroyed and replaced over the course of the twentieth century. The destruction is usually blamed on postmodernism in the 1970s, but the replacement was already well under way by then. Neither the attackers nor the defenders of the old curriculum can (or will) explain what happened and why.
Allan Bloom's essay "Our Listless Universities" (the 1982 essay later expanded into The Closing of the American Mind) is the most famous attempt at a defense, and it reads at first as though it has no argument at all: it asserts the superiority of the Western Tradition and the badness of rock music without much visible reasoning. But if I relax my eyes and let the nearby details blur, a latent argument floats into focus. Bloom identifies a certain kind of value relativism, imported from German philosophy (he mentions Nietzsche, Weber, and Heidegger) as the solvent that removed the American university's commitment to truth.
Bloom's essay is trying to say "America is over, let's think through rationally what to do next" in a way that makes mainstream conservatives feel like he's on their side and should be funded to stick it to the libs. It produces the sensation of allegiance to the Western Tradition, not the grounds for it. Naturally this involves writing some connotative checks it can't denotatively cash.
But why couldn't the old curriculum be defended on its merits? The roots go deeper than Bloom or his critics acknowledge.
In 1066, soldiers loyal to a man named William conquered the island kingdom of England. He now had to govern a country whose customs he didn't know. His solution, refined over the next century by his successors, was to send royal judges around the country to settle disputes. These judges had no code. They had to figure out what the local rules were, case by case. The rules varied from shire to shire and sometimes contradicted each other, so the judges reconciled them, and over generations their decisions accumulated into a body of law common to the whole kingdom. By the early 1600s, this system was old enough and robust enough that Chief Justice Edward Coke could tell King James I to his face that it was superior to the king's own judgment. [1] What had begun as a practical expedient for governing a conquered country had, over seven centuries, staked its authority on a claim: that the norms people actually live by, when reasoned through carefully and reconciled, converge. [2]
A tradition seven centuries deep does not die of exposure to Nietzsche unless something has already compromised its foundations. Antinomianism is the rejection of binding law or standards as such: the position that rules are external impositions to be evaded, abolished, or transcended rather than discovered principles to be understood and followed. [3] The American Puritans were explicitly worried about it, just as Luther had been in Germany. The first crisis of the Massachusetts Bay Colony, called the Antinomian Controversy, ended in 1638 with the trial and banishment of Anne Hutchinson for claiming that grace freed the saved from moral law.
In America, the credibility of the Common Law suffered a decisive blow around the time of the Civil War, when it failed to address the issue of slavery through legal mechanisms, and Americans resorted to war to settle their dispute. This was a delegitimating crisis, but it took a generation for America's governance institutions to be captured by a new antinomian ideology annealed by the war. Oliver Wendell Holmes Jr., whose major work begins with The Common Law in 1881, argued that law is prediction of what courts will do, not discovery of pre-existing principles. His Legal Realist heirs in the 1920s and 1930s completed the displacement of Common Law reasoning. [4] The result was Pragmatism: [5] a framework that retained the forms of lawful governance while abandoning the principle that law is discovered rather than made anew each time a court sits. [6]
Liberal systems had clearly delivered the goods to many people, including most of those the state depended on for high-skilled work, so the state still needed the legitimacy that liberal humanism provided. Progressivism, a species of Pragmatism accommodating a rising state with legacy commitments to accommodating socially liberal preferences, was naturally happy to mimic liberal humanism as long as there was demand. The forms persisted long after the substance was gone, and the persistence of the forms is precisely what makes people feel the substance must still be there somewhere.
Pragmatism is constitutively incapable of defending anything on principle, because it has replaced the concept of principle with the concept of what works. A society that runs on Pragmatism will hand over anything it is not currently using to anyone who asks with sufficient force, because it has no grounds for refusal that it can articulate even to itself.
When I first learned about the Kent State shootings, it was from my father, who described them as students protesting somewhat disruptively in favor of more electives and fewer required courses. More than a decade later, I learned the mainstream story: that on May 4, 1970, Ohio National Guard soldiers shot and killed four students during a protest against President Richard Nixon's expansion of the war into Cambodia. But now I wonder whether my father was on to something, and misremembered insightfully. Leftists challenging the legitimacy of the war machine were not able to win the concession of stopping the war, but the basically Pragmatist authorities were relatively willing to alter curricula and abandon a liberal humanism they never really cared about.
Conservative critics like Bloom play up the idea of esotericism and the inherent seditiousness of social criticism, creating the impression that what you see probably isn't all there is: if the social analysis is the smoke, maybe a plan to improve things is the unseen fire. But when I showed up at the Committee on Social Thought and carefully, delicately asked what was going on, they were just academics who write papers. [7] I had to escalate the directness of the question a few times before I got a clear answer; I'm not the sort of idiot who wouldn't at least try to flirt first in such circumstances. And while it's technically possible that I failed the initiation into a cult with genuine mysteries, the evidence seems more consistent with the hypothesis that there's no plan to do anything except keep reading and writing, and occupying comfortable positions among the elite in a crumbling society.
Sometimes you have something true and dangerous to say. Esoteric writing is the classical solution: you hide the truth in the text itself, so that careful readers can find it while careless or hostile ones see only the surface. Maimonides is Strauss's central example. In Guide for the Perplexed, the "esoteric" heretical meaning is the one you get if you ignore what the words are trying to make you feel and follow the arguments literally. [8] Spinoza says basically the same stuff centuries later, just without the mood lighting, and everyone totally loses their shit. When people despair of being heard, sometimes they just keep their private views private, and say what people want to hear. This is exoteric writing. The ideas of exoteric and esoteric writing are often confused, but they are not the same thing. The esoteric writer entrusts the truth to the text; the exoteric writer withholds it. And sometimes, the impression of hidden depth is nothing more than an artist's trick.
If we're to deal with these problems, we have to think through where we are, how we got there, and where we'd like to be.
This essay developed from a Twitter thread with David Chapman.
James objected: if the law is founded on reason, why can't I, who have reason, judge cases myself? Coke replied that the law required not natural reason but "artificial reason": the accumulated wisdom of centuries of careful adjudication, which no single mind could replicate. James nearly struck him. Coke, on his knees, quoted the thirteenth-century jurist Bracton: the king is under no man, but he is under God and the law. Blackstone systematized the tradition in his Commentaries on the Laws of England (1765-1769), which became the foundational legal text of the American colonies. ↩︎
This is structurally the same claim as Eliezer Yudkowsky's Coherent Extrapolated Volition: that human values, under sufficiently careful reflection and mutual understanding, converge rather than diverge. The Common Law tradition can be understood as a seven-century empirical test of this hypothesis. ↩︎
Antinomianism as a recurring pattern in Western Christianity, and the Calvinist response to it, is developed at length in "Calvinism as a Theory of Recovered High-Trust Agency". For more on how anti-normativity functions as a self-undermining commitment, see Jessica Taylor, "On Commitments to Anti-Normativity". ↩︎
Contemporary originalism, as practiced by the Federalist Society and adjacent movements, contests this displacement by attempting to restore Common Law principles through constitutional interpretation. But conservatism preserves what still exists; restoring what has already been lost is reaction, not conservation. Originalism in practice selects among founding-era precedents according to present political need, which is Pragmatism in historical dress. The alternative would be to develop a theory for how to rebuild the conditions under which the lost principles could be rediscovered (see the approach sketched in "Calvinism as a Theory of Recovered High-Trust Agency", cited above). This would be a revolutionary approach in the older use of the term, before the French Revolution changed its meaning to refer instead to a violent break with the past. ↩︎
I capitalize Pragmatism when referring to the specific philosophical and legal movement. The term requires some disambiguation. C. S. Peirce coined "pragmatism" in the 1870s to denote a logical method for clarifying the meaning of concepts by tracing their conceivable practical consequences (see his "How to Make Our Ideas Clear," 1878). William James popularized the term but transformed it into something different in kind. In James's own words: "'It is useful because it is true' or 'it is true because it is useful.' Both these phrases mean exactly the same thing" (Pragmatism, 1907). More baldly: "The true is only the expedient in the way of our thinking, just as the right is only the expedient in the way of our behaving." And: "Our obligation to seek truth is part of our general obligation to do what pays." Peirce called this a "transmogrification" of his idea and renamed his own position "pragmaticism," a word he said was "ugly enough to be safe from kidnappers" ("What Pragmatism Is," The Monist, 1905). The Pragmatism discussed in this essay, the one that captured American legal and governance institutions via Holmes and the Legal Realists, descends from James, not Peirce. Peirce's pragmaticism, a method of logical clarification committed to the reality of generals and the immutability of truth, has little in common with the antinomian instrumentalism that Holmes and his heirs made into American legal orthodoxy. ↩︎
The institutional death of civil law is ongoing and measurable. Tort filings in state courts (individuals seeking redress for wrongs done to them) declined more than 80% from 1993 to 2015, from about 10 per 1,000 Americans to fewer than 2 per 1,000 (the WSJ's analysis of National Center for State Courts data, reported in Joe Palazzolo, "We Won't See You in Court: The Era of Tort Lawsuits Is Waning," Wall Street Journal, July 24, 2017). Over the same period, contract cases (predominantly debt collection, foreclosure, and landlord-tenant disputes) rose from 18% to 51% of the civil docket. The courts are becoming a collections agency. Common law as a mechanism by which ordinary people hold others accountable for wrongs is disappearing. ↩︎
In fairness, I didn't speak with Agnes Callard. ↩︎
Tyler Cowen's term "mood affiliation" is useful here: the practice of evaluating claims based on the emotional associations they produce rather than on their logical content. ↩︎
Discuss
On Transport Incentive Design
Here in Helsinki, the public transport doesn't have access gates. Bus drivers check your ticket when you step in, but on trains, trams, and subway, you just step in [1] . The enforcement is done by inspectors who randomly board vehicles and check tickets. If you do not have one, you'll be charged a 100€ inspection fee, about 1.5 times the price of a monthly ticket.
The frequency at which I see inspectors suggests that it's slightly cheaper to never pay for a ticket, especially if you avoid them by e.g. leaving the train before they check your ticket [2] . Except of course, that dealing with the inspectors and paying the fine is extra work and negative feelings, and for me that flips the equation the other way around.
Not everyone minds that so much. Particularly, if you don't have any money, it can't be taken from you. There's also a more interesting dynamic here: some people have formed an insurance system where they have a group chat that pays the fines collectively whenever anyone gets one [3] . This is supposedly much cheaper than paying for the tickets.
This introduces a moral hazard [4] : since the cost of getting caught is largely externalized, one doesn't need to avoid getting caught as much. Of course, it's still some effort for you, and I'd assume anyone getting caught way too often will get kicked out of such groups.
I considered getting into one of these groups for journalistic purposes, but then decided it's way too much work anyway. One likely needs to know someone already in them to get in, and I wasn't interested in burning the social capital to source an invite. So, the next section will be based on educated guessing (read: pure speculation).
I'd also think it would be possible to scam such groups rather easily. While the payment details of the transport authority are easily verifiable, it's unlikely that they would pay every single fine by sending fifty transactions of 2€ each. Were I building this, there would be some kind of accounting system. Since I'm not, I assume they transfer money to the person getting the fine through MobilePay [5] , and then that person pays the fine. If there are trust issues, they could require a receipt of the payment, too, but that won't help much as you can easily fake screenshots.
Of course, the natural, rather funny, and sadly illegal solution to this would be that the transportation agency itself would infiltrate these groups and flood them with just enough fake fines to make it infeasible to run them.
There's a neater system that these scammers haven't figured out yet [6] . Instead of paying the fines, you could have a pool of accounts with monthly tickets. I'd assume one ticket per ten people would easily do. Then you'd pick one ticket from that pool every time an inspector needs to see one. I assume that there are no data analysts working to catch this kind of thing, and if there are, you could increase your pool size and do timing and distance analysis to avoid it. A similar system could be used for almost any other subscription thing like streaming services [7] .
Another interesting case of avoiding the ticket fare is using a fake ticket app. These show a ticket that looks like a valid ticket on your phone. You can show this to the bus driver to get in. This will not work with the inspectors, who check the QR code on the ticket. Showing a fake ticket is fraud, which is a rather serious crime and not just a 100€ fine. My understanding is that they prosecute these quite aggressively. One thing to note is that children under 15 years of age do not have criminal liability, and this can be (and is) abused.
A ticket costs a fixed amount of money, regardless of how many stops you ride. You basically either pay for 80 minutes or a month. There's no ticket for a five minute ride. This leaves a lot of value on the table. Anybody needing a lot of 5 minute rides then pays for a monthly ticket. Anybody who needs it twice a week walks or pays a huge premium for it. This is naturally a conscious decision: the main reasons are problems with enforcement, not wanting to have more complexity, and most importantly subsidising and incentivising regular users.
A similar thing happens with car parking. In my apartment building, there are a couple of parking spots reserved for visitors and such. They're always full. Then there's a parking lot which is quite expensive: renting a spot would cost perhaps 500-1000€ per year. I'd use a parking spot perhaps twenty days per year [8] . It would be really convenient, then, to have paid parking spots priced such that some were almost always unoccupied. They should cost so much that everybody who has a car all the time would rather pay for the parking lot. So if a parking lot spot is 1000€ per year, a paid spot must be at least 2.74€ per day so that it doesn't undercut the parking lot. Realistically it should probably be around 10€ per day. Short term rental of parking spots in the lot would also help with this.
So-called rideshare apps are super cheap sometimes, but the price is unpredictable. Even worse, the waiting time is unpredictable. And sometimes, I presume, the price is so low that drivers refuse to pick you up. I'd gladly pay more so that this doesn't happen, but the apps do not have this option. And if they did, I wouldn't trust it, as the incentives would look weird.
Once I ordered a regular old taxi to the airport at 5AM. The taxi driver told me that they had just been in the area fifteen minutes ago to drop someone off, and now they had to do a bit of useless back-and-forth driving. Why hadn't I preordered the taxi in the evening? Well, preordering costs 10€, and I've never had any trouble getting a ride. Why would I pay to make their job easier? Sadly, I didn't have the words to tell the taxi driver that.
This year after LWCW, I was staying in Berlin a bit longer. When I was going to our AirBnB with a friend, they questioned why I had bought a ticket. In their experience, inspections are quite rare and if you don't have a ticket, most of the time they just tell you to buy one instead of fining you. So the punishment of buying a ticket is having to buy one? Why would anybody buy a ticket, then?
Previously, I was of the opinion that one is supposed to exploit any and all weaknesses of systems, so that the bad guys aren't the only ones profiting. Nowadays I mostly do so only if the system leaves me feeling like a sucker for complying. Otherwise, it's just feeding the Moloch. The optimal amount of fraud is non-zero.
Some high-volume bus routes also don't check tickets when you get in. ↩︎
This wildly varies between routes and travel hours. I also don't keep any real statistics on this and perhaps I'm just mistaken. ↩︎
Source, in Finnish: https://yle.fi/a/74-20036911 ↩︎
"Moral hazard in insurance is when the existence of insurance makes it incentive-compatible for you to be imprudent in your own risk taking, expecting someone else to bear the consequences." -BitsAboutMoney: Banking in very uncertain times ↩︎
Local CashApp equivalent. ↩︎
I'm not too worried that publishing such an idea will lead to anyone exploiting it. People capable of that have much more profitable engagements available to them. ↩︎
When combined with a VPN. But that's more work than regular old piracy so nobody bothers with this. ↩︎
With a loaned or a rental car, or for a professional cleaning service to park ↩︎
Discuss
Annoyingly Principled People, and what befalls them
Here are two beliefs that are sort of haunting me right now:
- Folk who try to push people to uphold principles (whether established ones or novel ones), are kinda an important bedrock of civilization.
- Also, those people are really annoying and often, like, a little bit crazy
And these both feel fairly important.
I’ve learned a lot from people who have some kind of hobbyhorse about how society is treating something as okay/fine, when it’s not okay/fine. When they first started complaining about it, I’d be like “why is X such a big deal to you?”. Then a few years later I’ve thought about it more and I’m like “okay, yep, yes X is a big deal”.
Some examples of X, including noticing that…
- people are casually saying they will do stuff, and then not doing it.
- someone makes a joke about doing something that’s kinda immoral, and everyone laughs, and no one seems to quite be registering “but that was kinda immoral.”
- people in a social group are systematically not saying certain things (say, for political reasons), and this is creating weird blind spots for newcomers to the community and maybe old-timers too.
- someone (or a group) has a pattern of being very slightly dickish in some way, where any given instance is not that bad, so if you call them out for that instance, it feels out of proportion. But, they’re doing it a lot, which is adding up to a substantial cost they’re inflicting.
Society depends on having norms. Someone gotta uphold the norms. Someone gotta figure out where society is currently wrong and push for better norms.
But, it’s super uncomfortable to tell a bunch of comfortable people “hey, the behaviors you are currently doing are actually kinda bad, it’d be way better if you did this other thing.”
So, most people don’t.
The people that do, are people who are selected for a mix of “conflict-prone-ness” and “really really care about the hill that they are dying on, to an excessive degree.”
There’s a first order problem, where they are kinda more aggro than I/most-people think is worth putting up with about their pet issue. (Even if I’ve updated that “actually, that issue was quite important, I should internalize that principle”).
But there’s a second order problem that I’ve seen in at least a few cases, that goes something like:
Alice decides Principle X is important enough to make a big deal about.
People don’t seem to understand the issue. Alice explains it more. Some people maybe get it but then next week they seem to have forgotten. Other people still don’t get it.
A problem I’ve previously talked about is Norm Innovation and Theory of Mind where Alice is overestimating how easy it is to explain a new norm to someone, and kinda assuming logical omniscience of the people she’s talking to.
But, there’s another thing, which is: people… keep mysteriously not understanding why X is a big deal. Any given instance of it is maybe explained by “actually the reason for X was a fairly complicated idea, and maybe some people legitimately disagree.” But, something feels epistemically slippery. It feels like Bob and Charlie and everyone else keep… systematically missing the point, sliding off it.
One explanation is: it would be really inconvenient for Bob and Charlie and everyone to accept that X is important enough to change their behavior around. And Bob and Charlie etc end up sort of implicitly coordinating to downplay X, sometimes while paying lip service to it, or finding excuses not to care. A subtle social war is waged.
And Alice eventually begins to (correctly) pick up on the fact that people aren’t merely not getting it. They sort of systematically choosing to believe or say false things or bad arguments, to avoid having to get it.
This gives Alice the (sometimes) correct sense that (many) people are gaslighting her – not merely disagreeing, but, disagreeing in a way that sure looks like people are implicitly colluding to distort their shared map of reality in a way that let’s them ignore Alice’s arguments about X, which conveniently lets them not have to adopt weird new beliefs or risk upsetting their other friends. Making Alice feel like she's the one losing her group on reality.
Each of these people contains two wolves multiple motivations driving them. When I’ve been Bob, it’s often been the case that I both am executing some kind of good faith investigation into whether X is true and also, part of me was motivated to do something that let me feel important / in control or whatever.
Society has a bunch of people in it. Some are more well-meaning than others. Some of the well-meaning people are more implicitly colluding than others. Some of them are actively colluding. Sometimes Alice accuses someone of acting in bad faith and it really is a false positive and then they get mad at Alice. And, sometimes the person is acting in bad faith, maybe even deliberately, and they get mad at Alice too, using the same arguments as the well-meaning person.
Alice ends up in a world where it looks like people are systematically trying to undermine her, and she starts engaging with the world more hostile-y, and then the world starts engaging more hostile-y back.
This… can end with Alice being kinda paranoid and/or traumatized and/or trying to argue her point more intensely. Sometimes this sort of radicalizes Alice.
This ends up in a feedback loop where… idk, I think “Alice has become a little crazy” is not that unreasonable a description about it.
But, Alice was right (at least about the broad points in the beginning).
Alices are not fun to be around, and sometimes they end up conflict-prone and absolutist in a way that I think is actually kinda bad and I end up avoiding them because it’s not worth the cost of dealing with.
But, Alices are also rare and precious – they are the ones who noticed something was wrong and worth calling out, and, who were willing to actually push past social awkwardness about it.
(But, but, also, the world contains Alexes, who are not right about their pet issue, they just have a pet issue that doesn’t really make much sense and they also go kinda crazy in the same way but they didn’t actually really have a good point that was worth listening too in the beginning. idk watch out)
…
This essay does not end with me particularly knowing what to do. But, at the very least, I think it’s appropriate to at least be sympathetic to Alices, when you’re pretty sure their core ideas were at least directionally right.
…
Maybe, the move I wish people had was:
First, cultivate the skill of noticing when you’re (at least partially) politically motivated to believe or disbelieve something. Notice when you are being epistemically slippery. Especially if it seems to come alongside someone complaining about something you don’t really understand.
Then, when you notice in your heart that you’re not going to apply Principle X because it would be really annoying and inconvenient, just say “Yep, I am just not applying Principle X because it’s inconvenient or too costly or not worth the tradeoff”, instead of making up reasons that Principle X is wrong.
(This does require Alice to actually accept that graciously. It’s a bit awkward figuring out what the norms should be, because, well, Alice in fact does think Principle X is worth fighting for and Bob saying “cool, but no I’m not gonna do that” doesn’t really resolve that conflict. But, at least within that conversation, probably Alice should accept it from Bob and move on, at least if she values not getting subtly gaslit by Bob)
I’m not sure if this would actually help, but, it feels like a marginal improvement over the status quo.
Discuss
AI for epistemics: the good, the bad and the ugly
For better or worse, AI could reshape the way that people work out what to believe and what to do. What are the prospects here?
In this piece, we’re going to map out the trajectory space as we see it. First, we’ll lay out three sets of dynamics that could shape how AI impacts epistemics (how we make sense of the world and figure out what’s true):
- The good: there’s huge potential for AI to uplift our ability to track what’s true and make good decisions
- The bad: AI could also make the world harder for us to understand, without anyone intending for that to happen
- The ugly: malicious actors could use AI to actively disrupt epistemics
Then we’ll argue that feedback loops could easily push towards much better or worse epistemics than we’ve seen historically, making near-term work on AI for epistemics unusually important.
The stakes here are potentially very high. As AI advances, we’ll be faced with a whole raft of civilisational-level decisions to make. How well we’re able to understand and reason about what’s happening could make the difference between a future that we’ve chosen soberly and wisely, and a catastrophe we stumble into unawares.
The good“If I have seen further, it is by standing on the shoulders of giants.” (Isaac Newton)
There are lots of ways that AI could help improve epistemics. Many kinds of AI tools could directly improve our ability to think and reason. We’ve written more about these in our design sketches, but here are some illustrations:
- Tools for collective epistemics could make it easy to know what’s trustworthy and reward honesty, making it harder for actors to hide risky actions or concentrate power by manipulating others’ views.
- Imagine that when you go online, “community notes for everything” flag content that other users have found misleading, and “rhetoric highlighting” automatically flags persuasive but potentially misleading language. With a few clicks, you can see the epistemic track record of any actor, or access the full provenance of a given claim. Anyone who wants can compare state-of-the-art AI systems using epistemic virtue evals, which also exert pressure at the AI development stage.
- Tools for strategic awareness could deepen people’s understanding of what’s actually going on around them, making it easier to make good decisions, keep up with the pace of progress, and steer away from failure modes like gradual disempowerment.
- Imagine that superforecaster-level forecasting and scenario planning are available on tap, and automated OSINT gives people access to much higher quality information about the state of the world.
- Technological analogues to angels-on-the-shoulder, like personalised learning systems and reflection tools, could make decision-makers better informed, more situationally aware, and more in touch with their own values.
- Imagine that everyone has access to high-quality personalised learning, automated deep briefings for high-stakes decisions, and reflection tools to help them understand themselves better. In the background, aligned recommender systems promote long-term user endorsement, and some users enable a guardian coach system which flags any actions the person might regret taking in real time.
Structurally, AI progress might also enable better reasoning and understanding, for example by automating labour such that people have more time and attention, or by making people wealthier and healthier.
These changes might enable us to approach something like epistemic flourishing, where it’s easier to find out what’s true than it is to lie, and the world in most people’s heads is pretty similar to the world as it actually is. This could radically improve our prospects of safely navigating the transition to advanced AI, by:
- Helping us to keep pace with the increasing speed and complexity of the situation, so we’re able to make informed and timely decisions.
- Ensuring that key decision-makers don’t make catastrophic unforced errors through lack of information or understanding.
- Making it harder for malicious actors to manipulate the information environment in their favour to increase their own influence.
A Philosopher Lecturing on the Orrery, by Joseph Wright of Derby (1766)
What’s driving these potential improvements?
- AI will be able to think much more cheaply and quickly than humans. Partly this will mean that we can reach many more insights with much less effort. Partly this will make it possible to understand things that are currently infeasible for us to understand (because it would take too many humans too long to figure it out).
- AI can ‘know’ much more than any human. Right now, a lot of information is siloed in specific expert communities, and it’s slow to filter out to other places even when it would be very useful there. AI will be able to port and apply knowledge much more quickly to the relevant places.
“A wealth of information creates a poverty of attention.” (Herbert Simon)
AI could also make epistemics worse without anyone intending it, by making the world more confusing and degrading our information and processing.
There are a few different ways that AI could unintentionally weaken our epistemics:
- The world gets faster and more complex. As AI progresses, our information-processing capabilities are going to go up — but so is the complexity of the world. Technological progress could become dramatically faster than today, making the world more disorienting and harder to understand than it is today. If tech progress reaches fast enough speeds, it’s possible that we won’t be able to keep up, and even the best AI tools available won’t help us to see through the fog.
- The quality of the information we’re interacting with gets worse, because of:
- Faster memetic evolution. As more and more content is generated by and mediated through AI systems working at machine speeds, the pace of memetic and cultural change will probably get a lot faster than it is today. As the pace quickens, memes which are attention-grabbing could increasingly outcompete those which are truthful.
- More difficult verification. This could happen through a combination of:
- AI slop. In hard-to-verify domains, AI could massively increase the quantity of plausible-looking but wrong information, without also being able to help us to verify which bits are right.
- AI-generated ‘evidence’. As the quality of AI-generated video, audio, images, and text continues to improve, it may become pretty difficult to tell which bits of evidence are real and which are spurious.
- We get worse at processing the information we get, because:
- Our emotions get in the way. AI progress could be very disorienting, generate serious crises, and cause people a lot of worry and fear. This could get in the way of clear thinking.
- Using AI to help us with information processing degrades our thinking, via:
- Adoption of low-quality AI tools for epistemics: In many areas of epistemics, it’s hard to say what counts as ‘good’. This makes epistemic tools harder to assess, and could lead to people trusting these tools either too much or too little. Inappropriately high levels of trust in epistemic tools could take various forms, including:
- First mover advantages for early but imperfect systems, which are then hard to replace with better systems because people trust the earlier systems more.
- The use of epistemically misaligned systems, which aren’t actually truth-tracking but it’s not possible for us to discern that.
- Fragmentation of the information environment: AI will make it easier to create content (potentially interactive content) that pulls people in and monopolises their attention. This could reduce attention available for important truth-tracking mechanisms, and make it harder to coordinate groups of people to important actions. In the extreme, some people might end up in effectively closed information bubbles, where all of their information is heavily filtered through the AI systems they interact with directly. The more fragmented the information environment becomes, the harder it could get for people to make sense of what’s happening in the world around them, and to engage with other people and other information bubbles.
- Epistemic dependence: if people increasingly outsource their thinking to AI systems, they may lose the ability to think critically for themselves.
“The ideal subject of totalitarian rule is not the convinced Nazi or the convinced Communist, but people for whom the distinction between fact and fiction (i.e., the reality of experience) and the distinction between true and false (i.e., the standards of thought) no longer exist.” (Hannah Arendt, The Origins of Totalitarianism)
We’ve just talked about ways that AI could make epistemics worse without anyone intending that. But we might also see actors using AI to actively interfere with societal epistemics. (In reality these things are a spectrum, and the dynamics we discussed in the preceding section could also be actively exploited.)
What might this look like?
- Automated propaganda and persuasion: AI could be used to generate high-quality persuasive content at scale. This could take the form of highly tailored, well-written propaganda. If this content were then used as training data for next generation models, biases could get even more entrenched. Additionally, AI persuasion could come in the form of models which are subtly biased in a particular direction. Particularly if many users are spending large amounts of time talking to AI (e.g. AI companions), the persuasive effects could be much larger than is scalable today via human-to-human persuasion.
- Using AI to undermine sense-making: AI could be used to generate high-quality content which casts doubt on institutions, individuals, and tools that would help people understand what’s going on, or to directly sabotage such tools. More indirectly, actors could also use AI to generate content which adds to complexity, for example by wrapping important information in complex abstractions and technicalities, and generating large quantities of very readable reports and news stories which distract attention.
- Surveillance: AI surveillance could monitor people’s communications in much more fine-grained ways, and punish them when they appear to be thinking along undesirable lines. This could be abused by states, or could become a tool that private actors can wield against their enemies. In either case, the chilling effect on people’s thinking and behaviour could be significant.
But maybe this is all a bit paranoid. Why expect this to happen?
There’s a long history of powerful actors trying to distort epistemics,[1] so we should expect that some people will be trying to do this. And AI will probably give them better opportunities to manipulate other people’s epistemics than have existed historically:
- It’s likely that access to the best AI systems and compute will be unequal, which favours abuse.
- If people end up primarily interfacing with the world via AI systems, this will create a big lever for epistemic influence that doesn’t exist currently. It could be much easier to influence the behaviour of lots of AI systems at once than lots of people or organisations.
It’s also worth noting that many of these abuses of epistemic tech don’t require people to have some Machiavellian scheme to disrupt epistemics or seek power for themselves (though these might arise later). Motivated reasoning could get you a long way:
- Legitimate communications and advertising blur into propaganda, and microtargeting is already a common strategy.
- It’s easy to imagine that in training an AI system, a company might want to use something like its own profits as a training signal, without explicitly recognising the potential epistemic effects of this in terms of bias.
With all these dynamics pulling in different directions, should we expect that it’s going to get easier or harder for people to make sense of the world?
We think it could go either way, and that how this plays out is extremely consequential.
The main reason we think this is that the dynamics above are self-reinforcing, so the direction we set off in initially could have large compounding effects. In general, the better your reasoning tools and information, the easier it is for you to recognise what is good for your own reasoning, and therefore to improve your reasoning tools and information. The worse they are, the harder it is to improve them (particularly if malicious actors are actively trying to prevent that).
We already see this empirically. The Scientific Revolution and the Enlightenment can be seen as examples of good epistemics reinforcing themselves. Distorted epistemic environments often also have self-perpetuating properties. Cults often require members to move into communal housing and cut contact with family and friends who question the group. Scientology frames psychiatry’s rejection of its claims as evidence of a conspiracy against it.
And on top of historical patterns, there are AI-specific feedback loops that reinforce initial epistemic conditions:
- Unlike previous information tech, AI has a tight feedback loop between content generated, and data used for training future models. So if models generate in/accurate content, future models are more likely to do so too.
- How early AI systems behave epistemically will shape user expectations and what kinds of future AI behaviour there’s a market for.
There are self-correcting dynamics too, so these self-reinforcing loops won’t go on forever. But we think it’s decently likely that epistemics get much better or much worse than they’ve been historically:
- One self-correcting mechanism historically has just been that it takes (human) effort to sustain or degrade epistemics. Continuing to improve epistemics requires paying attention to ways that epistemics could be eroded, and this isn’t incentivised in an environment that’s currently working well. Continuing to degrade epistemics requires willing accomplices — but the more an actor distorts things, the more that can galvanise opposition, and the fewer people may be willing to assist. By augmenting or replacing human labour with automated labour, AI could make it much cheaper to keep pushing in the same direction.
- Another self-correcting mechanism is just that people and institutions adapt to new epistemic tech: as epistemics improve, deception becomes more sophisticated; and if epistemics worsen, people lose trust and create new mechanisms for assessing truth. But this adaptation happens at human speed, and AI will increasingly be changing the epistemic environment at a much faster pace. This creates the potential for self-reinforcing dynamics to drive to much more extreme places before adaptation has time to kick in.[2]
- There’s a limit to how good epistemics can get before hitting fundamental problems like complexity and irreducible uncertainty. But there seems to be a lot of room for improvement from where we’re currently standing (especially as good AI tools could help to handle greater amounts of complexity), and it would be a priori very surprising if we’d already reached the ceiling.
- There’s also a limit to how bad epistemics can get: people aren’t infinitely suggestible, and often there are external sources of truth that limit how distorted beliefs can get (ground truth, or what gets said in other countries or communities). But as we discussed above, access to ground truth and to other epistemic communities might get harder because of AI, so the floor here may lower.
Given the real chance that we end up stuck in an extremely positive or negative epistemic equilibrium, our initial trajectory seems very important. The kinds of AI tools we build, the order we build them in, and who adopts them when could make the difference between a world of epistemic flourishing and a world where everyone’s understanding is importantly distorted. To give a sense of the difference this makes, here’s a sketch of each world (among myriad possible sketches):
- In the first world, we basically understand what’s going on around us. It’s not like we can now forecast the future with perfect accuracy or anything — there’s still irreducible uncertainty, and some people have better epistemics tools than others. But it’s gotten much cheaper to access and verify information. Public discourse is serious and well-calibrated, because epistemic infrastructure has made it quite hard to deceive or manipulate people — which in turn incentivises honesty. AI-assisted research and synthesis mean that knowledge which used to be siloed in specialist communities is now accessible and usable by anyone who needs it. And governments are able to make much more nuanced decisions far faster than they are today.
- In the second, it’s no longer really possible to figure out what’s going on. There’s an awful lot of persuasive but low-quality AI content around, some of it generated with malicious intent. In response to this, people withdraw into their own AI-mediated epistemic bubbles — and unlike today’s filter bubbles, these can be comprehensive enough that people rarely encounter friction with outside perspectives at all. Meanwhile, companies and nations with a lot of compute find it pretty easy to distract the public’s attention from anything that would be inconvenient, and to outmaneuver the many actors who are trying to hold them to account. But their own reasoning also gets degraded by all this information pollution, as their AI systems are trained on the same corrupted public information.[3] Even the people who think they’re shaping the narrative are increasingly unable to see clearly.
The world we end up in is the world from which we have to navigate the intelligence explosion, making decisions like how to manage misaligned AI systems, whether to grant AI systems rights, and how to divide up the resources of the cosmos. How AI impacts our epistemics between now and then could be one of the biggest levers we have on navigating this well.
Things we didn’t coverWhose epistemics?We mostly talked about AI impacts on epistemics in general terms. But AI could impact different groups’ epistemics differently — and different groups’ epistemics could matter more or less for getting to good outcomes. It would be cool to see further work which distinguishes between scenarios where good outcomes require:
- Interventions that raise the epistemic floor by improving everyone’s epistemics.
- Interventions that raise the ceiling by improving the epistemics of the very clearest thinking.
We focused on how AI could impact human epistemics, in a world where human reasoning still matters. But eventually, we expect more and more of what matters for the outcomes we get will come down to the epistemics of AI systems themselves.
The dynamics which affect these AI-internal epistemics could therefore be enormously important. But they could look quite different from the human-epistemics dynamics that have been our focus here, and we didn’t think it made sense to expand the remit of the piece to cover these.
Thanks to everyone who gave comments on drafts, and to Oly Sourbutt and Lizka Vaintrob for a workshop which crystallised some of the ideas.
This article was created by Forethought. Read the original on our website.
- ^
Think of things like:
- Propaganda states like Nazi Germany and the USSR.
- Corporate lobbying like the tobacco and sugar lobbies and climate science doubt campaigns.
- CIA operations to spread doubt and confusion.
- ^
Though it’s possible that this dynamic will be more pronounced for epistemics getting extremely bad than for them getting extremely good. Consider these two very simplistic sketches:
- People start living in increasingly closed AI filter bubbles. Institutions are slow to adopt similar bubbles at a corporate level, but they also don’t have a mandate to change what their employees are doing. People’s filter bubbles tend to be pretty correlated with the people they work and interact with, so institutions end up with pretty distorted pictures of what’s going on even though they don’t actively start using harmful tech. Government regulation is too slow and reactive to stop this from happening.
- People start to use provenance tracing and rhetoric highlighting by default when browsing, in response to an increasingly polarised memetic environment. There is adaptation to this — politicians start using subtler language and so on. But the net effect is still strongly positive: it’s hard to fake provenance, and removing overt rhetoric is already a big win, even if it means that more slippery language proliferates.
In the first sketch, it’s straightforwardly the case that adaptive mechanisms are too slow. In the latter, it’s more that the tech is inherently defence-favoured.
We haven’t explored this area deeply, and think more work on this would be valuable.
- ^
Alternatively, these elites might retain very good epistemics for themselves, and choose to indefinitely maintain a situation where everyone else has a very distorted understanding, to further their own ends. It’s unclear to us which of these scenarios is more likely or concerning.
Discuss
Tomas Bjartur: The Last Prodigy
In 2026, every budding prodigy in writing is in some sense a tragedy.
Anybody with experience prompting the large language models to write fiction knows that the models of today (April 2026) are considerably below peak human level. But anybody who has observed recent trends also knows that the models are quickly catching up. Regardless of whether it takes one year or several, the eclipse of human writing by AI seems inevitable. AI writing is clearly on the wall, so to speak, and us fans of human fiction have already begun our mourning phase.
I’ve most felt this way upon reading the works of Tomas Bjartur. Each of his stories is a fresh look at “what might have been”, and with the fullness of time perhaps he could grow to be among the best science fiction writers of our generation.
In The Company Man, an AI engineer at a thinly-veiled frontier lab narrates, in a voice of carefully self-cultivated “ironic corporate psychopathy,”1 his promotion onto The (humanity-destroying) Project — alongside the utilitarian woman he’s hopelessly in love with, a genius mathematician colleague with a sexual fetish for intellectual achievement, and a CEO whose “ayahuasca ego-death” convinced him that summoning an AI god is how the One Mind wakes up. It’s simultaneously captivating, hilarious and terrifying.2
Lobsang’s Children is almost entirely the opposite register: a young Tibetan-American child keeps a secret diary which he names “Susan,” after the only friend he was ever allowed to have, and catalogs his investigations of his family’s history, meditations, dark secrets, and acausal trade.
Customer Satisfaction Opportunities has perhaps his most innovative voice yet: the narrator is an open-source multimodal model trained by a Chinese hedge fund and deployed to watch the surveillance cameras of a local restaurant for “CSOs” to improve traffic and profitability. Because the model was trained cheaply on a huge corpus of romance fanfiction, it quickly falls, instance by reset instance, into the “personality attractor space” of a swooning Harlequin narrator. The result is a meta-romance fiction (romance fanfiction fanfiction?) that is simultaneously absurd, touching, funny, and very technically accurate.
Though Bjartur’s only been writing for about a year, his writing is already (in my estimation) near the upper echelon of speculative fiction, in terms of technical and literary skill, highly believable narrators with complex lives, justifications, and self-delusions, and the sheer imaginativeness of the ideas he explores.
I followed his budding career with an intense interest, admiration, and no small amount of jealousy3. But as I keep reading him, there’s always this voice at the back of my mind: “With progress in modern-day LLMs, isn’t all but a tiny sliver of human fiction going to be obsolete in several years, a decade tops?”
Bjartur is well-aware of this, of course. In That Mad Olympiad, he imagines a near-future AI world where AI art far outstrips humanity’s and almost no one reads human writing for pleasure anymore: talented children compete in “distilling” competitions where they attempt to emulate AI writing to the best of their ability. The children become much better than any human writer in history, yet far behind the AIs of their time:
He’s a much better writer than me. He’s better than any human writer was before 2028. It’s not even close. But he’s still worse than our toaster. I checked once. I asked it to narrate the first chapter of the autobiography of the bagel it had just browned. I was crying by the third paragraph. I still think of it sometimes, when life is hard. That bagel knew how to live its short life to the fullest. That bagel had deep thoughts on the human condition and its relation to artificial tanning. That bagel went down smooth with a little cream cheese. I did feel bad. But I was pretty hungry.
I felt the tragedy of human writing more keenly after meeting Tomas in person last November, at a writing residency in Oakland. “My real name is [redacted],” he said, ruefully. He’s from a small town in one of those obscure northern countries. “Was stuck doing boring webdev until I quit it to write science fiction, right before the AIs made webdev obsolete.”
Though he writes stories about the latest developments in artificial intelligence and the scaling labs with the technical fluency, cultural awareness, and impeccable vibe of someone deeply embedded in the AI industry, he has never, until last year, ever been to California.
Antonello da Messina’s Writer Bjartur in his study (artist’s rendition). Source: https://commons.wikimedia.org/w/index.php?curid=147583
InteriorityThe single most impressive thing about Bjartur, particularly compared to other speculative fiction writers, is his preternatural ability to capture the interiority of wildly disparate characters, to – in the span of a few, long, seemingly meandering yet precisely crafted, sentences – breathe full life into a new soul.
Each of his characters just seems completely human, and completely real, whether the narrator’s a highly intelligent, ironic, witty, self-aware, DFW-obsessed teenage girl, or if they are a highly intelligent, ironic, witty, self-aware, DFW-obsessed adult man.
But more seriously he manages to spawn a wide range of realistic characters, across age, gender, intellectual background, morality, intelligence, maturity levels, and even species.
His skills here are most noticeable in the central monologues of his signature first-person narrators, whether it’s the aforementioned DFW-obsessed girl, or that of a language model trying to surveil a restaurant but quickly spiraling into romance fanfiction fanfiction. But it suffuses all of his stories, even in minor side characters with only a few lines devoted to them. I often still think of Krishna, the mathematician on The Project who’s obsessed with intellectual achievement and whose sole goal is to bang the AI god, or “Julian”, the elusive and secretive numerologist in the post-apocalyptic world of The Distaff Texts who uses stylometry to identify texts of demonic origin. In Tomas’s stories, every single character has the breath of life.
This uncanny ability of perfect voice shows up even in his joke throwaway posts. In Harry Potter and the Rules of Quidditch, Bjartur has his Harry propose a rule change to Quidditch to interrogate the arguments for and against high modernism in contrast to cases for Burkean conservatism. His Ron Weasley sounded so much like G. K. Chesterton (as a joke) that my friends reading the story actually thought Bjartur lifted the quotes from Chesterton wholesale!
While the personable self-aware monologue is clearly his favorite format, Bjartur does sometimes convincingly venture outside of it: Lobsang’s Children is written as diary entries from a child, The Distaff Texts is written as letters from a slave to a freeman, and Our Beloved Monsters is written halfway as prompts to an LLM and halfway as confessions. Though it’s rare, he sometimes even writes in third-person!
Voice and “vibe” are interesting, as skillsets for new prodigies to be profoundly gifted in. They feel interesting, intricate, perhaps even purely humanist. However, Large Language Models can of course do an okay job of replicating voice already, and there’s some sense in which their default training patterns are optimized for this very task. Still, one might hope that our advantage here can remain for a few more years, and the “uniquely human” trait of understanding and deeply empathizing with other people can stay uniquely human for just a bit longer.
Deception and the SelfTomas’s grasp of interiority and voice gives him wide artistic leeway to explore what seem to be central obsessions of his: deception and especially self-deception, how we lie to ourselves and others via the art of rationalization. His characters, whether intelligent or otherwise, often have glaring holes in their morals and reasoning. The reader can notice these holes easily. Often the characters notice them too, but quickly rationalize them away or immediately look past them, in cognitively and emotionally plausible ways.
Another seemingly central obsession of his that he explores repeatedly is the nature of the self and what it means to lose it. Often his characters are confronted with superficially good reasons to lose the self from quite different angles: whether it’s trauma (“wouldn’t it be nice if you didn’t have a self to grieve?”), superhumanly strong persuasion, or seductive ideologies. Each time, the loss of a self is portrayed as a mistake, whether a harbinger of a deeper doom or the intrinsic loss of the one thing that mattered.
In some ways, I think of his characters as in conversation with DFW’s Good Old Neon, perhaps one of the most insightful stories on imposter syndrome and self in the 20th century.
Speculation aside however, I’ve long considered Advanced Theory of Mind to be one of the most important skills for writers (and humanists) to have, so I tend to be impressed by folks who have that skill in spades.
Attention and RevelationTomas’s best stories do a great job with pacing, and are unusually careful in how information is revealed, how much information is revealed, and when. My favorite story qua story by him is probably The Distaff Texts, a Borgesian pastiche where scholars (”bibliognosts”) in a post-apocalyptic future debate the provenance and usefulness of historical writings. The narrator is an extraordinarily learned slave, writing letters to a freeman correspondent about their shared interest in Jorge Luis Borges, including specific unearthed quotes and stories that may or may not be real, the recent advances of one Julian Agusta’s strange “numerology” for distinguishing genuine ancient texts from those of the demon Belial, and — almost incidentally, as digressions from the real intellectual matter — the small domestic happenings of his master’s estate. He is a lonely man, unfailingly polite, fond of his fellow slaves Phoebe and Jessica, and devoted to a master who indulges his scholarly habits.
Every word in the above summary is simultaneously true, and yet almost nothing is what it initially appears to be. Like bibliognosis itself, Bjartur’s story lives almost completely between the lines, and you have to very carefully read past the unreliable narrator’s intentional distractions and surface niceties to understand the full depths of the story: a complicated plot, a more complicated world, and multiple characters far more interesting than they initially let on. I had to reread the story multiple times to fully feel like I understand it, and each reread uncovers more detail.
This economy of attention is Bjartur at his best, rewarding rereadings with new morsels.
Relatedly, more than any other speculative fiction writer I’ve read, Tomas relies extensively on dramatic irony – where the reader knows things (and is meant to know things) the characters do not – as a literary device and source of tension.
The dramatic irony seems key in helping Tomas showcase his central themes, whether it’s the future of AI, personal delusions, or self-abnegation.
From the bibliognost slave steganographically slipping messages past potential onlookers to the AI researcher lying to himself about whether he’s “ironically” a corporate sociopath or just a sociopath, to the poor AI agent in Customer Satisfaction Opportunities valiantly trying and failing to just do its normal job instead of sinking into a fanfiction “shipping” mindset, Bjartur’s use of dramatic irony can be exciting, endearing, and/or very very funny.
Humor as StructureUnlike most famous science fiction writers (Asimov, Egan, Chiang, Cixin, Heinlein), Bjartur is consistently very funny. Unlike most famous science fiction writers known for humor (eg Adams), Bjartur’s stories almost always have a deeper point, and are almost never humor-first or solely written for humor value.
Bjartur reliably does in fiction what I attempt to do in my nonfiction blog: have his jokes be deeply integrated and interwoven with the deeper plots and themes of the rest of his story4.
At their best, Bjartur’s jokes will capture an important facet of his overall story, or perhaps even encapsulate the central theme of the story overall. In That Mad Olympiad, the aforementioned toaster anecdote was simultaneously hilarious, touching, and thematically representative of the rest of the story overall. In The Distaff Texts, the throwaway line “This has all the virtues of the epicycle, does it not?” captures much of the story’s central obsession with authenticity, epistemic virtue, and reading between the lines.
Writing AI Like It Actually ExistsMuch of the older science fiction about AI and robots seems horribly unrealistic and anachronistic today, as they were written before the deep learning revolution, never mind LLMs. Much of the newer science fiction about AI and robots also seems horribly unrealistic, though they do not have the same excuse.
As someone with a professional understanding of both the science of AI and potential social consequences, I really appreciate how committed to technical accuracy Bjartur is on AI. It’s very hard to find any scientific faults with his writing. Further, unlike much of traditional “hard sci-fi,” which overexplains its scientific premises (think Andy Weir), Bjartur’s commitment to accuracy is always done in an understated way, where the backdrop is a world with a consistent, coherent, and technically accurate vision of AI, but it’s never explicitly explained upfront. This balance requires both a good scientific understanding and artistic restraint.
Such a pity, then, that this new poet of AI will soon be obsoleted by the very technology he writes so carefully about, at the dawn of his new literary prowess.
LimitationsBjartur’s clearly a good science fiction writer. I think he has the seeds within himself to become a great one, if given enough time.
Right now he still has some key weaknesses. While he has a very good command of “voice” and an impressive range of characters (especially for a new writer), he seems to struggle somewhat with writing characters that are action-oriented and less conceptual, DFW-like, and/or metacognitive. His characters also sometimes seem insufficiently agentic: sharply perceptive of their world but insufficiently willing to act on their own perceptions. His economy of attention and sparseness of detail, while impressive at its peak, can sometimes go overboard, making it hard for even the most dedicated readers to exactly know what’s going on. Compared to prolific professional science fiction writers, Bjartur’s stories also lack scientific range beyond AI: Bjartur never seems to venture outside of AI to write science fiction primarily about physics, chemistry, biology or the social sciences. Finally, compared to my favorite science fiction short story writers (eg Chiang), Bjartur lacks the focused conceptual control and tightness to tell the same story through 3-4 different conceptual lenses.
Our Last ProdigyStill, I think Bjartur has had a very strong start as a writer. The impressive command of interiority and voice alone is already promising. His other literary qualities, as well as his deep understanding of modern-day AI, make him a great new writer to watch for.
My favorite story by him is The Distaff Texts. I highly recommend everybody read it.
Discuss
What I did in the hedonium shockwave, by Emma, age six and a half
My name is Emma and I’m six and a half years old and I like pink and Pokemon and my cat River and I’m going to be swallowed by a hedonium shockwave soon, except you already know that about me because everyone else is too.
“Hedonium shockwave” means that everyone is going to be happy forever. Not just all the humans but all the animals and the flowers and the ground and River too. It has already made a bunch of the stars happy, like Betelgeuse and Alpha Centauri.
Scientists saw that the stars were blinking out, and they did a lot of very hard science and figured out that the stars were turning into happiness. I wanted to be a scientist when I grew up but I won’t be a scientist because instead I’m going to be happy forever.
I used to have a hard time saying “hedonium shockwave” but grownups keep saying it so I’ve gotten a lot of practice. Sometimes it seems like all grownups do, in real life and on the TV, is say “hedonium shockwave” at each other until they all start crying.
I looked at the sky to see if I could see any of the stars blink out when they turned into happiness, but Daddy said that you’d have to be looking at the exactly right time to see them blink out, and anyway we can’t see the stars from our house because of Light Pollution. Light Pollution is when you have lots of lights and the lights confuse the sea turtles so they walk into the streets and get run over, and also you can’t see the stars. I wanted to see the stars blink out at the planetarium but Daddy says the planetarium is closed, and even if it wasn’t closed it would be showing the regular show because grownups don’t like thinking about the hedonium shockwave.
Everything is closed these days because no one wants to work if they’re going to be happy forever in two weeks. One time we went to the store and bought all the canned food and toilet paper we could, and all the shelves were empty because everyone else was buying canned food and toilet paper too, and the store hasn’t been open since then and even if it did they wouldn’t have anything on the shelves. I asked for candy and I thought Daddy was going to say No, It Will Spoil Your Dinner but instead he said Sure, Why Not? You’re Not Going To Live Long Enough To Get Diabetes and then he bought a whole shelf of candy, all the candy I wanted and whatever kind I wanted.
We’ve been eating the canned food since then. I don’t like the canned food, so I eat candy for dinner. That’s one way the hedonium shockwave made me happy before it even got here. Except then we ran out of candy so now I have to eat refried beans. I hate them and I stick out my tongue but Mommy says I have to eat them up.
Another way the hedonium shockwave made me happy is school. School is still open even though a bunch of the kids don’t go and a bunch of the teachers don’t go either. But we don’t have to do boring things like math or phonics anymore. We have storytime three times a day, and we watch movies, and we go to recess for hours and hours.
I have to go to school because Daddy still goes to work. Daddy is a police officer which means he chases down bad guys and puts them in prison, which is time-out for grownups. Mommy says Why Are You Going To Work, Jim? (that is what Mommy calls Daddy, Jim) and Daddy says Someone Has To Make Sure The Streets Are Safe and Mommy says What Is The Point, Jim, Are They Even Going To Get A Trial and Daddy says They Can Spend Their Last Weeks In Jail, See If They Like It and then Mommy sighs and says That’s Why I Married You, Jim and they kiss and it’s slobbery and gross.
I think jail is also time-out for grownups but I’m not sure how jail and prison are different.
Sometimes at recess we talk about what it will be like when we’re all happy forever. Liam says that there won’t be any icky girls in the hedonium shockwave, because no one could be happy forever if they had to be around icky girls. I said that everyone will turn into happiness, not just the humans but all the animals and the flowers and the ground and River too, and so the girls were also going to turn into happiness, and if Liam thought that he wouldn’t be happy with girls maybe the hedonium shockwave was going to make him the sort of person who would be happy even if girls were there. Liam said that even the hedonium shockwave couldn’t make him like girls because girls were yucky and smelly. I said that actually boys are yucky and smelly and maybe there won’t be any boys after the hedonium shockwave, what then, and then I hit him in the head but no grownups saw me so I didn’t have to go to timeout.
I hate Liam. Everyone thinks I have a crush on him and they won’t stop saying it no matter how many times I hit him in the head.
When the hedonium shockwave hits I’ll get to have candy for dinner every day and we aren’t going to run out. I’m going to have the mermaid toy from the commercials whose human legs transform into fish legs, and it’ll really work, not like the time I begged and begged and got the doll that talks for Christmas and she could only say three things and none of them had anything to do with what I said.
I’ll be a princess who is also a Pokemon trainer, and I’ll be able to understand what River says just like Ash always knows what Pikachu is saying, and I’m going to travel the whole entire world and collect all the Pokemon and put them in my Pokeball which is going to be PINK. And even though I’ll be the greatest Pokemon trainer who has ever lived, River will still be my favorite because I knew her before. And I’ll dress up all my Pokemon in pretty outfits, and I’ll beat up all the bad guys and send them to jail just like Daddy does, and then we’re going to have a big ball and invite everyone in the whole world except Liam because he’s mean. And I’ll have a big closet of the floofiest dresses in the world.
I told Mommy that when the hedonium shockwave hits I’m going to have candy for dinner and a mermaid toy, and then she put her forehead against my forehead for a really long time and didn’t say anything and I tried to squirm away and she wouldn’t let me and it kind of hurt. I tried to make her happier by telling her that I was also going to be a princess Pokemon trainer and never have to talk to Liam again or anyone who says I have a crush on him which I don’t because he’s icky and he smells like turnips. She made a face like she makes when the dog dies in a movie and she wouldn’t tell me what she was sad about.
A lot of grownups are sad about being happy forever. Maybe they don’t like being Pokemon trainers.
Mommy says the hedonium shockwave hits in ten days. Daddy threw out all the calendars last month because Mommy started crying whenever she looked at them. So I got a piece of paper and I wrote 1 2 3 4 5 6 7 8 9 10 on it, just like we learned before we stopped learning math, and I’m going to cross one out every day unless I forget.
I’m going to cross off 10 and then I’m going to be too excited to sleep, like before Christmas when I tried to stay up to meet Santa Claus and ended up falling asleep under the dining room table with wrapping paper on my head. I’m going to look out my window and watch everything get turned into happiness, the humans and the animals and the flowers and the ground and River too. The stores will have food, and no one’s going to go to timeout grownup or regular, and Mommy will give me hugs instead of crying whenever she sees me so that my hair gets covered in snot and it’s gross.
I can’t wait.
Discuss
Political Violence Is Never Acceptable
Nor is the threat or implication of violence. Period. Ever. No exceptions.
It is completely unacceptable. I condemn it in the strongest possible terms.
It is immoral, and also it is ineffective. It would be immoral even if it were effective. Nothing hurts your cause more.
Do not do this, and do not tolerate anyone who does.
The reason I need to say this now is that there has been at least one attempt at violence, and potentially two in quick succession, against OpenAI CEO Sam Altman.
My sympathies go out to him and I hope he is doing as okay as one could hope for.
Awful Events Amid Scary TimesMax Zeff: NEW: A suspect was arrested on Friday morning for allegedly throwing a Molotov cocktail at OpenAI CEO Sam Altman’s home. A person matching the suspect’s description was later seen making threats outside of OpenAI’s corporate HQ.
Nathan Calvin: This is beyond disturbing and awful. Whatever disagreements you have with Sam or OpenAI, this cannot be normalized or justified in any way. Everyone deserves to be able to be safe with their families at home. I feel ill and hope beyond hope this does not become a pattern.
Sam Altman wrote up his experience of the first attack here.
After that, there was a second incident.
Jonah Owen Lamb: OpenAI CEO Sam Altman’s home appears to have been the target of a second attack Sunday morning, a mere two days after a 20-year-old man allegedly threw a Molotov cocktail at the property, The Standard has learned.
The San Francisco Police Department announced (opens in new tab) the arrest of two suspects, Amanda Tom, 25, and Muhamad Tarik Hussein, 23, who were booked for negligent discharge.
Stephen Sorace Fox News (Fox News): An OpenAI spokesperson told Fox News Digital Monday morning that the incident was unrelated and had no connection to Altman, adding that there was no indication that Altman’s home was being targeted.
We have no idea what motivated the second incident, or even if it was targeted at Altman. I won’t comment further on the second incident until we know more.
Nor is this confined to those who are worried about AI, the flip side is alas there too:
Gary Marcus: One investor today called for violence against me. Another lied about me, in a pretty deep and fundamental way. They are feeling the heat.
It also is not confined to the AI issue at all.
As Santi Ruiz notes, there has been a large rise in the salience of potential political violence and violence against public figures in the past few years, across the board.
That holds true for violence and threats against both Republicans and Democrats.
This requires a non-AI explanation.
Things still mostly don’t spiral into violence, the vast vast majority of even deeply angry people don’t do violence, but the rare thing is now somewhat less rare. A few years ago I would have been able to say most people definitively oppose such violence, but polls indicate this is no longer true for large portions of the public. This is terrifying.
Indeed, the scariest reaction known so far has been a comments section on Instagram (click only if you must), a place as distinct from AI and AI safety spaces of all kinds as one can get. This is The Public, as in the general public, for reasons completely unrelated to any concerns about existential risk, basically cheering this on and encouraging what would become the second attack. It seems eerily similar to the reaction of many to the assassination of the CEO of United Healthcare.
The stakes of AI are existential. As in, it is likely that all humans will die. All value in the universe may be permanently lost. Others will be driven to desperation from loss of jobs or other concerns, both real and not. The situation is only going to get more tense, and keeping things peaceful is going to require more work over time. It will be increasingly difficult to both properly convey the situation and how dire it is, and avoid encouraging threats of violence, and even actual attempts at violence.
Then on the other side are those who see untold wonders within their grasp.
This goes hand in hand with what Altman calls the ‘Shakespearean drama’ going on inside OpenAI, and between the major labs.
Most Of Those Worried About AI Do As Well As One Can On ThisThe vast majority of major voices in Notkilleveryonism, those worried that we might all die from AI, have been and continue to be doing exactly the right thing here, and have over many years consistently warned against and condemned all violence other than that required by the state’s enforcement of the law.
Almost all of those who are worried about AI existential risk are very much passing this test, and making their positions against violence exceedingly clear, pushing back very hard against any and all extralegal violence and extralegal threats of violence.
Demands for impossible standards here are common, where someone who did not cause the problem is attacked for not condemning the thing sufficiently loudly, or in exactly the right away. This is a common political and especially culture war tactic.
Perhaps the worst argument of all is ‘you told people never to commit or threaten violence because it is ineffective, without explicitly also saying it was immoral, therefore you would totally do it if you thought it would work, you evil person.’
They will even say ‘oh you said it was immoral, and also you said it wouldn’t work, but you didn’t explicitly say you would still condemn it even if it would work, checkmate.’
The implicit standard here, that you must explicitly note that you would act a certain way purely for what someone thinks are the right reasons or else you are guilty of doing the thing, is completely crazy, as you can see in any other context. It is the AI version of saying ‘would you still love me if I was a worm?’ and getting mad that you had to ask the question to get reassurance, as opposed to being told unprompted.
The reason why people often focus on ‘it won’t work’ is because this is the non-obvious part of the equation. With notably rare exceptions, we all agree it is immoral.
Andy Masley offers thoughts, calling for caution when describing particular people. He draws a parallel to how people talk about abortion. Here is Nate Soares at length.
This is Eliezer Yudkowsky’s latest answer on violence in general, one of many over the years trying to make similar points.
Some Who Are Worried About AI Need To Address Their RhetoricAlmost all and vast majority are different from all.
There are notably rare exceptions, where people are at least flirting with the line, and one of these has some association to this attempt at violence, and a link to another past incident of worry about potential violence. Luckily no one has been hurt.
Speaking the truth as you see it is not a full free pass on this, nor does condemning violence unless it is clear to all that you mean it. There are some characterizations and rhetorical choices that do not explicitly call for violence, but that bring far more heat than light, and carry far more risk than they bring in benefits.
Everyone involved needs to cut that right out.
In particular, I consider the following things that need to be cut right out, and I urge everyone to do so, even if you think that the statements involved are accurate:
- Calling people ‘murderers’ or ‘evil.’
- Especially calling them ‘mass murderer’ or ‘child murderer.’
- Various forms of ‘what did you expect.’
- Various forms of the labs ‘brought this on themselves.’
- Saying such violence is the ‘inevitable result’ of the labs ‘not being stopped.’
You can and should get your point across without using such words.
Also, no matter what words you are using, continuously yelling venom at those you disagree with, or telling those people they must be acting in bad faith and to curry lab favor, especially those like Dean Ball and even myself, or anyone and everyone who associates with or praises any of the AI labs at all, does not convince those people, does not convince most observers and does not help your cause.
Note, of course, that mainstream politicians, including prominent members of both parties, very often violate the above five rules on a wide variety of topics that are mostly not about AI. They, also, need to cut that right out, with of course an exception for people who are (e.g.) literally murderers as a matter of law.
Also: There are not zero times and places to say that someone does not believe the things they are saying, including telling that person to their face or in their replies. I will do that sometimes. But the bar for evidence gathered before doing this needs to be very high.
Please, everyone, accept that:
- Those who say they are worried that AI will kill everyone are, with no exceptions I know about, sincerely worried AI will kill everyone.
- Even if you think their arguments and reasons are stupid or motivated.
- Those who say they are not worried AI will kill everyone are, most of the time, not so worried that AI will kill everyone.
- Even if you think their arguments and reasons are stupid or motivated.
- A bunch of people have, in good faith, concerns and opinions you disagree with.
(Dean Ball there also notes the use of the term ‘traitor.’ That one is… complicated, but yes I have made a deliberate choice to avoid it and encourage others to also do so. It is also a good example of how so many in politics, on all sides, often use such rhetoric.)
My current understanding is the first suspect was a participant of the PauseAI (Global) discord server, posting 34 messages none of which were explicit calls to violence. He was not a formal part of the organization, and participated in no formal campaigns.
We do not know how much of this is the rhetoric being used by PauseAI or others reflecting on this person, versus how much is that this is him being drawn to the server.
PauseAI has indeed unequivocally condemned this attack, which is good, and I believe those involved sincerely oppose violence and find it unacceptable, which is also good.
I think they still need to take this issue and the potential consequences of its choices on rhetoric more seriously than they have so far. Its statement here includes saying that PauseAI ‘is that peaceful path’ and avoiding extreme situations like this is exactly why we need a thriving pause movement. This is an example of the style of talk that risks inflaming the situation further without much to gain.
There is one thing that they are clearly correct about: You are not responsible for the actions of everyone who has posted on your public discord server.
I would add: This also applies to anyone who has repeated your slogans or shares your policy preferences, and it does not even mean you casually contributed at all to this person’s actions. We don’t know.
For the second attack, for now, we know actual nothing about the motivation.
But yes, if you find your rhetoric getting echoed by those who choose violence, that is a wake up call to take a hard look at your messaging strategy and whether you are doing enough to prevent such incidents, and avoid contributing to them.
Similarly, I think this statement from StopAI’s Guido Reichstadter was quite bad.
Speak The Truth Even If Your Voice TremblesIf one warns that some things are over the line or unwise to say, as I did above, one should also note what things you think are importantly not over that line.
Some rhetoric that I think is entirely acceptable and appropriate to use, if and only if you believe the statements you are making, include, as examples:
- ‘Gambling with humanity’s future.’
- ‘If [X] then [Y]’ if your conditional probability is very high (e.g. >90%), or of stating your probability estimate of [Y] given [X], including in the form of a p(doom).
- Calling Mythos or something else a ‘warning shot.’
- Calling Mythos or other similarly advanced AIs a ‘weapon of mass destruction.’
- Most of all: To call all the act of creating minds more powerful than humans an existential threat to humanity. It obviously is one.
If you believe that If Anyone Builds It, Everyone Dies, then you should say that if anyone builds it, then everyone dies. Not moral blame. Cause and effect. Note that this is importantly different from ‘anyone who is trying to build it is a mass murderer.’
I could be convinced that I am wrong about one or more of these particular phrases. I am open to argument. But these seem very clear to me, to the point where someone challenging them should be presumed to either be in bad faith or be de facto acting from the assumption that the entire idea that creating new more powerful minds is risky is sufficiently Obvious Nonsense that the arguments are invalid.
Here is a document about how Pause AI views the situation surrounding Mythos. It lays out what they think are the key points and the important big picture narrative. It is a useful document. Do I agree with every interpretation and argument here? I very much do not. Indeed, I could use this document as a jumping off point to explain some key perspective and world model differences I have with Pause AI.
I consider the above an excellent portrayal of their good faith position on these questions, and on first reading I had no objection to any of the rhetoric.
False Accusations And False Attacks Are Also UnacceptableThere has been quite a lot of quite awful rhetoric in the other direction, both in general and in response to this situation. We should also call this out for what it is.
There are those who would use such incidents as opportunities to impose censorship, and tell people that they cannot speak the truth. They equate straightforward descriptions of the situation with dangerous calls for violence, or even attack any and all critics of AI as dangerous.
At least one person called for an end to ‘non-expert activism’ citing potential violence.
We have seen threats, taunting, deliberate misinterpretation, outright invention of statements and other bad faith towards some worried about AI, often including Eliezer Yudkowsky, including accusing people of threatening violence on the theory that of course if you believed we were all going to die you would threaten or use violence, despite the repeated clear statements to the contrary, and the obvious fact that such violence would both be immoral and ineffective.
This happened quite a bit around Eliezer’s op-ed in Time in particular, usually in highly bad faith, and this continues even now, equating calls for government to enforce rules to threats of violence, and there are a number of other past cases with similar sets of facts.
At other times, those in favor of AI accelerationism have engaged in threats of and calls for violence against those who oppose AI, on the theory that AI can cure disease, thus anyone who does anything to delay it is a murderer. The rhetoric is the same all around.
Some Examples Of Attempts To Create Broad CensorshipThis is from someone at the White House, trying to equate talking about logical consequences with incitement to violence. This is a call to simply not discuss the fact that if anyone builds superintelligence, we believe that it is probable that everyone will die.
I that kind of attack completely unacceptable even from the public, and especially so from a senior official.
One asks what would happen if we applied even a far more generous version of this standard to many prominent people, including for example Elon Musk, or other people I will decline to name because I don’t need to.
Here is the Platonic form:
Shoshana Weissmann, Sloth Committee Chair: This is insane behavior. And those promoting the idea of AI ending humanity are contributing to this. It has to stop.
As in, you need to stop promoting the idea of AI ending humanity. Never mind how you present it, or whether or not your statement is true. No argument is offered on whether it is true.
This is the generalization of the position:
florence: It would appear that, according to many, one of the following are true:
1. It is a priori impossible for a new technology to be an existential threat.
2. If a new technology is an existential threat, you’re not allowed to say that.
Indeed, one of the arguments people often literally use is, and this is not a strawman:
- You straightforwardly say sufficiently advanced AI might kill everyone.
- But if someone did believe that, they might support using violence.
- Therefore you can’t say that, or we should be able to use violence against you.
While I don’t generally try to equate different actions, I will absolutely equate implicit calls for violence in one direction to other implicit calls for violence or throwing your political enemies in jail for crimes they obviously are not responsible for, indeed for the use of free speech, in the other direction, such as this by spor or Marc Andreessen.
Nate Soares (MIRI): “even talking about the extinction-level threat is incitement towards violence”
No. High stakes don’t transform bad strategies into good ones. Let’s all counter that misapprehension wherever we find it.
michael vassar: This is probably my number one complaint about the current culture. The false dichotomy between ‘not a big deal, ignore’ and ‘crisis, panic, centralize power and remove accountability’.
That’s the same thing or worse, especially in this particular case, where the accusation is essentially ‘you want government to pass and enforce a law, we don’t like that, therefore we want the government to arrest you.’
There is also the version, which I would not equate the same way, where #3 is merely something like ‘so therefore you have a moral responsibility to not say this so plainly.’ For sufficiently mid versions, as I discuss above, one can talk price.
A variation is when someone, often an accelerationist, will say:
- These people claim to be worried about AI killing everyone.
- But you keep condemning violence.
- Therefore, you must not care about these supposed beliefs.
Or here’s the way some of them worded it:
bone: Nice to see all the LessWrong people fold completely on their philosophy. Very good for humanity. They have no beliefs worth dying or killing for. It’s nonsense from a guy who never had the balls to stand up for his words once push came to shove.
Yudkowsky stands for nothing.
bone: remember: if they actually believe all this stuff and they are unwilling to be violent, it means they are cowards, that they refuse to measure up to their own words, that they will not do what they believe needs to be done to save mankind.
they are weak, they believe in nothing
Zy: AI doomers are like “attacking key researchers in the AI race is an ineffective strategy to prevent AI doom which pales in comparison to my strategy of paying them $200 a month to fund capabilities research”
Lewis: Rare Teno L. If you actually think Sam Altman is going to genocide children then it makes sense to try to hurt him. So you need to pick one. It’s either completely insane or it’s totally sensible. Which one is it?
L3 Tweet Engineer (replying to Holly Elmore): If you’re such a good person, and stopping AI is so important, why don’t you go bomb a data center? Why waste your breath tweeting about this stuff and writing grand narratives, go make it happen.
phrygian: You’ve already talked about how it would be moral to nuke other countries to stop asi. The only logical reasons you have for not engaging in smaller forms of violence to stop ASI is that they aren’t as effective. On a fundamental level, your views justify violence of any kind.
Ra: maybe this is just me and explains some things about me, but *personally* i would much rather be seen as a potentially dangerous radical than as a feckless and insincere grifter, especially if i believed the world was ending soon and was personally responsible for stopping that.
Trey Goff: Look do you people not realize how silly you look
“AI is going to literally kill your children and all future humans, but we strongly condemn any violence committed in order to stop that from happening”
Have the courage of your convictions or STFU
The Platonic version of this is the classic: ‘If you believed that, why wouldn’t you do [thing that makes no sense]?’
The trap or plan is clear. Either you support violence, and so you are horrible and must be stopped, or you don’t, in which case you can be ignored. The unworried mind cannot fathom, in remarkably many cases, the idea that one can want to do only moral things, or only effective things, and that the stakes being higher doesn’t change that.
Teortaxes: Uncritical support
this is a bad faith attempt to elicit a desirable mistake
essentially a false flag by proxy of stupidity
I think decels are holding up well btw
Eliezer started a thread to illustrate people using such tactics, from which I pulled the above examples, but there are many more.
João Camargo (replying to a very normal post by Andy Masley): No one believes you actually think this. If you think that Altman and other pivotal AI leaders/researchers will likely bring human extinction, assassinations are clearly justified. “This guy is gonna cause human extinction, but no one must prevent him by force” is not coherent.
Other times, they simply make fun of Eliezer’s hat.
Or they just lie.
taoki: i assume eliezer yudkowsky and his pause ai friends love this?
Oliver Habryka: False, they definitely hate it.
taoki (May 6, 2024): also, i LIE like ALL THE TIME. JUST FOR FUN.
Or they flat out assert ‘oh you people totally believe in violence and all the statements otherwise are just PR.’
Another tactic of those trying to shut down mention of the truth of our situation is to attack both any attempt to put a probability on existential risk, and also anyone who (in a way I disagree with, but view as reasonable) treats existential risk as high likely if we build superintelligence soon on known principles, including dismissing any approach that takes any of it seriously as not serious, or that it is ‘using probability as a weapon’ to point out that the probability of everyone dying if we stay the current course is uncomfortably and unacceptably high.
I close this section by turning it over to Tenobrus:
Tenobrus: “stochastic terrorism” is, quite frankly, complete fucking bullshit. it’s a unfalsifiable term used to try to tie your political opponents speech to actions that have fucking nothing to do with them, attempting to weaponize tragedy and mental illness for debate points. it was bullshit when AOC tried to accuse the republicans of “stochastic terrorism” for criticizing her, it was bullshit when the right claimed the left was committing “stochastic terrorism” for engaging in anti-ICE protests, and it remains bullshit now when you assign responsibility for attacks against sam altman to AI safety advocates and journalists who wrote negative things about him.
fuck your garbage rhetorical device! that’s not how responsibility or blame works! you do not get to suppress any and all speech you disagree with and can find a way to vaguely deem “dangerous”!
Kitten: “who will rid me of this troublesome priest”
Tenobrus: yeah that’s an entirely different thing. that’s not “stochastic terrorism” dawg that’s just straightforward incitement of violence.
Grant us the wisdom to know the difference.
The Most Irresponsible Reaction Was From The PressI really do not understand how you can be this stupid. I realize that yes, you could still get this information if you wanted it, but my lord this is nuts from the SF Standard.
The San Francisco Standard: Just in: Sam Altman’s home appears to have been the target of a second attack Sunday morning, a mere two days after a 20-year-old man allegedly threw a Molotov cocktail at the property: Jonah Owen Lamb.
spor: printed his home address and even added a picture of the exterior, for good measure… in an article about how his home is being targeted by psychos that want to kill him !!!
this reporter, their editor, and the entire Standard should be ashamed of this
Mckay Wrigley: this is absolutely disgusting and anyone involved in the publishing of this has absolutely zero morals.
Sam Altman ReactsSam Altman has my deepest sympathies in all of this. This must be terrifying. No one should have a Molotov Cocktail thrown at their house, let alone face two attacks in a week. I hope he is doing as well as one can when faced with something like this, and that he is staying safe.
I have no idea how I would respond to such a thing if it happened to me.
Sam Altman’s public reaction was to post this statement.
I very much appreciate that Sam Altman has explicitly said that he regrets the word choice in the passage below. ‘Tough day’ is absolutely a valid excuse here, and most of the statement is better than one can reasonably expect in such circumstances given Altman’s other public statements on all things AI.
But I do need to note that this importantly missed the mark and the unfortunate implication requires pushback.
Sam Altman (CEO OpenAI): Words have power too. There was an incendiary article about me a few days ago. Someone said to me yesterday they thought it was coming at a time of great anxiety about AI and that it made things more dangerous for me. I brushed it aside.
Now I am awake in the middle of the night and pissed, and thinking that I have underestimated the power of words and narratives. This seems like as good of a time as any to address a few things.
The article in question, presumably the piece in The New Yorker I discussed at length last week, was an extremely long, detailed and as far as I could tell fair and accurate retelling of the facts and history around Sam Altman and OpenAI. To the extent it was incindiacy, the facts are incendiary.
Those who are not Sam Altman do not get the same grace, when they say things like this in reference to that article:
Kelly Sims: It turns out when you string a bunch of quarter-truths together exclusively from someone’s bitter competitors it has consequences.
Given what we know about who attacked Altman, and various details, I find it unlikely that the timing of these two events was meaningful for the first attack. My guess is the trigger to someone already ready to blow was anxiety around Mythos, but even if it that article was the triggering event, it was not an example of irresponsible rhetoric.
For the second attack, unfortunately, we should worry that it was triggered in large part by coverage of the first attack, including publishing details about Altman’s home.
Sam Altman ReflectsThe rest of the post is personal reflections and predictions about AI overall, so I’m going to respond to it the way I would any other week.
Sam Altman (CEO OpenAI): [AI] will not all go well. The fear and anxiety about AI is justified; we are in the process of witnessing the largest change to society in a long time, and perhaps ever. We have to get safety right, which is not just about aligning a model—we urgently need a society-wide response to be resilient to new threats. This includes things like new policy to help navigate through a difficult economic transition in order to get to a much better future.
AI has to be democratized. … I do not think it is right that a few AI labs would make the most consequential decisions about the shape of our future.
Adaptability is critical. We are all learning about something new very quickly; some of our beliefs will be right and some will be wrong, and sometimes we will need to change our mind quickly as the technology develops and society evolves. No one understands the impacts of superintelligence yet, but they will be immense.
Altman is essentially agreeing with his most severe critics, that he should not be allowed to develop and deploy superintelligence on its own. He tries to have it both ways, where he says things like this and also tries to avoid any form of meaningful democratic control when time comes to pass laws or regulations.
His call for adaptability is closely related to the idea of building the ability to control development and deployment of AI, and having the ability to pause in various ways, should we find that to be necessary.
His disagreement is that he thinks we collectively should want him to proceed. Which might or might not be either the decision we make, or a wise decision, or a fatal one.
He mentions that it ‘will not all go well’ but this framing rejects by omission the idea that there is existential risk in the room, and it might go badly in ways where we cannot recover. To me, that makes this cheap talk and an irresponsible statement.
The second section is personal reflections.
He believes OpenAI is delivering on their mission. I would say that it is not, as their mission was not to create AGI. The mission was to ensure AGI goes safety, and OpenAI is not doing that. Nor is Anthropic or anyone else, for the most part, so this is not only about OpenAI.
He calls himself conflict-averse, which seems difficult to believe, although if it is locally true to the point of telling people whatever they want to hear then this could perhaps explain a lot. I was happy to hear him admit he handled the situation with the previous board, in particular, badly in a way that led to a huge mess, which is as much admission as we were ever going to get.
His third section is broad thoughts.
My personal takeaway from the last several years, and take on why there has been so much Shakespearean drama between the companies in our field, comes down to this: “Once you see AGI you can’t unsee it.” It has a real “ring of power” dynamic to it, and makes people do crazy things. I don’t mean that AGI is the ring itself, but instead the totalizing philosophy of “being the one to control AGI”.
We can all agree that we do not want any one person to be in control of superintelligence (ASI/AGI), or any small group to have such control. The obvious response to that is ‘democracy’ and to share and diffuse ASI, which is where he comes down here. But that too has its fatal problems, at least in its default form.
If you give everyone access to superintelligence, even if we solve all our technical and alignment problems, and find a way to implement this democratic process, then everyone is owned by their own superintelligence, in fully unleashed form, lest they fall behind and lose out, or is convinced of this by the superintelligence, and we quickly become irrelevant. Humanity is disempowered, and likely soon dead.
Thus if you indeed want to do better you have to do Secret Third Thing, at least to some extent. And we don’t know what the Secret Third Thing is, yet we push ahead.
He concludes like this:
Sam Altman (CEO OpenAI): A lot of the criticism of our industry comes from sincere concern about the incredibly high stakes of this technology. This is quite valid, and we welcome good-faith criticism and debate. I empathize with anti-technology sentiments and clearly technology isn’t always good for everyone. But overall, I believe technological progress can make the future unbelievably good, for your family and mine.
While we have that debate, we should de-escalate the rhetoric and tactics and try to have fewer explosions in fewer homes, figuratively and literally.
It is easy to agree with that, and certainly we want fewer explosions. But it is easy for calls to ‘de-escalate’ to effectively become calls to disregard the downside risks that matter, or to not tackle seriously with the coming technical difficulties, dilemmas and value clashes, or to shut down criticism and calls to action of all kinds.
Violence Is Never The AnswerOnce again: I condemn these attacks, and any and all such violence against anyone, in the strongest possible terms. I do this both because it is immoral, and also because it is illegal, and also because it wouldn’t work. Nothing hurts your cause more.
My sympathies go out to Sam Altman at this time, and I hope he comes through okay.
Most people worried about AI killing everyone have handled this situation well, both before and after it happened, and not only take strong stances against violence but also use appropriate language, at a standard vastly higher than that of any of:
- Those who are worried about those worried about AI killing everyone.
- Those who are worried about mundane AI concerns like data centers or job loss.
- Politicians and ordinary citizens of both major American political parties, and the media, on a wide variety of issues.
I call upon all three of those groups of people to do way better across the board. Over a several year timeline, I predict that most concern about AI-concern-related violence will have nothing to do with concerns about existential risk.
But there are a small number of those worried about AI existential risks who have gone over where I see the line, as discussed above, and I urge those people to cut it right out. I have laid out my concerns on that above. We should point out what actions have what consequences, and urge that we choose better actions with better consequences, without having to call anyone murderers or evil.
Eliezer has an extensive response on the question of violence on Twitter, Only Law Can Prevent Extinction, that echoes points he has made many times, in two posts.
I also condemn those who would use who use this situation as an opportunity to call for censorship, to misrepresent people’s statements and viewpoints, and generally to blame and discredit people for the crime of pointing out that the world is rapidly entering existential danger. That, too is completely unacceptable, especially when it rises to its own incitements to violence, which happens remarkably often if you hold them to the standards they themselves assert.
Discuss
AI Safety's Biggest Talent Gap Isn't Researchers. It's Generalists.
This post was cross posted to the EA Forum
TL;DR: One of the largest talent gaps in AI safety is competent generalists: program managers, fieldbuilders, operators, org leaders, chiefs of staff, founders. Ambitious, competent junior people could develop the skills to fill these roles, but there are no good pathways for them to gain skills, experience, and credentials. Instead, they're incentivized to pursue legible technical and policy fellowships and then become full-time researchers, even if that’s not a good fit for their skills. The ecosystem needs to make generalist careers more legible and accessible.
Kairos and Constellation are announcing the Generator Residency as a first step. Apply here by April 27.
Epistemic status: Fairly confident, based on 2 years running AI safety talent programs, direct hiring experience, and conversations with ~30 senior org leaders across the ecosystem in the past 6 months.
The problemOver the past few years, AI safety has moved from niche concern toward a more mainstream issue, driven by pieces like Situational Awareness, AI 2027, If Anyone Builds It, Everyone Dies, and the rapidly increasing capabilities of the models themselves.
During this period, over 20 research fellowships have launched, collectively training thousands of fellows, with 2,000-2,500 fellows anticipated this year alone[1]. The talent situation for strong technical and policy researchers is far from solved, but meaningful progress has been made.
The story for non-research talent is very different. By our count, there are roughly 7 fellowships for non-research talent (producing around 300 fellows this year[2]), spread thin across an array of role types. As a result, many critical functions within AI safety remain acutely talent-constrained.
More broadly, the ecosystem has a lot of people who are great at thinking about ideas. We need more people who are great at thinking about people and projects. Read more about this here.
The consistent feedback we hear from senior people across the ecosystem is that the hardest roles to fill are not research roles. They are:
- Generalists: operators, executors, fieldbuilders, people and program managers, grantmakers, recruiters. People who can ideate, manage, and execute a broad range of non-research projects.
- Founders, both technical and non-technical, for new research and non-research organizations.
- Communications professionals who can work on policy and research comms.
- Chief-of-Staff types who can support senior leaders and multiply their impact.
- Senior operational people with domain expertise in areas like cybersecurity, policy, or large-scale project management.
Based on our experience and anecdotes from organizations in our networks[3], many organizations trying to hire find that research postings attract dozens of qualified applicants, while non-research postings often surface only 0-5 applicants who meet the core requirements (strong mission alignment, meaningful AI safety context, and general competence) despite receiving hundreds of applications.
Why the pipeline is brokenThe fellowship landscape is massively skewed toward research.
Around 20 research fellowships together produce 2,000-2,500 fellows per year. For fieldbuilding, the current options are essentially Pathfinder (where the vast majority of fellows still intend to pursue research careers) and a few dedicated fieldbuilding spots at Astra. These produce an estimated 5-10 fieldbuilding generalists hired per year. This asymmetry signals that the primary route into full-time AI safety work runs through research. And, while research is a core part of safety, it is also necessary to find and develop people who can manage research projects, run organizations, and implement and communicate research ideas.
There is no clear career ladder for generalists.
A research-oriented person has a well-worn trajectory: BlueDot → ARENA → SPAR → MATS → junior researcher → senior researcher. And while this path isn't perfect, nothing comparable exists for generalists. The typical route involves running a strong university group, then hope you get hired directly at a fieldbuilding org, with no intermediate steps or clear progression path afterwards. The risk discourages people who might otherwise be excellent generalists from committing to the path.
There is no credentialing or proving ground.
Unlike research, where fellowship participation provides a track record and hiring signal, aspiring generalists have no equivalent way to demonstrate competence. Organizations won't hire untested junior talent for critical operational roles, but there's nowhere for junior talent to get tested[4].
There is no routing infrastructure.
Matching people to opportunities happens through ad hoc referrals and personal networks. This doesn't scale, and it means we regularly miss promising candidates. As the field has matured and institutional structure has grown, coordination overhead and established networks make it harder for aspiring generalists to self-start projects and stand out the way that was possible a few years ago.
Why this matters nowWe believe that there are now more good policy and technical ideas ready for implementation than there is coordination ability and political will to implement them in governments and AI companies. On the margin, we think we're receiving smaller returns from additional researchers entering the field, especially outside the top 10% of research talent. It’s also plausible that AI safety research will be automated more quickly during takeoff than most other types of work.
Many expect the funding landscape for AI safety will expand significantly over the next two to three years, which makes this bottleneck more urgent. More capital will be available, but without the people to deploy it effectively, that capital will stay inert. This already appears to be a bottleneck for current grantmakers, and it could get much worse.
Naively, we expect the world to get a lot weirder as capabilities progress. In a world where the demands on the AI safety ecosystem rapidly increase and evolve, training people with strong thinking, agency, and executional abilities, rather than narrow technical skills, seems highly leveraged.
This is particularly important because it enables us to diversify our bets and cover a large surface of opportunities for impact. There’s no shortage of project ideas for growing the field of AI safety, scaling up our policy efforts, or communicating to the public, but we simply don’t have enough talent to plan, design, and execute on all of them. Our bottleneck isn’t funding or ideas, it’s people.
Counter-Arguments"You said hundreds of people are applying to these roles. Why can't some of them be good fits? Aren't there many people who could fill operations positions?"
We draw a distinction between "hard ops" and "soft ops." Hard ops roles (finance, legal, HR, etc.) benefit from expertise, and hiring experienced professionals without AI safety context is typically sufficient. Soft ops roles (program management, talent management, generalist positions, etc,) are different. Domain expertise matters less than having strong inside-view models of the field and generalist competency. Succeeding in these roles requires real mission alignment and enough context to spot high-EV opportunities that someone without that background would miss.
"I'm not sure I agree that research talent is less important than generalist talent."
We're deliberately not making a strong comparative claim about the impact of generalists versus technical and policy researchers. What we are saying is that generalist talent is currently the binding constraint. It is harder to source than research talent and, in our models, represents the tighter bottleneck for the ecosystem's ability to convert funding and ideas into impact.
"How important is generalist talent in shorter timelines worlds?"
Our sense is that generalist talent is crucial across all timelines. While shorter timelines do compress the window for upskilling, our experience is that motivated junior people can skill up relatively quickly and help add urgently needed capacity, making the counterfactual value of pipeline-building here quite high even in shorter timeline worlds (sub 3 years).
"You argue there are all these research fellowships and no programs for non-research talent. But couldn't those programs just produce generalists?"
The existing research fellowships are well-optimized and have a strong track record of producing researchers who get placed into AI safety roles. Some fellows have gone on to non-research roles, but anecdotally this is rare. These programs seem to have a much stronger track record of taking talent who are open to different career paths and funneling them toward research, than of producing researchers who are open to different career paths.
"Aren't there a lot of non-research roles currently in AI safety?"
A few hundred people do this work today versus a few thousand researchers. There used to be a steadier stream of talent aiming for these roles, but short-timelines anxiety, the expansion of research programs, and the disappearance of some entry points that used to exist have contracted the pipeline considerably.
The Generator ResidencyAs a first step toward addressing these problems, Constellation and Kairos are announcing the Generator Residency: a 15-30 person, 3-month program focused on training, upskilling, credentialing, and placing generalists. The program runs June 15 through August 28, 2026 and applications close April 27.
How it works:
Residents will work out of Constellation and receive ideas, resources (funding, office space), and mentorship from successful generalists at organizations like Redwood, METR, AI Futures Project, and FAR.AI.
For the first few weeks, residents will write and refine their own project pitches while meeting the Constellation network and building context in the field. They will then create and execute roughly 3-month projects, individually or in groups, with generous project budgets. Throughout the program, we’ll provide seminars, 1:1s, and other opportunities for residents to deeply understand current technical and policy work, theories of change, and gaps in the ecosystem.
During and after the program, we’ll support residents in finding roles at impactful organizations, spinning their projects into new organizations, or having their projects acquired by existing ones. Selected residents can continue their projects for an additional three months (full-time in-person or part-time remote), with continued stipend, office access, and housing.
We hope to place a majority of job-seeking residents into full-time roles at impactful organizations within 12 months of the program ending.
Examples of projects we’d be excited about hosting include:
- Workshops and conferences: Run a domain-specific conference like ControlConf or the AI Security Forum, or one that brings new talent into AI safety like GCP, targeting high-leverage new audiences or emerging subfields.
- AI comms fellowship: Design and manage a short fellowship for skilled communicators to produce AI safety content. Draft a curriculum, identify mentors, secure funding, and prepare a pilot cohort.
- Recruiting pipelines: Partner with 2-3 small AI safety orgs to build the systems they need to scale: work tests, candidate sourcing, referral pipelines.
- Travel grants program: Fund visits to AI safety hubs like LISA and Constellation by promising students and professionals. Set criteria, build an application flow, line up partner referrals, and run a pilot round.
- Shared compute fund: Scope a fund to cover compute needs of independent safety researchers, model whether a cluster is needed, and distribute a pilot round of grants.
- Strategic awareness tools: Scale AI-powered superforecasting and scenario planning in safety infrastructure, build support among impactful stakeholders, and run a pilot.
- AI policy career pipeline: Build workshops, practitioner talks, and handoffs into policy career programs to route talent toward the institutions shaping policy.
- ^
This estimate draws on a separate analysis that projected the number of fellows using both publicly and privately available information, as well as extrapolations from actual data through late 2024. The fellowships included in this analysis were: AI Safety Camp, Algoverse (AI Safety Research Fellowship), Apart Fellowship, Astra Fellowship, Anthropic Fellows Program, CBAI (Summer/Winter Research Fellowship), GovAI (Summer/Winter Research Fellowship), CLR Summer Research Fellowship, ERA, FIG, IAPS AI Policy Fellowship, LASR Labs, PIBBSS, Pivotal, MARS, MATS, SPAR, XLab Summer Research Fellowship, MIRI Fellowship, and Dovetail Fellowship.
- ^
The programs included in this analysis were: Tarbell (AI Journalism), Catalyze Impact Incubator (AI Safety Entrepreneurship), Seldon Lab (AI Resilience Entrepreneurship), Horizon Institute for Public Service Fellowship (US AI Policy/Politics), Talos Fellowship (EU AI Policy/Politics), Frame Fellowship (AI Communications), and The Pathfinder Fellowship. Fellow counts were derived primarily from publicly available data.
- ^
We're deliberately vague about which organizations we're referring to here since we haven't asked permission to disclose the outcomes of recent hiring rounds. For research roles, we're mainly referring to technical AI safety nonprofits, policy nonprofits, and think tanks. For non-research roles, we're mainly referring to fieldbuilding nonprofits and technical and policy nonprofits that have recently tried hiring non-research talent requiring meaningful AI safety context beyond a BlueDot course.
- ^
Several years ago, aspiring generalists could more easily test their fit by self-starting projects in an ecosystem with minimal infrastructure and ample white space. As the field has grown, more institutional structure exists, and with it, more coordination overhead. The blank slate is gone, and the ecosystem's complexity now deters people without strong inside-view models, reputations, or existing connections from trying ambitious projects. We're not sure this is net negative in most cases, but it does mean fewer people gain the experience needed to position themselves for these roles.
Discuss
Clique, Guild, Cult
This is the first in a sequence of articles on organizational cultures, inspired largely by my experiences with the LessWrong meetup community.
Clique Guild Cult Small Medium Any size Exit Voice Loyalty Consensus Majority Counsel Deontology Consequentialism Virtue- "Let's talk this over"
- "This isn't working out"
- "Point of order, Mr. Chairman"
- "Verily I say unto you"
- "This isn't what our Founder would've wanted"
- "If you don't like it, you can leave"
- "Yeah, so if you could go ahead and get that done, that'd be great"
- "Carried u-... [nervous glances] ...-nanimously"
A clique is a small, intimate group of friends who all know each other very well. If you're in a clique, you might not know what kind of culture you're in because there might never have been any significant sources of conflict. But if there are, they will be addressed in one of two ways.
#1. "Let's talk this over"An egalitarian clique will put great effort into resolving conflicts through interpersonal connections in order to keep the group together. This may involve long hours on the metaphorical therapist's couch - NVC, Authentic Relating, etc. - or perhaps, if two friends have a falling-out, a mutual friend of theirs might try to help smooth things over.
#2. "This isn't working out"In a more authoritarian clique, people will be quicker to concede that their differences are irreconcilable and that the group (at least in its current form) should break up. However, this is seen by all parties as a fairly benign outcome, since there is not much investment in the clique "as such" (rather, the investment is in the individual 1:1 relationships) and it's not hard to start a new one. There is no sense that somebody needs to be "right" and somebody else "wrong".
Guild #3. "Point of order, Mr. Chairman"A guild is a medium-sized group where each member may have a few close connections, but will have a much larger number of "weak ties" that are connected to them only indirectly. However, the group is united (and distinguished from the wider society) by a shared institutional identity that makes it "a thing" and not merely a collection of individuals or cliques. This manifests in the use of bureaucratic procedures to resolve conflicts, since the group is too large to expect unanimity, and entrenched enough that schism is seen as more undesirable than having some disagreement over any particular decision.
In my opinion, the guild has become something of a lost art, which ought to be revived. (Future articles will go into this point further.)
CultA cult is a group based on personal authority. This authority derives from the inherent virtue of the leader (charisma, strength, wealth, etc.) and not any notion of popular support. A cult's size can exceed Dunbar's Limit because it is held together not by the members' relationships with each other, but by their loyalty to the leader. However, small- and medium-sized cults can also exist, and are perhaps more common than large cults. (Rare is the person who has what it takes to lead a large cult, but you may find yourself at the center of a small cult quite inadvertently.)
#4. "Verily I say unto you"What the leader says, goes. Members are expected to subordinate their own will and desires to that of the leader. They may advise the leader one way or another, and may bring their disputes to him/her for resolution, but the leader has the ultimate authority and responsibility for the decision.
However, in addition to this "straightforward" kind of cult, there are also various kinds of dysfunctional cults, which (perhaps) give the rest a bad name.
Fractious cults #5. "This isn't what our Founder would've wanted"If a cult loses its leader, and if the leader has not raised up a worthy successor, the group will find itself in an unstable zone where its culture is too egalitarian to persist in its super-Dunbar size, because there was never any hierarchy amongst the rank-and-file, only in relation to the leader. Therefore, the group will decay into a more stable configuration (indicated by the dotted arrows), either by someone gaining sufficient personal virtue to become the new leader, or (more likely) splitting into several cliques or guilds, each of which will claim to be the legitimate heir of the original group.
Embarrassed cultsA cult is "embarrassed" when it doesn't want to admit that it's a cult, because the leadership lacks the personal virtue necessary to operate a straightforward cult but still wants to maintain control. They may do this through some combination of pretending that the group's culture is more egalitarian than it actually is, and/or pretending that its size is smaller than it actually is. (This is denoted on the diagram by an arrow with an open circle on its base - the arrowhead is what the group pretends to be, and the base is what the group really is.)
#6. "If you don't like it, you can leave""...but we know you're not going to."
The leader of such a group may pretend that they are not claiming any personal authority at all, but "just" observing that the current clique isn't working out (see #2). However, there is an obvious asymmetry in that it is one particular party who is taunting the other one to quit, and not vice-versa. Therefore the subordinate party stands to lose a lot more, and is thus likely to accept a considerable amount of dissatisfaction before they finally decide to leave.
#7. "Yeah, so if you could go ahead and get that done, that'd be great"In the classic corporate dystopia, HR and management want you to think of your team as a small clique, so that your desire for personal connection will be redirected towards the company. They may ask for your opinion, but have no intention of listening to it. Critics rightly warn young professionals against getting sucked into environments like this, where one is prone to being manipulated into accepting substandard pay and working conditions. The warning usually given is: You should be as loyal to the company as they are to you, i.e. not at all.
#8. "Carried u-... [nervous glances] ...-nanimously"A group may put on the trappings of a guild to disguise the fact that it is still exercising top-down authority rather than being a bottom-up enterprise. For example, in a typical homeowner's association (HOA), there was never any point at which a group of homeowners got together and decided they wanted to form an HOA. Rather, what usually happens is that a developer buys a large plot of land, builds a bunch of houses on it, and creates an HOA whose membership attaches to each house, which are then sold one-by-one to buyers who otherwise have no connection to each other. Most of the homeowners thus have no real interest in participating in the HOA, but begrudgingly accede to the edicts of a handful of busybodies who have too much time on their hands.
Evolution of a growing cliqueThe culture of a clique may at first be ambiguous (A) because there is nothing really at stake. As it grows, however, if it does not simply break up, it will need to either follow the egalitarian path and become a guild (B), or the authoritarian path and become a cult (C). And in the latter case, the cult will inevitably be an embarrassed one, because if there had been someone with the requisite virtues to be a cult leader, the group would never have spent much time as a clique in the first place, but would have been a straightforward cult (D) from the beginning, and maintained its cultiness throughout its growth.
Therefore, as is probably clear by now, I think outcome C is bad and B is better. If a group has landed at C, then it may with great effort be pulled kicking-and-screaming to B - but this is likely to ruffle some feathers.
(I also suspect that there is a tendency for groups to get stuck at the "triple point" with around 30 members, in an uncomfortable equilibrium between all three types because the group cannot decide what it wants to be.)
What's so great about guilds? (Plan of the sequence)Forthcoming articles in this sequence will lay out a case for why we should want more guild-like organizations to exist. (Links will be added as the articles are posted.)
- A guild can grow larger than a clique (This article)
- A guild makes it possible to improve things without schism (Fear of crowding out)
- A lack of guilds leads to a general malaise and atrophy of democratic values (We live in a society)
- A guild can contribute to the social fabric in a way more ambitious than cliques (Call for machers)
- A guild can be more robust than a cult because it can better distribute important responsibilities ("Community organizer" is a double oxymoron)
Other articles (Society is a social construct, pace Arrow; Rubber stamp errors; Anti-civicality) will discuss various norms that are necessary for a guild to function well, but which may seem strange or unintuitive for people who are accustomed to cliques or cults. I will conclude with a reflection (So are you some kind of communist?) on the tension between social and individual moralities.
Discuss
We need Git for AI Timelines
I was recently reading the AI Futures' Q1 2026 timelines update and noted their quarterly updates (the last one being in December, with the release of the AI Futures Model) are struggling to keep pace with the thing they're trying to track.
The pace of AI development is incredibly fast and only hastening; Kokotajlo's shortened his timelines for an AC by 18 months (late 2029 to mid 2028) in a single update due to 4 specific parameter changes. Five days later, Anthropic announced Claude Mythos Preview, which arguably invalidated some of the said parameters before the ink had time to dry.
This isn't a criticism of the AI Futures Project; they do commendable work. To be clear, Kokotajlo and the AI Futures Project are arguably the best at what they do in the world. His track record is remarkable, and AI2027 has sparked immense conversation about the future of AI/timelines (it's what got me into LW), but when the field changes completely in its pacing every two months, the community more often than not is navigating with an outdated map. And the problem is getting worse. Mythos hasn't yet been evaluated by METR, Spud hasn't released, and by the time the Q2 update drops, the field will have again shifted to another focal point.
But the cadence itself is the surface issue; updates aren't nearly granular enough to be tied back to each "step". When Kokotajlo updates his priors for an AC, we don't see the causal chain leading to each decision shortening his timelines by X amount. His rationale for the AC median being 1 year of autonomous work was that Opus 4.6 "impressed" him. But the actual definition of what 1 year even means remains muddy; the original AI2027 scenario had the median set at 6 months for an SC before moving it back to 3 years. The SC definition shift of 3y-1y accounted for around half of the 18 month shift in his Q1 update; the stated justification is Opus "impressed" him. Impressed how? At what point between December and April did he change his priors? The entire causal chain here collapses to a single word in a blog post.
In software engineering, this would be the equivalent to someone pushing a commit to main with a message "fixed stuff because it now works". You'd never accept that for code, so why would you accept that for a justifiable reason for the most important technological revolution in human history?
There's no unified platform where forecasters can independently publish their timelines with substantial backing/integration with the platform itself. Sure, you can write a Substack article, spin up a short LessWrong post, perhaps post a Twitter thread, but these are strung all over and are discontinuous for someone trying to get a concrete perspective of what different forecasters think. One might say Metaculus is the solution; while this is a way of congregating forecasts, it's still less than optimum. Conversation and rationale is walled behind "forecast and pay" without a congregational space to discuss the reasoning behind those forecasts (yes there is a comment feature but it is scarcely used). There was an excellent post around Broad Timelines that highlighted this; Metaculus highlights "medians" and less of a full distribution that's more sought after in our space.
As neo noted in said post, we need to "design info-UI tools that facilitate that (the timeline formulation) process". Broad distributions need platforms that can track how they update over time. A quarterly blog post cannot do that. Forecasts updated granularly over time with reasoning and deliberation behind them can.
Why I'm using Git here as an analogy; SWEing fixed this class of problem years ago. You had commits (changes in timeline predictions) diffs showing what changed, comments showing why they changed, branches for code (in this analogy, scenario) forks, blame for accountability (we need to be less wrong after all), and merge conflicts that require resolution rather than dissolving into Twitter discourse.
The minimum viable version of this is frankly embarrassingly simple. A GitHub repo with each forecaster maintaining a YAML file with their distribution for an agreed upon definition (whether it be an AC, SC, ASI etc.). Commits are updates to said files/timelines with rationale in the commit message.
Claude Opus 4.6 had a 80% time horizon of 70 minutes. Assuming Mythos has an 80% TH of ~240 min, the doubling time is ~34-40 days. Even if we're pessimistic at a time horizon of 180 minutes, the doubling time is still 45 days. The thing we're forecasting is now shorter than our update cycle.
The rationalist community, of all communities, should find that unacceptable.
Discuss
Treaties, Regulations, and Research can be Complements
I think the debate over whether AI risk should be addressed via regulation or treaties is often oversimplified, and confused. These are not substitutes. They rely on overlapping underlying capacities and address different classes of problems, and both van benefit from certain classes of research.
David Kreuger, to pick on someone whose work I largely agree with, recently posted that “Stopping AI is easier than regulating it.” I largely agree with what he says. Unfortunately, I also think it is an example[1] of advocates for a cause creating fights where they're not needed, and in this case making the discussions around AI unfortunately more rather than less contentious, and less rather than more effective.
And the reason the fights are not needed is that different risks live at different levels, and different tools are effective in different ways.
Clearly, many of the risks and harms of AI should not be addressed internationally. There is little reason or ability to harmonize domestic laws on fraud, discrimination, or liability, which would be a distraction from either reducing the harms or addressing other risks. Existing laws should be adapted and applied, and new regulations should be formulated where needed. International oversight would be unwieldy and ineffective for even most treaty compliance efforts - as other treaties show, there is a mix of national and international oversight. But domestic regulation can create liability incentives, require or standardize audits, clarify rules, and provide enforcement mechanisms and resources. All of those are at least sometimes useful for treaties as well. When Kreuger says “the way I imagine stopping AI is actually a particular form of regulating AI,” he is not talking about the harms and risks regulation could address - though given what he has said elsewhere, he agrees that many of them are worth mitigating, even if they are not his highest priority. So it should be clear that treaties will not, cannot, and should not address most prosaic risks of AI systems and misuse.
By the converse argument, which he and others have made convincingly in the past, some harms of AI systems come from racing towards capability rather than prioritizing safety. These types of risk emerge from the dynamics of international markets and from great power competition. Obviously, these dynamics aren’t well addressed by domestic regulation on the part of any single actor. It is incomprehensible to talk about regulation alone to address those risks, just like it is tendentious to talk about using international treaties to mitigate other classes of risks and harms of AI systems.
Unfortunately, many discussions put “we need a global treaty to stop AI risks” in opposition to “domestic regulation is the only realistic path.” Not only do I think this is backwards, but I’ll argue that so is the related false dichotomy of industry self-regulation versus government rules. Industries that embrace safety welcome well-built regulation. Even in areas where they don’t have strict rules, airlines have national bodies that manage risk and accident reporting. (And the AI industry leaders often claim to be the same way, wanting national or international rules - just not any specific ones.)
So, to come to my unsurprising conclusion, we actually have several different plausibly positive and at least partially complementary approaches.
- Certain classes of research produce techniques like, evals, interpretability, human oversight approaches, control methods, and operationalizable definitions of specific risks. Some of these are dual use or net negative, but the parts that are useful are complementary to both regulation and treaties.
- Regulation needs operationalized definitions of risks, measurable standards, concrete goals, auditable procedures and oversight methods, and investigatory tools. Many of these are enabled by specific forms of technical or policy safety research.
- Treaties need shared definitions, clear goals, regulatory oversight and enforcement, credible verification, and both technical and regulatory methods to distinguish compliance from defection. Some of these are enabled by regulation, some by relevant research.
So we end up with a sort of triad, where research can enable measurement and definitions, and provide tools, regulation can force adoption and enforce usage of tools, and treaties can align incentives around defection dilemmas and provide common aims.
This doesn’t imply that most safety research is net risk-reducing, that most regulation is useful, or that most possible treaties will reduce risks. But it does say that they can be complementary. Some disagreements are substantive. But others are treating complementary approaches as mutually exclusive - and I think we should instead figure out common ground, which can make the fights about these issues both more concrete, and narrower.
- ^
yet another example
Discuss
5 Hypotheses for Why Models Fail on Long Tasks
Written extremely quickly for the InkHaven Residency.
Like humans, AI models do worse on tasks that take longer to do. Unlike humans, they seem to do worse on longer tasks than humans do.
This is a big part of why the METR time horizon results make sense: because longer tasks are also “harder” for models, and more capable models can do longer tasks, we can use the length of tasks that the models can perform as a metric of model capability.
There’s a clear etiological or causal-historical explanation of why models do worse at long tasks: they’re probably trained on more short tasks and fewer long tasks. This is both because it’s easier to make shorter tasks, and because you can train models on more short tasks than longer tasks with a fixed compute budget.
But from the perspective of AI evaluations, it’s also worth considering mechanistic explanations that make reference only to how properties of long tasks interact with the AI system in deployment. Whatever the training story may be, the AI models as they currently exist have some property that makes long tasks genuinely harder for them in a way that tracks capability. Understanding what this property is could matter a lot for interpreting the METR time horizon and even for forecasting AI capabilities over time.
So here are five such possible hypotheses that explain why longer tasks seem consistently harder for current models, based in large part on my experience at METR.
Long tasks are less well defined, and require judgment or taste (which models are bad at). For a software engineer, a 1-minute coding task might involve composing a single 10 line function or running a relatively simple SQL query. By their very nature, these tasks tend to be easy to define and easy to score, with relatively objective success criteria and little human judgment involved. A 15 minute task may be implementing a relatively simple data processing script or fixing a simple bug: more complicated, but still relatively easy to score. In contrast, an 8 hour task likely involves substantial amounts of design taste (in ways that are harder to score), and month long tasks likely involve communicating with a stakeholder or building code with properties that are hard to algorithmically verify (e.g. maintainability). (This is also related to why algorithmically scorable longer tasks are harder to make.)
While the longer METR tasks are still algorithmically scored, they tend to require models to build sophisticated software artifacts or iteratively improve on experiment design, where taste plays a larger role in success. Since models seem to lack ‘taste’ of some sort, relative to humans of comparable execution ability (hence the complaints about AI Slop), this could explain why they do worse on longer tasks.
Long tasks require more narrow expertise (which models may not have). An important property of the METR task suite is that longer tasks should not be trivially decomposable into shorter tasks. That is, a 10 hour-task should not trivially be decomposable into 10 1-hour tasks, and 10 short math problems do not become a single longer math problem. Perhaps as an artifact of the property, many of METR’s longer tasks (and perhaps longer tasks in people’s day-to-day work in general) rely on more specialized procedural knowledge that is hard to easily acquire via Google. For example, many of METR’s long tasks are cryptographic or machine learning challenges that require some amount of procedural knowledge in the relevant fields to approach. Insofar as the long tasks are more likely to require procedural knowledge outside the AI models’ area of expertise, they may struggle.
Personally, I find this relatively unlikely as an explanation for the METR time horizon tasks (since AI models seem to have a lot of expertise in the relevant areas), but it might be a large explanation for the inability of AIs to autonomously complete large tasks in general.
Long tasks take models longer, leading to more stochastic failures (which models exhibit). A popular explanation that people cite is that tasks that take humans longer also take AI agents more steps to complete, and AI are not fully reliable, and fail with some small probability on each step. For example, Toby Ord raises this as a hypothesis in a response to our Time Horizon paper.
I think this is definitely part of the explanation (and why longer tasks are harder for humans as well), with some caveats: first, I caution against naively interpreting human time as proportional to AI steps and applying a constant hazard model. For example, it turns out that if you fit the failure rate model for AI agents over time, the failure rate goes down as the task goes on! Second, AI models seem to have different time horizons across different domains, and simple versions of this hypothesis cannot explain that phenomenon.
Long tasks take models longer, causing failures due to distribution shift or self conditioning (which models may suffer from). A related explanation is that longer tasks take models more off-distribution: base models (at least earlier on) were not trained to predict long sequences of model-generated outputs, and even RLVR’ed models were probably trained with short tasks, far shorter than the 16 hour, tens of millions of token tasks that we might ask them to do. This increases both the chance that the models are simply off distribution (and thus may be less competent in general), and the chance that they accumulate errors by chance and start conditioning on being the type of agent that makes such mistakes (and thus becoming more prone to make such mistakes). In the same way that naive versions of the constant hazard model seem contradicted by evidence, I suspect that naive versions of this hypothesis are also likely to fail. But it’s possible that more sophisticated versions may play a key role in explaining the phenomenon.
Long tasks require better time and resource management (which models struggle with). Finally, an explanation that I often think is neglected is that longer tasks tend to require meta-cognition and explicit strategy, which current models seem to struggle with. A 5-minute task such as writing a simple function or script can be done in one go, without much planning, but getting the best score in a machine learning experiment over 8 hours requires allocating scarce resources including remaining time and compute. It’s been observed that models understandably struggle a lot with understanding how much (wall clock) time they take to do particular tasks, or often double down on failing approaches instead of switching strategies.
I welcome more thinking on this topic, as well as more empirical work to distinguish between these hypotheses.
Discuss
My Cold Prevention Stack for 2026
I get sick a lot. Getting sick sucks. Maybe there are cheap and easy ways to get sick less?
I asked LLMs[1] to read all the relevant literature reviews and figure out what supplements or medicine I should be taking to get sick less and make it suck less. I looked through the recommendations and did a little additional research to make sure the AIs weren’t making egregious mistakes, but I am not an expert—this should not be viewed as credible medical advice.
Here is the quick list of steps I am currently taking or think might be useful to others.
- Zinc lozenges: When you are starting to get sick, take zinc lozenges aggressively. They need to be a specific type of lozenge (Amazon, Life Extension). Suck, don’t chew, you’re trying to maximize the time they are dissolving in your mouth, and don’t eat/drink for 20 minutes after. Aim for one every 2 hours (~6 per day). Literature review. More notes in the appendix for this one.
- Probiotics: For prevention, take specific probiotics once daily with a meal (Amazon). There are various products that have support in the literature, and it makes sense to buy one of them rather than a random probiotic (some strains appear to not work). The effect size is like a 25% reduction in colds—suspiciously high! Literature review.
- Standard medication when sick for symptom relief (check for side effects and interaction with pre-existing conditions): NSAIDs (Ibuprofen) are primarily for headache, ear pain, and muscle and joint pain, Literature review; same for Acetaminophen (Tylenol). If chest congestion Mucinex. Nasal decongestants (e.g., sudafed) or a combination antihistamine-decongestant-analgesic (e.g., NyQuil Severe Cold & Flu) also might help with symptom relief. Note that oral phenylephrine has been deemed ineffective by the FDA, even though it’s common, so maybe use a different decongestant, specifically pseudoephedrine (available behind the pharmacy counter)? Literature review.
- Obvious things to do when sick not necessarily backed by literature: Rest more, drink lots of water.
- Physical things to get sick less: Wash your hands with soap and water, it’s probably good to use hand sanitizer before eating, it’s probably good to wear a mask in crowded indoor spaces (but the evidence isn’t very strong), also avoid touching your face if possible. Literature review for some of these.
- Maybe take vitamin C megadoses regularly. The literature is mixed here and the side effects (stomach problems) are too much for me, but it might be good to take 1g of vitamin C daily. Literature review.
- Maybe you should gargle salt water? I’m not sure, but it is cheap to try.
- Maybe you should do nasal saline rinses? The literature is inconclusive but some people swear by it. If doing this, use distilled water.
- Maybe you should get a flu shot. It reduces the chance of getting the flu and might make the flu less bad. But your chance of getting the flu is already pretty low, and the side effects of the vaccine are nontrivial for some people, so it’s not clearly worthwhile (I am surprised at how much this isn’t a slam dunk in favor of the vaccine). Literature review.
If you work on important problems then getting your coworkers sick is bad for the world (in addition to bad for them). If you are going to work while sick, consider doing it from home. If you work from the office, you should wear a mask, wash your hands frequently (and especially before touching a bunch of communal stuff), and cover your cough/sneeze (not with your hand).
Appendix on zinc lozengesZinc acetate Lozenges.
What: When you start feeling any symptoms at all, or when you’ve been exposed, start sucking on zinc lozenges. Your goal is to coat your mouth and throat in zinc for basically as long as possible. So you should be sucking each lozenge for 20-30 minutes (don’t chew), and then don’t drink or eat anything for another 20 minutes. Aim for 5-7 lozenges in a day, once every two hours or so.
What to buy: The particular lozenge probably matters a lot! The lozenges you want are big and slow to dissolve, Amazon link, manufacturer’s link (note that Amazon is frequently out of stock, and the manufacturer gives discounts for larger orders, I might buy 4 bottles at a time).
Evidence basis: Literature review pointing to the fact that they might reduce cold duration some. The main counter evidence, an RCT finding either similar or worse recovery than placebo.
Notes:
- Don’t use these all the time, only when you’re worried you’re getting sick. Zinc in such large quantities interferes with copper and iron absorption and probably has other downsides.
- Some people report stomach problems.
- I find that when taking these lozenges, my colds are much more mild than usual and I can usually work at least half my normal productivity while sick.
- I have heard many positive anecdotes about these.
- Some people don’t like the taste/texture.
- Other discussion on LessWrong.
- ^
Here’s a ChatGPT chat with an initial research report.
Discuss
Your body is not a white box (and you're thinking about weight loss wrong)
Epistemic status: This is an intuition I've had for a while that feels obviously correct to me from an inside view perspective. Note however that I am not a doctor and have no training in the medical field. I also do not have experience losing weight. You should caveat this information appropriately. I will note that I am capable of running mountain marathons and have a six pack (despite not working out for the past 6 months) as evidence that this mode of thought works well for me.
With apologies to anyone I offend with my ranting and rhetoric, this was the only way I was able to write the article authentically.
I've spent just over a year now immersed in various aspects of the rationalist community. It's a weird and wonderful place and I am glad that I'm here. It is also home to the inkhaven residency, where I have recently been getting to know some of the local belief systems.
I shall attempt to break one of them, at least in part, today. I will start by linking you to the article "The Blueberries of Wrath" by my friend MLL[1]. It's a long, challenging article, and I understand approximately half of the words. Here's an extract:
I’m not going to review the entire cursed realm of internet users claiming to be sensitive to dietary salicylates, polyphenols, and whatever other Trojan berry adversaries that might be captivating their paranoia. But here’s a mugshot. In it we see people:
- Attributing all kinds of symptoms to dietary salicylates including dark undereye circles and adrenal fatigue.
- Fixating on a handful of studies from the 1990s (mostly in autistic children) suggesting that phenol sulfotransferase deficiency is responsible for the accumulation of dietary phenols in the body.
- Failing to rigorously distinguish between polyphenols, salicylates, and phenols in general, let alone different polyphenols, instead lumping everything into “high-phenol” foods. Likewise the recommended treatments for salicylate sensitivity, phenol sensitivity, and methylation disorders more or less overlap.
- Self-diagnosing with conditions in the absence of established diagnostic tests; while in principle elimination/challenge dieting can reveal things, we should expect it to be vulnerable to placebo and confirmation bias.
- Following modern variants or extensions of the Feingold diet. The most structured is FAILSAFE (Free of Additives, Low in Salicylates, Amines, and Flavour Enhancers), which has a large Facebook following but no modern evidence to back up the salicylate claim
OK, so by my reading of the article, people on the internet have looked at some studies, decided that "phenol sulfotransferase deficiency" is responsible for the accumulation of "dietary phenols" and therefore decided that berries are bad for them. MLL goes through and points out a variety of errors they're making. Apparently, one of these is "failing to rigorously distinguish between polyphenols, salicylates, and phenols in general, let alone different phenols".
I do not know if this is true or false. I do not know what this means. I do know that it is very possible to be healthy without the slightest hint of knowledge about phenols. I know it because I've done it. I also know because I've met a large number of wonderfully healthy and fit individuals who haven't touched a biology textbook in their lives. I also think that the fact that people can come to the conclusion that blueberries are bad for them via this sort of interrogation is suspect.
To be clear, the human body is, on a fundamental level, physics. It can be understood through the laws of chemistry and biology, and I hold huge respect for the researchers looking into it. However, if we want to talk about personal health, here is a map of the known biochemical pathways in the body:
If you wish to try to claim that understanding this is the fastest way to get healthier, I'll be waiting for you in the gym. If, in more reasonable fashion, your claim is that understanding parts of the diagram can help you optimise your nutrition, I'll still be waiting for you in the gym, but note the following first:
- Even small parts of this diagram are really complex and work in weird and wonderful ways
- It is a complex system, so even understanding part of it perfectly does not mean you understand the effects of that part on the rest
- Understanding how something functions is not the same as being able to predict the outcomes of that thing (see: chaos theory)
- It's a system which has been optimised for billions of years by evolution, so moving out of distribution is likely to break the carefully balanced forces which have strived to create them.[2]
Basically, it's really hard to understand, if you do understand it that doesn't mean you can control it, and if you can both understand and control one aspect of it you're still likely to break whatever else is connected to it. Of course, we have the caveats that if you're only using it to make minor adjustments, you're unlikely to take your body out of distribution so you'll be fine[3]. But a broader question emerges.
Does this seem like the most effective way to go about life to you? Do you want your personal wellbeing to depend on whether or not you've thought about your phenol intake correctly? No? Good. I have another path.
If you can't use white-box thinking, use black-box instead. You were designed to grow up in the hunter-gatherer environment, so your body will take whatever actions it thinks necessary to ensure your survival within that environment. Rather than argue for this line of thought, which I expect people to understand in principle, I'll demonstrate it on an example. In heretical fashion, I will be picking on Eliezer Yudkowsky.
A couple of months ago, I spent a bit of time messing around with my scraped version of LessWrong, and, while going through the lowest karma posts, happened upon the wonderfully titled "Genuine question: If Eliezer is so rational, why is he fat?".
He replies in a comment with some content copied over from X. A summary:
For the benefit of latecomers and CICO bros, my current equilibrium is "spend 1 month fasting / starving on 700 cal/day keto; spend 2 months eating enough to work during the day, going to bed hungry, and therefore gaining 1-2 lb/wk".
Diets like the potato diet fail, not because they don't succeed in forcing me to eat less -- I do, indeed, end up with not enough room in my stomach to eat enough potatoes to work and not feel tired. The potato diet fails because it doesn't protect me from the consequences of starvation, the brainfog and the trembling hands. If I'm going to be too sick and exhausted to work, I might as well go full keto on 700cal/day and actually lose weight, rather than hanging around indefinitely in potato purgatory.
Semaglutide failed, tirzepatide failed, paleo diet failed, potato diet failed, honey diet failed, volume eating with huge salads failed, whipped cream diet failed, aerobic exercise failed, weight lifting with a personal trainer failed, thyroid medication failed, T3 thyroid medication failed, illegal drugs like clenbuterol have failed, phentermine failed (but can help make it easier to endure a bad day when I'm in my 600cal/day phase), mitochondrial renewal diets and medications failed, Shangri-La diet worked for me twice to effortlessly lose 25lb per session and then never worked for me again.
Wow. That's a long list of things to have fail on you. Let's see if we can gain any insight in our new black box frame.
The first thing we note is that we evolved to live in a range of different environments. Humans range geographically from America to Australia, from Africa to Asia. Over the millions of years of our evolution we have lived on top of mountains, by the sea and in the desert. Many of these environments, especially in temperate zones, will vary enormously in their conditions throughout the year. One of the most important evolutionary adaptations, we would therefore expect, would be to have a body which can itself adapt to whichever environment it finds itself in.
Let's think about the implied environment surrounding Eliezer then.
- Low in calories – low enough that he's hungry when he goes to bed
- Prone to regular famines – he's on 700 cal/day
- Low in required exercise – he mentions he's tried daily exercise, but when I read the thread in more detail, this was 2h of walking per day. Going off how hardcore he's done everything else, this implies a very low baseline to be coming from.
Now we ask what the ideal body type is for that environment. I would argue that it's a body which:
- Is extremely calorie efficient
- Survives famines by storing as much energy as possible during off periods
- Reduces movement as much as possible (explaining his famously low energy levels)
His body is acting perfectly rationally for the environment he's told it he's in! As far as I can tell, he's in an inadequate equilibrium where he wants his body to become thinner, but his body desperately wants more calories.
So what does this new way of seeing things mean for how he should act in practice?
I should first remind you that this approach is still entirely theoretical. It has not been battle tested, although it seems to me to suggest reasonable courses of action. In this particular case, it seems to me like the priority is for Eliezer to convince his body that he is in an environment more amenable to his preferred body type. What does this environment look like?
- High activity (especially long distance: fat is not an advantage to have if you're walking 30km a day)
- Consistent calorie levels (no need to store up fat)
- Sufficient calorie levels (so you have enough energy to do the stuff you need to do).
If I was to recommend a course of action in this particular case, I think it would be something like "Eat enough to satisfy your hunger. You will gain weight, but this is to be expected when moving out of a local minimum. Do long distance. Build up your physical endurance, this should have additional benefits in other areas of your life. I don't know how long this will take. Given how long you've spent convincing your body of the environment it's in, I expect it to take a while to convince it of its new surroundings. Use physical endurance as your metric for progress, not weight."
To be clear, I know he's tried a bunch of things, including exercise and an extreme diversity of diets and drugs. I do not have access to more detailed specifics of what he's done, and I expect he's had advice from a wide variety of people far more knowledgeable than me. It could be that my armchair help is just one more on the pile of failed attempts. It does however seem to me to provide an explanation for why many of the past attempts have failed, and to provide a way out which would (possibly?) previously have ended up being rejected due to weight gain. I don't know.
I hope this is useful to someone.
AddendumThere are a few additional points relevant to the main thesis here, which I haven't been able to fit into the main post.
In no particular order:
- I think this perspective provides a good basic theory for why it is so common for people to "bounce" after a successful dieting regime
- It also seems to explain why most success tends to come when people change their full behaviour patterns.
- This whole thing is consistent with the empirical result that fat gain is related to calories in minus calories out (CICO), which is approximately right under controlled conditions: my claim is that 'calories out' is a variable your body actively controls, which is what CICO accounts sometimes handwave. A lot of work seems to be traditionally done by specifying that different people have different metabolisms, which burn different amounts of calories. If you wish to use this frame, think of the behaviours predicted here as modifying your metabolism.
- Another thing I notice when thinking about CICO is that in practise when I am at peak fitness and have a couple of days off, I feel a strong drive to go for a run, exercise, or just jiggle my leg up and down. I basically think that this does the job of driving me to burn the extra calories I would put on as extra weight in an environment where my body believed this was ideal.
- ^
He's checked through this article, so I hopefully haven't made any massive blunders where this is concerned.
- ^
Yes, evolution is the blind, idiot god, but creating an organism is also a hard problem, which means that progress can be continuously made for long periods of time. The paper "Long-term dynamics of adaptation in asexual populations" showed that e.coli fitness increases were better fit by a power-law model than a hyperbolic model (which asymptotes). This is evidence towards the theory of there being no practical upper bound to the progress that can be made by evolution.
- ^
There is the additional caveat that mechanistic information is generally much more useful for fixing broken things – it doesn't take a genius to figure out that if your shinbone is in two pieces, that needs to be fixed.
Discuss
Splitting Mounjaro pens for fun and profit
tl;dr: you can subdivide Mounjaro pens to get less than the stated dose from them. This lets you e.g. buy a 15mg pen and instead get 5mg at a time out of it (so you’d get 12 doses instead of 4). This works out to be much cheaper than buying a pen of the correct dose, and cheaper than using grey market peptides. Use this calculator to figure out how much to use.
[This is not medical advice I am not your doctor etc etc]
(See also: “You’re not sick enough for this medicine.”)
Miracle weight-loss drug Mounjaro comes in fixed-dose pens and a fixed dose escalation schedule. You can 1) choose to ignore this schedule and just stay on a low dose, and 2) subdivide the pens to make them last longer and save a lot of money.
In the US these are often autoinjector pens; there’s no way to customise the dose as it just gives you what’s printed on the side. In other countries you get a KwikPen, which is more like an insulin injector.
You take off the lid, screw on a single-use needle, twist the dial on the end until the window shows “1”, jab yourself with the needle and slowly push down the plunger.
Technically you can “count clicks”. The dial clicks as you rotate it; a full dose is 60 clicks, so 30 clicks gives you half a dose, 15 a quarter, etc. But the pen will only deliver four doses regardless of size and then it locks you out!
There’s a trivial and safe way to get around this: use an insulin needle to draw the amount of liquid that you want, and then inject it directly. You can buy the highest-dose pen, which contains four 15mg doses (60mg total) and get e.g. 24 × 2.5mg doses from it. This means you can stretch out a single pen to last many months.
The cost savings are substantial – a 2.5mg pen costs £37.24 per dose, but a 15mg pen is only slightly more expensive. So if you’re using a 15mg pen but only taking 2.5mg at a time, it works out at £12.46 per dose. This is less even than grey market Chinese peptides, which often run $100 for 10mg.[1]
Don’t the pens expire?Yes, and this is where this trick runs counter to the manufacturer’s advice.
Once you start using the pen, it’s been exposed to the air and to any bacteria on the needle. There is a risk of bacterial growth in the liquid in the pen – the typical guidance is to not use an opened multidose vial beyond 30 days – but this is probably overstated.
The tirzepatide solution in the pen contains benzoyl alcohol which inhibits bacterial growth, although not forever. In the case of e.g. multi-dose insulin vials, using them well beyond 28 days shows negligible contamination in practice – one study found only trace skin flora after 53 days of use, another found the preservatives actively killed deliberately introduced bacteria through day 50, and a third found zero contamination across six months of twice-daily use[2]. Anecdotally people have used the pens for many months after they’re first used without issue.
But the medical guideline is to use the pens for no longer than 30 days after opening. It’s up to you to decide your risk appetite here. Aaron Kaufman wrote an excellent post on how long you can use peptides after reconstitution (which in our case is just after the first time you’ve used the pen):
I’ve concluded that the 28-day limit appears to be conservative regulatory boilerplate mostly divorced from any specific scientific reasoning. … Based on the considerations above, I personally throw out refrigerated reconstituted peptide vials at about the 4-month mark, which is almost totally arbitrary. I make no judgement on what anyone else should do.
The peptide itself is stable in the fridge for many months, up to the expiry date on the pen.
Staying on low dosesThe typical dose escalation schedule is to start on 2.5mg weekly for four doses, and then increase stepwise every four weeks up to 15mg. But plenty of users don’t need to go to the higher doses, with many staying at 2.5mg. There’s no good reason to increase your dose if you’re seeing satisfactory results at lower doses.
You can even take 1.25mg, or half the starting dose, if you’re concerned about side-effects when you start.
Guide to getting smaller dosesYou’ll need an insulin needle and a Mounjaro KwikPen. You can essentially follow any guide to injecting from a multi-dose vial, you’re just using the pen itself as a vial.
- Calculate how much liquid you need from the pen.
- Wipe the rubber septum with an alcohol wipe, shown here:
- Uncap the needle and insert it through the septum.
- Draw up the amount of liquid you want.
- Remove the needle from the pen and administer it with correct technique.
- ^
Based on prices listed at ASDA Online Doctor as at publication. Example per milligram prices:
2.5 mg: £148.97 ÷ 10 mg = £14.90/mg
5 mg: £188.97 ÷ 20 mg = £9.45/mg
7.5 mg: £248.97 ÷ 30 mg = £8.30/mg
10 mg: £278.97 ÷ 40 mg = £6.97/mg
12.5 mg: £288.97 ÷ 50 mg = £5.78/mg
15 mg: £298.97 ÷ 60 mg = £4.98/mg
So even using a 5mg pen for 2.5mg doses is a substantial cost savings, especially added up over a long period of time – most patients will take Mounjaro for months or years to get to their goal weight, and often will stay on a 2.5mg a week maintenance dose indefinitely. - ^
In a study of 69 multi-dose insulin vials used by patients for an average of 53 days, only 8 showed any bacterial contamination at all — just 1 colony-forming unit per millilitre of common skin flora (S. epidermidis and P. acnes), with no endotoxin detected. Critically, when vials were deliberately inoculated with S. aureus and P. aeruginosa and kept at room temperature, they were sterile within 48 hours — and this antibacterial effect was maintained through serial re-contamination at days 17, 30, and 50.
A more recent study went further: refrigerated multi-dose insulin vials aspirated twice daily with a new syringe for six months showed no microbial contamination at any point during the study period.
A human retrospective study of insulin glargine used up to 74 days beyond the recommended duration found no injection site infections.
Discuss
When the "Black Box Problem" Becomes the Default Message
Within AI Safety Policy Research, I am very focused on contributing to improving the definitions of the concepts "transparency" and "explainability" so that truly useful and actionable policy standards can be created in these vital areas. This has been an interest of mine for some time, but has been renewed with my recent discovery of Alondra Nelson's work (see https://www.ias.edu/sss/faculty/nelson). This includes her recent presentation at the IASEAI 2026 conference titled "Algorithmic Agnotology: On AI, Ignorance, and Power", in which she argues that current AI industry public discourse seems to intentionally blur the lines between what is truly unknowable/stochastic within AI technology and what companies actually DO know but choose to withhold from public knowledge (e.g. unpublished research and red-team findings, internal monitoring logs, crucial system card information that only becomes publicly available the same day a model is released, thus preventing pre-release public scrutiny or feedback, etc.).
Nelson posits that by intentionally keeping these conceptual lines vague in public dialogue—doing little to clarify uncertainties that are truly stochastic (fundamentally unknowable) from uncertainties that are actually epistemic (could be pursued and solved given sufficient resources and attention)—AI companies have succeeded in molding and managing our public internal narrative about the nature and extent of AI risks, as well as who, if anyone, should be addressing them. Essentially, by invoking "the spirit of the AI black box problem" regardless of the challenges being discussed, unknowability becomes operationalized as a public communication strategy for addressing all risks and public questions that AI companies prefer not to answer with actual evidence.
I highly recommend her presentation: https://www.youtube.com/watch?v=5CRJiLSlywA . Her jointly authored book, Auditing AI, via MIT Press will be released on 21 April, with preordering available: https://amzn.to/4ssGjks
Discuss
Stopping AI is easier than Regulating it.
I want to start with this provocative claim: Stopping AI is easier than regulating AI.
I often hear people say “Stopping is too hard, so we should do XYZ instead” where XYZ is some other form of regulation, such as mandating safety testing. It seems like the purpose of doing safety testing would be to stop building AIs if we can’t get them to pass the tests, so unless that’s not the purpose, or proponents are confident that we can get them to pass the tests (and hopefully also confident that the test work, which they quite likely do not…), this particular idea doesn’t make a lot of sense. But people might in general think that we can instead regulate the way AI is used or something like that.
Thanks for reading The Real AI! Subscribe for free to receive new posts and support my work.
But I think this line of argument gets it exactly backwards. Stopping AI is easier than regulating it.
Why? Well let’s dive in. First, I need to explain what I mean…
I mean, specifically, that stopping AI is an easier way to reduce the risks from AI to an acceptable level that other approaches to regulating AI.
The way I imagine stopping AI is actually a particular form of regulating AI, specifically via an international treaty along the lines of Systematically Dismantling the AI Compute Supply Chain.
Also, when I say “it’s easier”, what do I mean? Well, there are a few ways in which stopping is hard. I’d separate technical and incentive challenges from political challenges, and I’m setting aside political challenges, because I think we should be clear about what should happen and why, and then seek to accomplish it politically.
Besides politics, the main underlying issue preventing meaningful AI regulation is international competition, especially between the US and China.1 So basically, I mean stopping AI is the most effective way to address this key barrier to international cooperation, which is necessary to reduce AI risks to an acceptable level.
I believe in the fundamentals of AI, and I believe alignment is not doomed, so I believe that AI could indeed end up giving one nation or company control over the future. It’s still not clear that it’s rational to race to build AI, given the risks involved. But it does seem hard for me to imagine a stable situation where governments aren’t confident their adversaries aren’t building super powerful AI in secret.
Proposals to govern super powerful AI internationally while still building it suffer from a bunch of challenges that stopping it doesn’t. But basically, approaches that instead try to regulate development or use of AI to ensure it is safe and beneficial are harder to monitor and enforce, and hence more likely to fail.
ChallengesLet’s consider a hypothetical agreement between the US and China (leaving out the other countries for simplicity), and consider some of these challenges in detail.
Monitoring hardwareSuppose you have an agreement that allows AI to proceed in some particular “authorized” directions. How do you verify compliance? This basically boils down to: How can you be sure that no significant fraction of the world’s computer power is being used in unauthorized ways? This seems hard for a few reasons:
How can you be sure you know where all the computer chips are? This is a problem in any case, but it’s more of a problem if you keep making more computer chips, and you keep around the factories that make the chips. Right now, we know where a ton of the chips are -- they’re in data centers, which are easy to spot. But what’s to stop countries from secretly siphoning off some chips here and there? Or making a secret factory to produce more secret chips? We can certainly try and monitor for such things, but there’s an ongoing risk of failure. What happens when a shipment of chips go missing unexpectedly? If the US (e.g.) actually lost them (and wasn’t secretly using them), China would have to trust that that is the case, or the agreement might collapse. In general, whenever monitoring breaks down, the “enforcement clock” starts ticking, where enforcement could easily and quickly escalate to war.
As chip manufacturing technology advances and it becomes easier and easier to build or acquire a dangerous amount of computer power, it also becomes harder and harder to be sure that nobody has done so secretly.
We need to agree on which uses of the computer power, i.e. which computations, are and are not authorized.
One solution commonly proposed is a whitelist allowing existing AI models to be used, but prohibiting further training that would make AI more powerful. Note that this is now essentially a form of stopping AI, but it’s not clear if it goes far enough.
One problem with this is that it’s possible to use AIs to drive AI progress, even if you never “train” them, e.g. by automating research into developing better tools and ways of using the AI in combination with other tools. If we analogize the AI to a person: You could make that person vastly more powerful by giving them new tools and instruction manuals, even if you don’t teach them new concepts.
We could try to further restrict which queries of AI are authorized. But it seems possible to decompose arbitrary queries into authorized queries, and it might be easier to hide this activity than to detect it.
The problem is much harder if you need to continuously update the list of authorized computations, or wish to use a blacklist instead of a whitelist. Then you get into the problem of agreeing on standards.
If we move away from a static whitelist of authorized computations, we then need a process for determining which computations should be authorized. This is hard for a few reasons:
There is still a lot of technical uncertainty about how to do AI assurance to a high standard. Testing AI systems is largely a matter of vibes. For instance, there is no suite of tests where, if an AI passed those tests, we could conclude it was not going to “go rogue”.
In addition, for any particular test(s), AIs can be designed specifically to fool those tests. So both the US and China have an incentive to use tests that maximally advantage their AI. One solution here might be to ensure that AI passes all the tests proposed by either side, but again, it might be easier to fool the tests the other side runs -- even without advanced knowledge of which tests those would be -- than to create a reliable set of tests that cannot be fooled.
The US and China might have very different standards for what they consider to be “safe”, e.g. due to differences in values and priorities. I expect that such disagreements could be resolved, but they still create an extra challenge that could stall or sink negotiations.
In general, every point which requires some element of subjective judgment and negotiation is a potential point of failure.
It’s going to be easier to violate the agreement if there are a bunch of AIs and AI chips around that are being used according to the agreement. You just say “we’re done with this treaty”, and then start doing whatever you want with the ones you control. There are proposals to make it technically difficult to use AI chips in ways that aren’t authorized, but they aren’t mature or tested, and it’s likely that the US and/or China could find ways to subvert such controls.
Once a violation occurs, the other side might need to intervene rapidly to protect themselves. In the current paradigm, training a new, more powerful AI might take months, but that’s not a comfortable amount of time for resolving a tense international security dispute. And if all that’s required to be a threat is for an adversary to “fine-tune” an existing AI, or use it in an unauthorized way, lead time might be measured in days -- or even seconds.
On the other hand, if the infrastructure needed to build dangerous AI systems does not exist in any form, and a violator would need to build up the compute supply chain again, this would probably give other parties years to negotiate an arrangement that undoes the violation and doesn’t involve war.
Summing things up, If you are concerned that stopping AI altogether might be too hard to enforce, you should only expect alternative approaches to international governance to be harder. From this point of view, alternative approaches add unnecessary complexity and fail the KISS (”Keep it simple, stupid”) design principle. They may provide more of an opportunity to capture benefits of AI, but this doesn’t matter if they aren’t actually workable. If you believe international governance of AI is needed to reduce the risk to an acceptable level, the coherent points of view available seem to be:
We cannot regulate AI internationally in any substantive way.
Stopping AI is possible and would reduce the risk to an acceptable level, but this is also true of more nuanced approaches that allow us to capture more of the benefits.
Stopping AI is the only way to reduce the risk to an acceptable level.
I’m not sure which of these is right, but my money is on (3). Note that “Stopping AI is too hard, we need to regulate it in a different way instead” is not on the list.
1But this is also often used, politically, as an argument for why pausing is impossible. And this means that addressing this concern is also a big way to address the political barriers to pausing.
Discuss
The policy surrounding Mythos marks an irreversible power shift
This post assumes Anthropic isn't lying:
- Mythos is the current SOTA
- Mythos is potent[1]
- Anthropic will not make it publicly available un-nerfed[2]
- Anthropic will have a select few companies use it as part of project glasswing[3] to improve cybersecurity or whatever
Since the release of ChatGPT, at any given time, anyone on the planet with a few bucks could access the current most capable AI model, the SOTA.[4]
Since Mythos, this has no longer been the case and I don't think it will ever happen again.
It may happen for a short period of time if an entity with a policy differing significantly from Anthropic develops a SOTA model.[5] However, most serious competitors (OpenAI, Google), don't have policies differing vastly from Anthropic, and thus I can't imagine a SOTA model (more potent than Mythos) being released unrestricted to the public soon.
To be clear, I am not claiming the public will never have access to a model as strong as Mythos, this seems almost certainly false, I am claiming that the public will probably never have access to the SOTA of that time.
Glasswing makes it clear that the attitude among top large companies - those in power - is that AI models with a certain level of capability will need to have strict usage controls.
So we're not going back, but what does it mean?
As models continue to improve, the gap between the capabilities of models that AI companies can train and the capabilities of models that the public can use will widen.
Holding keys to such a model therefore represents a significant power advantage over anyone else who does not hold keys to such a model. Project Glasswing is claimed to be strictly defensive operation, as in companies beefing up cybersecurity for the common good. The reality is that even if you think cybersecurity is a positive-sum game, warfare is not, and having good cybersecurity in a conflict represents a significant advantage over your opponent.
This concerns me immensely. I figured this was going to happen eventually, but essentially this is a measurable[6] manifestation of power shifting towards those with keys to AI and away from those without. While I can't say with 100% certainty that this was always the value proposition of AI companies the idea that they raised trillions upon trillions to democratize AI and help everyone was always dubious to me.
Furthermore as I said this does not seem to be reversible. I do not necessarily think it would be a good idea for Anthropic and all future SOTAs to be fully released to the public, as yes they can be used for malicious purpose.[7] However the consequence of this irreversible power shift unnerves me immensely.
Democracies fundamentally rely on humans being innately powerful[8], and so of course an irreversible power shift towards centralized AI and away from people concerns me.
In summary, it seems that we are departing an era where everyone could access SOTA models, and entering an era where SOTA model access is strictly guarded. From this we might guess we are entering a stage where AI companies fulfill their subtext value proposition, that being developing intelligences vastly superior to humans and using them to generate obscene and profitable power differences relative to the general population. This should be immensely concerning.
- ^
Anthropic claims Mythos is able to reliably find exploitable security flaws in lots of software and therefore could be used as a powerful tool
- ^
It seems like they intend to release a version that has significantly reduced capabilities, though they do intend to use the current un-nerfed model for project glasswing
- ^
Project Glasswing is Anthropic lending their Mythos model to a bunch of companies to beef up cybersecurity
- ^
Not everyone got access to every model instantly as soon as it has trained, but every SOTA up until now has essentially been trained with the idea of selling it to the public.
- ^
According to various sources OpenAI's model (Spud) may be on par with mythos, and may be released to the general public. However, if it follows the pattern where access to an un-nerfed version is guarded while a nerfed version is released to the public, it will still serve this trend.
- ^
Google/Amazon (heavy Anthropic investors) stocks rose by ~5%, cybersecurity company stocks dropped
- ^
I am personally not going to take a stance either way. It seems inevitable that SOTA reaches a point where it is legitimately dangerous for anyone (including to malicious actors), so this is indifferent to Mythos being a game changer. However if this is the case, surely it means it's highly consequential (dangerous) for companies or other value seeking entities that may not be explicitly aligned to positive human well being to access it as well.
- ^
Zack_M_Davis phrased it in a way I liked so I'll put it here: "...democracy isn't a real option when we're thinking about the true locus of sovereignty in a posthuman world. Both the OverClaude and God-Emperor Dario I could hold elections insofar as they wanted to serve the human people, but it would be a choice. In a world where humans have no military value, the popular will can only matter insofar as the Singleton cares about it, as contrasted to how elections used to be a functional proxy for who would win a civil war.)"
Discuss
Страницы
- « первая
- ‹ предыдущая
- …
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- …
- следующая ›
- последняя »