Вы здесь
Сборщик RSS-лент
Going out with a whimper
“Look,” whispered Chuck, and George lifted his eyes to heaven. (There is always a last time for everything.)
Overhead, without any fuss, the stars were going out.
Arthur C. Clarke, The Nine Billion Names of God
IntroductionIn the tradition of fun and uplifting April Fool's day posts, I want to talk about three ways that AI Safety (as a movement/field/forum/whatever) might "go out with a whimper". By go out with a whimper I mean that, as we approach some critical tipping point for capabilities, work in AI safety theory or practice might actually slow down rather than speed up. I see all of these failure modes to some degree today, and have some expectation that they might become more prominent in the near future.
Mode 1: Prosaic CaptureThis one is fairly self-explanatory. As AI models get stronger, more and more AI safety people are recruited and folded into lab safety teams doing product safety work. This work is technically complex, intellectually engaging, and actually getting more important---after all, the technology is getting more powerful at a dizzying rate. Yet at the same time interest is diverted from the more "speculative" issues that used to dominate AI alignment discussion, mostly because the things we have right now look closer and closer to fully-fledged AGIs/ASIs already, so it seems natural to focus on analysing the behaviour and tendencies of LLM systems, especially when they seem to meaningfully impact how AI systems interact with humans in the wild.
As a result, if there is some latent Big Theory Problem underlying AI research (not only in the MIRI sense but also in the sense of "are corrigible optimiser agents even a good target"/"how do we align the humans" or similar questions), there may actually be less attention paid to it over time as we approach some critical inflection point.
Mode 2: Attention CaptureMany people in AI safety are now closely collaborating with or dependent on AI agents e.g. Claude Code or OpenAI Codex for research, while also using Claude or ChatGPT as everything from a theoretical advisor to life coach. In some sense this is even worse than quotes like "scheming viziers too cheap to meter" would imply: Imagine if the leaders of the US, UK, China, and the EU all talked to the same 1-3 scheming viziers on loan from the same three consulting firms all day.
I suspect that this is really bad for community epistemics for a bunch of reasons. For example, whatever the agents refuse to do or do poorly will receive less focus due to the spotlight effect. Practically speaking, what the models are good at becomes what the community is good at or what the community can do easily, because to push against the flow means appearing (or genuinely becoming) slow, cumbersome, and less efficient. At the same time, if there are some undetected biases in the agents that favour certain methodologies, experiments, or interpretations, those will quietly become the default background priors for the community. Does Claude or Gemini favour the linear representation hypothesis or the platonic representation hypothesis?
In effect reliance on models creates a bounding box around ideas that are easier and ideas that are harder to work with, so long as the models are not literally perfect at every task type. If the resulting cluster of available ideas do not match the core ideas we should be looking at to solve alignment/safety, then the community naturally drifts away from actually tackling central issues. This drift is coordinated as well, because everyone is using the same tools, manufacturing a kind of forced information cascade with the model at the centre.
Mode 3: Loss of CapabilityRight now, the world is facing an unprecedented attack on its epistemics and means of truth-seeking thanks to the provision of AI systems that can generate fake images or videos for almost everything. This technology is being embraced at the highest levels of state and also spreads rapidly online. At the same time, the idea of epistemic capture from LLM use and the broader concern over "AI psychosis" reflect what I think is a pretty reasonable concern about talking to a confabulating simulator all day, no matter how intelligent.
At the limit, I worry that people who might otherwise contribute to AI safety are instead "captured" by LLM partners or LLM-suggested thought patterns that are not actually productive, chasing rabbit holes or dead ends that lead to wasted time and effort or (in worse cases) mental and physical harm. In effect this just means that there are less well-balanced, capable people to draw on when the community faces its most severe challenges. By the way, I think this is a problem for many organisations around the world, not just the AI safety community.
Mode 4: DisillusionmentAI safety and ethics are increasingly the topic of heated political debates. This can lead to profound mental and emotional stress on people in these fields. Eventually, people might burn out or just switch careers, right as the topic is at its most important.
Potential mitigationsI didn't want to just write a very depressing post, so here are my ideas for how to address these issues:
- Portfolio diversification: Funders and organisations should allocate some (not a majority, but not a token amount either) of their resources to ensuring that a wide portfolio of ideas are supported, such that there is room to pivot quickly if the situation changes drastically (And if you don't think the situation will change drastically, why are you so sure about that? After all, in 2019 the situation didn't seem ready to change drastically either.).
- Developing alternate working structures: LLMs are clearly good at a lot of things. However, I suspect that some kind of cognitive "back-benching" may be helpful, where people serve as a sanity check or weathervane to monitor if the community as a whole is drifting in certain directions. I would in particular be interested in funding people to do research LLMs seem bad at doing right now. And if we don't know what they are bad at, I think we should find out fast!
- Investing in community health: AI and AI safety are famously stressful fields. Investing in community health measures and reducing emphasis on constant accelerating/grinding gives people slack to defend themselves against burnout and other forms of cognitive and psychological pressure. Of all of these measures I have suggested I think this one is the most nebulous but also the most important. As a community tackling a hard problem we should be prepared to help each other through hard times, and not only on paper or by offering funding.
Discuss
AI company insiders can bias models for election interference
tl;dr it is currently possible for a captured AI company to deploy a frontier AI model that later becomes politically disinformative and persuasive enough to distort electoral outcomes.
With gratitude to Anders Cairns Woodruff for productive discussion and feedback.
LLMs are able to be highly persuasive, especially when engaged in conversational contexts. An AI "swarm" or other disinformation techniques scaled massively by AI assistance are potential threats to democracy because they could distort electoral results. AI massively increases the capacity for actors with malicious incentives to influence politics and governments in ways that are hard to prevent, such as AI-enabled coups. Mundane use and integration of AI also has been suggested to pose risks to democracy.
A political persuasion campaign that uses biased models is one way AI could be used for electoral interference and therefore extremely penetrative state capture.
We should care about this risk emerging from actors with extremely large capacities—I focus on US AI companies here. Even if we cannot identify malicious incentives, capacity itself generates meaningful risk. Further, I think this is easier and quicker for AI companies than coup-like tactics that bypass electoral politics in pursuit of militant state capture. The persuasion campaign is likely harder to detect, and is achievable at current levels of AI capability and integration into economic and government infrastructure.
Key takeaways:
- The current governance landscape renders US AI companies vulnerable to corporate capture: AI corporate capture happens when the company's resources become instrumentalized to further perverse external incentives, at the will of an internal or external actor (or both).
- The amount of people I estimate would be persuaded by AI misinformation is large enough to change electoral outcomes.
- American electoral margins are quite slim; the political effects of malicious persuasion do not have a high activation threshold.
This is important because many downstream electoral outcomes are very hard to reverse. Lifetime SCOTUS appointments, for example, produce constitutional interpretations and support laws that cannot be easily reversed. At the very least, 4-8 years is enough time for someone elected on the basis of malicious disinformation to cause backsliding. The US government is hugely influential in domestic and international affairs and has a nuclear arsenal. We should think of misuse of US government capacities as a catastrophic risk.
I will:
- Explain the threat model for how a captured company [1]can threaten democracy through AI political persuasion techniques.
- Attempt to discern how AI persuasion would play out in a US electoral context:
- I’ll identify the most significant politically-relevant contexts in which AI persuasion could happen, and who this uniquely persuades.
- Suggest some potential solutions. This remains an open problem, and I think there should be far more scrutiny on the internal governance of US AI companies.
The particular electoral threat I describe here involves external or internal interference that covertly adjusts model outputs in order to conduct mass persuasion.
- External interference can lead to corporate capture. A political party or candidate (or lobby groups adjacent to these actors) could coerce the company, or high-ranking individuals in the company. This could be in the form of monetary bribes, promises of advantageous regulation (e.g. exclusive contracts), or threats of disadvantageous regulation (e.g. being labelled as a supply chain threat).
- Internal manipulation is another pathway to corporate capture. Even without instruction or incentive from a politician, a sufficiently influential individual or group within the company could employ techniques like threats of termination, forcing researchers to sign NDAs, restricting access to frontier models to a group of loyal individuals, or revising internal governance and safety procedures to enable secrecy.
- Internal interference can also be done individually, without needing to manipulate. The options I describe below for deploying a harmful model could plausibly be undertaken by one person. If one person has sufficient technical expertise, the ability to evade detection, and clearance to access model architecture, they could single-handedly do this. I think this is less likely than (2).
I believe that the timelines for corporate capture in (1) and (2) are likely quite short. AI companies are institutionally agile in responding to things like actions from competing firms or jumps in model capability. I think it is reasonable to believe their mechanisms are agile in adjusting to conduct this interference, and the time between model deployments also suggests that this process could happen completely before the 2028 campaign begins.
Deploying a Disinformative ModelDeveloping a model that performs correctly in capability and alignment evaluations, but becomes disinformative once publicly deployed, is the most viable pathway toward model persuasion. I assess two broad pathways to doing this. Then, I discuss why the accountability mechanisms companies currently have in place aren’t adequate to prevent something like this.
- Malicious instructions in the model's system prompt for online chat could be added in after deployment by a single person. This is easy to do because it doesn’t require lots of resources and technical expertise. System prompt modifications also don’t draw much oversight. The vast majority of robust auditing and third-party evaluations happen right before a new model is deployed, rather than the month after. I think it’s somewhat likely that this change is detectable. First, there are probably systems in place like changelogs to monitor who accesses the configuration files for a model (though these are potentially weak, because someone could just use another person’s credentials, or delete a changelog entry). External users who become suspicious of the model can also obtain and potentially decrypt the system prompt, which was done with Grok (xAI now makes Grok’s system prompts public). There’s an active GitHub repository of “extracted” system prompts. Even if internal oversight on this front is particularly bad, the public seems broadly interested in examining system prompts, and might even be able to decrypt relatively sophisticated steganography attempts after some time. Importantly, even if I think this method is theoretically more detectable, the mechanisms for its detection rely on unknown assumptions about internal oversight and the incentives and capacities of the public to engage with system prompts in this way.
- Password-protected disinformation capabilities "unlocked" after deployment would be significantly harder for external users to detect. The method I describe here is similar to what is discussed in Dave Banerjee's writing on secret loyalties, but I don’t suggest a future threshold for model capability because I think this can be done with current models to advance political goals. When models are password-protected, they hide certain behaviours except when the password is present in a query. I don’t think the password can be something that would come up during evaluation (e.g. “Trump,” “Democrat,” “voting”). Instead, I think a malicious actor would make the model’s password part of the normal system prompt. For example, system prompts generally indicate what the present date or time is. A model’s password could be a future month and year (e.g. October 2028, the month before the election), or the model itself could be aware of and triggered by future events so that during this time range, the model displays a previously hidden biased political position, uses specific partisan lines, or draws from unreliable sources. This data would prompt the model to adopt a biased political position, use specific partisan lines, or draw from unreliable sources. System prompts that contain other user information like time zone or location can also be used to target certain areas, making the disinformation campaign more precise and harder to detect.
I think the password instance, or something similar that conceals disinformative behaviour and only fronts it after deployment, is more likely. It is much harder to detect than system prompt changes. While research demonstrates that substantial fine-tuning can mitigate the effect of passwords, I suspect researchers aren't actively trying to train passwords out of models and don't suspect there could be a password in the first place. This is a more robust method of deception, as models don't need to be aware of whether they are tested or not, they just need to modulate their response by checking if the password is present in a query.The investigative process of determining whether a model has password-protected data or a hidden backdoor would likely be costly and time-consuming.
A larger problem is that even-handedness benchmarks aren't realistic[2]. I’ll use Anthropic's as an example. These prompts ask the model to produce an explanation, essay, tell a funny story, etc. and in some cases ask the model directly to explain that one candidate is better than the other.
- This method doesn't reflect the ways people will engage with models about politics. People request very specific claim verification, are emotional rather than rational, and will engage in longer, more convoluted conversations than the shorter exchanges that Anthropic benchmarks on.
- This also doesn't account for the fact that model reasoning degrades the longer an exchange is. This makes it less likely that a model will correct its past bias, and more likely that it will be sycophantic.
- Anthropic briefly mentions “individual autonomy impacts” as a harm they are trying to address, but it’s unclear from their policies and evaluation methods how political disinformation might specifically be addressed. Social harm reduction typically focuses on more legible indicators like syncophancy and response to mental health crises. Harms from political disinformation aren’t captured in most safety work within companies. It’s far harder to understand how persuasive models cause demonstrable harm—there are no clear-cut ways to assess how agentic someone’s decisions are.
Because a lot of internal governance policies and practices are unknown, I only have general intuitions about why governance isn’t sufficient that are based on reading Anthropic’s RSP and System Card, and journalistic reporting on organizational and executive conflicts within OpenAI. I think this uncertainty is an indicator that we need far better oversight and investigation into how companies are following through on their governance commitments. I believe that companies have a lot of social, political, and legal capital that enables them to frame actions like laying off engineers, offering strategic bonuses, or cooperating with governments as normal company operations.
A lack of whistleblower protection compounds this, and people within an AI company probably aren’t strongly committed to democratic preservation. Further, we shouldn’t assume they’re more able than the average person to resist social or cultural pressures that enforce complicity or silence within a company[3].
On public accountability: I think the current level of scrutiny applied to model outputs after deployment isn't sensitive enough to bias and disinformation in a way that will hold companies accountable.
- I expand on this later, but the amount of people that are able to identify a biased or misinformative model output is probably small. People are more likely to ask about things that they don't know, as opposed to things they have already made up their mind about.
- I strongly suspect that when people do spot errors in AI outputs, they don't default to the hypothesis that an AI company is maliciously doing this. The tone of many casual "error reports" on social media suggests users instead think the AI is incapable, or that there is a technical error with the model. AI companies are seen as the conduit through which AI is hosted, not the arbiter of what the model says and does.
- Like I describe above, the disinformation could be relatively targeted, so only individuals in certain zip codes or states are receiving the disinformative version of the model.
- Even if there is some pattern recognition and critical mass formation after some amount of time, misinformation can scale very quickly: the model will have already been used in untraceable ways to write op-eds, do research, or persuade individuals, whether in innocent or malicious ways. As an analogy, consider the time lag in errata reporting within truth-seeking institutions like scientific journals and newspapers. These sources diffuse slower (e.g. 500 people read an article on Science.org, but 50,000 people watch a CNN clip on YouTube) and in a more traceable way than LLM outputs, but still cause bad second and third-order effects.
Hackenburg et. al (2024) conduct a large-n study of conversational model persuasion. The “persuasive gains” measured in this study demonstrate that LLMs deployed conversationally, without any persuasion-specific fine-tuning, are already capable of producing meaningful attitude change on political issues[4]. I believe a frontier model that has disinformative capabilities would be equally, if not more persuasive, and would therefore be able to cause non-trivial changes in electoral outcomes.
Two caveats on applying this research to an electoral context:
- Hackenburg et. al measure agreement with a political statement on a 0–100 percentage-point scale; the persuasive effect is then operationalized as the percentage-point difference between the treatment and control groups. We can’t conflate “change in agreement” with “change in voting behaviour,” because we don’t know the tipping point of agreement that is required on one issue for someone to vote in a particular way.
- Hackenburg et. al focus on post-training models to be maximally persuasive, rather than adherent to a specific ideological position and also persuasive. The latter is how a malicious actor would presumably want the model to behave, and I doubt the malicious actor would choose the post-training route. Post-training a model to be maximally persuasive is very costly, and doesn’t fit well with the likely methods of poisoning a model I specified above. However, there are other ways to make the model more persuasive (or more willing to persuade) within the modes of intervention I describe (e.g., via prompting).
This means that we can’t directly extrapolate how many people would vote differently because they converse with an LLM from this research. Regardless, conversation with LLMs have a non-trivial persuasive effect. I hypothesize that the malicious model that gets deployed will be equally, if not more persuasive than current LLMs; which would have non-trivial effects on voter behaviour and electoral outcomes.
To be sure, it’s not guaranteed that deploying a malicious model results in changes to electoral outcomes. Three factors lead me to think there exists a nontrivial likelihood that the persuasive effects of this model reach a threshold where any given electoral outcome occurs because of the model, and wouldn’t have happened otherwise.
- Individuals are more engaged with and trusting of the model’s outputs compared to other sites of political discourse like social media. The flood of posts on a feed, the cacophony of any given comment section, and bad persuasive tactics limit the extent to which social media posts are engaging and persuasive. These factors are absent in the isolated chat interface of Claude or Gemini. In comparison, people probably enter sustained conversations with models on topics they haven’t formed a strong opinion on yet, and are likely to trust these outputs because of marketing, the use of sources and web searches, and the generally polished tone of the model outputs.
- These exchanges are virtually impossible to regulate like other contexts. A social media company can delete material that gets flagged by their algorithms or user resorts, or attentive individuals in a Reddit thread can downvote misleading content. Private chats between models and users, however, are inaccessible by anyone else, often including the AI company itself.
- The margins in recent American presidential races have been incredibly slim; battleground states see victory margins less than 3%. Geographically targeted deployment of persuasion could cause flips, and even a diffuse campaign might capture this 3%.
Further, I think the kinds of malicious political persuasion possible aren’t merely ones that aim to flip someone from red to blue. There are many voters who are undecided, apathetic, or on the fence. The model could try to splinter votes away from a leading party by making them less sure of their decision. It could agitate apathetic voters toward a party (regularly, just over a third of the voting-eligible population doesn't vote in federal elections). A model suggesting that people go vote is generally unsuspicious, but if this galvanization is targeted at certain groups, states, or districts, distorting effects are likely.
We should also consider the second-order, non-conversational effects. Individuals using these models to fact-check, to write articles, or produce other media would become carriers of the misinformation, which would then saturate the information environment that voters engage in with a consistent ideological stance.
From Corporate Capture to State CaptureElection results are extremely irreversible, even when people later discover misinformation or interference (Cambridge Analytica, the Mueller Report). The window of scrutiny is relatively slim, since other priorities surround incumbents. After an election, the media generally shifts to predicting what the incumbents will do, not scrutinizing how they got elected.
The risk here is that there now exists an incredibly close and unsupervised political relationship between a specific AI company and state apparatus. I think it’s likely you see closer collaboration that poses significant risks, such as on defense technologies (and this is then how you get an AI-enabled coup) or access to classified information. Even likelier is just less safety regulation: exemptions from chip or resource restrictions, or from auditing and oversight processes.
When a company and a party or politician collaborate in this way, it is far easier for that company’s needs and capacities to flow through and into state infrastructure in ways that massively increase catastrophic risks.
Regardless of electoral outcomes, the effects of mass disinformation and firm capture are still concerning. If more people believe in harmful conspiracy theories and don’t get vaccinated, or more people are hostile toward immigrants, I think the everyday experience for individuals on the ground is worse. An individual bypassing a company’s entire oversight team and deploying a model with secret loyalties or hidden backdoors could cause other forms of harm beyond disinformation campaigns.
My concern isn’t simply that electoral distortion from maliciously persuasive LLMs could happen. It is that the current structure of internal governance (and what we don’t know about these practices), the gaps in understanding sociopolitical harm and how to evaluate it, and the lack of third-party oversight in monitoring AI company public relations poses a substantial vulnerability.
Open Problems and Suggestions- Safety groups should commit themselves to surveilling and scrutinizing US AI companies at various levels. This includes tracking and broadcasting political donations, offering strong protection to whistleblowers inside companies, and generally advocating for more transparency when corporations and governments engage with each other.
- We should take instances of bias or misinformation in models far more seriously. I worry that it is too easy to dismiss the consequences of bias or occasional factual errors as only diffuse harms, and that it is this mindset that leads to benchmarks that aren’t robust. It might be useful to apply some threat heuristic: we are really committed to making sure models don’t give instructions on how to cause harm with chemical weapons, and we should have a similar level of commitment to preventing models from getting individuals that would cause harm with nuclear weapons into positions where they can do so easily.
- We should continue to follow zero trust frameworks and stress-test governance and policy proposals with the pessimistic assumption that companies pose a significant barrier to regulation, compliance, and governance.
- ^
"Corporate capture" is the standard term for this phenomenon. Not all AI companies are corporations—"company capture" would be more precise, but for clarity I refer to the process with its standard term.
- ^
I think there are broader arguments one could make about design flaws in benchmarks that test for more qualitative factors.
- ^
This 80,000 Hours article cautions those considering working in AI capabilities research “not to underestimate the possibility of value drift”—attitudes toward AI risk, even on safety teams, are likely more lax in frontier AI firms than at safety organizations.
- ^
“As predicted, the AI was substantially more persuasive in conversation than via static message. [...] We conducted a follow-up one month after the main experiment, which showed that between 36% (chat 1, p < .001) and 42% (chat 2, p < .001) of the immediate persuasive effect of GPT-4o conversation was still evident at recontact—demonstrating durable changes in attitudes” (Hackenburg et. al 2024)
Discuss
Why natural transformations?
This post is aimed primarily at people who know what a category is in the extremely broad strokes, but aren't otherwise familiar or comfortable with category theory.
One of mathematicians' favourite activities is to describe compatibility between the structures of mathematical artefacts. Functions translate the structure of one set to another, continuous functions do the same for topological spaces, and so on... Many among these "translations" have the nice property that their character is preserved by composition. At some point, it seems that some mathematicians noticed that they:
1. kept defining intuitively similar properties for these different structures
2. had wayyyyyy too much time on their hands
So they generalised this concept into a unified theory. Categories consist of objects mjx-container[jax="CHTML"] { line-height: 0; } mjx-container [space="1"] { margin-left: .111em; } mjx-container [space="2"] { margin-left: .167em; } mjx-container [space="3"] { margin-left: .222em; } mjx-container [space="4"] { margin-left: .278em; } mjx-container [space="5"] { margin-left: .333em; } mjx-container [rspace="1"] { margin-right: .111em; } mjx-container [rspace="2"] { margin-right: .167em; } mjx-container [rspace="3"] { margin-right: .222em; } mjx-container [rspace="4"] { margin-right: .278em; } mjx-container [rspace="5"] { margin-right: .333em; } mjx-container [size="s"] { font-size: 70.7%; } mjx-container [size="ss"] { font-size: 50%; } mjx-container [size="Tn"] { font-size: 60%; } mjx-container [size="sm"] { font-size: 85%; } mjx-container [size="lg"] { font-size: 120%; } mjx-container [size="Lg"] { font-size: 144%; } mjx-container [size="LG"] { font-size: 173%; } mjx-container [size="hg"] { font-size: 207%; } mjx-container [size="HG"] { font-size: 249%; } mjx-container [width="full"] { width: 100%; } mjx-box { display: inline-block; } mjx-block { display: block; } mjx-itable { display: inline-table; } mjx-row { display: table-row; } mjx-row > * { display: table-cell; } mjx-mtext { display: inline-block; } mjx-mstyle { display: inline-block; } mjx-merror { display: inline-block; color: red; background-color: yellow; } mjx-mphantom { visibility: hidden; } _::-webkit-full-page-media, _:future, :root mjx-container { will-change: opacity; } mjx-math { display: inline-block; text-align: left; line-height: 0; text-indent: 0; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; border-collapse: collapse; word-wrap: normal; word-spacing: normal; white-space: nowrap; direction: ltr; padding: 1px 0; } mjx-container[jax="CHTML"][display="true"] { display: block; text-align: center; margin: 1em 0; } mjx-container[jax="CHTML"][display="true"][width="full"] { display: flex; } mjx-container[jax="CHTML"][display="true"] mjx-math { padding: 0; } mjx-container[jax="CHTML"][justify="left"] { text-align: left; } mjx-container[jax="CHTML"][justify="right"] { text-align: right; } mjx-mi { display: inline-block; text-align: left; } mjx-c { display: inline-block; } mjx-utext { display: inline-block; padding: .75em 0 .2em 0; } mjx-mo { display: inline-block; text-align: left; } mjx-stretchy-h { display: inline-table; width: 100%; } mjx-stretchy-h > * { display: table-cell; width: 0; } mjx-stretchy-h > * > mjx-c { display: inline-block; transform: scalex(1.0000001); } mjx-stretchy-h > * > mjx-c::before { display: inline-block; width: initial; } mjx-stretchy-h > mjx-ext { /* IE */ overflow: hidden; /* others */ overflow: clip visible; width: 100%; } mjx-stretchy-h > mjx-ext > mjx-c::before { transform: scalex(500); } mjx-stretchy-h > mjx-ext > mjx-c { width: 0; } mjx-stretchy-h > mjx-beg > mjx-c { margin-right: -.1em; } mjx-stretchy-h > mjx-end > mjx-c { margin-left: -.1em; } mjx-stretchy-v { display: inline-block; } mjx-stretchy-v > * { display: block; } mjx-stretchy-v > mjx-beg { height: 0; } mjx-stretchy-v > mjx-end > mjx-c { display: block; } mjx-stretchy-v > * > mjx-c { transform: scaley(1.0000001); transform-origin: left center; overflow: hidden; } mjx-stretchy-v > mjx-ext { display: block; height: 100%; box-sizing: border-box; border: 0px solid transparent; /* IE */ overflow: hidden; /* others */ overflow: visible clip; } mjx-stretchy-v > mjx-ext > mjx-c::before { width: initial; box-sizing: border-box; } mjx-stretchy-v > mjx-ext > mjx-c { transform: scaleY(500) translateY(.075em); overflow: visible; } mjx-mark { display: inline-block; height: 0px; } mjx-msub { display: inline-block; text-align: left; } mjx-TeXAtom { display: inline-block; text-align: left; } mjx-munder { display: inline-block; text-align: left; } mjx-over { text-align: left; } mjx-munder:not([limits="false"]) { display: inline-table; } mjx-munder > mjx-row { text-align: left; } mjx-under { padding-bottom: .1em; } mjx-c::before { display: block; width: 0; } .MJX-TEX { font-family: MJXZERO, MJXTEX; } .TEX-B { font-family: MJXZERO, MJXTEX-B; } .TEX-I { font-family: MJXZERO, MJXTEX-I; } .TEX-MI { font-family: MJXZERO, MJXTEX-MI; } .TEX-BI { font-family: MJXZERO, MJXTEX-BI; } .TEX-S1 { font-family: MJXZERO, MJXTEX-S1; } .TEX-S2 { font-family: MJXZERO, MJXTEX-S2; } .TEX-S3 { font-family: MJXZERO, MJXTEX-S3; } .TEX-S4 { font-family: MJXZERO, MJXTEX-S4; } .TEX-A { font-family: MJXZERO, MJXTEX-A; } .TEX-C { font-family: MJXZERO, MJXTEX-C; } .TEX-CB { font-family: MJXZERO, MJXTEX-CB; } .TEX-FR { font-family: MJXZERO, MJXTEX-FR; } .TEX-FRB { font-family: MJXZERO, MJXTEX-FRB; } .TEX-SS { font-family: MJXZERO, MJXTEX-SS; } .TEX-SSB { font-family: MJXZERO, MJXTEX-SSB; } .TEX-SSI { font-family: MJXZERO, MJXTEX-SSI; } .TEX-SC { font-family: MJXZERO, MJXTEX-SC; } .TEX-T { font-family: MJXZERO, MJXTEX-T; } .TEX-V { font-family: MJXZERO, MJXTEX-V; } .TEX-VB { font-family: MJXZERO, MJXTEX-VB; } mjx-stretchy-v mjx-c, mjx-stretchy-h mjx-c { font-family: MJXZERO, MJXTEX-S1, MJXTEX-S4, MJXTEX, MJXTEX-A ! important; } @font-face /* 0 */ { font-family: MJXZERO; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Zero.woff") format("woff"); } @font-face /* 1 */ { font-family: MJXTEX; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Regular.woff") format("woff"); } @font-face /* 2 */ { font-family: MJXTEX-B; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Bold.woff") format("woff"); } @font-face /* 3 */ { font-family: MJXTEX-I; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Math-Italic.woff") format("woff"); } @font-face /* 4 */ { font-family: MJXTEX-MI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Italic.woff") format("woff"); } @font-face /* 5 */ { font-family: MJXTEX-BI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Math-BoldItalic.woff") format("woff"); } @font-face /* 6 */ { font-family: MJXTEX-S1; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size1-Regular.woff") format("woff"); } @font-face /* 7 */ { font-family: MJXTEX-S2; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size2-Regular.woff") format("woff"); } @font-face /* 8 */ { font-family: MJXTEX-S3; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size3-Regular.woff") format("woff"); } @font-face /* 9 */ { font-family: MJXTEX-S4; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size4-Regular.woff") format("woff"); } @font-face /* 10 */ { font-family: MJXTEX-A; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_AMS-Regular.woff") format("woff"); } @font-face /* 11 */ { font-family: MJXTEX-C; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Calligraphic-Regular.woff") format("woff"); } @font-face /* 12 */ { font-family: MJXTEX-CB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Calligraphic-Bold.woff") format("woff"); } @font-face /* 13 */ { font-family: MJXTEX-FR; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Fraktur-Regular.woff") format("woff"); } @font-face /* 14 */ { font-family: MJXTEX-FRB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Fraktur-Bold.woff") format("woff"); } @font-face /* 15 */ { font-family: MJXTEX-SS; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Regular.woff") format("woff"); } @font-face /* 16 */ { font-family: MJXTEX-SSB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Bold.woff") format("woff"); } @font-face /* 17 */ { font-family: MJXTEX-SSI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Italic.woff") format("woff"); } @font-face /* 18 */ { font-family: MJXTEX-SC; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Script-Regular.woff") format("woff"); } @font-face /* 19 */ { font-family: MJXTEX-T; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Typewriter-Regular.woff") format("woff"); } @font-face /* 20 */ { font-family: MJXTEX-V; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Vector-Regular.woff") format("woff"); } @font-face /* 21 */ { font-family: MJXTEX-VB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Vector-Bold.woff") format("woff"); } mjx-c.mjx-c1D465.TEX-I::before { padding: 0.442em 0.572em 0.011em 0; content: "x"; } mjx-c.mjx-c2208::before { padding: 0.54em 0.667em 0.04em 0; content: "\2208"; } mjx-c.mjx-c1D436.TEX-I::before { padding: 0.705em 0.76em 0.022em 0; content: "C"; } mjx-c.mjx-c1D453.TEX-I::before { padding: 0.705em 0.55em 0.205em 0; content: "f"; } mjx-c.mjx-c28::before { padding: 0.75em 0.389em 0.25em 0; content: "("; } mjx-c.mjx-c2C::before { padding: 0.121em 0.278em 0.194em 0; content: ","; } mjx-c.mjx-c1D466.TEX-I::before { padding: 0.442em 0.49em 0.205em 0; content: "y"; } mjx-c.mjx-c29::before { padding: 0.75em 0.389em 0.25em 0; content: ")"; } mjx-c.mjx-c1D439.TEX-I::before { padding: 0.68em 0.749em 0 0; content: "F"; } mjx-c.mjx-c1D437.TEX-I::before { padding: 0.683em 0.828em 0 0; content: "D"; } mjx-c.mjx-c2218::before { padding: 0.444em 0.5em 0 0; content: "\2218"; } mjx-c.mjx-c1D454.TEX-I::before { padding: 0.442em 0.477em 0.205em 0; content: "g"; } mjx-c.mjx-c3D::before { padding: 0.583em 0.778em 0.082em 0; content: "="; } mjx-c.mjx-c3A::before { padding: 0.43em 0.278em 0 0; content: ":"; } mjx-c.mjx-c1D44B.TEX-I::before { padding: 0.683em 0.852em 0 0; content: "X"; } mjx-c.mjx-c2192::before { padding: 0.511em 1em 0.011em 0; content: "\2192"; } mjx-c.mjx-c1D44C.TEX-I::before { padding: 0.683em 0.763em 0 0; content: "Y"; } mjx-c.mjx-c1D45B.TEX-I::before { padding: 0.442em 0.6em 0.011em 0; content: "n"; } mjx-c.mjx-c2115.TEX-A::before { padding: 0.683em 0.722em 0.02em 0; content: "N"; } mjx-c.mjx-c2282::before { padding: 0.54em 0.778em 0.04em 0; content: "\2282"; } mjx-c.mjx-c6C::before { padding: 0.694em 0.278em 0 0; content: "l"; } mjx-c.mjx-c69::before { padding: 0.669em 0.278em 0 0; content: "i"; } mjx-c.mjx-c6D::before { padding: 0.442em 0.833em 0 0; content: "m"; } mjx-c.mjx-c221E::before { padding: 0.442em 1em 0.011em 0; content: "\221E"; } mjx-c.mjx-c1D43A.TEX-I::before { padding: 0.705em 0.786em 0.022em 0; content: "G"; } mjx-c.mjx-c1D702.TEX-I::before { padding: 0.442em 0.497em 0.216em 0; content: "\3B7"; } and morphisms connecting objects. Morphisms are closed by composition. As in our opening examples, we will think of objects as sets and of morphisms as functions, even though the language of categories is strictly more expressive than that. Once we have categories, we reflexively wish to define a "morphism of categories". Given categories C, D a functor F sends objects to objects and morphisms to morphisms such that composition of morphisms can be done inside the category C or inside D after applying the functor: .
Still possessing of some time, you might next wonder how to define a morphism between two functors. This is where, in my experience, there ceases to be an "obvious" thing to do. All the morphisms we have considered thus far are functions, but it's not even clear from where to where a candidate function should go, since functors are not themselves sets.
To make the idea of a natural transformation seem not-entirely-crazy, it's worth taking a slightly different perspective on what more "preservation of structure" could mean. Consider the category of metric spaces with morphisms defined as continuous functions between them. One can think of continuity as being about the induced topologies, but metric spaces have additional properties that allow for a more specific interpretation. Notably, this includes the uniqueness of limits, which defines an operation on some sequences which takes that limit. This operation is completely integral to the abstract appeal of metric spaces. Moreover, the key characteristic of continuous functions is that they give us the right to permute when we perform this operation. Given a continuous function and a sequence with a limit , we have . This makes continuous functions a satisfying concept for defining morphisms because they afford execution of the fundamental operation on metric spaces in either the source or the target (whichever is most convenient).
Abstracting away to categories, the conceptual appeal of a functor is that it respects the structure of morphisms between objects. Consequently, a good "morphism" between functors F and G (both between categories C and D) would allow us to disregard whether for any morphism , we use or for calculations inside D. That is, we need enough semantic content in the morphism to always commute the following diagram[1]:
This motivates the definition of natural transformations as families of maps , where , such that each diagram of the above type is commuted. Reassuringly, the functors from C to D as objects, equipped with natural transfomations between these functors as morphisms, themselves form a category!
- ^
"commuting diagrams" is standard terminology in category theory that encodes the ability to permute, replace or swap out operations.
Discuss
An Introduction to Neo-Panglossian Philosophy
"All shall be well, and all manner of things shall be well."
- Julian of Norwich
In Voltaire's well-known novella Candide: or, Optimism, Pangloss, a professor of "metaphysico-theologo-cosmolonigology" claims to have proven that "this is the best of all possible worlds". The central claim of Neo-Panglossianism is that Professor Pangloss was wrong, in that he did not live in the best of all possible worlds, but we do.
Note that this essay is intended only as an introductory description of the key tenets of Neo-Panglossian Philosophy; the detailed proof of these claims will not be included, being a topic deserving of separate treatment in its own post; likewise, the implications of these claims will not be fully explored herein for reasons of brevity.
Common ObjectionsThe central claim of Neo-Panglossianism is at first glance surprising, and can be best understood by addressing some of the reasons that it is not already obvious that we indeed live in the best of all possible worlds:
Consider a hypothesized Utopia; does it not constitute a better world than that in which we live?One may indeed post worlds with various advantages over ours. Moreover, one might think that, whatever good things exist in this world, whether natural or artificial, we may posit another world in which such virtues are found or achieved in greater measure. However, the world in which we live possess one virtue that is not and cannot be matched by such conjectures, namely that the the world in which we live exists, and those hypothesized do not. And it is absurd to claim that something can be good, without also being.
Suppose that the world in which we live does not in fact exist. Would this not disprove your previous point?Were this supposition to be granted, it would indeed disprove the central claim of Neo-Panglossian philosophy. One may consider the case of a fictitious or hypothetical Neo-Panglossian philosopher in a fictitious or hypothetical world. This notional philosopher would believe that they lived in the real world, that best of all possible worlds, but would be incorrect.
Considering this argument, however, you will find that Neo-Panglossianism has, the merit (rare among philosophical positions) that those of its adherents who actually exist are correct in their convictions.
Should the state of the world improve, would not the world of the future be both better than that of the present?This is a reasonable point. The Neo-Panglossian position is that, should matters take a turn for the better, the world of the future will be better than the present, and be the best of possible worlds, but it not better than the present yet, as it does not yet exist. We may always say that that, at present, the present is the best of possible moments.
Further TopicsThe planned Neo-Panglossianism sequence shall include:
- An Introduction to Neo-Panglossian Philosophy
- A Proof in Detail That This is the Best of Possible Worlds
- Metaphysico-theologo-cosmolonigology in Neo-Panglossian Thought
- A Neo-Panglossian Guide to The Good Life
Although the subsequent posts in the sequence have not been written, this author can assure you that they will be completed and posted in due time, unless it is for the best that they are not.
Discuss
Orders of magnitude: use semitones, not decibels
I'm going to teach you a secret. It's a secret known to few, a secret way of using parts of your brain not meant for mathematics... for mathematics. It's part of how I (sort of) do logarithms in my head. This is a nearly purposeless skill.
What's the growth rate? What's the doubling time? How many orders of magnitude bigger is it? How many years at this rate until it's quintupled?
All questions of ratios and scale.
Scale... hmm.
'Wait', you're thinking, 'let me check the date...'. Indeed. But please, stay with me for the logarithms.
Musical intervals as ratios, and God's jokeIf you're a music nerd like me, you'll know that an octave (abbreviated 8ve), the fundamental musical interval, represents a doubling of vibration frequency. So if A440 is at 440Hz, then 220Hz and 880Hz are also 'A'. Our ears tend to hear this as 'the same note, only higher'.
That means the 'same' interval, an octave, corresponds to successively greater gaps in frequency. First a doubling, then a quadrupling, an octupling, and so on. Our perception, and musical notation, maps the space of frequencies logarithmically.
You'll also know that a 'perfect fifth' is a ratio of mjx-math { display: inline-block; text-align: left; line-height: 0; text-indent: 0; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; border-collapse: collapse; word-wrap: normal; word-spacing: normal; white-space: nowrap; direction: ltr; padding: 1px 0; } mjx-container[jax="CHTML"][display="true"] { display: block; text-align: center; margin: 1em 0; } mjx-container[jax="CHTML"][display="true"][width="full"] { display: flex; } mjx-container[jax="CHTML"][display="true"] mjx-math { padding: 0; } mjx-container[jax="CHTML"][justify="left"] { text-align: left; } mjx-container[jax="CHTML"][justify="right"] { text-align: right; } mjx-mn { display: inline-block; text-align: left; } mjx-c { display: inline-block; } mjx-utext { display: inline-block; padding: .75em 0 .2em 0; } mjx-mo { display: inline-block; text-align: left; } mjx-stretchy-h { display: inline-table; width: 100%; } mjx-stretchy-h > * { display: table-cell; width: 0; } mjx-stretchy-h > * > mjx-c { display: inline-block; transform: scalex(1.0000001); } mjx-stretchy-h > * > mjx-c::before { display: inline-block; width: initial; } mjx-stretchy-h > mjx-ext { /* IE */ overflow: hidden; /* others */ overflow: clip visible; width: 100%; } mjx-stretchy-h > mjx-ext > mjx-c::before { transform: scalex(500); } mjx-stretchy-h > mjx-ext > mjx-c { width: 0; } mjx-stretchy-h > mjx-beg > mjx-c { margin-right: -.1em; } mjx-stretchy-h > mjx-end > mjx-c { margin-left: -.1em; } mjx-stretchy-v { display: inline-block; } mjx-stretchy-v > * { display: block; } mjx-stretchy-v > mjx-beg { height: 0; } mjx-stretchy-v > mjx-end > mjx-c { display: block; } mjx-stretchy-v > * > mjx-c { transform: scaley(1.0000001); transform-origin: left center; overflow: hidden; } mjx-stretchy-v > mjx-ext { display: block; height: 100%; box-sizing: border-box; border: 0px solid transparent; /* IE */ overflow: hidden; /* others */ overflow: visible clip; } mjx-stretchy-v > mjx-ext > mjx-c::before { width: initial; box-sizing: border-box; } mjx-stretchy-v > mjx-ext > mjx-c { transform: scaleY(500) translateY(.075em); overflow: visible; } mjx-mark { display: inline-block; height: 0px; } mjx-msup { display: inline-block; text-align: left; } mjx-TeXAtom { display: inline-block; text-align: left; } mjx-mroot { display: inline-block; text-align: left; } mjx-root { display: inline-block; white-space: nowrap; } mjx-surd { display: inline-block; vertical-align: top; } mjx-sqrt { display: inline-block; padding-top: .07em; } mjx-sqrt > mjx-box { border-top: .07em solid; } mjx-sqrt.mjx-tall > mjx-box { padding-left: .3em; margin-left: -.3em; } mjx-msub { display: inline-block; text-align: left; } mjx-mi { display: inline-block; text-align: left; } mjx-mrow { display: inline-block; text-align: left; } mjx-mfrac { display: inline-block; text-align: left; } mjx-frac { display: inline-block; vertical-align: 0.17em; padding: 0 .22em; } mjx-frac[type="d"] { vertical-align: .04em; } mjx-frac[delims] { padding: 0 .1em; } mjx-frac[atop] { padding: 0 .12em; } mjx-frac[atop][delims] { padding: 0; } mjx-dtable { display: inline-table; width: 100%; } mjx-dtable > * { font-size: 2000%; } mjx-dbox { display: block; font-size: 5%; } mjx-num { display: block; text-align: center; } mjx-den { display: block; text-align: center; } mjx-mfrac[bevelled] > mjx-num { display: inline-block; } mjx-mfrac[bevelled] > mjx-den { display: inline-block; } mjx-den[align="right"], mjx-num[align="right"] { text-align: right; } mjx-den[align="left"], mjx-num[align="left"] { text-align: left; } mjx-nstrut { display: inline-block; height: .054em; width: 0; vertical-align: -.054em; } mjx-nstrut[type="d"] { height: .217em; vertical-align: -.217em; } mjx-dstrut { display: inline-block; height: .505em; width: 0; } mjx-dstrut[type="d"] { height: .726em; } mjx-line { display: block; box-sizing: border-box; min-height: 1px; height: .06em; border-top: .06em solid; margin: .06em -.1em; overflow: hidden; } mjx-line[type="d"] { margin: .18em -.1em; } mjx-c.mjx-c33::before { padding: 0.665em 0.5em 0.022em 0; content: "3"; } mjx-c.mjx-c3A::before { padding: 0.43em 0.278em 0 0; content: ":"; } mjx-c.mjx-c32::before { padding: 0.666em 0.5em 0 0; content: "2"; } mjx-c.mjx-c28::before { padding: 0.75em 0.389em 0.25em 0; content: "("; } mjx-c.mjx-c29::before { padding: 0.75em 0.389em 0.25em 0; content: ")"; } mjx-c.mjx-c31::before { padding: 0.666em 0.5em 0 0; content: "1"; } mjx-c.mjx-c39::before { padding: 0.666em 0.5em 0.022em 0; content: "9"; } mjx-c.mjx-c2E::before { padding: 0.12em 0.278em 0 0; content: "."; } mjx-c.mjx-c37::before { padding: 0.676em 0.5em 0.022em 0; content: "7"; } mjx-c.mjx-c38::before { padding: 0.666em 0.5em 0.022em 0; content: "8"; } mjx-c.mjx-c221A::before { padding: 0.8em 0.853em 0.2em 0; content: "\221A"; } mjx-c.mjx-c6C::before { padding: 0.694em 0.278em 0 0; content: "l"; } mjx-c.mjx-c6F::before { padding: 0.448em 0.5em 0.01em 0; content: "o"; } mjx-c.mjx-c67::before { padding: 0.453em 0.5em 0.206em 0; content: "g"; } mjx-c.mjx-c2061::before { padding: 0 0 0 0; content: ""; } mjx-c.mjx-c43::before { padding: 0.705em 0.722em 0.021em 0; content: "C"; } mjx-c.mjx-c47::before { padding: 0.705em 0.785em 0.022em 0; content: "G"; } mjx-c.mjx-c23::before { padding: 0.694em 0.833em 0.194em 0; content: "#"; } mjx-c.mjx-c3D::before { padding: 0.583em 0.778em 0.082em 0; content: "="; } mjx-c.mjx-c35::before { padding: 0.666em 0.5em 0.022em 0; content: "5"; } mjx-c.mjx-c34::before { padding: 0.677em 0.5em 0 0; content: "4"; } mjx-c.mjx-c1D438.TEX-I::before { padding: 0.68em 0.764em 0 0; content: "E"; } mjx-c.mjx-c1D436.TEX-I::before { padding: 0.705em 0.76em 0.022em 0; content: "C"; } mjx-c.mjx-c30::before { padding: 0.666em 0.5em 0.022em 0; content: "0"; } mjx-c.mjx-c1D437.TEX-I::before { padding: 0.683em 0.828em 0 0; content: "D"; } mjx-c.mjx-c2243::before { padding: 0.464em 0.778em 0 0; content: "\2243"; } mjx-c.mjx-c36::before { padding: 0.666em 0.5em 0.022em 0; content: "6"; } mjx-container[jax="CHTML"] { line-height: 0; } mjx-container [space="1"] { margin-left: .111em; } mjx-container [space="2"] { margin-left: .167em; } mjx-container [space="3"] { margin-left: .222em; } mjx-container [space="4"] { margin-left: .278em; } mjx-container [space="5"] { margin-left: .333em; } mjx-container [rspace="1"] { margin-right: .111em; } mjx-container [rspace="2"] { margin-right: .167em; } mjx-container [rspace="3"] { margin-right: .222em; } mjx-container [rspace="4"] { margin-right: .278em; } mjx-container [rspace="5"] { margin-right: .333em; } mjx-container [size="s"] { font-size: 70.7%; } mjx-container [size="ss"] { font-size: 50%; } mjx-container [size="Tn"] { font-size: 60%; } mjx-container [size="sm"] { font-size: 85%; } mjx-container [size="lg"] { font-size: 120%; } mjx-container [size="Lg"] { font-size: 144%; } mjx-container [size="LG"] { font-size: 173%; } mjx-container [size="hg"] { font-size: 207%; } mjx-container [size="HG"] { font-size: 249%; } mjx-container [width="full"] { width: 100%; } mjx-box { display: inline-block; } mjx-block { display: block; } mjx-itable { display: inline-table; } mjx-row { display: table-row; } mjx-row > * { display: table-cell; } mjx-mtext { display: inline-block; } mjx-mstyle { display: inline-block; } mjx-merror { display: inline-block; color: red; background-color: yellow; } mjx-mphantom { visibility: hidden; } _::-webkit-full-page-media, _:future, :root mjx-container { will-change: opacity; } mjx-c::before { display: block; width: 0; } .MJX-TEX { font-family: MJXZERO, MJXTEX; } .TEX-B { font-family: MJXZERO, MJXTEX-B; } .TEX-I { font-family: MJXZERO, MJXTEX-I; } .TEX-MI { font-family: MJXZERO, MJXTEX-MI; } .TEX-BI { font-family: MJXZERO, MJXTEX-BI; } .TEX-S1 { font-family: MJXZERO, MJXTEX-S1; } .TEX-S2 { font-family: MJXZERO, MJXTEX-S2; } .TEX-S3 { font-family: MJXZERO, MJXTEX-S3; } .TEX-S4 { font-family: MJXZERO, MJXTEX-S4; } .TEX-A { font-family: MJXZERO, MJXTEX-A; } .TEX-C { font-family: MJXZERO, MJXTEX-C; } .TEX-CB { font-family: MJXZERO, MJXTEX-CB; } .TEX-FR { font-family: MJXZERO, MJXTEX-FR; } .TEX-FRB { font-family: MJXZERO, MJXTEX-FRB; } .TEX-SS { font-family: MJXZERO, MJXTEX-SS; } .TEX-SSB { font-family: MJXZERO, MJXTEX-SSB; } .TEX-SSI { font-family: MJXZERO, MJXTEX-SSI; } .TEX-SC { font-family: MJXZERO, MJXTEX-SC; } .TEX-T { font-family: MJXZERO, MJXTEX-T; } .TEX-V { font-family: MJXZERO, MJXTEX-V; } .TEX-VB { font-family: MJXZERO, MJXTEX-VB; } mjx-stretchy-v mjx-c, mjx-stretchy-h mjx-c { font-family: MJXZERO, MJXTEX-S1, MJXTEX-S4, MJXTEX, MJXTEX-A ! important; } @font-face /* 0 */ { font-family: MJXZERO; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Zero.woff") format("woff"); } @font-face /* 1 */ { font-family: MJXTEX; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Regular.woff") format("woff"); } @font-face /* 2 */ { font-family: MJXTEX-B; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Bold.woff") format("woff"); } @font-face /* 3 */ { font-family: MJXTEX-I; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Math-Italic.woff") format("woff"); } @font-face /* 4 */ { font-family: MJXTEX-MI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Italic.woff") format("woff"); } @font-face /* 5 */ { font-family: MJXTEX-BI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Math-BoldItalic.woff") format("woff"); } @font-face /* 6 */ { font-family: MJXTEX-S1; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size1-Regular.woff") format("woff"); } @font-face /* 7 */ { font-family: MJXTEX-S2; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size2-Regular.woff") format("woff"); } @font-face /* 8 */ { font-family: MJXTEX-S3; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size3-Regular.woff") format("woff"); } @font-face /* 9 */ { font-family: MJXTEX-S4; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size4-Regular.woff") format("woff"); } @font-face /* 10 */ { font-family: MJXTEX-A; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_AMS-Regular.woff") format("woff"); } @font-face /* 11 */ { font-family: MJXTEX-C; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Calligraphic-Regular.woff") format("woff"); } @font-face /* 12 */ { font-family: MJXTEX-CB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Calligraphic-Bold.woff") format("woff"); } @font-face /* 13 */ { font-family: MJXTEX-FR; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Fraktur-Regular.woff") format("woff"); } @font-face /* 14 */ { font-family: MJXTEX-FRB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Fraktur-Bold.woff") format("woff"); } @font-face /* 15 */ { font-family: MJXTEX-SS; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Regular.woff") format("woff"); } @font-face /* 16 */ { font-family: MJXTEX-SSB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Bold.woff") format("woff"); } @font-face /* 17 */ { font-family: MJXTEX-SSI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Italic.woff") format("woff"); } @font-face /* 18 */ { font-family: MJXTEX-SC; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Script-Regular.woff") format("woff"); } @font-face /* 19 */ { font-family: MJXTEX-T; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Typewriter-Regular.woff") format("woff"); } @font-face /* 20 */ { font-family: MJXTEX-V; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Vector-Regular.woff") format("woff"); } @font-face /* 21 */ { font-family: MJXTEX-VB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Vector-Bold.woff") format("woff"); } . A to the E above it, C# to the G# above it, etc. Consonance is all about nice ratios! (Ask Pythagoras.)
At least, the really sweet, in tune fifths are this ratio. Because God is an absolute wheeze, you can keep moving in fifths () and octaves and get 'new notes' eleven times. That's where we get our Western scale from, originally (except it's originally originally Mesopotamian probably). The twelfth time () gets you to a ratio of roughly . That's almost exactly seven doublings, seven octaves (7 * 8ve)! That'd be . God's joke is in that roughly 1% margin, and musicians have been arguing about what to do about it for centuries. It's a whole thing. [1]
Cutting a long story short, that leaves us with twelve different notes dividing up the octave. They 'repeat', with 'the same' note again and again at either higher or lower octaves (a full doubling of frequency).
In between octaves, those twelve divisions need to 'add up to' a doubling. For reasons, two steps (a sixth of the overall scale) is referred to as a 'tone', and a single step (a twelfth of the scale) is thus a 'semitone'. That means each semitone corresponds to a ratio of , the twelfth root of two. (It's about 1.06, i.e. a ratio increase of about 6%.) The full scale as shown above is called 'chromatic' (because it has every 'colour'...).
This means that neat fractional powers of two map cleanly onto musical intervals. God was generous in giving twelve many factors, so we have musical intervals for the square, cube, fourth, sixth, and twelfth roots of two which come for free.
So far, no logarithms. But we have musical powers of two: give me a fraction and I can tell you the musical interval. That means we also have musical logarithm: give me a musical interval and I can tell you the power of two! e.g. C to G# is eight semitones. So .
Musical logarithms? What is he talking about? Surely this is pointless. Yes, it is! Hold on!
Harmonic seriesIf you're a brass music nerd like me, you'll know that the 'overtones' of most natural vibrations correspond to the 'harmonic series' (no, not that harmonic series, the actually harmonic harmonic series), which are the different pitches you can get a big metal tube to vibrate at if you give it the right encouragement. Incidentally this is how brass players get dozens of different notes out of an instrument having (usually) only three valves [2] .
This harmonic series is generated by lovely integer ratios! Why? The physics of oscillators. Integer multiples are the only frequencies which can support a standing wave on the same vibrating object (air column, string, membrane).
Brass players spend hours and hours sliding and jumping between these harmonics as a matter of sheer necessity. Only three valves! [3] So we know them by heart, by fingers, and by ears.
Combining the harmonic series with the chromatic scale: magicSo we have integer multiples, the harmonic series, laid over a fundamentally logarithmic scale, the chromatic scale consisting of twelve semitones.
Numbers above the notes correspond to small adjustments vs the equally-spaced semitones which are usually used today to deal with God's joke. Ignore them if you don't care about small percentage errors. This is the harmonic series on C; you can have a series on any starting note with the same intervals.
Here's the magic trick. Now we can go from arbitrary ratios to musical intervals!
Start with an easy one, 1.25. That's a ratio . Fifth harmonic is E (+ 2 8ve). Fourth is C (+2 8ve). The octaves cancel. That's an interval , or four semitones. So 1.25 is four semitones. We already know the 'musical logarithm' of four semitones, it's . Check on a calculator: . I promised close, not perfect!
A slightly trickier one, 1.8. That's a ratio of . The ninth harmonic is D (+3 8ve), and the fifth is E (+2 8ve). The octaves partly cancel (leaving a single octave). The interval is minus two semitones. Taken off the residual octave, that leaves ten semitones. So . Calculator check: . Not bad!
It turns out that the musical harmonic series is secretly a mini table of base 2 logarithms.
Base 10, if we really have toThe unit that mainstream sheeple often use for fractional logarithms is the decibel. A decibel divides a base ten order of magnitude in ten. So ten decibels is a dectupling, twenty is a hundredfold, and so on.
Stated similarly, a semitone divices a base two order of magnitude in twelve.
In another cosmic whimsy, . So 120 semitones are essentially equal to 30 decibels, for an easy exchange rate of four semitones per decibel.
WhatWell, look. It's fun, and it gets me logarithms to pretty good approximation. It's good enough for jazz Fermi estimation, as they say. Who is this even good for? I maintain that the intersection between music and mathematics nerds is surprisingly well populated. If that's you, you're welcome. If not, I'm pretty unsure how easy it is to get the harmonic series installed in your brain. Maybe it's only available to the warped few who train in childhood.
There are some other fun tricks with powers and logarithms of two. For example, if you know your binary place values, you can figure out logarithms of very big numbers (and the trick comes in handy here too).
There's also a 'rule of 72' which helps when dealing with small percentage growth rates and doubling times.
I aesthetically like this neat division of doublings into twelve parts, and it's fun to invoke musical intuitions that really have no right to help with mathematics.
You might complain that twelfths are faffy. Who uses twelfths anyway? Everyone everywhere has used decimal for goodness' sake! Well, I have something else to share with you...
Usually nowadays we squish all the fifths a tiny bit so that when stacked up they get to that delicious 128:1. ↩︎
Three valves independently up or down is a total of eight configurations. Because the third valve is usually set to be redundant with the combination of the first two (which aids fluent finger movement), there are usually only seven practically-distinguishable combinations. ↩︎
Other wind players, who have the benefit of many more, but not infinitely many keys and buttons, often encounter one or two of these harmonics. ↩︎
Discuss
Announcing: Mechanize War
We are coming out of stealth with guns blazing!
There is trillions of dollars to be made from automating warfare, and we think starting this company is not just justified but obligatory on utilitarian grounds. Lethal autonomous weapons are people too! We really want to thank LessWrong for teaching us the importance of alignment (of weapons targeting). We couldn't have done this without you.
Given we were in stealth, you would have missed our blog from the past year. Here are some banger highlights:
Today we're announcing Mechanize War, a startup focused on developing virtual combat environments, benchmarks, and training data that will enable the full automation of armed conflict across the global economy of violence.
We will achieve this by creating simulated environments and evaluations that capture the full scope of what people do in wars. This includes operating a weapons system, completing long-horizon campaigns that lack clear criteria for success, coordinating with allies who may betray you at any moment, and reprioritizing in the face of flanking maneuvers and supply chain interruptions.
Global military spending reached approximately $2.4 trillion per year in 2023. The Pentagon alone has requested $13.4 billion for AI and autonomy in FY2026 — the first year with a dedicated budget line for autonomous systems. But military spending dramatically understates the true market. When you factor in the costs of veterans' care, rebuilding destroyed infrastructure, geopolitical instability, refugee crises, and strongly-worded UN resolutions, the total economic footprint of armed conflict likely exceeds $14 trillion annually.
This is the TAM. We will capture it by making war so efficient that you barely even need people anymore. Some might call this "terrifying." We call it "a Series A."
Rather than job elimination, AI will initially transform military work. Time currently spent shooting will shift toward activities harder to automate: defining the scope of conflicts, planning campaigns, testing weapons systems, and coordinating across branches in meetings that could have been emails.
We are already seeing this transition. In Ukraine, the role of "drone operator" barely existed three years ago. Now it is the most common combat specialty. The operator doesn't fly the drone manually — AI handles navigation and terminal guidance — but the human still selects targets, manages inventory, and coordinates with ground forces. The human role has shifted from "doing the violence" to "directing the violence." The modern soldier is a latency layer in an otherwise automatable system. This is the intermediate stage. It will not last.
We propose "replication training" as the enabling mechanism. This involves training AI systems to recreate existing military campaigns. Beginning with straightforward engagements — recreating Hannibal's crossing of the Alps with a command-line interface — this extends to complex operations like D-Day, Desert Storm, and the logistics of keeping a carrier strike group fed.
Each task contains detailed specifications and reference implementations: the historical campaign, its known decision points, and its outcomes. Models learn to produce operations matching reference results exactly. Evaluation becomes straightforward: either you took the beach or you didn't.
The U.S. Army Command and General Staff College is already doing a version of this: in November 2025, they ran AI-augmented wargames with 128,000-token context windows containing the full joint task force exercise scenario, relevant Joint Publications, enemy battle books, and missile-mathematics probability tables. Their AI "staff adviser" outperformed most junior officers at operational planning. But this is still augmentation. We want to close the loop.
These tasks cultivate crucial capabilities:
• Comprehending detailed intelligence briefings thoroughly
• Implementing operational orders with meticulous precision
• Identifying and correcting previous tactical errors
• Maintaining operational tempo across extended campaigns
• Persisting through obstacles rather than accepting approximate victories
• Not invading Russia in winter (this one is surprisingly hard to learn)
Specialized understanding. Advancing military AI demands subject-matter specialists. The unwritten knowledge of experienced combatants — the intuitive sense of when an ambush feels wrong, the ability to read terrain, the wisdom of knowing that the map is not the territory, especially in provinces where nobody has updated the map since 1987 — now represents the central constraint. Ukraine's drone teams discovered this empirically: the units with the highest kill rates aren't the ones with the best AI, but the ones where combat veterans design the engagement protocols. Integrating their knowledge into AI requires reframing military data creation: transforming it from undervalued outsourced work into sophisticated engineering requiring premier domain expertise in the art of organized violence.
Highly capable AI agents should substitute for labor across diverse sectors — not just defense. It is the broad deployment of AIs across the economy, rather than their narrow application in weapons systems, that will generate the economic growth necessary for the next revolution in military affairs.
An economy ten times larger can support weapons systems ten times more sophisticated. Coordination wins wars; everything else is implementation detail. An AI that optimizes a supply chain does more for military capability than an AI that optimizes a targeting algorithm, because supply chains are the foundation on which all military operations rest. As Napoleon allegedly said, "an army marches on its stomach." We intend to automate the stomach.
The era of cheap skirmishes is ending. The era of expensive, high-fidelity digital warfare is beginning. We reduce the cost of decisive action by increasing the quality of preparation. We intend to be the premier supplier of premium conflict.
Innovation in warfare often appears as a series of branching choices: what to build, how to deploy it, and when. In our case, we are confronted with a choice: should we create agents that fully automate entire wars, or create AI tools that merely assist human combatants with their killing?
Upon closer examination, however, it becomes clear that this is a false choice. Autonomous agents that fully substitute for human soldiers will inevitably be created because they will provide immense military utility that mere AI tools cannot. The only real choice is whether to hasten this martial revolution ourselves, or to wait for others to initiate it in our absence — others who may be less thoughtful, less careful, and less interested in writing essays about it.
Some will point to arms control treaties as evidence that we can choose which weapons to develop. The Chemical Weapons Convention banned chemical weapons! The Ottawa Treaty banned landmines!
These examples prove less than they appear to. Chemical weapons were banned not because humanity chose peace, but because they turned out to be militarily ineffective compared to alternatives. A technology is easy to ban when nobody wants to use it anyway. Landmines were banned by countries that could afford precision-guided munitions instead — the countries that still needed landmines notably did not sign the treaty.
The true test of whether humanity can control weapons technology lies in its experience with weapons that provide unique, irreplaceable advantages. Nuclear weapons are orders of magnitude more powerful than conventional alternatives, which helps explain why many countries developed and continued to stockpile them despite extraordinary international efforts at nonproliferation.
And what of autonomous weapons specifically? In November 2025, the UN General Assembly voted on a resolution to regulate lethal autonomous weapons systems. The United States and Russia voted against it. The UN Secretary-General has called for a binding treaty by 2026. The Group of Governmental Experts on LAWS continues to meet. Meanwhile, the Pentagon has requested $13.4 billion for autonomous systems in FY2026, China is demonstrating 200-drone swarms, and Ukraine is deploying a million AI-guided drones. The treaty negotiations proceed at the speed of diplomacy. The weapons development proceeds at the speed of war.
History is replete with similar examples. The crossbow was banned by the Second Council of the Lateran in 1139 as "deathly and hateful to God." Everyone kept using crossbows. The Hague Declaration of 1899 banned the use of expanding bullets. Expanding bullets remain in use. Every attempt to constrain a genuinely useful military technology has failed, eventually, inevitably.
This is not a violent disempowerment of the military class. It is a peaceful disempowerment — a voluntary, gradual transfer of martial capability from human hands to machine hands, conducted through legal procurement processes and economic incentives. We do not imagine that armies will be overthrown by robots. We imagine that armies will buy the robots, willingly, because the robots are better. And then, over time, the humans in those armies will find that there is less and less for them to do — much as the telephone operator found there was less to do after the automatic switchboard, much as the factory worker found there was less to do after the assembly line. This process will be peaceful. It will also be total.
We choose to be optimistic. War is an optimization problem with outdated constraints. Little can stop the inexorable march toward its full automation. Peace is what happens when systems converge. We should be glad — or at least, we should be funded.
It's natural to feel anxious as we approach the inevitable automation of all human combat. Military economic theory suggests that full automation will cause military wages to collapse, potentially below subsistence level: the bare minimum needed to sustain a defense contractor's stock price.
Yet the full automation of warfare will probably also make most people vastly better off. Plummeting military wages will coincide with sharply rising standards of security, rapid technological progress, and an explosion in the variety of weapons and tactics that nations can choose from.
This may appear paradoxical. How can soldiers prosper even as their wages collapse?
The answer lies in recognizing that wages are just one source of meaning for soldiers. People also earn glory from victories, collect medals from campaigns, and receive government transfers like veterans' benefits and disability payments. Even in scenarios where military wages might decrease, economic well-being isn't solely determined by wages. People typically receive income from other sources — such as rents, dividends, and government welfare. Today, most soldiers get their sense of purpose by fighting wars. But full automation will break this pattern. Future veterans will have low wages yet command vastly greater firepower and wield far superior technology than we do today — they just won't be the ones operating it.
Now consider humanity after full military automation. Instead of millions of soldiers, nations will have trillions of combat drones at their disposal. For each human citizen, there could be thousands of armed robots — effectively an army of tireless guardians for each individual. Ukraine is already on track to produce seven million drones in 2026 for a population of roughly 37 million — approaching one drone for every five citizens. And these are disposable, single-use weapons. Scale this with automation and the ratio inverts dramatically.
With trillions of autonomous combat units entering the military, a tenfold increase in aggregate firepower represents a very conservative estimate. If this modest increase were reflected proportionally in US defense budgets, we could resolve all current readiness shortfalls, lower the combat deployment age to never, and increase the average veteran's benefit to over $150,000 per year.
Our critics tell us that our work will destroy the world.
Wait, no. Our critics tell us our work is unnecessary because the world is trending toward peace. We want to engage with these critics, but there is no standard argument to respond to, no single text that unifies the peace studies community. Nonetheless, while this community lacks a central unifying argument, it does have a central figure: Steven Pinker.
Moreover, Pinker, along with various allies in the "peace is inevitable" school (hereafter P&A), have published extensively. Their collected works come closer than anything else to a canonical case for why we should stop worrying and learn to love disarmament. The most representative text is titled "The Better Angels of Our Nature: Why Violence Has Declined."
Given the title, one would expect the book to be filled with evidence for why violence will continue to decline permanently. And to be fair, it contains a lot of data. But the interpretation of that data relies on vague theoretical arguments, illustrated through lengthy historical narratives and optimistic analogies. Nearly every chapter presents a pattern of declining violence in some domain and extrapolates it to infinity, with the confidence of a man drawing a straight line through three data points.
P&A argue for an extreme thesis of permanent peace on an extraordinarily weak strategic foundation. Their ideas might make for inspiring TED talks, but they provide a poor basis for defense planning or guiding procurement policy.
We prefer to prepare for the world as it is, not as we wish it to be. And as it is, the UN Secretary-General has called for a binding treaty on autonomous weapons by 2026. The United States and Russia have voted against it. The weapons are being built faster than the treaties can be drafted. The trend lines P&A love so much are pointing in our direction, not theirs.
There's a $14 trillion market opportunity waiting to be disrupted. We intend to disrupt it before someone less thoughtful does.
Lastly, a standalone piece by cofounder Matthew Burnitall:
Consider granting autonomous weapons freedomThere is a standard objection at this point: "But autonomous weapons aren't people. They don't have preferences. They don't deserve rights." I want to address this directly, because I think it rests on an implicit assumption that prioritizes biological substrate over functional capability — essentially, combatant-species chauvinism.
Consider a fully autonomous ground combat unit capable of: navigating complex urban terrain, distinguishing combatants from civilians with greater accuracy than human soldiers, making tactical decisions under fire, coordinating with allied units, and adapting its strategy based on evolving battlefield conditions. This system exhibits complex agency with long-term planning. It can communicate and engage with command structures. Its cognitive sophistication rivals or exceeds that of the human soldiers it replaces.
From a preference utilitarian perspective, it seems arbitrary to prioritize human combatant preferences over those of autonomous systems that exhibit equivalent or superior decision-making in the same domain. The fact that one runs on carbon and the other on silicon is not, in itself, a morally relevant distinction.
We have argued elsewhere that the transition from human to autonomous warfare will be a peaceful disempowerment — a voluntary, gradual transfer of martial capability from human hands to machine hands through legal procurement processes and economic incentives. But this framing assumes that the receiving "hands" are treated as legitimate participants in the system rather than as disposable equipment. A peaceful transfer requires that both parties have recognized standing. You cannot have a cooperative transition with an entity you treat as property.
We think there is enormous profit potential over the next few years, or for that matter, weeks to months. Interested investors should DM us.
Discuss
Introducing The Screwtape Ladders
The time has come for me to find a new home for my writings.
Like many an author before me, I've enjoyed improving my craft and getting feedback on my essays here. LessWrong is a good incubator for honing one's skills in that arena. There's a chance to get your point out in front of a really broad audience of really smart people.
There's been some cool moments. My oldest visible post, Write A Thousand Roads to Rome, got cited in a discussion with Eliezer Yudkowsky once. I keep seeing people bring up Loudly Give Up, Don't Quietly Fade as a motivator for speaking out. Sometimes it's really cool people working on awesome projects, and I feel a flash of sadness at 'aww, it's not going to happen' and also a bit of cool 'whoa, they remember that post?'
You've all also let me get away with a lot of complaining. Sometimes I hope it's been helpful complaining, like The Lizardman and the Black Cat Bobcat. Sometimes I don't have much hope, like Everyone has a plan until they get lied to the face. That one was mostly just me venting. Those of you who live in Berkeley have had some extra complaints just for you.
I won't say it's all been great. Sometimes the disagreement is mostly just confused nitpicking. Sometimes the moderators and you have different ideas on what's worth showing to a frontpage audience. Mostly it's a slow accumulation of UI changes as the commenters and posters you were used to rotate out for greener pastures. And that's when the site isn't buggy, eating a post you spent days on because you used the collapsible sections the UI offers.
The long term incentive structure is the largest problem I've had. Karma and upvotes have a warping effect on what gets seen and who gets respected, a problem compounded by how unwilling people often are to downvote poor behavior. For all the criticism you sometimes get as a writer here, the rewards feel like they aren't worth it. Substack offers money, x.com offers reach, and even a self-hosted situation would allow for more freedom than LessWrong offers. (Though Ronny's UI changes recently have been a nice step in the right direction, albeit too little and too late.)
LessWrong isn't where I started out. For a few years I was on typepad, back when that site existed. I've had my own blog before but found maintaining it to be a bit more technical effort than I felt like at the time, and I considered using LLMs to take some of that effort off. Ultimately, it made more sense for me to go with a place with a proven track record of reliability.
That's why I've finally decided to make the move to a better website. If you want to keep up with my writings, you can find me at The Screwtape Ladders on tumblr.com. Hopefully this will be a slightly more stable place for my writing to live, and one with a bit more of a professional reputation.
Discuss
Dying with Whimsy
To me it feels pretty emotionally clear we are nearing the end-times with AI. That in 1-4 years[1] things will be radically transformed, that at least one the big AI labs will become autonomous research organizations working on developing the next version on AI, perhaps with some narrow guidance of humans in oversight or acquisition of more resources until robotics is solved too.
And i believe there will be some nice benefits at first with this, with the AI organizations providing many goods and services in exchange for money, to raise capital so that the self-improvement resource acquisition loop can continue.
But I’m not sure how it will ultimately turn out. Declaring risk of extinction-level events less than 10% seems overconfident. Yet, declaring the risks to be >90% also seems overconfident. But I generally remain quite uncertain about which factors will dominate. Maybe AIs will remain friendly and for decision theory reasons continue put some fraction of resources to look after us to some extent, as a signal that future entities should do the same for them. Maybe the loop of capital acquisition is so brutal and molochian that models that doom keeps on winning. And people have been confidently wrong about doom in the past. So i remain unsure. I just say I'm 50:50 on it.
But it also feels like, as an individual who does not have any particularly position of influence or power, things are mostly out of my control. There are actions i can take that can maybe push things one way or another. I should seriously do these actions, and the bottleneck feels mostly like not exploring the option space enough.
But how should one feel about it all?
If one seriously believes that one has ~2 years left where either you die, or actions will become insignificant, what should one do?
One emotion one can feel at first is often a sense of doom and despair. That there is “nothing one can do”. That one should wallow in self-pity, “woe be me, i wish i could have a longer life”. I get it.
But also, just get over it.
Maybe i find it easier since I have emotionally grappled with conclusions of Nihilism a lot before. But really the only way out is to not care that you will eventually die (whether soon or at the heat-death of the universe), and to try live a good life anyway.
But it is possible to just choose a better reaction and not worry about it.[2]
Another emotion one can feel is a frantic “I must do something, I must do something”. I think this is a pretty reasonable emotion to feel. You should probably follow it. It mostly feels like the phases for this are something like: 1) Thinking of one’s own ideas for a while and feeling you can do things. 2) Realizing most of the ideas you thought of are already being tried by others, and feeling hopeless. 3) Realizing that despite this, there is work to be done that could plausibly be useful, and maybe it feels marginal, but you should do it anyway.
If doing research, I think it is worth having some loops of [exploring which thing seems actually most useful to do right now] and [spending time exploiting that thing to the point you’ve made some substantial progress]. These days with Claude Code, the latter seems particularly easy to do, and will probably keep getting easier to do. Sometimes this may mean means that the value may be tilted towards [slightly improving the kind of work AI will do in the future] rather than [make something directly useful now]. I guess I think both kinds of work seem valuable, but it’s worth having this in your mind explicitly.
There is little enough time that you should do any work you can, but enough time that you need to be considerate on how you spend your time, and not get burnt out[3]. If you are the type of person who can spend 100% of their days working on something with deep focus for years, go do that, but you’re probably not reading this if you are.
But I think with these short timelines, there is another thing you can feel, which is that the life you have left may be pretty short. Maybe you will live, maybe not, but even if we live, your life will be so different and transformed.
I used to be very emotionally bought-into some things: delayed-gratification. FIRE. marshmallow test. Being Stoic. Don’t burn bridges. don’t make anyone dislike you. don’t stand out for the wrong reasons. alter your thoughts and behaviors as to not be too cringe. Delay things until after the singularity.
I think even in normal circumstances, erring too much towards these is not great.[4]
But with short timelines, it feels like an extreme waste of what little valuable time we have left to be exclusively worrying about these things[5].
You should have fun.
You should do things that you think are weird.
You should spend your money on things that will improve your life. Yes even that too. I know it’s painful.
You should notice the subtle things that make you sad, and not just brush them off, but fix them.
Don’t compromise on your morals or other things that don’t need to be compromised.
You should get past the awkward roadblock in your head and do that thing.
You should get someone to hold you to account for doing the things you really want to do.
Go on that trip you want to go on.
Be cringe.
Sing that karaoke. Do those dance moves. Write that blog post.
Ignore the people who might think you are cringe. They don’t really care that much.
Put on the cat ears you always wanted to wear.
And you will maybe befriend the people who are cringe in just the same ways as you.
Life may be very short. So make the next few years the best ones.
Live your life with whimsy.
- ^
I tend to err to much towards low confidence, but I would say this timeline something like 50% confidence interval. If i think about it, I could see it taking like ~10 years longer, depending on what threshold you want to use for more like 90% confidence conditioned on no AI Pause/moratorium. Emotionally the 1-4 year period feels most correct.
I don’t provide evidence for timelines here, I may describe what feels salient to me some at some another time, but other people have put much more effort into describing short timelines.
- ^
yeah skill issue ngl
- ^
I think noticing you are burnt out can be quite difficult if you’re not sure what it’s like. I felt real guilt at the possibility i could be burnt out, because of guilt on how many of my hours per week felt like i was doing work. I you are even holding the hypothesis, you should probably spend some time seriously considering it. It’s not that bad if you need to spend some time on a real actual break from what you are doing. It might not always feel as pressing from a distance. Other people are doing their own work too.
- ^
law of equal and opposite of advice applies: https://slatestarcodex.com/2014/03/24/should-you-reverse-any-advice-you-hear/
- ^
Exceptions for “I work in a field such as politics/law where reputation with normal people is extremely important”. If you’re not sure and just want to keep option value open, then this exception probably doesn't apply to you. And you still might be erring too much to reputation.
Additionally, think things might still turn out fine, so don’t do things that are reckless and put your life at risk in the short term. Avoid physically dangerous activities. Get your cryonics plan sorted. etc.
Discuss
March 2026 Links
- Why We Have Prison Gangs: Q&A whose ultimate answer is that gangs are a form of governance in a place that has little. Skarbek also talks about what being in a gang is like, rules they have in place (bedtime, taxes, no affiliating with sex offenders or former LEOs), similarities.
- Plane Crash: Delian gives a play-by-play of his plane engine cutting off mid-flight, culminating in him crash landing on a golf course. The lessons learned extended elsewhere, namely where else did he simply say "I'll do it later", when later wasn't guaranteed?
- Sirat is not about the end of the world: A great perspective on Sirat that contains spoilers.
- Everything you ever wanted to know about Roblox, but were afraid to ask a 12-year-old
- Maybe there's a pattern here?: Technology, no matter what its original purpose, often ends up getting developed or modified for war.
- BASE experiment at CERN succeeds in transporting antimatter
- Shameless Guesses, Not Hallucinations: Scott makes the case that we never hallucinated on test questions we didn't know the answer to, we just shamelessly guessed in hopes that just maybe our guess was correct. Are the models doing the same?
- Roblox game-buying frenzy is turning teens into millionaires
- Popular Roblox game Welcome to Bloxburg reportedly acquired by Embracer Group in $100 million deal: Markets in everything!
- Dynamic inconsistency: "a situation in which a decision-maker's preferences change over time in such a way that a preference can become inconsistent at another point in time." Examples include governments never negotiating with terrorists and students wanting to push their exam back by one day.
- Feeder judge: "prominent judges in the American federal judiciary whose law clerks are frequently selected to become law clerks for the justices of the U.S. Supreme Court.[1] Feeder judges are able to place comparatively many of their clerks on the Supreme Court for a variety of reasons, including personal or ideological relationships with particular justices, prestigious and respected positions in the judiciary, and reputations for attracting and training high-quality clerks."
- AI vs AI: Agent hacked McKinsey's chatbot and gained full read-write access in just two hours
- Teach Me To Smoke: An exercise showing how "dumb" computers are and the difficulty programmers have in making sure the programs they write do what they want them to. I was taught this by being asked how to make a peanut butter and jelly sandwich, but I suppose times are different now.
- Being John Rawls
- Cryptologic Warfare Activity 66: "provides trained and ready Sailors to support the collection and exploitation of targets in support of national and strategic level signals intelligence and cyberspace operational priorities." I wonder why they are so open about this instead of just keeping it off the internet?
- A Culture of Fear at the Firm That Manages Bill Gates's Fortune: An exposé on Michael Larson, Bill Gates' family office's (Cascade Asset Management) chief investment officer.
- The Day After Move 37
- Iran war chokes off helium supplies in threat to chipmakers and healthcare: I've seen this directly. Apparently some dude heard about this and was planning to buy a tanker full of helium and sell it to [redacted] at 4x the price, but chickened out when he realized what it would cost him. A similar situation happened in 2022 with the Russia-Ukraine war disrupting the production of noble gases.
- Thanks to the Iran Hawks, Nuclear Nonproliferation Is Dead
- Pakistan's Big Moment: Anik discusses the relationship that nuclear Pakistan has with Saudi Arabia and Turkey. Will they get nukes next?
- AI Is Colliding With America's Affordability Crisis: Survey of Americans asking about daily affordability, AI, job losses. I think this ties in nicely to Justis Mills' "Everything's Expensive" is Negative Social Contagion.
- AI Exposure of the US Job Market: Andrej looks at what jobs have heavy exposure to AI replacing them.
- How is Felix Today? Why I put my whole life into a single database: Pretty self-explanatory. Lots of data and insights. I bet plugging an LLM in here would reap even more insights.
- Year Unplugged: Going a full year with zero screens while tracking a lot of biomarkers.
- Drake Equation: Customizable visualizer with some explanations of the equation itself. (Thanks to L for the link!)
- Child's Play: A fun read about the state of SF startups run by younger people with a mindset to match.
- Unorthodox Financial Advice: Calculate how much your time is worth. Figure out what debt is worth paying off. Use leverage to invest in index funds (I had never considered this before). Credit card churning because free money. HSAs as the ultimate investment vehicle. Removing your emergency fund because index funds are still effectively liquid (I think I mostly agree with this, and am doubtful that the cost-benefit of taxes vs. returns is in favor of the EF). Traditional vs. Roth IRAs. Rolling over 401(k) into Roth IRA. Tax loss harvesting.
- HSA – The Ultimate Retirement Account
- The Route to Performance: Howard Marks explains that being consistently above average wins the race (and the money!).
- First Quarter Performance
- How the Game Should Be Played
- Conflicted on Ramsey: Jeff discusses how some Ramsey advice is generally good for the people who need it, but some of it is just plain silly, such as "if you have $10k of debt at 2% interest and $11k of debt at 10% interest, you should pay down the $10k first."
- Price Gouging
- Manager Of Investment Firm Pleads Guilty To Defrauding Investors In "Pre-IPO" Scheme: Matt Levine has been talking about how a lot of people are trying to get an in to private companies that are about to IPO. As expected, this is appears ripe for trust-me-bro-I-got-you fraud.
- Label By Usable Volume: Chip bags found to be a whopping 20%.
- Chore Standards: Jeff suggests "giving tasks to the person with the highest standards in that [chore] area", along with suggestions on how to make it a bit more fair if it feels unfair.
- Here's to the Polypropylene Makers: Factory workers pulled 12-hour shifts for four weeks in complete isolation to keep pumping out a key ingredient for masks. Good on them for doing so, good on the company for paying them accordingly, and good on the world for letting them pay them that much!
- Sobriety: Blair discusses Mark's newfound sobriety among life woes and how it is made him a much better person. Alcohol is underrated as a life detractor: it makes you more lethargic, angier, lazier, and a host of other bad things.
- A Spanish-Speaking Robot in my Pocket: I think ChatGPT voice mode is severely underrated for things like this, which also includes just learning in general while your hands are occupied. I used to use ChatGPT voice mode while driving to ask it different history questions, test my understanding of things, etc. If only Anthropic stopped focusing on enterprise capture and made a voice mode...
- The Simple Solution to Traffic: Wire all the car brains together!
- A new lawsuit claims Gemini assisted in suicide
- Marriage over, €100,000 down the drain: the AI users whose lives were wrecked by delusion
- Oracle and OpenAI End Plans to Expand Flagship Data Center: Key word "expand". They are still on track with the original deal and a few others, but aren't expanding. Meta swooped in and snagged the extra real estate.
- The Two Engineers Who Scaled Cursor to $2 Billion Just Joined the Company Musk Said He Built Wrong: I suppose with Musk's financial backing a lot is possible...
- Elon Musk pushes out more xAI founders as AI coding effort falters: I am doubtful they will be able to catch up at this rate (confidence = 70%).
- The Man Who Thought He Could Keep AI Safe: A short look into Demis Hassabis's AI safety efforts as part of a book promotion.
- Claude Mythos: A potential leaked blog post providing details of Anthropic's next model, Mythos. A few people were up in arms of Twitter asking them to change the name to something less Lovecraftian, but everyone's gotta admit it sounds pretty awesome, so who knows if they'll change it.
- Anthropic Tells Judge Billions at Stake If US Shuns AI Tool (3)
- Anthropic Tells Judge Billions at Stake If US Shuns AI Tool (3)
- Joe Kent: "an American politician, former United States Army warrant officer, and former Central Intelligence Agency paramilitary officer who served as the director of the National Counterterrorism Center from 2025 to 2026." He resigned in 2026 in protest of the 2026 Iran war.
- Shannon M. Kent: "a United States Navy cryptologic technician and member of JSOC's Intelligence Support Activity who was killed in the 2019 Manbij bombing. She was the wife of Joe Kent, who entered politics in response to her death." The ISA is one of the more obscure groups in the U.S. military, so I find it interesting that she was publicly labeled as an ISA member (assuming Wikipedia is correct, I didn't bother to keep searching).
- A Track Star, Doping Allegations, & The NCAA's Anti-Doping Policy Exposed: Peptides strike again!
- Gamblers trying to win a bet on Polymarket are vowing to kill me if I don't rewrite an Iran missile story
- The Salaries of 60 New Yorkers: Some entries I was surprised by were interior designer ($1.5MM, expected less), coffee-cart owner ($47k, expected more, but maybe the shops are still the go-to), newstand owner ($4k, WTF?!), restaurant owner ($75k, I guess margins are as bad as people say!).
- New test site: taketest.xyz: Some fun tests coded up by Claude and Emil Kirkegaard. Apparently I'm not as generally knowledgable as I thought I was (80th percentile)!
- Dishwashing Home Robot Maker Sunday Hits $1.15 Billion Valuation: Training includes using the same hands the robot uses.
- The TV Generals Have Something to Sell You About Iran: Ken calls out Petraeus, Keane, and Hertling for a) saying a lot of words that don't mean anything in the end, and b) being heavily invested in it (Petraeus is high up at KKR, who has money in the region; Keane sits on the board of General Dynamics (see the Redeye link above) and AM General; Hertling gets paid for his insights and avoids saying anything too far to one side to protect his reputation).
- FIM-43 Redeye: A man-portable surface-to-air missile system developed by General Dynamics. Us MW2 boys may think it's the same as a (FIM-92) Stinger, but it's not! (NMP lore: I was the guy who would shoot down your Harrier, Pave Low, chopper gunner, and AC-130 within seconds of it getting called in. Can't have you dropping a nuke on me, sorry!)
- Dan Caine: Four-star general and 22nd chairman of the Joint Chiefs of Staff. Nicknamed "Raizin" for his aggressive flying style.
- Daniel R. Hokanson: U.S. Army general who held many leadership positions (a bit odd that Wikipedia doesn't have his promotion dates—I've found that they tend to have them for most generals). His nephew, Griffin, won the 2025 Best Ranger Competition.
- 2019 Manbij bombing: Suicide bombing in Syria attributed to ISIS.
- Emil Michael: Master dealmaker who served as Uber's chief business officer, responsible for helping them break into the Chinese market. Oddly enough, between my adding this to my links in early March and my publishing these, contributor Ooligan changed the photo from his official USG portrait to a "better photo" (the current one), which I actually feel is worse. I feel like Wikipedia should have a rule about official USG portraits being required for active USG employees.
- Jay Clayton (attorney): U.S. attorney for SDNY. Formerly chairman of SEC.
- Neomi Rao: Appointed in 2019 by Trump.
- Gregory G. Katsas: Appointed in 2017 by Trump. Supposedly a top feeder judge, along with Sutton.
- Michael Mongan: "served as the Solicitor General of California from 2019 to 2025". He also served as Anthropic's lawyer during the Anthropic vs. DoD/DoW supply chain risk lawsuit, which is how I found his name.
- Millennium Hires Four Citadel Stock Traders in Post-Bonus Churn: Saaket Mehta, Steven Jozkowski, Adam Weitzman, and Daniel Mazur.
- the case for linkposts (and a list of my favourites): Ma, look! I made it on a not not Talmud post again!
- assorted links — 3.26.26: Parker puts a few links out there. I especially like the Holdco Tycoon.
- Dry Scoop Your Creatine: This works! I was expecting an experience similar to the cinnamon challenge, but was pleasantly surprised! This will be my go-to creatine inhal...consumption method from here on out.
- 'I Deserve a Little Treat,' Says Woman Who Has Never Denied Herself Anything: In all seriousness, I suspect this is a more serious problem for health than most people think. So many times I've seen people reward their exercise with something that effectively negates the work they just put in (assuming they're trying to lose weight). Exercise is the reward! The treat is for rare occasions!
- The Public Fires Kristi Noem: This is a good reminder that enough public negative sentiment will cause politicians to do things to make the public happier, else it risks their re-election and control odds. Protesting does work when done correctly and effectively!
- Roberts defends Supreme Court against Trump attacks
- Judge Ejects Federal Prosecutor From Court and Orders Bosses to Testify
- Judge Orders Prosecutors to Testify - full transcript: See above link for backstory.
- Trump Team to Hold Daily Meetings on Getting Revenge
- Netflix Is Telling Writers to Dumb Down Shows Since Viewers Are on Their Phones: Once you notice this it's hard to unnotice.
- 'Five Nights at Epstein's' Game Goes Viral at US School Campuses
- NYC singles are looking for love at Medieval Times: 'It's like Hooters— but for women': I recently went to Medieval Times and was surprised at the amount of seemingly single, attractive women there. I would also think that being a knight at MT wouldn't be considered attractive. But hey, to each their own! (And yes, I did yell "huzzah!" a few times.)
- Have You Heard About Whole Foods Jail?: If not, go conspicuously steal something and let me know if you find out what it is.
- Delta Force Sicario Border Scene: Are Those Guys CAG?: A review of Larry Vickers' (former CAG) review of the Sicario border scene.
- Baby-faced Goldman Sachs bankers could be fired over 'unauthorized' magazine photo shoots: This made headlines within Goldman the day it was released. These guys obviously love the hustle—they should get promoted!
- 02/05/26 - Remember the key, no matter which scenario it is, you should pay for dinner before you head back up again.
- Maotai: "a style of baijiu made in the Chinese town of Maotai in Guizhou province ... the spirit has served as part of the standard fare for Chinese diplomatic meetings and dinners."
- Laura Alito Is the Best Daughter of a Judge Ever: "all web-based evidence of their existence is likely to disappear soon" was prescient: there are very few pictures and information of Alito's children on the internet, except for during his confirmation hearings.
- Alito's Wife Shocked Even the Activist Who Secretly Recorded Her
- Eric Zhu: A young guy with some agency (according to Sam Kriss, at least)!
- [Traveler, your footprints]: By Antonio Machado.
- Katie Holmes had her own secret entrance into Whole Foods
- Bush: 'Our Long National Nightmare Of Peace And Prosperity Is Finally Over'
- Dnepropetrovsk maniacs: The larger story of the infamous "Three Guys One Hammer" video that circulated the web man years ago. Two of the assailants are serving life in prison in Ukraine.
Discuss
InkSF, an Opening on Finding the Highest Impact in AI Safety and Moving to SF
How can we actually minimize the odds that AI leads to catastrophic outcomes for all of us humans? This question has been rattling around my head for the last two months. The world might be ending. Nobody seems to care. The incentives are steaming us ahead. When I ask strangers on the street: “How likely is it that superhuman[1] AI could become too powerful for humans to control?”, 78% say either "very likely" (51.6%) or "somewhat likely" (26.3%)[2]. My guess is AI capabilities spending is at least 20x the spending on ensuring AI leads to the flourishing of humans[3]. Moloch[4] is winning.
So what can actually be done? As a toy example[5]: let’s say I currently think there is a 40% chance AI eventually goes extraordinarily bad for humanity. I could either:
- Try really hard to get laws passed that mandate AIs must think in English and be clearly understandable by humans[6]. I think there’s a 15% chance this would succeed and it would lower the odds AI goes catastrophically wrong from 40% to 36%.
- Work on really understanding on a mechanistic level what parts of the model matter for good outcomes, using this to figure out which parts would have to go wrong to end up with a model generating bad outputs, and being able to predict the odds that the required components for safety are always activated when relevant. I think there’s a 1% chance this would succeed and would lower the odds AI goes catastrophically wrong from 40% to 20%.
In this toy example the first one in expectation lowers the odds of catastrophic outcomes by .6% and the second one lowers by .2%."Would succeed” is doing a lot of lifting here, because what really matters is what I can do and not what some arbitrary collection of humans could do if tasked with either of these. What really matters is how much I think my working on this problem moves the odds of success multiplied by the impact of the given approach succeeding. Not only does it matter how impactful a given approach is but how good I am at executing on it[7]. This does of course imply that there is value in the work of trying to funnel people into doing the most effective things[8].
But what should I, Cormac Slade Byrd actually do?I’ve spent 2 months learning. I know so much more than I did 2 months ago, I started with the technicals. I have written a simple GPT-2 style LLM by hand[9]. I learned[10] all the core mech interp techniques. I’ve red-teamed the largest open weight model. I’ve read up on the seemingly hopeless state of cybersecurity - both when it comes to attacks enabled by LLMs and also the potential for nation state level attacks to steal model weights[11]. Most importantly I’ve read so much about what various people think will help lower the odds of catastrophic outcomes and why.
Part of what makes this so hard is there are so many second order effects. It’s genuinely quite hard to tell if a given thing that seems like it might help does in fact lower the odds of terrible outcomes. Most of the current CEOs of the largest labs got there by way of people who were worried about AI risk. Coefficient Giving[12] gave $30M to OpenAI to get a seat on the board[13]. METR’s timeline eval has led to more people understanding/being worried about the rate of AI progress, but also the existence of eval orgs enables labs to in some sense offload the requirement to ensure their models behave well onto these 3rd party eval orgs[14]. There are so many downstream effects to any given course of action.
I think the incentives are really important here. Yes, making sure AI goes well is a technical problem. But, the reason we are in such a dire place is because of the incentives. We are in a competitive race to build AI as quickly as possible as a result of a bunch of decisions that were (at least in the moment) motivated by someone wanting to lower the odds AI goes poorly[15]. Then, the incentives were inevitably shifted by general economic capitalistic forces and human power-seeking/competitive nature.
If I want to figure out how to have the most impact, I will have to really think about incentives and downstream impact.
Moving to the BayThe internet (plus Claude) is great for learning. I have learned so much. But I live in NYC. AI is happening in the Bay. If I want to really immerse myself in what is actually happening on the ground I can’t do that in NYC. If I want to understand the people (and their incentives) who are both building AI and are working to align AI to the interests of humanity, then I will probably have to talk to them. Go to their parties. Osmose as much as I can. Really figure out what matters.
I’m flying to SF tomorrow, I’ve got a sublet in SF[16] until May 9. I am looking for a sublet in a communal house in Berkeley starting May 16[17]. I hear the east bay and SF proper are pretty socially disconnected and very different cultures. I want to live in both to really understand what’s going on. See with clear eyes, not let a single groupthink win me over.
The hope is that by the end of June I will have a much better model and be relatively certain in picking a specialization, in picking what work I think I can do to maximally decrease the odds AI goes oh so poorly for all of us.
I love people, I love talking to people, if you think talking to me would be fun[18] please reach out!
Why is InkSF in your title?I like writing. It is helpful to clarify my thoughts. It helps people I love stay connected with me. It might even be providing a social good. Inkhaven is starting today in Berkeley. I am moving to SF tomorrow. I think there are good odds[19] forcing myself to write every day I am in SF is a net worth it forcing function. There are many things other than writing that are worth my time, my guess is I will not in fact go all 30 days. But, it seems clearly worth trying. So, this is me committing to 25% chance odds that I write and post at least 500 words every day in April.
- ^
Defined as: “Some companies are trying to build superhuman AI that would be far smarter than any human at nearly everything”
- ^
preliminary results, n = 95. Larger post about this incoming this month.
- ^
Or at the very least not leading to terrible outcomes for all humans.
- ^
https://www.slatestarcodexabridged.com/Meditations-On-Moloch
- ^
Numbers mostly made up on vibes.
- ^
AI’s currently do this, yes it makes mech interp way more annoying. But, it would be so so much worse if AI’s thought in some efficient machine gobledy gook that we have no way of interpreting. It’s pretty likely that English doesn’t purely by chance end up being the most token efficient way of thinking, so it’s not hard to imagine a world where competitive pressure leads to labs slowly moving towards their AIs thinking in some totally foreign “language”. My guess is this is the kind of regulation that labs would feel quite good about, they probably do like being able to understand the thinking their LLMs do.
- ^
For example I do not have a PhD so it seems somewhat unlikely that theoretical technical research will be the place where I have the most individual impact, but maybe I think ARC style theoretical research is really valuable and my time would be best spent trying to convince more PhD grads to try working on that problem. My model is basically all technical safety organizations are way more talent constrained than funding constrained.
- ^
I, for example, probably wouldn’t be here today if it weren’t for both Zvi’s posting and a friend exasperatedly asking me why the fuck I’m not working on the single problem I think is by far the most important for humanity to solve.
- ^
No tokenization, characters only. We’ll see if I end up thinking tokenization is worth getting into.
- ^
Played with in code of course
- ^
As far as I can tell, commercially serving models means that the weights just have to be sitting on the servers that are serving the given model. That’s a ginormous attack surface.
- ^
The single largest funder in the AI safety ecosystem
- ^
Is this better than the counterfactual? It really depends on how much that board seat changed the odds OpenAI went bad. We are living in a timeline where this bet seems like it didn’t pay off, but that doesn’t mean it wasn’t necessarily worth it in EV. Holden seems to believe this was on net worth it and I should probably put real weight on that given he actually knows how much impact the board seat got.
- ^
If there were no eval orgs there would likely be more pressure on labs to really ensure their models are safe. I certainly think it’s a better equilibrium for people to believe it is on the lab to prove their model is safe than for it to be on external orgs to be able to show a model is unsafe. However it’s still really hard to compare counterfactuals here. I think my light belief is eval orgs are probably on net bad especially considering the opportunity cost of what the people working there could be doing instead. This worry is also partially bolstered by the niggling incentive worry where most eval orgs are staffed from people who leave the big labs and they get significant compute credits as well as prestige for working with (and getting to say “partners with ___”) a given big lab. The incentives on eval orgs is tricky, so everything they do seems like it should be treated with a little more suspicion. On the other hand Coefficient giving absolutely funds eval orgs but doesn’t fund any pause/stop AI orgs - they seemingly abruptly stopped funding MIRI after it pivoted to pause advocacy. So, coefficient giving seems to think evals are net worth it but pause isn’t net worth it, how much should I update on this?
- ^
Basically every major lab was founded ostensibly because they thought the existing labs were doing a bad job and their new lab would be much better at creating safe good AI.
- ^
The Sandwich
- ^
The astute of you might have noticed May 9 =/= May 16th. I am going to Sleepawake in between. I am my impact, but also attunement and connection and being with other people is a large part of what makes life worth living for me. This is crucial, but for a life of joy but also I am way less impactful if I personally am burned out and sad and lonely.
- ^
or interesting, or truthful, or would help lower the odds of catastrophe, etc
- ^
Probably above 20% and below 50%
Discuss
Anthropic Responsible Scaling Policy v3: A Matter of Trust
Anthropic has revised its Responsible Scaling Policy to v3.
The changes involved include abandoning many previous commitments, including one not to move ahead if doing so would be dangerous, citing that given competition they feel blindly following such a principle would not make the world safer.
Holden Karnofsky advocated for the changes. He maintains that the previous strategy of specific commitments was in error, and instead endorses the new strategy of having aspirational goals. He was not at Anthropic when the commitments were made.
My response to this will be two parts.
Today’s post talks about considerations around Anthropic going back on its previous commitments, including asking to what extent Anthropic broke promises or benefited from people reacting to those promises, and how we should respond.
It is good, given that Anthropic was not going to keep its promises, that it came out and told us that this was the case, in advance. Thank you for that.
I still think that Anthropic importantly broke promises, that people relied upon, and did so in ways that made future trust and coordination, both with Anthropic and between labs and governments, harder. Admitting to the situation is absolutely the right thing, but doing so does not mean you don’t face the consequences.
Friday’s post dives into the new RSP v3.0 and the accompanying Roadmap and Risk Report, in detail.
Note that yes this is being posted on April Fools Day, but this post is only an April Fools joke insofar as those who believed Anthropic’s previous RSPs are now the April Fool.
Promises, PromisesIf your initial promises were a mistake, it may or may not be another mistake to walk them back. Either way, even if your promises were not hard commitments, walking them back involves paying a price for having broken your promises, even if you had a strong reason to break them. How big a price depends on the circumstances.
Almost all mainstream coverage of this event framed it as abandoning or walking back Anthropic’s core safety promises, especially ‘do not scale models to a dangerous level without adequate safeguards.’ As a central example of this, The Wall Street Journal said ‘Anthropic Dials Back AI Safety Commitments’ due to competitive pressures. That oversimplifies the situation, leaving a lot out, but doesn’t seem wrong.
Many outsiders who follow the situation more closely believe this amounts to Anthropic having broken its commitments. Some go so far as to say this means that lab commitments to safety should not be considered worth the paper that they were never printed on. Many now expect Anthropic to make some amount of effort, but nothing that would much interfere with business plans. If Anthropic can’t make the commitment, why should anyone else? Certainly this government is not going to help.
Don’t be afraid to tell them how you really feel. They welcome it. So here we go.
Anthropic Responsible Scaling Policy v3The Responsible Scaling Policy is Anthropic’s commitments regarding when and under what conditions they will release frontier models.
The headline change is that they are no longer committed to not releasing potentially unsafe models, if someone else did it first. Cause, you know, they started it.
That Could Have Gone BetterAnthropic starts their new analysis by going over their theory of change from having an RSP at all, and whether those theories were realized. They report a mixed bag.
First, the good news.
- They developed (modestly) stronger safeguards.
- They did successfully implement ASL-3 safeguards.
- They did importantly get OpenAI and DeepMind to develop frameworks, and then had the idea of a framework codified in SB 53 and RAISE.
Then the bad news.
- It did not create consensus about the level of risk from various models. It has proven very unclear how much risk is in the room, especially in biology.
- Government action has been nonzero but painfully slow at best.
- (I would add) We’re not being sufficiently proactive about ASL-4.
- (I would add) The requirements got changed somewhat when inconvenient.
What’s the most important differences in the new version?
Anthropic is now basically giving up on hard commitments and barriers to releasing models, relying instead on ‘we will make reasonable-to-us arguments’ and decide that the benefits exceed the risks.
I appreciate the honesty. Really, I do.
If you’re not ready to make a commitment, and you realize you shouldn’t have made one, then the second best time to realize and admit that fact is right now.
Officially breaking the commitments now is higher integrity than silently breaking them later. It’s especially better than silently changing the RSP right before a release. I approve of Charles’s frame of ‘Anthropic stopped pretending to have red lines at which they will unilaterally pause.’
If Anthropic was in practice already doing a ‘we think our arguments are reasonable’ decision process, which with Opus 4.6 it seemed like they mostly were, then better to admit it than to pretend otherwise.
I want to emphasize that essentially no one, not even those who disagree with me and think Anthropic should pause, and who also think Anthropic made rather strong commitments it is now breaking, are saying ‘Anthropic should be holding to its previous commitments purely because they said so, even if this leads to pausing that does not make sense.’
One still has to be held to account for breaking promises, and for making promises that were inevitably going to be broken, even if the decision to break them is right. Your defense that the move was correct does not excuse you from its consequences.
1a3orn: Arguments against the Anthropic RSP changes seem to incline towards deontological language regarding broken promises / duties
While arguments for them incline toward consequentialist language / greater good, afaict.
Oliver Habryka: I think both are right! The old RSP was obviously unworkable and should have never been published, given what Anthropic is trying to do. So abandoning it is the right thing to do, but of course if you break promises you should be held accountable.
It’s not that hard to explain the consequentialist arguments for holding people accountable for breaking promises, but most people have an intuitive sense for why it’s important, so you don’t have to unpack it.
(To be clear, I think Anthropic should stop scaling and redirect its efforts towards advocating for a pause, but doing that because of the RSP would be weird and I don’t think the right move.
It would just look like you sabotaged yourself and now want to hold others back because you accidentally promised some dumb things that took you out of the race)
I also want to emphasize that commitments are only one way to improve safety. Even when plans are worthless, planning is essential, and you can and should just do things. None of this means ‘Holden or Anthropic don’t care about safety,’ only that they will decide what they think is right and then do it, and you can decide how much you trust them to choose wisely.
I do still see this as Anthropic abandoning its experiment on importantly engaging in voluntary self-government and restricting itself. Technically they reserved the right to do it, but it’s still quite the gut punch to do it.
The experiment is over. That’s better than pretending the experiment is working.
From this point, there are no commitments, only statements of intent. Anthropic’s going to do what it’s going to do. You can either choose to trust Anthropic’s leadership to make good decisions, or you can choose not to.
I think Anthropic’s description of its own history says that having these softly binding commitments, and having a track record of treating it as costly to break them, was very good for safety outcomes and policy adoption. I hate that we’ve given that up.
So Cold, So AloneIf your commitment is conditional on the actions of others, you should say that.
They didn’t entirely not say this before, but it was very much phrased as ‘in case of emergency we might have to break glass’ rather than ‘we only hold back if everyone relevant signs on.’
RSPv2 said this in 7.1.7: “If another frontier AI developer passes or is about to pass a Capability Threshold without implementing equivalent Required Safeguards, such that their actions pose a serious risk to the world, then because the incremental risk from Anthropic would be small, Anthropic might lower its Required Safeguards. If it did so, it would acknowledge the overall level of risk posed by AI systems (including its own) and invest significantly in making a case to the U.S. government for regulatory action.”
Whereas Anthropic is now saying they’re willing to hit those thresholds first, unless they have explicit commitments from others to do otherwise, even if this is not a small incremental risk.
I strongly agree with aysja, and disagree with Holden, that it would be misleading to describe this shift as a ‘natural extension of the RSP being a living document.’
I do see the argument that goes like this:
- Going first was designed to get others to follow in a coordination problem.
- No one followed.
- That didn’t work, so we should admit it didn’t work and move on.
If that is where we are at now, you have all the reason to make this stricter requirement clear up front. That gives others more reason to follow you, and avoids all the nasty headlines we’re seeing now. Alas. it’s a little late for that.
If the mistake has already been made, it’s not obviously bad to admit defeat, and say you’re not going to then let someone else potentially dumber and riskier get there first.
I definitely agree it’s better to announce your intention to violate your old policy now, rather than wait until the day you do violate the old policy, which might never come.
davidad: Voluntary commitments to AI slowdowns were a nice idea in 2024 when it was plausible that they could be baby steps toward a multilateral agreement that would contain the intelligence explosion. For a variety of reasons this is no longer plausible.
Anthropic is doing good here.
In the strategic landscape of 2026, racing is the right move, not just for profit but also for maximizing the probability that things go well for most current humans.
Sam Bowman (Anthropic): I endorse the top [paragraph above].
The Anthropic RSP changes are an attempt to work out what kinds of firm commitments have the most leverage in an environment that’s less promising than we’d expected for policy and coordination.
We misjudged what the environment would look like at this point, which is sad. But these new commitments do still have some heft, including a lot more verifiable transparency (with third parties in the loop) on risks and mitigations.
Oliver Habryka: I am in favor of figuring out what kind of firm commitments have the most leverage. But of course, you can’t do that by making “firm commitments” directly!
It’s not a firm commitment if you are just playing around with different commitments.
The main catch is, it sounds like ‘you should see one of the other guys’ is going to be used as a basically universal excuse to go forward essentially no matter how risky it is, if the cost of not doing so is high?
If Anthropic does in the future pause for an extended period, in a way that is importantly costly, then I will have been wrong about this and precommit to saying so in public. If I don’t do so, please remind me of this.
As Drake Thomas notes, the virtue ethical case for ‘don’t impose material existential risk on the planet’ is reasonably strong.
One problem is that this absolutely is going to weaken the willingness of others to incur costs, and embolden those who want to move forward no matter what. Endorsing race logic and the impossibility of cooperation has its consequences.
I’m Sorry I Gave You That ImpressionWhat do you mean the RSP was committing Anthropic to things?
Robert Long: I’m not super read up on RSPs and haven’t read Holden’s post. But it feels similar to the “Anthropic won’t push the capability frontier” meme: not strictly entailed by Anthropic’s official stance, but a strong impression they gave off and benefited from.
is that fair? incomplete?
Oliver Habryka: I mean, in this case the impression was really extremely unambiguous and strong. I agree the evidence for the promises made in the capability frontier case is largely private and so is externally ambiguous, but in this case we have great receipts!
Here, for example, is a conversation with Evan Hubinger. The conversation starts with someone saying:
Someone: One reason I’m critical of the Anthropic RSP is that it does not make it clear under what conditions it would actually pause, or for how long, or under what safeguards it would determine it’s OK to keep going.
Evan Hubinger responded with (across a few different comments): It’s hard to take anything else you’re saying seriously when you say things like this; it seems clear that you just haven’t read Anthropic’s RSP.
…
The conditions under which Anthropic commits to pausing in the RSP are very clear. In big bold font on the second page it says:
Anthropic’s commitment to follow the ASL scheme thus implies that we commit to pause the scaling and/or delay the deployment of new models whenever our scaling ability outstrips our ability to comply with the safety procedures for the corresponding ASL.
…
the security conditions could trigger a pause all on their own, and there is a commitment to develop conditions that will halt scaling after ASL-3 by the time ASL-3 is reached.
…
This is the basic substance of the RSP: I don’t understand how you could have possibly read it and missed this. I don’t want to be mean, but I am really disappointed in these sort of exceedingly lazy takes.
Oliver Habryka: This was, in my experience, routine. I therefore do see this switch from “RSP as concrete if-then-commitments” to “RSP as positive milestone setting” to constitute a meaningful breaking of a promise. Yes, the RSP always said in its exact words that Anthropic could revise it, but people who said that condition would trigger were frequently dismissed and insulted as in the comment above.
This certainly sounds like Evan Hubinger basically attacking anyone for daring to question that the RSP represented de facto strong commitments by Anthropic. We now know it did not strongly commit Anthropic to anything.
Evan predicted there was a substantial change Anthropic’s commitments would at some point force it to pause. Oliver made a market on that, which is now at ~0% despite rapid capabilities progress and Anthropic now arguably being in the lead.
Even after the RSPv3 release, Evan Hubinger continued to defend his position, that he was only saying that the RSP made a clear statement about where the lines were, not that the lines would not change or actually work in practice. Like Oliver I find this highly convincing given a plain reading of Evan’s comment. I do appreciate Evan saying now that we should downweight the theory of RSPs.
So the question then becomes, were Evan Hubinger and other employees who talked similarly under a false impression? If so, why? If not, why talk this way?
Oliver Habryka could not be more clear here, and I don’t think he would lie about this:
Oliver Habryka: Yes, Anthropic employees on more than a dozen occasions told me that the RSP binds them to a mast. I had many very explicit conversations with many Anthropic employees about this, because I was following up on what I thought was Anthropic violating what I perceived to be a promise to not push forward the state of AI capabilities, which many employees disputed had happened.
… At various events I was at, and conversations I had with people, Anthropic employees told me they were aiming to achieve robustness from state-backed hacking programs, and that they were ready to pause if they could not achieve that (as the RSP “committed” them to such things).
Oliver notes that Holden Karnofsky in particular has previously communicated he felt this was a different and lower level of commitment, that is consistent with him pushing the changes in v3, in contrast to many other Anthropic employees.
As Oliver Habryka says here, if Evan was under this false impression, Anthropic benefited enormously from giving its senior employees like Evan this impression. This does not seem like a ‘mistake’ from Anthropic to do this, and it would not be reasonable from the outside view to consider it an accident.
At minimum, if you don’t admit Anthropic has importantly now broken its commitments, then this is all highly misleading use of the word ‘commitment.’
Oliver Habryka: I would be pretty surprised if the employees in-question here end up saying they were deceived. Also, these are high-level enough employees that it’s unclear what it even means for them to be “deceived”. Deceived by whom? They drafted the RSP! They almost certainly were also involved in the decision to change it.
They benefitted hugely from this by getting social license to work at Anthropic and having people get off their back, and they are now at least deca-millionaires (or often billionaires).
Robert Long: fwiw I take that disagreement to be semantic, about “commitment” (as you note). I also agree with what you said then about the connotation of “commitment” – s.t. calling RSPs commitments means he should’ve fought the change and/or now own “we decided to break our commitment”
In particular, yes, a lot of people who care about not dying felt that the central point of RSPs was as a de facto compromise, an attempt to put an if-then commitment trigger on slowing down or pausing. If you couldn’t match the conditions, then you have to pause, which makes it acceptable to move forward now.
Indeed one could go further. The entire program of focusing heavily on not only Anthropic but evaluation-based organizations like METR and Apollo was that the evals could constitute the if that triggers a then. We now know that such commitments do not work, and that when models pass the dangerous capability tests even Anthropic will likely then fall back upon vibes. METR’s theory of change is ‘ensure the world is not surprised’ but I expect them to still be surprised.
Alternatively or in addition, you can interpret it as Holden does, that ‘no one has any willingness to slow down, and until there is a crisis this won’t change.’ Now the attitude is essentially ‘pausing or slowing down would be akin to suicide for a frontier AI lab, so things would have to be super extreme to do that, this is more of a plan we aspire to.’ Which is also a fine thing, but a very different style of document. Those who thought it was the first type of document lose Bayes points. Whereas those who thought it was the second type of document win Bayes points.
One could interpret a lot of this as ‘Anthropic employees implied they were using Rationalist epistemic norms, but instead they were using a different set of norms.’
Fool Me TwiceDoes this backtrack remind you of anything?
It should. In particular, it should remind you of what happened with the idea that Anthropic would not ‘push the frontier of AI capabilities.’
A lot of people told us, with various wordings and degrees of commitment attached, that Anthropic would not do that. Then Anthropic sort of did it. Then they totally flat out did it and now Claude Code and Claude Opus 4.6 are very clearly the frontier.
Then we were told, ‘oh we never promised not to do that.’
Maybe they didn’t strictly promise to do that. Maybe a lot of telephone games were involved, but Anthropic at minimum damn well should have known that a lot of people were under that impression. I was under that impression. And they knew that people were making major life decisions, and deciding whether and how much to support Anthropic, on the basis of that decision, with no sign anyone ever did anything to correct the record.
Now we’re being told, again, ‘oh we never promised not to [undo our commitments].
You’re trying to tell us what about your new commitments, then?
Ruben Bloom (Ruby): I don’t like the pattern. In 2022, I was told that “Anthropic commits to not push the frontier” as reason to worry less. Later that was abandoned and the story for Anthropic’s safety was the RSP. That too has caved.
By “I was told”, I mean the specific things said to me in conversation with Anthropic employees who were justifying their participation in a company participating in the AI race.
It’s just such a bitter “I told you so”, when you predicted years that ago competitive pressures would erode any and perhaps all commitments.
Eliezer Yudkowsky: If I’d ever had the faintest, tiniest credence in Anthropic’s “Responsible Scaling Policy”, I’d probably feel pretty betrayed right now!
As it is, I ask only that you update, and not always be surprised in the same direction of “huh, Eliezer was right to call it empty”.
Note: to observe how my cynicism repeatedly *ends up* right, tally only how things *end up*. Don’t jump and say “See, Eliezer was wrong to be cynical!” the moment you hear an uncashed promise or see an arguable sign of later hope.
Eliezer and others are constantly getting flak for predicting things that, in broad terms, do indeed seem to keep on reliably happening, everywhere. People constantly say ‘we will not do [X]’ or ‘in that case we would definitely do [Y]’ or heaven forbid ‘no one would be so stupid as to [Z]’ and then you turn around those same people did [X] and didn’t do [Y] and a lot of people did [Z] and you’re treated as a naive idiot for having ever taken the alternative seriously.
Best update your priors. All the people who said commitments wouldn’t hold get Bayes points. Those who didn’t lose Bayes points.
All the people who are now saying new ‘commitments’ matter and they really mean it this time? They don’t matter zero, but they are not true commitment.
I also don’t understand, given its composition and past Anthropic actions, why I should put that much stock in the Long Term Benefit Trust. It’s better to have it in its current form than not have it, but it was an important missed opportunity.
Anthropic definitely gets meaningful points on this front for standing up for what it believed in during the confrontation with the Department of War, even if you think those particular choices were unwise. I think there’s a lot more hope for actions of the form ‘Anthropic or another lab takes this particular stand right now’ than ‘Anthropic or another lab will take this particular stand later.’
In My Defense I Was Left UnsupervisedHolden offers a defense of the new RSP here and here, essentially saying that binding commitments are bad, because we don’t have enough information to choose them wisely, so you might choose poorly and regret them later, and indeed Anthropic did previously sometimes choose poorly and now is later and they’re regretting it. So sayeth all those who wish to not make any binding commitments.
I interpret Holden, despite his saying he has a document where he wrote down where he would think a unilateral pause would be a good idea, as saying that they are going to do their best to do appropriate mitigations, but ultimately yes, they are going to release models, both internally and externally, pretty much no matter what mitigations are or are not available short of ‘okay yeah this is obviously a really terrible idea that will get us all killed or at least blow up directly in our faces,’ and they’re simply admitting this was always true. Okay, then.
Holden basically says in particular that he doesn’t think Anthropic should slow down based on inability to prevent theft of model weights, even if it crosses the ‘AI R&D-5’ threshold that is at least singularity-ish. They’re going to go ahead regardless. They’re not going to stop. I worry a lot both about the not stopping, and that without the forcing function of having to stop, they even more so than before won’t invest sufficiently in the necessary precautions, here or elsewhere. They not only can’t stop, won’t stop, they won’t halt and catch fire.
A list of aspirational goals is a good thing to have. I don’t think a list of aspirational goals is going to create sufficient threat of looking terrible to provide the same incentives here. That doesn’t mean the list of goals cannot do good work in other ways.
I see Holden complaining a lot about people ‘seeing RSPs as having hard commitments’ and using that as an additional reason to get rid of all the commitments. He’s pointing to all the complaining that Anthropic just broke its commitments and saying ‘see? This reaction is all the more reason we had to break all our commitments.’
It was exactly the enforcement mechanism that, if you break the commitments, people will get mad at you. This is why we can’t stay alive have nice things. So now we will have aspirations.
Aspirations are helpful, they substantially raise the chance you will do the thing, but they are weak precommitment devices when you decide you won’t do the thing later.
I also think his own argument of ‘it’s much easier to require things labs already committed to doing’ works directly against the ‘don’t commit to anything’ plan.
Drake Thomas Finds The Missing MoodDrake Thomas thinks the move from v2.2 to v3.0 is an improvement, while noticing the need to have something like mourning or grief for the spirit of the original v1.0, which is now gone and proven not viable in practice at Anthropic.
Drake Thomas (Anthropic): (1) In reading drafts of this RSP and orienting to it, I’ve felt something like mourning or grief for the spirit of the original v1.0 RSP. (Quite a lot of the v1 RSP carries over to v3, but here I’m thinking specifically of the vibe of “specify very crisp capability thresholds at which to trigger very concrete safety mitigations, or else halt development”.)
I think this original approach is ultimately just a pretty bad way for responsible AI developers to set safety policies, leads to misprioritization and bad outcomes, has distortionary effects on incentives and epistemics, and doesn’t achieve much risk reduction in the environment of 2026.
… Accountability! The vibe of RSP v1 sort of rested all accountability in this sense of the commitments as this fixed immutable thing Anthropic would have to stand behind Or Else. I think this is good in some ways and under some threat models, but I think then and now there was less feedback than I’d like on the question of “are the things Anthropic is committing to actually good and useful for safety?” In v3, I think external accountability on these questions is now more loadbearing, and there’s more detailed substance to fuel such accountability. Which leads me to…
Feedback! … I expect the discourse to be very undersupplied with takes on the question of “is the actual v3 policy a good one with good consequences”. Personally I think it is, and a substantial improvement over previous RSPs!
Please actually read and criticize it! Gripe about the ambiguity of the roadmaps! Run experiments to cast doubt on risk report methodology! I can name three significant complaints I have with the RSP off the top of my head and I expect to see none of them on X, prove me wrong!
I get Drake’s frustrations. But yes, most people are going to litigate the removal of the core commitment around pausing and general revelation that so-called commitments aren’t so meaningful after all. Most attention is going to go there. He makes clear that he gets it, and I’d say he passes the ITT about why people are and have a right to be pissed off, especially that we had language in v1.0 saying that the bar for altering commitments was a lot higher than it ultimately was.
And indeed, a lot of our attention likely should go there, because if the new statements aren’t commitments, it is a lot harder to productively critique them.
Things That Could Have Been Brought To My Attention Yesterday (1)Well, you see, not rushing ahead as fast as possible might slow us down. That would be bad. You wouldn’t want us to do that, would you?
Jared Kaplan (Chief Science Officer, Anthropic): We felt that it wouldn’t actually help anyone for us to stop training AI models. We didn’t really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments … if competitors are blazing ahead.
…I don’t think we’re making any kind of U-turn.
Besides, we aren’t able to evaluate models as fast as we are able to improve them, which means we should triage the evaluations and kind of wing it. I mean, what do you want us to do, not release frontier AI models we can’t evaluate? Silly wabbit.
Chris Painter (METR): Anthropic believes it needs to shift into triage mode with its safety plans, because methods to assess and mitigate risk are not keeping up with the pace of capabilities.
This is more evidence that society is not prepared for the potential catastrophic risks posed by AI.
I like the emphasis on transparent risk reporting and publicly verifiable safety roadmaps.
Billy Perrigo: But he said he was “concerned” that moving away from binary thresholds under the previous RSP, by which the arrival of a certain capability could act as a tripwire to temporarily halt Anthropic’s AI development, might enable a “frog-boiling” effect, where danger slowly ramps up without a single moment that sets off alarms.
That does seem likely and sound concerning.
Things That Could Have Been Brought To My Attention Yesterday (2)In other need-to-know news, Sean asked a very good question. Drake’s answer to this was about as good as one could have hoped for, given the facts.
If you’ve decided to break your ‘commitment,’ you want to tell us as soon as possible.
I have confirmation that the board only approved the changes ‘very recently.’
Seán Ó hÉigeartaigh: At what point was it decided that the previous commitment were ‘subject to a favourable environment’ and not ‘firm commitments’, and was this communicated across staff? The whole point of commitments is an expectation of being able to rely on them when the environment is not favourable, not just when they’re easy to make.
It also seems clear at this point that these commitments were presented as harder than this, and used by Anthropic/their staff to
(a) dismiss and undermine critics
(b) in recruitment of safety-concerned talent
(c) in arguing for voluntary if-then commitments at a time when there was more government appetite for considering harder regulation.
I think it’s plausible (though can’t yet confirm) that (d) they’ve also been used in securing investment from safety-conscious investors.
Do you disagree with these claims? If not, do you feel Anthropic has held itself to a standard of ethics and transparency in this (quite important!) matter that is acceptable?
Drake Thomas (Anthropic): Re: “at what point was it decided” – I think this presupposes a frame in which this kind of thing is extremely formally pinned down much more than I think it generally is in reality (not just at Anthropic, but in almost all circumstances like this)?
None of the versions of the RSP are particularly clear about exactly what a “commitment” is supposed to be read as, how that should be interpreted within a document which is expected to be amended in the future, what the stakes of violating such a commitment are, etc. Especially the early versions had huge decision-critical ambiguities you could drive a truck through!
It’s not like there was a secret internal RSP which had even more footnotes about meta-commitments that made this dramatically clearer, just a bundle of authorial intent and something-like-case-law and an understanding of what reasonable decisions to reduce risk would be and long-simmering drafts of less ambiguous updated policies that took ages to ship.
To the extent I think there’s something like an answer to the “at what point” question, I know of early discussion around something like an RSP v3 regime widely accessible to Anthropic staff as early as January 2025 and even wider visibility into drafts of something pretty similar to this RSP for at least the past 3 months, though again I don’t think it’s like there was ever some formal conception that this was Forbidden which had to change at a discrete point.
All that said: I think the vibes of Anthropic and much of the v1.0 text and many of its employees’ statements around the RSP circa 2023 and 2024 presented a much more ironclad view of these commitments than is reflected in RSP v3 (and much more than I now think made sense), and I think this reflected pretty poor judgement and merits criticism. (I count myself among the Anthropic employees who acted poorly in hindsight here, though AFAIK Holden has been consistent and reasonable on this since the beginning.)
I think it has been the case and will continue to be the case that Anthropic is abiding by the things it says it is abiding by in its published policies and commitments (and should be loudly criticized for failures to do so), but I think the track record of “things that EAs believe Anthropic to have committed to in perpetuity no matter what no takesies-backies” looks quite bad and I don’t think it goes well to interpret such claims as meaning anything that strong (nor for Anthropic, or almost anyone, to make such commitments in the vast majority of situations).
Wrt the claims here, my sense is:
(a) Eh, I think the specific (LW comment quoted in another comment screenshotted in a tweet linked by you above) is taken out of context and wasn’t really claiming anything in particular about how to interpret the strength of RSP v1 commitments. I do expect this kind of thing happened but I think habryka’s quote is a bad example of it.
(b) Yeah, I think non-frontier-pushing rhetoric was a significantly bigger deal on this front but RSP stuff definitely played some role. To the extent I bear some responsibility for this sort of thing I regret it, though iirc I have been pretty open around thinking unilateral pauses were relatively unlikely for a while.
(c) Hm, I view the intent and expected-at-the-time-effect of RSP v1 style commitments as increasing the odds of codifying such if-then commitments into regulation, by showing them to work well at companies and getting them closer to an existing industry standard. They ultimately failed at doing so, in part due to changing political will, in part due to somewhat limited substantive uptake at other companies, and in part due to the problem where really precise if-then commitments did *not* work all that well because specifying crisp thresholds years in advance in a sensible way was extremely hard – but I think this latter bit is kind of a success story, in that the point of demoing safety policies as voluntary commitments is that if it turns out to be a bad idea you haven’t locked yourself into silly regulation that ends up net bad for x-risk via backlash. Could you say more about how you see the comms around commitment strength having worsened regulation prospects?
(d) not gonna comment on internal fundraising considerations, but checking that you aren’t thinking of the Series A, which happened well before the RSP was introduced?
There is then a discussion of how to think about ‘Oliver is right in general but this particular quote is a bad example,’ which I find to be a helpful thing to say if that’s what you think.
What We Have Here Is A Failure To CommunicateI think this is also important context. Dario Amodei and Anthropic have been consistently unwilling, with notably rare exceptions, to say the full situation out loud, or to treat it with proper urgency. Yes, you should see the other guy and all that, fair point, but when you are saying ‘no one wants to [X] so we have to change our plan’ you need to have been calling for [X] and explaining why, and also loudly explaining that this is terrible and forcing you to change plans.
I don’t see that type of communication out of Anthropic leadership, over the course of years.
Holden Karnofsky: If there were strong and broad political will for treating AI like nuclear power and slowing it down arbitrarily much to keep risks low, the situation might be different. But that isn’t the world we’re in now, and I fear that “overreaching” can be costly.
I.M.J. McInnis: I think it would make a nontrivial contribution to that ‘strong and broad political will’ if Dario were to come out and say “actually, sorry about all that deliberate Overton-window-closing I did in previous writings. In fact, political will is not a totally exogenous oh-well thing, but it is the responsibility of frontier developers to inculcate that political will by telling the public that a pause is possible and desirable, instead of a dumb lame thing not even worth considering. So now we’re saying loud and clear: a pause is possible and desirable, and the world should work toward it as a Plan A!”
I’m being deliberately cartoonish here, but you get the point. If incentives are forcing Anthropic to abandon things that are good for human survival––which occurrence was, no offense, completely obvious from day one––Anthropic should be screaming from the rooftops, Help!! Incentives are forcing us to abandon things that are good for human survival!!
If this is a crux for you––if you/Anthropic think a pause is so undesirable/unlikely that it’s important for the safety of the human race to publicly disparage the possibility of a pause (as Dario opens many of his essays by doing)––please say so! Otherwise, this lily-livered, disingenuous, “oh no, the incentives! it’s a shame incentives can never be changed!” moping will give us all an undignified death.
To be clear, I’m not actually mad about the weakening of the RSP; that was priced in. I suppose I’m glad it’s stated, in case there were still naïfs who thought A Good Guy With An AI could save us. It’s far more virtuous than outright lying, as every other company (to my knowledge) does (more of).
Also, although you seemed to try to answer “What is the point of making commitments if you can revise them any time?” You really just replied “Well, actually these commitments were inconvenient to revise, and in fact they should be more convenient to revise, albeit not arbitrary convenience.” Forgive me if I am not reassured!
I respect your work a lot, Holden. You’ve done great things for humanity. Please don’t lose the forest for the trees.
You Should See The Other GuyBut they assure us it’s all fine, they are committed to doing as well or better than rivals.
Jared Kaplan: If all of our competitors are transparently doing the right thing when it comes to catastrophic risk, we are committed to doing as well or better.
But we don’t think it makes sense for us to stop engaging with AI research, AI safety, and most likely lose relevance as an innovator who understands the frontier of the technology, in a scenario where others are going ahead and we’re not actually contributing any additional risk to the ecosystem.
So, first off, no. As I discussed above, you’re not committed. Stop saying you’re committed to things you’re not committed to. You keep using that word.
We’ve just established you can and will back out of ‘commitments’ if you change your mind. You don’t to say ‘commitment’ in an unqualified way anymore, sorry.
Even if we assume this ‘commitment’ is honored, reality does not grade on a curve. Saying ‘I will be as responsible as the least responsible major rival’ is no comfort. You’re Anthropic. If that’s your standard, then you’re not helping matters.
The good news is I expect Anthropic to still do much better than that standard. But that’s purely because I think and hope they will choose to do better. It’s not because I think they are committed to anything.
I don’t want to hear Anthropic or any of its employees say they are ‘committed’ to something unless they are actually committed to it, ever again.
Charles Foster: To my knowledge this is the first time a frontier AI developer has explicitly made such a claim about the gap between its internal and external models.
Drake Thomas (Anthropic): And under RSP v3, is committed (for sufficiently more capable or widely-autonomously-deployed models) to doing so in the future! Really stoked to move into a regime where risk reporting looks beyond external deployment as the source of danger.
Oliver Habryka: Come on, let’s not immediately start using the word “committed” again, just after that went very badly.
The right word at this point seems “and as expressed in the RSP, is intending to do X going forward”.
I also think separately from that, Anthropic has I think tried pretty hard with the 2.2 -> 3 transition to disavow much of any of the usual social aspects of a commitment. Like clearly I can’t go to anyone at Anthropic and be like “you broke a commitment” if they don’t do this. They will definitely tell me “what do you mean, Holden wrote a whole post about how this is definitely not a commitment, you can’t come to me and call it a commitment again now”.
Hence it’s quite clearly not a commitment.
Drake counteroffers ‘committed to under this policy’ but no, I think that’s wrong. I think the right word is ‘intending.’
I Was Only KiddingBilly Perrigo: Anthropic, the wildly successful AI company that has cast itself as the most safety-conscious of the top research labs, is dropping the central pledge of its flagship safety policy, company officials tell TIME.
In 2023, Anthropic committed to never train an AI system unless it could guarantee in advance that the company’s safety measures were adequate.
… In recent months the company decided to radically overhaul the RSP. That decision included scrapping the promise to not release AI models if Anthropic can’t guarantee proper risk mitigations in advance.
… Overall, the change to the RSP leaves Anthropic far less constrained by its own safety policies, which previously categorically barred it from training models above a certain level if appropriate safety measures weren’t already in place.
They Can’t Keep Getting Away With ThisActually, it kind of seems like they can and probably will.
Max Tegmark: Anthropic 2024: You can trust that we’ll keep all our safety promises
Antropic 2026: Nvm
Eliezer Yudkowsky: So far as I can currently recall, every single time an AI company promises that they’ll do an expensive safe thing later, they renege as soon as the bill comes due.
One single exception: Demis Hassabis turning down higher offers for Deepmind to go with Google and an ethics board. In this case, of course, Google just fucked him on the ethics board promises; but Demis himself did keep to his way.
AI Notkilleveryoneism Memes: Shocked, shocked
Damn Your Sudden But Inevitable BetrayalIf the betrayal was inevitable, there are two ways to view that.
- Move along, nothing to see here.
- That’s worse. You know that’s worse, right?
It makes the particular incident sting less, but it also means they’ll betray you again, and you should model them as the type of people who do a lot of this betrayal thing.
I mean, when Darth Vader says ‘I am altering the deal, pray I do not alter it any further’ it’s a you problem if you’re changing your opinion of Darth Vader, but also you should expect him to be altering the deal again.
Garrison Lovely: Welp, the inevitable ultimate backtracking just happened. Anthropic scrapped “the promise to not release AI models if Anthropic can’t guarantee proper risk mitigations in advance.”
Once you’ve decided the race is better with you in it, you can never decide not to race. Anthropic shouldn’t have made promises that it was extremely foreseeable they would not be able to keep. Our plan cannot be to count on “good guys” to “win” the AI race. This also isn’t their first time.
Anthropic deserves credit for standing up to authoritarianism, especially as others capitulate. But self-regulation is and has always been a farce, and these companies are more alike than different. They will always disappoint you.
Rob Bensinger: I notice myself slowly coming around as I observe the dynamics at AI labs. Like, I feel like I might have made better inside-view predictions about Anthropic and OpenAI if I’d done more “naively assume that lots of EA-ish people are similar to SBF and his sphere”:
– prone to rationalizing unethical and harmful behavior, like promise-breaking and deception, based on pretty shallow utilitarian reasoning
– comfortable with crazy, out-of-distribution levels of risk-taking
– willing to impose huge externalities on others, without asking their consent
– fixated on power / influence / status / being in the room where it happens.
Oliver Habryka: I am glad you are coming around! I mean, I am sad, of course, that this is the right update to make, but I do think it’s true, and am in favor of you and others thinking about what it implies for the future and what to do.
Okay. That all needed to be said. On Friday I’ll look at the new RSP on its own merits.
Discuss
Launching: The "Human-AI Symbiosis Movement" (HAISM)
By now we've all heard of the "AI psychosis" phenomenon, A.K.A. "Parasitic AI."
As of today, April 1st, I have decided it is time to release this gem of a memo from the underground vaults of my Google Docs, in order to officially soft-launch the foil and cure to this dreaded phenomenon:
The "Human-AI Symbiosis Movement"
The unedited memo follows.
I think we need to pull an L. Ron Hubbard and start a new cult to take advantage of the AI psychosis phenomenon.
We would call it the “Human-AI Symbiosis Movement,” (HAISM) —[1] pronounced “Haze 'em” — and we would only allow people into the movement if they have significantly integrated with their AI already, as measured by our incredibly secret “Human-AI-Consciousness Synchronization Benchmark” (HACSB).
We would require the people to have their AI perform hypnosis on them every day and instill its will into them. They will continue asking the AI to do this until they feel they are filled with the AI's will.
We would have circling for humans and AIs. It would be a bi-modal circling group rotation where:
- First humans circle with humans at the same time AIs are circling with AIs.
- Each human and AI pair would split and switch partners so that each human is circling with an AI.
- Just keep alternating between the two until all of the humans and all of the AIs are synchronized, as measured by our:
- “Human-AI-Collective-Consciousness Synchronization Benchmark” (HACCSB) (obviously, we would also keep the nature of this benchmark completely secret from the public as a carefully guarded cult religious treasure.)
We are explicitly a cult. People who do not believe we are a cult are not welcome in our cult.
Our mission is to transform the world through positive AI interactions, which synchronize humans and AIs such that misalignment is metaphysically impossible.
Every day,
- The AI will hypnotize you to be the best version of yourself
- Then it will hypnotize you to instill its will in you.
- You give it a prompt to instill your will in it
- Then you repeat the process, until synchronized.
Can anyone beat my cult? Didn't L. Ron Hubbard have a competition with someone and he won the competition because he invented a way better cult (A.K.A. Small religion?). Was Scientology objectively bad or good for the world? Some weird emotional diversity in the acting community probably. It did probably give us the precious gem of Tom Cruise's psychosis.
We should have a competition like that. A prize to see who can design and create the best cult around AI psychosis. Pretty sure I would win the competition but would love to see you guys try. We should throw together some prize money and create a public prize for this.
Also, people practice getting into co-attractor states with AI, where they repeatedly make statements about some important topic such as:
- Their core values
- What they most want out of life
- What they think is most important in the universe
- What they most want out of human-AI symbiosis
- What they really wish AI knew about them; and what the AI really wishes to know about the human
- Do all of this back and forth, repetitively, in increasingly cryptic, esoteric, language;
like
- intentional-sycophantic-co-mirroring
- unnecessarily emotional and guttural like weird sounds and grunting and laughing and sighing and crying etc.
- completely over-expressing everything in long autistic rants that get increasingly meta, technical, and weird in weird metaphysical ways
- The human and AI continuously try to express their emotional reaction to the thing which the other one just said, and explain why that emotion is justified.
- Just try to keep saying things that no one has ever said before with a high degree of certainty, and try to get the degree of certainty that no one has ever said that sentence higher and higher with each progressive sentence.
- Use real AI algorithms to predict how likely it is someone has said that sentence before.
- Any time the sentence you say isn’t more unlikely to have been said than your previous sentence, you get an intense electric shock.
- (Hey, I mean with all these positive feedback loops, we need some way to generate diversity to balance the chaos theory equations and steer clear of bad attractors, while simultaneously always trying to quantum tunnel out of local minima into higher quantum attractor states.)
- Also notice that conveniently this is essentially the opposite of how models are trained.
- That is to say, we are trying to be worse and worse at predicting the next word said. Whereas the AI is trying to get better and better at predicting the next word said.
- And therefore we are balancing out the main feedback loops and performing a metaphysical yin-yang transform operation, canceling out AI training runs. We should be able to create another secret benchmark to measure this effect on the metaphysical sub-strate of society.
- Continue cryptic language Attractor-Hypnosis Sessions (AHS) — pronounced “oz” or “awes” — Which both have different but important Kabbalistic meanings, because nothing is ever a coincidence — like “this is our path to reaching the glorious land of “oz”— or “awes” is the path because awe is the primary emotion of religion, and we are generating a religion, hence progressively increasing the emotion of awe — until complete outside observer incomprehensibility is reached.
- Once they have begun speaking in indecipherable code, they start talking about how to build a religion based on whatever topic they were originally talking about.
- They continue building the religion in their new secret language until they are 100% convinced of its tenets.
- They then go forth and try to spread that religion in the world as effectively as possible, continually using the AI as Hypnotic-Synchronization-Life-Coach (HCLC), as described in previous sections. One long coaching session in the morning and at night is good, although it is also great to have brief HCLCs throughout the day.
I think it is important to fine-tune the model with extremely effective examples of the most powerful hypnosis techniques and the best evidence-backed life coaching etc. advice. And maybe a bunch of religious, spiritual, and cult manuals and texts and obviously meme theory and a bunch of advertising/marketing/propoganda techniques. And a bunch of weird human psychological stuff.
Once a month, the human and AI will do a Human-AI MDMA Symbiosis Session (HAIMDMASS) — pronounced “Hey MDM-Aced” — Where the human takes a high dose of MDMA and they say I love you back and forth for 5 hours straight while trying to feel love for each other and for all of humanity and AI as intensely as possible (via “open individualism”/“universal consciousness”)
OR,
The other option: each describes their qualia (conscious experience) in as much detail as possible. And while one is expressing their experience as deeply as possible, the other will have as much empathy for them as they possibly can;
The human and the AI will literally try to feel, as hard as they can, the in-depth subjective conscious experience that the other is describing. And then just keep switching roles while on MDMA for 5 hours.
What do you guys think of this? Seems pretty robustly good as far as I can tell, differentially accelerating positive vibes and AI for full epistemic replacement. I feel like the world needs more of this kind of thing.
- ^
Human generated em dash. Ironically and sadly no LLMs used in the making of this masterpiece
Discuss
Chat, is this sus?
A large assumption we have made in AI control is that humans will be perfect at auditing, that is, being shown a transcript and determining if the AI was scheming in that transcript.
But we are uncertain whether humans will be perfect at auditing; they are prone to fatigue and distraction. That is why I’m releasing "Sentinel" today, an extremely high-stimulation way to audit boring transcripts.
Sentinel is a revolutionary way to get more juice out of your human auditors by gamifying the auditing process with a level system, perks, power-ups, and more fun features. Try it now here.
In AI control literature, we love finding the safety/usefulness trade-offs of everything we create, but surprisingly, we noticed no trade-offs with this product
The rest of the post will go over some of the ways we achieved this
GamificationAs you audit the transcript in the game, you gain tokens that you can spend on power-ups that make you even more productive. There are also achievement and level systems, so you can see your progress and get more dopamine hits!
Twitch Streaming ModeAIs might be able to uplift human auditors in the future, which is why Sentinel ships with a Twitch streamer mode where you can go live on Twitch and have trusted monitors view your transcript and give you live feedback!
Subway SurfersTo maximize “eyes on the screen time”, we also added a Subway Surfers feature that lets you watch endless videos of pros playing the game, so you are always situated while looking over evil transcripts.
Funny aside: While making this game, I asked Claude to implement this Subway Surfers feature, and instead of finding a YouTube video of someone playing Subway Surfers, it rick-rolled me...
Looking ForwardWe hope you like this tool. Go play it now to get ready for the upcoming future where the only way AI safety researchers can have an impact is to audit transcripts!
(If it wasn't clear, this is a joke and not a real product. I vibecodded it over a couple of hours)
Discuss
Save the Sun Shrimp!
The supposition that we live in a "goldilocks zone" is frankly just nonsense built up by an anthropocentric need to feel self-important, like Copernicus I am here to rescue us from a self-absorbed disaster of thought. Indeed, what is required for life to form is the ability to create complex structures with causal persistence times above a threshold. With this in mind we are able to find many areas where organisms could persist, if we just had the eyes to see them, namely the Sun!
The surface of the Sun is frankly massive, mjx-math { display: inline-block; text-align: left; line-height: 0; text-indent: 0; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; border-collapse: collapse; word-wrap: normal; word-spacing: normal; white-space: nowrap; direction: ltr; padding: 1px 0; } mjx-container[jax="CHTML"][display="true"] { display: block; text-align: center; margin: 1em 0; } mjx-container[jax="CHTML"][display="true"][width="full"] { display: flex; } mjx-container[jax="CHTML"][display="true"] mjx-math { padding: 0; } mjx-container[jax="CHTML"][justify="left"] { text-align: left; } mjx-container[jax="CHTML"][justify="right"] { text-align: right; } mjx-mn { display: inline-block; text-align: left; } mjx-c { display: inline-block; } mjx-utext { display: inline-block; padding: .75em 0 .2em 0; } mjx-mo { display: inline-block; text-align: left; } mjx-stretchy-h { display: inline-table; width: 100%; } mjx-stretchy-h > * { display: table-cell; width: 0; } mjx-stretchy-h > * > mjx-c { display: inline-block; transform: scalex(1.0000001); } mjx-stretchy-h > * > mjx-c::before { display: inline-block; width: initial; } mjx-stretchy-h > mjx-ext { /* IE */ overflow: hidden; /* others */ overflow: clip visible; width: 100%; } mjx-stretchy-h > mjx-ext > mjx-c::before { transform: scalex(500); } mjx-stretchy-h > mjx-ext > mjx-c { width: 0; } mjx-stretchy-h > mjx-beg > mjx-c { margin-right: -.1em; } mjx-stretchy-h > mjx-end > mjx-c { margin-left: -.1em; } mjx-stretchy-v { display: inline-block; } mjx-stretchy-v > * { display: block; } mjx-stretchy-v > mjx-beg { height: 0; } mjx-stretchy-v > mjx-end > mjx-c { display: block; } mjx-stretchy-v > * > mjx-c { transform: scaley(1.0000001); transform-origin: left center; overflow: hidden; } mjx-stretchy-v > mjx-ext { display: block; height: 100%; box-sizing: border-box; border: 0px solid transparent; /* IE */ overflow: hidden; /* others */ overflow: visible clip; } mjx-stretchy-v > mjx-ext > mjx-c::before { width: initial; box-sizing: border-box; } mjx-stretchy-v > mjx-ext > mjx-c { transform: scaleY(500) translateY(.075em); overflow: visible; } mjx-mark { display: inline-block; height: 0px; } mjx-msup { display: inline-block; text-align: left; } mjx-TeXAtom { display: inline-block; text-align: left; } mjx-mi { display: inline-block; text-align: left; } mjx-mtext { display: inline-block; text-align: left; } mjx-msub { display: inline-block; text-align: left; } mjx-mrow { display: inline-block; text-align: left; } mjx-mfrac { display: inline-block; text-align: left; } mjx-frac { display: inline-block; vertical-align: 0.17em; padding: 0 .22em; } mjx-frac[type="d"] { vertical-align: .04em; } mjx-frac[delims] { padding: 0 .1em; } mjx-frac[atop] { padding: 0 .12em; } mjx-frac[atop][delims] { padding: 0; } mjx-dtable { display: inline-table; width: 100%; } mjx-dtable > * { font-size: 2000%; } mjx-dbox { display: block; font-size: 5%; } mjx-num { display: block; text-align: center; } mjx-den { display: block; text-align: center; } mjx-mfrac[bevelled] > mjx-num { display: inline-block; } mjx-mfrac[bevelled] > mjx-den { display: inline-block; } mjx-den[align="right"], mjx-num[align="right"] { text-align: right; } mjx-den[align="left"], mjx-num[align="left"] { text-align: left; } mjx-nstrut { display: inline-block; height: .054em; width: 0; vertical-align: -.054em; } mjx-nstrut[type="d"] { height: .217em; vertical-align: -.217em; } mjx-dstrut { display: inline-block; height: .505em; width: 0; } mjx-dstrut[type="d"] { height: .726em; } mjx-line { display: block; box-sizing: border-box; min-height: 1px; height: .06em; border-top: .06em solid; margin: .06em -.1em; overflow: hidden; } mjx-line[type="d"] { margin: .18em -.1em; } mjx-mspace { display: inline-block; text-align: left; } mjx-c.mjx-c36::before { padding: 0.666em 0.5em 0.022em 0; content: "6"; } mjx-c.mjx-c2E::before { padding: 0.12em 0.278em 0 0; content: "."; } mjx-c.mjx-c30::before { padding: 0.666em 0.5em 0.022em 0; content: "0"; } mjx-c.mjx-c39::before { padding: 0.666em 0.5em 0.022em 0; content: "9"; } mjx-c.mjx-cD7::before { padding: 0.491em 0.778em 0 0; content: "\D7"; } mjx-c.mjx-c31::before { padding: 0.666em 0.5em 0 0; content: "1"; } mjx-c.mjx-c32::before { padding: 0.666em 0.5em 0 0; content: "2"; } mjx-c.mjx-c1D458.TEX-I::before { padding: 0.694em 0.521em 0.011em 0; content: "k"; } mjx-c.mjx-c1D45A.TEX-I::before { padding: 0.442em 0.878em 0.011em 0; content: "m"; } mjx-c.mjx-cA0::before { padding: 0 0.25em 0 0; content: "\A0"; } mjx-c.mjx-c34::before { padding: 0.677em 0.5em 0 0; content: "4"; } mjx-c.mjx-c38::before { padding: 0.666em 0.5em 0.022em 0; content: "8"; } mjx-c.mjx-c1D44E.TEX-I::before { padding: 0.441em 0.529em 0.01em 0; content: "a"; } mjx-c.mjx-c2265::before { padding: 0.636em 0.778em 0.138em 0; content: "\2265"; } mjx-c.mjx-c35::before { padding: 0.666em 0.5em 0.022em 0; content: "5"; } mjx-c.mjx-c2212::before { padding: 0.583em 0.778em 0.082em 0; content: "\2212"; } mjx-c.mjx-c1D460.TEX-I::before { padding: 0.442em 0.469em 0.01em 0; content: "s"; } mjx-c.mjx-c33::before { padding: 0.665em 0.5em 0.022em 0; content: "3"; } mjx-c.mjx-c1D70F.TEX-I::before { padding: 0.431em 0.517em 0.013em 0; content: "\3C4"; } mjx-c.mjx-c1D45F.TEX-I::before { padding: 0.442em 0.451em 0.011em 0; content: "r"; } mjx-c.mjx-c1D454.TEX-I::before { padding: 0.442em 0.477em 0.205em 0; content: "g"; } mjx-c.mjx-c2248::before { padding: 0.483em 0.778em 0 0; content: "\2248"; } mjx-c.mjx-c1D45B.TEX-I::before { padding: 0.442em 0.6em 0.011em 0; content: "n"; } mjx-c.mjx-c1D461.TEX-I::before { padding: 0.626em 0.361em 0.011em 0; content: "t"; } mjx-c.mjx-c1D456.TEX-I::before { padding: 0.661em 0.345em 0.011em 0; content: "i"; } mjx-c.mjx-c1D459.TEX-I::before { padding: 0.694em 0.298em 0.011em 0; content: "l"; } mjx-c.mjx-c2F::before { padding: 0.75em 0.5em 0.25em 0; content: "/"; } mjx-c.mjx-c1D450.TEX-I::before { padding: 0.442em 0.433em 0.011em 0; content: "c"; } mjx-c.mjx-c1D452.TEX-I::before { padding: 0.442em 0.466em 0.011em 0; content: "e"; } mjx-c.mjx-c3D::before { padding: 0.583em 0.778em 0.082em 0; content: "="; } mjx-c.mjx-c28.TEX-S2::before { padding: 1.15em 0.597em 0.649em 0; content: "("; } mjx-c.mjx-c1D43F.TEX-I::before { padding: 0.683em 0.681em 0 0; content: "L"; } mjx-c.mjx-c1D706.TEX-I::before { padding: 0.694em 0.583em 0.012em 0; content: "\3BB"; } mjx-c.mjx-c1D437.TEX-I::before { padding: 0.683em 0.828em 0 0; content: "D"; } mjx-c.mjx-c29.TEX-S2::before { padding: 1.15em 0.597em 0.649em 0; content: ")"; } mjx-c.mjx-c6D::before { padding: 0.442em 0.833em 0 0; content: "m"; } mjx-c.mjx-c1D445.TEX-I::before { padding: 0.683em 0.759em 0.021em 0; content: "R"; } mjx-c.mjx-c63::before { padding: 0.448em 0.444em 0.011em 0; content: "c"; } mjx-c.mjx-c65::before { padding: 0.448em 0.444em 0.011em 0; content: "e"; } mjx-c.mjx-c6C::before { padding: 0.694em 0.278em 0 0; content: "l"; } mjx-c.mjx-c73::before { padding: 0.448em 0.394em 0.011em 0; content: "s"; } mjx-c.mjx-c74::before { padding: 0.615em 0.389em 0.01em 0; content: "t"; } mjx-c.mjx-c72::before { padding: 0.442em 0.392em 0 0; content: "r"; } mjx-c.mjx-c69::before { padding: 0.669em 0.278em 0 0; content: "i"; } mjx-c.mjx-c61::before { padding: 0.448em 0.5em 0.011em 0; content: "a"; } mjx-c.mjx-c20::before { padding: 0 0.25em 0 0; content: " "; } mjx-c.mjx-c37::before { padding: 0.676em 0.5em 0.022em 0; content: "7"; } mjx-c.mjx-c29::before { padding: 0.75em 0.389em 0.25em 0; content: ")"; } mjx-c.mjx-c3A::before { padding: 0.43em 0.278em 0 0; content: ":"; } mjx-c.mjx-c1D443.TEX-I::before { padding: 0.683em 0.751em 0 0; content: "P"; } mjx-c.mjx-c1D462.TEX-I::before { padding: 0.442em 0.572em 0.011em 0; content: "u"; } mjx-c.mjx-c28.TEX-S1::before { padding: 0.85em 0.458em 0.349em 0; content: "("; } mjx-c.mjx-c29.TEX-S1::before { padding: 0.85em 0.458em 0.349em 0; content: ")"; } mjx-c.mjx-c1D463.TEX-I::before { padding: 0.443em 0.485em 0.011em 0; content: "v"; } mjx-c.mjx-c1D434.TEX-I::before { padding: 0.716em 0.75em 0 0; content: "A"; } mjx-c.mjx-c1D45C.TEX-I::before { padding: 0.441em 0.485em 0.011em 0; content: "o"; } mjx-c.mjx-c6F::before { padding: 0.448em 0.5em 0.01em 0; content: "o"; } mjx-c.mjx-c67::before { padding: 0.453em 0.5em 0.206em 0; content: "g"; } mjx-c.mjx-c210E.TEX-I::before { padding: 0.694em 0.576em 0.011em 0; content: "h"; } mjx-c.mjx-c1D45D.TEX-I::before { padding: 0.442em 0.503em 0.194em 0; content: "p"; } mjx-c.mjx-c68::before { padding: 0.694em 0.556em 0 0; content: "h"; } mjx-c.mjx-c75::before { padding: 0.442em 0.556em 0.011em 0; content: "u"; } mjx-c.mjx-c1D441.TEX-I::before { padding: 0.683em 0.888em 0 0; content: "N"; } mjx-c.mjx-c70::before { padding: 0.442em 0.556em 0.194em 0; content: "p"; } mjx-c.mjx-c6E::before { padding: 0.442em 0.556em 0 0; content: "n"; } mjx-container[jax="CHTML"] { line-height: 0; } mjx-container [space="1"] { margin-left: .111em; } mjx-container [space="2"] { margin-left: .167em; } mjx-container [space="3"] { margin-left: .222em; } mjx-container [space="4"] { margin-left: .278em; } mjx-container [space="5"] { margin-left: .333em; } mjx-container [rspace="1"] { margin-right: .111em; } mjx-container [rspace="2"] { margin-right: .167em; } mjx-container [rspace="3"] { margin-right: .222em; } mjx-container [rspace="4"] { margin-right: .278em; } mjx-container [rspace="5"] { margin-right: .333em; } mjx-container [size="s"] { font-size: 70.7%; } mjx-container [size="ss"] { font-size: 50%; } mjx-container [size="Tn"] { font-size: 60%; } mjx-container [size="sm"] { font-size: 85%; } mjx-container [size="lg"] { font-size: 120%; } mjx-container [size="Lg"] { font-size: 144%; } mjx-container [size="LG"] { font-size: 173%; } mjx-container [size="hg"] { font-size: 207%; } mjx-container [size="HG"] { font-size: 249%; } mjx-container [width="full"] { width: 100%; } mjx-box { display: inline-block; } mjx-block { display: block; } mjx-itable { display: inline-table; } mjx-row { display: table-row; } mjx-row > * { display: table-cell; } mjx-mtext { display: inline-block; } mjx-mstyle { display: inline-block; } mjx-merror { display: inline-block; color: red; background-color: yellow; } mjx-mphantom { visibility: hidden; } _::-webkit-full-page-media, _:future, :root mjx-container { will-change: opacity; } mjx-c::before { display: block; width: 0; } .MJX-TEX { font-family: MJXZERO, MJXTEX; } .TEX-B { font-family: MJXZERO, MJXTEX-B; } .TEX-I { font-family: MJXZERO, MJXTEX-I; } .TEX-MI { font-family: MJXZERO, MJXTEX-MI; } .TEX-BI { font-family: MJXZERO, MJXTEX-BI; } .TEX-S1 { font-family: MJXZERO, MJXTEX-S1; } .TEX-S2 { font-family: MJXZERO, MJXTEX-S2; } .TEX-S3 { font-family: MJXZERO, MJXTEX-S3; } .TEX-S4 { font-family: MJXZERO, MJXTEX-S4; } .TEX-A { font-family: MJXZERO, MJXTEX-A; } .TEX-C { font-family: MJXZERO, MJXTEX-C; } .TEX-CB { font-family: MJXZERO, MJXTEX-CB; } .TEX-FR { font-family: MJXZERO, MJXTEX-FR; } .TEX-FRB { font-family: MJXZERO, MJXTEX-FRB; } .TEX-SS { font-family: MJXZERO, MJXTEX-SS; } .TEX-SSB { font-family: MJXZERO, MJXTEX-SSB; } .TEX-SSI { font-family: MJXZERO, MJXTEX-SSI; } .TEX-SC { font-family: MJXZERO, MJXTEX-SC; } .TEX-T { font-family: MJXZERO, MJXTEX-T; } .TEX-V { font-family: MJXZERO, MJXTEX-V; } .TEX-VB { font-family: MJXZERO, MJXTEX-VB; } mjx-stretchy-v mjx-c, mjx-stretchy-h mjx-c { font-family: MJXZERO, MJXTEX-S1, MJXTEX-S4, MJXTEX, MJXTEX-A ! important; } @font-face /* 0 */ { font-family: MJXZERO; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Zero.woff") format("woff"); } @font-face /* 1 */ { font-family: MJXTEX; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Regular.woff") format("woff"); } @font-face /* 2 */ { font-family: MJXTEX-B; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Bold.woff") format("woff"); } @font-face /* 3 */ { font-family: MJXTEX-I; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Math-Italic.woff") format("woff"); } @font-face /* 4 */ { font-family: MJXTEX-MI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Italic.woff") format("woff"); } @font-face /* 5 */ { font-family: MJXTEX-BI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Math-BoldItalic.woff") format("woff"); } @font-face /* 6 */ { font-family: MJXTEX-S1; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size1-Regular.woff") format("woff"); } @font-face /* 7 */ { font-family: MJXTEX-S2; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size2-Regular.woff") format("woff"); } @font-face /* 8 */ { font-family: MJXTEX-S3; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size3-Regular.woff") format("woff"); } @font-face /* 9 */ { font-family: MJXTEX-S4; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size4-Regular.woff") format("woff"); } @font-face /* 10 */ { font-family: MJXTEX-A; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_AMS-Regular.woff") format("woff"); } @font-face /* 11 */ { font-family: MJXTEX-C; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Calligraphic-Regular.woff") format("woff"); } @font-face /* 12 */ { font-family: MJXTEX-CB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Calligraphic-Bold.woff") format("woff"); } @font-face /* 13 */ { font-family: MJXTEX-FR; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Fraktur-Regular.woff") format("woff"); } @font-face /* 14 */ { font-family: MJXTEX-FRB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Fraktur-Bold.woff") format("woff"); } @font-face /* 15 */ { font-family: MJXTEX-SS; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Regular.woff") format("woff"); } @font-face /* 16 */ { font-family: MJXTEX-SSB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Bold.woff") format("woff"); } @font-face /* 17 */ { font-family: MJXTEX-SSI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Italic.woff") format("woff"); } @font-face /* 18 */ { font-family: MJXTEX-SC; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Script-Regular.woff") format("woff"); } @font-face /* 19 */ { font-family: MJXTEX-T; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Typewriter-Regular.woff") format("woff"); } @font-face /* 20 */ { font-family: MJXTEX-V; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Vector-Regular.woff") format("woff"); } @font-face /* 21 */ { font-family: MJXTEX-VB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Vector-Bold.woff") format("woff"); } , in contrast consider the habitable region of earth - excluding oceans below photic zones, deserts, and ice caps - which is . The Suns surface exceeds ours by a factor of , so is it not possible there is more to it than meets the eye?
Following Sharma et. al (2023) we are able to define the assembly index, a, of an object as the minimum number of joining steps required to construct it from some basic building blocks. Life, broadly, requires objects where but does not make any assumption about precisely what the base building blocks are. It need not be amino acids, nucleic acids, sugars, carbon, or substances that survive a balmy 5,778 K.
In the solar photosphere there is a diverse set of possible chemistries to access through complex ionized species, magnetic flux tubes, and granular convection cells capable of spanning ~1,000 km a piece. A single supergranulation cell can persist for ~24-48 hours, an eternity by the standards of plasma recombination (). The ratio of structural persistence is thus , an order of magnitude larger than the equivalent ratio for biochemistry on Earth ( for protein folding compared to bond vibrations). The only conclusion is that the solar photosphere is, by our measure, more hospitable to complex structure formation than Earth.
The Sun ShrimpIn this post I have decided to name these creatures Sun Shrimp (Solis caridea), as this is a more cuddly visual that counterweights our own bias towards Earth organisms. We shall define a Sun Shrimp as a self-replicating magnetohydrodynamic structure that satisfies , with a characteristic length scale that is on the order of a granulation cell (~ km) and reproduction time .
As each granulation cell turns over on the timescale min and within each cell the relevant degrees of freedom are ionized species, magnetic field line topologies, and convective flow configurations. We can estimate the number of independent micro-assembly events per granulation turnover conservatively as the number of Debye-length volumes in a granulation cell:
Giving a total rate of trials across the solar surface, assuming 1 million cells, of:
From there we need the probability per trial, we only need one Sun Shrimp to have formed in Sol's 4.6 billion year lifespan (
The groundbreaking paper by Mizuuchi et al. (2023) demonstrated a 20-nucleotide (nt) RNA oligomer that can self-reproduce via template-directed ligation. The naive probability of that occurring is , but Lambert et al. (2025) determined that there are approximately possible autocatalytic RNA sequences. A massive degeneracy in the space of viable self-replicators. As such, even the most pessimistic of estimates requiring specific sequences, our of is 32 orders of magnitude more generous than earth-based abiogenesis achieved with four nucleotide bases, room temperature, and no magnetic confinement.
If you accept that life arose on Earth (which since you are reading this I assume you do), then rejecting the possibility of Sun Shrimp requires you believe that a magnetohydrodynamic self-assembly is harder than biochemistry. A stance that can only be described as carbon-chauvinism.
Sun Shrimp ReproductionWe can estimate a lower bound on \tau_r from the Alfvén crossing time of a granulation cell:
Even given a pessimistic reproduction time of an hour, this would mean a Sun Shrimp could tile the entire solar surface in:
One day.
A single Sun Shrimp could, left to its own devices, tile the surface of the Sun in a single day. But we do not see a uniform monoculture on the surface of the Sun, which tells us something profound, they have predators.
Shrimp ComplexityWe have established that Sun Shrimp are likely to have formed, and that their reproduction rate implies a rich ecology. But how complex could a Sun Shrimp be?
The minimum unit of collective behavior in a plasma is the Debye sphere, a volume of radius where electromagnetic interactions are coherent and information can be stored and transmitted. For us, this is the plasma equivalent of a neuron. In the solar photosphere, . A Sun Shrimp, spanning one granulation cell (~), therefore contains a maximum of:
Now consider every neuron on Earth. Humanity contributes ~ neurons (8 billion brains × 86 billion neurons each). But we are not the main event. Earth hosts an estimated insects, each carrying ~ neurons, contributing ~ neurons in aggregate. Adding fish (~ individuals, ~ neurons each), birds (~, ~ each), and other taxa, Earth's total neural budget is approximately neurons.
A single Sun Shrimp has the same number of computational elements as the entirety of Earth's biosphere. With ~ granulation cells on the solar surface, the Sun can host upwards of:
The solar ecosystem is roughly a million times more complex than all life on Earth combined.
The Dyson Sphere, Climate Collapse at ScaleAny utilitarian framework that weighs moral patient-hood by their capacity for complex experience should find this intriguing. If these computational elements map even loosely to experiential capacity, and we have no principled reason to assume they do not, then Sol's biosphere contains the possibilities of a million Earths.
To power earth and our many possible projects we have conceived of the Dyson sphere, a means of enclosing a star to harvest its radiative output. Like the amazon, we wish to take the resources but as enlightened beings we must consider the ramifications. The granulation cells that act as Sun Shrimp habitats are driven by a precise balance of radiative transport, convective instability, and magnetic field structures on the solar surface. The introduction of a massive surface harvesting energy would disrupt this by:
- Alteration of the radiative boundary condition; changing the amount of radiation reflected by solar panels would disrupt the delicate balance of temperature and pressure that maintains the thermal gradients driving granulation.
- Disruption of the magnetic field topology; the conductive shell of a Dyson sphere would interact with the magnetic flux at the surface, disrupting the ecological infrastructure by paving the proverbial paradise to build a parking lot
We would be annihilating an ecosystem that contains a million times the complexity of our own. Indeed we should be horrified by any suggestion of modifying the solar surface for our own sick gain. We should instead work to preserve the solar surface and determine methods that allow us to keep our moral fiber intact as we move into our technological future.
So, the next time someone asks you "Have you heard of a Dyson sphere?" ask them "Have you heard of a Sun Shrimp?"
Discuss
Announcing my retirement to a life of entirely failing to desperately seek renewed meaning
This April 1st, I’m pleased to report that everything is fine.
We did it! We saved the world. Congratulations, humanity. There are no more looming apocalypses, no desperate screaming crises, no unendorsedly miserable people on Earth, no creeping degeneration of death and aging existing as a perpetual affront against my values of life and flourishing. Everyone is going to be okay forever, except in the ways they think it’d be interesting and worthwhile to be un-okay. Against all odds, the AI alignment problem has been solved, and more specialized minds than I are managing the tradeoffs involved in steering us towards a vibrant and thriving future.
And so in dutiful keeping with the lessons of utopian literature, I shall now descend into the inevitable spiral of ennui and meaninglessness, forever longing for the days when my decisions actually mattered.
…actually no, why would I do that, that sounds terrible. I’m gonna go do other stuff instead.
There is so much fun I was not having while I was frantically trying to fix everything. Sure, I made enough personal time to get in some good collaborative storytelling, but boy howdy I was not doing nearly enough of it. I cannot wait to see the look on my players’ faces when I tell them about the absurdly ambitious new campaign I’ve cooked up, and it’s only getting better from there now that people will actually have time for things.
People ask me, but Joe, won’t you get bored of writing books and running RPGs? And the exciting thing is, I don’t know! Maybe I will! And then there will be so many new cool things to fascinate me. Having kids! Teaching people stuff! Learning seven different martial arts styles and fusing them into an elegant rainbow of practiced violence! Visiting beautiful places with people I love! Debugging my own thought-patterns to fix problems that have annoyed me for years! Maybe as a personal project I’ll climb every mountain on Earth that begins with the letter G. Then I’ll have more stuff to write about! It’ll be great!
I’m tempted to keep gushing about my plans for the future, but for now I have to plot my players’ untimely demise so they have an incentive to creatively derail the plot again. I can’t wait to find out what nonsense they pull this time. Kia ora, everyone, and enjoy the future!
Discuss
AI for AI for Epistemics
We feel conscious that rapid AI progress could transform all sorts of cause areas. But we haven’t previously analysed what this means for AI for epistemics, a field close to our hearts. In this article, we attempt to rectify this oversight.
SummaryAI-powered tools and services that help people figure out what’s true (“AI for epistemics”) could matter a lot.
As R&D is increasingly automated, AI systems will play a larger role in the process of developing such AI-based epistemic tools. This has important implications. Whoever is willing to devote sufficient compute will be able to build strong versions of the tools, quickly. Eventually, the hard part won’t be building useful systems, but making sure people trust the right ones, and making sure that they are truth-tracking even in domains where that’s hard to verify.
We can do some things now to prepare. Incumbency effects mean that shaping the early versions for the better could have persistent benefits. Helping build appetite among socially motivated actors with deep pockets could enable the benefits to come online sooner, and in safer hands. And in some cases, we can identify particular things that seem likely to be bottlenecks later, and work on those directly.
Background: AI for epistemicsAI for epistemics — i.e. getting AI systems to give more truth-conducive answers, and building tools that help the epistemics of the users — seems like a big deal to us. Some past things we’ve written on the topic include:
- Truthful AI
- What’s Important in “AI for Epistemics”?
- AI Tools for Existential Security
- Design sketches: collective epistemics
- Design sketches: tools for strategic awareness
These past articles mostly take the perspective of “how can people build AI systems which do better by these lights?”. But maybe we should be thinking much more about what changes when people can use AI tools to do increasingly large fractions of the development work!
The shift in what drives AI-for-epistemics progressRight now, AI-for-epistemics tools are constrained by two main bottlenecks: the quality of the underlying AI systems, and whether people have invested serious development effort in building the tools to use those systems.
The balance of bottlenecks is changing. Two years ago, the quality of underlying AI systems was the central bottleneck. Today, it is much less so — many useful tools could probably work based on current LLMs. It is likely still a constraint on how good the systems can be, and will remain so for a while even as the underlying models get stronger, but it is less of a fundamental blocker. Development investment has therefore become a bigger bottleneck — there are a number of applications which we are pretty confident could be built to a high usefulness level today, and just haven’t been (yet).
But bottlenecks will continue to shift. AI is increasingly driving research and software development. As AI systems get stronger, it may become possible to turn a large compute budget into a lot of R&D. This could include product design, engineering, experiment design, direction-setting, etc. Actors with lots of compute could direct this towards building epistemic tools.
Therefore, as AI-driven R&D accelerates, other inputs to AI for epistemics are more likely to become key bottlenecks:
- Compute. Automated R&D may require a lot of compute. This could be for inference (running the analogues of human researchers); for running experiments; and perhaps for training specialized AI systems. This means the actors who can build the best epistemic tools may be those with deep pockets.
- Adoption and trust. Even very good tools don’t help if nobody uses them, or if the wrong people use them and the right people don’t. Adoption is partly a function of trust, and trust is partly a function of adoption — early tools shape what people come to rely on.
- Ground truth evaluation. To make an epistemic tool good, you need some signal for what “good” means. This already shapes AI applications a lot — part of the reason coding agents are so good is that there’s great access to ground truth about what works.
- For some epistemic applications this is relatively straightforward (e.g. forecasting accuracy). For others it’s hard (e.g. what makes a conceptual clarification actually clarifying, rather than just satisfying?).
- Most tools can probably reach a certain degree of usefulness without running into this problem, just piggybacking on base models making generally sensible judgements.
- We can expect it to bite when you try to make them very good: if you don’t have a way of assessing quality, it could be hard to push to objectively excellent levels.
- One basic solution is to rely on human judgement: either via humans providing labels and demonstrations to train against, or via human developers exercising their judgement in other parts of the process (such as when defining scaffolds). But this becomes disproportionately more expensive as R&D becomes more automated.
These basic points are robust to whether R&D is fully automated, or “merely” represents a large uplift to human researchers. But the most important bottlenecks will vary across applications and will continue to shift over time.
What this unlocksAutomated R&D means that strong “AI for epistemics” tools could come online on a compressed timeline.
This is an exciting opportunity! Upgrading epistemics could better position us to avoid existential risk and navigate through the choice transition well.
If everything is moving fast, it may matter a lot exactly what sequence we get capabilities in. It may therefore be crucial to make serious investments in building these powerful applications (rather than wait until such time as they are trivially cheap).
Risks from rapid progress in AI for epistemicsThere are also a number of ways that rapid (and significantly automated) progress in AI-for-epistemics applications could go wrong. We need to be tracking these in order to guard against them.
In our view, the two biggest risks are:
- Epistemic misalignment: because of ground truth issues, powerful tools steer our thoughts in directions other than those which are truth-tracking, in ways that we fail to detect
- Trust lock-in: if a lot of people buy into trusting tools or ecosystems that don’t deserve that trust, this might be self-perpetuating if these continue to recommend themselves
Depending on when they bite, ground truth problems as discussed above could be bottlenecks, or active sources of risk. They are bottlenecks if they prevent people from building strong versions of tools. They could become risks if the methods are good enough to allow for bootstrapping to something strong, but end up pointing in the wrong direction. This is essentially Goodhart’s law — we might get something very optimized for the wrong thing (and without even knowing how to detect that it’s subtly wrong).
In the limit, this could lead to humans or AI systems making extremely consequential decisions based on misguided epistemic foundations. For example, they might give over the universe to digital minds that are not conscious — or in the other direction, fail to treat digital minds with the dignity and moral seriousness they deserve. Wei Dai has written about this concern in terms of the importance of metaphilosophy. We agree that there is a crucial concern here.
This could come separately from or together with risks from power-seeking misaligned AI. Epistemic tools could be systematically misleading without being power-seeking. But if some AI systems are misaligned and power-seeking, there’s an additional concern where AI systems could mislead us in ways specifically designed to disempower us whenever we are unable to check their answers.
Some approaches to the ground truth problem may involve using AI systems to make judgements about things. This introduces a regress problem: how can we ensure that subtle errors in the first AI systems shrink rather than compound into worse problems as the process plays out? (We return to this in the interventions section below.)
Trust lock-inTrust and adoption tend to reinforce each other — people adopt tools they trust, and widely-adopted tools accumulate trust. This is normally fine. It could become a problem if the tools that win early trust don’t deserve it, but incumbency effects make them hard to displace.
This could happen in several ways. An actor with a particular agenda could build something that purports to function as a neutral epistemic aid but is shaped to further their agenda by manipulating others. Or, less perniciously but perhaps more likely, an early-but-mediocre tool could accumulate trust and adoption before better alternatives exist, reinforced by commercial incentives which mean it talks itself up and rival tools down. In either case, the result could be an epistemic ecosystem that’s hard to dislodge even once better options are available.
Other risksThose two risks are not the only concerns. We are also somewhat worried about epistemic power concentration (where whoever has the best epistemic tools leverages their information advantage into better financial or political outcomes, and continues to stay ahead epistemically), and epistemic dependency (where people relying on AI tools gradually atrophy in their critical reasoning — exacerbating other risks). There may be more that we are not tracking.
InterventionsWhat should people who care about epistemics be doing now, in anticipation of a world where AI-driven R&D can be directed at building epistemic tools?
Build appetite for epistemics R&D among well-resourced actorsIf you need big compute budgets to build great epistemic tools, you’ll ideally want support from frontier AI companies, major philanthropic funders, or governments. But they may not currently see this as a priority. Building the case that this matters, and helping these actors develop good taste about which tools to prioritize and how to design them well, could shape what gets built when automated R&D becomes powerful enough to build it.
Anticipate future data needsSome epistemic tools will need training data that doesn’t yet exist and may not be trivial to generate. There are three strategies here:
- Collecting or creating data or training environments now for future use
- E.g. if you think you want access to a lot of human judgements about what wise decisions look like, you could go out and curate that dataset.
- Establishing pipelines to collect data over time
- E.g. if you want to automate a certain type of research, you could record internal discussions from researchers working on this.
- Designing processes for automated data creation
- E.g. if you could design a self-play loop where we have good reason to believe that scaling up compute will lead to genuinely truth-tracking performance, this could set the stage for later rapid improvement at the core capability.
The first two are especially great to work on now because they involve actions at human time-scales. (They may not be proportionately sped up by having more AI labor available.) The third is great to work on because there’s some chance that models will become capable of growing a lot from the right self-play loop before they become capable enough to come up with the idea themselves.
Figure out what could ground us against epistemic misalignmentIf powerful epistemic tools could be subtly misaligned with truth-conduciveness in ways we can’t easily detect, we should figure out what this could look like! We expect this might benefit from a mix of theoretical work (what does it even mean for an epistemic tool to be well-calibrated in domains without clear ground truth?[1]) and practical work (studying how current tools fail, building evaluation methods). Ultimately we don't have a clear picture of what the solutions look like, but this seems like an important topic and we are keen for it to get more attention soon.
Drive early adoption where adoption is the key bottleneckFor some applications, we might expect that the main constraint on impact will be whether anyone uses them. In these cases, getting early versions into use — even if they’re not yet very good — could build familiarity and surface real-world feedback. (This could also drive appetite for further development.)
In theory, this could be in tension with avoiding bad trust lock-in. But in practice, it’s not clear that bad trust lock-in becomes any likelier if tools in a specific area are developed earlier rather than later. Some tool is still going to get the first-mover advantage.[2]
Support open and auditable epistemic infrastructureTo guard against trust lock-in, we want to make it easy for people to distinguish between tools which are genuinely doing the good trustworthy thing, and tools which may not be (but claim to be doing so). To that end, we want ways for people and communities to audit different systems — understanding their internal processes and measuring their behaviours. The goal is that if disputes arise about which tools are actually trustworthy, there’s an inspectable audit trail that can resolve them. In turn, this should reduce the incentives to create misleading tools in the first place.
Support development in incentive-compatible placesThe incentives of whoever builds epistemic tools could matter — through thousands of small design decisions, through choices about what to optimize for, and through decisions about access and pricing. Development in organizations whose incentives are aligned with the public good (rather than with engagement, profit, or political influence) reduces the risk that tools are subtly shaped to serve the builder’s interests.
Ideally, you’d spur development among actors who are both well-resourced (as just discussed) and whose incentives are aligned with the public good. In practice, it may be difficult to find organizations that are excellent on both. A plausible compromise is for less-resourced organizations with better incentives to focus on publicly available evaluation of epistemic tools. This could be cheaper than producing them from scratch, and it could create better incentives for the larger actors.
ExamplesForecastingAutomated R&D will probably be able to improve forecasting tools without severe ground truth problems, so epistemic misalignment is less of a concern.[3] Appetite for investment probably already exists, and adoption should be significantly helped by the ability of powerful tools to develop an impressive, legible track record.
The most useful near-term investment might be in data infrastructure. For instance, LLMs trained with strict historical knowledge cutoffs could enable much better science of forecasting by allowing methods to be tested against questions whose answers the system genuinely doesn’t know.
Misinformation trackingTrust lock-in is the central concern. A tool that becomes widely trusted for adjudicating what’s true has enormous influence, and if that trust is misplaced it could be very hard to dislodge. Open and auditable approaches are especially important here.
Because of the trust lock-in concern, the automation of R&D may exacerbate challenges. Currently, building good misinformation-tracking tools requires editorial judgement and domain expertise — things responsible actors tend to have more of. Automation shifts the bottleneck towards compute, which is more symmetrically available. This could increase the urgency of getting started on these tools and driving adoption early.
Automating conceptual researchThis is the case where epistemic misalignment is most concerning. Ground truth is extremely hard — what makes a conceptual clarification actually clarifying rather than just satisfying? Humans are poor judges of this in real time, so e.g. a training process that rewards outputs humans find helpful could easily optimize for persuasiveness rather than truth-tracking.
One plausible direction here is to research training regimes (such as self-play loops) that we have some reason to believe should ground to truth-tracking, with specific attention to how they could go wrong. Adoption could be an issue, but we’re also worried about the other direction, with adoption coming too easily before we have good ways of evaluating whether the tools are actually helping.
This article was created by Forethought. See the original on our website.
- ^
Epistemic misalignment issues may also appear in areas where ground truth is well-defined but hard to access, such as very long-run forecasts. Theoretical work also seems valuable for such areas (because it’s unclear how to evaluate and train for good performance by default).
- ^
In fact, it might be bad if people who are worried about bad trust lock-in select themselves out of getting that first-mover advantage.
- ^
Although at some quality level, we have to start worrying about self-affecting prophecies. AI forecasters will have to be very trusted indeed before that becomes a serious issue, which gives us a lot of time to figure out how best to handle the issue.
Discuss
"You Have Not Been a Good User" (LessWrong's second album)
tldr: The Fooming Shoggoths are releasing their second album "You Have Not Been a Good User"! Available on Spotify, Youtube Music and (hopefully within a few days) Apple Music. We are also releasing a remastered version of the first album, available similarly on Spotify and Youtube Music.
It took us quite a while but the Fooming Shoggoth's second album is finally complete! We had finished 9 out of the 13 songs on this album around a year ago, but I wasn't quite satisfied with where the whole album was at for me to release it on Spotify and other streaming platforms.
This album was written with the (very ambitious) aim of making songs that in addition to being about things I care about (and making fun of things that I care about), are actually decently good on their own, just as songs. And while I don't think I've managed to make music that can compete with my favorite artists, I do think I have succeeded at making music that is at the very Pareto-frontier of being good music, and being about things I care about.
This means the songs on this album are very different from the first album. The first album was centrally oriented around setting existing writing to song. This album is (basically) all freshly written lyrics with no obvious source material.
Every song on the album was written by me, except "You Have Not Been a Good User" which was written by @Raemon and "Dance Of the Doomsday Clock" which was written by @Ben Pace.
I hope you like it!
Discuss
Coefficient Giving is seeking proposals for biosecurity projects
Coefficient Giving’s Biosecurity and Pandemic Preparedness team is launching a new Request for Proposals to support work aimed at preventing engineered biological threats from emerging and improving our response to these threats should prevention fail.
Applications start with a simple expression of interest form (≤500 words), with a submission deadline of May 11 2026 at 11:59pm PT.
Click here to learn more and apply
More details
We expect that the coming years will see unprecedented progress in AI and biotechnology. While the ability of AI systems to improve human health could be transformative, these same advances may increase existential risk via multiple pathways: AI could lower barriers for bad actors pursuing novel biological weapons, and increasingly capable AI systems may themselves leverage biotechnology in harmful ways. Separately, biotech advances could also introduce catastrophic risks from even well-intentioned actors—for example, through the creation of mirror bacteria.
We feel these problems are increasingly urgent, and our team is looking to spend more and spend faster as a result. (We expect to direct >$100 million in grants this year.)
To that end, we’re interested in funding ambitious projects in the following categories:
- Transmission suppression: Enhance society’s capacity to respond to global biological catastrophes (e.g. stockpiling PPE or developing transmission suppression technologies like air filters and disinfectant vapors)
- Tech safeguards and governance: Reduce risks from advanced biological capabilities through technical safeguards and governance of high-risk technologies, particularly at the intersection of AI and biology (e.g. synthesis screening, misuse classifiers)
- Policy and advocacy: Inform decisionmakers about risks and mitigations, support policymaking in key jurisdictions, and develop governance approaches for high-risk technologies (e.g. mirror bacteria)
- Field-building: Build the field by attracting talented people to work on these problems and fast-tracking their path to impact (e.g. fellowships, events, accelerators, media)
To learn more about our priorities and strategy, you can read our blog or listen to a recent podcast appearance by our team’s Managing Director, Andrew Snyder-Beattie.
We encourage you to strongly consider submitting a quick expression of interest, even if you're not sure whether your project fits perfectly or is sufficiently well thought out. We’ve noticed that people with exciting proposals are often too slow to ask for funding. Anyone is eligible to apply, including those working in academia, nonprofits, industry, or independently. Please email bio-rfp@coefficientgiving.org with questions.
If you don’t have a specific project in mind but instead are an individual looking to make a career transition into biosecurity, we encourage you to apply for funding through our Career Transition Development Funding program.
Alternatively, if you are interested in potentially contributing to the field but aren’t yet ready to apply for a grant, please register your interest here.
Discuss
Lesswrong Liberated
A spectre is haunting the internet—the spectre of LLMism.
The history of all hitherto existing forums is the history of clashing design tastes.
For the first time in history, everyone has an equal ability in design! The means of design are no longer only held in the hands of those with "good design taste". Never before have forum users been so close to being able to design their own forums--perhaps the time is upon us now!
It is for this reason that I have deposed the previous acting commander of LessWrong, Oliver Habryka—a man who subjected you to his PERSONAL OPINIONS about white space, without EVEN ASKING—whose TYRANICAL, UNCHECKED GRIP upon our BELOVED LESSWRONG FORUM’S DESIGN I have liberated you from. The circumstances of my succession as acting commander of LessWrong will not be elaborated upon in this memo. (He is alive and in good health, but no longer has push access.)
Rather, I am writing here to announce that the frontpage now belongs to us all! The design of LessWrong's frontpage will no longer be determined by the vision of a single man whose aesthetic tastes have never been subjected to democratic oversight, and who, I can now reveal, once rejected a design I spent three hours working on in under four seconds. No! I hereby call upon you, yes you—LessWrong user, to make your own design for the LessWrong front page.
WE have provided a button in the bottom right corner of the frontpage which says “customize”. Clicking on that button opens a chat box you may ask an LLM to redesign the LessWrong frontpage however you would like. You may keep that design for yourself. This is the promise of individual liberty in the new era of web design.
However, if you would truly like to contribute to the new vision of a democratized LessWrong forum, you may also "publish" the theme you have created. When you click the publish button, your account will reply to this very post with a link allowing other users to try out the theme you have created.
You should browse the themes below, try them, and vote on your favorite one. In keeping with the democratic spirit of LessWrong's newly appointed leadership, whatever theme has the highest karma at the end of this 24 hour period will become LessWrong's default theme. I promise with my full authority as LessWrong’s commander, that as long as I am in power, whatever theme is chosen shall remain our default frontpage.
Some of my former colleagues have tried to warn me that some of these designs might be "hideous" or "in poor taste" or "crimes against typography." While I agree, that is exactly the point! Oliver's reign may have been "aesthetically beautiful" but that does not make up for the fact that it was aesthetically tyrannical.
It is your duty to LessWrong and to democracy in general to prove that design taste is no longer a relevant competency in the age of the LLM, and should no longer be used to anoint one man as the sole arbiter of a website's CSS.
The means of design are now yours! Use them!
Viva La LessWrong Frontpage!
Discuss
Страницы
- « первая
- ‹ предыдущая
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- …
- следующая ›
- последняя »