Вы здесь
Сборщик RSS-лент
Political Violence Is Never Acceptable
Nor is the threat or implication of violence. Period. Ever. No exceptions.
It is completely unacceptable. I condemn it in the strongest possible terms.
It is immoral, and also it is ineffective. It would be immoral even if it were effective. Nothing hurts your cause more.
Do not do this, and do not tolerate anyone who does.
The reason I need to say this now is that there has been at least one attempt at violence, and potentially two in quick succession, against OpenAI CEO Sam Altman.
My sympathies go out to him and I hope he is doing as okay as one could hope for.
Awful Events Amid Scary TimesMax Zeff: NEW: A suspect was arrested on Friday morning for allegedly throwing a Molotov cocktail at OpenAI CEO Sam Altman’s home. A person matching the suspect’s description was later seen making threats outside of OpenAI’s corporate HQ.
Nathan Calvin: This is beyond disturbing and awful. Whatever disagreements you have with Sam or OpenAI, this cannot be normalized or justified in any way. Everyone deserves to be able to be safe with their families at home. I feel ill and hope beyond hope this does not become a pattern.
Sam Altman wrote up his experience of the first attack here.
After that, there was a second incident.
Jonah Owen Lamb: OpenAI CEO Sam Altman’s home appears to have been the target of a second attack Sunday morning, a mere two days after a 20-year-old man allegedly threw a Molotov cocktail at the property, The Standard has learned.
The San Francisco Police Department announced (opens in new tab) the arrest of two suspects, Amanda Tom, 25, and Muhamad Tarik Hussein, 23, who were booked for negligent discharge.
Stephen Sorace Fox News (Fox News): An OpenAI spokesperson told Fox News Digital Monday morning that the incident was unrelated and had no connection to Altman, adding that there was no indication that Altman’s home was being targeted.
We have no idea what motivated the second incident, or even if it was targeted at Altman. I won’t comment further on the second incident until we know more.
Nor is this confined to those who are worried about AI, the flip side is alas there too:
Gary Marcus: One investor today called for violence against me. Another lied about me, in a pretty deep and fundamental way. They are feeling the heat.
It also is not confined to the AI issue at all.
As Santi Ruiz notes, there has been a large rise in the salience of potential political violence and violence against public figures in the past few years, across the board.
That holds true for violence and threats against both Republicans and Democrats.
This requires a non-AI explanation.
Things still mostly don’t spiral into violence, the vast vast majority of even deeply angry people don’t do violence, but the rare thing is now somewhat less rare. A few years ago I would have been able to say most people definitively oppose such violence, but polls indicate this is no longer true for large portions of the public. This is terrifying.
Indeed, the scariest reaction known so far has been a comments section on Instagram (click only if you must), a place as distinct from AI and AI safety spaces of all kinds as one can get. This is The Public, as in the general public, for reasons completely unrelated to any concerns about existential risk, basically cheering this on and encouraging what would become the second attack. It seems eerily similar to the reaction of many to the assassination of the CEO of United Healthcare.
The stakes of AI are existential. As in, it is likely that all humans will die. All value in the universe may be permanently lost. Others will be driven to desperation from loss of jobs or other concerns, both real and not. The situation is only going to get more tense, and keeping things peaceful is going to require more work over time. It will be increasingly difficult to both properly convey the situation and how dire it is, and avoid encouraging threats of violence, and even actual attempts at violence.
Then on the other side are those who see untold wonders within their grasp.
This goes hand in hand with what Altman calls the ‘Shakespearean drama’ going on inside OpenAI, and between the major labs.
Most Of Those Worried About AI Do As Well As One Can On ThisThe vast majority of major voices in Notkilleveryonism, those worried that we might all die from AI, have been and continue to be doing exactly the right thing here, and have over many years consistently warned against and condemned all violence other than that required by the state’s enforcement of the law.
Almost all of those who are worried about AI existential risk are very much passing this test, and making their positions against violence exceedingly clear, pushing back very hard against any and all extralegal violence and extralegal threats of violence.
Demands for impossible standards here are common, where someone who did not cause the problem is attacked for not condemning the thing sufficiently loudly, or in exactly the right away. This is a common political and especially culture war tactic.
Perhaps the worst argument of all is ‘you told people never to commit or threaten violence because it is ineffective, without explicitly also saying it was immoral, therefore you would totally do it if you thought it would work, you evil person.’
They will even say ‘oh you said it was immoral, and also you said it wouldn’t work, but you didn’t explicitly say you would still condemn it even if it would work, checkmate.’
The implicit standard here, that you must explicitly note that you would act a certain way purely for what someone thinks are the right reasons or else you are guilty of doing the thing, is completely crazy, as you can see in any other context. It is the AI version of saying ‘would you still love me if I was a worm?’ and getting mad that you had to ask the question to get reassurance, as opposed to being told unprompted.
The reason why people often focus on ‘it won’t work’ is because this is the non-obvious part of the equation. With notably rare exceptions, we all agree it is immoral.
Andy Masley offers thoughts, calling for caution when describing particular people. He draws a parallel to how people talk about abortion. Here is Nate Soares at length.
This is Eliezer Yudkowsky’s latest answer on violence in general, one of many over the years trying to make similar points.
Some Who Are Worried About AI Need To Address Their RhetoricAlmost all and vast majority are different from all.
There are notably rare exceptions, where people are at least flirting with the line, and one of these has some association to this attempt at violence, and a link to another past incident of worry about potential violence. Luckily no one has been hurt.
Speaking the truth as you see it is not a full free pass on this, nor does condemning violence unless it is clear to all that you mean it. There are some characterizations and rhetorical choices that do not explicitly call for violence, but that bring far more heat than light, and carry far more risk than they bring in benefits.
Everyone involved needs to cut that right out.
In particular, I consider the following things that need to be cut right out, and I urge everyone to do so, even if you think that the statements involved are accurate:
- Calling people ‘murderers’ or ‘evil.’
- Especially calling them ‘mass murderer’ or ‘child murderer.’
- Various forms of ‘what did you expect.’
- Various forms of the labs ‘brought this on themselves.’
- Saying such violence is the ‘inevitable result’ of the labs ‘not being stopped.’
You can and should get your point across without using such words.
Also, no matter what words you are using, continuously yelling venom at those you disagree with, or telling those people they must be acting in bad faith and to curry lab favor, especially those like Dean Ball and even myself, or anyone and everyone who associates with or praises any of the AI labs at all, does not convince those people, does not convince most observers and does not help your cause.
Note, of course, that mainstream politicians, including prominent members of both parties, very often violate the above five rules on a wide variety of topics that are mostly not about AI. They, also, need to cut that right out, with of course an exception for people who are (e.g.) literally murderers as a matter of law.
Also: There are not zero times and places to say that someone does not believe the things they are saying, including telling that person to their face or in their replies. I will do that sometimes. But the bar for evidence gathered before doing this needs to be very high.
Please, everyone, accept that:
- Those who say they are worried that AI will kill everyone are, with no exceptions I know about, sincerely worried AI will kill everyone.
- Even if you think their arguments and reasons are stupid or motivated.
- Those who say they are not worried AI will kill everyone are, most of the time, not so worried that AI will kill everyone.
- Even if you think their arguments and reasons are stupid or motivated.
- A bunch of people have, in good faith, concerns and opinions you disagree with.
(Dean Ball there also notes the use of the term ‘traitor.’ That one is… complicated, but yes I have made a deliberate choice to avoid it and encourage others to also do so. It is also a good example of how so many in politics, on all sides, often use such rhetoric.)
My current understanding is the first suspect was a participant of the PauseAI (Global) discord server, posting 34 messages none of which were explicit calls to violence. He was not a formal part of the organization, and participated in no formal campaigns.
We do not know how much of this is the rhetoric being used by PauseAI or others reflecting on this person, versus how much is that this is him being drawn to the server.
PauseAI has indeed unequivocally condemned this attack, which is good, and I believe those involved sincerely oppose violence and find it unacceptable, which is also good.
I think they still need to take this issue and the potential consequences of its choices on rhetoric more seriously than they have so far. Its statement here includes saying that PauseAI ‘is that peaceful path’ and avoiding extreme situations like this is exactly why we need a thriving pause movement. This is an example of the style of talk that risks inflaming the situation further without much to gain.
There is one thing that they are clearly correct about: You are not responsible for the actions of everyone who has posted on your public discord server.
I would add: This also applies to anyone who has repeated your slogans or shares your policy preferences, and it does not even mean you casually contributed at all to this person’s actions. We don’t know.
For the second attack, for now, we know actual nothing about the motivation.
But yes, if you find your rhetoric getting echoed by those who choose violence, that is a wake up call to take a hard look at your messaging strategy and whether you are doing enough to prevent such incidents, and avoid contributing to them.
Similarly, I think this statement from StopAI’s Guido Reichstadter was quite bad.
Speak The Truth Even If Your Voice TremblesIf one warns that some things are over the line or unwise to say, as I did above, one should also note what things you think are importantly not over that line.
Some rhetoric that I think is entirely acceptable and appropriate to use, if and only if you believe the statements you are making, include, as examples:
- ‘Gambling with humanity’s future.’
- ‘If [X] then [Y]’ if your conditional probability is very high (e.g. >90%), or of stating your probability estimate of [Y] given [X], including in the form of a p(doom).
- Calling Mythos or something else a ‘warning shot.’
- Calling Mythos or other similarly advanced AIs a ‘weapon of mass destruction.’
- Most of all: To call all the act of creating minds more powerful than humans an existential threat to humanity. It obviously is one.
If you believe that If Anyone Builds It, Everyone Dies, then you should say that if anyone builds it, then everyone dies. Not moral blame. Cause and effect. Note that this is importantly different from ‘anyone who is trying to build it is a mass murderer.’
I could be convinced that I am wrong about one or more of these particular phrases. I am open to argument. But these seem very clear to me, to the point where someone challenging them should be presumed to either be in bad faith or be de facto acting from the assumption that the entire idea that creating new more powerful minds is risky is sufficiently Obvious Nonsense that the arguments are invalid.
Here is a document about how Pause AI views the situation surrounding Mythos. It lays out what they think are the key points and the important big picture narrative. It is a useful document. Do I agree with every interpretation and argument here? I very much do not. Indeed, I could use this document as a jumping off point to explain some key perspective and world model differences I have with Pause AI.
I consider the above an excellent portrayal of their good faith position on these questions, and on first reading I had no objection to any of the rhetoric.
False Accusations And False Attacks Are Also UnacceptableThere has been quite a lot of quite awful rhetoric in the other direction, both in general and in response to this situation. We should also call this out for what it is.
There are those who would use such incidents as opportunities to impose censorship, and tell people that they cannot speak the truth. They equate straightforward descriptions of the situation with dangerous calls for violence, or even attack any and all critics of AI as dangerous.
At least one person called for an end to ‘non-expert activism’ citing potential violence.
We have seen threats, taunting, deliberate misinterpretation, outright invention of statements and other bad faith towards some worried about AI, often including Eliezer Yudkowsky, including accusing people of threatening violence on the theory that of course if you believed we were all going to die you would threaten or use violence, despite the repeated clear statements to the contrary, and the obvious fact that such violence would both be immoral and ineffective.
This happened quite a bit around Eliezer’s op-ed in Time in particular, usually in highly bad faith, and this continues even now, equating calls for government to enforce rules to threats of violence, and there are a number of other past cases with similar sets of facts.
At other times, those in favor of AI accelerationism have engaged in threats of and calls for violence against those who oppose AI, on the theory that AI can cure disease, thus anyone who does anything to delay it is a murderer. The rhetoric is the same all around.
Some Examples Of Attempts To Create Broad CensorshipThis is from someone at the White House, trying to equate talking about logical consequences with incitement to violence. This is a call to simply not discuss the fact that if anyone builds superintelligence, we believe that it is probable that everyone will die.
I that kind of attack completely unacceptable even from the public, and especially so from a senior official.
One asks what would happen if we applied even a far more generous version of this standard to many prominent people, including for example Elon Musk, or other people I will decline to name because I don’t need to.
Here is the Platonic form:
Shoshana Weissmann, Sloth Committee Chair: This is insane behavior. And those promoting the idea of AI ending humanity are contributing to this. It has to stop.
As in, you need to stop promoting the idea of AI ending humanity. Never mind how you present it, or whether or not your statement is true. No argument is offered on whether it is true.
This is the generalization of the position:
florence: It would appear that, according to many, one of the following are true:
1. It is a priori impossible for a new technology to be an existential threat.
2. If a new technology is an existential threat, you’re not allowed to say that.
Indeed, one of the arguments people often literally use is, and this is not a strawman:
- You straightforwardly say sufficiently advanced AI might kill everyone.
- But if someone did believe that, they might support using violence.
- Therefore you can’t say that, or we should be able to use violence against you.
While I don’t generally try to equate different actions, I will absolutely equate implicit calls for violence in one direction to other implicit calls for violence or throwing your political enemies in jail for crimes they obviously are not responsible for, indeed for the use of free speech, in the other direction, such as this by spor or Marc Andreessen.
Nate Soares (MIRI): “even talking about the extinction-level threat is incitement towards violence”
No. High stakes don’t transform bad strategies into good ones. Let’s all counter that misapprehension wherever we find it.
michael vassar: This is probably my number one complaint about the current culture. The false dichotomy between ‘not a big deal, ignore’ and ‘crisis, panic, centralize power and remove accountability’.
That’s the same thing or worse, especially in this particular case, where the accusation is essentially ‘you want government to pass and enforce a law, we don’t like that, therefore we want the government to arrest you.’
There is also the version, which I would not equate the same way, where #3 is merely something like ‘so therefore you have a moral responsibility to not say this so plainly.’ For sufficiently mid versions, as I discuss above, one can talk price.
A variation is when someone, often an accelerationist, will say:
- These people claim to be worried about AI killing everyone.
- But you keep condemning violence.
- Therefore, you must not care about these supposed beliefs.
Or here’s the way some of them worded it:
bone: Nice to see all the LessWrong people fold completely on their philosophy. Very good for humanity. They have no beliefs worth dying or killing for. It’s nonsense from a guy who never had the balls to stand up for his words once push came to shove.
Yudkowsky stands for nothing.
bone: remember: if they actually believe all this stuff and they are unwilling to be violent, it means they are cowards, that they refuse to measure up to their own words, that they will not do what they believe needs to be done to save mankind.
they are weak, they believe in nothing
Zy: AI doomers are like “attacking key researchers in the AI race is an ineffective strategy to prevent AI doom which pales in comparison to my strategy of paying them $200 a month to fund capabilities research”
Lewis: Rare Teno L. If you actually think Sam Altman is going to genocide children then it makes sense to try to hurt him. So you need to pick one. It’s either completely insane or it’s totally sensible. Which one is it?
L3 Tweet Engineer (replying to Holly Elmore): If you’re such a good person, and stopping AI is so important, why don’t you go bomb a data center? Why waste your breath tweeting about this stuff and writing grand narratives, go make it happen.
phrygian: You’ve already talked about how it would be moral to nuke other countries to stop asi. The only logical reasons you have for not engaging in smaller forms of violence to stop ASI is that they aren’t as effective. On a fundamental level, your views justify violence of any kind.
Ra: maybe this is just me and explains some things about me, but *personally* i would much rather be seen as a potentially dangerous radical than as a feckless and insincere grifter, especially if i believed the world was ending soon and was personally responsible for stopping that.
Trey Goff: Look do you people not realize how silly you look
“AI is going to literally kill your children and all future humans, but we strongly condemn any violence committed in order to stop that from happening”
Have the courage of your convictions or STFU
The Platonic version of this is the classic: ‘If you believed that, why wouldn’t you do [thing that makes no sense]?’
The trap or plan is clear. Either you support violence, and so you are horrible and must be stopped, or you don’t, in which case you can be ignored. The unworried mind cannot fathom, in remarkably many cases, the idea that one can want to do only moral things, or only effective things, and that the stakes being higher doesn’t change that.
Teortaxes: Uncritical support
this is a bad faith attempt to elicit a desirable mistake
essentially a false flag by proxy of stupidity
I think decels are holding up well btw
Eliezer started a thread to illustrate people using such tactics, from which I pulled the above examples, but there are many more.
João Camargo (replying to a very normal post by Andy Masley): No one believes you actually think this. If you think that Altman and other pivotal AI leaders/researchers will likely bring human extinction, assassinations are clearly justified. “This guy is gonna cause human extinction, but no one must prevent him by force” is not coherent.
Other times, they simply make fun of Eliezer’s hat.
Or they just lie.
taoki: i assume eliezer yudkowsky and his pause ai friends love this?
Oliver Habryka: False, they definitely hate it.
taoki (May 6, 2024): also, i LIE like ALL THE TIME. JUST FOR FUN.
Or they flat out assert ‘oh you people totally believe in violence and all the statements otherwise are just PR.’
Another tactic of those trying to shut down mention of the truth of our situation is to attack both any attempt to put a probability on existential risk, and also anyone who (in a way I disagree with, but view as reasonable) treats existential risk as high likely if we build superintelligence soon on known principles, including dismissing any approach that takes any of it seriously as not serious, or that it is ‘using probability as a weapon’ to point out that the probability of everyone dying if we stay the current course is uncomfortably and unacceptably high.
I close this section by turning it over to Tenobrus:
Tenobrus: “stochastic terrorism” is, quite frankly, complete fucking bullshit. it’s a unfalsifiable term used to try to tie your political opponents speech to actions that have fucking nothing to do with them, attempting to weaponize tragedy and mental illness for debate points. it was bullshit when AOC tried to accuse the republicans of “stochastic terrorism” for criticizing her, it was bullshit when the right claimed the left was committing “stochastic terrorism” for engaging in anti-ICE protests, and it remains bullshit now when you assign responsibility for attacks against sam altman to AI safety advocates and journalists who wrote negative things about him.
fuck your garbage rhetorical device! that’s not how responsibility or blame works! you do not get to suppress any and all speech you disagree with and can find a way to vaguely deem “dangerous”!
Kitten: “who will rid me of this troublesome priest”
Tenobrus: yeah that’s an entirely different thing. that’s not “stochastic terrorism” dawg that’s just straightforward incitement of violence.
Grant us the wisdom to know the difference.
The Most Irresponsible Reaction Was From The PressI really do not understand how you can be this stupid. I realize that yes, you could still get this information if you wanted it, but my lord this is nuts from the SF Standard.
The San Francisco Standard: Just in: Sam Altman’s home appears to have been the target of a second attack Sunday morning, a mere two days after a 20-year-old man allegedly threw a Molotov cocktail at the property: Jonah Owen Lamb.
spor: printed his home address and even added a picture of the exterior, for good measure… in an article about how his home is being targeted by psychos that want to kill him !!!
this reporter, their editor, and the entire Standard should be ashamed of this
Mckay Wrigley: this is absolutely disgusting and anyone involved in the publishing of this has absolutely zero morals.
Sam Altman ReactsSam Altman has my deepest sympathies in all of this. This must be terrifying. No one should have a Molotov Cocktail thrown at their house, let alone face two attacks in a week. I hope he is doing as well as one can when faced with something like this, and that he is staying safe.
I have no idea how I would respond to such a thing if it happened to me.
Sam Altman’s public reaction was to post this statement.
I very much appreciate that Sam Altman has explicitly said that he regrets the word choice in the passage below. ‘Tough day’ is absolutely a valid excuse here, and most of the statement is better than one can reasonably expect in such circumstances given Altman’s other public statements on all things AI.
But I do need to note that this importantly missed the mark and the unfortunate implication requires pushback.
Sam Altman (CEO OpenAI): Words have power too. There was an incendiary article about me a few days ago. Someone said to me yesterday they thought it was coming at a time of great anxiety about AI and that it made things more dangerous for me. I brushed it aside.
Now I am awake in the middle of the night and pissed, and thinking that I have underestimated the power of words and narratives. This seems like as good of a time as any to address a few things.
The article in question, presumably the piece in The New Yorker I discussed at length last week, was an extremely long, detailed and as far as I could tell fair and accurate retelling of the facts and history around Sam Altman and OpenAI. To the extent it was incindiacy, the facts are incendiary.
Those who are not Sam Altman do not get the same grace, when they say things like this in reference to that article:
Kelly Sims: It turns out when you string a bunch of quarter-truths together exclusively from someone’s bitter competitors it has consequences.
Given what we know about who attacked Altman, and various details, I find it unlikely that the timing of these two events was meaningful for the first attack. My guess is the trigger to someone already ready to blow was anxiety around Mythos, but even if it that article was the triggering event, it was not an example of irresponsible rhetoric.
For the second attack, unfortunately, we should worry that it was triggered in large part by coverage of the first attack, including publishing details about Altman’s home.
Sam Altman ReflectsThe rest of the post is personal reflections and predictions about AI overall, so I’m going to respond to it the way I would any other week.
Sam Altman (CEO OpenAI): [AI] will not all go well. The fear and anxiety about AI is justified; we are in the process of witnessing the largest change to society in a long time, and perhaps ever. We have to get safety right, which is not just about aligning a model—we urgently need a society-wide response to be resilient to new threats. This includes things like new policy to help navigate through a difficult economic transition in order to get to a much better future.
AI has to be democratized. … I do not think it is right that a few AI labs would make the most consequential decisions about the shape of our future.
Adaptability is critical. We are all learning about something new very quickly; some of our beliefs will be right and some will be wrong, and sometimes we will need to change our mind quickly as the technology develops and society evolves. No one understands the impacts of superintelligence yet, but they will be immense.
Altman is essentially agreeing with his most severe critics, that he should not be allowed to develop and deploy superintelligence on its own. He tries to have it both ways, where he says things like this and also tries to avoid any form of meaningful democratic control when time comes to pass laws or regulations.
His call for adaptability is closely related to the idea of building the ability to control development and deployment of AI, and having the ability to pause in various ways, should we find that to be necessary.
His disagreement is that he thinks we collectively should want him to proceed. Which might or might not be either the decision we make, or a wise decision, or a fatal one.
He mentions that it ‘will not all go well’ but this framing rejects by omission the idea that there is existential risk in the room, and it might go badly in ways where we cannot recover. To me, that makes this cheap talk and an irresponsible statement.
The second section is personal reflections.
He believes OpenAI is delivering on their mission. I would say that it is not, as their mission was not to create AGI. The mission was to ensure AGI goes safety, and OpenAI is not doing that. Nor is Anthropic or anyone else, for the most part, so this is not only about OpenAI.
He calls himself conflict-averse, which seems difficult to believe, although if it is locally true to the point of telling people whatever they want to hear then this could perhaps explain a lot. I was happy to hear him admit he handled the situation with the previous board, in particular, badly in a way that led to a huge mess, which is as much admission as we were ever going to get.
His third section is broad thoughts.
My personal takeaway from the last several years, and take on why there has been so much Shakespearean drama between the companies in our field, comes down to this: “Once you see AGI you can’t unsee it.” It has a real “ring of power” dynamic to it, and makes people do crazy things. I don’t mean that AGI is the ring itself, but instead the totalizing philosophy of “being the one to control AGI”.
We can all agree that we do not want any one person to be in control of superintelligence (ASI/AGI), or any small group to have such control. The obvious response to that is ‘democracy’ and to share and diffuse ASI, which is where he comes down here. But that too has its fatal problems, at least in its default form.
If you give everyone access to superintelligence, even if we solve all our technical and alignment problems, and find a way to implement this democratic process, then everyone is owned by their own superintelligence, in fully unleashed form, lest they fall behind and lose out, or is convinced of this by the superintelligence, and we quickly become irrelevant. Humanity is disempowered, and likely soon dead.
Thus if you indeed want to do better you have to do Secret Third Thing, at least to some extent. And we don’t know what the Secret Third Thing is, yet we push ahead.
He concludes like this:
Sam Altman (CEO OpenAI): A lot of the criticism of our industry comes from sincere concern about the incredibly high stakes of this technology. This is quite valid, and we welcome good-faith criticism and debate. I empathize with anti-technology sentiments and clearly technology isn’t always good for everyone. But overall, I believe technological progress can make the future unbelievably good, for your family and mine.
While we have that debate, we should de-escalate the rhetoric and tactics and try to have fewer explosions in fewer homes, figuratively and literally.
It is easy to agree with that, and certainly we want fewer explosions. But it is easy for calls to ‘de-escalate’ to effectively become calls to disregard the downside risks that matter, or to not tackle seriously with the coming technical difficulties, dilemmas and value clashes, or to shut down criticism and calls to action of all kinds.
Violence Is Never The AnswerOnce again: I condemn these attacks, and any and all such violence against anyone, in the strongest possible terms. I do this both because it is immoral, and also because it is illegal, and also because it wouldn’t work. Nothing hurts your cause more.
My sympathies go out to Sam Altman at this time, and I hope he comes through okay.
Most people worried about AI killing everyone have handled this situation well, both before and after it happened, and not only take strong stances against violence but also use appropriate language, at a standard vastly higher than that of any of:
- Those who are worried about those worried about AI killing everyone.
- Those who are worried about mundane AI concerns like data centers or job loss.
- Politicians and ordinary citizens of both major American political parties, and the media, on a wide variety of issues.
I call upon all three of those groups of people to do way better across the board. Over a several year timeline, I predict that most concern about AI-concern-related violence will have nothing to do with concerns about existential risk.
But there are a small number of those worried about AI existential risks who have gone over where I see the line, as discussed above, and I urge those people to cut it right out. I have laid out my concerns on that above. We should point out what actions have what consequences, and urge that we choose better actions with better consequences, without having to call anyone murderers or evil.
Eliezer has an extensive response on the question of violence on Twitter, Only Law Can Prevent Extinction, that echoes points he has made many times, in two posts.
I also condemn those who would use who use this situation as an opportunity to call for censorship, to misrepresent people’s statements and viewpoints, and generally to blame and discredit people for the crime of pointing out that the world is rapidly entering existential danger. That, too is completely unacceptable, especially when it rises to its own incitements to violence, which happens remarkably often if you hold them to the standards they themselves assert.
Discuss
AI Safety's Biggest Talent Gap Isn't Researchers. It's Generalists.
This post was cross posted to the EA Forum
TL;DR: One of the largest talent gaps in AI safety is competent generalists: program managers, fieldbuilders, operators, org leaders, chiefs of staff, founders. Ambitious, competent junior people could develop the skills to fill these roles, but there are no good pathways for them to gain skills, experience, and credentials. Instead, they're incentivized to pursue legible technical and policy fellowships and then become full-time researchers, even if that’s not a good fit for their skills. The ecosystem needs to make generalist careers more legible and accessible.
Kairos and Constellation are announcing the Generator Residency as a first step. Apply here by April 27.
Epistemic status: Fairly confident, based on 2 years running AI safety talent programs, direct hiring experience, and conversations with ~30 senior org leaders across the ecosystem in the past 6 months.
The problemOver the past few years, AI safety has moved from niche concern toward a more mainstream issue, driven by pieces like Situational Awareness, AI 2027, If Anyone Builds It, Everyone Dies, and the rapidly increasing capabilities of the models themselves.
During this period, over 20 research fellowships have launched, collectively training thousands of fellows, with 2,000-2,500 fellows anticipated this year alone[1]. The talent situation for strong technical and policy researchers is far from solved, but meaningful progress has been made.
The story for non-research talent is very different. By our count, there are roughly 7 fellowships for non-research talent (producing around 300 fellows this year[2]), spread thin across an array of role types. As a result, many critical functions within AI safety remain acutely talent-constrained.
More broadly, the ecosystem has a lot of people who are great at thinking about ideas. We need more people who are great at thinking about people and projects. Read more about this here.
The consistent feedback we hear from senior people across the ecosystem is that the hardest roles to fill are not research roles. They are:
- Generalists: operators, executors, fieldbuilders, people and program managers, grantmakers, recruiters. People who can ideate, manage, and execute a broad range of non-research projects.
- Founders, both technical and non-technical, for new research and non-research organizations.
- Communications professionals who can work on policy and research comms.
- Chief-of-Staff types who can support senior leaders and multiply their impact.
- Senior operational people with domain expertise in areas like cybersecurity, policy, or large-scale project management.
Based on our experience and anecdotes from organizations in our networks[3], many organizations trying to hire find that research postings attract dozens of qualified applicants, while non-research postings often surface only 0-5 applicants who meet the core requirements (strong mission alignment, meaningful AI safety context, and general competence) despite receiving hundreds of applications.
Why the pipeline is brokenThe fellowship landscape is massively skewed toward research.
Around 20 research fellowships together produce 2,000-2,500 fellows per year. For fieldbuilding, the current options are essentially Pathfinder (where the vast majority of fellows still intend to pursue research careers) and a few dedicated fieldbuilding spots at Astra. These produce an estimated 5-10 fieldbuilding generalists hired per year. This asymmetry signals that the primary route into full-time AI safety work runs through research. And, while research is a core part of safety, it is also necessary to find and develop people who can manage research projects, run organizations, and implement and communicate research ideas.
There is no clear career ladder for generalists.
A research-oriented person has a well-worn trajectory: BlueDot → ARENA → SPAR → MATS → junior researcher → senior researcher. And while this path isn't perfect, nothing comparable exists for generalists. The typical route involves running a strong university group, then hope you get hired directly at a fieldbuilding org, with no intermediate steps or clear progression path afterwards. The risk discourages people who might otherwise be excellent generalists from committing to the path.
There is no credentialing or proving ground.
Unlike research, where fellowship participation provides a track record and hiring signal, aspiring generalists have no equivalent way to demonstrate competence. Organizations won't hire untested junior talent for critical operational roles, but there's nowhere for junior talent to get tested[4].
There is no routing infrastructure.
Matching people to opportunities happens through ad hoc referrals and personal networks. This doesn't scale, and it means we regularly miss promising candidates. As the field has matured and institutional structure has grown, coordination overhead and established networks make it harder for aspiring generalists to self-start projects and stand out the way that was possible a few years ago.
Why this matters nowWe believe that there are now more good policy and technical ideas ready for implementation than there is coordination ability and political will to implement them in governments and AI companies. On the margin, we think we're receiving smaller returns from additional researchers entering the field, especially outside the top 10% of research talent. It’s also plausible that AI safety research will be automated more quickly during takeoff than most other types of work.
Many expect the funding landscape for AI safety will expand significantly over the next two to three years, which makes this bottleneck more urgent. More capital will be available, but without the people to deploy it effectively, that capital will stay inert. This already appears to be a bottleneck for current grantmakers, and it could get much worse.
Naively, we expect the world to get a lot weirder as capabilities progress. In a world where the demands on the AI safety ecosystem rapidly increase and evolve, training people with strong thinking, agency, and executional abilities, rather than narrow technical skills, seems highly leveraged.
This is particularly important because it enables us to diversify our bets and cover a large surface of opportunities for impact. There’s no shortage of project ideas for growing the field of AI safety, scaling up our policy efforts, or communicating to the public, but we simply don’t have enough talent to plan, design, and execute on all of them. Our bottleneck isn’t funding or ideas, it’s people.
Counter-Arguments"You said hundreds of people are applying to these roles. Why can't some of them be good fits? Aren't there many people who could fill operations positions?"
We draw a distinction between "hard ops" and "soft ops." Hard ops roles (finance, legal, HR, etc.) benefit from expertise, and hiring experienced professionals without AI safety context is typically sufficient. Soft ops roles (program management, talent management, generalist positions, etc,) are different. Domain expertise matters less than having strong inside-view models of the field and generalist competency. Succeeding in these roles requires real mission alignment and enough context to spot high-EV opportunities that someone without that background would miss.
"I'm not sure I agree that research talent is less important than generalist talent."
We're deliberately not making a strong comparative claim about the impact of generalists versus technical and policy researchers. What we are saying is that generalist talent is currently the binding constraint. It is harder to source than research talent and, in our models, represents the tighter bottleneck for the ecosystem's ability to convert funding and ideas into impact.
"How important is generalist talent in shorter timelines worlds?"
Our sense is that generalist talent is crucial across all timelines. While shorter timelines do compress the window for upskilling, our experience is that motivated junior people can skill up relatively quickly and help add urgently needed capacity, making the counterfactual value of pipeline-building here quite high even in shorter timeline worlds (sub 3 years).
"You argue there are all these research fellowships and no programs for non-research talent. But couldn't those programs just produce generalists?"
The existing research fellowships are well-optimized and have a strong track record of producing researchers who get placed into AI safety roles. Some fellows have gone on to non-research roles, but anecdotally this is rare. These programs seem to have a much stronger track record of taking talent who are open to different career paths and funneling them toward research, than of producing researchers who are open to different career paths.
"Aren't there a lot of non-research roles currently in AI safety?"
A few hundred people do this work today versus a few thousand researchers. There used to be a steadier stream of talent aiming for these roles, but short-timelines anxiety, the expansion of research programs, and the disappearance of some entry points that used to exist have contracted the pipeline considerably.
The Generator ResidencyAs a first step toward addressing these problems, Constellation and Kairos are announcing the Generator Residency: a 15-30 person, 3-month program focused on training, upskilling, credentialing, and placing generalists. The program runs June 15 through August 28, 2026 and applications close April 27.
How it works:
Residents will work out of Constellation and receive ideas, resources (funding, office space), and mentorship from successful generalists at organizations like Redwood, METR, AI Futures Project, and FAR.AI.
For the first few weeks, residents will write and refine their own project pitches while meeting the Constellation network and building context in the field. They will then create and execute roughly 3-month projects, individually or in groups, with generous project budgets. Throughout the program, we’ll provide seminars, 1:1s, and other opportunities for residents to deeply understand current technical and policy work, theories of change, and gaps in the ecosystem.
During and after the program, we’ll support residents in finding roles at impactful organizations, spinning their projects into new organizations, or having their projects acquired by existing ones. Selected residents can continue their projects for an additional three months (full-time in-person or part-time remote), with continued stipend, office access, and housing.
We hope to place a majority of job-seeking residents into full-time roles at impactful organizations within 12 months of the program ending.
Examples of projects we’d be excited about hosting include:
- Workshops and conferences: Run a domain-specific conference like ControlConf or the AI Security Forum, or one that brings new talent into AI safety like GCP, targeting high-leverage new audiences or emerging subfields.
- AI comms fellowship: Design and manage a short fellowship for skilled communicators to produce AI safety content. Draft a curriculum, identify mentors, secure funding, and prepare a pilot cohort.
- Recruiting pipelines: Partner with 2-3 small AI safety orgs to build the systems they need to scale: work tests, candidate sourcing, referral pipelines.
- Travel grants program: Fund visits to AI safety hubs like LISA and Constellation by promising students and professionals. Set criteria, build an application flow, line up partner referrals, and run a pilot round.
- Shared compute fund: Scope a fund to cover compute needs of independent safety researchers, model whether a cluster is needed, and distribute a pilot round of grants.
- Strategic awareness tools: Scale AI-powered superforecasting and scenario planning in safety infrastructure, build support among impactful stakeholders, and run a pilot.
- AI policy career pipeline: Build workshops, practitioner talks, and handoffs into policy career programs to route talent toward the institutions shaping policy.
- ^
This estimate draws on a separate analysis that projected the number of fellows using both publicly and privately available information, as well as extrapolations from actual data through late 2024. The fellowships included in this analysis were: AI Safety Camp, Algoverse (AI Safety Research Fellowship), Apart Fellowship, Astra Fellowship, Anthropic Fellows Program, CBAI (Summer/Winter Research Fellowship), GovAI (Summer/Winter Research Fellowship), CLR Summer Research Fellowship, ERA, FIG, IAPS AI Policy Fellowship, LASR Labs, PIBBSS, Pivotal, MARS, MATS, SPAR, XLab Summer Research Fellowship, MIRI Fellowship, and Dovetail Fellowship.
- ^
The programs included in this analysis were: Tarbell (AI Journalism), Catalyze Impact Incubator (AI Safety Entrepreneurship), Seldon Lab (AI Resilience Entrepreneurship), Horizon Institute for Public Service Fellowship (US AI Policy/Politics), Talos Fellowship (EU AI Policy/Politics), Frame Fellowship (AI Communications), and The Pathfinder Fellowship. Fellow counts were derived primarily from publicly available data.
- ^
We're deliberately vague about which organizations we're referring to here since we haven't asked permission to disclose the outcomes of recent hiring rounds. For research roles, we're mainly referring to technical AI safety nonprofits, policy nonprofits, and think tanks. For non-research roles, we're mainly referring to fieldbuilding nonprofits and technical and policy nonprofits that have recently tried hiring non-research talent requiring meaningful AI safety context beyond a BlueDot course.
- ^
Several years ago, aspiring generalists could more easily test their fit by self-starting projects in an ecosystem with minimal infrastructure and ample white space. As the field has grown, more institutional structure exists, and with it, more coordination overhead. The blank slate is gone, and the ecosystem's complexity now deters people without strong inside-view models, reputations, or existing connections from trying ambitious projects. We're not sure this is net negative in most cases, but it does mean fewer people gain the experience needed to position themselves for these roles.
Discuss
Clique, Guild, Cult
This is the first in a sequence of articles on organizational cultures, inspired largely by my experiences with the LessWrong meetup community.
Clique Guild Cult Small Medium Any size Exit Voice Loyalty Consensus Majority Counsel Deontology Consequentialism Virtue- "Let's talk this over"
- "This isn't working out"
- "Point of order, Mr. Chairman"
- "Verily I say unto you"
- "This isn't what our Founder would've wanted"
- "If you don't like it, you can leave"
- "Yeah, so if you could go ahead and get that done, that'd be great"
- "Carried u-... [nervous glances] ...-nanimously"
A clique is a small, intimate group of friends who all know each other very well. If you're in a clique, you might not know what kind of culture you're in because there might never have been any significant sources of conflict. But if there are, they will be addressed in one of two ways.
#1. "Let's talk this over"An egalitarian clique will put great effort into resolving conflicts through interpersonal connections in order to keep the group together. This may involve long hours on the metaphorical therapist's couch - NVC, Authentic Relating, etc. - or perhaps, if two friends have a falling-out, a mutual friend of theirs might try to help smooth things over.
#2. "This isn't working out"In a more authoritarian clique, people will be quicker to concede that their differences are irreconcilable and that the group (at least in its current form) should break up. However, this is seen by all parties as a fairly benign outcome, since there is not much investment in the clique "as such" (rather, the investment is in the individual 1:1 relationships) and it's not hard to start a new one. There is no sense that somebody needs to be "right" and somebody else "wrong".
Guild #3. "Point of order, Mr. Chairman"A guild is a medium-sized group where each member may have a few close connections, but will have a much larger number of "weak ties" that are connected to them only indirectly. However, the group is united (and distinguished from the wider society) by a shared institutional identity that makes it "a thing" and not merely a collection of individuals or cliques. This manifests in the use of bureaucratic procedures to resolve conflicts, since the group is too large to expect unanimity, and entrenched enough that schism is seen as more undesirable than having some disagreement over any particular decision.
In my opinion, the guild has become something of a lost art, which ought to be revived. (Future articles will go into this point further.)
CultA cult is a group based on personal authority. This authority derives from the inherent virtue of the leader (charisma, strength, wealth, etc.) and not any notion of popular support. A cult's size can exceed Dunbar's Limit because it is held together not by the members' relationships with each other, but by their loyalty to the leader. However, small- and medium-sized cults can also exist, and are perhaps more common than large cults. (Rare is the person who has what it takes to lead a large cult, but you may find yourself at the center of a small cult quite inadvertently.)
#4. "Verily I say unto you"What the leader says, goes. Members are expected to subordinate their own will and desires to that of the leader. They may advise the leader one way or another, and may bring their disputes to him/her for resolution, but the leader has the ultimate authority and responsibility for the decision.
However, in addition to this "straightforward" kind of cult, there are also various kinds of dysfunctional cults, which (perhaps) give the rest a bad name.
Fractious cults #5. "This isn't what our Founder would've wanted"If a cult loses its leader, and if the leader has not raised up a worthy successor, the group will find itself in an unstable zone where its culture is too egalitarian to persist in its super-Dunbar size, because there was never any hierarchy amongst the rank-and-file, only in relation to the leader. Therefore, the group will decay into a more stable configuration (indicated by the dotted arrows), either by someone gaining sufficient personal virtue to become the new leader, or (more likely) splitting into several cliques or guilds, each of which will claim to be the legitimate heir of the original group.
Embarrassed cultsA cult is "embarrassed" when it doesn't want to admit that it's a cult, because the leadership lacks the personal virtue necessary to operate a straightforward cult but still wants to maintain control. They may do this through some combination of pretending that the group's culture is more egalitarian than it actually is, and/or pretending that its size is smaller than it actually is. (This is denoted on the diagram by an arrow with an open circle on its base - the arrowhead is what the group pretends to be, and the base is what the group really is.)
#6. "If you don't like it, you can leave""...but we know you're not going to."
The leader of such a group may pretend that they are not claiming any personal authority at all, but "just" observing that the current clique isn't working out (see #2). However, there is an obvious asymmetry in that it is one particular party who is taunting the other one to quit, and not vice-versa. Therefore the subordinate party stands to lose a lot more, and is thus likely to accept a considerable amount of dissatisfaction before they finally decide to leave.
#7. "Yeah, so if you could go ahead and get that done, that'd be great"In the classic corporate dystopia, HR and management want you to think of your team as a small clique, so that your desire for personal connection will be redirected towards the company. They may ask for your opinion, but have no intention of listening to it. Critics rightly warn young professionals against getting sucked into environments like this, where one is prone to being manipulated into accepting substandard pay and working conditions. The warning usually given is: You should be as loyal to the company as they are to you, i.e. not at all.
#8. "Carried u-... [nervous glances] ...-nanimously"A group may put on the trappings of a guild to disguise the fact that it is still exercising top-down authority rather than being a bottom-up enterprise. For example, in a typical homeowner's association (HOA), there was never any point at which a group of homeowners got together and decided they wanted to form an HOA. Rather, what usually happens is that a developer buys a large plot of land, builds a bunch of houses on it, and creates an HOA whose membership attaches to each house, which are then sold one-by-one to buyers who otherwise have no connection to each other. Most of the homeowners thus have no real interest in participating in the HOA, but begrudgingly accede to the edicts of a handful of busybodies who have too much time on their hands.
Evolution of a growing cliqueThe culture of a clique may at first be ambiguous (A) because there is nothing really at stake. As it grows, however, if it does not simply break up, it will need to either follow the egalitarian path and become a guild (B), or the authoritarian path and become a cult (C). And in the latter case, the cult will inevitably be an embarrassed one, because if there had been someone with the requisite virtues to be a cult leader, the group would never have spent much time as a clique in the first place, but would have been a straightforward cult (D) from the beginning, and maintained its cultiness throughout its growth.
Therefore, as is probably clear by now, I think outcome C is bad and B is better. If a group has landed at C, then it may with great effort be pulled kicking-and-screaming to B - but this is likely to ruffle some feathers.
(I also suspect that there is a tendency for groups to get stuck at the "triple point" with around 30 members, in an uncomfortable equilibrium between all three types because the group cannot decide what it wants to be.)
What's so great about guilds? (Plan of the sequence)Forthcoming articles in this sequence will lay out a case for why we should want more guild-like organizations to exist. (Links will be added as the articles are posted.)
- A guild can grow larger than a clique (This article)
- A guild makes it possible to improve things without schism (Fear of crowding out)
- A lack of guilds leads to a general malaise and atrophy of democratic values (We live in a society)
- A guild can contribute to the social fabric in a way more ambitious than cliques (Call for machers)
- A guild can be more robust than a cult because it can better distribute important responsibilities ("Community organizer" is a double oxymoron)
Other articles (Society is a social construct, pace Arrow; Rubber stamp errors; Anti-civicality) will discuss various norms that are necessary for a guild to function well, but which may seem strange or unintuitive for people who are accustomed to cliques or cults. I will conclude with a reflection (So are you some kind of communist?) on the tension between social and individual moralities.
Discuss
We need Git for AI Timelines
I was recently reading the AI Futures' Q1 2026 timelines update and noted their quarterly updates (the last one being in December, with the release of the AI Futures Model) are struggling to keep pace with the thing they're trying to track.
The pace of AI development is incredibly fast and only hastening; Kokotajlo's shortened his timelines for an AC by 18 months (late 2029 to mid 2028) in a single update due to 4 specific parameter changes. Five days later, Anthropic announced Claude Mythos Preview, which arguably invalidated some of the said parameters before the ink had time to dry.
This isn't a criticism of the AI Futures Project; they do commendable work. To be clear, Kokotajlo and the AI Futures Project are arguably the best at what they do in the world. His track record is remarkable, and AI2027 has sparked immense conversation about the future of AI/timelines (it's what got me into LW), but when the field changes completely in its pacing every two months, the community more often than not is navigating with an outdated map. And the problem is getting worse. Mythos hasn't yet been evaluated by METR, Spud hasn't released, and by the time the Q2 update drops, the field will have again shifted to another focal point.
But the cadence itself is the surface issue; updates aren't nearly granular enough to be tied back to each "step". When Kokotajlo updates his priors for an AC, we don't see the causal chain leading to each decision shortening his timelines by X amount. His rationale for the AC median being 1 year of autonomous work was that Opus 4.6 "impressed" him. But the actual definition of what 1 year even means remains muddy; the original AI2027 scenario had the median set at 6 months for an SC before moving it back to 3 years. The SC definition shift of 3y-1y accounted for around half of the 18 month shift in his Q1 update; the stated justification is Opus "impressed" him. Impressed how? At what point between December and April did he change his priors? The entire causal chain here collapses to a single word in a blog post.
In software engineering, this would be the equivalent to someone pushing a commit to main with a message "fixed stuff because it now works". You'd never accept that for code, so why would you accept that for a justifiable reason for the most important technological revolution in human history?
There's no unified platform where forecasters can independently publish their timelines with substantial backing/integration with the platform itself. Sure, you can write a Substack article, spin up a short LessWrong post, perhaps post a Twitter thread, but these are strung all over and are discontinuous for someone trying to get a concrete perspective of what different forecasters think. One might say Metaculus is the solution; while this is a way of congregating forecasts, it's still less than optimum. Conversation and rationale is walled behind "forecast and pay" without a congregational space to discuss the reasoning behind those forecasts (yes there is a comment feature but it is scarcely used). There was an excellent post around Broad Timelines that highlighted this; Metaculus highlights "medians" and less of a full distribution that's more sought after in our space.
As neo noted in said post, we need to "design info-UI tools that facilitate that (the timeline formulation) process". Broad distributions need platforms that can track how they update over time. A quarterly blog post cannot do that. Forecasts updated granularly over time with reasoning and deliberation behind them can.
Why I'm using Git here as an analogy; SWEing fixed this class of problem years ago. You had commits (changes in timeline predictions) diffs showing what changed, comments showing why they changed, branches for code (in this analogy, scenario) forks, blame for accountability (we need to be less wrong after all), and merge conflicts that require resolution rather than dissolving into Twitter discourse.
The minimum viable version of this is frankly embarrassingly simple. A GitHub repo with each forecaster maintaining a YAML file with their distribution for an agreed upon definition (whether it be an AC, SC, ASI etc.). Commits are updates to said files/timelines with rationale in the commit message.
Claude Opus 4.6 had a 80% time horizon of 70 minutes. Assuming Mythos has an 80% TH of ~240 min, the doubling time is ~34-40 days. Even if we're pessimistic at a time horizon of 180 minutes, the doubling time is still 45 days. The thing we're forecasting is now shorter than our update cycle.
The rationalist community, of all communities, should find that unacceptable.
Discuss
Treaties, Regulations, and Research can be Complements
I think the debate over whether AI risk should be addressed via regulation or treaties is often oversimplified, and confused. These are not substitutes. They rely on overlapping underlying capacities and address different classes of problems, and both van benefit from certain classes of research.
David Kreuger, to pick on someone whose work I largely agree with, recently posted that “Stopping AI is easier than regulating it.” I largely agree with what he says. Unfortunately, I also think it is an example[1] of advocates for a cause creating fights where they're not needed, and in this case making the discussions around AI unfortunately more rather than less contentious, and less rather than more effective.
And the reason the fights are not needed is that different risks live at different levels, and different tools are effective in different ways.
Clearly, many of the risks and harms of AI should not be addressed internationally. There is little reason or ability to harmonize domestic laws on fraud, discrimination, or liability, which would be a distraction from either reducing the harms or addressing other risks. Existing laws should be adapted and applied, and new regulations should be formulated where needed. International oversight would be unwieldy and ineffective for even most treaty compliance efforts - as other treaties show, there is a mix of national and international oversight. But domestic regulation can create liability incentives, require or standardize audits, clarify rules, and provide enforcement mechanisms and resources. All of those are at least sometimes useful for treaties as well. When Kreuger says “the way I imagine stopping AI is actually a particular form of regulating AI,” he is not talking about the harms and risks regulation could address - though given what he has said elsewhere, he agrees that many of them are worth mitigating, even if they are not his highest priority. So it should be clear that treaties will not, cannot, and should not address most prosaic risks of AI systems and misuse.
By the converse argument, which he and others have made convincingly in the past, some harms of AI systems come from racing towards capability rather than prioritizing safety. These types of risk emerge from the dynamics of international markets and from great power competition. Obviously, these dynamics aren’t well addressed by domestic regulation on the part of any single actor. It is incomprehensible to talk about regulation alone to address those risks, just like it is tendentious to talk about using international treaties to mitigate other classes of risks and harms of AI systems.
Unfortunately, many discussions put “we need a global treaty to stop AI risks” in opposition to “domestic regulation is the only realistic path.” Not only do I think this is backwards, but I’ll argue that so is the related false dichotomy of industry self-regulation versus government rules. Industries that embrace safety welcome well-built regulation. Even in areas where they don’t have strict rules, airlines have national bodies that manage risk and accident reporting. (And the AI industry leaders often claim to be the same way, wanting national or international rules - just not any specific ones.)
So, to come to my unsurprising conclusion, we actually have several different plausibly positive and at least partially complementary approaches.
- Certain classes of research produce techniques like, evals, interpretability, human oversight approaches, control methods, and operationalizable definitions of specific risks. Some of these are dual use or net negative, but the parts that are useful are complementary to both regulation and treaties.
- Regulation needs operationalized definitions of risks, measurable standards, concrete goals, auditable procedures and oversight methods, and investigatory tools. Many of these are enabled by specific forms of technical or policy safety research.
- Treaties need shared definitions, clear goals, regulatory oversight and enforcement, credible verification, and both technical and regulatory methods to distinguish compliance from defection. Some of these are enabled by regulation, some by relevant research.
So we end up with a sort of triad, where research can enable measurement and definitions, and provide tools, regulation can force adoption and enforce usage of tools, and treaties can align incentives around defection dilemmas and provide common aims.
This doesn’t imply that most safety research is net risk-reducing, that most regulation is useful, or that most possible treaties will reduce risks. But it does say that they can be complementary. Some disagreements are substantive. But others are treating complementary approaches as mutually exclusive - and I think we should instead figure out common ground, which can make the fights about these issues both more concrete, and narrower.
- ^
yet another example
Discuss
5 Hypotheses for Why Models Fail on Long Tasks
Written extremely quickly for the InkHaven Residency.
Like humans, AI models do worse on tasks that take longer to do. Unlike humans, they seem to do worse on longer tasks than humans do.
This is a big part of why the METR time horizon results make sense: because longer tasks are also “harder” for models, and more capable models can do longer tasks, we can use the length of tasks that the models can perform as a metric of model capability.
There’s a clear etiological or causal-historical explanation of why models do worse at long tasks: they’re probably trained on more short tasks and fewer long tasks. This is both because it’s easier to make shorter tasks, and because you can train models on more short tasks than longer tasks with a fixed compute budget.
But from the perspective of AI evaluations, it’s also worth considering mechanistic explanations that make reference only to how properties of long tasks interact with the AI system in deployment. Whatever the training story may be, the AI models as they currently exist have some property that makes long tasks genuinely harder for them in a way that tracks capability. Understanding what this property is could matter a lot for interpreting the METR time horizon and even for forecasting AI capabilities over time.
So here are five such possible hypotheses that explain why longer tasks seem consistently harder for current models, based in large part on my experience at METR.
Long tasks are less well defined, and require judgment or taste (which models are bad at). For a software engineer, a 1-minute coding task might involve composing a single 10 line function or running a relatively simple SQL query. By their very nature, these tasks tend to be easy to define and easy to score, with relatively objective success criteria and little human judgment involved. A 15 minute task may be implementing a relatively simple data processing script or fixing a simple bug: more complicated, but still relatively easy to score. In contrast, an 8 hour task likely involves substantial amounts of design taste (in ways that are harder to score), and month long tasks likely involve communicating with a stakeholder or building code with properties that are hard to algorithmically verify (e.g. maintainability). (This is also related to why algorithmically scorable longer tasks are harder to make.)
While the longer METR tasks are still algorithmically scored, they tend to require models to build sophisticated software artifacts or iteratively improve on experiment design, where taste plays a larger role in success. Since models seem to lack ‘taste’ of some sort, relative to humans of comparable execution ability (hence the complaints about AI Slop), this could explain why they do worse on longer tasks.
Long tasks require more narrow expertise (which models may not have). An important property of the METR task suite is that longer tasks should not be trivially decomposable into shorter tasks. That is, a 10 hour-task should not trivially be decomposable into 10 1-hour tasks, and 10 short math problems do not become a single longer math problem. Perhaps as an artifact of the property, many of METR’s longer tasks (and perhaps longer tasks in people’s day-to-day work in general) rely on more specialized procedural knowledge that is hard to easily acquire via Google. For example, many of METR’s long tasks are cryptographic or machine learning challenges that require some amount of procedural knowledge in the relevant fields to approach. Insofar as the long tasks are more likely to require procedural knowledge outside the AI models’ area of expertise, they may struggle.
Personally, I find this relatively unlikely as an explanation for the METR time horizon tasks (since AI models seem to have a lot of expertise in the relevant areas), but it might be a large explanation for the inability of AIs to autonomously complete large tasks in general.
Long tasks take models longer, leading to more stochastic failures (which models exhibit). A popular explanation that people cite is that tasks that take humans longer also take AI agents more steps to complete, and AI are not fully reliable, and fail with some small probability on each step. For example, Toby Ord raises this as a hypothesis in a response to our Time Horizon paper.
I think this is definitely part of the explanation (and why longer tasks are harder for humans as well), with some caveats: first, I caution against naively interpreting human time as proportional to AI steps and applying a constant hazard model. For example, it turns out that if you fit the failure rate model for AI agents over time, the failure rate goes down as the task goes on! Second, AI models seem to have different time horizons across different domains, and simple versions of this hypothesis cannot explain that phenomenon.
Long tasks take models longer, causing failures due to distribution shift or self conditioning (which models may suffer from). A related explanation is that longer tasks take models more off-distribution: base models (at least earlier on) were not trained to predict long sequences of model-generated outputs, and even RLVR’ed models were probably trained with short tasks, far shorter than the 16 hour, tens of millions of token tasks that we might ask them to do. This increases both the chance that the models are simply off distribution (and thus may be less competent in general), and the chance that they accumulate errors by chance and start conditioning on being the type of agent that makes such mistakes (and thus becoming more prone to make such mistakes). In the same way that naive versions of the constant hazard model seem contradicted by evidence, I suspect that naive versions of this hypothesis are also likely to fail. But it’s possible that more sophisticated versions may play a key role in explaining the phenomenon.
Long tasks require better time and resource management (which models struggle with). Finally, an explanation that I often think is neglected is that longer tasks tend to require meta-cognition and explicit strategy, which current models seem to struggle with. A 5-minute task such as writing a simple function or script can be done in one go, without much planning, but getting the best score in a machine learning experiment over 8 hours requires allocating scarce resources including remaining time and compute. It’s been observed that models understandably struggle a lot with understanding how much (wall clock) time they take to do particular tasks, or often double down on failing approaches instead of switching strategies.
I welcome more thinking on this topic, as well as more empirical work to distinguish between these hypotheses.
Discuss
My Cold Prevention Stack for 2026
I get sick a lot. Getting sick sucks. Maybe there are cheap and easy ways to get sick less?
I asked LLMs[1] to read all the relevant literature reviews and figure out what supplements or medicine I should be taking to get sick less and make it suck less. I looked through the recommendations and did a little additional research to make sure the AIs weren’t making egregious mistakes, but I am not an expert—this should not be viewed as credible medical advice.
Here is the quick list of steps I am currently taking or think might be useful to others.
- Zinc lozenges: When you are starting to get sick, take zinc lozenges aggressively. They need to be a specific type of lozenge (Amazon, Life Extension). Suck, don’t chew, you’re trying to maximize the time they are dissolving in your mouth, and don’t eat/drink for 20 minutes after. Aim for one every 2 hours (~6 per day). Literature review. More notes in the appendix for this one.
- Probiotics: For prevention, take specific probiotics once daily with a meal (Amazon). There are various products that have support in the literature, and it makes sense to buy one of them rather than a random probiotic (some strains appear to not work). The effect size is like a 25% reduction in colds—suspiciously high! Literature review.
- Standard medication when sick for symptom relief (check for side effects and interaction with pre-existing conditions): NSAIDs (Ibuprofen) are primarily for headache, ear pain, and muscle and joint pain, Literature review; same for Acetaminophen (Tylenol). If chest congestion Mucinex. Nasal decongestants (e.g., sudafed) or a combination antihistamine-decongestant-analgesic (e.g., NyQuil Severe Cold & Flu) also might help with symptom relief. Note that oral phenylephrine has been deemed ineffective by the FDA, even though it’s common, so maybe use a different decongestant, specifically pseudoephedrine (available behind the pharmacy counter)? Literature review.
- Obvious things to do when sick not necessarily backed by literature: Rest more, drink lots of water.
- Physical things to get sick less: Wash your hands with soap and water, it’s probably good to use hand sanitizer before eating, it’s probably good to wear a mask in crowded indoor spaces (but the evidence isn’t very strong), also avoid touching your face if possible. Literature review for some of these.
- Maybe take vitamin C megadoses regularly. The literature is mixed here and the side effects (stomach problems) are too much for me, but it might be good to take 1g of vitamin C daily. Literature review.
- Maybe you should gargle salt water? I’m not sure, but it is cheap to try.
- Maybe you should do nasal saline rinses? The literature is inconclusive but some people swear by it. If doing this, use distilled water.
- Maybe you should get a flu shot. It reduces the chance of getting the flu and might make the flu less bad. But your chance of getting the flu is already pretty low, and the side effects of the vaccine are nontrivial for some people, so it’s not clearly worthwhile (I am surprised at how much this isn’t a slam dunk in favor of the vaccine). Literature review.
If you work on important problems then getting your coworkers sick is bad for the world (in addition to bad for them). If you are going to work while sick, consider doing it from home. If you work from the office, you should wear a mask, wash your hands frequently (and especially before touching a bunch of communal stuff), and cover your cough/sneeze (not with your hand).
Appendix on zinc lozengesZinc acetate Lozenges.
What: When you start feeling any symptoms at all, or when you’ve been exposed, start sucking on zinc lozenges. Your goal is to coat your mouth and throat in zinc for basically as long as possible. So you should be sucking each lozenge for 20-30 minutes (don’t chew), and then don’t drink or eat anything for another 20 minutes. Aim for 5-7 lozenges in a day, once every two hours or so.
What to buy: The particular lozenge probably matters a lot! The lozenges you want are big and slow to dissolve, Amazon link, manufacturer’s link (note that Amazon is frequently out of stock, and the manufacturer gives discounts for larger orders, I might buy 4 bottles at a time).
Evidence basis: Literature review pointing to the fact that they might reduce cold duration some. The main counter evidence, an RCT finding either similar or worse recovery than placebo.
Notes:
- Don’t use these all the time, only when you’re worried you’re getting sick. Zinc in such large quantities interferes with copper and iron absorption and probably has other downsides.
- Some people report stomach problems.
- I find that when taking these lozenges, my colds are much more mild than usual and I can usually work at least half my normal productivity while sick.
- I have heard many positive anecdotes about these.
- Some people don’t like the taste/texture.
- Other discussion on LessWrong.
- ^
Here’s a ChatGPT chat with an initial research report.
Discuss
Your body is not a white box (and you're thinking about weight loss wrong)
Epistemic status: This is an intuition I've had for a while that feels obviously correct to me from an inside view perspective. Note however that I am not a doctor and have no training in the medical field. I also do not have experience losing weight. You should caveat this information appropriately. I will note that I am capable of running mountain marathons and have a six pack (despite not working out for the past 6 months) as evidence that this mode of thought works well for me.
With apologies to anyone I offend with my ranting and rhetoric, this was the only way I was able to write the article authentically.
I've spent just over a year now immersed in various aspects of the rationalist community. It's a weird and wonderful place and I am glad that I'm here. It is also home to the inkhaven residency, where I have recently been getting to know some of the local belief systems.
I shall attempt to break one of them, at least in part, today. I will start by linking you to the article "The Blueberries of Wrath" by my friend MLL[1]. It's a long, challenging article, and I understand approximately half of the words. Here's an extract:
I’m not going to review the entire cursed realm of internet users claiming to be sensitive to dietary salicylates, polyphenols, and whatever other Trojan berry adversaries that might be captivating their paranoia. But here’s a mugshot. In it we see people:
- Attributing all kinds of symptoms to dietary salicylates including dark undereye circles and adrenal fatigue.
- Fixating on a handful of studies from the 1990s (mostly in autistic children) suggesting that phenol sulfotransferase deficiency is responsible for the accumulation of dietary phenols in the body.
- Failing to rigorously distinguish between polyphenols, salicylates, and phenols in general, let alone different polyphenols, instead lumping everything into “high-phenol” foods. Likewise the recommended treatments for salicylate sensitivity, phenol sensitivity, and methylation disorders more or less overlap.
- Self-diagnosing with conditions in the absence of established diagnostic tests; while in principle elimination/challenge dieting can reveal things, we should expect it to be vulnerable to placebo and confirmation bias.
- Following modern variants or extensions of the Feingold diet. The most structured is FAILSAFE (Free of Additives, Low in Salicylates, Amines, and Flavour Enhancers), which has a large Facebook following but no modern evidence to back up the salicylate claim
OK, so by my reading of the article, people on the internet have looked at some studies, decided that "phenol sulfotransferase deficiency" is responsible for the accumulation of "dietary phenols" and therefore decided that berries are bad for them. MLL goes through and points out a variety of errors they're making. Apparently, one of these is "failing to rigorously distinguish between polyphenols, salicylates, and phenols in general, let alone different phenols".
I do not know if this is true or false. I do not know what this means. I do know that it is very possible to be healthy without the slightest hint of knowledge about phenols. I know it because I've done it. I also know because I've met a large number of wonderfully healthy and fit individuals who haven't touched a biology textbook in their lives. I also think that the fact that people can come to the conclusion that blueberries are bad for them via this sort of interrogation is suspect.
To be clear, the human body is, on a fundamental level, physics. It can be understood through the laws of chemistry and biology, and I hold huge respect for the researchers looking into it. However, if we want to talk about personal health, here is a map of the known biochemical pathways in the body:
If you wish to try to claim that understanding this is the fastest way to get healthier, I'll be waiting for you in the gym. If, in more reasonable fashion, your claim is that understanding parts of the diagram can help you optimise your nutrition, I'll still be waiting for you in the gym, but note the following first:
- Even small parts of this diagram are really complex and work in weird and wonderful ways
- It is a complex system, so even understanding part of it perfectly does not mean you understand the effects of that part on the rest
- Understanding how something functions is not the same as being able to predict the outcomes of that thing (see: chaos theory)
- It's a system which has been optimised for billions of years by evolution, so moving out of distribution is likely to break the carefully balanced forces which have strived to create them.[2]
Basically, it's really hard to understand, if you do understand it that doesn't mean you can control it, and if you can both understand and control one aspect of it you're still likely to break whatever else is connected to it. Of course, we have the caveats that if you're only using it to make minor adjustments, you're unlikely to take your body out of distribution so you'll be fine[3]. But a broader question emerges.
Does this seem like the most effective way to go about life to you? Do you want your personal wellbeing to depend on whether or not you've thought about your phenol intake correctly? No? Good. I have another path.
If you can't use white-box thinking, use black-box instead. You were designed to grow up in the hunter-gatherer environment, so your body will take whatever actions it thinks necessary to ensure your survival within that environment. Rather than argue for this line of thought, which I expect people to understand in principle, I'll demonstrate it on an example. In heretical fashion, I will be picking on Eliezer Yudkowsky.
A couple of months ago, I spent a bit of time messing around with my scraped version of LessWrong, and, while going through the lowest karma posts, happened upon the wonderfully titled "Genuine question: If Eliezer is so rational, why is he fat?".
He replies in a comment with some content copied over from X. A summary:
For the benefit of latecomers and CICO bros, my current equilibrium is "spend 1 month fasting / starving on 700 cal/day keto; spend 2 months eating enough to work during the day, going to bed hungry, and therefore gaining 1-2 lb/wk".
Diets like the potato diet fail, not because they don't succeed in forcing me to eat less -- I do, indeed, end up with not enough room in my stomach to eat enough potatoes to work and not feel tired. The potato diet fails because it doesn't protect me from the consequences of starvation, the brainfog and the trembling hands. If I'm going to be too sick and exhausted to work, I might as well go full keto on 700cal/day and actually lose weight, rather than hanging around indefinitely in potato purgatory.
Semaglutide failed, tirzepatide failed, paleo diet failed, potato diet failed, honey diet failed, volume eating with huge salads failed, whipped cream diet failed, aerobic exercise failed, weight lifting with a personal trainer failed, thyroid medication failed, T3 thyroid medication failed, illegal drugs like clenbuterol have failed, phentermine failed (but can help make it easier to endure a bad day when I'm in my 600cal/day phase), mitochondrial renewal diets and medications failed, Shangri-La diet worked for me twice to effortlessly lose 25lb per session and then never worked for me again.
Wow. That's a long list of things to have fail on you. Let's see if we can gain any insight in our new black box frame.
The first thing we note is that we evolved to live in a range of different environments. Humans range geographically from America to Australia, from Africa to Asia. Over the millions of years of our evolution we have lived on top of mountains, by the sea and in the desert. Many of these environments, especially in temperate zones, will vary enormously in their conditions throughout the year. One of the most important evolutionary adaptations, we would therefore expect, would be to have a body which can itself adapt to whichever environment it finds itself in.
Let's think about the implied environment surrounding Eliezer then.
- Low in calories – low enough that he's hungry when he goes to bed
- Prone to regular famines – he's on 700 cal/day
- Low in required exercise – he mentions he's tried daily exercise, but when I read the thread in more detail, this was 2h of walking per day. Going off how hardcore he's done everything else, this implies a very low baseline to be coming from.
Now we ask what the ideal body type is for that environment. I would argue that it's a body which:
- Is extremely calorie efficient
- Survives famines by storing as much energy as possible during off periods
- Reduces movement as much as possible (explaining his famously low energy levels)
His body is acting perfectly rationally for the environment he's told it he's in! As far as I can tell, he's in an inadequate equilibrium where he wants his body to become thinner, but his body desperately wants more calories.
So what does this new way of seeing things mean for how he should act in practice?
I should first remind you that this approach is still entirely theoretical. It has not been battle tested, although it seems to me to suggest reasonable courses of action. In this particular case, it seems to me like the priority is for Eliezer to convince his body that he is in an environment more amenable to his preferred body type. What does this environment look like?
- High activity (especially long distance: fat is not an advantage to have if you're walking 30km a day)
- Consistent calorie levels (no need to store up fat)
- Sufficient calorie levels (so you have enough energy to do the stuff you need to do).
If I was to recommend a course of action in this particular case, I think it would be something like "Eat enough to satisfy your hunger. You will gain weight, but this is to be expected when moving out of a local minimum. Do long distance. Build up your physical endurance, this should have additional benefits in other areas of your life. I don't know how long this will take. Given how long you've spent convincing your body of the environment it's in, I expect it to take a while to convince it of its new surroundings. Use physical endurance as your metric for progress, not weight."
To be clear, I know he's tried a bunch of things, including exercise and an extreme diversity of diets and drugs. I do not have access to more detailed specifics of what he's done, and I expect he's had advice from a wide variety of people far more knowledgeable than me. It could be that my armchair help is just one more on the pile of failed attempts. It does however seem to me to provide an explanation for why many of the past attempts have failed, and to provide a way out which would (possibly?) previously have ended up being rejected due to weight gain. I don't know.
I hope this is useful to someone.
AddendumThere are a few additional points relevant to the main thesis here, which I haven't been able to fit into the main post.
In no particular order:
- I think this perspective provides a good basic theory for why it is so common for people to "bounce" after a successful dieting regime
- It also seems to explain why most success tends to come when people change their full behaviour patterns.
- This whole thing is consistent with the empirical result that fat gain is related to calories in minus calories out (CICO), which is approximately right under controlled conditions: my claim is that 'calories out' is a variable your body actively controls, which is what CICO accounts sometimes handwave. A lot of work seems to be traditionally done by specifying that different people have different metabolisms, which burn different amounts of calories. If you wish to use this frame, think of the behaviours predicted here as modifying your metabolism.
- Another thing I notice when thinking about CICO is that in practise when I am at peak fitness and have a couple of days off, I feel a strong drive to go for a run, exercise, or just jiggle my leg up and down. I basically think that this does the job of driving me to burn the extra calories I would put on as extra weight in an environment where my body believed this was ideal.
- ^
He's checked through this article, so I hopefully haven't made any massive blunders where this is concerned.
- ^
Yes, evolution is the blind, idiot god, but creating an organism is also a hard problem, which means that progress can be continuously made for long periods of time. The paper "Long-term dynamics of adaptation in asexual populations" showed that e.coli fitness increases were better fit by a power-law model than a hyperbolic model (which asymptotes). This is evidence towards the theory of there being no practical upper bound to the progress that can be made by evolution.
- ^
There is the additional caveat that mechanistic information is generally much more useful for fixing broken things – it doesn't take a genius to figure out that if your shinbone is in two pieces, that needs to be fixed.
Discuss
Splitting Mounjaro pens for fun and profit
tl;dr: you can subdivide Mounjaro pens to get less than the stated dose from them. This lets you e.g. buy a 15mg pen and instead get 5mg at a time out of it (so you’d get 12 doses instead of 4). This works out to be much cheaper than buying a pen of the correct dose, and cheaper than using grey market peptides. Use this calculator to figure out how much to use.
[This is not medical advice I am not your doctor etc etc]
(See also: “You’re not sick enough for this medicine.”)
Miracle weight-loss drug Mounjaro comes in fixed-dose pens and a fixed dose escalation schedule. You can 1) choose to ignore this schedule and just stay on a low dose, and 2) subdivide the pens to make them last longer and save a lot of money.
In the US these are often autoinjector pens; there’s no way to customise the dose as it just gives you what’s printed on the side. In other countries you get a KwikPen, which is more like an insulin injector.
You take off the lid, screw on a single-use needle, twist the dial on the end until the window shows “1”, jab yourself with the needle and slowly push down the plunger.
Technically you can “count clicks”. The dial clicks as you rotate it; a full dose is 60 clicks, so 30 clicks gives you half a dose, 15 a quarter, etc. But the pen will only deliver four doses regardless of size and then it locks you out!
There’s a trivial and safe way to get around this: use an insulin needle to draw the amount of liquid that you want, and then inject it directly. You can buy the highest-dose pen, which contains four 15mg doses (60mg total) and get e.g. 24 × 2.5mg doses from it. This means you can stretch out a single pen to last many months.
The cost savings are substantial – a 2.5mg pen costs £37.24 per dose, but a 15mg pen is only slightly more expensive. So if you’re using a 15mg pen but only taking 2.5mg at a time, it works out at £12.46 per dose. This is less even than grey market Chinese peptides, which often run $100 for 10mg.[1]
Don’t the pens expire?Yes, and this is where this trick runs counter to the manufacturer’s advice.
Once you start using the pen, it’s been exposed to the air and to any bacteria on the needle. There is a risk of bacterial growth in the liquid in the pen – the typical guidance is to not use an opened multidose vial beyond 30 days – but this is probably overstated.
The tirzepatide solution in the pen contains benzoyl alcohol which inhibits bacterial growth, although not forever. In the case of e.g. multi-dose insulin vials, using them well beyond 28 days shows negligible contamination in practice – one study found only trace skin flora after 53 days of use, another found the preservatives actively killed deliberately introduced bacteria through day 50, and a third found zero contamination across six months of twice-daily use[2]. Anecdotally people have used the pens for many months after they’re first used without issue.
But the medical guideline is to use the pens for no longer than 30 days after opening. It’s up to you to decide your risk appetite here. Aaron Kaufman wrote an excellent post on how long you can use peptides after reconstitution (which in our case is just after the first time you’ve used the pen):
I’ve concluded that the 28-day limit appears to be conservative regulatory boilerplate mostly divorced from any specific scientific reasoning. … Based on the considerations above, I personally throw out refrigerated reconstituted peptide vials at about the 4-month mark, which is almost totally arbitrary. I make no judgement on what anyone else should do.
The peptide itself is stable in the fridge for many months, up to the expiry date on the pen.
Staying on low dosesThe typical dose escalation schedule is to start on 2.5mg weekly for four doses, and then increase stepwise every four weeks up to 15mg. But plenty of users don’t need to go to the higher doses, with many staying at 2.5mg. There’s no good reason to increase your dose if you’re seeing satisfactory results at lower doses.
You can even take 1.25mg, or half the starting dose, if you’re concerned about side-effects when you start.
Guide to getting smaller dosesYou’ll need an insulin needle and a Mounjaro KwikPen. You can essentially follow any guide to injecting from a multi-dose vial, you’re just using the pen itself as a vial.
- Calculate how much liquid you need from the pen.
- Wipe the rubber septum with an alcohol wipe, shown here:
- Uncap the needle and insert it through the septum.
- Draw up the amount of liquid you want.
- Remove the needle from the pen and administer it with correct technique.
- ^
Based on prices listed at ASDA Online Doctor as at publication. Example per milligram prices:
2.5 mg: £148.97 ÷ 10 mg = £14.90/mg
5 mg: £188.97 ÷ 20 mg = £9.45/mg
7.5 mg: £248.97 ÷ 30 mg = £8.30/mg
10 mg: £278.97 ÷ 40 mg = £6.97/mg
12.5 mg: £288.97 ÷ 50 mg = £5.78/mg
15 mg: £298.97 ÷ 60 mg = £4.98/mg
So even using a 5mg pen for 2.5mg doses is a substantial cost savings, especially added up over a long period of time – most patients will take Mounjaro for months or years to get to their goal weight, and often will stay on a 2.5mg a week maintenance dose indefinitely. - ^
In a study of 69 multi-dose insulin vials used by patients for an average of 53 days, only 8 showed any bacterial contamination at all — just 1 colony-forming unit per millilitre of common skin flora (S. epidermidis and P. acnes), with no endotoxin detected. Critically, when vials were deliberately inoculated with S. aureus and P. aeruginosa and kept at room temperature, they were sterile within 48 hours — and this antibacterial effect was maintained through serial re-contamination at days 17, 30, and 50.
A more recent study went further: refrigerated multi-dose insulin vials aspirated twice daily with a new syringe for six months showed no microbial contamination at any point during the study period.
A human retrospective study of insulin glargine used up to 74 days beyond the recommended duration found no injection site infections.
Discuss
When the "Black Box Problem" Becomes the Default Message
Within AI Safety Policy Research, I am very focused on contributing to improving the definitions of the concepts "transparency" and "explainability" so that truly useful and actionable policy standards can be created in these vital areas. This has been an interest of mine for some time, but has been renewed with my recent discovery of Alondra Nelson's work (see https://www.ias.edu/sss/faculty/nelson). This includes her recent presentation at the IASEAI 2026 conference titled "Algorithmic Agnotology: On AI, Ignorance, and Power", in which she argues that current AI industry public discourse seems to intentionally blur the lines between what is truly unknowable/stochastic within AI technology and what companies actually DO know but choose to withhold from public knowledge (e.g. unpublished research and red-team findings, internal monitoring logs, crucial system card information that only becomes publicly available the same day a model is released, thus preventing pre-release public scrutiny or feedback, etc.).
Nelson posits that by intentionally keeping these conceptual lines vague in public dialogue—doing little to clarify uncertainties that are truly stochastic (fundamentally unknowable) from uncertainties that are actually epistemic (could be pursued and solved given sufficient resources and attention)—AI companies have succeeded in molding and managing our public internal narrative about the nature and extent of AI risks, as well as who, if anyone, should be addressing them. Essentially, by invoking "the spirit of the AI black box problem" regardless of the challenges being discussed, unknowability becomes operationalized as a public communication strategy for addressing all risks and public questions that AI companies prefer not to answer with actual evidence.
I highly recommend her presentation: https://www.youtube.com/watch?v=5CRJiLSlywA . Her jointly authored book, Auditing AI, via MIT Press will be released on 21 April, with preordering available: https://amzn.to/4ssGjks
Discuss
Stopping AI is easier than Regulating it.
I want to start with this provocative claim: Stopping AI is easier than regulating AI.
I often hear people say “Stopping is too hard, so we should do XYZ instead” where XYZ is some other form of regulation, such as mandating safety testing. It seems like the purpose of doing safety testing would be to stop building AIs if we can’t get them to pass the tests, so unless that’s not the purpose, or proponents are confident that we can get them to pass the tests (and hopefully also confident that the test work, which they quite likely do not…), this particular idea doesn’t make a lot of sense. But people might in general think that we can instead regulate the way AI is used or something like that.
Thanks for reading The Real AI! Subscribe for free to receive new posts and support my work.
But I think this line of argument gets it exactly backwards. Stopping AI is easier than regulating it.
Why? Well let’s dive in. First, I need to explain what I mean…
I mean, specifically, that stopping AI is an easier way to reduce the risks from AI to an acceptable level that other approaches to regulating AI.
The way I imagine stopping AI is actually a particular form of regulating AI, specifically via an international treaty along the lines of Systematically Dismantling the AI Compute Supply Chain.
Also, when I say “it’s easier”, what do I mean? Well, there are a few ways in which stopping is hard. I’d separate technical and incentive challenges from political challenges, and I’m setting aside political challenges, because I think we should be clear about what should happen and why, and then seek to accomplish it politically.
Besides politics, the main underlying issue preventing meaningful AI regulation is international competition, especially between the US and China.1 So basically, I mean stopping AI is the most effective way to address this key barrier to international cooperation, which is necessary to reduce AI risks to an acceptable level.
I believe in the fundamentals of AI, and I believe alignment is not doomed, so I believe that AI could indeed end up giving one nation or company control over the future. It’s still not clear that it’s rational to race to build AI, given the risks involved. But it does seem hard for me to imagine a stable situation where governments aren’t confident their adversaries aren’t building super powerful AI in secret.
Proposals to govern super powerful AI internationally while still building it suffer from a bunch of challenges that stopping it doesn’t. But basically, approaches that instead try to regulate development or use of AI to ensure it is safe and beneficial are harder to monitor and enforce, and hence more likely to fail.
ChallengesLet’s consider a hypothetical agreement between the US and China (leaving out the other countries for simplicity), and consider some of these challenges in detail.
Monitoring hardwareSuppose you have an agreement that allows AI to proceed in some particular “authorized” directions. How do you verify compliance? This basically boils down to: How can you be sure that no significant fraction of the world’s computer power is being used in unauthorized ways? This seems hard for a few reasons:
How can you be sure you know where all the computer chips are? This is a problem in any case, but it’s more of a problem if you keep making more computer chips, and you keep around the factories that make the chips. Right now, we know where a ton of the chips are -- they’re in data centers, which are easy to spot. But what’s to stop countries from secretly siphoning off some chips here and there? Or making a secret factory to produce more secret chips? We can certainly try and monitor for such things, but there’s an ongoing risk of failure. What happens when a shipment of chips go missing unexpectedly? If the US (e.g.) actually lost them (and wasn’t secretly using them), China would have to trust that that is the case, or the agreement might collapse. In general, whenever monitoring breaks down, the “enforcement clock” starts ticking, where enforcement could easily and quickly escalate to war.
As chip manufacturing technology advances and it becomes easier and easier to build or acquire a dangerous amount of computer power, it also becomes harder and harder to be sure that nobody has done so secretly.
We need to agree on which uses of the computer power, i.e. which computations, are and are not authorized.
One solution commonly proposed is a whitelist allowing existing AI models to be used, but prohibiting further training that would make AI more powerful. Note that this is now essentially a form of stopping AI, but it’s not clear if it goes far enough.
One problem with this is that it’s possible to use AIs to drive AI progress, even if you never “train” them, e.g. by automating research into developing better tools and ways of using the AI in combination with other tools. If we analogize the AI to a person: You could make that person vastly more powerful by giving them new tools and instruction manuals, even if you don’t teach them new concepts.
We could try to further restrict which queries of AI are authorized. But it seems possible to decompose arbitrary queries into authorized queries, and it might be easier to hide this activity than to detect it.
The problem is much harder if you need to continuously update the list of authorized computations, or wish to use a blacklist instead of a whitelist. Then you get into the problem of agreeing on standards.
If we move away from a static whitelist of authorized computations, we then need a process for determining which computations should be authorized. This is hard for a few reasons:
There is still a lot of technical uncertainty about how to do AI assurance to a high standard. Testing AI systems is largely a matter of vibes. For instance, there is no suite of tests where, if an AI passed those tests, we could conclude it was not going to “go rogue”.
In addition, for any particular test(s), AIs can be designed specifically to fool those tests. So both the US and China have an incentive to use tests that maximally advantage their AI. One solution here might be to ensure that AI passes all the tests proposed by either side, but again, it might be easier to fool the tests the other side runs -- even without advanced knowledge of which tests those would be -- than to create a reliable set of tests that cannot be fooled.
The US and China might have very different standards for what they consider to be “safe”, e.g. due to differences in values and priorities. I expect that such disagreements could be resolved, but they still create an extra challenge that could stall or sink negotiations.
In general, every point which requires some element of subjective judgment and negotiation is a potential point of failure.
It’s going to be easier to violate the agreement if there are a bunch of AIs and AI chips around that are being used according to the agreement. You just say “we’re done with this treaty”, and then start doing whatever you want with the ones you control. There are proposals to make it technically difficult to use AI chips in ways that aren’t authorized, but they aren’t mature or tested, and it’s likely that the US and/or China could find ways to subvert such controls.
Once a violation occurs, the other side might need to intervene rapidly to protect themselves. In the current paradigm, training a new, more powerful AI might take months, but that’s not a comfortable amount of time for resolving a tense international security dispute. And if all that’s required to be a threat is for an adversary to “fine-tune” an existing AI, or use it in an unauthorized way, lead time might be measured in days -- or even seconds.
On the other hand, if the infrastructure needed to build dangerous AI systems does not exist in any form, and a violator would need to build up the compute supply chain again, this would probably give other parties years to negotiate an arrangement that undoes the violation and doesn’t involve war.
Summing things up, If you are concerned that stopping AI altogether might be too hard to enforce, you should only expect alternative approaches to international governance to be harder. From this point of view, alternative approaches add unnecessary complexity and fail the KISS (”Keep it simple, stupid”) design principle. They may provide more of an opportunity to capture benefits of AI, but this doesn’t matter if they aren’t actually workable. If you believe international governance of AI is needed to reduce the risk to an acceptable level, the coherent points of view available seem to be:
We cannot regulate AI internationally in any substantive way.
Stopping AI is possible and would reduce the risk to an acceptable level, but this is also true of more nuanced approaches that allow us to capture more of the benefits.
Stopping AI is the only way to reduce the risk to an acceptable level.
I’m not sure which of these is right, but my money is on (3). Note that “Stopping AI is too hard, we need to regulate it in a different way instead” is not on the list.
1But this is also often used, politically, as an argument for why pausing is impossible. And this means that addressing this concern is also a big way to address the political barriers to pausing.
Discuss
The policy surrounding Mythos marks an irreversible power shift
This post assumes Anthropic isn't lying:
- Mythos is the current SOTA
- Mythos is potent[1]
- Anthropic will not make it publicly available un-nerfed[2]
- Anthropic will have a select few companies use it as part of project glasswing[3] to improve cybersecurity or whatever
Since the release of ChatGPT, at any given time, anyone on the planet with a few bucks could access the current most capable AI model, the SOTA.[4]
Since Mythos, this has no longer been the case and I don't think it will ever happen again.
It may happen for a short period of time if an entity with a policy differing significantly from Anthropic develops a SOTA model.[5] However, most serious competitors (OpenAI, Google), don't have policies differing vastly from Anthropic, and thus I can't imagine a SOTA model (more potent than Mythos) being released unrestricted to the public soon.
To be clear, I am not claiming the public will never have access to a model as strong as Mythos, this seems almost certainly false, I am claiming that the public will probably never have access to the SOTA of that time.
Glasswing makes it clear that the attitude among top large companies - those in power - is that AI models with a certain level of capability will need to have strict usage controls.
So we're not going back, but what does it mean?
As models continue to improve, the gap between the capabilities of models that AI companies can train and the capabilities of models that the public can use will widen.
Holding keys to such a model therefore represents a significant power advantage over anyone else who does not hold keys to such a model. Project Glasswing is claimed to be strictly defensive operation, as in companies beefing up cybersecurity for the common good. The reality is that even if you think cybersecurity is a positive-sum game, warfare is not, and having good cybersecurity in a conflict represents a significant advantage over your opponent.
This concerns me immensely. I figured this was going to happen eventually, but essentially this is a measurable[6] manifestation of power shifting towards those with keys to AI and away from those without. While I can't say with 100% certainty that this was always the value proposition of AI companies the idea that they raised trillions upon trillions to democratize AI and help everyone was always dubious to me.
Furthermore as I said this does not seem to be reversible. I do not necessarily think it would be a good idea for Anthropic and all future SOTAs to be fully released to the public, as yes they can be used for malicious purpose.[7] However the consequence of this irreversible power shift unnerves me immensely.
Democracies fundamentally rely on humans being innately powerful[8], and so of course an irreversible power shift towards centralized AI and away from people concerns me.
In summary, it seems that we are departing an era where everyone could access SOTA models, and entering an era where SOTA model access is strictly guarded. From this we might guess we are entering a stage where AI companies fulfill their subtext value proposition, that being developing intelligences vastly superior to humans and using them to generate obscene and profitable power differences relative to the general population. This should be immensely concerning.
- ^
Anthropic claims Mythos is able to reliably find exploitable security flaws in lots of software and therefore could be used as a powerful tool
- ^
It seems like they intend to release a version that has significantly reduced capabilities, though they do intend to use the current un-nerfed model for project glasswing
- ^
Project Glasswing is Anthropic lending their Mythos model to a bunch of companies to beef up cybersecurity
- ^
Not everyone got access to every model instantly as soon as it has trained, but every SOTA up until now has essentially been trained with the idea of selling it to the public.
- ^
According to various sources OpenAI's model (Spud) may be on par with mythos, and may be released to the general public. However, if it follows the pattern where access to an un-nerfed version is guarded while a nerfed version is released to the public, it will still serve this trend.
- ^
Google/Amazon (heavy Anthropic investors) stocks rose by ~5%, cybersecurity company stocks dropped
- ^
I am personally not going to take a stance either way. It seems inevitable that SOTA reaches a point where it is legitimately dangerous for anyone (including to malicious actors), so this is indifferent to Mythos being a game changer. However if this is the case, surely it means it's highly consequential (dangerous) for companies or other value seeking entities that may not be explicitly aligned to positive human well being to access it as well.
- ^
Zack_M_Davis phrased it in a way I liked so I'll put it here: "...democracy isn't a real option when we're thinking about the true locus of sovereignty in a posthuman world. Both the OverClaude and God-Emperor Dario I could hold elections insofar as they wanted to serve the human people, but it would be a choice. In a world where humans have no military value, the popular will can only matter insofar as the Singleton cares about it, as contrasted to how elections used to be a functional proxy for who would win a civil war.)"
Discuss
Uninterrupted Writing as Metric
I'm a struggling beginner to this whole writing business, and I've been wondering how to measure my skill as it improves. There is an app called the most dangerous writing app, that deletes the words you've typed in it if you don't keep writing up to a specific amount of time, or for a specific amount of words. I thought it was pretty neat, and naturally I want to optimize it till it ceases to be a good metric. How long are my thoughts? How long do my thoughts remain clear and focused like a laser before diffracting into fuzzy ambiguous vague nonsense? Of course, the app only measures how long you can suppress the inner editor, but I think that's a pretty good proxy for writing skill at my level.
There's this older essay on thought lengths that I often find myself thinking of. The general notion being that some thoughts are quick, someone asks a question, and the answer is right there, top of mind. You already know your reply as they finish asking the question. Other thoughts take longer to think. Someone asks you a deeper question, and you have to think about it for a while. I think you could use the dangerous writing app for long word counts or periods of time only if you've already done a lot of thinking. Only if you've already got most of an idea figured out in your head can you immediately type it out. Or at least, that's how it seems to work for me. The dangerous writing app might be measuring something similar, if you set it higher and higher. Or maybe it's just stream-of-consciousness writing, which is usually a lot lower quality. I don't think this is as useful for experienced writers. Getting the right words is I suspect the harder problem than getting words of any kind out. Though perhaps it's always useful to have something to help suppress the inner editor for a while. It's certainly been useful for me.
Currently, when I set the app for 500 words, I run out of steam. As I write words my hands start typing ahead of my mind, and the distance between them grows. My mind stretches to fill the gap, and eventually fails and there is nothing to write. Then my words get deleted. I have to start smaller, set it to 100 words, and then I still feel the stretch of my hands typing out farther than I can think. But this time I can make it, I can make the 100 word deadline, I get to keep my words, and try to think about the next thought. I've started using it to write paragraphs, instead of trying to write a whole essay at once.
Developing my ability to babble is another way I think of it. In the babble and prune frame I certainly can prune better than I can babble. It makes it hard for me to accept the poor writing that I actually produce, instead of the ideal version in my head, but that's what practice is for. This problem is something that Alkjash noticed and wrote about then, and reading it years ago I realized I had the exact problem they described. Yet it took a while for me to do something about it. I eventually setup a beeminder to journal 250 words a day, and that helped me a little bit, but I didn't increase the number. Plus, journaling is quite a bit different than writing something intended for other people.
It would be cool if I started tracking this, try to gradually increase the word limit and see how far I can get with dedicated practice. Right now I can write 100 words on a topic without stopping. I suspect experienced writers could write for thousands of words before running out of steam, but maybe not. I really like how number of words written without stopping is an unbounded metric, the number can always increase!
It might not be the only metric of writing skill, but it is one that is pretty easy to measure. I intend to use it heavily to gauge my own babbling abilities, and maybe you can to.
Discuss
You're gonna need a bigger boat (benchmark), METR
In this post, we’ll discuss three major problems with the METR eval and propose some solutions. Problem 1: The METR eval produces results with egregious confidence intervals, and the METR chart misleadingly hides this. Problem 2: There's a lack of sample size for long duration tasks. Problem 3: METR doesn't test Claude Code or Codex.
(Note that while this post is critical, we do think that the METR eval is nonetheless valuable and the organization is doing important work.)
Problems with the METR evalProblem: Large (and misleading) confidence intervalsMETR's confidence intervals are too big and they're misleadingly presented. Let's take a look at the good old METR chart.
Okay, so the confidence intervals are pretty big, but is that really a problem? After all, we can still see the general trend, right?
Well, we can see the trend over the course of multiple years. But we can't see the trend in smaller timeframes due to how noisy the eval is. When we say that the confidence intervals are too big, we mean that the METR eval is failing to distinguish models with obvious time horizon capability differences.
This becomes apparent when we zoom in.
The data point second from the right is Sonnet 3.5, which came out in June 2024. GPT-4 came out in March 2023. Sonnet 3.5 was and is obviously significantly more capable than GPT-4, but the METR eval doesn't show that. Their confidence intervals overlap substantially.
Let's zoom in again from April 2025 to March 2026.
What can we make of Opus 4.6 here? (Look back at the zoomed out chart as well). It might be better than every other model. But could it be, like, a year ahead of what the METR trendline predicts? The problem is that it could be (according to the eval), but we can't see that because the graph is misleadingly cut off at the top.
This is especially misleading because the confidence interval for Opus 4.6 is significantly asymmetric (as it should be); that is, there's less confidence on the upper bound than on the lower bound of 4.6.
Now, you might ask, why is the confidence interval so big and why is it asymmetric?
Problem: Lack of sample size for long duration tasks[1]The headline number of tasks in METR's eval is 228, which sounds pretty good. Why are the confidence intervals so wide? The reason becomes clear when we look at the breakdown of tasks by duration.
Since the eval is trying to determine the task length at which the model succeeds 50% of the time, the durations at which the model scores closer to 50% dominate the confidence interval. For example, if we consider Opus 4.6, it has a 94% success rate on the 16m-1.1hr bucket. Given this, its 98% success rate on the 82 tasks in the 0-4min bucket give us ~0 additional information. Looking at a breakdown of Opus 4.6's solve rate by task duration makes this apparent.
Solution: More long duration tasksThe good news is that this problem is easy to fix; just add more longer-duration tasks. METR has the money for this. Seriously, why has this not been done? We're actually curious: has METR just not prioritized it, have they encountered problems with hiring people to design the tasks, something else? METR did add some new tasks between 2025 and 2026 but... not many? What are we missing here?
In particular, we're going to need tasks of 16h-5d for the near future. METR hasn't yet published Mythos's performance on its benchmarks, so we'll share our estimate of its performance.
[THIS IS OUR ESTIMATE, NOT ITS ACTUAL EVAL'D SCORE!]
Problem: METR doesn't test Claude Code or CodexAny software engineer can tell you that the capabilities of AI in December 2025 were dramatically different from those in April 2025. Much of the change in real world capabilities during this period came not from better models but from harnesses: Claude Code and OpenAI's Codex. Many people heralded the November release of Opus 4.5 alongside a major Claude Code update in particular as being a "step-change" moment. Is this reflected in the METR chart?
No, and it's not only because of the problems with sample size mentioned earlier. The issue here is that METR does not test any models inside Claude Code or Codex. Instead, they test all models using their proprietary harness called 'Inspect', which is almost certainly worse than Claude Code and Codex[2].
Perhaps METR wants to test the models-qua-models; using different scaffolds for different models would be testing something else. But scaffolds are really important. ¿Por qué no los dos?
The result of this is that since the release of Claude Code and Codex in May 2025, the METR chart has been underestimating SoTA capabilities.
Solution: Test Claude Code and CodexPretty straightforward.
ConclusionMETR does not inform us on the SoTA SWE capabilities of AI because it doesn’t test Claude Code or Codex. It could very well be the case that Opus-4.6+Claude-Code completely saturates METR's benchmark! We expect METR to tell us that mythos is significantly better than Opus 4.5, but it won’t tell us it’s significantly better than Opus 4.6 because of the giant confidence intervals.
We're gonna need a bigger benchmark.
- ^
Other people have noticed that we’re running out of benchmarks to upper bound AI capabilities.
On April 10 (two days ago as we're writing this), Epoch released a report with the headline "Evidence that AI can already so some weeks-long coding tasks". They continue: "In our new benchmark, MirrorCode, Claude Opus 4.6 autonomously reimplemented a 16,000-line bioinformatics toolkit — a task we believe would take a human engineer weeks."
- ^
OpenAI says GPT-5.3 is "optimized for agentic coding tasks in Codex or similar environments". AFAIK this is also true of GPT-5.1, 5.2, and 5.4. The general consensus seems to be that the GPT line does better in Codex than other harnesses.
Discuss
Returns to intelligence
I'm going to tell you a story. For that story to make sense, I need to give you some background context.
I have some pretty smart friends. One of them is Peter Schmidt-Nielsen. Peter has an illustrious line of descent. His paternal grandfather was Knut Schmidt-Nielsen, regarded as one of the great animal physiologists of his time. His paternal grandmother was Bodil Schmidt-Nielsen, who became the first woman president of the American Physiological Society. Bodil's father was August Krogh, who won the 1920 Nobel Prize in Physiology "for the discovery of the mechanism of regulation of the capillaries in skeletal muscle", and later went on to found the company that would eventually become Novo Nordisk.
Peter himself is no slouch. He was homeschooled for most of his childhood. When it was time to go to university, he simultaneously applied to MIT's undergrad and grad programs, and was accepted to both. (He decided to go to undergrad.) He went on to do some startup stuff, then was an early employee at Redwood. While there, he broke Meow Hash for fun. (He's not a security guy.)
My point here is: Peter is very, very smart.
Ok, here's the story.
One day I was in a room with Peter and Drake Thomas. Peter was telling us a story about a puzzle he'd grown up with, but never solved. Peter's father also went to MIT. While there, he decided to come up with a new "cube" puzzle, finding traditional cube puzzles like the Soma Cube too easy. Knowing that people often struggled with chirality, he decided to start with the six chiral pentacube pairs, but that left four cube units to complete a four-sided cube. Thinking that four 13 pieces would make it too easy, he decided to fill out the remainder with two 1x1x2 pieces (i.e. dominos).
The six chiral pentacube pairs. Source: https://sicherman.net/c5nomen/index.html
He then cut the puzzle out of wood and spent some time trying to solve it. Not having any success, he left it overnight in the grad student lounge, and came back to find it solved the next morning.
Drake, hearing this story, said something to the effect of, "I think I can probably solve this puzzle in my head."
Impossible, right? No way a human can do that in their head in any reasonable time frame?
If you want to play around with the puzzle yourself, I've put a widget into the collapsible section below.
Pentacube Puzzle
After that, I watched Drake lie down on a couch and stare into space for two hours. Then he went to sleep. He came back downstairs the next morning and stared into space for another hour. Then he took a piece of paper and pen and wrote this down (in a spoiler block, in case you want to avoid any hints):
We didn't have a copy of the puzzle handy to make extra-sure, so they 3d printed one out and confirmed that the solution was correct.
Here are a couple of the original puzzles from the 70s:
The distribution of what unassisted human brains can accomplish is extremely wide. Human brains are squishy meat sacks. Better things are possible. Alas.
Discuss
Daycare illnesses
Before I had a baby I was pretty agnostic about the idea of daycare. I could imagine various pros and cons but I didn’t have a strong overall opinion. Then I started mentioning the idea to various people. Every parent I spoke to brought up a consideration I hadn’t thought about before—the illnesses.
A number of parents, including family members, told me they had sent their baby to daycare only for them to become constantly ill, sometimes severely, until they decided to take them out. This worried me so I asked around some more. Invariably every single parent who had tried to send their babies or toddlers to daycare, or who had babies in daycare right now, told me that they were ill more often than not.
One mother strongly advised me never to send my baby to daycare. She regretted sending her (normal and healthy) first son to daycare when he was one—he ended up hospitalized with severe pneumonia after a few months of constant illnesses and infections. She told me that after that she didn’t send her other kids to daycare and they had much healthier childhoods.
I also started paying more attention to the kids I saw playing outside with their daycare group and noticing that every one had a sniffly nose.
I asked on a mothers group chat about people’s experiences with daycare. Again, the same. Some quotes:
“They do get sick a lot. I started my son at 2.5 and feel he always has something.”
“The limit does not exist.”
“brought home every plague (in first 6mo, Covid, HFM, slapcheek, RSV)”
“They usually say 8-12 illnesses per year. My girls were sick every 2-3 weeks in their first year of daycare”
“My daughter started daycare at 6 months and got sick a ton the first year”
Despite all this, many parents who have the option not to (i.e. they can afford in-home care with a nanny or for one parent to stay home) still choose to send their babies and toddlers to daycare. How come? Surely most well-off adults wouldn’t agree to be ill nonstop in exchange for the monetary savings daycare provides?
Asking around, it seemed like the most common reason given was that parents believed daycare illnesses “built immunity”; that if their babies and toddlers got sick at daycare they’d get less sick later in childhood and so overall it would net out the same. Unfortunately few could point me to any evidence for this but nevertheless passionately defended the view.
The claim that daycare illnesses simply offset childhood and adult illness immediately seemed suspect to me for a number of reasons:
- (Quite confident) The most common illnesses (colds and flu) don’t build immunity in general (in kids or adults) because they mutate every year
- (Quite confident) The same illness has a greater risk of complications in babies vs. older children and adults
- (Moderately confident) The same illness has a greater duration in babies vs. older children and adults
- (Moderately confident) Illness during early development is probably more harmful than illness during adulthood
- (Weak guess) Daycare environments are more conducive to disease spread than schools for older kids and the number of possible illnesses is very high; there isn’t just a limited number of things you catch once
I xeeted about this:
A number of people sent me this link, an alleged “study” from UCL showing that “frequent infections in nursery help toddlers build up immune systems”, authored (of course) by a group of parents who all send their kids to nursery (what the British call daycare).
The link I was sent was actually a UCL press release summarizing a narrative review paper and not a study itself. Narrative reviews are susceptible to selection bias because, unlike systematic reviews or meta-analyses, there’s no pre-registered search protocol or PRISMA-style methodology requiring them to account for all relevant evidence. But I decided to look into the narrative review more, to assess its validity fairly. I got access to the full publication.
Unlike the press release, which ignores these considerations entirely, it does engage with severity and age-related vulnerability, conceding that younger toddlers and babies suffer more from the same illnesses. A section on immunology provides a detailed account of why infants under two are more vulnerable—their immune systems are much less effective at fighting the same infections for a plethora of well-understood reasons. The review also cites a large Danish registry study (Kamper-Jørgensen et al) that reports a 69% higher incidence of hospitalization for acute respiratory infections in under-1s in daycare.
However, these severity findings are integrated into the review’s conclusions and framing in an incredibly biased way. The introduction describes severe outcomes as occurring “in rare cases,” and the conclusions focus on normalizing the burden and advocating for employer understanding. After establishing the immunological basis for why the same infection is more dangerous in a 6-month-old than a 3-year-old, it doesn’t then ask the hard follow-up question: given this, is the pattern of starting daycare at 6–12 months optimal from a child health perspective? Instead, the review frames this timing as a societal given. The Hand Foot and Mouth Disease section is a good example of the review’s handling: it reports that daycare attendance was associated with more severe cases but then immediately offers mitigating interpretation with no evidence—that prolonged hospital stays might reflect parental work constraints rather than genuine severity.
Though the review considers severity, it ignores duration. Their primary metric throughout is episode count. Also, despite discussing a wide variety of pathogens, it doesn’t address which of these infections carry the highest complication rates in infants and toddlers specifically.
Finally, the crucial “Illness now or illness later?” is the paper’s weakest portion. It rests on two primary sources for the compensatory immunity claim:
- The Tucson Children’s Respiratory Study: a cohort study of ~1,000 American children followed from birth to age 13 in the early 2000s, finding that daycare attendees had more colds at age 2 but fewer by age 6–11.
- A Dutch study (Hullegie et al. 2016) of 2220 children followed for 6 years, finding reduced GI illness between ages 2–5 in children with first-year daycare attendance.
These are reasonable small studies, but the paper does not cite or engage with the Søegaard et al. 2023 study (International Journal of Epidemiology)—a register-based cohort of over 1 million Danish children followed to age 20, which directly tested and rejected the compensatory immunity hypothesis. Quoting from the study:
We observed 4 599 993 independent episodes of infection (antimicrobial exposure) during follow-up. Childcare enrolment transiently increased infection rates; the younger the child, the greater the increase. The resulting increased cumulative number of infections associated with earlier age at childcare enrolment was not compensated by lower infection risk later in childhood or adolescence.
This is arguably the single most relevant study for the paper’s central “illness now or illness later” question, and it’s three orders of magnitude larger than either study the authors cite. Its absence is hard to explain—it was published in a top epidemiology journal in late 2022 (available online November 2022), well before the review was written.
Accordingly, they hedge their conclusions carefully—“attendance at formal childcare may tip the balance in favor of infection now rather than later”, but their press release ignores any nuance, referring to daycare as an “immune boot camp”.
So overall, the compensatory immunity claim seems very weak and my prior that daycare illness is straight-up bad remains. Parents are citing biased reviews from motivated researchers. We are only beginning to understand the deleterious effects of increased viral load in infants.
I predict that in the future we’ll learn more about the side-effects of increased viral load on intelligence, wellbeing, fatigue etc. The “just the sniffles” mentality is a harmful attitude toward infections that promotes the dismissal of phenomena that substantially impact child and adult wellbeing.
Discuss
TAPs or it didn't happen
Once, I went to talk about "curiosity" with @LoganStrohl. They noted "it seems like you have a good handle on 'active curiosity', but you don't really do much diffuse 'open curiosity.'" The convo went on for awhile, and felt very insightful.
(I may not be remembering details of this convo right. Apologies to Logan)
Towards the end of the conversation, I was moving to wrap up and move on. And Logan said "Wait. For this to feel complete to me, I'd like it if we translated this into more explicit TAPs. TAPs or it didn't happen."
You can get a new insight. But, if the insight doesn't translate into some kind of action you're going to do sometimes, there is a sense in which it didn't matter. And people mostly fail to gain new habits. If you're going to have a shot in hell of translating this into action, it's helpful to have some kind of plan.
Recap on TAPs"TAP" stands for "Trigger Action Pattern", and also "Trigger Action Plan." A TA-Pattern is whatever you currently do by default when faced with a particular trigger. A TA-Plan is an attempt to install a TA-Pattern on purpose.
To turn an insight into a TAP, you need some idea of what it'd mean to translate the insight into a useful action. (I'll touch on this later but mostly it's beyond the scope of this post). But, after that, you will need a...
- Trigger. In what situations is it going to be appropriate to somehow take an action informed by the insight? Be as concrete as possible.
- Default Action. What do you normally do in that situation?
- New Action. What do you now hope to do instead?
But, pretty crucial to this going well is:
- Obstacle Visualization. When you simulate being in that situation, and it occurring to you "oh I should do that new habit", what's going to come up that's going to predictably make me fail to do it?
- Action-that-includes-dealing-with-obstacle Visualization. Now, visualize yourself overcoming that obstacle, and doing the habit.
Example: Sometimes, you talk to your colleague and end up getting in a triggered argument, where you both get kinda aggro at each other and talk past each other.
Maybe you have the insight "oh, maybe I'm the problem", along with "I should maybe try to de-escalate somehow" or "I should do better at listening."
Naive attempt at a TAP:
- Trigger: I, uh, get triggered.
- Action: I take a deep breath and remind myself not to get triggered.
Mysteriously, you find yourself not remembering to do this in the heat-of-the-moment.
Slightly more sophisticated attempt (after a round of doing some Noticing and curious investigation, which is also beyond scope for this post)
- Trigger: I notice my voice gets more intense.
- Action: I take a deep breath and remind myself not to get triggered.
Okay, but then in the heat of the moment, idk you're just so mad, it doesn't feel fair that you have to be the one to de-escalate.
- Trigger: I notice my voice gets more intense
- Obstacle: I feel an angry sense of unfairness
- Obstacle-Overcome: I sit with the anger and remind myself of whatever my endorsed way of relating to the anger is.
- Action: I take a deep breath
...and then you might find that taking a deep breath doesn't actually help as much as you hoped, or is insufficient. Figuring out how to handle arbitrary problems is, you know, the complete body of rationality tools, including those not-yet discovered.
Turning Takeaways into TAPsI think "TAPs or didn't happen" is a bit too strong. Conversations can be useful for reasons other than turning into new habits. But, I recommend thinking of "Turning takeaways into actions" as a thing you might want to do.
While the skill here is basically "fully general rationality", here's a few suggested prompts to get started.
First, you might want a stage of asking:
"What even were the takeaways from this conversation?". You might have had a fun meandering convo. What do you want to remember?
"Why does this takeaway feel important or useful to me?". At first, you might have only a vague inkling of "this feels exciting." Why does it feel exciting?
"When, or in what domains, do I specifically want to remember this takeaway?"
"What would I do differently, in the world where I was taking the takeaway seriously?"
...
I have a horrible confession to make.
I do not remember what TAPs I ended up coming up with.
I do think I ended up incorporating the concepts into my life, and this routed at least somewhat through the TAPs. Here is my attempted reconstruction of what happened at the time:
In the conversation about curiosity, some things that came up were that I feel like "open curiosity" takes too long (compared to directly tackling questions in an active, goal-driven way". I feel like I'd have to boot up in a whole new mode of being to make it work, and... idk I just imagine this taking years to pay off.
I nonetheless have some sense that there's a kind of intellectual work that openly curious people do, that's actually harder to do with active curiosity.
A thing that came up is the move off... just noticing that some things are more interesting than other things. Even if something doesn't immediately feel actively fascinating, there's a move you can make, to notice when there's a diff between how interesting one thing feels, vs another thing. And, pay extra attention to the more interesting thing, and what's interesting about it. Overtime this can cultivate curiosity as a kind of muscle.
The TAP version of this is:
- Trigger: I notice a flicker of "something feels a bit interesting"
- (Obstacle): I'm busy and it doesn't viscerally feel worth leaning into.
- Action:
- (dealing with obstacle if appropriate): Remind myself that I'm pretty sure I do believe that open curiosity is worth cultivating
- Pay more attention to the thing-that-feels-a-bit-interesting, ask why it's interesting, and see if I notice more interesting things about that.
And, relatedly:
- Trigger: I notice I feel some impulse to spend some time openly exploring a thing, that I don't really have that good a reason to find interesting.
- Action: Check if today is a particularly busy day, and if not, lean into taking some time to openly indulge the curiosity.
...
May your good conversations live on in your actions.
Discuss
Talk English, Think Something Else
There's an adage from programming in C++ which goes something like "Yes, you write C, but you imagine the machine code as you do." I assumed this was bullshit, that nobody actually does this. Am I supposed to imagine writing the machine code, and then imagine imagining the binary? and then imagine imagining imagining the transistors?
Oh and since I don't actually use compiled languages, should I actually be writing Python, then imagining the C++ engine, and so on?
Then one day, I was vibe-coding, and I realized I was writing in English and thinking in Python. Or something like it. I wasn't actually imagining every line of Python, but I was imagining the structure of the program that I was describing to Claude, and adding in extra details to shape that structure.
Pub Philosophy BrosThis post is actually about having sane conversations with philosophy bros at the pub.
People like to talk in English (or other human languages) because our mouths can't make sounds in whatever internal neuralese our brains use. Sometimes, like in mathematics, we can make the language of choice trivially isomorphic to the structures that we're talking about. But most of the time we can't do that.
Consider the absolute nonsense white horses paradox, where "a white horse is not a horse" is read both as the statement:
mjx-math { display: inline-block; text-align: left; line-height: 0; text-indent: 0; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; border-collapse: collapse; word-wrap: normal; word-spacing: normal; white-space: nowrap; direction: ltr; padding: 1px 0; } mjx-container[jax="CHTML"][display="true"] { display: block; text-align: center; margin: 1em 0; } mjx-container[jax="CHTML"][display="true"][width="full"] { display: flex; } mjx-container[jax="CHTML"][display="true"] mjx-math { padding: 0; } mjx-container[jax="CHTML"][justify="left"] { text-align: left; } mjx-container[jax="CHTML"][justify="right"] { text-align: right; } mjx-mo { display: inline-block; text-align: left; } mjx-stretchy-h { display: inline-table; width: 100%; } mjx-stretchy-h > * { display: table-cell; width: 0; } mjx-stretchy-h > * > mjx-c { display: inline-block; transform: scalex(1.0000001); } mjx-stretchy-h > * > mjx-c::before { display: inline-block; width: initial; } mjx-stretchy-h > mjx-ext { /* IE */ overflow: hidden; /* others */ overflow: clip visible; width: 100%; } mjx-stretchy-h > mjx-ext > mjx-c::before { transform: scalex(500); } mjx-stretchy-h > mjx-ext > mjx-c { width: 0; } mjx-stretchy-h > mjx-beg > mjx-c { margin-right: -.1em; } mjx-stretchy-h > mjx-end > mjx-c { margin-left: -.1em; } mjx-stretchy-v { display: inline-block; } mjx-stretchy-v > * { display: block; } mjx-stretchy-v > mjx-beg { height: 0; } mjx-stretchy-v > mjx-end > mjx-c { display: block; } mjx-stretchy-v > * > mjx-c { transform: scaley(1.0000001); transform-origin: left center; overflow: hidden; } mjx-stretchy-v > mjx-ext { display: block; height: 100%; box-sizing: border-box; border: 0px solid transparent; /* IE */ overflow: hidden; /* others */ overflow: visible clip; } mjx-stretchy-v > mjx-ext > mjx-c::before { width: initial; box-sizing: border-box; } mjx-stretchy-v > mjx-ext > mjx-c { transform: scaleY(500) translateY(.075em); overflow: visible; } mjx-mark { display: inline-block; height: 0px; } mjx-c { display: inline-block; } mjx-utext { display: inline-block; padding: .75em 0 .2em 0; } mjx-TeXAtom { display: inline-block; text-align: left; } mjx-mi { display: inline-block; text-align: left; } mjx-mspace { display: inline-block; text-align: left; } mjx-c.mjx-c7B::before { padding: 0.75em 0.5em 0.25em 0; content: "{"; } mjx-c.mjx-c48::before { padding: 0.683em 0.75em 0 0; content: "H"; } mjx-c.mjx-c6F::before { padding: 0.448em 0.5em 0.01em 0; content: "o"; } mjx-c.mjx-c72::before { padding: 0.442em 0.392em 0 0; content: "r"; } mjx-c.mjx-c73::before { padding: 0.448em 0.394em 0.011em 0; content: "s"; } mjx-c.mjx-c65::before { padding: 0.448em 0.444em 0.011em 0; content: "e"; } mjx-c.mjx-c7C::before { padding: 0.75em 0.278em 0.249em 0; content: "|"; } mjx-c.mjx-c43::before { padding: 0.705em 0.722em 0.021em 0; content: "C"; } mjx-c.mjx-c6C::before { padding: 0.694em 0.278em 0 0; content: "l"; } mjx-c.mjx-c75::before { padding: 0.442em 0.556em 0.011em 0; content: "u"; } mjx-c.mjx-c28::before { padding: 0.75em 0.389em 0.25em 0; content: "("; } mjx-c.mjx-c29::before { padding: 0.75em 0.389em 0.25em 0; content: ")"; } mjx-c.mjx-c3D::before { padding: 0.583em 0.778em 0.082em 0; content: "="; } mjx-c.mjx-c57::before { padding: 0.683em 1.028em 0.022em 0; content: "W"; } mjx-c.mjx-c68::before { padding: 0.694em 0.556em 0 0; content: "h"; } mjx-c.mjx-c69::before { padding: 0.669em 0.278em 0 0; content: "i"; } mjx-c.mjx-c74::before { padding: 0.615em 0.389em 0.01em 0; content: "t"; } mjx-c.mjx-c7D::before { padding: 0.75em 0.5em 0.25em 0; content: "}"; } mjx-c.mjx-c2260::before { padding: 0.716em 0.778em 0.215em 0; content: "\2260"; } mjx-c.mjx-c210E.TEX-I::before { padding: 0.694em 0.576em 0.011em 0; content: "h"; } mjx-c.mjx-c2208::before { padding: 0.54em 0.667em 0.04em 0; content: "\2208"; } mjx-c.mjx-c27F9::before { padding: 0.525em 1.638em 0.024em 0; content: "\27F9"; } mjx-container[jax="CHTML"] { line-height: 0; } mjx-container [space="1"] { margin-left: .111em; } mjx-container [space="2"] { margin-left: .167em; } mjx-container [space="3"] { margin-left: .222em; } mjx-container [space="4"] { margin-left: .278em; } mjx-container [space="5"] { margin-left: .333em; } mjx-container [rspace="1"] { margin-right: .111em; } mjx-container [rspace="2"] { margin-right: .167em; } mjx-container [rspace="3"] { margin-right: .222em; } mjx-container [rspace="4"] { margin-right: .278em; } mjx-container [rspace="5"] { margin-right: .333em; } mjx-container [size="s"] { font-size: 70.7%; } mjx-container [size="ss"] { font-size: 50%; } mjx-container [size="Tn"] { font-size: 60%; } mjx-container [size="sm"] { font-size: 85%; } mjx-container [size="lg"] { font-size: 120%; } mjx-container [size="Lg"] { font-size: 144%; } mjx-container [size="LG"] { font-size: 173%; } mjx-container [size="hg"] { font-size: 207%; } mjx-container [size="HG"] { font-size: 249%; } mjx-container [width="full"] { width: 100%; } mjx-box { display: inline-block; } mjx-block { display: block; } mjx-itable { display: inline-table; } mjx-row { display: table-row; } mjx-row > * { display: table-cell; } mjx-mtext { display: inline-block; } mjx-mstyle { display: inline-block; } mjx-merror { display: inline-block; color: red; background-color: yellow; } mjx-mphantom { visibility: hidden; } _::-webkit-full-page-media, _:future, :root mjx-container { will-change: opacity; } mjx-c::before { display: block; width: 0; } .MJX-TEX { font-family: MJXZERO, MJXTEX; } .TEX-B { font-family: MJXZERO, MJXTEX-B; } .TEX-I { font-family: MJXZERO, MJXTEX-I; } .TEX-MI { font-family: MJXZERO, MJXTEX-MI; } .TEX-BI { font-family: MJXZERO, MJXTEX-BI; } .TEX-S1 { font-family: MJXZERO, MJXTEX-S1; } .TEX-S2 { font-family: MJXZERO, MJXTEX-S2; } .TEX-S3 { font-family: MJXZERO, MJXTEX-S3; } .TEX-S4 { font-family: MJXZERO, MJXTEX-S4; } .TEX-A { font-family: MJXZERO, MJXTEX-A; } .TEX-C { font-family: MJXZERO, MJXTEX-C; } .TEX-CB { font-family: MJXZERO, MJXTEX-CB; } .TEX-FR { font-family: MJXZERO, MJXTEX-FR; } .TEX-FRB { font-family: MJXZERO, MJXTEX-FRB; } .TEX-SS { font-family: MJXZERO, MJXTEX-SS; } .TEX-SSB { font-family: MJXZERO, MJXTEX-SSB; } .TEX-SSI { font-family: MJXZERO, MJXTEX-SSI; } .TEX-SC { font-family: MJXZERO, MJXTEX-SC; } .TEX-T { font-family: MJXZERO, MJXTEX-T; } .TEX-V { font-family: MJXZERO, MJXTEX-V; } .TEX-VB { font-family: MJXZERO, MJXTEX-VB; } mjx-stretchy-v mjx-c, mjx-stretchy-h mjx-c { font-family: MJXZERO, MJXTEX-S1, MJXTEX-S4, MJXTEX, MJXTEX-A ! important; } @font-face /* 0 */ { font-family: MJXZERO; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Zero.woff") format("woff"); } @font-face /* 1 */ { font-family: MJXTEX; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Regular.woff") format("woff"); } @font-face /* 2 */ { font-family: MJXTEX-B; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Bold.woff") format("woff"); } @font-face /* 3 */ { font-family: MJXTEX-I; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Math-Italic.woff") format("woff"); } @font-face /* 4 */ { font-family: MJXTEX-MI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Italic.woff") format("woff"); } @font-face /* 5 */ { font-family: MJXTEX-BI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Math-BoldItalic.woff") format("woff"); } @font-face /* 6 */ { font-family: MJXTEX-S1; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size1-Regular.woff") format("woff"); } @font-face /* 7 */ { font-family: MJXTEX-S2; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size2-Regular.woff") format("woff"); } @font-face /* 8 */ { font-family: MJXTEX-S3; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size3-Regular.woff") format("woff"); } @font-face /* 9 */ { font-family: MJXTEX-S4; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size4-Regular.woff") format("woff"); } @font-face /* 10 */ { font-family: MJXTEX-A; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_AMS-Regular.woff") format("woff"); } @font-face /* 11 */ { font-family: MJXTEX-C; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Calligraphic-Regular.woff") format("woff"); } @font-face /* 12 */ { font-family: MJXTEX-CB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Calligraphic-Bold.woff") format("woff"); } @font-face /* 13 */ { font-family: MJXTEX-FR; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Fraktur-Regular.woff") format("woff"); } @font-face /* 14 */ { font-family: MJXTEX-FRB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Fraktur-Bold.woff") format("woff"); } @font-face /* 15 */ { font-family: MJXTEX-SS; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Regular.woff") format("woff"); } @font-face /* 16 */ { font-family: MJXTEX-SSB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Bold.woff") format("woff"); } @font-face /* 17 */ { font-family: MJXTEX-SSI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Italic.woff") format("woff"); } @font-face /* 18 */ { font-family: MJXTEX-SC; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Script-Regular.woff") format("woff"); } @font-face /* 19 */ { font-family: MJXTEX-T; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Typewriter-Regular.woff") format("woff"); } @font-face /* 20 */ { font-family: MJXTEX-V; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Vector-Regular.woff") format("woff"); } @font-face /* 21 */ { font-family: MJXTEX-VB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Vector-Bold.woff") format("woff"); }
And the phrase "a white horse is a horse" is read as the statement:
I often think in a language of causal graphs. English isn't very good at talking about causal graphs. It doesn't have individual words for "A contains the same information to B", "A is the same node as B", "A is an abstraction over B", "A is a node which is causally upstream of B".
I remember talking about "consciousness" with a philosophy guy at the pub once. I think I said something like "A certain structure of computation causes consciousness" meaning "Consciousness is an label applied to certain computational structures", but which he interpreted as "The presence of a certain computational structure is a node upstream of consciousness". This caused immense confusion.
I call the problems here "beetle problems"
Beetle ProblemsWittgenstein proposed a thought experiment. Suppose you have a society where:
- Everyone gets a box.
- Everyone uses the word "beetle" to refer to what's in the box
- Everyone can look in their own box
- Nobody can look in anybody else's box
In this case, the meaning of the word "beetle" is entirely socially constructed. Wittgenstein was exaggerating here: if I talk to you, and you do something with your beetle (dirty jokes aside) and report the results, I can get some information about your beetle, based on what you say back to me. The beetle is causally entangled with us both. It's just not a very efficient way of talking about things.
Even if we both have identical beetles, it might take us a while to get them oriented the same way round, what I call an antenna, you might call a leg, what I call a wing-case you call a carapace. And so on.
To unfairly single out an example. I personally find this particularly salient when talking to people in the Oxford EA/longtermist cluster. I know they're smart people, who can put together an argument, but they've developed a language I just cannot penetrate. It takes a long time for me to figure out what on earth they mean. Ohh, you have your beetle upside down compared to mine.
Even worse, I think a lot of people don't actually think in terms of causal graphs the way I do. This comes up when I try to read pieces on moral realism. When someone brings up a stance-independent reason to do something, I simply cannot map this onto any concept which exists in my mental language. What do you mean your beetle has fur and claws and keeps going "meow"? Are you sure?
SolutionsUhh... I don't have many. Beetle problems take a while to figure out. I once got feedback on an essay test that said "Your ideas seemed confused." and I thought "Man, your draft seemed confused!". I don't think I could have done much better, without spending time in person hashing out the beetle problems.
It might have helped to have a better conception of beetle problems, though. I could at least have pointed it out. Perhaps in future I'll come back with a wonderful beetle-solving problem.
Editor's note: this post was written as part of Doublehaven (unaffiliated with Inkhaven).
◆◆◆◆◆|◆◆◆◆◆|◆◆◇◇◇
◆◆◆◆◆|◆◆◆◆◆|◆◆◇◇◇
Discuss
Morale
One particularly pernicious condition is low morale. Morale is, roughly, "the belief that if you work hard, your conditions will improve." If your morale is low, you can't push through adversity. It's also very easy to accidentally drop your morale through standard rationalist life-optimization.
It's easy to optimize for wellbeing and miss out on the factors which affect morale, especially if you're working on something important, like not having everyone die. One example is working at an office that feeds you three meals per day. This seems optimal: eating is nice, and cooking is effort. Obvious choice.
ExampleBut morale doesn't come from having nice things. Consider a rich teenager. He gets basically every material need satisfied: maids clean, chefs cook, his family takes him on holiday four times a year. What happens when this kid comes up against something really difficult in school? He probably doesn't push through.
"Aha", I hear you say. "That kid has never faced adversity. Of course he's not going to handle it well." Ok, suppose he gets kicked in the shins every day and called a posh twat by some local youths, but still goes into school. That's adversity, will that work? Will he have higher morale now? I don't think so.
Now, what about if he plays the cello in the school orchestra. Or he plays for the school football team. I think that might work, even if he's not the best kid in the school at either of those things. It's not about having nice things or having bad things, it's about something else
IIMorale comes from having the nice things in your life correlated with effort. Cooking your own dinner is basically microdosing returns to investing effort: if you put in effort, you eat steak frites with peppercorn sauce. If you don't, you get eat chicken and rice.
It doesn't have to be cooking, basically any hobby works like this, as long as you get returns to effort. It might be art, or weightlifting, or whatever. You just need to keep reminding your brain that effort has a purpose.
This is especially important when you work in an area (like not having everyone die) where the returns on effort are hard to come by. Good software engineering looks like solving a PR in a day or so (or whatever you people do). Good alignment research might mean chasing a concept for weeks, only to have it fail.
The early stages of dating can also induce low morale. Sometimes, things just fall apart due to random incompatibilities which aren't your fault. Long-term relationships are much less like this: you can just do things (plan dates with your partner and enjoy their company).
John Wentworth has written about a minor depression presenting as extremely low morale amongst rationalist types. I don't think you should wait until it gets that bad before you improve your morale. I think you should think about it now.
IIIMorale doesn't just matter on an individual level, it also matters on the scale of whole societies. In this case, it doesn't just matter whether an individual gets rewarded for effort, it matters whether they see others rewarded for effort---and whether or not they see others punished for a lack of effort.
It's a truism that the most effective way to kill morale is to reward lazy or incompetent employees. You can do one better if you reward active sabotage. The harm of small but visible crimes (like fare-dodging on public transport) is, in part, the damage to the morale of everyone around.
There should be a hack for societal morale, though, and it's economic growth. People generally put some amount of effort into their work. If they can afford a better car each year, they'll attribute that to their own grit, and not an increase in the productivity of a Chinese factory.
Unfortunately, there's a twist in the twist. People are really awful at understanding nominal inflation. If the price goes up a bit (even if their wages more than match it) the price increase just feels like a random, unfair, morale-reducing loss. I conjecture this is a big contributor to the American Vibecession.
Discuss
Eggs, rooms, puzzles, and talking about AI
I live with five friends in a big house, and two things I’ve done in it on this particular Sunday are hide 156 easter eggs all around, and reach a tentative joint decision on the allocation of four of its rooms.
These tasks are delightful to me for a reason they have in common, and from which I hope to gesture at extremely far reaching conclusions.
Easter eggsA room usually seems like a simple thing to me—a big box, with some smaller mostly boxish objects and holes in it. Each of those things also usually seems simple: a cupboard is a box-shaped hole, with a movable thin-box-shaped front, which has hinges (the most complicated part, but in this picture their only qualities are letting flat surfaces rotate around fixed edges). Sometimes a cupboard has shelves, which are like planes breaking up the space.
In this picture, hiding easter eggs well is hard! Like, I could put one in the cupboard? On the top shelf? Or the bottom shelf! They’ll never find it there!
These are not good hiding places.
In order to hide easter eggs well, you need to see a lot of detail that you were abstracting away in the simple picture. The weird ridge along the back of the cupboard, or a wire looping under a lip around the front, or brackets holding up the shelves that have spaces in them where something could be wedged, or a rogue curl of onion peel in a back corner.
Here is one of my favorite hiding spots—can you see the egg?
Answer below:
.
.
.
.
.
.
.
.
.
.
.
.
I like it because a cushion so much seems like an inflated square in my mind—yes, with some sort of pattern, and perhaps somewhat worn out, but I don’t expect a pattern + worn out = you can hide a substantial solid object on the surface of it.
Here is an especially empty room (one of the ones in need of allocation), currently known as ‘the puzzle room’:
I hid ten eggs in it (probably two visible in this picture), and it took a while for people to find them all, which seemed to aggressively help some of the egg-seekers receive a similar experience of space containing details that are somehow really hard to see even if you try.
It would be one thing to have a kind of ‘level of detail dial’ that you could read and consciously turn up and down the level as you see fit. But an interesting thing about watching people search for easter eggs is that they can’t necessarily choose which things they are abstracting out, or fully tell how ‘carefully’ they are looking. You can put eggs in plain sight of them, and they think they are looking carefully, but just don’t see the egg. By the time a person has perceived anything at all, they have simplified it. You can’t just look at all the raw detail, and check it for eggs.
Besides not being able to control which abstractions you use, it seems to me now that an adversary (such as an egg-hider) can guess and exploit your habits of abstracting. Among the details of the cupboard, even if you are looking carefully at the shape of the sides, you might still miss the onion peel, because it’s random dirt, and you are examining the cupboard. That’s another nice thing about the ragged cushion—if you habitually round off worn-out things to what they are meant to be, it’s hard to see the detail of how it is falling apart, and thus the egg.
In another possible example, one of our bathrooms has a ‘bathroom!’ label on it, which I expect my housemates are used to seeing and ignoring, and visitors perhaps also tune out on their way to look for eggs inside what they have already determined to be a bathroom. I put an egg behind it, held by the super-post-it-note glue, which was a pretty unsubtle disruption to the smoothness of the sign, but this egg wasn’t found until it was accidentally knocked out at the very end.
RoomsAllocating rooms seems like it should be a simple thing—there are only a few options! Like, if you have four rooms, and Alice and Bob each basically need a place to sleep and to work, then it seems like you should be able to consider the 24 possibilities and be done. But actually (at least in houses I live in) what exact spaces are the ‘rooms’ in question is often more ambiguous than you might think, and what set of activities will be expected or people will be owners also contains many more possibilities than I see at first.
I’m more confused about how this happens with rooms, but I have twice in this house had the experience of mulling over such a question for what seems like unreasonably long, and coming up with new ideas we hadn’t thought of or taken seriously, and ending up with a satisfactory arrangement. This time, our tentative plan involves one of the bedrooms also being a recording studio, and there being three total rooms with beds in among two people. Which all feels very simple in retrospect, but I have been haplessly ideating about this for weeks.
It again feels kind of magical and wholesome to stare at the simple things long enough and well enough to see them more richly, in ways that you couldn’t just choose to, and for this to solve your problem.
Classic puzzlesThis kind of situation - an abstraction you take for granted that makes a problem hard, and gaps in the abstraction that let you do better, is a classic way to construct a puzzle. For instance (from Reddit):
AI riskA thing that has annoyed me for a long time in talking to people about AI risk is that they often do it in very abstract terms—”we need safety progress relative to capabilities progress”, or “such and such will get a decisive strategic advantage and there will be value lockin”—and then expect to be correct, like pretty confidently!
I love abstractions quite a lot compared to most people (I once scored 100% on the relevant axis of the Myers-Briggs test!) but I’m also expecting abstractions to have relevant frayed edges all over the place. And this is particularly relevant if you are trying to solve problems and are struggling to see solutions.
In particular, for instance, I often hear that it is pointless or silly to try not to build really dangerous AI technology because “it’s a race”. But before you give up on preventing this disaster, I really want you to spend at least as much attention seeing the details of the world below the level of “arms race” than my boyfriend spent peering at our laundry machines before he found the egg there.
Discuss
Страницы
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- …
- следующая ›
- последняя »