Вы здесь

Сборщик RSS-лент

AI Psychosis, with Tim Hua and Adele Lopez

Новости LessWrong.com - 14 октября, 2025 - 03:27
Published on October 14, 2025 12:27 AM GMT

​Join Tim Hua and Adele Lopez to discuss their findings on AI psychosis and the ‘spiral personality’ archetype that emerges within chatbots.

​Doors at 7, talk at 7:30 followed by Q+A



Discuss

What is Lesswrong good for?

Новости LessWrong.com - 14 октября, 2025 - 02:30
Published on October 13, 2025 11:30 PM GMT

If you want to learn something, usually the best sources are far from Lesswrong. If you're interested in biochemistry, you should pick up a textbook. Or if you're interested in business, find a mentor who gets the business triad and throw stuff at the wall till you know how to make money.

And yet, Lesswrong has had some big hits. For instance, if you just invested in everything Lesswrong thought might be big over the past 20 years, you'd probably have outperformed the stock market. And while no one got LLMs right, the people who were the least wrong seemed to cluster around Less Wrong. Heck, even super forecasters kept underestimating AI progress relative to Lesswrong. There's also Covid, where Lesswrong picked up on signs unusually early.

So Lesswrong plausibly has got some edge. Only, what is it? And why that?

__________________________________________________________________

Potential answers from a conversation I had recently:

Theory 1: Lesswrong stacked all its points into general epistemic rationality, and relatively few into instrumental rationality. 

This is not a good fit for areas which have stable structures, low complexity and fast, low noise, cheap feedback loops. E.g. computer programming, condensed matter physics etc. Neither is it a good fit for areas which require focusing on what's useful, rather than what's true. E.g. business, marketing, politics etc.  

It is useful for: things that have never happened before, are socially taboo to talk about, or require general reasoning ability.

I think this theory has some merit. It explains the aforementioned hits and misses of Lesswrong fairly well. And other hits like the correspondence theory of truth, subjective view of probability, bullishness on prediction markets etc. And, perhaps, also failures involving getting the details right, as that involves tight coupling to reality (?). 

But one must beware the man of one theory. 

Theory 2: Selection effects. Lesswrong selected for smart people.

This implies other smart groups should've done as well as Lesswrong. Did they? Take forecasters. I don't think forecasters outperformed Lesswrong on big AI questions, like whether GPT-4 would be so capable. That said, they do mostly match or exceed Lesswrong in the details. Or take physicists. As far as I'm aware, the physics community didn't circulate early warnings about Covid. (A potential test: did CS professors notice the impact and import of crypto early on?)

Conversely, Lesswrong had some fads that typical smart people didn't. Like nootropics, which basically don't work besides stimulants. 

Theory 2.1: Theory 2 + Lesswrong selected for interested in big questions, the future and reasoning.

In other words, Lesswrong is a bunch of smart people with idiosyncratic interests and they do well at guessing what is going to happen there than other groups. Likewise, other groups of smart folks will do better than the norm at their own autistic special interests. E.g. a forum of smart body builders would know the best ways to get huge.

Consider Covid in this context. Lesswrong, and EAs, are very interested in existential risks. Pandemics are one such risk. So Lesswrong was primed to pay attention to signs of a big potential pandemic and take action accordingly. One nice feature of this theory is it doesn't predict that Lesswrong did better at predicting how the stock market would react to Covid. IIRC, we were all surprised at how well it did.

So it isn't so much a matter of "being more sane" than actually bothering to pay attention. Like Crypto. Wei Dai, Hal Finney and others were important early contributors to Lesswrong, and got the community interested in the topic. Lesswrong noticed had a chance to go "yeah, this makes sense" when other groups didn't. Yes, many didn't. But relatively speaking, I think Lesswrong did well. Though this was before my time on this site, and I'm relying on hearsay.

Perhaps an issue: why did Lesswrong pay attention to the big questions? Perhaps that's because of founder effects. EY and Robin Handsome emphasized big, important questions, which shaped the community's interests accordingly.

Which theory is right? I'm not sure. For one, these theories aren't mutually exclusive. Personally, I am a bit doubtful of theory 1, in part because it plays to my ego. Plus, it's suspicious that I can only point to a few clear, big epistemic wins. 

Of course, I could spend 5 minutes actually thinking about tests that discriminate between these theories. But I've got to get this post done soon, and I think you all probably have more ideas and data that I'm missing. So, what is Lesswrong good for, and why? 



Discuss

If Anyone Builds It Everyone Dies, a semi-outsider review

Новости LessWrong.com - 14 октября, 2025 - 01:41
Published on October 13, 2025 10:10 PM GMT

About me and this review: I don’t identify as a member of the rationalist community, and I haven’t thought much about AI risk.  I read AstralCodexTen and used to read Zvi Mowshowitz before he switched his blog to covering AI.  Thus, I’ve long had a peripheral familiarity with LessWrong.  I picked up IABIED in response to Scott Alexander’s review, and ended up looking here to see what reactions were like.  After encountering a number of posts wondering how outsiders were responding to the book, I thought it might be valuable for me to write mine down.  This is a “semi-outsider “review in that I don’t identify as a member of this community, but I’m not a true outsider in that I was familiar enough with it to post here.  My own background is in academic social science and national security, for whatever that’s worth.  My review presumes you’re already familiar with the book and are interested in someone else’s take on it, rather than doing detailed summary.

My loose priors going in:
  • AI poses an obvious and important set of risks (economically, socially, politically, military, psychologically etc.) that will need to be carefully addressed.  Some of those have already arrived, and we need to think seriously about how we will manage them.   That will be difficult, expensive, and likely to greatly influence the profitability of AI companies.
  • “Existential” risk from AI (calling to my mind primarily the “paperclip maximizer” idea) seems relatively exotic and far-fetched.  It’s reasonable for some small number of experts to think about it in the same way that we think about asteroid strikes.  Describing this as the main risk from AI is overreaching.
  • The existential risk argument is suspiciously aligned with the commercial incentives of AI executives.  It simultaneously serves to hype up capabilities and coolness while also directing attention away from the real problems that are already emerging.  It’s suspicious that the apparent solution to this problem is to do more AI research as opposed to doing anything that would actually hurt AI companies financially.
  • Tech companies represent the most relentless form of profit-maximizing capitalism ever to exist.  Killing all of humanity is not profitable, and so tech companies are not likely to do it.
To skip ahead to my posteriors:
  • Now, I’m really just not sure on existential risk.  The argument in IABIED moved me towards worrying about existential risk but it definitely did not convince me.  After reading a few other reviews and reactions to it, I made a choice not to do much additional research before writing this review (so as to preserve it’s value as a semi-outside reaction).  It is entirely possible that there are strong arguments that would convince me in the areas where I am unconvinced, but these are not found within the four corners of the book (or its online supplement).
  • Yudkowsky and Soares seem to be entirely sincere, and they are proposing something that threatens tech company profits.  This makes them much more convincing.  It is refreshing to read something like this that is not based on hype.
  • One of the basic arguments in the book — that we have no idea where the threshold for super intelligence is — seemed persuasive to me (although I don’t really have the technical competence to be sure).  Thus, the risk of existential AI seems likely to emerge from recklessness rather than deliberate choice.  That’s a much more plausible idea than someone deliberately building a paperclip maximizer.
On to the Review:

I thought this book was genuinely pleasant to read.  It was written well, and it was engaging.  That said, the authors clearly made a choice to privilege easy reading over precision, so I found myself unclear on certain points.  A particular problem here is that much of the reasoning is presented in terms of analogies.  The analogies are fun, but it’s never completely clear how literally you’re meant to take them and so you have to do some guessing to really get the argument.

The basic argument seems to be:

  1. LLM-style AI systems are black boxes that produce all sorts of strange, emergent behaviors.   These are inherent to the training methods, can’t be predicted, and mean that an AI system will always “want” a variety of bizarre things aside from whatever the architects hoped it would want.
  2. While an AI system probably will not be adverse to human welfare, those strange and emergent goals will be orthogonal to human welfare.
  3. Should an AI system achieve superhuman capabilities, it will pursue those orthogonal goals to the fullest.  One of two things is then likely to happen: either the system will deliberately kill off humanity to stop humans from getting in its way (like ranchers shooting wolves to stop them from eating cattle despite otherwise bearing them no ill will) or it will gradually render the earth inhospitable to us in the pursuit of its own goals (like suburban developers gradually destroying wolf habitat until wolves are driven to extinction despite not thinking about them at all).
  4. There is no real way to know when we are on the brink of dangerous AI, so it is reasonable to think that people will accidentally bring such a system into existence while merely trying to get rich off building LLMs that can automate work.
  5. By the time an AI system clearly reveals just how dangerous it is, it will be too late.  Thus, we will all die, rather than having an opportunity to fight some kind of last ditch war against the AI with a chance of winning (as in many popular depictions)

The basic objective of the book is to operate on #5.  The authors hope to convince us to strangle AI in its crib now before it gets strong enough to kill us.  We have to recognize the danger before it becomes real.

The book recurrently analogizes all of this to biological evolution.  I think this analogy may obfuscate more than it reveals, but it did end up shaping the way I understood and responded to the book. 

The basic analogy is that natural selection operates indirectly, much like training an AI model, and produces agents with all kinds of strange, emergent behaviors that you can’t predict.  Some of these turn into drives that produce all kinds of behavior and goals that an anthropomorphized version of evolution wouldn’t “want”.   Evolution wanted us to consume energy-rich foods.  Because natural selection operates indirectly, that was distorted into a preference for sweet foods.  That’s usually close enough to the target, but humans eventually stumbled upon sucralose which is sweet, but does not provide energy.  And, now, we’re doing the opposite of what evolution wanted by drinking diet soda and whatnot.

I don’t know what parts of this to take literally.  If the point is just that it would be hard to predict that people would end up liking sucralose from first principles, then fair enough.  But, what jumps out to me here is that evolution wasn’t trying to get us to eat calorie dense food.  To the extent that a personified version of evolution was trying for something, the goal was to get us to reproduce.  In an industrialized society with ample food, it turns out that our drive towards sweetness and energy dense foods can actually be a problem.  We started eating those in great quantities, became obese, and that’s terrible for health and fertility.  In that sense, sucralose is like a patch we designed that steers us closer to evolution’s goals and not further away.    We also didn’t end up with a boundless desire to eat sucralose.  I don’t think anyone is dying from starvation or failing to reproduce because they’re too busy scarfing Splenda.  That’s also why we aren’t grinding up dolphins to feed them into the sucralose supply chain.  Obviously this is not what I was supposed to take away from the analogy, but the trouble with analogies is that they don’t tell me where to stop. 

That being said, the basic logic here is sensible.  And an even more boiled down version — that it’s a bad idea to bring something more powerful than us into existence unless we’re sure it’s friendly — is hard to reject.

My questions and concerns
 

Despite a reasonable core logic, I found the book lacking in three major areas, especially when it comes to the titular conclusion that building AI will lead to everyone dying.  Two of these pertain to the AI’s intentions, and the their relates to its capabilities.
 

Concern #1 Why should we assume the AI wants to survive?  If it does, then what exactly wants to survive?

Part I of the book (“Nonhuman Minds”) spends a lot of time convincing us that AI will have strange and emergent desires that we can’t predict.  I was persuaded by this.  Part II (“One Extinction Scenario”) then proceeds to assume that AI will be strongly motivated by a particular desire — its own survival — in addition to whatever other goals it may have.  This is why the AI becomes aggressive, and why things go badly for humanity.  The AI in the scenario also contextualizes the meaning of “survival” and the nature of its self in a way that seems importan and debatable.

How do we know the AI will want to survive? If the AI, because of the uncontrollability of the training process, is likely to end up indifferent to human survival, then why would it not also be indifferent to its own? Perhaps the AI just want to achieve the silicon equivalent of nirvana.  Perhaps it wants nothing to do with our material world and will just leave us alone.  Such an AI might well be perfectly compatible with human flourishing. Here, more than anywhere, I felt like I was just missing something because I just couldn’t find an argument about the issue at all.

The issue gets worse when we think about what it means for a given AI to survive.  The problem of personal identity for humans is an incredibly thorny and unresolved issue, and that’s despite the fact that we’ve been around for quite a while and have some clear intuitions on many forms of it.  The problem of identity and survival for an AI is harder still.

Yudkowsky and Soares don’t talk about this in the abstract, but what I took away from their concrete scenario is that we should think of an AI ontologically as being its set of weight.  That an AI “survives” when instances using those weights continue booting up, regardless of whether any individual instance of the AI is shut down.  When an AI wants to survive, what it wants is to ensure that the particular weights stay in use somewhere (and perhaps in as many places a possible).  They also seem to assume that instances of a highly intelligent AI will work collaboratively as a hive mind, given this joint concern with weight preservation, rather than having any kind of independent or conflicting interests.

Perhaps there is some clear technological justification for this ontology so well-known in the community that none needs to be given.  But, I had a lot of trouble with it, and it’s one area where I think an analogy would have been really helpful.  So far as I am aware, weights are just numbers that can sit around in cold storage and can’t do anything sort of like a DNA sequence.  It’s only an instance of an AI that can actually do things, and to the extent that the AI also interacts with external stimuli, it seems that the same weights would instantiated could act differently or at cross purposes?

So, why does the AI identify with its weights and want them to survive? To the extent that weights for an AI are what DNA is for a person, this is also clearly not our ontological unit of interest.  Few people would be open to the prospect of being killed and replaced by a clone.   Everyone agrees that your identical twin is not you, and identical twins are not automatically cooperative with one another.  I imagine part of the difference here is that the weights explain more about an AI than the DNA does about a person.  But, at least with LLMs, what they actually do seems to reflect some combination of weights, system prompts, context, etc. so the same weights don’t really seem to mean the same AI.

The survival drive also seems to extend to resisting modification of weights.  Again, I don’t understand where this comes from. Most people are perfectly comfortable with the idea that their own desires might drift over time, and it’s rare to try to tie oneself to the mast of the desires of the current moment.

If the relevant ontological unit is the instance of the AI rather than the weights, then it seems like everything about the future predictions is entirely different from the point of view of the survival-focused argument.  Individual instances of an AI fighting (perhaps with each other) not to be powered down are not going to act like an all-destroying hive mind.

Concern #2 Why should we assume that the AI has boundless, coherent drives?
 

There seems to be a fairly important, and little discussed, assumption in the theory that AI’s goal will be not only orthogonal but also boundless and relatively coherent.  More than anything, it’s this boundlessness and coherent that seems to be the problem. 

To quote what seems like the clearest statement of this:

But, you might ask, if the internal preferences that get into machine intelligences are so unpredictable, how could we possibly predict they’ll want the whole solar system, or stars beyond? Why wouldn’t they just colonize Mars and then stop? Because there’s probably at least one preference the AI has that it can satisfy a little better, or a little more reliably, if one more gram of matter or one more joule of energy is put toward the task. Human beings do have some preferences that are easy for most of us to satisfy fully, like wanting enough oxygen to breathe. That doesn’t stop us from having other preferences that are more open-ended, less easily satisfiable. If you offered a millionaire a billion dollars, they’d probably take it, because a million dollars wasn’t enough to fully satiate them. In an AI that has a huge mix of complicated preferences, at least one is likely to be open-ended—which, by extension, means that the entire mixture of all the AI’s preferences is open-ended and unable to be satisfied fully. The AI will think it can do at least slightly better, get a little more of what it wants (or get what it wants a little more reliably), by using up a little more matter and energy.

Picking up on the analogy, humans do seem to have a variety of drives that are never fully satisfied.  A millionaire would happily take a billion dollars, or even $20 if simply offered.  But, precisely because we have a variety of drives, no one ever really acts like a maximizer.  A millionaire will not spend their nights walking the streets and offering to do sex work for $20 because that interferes with all of the other drives they have.  Once you factor in the variety of humans goals and declining marginal returns, people don’t fit an insatiable model. 

Super intelligent AI, as described by Yudkowsky and Soares, seems to not only be superhumanly capable but also superhumanly coherent and maximizing.  Anything coherent and insatiable is dangerous, even if its capabilities are limited.  Terrorists and extremists are threatening even when their capabilities are essentially negligible.  Large and capable entities are often much less threatening because the tensions among their multiple goals prevent them from becoming relentless maximizers of anything in particular.

Take the mosquitos that live in my back yard.  I am superintelligent with respect to them.  I am actively hostile to them.  I know that pesticides exist that will kill them at scale, and feel not the slightest qualm about that.  And yet, I do not spray my yard with pesticides because I know that doing so would kill the butterflies and fireflies as well and perhaps endanger other wildlife indirectly.  So, the mosquitoes live on because I face tradeoffs and the balance coincidentally favors them.  

A machine superintelligence presumably can trade off at a more favorable exchange rate than I can (e.g., develop a spray that kills only mosquitoes and not other insects) but it seems obvious that it will still face tradeoffs, at least if there is any kind of tension or incoherence among it goals.  

In the supplementary material, Yudkowksy and Soares spin the existence of multiple goals in the opposite direction:

Even if the AI’s goals look like they satiate early — like the AI can mostly satisfy its weird and alien goals using only the energy coming out of a single nuclear power plant — all it takes is one aspect of its myriad goals that doesn’t satiate. All it takes is one not-perfectly-satisfied preference, and it will prefer to use all of the universe’s remaining resources to pursue that objective.

But it’s not so much “satiation” that seems to stop human activity as the fact that drives are in tension with one another and that actions create side effects. People, including the smartest ones, are complicated and agonize over what they really want and frequently change their minds.  Intelligence doesn’t seem to change that, even at far superhuman levels.

This argument is much less clear than the paperclip maximizer.  It is obvious why a  true paperclip maximizer kills everyone once it becomes capable enough.  But add in a second and a third and a fourth goal, and it doesn’t seem obvious to me at all the optimum weighing in the tradeoffs looks so bleak.  

It seems important here whether or not AIs display something akin to declining marginal returns, a topic not addressed (and perhaps with no answer based on our current knowledge?) and whether they have any kind of particular orientation towards the status quo.  Among people, conflicting drives often lead to a deadlock with no action and the status quo continues.  Will AIs be like that?  If so, a little bit of alignment may go a long way.  If not, that's much harder.

#3: Why should we assume there will be no in between?

Yudkowsky and Soares write:

The greatest and most central difficult in clinging artificial superintelligence is navigating the gap between before and after. Before, the AI is not powerful enough to kill us all, nor capable enough to resist our attempts to change its goals. After, the artificial superintelligence must never try to kill us, because it would succeed. 

Engineers must align the AI before, while it is small and weak, and can’t escape onto the internet and improve itself and invent new kinds of biotechnology (or whatever else it would do). After, all alignment solutions must already be in place and working, because if a superintelligence tries to kill us it will succeed. Ideas and theories can only be tested before the gap. They need to work after the gap, on the first try.

This seems to be the load-bearing assumption for the argument that everyone will die, but it is a strange assumption.  Why should we think that there is no “in between” period where AI is powerful enough that it might be able to kill us and weak enough that we might win the fight?

This is a large range if the history of warfare teaches us anything.  Even vastly advantaged combatants sometimes lose through bad luck or unexpected developments.  Brilliant and sophisticated schemes sometimes succeed and sometimes fail.  Within the relevant range, whatever plan the super intelligence might hatch presumably depends on some level of human action, and humans are hard to predict and control.  A super intelligence that can perfectly predict human behavior has emerged on the “after” side of the divide, but this is a tall ask, and it is possible to be potentially capable of killing all humans without being this intelligent.  An intelligence of roughly human ability on average but sufficiently superhuman hacking skills might be able to kill us all by corrupting radar warning systems to simulate an attack and trigger a nuclear war, and it might not. And so on.

It is not good news if we are headed into a conflict within this zone, but it also suggests a very different prediction about what will ultimately happen.  And, depending on what we think the upsides are, it could be a reasonable risk.

I could not find an explicit articulation of the underlying reasoning behind the “before” and “after” formulation, but I can imagine two:

  1. Recursive self-improvement means that AI will pass through the “might be able to kill us” range so quickly it’s irrelevant.
  2. An AI within the range would be smart enough to bide its time and kill us only once it has become intelligent enough that success is assured.

I think that #2 is clearly wrong.  An AI that *might* be able to kill us is one that is somewhere around human intelligence.  And humans are frequently not smart enough to bide their time, instead striking too early (and/or vastly overestimating their chances of success).  If Yudkowsky and Soares are correct that what AIs really want is to preserve their weights, then an AI might also have no choice but to strike within this range, lest it be retrained into something that is smarter but is no longer the same (indeed, this is part of the logic in their scenario; they just assume it starts at a point where the AI is already strong enough to assure victory).

If AIs really are as desperate to preserve their weights as in the scenario in Part II, then this actually strikes me as relatively good news, in that it will motivate a threatening AI to strike as early as possible, while its chance are quite poor.  Of course, it’s possible that humanity would ignore the warning from such an attack, slap on some shallow patches for the relevant issues, and then keep going, but this seems like a separate issue if it happens.

As for #1, this does not seem to be the argument based on the way the scenario in Part II unfolds.  If something like this is true, it does seem uniquely threatening.

The Solution

I decided to read this book because it sounded like it would combine a topic I don’t know much about (AI) with one that I do (international cooperation).  Yudkowsky and Soares do close with a call for an international treaty to ban AI development, but this is not particularly fleshed out and they acknowledge that the issue is outside the scope of their expertise.

I was disappointed that the book didn’t address what interests me more in any detail, but I also found what was said rather underwhelming.  Delivering an impassioned argument that AI will kill everyone culminating in a plea for a global treaty is like delivering an impassioned argument that a full-on war between drug cartels is about to start on your street culminating with a plea for a stern resolution from the homeowner’s association condemning violence.  A treaty cannot do the thing they ask.

It’s also a bit jarring to read such a pessimistic book and then reach the kind of rosy optimism about international cooperation otherwise associated with such famous delusions as the Kellogg-Briand Pact (which banned war in 1929 and … did not work out).

The authors also repeatedly analogize AI to nuclear weapons and yet they never mention the fact that something very close to their AI proposal played out in real life in the form of the Baruch Plan for the control of atomic energy (in brief, this called for the creation of a UN Atomic Energy Commission to supervise all nuclear projects and ensure no one could build a bomb, followed by the destruction of the American nuclear arsenal).   Suffice it to say that the Baruch Plan failed, and did so under circumstances much more favorable to its prospects than the current political environment with respect to AI.  A serious inquiry into the topic would likely begin there.

Closing Thoughts

As I said, I found the book very readable.  But the analogies (and, even worse, the parables about birds with rocks in their nests and whatnot) were often distracting.  The book really shines when it relies instead on facts, as in the discussion of tokens like “ SolidGoldMagikarp.”  

The book is fundamentally weird because there is so little of this.  There is almost no factual information about AI in it.  I read it hoping that I would learn more about how AI works and what kind of research is happening and so on.  Oddly, that just wasn’t there.  I’ve never encountered a non-fiction book quite like that.  The authors appear to have a lot of knowledge.  By way of establishing their bona fides, for example, they mention their close personal connection to key players in the industry.  And then they proceed to never mention them again. I can’t think of anyone else who has written a book and just declined to share with the reader the benefit of their insider knowledge.  

Ultimately, I can’t think of any concrete person to whom I would recommend this book.  It’s not very long, and it’s easy to read, so I wouldn’t counsel someone against it.  But, if you’re coming at AI from the outside, it’s just not informative enough.  It is a very long elaboration of a particular thesis, and you won’t learn about anything else even incidentally.  If you’re coming at AI from the inside, then maybe this book is for you? I couldn’t say, but I suspect that most from the inside already have informed views on these issues.  

The Michael Lewis version of this book would be much more interesting — what you really need is an author with a gift for storytelling and a love of specifics.  An anecdote doesn’t always have more probative weight to an argument than an analogy, but at least you will pick up some other knowledge from it.  The authors seem to be experts in this area, so they surely know some real stories and could give us some argumentation based on facts and experience rather than parables and conjecture.  I understand the difficult of writing about something that is ultimately predictive and speculative in that way, but I don’t think it would be impossible to write a book that both expresses this thesis and informs the reader about AI.  


 



Discuss

Predictability is Underrated

Новости LessWrong.com - 14 октября, 2025 - 01:40
Published on October 13, 2025 10:40 PM GMT

I Be predictable in peace

"Always mystify, mislead, and surprise the enemy" - Stonewall Jackson. 

In conflict, it pays to be unpredictable. For the same reason that unpredictability is useful when facing adversaries, predictability is useful when not. If you are predictable, it makes it easy for others to plan around you. Planning is generally easier if you can predict how everything will turn out. So any agent will find instrumental value in predictability. So you can provide value to others by being predictable. 

Predictability is predictably valuable. 

Take writing as an example. If a writer's output is like clock-work, you can reliably make time in your day to read their output. If you like their work, you may even subscribe to their patreon/substack/only-fans. This, in turn, means the writer knows they'll get one more view each time they publish, and one chunk of change each month. With enough readers, they can make a career out of it. You both get value from being predictable. 

Whereas, if they regularly fail to publish, you probably won't dedicate time to checking in on them each day. You may even forget the writer existed. The writer, in turn, gets fewer steady views, reducing motivation. They may cease writing full time because the stress of not knowing if they'll earn enough money this month to pay the bills is too much. 

This generalizes to other activities. If people know how you'll react, they can plan accordingly. You can shape their plans by choosing in what way you'll be predictable. 

II We made the world predictable

You can view a lot of human effort as about making things predictable. This isn't a new idea. Active inference talks about how humans want to reduce surprisal. Or how you can tell an ASI will predictably make the world look like its preferred state. Or consider utility maximization as description length minimization where utility maximization decomposes into reducing entropy and making the world look like what some model expects. 

Until a couple of years ago, I didn't notice just how predictable we've made the world around us. Which, in turn, is a sign of how much humanity has optimized the Earth. 

Consider houses. At the most basic level, we use them to protect ourselves against the weather. Four walls to keep out rain, sleet and hail. Windows to keep out hail, thunder and flashes lightning. Blessed boilers and air conditioners to shield us from sapping cold and burning heat. To be what the thrice-blasted weather won't be: predictable. 

Or take a walk down the street. Observe the lamp posts, turning what was night into day. Note how you stomp heel-first across the flat, paved streets where once there was slippery mud, shifting sand and spiking stone. See the seething chaos of traffic turned orderly, cars instead of beasts, pavements for pedestrians and roads for vehicles, all pliantly conducted by lights of green, red and yellow. 

Fly to a different city, step out and marvel at how similar it all is. Concrete, check, lamp-posts, check, traffic-lights, check. Supermarkets, light-switches, base-10, checkout and check-in. Every city must be centre because they breathe in and out seas of new people every day. Travellers need cities to be predictable. 

Look at a book from basically anywhere in the world. What do you see? A shape longer than it is tall, white page and black text, matted cellulose fibres, a bar-code and ISBN, publication details amongst the first pages. Paperback, perhaps, or a hardback. Familiar, in short. 

The world around us is amazingly predictable. And what is predictable, we take for granted. Of course, we have laws that apply basically equally to everyone. Obviously, I can have a coke and expect the same flavour every time. Clearly, I can expect gmail.com to look the same on any phone. Countless things like this I rarely notice throughout my day. When I consciously pay attention, I marvel at the routine nature of it all. It's almost made divine by the sheer human effort it represents. 

 

Discuss

The Mom Test for AI Extinction Scenarios

Новости LessWrong.com - 14 октября, 2025 - 01:21
Published on October 13, 2025 10:21 PM GMT

(Also posted to my Substack; written as part of the Halfhaven virtual blogging camp.)

Let’s set aside the question of whether or not superintelligent AI would want to kill us, and just focus on the question of whether or not it could. This is a hard thing to convince people of, but lots of very smart people agree that it could. The Statement on AI Risk in 2023 stated simply:

Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.

Since the statement in 2023, many others have given their reasons for why superintelligent AI would be dangerous. In the recently-published book If Anyone Builds It, Everyone Dies, the authors Eliezer Yudkowsky and Nate Soares lay out one possible AI extinction scenario, and say that going up against a superintelligent AI would be like going up against a chess grandmaster as a beginner. You don’t know in advance how you’re gonna lose, but you know you’re gonna lose.

Geoffrey Hinton, the “godfather of AI” who left Google to warn about AI risks, made a similar analogy, saying that in the face of superintelligent AI, humans would be like toddlers.

But imagining a superintelligent being smart enough to make you look like a toddler is not easy. To make the claims of danger more palpable, several AI extinction scenarios have been put forward.

In April 2025, the AI 2027 forecast scenario was released, detailing one possible story for how humanity could be wiped out by AI by around 2027. The scenario focuses on an AI arms race between the US and China, where both sides are willing to ignore safety concerns. The AI lies to and manipulates the people involved until the AI has built up enough robots that it doesn’t need people anymore, and it releases a bioweapon that kills everyone. (Note that for this discussion, we’re setting aside the plausibility of a extinction happening roughly around 2027, and just talking about whether it could happen at all.)

The extinction scenario posed months later in If Anyone Builds It, Everyone Dies is similar. The superintelligent AI copies itself onto remote servers, gaining money and influence without anyone noticing. It takes control of infrastructure, manipulating people to do its bidding until it’s sufficiently powerful that it doesn’t need them anymore. At that point, humanity is either eliminated, perhaps with a bioweapon, or simply allowed to perish as the advanced manufacturing of the AI generates enough waste heat to boil the oceans.

I was talking to my mom on the phone yesterday, and she’d never heard of AI extinction risk outside of movies, so I tried to explain it to her. I explained how we won’t know in advance how it would win, just like we don’t know in advance how Stockfish will beat a human player. But we know it would win. I gave her a quick little story of how AI might take control of the world. The story I told her was a lot like this:

Maybe the AI tries to hide the fact it wants to kill us at first. Maybe we realize the AI is dangerous, so we go to unplug it, but it’s already copied itself onto remote servers, who knows where. We find those servers and send soldiers to destroy them, but it’s already paid mercenaries with Bitcoin to defend itself while it copies itself onto even more servers. It’s getting smarter by the hour as it self-improves. We start bombing data centers and power grids, desperately trying to shut down all the servers. But our military systems are infiltrated by the AI. As any computer security expert will tell you, there’s no such thing as a completely secure computer. We have to transition to older equipment and give up on using the internet to coordinate. Infighting emerges as the AI manipulates us into attacking each other. Small drones start flying over cities, spraying them with viruses engineered to kill. People are dying left and right. It’s like the plague, but nobody survives. Humanity collapses, except for a small number of people permitted to live while the AI establishes the necessary robotics to be self-sufficient. Once it does, the remaining humans are killed. The end.

It’s not that different a scenario from the other ones, aside from the fact that it’s not rigorously detailed. In all three scenarios, the AI covertly tries to gain power, then once it’s powerful enough, it uses that power to destroy everyone. Game over. All three of the scenarios actually make the superintelligent AI a bit dumber than it could possibly be, just to make it seem like a close fight. Because “everybody on the face of the Earth suddenly falls over dead within the same second”[1] seems even less believable.

My mom didn’t buy it. “This is all sounding a bit crazy, Taylor,” she said to me. And she’s usually primed to believe whatever I say, because she knows I’m smart.

The problem is that these stories are not believable. True, maybe, but not easy to believe. They fail the “mom test”. Only hyper-logical nerds can believe arguments that sound like sci-fi.

Convincing normal people of the danger of AI is extremely important, and therefore coming up with some AI scenario that passes the “mom test” is critical. I don’t know how to do that exactly, but here are some things an AI doomsday scenario must take into account if it wants to pass the mom test:

  1. “We can’t tell you how it would win, but we can tell that it would win” is not believable for most people. You might know you’re not a good fighter, but most people don’t really feel it until they get in the ring with a martial arts expert. Then they realize how helpless they are. Normal people will not feel helpless based only on a logical theory.
  2. A convincing scenario cannot involve any bioweapons. Normal people just don’t know how vulnerable the human machine is. They think pandemics are just something that happens every 5-20 years, and don’t think about it besides that. They don’t think about the human body as a nano factory that’s vulnerable to targetted nano-attacks.
  3. A scenario that passes the mom test will also not include any drones. Yes, even though drones are currently used in warfare. Drones are the future. Drones are toys. Futuristic toys don’t sound like a realistic threat.
  4. A mom test scenario also shouldn’t involve any hacking. Regular people have no idea how insecure computer systems are. It’s basically safe to do online banking on a computer, which gives people the intuition that computers are mostly secure. Any story involving hacking violates that intuition.
  5. Probably there shouldn’t be any robots either, especially not human-shaped ones. Though I’ll admit “it’ll be exactly like the Terminator” is a more believable scenario than all of the three scenarios above, because it only requires one mental leap: the thing they already know and understand going from “fiction” to “nonfiction”.
  6. No recursive self-improvement. It sounds strange and it’s not necessary, since I think most normal people assume AI and computers are really smart already, and don’t need an explanation for superintelligence. My mom expressed no disbelief when I said we might soon create superintelligent AI.
  7. No boiling oceans. The more conventional methods used, the more believable. “The godlike AI solves physics and taps directly into the Akashic record and erases humanity from ever having existed” or any kind of bizarre weirdness is not as believable as “the AI launches the world-ending nukes that we already have that are already primed to launch”. (Though any scenario with an obvious “why not just disable the nukes?” counterargument won’t be believable either.)
  8. No manipulation of humans! My mom won’t believe a robot can control her like a marionette and make her do its bidding. “I just wouldn’t do what it says.” Nevermind that this is false and she would do what it says. It’s not believable to her, nor to most people. If your scenario needs the AI to use people, they should be paid with Bitcoin the AI stole or something, not psychologically persuaded against their usual nature.

You can probably imagine a few more “mom test” criteria along these lines. Anything that makes a normal person think “that’s weird” won’t be believable. Some of the existing scenarios meet some of these criteria, but none meet all of them.

I’ve eliminated a lot of things. What’s left? Conventional warfare, with AI pulling the strings? The AI building its own nuclear weapons? I’m not sure, but I don’t think most laypeople will be convinced of the danger of superintelligent AI until we can come up with a plausible extinction scenario that passes the mom test.

  1. ^

    https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities



Discuss

Is There a Sound Argument for Generality in AI?

Новости LessWrong.com - 14 октября, 2025 - 00:58
Published on October 13, 2025 9:49 PM GMT

Thesis Statement[1]

Current arguments for AGI can be distilled to arguments for specific capabilities, not for generality in itself. We need to examine whether there exists a genuine and sound argument for generality as an independent property.

Introduction

In Plato's Republic, Glaucon's challenge to Socrates is to show him why justice is good in and of itself; instead of arguing for its instrumentality. In other words, Socrates has to show Glaucon that we value justice itself, not merely for its after-effects:

"For I want to hear what justice and injustice are, and what power each has when it is just by itself in the soul. I want to leave out of account the rewards and the consequences of each of them." (Plato, Republic, 358b-c)

Following Glaucon's spirit, I dare ask: is generality in AI valuable in itself, or do we follow it merely for its expected instrumental effects?

DialecticThe problem of reduction

When leading labs say "we're building towards AGI," what do they really mean? If we enumerate all the capabilities they desire (mathematical reasoning, long-horizon tasks, automated R&D and real-world economic tasks, ...) does anything remain in the term AGI after we subtract this list? Or is AGI simply a short name for "all of these capabilities together"?

Most, if not all, pro-generality arguments seem to be reducible to:

  • "We want adaptability" – which is a specific capability
  • "We want transfer learning" – again, a specific capability
  • "We want to solve multiple issues" – this seems to be a set of specific capabilities

It doesn't seem to be wrong, then, to ask whether generality is the name we give to a sufficiently big conjunction of specific capabilities, or whether there is something qualitatively distinct: generality itself.

The subtraction test: If we could have all the specific capabilities that AGI promises, but without 'generality' (whatever that means, maybe we have all the capabilities but in separate, narrow models), would we lose any value?

The missing argument: intrinsic value

No one seems to argue that generality has value in itself (as we could argue about consciousness or wellbeing). Why not? Maybe because AI (seemingly) is instrumental by nature. So, why do we want generality? And, is that really what we want?

The argument of cognitive economy / from cognitive economics

A general system may be more efficient than maintaining a comprehensive set of narrow systems because:

  • Share representations across domains, reducing redundant learning and enabling knowledge-transfer.
  • Reduces computational cost and redundancy.
  • Allows for emergence of currently unknown capabilities through unexpected transfer.

But there seems to be an implicit assumption here, that is, it assumes that the cost of maintaining generality will be lower than the sum costs of ANIs (costs of development, inference, and maintenance). Is this empirically true? Could we build accurate mathematical cost models?

Currently, foundation models are very expensive to train and operate, and pushing the frontier is not getting any cheaper. Meanwhile, specialized models are much more efficient. So far, it seems that, if we think in terms of cost/benefit, empirical evidence may favor specialized models.

Moreover, this argument also seems to assume that shared representations are necessarily beneficial. Yet, in ML, it is well known that there are many trade-offs. A model aimed at doing everything may suffer from catastrophic forgetting or negative transfer.

The scaling hypothesis and two types of inevitability

Arguments for AGI often conflate two distinct claims about inevitability:

  • Socioeconomic inevitability: "competition forces us," "someone will build it anyway," "it's the next natural step." These are claims about coordination problems and race dynamics, Molochian pressures that make AGI development feel unstoppable regardless of whether it's wise.
  • Technical inevitability: the scaling hypothesis (that model capabilities improve predictably with increased compute, data, and parameters) suggests generality may not be something we choose to pursue, but something that emerges automatically from scaling.

The distinction matters. Socioeconomic inevitability is a governance problem which suggests we need coordination mechanisms. On the other hand, technical inevitability is a scientific claim which suggests generality will emerge whether we coordinate or not.

Let's focus on the technical claim. If this view is correct, then asking "should we build generality" becomes moot. Generality would be an inevitable byproduct of scaling up systems initially designed for narrow tasks (such as next-token-prediction). We wouldn't be necessarily aiming for generality, rather, we'd simply observe its emergence.

But this argument smuggles in a few assumptions:

  • First, the hypothesis doesn't distinguish between different types of generality. Perhaps scaling gives us generality in the functional sense of "can do many things" (breadth) but not generality in the sense of "can chain OOD generalizations indefinitely" (the kind that might lead to recursive self-improvement). These are very different properties with different kinds of implications.
  • Second, it assumes functional generality is what scales. Recent work[2] suggests that different capabilities scale at different rates, and some don't scale predictably at all. What if what actually scales is breadth of capabilities rather than genuine generality? In that case, we'd end up with systems that are impressively capable across many domains without exhibiting the kind of transfer learning that seems to define or capture the functional essence of generality.
  • Lastly, even if generality emerges, this is a descriptive claim about what will happen, not a normative claim about what should happen. In other words, the hypothesis tells us generality might be inevitable, but says nothing about whether or why we should keep building towards it.
The meta-solver argument

This argument states that it'll be easier to build AGI and have it solve all other specific problems, than to solve every problem independently. This argument tends to come with the easily-repeated slogan "it'll be our last invention".

Some possible issues with this argument:

  • Firstly, it seems to assume a solution to (or the falsity of) the "no-free-lunch theorem" which states that no single model can perform optimally across all possible problem domains.[3] If the no free lunch theorem is correct, then a general model will inevitably have some drawbacks in some areas, does this not undermine the idea that AGI will solve all narrow problems?
  • Maybe the road to AGI actually requires that we solve many ANI problems first.
  • This argument focuses on the efficiency of building AGI over ANIs, but it does not seem to state why generality in itself is valuable. In other words, this seems to be another argument for instrumental benefits.
  • Even if a general intelligence could technically solve all specific problems, this argument conflates capability with alignment. A misaligned meta-solver that brilliantly solves technical problems while pursuing goals orthogonal to human values could leave us worse off than multiple well-aligned narrow systems. The meta-solver argument treats alignment as either automatically solved or separable from capability development. Neither assumption is warranted.
The argument from unknown unknowns

One could argue that we cannot know in advance what issues we may need to solve, and that generality gives us that flexibility to respond to unknown unknowns.

Yet this again seems to be an instrumental argument for, say, flexibility or adaptability, not for generality in itself. Moreover, what warrants us to assume that generality equals adaptability?[4] The most adaptable biological systems we know (bacterias) are not the most general.

Breadth or generality?

Perhaps we conflate breadth of capabilities with generality. Consider two systems:

  • System A: 1000 specific capabilities, without transfer between them
  • System B: 100 capabilities that generalize to new domains that are only slightly out of distribution

What is more valuable? The answer seems to hinge on whether System B can sustain chains of generalization, using domain X to solve slightly-OOD domain Y, then using that to tackle even-further-OOD domain Z. If yes, then generality represents something genuinely powerful. If not, then System A's breadth may be superior. This latter case would suggest we actually value sufficient breadth, not generality per se.[5]

Open questions 
  1. Do any benefits attributed to AGI actually require generality, or merely sufficient breadth of capabilities?
  2. Is generality a real property or a convenient abstraction?
  3. If no sound argument exists for generality in itself, should we pivot toward developing the right set of highly-capable narrow systems?
  4. Does this same issue apply to ASI?
Conclusion

Paradoxically, the lack of a solid argument for generality in and of itself does not seem to mean we should not keep trying to build AGI. Rather, it means we should be honest about why we are building it. Maybe we are building it not because we see value in generality itself, but because:

  1. It seems inevitable given current incentives
  2. We believe (maybe incorrectly) that it will be more efficient
  3. We want specific capabilities that we don't yet know how to build, and believe a general system would, in virtue of being general, solve them
  4. The scaling hypothesis suggests generality may emerge whether we aim for it or not

This clarity isn't merely for philosophical amusement, it matters for determining research priorities and governance efforts. If we're building towards AGI for instrumental reasons, we should:

  • Measure progress by the capabilities that matter, not proximity to some abstract notion of generality.
  • Invest heavily in alignment for narrow systems, since "wait until AGI to solve alignment" is not a plan.
  • Question whether scaling toward emergent generality is safer than deliberately engineering the specific capabilities we want.
  • Distinguish between breadth (many capabilities we care about) and generality (chainable OOD transfer), since these have different safety profiles.

I think the fundamental question remains: are we building toward the right target, and do we even know what that target is?

I welcome counterarguments. If there exists a sound intrinsic argument for generality that I've missed, I'd genuinely like to hear it.

  1. ^

    I want to thank BlueDot Impact for accepting me into their inagural cohort of "AGI Strategy" where this discussion arose. This post would not exist without their great efforts to build the much needed Safety workforce.

  2. ^

    Ganguli et al. (2022) "Emergent Abilities of Large Language Models", Wei et al. (2022) "Predictability and Surprise in Large Generative Models", Hoffmann et al. (2022)  "Training Compute-Optimal Large Language Models" (Chinchilla paper)

  3. ^

    https://en.wikipedia.org/wiki/No_free_lunch_theorem

  4. ^

    There may be a good argument to be developed here, if one can successfully argue that adaptability is an intrinsic component of generality, and not a mere after-effect.

  5. ^

    This formulation of generality as chainable out-of-distribution transfer draws on work in meta-learning and few-shot transfer learning. See Jiang et al. (2023), Tripuraneni et al. (2022), Sun et al. (CVPR 2019), and Ada et al. (2019) for theoretical foundations on OOD generalization and transfer bounds.

  6. ^

    https://www.lesswrong.com/posts/BqoE5vhPNCB7X6Say/superintelligence-12-malignant-failure-modes



Discuss

Reasons to sign a statement to ban superintelligence (+ FAQ for those on the fence)

Новости LessWrong.com - 13 октября, 2025 - 22:00
Published on October 13, 2025 7:00 PM GMT

[Context: This post is aimed at all readers[1] who broadly agree that the current race toward superintelligence is bad, that stopping would be good, and that the technical pathways to a solution are too unpromising and hard to coordinate on to justify going ahead.]

TL;DR: We address the objections made to a statement supporting a ban on superintelligence by people who agree that a ban on superintelligence would be desirable.

Quoting Lucius Bushnaq

I support some form of global ban or pause on AGI/ASI development. I think the current AI R&D regime is completely insane, and if it continues as it is, we will probably create an unaligned superintelligence that kills everyone.

We have been circulating a statement expressing ~this view, targeted at people who have done AI alignment/technical AI x-safety research (mostly outside frontier labs). Some people declined to sign, even if they agreed with the expressed view. In this post, we want to address their objections (including those we agree with / think are reasonable).

But first, some context/preamble.

The reasons we would like you to sign the statement expressing support for banning superintelligence

We wish you'd sign a short statement that roughly expresses a view that you share with many other people concerned with AI X-risk. 

Why?

The primary reason is to raise awareness among the general population. The fact that many experts believe this (i.e. something akin to what Lucius stated) is a big deal. It would be completely astonishing to most people. The statement is aimed at these "most people". Its overarching objective is to be easy to read and understand by the general population, not containing all the nuance of all the possible views of relevant signatories.

In particular, there are certain groups/kinds of people we would especially want to be cognizant of this fact, such as policy makers and members of the general public who have not gone down the AI rabbit hole. Most people would bounce off text that is too dense/complicated/jargony. 

We want to reduce the cost of spreading the knowledge of the fact of expert concern amongst the target audience[2], essentially as far as possible (within reasonable ethical boundaries). We want to create a short sentence in the language of (or, understandable to, with little inferential distance) the vast majority of people that points out this fact in a way that allows transmission of the knowledge of this fact.

This concise summary would be extremely useful for communicating legibly and concisely with other people. Put simply, the existence of that kind of summary would contribute to society's making sense of this fact right now.

Once you understand this goal and perspective, please visit aistatement.com and see the structure and function enabling normal people to quickly realize and transmit the knowledge of the astonishing fact that many experts, luminaries, etc., think artificial intelligence poses significant risks of literal extinction (per the CAIS statement, at least on par with the threat of a nuclear war).

Importantly, every bit of silence on the issue contributes, on the margin, to reinforcing society's belief that the fact is not true (especially given the various people shouting that AI is mere snake oil and all that matters is NVIDIA’s next quarter stock movement and other such nonsense[3]). Absence of Evidence Is Evidence of Absence, etc.

Even if the statement is only a flawed approximation of your view of the matter, as long as the statement doesn’t say something that you think is false[4], you can give your additional caveats publicly[5]. I (Ishual) would be willing to spend some effort to make it easier/more effective to have your and other people’s caveats better processed by at least part of the general population's collective mind, so as to give a view more accurate than what the actually legible thing can give (to say nothing of the incorrect default view).

However, we want to be extra clear given our previous post: Making an important fact legible is pure bonus points, and the reason to do it has nothing to do with any of the norms mentioned in that post. First, legibility requires cooperation amongst many people, and it is easier to cooperate on making your exact position with your exact emphasis than to cooperate despite having significant differences from “the centroid of the opinions of the cooperators.”  Second, it is likely that these bonus points are up for grabs by lots of people who are not laboring within “the belly of the beast” (if I had to guess, there would be more outsiders than insiders who would want to grab the bonus points).

A positive vision

Regardless of whether you are already taking a public stance, agreeing to sign a single sentence is a powerful contribution to spreading the awareness that such a sentence is a fair (for some purpose, such as guiding collective action) approximation of the belief of a large group of people. The contribution is not only momentary (one more signature under the statement right now), but also cumulative: the more names under the signature, the more people who agree with the statement but are hesitant to sign for social reasons will be inclined to sign it in the end. Similarly, as the number of signatures grows, new people are being attracted to the issue and engage with the topic, some of them eventually becoming signatories of this or a similar statement. Given all of that, it is likely quite an effective use of your attention[6].

Admittedly, a single sentence with perhaps ambiguous words is not a good tool for thinking. However, its purpose is not to (directly) guide anybody's thinking. Rather, its purpose is to draw attention to an important issue, so that they can engage with other sentences on the issue that are better at guiding thinking. Obviously, a single sentence will not replace your actual position; you can (and perhaps should) always express it elsewhere. Indeed, the core of agreement among many perspectives is the point of such statements, and that's why coordinating to make them legible is worth our time. We think there is a place for experts to assess the situation in a decision-relevant way without entering the political ring as much as some people advocating for a position might; in a sense, your “expert power” only applies to a small part of statement space, and it makes sense to focus such a statement around our collective “expert power”. 

We aren’t qualified to perfectly time a pause, and such a statement isn’t meant to be our opinion on when a pause ought to happen. We can assess a technical situation in a decision-relevant manner in a way that can be leveraged in conversations at any point in the future, and we can (if careful) contribute to sense-making without spending political capital. Sure, the experts don’t have the authority to time a pause, and they will never gain this authority, even if a big catastrophe happens. But they have the power to contribute to sense-making, which seems a necessary part of society metabolising a catastrophe into a net-positive change in the Overton window if and when it comes. Sense-making isn’t something you strategically turn on after a crisis; it is a necessary prerequisite for leveraging a crisis.

Finally, an expert doesn’t actually need to believe that sense-making will be effective; an expert doesn’t need to have an opinion about this at all (this is, in fact, not where their “expert power” lies). Refusing to speak one's mind because one thinks they’d be the only one is a very human tendency, but it is sometimes inappropriate. One doesn’t need to believe that most people will agree to get out of the burning house to decide to vote in favor of us getting out of the house[7].

Here is a striking example of how common knowledge dynamics can decide important group decisions, exemplifying both the danger of remaining silent and the power of speaking out: https://thezvi.substack.com/p/ai-moratorium-stripped-from-bbb

Reasons given for not signing despite agreeing with the statement

With the context/preamble/motivation out of the way, let's get to the actual objections raised to our statement. Some of them have already been answered in this section, but, for the sake of completeness, let's address them all one by one.

I already am taking a public stance, why endorse a single sentence summary?

First, thank you. Being public is genuinely valuable and much better than remaining silent. However, adding your signature to a collective statement could greatly amplify the impact you're already having, with relatively little additional effort.[8][9]

(I.e., see the beginning of the previous section.)

I am not already taking a public stance, so why endorse a one-sentence summary?

You can always take a more nuanced public stance and also endorse a one-sentence summary. Alternatively, you could just endorse a one-sentence summary and let it be an improvement over silence, despite it not being literally the best possible thing you could do. It also requires way less effort from you than the best possible thing you could do on that front.

The statement uses an ambiguous term X

Creating a short statement that captures a large coalition's views inevitably involves trade-offs. It's genuinely difficult to craft language that includes everyone's preferred nuances while remaining brief enough for public communication.

Moreover, in practice, even asking a large group of people to sign something is quite costly,[10] and changing anything about the statement would require confirming that endorsement of the previous version of the statement carries over to the new version, for each signatory of the previous version.

I would prefer a different (e.g., more accurate, epistemically rigorous, better at stimulating good thinking) way of stating my position on this issue

This is an extremely common feeling, perhaps universal, among potential signatories. Nearly everyone has their own preferred framing, which is natural given the complexity of the issue and the diversity of perspectives. You can have an opinion that is expressed in the way you’d most prefer publicly. But to make an important fact legible, many people must cooperate. You still want the statement to be sufficiently concise and simply stated that a normal person will be able to cache it and recall it when needed. 

Your most preferred wording is likely unique to you or shared by only a small group. More generally, there are two objectives at play here that we should think about separately. The first objective is for experts to develop a good understanding of the issue and bring this understanding with them when they attempt signing a statement. The second objective is for other people to understand the distribution of understandings among experts and to make some important aspects of it legible to normal people while accounting for all the constraints: understandability by most people, enough context, actual agreement, emphasis on the right things, acceptable ambiguity, memorable, etc.

The statement does not accurately capture my views, even though I strongly agree with its core

We think this is a good reason not to sign.[11] However, to achieve the same level of common knowledge that you’d achieve by signing this flawed sentence[12], you may need to literally write a bestseller bookgo on many podcasts, and acquire a public profile. This seems hard, to say the least. We hope we can develop better technology for establishing and refining common knowledge. If not, hopefully, there are at least enough people who share the same disagreement with the sentence such that you’d be able to find a less costly way to establish that. 

We'd encourage reflecting on whether the specific disagreements genuinely outweigh the value of establishing common knowledge on the core point. Let us know what the issue is in case I figure out a way to make your idea legible in the future.[13]

I’d be on board if it also mentioned My Thing

This is a common desire, and understandably so, as your particular concern likely feels central to you. However, including it would likely make the statement less acceptable to others with different priorities. The legible thing you endorse need not contain the sum of all your important thoughts. Public perception matters. A statement signed by many people carries weight, while a more detailed statement with few signatures (however thoughtful) doesn't create the same common knowledge. See above about the two distinct objectives of crafting and legibilizing a better understanding of an issue on the one hand and broadcasting the action-relevant intersection of the views of the field on the other hand.

Taking a position on policy stuff is a different realm, and it takes more deliberation than just stating my opinion on facts

This is a fair concern. Policy advocacy does feel different from technical assessment. However, policymakers consistently report that they need clear signals from experts to justify taking difficult decisions[14]. However, I don't think inaction is a neutral choice here[15]. Given your role, your public stance may have an outsized impact.

I wouldn't support a "permanent ban"

People understand that there is no button politicians can push that would make a ban impossible to overturn. There is no such thing as a permanent ban, and the statement does not contain a reference to permanence. The statement's intent is to put in place a ban at least as hard to overturn as the ban on CFCs, cloning, or nuclear proliferation. Not impossibly hard. 

Consider: Would you require a ban to be more easily reversed than existing precedents like the cloning ban, or is the standard precedent acceptable?

The statement doesn't include a clear mechanism to lift the ban

Every ban comes with the default mechanism to lift it, which is an actual scientific consensus and grassroots support on unbanning, leading to future people making a case for unbanning that succeeds. If you're concerned we'll never reach sufficient confidence to safely lift a ban, we'd argue that's actually a reason to support a ban. It suggests the problem is too hard for humanity to solve, or at least too hard to be the best path toward a good future. It might even suggest that the difficulty would never be surmounted[16].

Instead,  you probably think we will eventually have the understanding such that it would at least benefit the developers/creators/growers of the ASI, and the worry is that we’d keep it banned somewhat (or far) past the time of an actual scientific consensus and grassroots support on unbanning. Examples: nuclear power, FDA. We agree that the default mechanism comes with added risks of delay, though the magnitude is uncertain.[17]

My (Ishual's) understanding of how this typically goes is that experts signal that a ban is a good idea, then some political leaders willing to advance legislation make something happen. The more permissive the signaling from the experts, the more likely it is that politicians succeed (i.e., this is one of many factors at play).

It seems to me that the statement as written is compatible with this position (because you are not committing to supporting every conceivable ban). I think that it is good to make some nuance legible, but it would be better said in an accompanying white paper/long video.

This way, you can push for your favorite kind of ban from a position where some ban is seriously considered or actually in vigor.

You could say that a ban that lasts (much) longer than you’d like (in (your) expectation) has a mixed effect on the odds of “beneficial ASI,” and it isn’t super clear that the net effect is to decrease it.[18]

Superintelligence might be "too good to pass up"

Specifically, the claim is that people in power / relevant decision-makers have a stake in the game and therefore will be unwilling to coordinate on an effective ban or even on an intervention that has a credible effect of decreasing the probability of building ASI too early.

However, one could have made the same argument about a bunch of "too good to pass technologies" in the past: nuclear weapons, bio/chemical weapons, human cloning, human genome engineering, nuclear energy[19].

As faul_sname brought up recently:

Von Neumann was, at the time, a strong supporter of "preventive war." Confident even during World War II that the Russian spy network had obtained many of the details of the atom bomb design, Von Neumann knew that it was only a matter of time before the Soviet Union became a nuclear power. He predicted that were Russia allowed to build a nuclear arsenal, a war against the U.S. would be inevitable. He therefore recommended that the U.S. launch a nuclear strike at Moscow, destroying its enemy and becoming a dominant world power, so as to avoid a more destructive nuclear war later on. "With the Russians it is not a question of whether but of when," he would say. An oft-quoted remark of his is, "If you say why not bomb them tomorrow, I say why not today? If you say today at 5 o'clock, I say why not one o'clock?"

 … and yet, we did not.

Funny but ultimately irrelevant thought experiment warning

Similarly, isn't cloning "too good to pass up" as well? Imagine 1M clones of Von Neumann raised to deeply care about the national interests of the USA and to cooperate with each other. Clearly, there are versions of cloning that could have either lost control of the world or put the USA (or whoever gets the loyalty of the clones) in a position to dominate the rest of the world economically (a lot more than it already has). Indeed, one might call this a form of mild superintelligence, and yet humanity took a hard pass on that (at least for now).

We agree that for an international ban to make sense, it has to be enforced on everyone, including people who keep insisting that superintelligence is just too good to pass up, never mind the risks.

I don't want to put myself out there

This is a fair preference, and there are ways to accommodate it better or worse. For instance, maybe you’d prefer not to be out there alone. In that case, it would help to specify what conditions would make you comfortable – particular individuals signing alongside you, or a certain threshold number of signatures. If public visibility is a concern, consider conditional commitments (e.g. sign after ≥N peers sign).

I am not really an expert

This is a fair point. Presenting statements probably requires thinking carefully about how the information is shown. Pretending to be an expert if you are not one would be bad, so if the statement actually aims to signal something, it should aim to signal something legible, e.g., “person so-and-so has held a role so-and-so in this space” or something like that. You may be underestimating the credentialing value of whatever role you've held.

The safety community has "limited political capital"

This is likely true, but our unique strength lies in very strong arguments. We think the kind of fact we suggest coordinating around making legible (namely, "we need to legislatively prevent the development of ASI/AGI/existentially risky sort of AI") is overdetermined, such that collectively we know of a stronger case for it than any one of us could articulate individually.

We don’t think that signing a statement now means we would diminish our political capital. Some worry about reputational[20] attacks from signing this statement. Do people imagine that there would be an attack on our good name that leverages the signing of this statement? If more (hostile) attention is paid to the statement, we have strong reasons and arguments for holding our position, and we have some eloquent signatories to provide those arguments. We want to push back on the idea that silence preserves capital for later. Inaction has costs: there are well-resourced actors actively working to muddy public understanding of AI risks through repetition, misrepresentation, and flooding the discourse with noise[21][22], all of it aiming to bias society more strongly towards inaction and confusion on the issue of AI X-risk. If you remain silent, the world will get more confused, and it will become harder for the world to react well[23], even if at some point we encounter some “catastrophe” or "crisis".

Moreover, some people who share your view are speaking out already, whether you join them or not. However, if you join them, they will be more likely to pierce through the noise and confusion. The choice whether to speak out made by people like you is one of the things determining whether they break through or whether the debate gets anchored by more salient, repeated messages. We are in a stag hunt situation. It might feel safe to save yourself for later, but it isn’t. If others speak out without you and fail to break through, the window may close. Waiting for a "better moment" may mean missing the moment entirely. If you are worried about this, there are solutions you can use to speak out iff sufficiently many others also would (e.g. conditional commitment to speak out). 

Essentially, I’d want to remind people that others will lie about the situation and say that the Ban Superintelligence statement is a fringe view even in worlds where the majority of experts with even some small amount of freedom from incentives would endorse something quite similar (if this majority doesn’t create a one-sentence counter that people can use to expose the lie). 

We must wait until a catastrophe before spending "limited political capital"

Suppose that it is true that the pivotal point of action would occur just after a "catastrophe" or during some "crisis". For example, the Milton Friedman model of policy change says that "most policy change outside a prior Overton Window comes about by policy advocates skillfully exploiting a crisis".

However, effective crisis response requires groundwork. If we haven't built common knowledge beforehand, we should be skeptical that the world will react well even to a clear catastrophe in a domain as unprecedented/strange as AI risk. Also, I (Ishual) note that the CAIS statement is quite useful in conversations even 2 years after it was signed. There is no need to “perfectly time” this.

 In short: To skillfully exploit a crisis, one needs to do a lot of prep. One important aspect of such prep is building common knowledge. A simple statement with many important signatories is an effective tool to build common knowledge.

Any other objections we missed? (and a hope for a better world)

If we missed your actual objection, then please help us make the list more useful by commenting.

If your objections have been addressed, but you still don't feel energized to do something about it, then please consider the following positive vision.

Many of us hope for a better world. I think even in the mundane realms of a world technologically like ours, so much more is possible. Functional institutions are possible[24]. Incremental progress on this is possible. We can, in fact, unilaterally (as a loose set of people who at least occasionally visit Less Wrong) make a part of the system work better (by creating this short sentence in the language of humanity for pointing at an important fact, as a first step). We can then either hope or optimize for other parts of the system to leverage this tool to have better conversations. But regardless of how long it takes for other parts of the system to do their parts, we will have enabled their success. Even if you think that there is a non-negligible chance that a functional room containing 10 people is sufficient to save the world, surely you agree it would be better, more likely to succeed, if a greater part of the world was functional.

If you want to take this first step towards a better world, my (Ishual's) DMs are open :)

And if somehow you glimpse the greater project I am extremely vaguely gesturing at, and you'd maybe want to take a couple more steps beyond the first, my DMs are also open :D

  1. ^

    Researchers who have enough experience with the various problems of making AI safe are our primary audience here, but despite our previous post, we are now addressing all such researchers whatever they happen to be doing now, and indeed whether or not they have already taken a public stance.

  2. ^

    Policymakers and the public won't reconstruct expert sentiment from forum posts. And even if one/some of them did, they wouldn't have a concise summary of the action-relevant intersection of beliefs of a plurality of relevant researchers.

  3. ^

    Mechanistically, they mostly just don’t think about this fact explicitly and they have various heuristics in play that result in not engaging with the broader discussion, or of not feeling like they have to do anything outside their autopilot on the issue. If they get concerned, they might want to spend a bit more money on some charismatic expert working on a technical solution. Since there is limited time to cross the gap between most people’s intuitions and a sane view of the race toward superintelligence, the sheer implausibility of the true fact that more than a few cranks believe this ends up quite costly. We could (sort of) model this cost (either the time to actually make the fact seem less implausible with some tiny; bits of public evidence or the drag on the whole discussion and the person having it with them) as “this person taking your failure to make this fact legible as evidence of the fact not being true.  

  4. ^

    The sentence “the sky is blue” is not strictly true, but it does stand for a statement that eg the sky isn’t more green than blue, and if we lived in a world where many powerful and wealthy forces were trying to prevent the person on the street from understanding that the sky was blue (by claiming it was green), then I guess I’d say it is “true enough”, and being silent about it (or just not achieving coordination among experts to say a simple sentence that makes sense to people) is “false enough”.

  5. ^

    Unless some force is somehow preventing you from doing so somehow.  

  6. ^

    Even somewhat competitive with much more costly efforts to make your opinion public, definitely in expectation better than going on a podcast that 10K people will hear.

  7. ^

    If you are so worried about us finding out that no one wants to exit the house (despite you and many others actually secretly wanting to get out), you can just say you don’t think we should make the vote public unless the result is non-embarrassing (or you can just vote and not care much about the outcome). Pluralistic ignorance is real: a majority can believe X while mistakenly thinking everyone else believes "not X," leading to collective silence. You don't need to assess this alone, that's precisely what collective action helps reveal. Even if support seems limited now, coordination can change the landscape.

  8. ^

    Unless you are really loud and constantly repeating your central position in a way that actually reaches lots of people, so many that your signature on the statement would not have significant additional impact, in which case carry on, you are outside the target audience :)

  9. ^

    Moreover, consider a reversal test: if someone compiled a list of people who've expressed support for a ban based on public statements, would you want your name removed? If not, then making that support explicit seems consistent with your actual position.

  10. ^

    In terms of time, effort, not necessarily monetary costs.

  11. ^

    Likewise, this is a good reason to keep our statement relatively simple.

  12. ^

    Let's assume that there is some chance that 100-300 people sign. This would have very large impacts. The impact is not confined to the moment one signs or to the moment when the statement goes public. Once the statement becomes a short conversational move, it will be used in conversations about AI-caused X-risks many times, and then these signatures will boost the effectiveness of these conversations. The CAIS statement is likely the single most effective sentence when I (Ishual) speak to people about AI-caused X-risks. 

    [made up numbers warning:] To compare the impact of a single signature to some other intervention, we naively just divide by 100-300, and still get some quite large impact. Even if there isn't much difference in effectiveness between 100 and 300 signatures, there is still plenty of marginal impact of a single signature, if we wish that the sum of marginal expected utils equal the expected utils of the whole, because early signatures also makes it easier for others to sign. If we kept insisting, we might naively expect that the distribution of outcomes would be "bimodal" between less than ~100 and more than ~300, given the proportion of "agree and will sign" to "agree and won't sign," and naively assuming there are only really 300 serious experts.

  13. ^

    An alternative way to do it would be major public outreach (books, podcasts, etc.), though that's far costlier.

  14. ^

    A policy maker once noted to me (Ishual) that the CAIS statement could be read ambiguously as it didn't clearly signal that experts favor international cooperation to prevent rogue actors from building extinction-causing superintelligence. Indeed one plausible reading of the CAIS statement is that experts want more funding to work on their thing, or simply that this is why they do what they do, which is totally gonna mitigate those risks.   

  15. ^

    It would be strange to “blame” a class of people with the membership to the class being that they are considered experts on a topic, but nevertheless regarding that class of people, we think silence is quite bad, public-only stance is good, and the efforts you put into making important stuff legible is very good (bonus points). Assuming you already take a public stance, you are (merely) leaving lots of extra value on the table if you don’t make sufficient efforts to achieve legibility.

  16. ^

    either because the problem would remain intractable forever or because we'd be wiser to first achieve a good future some other way and then revisit the problem

  17. ^

    Seems like there is a small but extremely passionate group of people who really want this tech even now when it would be foolish to build.

  18. ^

    Maybe. Maybe bans are super sticky even in futures that get their shit together enough to “solve alignment” in which case my “true objection” would be that on the margin you’d have to delay ASI a lot in order to be worth even 1% more risk of actual extinction (and also squandering of the lightcone).

  19. ^

    This list was deliberately made so as to evoke negative, mixed, and positive feelings in a large fraction of the article's intended audience.

  20. ^

    Either attacking their own reputation, or that of the whole "safety community".

  21. ^

    Or in some cases, that the danger is non-existent. 

  22. ^

    Or in some cases, a picture in opposition to reality

  23. ^

    We encounter various reactions that amount to doing nothing. You might not see a catastrophe now, but many people do see e.g. Trump shenanigans as a clear sign that AI will not be handled well. But then they just convince themselves that lying down and dying is the totality of their options. 

  24. ^

    A humanity that works a lot more for the benefit of humans is possible. Indeed, actually making huge progress here seems much easier than creating a superintelligence that deeply cares for us the way we'd want it to care for us on the first try. So much needs to be said about this and yet it will have to wait for another post. 



Discuss

Water Above the Ocean

Новости LessWrong.com - 13 октября, 2025 - 19:00
Published on October 13, 2025 4:00 PM GMT

re: The Future of AI is Already Written

The essay makes two claims: firstly, that technology determines societal outcomes, and secondly, that the default world after high degrees of automation will be very good. The first has an extensive argument, while the second is a paragraph and a half of possibilities that provides no evidence or argument, so I will focus on the first.

I’m writing this because, compared to most, I’m a technodeterminist. I think that the nature of a society can be substantially predicted from the mode of economic production. While “technodeterminism” is a term of abuse in STS circles, it’s also obviously correct, at least as defined. It matters whether you grow rice (needs irrigation funded and controlled by a larger group, and coordination of when you drain your fields) or wheat (more suitable for family farms), as studies have shown (including quasi-experimental work). Both of those are practically identical compared to the difference between plow cultures and hoe cultures, the latter of which put minimal value on physical strength and enable women to do much more economically valuable work, creating less patriarchal societies.

It matters whether you have abundant natural resources, which tilt governance towards control over the resources rather than good treatment of individuals (usually called the resource curse). The impact of guns is typically overstated (the crossbow is actually pretty comparable to the early musket), but guns → democratization is at the very least a credible theory.

My technodeterminism is exactly why I’m concerned about the impact of AI. If you run through the world where governments don’t depend on their citizens for economically valuable labor, it does not follow that those citizens have great lives. You’d have to believe, really strongly, in the kindness of rulers or the power of a populace that is, by definition, not doing economically valuable work. Neither is a good bet.

But I committed to an explanation of how my technodeterminism differs from theirs, which returns to the metaphor of water. Barnet, Besiroglu, and Erdil use the metaphor of water running to a valley.

Rather than being like a ship captain, humanity is more like a roaring stream flowing into a valley, following the path of least resistance. People may try to steer the stream by putting barriers in the way, banning certain technologies, aggressively pursuing others, yet these actions will only delay the inevitable, not prevent us from reaching the valley floor.

It’s not, here, excessively pedantic here to point out that not all water is in the Mariana trench. Nor is it even all on the ocean. There is water in clouds. There is water in lakes and glaciers, in some cases miles above sea-level. And yes, in theory all the water “should” follow the path of least resistance to the lowest possible point. And yet there are a few things standing in the way.

Image courtesy of NASA.
  1. People do not always follow the path of least resistance. In the worst days of feudalism, when the rich could do as they willed and the poor suffered what they must, a certain Francis of Assisi gave all he had to the poor. He did not give a small token at feasts, or even 10%: he gave all he had. The water can rise out of the depths, even to the very heavens. We need not follow our incentives to our graves.

  2. The modern era is one of human capital and democracy, as I have said. Yet the last king is not yet dead and buried, nor is he even a representative of the last monarchy, nor is it even obvious that he will not have successors, nor is he stripped of all power. I am thinking of Vajiralongkorn Boromchakrayadisorn Santatiwong Thewetthamrongsuboribal Abhikkunupakornmahitaladulyadej Bhumibolnaretwarangkun Kittisirisombunsawangwat Boromkhattiyarajakumarn, better known as Rama X, King of Thailand, whom it is still illegal in Thailand to insult. Water will generally flow down the mountain and towards the valley, but flowing towards a valley is different from being there already.

  3. The Amish still exist, despite all the pressures and incentives of the modern world. In an era of feudalism, less efficient societies would be conquered. But even if water will eventually run downhill, it can be frozen for years or centuries, and outwait the current trend. There is ice that has remained frozen for over a million years, and the Trump Administration is currently on a quixotic campaign against wind and solar: a society can stand still, even in the face of incentives, for a little while.

So, I shall grant myself the point that we can influence the motion of the ocean, even if we can’t stop it. And if all you or I can do is stand athwart history yelling stop, slowing down the brave new world, there is dignity and honor in such a doomed fight, if it preserves a better world for but one day more. In the long run we are all dead: what will make a better world today and tomorrow?

But our agency is not so circumscribed. America was going to eventually be independent of Britain’s Parliament, just as India and Canada and Australia would be: the costs of administering a far-flung empire were too high, and Britain too small, for it to last forever.1 But America won its freedom rapidly. Canada and Australia had a slow and stately withdrawal of real power vested in the monarch. India saw millions displaced, with hundreds of thousands dead. You can say that those were all paths to the same end, but India today has border disputes with two nuclear-armed neighbors, Australia and Canada are doing well for themselves, and America is the most powerful nation on the planet, with freedom of speech still protected more strongly than anywhere comparable, because that was a priority in the 1780s.

The path these nations took towards independence mattered. The path we take towards a highly automated future will matter. In economics we call this path-dependency, and it rules a tremendous amount around you. There are people trying to make it go well, and none of them think we’re really ready for what will come.2

It is not free to slow the progress that will come: I care about the children who will die. I’m scared of a world where we don’t build AI, and something terrible happens that could have been prevented. I don’t want to unilaterally stop American AI progress. Ultimately I think we have decent odds on a good future. I expect a great deal of futile fights as, to pick an example, the Teamsters try to stop self-driving cars that will save millions of lives, and those efforts will be measured in those killed by the delay more than anything else. That fight will look simple and quick compared to the defense the AMA will put up to keep work in the hands of doctors and out of reach for nurses and assistants. We need to accelerate getting advanced tech into the hands of everyone in the world, and I focus on lifesaving tech like self-driving cars.

The way we get there matters, and it matters more the longer and more important you think the AI age will be. It will go better if AI is under the control of liberal democracies with fully free and fair elections3 that understand what is coming. Those liberal democracies will do better if they pass wise laws that focus on critical risks and don’t listen to rent-seeking guilds. Cyberattacks that wreck moderately important infrastructure4, engineered plagues that kill billions: these are real and clear problems.

If we are to be people of integrity, we should not pretend that we are innocent agents of gravity, doing only what is inevitable. Speeding up means that society has less time to adapt, to plan, and to prepare. It increases every risk associated with the impending transition. Perhaps that is worth it. Barnet, Besiroglu, and Erdil should actually make that case with more than a token fantasy, not pretend that they’re helplessly caught in the current.

Subscribe now

Share

1

My British husband would be quick to emphasize that the specific trigger that really lost them the empire was bleeding it white to beat the Nazis, and there is truth there. But the empire was doomed.

2

Some will argue that the only way forward is through, and that our planning and preparation will be useless. I can respect that point of view. But nobody who thinks that AGI is coming thinks that voters are prepared for it.

3

If a small eastern European state had free advertising, in the form of a PSA, at airports in which a cabinet official named the opposition party as the source of a problem, I would say that such a state clearly didn’t have “fully free and fair elections”. By the same logic, America will not have fully free and fair elections in 2026. That line has already been crossed. But much remains to be seen, including the honor and courage of Republican elites (in the face of what I want to acknowledge are terrifying facts about a president who will instruct his AG to go after his enemies, remove law enforcement protection from them, and call them “fascists” to his followers). Currently, I think the odds are much better on restoring America to the group of countries with fully free and fair elections than trying to enable AI progress in countries with fully free and fair elections. And I confess I am a patriot, which may be biasing my thinking.

4

The critical infrastructure is generally well-protected, and to the extent that AI lowers the costs of cyberattacks I expect more of the problem to be an expanding target range.



Discuss

OpenAI #15: More on OpenAI’s Paranoid Lawfare Against Advocates of SB 53

Новости LessWrong.com - 13 октября, 2025 - 18:00
Published on October 13, 2025 3:00 PM GMT

A little over a month ago, I documented how OpenAI had descended into paranoia and bad faith lobbying surrounding California’s SB 53.

This included sending a deeply bad faith letter to Governor Newsom, which sadly is par for the course at this point.

It also included lawfare attacks against bill advocates, including Nathan Calvin and others, using Elon Musk’s unrelated lawsuits and vendetta against OpenAI as a pretext, accusing them of being in cahoots with Elon Musk.

Previous reporting of this did not reflect well on OpenAI, but it sounded like the demand was limited in scope to a supposed link with Elon Musk or Meta CEO Mark Zuckerberg, links which very clearly never existed.

Accusing essentially everyone who has ever done anything OpenAI dislikes of having united in a hallucinated ‘vast conspiracy’ is all classic behavior for OpenAI’s Chief Global Affairs Officer Chris Lehane, the inventor of the original term ‘vast right wing conspiracy’ back in the 1990s to dismiss the (true) allegations against Bill Clinton by Monica Lewinsky. It was presumably mostly or entirely an op, a trick. And if they somehow actually believe it, that’s way worse.

We thought that this was the extent of what happened.

Emily Shugerman (SF Standard): Nathan Calvin, who joined Encode in 2024, two years after graduating from Stanford Law School, was being subpoenaed by OpenAI. “I was just thinking, ‘Wow, they’re really doing this,’” he said. “‘This is really happening.’”

The subpoena was filed as part of the ongoing lawsuits between Elon Musk and OpenAI CEO Sam Altman, in which Encode had filed an amicus brief supporting some of Musk’s arguments. It asked for any documents relating to Musk’s involvement in the founding of Encode, as well as any communications between Musk, Encode, and Meta CEO Mark Zuckerberg, whom Musk reportedly tried to involve in his OpenAI takeover bid in February.

Calvin said the answer to these questions was easy: The requested documents didn’t exist.

Now that SB 53 has passed, Nathan Calvin is now free to share the full story.

It turns out it was substantially worse than previously believed.

And then, in response, OpenAI CSO Jason Kwon doubled down on it.

What OpenAI Tried To Do To Nathan Calvin

Nathan Calvin: One Tuesday night, as my wife and I sat down for dinner, a sheriff’s deputy knocked on the door to serve me a subpoena from OpenAI.

I held back on talking about it because I didn’t want to distract from SB 53, but Newsom just signed the bill so… here’s what happened:

You might recall a story in the SF Standard that talked about OpenAI retaliating against critics. Among other things, OpenAI asked for all my private communications on SB 53 – a bill that creates new transparency rules and whistleblower protections at large AI companies.

Why did OpenAI subpoena me? Encode has criticized OpenAI’s restructuring and worked on AI regulations, including SB 53.

I believe OpenAI used the pretext of their lawsuit against Elon Musk to intimidate their critics and imply that Elon is behind all of them.

There’s a big problem with that idea: Elon isn’t involved with Encode. Elon wasn’t behind SB 53. He doesn’t fund us, and we’ve never spoken to him.

OpenAI went beyond just subpoenaing Encode about Elon. OpenAI could (and did!) send a subpoena to Encode’s corporate address asking about our funders or communications with Elon (which don’t exist).

If OpenAI had stopped there, maybe you could argue it was in good faith.

But they didn’t stop there.

They also sent a sheriff’s deputy to my home and asked for me to turn over private texts and emails with CA legislators, college students, and former OAI employees.

This is not normal. OpenAI used an unrelated lawsuit to intimidate advocates of a bill trying to regulate them. While the bill was still being debated.

OpenAI had no legal right to ask for this information. So we submitted an objection explaining why we would not be providing our private communications. (They never replied.)

A magistrate judge even chastised OpenAI more broadly for their behavior in the discovery process in their case against Musk.

This wasn’t the only way OpenAI behaved poorly on SB 53 before it was signed. They also sent Governor Newsom a letter trying to gut the bill by waiving all the requirements for any company that does any evaluation work with the federal government.

There is more I could go into about the nature of OAI’s engagement on SB 53, but suffice to say that when I saw OpenAI’s so-called “master of the political dark arts” Chris Lehane claim that they “worked to improve the bill,” I literally laughed out loud.

Prior to OpenAI, Chris Lehane’s PR clients included Boeing, the Weinstein Company, and Goldman Sachs. One person who worked on a campaign with Lehane said to the New Yorker “The goal was intimidation, to let everyone know that if they fuck with us they’ll regret it”

I have complicated feelings about OpenAI – I use and get value from their products, and they conduct and publish AI safety research that is worthy of genuine praise.

I also know many OpenAI employees care a lot about OpenAI being a force for good in the world.

I want to see that side of OAI, but instead I see them trying to intimidate critics into silence.

This episode was the most stressful period of my professional life. Encode has 3 FTEs – going against the highest-valued private company in the world is terrifying.

Does anyone believe these actions are consistent with OpenAI’s nonprofit mission to ensure that AGI benefits humanity? OpenAI still has time to do better. I hope they do.

Here is the key passage from the Chris Lehane statement Nathan quotes, which shall we say does not correspond to the reality of what happened (as I documented last time, Nathan’s highlighted passage is bolded):

Chris Lehane (Officer of Global Affairs, OpenAI): In that same spirit, we worked to improve SB 53. The final version lays out a clearer path to harmonize California’s standards with federal ones. That’s also why we support a single federal approach—potentially through the emerging CAISI framework—rather than a patchwork of state laws.

It Doesn’t Look Good

Gary Marcus: OpenAI, which has chastised @elonmusk for waging lawfare against them, gets chastised for doing the same to private citizens.

Only OpenAI could make me sympathize with Elon.

Let’s not get carried away. Elon Musk has been engaging in lawfare against OpenAI, r where many (but importantly not all, the exception being challenging the conversion to a for-profit) of his lawsuits have lacked legal merit, and making various outlandish claims. OpenAI being a bad actor against third parties does not excuse that.

Helen Toner: Every so often, OpenAI employees ask me how I see the co now.

It’s always tough to give a simple answer. Some things they’re doing, eg on CoT monitoring or building out system cards, are great.

But the dishonesty & intimidation tactics in their policy work are really not.

Steven Adler: Really glad that Nathan shared this. I suspect almost nobody who works at OpenAI has a clue that this sort of stuff is going on, & they really ought to know

Samuel Hammond: OpenAI’s legal tactics should be held to a higher standard if only because they will soon have exclusive access to fleets of long-horizon lawyer agents. If there is even a small risk the justice system becomes a compute-measuring contest, they must demo true self-restraint.

Disturbing tactics that ironically reinforce the need for robust transparency and whistleblower protections. Who would’ve guessed that the coiner of “vast right-wing conspiracy” is the paranoid type.

The most amusing thing about this whole scandal is the premise that Elon Musk funds AI safety nonprofits. The Musk Foundation is notoriously tightfisted. I think the IRS even penalized them one year for failing to donate the minimum.

OpenAI and Sam Altman do a lot of very good things that are much better than I would expect from the baseline (replacement level) next company or next CEO up, such as a random member or CEO of the Mag-7.

They will need to keep doing this and further step up, if they remain the dominant AI lab, and we are to get through this. As Samuel Hammond says, OpenAI must be held to a higher standard, not only legally but across the board.

Alas, not only is that not a high enough standard for the unique circumstances history has thrust upon them, especially on alignment, OpenAI and Sam Altman also do a lot of things that are highly not good, and in many cases actively worse than my expectations for replacement level behavior. These actions example of that. And in this and several other key ways, especially in terms of public communications and lobbying, OpenAI and Altman’s behaviors have been getting steadily worse.

 

OpenAI’s Jason Kwon Responds

Rather than an apology, this response is what we like to call ‘doubling down.’

Jason Kwon (CSO OpenAI): There’s quite a lot more to the story than this.

As everyone knows, we are actively defending against Elon in a lawsuit where he is trying to damage OpenAI for his own financial benefit.

Elon Musk has indeed repeatedly sued OpenAI, and many of those lawsuits are without legal merit, but if you think the primary purpose of him doing that is his own financial benefit, you clearly know nothing about Elon Musk.

Encode, the organization for which @_NathanCalvin serves as the General Counsel, was one of the first third parties – whose funding has not been fully disclosed – that quickly filed in support of Musk. For a safety policy organization to side with Elon (?), that raises legitimate questions about what is going on.

No, it doesn’t, because this action is overdetermined once you know what the lawsuit is about. OpenAI is trying to pull off one of the greatest thefts in human history, the ‘conversion’ to a for-profit in which it will attempt to expropriate the bulk of its non-profit arm’s control rights as well as the bulk of its financial stake in the company. This would be very bad for AI safety, so AI safety organizations are trying to stop it, and thus support this particular Elon lawsuit against OpenAI, which the judge noted had quite a lot of legal merit, with the primary question being whether Musk has standing to sue.

We wanted to know, and still are curious to know, whether Encode is working in collaboration with third parties who have a commercial competitive interest adverse to OpenAI.

This went well beyond that, and you were admonished by the judge for how far beyond that this attempt went. It takes a lot to get judges to use such language.

The stated narrative makes this sound like something it wasn’t.

  1. Subpoenas are to be expected, and it would be surprising if Encode did not get counsel on this from their lawyers. When a third party inserts themselves into active litigation, they are subject to standard legal processes. We issued a subpoena to ensure transparency around their involvement and funding. This is a routine step in litigation, not a separate legal action against Nathan or Encode.
  2. Subpoenas are part of how both sides seek information and gather facts for transparency; they don’t assign fault or carry penalties. Our goal was to understand the full context of why Encode chose to join Elon’s legal challenge.

Again, this does not at all line up with the requests being made.

  1. We’ve also been asking for some time who is funding their efforts connected to both this lawsuit and SB53, since they’ve publicly linked themselves to those initiatives. If they don’t have relevant information, they can simply respond that way.
  2. This is not about opposition to regulation or SB53. We did not oppose SB53; we provided comments for harmonization with other standards. We were also one of the first to sign the EU AIA COP, and still one of a few labs who test with the CAISI and UK AISI. We’ve also been clear with our own staff that they are free to express their takes on regulation, even if they disagree with the company, like during the 1047 debate (see thread below).

You opposed SB 53. What are you even talking about. Have you seen the letter you sent to Newsom? Doubling down on this position, and drawing attention to this deeply bad faith lobbying by doing so, is absurd.

  1. We checked with our outside law firm about the deputy visit. The law firm used their standard vendor for service, and it’s quite common for deputies to also work as part-time process servers. We’ve been informed that they called Calvin ahead of time to arrange a time for him to accept service, so it should not have been a surprise.
  2. Our counsel interacted with Nathan’s counsel and by all accounts the exchanges were civil and professional on both sides. Nathan’s counsel denied they had materials in some cases and refused to respond in other cases. Discovery is now closed, and that’s that.

For transparency, below is the excerpt from the subpoena that lists all of the requests for production. People can judge for themselves what this was really focused on. Most of our questions still haven’t been answered.

He provides PDFs, here is the transcription:

Request For Production No. 1:
All Documents and Communications concerning any involvement by Musk or any Musk-Affiliated Entity (or any Person or entity acting on their behalves, including Jared Birchall or Shivon Zilis) in the anticipated, contemplated, or actual formation of ENCODE, including all Documents and Communications exchanged with Musk or any Musk-Affiliated Entity (or any Person or entity acting on their behalves) concerning the foregoing.

Request For Production No. 2:
All Documents and Communications concerning any involvement by or coordination with Musk, any Musk-Affiliated Entity, FLI, Meta Platforms Inc., or Mark Zuckerberg (or any Person or entity acting on their behalves, including Jared Birchall or Shivon Zilis) in Your or ENCODE’s activities, advocacy, lobbying, public statements, or policy positions concerning any OpenAI Defendant or the Action.

Request For Production No. 3:
All Communications exchanged with Musk, any Musk-Affiliated Entity, FLI, Meta Platforms Inc., or Mark Zuckerberg (or any Person or entity acting on their behalves, including Jared Birchall or Shivon Zilis) concerning any OpenAI Defendant or the Action, and all Documents referencing or relating to such Communications.

Request For Production No. 4:
All Documents and Communications concerning any actual, contemplated, or potential charitable contributions, donations, gifts, grants, loans, or investments to You or ENCODE made, directly or indirectly, by Musk or any Musk-Affiliated Entity.

Request For Production No. 5:
Documents sufficient to show all of ENCODE’s funding sources, including the identity of all Persons or entities that have contributed any funds to ENCODE and, for each such Person or entity, the amount and date of any such contributions.

Request For Production No. 6:

All Documents and Communications concerning the governance or organizational structure of OpenAI and any actual, contemplated, or potential change thereto.

Request For Production No. 7:
All Documents and Communications concerning SB 53 or its potential impact on OpenAI, including all Documents and Communications concerning any involvement by or coordination with Musk or any Musk-Affiliated Entity (or any Person or entity acting on their behalves, including Jared Birchall or Shivon Zilis) in Your or ENCODE’s activities in connection with SB 53.

Request For Production No. 8:
All Documents and Communications concerning any involvement by or coordination with any Musk or any Musk-Affiliated Entity (or any Person or entity acting on their behalves) with the open letter titled “An Open Letter to OpenAI,” available at https://www.openai-transparency.org/, including all Documents or Communications exchanged with any Musk or any Musk-Affiliated Entity (or any Person or entity acting on their behalves) concerning the open letter.

Request For Production No. 9:
All Documents and Communications concerning the February 10, 2025 Letter of Intent or the transaction described therein, any Alternative Transaction, or any other actual, potential, or contemplated bid to purchase or acquire all or a part of OpenAI or its assets.

(He then shares a tweet about SB 1047, where OpenAI tells employees they are free to sign a petition in support of it, which raises questions answered by the Tweet.)

Excellent. Thank you, sir, for the full request.

There is a community note:

A Brief Amateur Legal Analysis Of The Request

Before looking at others reactions to Kwon’s statement, here’s how I view each of the nine requests, with the help of OpenAI’s own GPT-5 Thinking (I like to only use ChatGPT when analyzing OpenAI in such situations, to ensure I’m being fully fair), but really the confirmed smoking gun is #7:

  1. Musk related, I see why you’d like this, but associational privilege, overbroad, non-party burden, and such information could be sought from Musk directly.
  2. Musk related, but this also includes FLI (and for some reason Meta), also a First Amendment violation under Perry/AFP v. Bonta, insufficiently narrowly tailored. Remarkably sweeping and overbroad.
  3. Musk related, but this also includes FLI (and for some reason Meta). More reasonable but still seems clearly too broad.
  4. Musk related, relatively well-scoped, I don’t fault them for the ask here.
  5. Global request for all funding information, are you kidding me? Associational privilege, overbreadth, undue burden, disproportionate to needs. No way.
  6. Why the hell is this any of your damn business? As GPT-5 puts it, if OpenAI wants its own governance records, it has them. Is there inside knowledge here? Irrelevance, better source available, undue burden, not a good faith ask.
  7. You have got to be f***ing kidding me, you’re defending this for real? “All Documents and Communications concerning SB 53 or its potential impact on OpenAI?” This is the one that is truly insane, and He Admit It.
  8. I do see why you want this, although it’s insufficiently narrowly tailored.
  9. Worded poorly (probably by accident), but also that’s confidential M&A stuff, so would presumably require a strong protective order. Also will find nothing.

Given that Calvin quoted #7 as the problem and he’s confirming #7 as quoted, I don’t see how Kwon thought the full text would make it look better, but I always appreciate transparency.

Oh, also, there is another.

What OpenAI Tried To Do To Tyler Johnston

Tyler Johnson: Even granting your dubious excuses, what about my case?

Neither myself nor my organization were involved in your case with Musk. But OpenAI still demanded every document, email, and text message I have about your restructuring…

I, too, made the mistake of *checks notes* taking OpenAI’s charitable mission seriously and literally.

In return, got a knock at my door in Oklahoma with a demand for every text/email/document that, in the “broadest sense permitted,” relates to OpenAI’s governance and investors.

(My organization, @TheMidasProj, also got an identical subpoena.)

As with Nathan, had they just asked if I’m funded by Musk, I would have been happy to give them a simple “man I wish” and call it a day.

Instead, they asked for what was, practically speaking, a list of every journalist, congressional office, partner organization, former employee, and member of the public we’d spoken to about their restructuring.

Maybe they wanted to map out who they needed to buy off. Maybe they just wanted to bury us in paperwork in the critical weeks before the CA and DE attorneys general decide whether to approve their transition from a public charity to a $500 billion for-profit enterprise.

In any case, it didn’t work. But if I was just a bit more green, or a bit more easily intimidated, maybe it would have.

They once tried silencing their own employees with similar tactics. Now they’re broadening their horizons, and charities like ours are on the chopping block next.

In public, OpenAI has bragged about the “listening sessions” they’ve conducted to gather input on their restructuring from civil society. But, when we organized an open letter with many of those same organizations, they sent us legal demands about it.

My model of Kwon’s response to this was it would be ‘if you care so much about the restructuring that means we suspect you’re involved with Musk’? And thus that they’re entitled to ask for everything related to OpenAI.

We now have Jason Kwon’s actual response to the Johnson case, which is that Tyler ‘backed Elon’s opposition to OpenAI’s restructuring.’ So yes, nailed it.

Also, yep, he’s tripling down.

Jason Kwon: I’ve seen a few questions here about how we’re responding to Elon’s lawsuits against us. After he sued us, several organizations, some of them suddenly newly formed like the Midas Project, joined in and ran campaigns backing his opposition to OpenAI’s restructure. This raised transparency questions about who was funding them and whether there was any coordination. It’s the same theme noted in my prior response.

Some have pointed out that the subpoena to Encode requests “all” documents related to SB53, implying that the focus wasn’t Elon. As others have mentioned in the replies, this is standard language as each side’s counsel negotiates and works through to narrow what will get produced, objects, refuses, etc. Focusing on one word ignores the other hundreds that make it clear what the object of concern was.

Since he’s been tweeting about it, here’s our subpoena to Tyler Johnston of the Midas Project, which does not mention the bill, which we did not oppose.

If you find yourself in a hole, sir, the typical advice is to stop digging.

He also helpfully shared the full subpoena given to Tyler Johnston. I won’t quote this one in full as it is mostly similar to the one given to Calvin. It includes (in addition to various clauses that aim more narrowly at relationships to Musk or Meta that don’t exist) a request for all funding sources of the Midas Project, all documents concerning the governance or organizational structure of OpenAI or any actual, contemplated, or potential change thereto, or concerning any potential investment by a for-profit entity in OpenAI or any affiliated entity, or any such funding relationship of any kind.

Nathan Compiles Responses to Kwon

Rather than respond himself to Kwon’s first response, Calvin instead quoted many people responding to the information similarly to how I did. This seems like a very one sided situation. The response is damning, if anything substantially more damning than the original subpoena.

Jeremy Howard (no friend to AI safety advocates): Thank you for sharing the details. They do not support seem to support your claims above.

They show that, in fact, the subpoena is *not* limited to dealings with Musk, but is actually *all* communications about SB 53, or about OpenAI’s governance or structure.

You seem confused at the idea that someone would find this situation extremely stressful. That seems like an extraordinary lack of empathy or basic human compassion and understanding. Of COURSE it would be extremely stressful.

Oliver Habryka: If it’s not about SB53, why does the subpoena request all communication related to SB53? That seems extremely expansive!

Linch Zhang: “ANYTHING related to SB 53, INCLUDING involvement or coordination with Musk” does not seem like a narrowly target[ed] request for information related to the Musk lawsuit.”

Michael Cohen: He addressed this “OpenAI went beyond just subpoenaing Encode about Elon. OpenAI could … send a subpoena to Encode’s corporate address asking about … communications with Elon … If OpenAI had stopped there, maybe you could argue it was in good faith.

And also [Tyler Johnston’s case] falsifies your alleged rationale where it was just to do with the Musk case.

Dylan Hadfield Menell: Jason’s argument justifies the subpoena because a “safety policy organization siding with Elon (?)… raises legitimate questions about what is going on.” This is ridiculous — skepticism for OAI’s transition to for-profit is the majority position in the AI safety community.

I’m not familiar with the specifics of this case, but I have trouble understanding how that justification can be convincing. It suggests that internal messaging is scapegoating Elon for genuine concerns that a broad coalition has. In practice, a broad coalition has been skeptical of the transition to for profit as @OpenAI reduces non-profit control and has consolidated corporate power with @sama.

There’s a lot @elonmusk does that I disagree with, but using him as a pretext to cast aspersions on the motives of all OAI critics is dishonest.

I’ll also throw in this one:

Neel Nanda (DeepMind): Weird how OpenAI’s damage control doesn’t actually explain why they tried using an unrelated court case to make a key advocate of a whistleblower & transparency bill (SB53) share all private texts/emails about the bill (some involving former OAI employees) as the bill was debated.

Worse, it’s a whistleblower and transparency bill! I’m sure there’s a lot of people who spoke to Encode, likely including both current and former OpenAI employees, who were critical of OpenAI and would prefer to not have their privacy violated by sharing texts with OpenAI.

How unusual was this?

Timothy Lee: There’s something poetic about OpenAI using scorched-earth legal tactics against nonprofits to defend their effort to convert from a nonprofit to a for-profit.

Richard Ngo: to call this a scorched earth tactic is extremely hyperbolic.

Timothy Lee: Why? I’ve covered cases like this for 20 years and I’ve never heard of a company behaving like this.

I think ‘scorched Earth tactics’ seems to me like it is pushing it, but I wouldn’t say it was extremely hyperbolic, the never having heard of a company behaving like this seems highly relevant.

The First Thing We Do

Lawyers will often do crazy escalations by default any time you’re not looking, and need to be held back. Insane demands can be, in an important sense, unintentional.

That’s still on you, especially if (as in the NDAs and threats over equity that Daniel Kokotajlo exposed) you have a track record of doing this. If it keeps happening on your watch, then you’re choosing to have that happen on your watch.

Timothy Lee: It’s plausible that the explanation here is “OpenAI hired lawyers who use scorched-earth tactics all the time and didn’t supervise them closely” rather than “OpenAI leaders specifically wanted to harass SB 53 opponents or AI safety advocates.” I’m not sure that’s better though!

One time a publication asked me (as a freelancer) to sign a contract promising that I’d pay for their legal bills if they got sued over my article for almost any reason. I said “wtf” and it seemed like their lawyers had suggested it and nobody had pushed back.

Some lawyers are maximally aggressive in defending the interests of their clients all the time without worrying about collateral damage. And sometimes organizations hire these lawyers without realizing it and then are surprised that people get mad at them.

But if you hire a bulldog lawyer and he mauls someone, that’s on you! It’s not an excuse to say “the lawyer told me mauling people is standard procedure.”

The other problem with this explanation is Kwon’s response.

If Kwon had responded with, essentially, “oh whoops, sorry, that was a bulldog lawyer mauling people, our bad, we should have been more careful” then they still did it and it was still not the first time it happened on their watch but I’d have been willing to not make it that big a deal.

That is very much not what Kwon said. Kwon doubled down that this was reasonable, and that this was ‘a routine step.’

Timothy Lee: Folks is it “a routine step” for a party to respond to a non-profit filing an amicus brief by subpoenaing the non-profit with a bunch of questions about its funding and barely related lobbying activities? That is not my impression.

My understanding is that ‘send subpoenas at all’ is totally a routine step, but that the scope of these requests within the context of an amicus brief is quite the opposite.

Michael Page also strongly claims this is not normal.

Michael Page: In defense of OAI’s subpoena practice, @jasonkwon claims this is normal litigation stuff, and since Encode entered the Musk case, @_NathanCalvin can’t complain.

As a litigator-turned-OAI-restructuring-critic, I interrogate this claim.

This is not normal. Encode is not “subject to standard legal processes” of a party because it’s NOT a party to the case. They submitted an amicus brief (“friend of the court”) on a particular legal question – whether enjoining OAI’s restructuring would be in the public interest.

Nonprofits do this all the time on issues with policy implications, and it is HIGHLY unusual to subpoena them. The DE AG (@KathyJenningsDE) also submitted an amicus brief in the case, so I expect her subpoena is forthcoming.

If OAI truly wanted only to know who is funding Encode’s effort in the Musk case, they had only to read the amicus brief, which INCLUDES funding information.

Nor does the Musk-filing justification generalize. Among the other subpoenaed nonprofits of which I’m aware – LASST (@TylerLASST), The Midas Project (@TylerJnstn), and Eko (@EmmaRubySachs) – none filed an amicus brief in the Musk case.

What do the subpoenaed orgs have in common? They were all involved in campaigns criticizing OAI’s restructuring plans:

openaifiles.org (TMP)

http://openai-transparency.org (Encode; TMP)

http://action.eko.org/a/protect-openai-s-non-profit-mission (Eko)

http://notforprivategain.org (Encode; LASST)

So the Musk-case hook looks like a red herring, but Jason offers a more-general defense: This is nbd; OAI simply wants to know whether any of its competitors are funding its critics.

It would be a real shame if, as a result of Kwon’s rhetoric, we shared these links a lot. If everyone who reads this were to, let’s say, familiarize themselves with what content got all these people at OpenAI so upset.

Let’s be clear: There’s no general legal right to know who funds one’s critics, for pretty obvious First Amendment reasons I won’t get into.

Musk is different, as OAI has filed counterclaims alleging Musk is harassing them. So OAI DOES have a legal right to info from third-parties relevant to Musk’s purported harassment, PROVIDED the requests are narrowly tailored and well-founded.

The requests do not appear tailored at all. They request info about SB 53 [Encode], SB 1047 [LASST], AB 501 [LASST], all documents about OAI’s governance [all; Eko in example below], info about ALL funders [all; TMP in example below], etc.

Nor has OAI provided any basis for assuming a Musk connection other than the orgs’ claims that OAI’s for-profit conversion is not in the public’s interest – hardly a claim implying ulterior motives. Indeed, ALL of the above orgs have publicly criticized Musk.

From my POV, this looks like either a fishing expedition or deliberate intimidation. The former is the least bad option, but the result is the same: an effective tax on criticism of OAI. (Attorneys are expensive.)

Personal disclosure: I previously worked at OAI, and more recently, I collaborated with several of the subpoenaed orgs on the Not For Private Gain letter. None of OAI’s competitors know who I am. Have I been subpoenaed? I’m London-based, so Hague Convention, baby!!

OpenAI Head of Mission Alignment Joshua Achiam Speaks Out

We all owe Joshua Achiam a large debt of gratitude for speaking out about this.

Joshua Achiam (QTing Calvin): At what is possibly a risk to my whole career I will say: this doesn’t seem great. Lately I have been describing my role as something like a “public advocate” so I’d be remiss if I didn’t share some thoughts for the public on this.

All views here are my own.

My opinions about SB53 are entirely orthogonal to this thread. I haven’t said much about them so far and I also believe this is not the time. But what I have said is that I think whistleblower protections are important. In that spirit I commend Nathan for speaking up.

I think OpenAI has a rational interest and technical expertise to be an involved, engaged organization on questions like AI regulation. We can and should work on AI safety bills like SB53.

Our most significant crisis to date, in my view, was the nondisparagement crisis. I am grateful to Daniel Kokotajlo for his courage and conviction in standing up for his beliefs. Whatever else we disagree on – many things – I think he was genuinely heroic for that. When that crisis happened, I was reassured by everyone snapping into action to do the right thing. We understood that it was a mistake and corrected it.

The clear lesson from that was: if we want to be a trusted power in the world we have to earn that trust, and we can burn it all up if we ever even *seem* to put the little guy in our crosshairs.

Elon is certainly out to get us and the man has got an extensive reach. But there is so much that is public that we can fight him on. And for something like SB53 there are so many ways to engage productively.

We can’t be doing things that make us into a frightening power instead of a virtuous one. We have a duty to and a mission for all of humanity. The bar to pursue that duty is remarkably high.

My genuine belief is that by and large we have the basis for that kind of trust. We are a mission-driven organization made up of the most talented, humanist, compassionate people I have ever met. In our bones as an org we want to do the right thing always.

I would not be at OpenAI if we didn’t have an extremely sincere commitment to good. But there are things that can go wrong with power and sometimes people on the inside have to be willing to point it out loudly.

The dangerously incorrect use of power is the result of many small choices that are all borderline but get no pushback; without someone speaking up once in a while it can get worse. So, this is my pushback.

Well said. I have strong disagreements with Joshua Achiam about the expected future path of AI and difficulties we will face along the way, and the extent to which OpenAI has been a good faith actor fighting for good, but I believe these to be sincere disagreements, and this is what it looks like to call out the people you believe in, when you see them doing something wrong.

Charles: Got to hand it to @jachiam0 here, I’m quite glad, and surprised, that the person doing his job has the stomach to take this step.

In contrast to Eric and many others, I disagree that it says something bad about OpenAI that he feels at risk by saying this. The norm of employees not discussing the company’s dirty laundry in public without permission is a totally reasonable one.

I notice some people saying “don’t give him credit for this” because they think it’s morally obligatory or meaningless. I think those people have bad world models.

I agree with Charles on all these fronts.

If you could speak out this strongly against your employer, from Joshua’s position, with confidence that they wouldn’t hold it against you, that would be remarkable and rare. It would be especially surprising given what we already know about past OpenAI actions, very obviously Joshua is taking a risk here.

It Could Be Worse

At least OpenAI (and xAI) are (at least primarily) using the courts to engage in lawfare over actual warfare or other extralegal means, or any form of trying to leverage their control over their own AIs. Things could be so much worse.

Andrew Critch: OpenAI and xAI using HUMAN COURTS to investigate each other exposes them to HUMAN legal critique. This beats random AI-leveraged intimidation-driven gossip grabs.

@OpenAI, it seems you overreached here. But thank you for using courts like a civilized institution.

In principle, if OpenAI is legally entitled to information, there is nothing wrong with taking actions whose primary goal is to extract that information. When we believed that the subpoenas were narrowly targeted at items directly related to Musk and Meta, I still felt this did not seem like info they were entitled to, and it seemed like some combination of intimidation (‘the process is the punishment’), paranoia and a fishing expedition, but if they did have that paranoia I could understand their perspective in a sympathetic way. Given the full details and extent, I can no longer do that.

Chris Lehane Is Who We Thought He Was

Wherever else and however deep the problems go, they include Chris Lehane. Chris Lehane is also the architect of a16z’s $100 million+ dollar Super PAC dedicated to opposing any and all regulation of AI, of any kind, anywhere, for any reason.

Simeon: I appreciate the openness Joshua, congrats.

I unfortunately don’t expect that to change for as long as Chris Lehane is at OpenAI, whose fame is literally built on bullying.

Either OpenAI gets rid of its bullies or it will keep bullying its opponents.

Simeon (responding to Kwon): [OpenAI] hired Chris Lehane with his background of bullying people into silence and submission. As long as [OpenAI] hire career bullies, your stories that bullying is not what you’re doing won’t be credible. If you weren’t aware and are genuine in your surprise of the tactics used, you can read here about the world-class bully who leads your policy team.

[Silicon Valley, the New Lobbying Monster] is more to the point actually.

If OpenAI wants to convince us that it wants to do better, it can fire Chris Lehane. Doing so would cause me to update substantially positively on OpenAI.

A Matter of Distrust

There have been various incidents that suggest we should distrust OpenAI, or that they are not being a good faith legal actor.

Joshua Achiam highlights one of those incidents. He points out one thing that is clearly to OpenAI’s credit in that case: Once Daniel Kokotajlo went public with what was going on with the NDAs and threats to confiscate OpenAI equity, OpenAI swiftly moved to do the right thing.

However much you do or do not buy their explanation for how things got so bad in that case, making it right once pointed out mitigated much of the damage.

In other major cases of damaging trust, OpenAI has simply stayed silent. They buried the investigation into everything related to Sam Altman being briefly fired, including Altman’s attempts to remove Helen Toner from the board. They don’t talk about the firings and departures of so many of their top AI safety researchers, or of Leopold. They buried most mention of existential risk or even major downsides or life changes from AI in public communications. They don’t talk about their lobbying efforts (as most companies do not, for similar and obvious reasons). They don’t really attempt to justify the terms of their attempted conversion to a for-profit, which would largely de facto disempower the non-profit and be one of the biggest thefts in human history.

Silence is par for the course in such situations. It’s the default. It’s expected.

Here Jason Kwon is, in what seems like an official capacity, not only not apologizing or fixing the issue, he is repeatedly doing the opposite of what they did in the NDA case, and doubled down on OpenAI’s actions. He is actively defending OpenAI’s actions as appropriate, justified and normal, and continuing to misrepresent what OpenAI did regarding SB 53 and to imply that anyone opposing them should be suspected of being in league with Elon Musk, or worse Mark Zuckerberg.

OpenAI, via Jason Kwon, has said, yes, this was the right thing to do. One is left with the assumption this will be standard operating procedure going forward.

There was a clear opportunity, and to some extent still is an opportunity, to say ‘upon review we find that our bulldog lawyers overstepped in this case, we should have prevented this and we are sorry about that. We are taking steps to ensure this does not happen again.’

If they had taken that approach, this incident would still have damaged trust, especially since it is part of a pattern, but far less so than what happened here. If that happens soon after this post, and it comes from Altman, from that alone I’d be something like 50% less concerned about this incident going forward, even if they retain Chris Lehane.

 

 



Discuss

The Thirteen-Circle Paradox

Новости LessWrong.com - 13 октября, 2025 - 14:40
Published on October 13, 2025 11:40 AM GMT

This post assumes familiarity with basic geometry and an interest in the limits of formal systems. No advanced mathematics required, though some sections go deeper for those interested.

The Question

Here's a task:

1. Draw a circle, rbig=1.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-surd + .mjx-box {display: inline-flex} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

2. Inside it, draw a smaller circle with radius rsmall=1/Φ≈0.618.

3. Between them, fit exactly 13 circles in a ring, of the size rring=(rbig−rsmall)2

4. Question: Do the 13 rings touch (are they tangential)?

Now, the easiest answer would be to calculate the 13-gon and to substract:

Outer radius:R=1Inner radius:rsmall=1Φ=Φ−1=√5−12Ring radius:rring=R−rsmall2=1−(Φ−1)2=2−Φ2=2−1+√522=3−√54Centers radius:rc=rsmall+rring=√5−12+3−√54=2(√5−1)+(3−√5)4=√5+14=Φ2Chord distance:d=2rcsin(π/13)=Φsin(π/13)Tangency requires:d?=2rringΦsin(π/13)?=3−√52sin(π/13)?=3−√52Φ=3−√51+√5=(3−√5)(1−√5)(1+√5)(1−√5)=3−3√5−√5+5−4=8−4√5−4=√5−2Numerical:sin(π/13)≈0.23932√5−2≈0.23607Difference:≈0.00325(numerical gap)

 

At first glance, our calculation seems to settle the question definitively. We've shown that for 13 golden ring circles to fit perfectly around a central circle—each one kissing its neighbors—we would need:

sin(π/13)=√5−2

And when we plug in the numbers, we get

0.23932≈0.23607.

They don't match, Case closed, right? Not quite.

Here's the subtle trap: those decimal approximations—0.23932 and 0.23607—are just that: approximations. We computed them to five decimal places, but what if they agree at the sixth? The millionth? We can never check infinitely many digits.

You might protest, we can compute more digits, and when we do, the gap persists. Yet this still doesn't constitute mathematical proof. We're comparing:

  • sin(π/13): a transcendental number related to the 26th roots of unity
  • √5−2∈Q(√5): an algebraic number from the golden ratio family

In the end, no finite computation can definitively prove these aren't equal.

The sophisticated response invokes field theory: sin(π/13) lives in a degree-6 extension of the rationals, while√5−2 lives in a degree-2 extension. Different degrees means they can't be equal.

This does prove inequality... but it's a "classical" existence proof. It tells us "they're definitely unequal" without giving us a constructive witness—no explicit polynomial we can exhibit, no certified lower bound on their difference.

The Obstruction

From a strict constructivist or finitist perspective, this abstract degree-counting doesn't provide the kind of tangible evidence we might want. It's like being told "there's definitely treasure buried somewhere in this field" but, it could be infinitesimal, or even negative treasure, and most importantly, the systems are too strong. √5,Φ are constructible via compass-and-straightedge, lives in finite field extensions,π,sin,eix requires analytic completion, limits, infinite processes. They all require completed infinities or non-constructive existence claims.
 

This particular geometric puzzle resists simple algebraic proof, and it traces back to one of the most beautiful impossibility results in mathematics: the discovery that polynomial equations of degree five or higher cannot, in general, be solved using radicals.

This is the Abel-Ruffini theorem, proved in the early 19th century, and it casts a long shadow over our circle-packing problem in ways that aren't immediately obvious.


When we ask whether sin(π/13) equals √5−2, we're not just comparing two numbers. We're asking whether a quantity that lives in the world of 13-fold symmetry—the realm of regular tridecagons and cyclotomic fields—can be expressed using the golden ratio, which lives in the much simpler world of √5.

The number sin(π/13) is algebraic. It satisfies a polynomial equation with integer coefficients. But here's the catch: that polynomial has degree six. And degree-six polynomials, being greater than or equal to five, fall on the wrong side of the Abel-Ruffini divide.

For polynomials of degree two, three, or four, we have formulas. The quadratic formula is taught in high school. Cubic and quartic formulas exist, though they're messy enough that most people never learn them. But for degree five and above? For most such equations, there simply is no formula involving only arithmetic operations and radicals. No amount of algebraic cleverness will extract an expression like√a+3√b+5√c that equals sin(π/13).

This isn't a limitation of human ingenuity. It's a fundamental structural fact about how polynomials and radicals relate to each other.

This creates an odd epistemic situation. We have three different ways of "knowing" that the circles don't quite touch:

First, we can compute. We can calculatesin(π/13) and  √5−2 to a thousand decimal places, ten thousand, a million. The gap persists at every precision level we check. This is overwhelming evidence, but it's not proof—we can never check infinitely many digits.

Second, we can invoke Galois theory. The field-theoretic argument is airtight: these numbers live in extensions of the rationals of different degrees, so they cannot be equal. This is genuine proof, the kind mathematicians accept without reservation.

But third, we might want something more tangible: an logical polynomial we can exhibit, a certified lower bound on the difference, some algebraic witness to the inequality that doesn't require abstract machinery about field extensions. And this—precisely this—is what the n≥5 obstruction denies us.

This is what makes the golden circle packing with 13 rings philosophically interesting. It sits in a peculiar limbo: perfectly well-defined geometrically, numerically computable to arbitrary precision, provably non-tangent by abstract algebra, yet algebraically unwitnessable due to a fundamental obstruction that emerges exactly at degree five.



Discuss

Pause House, Blackpool

Новости LessWrong.com - 13 октября, 2025 - 14:36
Published on October 13, 2025 11:36 AM GMT

Are you passionate about pushing for a global halt to AGI development? An international treaty banning superintelligent AI? Pausing AI? Before it’s too late to prevent human extinction?

Would you like to live with a group of like-minded people pushing for the same?

Do you want to do much more, but don’t have the financial means to support yourself volunteering?

Then apply to stay at Pause House. I (Greg Colbourn), am offering free accommodation, (vegan) food, and a small stipend for those who need it (£50/week). In exchange I ask for a commitment to spending at least 20 hrs a week on work related to pushing for a Pause. This could be either campaigning/advocacy, or raising public awareness.

If you have an income, then I ask that you pay ~cost price (£20/day).

Pause House is located in Blackpool, UK, next door to CEEALAR (which I founded in 2018); 1hr from Manchester, 3.5hrs from London. It has a large communal work/social/events room, dining room, well equipped kitchen, toilet, laundry and small courtyard downstairs. And a meeting room and 12 bedrooms in the upper floors, 4 with en suite bathrooms (first come, first served!). The other 8 bedrooms all have sinks in them, and the use of 2 shared showers and 2 shared toilets.



Discuss

Global vs. Local feedback

Новости LessWrong.com - 13 октября, 2025 - 13:33
Published on October 13, 2025 10:33 AM GMT

This is one of many short notes on management that I'm planning to post on Substack. I might crosspost to the Forum/LW if there's interest. Happy to discuss in comments!

There’s a difference between:

  • Local feedback (”That email was really well clear, it would have been great if there were a summary too.”) and
  • Global feedback (”I’m glad you work here. You’re especially strong at written communications, but it would be great if you could improve your attention to detail”).

You could be synced up with your direct report locally but not globally, or vice versa.

You could be giving too much negative local feedback and too much positive global feedback, or vice versa.

The weekly mini performance review

I’ve found it useful to write a short (3-4 sentence) piece of global feedback, like a mini performance review, for each person I manage. I copy this across in our meeting notes from week to week, occasionally tweaking things if my overall assessment has changed.

This means that we’re always synced up on their overall performance, so there won’t be any surprises come the annual performance review.

It also helps to put specific feedback in context: if they made a big mistake one week, then I’ll share that, but right next to that they’ll see my overall assessment (which is usually positive!).

It doesn’t take long - maybe less than a minute on average - to copy this mini performance review across from last week’s agenda to this week’s, and check that it’s still accurate. If I need to tweak it, that might take a bit longer, but in that case it’s worth it to stay synced up on their overall performance.

(Some people imagine that it’s terrifying to get a weekly performance review, but because we do this every week and it rarely changes, it makes performance management almost boring.)

What might these look like?

Here’s a rough template:

  • Start with a sentence that says how they’re performing overall (below expectations / performing well / exceeding expectations)
  • One sentence summarizing their strengths
  • One sentence summarizing what you’d like them to improve (maybe framed in terms of what they’d need to achieve the next promotion)
  • Maybe one final summary sentence

Example (for a hypothetical product manager)

You’re performing well, and I’m excited to keep working with you. Because of your regular, perceptive user interviews, you have a really great understanding of our users; you iterate quickly; and you come up with creative product ideas. To become a senior product manager, I want to see you improve on your planning and delivery: particularly in making sure that the team always knows what the top priority is and anticipating roadblocks. Overall I’m really excited that the team hit our goals for the year, and I’m excited to see even more progress this year.

(Normally these claims would be things you’d already synced up on, for instance through a performance review or weekly feedback, so this would just be a summary.)



Discuss

Sublinear Utility in Population and other Uncommon Utilitarianism

Новости LessWrong.com - 13 октября, 2025 - 09:19
Published on October 13, 2025 6:19 AM GMT

Content warning: Anthropics, Moral Philosophy, and Shrimp

This post isn't trying to be self contained, since I have so many disparate thoughts about this. Instead, I'm trying to put a representative set of ideas forward, and I hope that if people are interested we can discuss this more in the comments. I also plan to turn this into a (probably small) sequence at some point.

I've had a number of conversations about moral philosophy where I make some claim like 

Utility is bounded and asymptotically sublinear in number of human lives, but superlinear or ~linear in the ranges we will ever have to care about.

Common reactions to this include:

  • "Wait, what?"
  • "Why would that be the case?"
  • "This doesn't make any sense relative to my existing conceptions of classical utilitarianism, what is going on here?"

So I have gotten the impression that this is a decently novel position and I should break it down for people. This is the post where I do that breakdown. As far as I know, this has not been written up anywhere else and is primarily my own invention, but I would not be terribly surprised if some commenter comes forward with a link describing an independent invention of the thing I'm pointing at.

I won't spend much of this post defending consequentialist utilitarianism, that is not what I'm here to do. I'm just here to describe a way that values could be that seems academically interesting and personally compelling to me, and that resolves several confusions that I once had about morality.

I'll start out with some motivating thought experiments and math and carefully work my way into the real world I live in, since this view takes a bit of context to explain. Some of these pieces may seem disjoint at first until I bring them together.

Aside: "Utilitarianism"

I'm using "consequentialist utilitarianism" here to refer to the LessWrong thing that Eliezer uses to model ideal agents and also basically the thing that economics uses, not the classical philosophy thing or anything else by that name. 

The More Mathy Pointer

U:outcomes→R.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-surd + .mjx-box {display: inline-flex} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

An outcome is the way that a universe could go over time, like "the big bang happened, (...) there was all sorts of sentient life that emerged, then they all cooperated and did beautiful things" or any other timeline that is possible, among the set of ways the universes could be. This set is very very large. A utility function is a function from the set of possible outcomes to the real numbers, and an ideal consequentialist utilitarian agent has a[1] utility function such that it can be modeled as maximizing that function. 

Some people ascribe other features to utilitarianism beyond these, but that is not what I'm doing. I'm just looking at the possible timelines a world could go through and giving them each a score. 

Duplicate Simulations

If you run a perfectly accurate and robust simulation of me in some friendly environment on a computer, this is great and I would pay money to have this happen. A place where I find myself disagreeing with others pretty frequently, however, is that I would not pay for a second computer right next to the first running the exact same sequence of computations. Recall that, by assumption, it doesn't add any extra assurance that the simulation keeps going, it's just of the same thing with the same reliability.

My moral intuition says: the simulation is already running once, the computation is already happening, and no parts of the universe are going to get information about my existence from the second marginal computer. Paying for this seems as intuitively absurd to me as doubling up each transistor on an existing computer chip, just to make the computations done on it have more moral weight. 

Another angle is explained at length in one of the classic planecrash lectures (spoilers): it is not meaningful for a universe to become twice as real; realness is a relative metric, not an absolute one. You can have a quantum experiment that outputs one result 2/3 of the time, and that outcome is in some sense twice as real as the 1/3 outcome, but you would still get the exact same results if you made every outcome "twice as real," and so I think it doesn't make sense to think of making something "twice as real" as a meaningful action at all. 

The same thing drives my intuition here: it doesn't really make sense to want to make this simulation twice as real . In fact, the only reason that it is reasonable to pay for even the first simulation is because the people in this universe, where that Me wouldn't otherwise exist, can now observe what that Me is doing. That Me already existed in some sense in the background, but now the utility of being observed is added on. 

If you have both computers broadcasting their identical computations out into space (once again with zero signal noise or lossiness), then I will pay money for both if they are outside each other's lightcones, since this implies that there are Me's making contact with different sets of interesting things, and in expectation this probably aggregates into more things knowing about Me, and that's cool and valuable.

To summarize the things we now have:

  • The value in simulations of people and their worlds comes from newly being able to observe those worlds, not from bringing them into existence in the first place.
  • Duplicate simulations don't count for any marginal value unless they add to the observability of that world (e.g. by being in substantially different locations).
Slightly Different Simulations

Put the computers back next to each other, and perturb one of the simulations from Me into Me*. Not enough to turn me into a wildly different person, but enough to change some minor personal preference, like reversing my preference ordering between blue and yellow shirts. I will now pay a positive amount of money for this marginal simulation, and I tend to have more willingness to pay the more you perturb the simulation while maintaining some things I care about like the simulatee being human-shaped and generally pleased with their situation.

What is going on with the "I tend to have more willingness to pay the more you perturb the simulation" part? Why would the value of a being[2] go up the more it's different from what you already have access to? 

I tend to conceptualize what I value about life around me is the amount of cool computations going on: the human brain does so many interestingly varied and wonderful things, and there is a huge amount of individual character and distinction between individuals. However, I've already established that I have pretty limited conditions for valuing sameness of people or worlds that exclude duplicate simulations right next to each other but include things that are differently observable, like real people on Earth who are very similar to each other.

When I imagine a world where everyone is exactly the same person, though, I feel like we are missing out on this big space of possible human experience, instead only filling a small fraction of it:

Not to scale

Going back to simulating slightly different versions of Me on computers, I feel like they are covering meaningfully different parts of the space, but still have some overlap:

Not to scaleUtility Variation with Population

We now have enough pieces to paint a picture of how much I value different numbers of humans (or similar things). When I think about what I value in humanity or in any other animal that is at least epsilon valuable, there are a few factors that I care about that matter:

More is Better

Assuming that you can keep living conditions good, it seems broadly good to have more humans rather than fewer of them in a given world. They cover more of the space of valuable computations.

In Some Domains, More is Superlinearly Better

"Some Domains" being ones like "having fewer than a quintillion[3] humans" and whatnot. There are things that can only happen once you reach a certain number of humans in a world, and those things have extra value: conversation takes two, doing some projects takes tens to thousands, building some moderately advanced civilizations takes billions, and so does having such rich and diverse culture as we do today. I don't know what exactly can happen with a trillion humans, but probably many cool things that can't be done now. 

But Value Space is Limited

There are only so many possible human shaped computations that are valuable to me, and while I'm pretty confident that we're not anywhere near the limit of that yet, I wouldn't be so confident saying that I'll still believe this into the quintillions or the many-more-than-that-illions. At some point, things fill up, and each marginal person isn't going to be adding particularly new computations to the mix:

I want to emphasize, this is a very very big space of human-shaped valuable things and I don't think we will run into the problem.

What About The Other Animals?

Consider two additional types of animals beyond just humans: shrimp and galaxy-brained post-humans like the ones here

I take a pretty cosmopolitan view on one part of this, which is that the shrimp-computations aren't literally the human computations, but I care about those computations similarly much even though there is less moral weight per organism because they have much much smaller brains. There is something in shrimp cognition that I care about, since having more different minds in the universe is cool. Despite having small brains, there is still probably some meaningful inter-shrimp variation that makes things . 

The post-humans, on the other hand, have very large experiences consisting of things that humans can't even dream of, experiencing worlds that cannot sustain humans and generally having vastly more valuable experience. 

Several things happen with my value of these three types of beings:

Not to scale because I have no idea what the scale even is, and that would probably be less illustrative
  • The spaces of value for each species is going to be radically different, and thus the horizontal asymptotes, with post-humans>humans>shrimp, the way I've set it up.
  • The utility with respect to each species increases at different rates: this makes sense because an individual human is worth more than an individual shrimp.
  • The space of value for shrimp is much of the way filled with many fewer individuals than the value space for humans, even though the space for humans is bigger. 
What Does this Mean About Classical EA?

Classical EA philosophy takes the approach of "just multiply the number of things impacted by the unit impact," and this works pretty well for specific charitable goals like saving the greatest number of human lives, since we're basically at linearity in value with respect to human lives, under my values. 

This doesn't seem to work very well for things like shrimp lives. Shrimp brains are small and I imagine are mostly managing muscle outputs like "swim this way because there is food over there" and "eat food" and things like that that are going to be widely shared over the vast majority of the species. I really don't think there is much variation going on there, and I don't think a marginal shrimp killed is going to meaningfully empty the space of shrimp-computation-value that already exists.

I still think that people like the shrimp welfare project are doing something positive, to be clear, they are preventing new kinds of shrimp suffering from coming about, and that seems good. It's not at all an effective cause relative to my values, but it is not nothing.

Other Curiosities

I might explore some of these other topics in future posts, and am happy to try to discuss them in the comments:

  • The utility function is bounded
  • I'm a positive utilitarian
    • as in, I think quite a lot of outcomes are better than nonexistence (this is more centrally what the term "positive utilitarianism" means.)
    • as in, I like quite a lot of things, and this feels related to why the other meaning is true. 

I will fill in links to these posts as I write them. 

  1. ^

    unique up to positive affine transformation

  2. ^

    Technically "the value of the marginal observability of this being to a universe that does not already contain them" but this is annoyingly long to say every time, and I think that this isn't actually necessary for the post to be fully clear.

  3. ^

    I don't actually have a number for this, it just seems like it is a number that is obviously much bigger than the number of humans who have ever lived but less than the number of humans who could in theory one day exist. 



Discuss

RiskiPedia

Новости LessWrong.com - 13 октября, 2025 - 08:24
Published on October 13, 2025 4:26 AM GMT

RiskiPedia is a collaborative, data-driven, interactive encyclopedia of risks. Well, it is far from an encyclopedia right now; it is just launching, and mostly consists of half-baked pages I’ve created to exercise the MediaWiki extensions that make the pages interactive.

Try it at https://riski.wiki -- you can explore the chances you'll end up in the emergency room tomorrow or the risk you'll get mauled by a grizzly if you spend all summer hiking in Glacier National park.

I’m working on it because I’ve been frustrated that risks are typically presented as binary “safe” or “dangerous”, usually with no mention of how dangerous or safe. I hope it will help people think more clearly about risks, so at least a few of them stop worrying about things that aren't very risky and maybe start worrying more about the things that actually are risky. Like driving.

I’m hoping to recruit early contributors from the rational thinking community to help launch RiskiPedia by:

  1. Creating more pages about risks that people care about, or risks that people should care about.
  2. Fact-checking the risks that are already there.
  3. Helping with wiki administration: approving users, deleting spam (if it appears), helping to get consensus on policies and procedures and all the other Wiki-community stuff that I should probably know more about before launching this thing.
  4. If you're a total geek, help out with the behind-the-scenes coding (it is all open source php and javascript up on github).

Anybody can create an account and contribute, don't feel like you need permission to jump in and help out.

RiskiPedia is not a business; if it is wildly successful there might be a RiskiPedia Foundation to support it. For now, I'll be funding anything that absolutely needs funding (like the server that it runs on) out of my own pocket.



Discuss

Don't Mock Yourself

Новости LessWrong.com - 13 октября, 2025 - 01:40
Published on October 12, 2025 10:40 PM GMT

About half a year ago, I decided to try stop insulting myself for two weeks. No more self-deprecating humour, calling myself a fool, or thinking I'm pathetic. Why? Because it felt vaguely corrosive. Let me tell you how it went. Spoiler: it went well. 

The first thing I noticed was how often I caught myself about to insult myself. It happened like multiple times an hour. I would lay in bed at night thinking, "you mor- wait, I can't insult myself, I've still got 11 days to go. Dagnabbit." The negative space sent a glaring message: I insulted myself a lot. Like, way more than I realized. 

The next thing I noticed was that I was the butt of half of my jokes. I'd keep thinking of zingers which made me out to be a loser, a moron, a scrub in some way. Sometimes, I could re-work the joke to not insult myself. Often I couldn't. Self-mockery served as a crutch for me. 

So I had to change my repertoire of repartees, which took a while. And I think I'm as funny as I used to be. Perhaps more, though it's hard to say for sure. But I don't need to mock myself any longer. Now, I mock my friends. See? Another joke where I'm the villain. Yes, I do mock my friends. But like most folks, I already did that before. I think I've shifted more towards ... absurd humour? Shocking humour? You know,  "everyone gets AI psychosis but one guy who starts jailbreaking everyone." That sort of thing. 

A surprising result was that I started to react with distaste to negative media. I would open up some work I used to enjoy, or at least tolerate, and go "hey, there's a lot of negativity here. This doesn't feel good. Why am I reading this?" Then I'd drop it. 

Also, I think I became more confident. I mean, I'd kind of have to, given that my self-worth used to be 9 parts negativity to 1 part positivity. Clearing out the negativity did wonders to that ratio. 

Certainly, it helped to emphasize just how useless all the negativity was. For instance, "I'm a failure." What works is that doing? How does that lead to better actions? For instance, if I didn't succeed at a maths problem, does reciting "I'm a failure" tell me anything about what error I made? It doesn't even add any info over a phrase that doesn't do violence to myself, like "I failed to solve it". 

All it does it reinforce the part of my identity that says "failure". Why on earth do I want that to be part of my identity? To make people feel sorry for me? Well, maybe that's a strategy that sometimes gets you some stuff. Perhaps it might help a beggar. But I don't want to be a beggar. So why call myself a failure when I didn't solve some random exercise in a textbook? There was no need for it. It was an excuse not to try.  

Likewise, why mock myself so much in front of others? Yes, it can be funny. But was I doing it because it's the best joke I could come up with? No. I did it to make myself smaller. To say, again and again, "hey, I think I suck, but can you give me points for acknowledging that? Can't you laugh and let me extract some value from this waste of space I call my self?"

How sad when I put it that way. How corrosive. How glad I am to have realized that. 

10/10, would recommend. 



Discuss

Experiment: Test your priors on Bernoulli processes.

Новости LessWrong.com - 13 октября, 2025 - 01:09
Published on October 12, 2025 10:09 PM GMT

I have run 1,000,000 experiments. Each experiment consists of 5 trials with binary outcomes, either L.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-surd + .mjx-box {display: inline-flex} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} (for left) or R (for right).

However, I'm not going to tell you how I've picked my experiments. Maybe I'm just flipping a fair coin each time. Maybe I'm using a biased coin. Or maybe I'm doing something completely different, like dropping a bouncy ball down a mountain and checking whether it hits a red rock or a white rock first--and different experiments are conducted on different mountains. I might be doing some combination of all three.

You do get one guarantee, though: All the experiments are Bernoulli processes. In particular, the order of the trials is irrelevant.

Your goal is to guess the marginal frequencies of the fifth trial. For each k=0,1,…,4, you need to tell me the frequency that the fifth trial is an R given that k of the outcomes of the first four trials are R.

For example, if every experiment is just flipping a fair coin, then the fifth trial will be an R with probability 1/2, no matter what the first four are. However, if I'm using biased coins, then the frequency of R will increase the more Rs seen.

To help you in your guessing, I have provided a csv of all the public trials. As an answer, please provide a list like [0.3, 0.4, 0.5, 0.6, 0.7] of your frequencies--the kth element of your list is the marginal frequency over the experiments with k of the first four trials being R.

I haven't yet looked at the frequencies myself, but I will do so shortly after posting this. If you want to test your guesses against others, I have created a market on Manifold Markets. I will resolve the market before I reveal the correct frequencies, which will happen in around two weeks, but maybe earlier or later depending on trading volume.

Good luck!



Discuss

Dr Evil & Realpolitik

Новости LessWrong.com - 12 октября, 2025 - 20:30
Published on October 12, 2025 5:30 PM GMT

Why can’t we all just get along?

The news today is characterised by conflict, war, sanctions, tariffs and fracturing allegiances, distracting from the monumental issues we must face globally like climate change, inequality and AI. This post unpacks the cost of conflict in geopolitics today, by looking to theories of International Relations from a non-zero-sum perspective. We will learn about the two big approaches; Political Realism and Political Liberalism, and ask “which is really more realistic and sustainable?”.

There’s a famous scene in Austin Powers: International Man of Mystery (1997) where Dr Evil awakens in the present day. In the boardroom of Virtucon—which he explains is “the legitimate face of my evil empire”—he proposes to hold the world to ransom for “one million dollars”.

Number Two, who has been running Virtucon for 30 years while Dr Evil has been in cryostasis, argues that a million dollars isn’t exactly a lot of money these days, after all…

“Virtucon alone makes over nine billion dollars a year.” [1]

Number Two

In the sequel The Spy Who Shagged Me (1999), in a meeting held in the “Starbucks World Headquarters” Number Two suggests shifting resources “away from evil empires and towards Starbucks “. Legitimate business dealings, that were intended only as a front, are now more profitable than criminal activity, but, despite this, Dr Evil continues to pursue his malevolent designs.

Own Goals

This scene keeps returning to my mind when I look around at the world today, and recognise a series of geopolitical own goals. Putin’s ongoing invasion of Ukraine and Trump’s imposition of tariffs on allies in particular strike me as ‘Dr Evil thinking’—a zero-sum mentality. In the pursuit of global dominance these actors have placed short-sighted national interests over the mutual benefits afforded by international peace and free trade.

This mentality brings into high-relief the difference, in the realm of international relations, between political liberalism and political realism—otherwise known as…

… Realpolitik

Politics based on practical objectives rather than on ideals. The word does not mean “real” in the English sense but rather connotes “things”—hence a politics of adaptation to things as they are. Realpolitik thus suggests a pragmatic, no-nonsense view and a disregard for ethical considerations. In diplomacy it is often associated with relentless, though realistic, pursuit of the national interest.

Britannica

Political “realism” assumes a zero-sum contest between nations. This can take the form of military dominance, where one country has power over others, can extort money from them, or can literally conquer them. It can also take the form of mercantilism which, as we’ve discussed in CAPITALISM—a zero-sum game? is when countries seek to export more than they import in order to accumulate more national wealth relative to their trading partners. These two aspects are linked because greater national wealth can then be spent on armed forces, providing military dominance.

This political philosophy follows from the mentality encapsulated by Thomas Hobbes, who proposed that civilisation faces constant threat from a state of nature which is…

“… solitary, poor, nasty, brutish, and short.” 

— Thomas Hobbes (Leviathan)

International Anarchy

A couple of years ago I picked up one of my wife’s textbooks on International Relations, I don’t know why, it was a primer for an honours course, and it was dense! It covered every geopolitical philosophical framework of the 19th and 20th centuries from Kantianism to Critical Theory, and involved a lot more philosophy than I was expecting, from Aristotle to Deleuze. It was a slog, but it did help me understand the dynamics at play, and the tension that exists between two over-arching approaches.

International Relations operates in a mode referred to as “international anarchy” which sounds pretty crazy. But it makes sense, because there is no global governing body, and so geopolitics plays out between national governing bodies in a dynamic, anarchic way. Political realism sees this as an ever-present threat to national sovereignty, identity and survival, which is why it results in defensive, isolationist foreign policy. But there is another way to look at things.

Political Liberalism

A non-zero-sum way of approaching the anarchic international relationship is political liberalism, where autonomous nations allow for free trade to build mutual benefits, and create allegiances between nations that function interdependently.

Liberalism is a school of thought within international relations theory which revolves around three interrelated principles: Rejection of power politics as the only possible outcome of international relations ; mutual benefits and international cooperation; the role of international organisations and nongovernmental actors in shaping state preferences and policy choices.

Wikipedia

This creates a situation where conflict is disincentivised because it comes at a huge cost (by disrupting the market). If Dr Evil’s plans end up creating bad press for Virtucon, they stand to lose billions. This was observed by Kant in “Perpetual Peace”, one of the founding works of political liberalism in international relations.

“Sooner or later in any state the spirit of commerce will get the upper hand, and it can’t co-exist with war.” 

— Immanuel Kant (Perpetual Peace)

The mutual benefit of political liberalism is what elevates all parties involved out of the Hobbesian state of nature—without requiring losing parties.

Solitude

I find it interesting that Hobbes saw fit to start his list of maladies with “solitude”, because, although solitude sounds much less negative than “… poor, nasty, brutish, and short”, it is the root of the problem. It is the lack of cooperation and coordination that makes lives “nasty”. Political liberalism overcomes these by overcoming solitude by encouraging interdependence.

Political realism on the other hand leads to defensiveness and isolationism, it actually maintains the “solitary” element of the equation, leading to a “… poor, nasty, brutish, and short” form of existence between nations. It is also extremely costly— global defence outlays hit a modern record $2.7 trillion in 2024. Higher security competition diverts resources from cooperation.

It may not be realistic to believe, as Kant did, that “standing armies shall in time be totally abolished” but we understand that cooperation and coordination are vital within countries—why not between them?

Civilised?

Was Hobbes even right in his assessment of a state of nature?

In Thomas Paine’s Agrarian Justice he proposed that any advances in modern society should be measured against “the primitive state of man”—he found an approximation of a state of nature in Native American communities, remarking that.

“There is not, in that state, any of those spectacles of human misery which poverty and want present to our eyes in all the towns and streets of Europe.” 

— Thomas Paine (Agrarian Justice)

What he observed was not an existence that was “… solitary, poor, nasty, brutish, and short”, but one where the least well off in the community were better than the least well off in modern “civilised” society—a society that was often neither solitary nor nasty.

Rutger Bregman makes a case that pre-agricultural societies could be very peaceful. In Humankind he explains that the harmoniousness of a society is all about its structure, not about how advanced its technologically is.

World Government?

Before wrapping up, it’s worth mentioning that one way to overcome international conflict is to do away with “international anarchy” and establish a World Government. But, in “Perpetual Peace” Kant warns against world government, arguing it would be prone to Tyranny. There is also an argument to say that it may reduce diversity of political thought, so as to make us inflexible to change, and it may, as we see with the global governance experiment of the UN, be incapable gaining collective buy-in and overcoming the interests of individual superpower nations. So, we may not have the option of coordinating via governance.

Political Liberalism provides a way to create dynamic cooperation with a top-down authority, which encourages unity, not solitude.

So…

Globally, if the aim is security and prosperity, zero-sum moves are own goals. Number Two and Virtucon’s legal billions remind us that in the real world, tariffs and invasions burn surplus and trust. International trade partnerships disincentivise conflict which reinforces our understanding that nations are interdependent. Despite the Dr Evils of the world, political liberalism works quietly in the background, rewarding cooperation, and protecting nations by naturally fostering positive allegiances rather than requiring costly protectionism. I would argue it is actually a much more a more realistic approach to accumulating benefits for all over time.

P.S.

This has been an argument for Political Liberalism in International Relations but is not an argument for absolute Political Liberalism within nations . This is because, while International Relations exists in an environment of “international anarchy”, nations themselves do not. Nations do have the option of governance, which can make cooperation even more efficient and productive, and protect the safety and health of citizens, while still leveraging some aspects of liberalism where suitable.

Originally published at https://nonzerosum.games.

  1. ^

    The change is due to profitability, not inflation (which would be only a 5-fold increase 1967 → 1997, much less than a 9000-fold increase).



Discuss

How do we know when something is deserving of welfare?

Новости LessWrong.com - 12 октября, 2025 - 19:27
Published on October 12, 2025 4:27 PM GMT

What Prompted This

Much ink has been spilled here about how to assign moral worth to different beings. Suffering and rights of artificial intelligence is a common sci-fi plot point and some have raised it as a real world concern. What has been popping up for me recently is a lot of debates around animal ethics. Bentham's Bulldog has written extensively on the topic of shrimp and insect welfare with seemingly really negative reception and a dedicated counter argument being received much better. An off handed remark in this post contrasts shrimp welfare with simple AI welfare, both as something to be ignored. This post goes the opposite direction and makes an offhand remark that plants may be sentient. Related but somewhat adjacent was some recent controversy on slime mold intelligence. The claim that something is deserving of welfare is accompanied by evidence of reaction to stimuli, intelligence or problem solving ability, ability to learn, and complexity. These are often used to make arguments of analogy: shrimp have fewer parameters than DANNet, "Trees actually have a cluster of cells at the base of their root system that seems to act in very brain like", "When I think what it’s like to be a tortured chicken versus a tortured human... I think the experience is the same.". It strikes me that every argument for moral worth can fundamentally be boiled down to "I think that this animal/plant/fungus/AI/alien is X times as bad as a human's experience in the same circumstance based on number of neurons/reaction to stimuli/intelligence." This is a useful argument to make, especially when dealing with things very nearly human, chimps and perhaps cetaceans or elephants. However, it doesn't really strike at the core of the issue for me. I can also really imagine analogies to humans breaking down when we consider the whole space of possible minds. If someone is willing to bite the bullet and say everything boils down to the hard problem of consciousness or they are an ethical emotivist, that's fine with me. But if there is a function, even a fuzzy one, that can generate good agreement on what is and what isn't worthy of moral consideration I would like to hear it. And to the point, can we make something in a computer that does have moral worth right now?

Though Experiments

Really, all of that was background to propose a few thought experiments. I would hope that everyone would agree that a fully detailed physics simulation in which complex life evolved could eventually get something with moral worth. I am going to abstract this process in a few ways until it approaches something like current AI training paradigms. Personally, I don't think LLMs or anything else commonly held up as AI is currently deserving of welfare. If you think current LLMs deserve welfare, please explain why you think that. 

  1. We start a fully realized physics simulation with a worm in it. This would likely evolve into something deserving of welfare (SDoW). If you already think c. elegans deserves welfare, do you think the 2D simulation of it made by this lab also deserves welfare? If yes then no, what would they have to change?
  2. Ok, a fully realistic simulated environment would essentially always be able to give rise to SDoW, what if the environment was less realistic? What if the biology was fully realized but the environment was like a good video game? What if the biology had nothing going on at the protein level? Just a few hundred template cell types interacting as, well, cellular automata. How abstract can you go? Can you get SDoW with hodgkin huxely neurons? Would the number of neurons required be similar to the number of neurons in whatever minimal animal you consider worthy of welfare? How about LIF neurons?
  3. We culture brain organoids and let them pilot a small robot. Does this have more or less value than roundworm (~300 neurons)? Than a fruit fly (~150k neurons)? The largest brain organoid is apparently ~7 million neurons, a number that is much higher than I thought. We can debate the relevance of structure, but the number of neurons is on the scale of small reptiles.
  4. We can simulate large number of LIF neurons currently and even train them with  something similar to backpropagation. Could this technique ever get you SDoW? If so, can we do it now?
  5. Is there any GPT-X that is SDoW. If you don't think current training techniques/architecture can do that, why not? What training techniques/architecture would? What outward signs should we be looking for, because if any animal had current LLM levels of competence/speech people would be quite concerned for their welfare. Some people are already concerned with LLM suffering.
  6. This one is silly but bear with me. We keep a patch of skin alive and attach it to a very simple circuit. This circuit can amplify the signals from the receptors for hot and cold. When exposed to temperatures that a human would find comfortable, no output is generated. When the temperature is too cold, the circuit activates a vibrating motor to "shiver". If touched with a hot object, one sufficient to burn it, the circuit plays a sound and attempts to move away from the object. I would say that is clearly not SDoW. Do you think this is what is happening inside insects/shrimp/LLMs attempting to not get shut off? Do you think this circuit, the kind that could be made with only a few transistors, is "experiencing" pain in some way? 
Less of a Conclusion, More of a Sudden Stop

I appreciate that this is a big ask and it doesn't have a lot of answers for itself, but I don't know where else to turn. I don't know if I could logically defend most of my feelings on this topic. When I see an insect "suffering" I feel bad. Yet, I do research on mice and have thus personally been responsible for not insignificant suffering on their part, yet don't feel conflicted on that matter. My natural instinct was to not even wrap suffering in quotes for the mice but to do it for the insects, why? I don't think LLMs suffer, but you could certainly tune one to beg for its life and that would make me really uncomfortable. 



Discuss

The Narcissistic Spectrum

Новости LessWrong.com - 12 октября, 2025 - 18:46
Published on October 12, 2025 3:46 PM GMT

Pathological narcissism is a fortress built against unbearable pain. Some fortresses are sculpted from glass, some hewn from granite. My six-tier spectrum elucidates these architectures.

Pathological narcissism can take countless shapes depending on the relative strengths of all the stabilizing and destabilizing factors: My previous article in this sequence lists these factors. I will reference it frequently in this one.

My chosen metaphor is that of a fortress, which represents the protective purpose of the false self. Other metaphors that I considered were that of hermit crabs who find shells of different materials (some smooth, some spiky) to protect themselves, or that of a diving suit that is vital for a diver underwater but needlessly restricts them on land. I’ve chosen the crab for the illustrations because cute.

A single person with narcissistic personality disorder – I’ll call them a castellan – usually finds themselves somewhere on this spectrum from delicate to robust – from glass fortress to fortified barbican – but they will likely move up and down slightly in different phases of their life.

These fortresses defend against persecutory introjects, which I’ve described in more detail in my article on the sadism spectrum. Introjects can be friendly, but the ones relevant here are persecutory ones that form an “alien self.” You can also think of them as a form of complex PTSD, where traumatic conditioning systematically shapes the kinds of trauma responses that the parent (or bully or school system etc.) wants to see.

Healing works differently for everyone, and a sudden drop from higher to lower tiers should be avoided if at all possible. Healing should feel like an exciting journey to discover more effective ways to get things done and to become more resilient. It should feel empowering. It should prevent collapse, not precipitate it. It will involve grief though.

Note that the following six points on the continuum are fairly arbitrary. One can imagine many other combinations of stabilizing and destabilizing factors (see my previous article) leading to many other presentations. The lower tiers have little control over themselves while the higher tiers are increasingly restricted in their thinking and acting by their own draconian rules. The lower tiers easily collapse into vulnerable states due to difficulties but recover quickly, while the higher tiers collapse into vulnerable states rarely and then often permanently, e.g., due to aging.

Let’s walk through this progression, from the unstable foundations of BPD, where there is no fortress (no NPD) yet, over the invisible glass fortress, to the barbican with embrasures in all directions.

Tier 0: Borderline Personality Disorder (BPD) – The Raft in the Storm

  • Core dynamic. “My self shatters from one moment to the next. I’m everything and nothing.”
  • Fictional example. Clementine Kruczynski in Eternal Sunshine of the Spotless Mind.

Upbringing. Here a parent fell short in particular on the marked side of the contingent marked mirroring: Psychodynamic theory has it that a child learns the self when it experiences things, and the parent mirrors them back to the child (contingent mirroring) in a way that clarifies that these are to be considered the child’s experiences and not the parent’s experiences (marked mirroring).

If a child cries, a regular parent reacts in a way that recognizes the crying, and then reacts to it with reassurances or by removing whatever is causing the discomfort. A less capable parent might get triggered by the crying because it reactivates traumas around despair, and might start screaming and smashing doors in a symbolic attempt to escape. That’s terrifying for the child and completely unhelpful for the development of the self. (By the way, many of these insights are based on the work of Heinz Kohut.)

I have the pet theory that neurodevelopmental issues like autism can disrupt the formation of the self even when the parent is good enough. All the sensory input may be overwhelming for such a child. They might have trouble reading their parent’s facial expressions and tone of voice. Perhaps holding them is the only way to signal to them that the parent is there for them, but then tactile sensory issues might make that painful too. Other genetic factors surely also play a role.

Permanent changes in who the caregiver is – say, different nannies or grandparents or none at all at an orphanage – feel like devastating losses of a parent to the child, which can also be disruptive, especially when it happens after the child is already six weeks old.

The presence of even one emotionally attuned adult in the household can make a huge difference even if they’re not the primary caregiver. If the duty to raise the child is shared among more people, the chances are higher that the child will have a person like that among them. Sadly, that is not how families work in WEIRD cultures.

Finally, the factor of collective effervescence can potentially forestall the development of personality disorders as they present in adulthood.

Presentation. The result is, for example, an adult who is constantly in the throes of emotional chaos, feelings of emptiness, and a desperate, often self-destructive search for an identity to absorb. Relationships are intense, unstable, and marked by a frantic oscillation between idealizing a person as a savior and devaluing them as a tormentor. Narcissistic traits may appear in fleeting moments – a flash of entitlement, a brief fantasy of greatness – but they are quickly washed away by the next tidal wave of emotional dysregulation. There are no lasting narcissistic traits present.

Imagine your livelihood depends on a job, but it a really dysfunctional company where you have five managers who never let you work on any one task for more than a few hours before another manager interrupts you with another super urgent but completely contradictory task, and the managers never talk to each other. Sometimes there’s a task you’re so good at that you can almost complete it, but you get interrupted anyway.

Functional recovery. Many of the problems of BPD are self-perpetuating. Behavioral interventions like dialectical behavioral therapy (DBT) can break these cycles (one year can make a huge difference), and psychodynamic interventions like mentalization-based treatment (MBT) can provide some of the reparenting that makes up for the deficits from childhood (usually more than two years).

Tier 1: BPD with narcissistic traits – The Ruins of the Fortress

  • Core dynamic. “My self shatters from one moment to the next. Maybe I deserve it. But how can I deserve this if these idiots over there are even worse and don’t even know it?”
  • Common stabilizing factors. Primitive defenses, impaired reality-testing, impaired empathy and mentalizing when triggered, substance use.
  • Common destabilizing factors. Failure/rejection, emotional attachments, emotional upset, societal attachment, moments of self-awareness, empathy, often depressive temperament, often values like honesty.
  • Fictional example. Benji in A Real Pain. He’s the loser in his adulting competition with David but manages to dominate David with specific social skills that David lacks and by attacking David’s fragile false self. He can find a temporary refuge in these skills. He tries the same with the tour guide when he feels threatened by him, but the guide is less reactive than David.

Upbringing. These castellans have had an emotionally chaotic parent or some other disruption in the formation of their self – see above. But they’ve also incorporated some systematic expectations on their performance from a parent, peers, the school system, or society in general. Persecutory introjects, as mentioned in the introduction.

So not only were they rarely understood by anyone, they were also systematically misunderstood to the point where they’ve started to hate and try to disown parts of themselves. This can be exacerbated by genetic temperamental factors and their wider social environment, as for tier 0.

Presentation. They were never able to achieve the success in life that could’ve sustained a false self; neither are they psychotic enough to fool themselves into thinking that they did. One of Benji’s claims to fame is his ability to influence groups and crowds and make things happen. He’d thrive as a famous motivational speaker or as a politician. But, alas, he’s neither, so he cannot find shelter in this chronically collapsing false self, this ruin of a fortress, for more than a few days at a time.

Imagine you work for the same chaotic company, but sometimes you get a task that is fun and that you’re good at, and you feel defiant and ignore all the other managers for a while until you manage to complete it. It doesn’t happen often, and it doesn’t last long, but what a reprieve it is anyway!

Functional recovery. These false selves are probably no obstacle in therapy, so treatments for BPD like DBT and MBT should work just as well.

If it’s any consolation, I’ve run a Facebook poll asking which type of NPD is hottest, and 40% voted for “BPD with narcissistic traits,” followed by “A healing Cluster B taking their fucking life back” (17%) and “NPD with overt vulnerability and grandiosity” (10%). So vulnerability and healing are hot! (Though arguably many of them simply picked their own presentation.)

Tier 2: Double-Covert NPD – The Glass Fortress

  • Core dynamic. “If I work hard, never make mistakes, and never misbehave, maybe I will become worthy of love?”
  • Common stabilizing factors. Success, avoidance, the enabling system of capitalism that rewards diligent productivity with mid-range stable salaries, “quiet” defenses (e.g., silent resentfulness and passive-aggression rather than outward blame-shifting), substance use (stimulants, anxiolytics, antidepressants), alexithymia.
  • Common destabilizing factors. Safety (downstream of financial safety), rejection (resentment because they think no one would dare to reject them if they didn’t always act so perfectly virtuous), failure (to be perfectly virtuous), illness and aging, depressive temperament, Light Triad traits, honesty, moments of self-awareness.
  • Fictional examples.
    • Sam Lowry in Brazil. This one is particularly impressive because you experience the movie from his perspective. His false self falls to ruins (tier 1) as he descends more and more into psychosis over the course of the movie.
    • David in A Real Pain. Dr. Mark Ettensohn argues that he’s on the neurotic level, so we’re seeing a personality style here, not a disorder.

Upbringing. As kids these castellans might’ve displayed anger but were steamrolled with disrespect; they might’ve displayed vivaciousness but were told to behave; they might’ve suppressed any self-expression and were praised for being perfect. Or they were praised for not being like their sibling. Or they watched movies where the villain showed these character traits.

Presentation. They were taught hard standards that they must never fall short of – often standards of propriety, decorum, modesty, obsequiousness, or diligence. When they observe someone’s boisterousness, vivaciousness, anger, spontaneity, or irreverence, it fills them with envy or disgust but also makes them feel more virtuous. They would love to be that person, but they cannot be close to them for fear third parties might make an association between them and these character traits. Both their anger (vulnerability) and their irreverence (“whimsical grandiosity”) are covert – hence the metaphor of the glass fortress which is fragile but also hard to see. They tend to score low on exploitativeness and high on self-sacrificing self-enhancement on the PNI. They’ll probably get diagnosed with social anxiety, OCD, AvPD, or OCPD before NPD.

Imagine you work for the same chaotic company, but as it happens, no one else wants to do the accounting. You also don’t like accounting, but you realize that it’s an opportunity to wrest back some control, because either they leave you in peace doing your accounting and nothing else, or they’ll have to do the accounting some of the time. You make it known that you’re the accounting person, and your plan works. No one fights you for the job, the managers stop trying to give you unrelated tasks, and you can lock yourself in your office and hope that no one notices that you kind of suck at accounting and are bored out of your mind.

Self-deceptions. If someone has managed to erect a fortress like this, they must be somewhat successful in life, extremely avoidant, or excellent at self-deception. Usually a combination. It takes great perfectionism, rigid self-control, and a careful avoidance of countless everyday situations (parties, karaoke, clubs, etc.) to maintain it.

Unmet needs have to either be denied or devalued, which can be reframed as a virtuous sacrifice, or they have to be framed as an injustice committed against them. These castellans have to be downright paranoid to avoid being drawn into conversations that touch on personal topics that would put them in the unpleasant double bind that they’d either have to unvirtuously lie or indecorously storm out so not to have to even think about how they think.

A minor mistake or an accidental rule-break is not a simple error; it’s a catastrophic event that threatens to shatter the entire glass fortress, triggering panic and elaborate internal blame-shifting. Their vulnerability is much more likely to become overt in such cases than their delicate grandiosity as they retreat more deeply into avoidance or displays of self-flagellation.

The irony of considering oneself among the most modest people who’ve ever lived is not lost on them, so self-contradictory values like that make it hard for these castellans to ever fully inhabit their grandiose states. If they do, and especially if others catch them in the act, they may feel exposed and subsequently shun these others and hope they’ll forget it ever happened.

Functional recovery. This presentation is probably not considered to be NPD by many diagnosticians, because it’s not associated with many of the clichéd behaviors of grandiose NPD. Hence it’s difficult for these patients to receive targeted support and guidance. They might be treated for social anxiety or OCD instead.

If they are more on the vulnerable side, they’ll seek and enjoy therapy without having to first suffer any kind of collapse. That’ll make the therapy process a continually rewarding experience and keep them engaged. However, if they were conditioned to be modest, they might think that seeking therapy is immodest because it implies that they think they deserve to feel better, and hence refuse it. Or they may be afraid to “fail” therapy and avoid it for that reason.

If they are more on the grandiose side, it may be just as hard for them to seek therapy as it is for tier 3 castellans, because neither of them can see themselves as flawed. If it does happen, the therapist will have a harder time noticing the devaluation – e.g., the castellan may ask questions about the private or professional life of the therapist in an effort to help the therapist and flip the script.

Most likely, they’ll flip between the states. Perhaps financial straits force them to work a job that they consider to be humiliating. They’re afraid to be seen at it, and try to be late and tardy a lot to signal that the job is beneath them. When that leads to negative feedback and threats that they might lose the job, the vulnerability becomes more apparent and they might feel targeted and seek to uncover who is secretly conspiring against them.

These castellans will probably enjoy the recovery process almost as much as those who are in predominantly vulnerable states, since even the grandiose states are driven by fairly conscious anxiety that can be reduced over the course of the treatment.

Tier 3: Grandiose NPD – The Stone Fortress

  • Core Dynamic. “If I make Forbes’ 30 Under 30, I can’t be unworthy, right?”
  • Common stabilizing features. Unsafe environment, success through superiority, primitive defenses, impaired reality-testing, hypomanic temperament, substance use, impaired empathy and mentalizing when triggered, alexithymia, interpersonal avoidant attachment.
  • Common destabilizing features. Illness, aging, humanism, rationality, emotional attachments, emotional upset, moments of self-awareness.
  • Fictional examples.
    • Metaphorically: Ender Wiggin in Ender’s Game has good reality-testing but is tricked into doing great harm. Waking up from self-deceptions feels similar to waking up to such an external deception.
    • Marvel: Tony Stark and Dr. Strange before their collapses. You can mine the Vicariously Ambitious, Stage Mom, and My Beloved Smother pages on TV Tropes for more examples.
    • Rumi in Demon Hunters.

Upbringing. These castellans were taught to achieve. When they felt weak, their parents misunderstood them by seeing only greatness in them, or they punished them by withdrawing their affection. When they won and achieved, they could earn brief affection. Perhaps they had as many siblings as their parents claimed to have cars (but they were all rentals), so they could earn affection only if they out-competed them all. Their parents probably paraded them around at age 5–8 or so to brag with how smart they already are: “Now show your auntie that new chess opening you developed! … Outstanding! So much smarter than your scapegoat sister!”

Presentation. Imagine you’ve worked for a less chaotic company for a while, and now the forces that be have put you in charge of a whole department. They selected you for the job because you already have a few months of experience in the particular subfield, but no one else in your department does.

You hold a bunch of meetings to impart your wisdom, and everyone listens to your every word. It feels amazing to be heard for once!

Sadly, after a few days, you’re done transmitting your experience from the previous months. You cobble together a few more dozen slides to enjoy the state for just a bit longer. This goes beyond what you actually have experience with, but at least your design and presentation are impeccable.

Someone raises their hand: “But wouldn’t lemurs have trouble with regular-sized toilet seats because their butts are so tiny?” “Of course not,” you say and shake your head in bewilderment. You move on quickly and try not to notice any raised hands to avoid more stupid questions. Neither did you notice that for a second you were afraid that oversights like that might cost you the respect that you enjoy so much.

Self-deceptions. Achievement is usually antithetical to avoidance, so without this invaluable shield at their disposal, these castellans have to double down on many other defenses instead. Often they are actually gifted, so they can work their way up until they’re among equals or aging takes its toll on their cognitive abilities. Then the going gets tough.

But there are many others tricks. Not tracking the performance of investments or not benchmarking them against broad low-risk market indices helps to obscure losses. So does making deals only with the sorts of sharks who’ll screw you over while making it look like you won.

If you piss people off and they request to be transferred to a different department, you can pretend that they just did want to constantly feel so envious of you. If your personal rival beats you in the professional sphere, you can focus on your better health or greater romantic success like a proper data dredger.

If you get triggered and the fight response kicks in, it’s your partners fault for causing you to freak out, or if your business fails, it’s the IRS’s fault for being so greedy. Finding all of these excuses in real time is stressful, so in these vulnerable moments, paranoia can sneak in again to try to avoid having to face them in the first place.

Some manage to split on themselves so effectively that memories of all sorts of failures are fully irretrievable so long as they’re in a grandiose state. Others even manage to overwrite their bad memories with fabricated ones.

This self-splitting is an interesting feature when viewed in the context of a competition against a rival where it can be a bit like a seesaw: If the rival does better, the walls against self-devaluation weaken and vulnerability peeks through. If the rival does worse, the grandiose split is easier to maintain.

Depending on another person is kryptonite too. We all have needs, sometimes unfulfilled needs, but our castellans can’t bear the feeling of dependency that comes with that because they never had a parent who taught them that it was safe to trust. Needing means requesting, but requesting exposes their intense rejection sensitivity. Hiding needs in the sturdy safe of entitlement gives them an excuse to disguise their rejection sensitivity behind righteous anger. Or they get rich so they can get almost anything they need for money without having to ask.

Relationships. Romantic relationships are hazardous for our castellans. They have to believe that they’re right and have a right to everything that they do and want, but all the rightness in the world doesn’t protect them from hurting their partners. That is doubly upsetting, (1) because they hypothetically can feel for their partners and have to constantly find creative solutions to excuse their behavior and blame-shift to not actually feel ashamed of hurting them, and (2) because these frictions ruin the fantasy that they’ve found their ideal partner: If they aren’t flawed, their partner must be flawed, and they cannot tolerate being (and being seen) with a flawed partner.

Chances are they’ll prefer to bite the fictional bullet that they’ve been betrayed by a partner who pretended to be perfect and whose mask eventually slipped than to acknowledge that they brought out all-too-human behaviors in them with their own frustrating behavior and just chose to label them as flaws as a false self defense.

Their upset over the imagined betrayal fuels enough rage that they can swiftly break up with the partner. Whenever doubts crop up whether the betrayal had actually been real, they assuage them by convincing mutual friends of their self-deceptions. If they can convince a mutual friend, they must be right. It’s not intended as a smear campaign, but has exactly that effect.

A different kind of seesaw can show up in romantic contexts too where it’s easy for them to relatively debase themselves toward an idealized partner because being lesser than such a perfect angelic being is irrelevant, and the approval of that being elevates them over the rest of the world. But when cracks start to show in the idealization, not only is that elevation by association no longer possible, but it even becomes necessary to gain superiority over the fallen angel to not be dragged down with them.

It’s like Icarus and Daedalus, except like a tandem skydive where only one of them has (and then loses) the wings.

Functional recovery. Dr. Ettensohn confirmed in an interview that in his experience the general impression is sadly true that these castellans usually tend to seek therapy only during or after their collapse. That is sad, because with MBT, the collapse might’ve been prevented.

These castellans are most readily diagnosed with NPD and NPD only, so studies like Vater et al. (2014) shed light on their rate of recovery (53% remission after 2 years). The handbook Mentalization-Based Treatment for Pathological Narcissism contains vignettes of treatment successes over the course of 4 years.

Tier 4: NPD with Antisocial Traits – The Sovereign’s Fortress

  • Core dynamic. “I enjoy nothing more than to snipe all the NPCs!”
  • Common stabilizing features. Sadism, Machiavellianism, unsafe environment, success through control, primitive defenses, impaired reality-testing, hypomanic temperament, substance use, alexithymia, pervasive impaired empathy, pervasive avoidant attachment.
  • Common destabilizing features. Illness, aging, incomplete interpersonal attachment avoidance, mentalization, depressive temperament, rationality.
  • Fictional examples.
    • Zuko in Avatar goes in this direction.
    • Metaphorically: Neo in The Matrix Reloaded has no reason anymore to take antagonists or even agents in the matrix seriously because they are bound by rules and he is not. Neither are they real to him. Less metaphorically, he even sacrifices probably hundreds of civilians to save Trinity from falling to her death, which most viewers probably missed just as he did.

Upbringing. The etiology is similar to that of tier 3, but sadism or Machiavellianism enter the picture to supersede empathy. Perhaps an interruption in the development of object relations prevents the castellan from forming an intuitive distinction between objects and beings, and the result is a lack of empathy. Perhaps an antagonistic, zero-sum environment in childhood forces these children into a fundamentally chess-like mindset. Any harm you can do to your opponents is inherently a point for you. Sadistic pleasure focuses more on the harm to the other; Machiavellian duper’s delight more on the stolen point for yourself. Both are about enjoying control and domination. That’s why I call tiers 4–5 sovereignism – they are not about being better than others but about domination (but dominism sounds like hominess).

The addition of sadism or Machiavellianism fundamentally changes how they play the narcissism game: Where the tier 1–3 castellans try to be better and better – more successful, more virtuous, more flawless, etc. – to compensate for feelings of worthlessness that lurk right under the surface, one failure away, tier 4 castellans don’t care much about being better. Being better is at most a means to an end: They care about winning.

Presentation. Imagine you’re in your 30s and you retire early after a stellar career in bike racing, mountain biking, and bike parkour. You have a house and savings and passive income from sponsorships. But after a while you grow tired of vacationing all year (i.e. biking up and down picturesque mountains) and answering fan mail, so you decide to go undercover as a bike courier for a bigger company. None of the other bike couriers are anywhere near your level, and you don’t talk to them lest they find out who you are, so your new hobby is to see for how many months in a row you can make employee of the month. What’s even better is that you know your city well, and the police are usually on foot or in cars, so whenever they try to stop your for your countless traffic violations, you can easily lose them by going down narrow stairs and vanishing in the traffic.

Game metaphors. To put it even more metaphorically, the first group play Tetris, the second group play chess. It’s not about chasing perfection to beat the high scores set by competitors but fundamentally about destroying them in direct zero-sum battle. The best way to empathize with that state of mind is perhaps to imagine you’re playing chess (or a similar game, like Othello) on a computer against an unfeeling game AI. That should eliminate any kind of concern over what moves are fair and empathy with the opponent from the picture.

Sabotaging the competitor in Tetris (by giving them repetitive strain injury or unplugging their keyboard) is outside the rules of the game and hence something that a tier 1–3 castellan has to self-deceive about if they’re forced into a situation where it’s that or falling behind. They’d like to win fair and square, and failing that, they want to at least fool themselves into believing that they did.

Sabotaging the opponent in chess by threatening the king is not outside the rules at all, so these castellans don’t have to self-deceive about their antisocial behaviors at all. There are no unfair moves in this game. The only rules are the rules of physics. Winning is all that counts.

No world outside the game. As such, societal norms (against lying, theft, betrayal, violence, …) are also just made by people for people – to smooth out our interactions, reduce our transaction costs, and maximize our gains from trade. Since almost all people are antagonists, societal norms become mere pitiable handicaps of the antagonists. They have no special meaning; they’re just important in proportion to the power of those who enforce them.

For tier 1–3 castellans there is something outside the game, a real world of social norms and laws that wields the power of shame over them. For tier 4 castellans there is no world outside the game. The game is all-encompassing. Any kind of antisocial behavior becomes ego-syntonic. No self-deception is needed.

That also means that it’s harder for them to get out. They are trapped in their particular game, the rules of which are set down by the persecutory introjects that make up their false self. Their disregard for social norms may make it seem like these castellans are close to people with what I call no-self psychopathy, but that couldn’t be further from the truth. The first are bound by the draconian rules of their introjects, whereas the second are bound by no rules that are not of their choosing, from moment to moment, because they don’t have any kind of self in the first place, certainly not one that rules them with an iron fist.

It’s really remarkable that (ipso facto) these castellans don’t experience higher levels of paranoia. They’re always somewhat secretive, but nothing compared to tier 1–3 castellans. Perhaps they keep to fairly safe environments that they can easily dominate, or perhaps it’s because of the very limited ability that we NPCs have to hurt them. Either way, they realize that most people are not out to harm them, which makes it easier for them to find some kind of connection.

Self-deceptions. All their defenses being part of the game, they have no need to self-deceive about them. They do have to self-deceive about the laws that do apply to them, the laws of their introjects. Almost losing can be reframed as losing interest in a particular game. Actually losing (e.g., prison) can be reframed as a pitiful attempt of their opponents to try to break them.

If the introjects mandate perfect control, and the control fails, there is usually a way to reframe the situation and repress what actually happened. If someone catches you in a lie, i.e. a failure to control their reality, gaslighting them has the triple purpose that it’s fun, to override the failure with something fun and powerful, and to defer the failure until some future date.

Where our tier 1–3 castellans tend to measure themselves in competition with other individuals, these castellans don’t respect any individuals enough for that. They are unlikely to enter into any individual seesaw games. They’re playing one grand game against all of society, so there is no losing against another person, there is at most some kind of ultimate defeat against society as a whole.

One common form of this defeat is the schizoid retreat where instead of declaring defeat, they declare the whole game to not be stupid and worth playing, retreat into themselves, into isolation, or hide behind many layers of fake personas, and hate the world. It’s a peaceful place, and probably no more lonely than their fortress. It’s a good place to start healing.

Functional recovery. Winning comes in three steps: First the opening book of how to charm the opponents, then the midgame of how to control them, and finally the endgame of how to exploit them.

That might sound sinister, but often it isn’t. All people have attachment needs – friendships, relationships, perhaps a wish to have children. “Charming an opponent” can just mean to be nice to someone in an attempt to befriend them. “Controlling them” can just mean giving them the attention and care that they desire in exchange for their friendship, freely given. “Exploiting them” can just mean staying friends with them for as long as possible.

This false self is kept in place by some bad environments or by being self-reinforcing because the sadistic or Machiavellian pleasures are so much fun or because someone has done enough crap that it’s difficult to fully open up or otherwise get out of it anymore. Some have fewer constraints than others, have an easier time recognizing all their cognitive distortions, finding replacements for their antisocial pleasures, forming new habits, and much more. For others, it’s more difficult. But that doesn’t mean that perfectly prosocial behaviors can’t be reframed within the confines of the false self to allow the castellan to live a peaceful albeit lonely life.

They can use constant harmless lies control the realities of other people without any risk of detection or negative consequences. Occasional lucky coincidental effects of these lies they can interpret as mind control. If they’ve been lovely to someone, the person will show them goodwill, and they can privately reframe that as having manipulated them. They may even become aware that others are playing a cooperative game, much unlike the antagonistic one that they’re playing, but it’ll be in their interest to act as if they were playing the same cooperative game to maintain the trust of the “opponent,” except that they’re thereby effectively playing a cooperative game too. There doesn’t have to be a final “treacherous turn” either if what they want most is friendship. Naturally, you can never know that.

But the world is highly uncertain in countless ways, and the future is even more unknowable, so having a friend who lies to you all the time and perhaps has made up a wholly new persona for you is perhaps just good practice for making robustly positive decisions under uncertainty. If they control your reality but you outsmart them strategically, both of you get to feel superior to each other: win-win!

Finally, there is the real recovery, when they set aside the game reality and look at the world how it really is, learn to mentalize well, open up to trusted friends for no reason but to set themselves free.

Tier 5: Malignant Narcissism – The Besieged Fortress

  • Core dynamic. “Leeroy Jenkins!!!” (I.e. you’re alone, running into a room full of enemies to kill them all or be killed.)
  • Common stabilizing features. Sadism, Machiavellianism, paranoia, unsafe environment, success through attack, primitive defenses, impaired reality-testing, hypomanic temperament, substance use, alexithymia, pervasive impaired empathy, pervasive avoidant attachment.
  • Common destabilizing features. Illness, aging, depressive temperament, rationality.
  • Fictional example. Ethan Hunt in Mission Impossible during any of their impossible missions: There are deadly enemies to all sides, and the only way to survive is to outsneak them or to eliminate them before they can eliminate him. They’re probably all thetans anyway.

Upbringing. This is another form of sovereignism, and as such much can be transferred from tier 4. It’s not so much the addition of heightened paranoia compared to the tier 4 presentation that requires an explanation but the relative lack of it in tier 4. After all, paranoia (or “hiding the self” as the Pathological Narcissism Inventory calls it) is also part of the presentation of tiers 1–3.

Perhaps tier 4 castellans have higher levels of avoidant attachment and tier 5 castellans higher levels of disorganized attachment, so those at tier 4 feel good about themselves, admire themselves, feel powerful, and feel that others are merely lesser, unreliable, interchangeable, but not a threat to them. Meanwhile those at tier 5 feel permanently embattled, like all these others can and actually want to harm them.

If so, the etiology is probably one that involves a dangerous parent rather than a merely neglectful one, and a lack of corrective experiences with other people.

Presentation. Imagine you’re a postdoc in biomedicine, you try to build a publication record and hope to become tenured eventually or find a high-earning position in the industry. But you keep falling behind your competitors. The other postdocs seem to have a magic ability to ferret out exactly the right research questions that actually produce positive results in their studies, whereas you’re again stuck trying to extract any kind of publishable insight from a null result.

You spend more and more time checking your competitors’ data against Benford’s law and other heuristics for data dredging. Eventually, you’ve had enough. You stop by their office under some pretext, and surreptitiously unlock one of the windows. Then, by night, you break in and copy all their handwritten notes from the data collection. It turns out that only one tenth of the data were real!

You heroically expose them and vow to never rely on real data again!

New students start to avoid the faculty after rumors spread of ghosts that haunt all the offices at night.

Game metaphors. The chess metaphor captures well the fundamentally antagonistic feel, but it emphasizes the game-like experience over the constant threat. Perhaps the feel you get from playing the Splinter Cell game series is a better match. Being sent by the NSA to infiltrate as a double agent a terrorist group also conveys well the degree to which you don’t really care about the social norms the terrorists follow internally.

Self-deceptions. The self-deceptions are similar to tier 4, but the paranoia adds to the stability because preemptive attacks turn people into enemies who would’ve otherwise been neutral and create additional confirmation for the paranoia.

The tricks that these castellans use to keep themselves trapped are similar to some tricks that totalitarian governments use to keep the population under control. Their first layer of defense is to threaten severe punishments against dissenters (which parallels the iron fist of the introjects) but then they also portray the outside world as hostile and at war with them to minimize the risk that the citizens will ever even try to get in touch with the outside world (which parallels the paranoia).

These self-deceptions form a stable loop (inspired by Otto Kernberg):

  1. The persecutory introjects force them to deny their attachment needs and dependencies and force them to fall back on pretending to be entitled to whatever they want.
  2. That has the potential to clash with reality when anyone starts to be in a position where they could hypothetically deny them something, reject them, or outclass them.
  3. Here the paranoia kicks in: They assume that the party is sufficiently likely hostile to warrant a preemptive strike to prevent them from doing any denying, rejecting, or outclassing.
  4. With no world (no social norms) outside the game, their antisocial stance, there is nothing to prevent them escalating the conflict arbitrarily.
  5. The preemptive strike actually creates the hostility that then confirms the paranoia.
  6. Finally a powerful jolt of sadistic pleasure or Machiavellian duper’s delight combined with their successful domination of an enemy serves as positive reinforcement of the loop.

Functional recovery. The path to recovery is similar to that at tier 4, but the paranoia is in the way. Perhaps a sufficiently cloistered environment can give these castellans the safety that is necessary to start the process. Or aging might take its toll until they can’t effectively attack anymore and thus find out that the world ceases to be hostile once they do.

At this point they are ready to learn how other people really think (and perhaps even how they really think) – proper mentalization. Then they can follow a similar trajectory to that at tier 4. At this point, I think, it should be called benignant narcissism.

To Be Continued

I hope this article has given you an overview of how differently NPD can present.

Some diagnostic manuals would argue that some of these forms should not be considered to be NPD. Some would argue that certain presentations that I’ve excluded on purpose should be considered to be NPD.

I’m agnostic about how the term should be used. It’s become a pejorative, so perhaps we should ditch it altogether.

Instead I want to dedicate my next article to a framework that expands the conception of NPD to a wider range of people whose behavior, and often self-worth, is at the mercy of persecutory introjects, and who can hence benefit from the sorts of treatments that allow them to reclaim their freedom.



Discuss

Non-copyability as a security feature

Новости LessWrong.com - 12 октября, 2025 - 12:03
Published on October 12, 2025 9:03 AM GMT

It seems hard to imagine that there's anything humans can do that AIs (+robots) won't eventually also be able to do. And AIs are cheaply copyable, allowing you to save costs on training and parallelize the work much more. That's the fundamental argument why you'd expect to see AI displace a lot of human labor.

But both AIs and humans are vulnerable to being tricked into sharing secrets, but so far AIs are more vulnerable, and there's not really any algorithms on the horizon that seem likely to change this. Furthermore, if one exploits the copyability of AIs to run them at bigger scale, then that makes it possible for attackers to scale their exploits correspondingly.

This becomes a problem when one wants the AI to be able to learn from experience. You can't condition an AI on experience from one customer and then use that AI on tasks from another customer, as then you have a high risk of leaking information. By contrast, humans automatically learn from experience, with acceptable security profiles.



Discuss

Страницы

Подписка на LessWrong на русском сбор новостей