Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 17 минут 51 секунда назад

The Techno-Pessimist Lens

12 сентября, 2025 - 03:47
Published on September 12, 2025 12:47 AM GMT

Lenses

Techno-optimism is the belief that the advancement of technology is generally good and has historically made society better.  Techno-pessimism is the opposite belief, that technology has generally made the world worse.  Both are lenses, or general ways of looking at the world that bring some aspects of reality into focus while obscuring others.  Judging which narrative is correct "on balance" is less useful than understanding what each has to offer.

Our World in Data is one of many sources making the case for techno-optimism.  Development in Progress makes a (balanced) case for techno-pessimism.  This post is an attempt to steelman techno-pessimism.

Boundaries

One can question techno-pessimism in terms of its point of reference.  Does its critique of "modernity" only apply to industrial tools, or could it extend all the way back to the plough, or even fire?  I see this as a mirror of the challenge faced by those who are optimistic about technology in general but pessimistic about AI.

For the pessimist, the harms of technology are structural and continuous, not the result of some particular technology being exceptional.  But this continuity need not be total.  I assume that most pessimists draw a cutoff somewhere.  For me, the most natural boundary is the agricultural revolution, since that is when the most powerful mechanisms of unintended consequences seemed to begin taking on a life of their own.  Others might place it earlier or later.

Three Pillars of Techno-Pessimism

Techno-pessimism has three core arguments, each of which is sufficient to make the overall narrative's case if accepted fully.  They can also add together if accepted partially.

1. Moral Atrocity

Modern humans live at the expense of domestic animals tortured in factory farms, wild animals driven to extinction, and indigenous cultures genocided out of existence.  The harms from these mass killings outweigh any benefits to the "winners."

2. Self Termination

Quite a lot of extinction (or at least catastrophe) level threats have emerged in the last 100 years, including nuclear war, global warming, biodiversity loss, and runaway AI.  The time since the industrial (or even agricultural) revolution is a historical eyeblink in the context of ~300,000 years of homo sapiens' existence, so the timing of these threats is not a coincidence.  The elevated risk of self-termination negates any temporary benefits of humanity's present conditions.

3. Subjective Wellbeing

It's not even clear that present-day humans are better off than our distant ancestors.  Yes, there are quite a few metrics on which things have improved.  And yes, if one adds up all of the obvious things to measure, the balance looks positive.  But how do we know the metrics aren't cherry-picked?  Or perhaps the selection process is biased because the positives are for some reason systematically easier to notice than the negatives?  The most meaningful measures must be holistic, and the best available data for such a holistic assessment is subjective measures of wellbeing.  The most obvious of these include rates of depression and suicide.  It's hard to get data on this from pre-modern and especially pre-civilizational times, but I would be surprised if these are massively down in the modern era.  Put simply, ask yourself: how much happier and satisfied with life are you than a pre-colonial Native American or modern Pirahã?  One can object that diminished (or just non-elevated) subjective wellbeing is irrational,  but this is changing the subject to discussing the cause of the problem, not its existence.

Is it Really Different this Time?

One can object to each of the pillars by arguing that "the only way out is through," or that future technology will solve the problems resulting from past technology.  Genetic engineering could bring back extinct species.  Synthetic meat could replace factory farms.  Nuclear power could replace fossil fuels.  But there are reasons to be skeptical.

First, no one set out to commit moral atrocity, diminish subjective well being, and certainly not trigger self-termination.  These are all unintended consequences of people pursuing other goals, so why we should expect by default that new "solutions" won't have unintended consequences of their own?

Second, technological improvements don’t displace harmful practices where market dynamics absorb the gains while leaving externalized costs intact.  For example, the argument that nuclear energy will replace fossil fuels assumes that these two energy sources are substitutes for each other.  But if one instead assumes that societies find ways to use as much energy as they can get then one should expect that these two sources will add to each other and the environment will suffer the full consequences of both.  The latter assumption is supported by the Jevons paradox, where gains in efficiency cause new industries to become profitable, which increases energy demand that outweighs any impact of efficiency.

Population Ethics

One can challenge each of the pillars on the basis of totalist population ethics.  In my back-of-the-envelope calculation, the increase in utilitarian benefit of human population increase since pre-agricultural times outweighs the cost of both wild animal population reduction and domestic animal suffering, given defensible assumptions about the relative value of human vs. animal life.[1]  I haven't run the math, but I can imagine self-termination working out similarly, given that humanity (and life on Earth generally) would eventually die off in the absence of technology (when the Sun explodes at the very latest), so a giant spike in utility could potentially compensate for an early cutoff, especially given the possibility of space colonization.  More people being alive today could also potentially compensate for subjective wellbeing going down, as long as the result isn't negative.  When one combines all three of these, I expect the math to get assumption-dependent and uncertain, but to hold for conservative estimates.

Leaning on population ethics is a legitimate move, but it’s also a highly unintuitive one, and should be made explicit.  From other moral systems, the story looks different.  A deontologist might argue that killing other populations to expand one’s own is seriously not cool, math be damned.  A virtue ethicist might see species and cultural loss as a failure of stewardship and a sacrifice of our moral integrity.

Mechanisms of Techno-Pessimism

Population ethics aside, one reason that techno-pessimism can seem implausible is the difficulty in seeing a viable mechanism.  Moral atrocity and self-termination don’t require much explanation, since the potential causes are relatively obvious: immorality, shortsightedness, and coalition politics.  Whether you find them persuasive depends largely on your moral values and forecasting.  

The third pillar requires more unpacking.  Optimists can point to clear, local improvements, so for techno-pessimism to make sense, something else must be worsening enough to offset those improvements.  These offsets may fully counter the gains or simply make them appear more modest, depending on how strongly one weighs the third pillar.

But why would people collectively choose to make their lives worse?  Techno-pessimists don’t actually need to answer this question to justify their worldview, since it’s possible to observe a pattern without knowing its cause.  Considering potential mechanisms is still worthwhile, however, because identifying causes is the first step toward finding solutions.

Self-Fulfilling Prophecy

Issues of moral atrocity and self-termination are overblown and subjective wellbeing would be fine if it wasn't for the alarmist narratives fueling misguided policy and public anxiety.  Implied solution: dispute the techno-pessimist lens to interrupt its self-fulfilling nature.  Stop playing the victim and be grateful for what you have!

Externalized Cost

Benefits of technologies tend to be direct: clearly measurable and accruing to the people using the tech, whereas downsides are often indirect and externalized.  Where the former are easier to see and incentivize, people can take actions that cause more harm than good globally while causing more good than harm locally.  Implied solution: internalize the externalized costs.

Adapt or Die

Tech that is adaptive becomes obligate.  Once a technology exists that provides a relative benefit to the people who choose to use it, anyone who doesn't use it is at a competitive disadvantage, with the end result that everyone has no choice but to use it even if the resulting equilibrium makes everyone worse off.  Implied solution: coordinate to prevent anyone from benefitting from anti-social actions, under threat of majority punishment.

Unintended Consequences

Technology may be designed with a specific use case in mind, but its effect is to make certain types of actions easier, which often facilitates a whole range of other use cases.  All of these uses in aggregate shift more meta things like the societal equilibrium and peoples' experience of the world, all of which has ripple effects that, among other things, influence the trajectory of which types of tech are built next, creating all kinds of unpredictable feedback loops.  One's expectations regarding whether the overall result of such feedback loops are good or bad depend on one's beliefs regarding techno-optimism/pessimism.  Implied solution: be more cautious about what you create, using tools like system dynamics to at least try to approximate second order effects.

Out of Distribution Effects

Technologies have shifted the societal equilibrium of the world in a way that tends to take us further from the conditions of our ancestral environment.  Agriculture, for example, led to societies with populations far exceeding Dunbar's number, which then required people to consciously design government structures.  Moving out of distribution like this resulted in a series of nonintuitive challenges, in turn leading to countless “dumb” mistakes and local minima, in the form of fragile and exploitative political systems.  Implied solution: Treat complexity as having a cost.  Design future technologies with an aim towards making daily life and larger systems more intuitive to navigate.  Consider also (incrementally) eliminating systems that introduce a lot of complexity for relatively small gains.

Solutions

A major objection to techno-pessimism takes the form: "OK, so what if things are getting worse?  What do you want, to go back to living in caves?!"  This is what I call buffer reasoning, or refusal to engage with a question out of dislike for an assumed solution. But it is entirely consistent to recognize a problem while rejecting the most obvious solutions.  Going back to pre-agricultural ways of living, for example, is obviously untenable for the simple reason that the world population is far larger than can be supported by pre-modern production methods.  Such a transition, if it occurred rapidly, would involve mass death.

Real solutions require deep, comprehensive understanding of the relevant problems and often involve trade-offs.  As can be seen from the Mechanisms section above, each diagnosis comes with a different implied solution.  Most of these require some form of restraint on development, which has a cost.  This is why it is worth being deliberate about how we balance the techno-optimist and pessimist lenses: our assumptions about the overall balance of harms and benefits anchors our sense of which trade-offs are worthwhile.

Relevance to AI Safety

Narratives about technology inform default assumptions about new technologies, which in turn inform policy beliefs.  For example, given a techno-optimist narrative, believing that governments should pause AI development requires accepting a seemingly extraordinary claim that this particular technology is exceptional.  Alternatively, one can argue that AI is better framed as a new form of intelligence than as a new form of technology (and also that the former is dangerous).  These are by no means insurmountable hurdles, but their presence starts AI safety advocates off at a disadvantage in conversations about policy.  In contrast, if one holds a more techno-pessimistic worldview, then AI being dangerous is a natural and default expectation.  This is not to say that one should choose a narrative based on whether it outputs political conclusions you like, only that narratives are worth noticing.  The lens you choose shapes the futures you see, and the paths we take to realize them.

  1. ^

    The linked spreadsheet is a back-of-the-envelope calculation for the change in the value of life since 10,000 BCE (pre agriculture).  For humans, I start by taking the population * average life expectancy to calculate year-adjusted population.  I set the life expectancy of early humans to 30 to include infant mortality.  One could defensibly ignore this factor and set life expectancy to 55, but this has a negligible impact on the overall calculation.  Next, I multiply year-adjusted population by quality of life (qol) for a qol-adjusted population.  I subtract the 10,000 BCE result from the modern result and then multiply that by the moral value of a human to get the change in total value of humanity.

    I assume that the moral value of a human and also the average quality of life for humans in 10,000 BCE is 1 because these are reference values to anchor other judgements.  If one believes that quality of life has doubled in modern times (ignoring life expectancy increases because those are already accounted for), then modern qol would be 2.  If one believes that a wild animal has 1 hundredth the moral value of a human, then the moral value fields for animals should be set to 0.01.  Numbers in bold are meant to be changed by the reader based on their beliefs.

    I make a similar calculation for wild animals, domestic animals, and fish.  These could have been lumped into one group, but I wanted to distinguish between animals whose populations have been reduced by habitat destruction (wild animals and fish) but otherwise live as they used to vs. animals who have been brought into an existence that involves great suffering (domestic animals) and have their qol set to a negative value (also intended to be adjusted by the reader).  I don't distinguish between animals in factory farms and house pets within domestic animals because the latter are a much smaller population.  I also wanted to distinguish between land animals vs. fish so that I could set different groups of animals as having very different moral values (take that, fish!)

    Finally, I add the totals for each group together, where humans are positive and wild animals, domestic animals, and fish are negative.  I notice that for reasonable estimates of populations and intuitive free-variable values (modern human qol, moral value of animals and fish, and domestic animal qol) the balance comes out positive.

    This is not what I expected!  My intention in creating this spreadsheet was to demonstrate just the opposite, that one would have to assume incredibly low moral values for animals for the balance to come out positive, but this is not where my math led me.  This doesn't mean my argument for techno-pessimism is necessarily wrong, but I shouldn't ground it in utilitarian math.

    Please feel free to copy this spreadsheet to change the empirical or variable numbers, change the categories or whatever else, and let me know if you come to different or otherwise interesting conclusions.



Discuss

Gratitude Journal Fallacy

12 сентября, 2025 - 03:21
Published on September 12, 2025 12:21 AM GMT

People who think of themselves as rationalists, but have not yet gotten good at rationality, may be susceptible to this.

Basically, these people want to feel like they came up with or found out about a Super Awesome Thing that No One Does that Creates a Lot of Awesomeness with Comparatively Little Effort. (This is especially dangerous if the person has found out about the concept of more dakka.) They may try the thing, and it works. But if it doesn't, they sometimes just continue doing it because it has become part of their self-image.

Think of keeping a gratitude journal. Studies show it improves your well-being, so I did it. After about a week I realized that I was not doing it because it was improving my well-being, only because I wanted to think of myself as someone who does Awesome Things No One Does. And it did improve my well-being, but only in that it made me feel sort-of-happy for like 2 minutes. In the time it takes for me to write about the stuff I was grateful for each day, I could do some other thing, that creates more utility-per-unit-effort.



Discuss

Optical rectennas are not a promising clean energy technology

12 сентября, 2025 - 02:08
Published on September 11, 2025 11:08 PM GMT

“Optical rectennas” (or sometimes “nantennas”) are a technology that is sometimes advertised as a path towards converting solar energy to electricity with higher efficiency than normal solar cells. I looked into them extensively as a postdoc a decade ago, wound up concluding that they were extremely unpromising, and moved on to other things. Every year or two since then, I run into someone who is very enthusiastic about the potential of optical rectennas, and I try to talk them out of it. After this happened yet again yesterday, I figured I'd share my spiel publicly!

(For some relevant background context, check out my write-ups on the fundamental efficiency limit of single-junction solar cells, and on the thermodynamic efficiency limit of any solar energy conversion technology whatsoever.)

1. What is a rectenna?

Rectenna is short for “rectifying antenna”, i.e. a combination of an antenna (a thing that can transfer electromagnetic waves from free space into a wire or vice-versa) and a rectifier (a.k.a. diode).

Rectennas are an old and established technology for radio-frequency (RF) electromagnetic waves. For example, if you have a very-low-power gadget, you can power it with a rectenna that scavenges energy from nearby commercial radio stations.

Basically, the commercial radio station emits an electromagnetic wave in free space, and the antenna converts that into an RF signal in a wire (“waveguide”). However, this signal is “AC”—its voltage cycles between positive and negative, at megahertz frequencies, averaging to zero. You can’t recharge a battery with such a signal; it would slightly charge the battery one nanosecond, then slightly discharge it a few nanoseconds later, etc. Hence the diode, which converts (part of) that energy to DC, allowing it to usefully charge batteries or power any other electrical component.

2. If RF rectennas can turn RF electromagnetic waves into electrical energy, why can’t optical rectennas turn sunlight into electrical energy?

Well, they can! That is, they can if you’re OK with very low power conversion efficiency. Very, very, very low. Like, 0.00…1% power conversion efficiency. I don't even remember how many zeros there were.

Are higher efficiencies possible for an optical rectenna? Yes! That is, if you’re collecting energy from an intense focused high-power laser, rather than from sunlight.

Why do I say this? There are two problems.

3. The easy problem: antennas

The easy problem is scaling down the antenna until it is nano-scale, such that the antenna is sized to absorb and emit sunlight-appropriate electromagnetic waves (e.g. 500 nm wavelength), instead of RF waves (e.g. 500,000,000 nm wavelength).

Making this nano-scale device, and making it inexpensive to mass-produce such that it covers an inexpensive sheet, and getting the antennas to absorb lots of sunlight, constitute the easy problem. This is tractable. It's not trivial, but if this were the only problem, I would expect commercial optical rectennas in short order.

Absorbing lots of sunlight was never the problem! If you want a surface to absorb lots of sunlight, just paint it black!

The hard part is getting useful electrical energy out of that absorbed sunlight. Which brings us to…

4. The hard problem: diodes

The hard problem is finding a diode which will rectify that energy. I claim that there is no commercially-available diode, nor any prototype diode, nor any computer-simulation-of-a-diode, nor even a whiteboard sketch of a possible diode, that is on track to rectify these electromagnetic waves and turn them into useful energy.

There are actually two problems: speed and voltage.

The speed problem is that almost all diodes stop rectifying signals if the frequency of those signals is too high. If memory serves, one common problem is that the diode has too high a capacitance, and another is that electrons can only move so fast. Remember, the sun emits electromagnetic waves with a frequency of around 500 THz = 500,000,000 MHz. This rules out almost all types of diodes.

And that's actually the less thorny problem, compared to:

The voltage problem is that, for the small wavelength of sunlight, you need a small antenna, and a small antenna has a small absorption cross-section with which it can collect light. So you wind up with very very little sunlight energy getting absorbed by any given antenna, and thus very little voltage in the attached circuit—if memory serves, well under a millivolt.

Alas, diodes stop being diodes if the voltage of the signal is extremely small. Just look at the IV curves:

Just like in Calculus 101, if you take a curve and zoom into a very narrow range of x-values, it looks like a line. Hence, if you take a device which functions as a diode for ±1V signals (left), that same device probably functions as a resistor for ±1mV signals (right). 

If the diode doesn’t actually rectify, then the power is absorbed instead of converted to usable energy.

Taking these together, it turns out that there are diodes which are fast enough for optical frequencies (metal-insulator-metal “MIM” diodes), but they do not turn on sharply at ultra-low voltage. There are diodes which turn on sharper than usual at low voltage (“backwards diodes”), but I don’t think they can support such high frequencies. And even if they could, even these diodes are not remotely close to being sharp enough for our (IIRC sub-millivolt) signal.

There is no diode, to my knowledge, that can work for this device. During this little postdoc project, I spent quite a while scouring the literature, and even trying to invent my own crazy new device concepts, but failed to find anything remotely close to meeting these specs.

5. But what if we combine the power collected by many antennas into a single waveguide, to increase the voltage?

Alas, the second law of thermodynamics mandates a fundamental tradeoff between the absorption cross-section and the collection angle, of any antenna (or antenna array) whatsoever. If you make a bigger antenna, it will collect more light, but only when the sun is in exactly the right spot in the sky.

6. But what if we track the sun?

Well, then you lose ~100% of the light on cloudy days, and you lose 15% of the light even on clear days (e.g. the light from the blue sky). Worse, you need very accurate 2-axis mechanical tracking as the sun moves across the sky, which is expensive. More importantly, if you’re willing to bear those costs (of precise two-axis mechanical tracking and losing the diffuse light), then you might as well just use a big lens and a tiny solar cell, and then the solar cell can be one of those super-expensive multi-junction cells, which incidentally is already getting pretty close to the theoretical efficiency limit on solar energy conversion.

Anyway, we shouldn’t compare with the theoretical efficiency limit, but rather with a rectenna, which I very much doubt would exceed 1% efficiency even at the theoretical limit of maximum possible absorption cross-section. (Why is there a limit, as opposed to being able to track ever-more-accurately? Because the sun is a disc, not a point. So there’s only so much that you can cut down the light collection angle.)

7. But what if we track the sun virtually, with a phased array?

That only solves one of the many problems above, and anyway phased arrays don’t work because sunlight is broadband.

8. But what if we use an impedance converter?

I glossed over this above, but to translate from “there is only so much electrical energy in the waveguide at any given time” to “there is only so much voltage across the diode”, you also need to know the relevant impedance. If the impedance is high enough, you can get a higher voltage for the same electrcial energy.

Alas…

Problem 1 is that high impedance makes the diode speed problem even worse, by effectively increasing the RC time constant.

Problem 2 is that there seems to be a tradeoff between how much you increase impedance, and how broadband your impedance converter is. And sunlight is rather broadband.

I say "seems to be a tradeoff", in that I am unaware of a law of physics demanding such a tradeoff. But it seems to be the case for all the impedance-conversion techniques that I know of, or at least for the techniques that work for these kinds of very high frequency waves (e.g. things like quarter-wave impedance transformers).

9. But what if … something else?

Hey, what do I know? Maybe there’s a solution. Maybe the numbers I threw out above are misremembered, or maybe I flubbed the math during my postdoc project.

I am very happy when I see people working on or excited about optical rectennas, as long as they are grappling with these problems, proposing solutions, and doing back-of-the-envelope calculations.

Instead, what I often see is people going on and on about the “easy problem” (i.e. the antenna), and how they’re making such great progress on it, without even mentioning the “hard problem” (i.e. the diode).



Discuss

Trends in Economic Inputs to AI

12 сентября, 2025 - 00:51
Published on September 11, 2025 9:51 PM GMT

Introduction

Frontier AI companies have seen rapid increases in the economic resources they have available to pursue AI progress. At some companies, the number of employees is at least doubling every year, and the amount of capital received is tripling every year. It is unclear whether this growth is sustainable. Is the revenue growing faster than the capital requirements? If it is, will revenue catch up before the capital available for AI investments runs out?

I do not think that there is enough publicly available data to answer these questions. It is plausible that frontier AI companies will run into significant economic limitations before Metaculus’s forecast for AGI in July 2033.

Similar Work
  • Epoch has an estimate for how various inputs to AI training runs could scale through 2030. Their work is distinct from this post because they focus on technical inputs (electric power, chip manufacturing, data, and latency), while this post focuses on economic inputs (labor and capital).
  • Epoch also has an estimate for revenue trends during 2023-2024. This post looks at a longer time scale and looks at other inputs in addition to revenue.
  • Parker Whitfill and Cheryl Wu have published an estimate for the number of employees at frontier companies over time.
Limitations

None of the frontier AI companies are independent, publicly traded companies. They are either independent companies that are not publicly traded (OpenAI, Anthropic, and xAI[1]), portions of a larger company (DeepMind and MetaAI), or not based somewhere with strong public reporting requirements (DeepSeek).

These companies are not required to make the data I am interested in publicly available. Instead, the information comes mostly from public statements from these companies. These are unlikely to be inaccurate (since then the companies could be sued for fraud), but they can be very imprecise or only selectively reported.

The quality of the data varies widely depending on what is being measured, and between the frontier AI companies. I think that enough of the data is high enough quality to draw some interesting conclusions from it.

All of the data described in this post can be found here: Growth Trends for AI Labs.

A key assumption that I am making is that the amount of capital received is proportional to the current capital requirements at the frontier AI companies. Equivalently, these companies are spending most of their capital on furthering AI capabilities, rather than holding on to it for the future. If the amount of available capital slows down, this would translate into frontier AI companies spending fewer resources on developing AI.

I am not predicting what would happen if capital does run out. Does AI progress transition to a new exponential trend with a slower growth rate or is there an AI winter? Do the frontier AI companies continue unscathed or do some of them fail when the bubble pops? This post attempts to estimate whether the current trends are sustainable, not what would happen if they are not.

Employees

The growth rate for the number of employees at OpenAI is 2x per year, at Anthropic is 2.4x per year, and at DeepMind is 1.4x per year.

I do not predict that the growth in the number of employees will run into hard limits before 2033.

Sources

There are two main sources of data for the number of employees at the frontier AI companies.

The first source is when the companies publish the number of their employees themselves, or tell journalists who publish it. For example, during the conflict between OpenAI’s former Board of Directors and Sam Altman, news organizations reported that over 700 out of OpenAI’s 770 employees threatened to leave with Sam Altman.

The other main source of data is from market research organizations like RocketReachPitchBook, or LeadIQ. They regularly create estimates for the number of employees at lots of companies (among other data) and sell that data as part of their consulting service. The process by which they collect data is not always public, but it can include scraping LinkedIn and surveys sent to companies, in addition to the news reports. Most of my data comes from market research organizations through intermediaries who have made the data public. An example is the blog posts by SEO AI about OpenAI and Anthropic.

Some of these sources only say the month or year of their estimates, rather than the exact day. I have arbitrarily assigned these numbers to the middle of the year (July 1) and the beginning of the month. Changing when these are assigned can change the growth rate by 0.4x per year, with the strongest effect when half of the data is annual estimates.

Data

OpenAI has the most extensive data available, with multiple time series from market research organizations going back as far as 2015, in addition to a scattering of media or self-reported estimates. They currently have about 7400 employees, and are growing at a rate of 2.0x per year.

Anthropic has a similar density of data, but has existed for a much shorter amount of time. They currently have about 2300 employees, and are growing at a rate of 2.4x per year.

DeepMind has existed for even longer than OpenAI, but has sparser data. Half of my data points come from a single source,[2] with the others coming from media reports. They currently have about 6700 employees and are growing at a rate of 1.4x per year.

MetaAI, xAI, and DeepSeek do not have enough data for me to be comfortable estimating a trend. The most recent estimate for each of them has fewer employees than OpenAI, Anthropic, or DeepMind today.

Figure 1: Estimates for the number of employees at frontier AI companies vs time. All of the companies have some estimates, but only OpenAI, Anthropic, and DeepMind have enough data for me to include a trendline. DeepMind has long had the most employees, but they have recently been passed by OpenAI.Projections

I will now recklessly extrapolate these trends forward in time. 

The target dates for this extrapolation are: July 2027, when AI 2027 predicts that AGI will be announced, July 2030, a target date for an Epoch investigation, July 2033, when Metaculus predicts general AI, and July 2047, the aggregate forecast for 50% chance of HLMI from the 2023 AI Impacts survey.

In July 2027, these trends predict that OpenAI will have about 26,000 employees, and Anthropic and DeepMind will each have about 11,000 employees.

In July 2030, these trends predict that OpenAI will have about 210,000 employees, Anthropic will have about 160,000 employees, and DeepMind will have about 30,000 employees.

In July 2033, these trends predict that OpenAI will have 1.8 million employees, Anthropic will have 2.4 million employees, and DeepMind will have 80,000 employees. Changing the methodology can change these projections by roughly a factor of 2.

These projections are large, but not completely absurd. OpenAI & Anthropic would be larger than the largest software companies today: Microsoft at 228,000 and Google at 186,000 employees. They are similar to the largest companies today: Walmart at 2.10 million and Amazon at 1.56 million employees. The tech industry currently employs 6.4 million people, so these 3 companies would employ about 2/3 of the current US tech workforce.

While I expect that this growth will cause shortages of people with particular skills, I do not think that there are hard limits for frontier AI companies finding new employees before 2033.

In July 2047, these trends predict that OpenAI will employ 32 billion people, Anthropic will employ 600 billion people, and DeepMind will employ 8 million people. These numbers are ridiculous. If AGI is first developed in 2047, we should not expect that the number of employees at frontier AI companies will have continued to grow at the rate they are growing today.

Capital

Frontier AI companies are receiving a large and rapidly increasing amount of capital. OpenAI, Anthropic, and xAI have received tens of billions of dollars, and this is growing at between 2.2x and 3.5x per year.

The availability of capital could be a significant constraint for these companies in the near future.

Sources

The available sources for capital received are better than the sources for the number of employees, for some of the frontier AI companies.

Most of the funding rounds for the private companies have been made public. They are conveniently conglomerated by organizations like Tracxn. Funding rounds are also discrete events, so I do not have to worry about when an estimate was made. I am more confident in my estimates for capital trends for OpenAI, Anthropic, and xAI than in my estimates for employee trends.

DeepMind and MetaAI are parts of larger companies and have received most of their capital internally. These financial transfers are not publicly available, so I cannot do this analysis for them.

DeepSeek is in China, and so has different reporting requirements. A US Congressional report estimated that they received $420 million from the hedge fund that owns them, as of April 2025. I have found no other sources with data about DeepSeek’s capital.

Data

OpenAI has received about $62 billion in total capital. This number is growing at a rate of 2.2x per year.  Their most recent funding round, $40 billion on March 31, 2025, looks to be above the trend. However, this capital has not yet all been received: $10 billion was received when this was announced, and $30 billion will be received by the end of the year.[3] If you split the capital up accordingly, it is on trend.

Anthropic has received about $30 billion in total capital. This number is growing at a rate of 3.5x per year. They also have the most recent funding round: $17 billion on September 2, 2025.

xAI has received about $22 billion in total capital. This number is growing at a rate of 3.3x per year. The trends for Anthropic and xAI look remarkably similar to each other.

Figure 2: Estimates for the total capital received by three of the frontier AI companies vs time. OpenAI has more capital, but Anthropic and xAI are growing faster.Projections

I will now recklessly extrapolate these trends forward in time.

In July 2027, these trends predict that OpenAI will have received about $280 billion in capital, Anthropic will have received about $270 billion, and xAI will have received about $230 billion. The total amount of venture capital under management in the US is currently about $1.2 trillion. These three companies would account for over 60% of US venture capital. The current pool of venture capital is rapidly running out. Frontier AI companies have already begun looking for other sources of capital. They have sold conventional debt and are receiving investments from sovereign wealth funds.

In July 2030, these trends predict that OpenAI will have received about $3 trillion in capital, Anthropic will have received about $12 trillion, and xAI will have received about $8 trillion. For comparison, the total amount of capital in sovereign wealth funds is currently $14.3 trillion, the total size of the US bond market is currently $58.2 trillion, and the total size of the US stock market is currently $62.2 trillion. These three companies would account for more than all of the capital in sovereign wealth funds, or more than a third of the US bond market or the US stock market. Figuring out whether this is an unreasonable amount of capital would require trying to figure out how much of the US bond & stock markets are potentially divertible into AI companies, which is beyond the scope of this post.

In July 2033, these trends predict that OpenAI will have received about $34 trillion in capital, Anthropic will have received about $520 trillion, and xAI will have received about $310 trillion. For comparison, the total global wealth is about $500 trillion. These three companies would account for more than all of the current total global wealth. 

There is no point in extrapolating these trends to July 2047 other than to gawk at how many zeros there are.

It seems important to emphasize that these are trends in inputs to AI, not outputs from AI. This is an estimate of the amount of capital received by frontier AI companies, which I assume tracks the amount spent to produce and deploy AI systems at scale.[4] It is not an estimate of the amount of value produced by AI systems.

In order for these trends to continue through 2033, most of the wealth required to continue AI development has to be newly generated. If frontier AI companies are not generating a significant fraction of global GDP,[5] they will have run out of money before 2033. Determining when exactly this will occur would require estimating how much of global wealth is available to be invested in AI, which is beyond the scope of this post. Most global wealth is not liquid or is otherwise unavailable for investing in AI. I turn instead to trends in the revenue generated by AI.

Revenue

The revenue data publicly available for most frontier AI companies is terrible. A growth rate of 3x per year is maybe reasonable.

Some frontier AI companies have also made public projections for their future revenue, and these projections are lower than what the trends would suggest.

Sources

None of the frontier AI companies are legally required to publish revenue data. OpenAI, Anthropic, and xAI are privately owned (not publicly traded on the stock market). DeepMind and MetaAI are both part of larger companies. DeepSeek is in China. These data are only public if companies choose to share them.

There are multiple ways of reporting revenue. I am focusing on annual revenue: the actual amount of revenue generated over the course of a year. To give it a specific date, I assign it at the middle of the year (July 1). Another common thing to report is annualized revenue: the revenue generated in a month (or quarter), multiplied by 12 (or 4). This can be helpful for tracking trends that might be changing rapidly. I am not using it because companies report annualized revenue for some months but not others, and I expect that there is selection bias.

OpenAI

OpenAI is the only frontier AI company that has published decent revenue data. Their nonprofit is legally required to publish their precise revenue, although that is a small part of their total organization. Their for profit company has also published annual revenue from 2020-2024, and has a reasonable looking projection for 2025.

OpenAI generated $3.7 billion in revenue in 2024 and projects $12.7 billion in 2025. It is growing at a rate of 3.2x per year.

Figure 3: Revenue of OpenAI’s nonprofit and for profit entities vs time. OpenAI’s projections for 2029 are substantially below the current trend.

OpenAI has made revenue projections for 2029: either $100 billion or $125 billion and it will be cash flow positive by then. These projections include a chart, which I have used to estimate their revenue projections for each year from 2026-2030.

This is a surprising projection. If the current trend continues, OpenAI would generate $1.4 trillion in revenue in 2029. This would also be the first year when projected revenue would cover projected capital requirements.

OpenAI is publicly projecting that its revenue growth will slow. They are projecting a revenue growth rate of 1.5x per year, not 3.2x per year. 

Either OpenAI's projections are accurate and their revenue growth will be slower than the trends described above, or their public projections are inaccurate.

  1. Accurate prediction:

    In this case, OpenAI's current revenue growth will not continue. Since OpenAI is also claiming that it will be cash flow positive, it is predicting that the growth rate in capital it receives will slow.

    This would have a significant impact on AI forecasts. In particular, many forecasts involve extrapolating exponential trends.[6] If current trends rely on maintaining this exponential growth of inputs, then those forecasts seem dubious.
     
  2. Inaccurate prediction:

    OpenAI might also continue their current exponential growth rate (or grow faster). This is consistent with a qualitative prediction they made: that projected revenue would surpass projected capital requirements in 2029. It is inconsistent with their numerical predictions.
Anthropic

Anthropic’s data is terrible. I have found multiple sources with very different estimates. The revenue for 2024 seems to be between $200 million and $1 billion, and is growing at a rate of between 2.6x and 11.5x per year. This is not a small range. In particular, it includes the growth rate for capital received (3.5x per year), so it is unclear whether these projections suggest that Anthropic will ever be profitable.

Figure 4: Revenue of Anthropic vs time, according to three different sources. There is huge uncertainty.

Anthropic has also made projections for 2025 and 2027. Their 2025 projection is $2.2 billion in revenue. Their “base case” projection for 2027 is $12 billion, and they say that revenue “could reach as high as” $34.5 billion. If I assume exponential growth between 2025 and 2027, they are projecting an exponential growth rate of between 2.7x and 7.8x per year.

Other Frontier AI Companies

DeepMind has not released revenue data since 2020. That data included “how much Alphabet pays for internal services, and that can be completely arbitrary.”

I don’t know of any revenue data that has been released for MetaAI.

xAI has only released rapidly changing projections for annual revenue. In February of this year, they projected annual revenue for 2025 to be $100 million. In June, after generating $50 million in revenue in the first quarter, they projected annual revenue for 2025 to be $1 billion, and $14 billion in 2029.[7]

DeepSeek has released a theoretical calculation of what their revenue could be: $200 million, along with a theoretical cost-profit ratio of 545%.

I distrust all of these numbers.

Epoch’s Estimates

Epoch has also estimated the growth rate in revenue for the frontier AI companies. They focused on the years 2023-2024. They estimate that Anthropic and DeepMind[8] had revenue of single digit billions per year and OpenAI had revenue of about $10 billion per year, in April 2025. The growth rate for each of these three companies’ revenue is estimated to be about 3x per year. They also argue that no other AI company had revenue exceeding $100 million in 2024.

The biggest difference in methodology is that Epoch uses reported annualized revenue. This allows them to focus on a shorter time period, although I don’t think that there is a solution for the selection bias in reporting. DeepMind doesn’t even report annualized revenue, so Epoch created a proxy based on the number of users.

I think that Epoch’s estimates are reasonable, and that they adequately express uncertainty. A growth rate of 3x per year is consistent with my estimate for OpenAI, and is within my large uncertainty for Anthropic and DeepMind.

Conclusion

The economic resources being used for AI are growing rapidly.

The number of employees at OpenAI and Anthropic is at least doubling every year, and increasing by 40% every year at DeepMind. The amount of capital received by OpenAI is doubling every year, and more than tripling every year for Anthropic and xAI. The growth of revenue generated is highly uncertain, but might be tripling every year.

It is unclear how sustainable this is. Frontier AI companies could run out of US venture capital in mid 2027 and could run out of all global wealth in 2033. Substantial new wealth generated by AI is necessary to maintain exponential trends. Revenue might catch up to capital requirements for OpenAI in 2029, although they project a much lower revenue then. For Anthropic, neither Epoch’s revenue estimate nor their own baseline projections are growing as fast as their capital received (although other estimates exist). Other frontier AI companies have too little data to make the comparison.

Either frontier AI companies will have to generate more revenue than their own projections suggest, or the amount of capital available to be invested in AI will not be able to continue its current exponential growth for much longer.

Acknowledgements

This post was produced by the ML Alignment & Theory Scholars Program. Jeffrey Heninger was the primary author of this post and Ryan Kidd scoped, managed, and edited the project. Thank you also to Cameron Holmes and John Teichman for comments on a draft.

Thank you to the many people who volunteered as mentors for scholars at MATS! We would also like to thank our 2025 donors, without whom MATS would not be possible.

If you are interested in becoming a MATS scholar, applications for the Winter 2026 cohort are now open until October 2.

 

  1. ^

    xAI merged with the social media company X (formerly Twitter) in March 2025, so it could also be considered to be in the second category. In that merger, xAI had a higher valuation.

  2. ^

    Whitfill’s and Wu’s post.

  3. ^

    $20 billion of that is conditional on OpenAI restructuring.

  4. ^

    Epoch estimates that frontier AI companies should spend roughly equal amounts on training and inference. If frontier AI companies are using most of the capital they acquire on training and inference, then each would account for somewhat less than half of these companies’ current budget.

  5. ^

    The annual capital requirements for these three companies would be greater than the current global GDP of $111 trillion. I do not know what the rest of the economy would look like in order for this to work.

  6. ^

    There are also forecasts that do superexponential extrapolation.

  7. ^

    This projected revenue is two orders of magnitude lower than their projected capital received by 2029.

  8. ^

    This estimate excludes revenue gained by integration of DeepMind’s products within other Google products, like including Gemini with Google search results.



Discuss

Contra Shrimp Welfare.

11 сентября, 2025 - 23:34
Published on September 11, 2025 8:34 PM GMT

It is likely that installing a shrimp stunner reduces global suffering as much as making the carts in a single Walmart less squeaky for 20 minutes a year. Or perhaps not at all.

Open Philanthropy has handed $2 million to the Shrimp Welfare Project (SWP), primarily to promote electrical stunning devices, and fund staff to push policy changes. Each stunner costs $70,000 to purchase and $50,000 to distribute. The goal? To "reduce suffering" when 500 million shrimp are harvested annually by cutting their death time from 20 minutes in ice slurry to 30 seconds via electrical stunning.

This initiative may sound odd at first glance but the SWP has produced numerous blog posts, elaborate spreadsheets, and lengthy PDFs to justify their approach. They have clearly thought this through extensively, and I will look to provide a short, but equivalently thorough rebuttal.

They claim that the shrimp stunner renders shrimp “unconscious” by synchronously depolarizing their neurons with an electrical current over three seconds and then kills them around 27 seconds later. This replaces the cheaper and more common process of immersing shrimp in an ice slurry which leads to immobilisation in around a minute and death in about 20 minutes.

Promoting this stunner based approach implies that the people behind the SWP believe that disrupting neuronal firing stops suffering, a physicalist perspective that I agree with.

My disagreement, however, is with the methodology, and the assumptions that the SWP depend on to justify their conclusions; namely behavioural tests, and loose biological analogues to justify shrimp consciousness and suffering.

Neuroscience offers several frameworks for understanding consciousness:

First, it should be mentioned that shrimp have ≈100,000 neurons. Humans have ≈86 billion neurons, and only 10% of the human brain is likely to be involved in any sort of conscious calculation. Neuronal firing and interaction alone does not imply the existence of consciousness or awareness.

Global Workspace Theory states that consciousness arises when information is globally broadcast across distributed brain systems humans achieve this via coordinated fronto-parietal networks, as shown in EEG, fMRI, and MEG studies on consciously available stimuli in the human brain, but shrimp lack anything comparable; no cortex, no long range networks, no unified communication hub.

Electric Field Theories emphasize stable macroscopic fields that bind cognition. Humans generate such fields across large, integrated networks, but shrimp nervous systems are small, modular, and discontinuous, making coherent fields that integrate information very unlikely. 

Marker’s midbrain theory of consciousness would imply that all vertebrates are conscious, and create a cohesive evolutionary pathway for consciousness that excludes shrimp. These theories locate consciousness in vertebrate midbrain integration hubs like the superior colliculus and periaqueductal gray. Shrimp have no clearly analogous structure, so they likely lack spatially unified representations. 

Integrated Information Theory gives shrimp their strongest case for awareness: they integrate information locally, yet their overall Φ is likely tiny given sparse neurons and limited connectivity.

Based on these thresholds, P(shrimp consciousness) is very low.

Now let’s discuss P(shrimp suffering).

If we grant shrimp the minuscule amount of awareness that IIT would give them, then we must gauge their ability to suffer. It takes some skill to suffer.

Under most neuroscientific theories, suffering is not just about detecting damaging stimuli, but requires integrating multiple streams of information into a unified evaluative model that links sensation, memory, affect and motivation. The scientific consensus is that suffering evolved as an adaptive feature to motivate avoidance of harm. Some argue that consciousness evolved to create a self-model that tracks current bodily states, stores and retrieves past experiences and projects future scenarios to guide avoidance behaviours.

There is no evidence that shrimp are capable of any of those mechanisms, and despite the SWP’s unsubstantiated claim that shrimp can undergo associative learning (https://www.shrimpwelfareproject.org/are-shrimps-sentient), which most likely comes from research on crabs (not shrimp), I found no such evidence myself.

Shrimp can detect noxious stimuli, and have been observed to groom damaged antennae. There is even evidence of reduced grooming after opioid administration implying the existence of damage detection, which LSE saw as enough evidence to claim sentience. This is clear evidence that shrimp can detect damage to their bodies. That does not mean they suffer.

You, as a human, associate injury with negative valence. That does not mean shrimp do.

Your suffering scales with Anterior Cingulate Cortex (ACC) activation, not with increased activation of pain receptors. Most vertebrates have ACC analogues, but shrimp don’t. Human suffering is largely caused by ACC activation, where electrical stimulation of the region generates a reported sense of “existential distress” and lesions to the ACC allow humans to detect pain without suffering. Shrimp certainly can’t suffer in any way that we can relate to.

You can make the argument that shrimp developed the ability to suffer independently. Under this assumption, consciousness and suffering evolved along the malacostran lineage hundreds of millions of years ago in the Cambrian period, as the shrimp nervous system hasn't changed much since then. This would imply that there are tens to hundreds of quintillions of sufferers inhabiting our earth. The SWP’s long term goal is to slightly reduce the suffering of 0.0004% of them (400 billion farmed shrimp) for around 20 minutes over their lifetimes. This would imply a yearly suffering reduction of 0.00000002% over the malacostran family.

If you still aren't convinced by all these arguments, I will put forward something more analytical, based on neuron counts. The SWP criticises neuron counts alone by mentioning synapse density and topology, neuron size, conduction velocity, refractory period, and inter-neuronal distance. Funnily enough, on all of these markers, shrimp fall orders of magnitude below humans. Their conduction is slow, neurons large and widely spaced and their synapses sparse and simple. The neuron count measurement is the most generous I could reasonably use.

The human brain contains approximately 86 billion neurons, but consciousness researchers theorize that only about 10% actively participate in generating conscious experience. This gives us 8.6 billion conscious neurons per human. Let's call this “one sentium”, the conscious capacity of a single human being.

Applying the same logic to shrimp, with their 100,000 total neurons and assuming the same 10% participation rate, each shrimp contributes 10,000 neurons to conscious processing (not unreasonable based on the architectures of their brains), simple arithmetic tells us that 860,000 shrimp equal one sentium of conscious capacity.

But not all sentiams are created equal. Shrimp lack a bounded sense of self, memory beyond a few seconds, cross-modal sensory integration, and any framework for complex experience. At best, a shrimp sentium would encode only the most surface-level sensory experience: raw sensation without context, meaning, or emotional depth. Think of the mildest irritation you can imagine, like the persistent squeak of a shopping cart wheel at Walmart.

A typical Walmart hosts about 550 shoppers at any given time, all of them pushing those squeaky carts. That's 550 human sentiams experiencing mild irritation. Each electrical stunner processes approximately 500 million shrimp annually, equivalent to 581 sentiams of conscious capacity. The stunner reduces their suffering from ice slurry death, which takes about 20 minutes.

I’ll ask the question, is this worth $100,000? Is this even worth $10? $1? 10¢?

It is likely that installing a shrimp stunner reduces global suffering as much as making the carts in a single Walmart less squeaky for 20 minutes a year. Or perhaps not at all.

The Shrimp Welfare Project wants shrimp to suffer so they can have a new problem to solve. 

While they  invest millions into speculative welfare gains for shrimp, the same effort and resources could fund malaria nets to save children’s lives, deworming programs to save children’s lives, vitamin A supplementation to prevent blindness, disaster relief, tuberculosis and HIV treatment, mental health treatments, maternal health services, lead paint removal, school feeding programs, safe water and sanitation projects and so many more proven efforts that actually reduce suffering.



Discuss

High-level actions don’t screen off intent

11 сентября, 2025 - 23:21
Published on September 11, 2025 8:21 PM GMT

One might think “actions screen off intent”: if Alice donates $1k to bed nets, it doesn’t matter if she does it because she cares about people or because she wants to show off to her friends or whyever; the bed nets are provided either way. 

I think this is in the main not true (although it can point people toward a helpful kind of “get over yourself and take an interest in the outside world,” and although it is more plausible in the case of donations-from-a-distance than in most cases).

Human actions have micro-details that we are not conscious enough to consciously notice or choose, and that are filled in by our low-level processes: if I apologize to someone because I’m sorry and hope they’re okay, vs because I’d like them to stop going on about their annoying unfair complaints, many small aspects of my wording and facial expression will be likely different, in ways that’re hard for me to track. I may think of both actions as “I apologized politely,” while my intent nevertheless causes predictable differences in impact.

Even in the donations-from-a-distance case, there is some of this: the organization Alice donates to may try to discern Alice’s motives, and may tailor its future actions to try to appeal to Alice and others like her, in ways that have predictably different effects depending on eg whether Alice mostly wants to know/care/help or mostly wants to reinforce her current beliefs.

(This is a simple point, but I often wish to reference it, so I’m writing it up.)



Discuss

Building Conscious* AI: An Illusionist Case

11 сентября, 2025 - 21:36
Published on September 11, 2025 4:41 PM GMT

In this post I want to lay down some ideas on a controversial philosophical position about consciousness: illusionism, and how it might impact the way we think about consciousness in AI. Illusionism, in a nutshell, proposes that phenomenal consciousness does not exist, although it seems to exist. My aim is to unpack that definition and give it just enough credence to make it worth exploring its consequences for AI consciousness, morality and alignment. 

Illusionism suggests that there is a different mechanism: consciousness* (aka the cognitive processes which trick us into thinking we have phenomenal consciousness, introduced later in the post) which is less morally significant but more cognitively consequential. This reframing leads to different conclusions about how to proceed with AI consciousness.

The illusionist approach is different from—but not in contradiction with—the kind of view exemplified by Jonathan Birch's recent "Centrist Manifesto". Birch emphasises the dual challenge of over-attribution and under-attribution of consciousness in AI, and outlines some of the challenges for AI consciousness research. In accordance with other recent work, he advocates for a careful, cautious approach.

By critiquing Birch's framework through an illusionist lens, I will end up arguing that we should seriously consider building consciousness* into AI. I'll outline reasons for expecting links with AI alignment, and how efforts to suppress consciousness-like behaviours could backfire. The illusionist perspective suggests we might be committing a big blunder: trying to avoid anything that looks like consciousness in AI, when it actually matters far less than we think morally, but is far more consequential than we think cognitively.

The case for illusionism

What is phenomenal consciousness?

The classic definition of phenomenal consciousness by Nagel is that a system is conscious if there is “something it is like” to be that system. If this seems vague to you (it does to me) then you might prefer defining consciousness through examples: seeing the colour red, feeling pain in one’s foot, and tasting chocolate are states associated with conscious experiences. The growth of nails and regulation of hormones are not (see Schwitzgebel's precise definition by example).

The hard problem and the meta-problem of consciousness.

In his seminal paper “Facing up to the problem of consciousness”, David Chalmers proposes a distinction between what he calls the easy and hard problems of consciousness. The easy problems are about functional properties of the human brain like “the ability to discriminate, categorize, and react to environmental stimuli”. While these problems might not actually be easy to solve, it is easy to believe they are solvable.

But when we do manage to solve all the easy problems (in Chalmers’ words): “there may still remain a further unanswered question: Why is the performance of these functions accompanied by experience?”. That’s the hard problem of consciousness: understanding why, on top of whatever functionality they have, some cognitive states have phenomenal properties.

23 years later, David Chalmers published “The Meta-Problem of Consciousness”. The first lines read: “The meta-problem of consciousness is (to a first approximation) the problem of explaining why we think that there is a [hard] problem of consciousness.” So instead of “why are we conscious”, the question is “why do we think we are conscious”. Technically, this is part of the easy problems. But as Chalmers notes, solving the hard problem probably requires understanding why we even think we have consciousness in the first place (it would be weird if it was a coincidence!). Thankfully, the meta-problem is more tractable scientifically than the hard one.

So suppose we solved the meta-problem of consciousness? The hard problem says we still have to explain consciousness itself—or do we? This is where illusionism comes in.

Illusionism to the rescue

Illusionism basically says this: once we have successfully explained all our reports about consciousness, there will be nothing left to explain. Phenomenal experiences are nothing more than illusions. For illusionists, the meta-problem is not just a stepping stone, it's the whole journey.

Cover of Illusionism as a theory of consciousness by Keith Frankish

As a guiding intuition, consider the case of white light, which was regarded as an intrinsic property of nature until Newton discovered that it is in fact composed of seven distinct colours. White light is an illusion in the sense that it does not possess an intrinsic property “whiteness” (even though it seems to). Suppose we manage to explain, with a high degree of precision, exactly how and when we perceive white, and why we perceive it the way we do. We do not subsequently need to formulate a “hard problem of whiteness” asking why, on top of this, whiteness arises. Illusionists claim that consciousness is an illusion in the same sense that whiteness is.[1]

So illusionists don’t deny that conscious experiences exist in some sense (we’re talking about them right now!). They deny that conscious experiences have a special kind of property: phenomenality (although they really seem to have phenomenality).

The most common objection to illusionism is straightforward: how can consciousness be an illusion when I obviously feel pain? This is an objection endorsed by a lot of serious philosophers (including Chalmers himself). Intuition pumps can only get us so far, we'll now dive into an actual philosophical argument.

Debunking consciousness

One of the main arguments for illusionism follows the template of a so-called “debunking argument”. The idea is that if we can explain the occurrence of our beliefs about X in a way that is independent of X, then our beliefs about X might be true, but that would be a coincidence (i.e. probability zero). Let’s use this template to “debunk” consciousness (following Chalmers):

  1. There is a correct explanation of our intuitions about consciousness which is independent of consciousness.
  2. If there is such an explanation, and our intuitions are correct, then their correctness is a coincidence.

I think many atheists want to make a similar kind of argument against the existence of God. Suppose that we can explain our beliefs about God in, say, evolutionary, psychological and historical terms without ever including God as a cause. It would then be a bizarre coincidence if our beliefs about God turned out to be correct. As with the debunking argument against consciousness, the hardest part is actually doing the debunking bit (i.e. claim 1). The good news is that philosophers can outsource this: it is a scientifically tractable problem.

Introducing consciousness* (with an asterisk)

If consciousness doesn't exist, what cognitive mechanisms generate our persistent reports and intuitions about it? Being an illusionist involves denying that phenomenal consciousness exists, but not that it seems to exist, and not that something is causing us to have all these intuitions. The fact that the illusion is so strong is precisely what the theory seeks to explain. There must be some cognitive mechanism which causes us to mischaracterise some cognitive states as possessing phenomenal properties.

So let's define consciousness* (with an asterisk) as "the cognitive mechanism leading us to systematically mischaracterise some states as phenomenal."[2] This sort of deflated (diet) version definitely exists. This distinction changes how we should think about AI consciousness:

  • Traditional/realist view: "Is this AI phenomenally conscious?"
  • Illusionist view: "Does this AI have the cognitive architecture that produces reports and intuitions about consciousness?"

The first question assumes something (phenomenal consciousness) that illusionists think is conceptually confused. The second question is scientifically tractable: we can study consciousness* in humans and look for similar mechanisms in AI.

So maybe we can just replace full-fat "consciousness" with diet "consciousness*" in all our ethical theories, dust off our hands, and call it a day. Problem solved, ethics intact, everyone goes home happy.

If only it were that simple. As we'll see, this substitution raises issues about what properties should ground moral consideration of minds—human and artificial alike.

For the remainder of the post I'll focus on consciousness* (the real but diet version, with the asterisk), and occasionally refer to full-fat consciousness (no asterisk). Whenever you see an asterisk, just think "the cognitive processes which trick me into thinking I have phenomenal consciousness".

The consequences of illusionism on ethics

Do illusionists feel pain?

There are three ways one can understand “pain” and illusionism has different takes on each of them (see Kammerer):

  • Functional pain: illusionism does not deny this exists.
  • Phenomenal pain: illusionism denies this exists (but not that it seems to exist).
  • Normative pain (i.e. inflicting pain is bad/unethical): illusionism does not deny this exists.

So illusionists can still say hurting people is wrong. But the question remains: why would inflicting pain be bad if there's no phenomenal experience? What about our new notion of consciousness* which does exist. Does that matter?

Questioning moral intuitions about consciousness

Our intuitions about pain's badness come from introspection: phenomenal pain seems to directly reveal its negative value with apparent immediacy. Pain doesn't just seem bad, it seems bad beyond doubt, with more certainty than any other fact. However as François Kammerer argues in his paper "Ethics without sentience", if illusionism is true, then

our introspective grasp of phenomenal consciousness is, to a great extent, illusory: phenomenal consciousness really exists, but it does not exist in the way in which we introspectively grasp and characterize it. This undercuts our reason to believe that certain phenomenal states have a certain value: if introspection of phenomenal states is illusory – if phenomenal states are not as they seem to be – then it means that the conclusions of phenomenal introspection must be treated with great care and a high degree of suspicion

In other words if we take the leap of faith with illusionism about phenomenal states, why stay stubbornly attached to our intuitions about the moral status of these same states?

To be clear, this argument targets intuitions about consciousness, the full-fat no asterisk version. But since consciousness* (with an asterisk) is none other than the set of cognitive processes which generate our (now-suspect) intuitions, this also removes reasons to treat it as a foundation for moral status.

This seems to point to the need to use other properties as foundations for moral consideration. As Kammerer explores, properties like agency, desires, sophisticated preferences, or capacity for deep caring are good candidates. Of course these might happen to be deeply entangled with consciousness*, such that in practice consciousness* might be linked to moral status. But even if this entanglement exists in humans, there is no guarantee it would persist in all artificial systems. We shouldn't exclude the possibility of systems possessing the cognitive build for consciousness* but without e.g. strong agency, or vice-versa.

Conscious AI: an illusionist critique of the Centrist Manifesto

Having presented illusionism, I now want to examine how it applies to current approaches in AI consciousness research. Beyond laying groundwork for my later argument about building consciousness* into AI, this also showcases the de-confusing powers of illusionism, and makes the issue more tractable overall.

In his recent paper AI Consciousness: A Centrist Manifesto, Jonathan Birch outlines two major challenges facing us in the near future: roughly, over-attribution and under-attribution of consciousness in AI. The paper does a great job of outlining the issues whilst remaining parsimonious in its assumptions[3]. However examining Birch's manifesto from an illusionist lens points to methodological blind spots and suggests a more promising path forward.

The gaming problem and the Janus problem

The first challenge Birch describes is that many people will misattribute human-like consciousness to AI. This is not a new phenomenon and is reminiscent of the ELIZA effect. Things get messy when labs become incentivised either to take advantage of Seemingly Conscious AI (SCAI) or to suppress it. I'll have more to say about this in the final section.

Jonathan Birch's second challenge cuts to the heart of the AI consciousness problem: we might create genuinely conscious AI before we have reliable ways to recognise it, and before we fully understand the moral implications. This is a serious problem. In the worst case, we could create billions of suffering agents. Addressing this challenge means understanding AI consciousness and how it relates to moral status. Birch goes on to identify two fundamental problems that make this solution difficult: the gaming problem and the Janus problem.

The gaming problem arises from the fact that frontier models are trained on massive datasets containing humans talking about their minds and experiences, and also post-trained to produce various responses (e.g. ChatGPT when it is asked if it is conscious). Whatever models say about their own subjective experience cannot be trusted.

Asking ChatGPT if it is conscious1st/7 paragraph of Claude's answer: that no one knows. 

The Janus problem is that whatever theory-driven indicator you find in AI, there will always be two ways to update: "AI is conscious" or "the theory is wrong". The same evidence points in opposite directions depending on your prior beliefs.

Birch argues these obstacles aren't permanent roadblocks—they can be overcome through systematic comparative research across species, theoretical refinements, and better AI interpretability tools.

Birch's research program for deciding whether to attribute consciousness to AI.

The illusionist response

While it's true that behavioural evidence becomes unreliable when dealing with AI systems, this doesn't mean we can't do empirical work. We can design theory-informed experiments that test real capabilities rather than surface-level mimicry. Illusionists view consciousness* as deeply integrated into cognition, suggesting many avenues for meaningful measurement. 

For instance, we might measure metacognitive abilities by having models predict their own performance or confidence across different domains. We can investigate self-modelling and situational awareness through real benchmarks. We could examine top-down attentional control (see next section on AST), and whether models can selectively shift their focus in ways you wouldn't expect from a pure language model. We have to be smart about how we design our experiments to avoid the gaming problem, but very similar concerns exist in other areas of AI research (e.g. alignment). The gaming problem is real, but far from intractable.

The Janus problem is also real in some sense: we can always draw inferences in both directions when we find theory-driven indicators in AI. There is nothing fundamentally wrong with discrediting a theory of consciousness* by showing that it leads to absurd results on AI models. Inferences go both ways in all parts of science.

In the paper, Birch sketches out how we might look for architectural indicators in LLMs from a leading theory of consciousness Global Workspace Theory (GWT). GWT proposes that consciousness arises when many specialised processors compete for access to a central "workspace" that then broadcasts information back to all input systems and downstream modules. As Birch shows, the transformer architecture does not contain a global workspace, although a similar architecture (the Perceiver variant) does. We run into issues when it turns out even a tiny Perceiver network technically has a global workspace, despite not displaying any kind of coherent behaviour. This issue arises because the approach was doomed from the get-go. From the illusionist perspective it suffers from two fundamental flaws:

  • First, looking for a global workspace solely in the architecture is the wrong place to look: if the architecture was all that matters, then GWT wouldn't distinguish between a trained model and one initialised with random weights! It's a bit like opening up a piano looking for Beethoven's 9th Symphony. Instead of "does this architecture contain a global workspace?" we should ask something like "do models develop global workspace-like dynamics?".
  • Second, GWT is not the right kind of theory. While a robust and convincing account of likely necessary processes for consciousness*, GWT does not explain our intuitions and reports. Michael Graziano terms this the Arrow B problem (see figure below).

    Arrow A is explaining how computational processes produces conscious* states, Arrow B is explaining how those states lead to intuitions and reports. GWT only tackles Arrow A. Figure from Illusionism Big and Small.

Somewhat contra Birch, I actually think that picking the right kind of theory, and asking the right kinds of questions collapses the Janus problem into standard empirical disagreements. This kind of thing happens in physics all the time: theories are rejected precisely because of the predictions they make. When quantum mechanics predicted wave-particle duality, many rejected it because particles can't be waves. The solution wasn't to declare the question unanswerable, but to develop better theories AND experiments that could distinguish between competing interpretations.

So what conditions should the right kind of theory satisfy? It must be a mechanistic theory that makes measurable predictions, AND it must explain how those mechanisms lead to our intuitions and reports about consciousness (the Arrow B problem).

Having critiqued an existing approach to AI consciousness, what does an illusionist-native alternative look like? Illusionism changes the priors, the goals, the methods, and the moral framework. It is not a complete departure from Birch's approach, but a focused recalibration.

Why we should build conscious* AI

Finally we get to the core point of this post: that we should seriously consider building consciousness* into AI[4]

Illusionism suggests consciousness* is less morally important (making it more acceptable to build) and more cognitively important (making it more useful to build). One response to this is that we are profoundly uncertain, therefore we should take the cautious approach: refrain from building it. This conservative approach is a reasonable default setting, but it does not come without its perils. Suppose we take the cautious approach, I will argue this could lead to:

  • missing out on opportunities that come with building consciousness* in AI. I'll argue from an illusionist perspective that there are first-principles reasons to expect links to alignment.
  • suffering from bad consequences from actors purposefully suppressing consciousness* or seemingness of consciousness* in AI. There is an illusionist case that this could backfire.

To dive into this it helps to introduce one of the leading illusionist-compatible theories of consciousness*.

The Attention Schema Theory: consciousness* as a model of attention

Here is part of the abstract of a 2015 paper, where Michael Graziano introduces the Attention Schema Theory (AST) better than I ever could:

The theory begins with attention, the process by which signals compete for the brain’s limited computing resources. This internal signal competition is partly under a bottom–up influence and partly under top–down control. We propose that the top–down control of attention is improved when the brain has access to a simplified model of attention itself. The brain therefore constructs a schematic model of the process of attention, the ‘attention schema,’ in much the same way that it constructs a schematic model of the body, the ‘body schema.’ The content of this internal model leads a brain to conclude that it has a subjective experience.

(The terms "subjective experience" and "consciousness" are used interchangeably)

In a nutshell, AST equates consciousness* with a model of attention. The crux is that this model is deeply imperfect just like our body schema (which e.g. doesn't represent blood vessels). Graziano would say it's a "quick and dirty" model, which evolved through natural selection to do its job, not to be accurate.

Going from representing an apple, to representing subjective awareness of an apple.

Say Graziano and his team gather enough evidence and build a rock-solid theory that explains why we have these deep intuitions about consciousness and why we report having subjective experiences. The illusionist position is simple: that's it. We're done. Any feeling that there must be something more is exactly what the theory predicts we would intuit[5].

A first-principles argument for why consciousness* could matter for AI alignment

If AST has any validity, then this cognitive machinery is arguably relevant to challenges in AI alignment. Moreover we might be overlooking them precisely because they involve consciousness. Here's one compelling reason why understanding consciousness* could be vital for alignment:

In his book Consciousness and the Social Brain, Graziano explores how the attention schema evolved not just for self-monitoring, but largely as a social tool. The same neural machinery that lets us model our own attention also lets us model other minds. Watching someone focus intently on something, you use your attention schema to model what they are attending to and predict their next move. Ultimately this provides the necessary tools for navigating complex social coordination problems.

The idea that we don't accurately represent our cognitive states, but rather misrepresent them in useful ways, is basically what illusionism is about. There is little reason to expect evolution to enforce that our reports be correct. Here's an intuition pump, suppose I'm with some friends and I spot a deadly snake. One thing which is not useful to communicate is the sequence of intricate electro-chemical reactions in my brain which lead me to run away. A more helpful broadcast would be to convey a useful fiction about my current cognitive state (e.g. enter the “fear” state, gasp, scream, etc). My representation is a rough but evolutionarily useful shortcut.

The implications for AI are notable: alignment is a bit like a social coordination problem. If we want to cooperate with advanced AI, we might benefit from it having something functionally similar to an attention schema. This would provide AIs with a superior model a) of what humans and other agents are attending to, making it less likely to mess up, and b) of what the AI itself is attending to, leading to better self-modelling/self-control and hopefully a boosted capacity to report its own cognitive states.

Perhaps having AIs develop useful models of others, and of themselves, can help rule out failure modes. The same way that LLMs having good world models makes some Nick Bostrom-style apocalypse scenarios implausible (relative to AlphaGo-type pure RL systems).

Suppressing consciousness: a model for "cautious approach" failure modes

Whether or not consciousness* turns out to be alignment-relevant, AI labs might face strong incentives to suppress consciousness-like behaviour in their models. As public concern about AI consciousness grows—driven by the ELIZA effect and justified moral uncertainty—companies will be pressured to "suppress" Seemingly Conscious AI, either by making it seem less conscious, or by somehow making it less conscious*. While this pressure seems reasonable and critics (like Zvi Mowshowitz in a recent post) rightly call out disingenuous industry arguments, I'll argue the approach could backfire.

The suppression would come from labs applying optimisation pressure (intentional or not, RL or otherwise) that steer models away from making statements that sound conscious or introspective. This risks creating a more general failure mode: the AI learns to broadly avoid communicating its knowledge about its internal states. Despite retaining the relevant self-modelling capabilities (which are essential to performance), subtle optimisation pressures push models to hide them.  This undermines AI alignment research methods which rely on AIs being transparent about their internal states. Methods which may be essential for detecting early signs of deceptive or misaligned behaviour. What seems like an innocent PR fix might turn into a big cognitive alteration.

This is just one illustration of how there could be a tension between consciousness* and alignment issues, there are others. There might also be cases where labs, wanting to be ethically cautious, accommodate AI desires in ways that similarly reinforce bad behaviours. 

The broad point is this: the traditional/realist position is to be cautious about consciousness, treat it as a moral hazard, and do our best to avoid it. The illusionist position, on the other hand, treats consciousness* as less morally and more cognitively significant: it suggests we should be far more comfortable building consciousness* into AI, far more curious about the potential doors it opens in AI research, and far more scared about the downstream consequences of tampering with it.

What success looks like

Going back to Birch's research program, here is an illusionist alternative. The illusionist research program looks a lot like a very boring scientific research agenda without anything special: it involves developing theories that meet the two illusionist criteria (mechanistic and explains intuitions and reports about consciousness), using these theories to inform empirical work on humans, animals and AIs, and updating our theories, and repeating over and over. We have priors about the distribution of consciousness* in the world. That's fine. We can debate and update them as empirical evidence comes in.

In parallel, advances in mechanistic interpretability offer new ways to test theory-driven indicators in models. Work on representation engineering, steering vectors, and sparse autoencoders provides promising avenues for detecting the computational structures that theories like AST predict should underlie consciousness*.

What you end up with is a back-and-forth between theory and experiment which comes with a lot of idiosyncratic methodological considerations. For example: be wary of AIs mimicking humans, sometimes AIs having it means the theory is wrong, beware of our intuitions, etc etc.

What success looks like: a proposed illusionism-native research program.Closing remarks

Caveating the illusionist approach

There are arguments against building consciousness* into AI. These are valid concerns and important to state:

  • Uncertainty runs deep: Illusionism could be wrong. Our arguments about consciousness*' moral irrelevance could be wrong. We need to proceed carefully.
  • Entanglement problems: Even if consciousness* isn't directly morally relevant, the actual moral markers whatever they may be—agency, preferences, desires—might be deeply intertwined with consciousness*.
  • Indirect human welfare concerns: Making AIs seem conscious might cause psychological harm to humans who form attachments to them (or any other theory of harm which doesn't assume AI suffering).

I personally totally endorse an approach that proceeds with caution and recognises uncertainty. I also happen to think that opinionated takes are an important part of advancing knowledge. In a few paragraphs I’ve claimed that 1) phenomenal consciousness doesn’t exist 2) consciousness doesn’t matter for morals and 3) we should actively build conscious* AI. Super controversial. I’m extremely keen to get any kind of feedback.

Appendix

A very tempting (but flawed) debunking argument about intuitions on the moral status of consciousness*

Following Daniel Dennett's advice in his autobiography, I'm sharing a tempting but ultimately flawed argument I came up with which aims to debunk our moral intuitions about consciousness. Thanks to François Kammerer for helping point out the flaw. The argument is:

  1. There is a correct explanation for our intuitions about the moral status of conscious* states, which is independent of consciousness(*).
  2. If there is such an explanation, and our intuitions are correct, then their correctness is a coincidence.
  3. The correctness of intuitions about consciousness* is not a coincidence.
  4. Our intuitions about the moral status of consciousness* are incorrect.

The argument is tempting, but when you think hard about whether or not to include the asterisk in brackets, it falls apart. Roughly:

  • If you write it with an asterisk, then the claim becomes implausible: it is actually quite likely that our intuitions depend on consciousness*.
  • If you write it without an asterisk, the argument doesn't add anything to the illusionist story (even if it turns out correct).
  1. ^

    Another useful analogy: until the early 20th century, vitalists maintained that there was something irreducibly special (they called it "élan vital") that distinguished living from dead, and which could not be reduced to mere chemistry and physics. That was until it was successfully explained by (bio)chemistry and physics. It turned out there was no explanatory gap after all.

  2. ^

    This is totally inspired by the concept of quasi-phenomenality introduced by Keith Frankish here.

  3. ^

    It seems common in AI consciousness research (e.g. this paper) to refrain from committing to any one theory, and argue we should proceed with uncertainty. I totally agree with this, but I also think opinionated takes help advance knowledge.

  4. ^

    The arguments here very much come from my own interpretation of illusionism. I'm skipping over some assumptions (e.g. materialism). There are also many disagreements between illusionists.

  5. ^

    Graziano goes into more detail on how AST is illusionist-compatible in his article: Illusionism Big and Small.



Discuss

Lessons from Studying Two-Hop Latent Reasoning

11 сентября, 2025 - 20:53
Published on September 11, 2025 5:53 PM GMT

Twitter | ArXiv

Many of the risks posed by highly capable LLM agents — from susceptibility to hijacking to reward hacking and deceptive alignment — stem from their opacity. If we could reliably monitor the reasoning processes underlying AI decisions, many of those risks would become far more tractable. Compared to other approaches in AI, LLMs offer a unique advantage: they can ``think out loud'' using chain-of-thought (CoT) enabling oversight of their decision-making processes. Yet the reliability of such monitoring hinges on an empirical question: do models need to externalize their reasoning in human language, or can they achieve the same performance through opaque internal computation?

In our new paper, we investigate LLM latent reasoning capabilities using two-hop question answering as a case study. We fine-tune LLMs (including Llama 3 8B and GPT-4o) on synthetic facts and test two-hop reasoning over these facts. By using synthetic facts, we rule out memorization and reasoning shortcuts as explanations for two-hop performance. We observe a nuanced picture:

  1. Finding 1: Models completely fail to compose synthetic facts they learned through fine-tuning without explicit chain-of-thought reasoning, achieving only chance-level accuracy despite perfect recall of the individual facts.
  2. Finding 2: Interventions to (i) force a correct fact storage order across transformer layers and (ii) encourage the first reasoning hop both fail to enable models to compose newly learned facts without chain-of-thought.
  3. Finding 3: Models successfully compose newly learned synthetic facts without chain-of-thought when those facts co-occur in the same fine-tuning document or in the same test-time prompt.
  4. Finding 4: LLMs are capable of composing two separately learned facts, as long as one of the facts is naturally acquired during pretraining (the second fact can be synthetic and acquired through fine-tuning).

These findings offer a humbling lesson: researchers observing models answering two-hop questions about real-world facts might incorrectly infer robust latent reasoning capabilities, not realizing that performance relies on facts appearing together in pretraining data. Conversely, our synthetic fine-tuning results alone might suggest models fundamentally cannot perform latent composition—a kind of limitation that could make oversight via reasoning traces highly reliable. Yet this too would be misleading given models' success in other settings. This sensitivity to experimental conditions serves as a cautionary tale for the broader project of understanding LLM reasoning: apparent limitations may be artifacts of experimental setups far removed from how frontier models are trained, while apparent successes may rely on shortcuts invisible to experimenters.

Moving forward, we believe that researchers hoping to gain confidence in LLM agent oversight based on CoT monitoring should move beyond multihop question answering as a toy model of reasoning. What ultimately matters for monitoring capable LLM agents is whether oversight mechanisms can catch misbehavior in practice. Determining monitorability might thus be best achieved via end-to-end evaluations, where LLM agents attempt to complete agentic tasks while their CoT is monitored by another model. Such evaluations could reveal whether models are capable of latent reasoning that would allow carrying out harmful tasks without externalizing harmful intentions in a readable way.



Discuss

My talk on AI risks at the National Conservatism conference last week

11 сентября, 2025 - 19:32
Published on September 11, 2025 4:32 PM GMT

Lately I’ve been trying to raise awareness of AI risks among American conservatives.  Stopping the reckless development of advanced AI agents (including Artificial General Intelligence (AGI) and Artificial Superintelligence (ASI)) should be a human issue, not a partisan issue. Yet most people working in AI safety advocacy lean to the Left politically, and we seem to be ignoring many potential allies on the Right. 

This neglect of conservative allies seems suboptimal, given that the Republican Party currently controls all three branches of the U.S. government (including the Presidency, the Supreme Court, and majorities in the House and the Senate). Granted, the pro-AI lobbyists, Big Tech accelerationists, and tech VCs like Andreessen Horowitz have some influence with the current White House (e.g. on AI Czar David Sacks), but many conservative political leaders in the executive and legislative branches have expressed serious public concerns about AI risks. 

In the interests of coalition-building, I attended a political conference last week in Washington DC called the National Conservatism conference (NatCon 5). There I gave a talk titled ‘Artificial Superintelligence would ruin everything’, in a session on ‘AI and the American Soul’. 

NatCon has been a big deal as the intellectual vanguard of American conservatism since the first conference in 2019. Vice President JD Vance rose to prominence partly because he gave inspiring speeches at previous NatCons, even before he was elected a Senator in 2022. Other speakers at NatCon 5 last week included Tulsi Gabbard (Director of National Intelligence), Tom Homan (Border Czar), Sebastian Gorka (NSC Senior Director for Counterterrorism), Jay Bhattacharya (Director of the NIH), Russell Vought (Director of the OMB), Harmeet Dhillon (Assistant Attorney General), Ambassador Jamieson Greer (U.S. Trade Representative), and U.S. Senators Jim Banks, Josh Hawley, and Eric Schmitt. There were about 1,200 registered attendees at NatCon 5, including many staffers working in the current administration, plus leading figures from conservative universities, news channels, magazines, and podcasts. The YouTube playlist of NatCon 5 talk videos released so far is here.

NatCon was also an opportunity to reach out to religious leaders. The conference included benedictions by Catholic priests, Protestant ministers, and Jewish rabbis, and the Judaeo-Christian tradition was a prominent theme. Most of the attendees would have identified as Christian. The main conference organizer, Yoram Hazony, is a Jewish political theorist who lives in Israel; his books on nationalism and conservatism would be excellent introductions for those genuinely interested in understanding modern national conservatism. (In a future post, I may explore some strategies for raising AI safety awareness among the 2.5 billion Christians in the world.)

Why this post? Well, a few days ago, Remmelt was kind enough to post some excerpts (here on EA Forum and here on LessWrong) from an article by The Verge reporting on the controversies about AI at the conference, including several excerpts from my talk and my questions to other speakers. This sparked little interest on EA Forum so far, but a fair amount of controversy on LessWrong, including some debate between Oliver Habryka and me. The AI theme at NatCon was also covered in this article in the Financial Times. Other NatCon speakers who addressed various AI risks included Senator Josh Hawley on AI unemployment (talk here), Mike Benz on AI censorship (talk here), and Rachel Brovard on AI transhumanism (talk here), plus the other speakers in our AI panel: Wynton Hall, Spencer Klavan, and Jeffrey Tucker (videos not yet released).

The video of my NatCon talk on AI risk hasn’t been released yet. But I thought it might be useful to include the full text of my talk below, so people on this forum can see how I tried to reach out to conservative ‘thought leaders’ given the beliefs, values, and civilizational goals that they already have. I tried to meet them where they are. In this regard, I think my talk was pretty successful – I got a lot of good questions in the Q&A afterwards, a lot of follow-up from conservative media. For example, here’s a short interview I did with Joe Allen from Bannon’s War Room -- who also recently interviewed Nate Soares about AI X-risk.  I also had many good discussions about AI over the 3 days of the conference, which were very helpful to me in understanding which kinds of AI safety arguments do or do not work with conservatives, nationalists, and Christians. 

I’d welcome any (reasonable) comments, reactions, and suggestions about how AI safety advocates can reach out more effectively to conservatives – especially to those who are currently in power.

-------------------------------------------

[talk itself is below]

Artificial Superintelligence would ruin everything

It’s an honor and a thrill to be here, at my first NatCon, among people who respect our ancestors, cherish our descendants, and love our nation.

In my day job, I’m a psychology professor who teaches courses on relationships, emotions, and altruism. But I’m also a conservative, a patriot, and a parent. There are still four or five of us around in academia. 

I’ve been following AI closely for 35 years, ever since I worked on neural networks and autonomous robots as a grad student at Stanford and then a post-doc at University of Sussex. In the last ten years, I’ve talked a lot about the dangers of AI in articles, podcasts, and social media.

The AI industry’s explicit goal is go far beyond current LLMs like ChatGPT, to develop Artificial General Intelligence, or AGI, that can do any cognitive or behavioral task that smart humans can do, and then to develop Artificial Superintelligence, or ASI, which would be vastly smarter than all the humans that have ever lived. 

These are not distant goals – many AI leaders expect AGI within 10 years, then ASI shortly after that. I’ve got two toddlers, and we could face dangerous ASIs by the time they graduate high school. 

In this talk, I aim to persuade you that ASI is a false god, and if we build it, it would ruin everything we know and love. Specifically, it would ruin five things that national conservatives care about: survival, education, work, marriage, and religion. 

First, ASI would ruin the survival of our species 

The most severe risk is that ASIs that we can’t understand, predict, or control end up exterminating all of humanity. This is called ASI extinction risk, and it’s been a major topic of research for 20 years. 

Remember, we’re not coding AIs like traditional software that can be analyzed and debugged. The best current AI systems are black box neural networks trained on over 40 trillion language tokens, yielding over a trillion connection weights, like synapses in a brain. Reading all of these connection weights aloud would take a human about 130,000 years. We have no idea how these LLMs really work, and no idea how to make them aligned with human interests.

So what? Well, about a quarter of the general public already think that ASI could cause human extinction within this century. Hundreds of leading AI researchers agree, including Yoshua Bengio and Geoff Hinton – the two most-cited living scientists. 

And every single CEO of a major AI company has warned that developing ASIs would impose serious extinction risks on humanity. This includes Sam Altman of OpenAI, Dario Amodei of Anthropic, Demis Hassabis of Deepmind, and Elon Musk of xAI. 

Generally, the more you know about AI, the higher your p(doom), or estimated probability that ASI would doom humanity to imminent extinction.  Among most AI safety experts, who have studied these issues for years, p(doom) is at least 20%. For many, including me, it’s well over 50%. 

Long story short, almost everyone in AI understands that developing ASI would be playing Russian roulette with our entire species. In the six chambers of this existential revolver, we’re just arguing about whether there would be one round, or five rounds. 

Problem is, when I talk about ASI extinction risk, many people seem weirdly unconcerned. They think the risk is too speculative or too distant.

So, in this talk, I’m going to focus on what happens even if we avoid ASI extinction – if we survive to enjoy the AI industry’s best-case scenario. 

ASI would ruin education. 

Actually, AI is already ruining higher education. Millions of college students are using AI to cheat, every day, in every class. Most college professors are in a blind panic about this, and we have no idea how to preserve academic integrity in our classes, or how our students will ever learn anything, or whether universities have any future.

We can’t run online quizzes or exams, because students will use AI to answer them. We can’t assign term papers, because LLMs can already write better than almost any student. So, in my classes, I’ve had to ‘go medieval’, using only in-person paper-and-pencil tests. 

The main result of AI in higher education so far, is that students use AI to avoid having to learn any knowledge or skills. 

So, what is the AI industry’s plan for education? They seem inspired by Neal Stephenson’s 1995 novel ‘The Diamond Age’, in which a nanotech engineer invents a superintelligent interactive book that serves as a customized tutor for his bright and curious daughter. ASIs could make great personal tutors for kids. They would know everything about everything, and be able to explain it with the best possible combination of words, videos, and games.  

The question is, what values would the AI companies train into these ASI tutors, so that they can shape the next generation of AI users? Will they nudge the kids towards national conservatism, family values, and the Christian faith? Or will they teach Bay Area leftist, secular, globalist, transhumanist values? 

You know the answer.  ASI tutors would give the AI industry total educational and ideological control over future generations.

ASI would ruin work. 

National conservatives believe in borders. We oppose immigration by millions of people who want our jobs or our welfare, but who do not share our traditions or values. 

Many conservatives in the current Trump administration, quite rightly, want stronger geographical borders against alien people. But they seem oblivious about protecting our digital borders against invasion by alien intelligences. Indeed, they seem giddy with delight about AI companies growing ASIs inside our data centers – without understanding that a few ASIs can easily become hundreds of ASIs, then millions, then billions. If you worry about immigrants out-breeding our native populations, wait until you see how quickly ASIs can self-replicate. 

These ASIs won’t be American in any meaningful sense. They won’t be human. They won’t assimilate. They won’t have marriages or families. They won’t be Christian or Jewish. They won’t be national conservatives. But they will take our jobs. 

Economists, bless their hearts, will often say ‘AI, like every technology before it, may eliminate some traditional jobs, but it will produce such prosperity that many new jobs will be created.’ This copium reveals a total misunderstanding of AI. 

Remember, Artificial General Intelligence is defined as an AI that can do any cognitive or behavioral task at least as well as a smart human can, to an economically competitive level. This includes being able to learn how to control a human-shaped body to do any physical labor that a human can learn to do. Even stronger ASI, plus anthropoid robots, could replace any human worker doing any existing job – from bricklaying to brain surgery, from running hedge funds to doing further AI research. 

OK, so we’d lose all existing jobs to ASI. But won’t the AI-fueled economic growth create billions of new jobs? Yes, it will – for other ASIs, which will be able to learn any new job faster than any human can learn it. We can’t re-train to do new jobs faster than the ASIs can train themselves.

ASI would, within a few decades, impose permanent unemployment on every human, now and forever. Our kids won’t have jobs, and won’t be able to pay for food, housing, or medical care. And every CEO of every AI company knows this -- which is why they all say that the only long-term solution to AI-induced unemployment is a massively expanded welfare state.  Elon Musk calls this Universal Generous Income; others have called it the Fully Automated Luxury Communist Utopia.

This is their plan: ASI will automate all human labor, no human workers will earn any income, the AI companies will earn all income, then pay most of their revenue to the government, and the government will distribute generous welfare payments to everyone. 

The AI executives promise that they will happily pay enough taxes to support this universal welfare state. Maybe they’ll pay the $20 trillion a year that it would cost. But for how long? For generations?  Forever?

After ASI, the dignity of human work would die. Husbands would no longer be breadwinners. Every mother would become a welfare queen. Every child would, economically, become a ward of the state. The family as an economic unit would end. The bonds of mutual loyalty that sustain our economic interdependence would become irrelevant. 

ASI would ruin marriage 

Look, I’m not against all AI applied to human relationships. I’m chief science advisor to a start-up company called Keeper, which has developed a really good AI matchmaking app to help men and women find compatible partners – ones who actually want marriage and children. Whereas Tinder and Hinge offer short-term casual hookups, Keeper is all about traditional family values. But we’re using narrow, domain-specific AI to do the matchmaking, with no ambitions to develop ASI.

By contrast, the big AI companies wants its users to develop their most significant, intimate relationships with their AIs, not with other humans. This is clear in the recent push to develop AI girlfriends and boyfriends, to make them ever more attractive, charming, interactive, and addictive. 

The AI transhumanists are eager for a future in which everyone has their own customized AI companions that anticipate all their desires. 

This would be logical outcome of combining chatbots, sexbots, deepfakes, goon caves, AR, VR, and reproductive tech. To misquote Xi Jinping, it would be Romanticism, with Bay Area Characteristics, for a New Era. Actual reproduction and parenting would be outsourced to artificial wombs and child care robots.

Granted, an ASI partner would be tempting in many ways. It would know everything about everyone and everything. It would chat insightfully about all our favorite books, movies, and games. It would show empathy, curiosity, arousal, and every human emotion that can simulated. And, no need for monogamy with AIs. If you can afford it, why not lease a whole AI harem?

But, after enough young people get a taste for an ASI boyfriend or girlfriend, no mere human would seem worthy of romantic attraction, much less a marriage, or a family. 

ASI would ruin religion 

Our new Pope, Leo XIV, said earlier this year that AI poses ‘new challenges for the defense of human dignity, justice, and labor’, and he views AI as the defining challenge of our world today. 

But most American AI developers are liberal Bay Area atheists. They may have had ‘spiritual experiences’ at Burning Man, enjoying LSD, EDM, and tantric sex. But they view traditional religion with bemused contempt. The have a god-shaped hole in their souls, which they fill with a techno-utopian faith in the coming ASI Singularity. In place of the Judeo-Christian tradition, they’ve created in a trendy millenarian cult that expects ASIs will fill all their material, social, and spiritual needs. 

This is the common denominator among millions of tech bros, AI devs, VCs, Rationalists, and effective accelerationists. ASI, to them, will be the new prophet, savior, and god. Indeed, they speak of summoning the ‘sand-god’: sand makes silicon chips, silicon chips enable superintelligence, and superintelligence means omniscience, omnipotence, and omnipresence. But godlike ASIs won’t offer real love, mercy, holiness, or salvation.

Summoning the ASI sand-god would be the ultimate hubris. It won’t have true divinity, and it won’t save any souls. But it may say unto us, ‘Thou shalt have no other gods before me’. 

So, what to do about ASI? 

My humble suggestion is that we shouldn’t let ASIs ruin our education, work, marriage, religion, nation, civilization, and species. We should shut down all ASI development, globally, with extreme prejudice, right now. 

To do that, we need strict AI regulations and treaties, and the will to enforce them aggressively. But we also need the moral courage to label ASI as Evil. Not just risky, not just suicidal, but world-historically evil. We need a global campaign to stigmatize and ostracize anyone trying to build ASI. We need to treat ASI developers as betrayers of our species, traitors to our nation, apostates to our faith, and threats to our kids. 

Many people will say: but if we don’t develop ASI, China will, and wouldn’t that be worse? 

This is where a little knowledge of game theory is a dangerous thing. We’re not in a geopolitical arms race to build a new tool, or a new weapon. We’re not racing up a mountain to reach global hegemony. Instead, we’re racing off a cliff. An American-made ASI would ruin everything we value in America. Maybe Xi Jinping could be persuaded that a Chinese-made ASI would ruin everything the Han Chinese people love, and would mean that the CCP would give up all power to their new ASI-emperor. 

When two players realize they’re racing off the same cliff, they can easily coordinate on the Pareto-optimal equilibrium where they both simply stop racing. There would be no temptation to defect from a global ban on ASI development, once the major players realize that secretly building any ASI would be civilizational suicide.

So, we’re not in a two-player AI arms race with ASI as the payoff – rather, we’re in a game that could add a third player, the ASI, that would also win any future games we try to play. The only winner of an AI arms race would be the ASI itself – not America, not China. Only the ASI, stamping on a human face, forever.

Let’s get real. The AI industry is radically revolutionary and aggressively globalist. It despises all traditions, borders, family values, and civic virtues. It aims to create and worship a new sand-god. It aims to ruin everything national conservatives know and love. 

We, in turn, must ruin the AI industry’s influence here in Washington, right now. Their lobbyists are spending hundreds of millions of dollars to seduce this administration into allowing our political enemies to summon the most dangerous demons the world has ever seen. 

[end]

[Note: this was cross-posted to EA Forum today here]



Discuss

The Astronaut and the Planet: Part I

11 сентября, 2025 - 19:12
Published on September 11, 2025 4:12 PM GMT

A series of posts on self-models and what they teach us about cognition, artificial and natural. I expect that the ideas will seem new and unexpected to some of you, and obvious and natural to others. I hope however that most will find at least something new in them.

An old favorite supposes at least two different modes of processing information our brain can exist in: the apologist - the narrative, conscious self who loves nothing more than to explain away - and the revolutionary - the intuitive, unconscious self who loves nothing more than to clean the slate. I think this picture carries a grain of truth and addresses a fundamental question about ourselves: how do we understand our conscious and unconscious selves? Why do we have conscious perception at all? And given that we do, why are we not conscious of all our mental operations? Why do we feel like an I, and what gets to be a part of our I? When we are in physical pain, is the pain produced by the I or experienced by the I?  Why does the I have a sense of agency and free will in a possibly deterministic world? 

In my own thinking around these questions, I have found a different metaphor much more useful: the Astronaut and the Planet. The Astronaut  is the seat of narrative consciousness, orbiting and observing a vast and complex Planet that is the rest of our mind and body. The astronaut's role is to build the best possible map, or model, of the planet based on the limited observations it can make of the planet and the environment. The apologist is the astronaut in its purest form. The planet is vastly more complex than the astronaut however, and capable of feats the astronaut cannot conceive of, and whatever the astronaut might feel like, it is very far from being in control of the planet.

The astronaut receives a curated stream of information - visual and auditory data, emotional pulses, cognitive thoughts - but these inputs are the end products of massive parallel processing on the planet's surface. Consider what happens when you recognize a friend's face: your conscious astronaut simply experiences "Oh, it's Sarah!" But this recognition emerges from millions of neural computations comparing features, accessing memories, and integrating context, none of which the astronaut directly observes.

This computational disparity is fundamental. The planet operates through vast parallel networks with access to procedural memories, emotional associations, and sensorimotor patterns accumulated over decades. The astronaut, in contrast, works with a narrow sequential stream of conscious attention and limited working memory. Yet despite this asymmetry, the astronaut often feels ownership over the entire planet.

Agency and Free Will

Why does the astronaut claim ownership over the planet's actions? My core claim: a system experiences agency precisely when its predictions align with observed outcomes. This claim naturally goes well with some of the core ideas in the literature on predictive processing.

Consider learning to type. Initially, you consciously think "press 'k'" and observe your finger moving to the correct key. The astronaut notices: internal intention → finger movement. Through repetition, it builds a predictive model: "When I intend to press 'k', my finger will move there." Because this prediction consistently matches reality, the astronaut experiences ownership over typing.

Crucially, the astronaut doesn't need to actually initiate the motor commands. Imagine a baby learning to control its limbs. Some deep brain region fires motor neurons, causing a hand to wave. The astronaut simply observes: certain internal patterns → hand movement. Through repeated observation, it learns to predict hand movements based on these internal signals. The astronaut then experiences agency over the hand, despite being neither the source of the motor commands nor the mechanism moving the limb.

The same logic explains our sense of ownership over thinking itself. We naturally associate thoughts with our narrative self, but as anyone who has practiced introspection knows, thoughts simply appear in consciousness. There's something paradoxical about "choosing" what to think—in making such a choice, haven't we already had the thought? We experience ownership over thinking because our astronaut can often predict the shape of the next thought, even though the actual cognitive work happens in the planet's hidden depths.

From this perspective, voluntary actions are precisely those actions which are strongly controlled by those parts of the planetary processing that the astronaut has most access to (thoughts primarily) while involuntary actions are strongly controlled by parts of the planet that the astronaut has little to no access to.

If agency arises through predicting the outcomes before they happen, free will is the exact opposite. It arises precisely in those moments when our astronaut has modeled the planet particularly poorly in a situation as to be unable to predict what will happen next. Faced with two apparently equally appealing options, the astronaut's simpler model of the planet fails to come to a decision. Note that this description of free will is completely agnostic about whether the underlying world is deterministic or not. In a practical sense, purely because of the computational differences between the astronaut and the planet, the planet is non-deterministic in crucial ways. 

Me, Myself and I

Where does our sense of unified selfhood come from? The astronaut faces a modeling problem: it observes a bewildering array of actions, thoughts, and sensations that seem to emanate from "somewhere." The most parsimonious model treats these as outputs of a single, unified entity: the planet. Moreover, because the astronaut can predict many of the planet's actions (especially those involving conscious thoughts), it naturally identifies itself with this unified entity. "I am thinking this thought, I am moving this hand, I am feeling this emotion." The astronaut experiences itself as the planet.

In fact, as many reading this will have observed, our sense of I is always in motion. Our self-hood expands and contracts as determined by the immediate situation - while driving a car or playing a video game, we might identify ourselves with the car or the player character. Similarly, a deep relationship expands our notion of self to include the other person - perhaps this is most strongly felt with our children. Conversely, our notion of selfhood can contract when faced with helplessness - anxiety can produce thoughts in one that they violently reject based on their self-conception. 

And along a different axis, how tightly we hold on to our conception of self also varies. Anger amplifies it and we entrench ourselves in our ego, uncertainty can propel us to rebuild our model of ourselves while psychedelics can free up our inhibitions enough to completely reshape our personality.

Why might our notion of self-hood be inhibitory? After all, the astronaut can do nothing but model the planet, and at first sight, a model can only be helpful. However, a model can be actively detrimental if we confuse a map for a goal. Because the astronaut's model of the planet is necessarily simplistic, converting this model into a goal necessarily sheds complexity. The words the astronaut uses ("kind, charitable, irritable...") are all approximations to what is really true, and to the extent that the planet feels constrained by these expectations, they serve the system poorly.

If we buy the argument so far, the obvious next question is why have the astronaut at all. What purpose is the astronaut serving in the grand scheme of things? The answer will have to wait till Part II.



Discuss