Вы здесь

Сборщик RSS-лент

The Autofac Era

Новости LessWrong.com - 24 сентября, 2025 - 21:20
Published on September 24, 2025 6:20 PM GMT

If we don’t build AGI in the next 5 years, I think it’s likely that we end up in what I’m calling the Autofac Era, where much of the economy is automated but humans continue to drive all economic activity as consumers. In this post I’ll explain why I think it might happen, what I expect it to look like, how you can prepare for it, and how it will end.

NB: This is an informal model. I haven’t done extensive research. It’s primarily based on my 25+ years of experience thinking about AI and my read of where we are today and what I think is likely to happen in the next few years. I strongly invite you to poke holes in it, or offer hard evidence in its favor. If you’d prefer to read a better researched model of how AI changes the world, I’d recommend AI 2027.

What is Autofac?

The name Autofac is drawn from a 1955 short story by science fiction author Philip K. Dick. In it, humans live in a world overrun by self-replicating factories that deliver an endless supply of consumer goods. The exact details of the story aren’t important for the model, though, other than inspiring the name, which I chose because it assumes we need to keep the consumer economy going even though most economic goods and services are provided by AI.

My background assumption is that the economy will remain based on human consumption of goods and services. At first this will primarily be because it’s how the economy already works, but later it’ll be because, without AGI, humans are the only source of wanting stuff. Tool AI would be just as happy to sit turned off, consuming no power and doing nothing, so an AI economy without AGI only makes sense, best I can tell, if there’s humans who want stuff to consume.

The development of AGI would obviously break this assumption, as would tool AI that autonomously tries to continue to deliver the same outcomes it was created for even if there were no humans around to actually consume them (a paperclip maximizer scenario, which is surprisingly similar to the Autofacs in PKD’s story).

How does the Autofac Era happen?

To get to the Autofac Era, it has to be that we don’t develop AGI in the next few years. I’m saying 5 years to put a hard number on it, but it could be more or less depending on how various things play out.

I personally think an Autofac scenario is likely because we won’t be able to make the conceptual breakthroughs required to build AGI within the next 5 years, specifically because we won’t be able to figure out how to build what Steve Byrnes has called the steering subsystem, even with help from LLM research assistants. This will leave us with tool-like AI that, even if it’s narrowly superintelligent, is not AGI because it lacks an internal source of motivation.

I put about 70% odds on us failing to solve steering in the next 5 years and thus being unable to build AGI. That’s why I think it’s interesting to think about an Autofac world. If you agree, great, let’s get to exploring what happens in the likely scenario that AGI takes 5+ years to arrive. If you disagree, then think of this model as exploring what you believe to be a low-probability hypothetical.

What will the Autofac Era be like?

Here’s roughly what I expect to happen:

  • AI becomes capable of automating almost all mental labor within 3 years. Humans continue to be in the loop, but only to steer the AI towards useful goals. Many successful companies are able to run with just 1 to 3 humans doing the steering.

    • This is based on extrapolation of current trends. It might happen slightly sooner or slightly later; 3 years is a median guess.

  • Shortly after, say in 1 to 2 years, AI becomes capable of automating almost all physical labor, with again the need for supervisors to steer the AI towards useful goals.

    • The delay is because of ramp times to manufacture robots and organized resistance by human laborers. I could also be wrong that there’s a delay and this could happen concurrently with the automation of mental labor, shortening the timeline.

  • The full automation of most economic activity allows rapid economic growth, with total economic output doubling times likely in the range of 1 to 3 years. Because this dramatically outstrips the rate at which humans can reproduce and there’s no AGI to eat up the economic excess, there’s tremendous surplus that creates extreme wealth for humanity.

  • Because we don’t have AGI, our powerful AI tools (which we may or may not consider to be superintelligent) remain tools, and thus humans retain at least nominal control because humans are the only source of motivation to do work.

  • Therefore, power structures have to remain mostly in place and controlled by humans, though potentially transformed by the need to respond to AI tools that eliminate certain kinds of friction that keep existing systems working today. There will still be states and laws and civilizational coordination norms and a legal monopoly on violence to keep the whole system functioning.

  • At this point, there are only a few jobs left for humans that fit roughly within three categories:

    • Source of motivation (+ ownership)

      • Business executive (which consists of steering AI agents towards goals, and may require some degree of technical expertise to get good results)

      • Investors & board members (though expect lots of AI automation of investment activities)

      • Journalists (with lots of AI automation of research, but only people know what we care about knowing)

      • Religious/spiritual leadership (though expect some AI cults)

      • Landlords (though expect robots to do maintenance, negotiate leases, etc.)

    • Human is the product/service

      • Caring professions (in cases where human connection is what’s valuable, like therapists, some nurses, and some doctors)

      • High-end chefs (luxury good)

      • Teachers and nannies (luxury good or people completely reject AI involvement)

      • Arts (rejection of AI slop, though expect many people with little taste to favor slop as they already do today)

      • Sex work (luxury good that competes with AI)

    • Monopoly on violence/coercion

      • Lawyers and judges and police (though expect a lot of automation of law research and basic police work)

      • Political leadership (humans remains in at least nominal control of state power, though rely heavily on AI advisors to make decisions)

      • Military leadership (robots can do the fighting, but humans ultimately control when to pull the trigger in a metaphorical sense, as many war fighting robots will have autonomous authority to kill barring a new Geneva Convention to coordinate on war norms)

      • Protected professions (any group who manages to capture regulators to protect themselves from automation)

  • Despite a limited number of jobs, humans remain critical to the economy as consumers. If we don’t keep a large consumer base, the entire economic system collapses and we no longer have the money to fund AI.

  • To solve this problem, and to prevent revolution, states set up a self-sustaining consumer economy using UBI built off public investment in private companies. The most common job thus becomes “investor”, but all such investment is done indirectly with investments and payments managed by the state with people receiving fixed UBI payments monthly or weekly.

    • This creates a large “underclass” of people whose only source of income is UBI. Some people use their UBI wisely, invest it privately, and maintain something like a middle class lifestyle (relatively speaking; they are in fact quite wealthy in absolute terms). Others are trapped, either by circumstances or high time preference, and live UBI-check to UBI-check with no savings or investments, but still live excellent lives with abundant food, medical care, and entertainment options.

    • The effect of all this is that every human gets richer than they are today, and income inequality goes way up. The “underclass” live lives of incredible luxury by modern and historical standards, but feel “poor” because other people are trillionaires.

    • How bad it is to live with inequality depends a lot on what happens with scarce resources. If housing remains supply constrained, this would be really bad, but AI robotics should make construction cheaper. I’m more worried about electricity, but realistically I think the Autofac Era will not consume all available power before it ends.

  • States continue to function much as they do today, although all states with real power have heavy automation of all economic, military, and political activity. Expect the US and China to remain dominant, with their client states generally benefitting, and outsider states benefitting from surplus, some degree of goodwill, and a desire to police the world to minimize disruptive conflicts.

The above is what I view as the “happy” path. There are lots of ways this doesn’t play out the way I’ve described, or plays out in a similar but different way. Maybe people coordinate to push back hard against automation and slow AI automation. Maybe AI enables biological warfare that kills most of humanity. Maybe there’s nuclear exchanges. Maybe AI-enabled warfare damages communication or electrical systems in ways that destroy modern industry. There’s lots of ways the exact scenario I lay out doesn’t happen.

Lots of folks have explored the many risks of both tool-like AI and AGI, and I highly recommend reading their work. In the interest of quantification, if I screen off existential risks from AGI/ASI, I’d place something like 35% odds on not seeing a world that looks basically like the happy path because of some kind of non-existential AI disaster.

I’ve also assumed that we continue with something like a capitalist system. Maybe there’s so much surplus that we have a political revolution and try central planning again, but this time it actually works thanks to AI. Such a world would feel quite a bit different from the scenario I’ve described, but would share many of the core characteristics of my model.

How can I prepare for the Autofac Era?

The best way to prepare is by owning capital, either directly or through investment vehicles like stocks and bonds. I won’t give you any advice on picking winners and losers. I’ll just suggest at least following the default advice of holding a large and diversified portfolio.

You could also try to have the skills and connections necessary to continue to be employed. This is a high risk strategy, as there’ll be a lot of competition for a much more limited number of rules. If you’re not in the top 10% in some way for your preferred role, you’re unlikely to stand a chance. If you pursue this path, have investments as a backup.

You’ll also be fine if you just don’t prepare. Life in the “underclass”, while it will be coded as low status and lock you out of access to luxury goods, you’ll still live a life full of what, by historical standards, would be luxuries. This is perhaps comparable to what happened during the Industrial revolution, except without the tradeoff of the “underclass” having to accept poor working conditions to get access to those luxuries.

That said, many people will find life in the “underclass” depressing. We know that humans care a lot about relative status. If they compare themselves to people with investments or jobs who can afford luxury goods, they may feel bad about themselves. A lot of people who aren’t used to being “poor” will suddenly find themselves in that bucket, even if being “poor” is extremely comfortable. My hope is that people continue to do what they’ve been doing for a while and develop alternative status hierarchies that allow everyone to feel high status regardless of their relative economic station.

How does the Autofac Era end?

I see it ending in one of three ways.

One is that there’s an existential catastrophe. Again, lots of other people have written on this topic, so I won’t get into it.

Another way for the Autofac Era to end is stagnation by the economy growing to the carrying capacity of the Sun. If we never make the breakthrough to AGI, we will eventually transition to a Malthusean period that can only be escaped by traveling to other stars to harness their energy. If that happens, the Autofac Era won’t really end, but a world with no growth would look very different from the one I’ve described.

Finally, the Autofac Era ends if we build AGI. This is the way I actually expect it to end. My guess is that the Autofac Era will only last 5 to 10 years before we succeed in creating AGI, and the onramp to AGI might even be gradual if we end up making incremental progress towards building steering subsystems for AI. At that point we transition to a different, and to be frank, much more dangerous world, because AGI may not care about humans the way tool-like AI implicitly does because they only instrumentally care about what humans care about. For more on such a world, you might read the recently released If Anyone Builds It, Everyone Dies.



Discuss

"Shut It Down" is simpler than "Controlled Takeoff"

Новости LessWrong.com - 24 сентября, 2025 - 20:21
Published on September 24, 2025 5:21 PM GMT

Two somewhat different plans for buying time and improving AI outcomes are: "Global Shutdown" and "Global Controlled Takeoff."

(Some other plans some people believe in include "ad hoc semi-controlled semi-slowed takeoff" and "race, then burn the lead on either superalignment or scary demos" and "decentralized differential defensive tech world.". I mostly don't expect those to work, but am mostly not talking about them in this post.)

"Global Shutdown" and "Global Controlled Takeoff" both include an early step of "consolidate all GPUs and similar chips into locations that can be easily monitored." 

The Shut Down plan then says things like "you cannot do any frontier development with the consolidated GPUs" (maybe you can use GPUs to run existing models that seem pretty safe, depends on implimentation details). Also, maybe, any research into new algorithms needs to be approved by an international org, and frontier algorithm development is illegal. (This is maybe hard to enforce, but, it is might dramatically reduce the amount of R&D that goes into it, since you can't be a billion dollar company who straightforwardly pours tons of resources into it without going to jail)

Controlled Takeoff says instead (as I currently understand advocates to advocate) something like "Frontier research continues, slowly, carefully, leveraging frontier controlled AI to do a ton of alignment research."

I'm generally pro "Shut It Down", but I also think Global Controlled Takeoff is much better than the status quo (both because it seems better in isolation, and because achieving it makes Shut Down easier), and I see some of the appeal depending on your exact beliefs.

But, some notes on strategy here.

"What's more impossible?"

A lot of AI safety arguments boil down to "what seems least impossible?". Is it more impossible to get a Global Shutdown, or to solve safe superintelligence with anything remotely like our current understanding or the understanding we're likely to get over the next 5-10 years?

I've heard a number of people say flatly "you're not going to get a global shut down", with a tone of finality that sounds like they think this is basically impossible. 

I'm not entirely sure I've correctly tracked which people are saying which things and whether I'm accidentally conflating statements from different people. But I think I've heard at least some people say "you're not getting a shut down" with that tone, who nonetheless advocate for controlled takeoff.

I certainly agree getting a global shutdown is very hard. But it's not obvious to me that getting a global controlled takeoff is much easier.

Two gears I want to make sure people are tracking:

Gear 1: "Actually consolidate the GPUs" is a huge political lift, regardless. 

By the time you've gotten various world powers and corporations to do this extremely major, expensive action, I think something has significantly changed about the political landscape. I don't see how you'd get it without world leaders taking AI more "fundamentally seriously", in a way that would make other expensive plans a lot more tractable.

Gear 2: "You need to compare the tractability of Global Shut Down vs Global Controlled Takeoff That Actually Works, as opposed to Something That Looks Close To But Not Actually A Controlled Takeoff."

Along with Gear 3: "Shut it down" is much simpler than "Controlled Takeoff."

A Global Controlled Takeoff That Works has a lot of moving parts. 

You need the international agreement to be capable of making any kind of sensible distinctions between safe and unsafe training runs, or even "marginally safer" vs "marginally less safe" training runs. 

You need the international agreement to not turn into molochian regulatory-captured horror that perversely reverses the intent of the agreement and creates a class of bureaucrats who don't know anything about AI and use the agreement to dole out favors.

These problems still exist in some versions of Shut It Down too, to be clear (if you're trying to also ban algorithmic research – a lot of versions of that seem like they leave room to argue about whether agent foundations or interpretability count). But, they at least get coupled with "no large training runs, period."

I think "guys, everyone just stop" is a way easier schelling point to coordinate around, than "everyone, we're going to slow down and try to figure out alignment as best we can using current techniques."

So, I am not currently convinced that Global Controlled Takeoff That Actually Works is any more politically tractable than Global Shut Down.

(Caveat: Insofar as your plan is "well, we will totally get a molochian moral maze horror, but, it'll generally move slower and that buys time", eh, okay, seems reasonable. But, at least be clear to yourself about what you're aiming for)

Gear 4: Removing pressure to accelerate is valuable for the epistemics of the people doing the AI-assisted alignment (if you're trying that).

One reason I think the Anthropic plan is actively bad, instead of "at-least-okay-ish," is that (given how hard they seem to actively oppose any kind of serious regulation that would slow them down), they seem intent on remaining in a world where, while they are supposedly working on aligning the next generation of AI, they have constant economic pressure to ship the next thing soon.

I believe, maybe, you can leverage AI to help you align AI.

I am pretty confident that at least some of the tools you need to navigate aligning unbounded superintelligence (or confidently avoiding creating unbounded superintelligence,) involves "precise conceptual reasoning" of a kind Anthropic-et-all seem actively allergic to. (see also behaviorism vs cognitivism. Anthropic culture seems to actively pride itself on empirics and be actively suspicious of attempts to reason ahead without empirics)

I'm not confident that you need that much precise conceptual reasoning / reasoning ahead. (MIRI has an inside view that says this is... not like impossible hard, but, is hard in a fairly deep way that nobody is showing respect for. I don't have a clear inside view about "how hard is it", but I have an inside view that it's harder than Anthropic's revealed actions thinks it is)

I think thinking through this and figuring out whether you need conceptual tools that you aren't currently good at in order to succeed, is very hard, and people are extremely biased about it. 

I think the difficult is exacerbated further if your competitor is shipping the next generation of product, and know-in-your-heart that you're reaching ASL danger levels that at least should give you some pause to think about it, but, the evidence isn't clear, and it would be extremely convenient for you and your org if your current level of control/alignment was sufficient to run the next training run.

So a lot of what I care most about with Shutdown/Controlled-Takeoff is making it no longer true that there is an economic incentive to rush ahead. (I think either Shutdown-y or Controlled Takeoff-y can both potentially work for this, if there's actually a trusted third party who is the one that makes calls about whether the next training run is allowed, who has the guns and compute).

 

Gear 5: Political tractability will change as demos get scarier. 

I'm not super thrilled with the "race to the edge, then burn the lead on scary demos" plan (specifically the "racing" part). But, I do think we will get much scarier demos as we approach AGI. 

Politicians maybe don't understand abstract arguments (although I think responses to If Anyone Builds It suggests they at least sometimes do). But I think there are various flavors of Sufficiently Scary Demos that will make the threat much more salient without needing to route through abstract arguments.

I think one of the most important things to be preparing for is leveraging Sufficiently Scary Demos when they arrive. I think this includes beginning to argue seriously now for global treaty shaped things and have them on hand so people go "oh, okay, I guess we do need that thing Those Guys Were Talking About After All" instead of just being bewildered.

 

Gears rather than Bottom Lines

I won't make a decisive claim that any of the above should be decisive in anyone's decisionmaking. I'm still processing the update that I can't really simulate the entire political disagreement yet and I'm not sure what other gears I'm missing from other people's perspective.

But, these are all individual gears that seem pretty important to me, which I think should be part of other people's overall strategizing.

I have a similar point comparing the feasibility of "Global Shut Down" vs "Decentralized Differentially Defensive Tech world that Actually Works", but, that's a fairly complex and different argument.

  1. ^

    to be clear this bias also applies in the MIRI-esque direction. But, they're not the one rushing ahead inventing AGI.



Discuss

AISN #63: California’s SB-53 Passes the Legislature

Новости LessWrong.com - 24 сентября, 2025 - 20:02
Published on September 24, 2025 5:02 PM GMT

r AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition: California’s legislature sent SB-53—the ‘Transparency in Frontier Artificial Intelligence Act’—to Governor Newsom’s desk. If signed into law, California would become the first US state to regulate catastrophic risk.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

A note from Corin: I’m leaving the AI Safety Newsletter soon to start law school—but if you’d like to hear more from me, I’m planning to continue to write about AI in a new personal newsletter, Conditionals. On a related note, we’re also hiring a writer for the newsletter.

California’s SB-53 Passes the Legislature

SB-53 is the Legislature’s weaker sequel to last year’s vetoed SB-1047. After Governor Gavin Newsom vetoed SB-1047 last year, he convened the Joint California Policy Working Group on AI Frontier Models. The group’s June report recommended transparency, incident reporting, and whistleblower protections as near-term priorities for governing AI systems. SB-53 (the “Transparency in Frontier Artificial Intelligence Act”) is an attempt to codify those recommendations. The California Legislature passed SB-53 on September 17th.

The introduction to SB-53’s text. Source.

Transparency. To track and respond to the risks involved in frontier AI development, governments need frontier developers to disclose the capabilities of their systems and how they assess and mitigate catastrophic risk. The bill defines a “catastrophic risk” as a foreseeable, material risk that a foundation model’s development, storage, use, or deployment will result in death or serious injury to more than 50 people, or more than $1 billion in property damages arising from a single incident involving a foundation model:

  • Providing expert-level assistance in the creation or release of CBRN weapons.
  • Autonomous cyberattack, murder, assault, extortion, or theft.
  • Evading the control of its frontier developer or user.

With these risks in mind, SB-53 requires frontier developers to:

  • Publish a frontier AI framework that includes (among other things) the developer’s capability thresholds for catastrophic risks, risk mitigations, and internal governance practices.
  • Review and update the framework once a year, and publish modifications within 30 days.
  • Publish transparency reports for each new frontier model, including technical specifications and catastrophic risk assessments.
  • Share assessments of catastrophic risks from internal use of frontier models with California’s Office of Emergency Services (OES) every 3 months.
  • Refrain from lying about catastrophic risks from its frontier models, its management of catastrophic risks, or its compliance with its frontier AI framework.

Incident reporting. Governments need to be alerted to critical safety incidents involving frontier AI systems—such as harms resulting from unauthorized access to model weights or loss of control of an agent—to intervene before they escalate into catastrophic outcomes. SB-53 provides that:

  • The OES will establish a hotline for reporting critical safety incidents.
  • Frontier developers are required to report critical safety incidents to within 15 days, or 24 hours if there is an imminent threat of death or serious injury.
  • Each year, the OES will produce a report with anonymized and aggregated information about critical safety incidents.

The bill’s incident reporting requirements are also designed to accommodate future federal requirements. In the case that federal requirement for critical safety incident reporting becomes equivalent to, or stricter than, those required by SB-53, then OES can defer to those federal requirements.

Whistleblower protection. California state authorities will need to rely on whistleblowers to report whether frontier AI companies are complying with SB-53’s requirements. Given the industry’s mixed history regarding whistleblowers, the bill provides that:

  • Frontier developers are prohibited from preventing or retaliating against covered employees (employees responsible for assessing, managing, or addressing risk of critical safety incidents) from reporting activities that they have reason to believe pose a specific and substantial catastrophic risk. (Existing whistleblower protections cover all employees and any violation of law—which includes SB-53’s transparency and incident-reporting requirements.)
  • Each year, the Attorney General will publish a report with anonymized and aggregated information about reports from covered employees.

Covered employees can sue frontier developers for noncompliance with whistleblower protections, and the Attorney General is empowered to enforce the bill’s transparency and incident reporting requirements by punishing violations with civil penalties of up to $1 million per violation.

How we got here, and what happens next. SB-1047 required frontier AI developers to implement specific controls to reduce catastrophic risk (such as shutdown controls and prohibitions on releasing unreasonably risky models), and Governor Newsom vetoed the bill under pressure from national Democratic leadership and industry lobbying. Since SB-53 only implements transparency requirements—and relies on the recommendations made by the Governor’s working group—SB-53 seems more likely to be signed into law. Anthropic has also publicly endorsed the bill.

Governor Newsom has until October 12th to sign SB-53. If he does, SB-53 will be the first significant AI legislation to become law since Senator Ted Cruz pushed (and narrowly failed) to attach a 10-year moratorium on state and local AI enforcement to federal budget legislation. He has since picked up the idea again in a new proposal—which, if it gains traction, might set up a conflict between California and Washington.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

Subscribe to receive future versions.

In Other News

Government

  • The Cyberspace Administration of China banned Chinese companies from buying Nvidia chips.
  • Italy approved a law regulating the use of artificial intelligence that includes criminal penalties for misuse.
  • Dozens of UK lawmakers accused Google of violating its AI safety commitments.

Industry

  • OpenAI and Anthropic published reports tracking economic patterns in how people use AI.
  • OpenAI and DeepMind both claimed gold-medal performance at the International Collegiate Programming Contest World Finals.
  • Nvidia is investing up to $100 billion in OpenAI. It’s also investing $5 billion in Intel.
  • Anthropic published a report discussing how AI is being used for cybercrime.

Civil Society

  • An open letter signed by former heads of state, nobel laureates, and other prominent figures calls for an international agreement on clear and verifiable red lines to prevent AI risks.
  • Stanford researchers published a paper finding that employment of early-career workers in exposed industries has declined 13%.
  • AI safety activists have begun hunger strikes outside of AI company headquarters in London and San Francisco.
  • Dan Hendrycks and Adam Khoja respond to critiques of Mutually Assured AI Malfunction (MAIM).
  • Rosario Mastrogiacomo discusses how AI agents are eroding the foundations of cybersecurity.
  • Ben Brooks argues that keeping frontier AI behind paywalls could create a new form of digital feudalism.
  • Oscar Delaney and Ashwin Acharya discuss ‘the hidden frontier’ of internal models at AI companies.

See also: CAIS’ X account, our paper on superintelligence strategy, our AI safety course, and AI Frontiers, a new platform for expert commentary and analysis.



Discuss

Launching the $10,000 Existential Hope Meme Prize

Новости LessWrong.com - 24 сентября, 2025 - 18:00
Published on September 24, 2025 3:00 PM GMT

Metaphors and short explanations have a large influence on how we think. But most of our common reference points lean dystopian: P(doom), the paperclip maximizer, Black Mirror, Brave New World, 1984, the list goes on. Where are the equally compelling ones for positive futures; abundant clean energy, infinite knowledge, no infectious disease, no poverty, solved aging, effective global coordination?

There has been some valuable recent work on this, such as the Better Futures series by Will MacAskill and Fin Moorhouse, which explores how to think more seriously about positive trajectories. Still, there’s a lot of space left for cultural references that can stick as easily as the dystopian ones.

As one small step toward filling this gap, we are launching the Existential Hope Meme Prize offering $10,000 for the best meme that conveys a hopeful vision of the future.

This project was first suggested at Vision Weekend Europe 2024. Much thanks to Ben Cumming and Jan Kulveit for this idea. 

Details and submissions here: existentialhope.com/existential-hope-meme-prize

We’d appreciate your help spreading the word, and of course, if you make memes yourself we’d love to see your submissions!



Discuss

The Chinese Room re-visited: How LLM's have real (but different) understanding of words

Новости LessWrong.com - 24 сентября, 2025 - 17:06
Published on September 24, 2025 2:06 PM GMT

TL;DR

The Chinese Room argument implies that computer programs can never really understand language. In this post, I’ll argue that LLM’s have a limited form of language understanding through teleosemantics i.e. the idea that meaning is acquired[1] through optimisation.

The Chinese Room

The Chinese Room argument, introduced by John Searle argues against what he calls the Strong AI hypothesis i.e. 

The appropriately programmed computer with the right inputs and outputs would thereby have a mind in exactly the same sense human beings have minds. 

The thought experiment poses a challenge to computational functionalism i.e. that mental phenomena can be described purely by their functional relation to each other. Since a computer program can accurately represent these functional relationships as symbols, according to functionalism, the computer program could have genuine mental states if it ran the right program. 

In the thought experiment, Searle imagines himself sitting in a room receiving Chinese characters that are passed under the door. He consults a set of rules/instructions to manipulate and transform the symbols and passes Chinese characters back under the door. To an outside observer, the room produces fluent Chinese, yet Searle sitting inside the room does not actually understand Chinese. 

John Searle looking confused in the Chinese Room.

The original argument was advanced in 1980 over 45 years ago, well before the advent of today’s LLM’s, but the implication for today’s systems is clear. Searle is arguing that current systems don’t use the words they produce meaningfully[2]. In other words, when an LLM tokens the word [dog] it doesn’t actually know what it means. To it, it’s just an empty symbol. 

Steelmanning the argument

The original formulation of the argument leaves it open to substantial confusion because there are two distinct observers; Searle sitting from his vantage point inside the room, and the room itself. If you asked the room a question it would respond in fluent Chinese whereas if you asked Searle a question he would respond in English from his vantage point. 

This distinction between the two observers has led some to advocate the Systems Reply which asserts that the room itself understands Chinese even if Searle himself doesn’t. This essentially views Searle as a kind of computational cog (or CPU) within the system that doesn’t exhibit the necessary understanding of Chinese even the full system does. In other words, the Systems Reply separates the room from the homunculus sitting inside and suggests that the systems individual components don’t need to understand for there to be genuine understanding.

Searle’s response is that he could, in principle, just memorise all of the rules/instructions that he was following inside the room, internalising them and then leave the room wandering outdoors to converse in Chinese. In this case, he asserts that he would still have no way of attaching meaning to the formal symbols even though he would now be the entire system. There’s still no way to get semantics from pure syntax. 

There are various counter-replies, but I think continued argumentation about where the system is located is unfruitful and misses the point that Searle is trying to make. His point is that one cannot simply derive semantic meaning from formal symbol manipulation alone. You cannot derive semantics from syntax and a computer program which manipulates symbols according to some syntactic rules does not automatically understand the semantics. 

Formally, 

P1) programs are purely formal (syntactic)

P2) minds have mental contents (semantics)

P3) syntax alone is not constitutive of or sufficient for semantics

C) programs are not constitutive of or sufficient for minds

The Chinese Room thought experiment lends support to P3), but as we’ve seen above we can quickly enter the weeds arguing about exactly where the program is located. Instead, some supporting intuition for P3 can be built independently of the Chinese Room. 

Consider the fact that you can sing the chorus of Despacito without necessarily understanding any Spanish. The individual Spanish tokens which make up the song mean actual Spanish words. But if you’re singing the song successfully without understanding what the words mean then the semantic meaning of what you’re communicating is entirely different to the individual Spanish words. You might be conveying meaning by signalling to the in-group, or even just proving to your brain that you know the rhythm and sounds. But the syntax produces an entirely different semantics in your brain compared to the brain of a Spanish speaker.

The post Generalised Hangriness introduces the idea that the words we say don’t necessarily convey all the meaning explicitly indicated by the words themselves. For example, if I respond in an uncharacteristically angry or unkind way, the literal meaning of my words might be different to the meaning of the subtext they’re communicating. I might shout “stop being so annoying!” when I really mean something like “I’m tired and hungry and just need to sleep.”

Also consider a well-trained parrot who says “Hello! Polly want a cracker!” It’s obvious the parrot doesn’t understand the actual words it’s saying in their English-language context even though it’s articulating the words. 

Some notes on the other premises

If you subscribe to something like Dennett’s Intentional Stance you’ll probably just reject P2). There’s no magic thing called “semantic understanding” which sits on top of flexible communication. No additional explanatory power is obtained by insisting that the system doesn’t truly understand if it passes all our behavioural tests of understanding such as a Turing test. If it walks like a duck and talks like a duck then we may as well call it a duck.

For what it’s worth, I think Dennett could be correct on this point. But I also think this argument is sometimes used to kill the debate when there are still interesting points to examine. Namely, is P1 correct? Could an LLM or another AI system actually be said to understand things in the sense that Searle meant it? Or are they doing the equivalent of singing Despacito without understanding Spanish? 

To tackle these questions we’ll introduce the concepts of intentionality and teleosemantics.

Intentionality

Thoughts can be about things. When you think of an object such as a dog or an electron you create a representation about the objects in question. This ability for mental states to be directed towards some object or state is called intentionality in philosophy of mind and poses a number of interesting problems. 

Human minds clearly have intentionality. When you think of a dog it’s about dogs. This mental state is directed towards the concept DOG[3] which relates to the hairy, barking animals inhabiting the real-life world we live in. We don’t just token an empty symbol [dog] without it actually referring to the concept DOG unless maybe we’re not concentrating properly, are not a native English speaker or have never encountered the token [dog] before and don’t know what it means. 

LLM’s are certainly capable of tokening the word [dog] but what, if anything, is this token actually directed to? Is it just syntax? Or do LLM’s actually represent what humans mean by dogs when they see the token [dog]? 

Teleosemantics

Teleosemantics ties content to proper function: a state represents what it was historically selected for to carry information to a downstream consumer system.

To use an often cited example, consider a frog snapping its tongue at a fly which passes across its visual field. The frog produces a representation FLY which is then consumed by the motor system in the form of a tongue-snap. The content lies in-between the producer (visual field) and consumer (motor system) mechanisms. The producer represents FLY because this historically led to the consumer doing the right thing in the form of a tongue-snap. In this case, the representation FLY takes on the meaning nutritious meal for a frog where it would take on a different meaning (say annoying-buzzy-thing) for a human whose consumer mechanism is a hand swatting it out of the way. 

Teleosemantics in action.

In this way, teleosemantics provides a nice story for how content is ascribed to certain representations. Content is simply that which the consumer reliably used the representation for in the causal history. It also provides a way of describing misrepresentations. If the frog snaps its tongue at a black blob on its visual field and this turns out to be a dust particle it has represented the dust particle incorrectly as a fly. 

Teleosemantics as optimisation

As Abram Demski notes in his post on teleosemantics

Teleosemantics reduces meaning-making to optimization. Aboutness becomes a type of purpose a thing can have.

In this way, we can use purposeful language to describe objects; hearts are for pumping blood, lungs are for breathing. Maps are for navigating the territory. 

But the key insight is that evolution by natural selection is not the only way to select proper functions. Reinforcement Learning creates an analogous optimisation via selection process in artificial systems. When biological systems interact with the environment, reward mechanisms fire in scenarios likely to increase a system's chances for survival. The same structure occurs when reinforcement learning agents receive a reward which causes them to develop certain functions which maximise the reward signal. A reinforcement learning agent which is exposed to reward signals therefore experiences semantic grounding according to teleosemantics in the same way that a biological system does. In both cases, representations that successfully track environmental features are reinforced. The substrate and fine-grained implementation details are different but the selective dynamics are functionally identical.

This solves several puzzles neatly. Namely, it provides a theory of semantics and intentionality within an entirely naturalistic framework explaining that there is a normative distinction between a function performing the role it is supposed to perform rather than malfunctioning. It also gives us a ready made reply to the Chinese Room argument. In the Chinese Room, the system never develops genuine intentionality because there is no evolutionary history which builds up the functions. It just follows the rules. 

The Robot Reply

The Robot reply concedes that Searle is right about the mental state of the operator in the room. They wouldn’t understand the Chinese symbols they’re seeing from inside. However, if we consider a modification and put the Chinese Room in a robot body with sensors such as video cameras, microphones etc… it would roam the world freely and begin to attach meaning to the symbols it encounters much as a child learns language by interacting with the world. This reply sits well with the teleosemantic view we’ve sketched above. If a robot system begins to optimise for movement in the world it will build a world-model which helps it to navigate and allows it to attach meaning to the things it encounters. In other words, it starts to optimise its map to align with the territory. 

Searle’s response to this is that the world-directed content is just another form of syntax which would be impenetrable to the operator inside the Chinese room. For example, imagine that the images and sounds the robot is encountering outside the room are digitised and fed into the room in binary format as 1’s and 0’s. Searle from his viewpoint within the room would still be unable to understand the stream of content but would retain the ability to manipulate the symbols and thereby cause the robot to interact with the world. 

The key point is that teleosemantics has an answer here. The embodied robot system develops genuine understanding through its optimisation process. The producer and consumer mechanisms work together to create proper functions which constitute semantic concepts. By contrast, Searle from within the room is not following an optimisation process - the rule book has been given to him and he’s implementing the rules. But there’s no feedback loop with the environment. No “reward” to tell him if he’s implementing the rules correctly or incorrectly and no genuine understanding of what the symbols mean. 

What does this all mean for LLM’s? 

This brings us back to the question of LLM’s. During training LLM’s undergo an optimisation process to learn to predict next tokens accurately and to output responses favourable to human judges via RLHF. Is the fact that they undergo optimisation enough to ground genuine understanding via teleosemantics? 

The answer is nuanced. Unlike frogs or other animals who evolved visual systems to track flies and other “real-world” objects, the LLM’s entire input/output stream is text. When an LLM encounters the token [dog] its consumer mechanism is not a motor system that needs to interact with actual dogs in the real world. This implies that whatever semantic meaning it derives from the token [dog] it cannot fundamentally refer to the same concept of “fluffy, barking animals” that humans refer to when they use the token [dog]. Their “mental states” would be entirely different to humans. 

However, the LLM still undergoes an optimisation process which means according to teleosemantics it still derives some semantic meaning from the token, just maybe not the meaning we might naively expect it to have as humans. The massive human text corpus that LLM’s are trained on encodes facts about dogs, such as the fact that they bark, are fluffy and wag their tails etc… the LLM builds internal representations via its optimisation process that reliably predict tokens in the sequence corresponding to these facts. Its proper functions therefore reliably track linguistic phenomena which themselves track world-directed content second-hand. 

This optimisation process builds up a “web of concepts” whereby LLM’s have access to relationships between many different tokens embedded in a high-dimensional latent vector space. Importantly, this web of understanding doesn’t perfectly track the real world. Sure, it’s been built up by optimising for next-token prediction and human preferences which track the world in somewhat high fidelity, but this isn’t perfect as the LLM never interacts with the real world itself. As a result, the understanding it builds will be a spectre of the genuine human-like understanding that we build up via interaction with the real world.

This also explains many interesting nuances of LLM behaviour. For example, sycophancy can be attributed to the fact that LLM’s have been optimised via RLHF based on human judges. Their view of “correct” answers is entirely based on what humans like to hear so, according to teleosemantics they should output somewhat sycophantic answers. Another example is hallucinations. On teleosemantics hallucinations are misrepresentations just as a frog mistakenly snaps its tongue at a BB pellet instead of a fly so too the LLM can output hallucinations as a result of its optimisation. 

Conclusion

Taking a step back, we’ve reviewed three systems exhibiting different levels of understanding according to teleosemantics. 

  • Searle within the Chinese room - exhibits no understanding. There is no optimisation process and no world-directed content.
  • Embodied robot system - exhibits full understanding once it has interacted sufficiently with the world to build an accurate world model via teleosemantics. 
  • LLM - exhibits linguistic understanding based on interacting with human text corpora via optimisation. Understanding is fundamentally different to human understanding since the optimisation function is different and it involves pure text and no direct world interactions. 

In some sense, even in spite of modern LLM’s displaying such a sophisticated understanding, the thrust of Searle’s argument still rings true and can provide a useful lesson. Searle primarily argued against “rule-based” encoding of syntax in AI programs and, I think correctly, anticipated that these hard-coded rules would never develop the genuine understanding that today’s LLM’s have achieved. Indeed, Searle never ruled out that AI would one day have genuine intentionality saying that such an AI would need to recreate all the “causal powers” of the brain. 

It’s not clear exactly what he meant by “causal powers” but I think teleosemantics provides a clean framework to distinguish between the relevant cases. 

 

  1. ^

    Strictly speaking, teleosemantics is a theory of what constitutes semantic content, but I’m using it in this essay to talk about how states acquire semantic content. My model is that teleosemantic optimisation is how systems typically acquire semantic content but, once acquired, the constitution of mental content is played by the causal/functional roles of the concepts themselves.

    This avoids Swampman type objections to teleosemantics (i.e. a molecule for molecule copy of a human could still have mental content) but opens up other complexities which I plan to expand on in a future post. 

  2. ^

    The argument has historically been linked to debates about machine consciousness, but for now I'll bracket these discussions and just focus on meaning and understanding. 

  3. ^

    Regarding notation: capitalised letters indicate the concept (semantics) whereas brackets indicate the token (syntax). Here DOG refers to the semantic content and [dog] refers to the syntax.



Discuss

How singleton contradicts longtermism

Новости LessWrong.com - 24 сентября, 2025 - 15:00
Published on September 24, 2025 11:10 AM GMT

As is well known, long-termism rests on three core assumptions:

1. The moral equality of generations.
2. The vast potential of the future.
3. The ability to influence the future.

While the third assumption is commonly criticized, the first and second points receive far less attention for some reason, especially in the context of the most likely ASI development scenario. Talk of myriads of meaningful lives makes little sense if we stop imagining a utopian, densely populated galaxy and instead consider the motivations of the agent that will be shaping that galaxy.

In most development models, the first agent to achieve superintelligence (ASI) will become a singleton. Its behavior will, with high probability, be determined by instrumental convergence.

  1. An ASI will see humanity, and any other independent agents, as a potential threat to achieving its goals. Any other agent with a different value system or set of goals is a risk. The most effective way to manage risks is to eliminate them. Therefore, a singleton will strive to prevent the emergence of any new agents it cannot 100% control, or at the very least, minimize it.
  2. Even if its goal is 'aligned', it should be understood that under real-world conditions, an aligned agent might commit terrible acts simply because in doing so it would, for example, avoid far more terrible suffering during the period it would have spent implementing and searching for expected alternative solutions.
  3. Any external constraints imposed on an ASI ("do no harm," "do not destroy other life forms") will either be bypassed if it is unaligned, or they will become the cause of paradoxical and even more destructive actions if it is aligned but forced to operate within suboptimal rules. Simply put, constraints are more likely to lead to greater suffering due to inefficiency than to prevent it.

Thus, the entire argument around longtermism is predicated on an ASI deciding (not us!) to prioritize working on new, unaligned agents over cost reduction and greater safety. And for the first hypothesis to be true in these agents, they would need to be conscious, which in strict terms is not necessary and would therefore be absent.

I believe that this must be weighed in the context of modern long-termism, which is likely to assume the uncontrolled proliferation of unnecessary agents. My estimate is that the world will likely become 'empty' morally.

What work have I missed that contradicts this?



Discuss

Marketing advice for AI Safety

Новости LessWrong.com - 24 сентября, 2025 - 14:36
Published on September 24, 2025 11:36 AM GMT

I wrote this in DMs to Akshyae Singh, who's trying to start something to help bring people together to improve AI Safety communication. After writing it, I thought that it might be useful for others as well. 

I'd like to preface with the information that I'm not a marketing expert, haven't formally studied in any way and haven't been paid for it either. Just a measurements nerd that cares about messaging and communication and think most marketing people actually do it quite badly, if you measure good/bad by conversion success (there are exceptions, ofc, e.g. several online creators, Nike (one of the first to notice outrage marketing), etc) . If you'd like some evidence of my competency, I've run 3 AI Safety events so far this year, which respectively attracted 165, 159, and 298 prospective AI Safety researchers

 
I think social media is going to become even more saturated than it already is, with AI video generation being an integrated part of every platform - so they'll have to compete with that, if they're doing social media. 
I think IRL stuff though, especially in person, is actually extremely neglected rn, due to marketing companies and marketing departments wanting to save money and overhead - and when working with very large brands, e.g. Coca Cola, Nike, etc, it's hard to measure conversion rates for ads, so it's often just based on number of views - the marketing people are incentivzed to get those view numbers as high as possible, so their bosses are happy and the platform is as well, so that they can charge more.
So if AI Safety people actually realize this and market in person, where there is much less competition and much lower quality competition, that could be a winning strategy.
I think if you want other AI Safety orgs to adopt you and use you, think of you as useful, etc, the best thing to do is be very useful, at scale
A relatively simple way to do this would be to develop Guides for messaging/marketing that are tested and kept up to date
The majority of AI safety organizers don't really know how to do social media, how to do messaging, etc.
If you know or get in touch with growing tiktokers, youtubers, douyin creators, etc, have them brainstorm and learn what actually works best for ai safety messaging, to actually get conversion, not just views, and keep that guide up to date - or just date when it was last tested - that would be extremely useful
You could also charge for this and have it be a way that you're financially aligned
So that you make more money by actually being good at ai safety messaging
Rather than essentially every AI Safety org rn, which is financially misaligned - they make money from writing a funding application, having the first page of their research look important, etc
 



Discuss

Страницы

Подписка на LessWrong на русском сбор новостей