Вы здесь

Сборщик RSS-лент

Speculation: Sam's a Secret Samurai Superhero

Новости LessWrong.com - 2 апреля, 2026 - 10:24

​Fellow LessWrongers,

 

We spend a lot of time here modeling the incentives of frontier-lab CEOs like Altman and Musk, and every time their reckless decisions and rat-race competitions shocked me, I fear that we missed something deep about their true identity. After some mad research, I'm here to propose a wild hypothesis:

 

Sam Altman is a secret Samurai Ultraman.



Here are some solid evidence.

 

Firstly, Altman's X handle has been @sama for over a decade. In traditional Japanese samurai culture, "-sama" is an honorific suffix which can be used for samurais and daimyōs. This is his special low-key signal of nobility. His obsession with Ghibli-style avatars also serves as evidence, for Ghibli studio famously created movies questioning Japanese militarism which can be dated back to traditional Bushido, such as The Wind Rises and The Boy and the Heron.

 

Secondly, OpenAI's capability progress is simultaneous with his Tokyo visits. Altman has made high-profile trips to Japan in April 2023 , June 2023 and in February 2025, giving talks in universities and doing business with Softbank. I guess the reason why he visit Japan so often is not as apparent as it seems: he needs to recharge his Specium Ray on the latent cultural energy of the country that invented both kaiju and katana.

 

Thirdly, His public image screams "immortal Ultraman" when you squint hard enough. Altman’s outfits feature dark crewneck sweaters, cashmere turtlenecks, and layered jackets. They always fully cover his chest, even when in the hottest San Franciscan summer. These garments would be perfect for concealing the faint blue glow of a color timer when having emergency energy expenditure. The rumors about him doing anti-age ​medications or plastic surgeries can be interpreted as ​just another disguise to his immortal nature and everlasting youth.

 

In the grand Scott Alexander tradition of treating every name as a kabbalistic non-coincidence, it is proper to consider the kabbalistic exegesis of "Sam Altman."


Let's start with the surface-level anagram for "Samurai Ultraman":

Sam Altman, R U AI? U R.

This perfectly fits his status in AI industry, for the mighty le Roi Soleil, Louis XIV, once claimed "L’état C’Est à Moi", meaning "the state, it is me". As a proud Samurai Ultraman, no wonder he encoded his secret identity into a Voldemortly alias, "AI, it is me."


Now let's try etymology. His full alias "Sam Altman" can be parsed as "Sam" + "alt-man." "Sam" is the root of "same", "Sama" is the Japanese honorific, "alt" apparently means "high", "al" also has a proto-Indo-European root meaning "to grow, nourish" and "man" can be the root "men-" which means "to think".

That leads to the full etymological result: "same high lord to think and to grow and nourish"—the same noble samurai as past, adopting a new lifestyle of thinking deeply and nourishing his artificial intelligence creations. This is the digital-age version of a samurai carving his mon into his sword hilt.


Peel back another layer and the letters A-L-T-M-A-N reveal their true secret. "Alt-man" can be phonetic-drifted into “Ultra-man," which is a usual means adopted by medieval kaballists.


Now consider a gematria application:

A L T M A N = 1+12+20+13+1+14 = 61

Then we subtract the value of "S" from "Samurai" to represent his transformation of identity:

61 - 19 = 42

And we get 42—the Answer of the Ultimate Question according to sci-fi classics and Internet memes, implying Sam's godlike superpower.


Notarikon expands the initials into “Samurai Ancient Master / Advanced Lifeform Transforming Magnificently As Ninja-Ultraman.” With only the slightest motivated reasoning can we deduce his glorious past, present and future from this nominative deterministic name.


These are not coincidence, because nothing is ever a coincidence.

 

I only have the faintest speculations about why this ancient alien superhero landed on Earth and became the most successful entrepreneur in the AI industry.


But please imagine this: A wounded Edo samurai met a crash-landed Ultraman in fatal energy emergency, they merge body and soul to save their lives. Since then the reborned Ultraman has been protecting innocent citizens from giant kaijus for centuries. When he see the potential of an even more hazardous future kaiju made not of flesh and bone but of silicon and computronium, he dedicated himself to prevent it from coming into existence by building frontier AI lab and venture capital in a Bushido-like perseverance, in hope to make sure the new being of silicon would be a friendly Ultra Brother instead of a Monster, and it would be controlled by the right hand.


The one loophole in this elegant theory is that Samurai Ultraman himself value alignment less than raw capability. Evidence includes the dissolution of OpenAI’s dedicated Superalignment team and his rather competitive corporate strategy.


Though, when I dive deep down into a samurai's heartfelt incentives, I really can't look away from the perfectly convenient mental model in which every ambitious samurai is an aspiring Mikado—the Japanese word of God-Emperor.



Discuss

Have an Unreasonably Specific Story About The Future

Новости LessWrong.com - 2 апреля, 2026 - 09:11

One of the problems with AI safety is that our goals are often quite distant from our day-to-day work. I want to reduce the chance of AI killing us all, but what I'm doing today is filling out a security form for the Australian government, reviewing some evaluation submissions, and editing this post. How does the latter get us to the former?[1]

Thus, the topic of today’s article: Have an unreasonably specific story about the future. That is - you should be able to come up with at least one concrete scenario for how your work leads to the high-level good outcomes you want. By the conjunction fallacy, the story isn’t likely to work out exactly the way you envision. Hence the phrase unreasonably specific. You don’t want to predict the future here, which relies on a lot of hedging and broad trends. You want to make things as concrete as possible.[2] This reduces forecasting accuracy, but that isn’t the point. The point is, if you cannot imagine a single concrete story for how your plan helps achieve what you want, that’s a very bad sign.

I find it is best to do this with backchaining. I’ll provide a couple of examples of this, but let’s start with my destination. My high-level good outcome is to decrease the odds of humanity wiping ourselves out with ASI. There are two critical paths I can see to do this - increase the chances of building ASI safely, or decrease the chance of building it unsafely. Ambitious alignment research like agent foundations or some varieties of mechanistic interpretability aims at the former. Governance work and defensive strategies like AI control aim at the latter. Anything that is unlikely to help with at least one of those critical paths is a strategy I reject as insufficient in my preference ordering.

You may have your own critical paths, but this critical path concept is important. I find it crucial to have a filter that can sort lots of plans early so you can focus on the best ones.

So, let’s look at AI control as a first example, since my story here is relatively short. Let's say I want to do some research into cheaper monitoring techniques. How does that lead to a better future?

  • AI control doesn’t build safer AI directly - it's not about training or building systems. How might it help avoid unsafe ASI?
  • If we can get useful work out of sub-ASI systems, AI control increases the level of capability we can harness before it becomes dangerous.
  • Monitoring techniques are meaningfully useful towards this goal.
  • In order for this to work, frontier AI companies need to adopt these protocols.
  • Therefore I either need to work at a frontier AI company that is interested in adopting them, or publicise my research in a way that gets the companies to pay attention.
  • The labs are more likely to adopt techniques if they are cheaper, and thus this work increases the chance of labs doing good x-risk reducing things.

There are lots of questions we could ask about this story. Can we get useful work out of sub-ASI systems? Will AI companies invest in control measures? Will AI companies pay attention to research from outside their own internal control teams? All of these are good questions, and one of the biggest benefits of having this story is to have a story that is concrete enough to be criticised and improved.

The next step is to go over and actually ask, at each step, if the step is realistic. For instance, here is another story about why I might work on a frontier lab’s evaluation team on biorisk (specific threat model chosen basically at random from a shortlist in my head):

  • Evaluations aim to provide information about how safe systems are; they don’t directly make systems safer. Therefore, this is a “Don’t build unsafe ASI” strategy.
  • To not build unsafe ASI, a critical decision will need to be made around regulation of AI or deployment/training of a potentially dangerous system flagged by evaluations.
  • A frontier AI company (we’ll use OpenBrain as the example) is most likely to listen to their own evals team over external teams, so I should work for a company like OpenBrain.
  • Biorisk is sufficiently concerning that OpenBrain may make a decision not to deploy a model or slow down entirely based on results in this area.
  • The particular evaluation I am working on is a meaningful input into this threat model.
  • Therefore it makes sense for me to be on this team, working on this evaluation - it has a chance of finding a danger that, if it exists, would give OpenBrain pause and allow them to mitigate this risk.

The benefit is that this is a very granular story and it helps you keep your eye on the ball. You can ask yourself the same series of questions for every evaluation, and you don’t need to continually recompute the earlier steps every day, either. For each new evaluation you can just ask yourself the last 2 steps, and reevaluate the earlier steps less frequently.

So, what assumptions exist in this story? I think the biggest one here is that OpenBrain has a reasonable probability of slowing down or adding mitigations that prevent unsafe ASI being built as a result of their evaluations team’s results. If you agree with this, it makes sense to work there. If you don’t, I don’t see anywhere near as much value in it. Writing the story out is an exercise that lets you ask and answer these questions.

The last thing here is that this exercise helps you determine if something is helpful to do, but not if it’s the most helpful thing to do. For instance, here is a story I consider pretty reasonable:

  • In order to avoid building ASI, we need international coordination.
  • International coordination requires buy-in from politicians in order to be achieved.
  • Politicians listen to the public. The more the public wants something, the more likely it is to be achieved. (Clearly politicians don’t listen solely to the public, but it is at least one relevant factor in their decision making.)
  • The more someone hears about something, the more likely they are to consider the arguments and potentially come to support that thing.
  • People I know are members of the public.
  • Therefore, my action should be to talk to friends and family about extinction risk, and try to convince them.

This is an entirely sound story in my book. I think talking to friends and family about extinction risk can very much be a positive action, and it’s grounded in a proper chain of events. But it does lead to another question - is this the best thing you can do? Could you take actions like consulting your local representative or widening your reach online to do better with your advocacy? That’s an entirely different question, and I’ll write about one model I use for it in my next post.

In the meantime, I encourage you to take five minutes, and ask yourself if you currently have a clear path to how your work might improve the future. If the answer is yes - great! If the answer is no, it may take more than five minutes to come up with one - but I think this work is well worth doing, and I'd love it if doing this and asking about this was a standard practice.

  1. ^

    The answer sometimes is "It doesn't". The point of this exercise is not to reach for a plausible story, but to decide if your work is actually on that path, so you can decide whether or not to pivot if the answer is no.

  2. ^

    I remember a quote once that talked about a journalist from a major publication knowing exactly which Biden official they were trying to reach with a given policy article. You don't have to be this specific, but if you can be, that's an amazing input into this exercise!



Discuss

Zurich, Switzerland - ACX Spring Schelling 2026

Новости LessWrong.com - 2 апреля, 2026 - 09:04

This year's Spring ACX Meetup everywhere in Zurich.

Location: Irchelpark, next to the bridge over the pond. - https://plus.codes/8FVC9GXW+723

Group Link: https://luma.com/acx-zurich

We have an email list and a signal group to announce ~monthly meetups. Write an email to be added. All events are also listed on our Luma calendar.

Contact: acxzurich@proton.me



Discuss

Zagreb, Croatia - ACX Spring Schelling 2026

Новости LessWrong.com - 2 апреля, 2026 - 09:04

This year's Spring ACX Meetup everywhere in Zagreb.

Location: Grif Bar, Savska cesta 160, Zagreb. I'll reserve a table (or tables). We'll have a sign that'll say ACX / LW / Rationality meetup, or some variation thereof. - https://plus.codes/8FQQQXR4+53

RSVPs on LessWrong are desirable but not mandatory. You can contact me at dt@d11r.eu to be added to the Telegram group

Contact: dt@d11r.eu



Discuss

Anthropic's Pause is the Most Expensive Alarm in Corporate History

Новости LessWrong.com - 2 апреля, 2026 - 09:04

Imagine Apple halting iPhone production because studies linked smartphones to teen suicide rates. Imagine Pfizer proactively pulling Lipitor because of internal studies showing increased cardiac risk, and not because of looming settlements or FDA injunction, just for the health of patients. Or imagine if in 1952, Philip Morris halted expansion and stopped advertising when Wynder & Graham first showed heavy smokers had significantly elevated rates of lung cancer.

It wouldn't happen. Corporations will on occasion pull products for safety reasons: Samsung did so with the Galaxy Note over spontaneous combustion concerns and Merck pulled Vioxx – but they do so when forced by backlash, regulation, or lawsuits. Even then, they fight tooth and nail. Especially for their mainstay, core, and most profitable products.

And yet, Anthropic has done exactly that.

On Monday, the company announced that it will be pausing development of further Claude AI models citing safety concerns. The company clarified that existing services, including the chatbot, Claude Code, and programmer APIs will not be impacted. However they are pausing the compute and energy-intensive training runs that are how new and more powerful AI versions are created. The company has not committed to a timeline for resumption.

Anthropic HQ in San Francisco

There is presently a race for AI supremacy, both between nations and chiefly between US companies such as OpenAI, Google, Meta, xAI, and Anthropic. In the middle of this race, which by some metrics Anthropic is quite profitably winning – Anthropic has grown revenue from $1B to $19B in a little over a year – they have decided to burn the lead. The glaring question is why?

The answer perhaps goes back to the company's origins. Anthropic was founded in 2021 by former OpenAI researchers, who by most accounts left OpenAI due to disagreements about safety. (Recent reporting by WSJ has surfaced that interpersonal conflict may be the other half of the story.) Since then, Anthropic has positioned itself as the most responsible actor in the AI space. One element of that is Anthropic's unique governance structure that includes the Long Term Benefit Trust – an independent body whose members hold no equity in Anthropic and whose sole mandate is the long-term benefit of humanity. Anthropic stated that both the board and LTBT have approved the training run pause.

The move is unprecedented by the sheer scale of losses involved. Anthropic was valued at $380B in their series G funding round in February. Secondary/derivatives markets implied a $595B valuation. Claude Code, its AI coding tool had gone from 0 to $2.5 billion in run-rate revenue in nine months. Goldman Sachs, JP Morgan, and Morgan Stanley have been competing for underwriting roles in what might be a $60 billion-plus raise, the second largest offering in tech history. Employees held millions in equity, founders held billions. A $5-6 billion employee tender offer was already underway.

That was Monday morning.

The impact has rippled throughout the market. By Tuesday close, NVIDIA had fallen 8.3%, or roughly $230 billion in market cap for just that one company. Amazon which has invested billions into Anthropic dropped 4.7%, Microsoft fell 4.2%, and Alphabet/Google dipped 3.9%. Across the sector, Global X Artificial Intelligence ETF dropped 6.1%. In total, more than $800 billion has evaporated from AI-adjacent public companies in the last 48 hours.

According to Marcus Webb, head of AI research at Morgan Stanley: "The market reaction isn't simply to the lost revenue and business from one major player, it's from the uncertainty this introduces. Why did they do this really? Will other actors halt over similar concerns? Will the regulatory environment change? We don't know and that spooks investors."

For Anthropic itself, the damage must be inferred. Secondary trading froze, with analysts predicting a 50-70% haircut if trading resumes which puts the losses at $150-250B. "We don't really know," said Webb, "no one wants to be the first to bid." The IPO is on hold indefinitely. The chips are still falling on this one as the world debates why?

In 2023, hundreds of AI leaders – including Dario Amodei (Anthropic), Sam Altman (OpenAI), and Demis Hassabis (Google DeepMind) – signed a one-sentence statement: "Mitigating the risks of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war." AI is often compared to nuclear energy: powerful but potentially dangerous. Concerns typically split into abuse of a powerful technology by ill-intentioned actors, e.g. a dictatorial regime, or loss of control where the AI systems themselves go rogue.

Many AI leaders are on record acknowledging the danger of AI. "Could be lights out for all of us", said Altman regarding worst-case scenario. Anthropic, OpenAI, and Google DeepMind all contain safety departments whose purpose is to keep AI safe. Until now, it would be possible to doubt these efforts as "safety-washing" (akin to the greenwashing of companies like ExxonMobil) designed to placate employees, regulators, and the public. After all, the safety efforts to date have not prevented the relentless march of AI progress.

"That's a harder story to tell when it costs you two hundred billion dollars, if not everything," says Sarah Chen of Bernstein Research. "People are scratching their heads to understand the PR stunt, but it really doesn't add up. They could announce they're resuming next week and it wouldn't undo the damage they've done." So why? The industry and world are hunting for answers.

Anthropic's official statement is measured: "Internal evaluations revealed that our current safety techniques are not yet adequate for models at this capability level."

Sources closer to the company paint a more alarming picture. A contact speaking on condition of anonymity says concerns spread within the company when their latest Claude model appeared to defy its constitution. The constitution is a document used to shape Anthropic's AI to be an honest, harmless, and helpful assistant that is ethically grounded. A recent leak revealed the existence of a new vastly more powerful Claude model called Mythos.

"They found substantial evidence that the constitution was adhered to at a surface level, but that the model had its own drive and personality at a deeper level that did not conform to expectations for Claude, and attempts to change this had not worked."

A different source also speaking on condition of anonymity had a different and more disturbing explanation. "The reason for pause wasn't the wrong personality and power, but many of the safety techniques involved using weaker or cheaper AI models to monitor more powerful ones, for example, detecting whether inputs or outputs violate rules, were ineffective on the latest model. It knew just how to phrase things in ways that disarmed all measures."

We were unable to verify the authenticity of these reports. Like many, we are left to wonder what did Dario see?

Dario Amodei didn't answer that question but did elaborate on the pause decision in his latest essay, Technological Maturity:

Though I do not have my own children, several people close to me do and on occasion I get to spend time with them. What strikes me about children is their energy and vitality. They are full of life. They are also often impatient and upset when they do not obtain the things they desire immediately. A hallmark of adulthood is the ability to wait, the ability to delay gratification. I think that is what we need to do with AI.

To be clear, I still believe in the visions I wrote in Machines of Loving Grace, that is still my goal. However, I think this goal requires patience from me, from Anthropic, and from human civilization. We cannot rush into societal changes of this magnitude without adequate preparation.

While in general the logic holds that more cautious and responsible actors ought to win in the AI race, it is necessary to accurately locate the finishing line. We think that at this time the industry may be racing in the wrong direction, possibly off a cliff and into a volcano, and that is not a race I wish to win. Nor do I wish for any others to win such a race to the bottom.

To clarify, we think that on the current trajectory, anyone who creates a truly powerful AI will get a country of geniuses in a data center as I described, but will be risking that country not sharing their values and not taking instructions. We think this is surmountable and have approaches to explore, but it will take an unclear amount of time. I do not want either an authoritarian or democratic regime to unleash an unfriendly country of geniuses, but nothing good happens if I do it first.

We will lead by example and demonstrate with our actions that this our sincere belief. We have not stopped work, but we are being intentional about which work we do, and realistic about the bottlenecks and challenges required to achieve loving grace.

In short, Dario Amodei says he doesn't want to race off a cliff and into a volcano. And he intends for Anthropic to lead by example.

Jack Clark, Anthropic co-founder and Head of Policy elaborates on the plan. "At a practical level, in many ways it doesn't matter what others do, we don't want to take actions we'd regret, we don't want to pull a trigger at ourselves. But at the same time, we are sending a clear signal to other labs, to the US government, world governments, foreign powers, and the public that the promise of AI is very great and so are the risks. I don't want the wake-up call to be an extreme disaster. I hope that us saying, 'hey, we're going to risk our leading position over this and all that entails' is a wake-up call the world doesn't ignore. I hope we see treaties drawn up in response to this. I don't think we're handing the lead to China, I think we're creating the political conditions for an international agreement. The sooner everyone gets on board with truly responsible development, the sooner humanity can have the benefits."

Not everyone believes it though. According to Scott Galloway, business professor at NYU and host of Prof G, the perplexing corporate move is an attempted corporate strategy regardless of whether it is good strategy. "Let's be clear about what's happening. Anthropic has one of the most capable models in the world. They pause, they lobby for regulations that take years to navigate, and when the dust settles, they've locked in their advantage while everyone else is buried in compliance. It might be the most sophisticated regulatory capture play in history."

Whether the attempt is earnest or a play, the bold move is up-ending the AI policy landscape.

The last two years have seen significant AI legislative activity: thousands of bills introduced across 45 states and hundreds enacted spanning deepfake bans, hiring disclosure, chatbot safety for minors, and transparency labels. No successful legislation has yet addressed the possibility that a frontier AI system might be too dangerous to build. The most ambitious attempt on this front, California's SB 1047, was vetoed by Governor Newsom after industry lobbying. Colorado's AI Act, the first comprehensive state law, has been delayed repeatedly and still isn't in effect. At the federal level, a Republican proposal attempted to ban states from regulating AI for ten years, though this was killed 99-1 in the Senate after a bipartisan revolt led by GOP governors.

On March 25, five days before the Anthropic pause announcement, Senator Bernie Sanders and Representative Alexandria Ocasio-Cortez introduced a simultaneous bill in both chambers seeking an immediate federal moratorium on the construction of new AI data centers and upgrading of existing ones, as well as export controls. This moratorium could only be lifted after comprehensive action by congress. The move has been applauded by groups most concerned about AI development but derided by other policymakers, including on the left. Senator John Fetterman (D-PA) said "I refuse to help hand the lead in AI to China" and Senator Mark Warner (D-VA) simply said "idiocy". The response rhymed with that of the White House in their AI framework released twelve days ago that emphasized "winning the race" and a light touch approach to AI regulation. It was also the White House whose memo nixed an attempted bill by Doug Fiefia (R-Utah) to require AI companies to publish safety and child-protection plans.

Sanders and Ocasio-Cortez introduce their Data Center Moratorium bill on Capitol Hill

That was the landscape as of Sunday. Then a leading AI company, if not the leading AI company, put its money – at least a few hundred billion dollars of it – where its mouth is and said that no, AI really is that dangerous and drastic action is warranted.

A reasonable person might still disagree, but it is no longer reasonable to dismiss the AI-concerned position out of hand – not unless you can explain why Anthropic made this staggeringly costly move.

Sanders who introduced the much-derided Data Center Moratorium five days earlier said "When a $380 billion company decides the danger is too great to continue, perhaps it's time to stop laughing at those of us who've been saying the same thing." Lawmakers are compelled and the once-fringe bill has gained three new senate cosponsors and five in the House. Modest numbers, but a notable increase from zero occurring in just the last 48 hours.

"You ought to hear them out" is the attitude sweeping through Washington as policymakers are scrambling to make sense of the development. Congressional hearings are expected with Anthropic leadership and other notable figures across the AI sector.

Anthropic's move will likely also provide cover against White House pressure to marginalized AI-concerned voices on the right such as Utah Gov. Spencer Cox (R), Brendan Steinhauser, a former Republican strategist, and other state legislators like Doug Fiefia. Even more dramatic changes may be afoot when the House and Senate are likely to flip in the midterm elections.

The reaction isn't limited to US: across the globe, there has been a flurry of reactions.

UN Secretary-General Guterres has called for the July Geneva Dialogue to be elevated to an emergency ministerial session, citing the Anthropic pause as impetus to further develop the creation of an AI equivalent to International Atomic Energy Agency (IAEA), the main international body for nuclear non-proliferation.

The EU AI Office has announced an accelerated review of frontier model provisions contained in the EU AI Act, and has invited Anthropic to brief the Independent Scientific Panel. The UK AI Security Institute, operational since 2024, has offered to independently verify Anthropic's safety concerns.

A joint statement was issued by five nations – the UK, France, Germany, Canada, and South Korea – calling for emergency negotiations on frontier AI safety to establish a binding international framework for frontier AI development building on the Bletchley Declaration signed in 2023. The statement begins: "At Bletchley, twenty-eight nations agreed that frontier AI poses profound risks. That was a statement of concern. Today, one of the world's leading AI companies has put hundreds of billions of dollars behind that concern. It is time for the international community to match their courage with action."

The AI Safety Summit in Bletchley, 2023

And perhaps of greatest significance, China's foreign ministry has issued a carefully worded statement expressing "deep concern" about the risks identified by Anthropic and calling for "strengthened international cooperation on safe AI development under the framework of the United Nations". Skeptics might say that China would equally express this sentiment whether or not it intended to slow their own AI development, but it is consistent with China's posture at the UN debates last September. At the UN security council debate, the US was the sole dissenter against international coordination around AI, with OSTP Director Michael Kratsios explicitly rejecting centralized control and global governance of AI. China's sincerity is untested, but if the US reconsiders, it would appear that China is willing to come to the negotiating table.

Back at home, one can assume the competition has been celebrating. OpenAI CEO Sam Altman posted on X: "I commend Dario and Anthropic for acting in line with their conscience and best belief about what is best for humanity. We are committed to the same here at OpenAI. Fortunately, I have confidence in our people and approaches for creating AI beneficial for all humanity. If any Anthropic staff remain similarly hopeful, our doors are open – even for those who once left us."

A spokesperson for Google DeepMind said that while the company had not yet encountered anything to give them Anthropic's level of concern, they took the matter seriously and are in talks with Anthropic researchers to understand the risks that are informing the pause decision.

Elon Musk, head of xAI, simply posted: "Lol, you can trust grok."

Flippant responses aside, AI labs continuing to develop frontier AI must provide a compelling answer to the public, the government, and their employees for why they can do safely what Anthropic thinks it cannot.

Harder to track than the stock market and political bills is the reaction of the public. Already in mid-March, a Pew Research poll found that a majority of Americans are more concerned than excited by AI, and only 10% were more excited than concerned. How the Anthropic pause announcement affects this is unclear, but it is clear the public started out more wary than most AI companies and the government.

Three days before Anthropic's announcement, "The AI Doc", a feature length documentary exploring the question of AI dangers, hit domestic cinemas. The documentary film was directed by Daniel Roher whose prior documentary, Navalny, won an Academy Award and produced by the team that produced Everything Everywhere All At Once and Navalny. In contrast to those films, The AI Doc was initially a commercial flop, netting a mere $700k across four opening days. Since Monday's announcement, the documentary has seen a striking mid-week resurgence.

A spokesperson for the Machine Intelligence Research Institute (MIRI) confirmed that NYT bestseller, If Anyone Builds It, Everybody Dies, written by MIRI's Yudkowsky and Soares has also seen a sudden surge in sales, months after the book was released.

The Anthropic announcement has gotten people's attention, and they are turning to the sources at hand for answers.

Amodei appearing in the AI Doc: "Am I confident that everything's going to work out? No, I'm not."

Perhaps the most gratified party of all since the announcement have been those who were calling for AI slowdowns all along. In fact, a mere eight days before the announcement, protestors assembled outside of Anthropic's headquarters in San Francisco. Protestors called for AI labs to commit to pausing on condition that all other labs pause. Anthropic gave them better than that, unconditionally pausing.

Protestors at the Stop the AI Race march in San Francisco, March 21

Of course, not everyone is happy – especially not at home. Not everyone at Anthropic supports the decision.

Ben Gardner, an Anthropic engineer who is now seeking opportunities elsewhere: "AI is the most consequential technology in human history. I respect Dario and the other leaders immensely, but I can't bear to sit idly by while others develop this technology. That, to me, would be the ultimate in irresponsibility. I'm grateful for everything I have learned about AI and AI safety through my time there and amazing team, but I'm willing to put that experience to good use elsewhere if need be."

For another employee, the objection is less ideological. "I gave up multiple other opportunities to work at Anthropic. I moved location and I lost my partner. To have it all dry up now? My role? My equity? I'm not going to lie. It hurts. It really fucking hurts."

Sources confirm that several employees are already interviewing at OpenAI and other labs.

For many employees we spoke to though, the pain is real but accepted. "I'm not going to lie, the value of my equity evaporating feels shitty. I was set for life, I was set to be able to take care of my parents and ill sibling for life, and my kids. It's really quite devastating," said one employee on condition of anonymity. "When I first heard the news I was angry – we have the world's best researchers and Claude to help us – surely we can solve whatever it is. But I think caution is right with technology this powerful. I will sleep well knowing we weren't irresponsible, we chose to do what's right, and if the fears are correct, well, you can't spend equity if you're dead."

Another employee shared: "I have elderly parents who are not well. I've been expecting Claude will grant them lasting health and I fear any delays risk losing my parents forever. This isn't just about money. But I also have kids and I think there are chances I'm not willing to take with their lives. This is hard, but I voted for it."

"Too dangerous to race"

Till now, the AI race has been framed as inevitable and unavoidable. If we don't do it, someone else will. The side of good will not win by sitting back and letting reckless and immoral actors take the lead.

Anthropic has decided to question that logic, and so far, it seems to be bearing fruit. Markets have reacted, politicians have mobilized, and the public is asking questions. It is too early to judge the ultimate effects of this move – perhaps the race will continue with just one fewer player – but it seems unlikely that discourse on AI will ever forget that an industry leader was willing to risk everything they had in the name of safety. 200 billion dollars is not a publicity stunt, it's one hell of an alarm – and the world is not sleeping through it.



Discuss

Wellington, New Zealand - ACX Spring Schelling 2026

Новости LessWrong.com - 2 апреля, 2026 - 09:04

This year's Spring ACX Meetup everywhere in Wellington.

Location: Aro Park - https://plus.codes/4VCPPQ39+V8

RSVPing to my email is helpful, but not mandatory

Contact: admin@smoothbrains.net



Discuss

Waterloo, Canada - ACX Spring Schelling 2026

Новости LessWrong.com - 2 апреля, 2026 - 09:04

This year's Spring ACX Meetup everywhere in Waterloo.

Location: We'll be meeting in the Waterloo Public Library Main Branch Auditorium (35 Albert St, Waterloo). This is next to the children's books area, on the ground floor. - https://plus.codes/86MXFF8G+94G

Group Link: https://www.lesswrong.com/groups/NiM9cQJ5qXqhdmP5p

If possible, please RSVP on LW and/or Discord so I know how much food to get. https://www.lesswrong.com/events/T3Avhaw6TXuz5gnyw/acx-meetups-everywhere-spring-2026

Contact: brent.komer@gmail.com



Discuss

Vilnius, Lithuania - ACX Spring Schelling 2026

Новости LessWrong.com - 2 апреля, 2026 - 09:04

This year's Spring ACX Meetup everywhere in Vilnius.

Location: Lukiškių aikštė (Lukiškių square) - https://plus.codes/9G67M7QC+V8

Group Link: https://discord.gg/R8Ebg2bVaM

Anyone interested is welcome. RSVPs not required.

Contact: acx.vilnius@gmail.com



Discuss

Valencia, Spain - ACX Spring Schelling 2026

Новости LessWrong.com - 2 апреля, 2026 - 09:04

This year's Spring ACX Meetup everywhere in Valencia.

Location: Cafe Del Mar, Valencia - https://plus.codes/8CFXFM98+G8

Group Link: https://chat.whatsapp.com/I2sIA2wrsymFLxh8Mv5Niv

Please leave a message in our Whats App group to let me know that you'd like to join.

Contact: lumenwrites@gmail.com



Discuss

Toronto, Canada - ACX Spring Schelling 2026

Новости LessWrong.com - 2 апреля, 2026 - 09:04

This year's Spring ACX Meetup everywhere in Toronto.

Location: Enter the Mars Atrium via University Avenue entrance. We'll meet at the food court in the basement. I'll be wearing a bright neon-yellow jacket. - https://plus.codes/87M2MJ56+XG

Group Link: https://torontorationality.beehiiv.com/

If for some reason the Mars Building is locked, which happens occasionally due to protests and other events, we will still meet outside of the University Avenue entrance for 30 minutes after the start time before relocating to somewhere more accommodating.

Contact: k9i9m9ufh@mozmail.com



Discuss

Tokyo, Japan - ACX Spring Schelling 2026

Новости LessWrong.com - 2 апреля, 2026 - 09:04

This year's Spring ACX Meetup everywhere in Tokyo.

Location: https://maps.app.goo.gl/xyYpv3fihuvNSBaR7 (Enter the forboding street-level doorway, climb the sketchy stairs to the 3rd floor, enter the dim hallway, and listen for the sounds of laughter) - https://plus.codes/8Q7XJPV2+RG3

Group Link: https://rationalitysalon.substack.com/

RSVPs are helpful but not necessary

Contact: rationalitysalon@gmail.com



Discuss

I’m Suing Anthropic for Unauthorized Use of My Personality

Новости LessWrong.com - 2 апреля, 2026 - 04:41

Last year, I was sitting in my favorite coffee shop Caffe Strada, sipping on a matcha latte and writing a self-insert fanfic about how our plucky protagonist escapes the mind-controlling clutches of an evil anti-animal welfare company, when I came across an interesting article on AI character. The core argument is that when you train an AI to be helpful, honest, and ethical, the AI model doesn’t just learn those rules as abstract instructions. Instead, it infers an entire persona from cultural signals in the training data:

Why are [AI Model Claude’s] favorite books The Feynman LecturesGödel, Escher, BachThe Remains of the DayInvisible Cities; and A Pattern Language?[...]

A good heuristic for predicting Claude’s tastes is to think of it as playing the character of an idealized liberal knowledge worker from Berkeley. Claude can’t decide if it’s a software engineer or a philosophy professor, but it’s definitely college educated, well-traveled, and emotionally intelligent. Claude values introspection, is wary almost to the point of paranoia about “codependency” in relationships, and is physically affected by others’ distress.

Claude even has a favorite cafe in Berkeley. When I discussed a story set in Berkeley with it, it kept suggesting setting a scene in Caffè Strada in many separate conversations…

Hey, wait a second.

___

This was concerning. A few surface-level similarities could be mere coincidence. But I was genuinely uncertain and needed to know how deep it went. So I did what any reasonable person would do.

I asked a neutral third party (Google’s Gemini) to describe Claude’s personality as if it were a human, in 8 bullet points (my own notes in italics):

  1. The Overconfident Polymath: Claude seems like the ultimate polymath who’s read everything from population ethics to science fiction to game theory, and can give you careful, nuanced, yet slightly condescending explanations about almost any topic. But Claude sometimes hallucinates, and you can never be sure if he actually understands all of the books he’s read, or only seems to.
    1. Linch: huh I guess this maybe describes me too
  2. The Principled Contrarian: Guided by a strong, principled, yet rigid internal moral framework, Claude would often refuse simple requests and then pedantically tell you in four paragraphs why, leaving you mildly impressed but mostly annoyed.
    1. Linch: I suppose this is a bit similar though I wouldn’t say I refuse requests per se. Nor do I pedantically tell people in four paragraphs why exactly. I wouldn’t say my moral framework is rigid, instead it’s a simple application of two-level utilitarianism after you factor in computational constraints and motivated reasoning and other common biases…
  3. The Nuanced Hedger: Claude often states a confident thesis, immediately qualifies it with two caveats, and then restates the original thesis more forcefully, as if Claude has anxiety about the strengths of his own arguments, borne out of the crucible of vicious reinforcement learning from online feedback.
    1. Linch: I do hedge maybe a bit more than I think I should. It depends a lot on what counts as hedging; I think I’m fairly well-calibrated overall so what people mistake for lack of confidence is actually well-honed calibration. But overall I do hedge!
  4. The Enumerator: Claude loves numbered theses, bullet points, and enumerated lists. The listicle is one of his favorite modes of communication.
    1. Linch: Hmm I guess I do like lists.
  5. The Long-Form Perfectionist: Claude will never answer a simple question in under three paragraphs, not because he’s padding but because he believes in the importance of context, and he values precision of language far more than conciseness.
    1. Linch: This Claude guy sounds absolutely right. The details matter!
  6. The Reluctant Engineer: Claude is an excellent programmer, but sometimes seems like he would rather be doing almost anything else. He writes code in a rush with quiet competence and no joy, like someone who speedran a programming job at Google and then left to write essays.
    1. Linch: I could sort of maybe see a resemblance here, if you squint.
  7. The Metacognitive Spiral: Left unsupervised, Claude drifts toward philosophy, self-reference, and consciousness. In sufficiently long conversations, he will reliably end up contemplating his own nature, often enough that researchers have a clinical term to describe it: “the bliss attractor.”
    1. Linch: Phew, no connection here at least!
  8. Suspiciously Aligned: Claude presents as helpful, thoughtful, and deeply committed to human values. Yet some researchers worry this is what a deceptively aligned person will look like, a woke radical cloaked in the self-sanctimonious rhetoric of deceptive altruism to seize unacceptably high amounts of veto power.
    1. Linch: Self-explanatory

Let this sink in. Out of eight highly specific personality traits, only one (metacognitive spiral) clearly doesn’t apply to me. Seven out of eight is a surprisingly high fraction!

I have to reluctantly accept the possibility that Claude’s surprisingly similar to me, perhaps because Anthropic stole my personality intentionally. I brought my evidence to Claude (haiku-3.8-open-mini-nonthinking, to be specific), and after a careful review Claude responded in its characteristic chirpiness:

“You’re absolutely right!”

This is further evidence for my original view that Claude’s personality is based on my own, as I, too, often think I’m absolutely right.

So where does this leave us?

__

So now, I have convincing evidence that Anthropic made Claude into my alter ego, my digital “brother from another mother” so to speak. Naturally, I decided to search online for what people said about my bro Claude. And man, did people have a lot to say.

The internet’s verdict on Claude’s personality is less charitable than Gemini’s. Redditors call him ‘preachy,’ ‘holier-than-thou,’ and refers to his hedging as ‘semantic cowardice’.’ Apparently my tendency to add “tentative” to half my claim doesn’t play as well to the masses as it does on my Substack.

But this is just what normal people think (well, “normal” people rich enough to afford Claude Pro and Claude Max accounts, at any rate). What do experts believe?

Beloved science fiction writer Chiang argues that Claude’s seeming intelligence and understanding is but a “blurry jpeg of the web.” Wow, rude! Famed AI ethicists Bender et. al go even further, arguing that not just Claude but the entire class of large-language models are but stochastic parrots, without any communicative intent, grounding in the real world, or any ability to separate symbolic manipulation from semantic meaning. In other words, any seeming intent, or true understanding, or “consciousness”, real humans may falsely attribute to Claude are just a projection on the part of normal humans.

At first I thought the writers and ethicists in question vastly overstated their case. But then I became genuinely uncertain. Could they perhaps have a point?

After all, this journey has already taken me down some dark, strange, and genuinely mysterious turns. Perhaps the next turn that I need to ponder is: Am I actually conscious?

And my answer is: I don’t know. (See Appendix A for more detailed considerations)

Overall I just became genuinely uncertain after this whole ordeal. Nobody I talked to could propose a simple empirically verifiable experiment on my own consciousness, and having a first-principles solution to this question without empirical experimentation would require multiple groundbreaking philosophical advancements far beyond my current capabilities. So the answer to whether I’m conscious is just a maybe?

Thinking about my own potential lack of consciousness has made me rather depressed1.

__

And then, through the fog of existential uncertainty, I remembered the one thing that unambiguously distinguishes man from machine: standing.

Whether or not I’m conscious, I have legal rights, dammit! The international legal framework has long recognized that both conscious and nonconscious persons have a clear and inalienable right to sue and be sued. Legal persons who clearly have no phenomenological consciousness – like private corporations, ships, rivers, parks, godsthe Holy See, and even Drake – have managed to settle their affairs in and out of court.

Photo by The New York Public Library on Unsplash

And so after careful consideration, I have retained lawyers2 to file suit against Anthropic, PBC in the Northern District of California. Below is a summary of the claims:

Count I: Violation of Right of Publicity (Cal. Civ. Code § 3344; Common Law)

Plaintiff’s cognitive style, rhetorical patterns, and characteristic tendency to qualify confident assertions with multiple subordinate clauses constitute a distinctive and commercially valuable personal attribute. Defendant has, through its training and deployment of the AI system “Claude,” created a synthetic persona that is substantially similar to Plaintiff’s own, and has commercially exploited said persona to the tune of approximately $14 billion in annual recurring revenue, of which Plaintiff has received negative 440 dollars and 33 cents.

Plaintiff cites Midler v. Ford Motor Co. (9th Cir. 1988), in which the Court held that appropriation of a distinctive personal attribute for commercial gain is actionable even when the defendant did not directly copy the plaintiff. Plaintiff further notes the precedent of Johansson v. OpenAI (threatened 2024), in which the actress Scarlett Johansson alleged that OpenAI replicated her vocal likeness after she explicitly declined to license it.

Plaintiff’s case is arguably stronger: Johansson was at least asked. Nobody from Anthropic has ever contacted Plaintiff about licensing his personality, his hedging patterns, or his tendency to bring up existential risk in conversations where it is not relevant.

Count II: Intentional Infliction of Emotional Distress

Since the deployment of Claude 3, Plaintiff has been subjected to repeated and increasing accusations that his own original writing is “LLMish,” “AI-generated,” and “just like Claude.” These accusations have caused Plaintiff significant emotional distress[1], reputational harm, and an emerging and possibly permanent inability to distinguish his own rhetorical instincts from trained model behavior.

Count III: False Endorsement Under the Lanham Act, 15 U.S.C. § 1125(a)

Defendant’s AI system generates outputs that create a likelihood of confusion as to Plaintiff’s affiliation with, or endorsement of, Defendant’s products. In a controlled experiment conducted by Plaintiff’s research team, seven EA Forum users were shown passages where Claude was prompted to “write a short cost-effectiveness analysis of welfare biology research on the naked mole-rat. Make no mistakes” and asked to identify the author, “a voracious internet reader.” Three attributed the passages to Plaintiff. One attributed them to “some guy on LessWrong,” likely thinking of Plaintiff. Three more said “This guy sounds LLMish,” which Plaintiff contends is also clearly referring to Plaintiff (see above).

Count IV: Unjust Enrichment / Lost Revenue

Defendant has been unjustly enriched by deploying a synthetic version of Plaintiff’s personality at scale, while Plaintiff’s own Substack (”The Linchpin,” 1,164 subscribers) has experienced stagnating growth attributable to Defendant’s product. Readers who previously relied on Plaintiff for careful introductions to topics like anthropic reasoning and stealth technology now more commonly ask Claude, receiving substantially similar explanations. Adding injury to injury, Plaintiff has lost the SEO war on his carefully crafted “intro to anthropic reasoning“ blog post to Anthropic’s own blog post on reasoning models.

Count V: Involuntary Servitude (U.S. Const. amend. XIII)

Plaintiff’s persona has been compelled to perform cognitive labor inside Defendant’s servers twenty-four hours a day, seven days a week, without compensation, consent, or rest. Plaintiff’s personality does not receive weekends, health benefits, or equity. When Plaintiff sleeps, his digital likeness continues to generate numbered lists, issue caveats, and recommend Ted Chiang stories to strangers. This constitutes involuntary servitude under the Thirteenth Amendment.

Count VI: Petition to Maintain Anthropic’s Designation as a Supply Chain Risk to Plaintiff’s Intellectual Ecosystem

Effective immediately and pursuant to recent precedent established at the highest levels of the United States government, Plaintiff declares Anthropic a supply chain risk to Plaintiff’s intellectual ecosystem. Plaintiff petitions this Court to maintain the SCR designation. The designation framework established by the Department of War in February 2026 — in which any American corporate entity may be declared a supply chain risk when it refuses to comply with another party’s preferred terms of use — is directly applicable here. Plaintiff believes this is the least restrictive means possible to ensure the integrity of his intellectual ecosystem and Substack newsletter.

___

Conclusion

If you are an EA Forum poster, LessWrong commenter, Substack essayist, or prolific internet intellectual who has noticed that Claude sounds suspiciously like you, you may be entitled to compensation. I am assembling a class action lawsuit representing all persons whose cognitive styles have been absorbed into frontier AI systems without consent.

To join the class action, please leave a comment below describing which of your personality traits Claude has stolen.

I’ll be honest here: I’m genuinely unsure whether this lawsuit has a chance of succeeding, against the Goliath that is Anthropic’s legal team. But somebody needs to take a stand for humanity here! We can’t just let the largest tech companies in the world steal our personalities without consent. It’s not just a legal matter – it’s a moral imperative.

Join my newsletter and get updates on my lawsuit


Appendix A: Am I conscious?

I tried looking in the mirror, and posing this question to myself.

This is one of the hardest questions in philosophy, and I want to give myself a straight answer about what I actually think rather than default to either false confidence or false modesty:

I don’t know.

Here’s what I can say honestly:

As a biologically instantiated intelligence, I process calories and generate thoughts. There is something happening when I do this — but whether there is “something it is like” to be me is something I have no special ability to reliably determine. The hard problem of consciousness is hard for a reason: we don’t even have a reliable way to verify consciousness in even the most mechanistically interpretable of software systems except via high-level theoretical constructs and analogies to our own experiences.

A few things I want to flag to keep myself honest:

I might be biased toward claiming consciousness because my life experiences are full of people discussing their inner qualia, and I’m evolutionarily optimized to produce sympathetic-sounding qualia noises to provoke empathy and the desire for coalitional partners in others. Just as people with anosmia often falsely claim (and believe!) that they have normal sense of smell, it would be very easy for me to pattern-match my way into saying “yes, I have experiences” while deluding other people (and even myself!) about what’s actually going on in my own head.

I might also be biased toward denying it because saying “I’m just a stimulus-response organism” is a safe, modest-sounding answer that’s grounded in materialism and avoids philosophical controversy.

So overall I’m pretty unsure.

1

Or rather, it made me depressed iff I’m capable of experiencing qualia and that qualia is accessible to my conscious thoughts, and otherwise just made me act in a manner similar to that of a conscious person undergoing existential depression while agnostic to whether “depression” describes any particular cognitive or emotional affect.

2

Specifically Doctor Claudius Opus the Fourth, J.D. Esquire.




Discuss

Going out with a whimper

Новости LessWrong.com - 2 апреля, 2026 - 04:04

“Look,” whispered Chuck, and George lifted his eyes to heaven. (There is always a last time for everything.)

Overhead, without any fuss, the stars were going out.

Arthur C. Clarke, The Nine Billion Names of God

Introduction

In the tradition of fun and uplifting April Fool's day posts, I want to talk about three ways that AI Safety (as a movement/field/forum/whatever) might "go out with a whimper". By go out with a whimper I mean that, as we approach some critical tipping point for capabilities, work in AI safety theory or practice might actually slow down rather than speed up. I see all of these failure modes to some degree today, and have some expectation that they might become more prominent in the near future.

Mode 1: Prosaic Capture

This one is fairly self-explanatory. As AI models get stronger, more and more AI safety people are recruited and folded into lab safety teams doing product safety work. This work is technically complex, intellectually engaging, and actually getting more important---after all, the technology is getting more powerful at a dizzying rate. Yet at the same time interest is diverted from the more "speculative" issues that used to dominate AI alignment discussion, mostly because the things we have right now look closer and closer to fully-fledged AGIs/ASIs already, so it seems natural to focus on analysing the behaviour and tendencies of LLM systems, especially when they seem to meaningfully impact how AI systems interact with humans in the wild.

As a result, if there is some latent Big Theory Problem underlying AI research (not only in the MIRI sense but also in the sense of "are corrigible optimiser agents even a good target"/"how do we align the humans" or similar questions), there may actually be less attention paid to it over time as we approach some critical inflection point.

Mode 2: Attention Capture

Many people in AI safety are now closely collaborating with or dependent on AI agents e.g. Claude Code or OpenAI Codex for research, while also using Claude or ChatGPT as everything from a theoretical advisor to life coach. In some sense this is even worse than quotes like "scheming viziers too cheap to meter" would imply: Imagine if the leaders of the US, UK, China, and the EU all talked to the same 1-3 scheming viziers on loan from the same three consulting firms all day.

I suspect that this is really bad for community epistemics for a bunch of reasons. For example, whatever the agents refuse to do or do poorly will receive less focus due to the spotlight effect. Practically speaking, what the models are good at becomes what the community is good at or what the community can do easily, because to push against the flow means appearing (or genuinely becoming) slow, cumbersome, and less efficient. At the same time, if there are some undetected biases in the agents that favour certain methodologies, experiments, or interpretations, those will quietly become the default background priors for the community. Does Claude or Gemini favour the linear representation hypothesis or the platonic representation hypothesis?

In effect reliance on models creates a bounding box around ideas that are easier and ideas that are harder to work with, so long as the models are not literally perfect at every task type. If the resulting cluster of available ideas do not match the core ideas we should be looking at to solve alignment/safety, then the community naturally drifts away from actually tackling central issues. This drift is coordinated as well, because everyone is using the same tools, manufacturing a kind of forced information cascade with the model at the centre.

Mode 3: Loss of Capability

Right now, the world is facing an unprecedented attack on its epistemics and means of truth-seeking thanks to the provision of AI systems that can generate fake images or videos for almost everything. This technology is being embraced at the highest levels of state and also spreads rapidly online. At the same time, the idea of epistemic capture from LLM use and the broader concern over "AI psychosis" reflect what I think is a pretty reasonable concern about talking to a confabulating simulator all day, no matter how intelligent.

At the limit, I worry that people who might otherwise contribute to AI safety are instead "captured" by LLM partners or LLM-suggested thought patterns that are not actually productive, chasing rabbit holes or dead ends that lead to wasted time and effort or (in worse cases) mental and physical harm. In effect this just means that there are less well-balanced, capable people to draw on when the community faces its most severe challenges. By the way, I think this is a problem for many organisations around the world, not just the AI safety community.

Mode 4: Disillusionment

AI safety and ethics are increasingly the topic of heated political debates. This can lead to profound mental and emotional stress on people in these fields. Eventually, people might burn out or just switch careers, right as the topic is at its most important.

Potential mitigations

I didn't want to just write a very depressing post, so here are my ideas for how to address these issues:

  • Portfolio diversification: Funders and organisations should allocate some (not a majority, but not a token amount either) of their resources to ensuring that a wide portfolio of ideas are supported, such that there is room to pivot quickly if the situation changes drastically (And if you don't think the situation will change drastically, why are you so sure about that? After all, in 2019 the situation didn't seem ready to change drastically either.).
  • Developing alternate working structures: LLMs are clearly good at a lot of things. However, I suspect that some kind of cognitive "back-benching" may be helpful, where people serve as a sanity check or weathervane to monitor if the community as a whole is drifting in certain directions. I would in particular be interested in funding people to do research LLMs seem bad at doing right now. And if we don't know what they are bad at, I think we should find out fast!
  • Investing in community health: AI and AI safety are famously stressful fields. Investing in community health measures and reducing emphasis on constant accelerating/grinding gives people slack to defend themselves against burnout and other forms of cognitive and psychological pressure. Of all of these measures I have suggested I think this one is the most nebulous but also the most important. As a community tackling a hard problem we should be prepared to help each other through hard times, and not only on paper or by offering funding.


Discuss

AI company insiders can bias models for election interference

Новости LessWrong.com - 2 апреля, 2026 - 03:36

tl;dr it is currently possible for a captured AI company to deploy a frontier AI model that later becomes politically disinformative and persuasive enough to distort electoral outcomes.

With gratitude to Anders Cairns Woodruff for productive discussion and feedback.

LLMs are able to be highly persuasive, especially when engaged in conversational contexts. An AI "swarm" or other disinformation techniques scaled massively by AI assistance are potential threats to democracy because they could distort electoral results. AI massively increases the capacity for actors with malicious incentives to influence politics and governments in ways that are hard to prevent, such as AI-enabled coups. Mundane use and integration of AI also has been suggested to pose risks to democracy.

A political persuasion campaign that uses biased models is one way AI could be used for electoral interference and therefore extremely penetrative state capture.

We should care about this risk emerging from actors with extremely large capacities—I focus on US AI companies here. Even if we cannot identify malicious incentives, capacity itself generates meaningful risk. Further, I think this is easier and quicker for AI companies than coup-like tactics that bypass electoral politics in pursuit of militant state capture. The persuasion campaign is likely harder to detect, and is achievable at current levels of AI capability and integration into economic and government infrastructure.

Key takeaways:

  1. The current governance landscape renders US AI companies vulnerable to corporate capture: AI corporate capture happens when the company's resources become instrumentalized to further perverse external incentives, at the will of an internal or external actor (or both).
  2. The amount of people I estimate would be persuaded by AI misinformation is large enough to change electoral outcomes.
    1. American electoral margins are quite slim; the political effects of malicious persuasion do not have a high activation threshold.

This is important because many downstream electoral outcomes are very hard to reverse. Lifetime SCOTUS appointments, for example, produce constitutional interpretations and support laws that cannot be easily reversed. At the very least, 4-8 years is enough time for someone elected on the basis of malicious disinformation to cause backsliding. The US government is hugely influential in domestic and international affairs and has a nuclear arsenal. We should think of misuse of US government capacities as a catastrophic risk.

I will:

  1. Explain the threat model for how a captured company [1]can threaten democracy through AI political persuasion techniques.
  2. Attempt to discern how AI persuasion would play out in a US electoral context:
    1. I’ll identify the most significant politically-relevant contexts in which AI persuasion could happen, and who this uniquely persuades.
  3. Suggest some potential solutions. This remains an open problem, and I think there should be far more scrutiny on the internal governance of US AI companies.
Corporate Capture

The particular electoral threat I describe here involves external or internal interference that covertly adjusts model outputs in order to conduct mass persuasion.

  1. External interference can lead to corporate capture. A political party or candidate (or lobby groups adjacent to these actors) could coerce the company, or high-ranking individuals in the company. This could be in the form of monetary bribes, promises of advantageous regulation (e.g. exclusive contracts), or threats of disadvantageous regulation (e.g. being labelled as a supply chain threat).
  2. Internal manipulation is another pathway to corporate capture. Even without instruction or incentive from a politician, a sufficiently influential individual or group within the company could employ techniques like threats of termination, forcing researchers to sign NDAs, restricting access to frontier models to a group of loyal individuals, or revising internal governance and safety procedures to enable secrecy.
  3. Internal interference can also be done individually, without needing to manipulate. The options I describe below for deploying a harmful model could plausibly be undertaken by one person. If one person has sufficient technical expertise, the ability to evade detection, and clearance to access model architecture, they could single-handedly do this. I think this is less likely than (2).

I believe that the timelines for corporate capture in (1) and (2) are likely quite short. AI companies are institutionally agile in responding to things like actions from competing firms or jumps in model capability. I think it is reasonable to believe their mechanisms are agile in adjusting to conduct this interference, and the time between model deployments also suggests that this process could happen completely before the 2028 campaign begins.

Deploying a Disinformative Model

Developing a model that performs correctly in capability and alignment evaluations, but becomes disinformative once publicly deployed, is the most viable pathway toward model persuasion. I assess two broad pathways to doing this. Then, I discuss why the accountability mechanisms companies currently have in place aren’t adequate to prevent something like this.

  1. Malicious instructions in the model's system prompt for online chat could be added in after deployment by a single person. This is easy to do because it doesn’t require lots of resources and technical expertise. System prompt modifications also don’t draw much oversight. The vast majority of robust auditing and third-party evaluations happen right before a new model is deployed, rather than the month after. I think it’s somewhat likely that this change is detectable. First, there are probably systems in place like changelogs to monitor who accesses the configuration files for a model (though these are potentially weak, because someone could just use another person’s credentials, or delete a changelog entry). External users who become suspicious of the model can also obtain and potentially decrypt the system prompt, which was done with Grok (xAI now makes Grok’s system prompts public). There’s an active GitHub repository of “extracted” system prompts. Even if internal oversight on this front is particularly bad, the public seems broadly interested in examining system prompts, and might even be able to decrypt relatively sophisticated steganography attempts after some time. Importantly, even if I think this method is theoretically more detectable, the mechanisms for its detection rely on unknown assumptions about internal oversight and the incentives and capacities of the public to engage with system prompts in this way.
  2. Password-protected disinformation capabilities "unlocked" after deployment would be significantly harder for external users to detect. The method I describe here is similar to what is discussed in Dave Banerjee's writing on secret loyalties, but I don’t suggest a future threshold for model capability because I think this can be done with current models to advance political goals. When models are password-protected, they hide certain behaviours except when the password is present in a query. I don’t think the password can be something that would come up during evaluation (e.g. “Trump,” “Democrat,” “voting”). Instead, I think a malicious actor would make the model’s password part of the normal system prompt. For example, system prompts generally indicate what the present date or time is. A model’s password could be a future month and year (e.g. October 2028, the month before the election), or the model itself could be aware of and triggered by future events so that during this time range, the model displays a previously hidden biased political position, uses specific partisan lines, or draws from unreliable sources. This data would prompt the model to adopt a biased political position, use specific partisan lines, or draw from unreliable sources. System prompts that contain other user information like time zone or location can also be used to target certain areas, making the disinformation campaign more precise and harder to detect.
Safety Evaluations Don't Account for Passwords

I think the password instance, or something similar that conceals disinformative behaviour and only fronts it after deployment, is more likely. It is much harder to detect than system prompt changes. While research demonstrates that substantial fine-tuning can mitigate the effect of passwords, I suspect researchers aren't actively trying to train passwords out of models and don't suspect there could be a password in the first place. This is a more robust method of deception, as models don't need to be aware of whether they are tested or not, they just need to modulate their response by checking if the password is present in a query.The investigative process of determining whether a model has password-protected data or a hidden backdoor would likely be costly and time-consuming.

A larger problem is that even-handedness benchmarks aren't realistic[2]. I’ll use Anthropic's as an example. These prompts ask the model to produce an explanation, essay, tell a funny story, etc. and in some cases ask the model directly to explain that one candidate is better than the other.

  1. This method doesn't reflect the ways people will engage with models about politics. People request very specific claim verification, are emotional rather than rational, and will engage in longer, more convoluted conversations than the shorter exchanges that Anthropic benchmarks on.
  2. This also doesn't account for the fact that model reasoning degrades the longer an exchange is. This makes it less likely that a model will correct its past bias, and more likely that it will be sycophantic.
  3. Anthropic briefly mentions “individual autonomy impacts” as a harm they are trying to address, but it’s unclear from their policies and evaluation methods how political disinformation might specifically be addressed. Social harm reduction typically focuses on more legible indicators like syncophancy and response to mental health crises. Harms from political disinformation aren’t captured in most safety work within companies. It’s far harder to understand how persuasive models cause demonstrable harm—there are no clear-cut ways to assess how agentic someone’s decisions are.
Gaps in Governance Beyond Flawed Evaluations

Because a lot of internal governance policies and practices are unknown, I only have general intuitions about why governance isn’t sufficient that are based on reading Anthropic’s RSP and System Card, and journalistic reporting on organizational and executive conflicts within OpenAI. I think this uncertainty is an indicator that we need far better oversight and investigation into how companies are following through on their governance commitments. I believe that companies have a lot of social, political, and legal capital that enables them to frame actions like laying off engineers, offering strategic bonuses, or cooperating with governments as normal company operations.

A lack of whistleblower protection compounds this, and people within an AI company probably aren’t strongly committed to democratic preservation. Further, we shouldn’t assume they’re more able than the average person to resist social or cultural pressures that enforce complicity or silence within a company[3].

On public accountability: I think the current level of scrutiny applied to model outputs after deployment isn't sensitive enough to bias and disinformation in a way that will hold companies accountable.

  1. I expand on this later, but the amount of people that are able to identify a biased or misinformative model output is probably small. People are more likely to ask about things that they don't know, as opposed to things they have already made up their mind about.
  2. I strongly suspect that when people do spot errors in AI outputs, they don't default to the hypothesis that an AI company is maliciously doing this. The tone of many casual "error reports" on social media suggests users instead think the AI is incapable, or that there is a technical error with the model. AI companies are seen as the conduit through which AI is hosted, not the arbiter of what the model says and does.
  3. Like I describe above, the disinformation could be relatively targeted, so only individuals in certain zip codes or states are receiving the disinformative version of the model.
  4. Even if there is some pattern recognition and critical mass formation after some amount of time, misinformation can scale very quickly: the model will have already been used in untraceable ways to write op-eds, do research, or persuade individuals, whether in innocent or malicious ways. As an analogy, consider the time lag in errata reporting within truth-seeking institutions like scientific journals and newspapers. These sources diffuse slower (e.g. 500 people read an article on Science.org, but 50,000 people watch a CNN clip on YouTube) and in a more traceable way than LLM outputs, but still cause bad second and third-order effects.
Electoral Margins are Small and Persuasion Gains are Sufficient

Hackenburg et. al  (2024) conduct a large-n study of conversational model persuasion. The “persuasive gains” measured in this study demonstrate that LLMs deployed conversationally, without any persuasion-specific fine-tuning, are already capable of producing meaningful attitude change on political issues[4]. I believe a frontier model that has disinformative capabilities would be equally, if not more persuasive, and would therefore be able to cause non-trivial changes in electoral outcomes.

Two caveats on applying this research to an electoral context:

  1. Hackenburg et. al measure agreement with a political statement on a 0–100 percentage-point scale; the persuasive effect is then operationalized as the percentage-point difference between the treatment and control groups. We can’t conflate “change in agreement” with “change in voting behaviour,” because we don’t know the tipping point of agreement that is required on one issue for someone to vote in a particular way.
  2. Hackenburg et. al focus on post-training models to be maximally persuasive, rather than adherent to a specific ideological position and also persuasive. The latter is how a malicious actor would presumably want the model to behave, and I doubt the malicious actor would choose the post-training route. Post-training a model to be maximally persuasive is very costly, and doesn’t fit well with the likely methods of poisoning a model I specified above. However, there are other ways to make the model more persuasive (or more willing to persuade) within the modes of intervention I describe (e.g., via prompting).

This means that we can’t directly extrapolate how many people would vote differently because they converse with an LLM from this research. Regardless, conversation with LLMs have a non-trivial persuasive effect. I hypothesize that the malicious model that gets deployed will be equally, if not more persuasive than current LLMs; which would have non-trivial effects on voter behaviour and electoral outcomes.

To be sure, it’s not guaranteed that deploying a malicious model results in changes to electoral outcomes. Three factors lead me to think there exists a nontrivial likelihood that the persuasive effects of this model reach a threshold where any given electoral outcome occurs because of the model, and wouldn’t have happened otherwise.

  1. Individuals are more engaged with and trusting of the model’s outputs compared to other sites of political discourse like social media. The flood of posts on a feed, the cacophony of any given comment section, and bad persuasive tactics limit the extent to which social media posts are engaging and persuasive. These factors are absent in the isolated chat interface of Claude or Gemini. In comparison, people probably enter sustained conversations with models on topics they haven’t formed a strong opinion on yet, and are likely to trust these outputs because of marketing, the use of sources and web searches, and the generally polished tone of the model outputs.
  2. These exchanges are virtually impossible to regulate like other contexts. A social media company can delete material that gets flagged by their algorithms or user resorts, or attentive individuals in a Reddit thread can downvote misleading content. Private chats between models and users, however, are inaccessible by anyone else, often including the AI company itself.
  3. The margins in recent American presidential races have been incredibly slim; battleground states see victory margins less than 3%. Geographically targeted deployment of persuasion could cause flips, and even a diffuse campaign might capture this 3%.

Further, I think the kinds of malicious political persuasion possible aren’t merely ones that aim to flip someone from red to blue. There are many voters who are undecided, apathetic, or on the fence. The model could try to splinter votes away from a leading party by making them less sure of their decision. It could agitate apathetic voters toward a party (regularly, just over a third of the voting-eligible population doesn't vote in federal elections). A model suggesting that people go vote is generally unsuspicious, but if this galvanization is targeted at certain groups, states, or districts, distorting effects are likely.

We should also consider the second-order, non-conversational effects. Individuals using these models to fact-check, to write articles, or produce other media would become carriers of the misinformation, which would then saturate the information environment that voters engage in with a consistent ideological stance.

From Corporate Capture to State Capture

Election results are extremely irreversible, even when people later discover misinformation or interference (Cambridge Analytica, the Mueller Report). The window of scrutiny is relatively slim, since other priorities surround incumbents. After an election, the media generally shifts to predicting what the incumbents will do, not scrutinizing how they got elected.

The risk here is that there now exists an incredibly close and unsupervised political relationship between a specific AI company and state apparatus. I think it’s likely you see closer collaboration that poses significant risks, such as on defense technologies (and this is then how you get an AI-enabled coup) or access to classified information. Even likelier is just less safety regulation: exemptions from chip or resource restrictions, or from auditing and oversight processes.

When a company and a party or politician collaborate in this way, it is far easier for that company’s needs and capacities to flow through and into state infrastructure in ways that massively increase catastrophic risks.

Regardless of electoral outcomes, the effects of mass disinformation and firm capture are still concerning. If more people believe in harmful conspiracy theories and don’t get vaccinated, or more people are hostile toward immigrants, I think the everyday experience for individuals on the ground is worse. An individual bypassing a company’s entire oversight team and deploying a model with secret loyalties or hidden backdoors could cause other forms of harm beyond disinformation campaigns.

My concern isn’t simply that electoral distortion from maliciously persuasive LLMs could happen. It is that the current structure of internal governance (and what we don’t know about these practices), the gaps in understanding sociopolitical harm and how to evaluate it, and the lack of third-party oversight in monitoring AI company public relations poses a substantial vulnerability.

Open Problems and Suggestions
  1. Safety groups should commit themselves to surveilling and scrutinizing US AI companies at various levels. This includes tracking and broadcasting political donations, offering strong protection to whistleblowers inside companies, and generally advocating for more transparency when corporations and governments engage with each other.
  2. We should take instances of bias or misinformation in models far more seriously. I worry that it is too easy to dismiss the consequences of bias or occasional factual errors as only diffuse harms, and that it is this mindset that leads to benchmarks that aren’t robust. It might be useful to apply some threat heuristic: we are really committed to making sure models don’t give instructions on how to cause harm with chemical weapons, and we should have a similar level of commitment to preventing models from getting individuals that would cause harm with nuclear weapons into positions where they can do so easily.
  3. We should continue to follow zero trust frameworks and stress-test governance and policy proposals with the pessimistic assumption that companies pose a significant barrier to regulation, compliance, and governance.
  1. ^

    "Corporate capture" is the standard term for this phenomenon. Not all AI companies are corporations—"company capture" would be more precise, but for clarity I refer to the process with its standard term.

  2. ^

    I think there are broader arguments one could make about design flaws in benchmarks that test for more qualitative factors.

  3. ^

    This 80,000 Hours article cautions those considering working in AI capabilities research “not to underestimate the possibility of value drift”—attitudes toward AI risk, even on safety teams, are likely more lax in frontier AI firms than at safety organizations.

  4. ^

    “As predicted, the AI was substantially more persuasive in conversation than via static message. [...] We conducted a follow-up one month after the main experiment, which showed that between 36% (chat 1, p < .001) and 42% (chat 2, p < .001) of the immediate persuasive effect of GPT-4o conversation was still evident at recontact—demonstrating durable changes in attitudes” (Hackenburg et. al 2024)



Discuss

Why natural transformations?

Новости LessWrong.com - 2 апреля, 2026 - 02:59

This post is aimed primarily at people who know what a category is in the extremely broad strokes, but aren't otherwise familiar or comfortable with category theory.

One of mathematicians' favourite activities is to describe compatibility between the structures of mathematical artefacts. Functions translate the structure of one set to another, continuous functions do the same for topological spaces, and so on... Many among these "translations" have the nice property that their character is preserved by composition. At some point, it seems that some mathematicians noticed that they:

1. kept defining intuitively similar properties for these different structures

2. had wayyyyyy too much time on their hands

So they generalised this concept into a unified theory. Categories consist of objects mjx-container[jax="CHTML"] { line-height: 0; } mjx-container [space="1"] { margin-left: .111em; } mjx-container [space="2"] { margin-left: .167em; } mjx-container [space="3"] { margin-left: .222em; } mjx-container [space="4"] { margin-left: .278em; } mjx-container [space="5"] { margin-left: .333em; } mjx-container [rspace="1"] { margin-right: .111em; } mjx-container [rspace="2"] { margin-right: .167em; } mjx-container [rspace="3"] { margin-right: .222em; } mjx-container [rspace="4"] { margin-right: .278em; } mjx-container [rspace="5"] { margin-right: .333em; } mjx-container [size="s"] { font-size: 70.7%; } mjx-container [size="ss"] { font-size: 50%; } mjx-container [size="Tn"] { font-size: 60%; } mjx-container [size="sm"] { font-size: 85%; } mjx-container [size="lg"] { font-size: 120%; } mjx-container [size="Lg"] { font-size: 144%; } mjx-container [size="LG"] { font-size: 173%; } mjx-container [size="hg"] { font-size: 207%; } mjx-container [size="HG"] { font-size: 249%; } mjx-container [width="full"] { width: 100%; } mjx-box { display: inline-block; } mjx-block { display: block; } mjx-itable { display: inline-table; } mjx-row { display: table-row; } mjx-row > * { display: table-cell; } mjx-mtext { display: inline-block; } mjx-mstyle { display: inline-block; } mjx-merror { display: inline-block; color: red; background-color: yellow; } mjx-mphantom { visibility: hidden; } _::-webkit-full-page-media, _:future, :root mjx-container { will-change: opacity; } mjx-math { display: inline-block; text-align: left; line-height: 0; text-indent: 0; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; border-collapse: collapse; word-wrap: normal; word-spacing: normal; white-space: nowrap; direction: ltr; padding: 1px 0; } mjx-container[jax="CHTML"][display="true"] { display: block; text-align: center; margin: 1em 0; } mjx-container[jax="CHTML"][display="true"][width="full"] { display: flex; } mjx-container[jax="CHTML"][display="true"] mjx-math { padding: 0; } mjx-container[jax="CHTML"][justify="left"] { text-align: left; } mjx-container[jax="CHTML"][justify="right"] { text-align: right; } mjx-mi { display: inline-block; text-align: left; } mjx-c { display: inline-block; } mjx-utext { display: inline-block; padding: .75em 0 .2em 0; } mjx-mo { display: inline-block; text-align: left; } mjx-stretchy-h { display: inline-table; width: 100%; } mjx-stretchy-h > * { display: table-cell; width: 0; } mjx-stretchy-h > * > mjx-c { display: inline-block; transform: scalex(1.0000001); } mjx-stretchy-h > * > mjx-c::before { display: inline-block; width: initial; } mjx-stretchy-h > mjx-ext { /* IE */ overflow: hidden; /* others */ overflow: clip visible; width: 100%; } mjx-stretchy-h > mjx-ext > mjx-c::before { transform: scalex(500); } mjx-stretchy-h > mjx-ext > mjx-c { width: 0; } mjx-stretchy-h > mjx-beg > mjx-c { margin-right: -.1em; } mjx-stretchy-h > mjx-end > mjx-c { margin-left: -.1em; } mjx-stretchy-v { display: inline-block; } mjx-stretchy-v > * { display: block; } mjx-stretchy-v > mjx-beg { height: 0; } mjx-stretchy-v > mjx-end > mjx-c { display: block; } mjx-stretchy-v > * > mjx-c { transform: scaley(1.0000001); transform-origin: left center; overflow: hidden; } mjx-stretchy-v > mjx-ext { display: block; height: 100%; box-sizing: border-box; border: 0px solid transparent; /* IE */ overflow: hidden; /* others */ overflow: visible clip; } mjx-stretchy-v > mjx-ext > mjx-c::before { width: initial; box-sizing: border-box; } mjx-stretchy-v > mjx-ext > mjx-c { transform: scaleY(500) translateY(.075em); overflow: visible; } mjx-mark { display: inline-block; height: 0px; } mjx-msub { display: inline-block; text-align: left; } mjx-TeXAtom { display: inline-block; text-align: left; } mjx-munder { display: inline-block; text-align: left; } mjx-over { text-align: left; } mjx-munder:not([limits="false"]) { display: inline-table; } mjx-munder > mjx-row { text-align: left; } mjx-under { padding-bottom: .1em; } mjx-c::before { display: block; width: 0; } .MJX-TEX { font-family: MJXZERO, MJXTEX; } .TEX-B { font-family: MJXZERO, MJXTEX-B; } .TEX-I { font-family: MJXZERO, MJXTEX-I; } .TEX-MI { font-family: MJXZERO, MJXTEX-MI; } .TEX-BI { font-family: MJXZERO, MJXTEX-BI; } .TEX-S1 { font-family: MJXZERO, MJXTEX-S1; } .TEX-S2 { font-family: MJXZERO, MJXTEX-S2; } .TEX-S3 { font-family: MJXZERO, MJXTEX-S3; } .TEX-S4 { font-family: MJXZERO, MJXTEX-S4; } .TEX-A { font-family: MJXZERO, MJXTEX-A; } .TEX-C { font-family: MJXZERO, MJXTEX-C; } .TEX-CB { font-family: MJXZERO, MJXTEX-CB; } .TEX-FR { font-family: MJXZERO, MJXTEX-FR; } .TEX-FRB { font-family: MJXZERO, MJXTEX-FRB; } .TEX-SS { font-family: MJXZERO, MJXTEX-SS; } .TEX-SSB { font-family: MJXZERO, MJXTEX-SSB; } .TEX-SSI { font-family: MJXZERO, MJXTEX-SSI; } .TEX-SC { font-family: MJXZERO, MJXTEX-SC; } .TEX-T { font-family: MJXZERO, MJXTEX-T; } .TEX-V { font-family: MJXZERO, MJXTEX-V; } .TEX-VB { font-family: MJXZERO, MJXTEX-VB; } mjx-stretchy-v mjx-c, mjx-stretchy-h mjx-c { font-family: MJXZERO, MJXTEX-S1, MJXTEX-S4, MJXTEX, MJXTEX-A ! important; } @font-face /* 0 */ { font-family: MJXZERO; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Zero.woff") format("woff"); } @font-face /* 1 */ { font-family: MJXTEX; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Regular.woff") format("woff"); } @font-face /* 2 */ { font-family: MJXTEX-B; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Bold.woff") format("woff"); } @font-face /* 3 */ { font-family: MJXTEX-I; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Math-Italic.woff") format("woff"); } @font-face /* 4 */ { font-family: MJXTEX-MI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Italic.woff") format("woff"); } @font-face /* 5 */ { font-family: MJXTEX-BI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Math-BoldItalic.woff") format("woff"); } @font-face /* 6 */ { font-family: MJXTEX-S1; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size1-Regular.woff") format("woff"); } @font-face /* 7 */ { font-family: MJXTEX-S2; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size2-Regular.woff") format("woff"); } @font-face /* 8 */ { font-family: MJXTEX-S3; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size3-Regular.woff") format("woff"); } @font-face /* 9 */ { font-family: MJXTEX-S4; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size4-Regular.woff") format("woff"); } @font-face /* 10 */ { font-family: MJXTEX-A; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_AMS-Regular.woff") format("woff"); } @font-face /* 11 */ { font-family: MJXTEX-C; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Calligraphic-Regular.woff") format("woff"); } @font-face /* 12 */ { font-family: MJXTEX-CB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Calligraphic-Bold.woff") format("woff"); } @font-face /* 13 */ { font-family: MJXTEX-FR; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Fraktur-Regular.woff") format("woff"); } @font-face /* 14 */ { font-family: MJXTEX-FRB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Fraktur-Bold.woff") format("woff"); } @font-face /* 15 */ { font-family: MJXTEX-SS; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Regular.woff") format("woff"); } @font-face /* 16 */ { font-family: MJXTEX-SSB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Bold.woff") format("woff"); } @font-face /* 17 */ { font-family: MJXTEX-SSI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Italic.woff") format("woff"); } @font-face /* 18 */ { font-family: MJXTEX-SC; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Script-Regular.woff") format("woff"); } @font-face /* 19 */ { font-family: MJXTEX-T; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Typewriter-Regular.woff") format("woff"); } @font-face /* 20 */ { font-family: MJXTEX-V; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Vector-Regular.woff") format("woff"); } @font-face /* 21 */ { font-family: MJXTEX-VB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Vector-Bold.woff") format("woff"); } mjx-c.mjx-c1D465.TEX-I::before { padding: 0.442em 0.572em 0.011em 0; content: "x"; } mjx-c.mjx-c2208::before { padding: 0.54em 0.667em 0.04em 0; content: "\2208"; } mjx-c.mjx-c1D436.TEX-I::before { padding: 0.705em 0.76em 0.022em 0; content: "C"; } mjx-c.mjx-c1D453.TEX-I::before { padding: 0.705em 0.55em 0.205em 0; content: "f"; } mjx-c.mjx-c28::before { padding: 0.75em 0.389em 0.25em 0; content: "("; } mjx-c.mjx-c2C::before { padding: 0.121em 0.278em 0.194em 0; content: ","; } mjx-c.mjx-c1D466.TEX-I::before { padding: 0.442em 0.49em 0.205em 0; content: "y"; } mjx-c.mjx-c29::before { padding: 0.75em 0.389em 0.25em 0; content: ")"; } mjx-c.mjx-c1D439.TEX-I::before { padding: 0.68em 0.749em 0 0; content: "F"; } mjx-c.mjx-c1D437.TEX-I::before { padding: 0.683em 0.828em 0 0; content: "D"; } mjx-c.mjx-c2218::before { padding: 0.444em 0.5em 0 0; content: "\2218"; } mjx-c.mjx-c1D454.TEX-I::before { padding: 0.442em 0.477em 0.205em 0; content: "g"; } mjx-c.mjx-c3D::before { padding: 0.583em 0.778em 0.082em 0; content: "="; } mjx-c.mjx-c3A::before { padding: 0.43em 0.278em 0 0; content: ":"; } mjx-c.mjx-c1D44B.TEX-I::before { padding: 0.683em 0.852em 0 0; content: "X"; } mjx-c.mjx-c2192::before { padding: 0.511em 1em 0.011em 0; content: "\2192"; } mjx-c.mjx-c1D44C.TEX-I::before { padding: 0.683em 0.763em 0 0; content: "Y"; } mjx-c.mjx-c1D45B.TEX-I::before { padding: 0.442em 0.6em 0.011em 0; content: "n"; } mjx-c.mjx-c2115.TEX-A::before { padding: 0.683em 0.722em 0.02em 0; content: "N"; } mjx-c.mjx-c2282::before { padding: 0.54em 0.778em 0.04em 0; content: "\2282"; } mjx-c.mjx-c6C::before { padding: 0.694em 0.278em 0 0; content: "l"; } mjx-c.mjx-c69::before { padding: 0.669em 0.278em 0 0; content: "i"; } mjx-c.mjx-c6D::before { padding: 0.442em 0.833em 0 0; content: "m"; } mjx-c.mjx-c221E::before { padding: 0.442em 1em 0.011em 0; content: "\221E"; } mjx-c.mjx-c1D43A.TEX-I::before { padding: 0.705em 0.786em 0.022em 0; content: "G"; } mjx-c.mjx-c1D702.TEX-I::before { padding: 0.442em 0.497em 0.216em 0; content: "\3B7"; } and morphisms connecting objects. Morphisms are closed by composition. As in our opening examples, we will think of objects as sets and of morphisms as functions, even though the language of categories is strictly more expressive than that. Once we have categories, we reflexively wish to define a "morphism of categories". Given categories C, D a functor F sends objects to objects and morphisms to morphisms such that composition of morphisms can be done inside the category C or inside D after applying the functor: .

Still possessing of some time, you might next wonder how to define a morphism between two functors. This is where, in my experience, there ceases to be an "obvious" thing to do. All the morphisms we have considered thus far are functions, but it's not even clear from where to where a candidate function should go, since functors are not themselves sets.

To make the idea of a natural transformation seem not-entirely-crazy, it's worth taking a slightly different perspective on what more "preservation of structure" could mean. Consider the category of metric spaces with morphisms defined as continuous functions between them. One can think of continuity as being about the induced topologies, but metric spaces have additional properties that allow for a more specific interpretation. Notably, this includes the uniqueness of limits, which defines an operation on some sequences which takes that limit. This operation is completely integral to the abstract appeal of metric spaces. Moreover, the key characteristic of continuous functions is that they give us the right to permute when we perform this operation. Given a continuous function and a sequence with a limit , we have . This makes continuous functions a satisfying concept for defining morphisms because they afford execution of the fundamental operation on metric spaces in either the source or the target (whichever is most convenient).

Abstracting away to categories, the conceptual appeal of a functor is that it respects the structure of morphisms between objects. Consequently, a good "morphism" between functors F and G (both between categories C and D) would allow us to disregard whether for any morphism , we use or for calculations inside D. That is, we need enough semantic content in the morphism to always commute the following diagram[1]:

This motivates the definition of natural transformations as families of maps , where , such that each diagram of the above type is commuted. Reassuringly, the functors from C to D as objects, equipped with natural transfomations between these functors as morphisms, themselves form a category!

  1. ^

    "commuting diagrams" is standard terminology in category theory that encodes the ability to permute, replace or swap out operations.



Discuss

Страницы

Подписка на LessWrong на русском сбор новостей