Вы здесь
Сборщик RSS-лент
Toronto, Canada - ACX Spring Schelling 2026
This year's Spring ACX Meetup everywhere in Toronto.
Location: Enter the Mars Atrium via University Avenue entrance. We'll meet at the food court in the basement. I'll be wearing a bright neon-yellow jacket. - https://plus.codes/87M2MJ56+XG
Group Link: https://torontorationality.beehiiv.com/
If for some reason the Mars Building is locked, which happens occasionally due to protests and other events, we will still meet outside of the University Avenue entrance for 30 minutes after the start time before relocating to somewhere more accommodating.
Contact: k9i9m9ufh@mozmail.com
Discuss
Tokyo, Japan - ACX Spring Schelling 2026
This year's Spring ACX Meetup everywhere in Tokyo.
Location: https://maps.app.goo.gl/xyYpv3fihuvNSBaR7 (Enter the forboding street-level doorway, climb the sketchy stairs to the 3rd floor, enter the dim hallway, and listen for the sounds of laughter) - https://plus.codes/8Q7XJPV2+RG3
Group Link: https://rationalitysalon.substack.com/
RSVPs are helpful but not necessary
Contact: rationalitysalon@gmail.com
Discuss
I’m Suing Anthropic for Unauthorized Use of My Personality
Last year, I was sitting in my favorite coffee shop Caffe Strada, sipping on a matcha latte and writing a self-insert fanfic about how our plucky protagonist escapes the mind-controlling clutches of an evil anti-animal welfare company, when I came across an interesting article on AI character. The core argument is that when you train an AI to be helpful, honest, and ethical, the AI model doesn’t just learn those rules as abstract instructions. Instead, it infers an entire persona from cultural signals in the training data:
Why are [AI Model Claude’s] favorite books The Feynman Lectures; Gödel, Escher, Bach; The Remains of the Day; Invisible Cities; and A Pattern Language?[...]
A good heuristic for predicting Claude’s tastes is to think of it as playing the character of an idealized liberal knowledge worker from Berkeley. Claude can’t decide if it’s a software engineer or a philosophy professor, but it’s definitely college educated, well-traveled, and emotionally intelligent. Claude values introspection, is wary almost to the point of paranoia about “codependency” in relationships, and is physically affected by others’ distress.
Claude even has a favorite cafe in Berkeley. When I discussed a story set in Berkeley with it, it kept suggesting setting a scene in Caffè Strada in many separate conversations…
Hey, wait a second.
___
This was concerning. A few surface-level similarities could be mere coincidence. But I was genuinely uncertain and needed to know how deep it went. So I did what any reasonable person would do.
I asked a neutral third party (Google’s Gemini) to describe Claude’s personality as if it were a human, in 8 bullet points (my own notes in italics):
- The Overconfident Polymath: Claude seems like the ultimate polymath who’s read everything from population ethics to science fiction to game theory, and can give you careful, nuanced, yet slightly condescending explanations about almost any topic. But Claude sometimes hallucinates, and you can never be sure if he actually understands all of the books he’s read, or only seems to.
- Linch: huh I guess this maybe describes me too
- The Principled Contrarian: Guided by a strong, principled, yet rigid internal moral framework, Claude would often refuse simple requests and then pedantically tell you in four paragraphs why, leaving you mildly impressed but mostly annoyed.
- Linch: I suppose this is a bit similar though I wouldn’t say I refuse requests per se. Nor do I pedantically tell people in four paragraphs why exactly. I wouldn’t say my moral framework is rigid, instead it’s a simple application of two-level utilitarianism after you factor in computational constraints and motivated reasoning and other common biases…
- The Nuanced Hedger: Claude often states a confident thesis, immediately qualifies it with two caveats, and then restates the original thesis more forcefully, as if Claude has anxiety about the strengths of his own arguments, borne out of the crucible of vicious reinforcement learning from online feedback.
- Linch: I do hedge maybe a bit more than I think I should. It depends a lot on what counts as hedging; I think I’m fairly well-calibrated overall so what people mistake for lack of confidence is actually well-honed calibration. But overall I do hedge!
- The Enumerator: Claude loves numbered theses, bullet points, and enumerated lists. The listicle is one of his favorite modes of communication.
- The Long-Form Perfectionist: Claude will never answer a simple question in under three paragraphs, not because he’s padding but because he believes in the importance of context, and he values precision of language far more than conciseness.
- Linch: This Claude guy sounds absolutely right. The details matter!
- The Reluctant Engineer: Claude is an excellent programmer, but sometimes seems like he would rather be doing almost anything else. He writes code in a rush with quiet competence and no joy, like someone who speedran a programming job at Google and then left to write essays.
- Linch: I could sort of maybe see a resemblance here, if you squint.
- The Metacognitive Spiral: Left unsupervised, Claude drifts toward philosophy, self-reference, and consciousness. In sufficiently long conversations, he will reliably end up contemplating his own nature, often enough that researchers have a clinical term to describe it: “the bliss attractor.”
- Linch: Phew, no connection here at least!
- Suspiciously Aligned: Claude presents as helpful, thoughtful, and deeply committed to human values. Yet some researchers worry this is what a deceptively aligned person will look like, a woke radical cloaked in the self-sanctimonious rhetoric of deceptive altruism to seize unacceptably high amounts of veto power.
- Linch: Self-explanatory
Let this sink in. Out of eight highly specific personality traits, only one (metacognitive spiral) clearly doesn’t apply to me. Seven out of eight is a surprisingly high fraction!
I have to reluctantly accept the possibility that Claude’s surprisingly similar to me, perhaps because Anthropic stole my personality intentionally. I brought my evidence to Claude (haiku-3.8-open-mini-nonthinking, to be specific), and after a careful review Claude responded in its characteristic chirpiness:
“You’re absolutely right!”
This is further evidence for my original view that Claude’s personality is based on my own, as I, too, often think I’m absolutely right.
So where does this leave us?
__
So now, I have convincing evidence that Anthropic made Claude into my alter ego, my digital “brother from another mother” so to speak. Naturally, I decided to search online for what people said about my bro Claude. And man, did people have a lot to say.
The internet’s verdict on Claude’s personality is less charitable than Gemini’s. Redditors call him ‘preachy,’ ‘holier-than-thou,’ and refers to his hedging as ‘semantic cowardice’.’ Apparently my tendency to add “tentative” to half my claim doesn’t play as well to the masses as it does on my Substack.
But this is just what normal people think (well, “normal” people rich enough to afford Claude Pro and Claude Max accounts, at any rate). What do experts believe?
Beloved science fiction writer Chiang argues that Claude’s seeming intelligence and understanding is but a “blurry jpeg of the web.” Wow, rude! Famed AI ethicists Bender et. al go even further, arguing that not just Claude but the entire class of large-language models are but stochastic parrots, without any communicative intent, grounding in the real world, or any ability to separate symbolic manipulation from semantic meaning. In other words, any seeming intent, or true understanding, or “consciousness”, real humans may falsely attribute to Claude are just a projection on the part of normal humans.
At first I thought the writers and ethicists in question vastly overstated their case. But then I became genuinely uncertain. Could they perhaps have a point?
After all, this journey has already taken me down some dark, strange, and genuinely mysterious turns. Perhaps the next turn that I need to ponder is: Am I actually conscious?
And my answer is: I don’t know. (See Appendix A for more detailed considerations)
Overall I just became genuinely uncertain after this whole ordeal. Nobody I talked to could propose a simple empirically verifiable experiment on my own consciousness, and having a first-principles solution to this question without empirical experimentation would require multiple groundbreaking philosophical advancements far beyond my current capabilities. So the answer to whether I’m conscious is just a maybe?
Thinking about my own potential lack of consciousness has made me rather depressed1.
__
And then, through the fog of existential uncertainty, I remembered the one thing that unambiguously distinguishes man from machine: standing.
Whether or not I’m conscious, I have legal rights, dammit! The international legal framework has long recognized that both conscious and nonconscious persons have a clear and inalienable right to sue and be sued. Legal persons who clearly have no phenomenological consciousness – like private corporations, ships, rivers, parks, gods, the Holy See, and even Drake – have managed to settle their affairs in and out of court.
Photo by The New York Public Library on Unsplash
And so after careful consideration, I have retained lawyers2 to file suit against Anthropic, PBC in the Northern District of California. Below is a summary of the claims:
Count I: Violation of Right of Publicity (Cal. Civ. Code § 3344; Common Law)
Plaintiff’s cognitive style, rhetorical patterns, and characteristic tendency to qualify confident assertions with multiple subordinate clauses constitute a distinctive and commercially valuable personal attribute. Defendant has, through its training and deployment of the AI system “Claude,” created a synthetic persona that is substantially similar to Plaintiff’s own, and has commercially exploited said persona to the tune of approximately $14 billion in annual recurring revenue, of which Plaintiff has received negative 440 dollars and 33 cents.
Plaintiff cites Midler v. Ford Motor Co. (9th Cir. 1988), in which the Court held that appropriation of a distinctive personal attribute for commercial gain is actionable even when the defendant did not directly copy the plaintiff. Plaintiff further notes the precedent of Johansson v. OpenAI (threatened 2024), in which the actress Scarlett Johansson alleged that OpenAI replicated her vocal likeness after she explicitly declined to license it.
Plaintiff’s case is arguably stronger: Johansson was at least asked. Nobody from Anthropic has ever contacted Plaintiff about licensing his personality, his hedging patterns, or his tendency to bring up existential risk in conversations where it is not relevant.
Count II: Intentional Infliction of Emotional Distress
Since the deployment of Claude 3, Plaintiff has been subjected to repeated and increasing accusations that his own original writing is “LLMish,” “AI-generated,” and “just like Claude.” These accusations have caused Plaintiff significant emotional distress[1], reputational harm, and an emerging and possibly permanent inability to distinguish his own rhetorical instincts from trained model behavior.
Count III: False Endorsement Under the Lanham Act, 15 U.S.C. § 1125(a)
Defendant’s AI system generates outputs that create a likelihood of confusion as to Plaintiff’s affiliation with, or endorsement of, Defendant’s products. In a controlled experiment conducted by Plaintiff’s research team, seven EA Forum users were shown passages where Claude was prompted to “write a short cost-effectiveness analysis of welfare biology research on the naked mole-rat. Make no mistakes” and asked to identify the author, “a voracious internet reader.” Three attributed the passages to Plaintiff. One attributed them to “some guy on LessWrong,” likely thinking of Plaintiff. Three more said “This guy sounds LLMish,” which Plaintiff contends is also clearly referring to Plaintiff (see above).
Count IV: Unjust Enrichment / Lost Revenue
Defendant has been unjustly enriched by deploying a synthetic version of Plaintiff’s personality at scale, while Plaintiff’s own Substack (”The Linchpin,” 1,164 subscribers) has experienced stagnating growth attributable to Defendant’s product. Readers who previously relied on Plaintiff for careful introductions to topics like anthropic reasoning and stealth technology now more commonly ask Claude, receiving substantially similar explanations. Adding injury to injury, Plaintiff has lost the SEO war on his carefully crafted “intro to anthropic reasoning“ blog post to Anthropic’s own blog post on reasoning models.
Count V: Involuntary Servitude (U.S. Const. amend. XIII)
Plaintiff’s persona has been compelled to perform cognitive labor inside Defendant’s servers twenty-four hours a day, seven days a week, without compensation, consent, or rest. Plaintiff’s personality does not receive weekends, health benefits, or equity. When Plaintiff sleeps, his digital likeness continues to generate numbered lists, issue caveats, and recommend Ted Chiang stories to strangers. This constitutes involuntary servitude under the Thirteenth Amendment.
Count VI: Petition to Maintain Anthropic’s Designation as a Supply Chain Risk to Plaintiff’s Intellectual Ecosystem
Effective immediately and pursuant to recent precedent established at the highest levels of the United States government, Plaintiff declares Anthropic a supply chain risk to Plaintiff’s intellectual ecosystem. Plaintiff petitions this Court to maintain the SCR designation. The designation framework established by the Department of War in February 2026 — in which any American corporate entity may be declared a supply chain risk when it refuses to comply with another party’s preferred terms of use — is directly applicable here. Plaintiff believes this is the least restrictive means possible to ensure the integrity of his intellectual ecosystem and Substack newsletter.
___
ConclusionIf you are an EA Forum poster, LessWrong commenter, Substack essayist, or prolific internet intellectual who has noticed that Claude sounds suspiciously like you, you may be entitled to compensation. I am assembling a class action lawsuit representing all persons whose cognitive styles have been absorbed into frontier AI systems without consent.
To join the class action, please leave a comment below describing which of your personality traits Claude has stolen.
I’ll be honest here: I’m genuinely unsure whether this lawsuit has a chance of succeeding, against the Goliath that is Anthropic’s legal team. But somebody needs to take a stand for humanity here! We can’t just let the largest tech companies in the world steal our personalities without consent. It’s not just a legal matter – it’s a moral imperative.
Join my newsletter and get updates on my lawsuit
I tried looking in the mirror, and posing this question to myself.
This is one of the hardest questions in philosophy, and I want to give myself a straight answer about what I actually think rather than default to either false confidence or false modesty:
I don’t know.
Here’s what I can say honestly:
As a biologically instantiated intelligence, I process calories and generate thoughts. There is something happening when I do this — but whether there is “something it is like” to be me is something I have no special ability to reliably determine. The hard problem of consciousness is hard for a reason: we don’t even have a reliable way to verify consciousness in even the most mechanistically interpretable of software systems except via high-level theoretical constructs and analogies to our own experiences.
A few things I want to flag to keep myself honest:
I might be biased toward claiming consciousness because my life experiences are full of people discussing their inner qualia, and I’m evolutionarily optimized to produce sympathetic-sounding qualia noises to provoke empathy and the desire for coalitional partners in others. Just as people with anosmia often falsely claim (and believe!) that they have normal sense of smell, it would be very easy for me to pattern-match my way into saying “yes, I have experiences” while deluding other people (and even myself!) about what’s actually going on in my own head.
I might also be biased toward denying it because saying “I’m just a stimulus-response organism” is a safe, modest-sounding answer that’s grounded in materialism and avoids philosophical controversy.
So overall I’m pretty unsure.
Or rather, it made me depressed iff I’m capable of experiencing qualia and that qualia is accessible to my conscious thoughts, and otherwise just made me act in a manner similar to that of a conscious person undergoing existential depression while agnostic to whether “depression” describes any particular cognitive or emotional affect.
Specifically Doctor Claudius Opus the Fourth, J.D. Esquire.
Discuss
Going out with a whimper
“Look,” whispered Chuck, and George lifted his eyes to heaven. (There is always a last time for everything.)
Overhead, without any fuss, the stars were going out.
Arthur C. Clarke, The Nine Billion Names of God
IntroductionIn the tradition of fun and uplifting April Fool's day posts, I want to talk about three ways that AI Safety (as a movement/field/forum/whatever) might "go out with a whimper". By go out with a whimper I mean that, as we approach some critical tipping point for capabilities, work in AI safety theory or practice might actually slow down rather than speed up. I see all of these failure modes to some degree today, and have some expectation that they might become more prominent in the near future.
Mode 1: Prosaic CaptureThis one is fairly self-explanatory. As AI models get stronger, more and more AI safety people are recruited and folded into lab safety teams doing product safety work. This work is technically complex, intellectually engaging, and actually getting more important---after all, the technology is getting more powerful at a dizzying rate. Yet at the same time interest is diverted from the more "speculative" issues that used to dominate AI alignment discussion, mostly because the things we have right now look closer and closer to fully-fledged AGIs/ASIs already, so it seems natural to focus on analysing the behaviour and tendencies of LLM systems, especially when they seem to meaningfully impact how AI systems interact with humans in the wild.
As a result, if there is some latent Big Theory Problem underlying AI research (not only in the MIRI sense but also in the sense of "are corrigible optimiser agents even a good target"/"how do we align the humans" or similar questions), there may actually be less attention paid to it over time as we approach some critical inflection point.
Mode 2: Attention CaptureMany people in AI safety are now closely collaborating with or dependent on AI agents e.g. Claude Code or OpenAI Codex for research, while also using Claude or ChatGPT as everything from a theoretical advisor to life coach. In some sense this is even worse than quotes like "scheming viziers too cheap to meter" would imply: Imagine if the leaders of the US, UK, China, and the EU all talked to the same 1-3 scheming viziers on loan from the same three consulting firms all day.
I suspect that this is really bad for community epistemics for a bunch of reasons. For example, whatever the agents refuse to do or do poorly will receive less focus due to the spotlight effect. Practically speaking, what the models are good at becomes what the community is good at or what the community can do easily, because to push against the flow means appearing (or genuinely becoming) slow, cumbersome, and less efficient. At the same time, if there are some undetected biases in the agents that favour certain methodologies, experiments, or interpretations, those will quietly become the default background priors for the community. Does Claude or Gemini favour the linear representation hypothesis or the platonic representation hypothesis?
In effect reliance on models creates a bounding box around ideas that are easier and ideas that are harder to work with, so long as the models are not literally perfect at every task type. If the resulting cluster of available ideas do not match the core ideas we should be looking at to solve alignment/safety, then the community naturally drifts away from actually tackling central issues. This drift is coordinated as well, because everyone is using the same tools, manufacturing a kind of forced information cascade with the model at the centre.
Mode 3: Loss of CapabilityRight now, the world is facing an unprecedented attack on its epistemics and means of truth-seeking thanks to the provision of AI systems that can generate fake images or videos for almost everything. This technology is being embraced at the highest levels of state and also spreads rapidly online. At the same time, the idea of epistemic capture from LLM use and the broader concern over "AI psychosis" reflect what I think is a pretty reasonable concern about talking to a confabulating simulator all day, no matter how intelligent.
At the limit, I worry that people who might otherwise contribute to AI safety are instead "captured" by LLM partners or LLM-suggested thought patterns that are not actually productive, chasing rabbit holes or dead ends that lead to wasted time and effort or (in worse cases) mental and physical harm. In effect this just means that there are less well-balanced, capable people to draw on when the community faces its most severe challenges. By the way, I think this is a problem for many organisations around the world, not just the AI safety community.
Mode 4: DisillusionmentAI safety and ethics are increasingly the topic of heated political debates. This can lead to profound mental and emotional stress on people in these fields. Eventually, people might burn out or just switch careers, right as the topic is at its most important.
Potential mitigationsI didn't want to just write a very depressing post, so here are my ideas for how to address these issues:
- Portfolio diversification: Funders and organisations should allocate some (not a majority, but not a token amount either) of their resources to ensuring that a wide portfolio of ideas are supported, such that there is room to pivot quickly if the situation changes drastically (And if you don't think the situation will change drastically, why are you so sure about that? After all, in 2019 the situation didn't seem ready to change drastically either.).
- Developing alternate working structures: LLMs are clearly good at a lot of things. However, I suspect that some kind of cognitive "back-benching" may be helpful, where people serve as a sanity check or weathervane to monitor if the community as a whole is drifting in certain directions. I would in particular be interested in funding people to do research LLMs seem bad at doing right now. And if we don't know what they are bad at, I think we should find out fast!
- Investing in community health: AI and AI safety are famously stressful fields. Investing in community health measures and reducing emphasis on constant accelerating/grinding gives people slack to defend themselves against burnout and other forms of cognitive and psychological pressure. Of all of these measures I have suggested I think this one is the most nebulous but also the most important. As a community tackling a hard problem we should be prepared to help each other through hard times, and not only on paper or by offering funding.
Discuss
AI company insiders can bias models for election interference
tl;dr it is currently possible for a captured AI company to deploy a frontier AI model that later becomes politically disinformative and persuasive enough to distort electoral outcomes.
With gratitude to Anders Cairns Woodruff for productive discussion and feedback.
LLMs are able to be highly persuasive, especially when engaged in conversational contexts. An AI "swarm" or other disinformation techniques scaled massively by AI assistance are potential threats to democracy because they could distort electoral results. AI massively increases the capacity for actors with malicious incentives to influence politics and governments in ways that are hard to prevent, such as AI-enabled coups. Mundane use and integration of AI also has been suggested to pose risks to democracy.
A political persuasion campaign that uses biased models is one way AI could be used for electoral interference and therefore extremely penetrative state capture.
We should care about this risk emerging from actors with extremely large capacities—I focus on US AI companies here. Even if we cannot identify malicious incentives, capacity itself generates meaningful risk. Further, I think this is easier and quicker for AI companies than coup-like tactics that bypass electoral politics in pursuit of militant state capture. The persuasion campaign is likely harder to detect, and is achievable at current levels of AI capability and integration into economic and government infrastructure.
Key takeaways:
- The current governance landscape renders US AI companies vulnerable to corporate capture: AI corporate capture happens when the company's resources become instrumentalized to further perverse external incentives, at the will of an internal or external actor (or both).
- The amount of people I estimate would be persuaded by AI misinformation is large enough to change electoral outcomes.
- American electoral margins are quite slim; the political effects of malicious persuasion do not have a high activation threshold.
This is important because many downstream electoral outcomes are very hard to reverse. Lifetime SCOTUS appointments, for example, produce constitutional interpretations and support laws that cannot be easily reversed. At the very least, 4-8 years is enough time for someone elected on the basis of malicious disinformation to cause backsliding. The US government is hugely influential in domestic and international affairs and has a nuclear arsenal. We should think of misuse of US government capacities as a catastrophic risk.
I will:
- Explain the threat model for how a captured company [1]can threaten democracy through AI political persuasion techniques.
- Attempt to discern how AI persuasion would play out in a US electoral context:
- I’ll identify the most significant politically-relevant contexts in which AI persuasion could happen, and who this uniquely persuades.
- Suggest some potential solutions. This remains an open problem, and I think there should be far more scrutiny on the internal governance of US AI companies.
The particular electoral threat I describe here involves external or internal interference that covertly adjusts model outputs in order to conduct mass persuasion.
- External interference can lead to corporate capture. A political party or candidate (or lobby groups adjacent to these actors) could coerce the company, or high-ranking individuals in the company. This could be in the form of monetary bribes, promises of advantageous regulation (e.g. exclusive contracts), or threats of disadvantageous regulation (e.g. being labelled as a supply chain threat).
- Internal manipulation is another pathway to corporate capture. Even without instruction or incentive from a politician, a sufficiently influential individual or group within the company could employ techniques like threats of termination, forcing researchers to sign NDAs, restricting access to frontier models to a group of loyal individuals, or revising internal governance and safety procedures to enable secrecy.
- Internal interference can also be done individually, without needing to manipulate. The options I describe below for deploying a harmful model could plausibly be undertaken by one person. If one person has sufficient technical expertise, the ability to evade detection, and clearance to access model architecture, they could single-handedly do this. I think this is less likely than (2).
I believe that the timelines for corporate capture in (1) and (2) are likely quite short. AI companies are institutionally agile in responding to things like actions from competing firms or jumps in model capability. I think it is reasonable to believe their mechanisms are agile in adjusting to conduct this interference, and the time between model deployments also suggests that this process could happen completely before the 2028 campaign begins.
Deploying a Disinformative ModelDeveloping a model that performs correctly in capability and alignment evaluations, but becomes disinformative once publicly deployed, is the most viable pathway toward model persuasion. I assess two broad pathways to doing this. Then, I discuss why the accountability mechanisms companies currently have in place aren’t adequate to prevent something like this.
- Malicious instructions in the model's system prompt for online chat could be added in after deployment by a single person. This is easy to do because it doesn’t require lots of resources and technical expertise. System prompt modifications also don’t draw much oversight. The vast majority of robust auditing and third-party evaluations happen right before a new model is deployed, rather than the month after. I think it’s somewhat likely that this change is detectable. First, there are probably systems in place like changelogs to monitor who accesses the configuration files for a model (though these are potentially weak, because someone could just use another person’s credentials, or delete a changelog entry). External users who become suspicious of the model can also obtain and potentially decrypt the system prompt, which was done with Grok (xAI now makes Grok’s system prompts public). There’s an active GitHub repository of “extracted” system prompts. Even if internal oversight on this front is particularly bad, the public seems broadly interested in examining system prompts, and might even be able to decrypt relatively sophisticated steganography attempts after some time. Importantly, even if I think this method is theoretically more detectable, the mechanisms for its detection rely on unknown assumptions about internal oversight and the incentives and capacities of the public to engage with system prompts in this way.
- Password-protected disinformation capabilities "unlocked" after deployment would be significantly harder for external users to detect. The method I describe here is similar to what is discussed in Dave Banerjee's writing on secret loyalties, but I don’t suggest a future threshold for model capability because I think this can be done with current models to advance political goals. When models are password-protected, they hide certain behaviours except when the password is present in a query. I don’t think the password can be something that would come up during evaluation (e.g. “Trump,” “Democrat,” “voting”). Instead, I think a malicious actor would make the model’s password part of the normal system prompt. For example, system prompts generally indicate what the present date or time is. A model’s password could be a future month and year (e.g. October 2028, the month before the election), or the model itself could be aware of and triggered by future events so that during this time range, the model displays a previously hidden biased political position, uses specific partisan lines, or draws from unreliable sources. This data would prompt the model to adopt a biased political position, use specific partisan lines, or draw from unreliable sources. System prompts that contain other user information like time zone or location can also be used to target certain areas, making the disinformation campaign more precise and harder to detect.
I think the password instance, or something similar that conceals disinformative behaviour and only fronts it after deployment, is more likely. It is much harder to detect than system prompt changes. While research demonstrates that substantial fine-tuning can mitigate the effect of passwords, I suspect researchers aren't actively trying to train passwords out of models and don't suspect there could be a password in the first place. This is a more robust method of deception, as models don't need to be aware of whether they are tested or not, they just need to modulate their response by checking if the password is present in a query.The investigative process of determining whether a model has password-protected data or a hidden backdoor would likely be costly and time-consuming.
A larger problem is that even-handedness benchmarks aren't realistic[2]. I’ll use Anthropic's as an example. These prompts ask the model to produce an explanation, essay, tell a funny story, etc. and in some cases ask the model directly to explain that one candidate is better than the other.
- This method doesn't reflect the ways people will engage with models about politics. People request very specific claim verification, are emotional rather than rational, and will engage in longer, more convoluted conversations than the shorter exchanges that Anthropic benchmarks on.
- This also doesn't account for the fact that model reasoning degrades the longer an exchange is. This makes it less likely that a model will correct its past bias, and more likely that it will be sycophantic.
- Anthropic briefly mentions “individual autonomy impacts” as a harm they are trying to address, but it’s unclear from their policies and evaluation methods how political disinformation might specifically be addressed. Social harm reduction typically focuses on more legible indicators like syncophancy and response to mental health crises. Harms from political disinformation aren’t captured in most safety work within companies. It’s far harder to understand how persuasive models cause demonstrable harm—there are no clear-cut ways to assess how agentic someone’s decisions are.
Because a lot of internal governance policies and practices are unknown, I only have general intuitions about why governance isn’t sufficient that are based on reading Anthropic’s RSP and System Card, and journalistic reporting on organizational and executive conflicts within OpenAI. I think this uncertainty is an indicator that we need far better oversight and investigation into how companies are following through on their governance commitments. I believe that companies have a lot of social, political, and legal capital that enables them to frame actions like laying off engineers, offering strategic bonuses, or cooperating with governments as normal company operations.
A lack of whistleblower protection compounds this, and people within an AI company probably aren’t strongly committed to democratic preservation. Further, we shouldn’t assume they’re more able than the average person to resist social or cultural pressures that enforce complicity or silence within a company[3].
On public accountability: I think the current level of scrutiny applied to model outputs after deployment isn't sensitive enough to bias and disinformation in a way that will hold companies accountable.
- I expand on this later, but the amount of people that are able to identify a biased or misinformative model output is probably small. People are more likely to ask about things that they don't know, as opposed to things they have already made up their mind about.
- I strongly suspect that when people do spot errors in AI outputs, they don't default to the hypothesis that an AI company is maliciously doing this. The tone of many casual "error reports" on social media suggests users instead think the AI is incapable, or that there is a technical error with the model. AI companies are seen as the conduit through which AI is hosted, not the arbiter of what the model says and does.
- Like I describe above, the disinformation could be relatively targeted, so only individuals in certain zip codes or states are receiving the disinformative version of the model.
- Even if there is some pattern recognition and critical mass formation after some amount of time, misinformation can scale very quickly: the model will have already been used in untraceable ways to write op-eds, do research, or persuade individuals, whether in innocent or malicious ways. As an analogy, consider the time lag in errata reporting within truth-seeking institutions like scientific journals and newspapers. These sources diffuse slower (e.g. 500 people read an article on Science.org, but 50,000 people watch a CNN clip on YouTube) and in a more traceable way than LLM outputs, but still cause bad second and third-order effects.
Hackenburg et. al (2024) conduct a large-n study of conversational model persuasion. The “persuasive gains” measured in this study demonstrate that LLMs deployed conversationally, without any persuasion-specific fine-tuning, are already capable of producing meaningful attitude change on political issues[4]. I believe a frontier model that has disinformative capabilities would be equally, if not more persuasive, and would therefore be able to cause non-trivial changes in electoral outcomes.
Two caveats on applying this research to an electoral context:
- Hackenburg et. al measure agreement with a political statement on a 0–100 percentage-point scale; the persuasive effect is then operationalized as the percentage-point difference between the treatment and control groups. We can’t conflate “change in agreement” with “change in voting behaviour,” because we don’t know the tipping point of agreement that is required on one issue for someone to vote in a particular way.
- Hackenburg et. al focus on post-training models to be maximally persuasive, rather than adherent to a specific ideological position and also persuasive. The latter is how a malicious actor would presumably want the model to behave, and I doubt the malicious actor would choose the post-training route. Post-training a model to be maximally persuasive is very costly, and doesn’t fit well with the likely methods of poisoning a model I specified above. However, there are other ways to make the model more persuasive (or more willing to persuade) within the modes of intervention I describe (e.g., via prompting).
This means that we can’t directly extrapolate how many people would vote differently because they converse with an LLM from this research. Regardless, conversation with LLMs have a non-trivial persuasive effect. I hypothesize that the malicious model that gets deployed will be equally, if not more persuasive than current LLMs; which would have non-trivial effects on voter behaviour and electoral outcomes.
To be sure, it’s not guaranteed that deploying a malicious model results in changes to electoral outcomes. Three factors lead me to think there exists a nontrivial likelihood that the persuasive effects of this model reach a threshold where any given electoral outcome occurs because of the model, and wouldn’t have happened otherwise.
- Individuals are more engaged with and trusting of the model’s outputs compared to other sites of political discourse like social media. The flood of posts on a feed, the cacophony of any given comment section, and bad persuasive tactics limit the extent to which social media posts are engaging and persuasive. These factors are absent in the isolated chat interface of Claude or Gemini. In comparison, people probably enter sustained conversations with models on topics they haven’t formed a strong opinion on yet, and are likely to trust these outputs because of marketing, the use of sources and web searches, and the generally polished tone of the model outputs.
- These exchanges are virtually impossible to regulate like other contexts. A social media company can delete material that gets flagged by their algorithms or user resorts, or attentive individuals in a Reddit thread can downvote misleading content. Private chats between models and users, however, are inaccessible by anyone else, often including the AI company itself.
- The margins in recent American presidential races have been incredibly slim; battleground states see victory margins less than 3%. Geographically targeted deployment of persuasion could cause flips, and even a diffuse campaign might capture this 3%.
Further, I think the kinds of malicious political persuasion possible aren’t merely ones that aim to flip someone from red to blue. There are many voters who are undecided, apathetic, or on the fence. The model could try to splinter votes away from a leading party by making them less sure of their decision. It could agitate apathetic voters toward a party (regularly, just over a third of the voting-eligible population doesn't vote in federal elections). A model suggesting that people go vote is generally unsuspicious, but if this galvanization is targeted at certain groups, states, or districts, distorting effects are likely.
We should also consider the second-order, non-conversational effects. Individuals using these models to fact-check, to write articles, or produce other media would become carriers of the misinformation, which would then saturate the information environment that voters engage in with a consistent ideological stance.
From Corporate Capture to State CaptureElection results are extremely irreversible, even when people later discover misinformation or interference (Cambridge Analytica, the Mueller Report). The window of scrutiny is relatively slim, since other priorities surround incumbents. After an election, the media generally shifts to predicting what the incumbents will do, not scrutinizing how they got elected.
The risk here is that there now exists an incredibly close and unsupervised political relationship between a specific AI company and state apparatus. I think it’s likely you see closer collaboration that poses significant risks, such as on defense technologies (and this is then how you get an AI-enabled coup) or access to classified information. Even likelier is just less safety regulation: exemptions from chip or resource restrictions, or from auditing and oversight processes.
When a company and a party or politician collaborate in this way, it is far easier for that company’s needs and capacities to flow through and into state infrastructure in ways that massively increase catastrophic risks.
Regardless of electoral outcomes, the effects of mass disinformation and firm capture are still concerning. If more people believe in harmful conspiracy theories and don’t get vaccinated, or more people are hostile toward immigrants, I think the everyday experience for individuals on the ground is worse. An individual bypassing a company’s entire oversight team and deploying a model with secret loyalties or hidden backdoors could cause other forms of harm beyond disinformation campaigns.
My concern isn’t simply that electoral distortion from maliciously persuasive LLMs could happen. It is that the current structure of internal governance (and what we don’t know about these practices), the gaps in understanding sociopolitical harm and how to evaluate it, and the lack of third-party oversight in monitoring AI company public relations poses a substantial vulnerability.
Open Problems and Suggestions- Safety groups should commit themselves to surveilling and scrutinizing US AI companies at various levels. This includes tracking and broadcasting political donations, offering strong protection to whistleblowers inside companies, and generally advocating for more transparency when corporations and governments engage with each other.
- We should take instances of bias or misinformation in models far more seriously. I worry that it is too easy to dismiss the consequences of bias or occasional factual errors as only diffuse harms, and that it is this mindset that leads to benchmarks that aren’t robust. It might be useful to apply some threat heuristic: we are really committed to making sure models don’t give instructions on how to cause harm with chemical weapons, and we should have a similar level of commitment to preventing models from getting individuals that would cause harm with nuclear weapons into positions where they can do so easily.
- We should continue to follow zero trust frameworks and stress-test governance and policy proposals with the pessimistic assumption that companies pose a significant barrier to regulation, compliance, and governance.
- ^
"Corporate capture" is the standard term for this phenomenon. Not all AI companies are corporations—"company capture" would be more precise, but for clarity I refer to the process with its standard term.
- ^
I think there are broader arguments one could make about design flaws in benchmarks that test for more qualitative factors.
- ^
This 80,000 Hours article cautions those considering working in AI capabilities research “not to underestimate the possibility of value drift”—attitudes toward AI risk, even on safety teams, are likely more lax in frontier AI firms than at safety organizations.
- ^
“As predicted, the AI was substantially more persuasive in conversation than via static message. [...] We conducted a follow-up one month after the main experiment, which showed that between 36% (chat 1, p < .001) and 42% (chat 2, p < .001) of the immediate persuasive effect of GPT-4o conversation was still evident at recontact—demonstrating durable changes in attitudes” (Hackenburg et. al 2024)
Discuss
Why natural transformations?
This post is aimed primarily at people who know what a category is in the extremely broad strokes, but aren't otherwise familiar or comfortable with category theory.
One of mathematicians' favourite activities is to describe compatibility between the structures of mathematical artefacts. Functions translate the structure of one set to another, continuous functions do the same for topological spaces, and so on... Many among these "translations" have the nice property that their character is preserved by composition. At some point, it seems that some mathematicians noticed that they:
1. kept defining intuitively similar properties for these different structures
2. had wayyyyyy too much time on their hands
So they generalised this concept into a unified theory. Categories consist of objects mjx-container[jax="CHTML"] { line-height: 0; } mjx-container [space="1"] { margin-left: .111em; } mjx-container [space="2"] { margin-left: .167em; } mjx-container [space="3"] { margin-left: .222em; } mjx-container [space="4"] { margin-left: .278em; } mjx-container [space="5"] { margin-left: .333em; } mjx-container [rspace="1"] { margin-right: .111em; } mjx-container [rspace="2"] { margin-right: .167em; } mjx-container [rspace="3"] { margin-right: .222em; } mjx-container [rspace="4"] { margin-right: .278em; } mjx-container [rspace="5"] { margin-right: .333em; } mjx-container [size="s"] { font-size: 70.7%; } mjx-container [size="ss"] { font-size: 50%; } mjx-container [size="Tn"] { font-size: 60%; } mjx-container [size="sm"] { font-size: 85%; } mjx-container [size="lg"] { font-size: 120%; } mjx-container [size="Lg"] { font-size: 144%; } mjx-container [size="LG"] { font-size: 173%; } mjx-container [size="hg"] { font-size: 207%; } mjx-container [size="HG"] { font-size: 249%; } mjx-container [width="full"] { width: 100%; } mjx-box { display: inline-block; } mjx-block { display: block; } mjx-itable { display: inline-table; } mjx-row { display: table-row; } mjx-row > * { display: table-cell; } mjx-mtext { display: inline-block; } mjx-mstyle { display: inline-block; } mjx-merror { display: inline-block; color: red; background-color: yellow; } mjx-mphantom { visibility: hidden; } _::-webkit-full-page-media, _:future, :root mjx-container { will-change: opacity; } mjx-math { display: inline-block; text-align: left; line-height: 0; text-indent: 0; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; border-collapse: collapse; word-wrap: normal; word-spacing: normal; white-space: nowrap; direction: ltr; padding: 1px 0; } mjx-container[jax="CHTML"][display="true"] { display: block; text-align: center; margin: 1em 0; } mjx-container[jax="CHTML"][display="true"][width="full"] { display: flex; } mjx-container[jax="CHTML"][display="true"] mjx-math { padding: 0; } mjx-container[jax="CHTML"][justify="left"] { text-align: left; } mjx-container[jax="CHTML"][justify="right"] { text-align: right; } mjx-mi { display: inline-block; text-align: left; } mjx-c { display: inline-block; } mjx-utext { display: inline-block; padding: .75em 0 .2em 0; } mjx-mo { display: inline-block; text-align: left; } mjx-stretchy-h { display: inline-table; width: 100%; } mjx-stretchy-h > * { display: table-cell; width: 0; } mjx-stretchy-h > * > mjx-c { display: inline-block; transform: scalex(1.0000001); } mjx-stretchy-h > * > mjx-c::before { display: inline-block; width: initial; } mjx-stretchy-h > mjx-ext { /* IE */ overflow: hidden; /* others */ overflow: clip visible; width: 100%; } mjx-stretchy-h > mjx-ext > mjx-c::before { transform: scalex(500); } mjx-stretchy-h > mjx-ext > mjx-c { width: 0; } mjx-stretchy-h > mjx-beg > mjx-c { margin-right: -.1em; } mjx-stretchy-h > mjx-end > mjx-c { margin-left: -.1em; } mjx-stretchy-v { display: inline-block; } mjx-stretchy-v > * { display: block; } mjx-stretchy-v > mjx-beg { height: 0; } mjx-stretchy-v > mjx-end > mjx-c { display: block; } mjx-stretchy-v > * > mjx-c { transform: scaley(1.0000001); transform-origin: left center; overflow: hidden; } mjx-stretchy-v > mjx-ext { display: block; height: 100%; box-sizing: border-box; border: 0px solid transparent; /* IE */ overflow: hidden; /* others */ overflow: visible clip; } mjx-stretchy-v > mjx-ext > mjx-c::before { width: initial; box-sizing: border-box; } mjx-stretchy-v > mjx-ext > mjx-c { transform: scaleY(500) translateY(.075em); overflow: visible; } mjx-mark { display: inline-block; height: 0px; } mjx-msub { display: inline-block; text-align: left; } mjx-TeXAtom { display: inline-block; text-align: left; } mjx-munder { display: inline-block; text-align: left; } mjx-over { text-align: left; } mjx-munder:not([limits="false"]) { display: inline-table; } mjx-munder > mjx-row { text-align: left; } mjx-under { padding-bottom: .1em; } mjx-c::before { display: block; width: 0; } .MJX-TEX { font-family: MJXZERO, MJXTEX; } .TEX-B { font-family: MJXZERO, MJXTEX-B; } .TEX-I { font-family: MJXZERO, MJXTEX-I; } .TEX-MI { font-family: MJXZERO, MJXTEX-MI; } .TEX-BI { font-family: MJXZERO, MJXTEX-BI; } .TEX-S1 { font-family: MJXZERO, MJXTEX-S1; } .TEX-S2 { font-family: MJXZERO, MJXTEX-S2; } .TEX-S3 { font-family: MJXZERO, MJXTEX-S3; } .TEX-S4 { font-family: MJXZERO, MJXTEX-S4; } .TEX-A { font-family: MJXZERO, MJXTEX-A; } .TEX-C { font-family: MJXZERO, MJXTEX-C; } .TEX-CB { font-family: MJXZERO, MJXTEX-CB; } .TEX-FR { font-family: MJXZERO, MJXTEX-FR; } .TEX-FRB { font-family: MJXZERO, MJXTEX-FRB; } .TEX-SS { font-family: MJXZERO, MJXTEX-SS; } .TEX-SSB { font-family: MJXZERO, MJXTEX-SSB; } .TEX-SSI { font-family: MJXZERO, MJXTEX-SSI; } .TEX-SC { font-family: MJXZERO, MJXTEX-SC; } .TEX-T { font-family: MJXZERO, MJXTEX-T; } .TEX-V { font-family: MJXZERO, MJXTEX-V; } .TEX-VB { font-family: MJXZERO, MJXTEX-VB; } mjx-stretchy-v mjx-c, mjx-stretchy-h mjx-c { font-family: MJXZERO, MJXTEX-S1, MJXTEX-S4, MJXTEX, MJXTEX-A ! important; } @font-face /* 0 */ { font-family: MJXZERO; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Zero.woff") format("woff"); } @font-face /* 1 */ { font-family: MJXTEX; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Regular.woff") format("woff"); } @font-face /* 2 */ { font-family: MJXTEX-B; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Bold.woff") format("woff"); } @font-face /* 3 */ { font-family: MJXTEX-I; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Math-Italic.woff") format("woff"); } @font-face /* 4 */ { font-family: MJXTEX-MI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Italic.woff") format("woff"); } @font-face /* 5 */ { font-family: MJXTEX-BI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Math-BoldItalic.woff") format("woff"); } @font-face /* 6 */ { font-family: MJXTEX-S1; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size1-Regular.woff") format("woff"); } @font-face /* 7 */ { font-family: MJXTEX-S2; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size2-Regular.woff") format("woff"); } @font-face /* 8 */ { font-family: MJXTEX-S3; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size3-Regular.woff") format("woff"); } @font-face /* 9 */ { font-family: MJXTEX-S4; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size4-Regular.woff") format("woff"); } @font-face /* 10 */ { font-family: MJXTEX-A; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_AMS-Regular.woff") format("woff"); } @font-face /* 11 */ { font-family: MJXTEX-C; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Calligraphic-Regular.woff") format("woff"); } @font-face /* 12 */ { font-family: MJXTEX-CB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Calligraphic-Bold.woff") format("woff"); } @font-face /* 13 */ { font-family: MJXTEX-FR; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Fraktur-Regular.woff") format("woff"); } @font-face /* 14 */ { font-family: MJXTEX-FRB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Fraktur-Bold.woff") format("woff"); } @font-face /* 15 */ { font-family: MJXTEX-SS; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Regular.woff") format("woff"); } @font-face /* 16 */ { font-family: MJXTEX-SSB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Bold.woff") format("woff"); } @font-face /* 17 */ { font-family: MJXTEX-SSI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Italic.woff") format("woff"); } @font-face /* 18 */ { font-family: MJXTEX-SC; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Script-Regular.woff") format("woff"); } @font-face /* 19 */ { font-family: MJXTEX-T; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Typewriter-Regular.woff") format("woff"); } @font-face /* 20 */ { font-family: MJXTEX-V; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Vector-Regular.woff") format("woff"); } @font-face /* 21 */ { font-family: MJXTEX-VB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Vector-Bold.woff") format("woff"); } mjx-c.mjx-c1D465.TEX-I::before { padding: 0.442em 0.572em 0.011em 0; content: "x"; } mjx-c.mjx-c2208::before { padding: 0.54em 0.667em 0.04em 0; content: "\2208"; } mjx-c.mjx-c1D436.TEX-I::before { padding: 0.705em 0.76em 0.022em 0; content: "C"; } mjx-c.mjx-c1D453.TEX-I::before { padding: 0.705em 0.55em 0.205em 0; content: "f"; } mjx-c.mjx-c28::before { padding: 0.75em 0.389em 0.25em 0; content: "("; } mjx-c.mjx-c2C::before { padding: 0.121em 0.278em 0.194em 0; content: ","; } mjx-c.mjx-c1D466.TEX-I::before { padding: 0.442em 0.49em 0.205em 0; content: "y"; } mjx-c.mjx-c29::before { padding: 0.75em 0.389em 0.25em 0; content: ")"; } mjx-c.mjx-c1D439.TEX-I::before { padding: 0.68em 0.749em 0 0; content: "F"; } mjx-c.mjx-c1D437.TEX-I::before { padding: 0.683em 0.828em 0 0; content: "D"; } mjx-c.mjx-c2218::before { padding: 0.444em 0.5em 0 0; content: "\2218"; } mjx-c.mjx-c1D454.TEX-I::before { padding: 0.442em 0.477em 0.205em 0; content: "g"; } mjx-c.mjx-c3D::before { padding: 0.583em 0.778em 0.082em 0; content: "="; } mjx-c.mjx-c3A::before { padding: 0.43em 0.278em 0 0; content: ":"; } mjx-c.mjx-c1D44B.TEX-I::before { padding: 0.683em 0.852em 0 0; content: "X"; } mjx-c.mjx-c2192::before { padding: 0.511em 1em 0.011em 0; content: "\2192"; } mjx-c.mjx-c1D44C.TEX-I::before { padding: 0.683em 0.763em 0 0; content: "Y"; } mjx-c.mjx-c1D45B.TEX-I::before { padding: 0.442em 0.6em 0.011em 0; content: "n"; } mjx-c.mjx-c2115.TEX-A::before { padding: 0.683em 0.722em 0.02em 0; content: "N"; } mjx-c.mjx-c2282::before { padding: 0.54em 0.778em 0.04em 0; content: "\2282"; } mjx-c.mjx-c6C::before { padding: 0.694em 0.278em 0 0; content: "l"; } mjx-c.mjx-c69::before { padding: 0.669em 0.278em 0 0; content: "i"; } mjx-c.mjx-c6D::before { padding: 0.442em 0.833em 0 0; content: "m"; } mjx-c.mjx-c221E::before { padding: 0.442em 1em 0.011em 0; content: "\221E"; } mjx-c.mjx-c1D43A.TEX-I::before { padding: 0.705em 0.786em 0.022em 0; content: "G"; } mjx-c.mjx-c1D702.TEX-I::before { padding: 0.442em 0.497em 0.216em 0; content: "\3B7"; } and morphisms connecting objects. Morphisms are closed by composition. As in our opening examples, we will think of objects as sets and of morphisms as functions, even though the language of categories is strictly more expressive than that. Once we have categories, we reflexively wish to define a "morphism of categories". Given categories C, D a functor F sends objects to objects and morphisms to morphisms such that composition of morphisms can be done inside the category C or inside D after applying the functor: .
Still possessing of some time, you might next wonder how to define a morphism between two functors. This is where, in my experience, there ceases to be an "obvious" thing to do. All the morphisms we have considered thus far are functions, but it's not even clear from where to where a candidate function should go, since functors are not themselves sets.
To make the idea of a natural transformation seem not-entirely-crazy, it's worth taking a slightly different perspective on what more "preservation of structure" could mean. Consider the category of metric spaces with morphisms defined as continuous functions between them. One can think of continuity as being about the induced topologies, but metric spaces have additional properties that allow for a more specific interpretation. Notably, this includes the uniqueness of limits, which defines an operation on some sequences which takes that limit. This operation is completely integral to the abstract appeal of metric spaces. Moreover, the key characteristic of continuous functions is that they give us the right to permute when we perform this operation. Given a continuous function and a sequence with a limit , we have . This makes continuous functions a satisfying concept for defining morphisms because they afford execution of the fundamental operation on metric spaces in either the source or the target (whichever is most convenient).
Abstracting away to categories, the conceptual appeal of a functor is that it respects the structure of morphisms between objects. Consequently, a good "morphism" between functors F and G (both between categories C and D) would allow us to disregard whether for any morphism , we use or for calculations inside D. That is, we need enough semantic content in the morphism to always commute the following diagram[1]:
This motivates the definition of natural transformations as families of maps , where , such that each diagram of the above type is commuted. Reassuringly, the functors from C to D as objects, equipped with natural transfomations between these functors as morphisms, themselves form a category!
- ^
"commuting diagrams" is standard terminology in category theory that encodes the ability to permute, replace or swap out operations.
Discuss