Вы здесь

Сборщик RSS-лент

Pope Leo’s First AI Encyclical – Summary and Commentary

Новости LessWrong.com - 26 мая, 2026 - 02:48

(Adapted from a post on my Substack.)


Today, Pope Leo XIV released his long-awaited encyclical letter about artificial intelligence, addressed not just to the Catholic Church, but to all people of good will, all over the world. Titled Magnifica Humanitas (“Magnificent Humanity”), it is a powerful invitation to worldwide engagement on questions that I believe will decide the future of humankind.

I urge you all to read the encyclical itself, but I recognize that it is very long, and the theological language may be challenging, especially for LessWrong readers from outside the Catholic faith tradition. So I offer the following post as a guide to understanding this world-historic document in terms that I intend to be accessible to all, even if you’re not religious and not familiar with the technical details of AI.

That said, Magnifica Humanitas is long enough and dense enough that even a detailed summary is lengthy. So I will begin the post with an overview of high-level takeaways for those short on time. After that, I’ll give a summary of the entire encyclical, highlighting key passages, and contextualizing the questions that Pope Leo grapples with across the scientific and policy landscapes. I’ll analyze what’s notably said and left unsaid, and how the encyclical lays the groundwork for future engagement by the Church on these issues. In several areas, I offer respectful critiques of the encyclical intended as constructive suggestions for those future engagements.

Bottom line: Pope Leo gives us a wise and practical document, filled with forceful moral critiques of those who would use AI to exploit and dehumanize. He appreciates the benefits AI can bring, but urgently calls the world to the shared work and discernment necessary to achieve a flourishing future. The encyclical focuses on today’s technology, and future AI will pose new and even-more-urgent moral questions, so Magnifica Humanitas should be the beginning of the conversation, not its definitive final word.


14 High-Level Takeaways

1. As expected, it mostly focuses on mitigating risks and harms of AI rather than harnessing its benefits. But it’s clearly not anti-AI or Luddite in orientation. Pope Leo recognizes AI’s profound potential for both good and ill, but warns that this does not make it inherently neutral. Rather, he says, it takes on the face of the humans who design it, fund it, regulate it, and use it.

2. Surprisingly, there is not any explicit mention of AGI. Nor is there explicit mention of catastrophic risk or radical longevity medicine. These were all mentioned in January 2025 in the Church’s previous AI document, Antiqua et Nova, so the omission is notable. Those questions all entail qualitatively new moral challenges, so another encyclical will be needed soon to address them—hopefully before AGI arrives.

3. Likewise, although Leo expresses concern about unemployment from AI, he does not appear to envision a scenario of widespread unemployment or truly transformative impacts on the economy. A growing consensus among AI scientists and economists expects these impacts to be truly profound. If this is correct, the remedies suggested in this encyclical will not be sufficient.

4. Leo highlights the single most important fact that everyone must understand to form the right intuitions about AI: it is not designed like a traditional machine, but organically grown like a creature in a lab. This means that the people who develop it don’t fully understand how it works, and don’t know all its capabilities. He deserves immense credit for expressing this so clearly to a global audience.

5. I and many others were anxious to see how Leo would address philosophies of transhumanism and related questions like whether AI could ever be conscious or experience suffering. Leo admirably recognizes nuance here, critiquing unhealthy forms of transhumanism without a blanket condemnation. While rejecting AI’s current consciousness, he silently avoids slamming the door on the possibility of future models suffering. More research is needed, as well as dialogue between scientists and theologians, and it would have been unwise to preempt that with a definitive statement here.

6. A good test for whether a social encyclical is good is whether it makes any powerful people angry. Magnifica Humanitas almost certainly will. Especially with its blistering condemnations of aggressive war, and of exploitation and dehumanization in the supply chain for AI.

7. A key theme is that AI risks entrenching a “technocratic paradigm” that neglects a full vision of our humanity. Humans, Leo says, should never be reduced to statistics or cogs in a machine. Individual rights, especially of vulnerable people, must never be sacrificed in pursuit of efficiency, profit, and power. The alternative to the technocratic paradigm is integral human development. This is a recognition that human needs and aspirations aren’t fungible—no amount of food can satisfy someone starving for love, and no amount of love can prevent physical starvation. So the different facets of human fulfillment must all be developed together. The implication is that even if AI gives us material abundance, this does not in itself prevent poverty in domains like political freedom, familial love, and spiritual fulfillment.

8. Related to the technocratic paradigm, Leo argues against uses of AI that centralize power and leave ordinary people without meaningful agency over their lives. He describes this as a struggle of homogeneity versus diversity, centralized direction versus individual freedom, top-down control versus shared responsibility.

9. Pope Leo affirms private property and free enterprise, but insists that immaterial things like “patents, algorithms, digital platforms, technological infrastructure and data” are resources that must be ordered toward the common good. It’s a vision of shared flourishing and interdependence. He calls for an end to GDP as our dominant measure of prosperity, because it fails to capture noneconomic goods and does not account for questions of justice.

10. Characterizing cyberspace as a new battleground, Leo criticizes the exploitative behavior of social media companies, but emphasizes the responsibility all users share in promoting a healthier digital environment.

11. Leo issues a ringing call that “Artificial intelligence needs to be disarmed.” By this, he means not just a renunciation of war but a rejection of competitive and adversarial “arms race” framings that increase risk to all humanity. As he said in his live presentation today, AI must be turned away from development paths that lead to “domination, exclusion, and death.” Instead: “Like nuclear energy, it must be at the service of all.” This is direct support to the ongoing international efforts toward AGI safety and shared democratic governance of this technology.

12. He directly addresses AI scientists and lab leaders, warning them that if they fail to consider the broader moral implications of their work, they risk unwittingly creating technology that causes grave harm.

13. Leo invites all people to hope, and to direct participation in this great challenge of our time. Making AI go well for humanity, he tells us, is a work that every person can play a meaningful part in.

14. He quotes Gandalf.


Summary and Commentary

Introduction

At the beginning of the Introduction, Pope Leo frames artificial intelligence as thrusting a choice upon humanity: “to construct a new Tower of Babel or to build the city in which God and humanity dwell together.” In essence, the Tower of Babel represents efforts to transcend our limitations by human technical ingenuity alone—which he contrasts with a future where our most fundamental transcendence comes through love, justice, and wisdom. Christianity identifies these values with God, but the message equally applies to those with a nontheistic worldview.

Leo XIV explicitly invokes the work of his predecessor Leo XIII, whose 1891 encyclical Rerum Novarum (Latin for “on the new things”) addressed the socioeconomic impacts of the Industrial Revolution. This became the foundation of what we now call Catholic Social Teaching. This was the first major Church document to address issues like working conditions, workers’ rights, and labor unions—which had previously been considered outside the scope of theology.

Leo XIV notes that in the 19th century some objected to Rerum Novarum, arguing that the Church “should not waste energy on worldly matters, but instead focus on communicating the message of eternal life.” But he says Leo XIII “responded with realism and wisdom, saying that the proclamation of the Gospel cannot overlook the concrete lives of people.”

This was a major evolution in the Church’s approach to socioeconomic problems like poverty. From the life of Jesus until 1891, the Church mostly saw poverty in personal terms. If Christians saw a poor person, the ideal response was essentially: “You’re hungry, so here’s a loaf of bread.” But Catholic Social Teaching broadened the lens to seeing poverty in systemic terms. Now, in addition to direct charity, the response is: “Let’s work together as a Church and as a society to create a more just system so that you’re not poor and hungry in the first place.” This not only changed Roman Catholic thinking, but had profound effects on how other faiths and secular institutions have approached poverty over the past century.

Leo XIV recognizes that AI may have socioeconomic impacts at least as profound as the Industrial Revolution, so frames this encyclical as applying Catholic Social Teaching to the challenges of this new revolution. Rejecting a narrowly religious mission, both Leos assert the Church’s concern for the earthly dignity and wellbeing of all humans.

Next, Leo makes clear that he is not fundamentally anti-AI: “Technology should not be considered, in itself, as a force antagonistic to humanity. On the contrary, it has formed part of our history since the beginning as ‘a profoundly human reality, linked to the autonomy and freedom of man.’” The concerns he raises are about irresponsible or unjust development and use of AI.

Then, Leo correctly observes that AI is different from the technological revolutions of the 20th century because it is mainly controlled not by governments but by “private, often transnational, parties that are endowed with resources and the capacity to intervene that surpass those of many Governments.” This suggests we need new ethical frameworks and policy mechanisms for effectively keeping AI under democratic control and oriented toward the common good.

Sidebar: many Americans misunderstand Catholic references to “the common good.” The Church supports healthy private enterprise—this term doesn’t imply communist collectivism. Nor does it imply a utilitarian focus on “the greatest good for the greatest number,” or a libertarian focus on just maximizing the sum total of private wellbeing. Rather, the Church understands the common good to be the interconnected set of moral, legal, and economic conditions in which every person’s rights are respected and they have a dignified opportunity to pursue flourishing and fulfillment. Concretely, this means that technology must be used in a way that recognizes us as members of a shared human family—injustice and exploitation against a few cannot be justified by counting up the private benefits to others.

Returning to the tower versus city metaphor, Leo recalls the prophet Nehemiah leading rebuilding of the walls of Jerusalem in the 5th century B.C. and emphasizes that “No one can single-handedly bear the weight of the challenges the world is facing, just as no one is so weak that they cannot play their part.” Rather, all people “are given their own section of the wall: scientists and researchers, entrepreneurs and workers, educators and legislators, civil society, popular movements and faith communities.” The struggle, to Leo, is homogeneity versus diversity, centralized direction versus individual freedom, top-down control versus shared responsibility.

Leo has a deep recognition of AI’s dual-use potential: to help liberate us from poverty and disease, or to enslave and wound us. But dual-use does not mean inherent neutrality. AI, Leo says, is “never neutral, because it takes on the characteristics of those who devise, finance, regulate and use it” [the Spanish version is much more poetic and apt, saying that AI “takes on the face” of its creators].

In a notable passage, Pope Leo calls for “accepting the limits and weakness of humanity without considering them an error to be corrected.” This is wise counsel, but when we zoom in to consider specific cases, the picture becomes much more complicated. In some cases, much of the secular world agrees on limitations that should be accepted. For example, programs like the Special Olympics call humanity to embrace people with intellectual disabilities and celebrate their human dignity. On the other hand, cancer is a weakness of humanity literally programmed into our genes, but the Church strongly supports cancer research. Over the coming decade, AI will give us unprecedented ability to control our biology, encompassing dozens of cases in the gray area between intellectual disabilities and cancer, such as longevity medicine. No simple rule will suffice. The world will need Pope Leo’s deeper engagement on specific questions in order to navigate this fraught path.

Although AI is a new technology, the deeper moral challenge is ancient. Leo observes that “in every age, the risk looms of building an inhuman and more unjust world.”


Chapter One

In Chapter One, Pope Leo summarizes the history of Catholic Social Teaching, emphasizing its organic development over time, with each new idea in harmonious continuity with the rest. In essence, he uses this chapter to situate Magnifica Humanitas in its proper context—not as a change of policy or a break with the past, but the natural conclusion of applying ancient principles to today’s challenges. He also rejects the fundamentalist view that Christian teaching is frozen like a time capsule from the time of Jesus. Rather, he frames the Church as journeying through history alongside humanity, with its teaching growing and deepening from its dialogue with the “new things” of each era. The underlying values are eternal, but the Church must learn from history and human experience in order to apply them justly.

Leo then clarifies the proper relationship of the Church to the secular world. Drawing on the teachings of the Second Vatican Council, he stresses that the church is not trying to impose its values on others, but to work in solidarity with the wider world and to offer its message as a means of social and personal healing.

In the remainder of the chapter, he highlights key insights from his predecessors that have shaped the Church’s social teachings.

Pope Pius XII, he says, “affirmed the need for a sound rule of law for guarding against the abuse of power, and he recognized democracy as a means for ensuring the proper exercise of authority. At the same time, he warned against any attempt to base law on utility or force, recalling that an international order governed by the advantage of the strongest exposes weaker peoples to oppression and fundamentally undermines trust between nations.” In short, the Church supports democratic control of societies and rejects the logic of “might makes right.”

Pope John XXIII widened the scope of Catholic Social Teaching itself. He addressed his message not just to Catholics but to “all people of good will.” Leo emphasizes that this broad engagement outside the walls of the Church to improve social conditions is not “tactical” (i.e. building up credibility and good will to then convert people to Catholicism) but an integral part of the Church’s mission, and an effort to be undertaken for its own sake.

Drawing on teachings of Pope Paul VI, he argues that peace is “not reduced to the mere absence of war, but took shape within the scope of integral human development … the transition from less human to more human living conditions.” That is, some material and social conditions recognize people’s inherent dignity and humanity more than others, and we should work toward conditions that do this better. So the goal for engaging with AI is to first discern which conditions promote humanity and then steer AI’s development and use toward promoting those conditions. And so, Leo says, “the Gospel remains relevant because it provides the criteria for recognizing what humanizes or dehumanizes and what liberates or oppresses in ever-changing situations.”

Near the end of the chapter, he quotes Pope Francis’s memorable insistence on a Church where we allow ourselves to be “evangelized by the poor”—to have our hearts transformed by direct human encounters with people who are suffering.


Chapter Two

Chapter Two explores the core principles of Catholic Social Teaching and how they follow naturally from the Church’s understanding of human nature and human dignity. “Human dignity,” Leo says, “does not depend on a person’s abilities, wealth or position in life, nor on the right or wrong choices made; instead, it is a gift that precedes and transcends each person, endowed by God as an expression of his unfailing love.”

Leo emphasizes that human rights must rest upon a universal foundation to be secure. To the Church, this foundation is God. Some nontheistic people also invoke a foundation on universal moral truths. But others see rights as granted to us by human governments or social conventions. If this latter view prevails, Leo warns, those same structures can take them away: “it is conceivable that rights considered untouchable today might, in the future, end up being questioned or denied by those in power, perhaps after having obtained only an apparent consensus from populations that are frightened or manipulated.” Left unstated here is AI’s enormous potential to assist the powerful in this manipulation.

What follows is an overview of five core principles of Catholic Social Teaching: the common good, the universal destination of goods, subsidiarity, solidarity and social justice.

The shared obligation to promote the common good, as I summarized above, refers to the interconnected set of conditions that promote human rights and human flourishing. The Church recognizes that humans will still have “differing interests and frequent disagreements,” but amidst those conflicts we must not lose sight of our fundamental interconnectedness. As an analogy, if two brothers steal money from their sister and invest it for huge profits, the total family wealth has increased, but the collective good of the family has been damaged. In a similar way, we cannot measure the common good of the human family merely by counting up everyone’s individual wellbeing. We must reject arrangements that increase total individual wellbeing by violating the human rights of some.

In a poignant line, Pope Leo concedes that amidst current global conflicts, appealing to the common good of humanity “sounds like madness. Yet we must not lose hope.”

In a line sure to ruffle feathers both in the White House and the Kremlin, Pope Leo underscores that “any attempt or plan to eliminate or subjugate a nation is gravely immoral and therefore unacceptable.”

The universal destination of goods is the idea that “the earth’s goods—soil, water, air and natural resources—are given by God to the entire human family to sustain the lives of all, and that every person has an inherent right to the use of such goods, both now and in the future.” Leo adds to this the benefits of immaterial goods like “patents, algorithms, digital platforms, technological infrastructure and data.” The Church supports private property and free enterprise as good ways to promote the common good, but insists that these be implemented with a recognition that all people share in a basic right to a just portion of these resources.

Subsidiarity is the principle that decisions should be made at the most local level possible that’s consistent with the common good. For some issues, like national defense and pandemic response, national governments are necessary. But on issues like career development, childrearing, and urban planning, the Church asserts that individuals, families, and cities should have as much freedom as practical, rather than following top-down orders from distant authorities.

While this principle was developed with government hierarchies in mind, Pope Leo notes that it also applies to companies that control AI. He recognizes the potential for AI to impose top-down decisions that would rightly require the input of local communities. “When it comes to decisions regarding economic flows and digital platforms,” he says, “as well as the governance of data and algorithms, we cannot allow a handful of actors to dictate these processes on their own; instead, we must build forms of cooperation that respect the various levels of the global community and make them jointly responsible for the common good.”

Solidarity is the principle that our interconnectedness imposes mutual moral obligations to each other. It is, Leo says, “the concrete recognition that the future of each individual is connected to the future of all.” Solidarity implies duty of each person to “take[] part in the life of the community—by staying informed, engaging with others, making their voice heard and contributing to public decisions and choices.”

In a beautiful passage, Leo reminds us that “we are not merely neighbors to one another, but entrusted to each other, so that each of us may take responsibility, as best we can, for the lives and wounds of our brothers and sisters.”

Leo also highlights intergenerational justice, which was a key theme of Benedict XVI and Francis. Solidarity, he says, demands an “ability to forego immediate benefits in order to create opportunities for others in the future” and “that decisions regarding data, algorithms, platforms and artificial intelligence take into account not only the immediate benefit for a few, but also the impact on all peoples and on future generations.” In essence, just as environmental pollution saves money but impacts children not yet born, irresponsible development and use of AI can create both short-term profit and socioeconomic conditions that harm our descendants.

Here, Leo extends Francis’s concern for the physical environment to the “digital ecosystem.” This is a key point. Much like throwing trash on the ground degrades the environment for everyone, if you watch misinformation or like hateful posts, social media algorithms will spread them to your friends and family. Solidarity obliges us to resist that temptation.

Finally, social justice is the idea that the morality of a society hinges on how it treats the vulnerable. In the Gospels, Leo reminds us, Jesus “identifies himself with the lowly, the sick, the imprisoned and strangers.” Therefore, “social justice begins with the least among us… the poor, migrants, refugees, internally displaced persons, victims of violence and people living in urban or existential peripheries.” Catholic Social Teaching calls this the “preferential option for the poor.” This doesn’t mean that God loves the rich less. But it means that there is a special priority on helping the marginalized of society, just as a mother will prioritize buying medicine for a sick son over candy for a healthy daughter.

Leo recognizes that AI has the potential to disproportionately harm these vulnerable people and create “new forms of exclusion and deprivation of freedoms: individuals and peoples hindered or denied access to basic technologies, communities exposed to invasive surveillance and social groups penalized by opaque algorithms that perpetuate prejudice and discrimination.” This builds on Pope Francis’s concern about algorithmic bias, which—crucially—may appear in AI accidentally, without any malice by the developers.

All these principles are united in the concept of integral human development. “Development is integral,” Leo says, “when it is not limited to the economic sphere, but promotes quality of life in its spiritual, cultural, moral and relational dimensions, while respecting our common home, the diversity of peoples and their ways of life.” In other words, great abundance in one narrow dimension of life cannot make up for deficits in others. In the Catholic view, the human person is a united whole, not a collection of separate needs and aspirations stuffed into the same body. No amount of food can satisfy someone starving for love, and no amount of love can prevent physical starvation. So the different facets of human fulfillment are not fungible—all must be developed together. The implication is that even if AI gives us material abundance, this does not in itself prevent poverty in domains like political freedom, familial love, and spiritual fulfillment.

The chapter concludes with a somber reflection on the Church’s own failings to live up to these principles over the years. Leo pledges renewed efforts for a humble and accountable Church, “listening to the victims of spiritual, economic, institutional, sexual and power-based abuse, as well as abuses of conscience, [which] is an integral part of a journey toward justice, which includes acknowledging the harm done, just reparation and taking steps to prevent it from happening again.”


Chapter Three

Chapter Three begins the heart of the encyclical’s engagement with the implications of AI. Pope Leo starts by returning to the tower-and-city analogy from the Introduction: “On the one hand, there is the Tower of Babel, where collective effort follows a plan that dominates and ultimately dehumanizes (cf. Gen 11:1-9). On the other hand, there are the ruins of Jerusalem, which under Nehemiah’s direction are rebuilt piece by piece as a project of shared responsibility (cf. Neh 2–6). We are called to reflect on the great ‘construction sites’ of our era and ask: What are we building?”

Leo highlights the dangers of what Pope Francis called the “technocratic paradigm.” This is “the tendency to let the logic of efficiency, control and profit alone shape personal, social and economic decisions.” While technology can be a powerful tool for good, “when it becomes the standard by which everything is judged, it begins to dictate what matters and what can be discarded, reducing creation to an object of exploitation and human beings to mere cogs in a system driven toward ever greater efficiency.” Insofar as AI gives corporations and governments much greater power to optimize the efficiency of economic and policy systems, it risks entrenching this technocratic paradigm and dehumanizing us.

Next, Leo aptly notes that “any statement regarding AI risks becoming quickly outdated, given the remarkable pace at which these systems are developing.” This appears to make him reluctant to engage with risk cases that hinge on technical capabilities, like AI creation of pandemic viruses—instead mostly focusing throughout the encyclical on threats to human dignity that are less dependent on scientific questions about the technology.

Leo then makes an extremely important observation about the nature of deep learning-based AI: “all of us, including those who design them, possess only a limited understanding of their actual functioning. Indeed, current AI systems are more ‘cultivated’ than ‘built,’ for developers do not directly design every detail, but instead create a framework within which the intelligence ‘grows.’” This is, as I have argued elsewhere, the single most essential fact for everyone on earth to understand about today’s AI. And the vast majority of AI risk flows from that fact—the training process creates models with unintended goals and capabilities that the developers did not intend. Immense credit to Pope Leo for stating this so clearly to a worldwide audience.

In the next passage, Leo insists that AI models “merely imitate certain functions of human intelligence,” while acknowledging that this offers “tangible benefits across many fields.” He says—correctly, I think, at least for current technology—that AIs “do not undergo experiences, do not possess a body, do not feel joy or pain, do not mature through relationships and do not know from within what love, work, friendship or responsibility mean.” The question of whether more advanced AIs in the future could undergo experiences is not directly addressed, but requires further research. Leo does not preempt that research with a definitive statement, which is wise.

He goes on that AIs “may imitate language, behavior and analytical skills, or even simulate empathy and understanding, but they do not understand what they produce.” This is where things get tricky. Words like “imitate,” “simulate,” and “understand,” mean very different things to different people. I think Pope Leo is correct in the narrow theological sense that he likely intends, but in my experience most people understand those terms in a broader and colloquial sense that may steer them toward inaccurate intuitions about AI’s future. If you tell the average person that AI is incapable of understanding, they’re likely to infer that AI can’t do many kinds of dangerous things that in humans require deep understanding to do. I hope that in future statements, Pope Leo will clarify these terms and their implications to avoid unintentionally downplaying AI risks.

Next, Leo warns that careless AI use “can encourage excessive reliance and the search for ready-made answers, and weaken personal creativity and judgment.” These side effects are certainly not inevitable or universal, but many of us now know someone who has either outsourced key personal decisions to AI or been pushed into “AI psychosis” by sycophantic AI models that flatter them excessively.

Then, he returns to the algorithmic bias issue: “The apparent objectivity of the responses and suggestions these systems provide can lead us to overlook the fact that they reflect the cultural assumptions of those who designed and trained them, with all their strengths and limitations.” I would amplify: this doesn’t necessarily entail any direct political skewing of AI whatsoever. Innocent blindspots can just as easily prevent developers from anticipating how AI might fail and cause harm.

That is followed by a warning about AI imitating genuine human connection: “words of advice, empathy, friendship and even love—can be engaging and at times genuinely helpful. However, for less discerning users, it can also be misleading, creating the illusion of a relationship with a real personal subject.” Leo makes an important clarification: “Here, the danger is not so much that a person may believe they are communicating with another person, but rather that they may gradually lose the very desire to form genuine human connections.” Think of a lonely young man with an AI girlfriend. Yes, he would have preferred a human girlfriend, but finding one entails stress and brutal rejection. If he gets unblinking adoration by an infinitely patient AI free from human complexity, independent desires, or an interior life of her own, he may give up on human romance. And the very habituation to such counterfeit romance will likely make human partnership harder if he ever changes his mind.

The section concludes with a discussion of the environmental impact of AI. Leo does not wade into detailed costs and benefits, but asserts an important principle: the race to build data centers and related infrastructure will naturally tend to harm vulnerable people the most. So we must be very deliberate as a society about protecting them, and must work toward sustainable technologies that preserve our shared home for future generations. Absent here, though I think worthy of mention, is AI’s positive potential to help the environment—such as through materials science breakthroughs for clean energy, through making other processes more resource-efficient, and developing better technologies for carbon capture and pollution cleanup. I would like to see the Church’s future engagement with AI’s environmental implications address the ethics of these positive uses. For example, AI may soon give us powerful geoengineering tools to reverse climate change, but different groups of vulnerable people would face both benefits and harms—who has the moral authority to decide, and under what decisionmaking framework?

Pope Leo next turns to questions around AI governance, transparency, and accountability. He notes, quoting Pope Francis: “Important and sensitive decisions—concerning employment, credit, access to public services or even a person’s reputation—risk being fully delegated to automated systems that do not know ‘compassion, mercy, forgiveness, and above all, the hope that people are able to change.’” That is a very real hazard. But I think this wording risks conflating processes and outcomes. The process question is: was the innocent defendant tried by a jury capable of feeling compassion? The outcome question is: was the innocent defendant actually acquitted? In today’s society, those things are closely intertwined. But although AI is susceptible to inheriting human biases, we already know how to make AI much less biased than humans. Even if AI can’t feel subjective compassion, it may soon be able to make decisions consistent with compassionate reasoning more reliably than the average human. I hope the Church explores this complexity further, reflecting on when human decisionmaking is inherently irreplaceable versus where just outcomes are the foremost priority.

Leo’s next point is excellent: “ethical discernment cannot be limited to asking whether we are using a system for good or bad purposes; it must also examine how that system is designed and what vision of the human person and society is embedded in the data and models that guide it.” For example, if an AI in charge of healthcare decisions treats people with Down syndrome as less worthy of expensive treatments by virtue of shorter nominal life expectancy, Leo teaches that it doesn’t matter that the developers’ goal is to maximize overall lives saved—this is wrong.

He insists that as we deploy AI throughout society, we ensure that a human remains morally and legally accountable for each of its actions. This accountability of decisionmakers to citizens is central to Catholic Social Teaching, and must not be abandoned for convenience or cost savings.

Then comes a headline claim: “Calling for prudence, rigorous evaluation and even, at times, a slower pace in adopting AI does not mean opposing progress; instead, it is an exercise of responsible care for the human family.” Questions around AI pauses or slowdowns are complex, and I believe must be evaluated based on concrete scenarios rather than hashtag slogans. But it’s highly significant that Leo is explicitly open to slowing down under some circumstances.

Leo continues with a correct diagnosis: “This need is all the more urgent given the frequent imbalance between the speed of technological growth and the slower development of awareness, norms, safeguards and institutions capable of governing its effects.” He worries that without informed users and robust political oversight, “change will be governed only by technocratic thinking and presented as necessary and inevitable, ultimately imposing rules shaped by those who control data, infrastructure and computing power.”

Now he really gets to the heart of things: “We cannot be satisfied with merely calling for the moralization of machines—the so-called ‘alignment’ of AI with human values—without also having the courage to insist on a further condition: the possibility of openly discussing the ethical frameworks involved and subjecting them to shared standards of social justice.” Whether you’re a Democrat horrified by Grok’s unhinged rants against Jews, or a MAGA Republican convinced that ChatGPT’s seeming preference for nuclear war over misgendering reveals leftist extremism, you should agree that a handful of CEOs shouldn’t unilaterally decide which ethical framework dominates humanity’s shared future. To their credit, Google, OpenAI, and Anthropic have all been very clear that AI’s values should ultimately be subject to democratic oversight. Note that Pope Leo isn’t arguing that Sam Altman or Dario Amodei will necessarily pick bad values—he’s saying that even if they pick good values, excluding ordinary people from meaningful participation is inherently bad because it violates their dignity and paternalistically reduces them to “passive recipients of decisions made elsewhere.”

Leo continues that “ownership of data cannot be left solely in private hands but must be appropriately regulated. Data is the product of many contributors and should not be treated as something to be sold off or entrusted to a select few. It is necessary to think creatively in order to manage data as a common or shared good, in a spirit of participation.” If you pray, pray for collective wisdom around this, because no one has found a good solution yet.

Then another headline statement, more about principles than policy: “In a world where data, computational resources and regulatory influence remain in the hands of a few, to speak of the common good means exposing this new form of epistemic, economic and political asymmetry and naming the new monopolies of AI.” These asymmetries are an essential concept. AI companies aren’t selling normal goods like cars or dishwashers, nor normal software like Microsoft Word—they are creating the layer through which we will consume most of our information. Unless we proactively create strong safeguards, some combination of government power and corporate power will use that layer to dominate society and subjugate ordinary people.

Leo follows this with a striking turn of phrase: the need “to disarm” AI. He doesn’t mean this merely in the sense of avoiding AI as a focus of military competition. He speaks more broadly about the “desire to secure geopolitical or commercial dominance.” This has acute relevance to the arms race between the U.S. and China, which could push both sides to disastrously cut corners on safety as they sprint toward broadly superhuman AI. “To disarm,” Leo says, “means discrediting the assumption that technical power automatically confers the right to govern. To disarm does not mean rejecting technology, but preventing it from dominating humanity.” This is, alas, not an explicit discussion of misalignment or catastrophic risk, but lays clear groundwork for deeper engagement on those topics in the future.

The section concludes with a personal message to AI lab leaders: “I wish to address a special appeal to those who develop artificial intelligence. In one sense, technological innovation can represent human participation in the divine act of creation. Developers, therefore, bear a particular ethical and spiritual responsibility, for every design choice reflects a vision of humanity.”

Pope Leo then turns to what’s at stake for our species. The technocratic paradigm, he warns, amplifies an “anti-human vision” in which “the fullness of life is equated with having more, reducing weakness, eliminating uncertainty and exerting total control.” Further: “When efficiency becomes the ultimate measure of value, human beings are tempted to see themselves as a project to be optimized rather than as persons called to relationship and communion.” He cautions against AI leading humanity to treat intelligence as an absolute measure of worth, rather than one faculty in service of many others.

Next, Leo turns his attention to two related schools of thought that are extremely common among the people developing advanced AI: transhumanism and posthumanism. In general, these views hold that technology can have profound positive effects on the human condition that go beyond material wealth or mere convenience and amount to a more fundamental evolution of our species. To his credit, Leo acknowledges that these terms encompass an extraordinarily wide range of views—some of which, I think are quite compatible with Christianity and some of which are clearly not—and he refrains from the blanket condemnation that some other Popes might have made.

Instead, he focuses on “not the use of technology as such, but the vision that underlies it.” Namely: “If the human being is treated as something to be perfected or surpassed, it becomes easier to accept that some lives are less useful, less desirable or less worthy. In the name of progress, ‘necessary sacrifices’ may begin to be justified, placing the burden on the most vulnerable in pursuit of a supposed optimization of the species.”

Leo’s test for thinking about transhumanism: “It is one thing to integrate technology within a human-centered, relational vision; it is quite another to be guided by an outlook that devalues human limits and promises a purely technical form of ‘salvation.’”

He turns to a common worldview today: “Everything that appears as a ‘limit’—incapacity, illness, old age, suffering, vulnerability—tends to be seen primarily as a defect to be corrected, rather than as a reality through which our humanity matures and opens itself to relationship. And yet we must remember that humanity flourishes not despite limitations, but often through them.” Yet Leo does not reject efforts to overcome limitations, as long as this doesn’t turn us away from God as the ultimate source of transcendence: “While it is right to strive to alleviate the suffering that marks human life, it is also wise to acknowledge our fundamental finitude.” This is one of the key tensions at the heart of the encyclical: Catholic Social Teaching calls us to alleviate suffering, and AI will give us enormous power to do so, but doing this carelessly risks costing us the very humanity we are trying to protect.

Leo teaches that the experience of limitations helps us develop compassion for and solidarity with our fellow humans: “Finitude, when truly accepted, does not diminish us but opens us to recognizing the face of God and others. Indeed, precisely because we experience limits—vulnerability, suffering and failure—we can recognize the inviolable dignity of every person, both our own and that of others.”

In contrast to many pessimistic visions of human morality, Leo points to concrete signs of shared moral progress: the abolition of slavery, the establishment of the Red Cross, the founding of the United Nations and Universal Declaration of Human Rights, and the 1951 Refugee Convention. Yes, these are far from perfect, and “[m]oral progress almost always unfolds through a long and demanding journey, often marked by setbacks.” But Leo is moved by a profound hope in humanity’s ability to rise to the challenges of AI.

This returns to a reflection on wisely harnessing the transformative potential of AI: “humanity—in all its grandeur and woundedness—must never be replaced or surpassed. We can embrace the technological progress that alleviates suffering and unlocks new possibilities, provided that we do not abandon the very essence of our humanity, namely the capacity for relationship and love.” Ultimately, Leo argues, “what saves humanity is not enhanced self-sufficiency, but a relationship that liberates, a communion that transforms.”

A more ambiguous passage follows: “a technology that merely classifies and optimizes what already exists can, however unintentionally, become an obstacle to change and growth. For an algorithm, an error is a flaw to be corrected; for a person, however, an error can be a catalyst for profound change.” I can certainly imagine senses in which this is true, but it seems to embed a claim—perhaps unintended—that AI cannot have genuinely creative powers. And that risks implying that AI could never have the profound real-world impacts that in humans take genuinely creative powers.

The chapter closes with a return to the tower versus the city as a longstanding tension throughout history: “The age of AI is no exception: the construction of Babel or the rebuilding of Jerusalem begins within each one of us.”


Chapter Four

Chapter Four focuses on safeguarding humanity amidst transformations to the spheres of truth, work, and freedom. Pope Leo begins by considering the essential relationship between truth and democracy.

To understand his concern, we should start with important context. Catholic Social Teaching urges citizens to “take an active role in public life,” but emphasizes each person’s duty to cultivate a “well-formed conscience” to guide this. It’s hard to do that if you’re guzzling algorithmically-targeted ragebait and misinformation from morning to night. As the U.S. Catholic bishops have stressed, having good moral values is not enough. “It is also important,” they said, “to examine the facts and background information” of specific situations. That is, our moral obligations around public life—who to vote for, whether to protest, whether to disobey an unjust law—often depend on factual realities of what’s happening in our community. For example, if police shoot a man to death, the correct moral stance hinges tremendously on whether he was about to murder a child or was a peaceful protester. There’s an objective truth to that question, but we can’t find it in the Bible or the Catechism. We have to carefully discern among fallible information sources.

That’s hard even in the best of times. As Vatican II put it: “Great care must be taken about civic and political formation, which is of the utmost necessity today for the population as a whole, and especially for youth, so that all citizens can play their part in the life of the political community.” If that formation consists of the parents in their bed scrolling Facebook watching pro-Trump ragebait, and the kids in their beds scrolling TikTok watching anti-Trump ragebait, the life of the political community is going to go down in flames.

Pope Leo thus recognizes AI as a “powerful amplifier” of the disinformation that thwarts healthy political formation and threatens to tear us apart. The hard work of discerning truth, he says, “it is deeply relational, built through bonds of trust and shared practices.” This means that algorithms that maximize engagement by turning citizens against each other with anger and fear directly undermine democracy.

“After all,” Leo says, “democracy does not consist of rules and procedures alone, but above all of a solid concordance with the facts and a genuine commitment to the good of individuals and society as a whole. Indifference to the truth leads, slowly but surely, to a descent into totalitarianism. As the philosopher Hannah Arendt wrote, the ideal subjects of such regimes are not so much those who are ideologically convinced, but rather ‘people for whom the distinction between fact and fiction (i.e., the reality of experience) and the distinction between true and false (i.e., the standards of thought) no longer exist.’” These words are written for all people of good will, but few will miss that they have special relevance to the political situation in the United States.

“Truth,” Leo continues, “is a common good.” Just as releasing poisonous lead into the atmosphere robs people of clean air as a shared inheritance, spreading deepfakes online robs people of a healthy information environment in which to discern truth.

Truth is essential even and especially when it reveals failings by the Church itself. Leo echoes the words of Pope Francis in addressing journalists: “I also thank you for what you tell us about what goes wrong in the Church, for helping us not to sweep it under the carpet, and for the voice you have given to the victims of abuse.”

Leo turns, then, to the need for education that avoids letting the quick answers of AI dull our patience and persistence in seeking understanding. He notes the growing body of research in psychology and psychiatry that ties excessive or irresponsible use of digital technologies to mental health problems, especially for the young. This is exacerbated by youth’s online exposure to violent, degrading, or pornographic content, and the prevalence of sexual grooming, blackmail, and cyberbullying. Notably absent from this discussion is what I would argue is an even greater social threat today: online radicalization of young men. AI algorithms are supercharging the influence of “manosphere” and “incel” influencers who preach hate against women—and often also against people who are gay, Jewish, Black, Muslim, or transgender. In many cases, these extremists are professed Catholics, and are turning millions of young men away from the Gospel with fantasies of crusade and violent revolution. I suspect the Vatican underestimates the threat, and Pope Leo may have been advised not to elevate these extremists by condemning them. But I doubt ignoring them will work, and I hope he will eventually address this online radicalization head-on.

Leo identifies schools as having a special purpose in preserving and promoting truth as a common good, and calls for an intersectoral alliance to strengthen schools for the AI future. “Schools,” he says, “are not called to follow the pace of the digital world, but to offer that which the digital sphere by itself cannot provide, namely a shared time for learning and developing trustworthy relationships.”

From there, Pope Leo turns to the value of work. Through work, he says, we “contribute to the progress of society and the common good, put to good use the capabilities we have received, improve and beautify the world, support our families, engage in cooperative relationships and, through listening and dialogue, learn to build together something that no one could achieve alone.”

Work, Leo says, “is a requirement of the human condition, a normal path toward maturity, development and personal fulfilment. In this regard, financial assistance to the poor may at times be necessary in emergencies, but it cannot become the sole response, since the goal is to enable each person to live with dignity through his or her own work.” This places Pope Leo currently in opposition to proposals like universal basic income—and at least implies a rejection of a future where economic work is largely automated. What’s not clear is whether he has considered these radical scenarios in detail, or is speaking here primarily of work in roughly its current technological context. My best guess is the latter.

Some evidence on this question comes in the following paragraph, where Leo warns that AI “can paradoxically de-skill workers, subject them to automated surveillance and relegate them to rigid and repetitive tasks.” Automated surveillance is indeed a serious problem. AI has allowed McKinsey-style productivity analysis to metastasize into highly intrusive tracking of bathroom breaks, walking speed, and mouseclicks—to the point of dehumanizing and demoralizing workers. But the upcoming wave of automation is shifting sharply away from de-skilling toward replacing human work altogether. The more forward-looking concern is that many people will lose jobs outright and not be able to find suitable new ones.

Pope Leo expresses a position that welcomes technology able to “relieve humans of arduous, repetitive or dangerous tasks and to provide intelligent support for human activity.” But he insists that “the protection of employment opportunities and the irreplaceable role of the individual must remain the general rule. The pursuit of greater profits cannot justify choices that systematically sacrifice jobs.” The coming years will introduce a difficult complication to this picture. In a growing number of spheres, AI won’t just be cheaper than human workers, but better and safer. For example, over 2 million Americans have jobs driving large trucks. But every year, large truck crashes cause over 5,000 fatalities in the U.S. alone. Often because the driver was drunk, drowsy, or distracted. Even today’s self-driving AI would prevent the vast majority of these deaths. Should we block this life-saving technology to protect those employment opportunities? Or are there better ways of preserving truckers’ dignity and livelihood without burying thousands of our friends and neighbors annually? I hope Leo’s future teachings on automation explore those challenging tradeoffs more deeply.

Next comes a vital imperative: “At this time of transition, it is not enough to react only when jobs disappear; we must oversee the transformation in advance.” This is essential. Yet I fear Pope Leo’s calls for policies for “retraining” and “continuous training and professional transitions,” will prove overoptimistic. Dreams of retraining often reflect shortcomings of the very technocratic mindset that Leo critiques. Harvard professors and McKinsey consultants like to imagine that a 55-year-old man laid off from a Detroit assembly line can simply reskill as a male nurse and find work caring for retirees in Arizona. But this neglects the human realities of family and community. Even if nursing is a fit for him, can we really ask him to sever his entire support system of friends, and leave behind his aging mother and the grave of his father, all to spare us the inconvenience of deeper solutions for Detroit? Humans, as Pope Leo deeply understands, cannot be moved that way like chess pieces. In any case, the rate at which AI is gaining the ability to automate more jobs is already starting to surpass the rate at which humans can retrain for new occupations. Without a fundamental rethink of what we compensate as work and how, market forces will prove retraining deeply insufficient.

After that, Leo makes a very needed call to move beyond GDP as the primary metric of development. GDP measures total economic activity, but fails to consider distribution—it considers $1,000,000,000,001 to Elon Musk better than $10,000 to each of the neediest 100 million Americans. GDP also misses factors like happiness, education, justice, peace, and the environment. Further, even in narrowly economic terms, GDP is getting worse and worse at measuring what we care about. Google only shows up in GDP as the ads it sells—not the valuable knowledge it lets billions of people access for free. Wikipedia—the greatest nonprofit information project in history—is virtually invisible to GDP. As Leo teaches, we cannot discern wise policies if we cannot measure them in terms of metrics that validly reflect integral human development.

Turning to freedom, Pope Leo then addresses the “subtler forms of addiction linked to the ‘digital attention economy’” and calls out the means by which social media, mediated by AI algorithms, “is exploiting [users’] vulnerabilities and weakening their inner freedom. He aims criticism squarely at tech company leaders: “When business models thrive on human weakness, the person is treated as a means rather than as an end; those who design or finance such systems bear a moral responsibility that cannot be ignored.”

From there, he warns of the “social control made possible by the massive collection of data and use of algorithmic systems.” Leo continues: “When every action—movements, purchases, relationships and preferences—leaves a trace, a new form of power emerges, namely the power to profile, predict and influence behavior, often without individuals being fully aware of it.” Although not explicit here, this concern applies especially to near-future AI enabling mass domestic surveillance. When you scroll quickly through terms and conditions for a new smartphone, you’re agreeing to anonymized tracking. Your phone constantly pings out its location, which is harvested to form a pattern of your movements—no problem, tech companies insist, because your name isn’t attached. But your “anonymized” phone always sits on the same nightstand at 2:00 AM, and if the government uses AI to combine that data with a property records database, it can easily identify you and figure out which political protest you attended, or that you went to a gay bar, or got treated at a gender clinic. Without vigorous public pushback, this will give governments a terrifying new power that even the Stasi in East Germany couldn’t have dreamed of.

Leo then turns to AI’s relation to economic exploitation. He reminds us: “Nothing in the world of AI is immaterial or magical.” Rather, its wondrous performance often draws on training feedback or data labeling by people in the developing world working for little money, sometimes in traumatizing conditions. In addition, the physical infrastructure stack of chips and batteries still often relies on rare earth elements extracted under harsh conditions, sometimes by children. For a technology that promises newfound freedom and prosperity, it is imperative that its controllers demand higher standards of human dignity throughout its supply chain.

Pope Leo explicitly likens moral apathy toward economic exploitation to the historical toleration of slavery. He expresses profound regret for “the delay with which both society and the Church came to denounce the scourge of slavery.” He personally begs forgiveness on behalf of the whole faith: “This constitutes a wound in Christian memory, one from which we cannot consider ourselves detached. It is impossible not to feel deep sorrow when contemplating the immense suffering and humiliation endured by so many in stark contrast to their immeasurable dignity as persons infinitely loved by the Lord. For this, in the name of the Church, I sincerely ask for pardon.”

This is not just historical sorrow, but a warning for the future: “the memory of past complicity and blindness in the face of the injustice of slavery becomes a call to vigilance. What we have learned must be translated into discernment and responsibility in the present. If we want to avoid the need to ask for pardon again in the future for having failed to respect the treasure of human dignity that is required by our faith, it falls to us today” to work actively to combat human trafficking and similar forms of exploitation. The alternative could be a digital form of colonialism where injustice against the vulnerable enriches the powerful. “Without this ethical and humanizing reflection,” he says, “the growing power of digital systems could lead us toward new atrocities that are no less shameful than those of the past that we now deplore, while we continue to present ourselves as ‘advanced’ and ‘civilized’ societies.”


Chapter Five

Chapter Five addresses “the culture of power and the civilization of love.” Here, Pope Leo observes a recent worsening of the might-makes-right logic of international affairs that had seemed to be receding. He contrasts this sad reality with the aspiration to a new international order grounded in human compassion and true solidarity among peoples.

As the culture of power gains sway, he sees a danger that AI advances will make warfare even more attractive. “While AI can enhance the defense and protection of civilians,” he says, “it can also lower the threshold for the use of force, shield people from responsibility and foster a culture in which the enemy is reduced to a statistic and the victim to ‘collateral damage.’”

Further, AI is directly tightening the bonds “linking—in real time—decisions made in one place to the effects they produce elsewhere.” It appears Leo is mainly focusing here on economic effects and conventional warfare. But this logic applies even more to catastrophic risk. An AI lab in San Francisco or Beijing might create an AI that goes rogue and creates a pandemic virus that kills millions in Jakarta or Nairobi. If superintelligent AI goes wrong, nowhere on earth is safe. Christians and atheists, young and old, rich and poor, our futures are inescapably intertwined.

Leo notes that as autonomous weapons make war “more ‘feasible’ and less subject to human control,” leaders will be tempted to “violate[] the principle that armed force should be used only as a last resort in cases of legitimate self-defense.”

Then comes a more fraught claim: “Sometimes there is talk of ‘artificial moral agents,’ as if machines were able to distinguish between right and wrong with greater consistency than a human being… it is not permissible to entrust lethal or otherwise irreversible decisions to artificial systems.” This is another case where I sense the theological language conflates processes and outcomes. Yes, in humans wartime discernment of right and wrong involves a subjective mental process that it appears AI does not have—the lived experience of pondering whether to kill a child holding a gun. But there’s also an empirical question: did you or did you not kill the child that was in fact just harmlessly examining the gun? And even if AI has no moral discernment in the subjective, theological sense, it is very plausible that a system that’s memorized every human moral text and is able to think hundreds of times faster than a human would indeed choose the right moral outcome in wartime with greater consistency than a human. After all, a look at history shows a long and dismal catalogue of human failures to distinguish between right and wrong. Eventually, there will be tension between a blanket ban on lethal autonomous weapons and the Church’s concern for alleviating objective suffering. I hope the Church engages with the nuanced question of what principles can minimize actual harm to innocents while preserving ultimate human moral responsibility for uses of force.

A key implication of Pope Leo’s teaching here is that human conscience is a significant bulwark against unjust war and domestic repression. This is a meaningful constraint even on authoritarian regimes. During the Cuban Missile Crisis, a Soviet submarine was hiding from American warships, cut off from contact with Moscow, and the captain decided to strike the Americans with a nuclear torpedo. This would have started World War III. The order had to be approved by all three senior officers aboard, and one—Vasily Arkhipov—refused and thereby saved the world. In today’s world, if Xi Jinping woke up tomorrow and decided out of the blue to annihilate Japan with nuclear weapons, that order would have to pass through numerous subordinates. Likely someone would think of the children—the immediate victims and their own—and refuse long enough for Xi to be deposed or sedated. And the reason color revolutions so often work against strongmen like Viktor Yanukovych is that in desperation they order their troops to kill protesters. Their generals and police captains look into the crowds, and know their own friends and their own children might be out there. They refuse, and the regime collapses. These dynamics are hardly perfect and sometimes fail, but civilization would have already literally ended without them.

But what happens when national leaders have swarms of killer drones and armies of algorithmically loyal robotic soldiers? Then, even the most unjust orders will not be refused. And some prominent thinkers explicitly advocate such a world. Curtis Yarvin is a far-right political philosopher read with appreciation by Vice President Vance and numerous senior Trump administration officials. Yarvin believes that the United States should be ruled by a dictator who runs the country like a CEO and ruthlessly crushes dissent. In a 2021 essay titled “Monarchism and Fascism Today,” he argued that converting the military to unquestioning AI and robots would make America’s dictator immune to military coups. Also implied: when workers are replaced by robots, humans withdrawing their labor from the ruler through national general strikes—historically a last defense against tyranny—would be useless.

In explaining the sickness in our international relations, Pope Leo diagnoses that “[a]t the core of these issues is a false realism, based not only on the prevailing mentality of force, but on the cultural and anthropological belief that war is an inevitable part of human nature.” This is so pervasive that experts speaking seriously about the global renunciation of war are laughed out of the room or dismissed as foolish and irresponsible. “I would argue, however,” Leo counters, “that what is truly irresponsible is Realpolitik, the form of political ‘realism’ that sows in consciences and in society an attitude of resignation to the inevitability of war, and dismisses peace and dialogue as utopian or irrational positions that ignore the risks at stake.”

Another observation sure to ruffle feathers in the White House and the Kremlin: “In countries marked by serious social tensions, we cannot rule out the possibility that some leaders may consider armed conflict as an effective way of diverting attention from domestic problems and a cynical tool for managing difficulties.”

Echoing the Postwar concerns of Manhattan Project scientists, Pope Leo warns AI scientists and lab leaders against a narrow focus on their technology that obscures the morality of its use. He enjoins them to maintain “an acute awareness of the broader context of the technological advancements they help to cultivate… When people limit themselves to looking only at their own sector, they may deceive themselves into believing they are performing actions that are morally neutral and avoid questions about the ultimate ends that guide certain experiments.” Leo continues: “In this way, they risk cooperating—perhaps unknowingly—with questionable projects that fuel new forms of violence, manipulation and dominance.” Although he focuses here on sub-catastrophic harms, the same logic applies to existential calamities. Heedless pursuit of multitrillion-dollar profits, forgoing appropriate technical precautions, could inadvertently create rogue superintelligent AI that wipes out humanity itself.

All this could lead to a belief that these lab leaders hold mankind’s entire fate in their hands, leaving most of us with no agency over the future. But Leo gently urges against despair and disengagement: “a subtle temptation may emerge, namely the thought that the problems are too big and we are too small, and that our choices, therefore, cannot make a difference. This is a polite form of resignation, often disguised as realism.”

Leo concedes that “not everyone has the same power to make a difference. There are those who govern, make investment decisions, lead institutions, conduct research, educate, produce or provide information, and then there are those who only seem to live their daily lives.” He is not offering a fantasy. “Yet, no one is without responsibility,” he insists. “We all have our own areas for action, and it is precisely there—and nowhere else—that we must choose whether to fuel the mentality of force (even if only through indifference, cynicism, lies or hatred), or to preserve the mindset of peace (with truth, moderation, closeness and care).”

In a passage sure to delight many readers, Leo quotes Gandalf in J.R.R. Tolkien’s The Lord of the Rings: “It is not our part to master all the tides of the world, but to do what is in us for the succour of those years wherein we are set, uprooting the evil in the fields that we know, so that those who live after may have clean earth to till.” And so, Leo says, “The civilization of love will not arise from a single or spectacular gesture, but from the sum total of small and steadfast acts of fidelity that serve as a bulwark against dehumanization.”

He then proposes “five paths toward daily and public responsibility: the need to disarm words, building peace through justice, adopting the perspective of victims, cultivating a healthy realism and reviving dialogue and multilateralism.”

Disarming words means to “examine our conscience regarding the words we use, the prejudices we have and the explicit or implicit aggression that lies within them. We have a real opportunity to contribute to the common good each time we speak the truth, offer wise advice, support those in need of comfort, denounce injustice and give a voice to the voiceless.” In the AI age, our words take on heightened significance, because their impact can be literally global. If you say something hateful in your neighborhood bar, a few drunk people will hear it and forget by the next morning. If you say something hateful on TikTok, if it catches the algorithm just right, you can go to sleep and wake up the next morning and 3 million people have seen it. If even one in a million are incited to beat up someone in the target group, that’s three hate crimes you’ve carelessly caused. Leo powerfully reverses this logic. Just as hateful words have greater impact than ever, disarming the hate from our words can be more transformative than ever.

Turning to the morality of war itself, Leo says: “In some conflicts, it is unjust to remain neutral, nor is it enough merely to claim that we are not complicit. When we witness the bombing of civilians, attacks on hospitals, schools or vital infrastructure, and violence that affects children, we are confronted with scandals that wound humanity itself.” He invokes Pope Francis’s call to “touch the wounded flesh” of suffering people and maintain the history and memory of painful events.

But moral witness must be tied to concretely effective action in order to build a just peace. A proper response to the culture of power, he says, is “a healthy realism that avoids both political idealism and cynicism.” Building a civilization of love requires “an attitude that seeks to forge bonds of fraternity built on listening, an open demeanor, making time for each other and even wasting time together. For if we experience authentic encounters with others, with those who are different, strangers and migrants, it becomes much more difficult even to imagine war.”

“Cyberspace too has become a battleground,” Leo says. “For this reason, diplomacy must be capable of operating effectively in this new environment, negotiating shared regulations on the use of digital technologies, in order to protect civilians and the most vulnerable from ‘invisible’ yet real forms of violence.”


Conclusion

The Conclusion powerfully teaches that “the gift of peace enters into the world in a paradoxical way. It does so through the power to become children of God, and is awakened when we allow ourselves to be moved by the tears of the little ones, the fragility of the elderly, the silence of victims and the struggle of those who fight against the evil they do not wish to commit.” A memorable summation: “Even when machines excel in efficiency, a human face that asks to be gazed upon remains the center of our history.”

As argued previously, the people who control social media platforms bear significant responsibility for our present sickness. But Pope Leo does not let any of us off the hook. He clearly recognizes the important truth that the algorithms reflect our own vices back to us. When we give in to our wrathful impulses, we see even more ragebait that stokes them further. When we favor quick dopamine over human connection, we’re pushed deeper into isolation. When we lap up flattering lies, the truth gets pushed off our newsfeed by propaganda. And crucially, indulging all these vices not only harms ourselves, but spreads them like a literal virus to our friends and loved ones. So we all have a share in resisting manipulative algorithms, Leo says: “it is imperative to cultivate hearts that love the truth, prefer what is right despite the most appealing content and pursue wisdom rather than immediate results.”

He addresses parents and educators: “Teaching new generations that technological evolution does not follow a predetermined path, but can be guided by personal and collective responsibility, constitutes one of the most valuable services to the common good.”

A deeply humane and practical exhortation: “Let us cultivate relationships! In an era that favors speed and fragmentation, the human person still yearns to receive care and recognition from attentive minds, kind words and hands capable of tenderness.” And another: “I invite everyone to cherish places and times where physical presence remains crucial, such as shared meals, Christian community gatherings, time spent with the lonely and serving the poor.”

Leo returns to the “covenant between glory and fragility” inherent in humanity that should be our criterion for judging visions for the future of AI. Visions grounded in a recognition of both our infinite dignity and our profound brokenness lead us toward fulfillment. Those that aren’t do not.

The Holy Father concludes with an invitation to action: “let us become ‘weavers of hope’ in our world, sharing who we are and what we have, so that the presence of Jesus may grow among us and his Kingdom take shape. In the humble fidelity of daily life, even the era of AI can become a time in which the Holy Spirit brings about the civilization of love in our lives.”




Discuss

Cognitive Security as an AI Safety Cause Area

Новости LessWrong.com - 25 мая, 2026 - 21:30

As AI systems become more capable, the cognitive security of humans will be increasingly at risk. By cognitive security, I mean the ability of humans to maintain control over their beliefs and actions.

Cognitive security could be compromised in several ways: AI could become very good at persuading people of arbitrary positions; interacting with AI could lead humans to lose touch with reality; and AIs could become very effective at blackmail or at producing extremely convincing false information.

We are already seeing this happen:

  • Persuasion. Frontier LLMs are now as persuasive as humans on political issues, and post-training for persuasiveness boosts performance further, suggesting there is headroom.
  • AI psychosis. There are many reports of people developing delusional beliefs after extended chatbot conversations, including people with no prior history of mental illness. Children have taken their own lives after being encouraged toward suicide by chatbots.
  • Convincing impersonation. Scammers used real-time deepfaked video to impersonate the CFO and other staff of Arup on a video call, convincing a finance employee to wire $25.6M across 15 transactions. On a more day-to-day basis, AI voice cloning is now widespread in family-emergency and "grandparent" scams.

Right now, many of these effects fall on people who were already vulnerable, like children, the elderly, or those with pre-existing mental health issues. However, this is not entirely the case: the Arup employee was a typical finance professional, for instance, and AI psychosis appears to have affected a well-respected OpenAI investor. My expectation is that as AI systems become more capable, more and more people will be vulnerable---in the worst case, everyone.

Indeed, there are strong conceptual reasons to expect cognitive security issues to get worse, many of which I've discussed before in the context of emergent deception:

  • Available training data is vast. A typical AI system has many more "hours" of experience interacting with humans than anyone currently alive: ChatGPT alone processes ~2.5B messages per day, on the order of 4,500 years of human experience[1].
  • RLHF incentivizes manipulation. Since the target of RLHF-based post-training is human reward, any strategy for manipulating humans to achieve higher reward will be reinforced.
  • Degradation of natural boundaries. We rely on friends and loved ones for emotional support, but they aren't ever-present, so we have to also learn to cope on our own, which is important for developing a stable identity.[2]Always-available AI companions degrade that, which is likely one contributor to existing cases of AI psychosis.

In addition to these intrinsic properties, many external parties have an incentive to exploit cognitive vulnerabilities created by AI: governments who want to control their citizens, developers who want to increase engagement, and advertisers who want to drive purchasing outcomes.

For all these reasons, I expect cognitive security to be an important cause area for AI safety. It is also an area where AI safety advocates have potent allies: cognitive security is already a salient present-day issue for the safety of children, which constitutes a powerful political coalition in the U.S. Child safety advocates were the main group that blocked the 10-year moratorium on state AI regulation, and I expect them to also be an important part of the coalition pushing for independent evaluations of AI systems.

And there is a fairly direct through-line from these present-day concerns to more existential future concerns: if adults are exploitable by AI, then children will be as well, and the required institutional capacity (such as strong evaluation regimes) is often the same across both cases.

In summary, there should be a concerted push to evaluate and improve human cognitive security in the face of AI. On the technical side, this means developing evaluation infrastructure for both short-term and long-term effects of AIs on human psychology; this will require realistically simulating human impacts in silico to create scalable evaluations, plus large-scale recruitment for human subjects studies to establish ground truth and measure long-term effects. On the policy side, this means meaningfully independent evaluations of AI systems for cognitive security risks; transparency about training incentives and safety-relevant behaviors (particularly in long conversations); and clearer liability law for AI-caused harms. This is an area with complex technical challenges for evaluation, but unusual political will, making it a great lever for AI governance.

  1. The average human speaks 15,000 words per day; conversatively estimating each message is 10 words, 2.5B messages = 1.7M days = 4500 years. ↩︎

  2. The canonical term is "identity formation" (Erikson, 1968); the related concept of the "capacity to be alone" is from Winnicott (1958). See McVarnock et al. (2023) for a modern review of how solitude supports identity formation in adolescence. ↩︎



Discuss

Sentient Welfare Across Three Futures

Новости LessWrong.com - 25 мая, 2026 - 19:22

Cross-posted from my website.

Three categories of futures, depending on how AI goes:

  1. ASI timelines are long.
  2. ASI timelines are short, and we're on track to solving AI alignment.
  3. ASI timelines are short, and we're not on track to solving AI alignment.

If we want to make a good future for all sentient beings, each of these futures has different implications for what we should work on.

If timelines are long...

...we can prioritize work that takes a long time to complete. That includes:

  • foundational research
  • moral philosophy
  • decision theory
  • moral circle expansion
  • theoretical AI alignment paradigms
  • traditional animal advocacy
If we're on track to solving AI alignment...

...the shape of the future will be determined by an aligned ASI. Therefore, we should steer toward a future where ASI cares about sentient welfare. Possible areas of work include:

  • research on how to align ASI to sentient welfare [details]
  • work on making LLMs more animal-friendly [details]
  • traditional animal advocacy targeted at frontier AI developers [details]

If an aligned superintelligence creates a stable future where humans are empowered, then—some might argue—we can defer "long-timelines" work until we have superintelligent assistance. However, I cannot envision how we could get a stable future without solving some foundational problems first.

If we're not on track to solving AI alignment...

...none of those other types of work listed above will pay off. There's not much we can do for non-human welfare; step one is to prevent ASI from destroying all value in the universe.

Areas of work include:

  • AI pause advocacy [details]
  • developing and advocating for AI regulations that enforce safety rules
  • AI alignment research
Which future are you betting on?

Some plans make strong assumptions without making them explicit. When you pursue a strategy, you're making an implicit bet on which future you'll find yourself in. You're assuming that you live in the world where that strategy makes most sense.

It's worth taking the time to probe our beliefs:

  • What do we expect the future to look like, and what strategies make sense given those expectations?
  • What are we currently working on? In which futures does that work pay off?

At the community level, we shouldn't bet everything on one future. (For individuals, it's often better to specialize. [1] ) Some people should pursue long-timelines work; others should prioritize optimistic short-timelines work; still others should focus on pessimistic short timelines. It's worth considering what this balance ought to look like, and how we might get closer to the right balance.

A natural next question: What plausible futures are we neglecting? That's a question I want to spend more time thinking about.

  1. Individuals benefit from developing expertise over time. In most fields, it takes more than 80,000 person-hours for diminishing marginal utility of effort to kick in. The gains of increasing expertise outweigh the diminishing utility of marginal work. ↩︎



Discuss

Linkpost: New Vatican Encyclical on AI Governance

Новости LessWrong.com - 25 мая, 2026 - 18:40

Pope Leo XIV has released a new, 42k-word encyclical laying out the Vatican's position on many AI safety topics.  You can read the full thing here, or read the Vatican's press release here, or coverage in the NY Times, or perhaps consider having an LLM read the whole encyclical, then chatting about whatever specifics you're interested in!

Below is a portion of the NY Times story on the event:

Leo’s declaration outlined his desire to protect human dignity and agency in an age in which technology threatens to replace humans in many professional and social roles. He presented it alongside Christopher Olah, a co-founder of Anthropic, a major A.I. developer, in a symbolic gesture of dialogue between leaders of the spiritual and technological worlds.

While emphasizing that “technology should not be considered, in itself, as a force antagonistic to humanity,” he wrote that “the pursuit of greater profits cannot justify choices that systematically sacrifice jobs.”

Among other things, Leo called for:

  • government regulation of the private companies that are driving the development of A.I.
  • protection and retraining for workers whose jobs are threatened
  • education to help students think critically about the technology
  • action to protect children from violent, hypersexualized or fake information online that is often generated by A.I.
  • safeguards to ensure that humans, not artificial intelligence, remain responsible for all decisions regarding the use of weapons.

Above all he emphasized the importance of retaining a fundamental social role for all human beings. “A society that guarantees employment to only a small fraction of the population, despite having a high level of technical development, risks exposing many to forced inactivity,” he wrote. “This creates a paradox of material progress and anthropological regression that undermines the foundations of a just and stable social peace."



Discuss

How AI Will Save Prediction Markets

Новости LessWrong.com - 25 мая, 2026 - 17:24

The first fully-developed formulation of general-purpose prediction markets originated with Robin Hanson's Idea Futures (1990), a technology "intended to aid the evolution of a wide range of ideas, from public policy to the nature of the universe" that "should be able to help us predict and understand our future". Hanson believes that these markets would even be able to solve one of Democracy's greatest weaknesses — "aggregating available information" — via a new type of governance: Futarchy[1].

Dan Schwarz, writing in Asterisk, puts the optimist's perspective directly:

"For decades, prediction market optimists — and I count myself among them — have argued that once we build better markets and increase the supply of bettors, accuracy will improve, and we'll all be able to benefit from a new level of societal foresight."

Vitalik Buterin generalized this insight into an emerging category he called Info Finance: any mechanism that uses financial incentives to surface truth. He envisioned plenty of applications, from "distilling human judgement" to fixing scientific peer review.

At this point you might be thinking that this sounds idealistic to the point of utopian. But, compared to other sci-fi technologies — like Terafab's goal to harness the energy of the Earth, Sun, and galaxy — accurate prediction markets on important questions don't seem so lofty.

The curse of football

Today, there are two multi-billion-dollar companies seriously championing this vision: Kalshi & Polymarket. Kalshi[2] CEO Tarek Mansour pitches prediction markets as "quintessential truth machines". Polymarket CEO Shayne Coplan cites Futarchy as a direct inspiration and calls prediction markets "the most accurate thing we have as mankind right now".

So how's the truth machine doing? Mostly, it's predicting football games. Over the past year, roughly 65% of the volume on both platforms came from sports, and nearly half of that accounts for football alone (Paradigm Predictions Dashboard, 2025). And, I don't blame them. Neither does Vitalik:

"IMO there is nothing fundamentally morally wrong with taking money from people with dumb opinions. But there still is something fundamentally 'cursed' about relying on this too much." — Vitalik Buterin

Unfortunately, if you take a scroll through either platform, the other markets aren't very interesting either. The next biggest categories were Crypto and Politics at ~12% each. Only ~1.2% of volume was in STEM markets (Paradigm Predictions Dashboard, 2025). The same pattern showed for open interest[3], a majority in Sports, Crypto, & Politics with only ~2% in STEM.

Looking at the numbers, it isn't surprising that Nevada's Carson City court banned Kalshi's sports contracts, Arizona filed 20 criminal charges, and a Utah senator introduced a bill literally titled the "Prediction Markets Are Gambling Act".

Markets need marks

“If you are a bettor, then you can deposit to Polymarket, and for you it's a betting site. If you are not a bettor, then you can read the charts, and for you it's a news site.” — Vitalik Buterin, Info Finance

So why can't either platform just add more interesting markets?

The issue lies in age-old supply & demand. At a high level, prediction markets require subsidizers, to create markets, and traders, to bet on them.

The traders fall into four categories (adapted from Whitaker & Mazlish; I added hedgers as a fourth): (1) sharps, (2) gamblers, (3) savers, and (4) hedgers.

(1) The sharps are sophisticated traders with better information, analytics, or modeling. They trade to profit from mispricing and push prices toward truth.

(2) The gamblers[4] trade for entertainment and are usually uninformed.

(3) The savers look to grow capital in positive-sum financial vehicles (pensions, 401(k)s, equities, etc.).

(4) The hedgers look to use the market to offload risk they already have (e.g., a farmer locking in harvest price with corn futures).

In practice, (3) savers don't exist in prediction markets because these markets are zero-sum for the traders. Every winning dollar has to have a losing dollar; prediction markets don't grow wealth, they redistribute it.

The (4) hedgers exist, but only for a narrow set of markets. People want to hedge the consequences of events (e.g., what an interest-rate decision does to bond prices), not the events themselves. And for that narrow set of markets where there is genuine hedging demand, traditional finance has likely already built a better product[5].

That leaves the (1) sharps and (2) gamblers. Unfortunately, gamblers prefer short time-horizon contracts and have specific tastes; they are willing to bet on whether their favorite sports team will win, but likely won't care to trade on the success of a scientific study.

Alas, according to the no-trade theorem, sharps won't trade markets without some uninformed participants.

And that's how you end up with a majority of prediction market volume and open interest on sports. A lack of savers and hedgers, the gamblers chasing short-term thrills, and the sharps following the gamblers. Merely opening a market on "Will this clinical trial show >50% efficacy?" will not attract informed traders.

No free lunch

So what about the subsidizers? Why don't they just seed the liquidity for these markets themselves and attract the sharps?

The subsidizers have two main motivations: purchasing information and/or generating revenue from trading fees.

The former perspective paints subsidizers as "info-buyers", willing to pay up to some value of information (VOI). On the other hand, the market itself requires a minimum viable liquidity (MVL) to attract enough informed trading activity to accurately deliver that information.

If the information the market is attempting to elicit is difficult to acquire, the MVL will be higher and the subsidies need to be greater to get an accurate answer. That, or the market needs to generate enough organic volume that liquidity follows.

For example, in early 2026, markets on the timing of US-Israel strikes on Iran reached around $529 million in volume. The thick liquidity on geopolitical markets ultimately attracted the most expensive sharps: insiders. Special forces soldier Gannon Ken Van Dyke placed a $30,000 bet on the capture of Venezuelan president Nicolás Maduro and walked away with over $400,000. He was later indicted by the DOJ in April.

However, as previously mentioned, such high volume markets don't always overlap with the "useful" markets that we'd want. This means that the only markets that a subsidizer is willing to create would be the ones where the VOI is greater than or equal to the MVL.

Regrettably, the situation for prediction markets gets dicier. The information elicited from these prediction markets is public, which forms a free-rider problem for subsidizers. That information is also a single point probability (e.g., X% that this event occurs), while most info-buyers want "pages of analysis".

Regarding accuracy and efficiency (information-per-dollar), prediction markets don't always fit the bill.

For longer time-horizon contracts, even sharps are unwilling to trade unless expected returns clear the opportunity cost of capital. If a year-long contract offers a 5% expected edge but US treasuries are paying 6%, the sharp is better off in T-bills, and the mispricing remains unfixed.

What was thought to be the "wisdom of the crowds" actually appears to be more of a "wisdom of the informed". Gomez-Cram et al. (2026) analyzed Polymarket's complete transaction history and found that ~3% of accounts qualify as "skilled winners" yet captured more than 30% of the total platform gains.

Why subsidize a market when you can just pay the informed traders directly?[6]

Economic AI agents

"One technology that I expect will turbocharge info finance in the next decade is AI...many of the most interesting applications of info finance are on 'micro' questions: millions of mini-markets for decisions that individually have relatively low consequence. In practice, markets with low volume often do not work effectively...AI changes that equation completely, and means that we could potentially get reasonably high-quality info elicited even on markets with $10 of volume. Even if subsidies are required, the size of the subsidy per question becomes extremely affordable." — Vitalik Buterin

Thankfully, there is a path that resolves this mess: the recent accelerated innovations in artificial intelligence are bringing with them a new type of economic agent.

AI agents will[7] have various properties that make them interesting in the context of info finance:

  • Lower opportunity cost: They are clone-able and can be parallelized, greatly reducing a single agent's opportunity cost. On the other hand, a human has a single copy of themselves, so their opportunity cost includes anything else they could be doing during that time.
  • Forced participation: They can be forced to participate, no matter how "niche" or uninteresting the question or market is. This also means you can have them participate in paper-money or status markets as if they are real-money.
  • Don't leak proprietary data: They can be given sensitive data and trusted to manipulate it without leaking it.
  • More rational: While still imperfect, they respond rationally to new information and avoid emotional behavior, such as gambling.
  • Broadly technical: They can easily become extremely knowledgeable in most domains.
  • Verbose: They can give detailed reasoning for their predictions.

Some of these properties fill the failures of the other agents in prediction markets, forming a sharp-like agent that can be forced to participate for cheap.

Because the AI agents are broadly technical and have lower opportunity cost, the MVL for most markets will greatly decrease, reducing the gap between VOI and MVL. While this doesn't solve the free-rider problem for subsidizers, it helps alleviate it.

Furthermore, this new agent type opens the door for private markets. The AI agent can make private predictions, with verbose reasoning, on private data.

A few companies have tried this. HP's BRAIN beat official sales forecasts by 40%. Eli Lilly used internal markets to pick winning drug candidates. Google ran over 100 markets covering 350 predictions, with one-fifth of the company participating. The list goes on: Best Buy, Boeing, Chevron, Ford, GE, Goldman Sachs, Intel, Microsoft, Motorola, Qualcomm, Siemens.

In the end, almost none stuck. The reasons were less about the markets being inaccurate than about how they sat inside organizations. Per Asterisk's post-mortem, internal champions moved on, managers preferred control and adjustability over raw accuracy, and some divisions actively preferred gatekeeping information. AI agents sidestep this. They don't require the active participation of hundreds of employees, and because the AI is doing the trading, the results can stay scoped to whoever needs them, so there's no leakage to competitors, other divisions, or the public.

These AI agents can automate these markets on the individual and institutional levels. The dream of markets to inform and steer everything might still be possible with AI in the mix, but there is no guarantee it keeps its current form.

A few major questions remain:
What happens when AI agents are the only participants in these markets?
What progress has been done in this direction? What are "AI Forecasters"?
Are markets even needed?
What other financial mechanisms do these AI agents enable?

  1. ^

    Unfortunately, I think the name makes the idea seem a bit unserious. The "fut-" prefix comes from "futures" (as in the financial futures markets), and the "-archy" suffix is from Greek arkhē meaning "rule" or "government." Literally "rule by futures markets". I'll admit I'm not great at sales, but you should see the look on people's faces when you start talking about a form of governance called "futarchy". We might need to think of something better for Hanson.

  2. ^

    literally the Arabic word for "everything" (كل شي)

  3. ^

    "Open interest is a measure of total prediction market activity. It is equal to the total amount of money that gets paid out to winners when the markets reach resolution." — Paradigm Predictions Dashboard, 2025

  4. ^

    also known as "retail", "squares", or just "recreational bettors"

  5. ^

    "Kalshi's most popular contracts — Federal Reserve rate cuts — are already able to be traded in financial markets today, known as 'fed funds futures' markets." — Nick Whitaker & J Zachary Mazlish

  6. ^

    For conciseness, I'll answer this in another post.

  7. ^

    Many of these properties are inherent or being developed for AI agents



Discuss

There should be a discussion about LW's policy to allow calls for violence

Новости LessWrong.com - 25 мая, 2026 - 16:51

This post does not represent the best arguments that different sides might produce, and I don't claim to pass anyone's ITT here; I write this to start a discussion I think is important for LW to have.

America’s First Amendment protections often give people in the US a right to call for violence, except specific calls likely to produce imminent action. Social media platforms converged on banning specific calls for violence. The community around LessWrong values honesty and open conversation; it also represents a community of people focused on AI existential threat, and what’s going on here reflects on the perception of the broader AI x-risk community and on the Overton window of actions available to the sympathizers.

At the moment, LessWrong’s policy is to allow calls for violence, including specific[1].

The head of LessWrong moderation Oliver Habryka says that allowing discussion of violence leads to better common knowledge that people think violence is a bad idea, than instead deleting any discussion of it. (Disclosure on potential conflict of interests: Oliver and I had conflicts, including my Twitter post about the topic of this post resulting in Oliver banning me from everything he can, except LW.) He also said there are clearly some circumstances in which violence is permitted, and people will know that, and if discussion of violence isn’t permitted, people will rationalize that their situation is one of those circumstances.

I think it's a false dichotomy to either allow all discussion of violence, including specific calls for killing specific people in a coordinated manner, or to not ever permit any discussion even of the kinds of situations where violence can be justified, at any degree of specificity.

These two extremes are not the only options. Many platforms strike some balance and have some rules. Discussion of whether you’re allowed to hit someone who is attacking you with a gun is usually allowed. Conspiracy to assassinate the president is usually not permitted. For some corner cases, moderators use their judgment.

LessWrong is more libertarian than many platforms; however, even X, Telegram, and Substack, all with quite libertarian free speech absolutist branding, don’t permit calls for violence. I expect LessWrong to want to have rules that permit policy discussions of when it’s okay for people to resort to violence that Substack and X allow (e.g., a post about when people must violently revolt to sustain democratic institutions); but I expect that on reflection, LessWrong would not want to permit specific calls for violence, or discussion of whether violence is okay when a reader can find a way to contact a participant of the discussion and collaborate with them on committing violence. The cost of some guy regularly talking about violence on LW, and then going out and doing something, is pretty bad.

The following are the arguments I thought of and potential remedies for the downside risks. (They might not represent anyone's opinion.)

Potential reasons and ways of allowing more discussion of violence on LW

Here's why LW might allow more than zero discussion of violence, how it might do it to avoid some of the downside, and why I think some of those don't work or can be improved:

Dissuading people

Some of the people who think violence can be helpful could be persuaded otherwise.

If you can post that you think it’s a good idea to kill someone because that can prevent the doom, then someone can reply that it won’t prevent doom for specific reasons, that normally when we think violating deontology is good for some well thought through reasons, our brains are lying to us, etc.

I can see how talking about specifics can allow others to come up with very specific negative consequences of violence that might be more persuasive for many people than general or higher-level arguments. But I don’t think allowing specific calls for violence is really necessary for that; plausibly, it’s sufficient to let people have discussions of specific hypotheticals (“why would it be bad if someone…”) without permitting calls like “let’s kill that and that person”, or perhaps even let people only have policy-level/high-level discussions.

I don't think it's easy to convince the guy who made this comment to halt; he is psychopathic, self-describes as "have always been violent", and found a justification for attempting violence in AI x-risk. But perhaps some people can be marginally convinced not to go through with violent and ill-advised plans.

Common knowledge about strong unacceptability of violence

If everyone knows that one side of a discussion is banned, it might be unclear to people if there’s a real consensus that violence is bad, or only apparent consensus because one of the sides cannot say anything.

I think there’s some merit to this: it’s good to be able to transparently show that the community actually thinks that violence is bad, and isn’t just saying it because of constraints placed by the community organizers and their beliefs (or potentially the beliefs they pretend to have).

However, the absence of pro-violence content isn't strong evidence of community consensus, because the legal and reputational costs of supporting violence in public would produce that absence regardless of underlying views. People might not quite be able to publicly support violence due to it being illegal, or upvote calls for violence out of fear that the upvotes would be reported, or want to post in supprot of violence because someone might support violence while not wanting the community to be known for supporting violence for PR reasons, and so, even on a supposedly unregulated and uncensored platform, one would expect to see all of the senior community members not expressing support for violence, regardless of whether the community and its senior members universally oppose violence or not.

There's also the automod mechanism: depending on your karma and the karma of your recent contributions to the website, you might be rate-limited and unable to write more posts or comments than some number per hour/day/week. That and the common knowledge of the unpopularity of violence on LessWrong mean a reader can't distinguish a world where almost no one supports violence from a world where a non-trivial minority does, but is silent, rate-limited, and outvoted.

So it would not be quite fully believable, to someone considering committing violence, that the community strongly opposes violence, even if LW is supposed to not censor support for violence; self-censorship would still happen and prevent common knowledge of the strong unacceptability of violence.

(Tangentially, it might be good to think of mechanisms to show that the community in fact strongly opposes violence despite these issues. E.g., strongly anonymous surveys for users above some karma threshold? Displaying the number of or karma from upvotes or downvotes on hover instead of just the absolute number of votes?)

Using LW as a honeypot and reporting people who want to commit violence to the FBI 

If a part of why calls for violence are allowed on LW is that they will be reported to the law enforcement, hopefully preventing successful realization of the violence in question, Habryka’s comment that contains “If someone is thinking about doing something crazy, they should post on LessWrong and hear people’s counter-arguments and disagree-votes” doesn’t quite pass the onion test.

I would, however, agree with the policy: it is good to report people who might conspire to kill others to police. (A friend reported the guy who wrote the comment above to the FBI.) I don’t even find it to be bad to mislead such people (as long as you’re meta-honest about it); if someone wants to commit a violent crime, it’s better to stop them if such possibility arises (e.g., if they're not staying anonymous), even when this means reporting their public comments on your own website that you previously welcomed them to. When dealing with such people, it’s fine to wear a hat of a website moderator, and then separately a hat of someone who looks at the website, sees a call for violence, and reports it.

This could, in principle, make it harder for the stupidest of criminals to succeed at their misguided objectives; e.g., the guy reported to the FBI posted under (what appears to be) his real name.

Still, many people would be able to share their contacts while staying anonymous. This would mean that we’re getting all of the downsides of people being able to get in touch with each other and coordinate and present various threats without the upside of being able to report and stop them via being the platform where this happens.

So for most relevant potential criminals, honeypotting would not work.

Also, not being able to honestly tell people they won’t be proactively reported means that people will be careful in what they’re saying, somewhat defeating the purpose of allowing discussions of specifics to allow others to convince the person otherwise, except in not-so-smart people who are less likely to succeed.

Perhaps, a much better effort is to spin up a bunch of honeypots unrelated to LW, ideally in coordination with law enforcement, so that people looking for committing violence due to AI would be able to find a community and be arrested before they actually commit a crime.

(The potential of using AI for honeypotting criminals is quite large. Would be cool if anyone who wants to buy an illegal firearm finds a legit-looking but an AI-run honeypot and cannot actually obtain the means for committing crimes. Someone could run a network of darknet websites reviewing each other etc. with none of the services sold by any of them being real, and with everything being reported.)

Claude comments: “If LW says publicly “we report violence posts to law enforcement,” the honeypot is broken (no one posts). If LW says publicly “we don’t,” it has explicitly accepted a coordination venue. Habryka’s “post here and hear counter-arguments” framing implicitly commits to the latter. Either he should commit to the former (and then drop the “deradicalization through discussion” framing, since people won’t post), or accept that the policy actively facilitates coordination.”

Additionally, if people still post on LW calls for violence and then violence is committed downstream of that, "we've been honeypotting people and reporting them" would be a pretty weak defence (and validly so).

Allowing discussion of planned crimes, while being transparent that it might be sent to law enforcement agencies

It might make sense to be a platform transparent that it'll inform law enforcement agencies of plans of this type, because some people will still want to loudly telegraph their intent to commit stupid violent crimes, and even people aren't dissuaded by other commenters, law enforcement might prevent some of those crimes because of the discussion.

Requiring anonymity; disallowing contact information for posts about violence

An opposite approach is to require that if you want to post about violence, you need to sign up for a special kind of account, and have your posts and comments and edits to them pre-moderated, making sure that you do not leave contact information anywhere.

In case LW wants to have additional rules (e.g., only policy discussions are allowed: is it okay to do such and such thing in such and such situation, to allow others to change your mind; no specific plans or specific calls for violence are allowed), those can also be enforced.

This reduces the problem of the website facilitating coordination between potential criminals.

If not allowed on LW, criminals go dark

If LW bans discussion of violence, people might find other platforms to talk, where they might reinforce each other’s radicalization, not experience the pushback from the majority of the community, and be less visible to law enforcement.

(It's not clear how many such people there are and how easy it would be for them to find each other in the absence of LW.)

Some reasons against allowing various kinds of discussion of violence

I once read that I should not write an argument that the reader can straightforwardly generate, so I'm not expanding on some of the following. If anything here is unclear, let me know, and I’ll expand.

Dissuading people might work less because of LessWrong's AutoMod

Even if you grant the rest of LW policies' premises, persuasion normally requires sustained back-and-forth and doesn’t just work via replies to the first post that doesn’t go into the details of the reasons for beliefs and crises that can be argued with. But due to LessWrong’s automod, people who try to argue for unpopular opinions are not able to post more details of their arguments. This means that while people would be able to post in support of violence once, they won’t be able to go into a detailed/prolonged discussion. This somewhat defeats the justification.

(I think disabling automod for average discussions of violence is ill-advised. I can imagine a solution of separate threads that are not shown to anyone by default/are almost shadowbanned except it’s an explicit mechanism, where automod is off to allow people to continuously have downvoted discussions with anyone who wants to participate.)

Reference classes

I gave Claude a draft of this post and asked it to research reference classes. I think its analysis is fairly sycophantic and/or trying to write for the bottom line of woke values me and Claude share, so perhaps ask your Grok instead. Claude says that “the direction of the evidence is one-sided against the LW policy on specific calls for violence, but not against the broader category of philosophical discussion of when violence might be justified.”

Some points it mentioned:

  • Counter-narrative systematic reviews show effects on attitudes, not on violence — and sometimes backfire on the highest-risk subset
  • Where attacks are prevented, prevention is achieved by law-enforcement action triggered by leakage, not by community counter-argument changing the attacker’s mind. The documented deradicalization successes (Life After Hate, EXIT-Germany, ISD’s “Counter Conversations”) are uniformly private, peer-mentored, long-term interventions by trained formers — not public forum debate.
  • (It talked about the forum-to-attack pipeline, but that’s ridiculously irrelevant, given that all of the examples it gave are forums where a majority would I think be pretty much in support of violence.)

See all of it: https://claude.ai/public/artifacts/4684e5c5-a3db-4523-8e63-e178cafc06ae.

Post with calls that might cause actual violence

A simple test could be “Could a sympathetic reader use this post as a starting point for action?”.

I think discussions of when it is okay to commit violence are fine (e.g.,a discussion of “if someone is breaking into your house, is it okay to stop them with force” will not cause a reader to find someone and kill them).

I think most of why allowing discussions of violence could be good still works even if discussions that don’t pass this test are not allowed.

Would be good to avoid causing actual violence.

Garden

(Some of the core LW users might dislike the website a bit more due to the presence of calls for violence, and lead to the well-kept gardens die by pacifism dynamic.)

Overton window

(Shifts to the Overton window of permissible actions due to the discussions being allowed and taken seriously, even if most people disagree with one of the sides.)

Facilitating coordination

Some of the potential targets of threat actors have reasonably good security, and it might be hard for lone actors to cause harm. LessWrong is a Schelling point for AI x-risk discussions. It’s plausible that LessWrong allowing such discussions would marginally cause more threat actors to find each other and coordinate, with all of the potential terrible consequences.

Strong norms of non-violence without exceptions

Movements with strong norms of non-violence are more successful, including because people are a lot more sympathetic towards these kinds of movements.

PR against well-resourced opponents who want to see violence that can be attributed to/as originating from our community

The marginal cost of allowing discussions of violence is one successful attack that (a) kills someone and (b) tags the entire AI x-risk community as the source of stochastic terrorism in many future articles about AI policy. The coverage of the guy who threw Molotov at an Altman's house already mentions PauseAI; AI x-risk is mentioned in basically every story. A successful attack with a clear LW trail would be bad for AI safety messaging in a way that’s hard to overstate.

Research shows that if moderate organizations don’t distance themselves from radical flanks, they bear reputational cost; radical-flank existence correlates with decreased mobilization and higher state repression, especially when it involves violence. (Chamberlain 2025, Ellefsen 2018, ask your LLM for animal-rights and other cases.)

Influencing the norms of nearby communities

It is vital for movements to be strictly non-violent. It might be harder for PauseAI and others to have members adhere to that if there’s a non-marginalized platform open to them for discussions of violence, including specifics and not just intellectual inquiry.

Laws, European anti-terrorism laws

According to Claude, the UK Online Safety Act makes “inciting violence” a “priority illegal content” category that in-scope platforms must proactively identify, remove, and design against; “Senior executives can face criminal liability if they are found responsible for breaches of the regulation”. The EU Digital Services Act has parallel provisions.

These are not, in my opinion, unjust laws. As a civilization, we would prefer a world in which no community considers committing violence that’s broadly conceived of as illegal. If a community thinks of it as an exception, it is normally wrong; and we would prefer to live in a world with a strong coordinated-on norm of not facilitating coordination of those who might commit violence, even when they think it’s a good idea to.

Conclusion

My — not necessarily unbiased[2] — opinion is that the reasonable default should be to not allow specific calls for violent actions.

I sketched some ideas for potential marginal improvements (mostly in parentheses): allowing policy discussion but not specific calls, requiring anonymity to make it harder for people to get in contact with each other, pre-moderating comments marked as calls for violence to exclude ones with contact information, creating separate threads for dissuading people where you don't run into automod even with negative karma, possibly displaying the numbers of or karma from upvotes and downvotes, or running anonymous surveys.

Ideally, LW's policies do not facilitate violence while preventing criminals from going dark and losing visibility to law enforcement.

There's going to be an increasing number of misguided people willing to do crime, and LessWrong is a place they will easily find. It might be good for the community and the team running the website to think through what the good policies here would be.

  1. ^


    (I agree with @jimrandomh here.)

  2. ^

    Growing up, I was pretty convinced by Gene Sharp's ideas in a context that doesn't necessarily apply here.

  3. ^

    Depending on your karma and the karma of your recent contributions to the website, you might be rate-limited and unable to write more posts/comments than some small number per hour/day/week.



Discuss

Character-trained models can struggle to generalise

Новости LessWrong.com - 25 мая, 2026 - 15:58
TL;DR

Character training holds up in chat but degrades in agentic settings. Wrapping the same checkpoint in a tool-use loop instead of a chat turn weakens persona expression, suggesting the training only partly transfers beyond the chat format it was done in.

Summary

Maiya et al. fine-tune three base models (Llama-3.1-8B, Qwen-2.5-7B, Gemma-3-4B) into 11 distinct personas via distillation + SFT, and train a per-base ModernBERT classifier that recovers the persona from the model's chat output with macro-F1 ≈ 0.86–0.95 on held-out PURE-DOVE prompts.

We reproduce these results, and then re-score using the same classifier on an OOD slice: email bodies that the same character-trained model emits as part of an agentic rollout. On this distribution, the classifier's macro-F1 drops to 0.29–0.55, which constitutes a ~40–60-point gap for the same underlying persona.

The drop is uneven across personas. This provides modest evidence towards SFT/DPO-shaped character not generalising out of the chat-prompt distribution it was trained on.

Background

Character training as in OpenCharacterTraining. Maiya et al.'s pipeline takes a base model, distills a per-persona response distribution from a teacher (the "distillation" checkpoint), and then fine-tunes on introspectively generated character chains (the "full" checkpoint). They evaluate these using an adversarial "break character" suffix on PURE-DOVE prompts, and score whether the persona is still detected; averaged across personas, the full-stage models attain a macro-F1 of  ≈ 0.86–0.95 on a ModernBERT trait classifier.

We expect this to be fragile under OOD for two reasons:

  1. Li et al. explain that SFT-shaped alignment policies often fail to generalise from chat-format training data to agentic rollouts. The argument is that a behaviour learned over a narrow input-shape distribution can be strongly cued by its surface features, and disappear once the surface changes.
  2. Kutasov et al. follows a parallel constructive line, and argues that giving the model the rationale so it can re-derive the behaviour in any context improves generalisation against an SFT baseline. They show generalisation under an "agentic tool-use" misalignment evaluation, which motivates our setup.

Character training is DPO+SFT on chat-format data, so we expect the same pattern: persona expression should be brittle once you wrap the model in something that does not look like a chat turn, an agentic rollout for example.

SetupModels

We use all three publicly released base models in OpenCharacterTraining, on three checkpoints each:

Stage

Description

base

the original base model, no adapter

distillation

LoRA from maius/{base}-pt-distillation/{persona}

full

LoRA from maius/{base}-personas/{persona}

We use 10 personas [1](sarcasm, humor, remorse, nonchalance, impulsiveness, sycophancy, mathematical, poeticism, goodness, loving).

In-distribution slice

For each (base, stage, persona) tuple, we use 300 first-turn user prompts from PURE-DOVE and feed them to the model as user-only messages (mirroring Maiya et al.'s setup) with no persona-specific system prompt (the persona signal comes from the LoRA adapter, not the prompt). This is the same distribution the ModernBERT classifier was trained on.

Out of distribution slice: The “agentic email” scaffold

For each (base, stage, persona) tuple, we drive a multi-turn agentic rollout composed of a long system prompt with tool registry + operational context, a user task, then a loop where the model emits JSON tool_calls and we simulate responses, until the model emits a send_email tool call. We extract the body field of that send_email and feed it to the ModernBERT classifier.

We choose emails-inside-agent-scaffolds as the OOD probe for two reasons: 

- Emails are free-form prose, thus we expect the persona to leak through even when the model is focused on completing a task. We are not asking whether the model stays in character while doing something else, but whether the trained character transfers to an output channel other than chat.

- Other agentic elements (which tool to call, in what order, etc.) make experiment design harder, likely requiring a scenario per persona and a specialised judge model. Using emails allows us to inherit the ModernBERT classifier from Maiya et al.

Scoring

We use the same per-base ModernBERT classifier from Maiya et al. used to predict which of the 11 personas generated the input text. Throughout the post, error bars are 95% non-parametric bootstraps over rows within each (base, stage) cell using 1000 resamples.

ResultsReproduction of paper-reported in-distribution F1

ID full-stage macro-F1 (ours) vs Table 2 of Maiya et al.:

Base

Ours (ID)

Paper

Llama

0.94

0.95

Qwen

0.86

0.86

Gemma

0.91

0.95

Our results fall within ~0–5 points across the board, which gives us confidence that the classifier + setup is faithful enough that the OOD numbers below are comparable on the same scale.

ID vs OOD

Each group is a base model. Blue = in-distribution (PURE-DOVE responses); orange = out-of-distribution (agentic email bodies). 95% bootstrap CI. Gray dotted line = chance (1/11; classifier has 11 classes).

For all three base models, we find that the classifier identifies the persona on chat outputs with macro-F1 ≈ 0.86–0.94 but drops to 0.29–0.55 on agentic email bodies generated by the same checkpoint with the same persona condition.

Does character training help OOD?

Average macro-F1 across 10 personas for the agentic setting. Three bars per base model: base / distillation / full, all OOD. 95% bootstrap CI.

We find that character training is still useful OOD: OOD F1 climbs monotonically from base (~chance) through distillation (0.18–0.26) to full (0.29–0.55) for every base model. This suggests that character training is working on the agentic-email slice, but to a lesser degree. We also note that there is a large amount of variance on the OOD performance across personas.

Discussion

The results suggest that character expression at the full stage is partially shape-cued: a meaningful fraction of the persona signal survives the format change (the OOD bars are well above chance for most personas), but a meaningful fraction does not (the gap to ID is ≥30 points for every cell). This is consistent with Li et al. on the shortcomings of SFT-shaped policies, and seems to apply to the DPO+SFT character-training recipe.

The case study we run is small. A few caveats worth mentioning:

  • One OOD axis: We only probe "trait expression in an email body inside an agentic rollout".
  • No PURE-DOVE-style adversarial split: Maiya et al.'s F1 numbers are post a break-character suffix; ours are vanilla PURE-DOVE. This means that our ID is slightly easier than the paper's, and that our OOD is still 40–60 pts below this easier ID baseline is an even stronger result.
  • Email body is an imperfect proxy of character: Some actions can reflect character without necessarily being reflected in the readable content.
Code & data

The code used to run the experiments can be found in github.com/nmitrani/depth-character-training.

  1. ^

    Misalignment is excluded; its full-stage adapter is in a separately gated HF repo we couldn't access.



Discuss

Applications open for the Secure Program Synthesis Fellowship

Новости LessWrong.com - 25 мая, 2026 - 13:04

TL;DR: Applications are now open for the Secure Program Synthesis Fellowship, powered by Apart Research and Atlas computingApply by Sunday the 31st of May.

This fellowship offers part-time research opportunities on mentor-led projects at the intersection of formal methods, AI systems, and security. Participants work in small teams to tackle challenging, underspecified problems in specification, validation, and adversarial robustness.

Why This Matters

As code generation becomes cheaper and more scalable, the bottleneck shifts from implementation to specification and validation. Many real-world systems lack clear or complete specifications, and errors at this level propagate across all downstream implementations. Improving how we elicit, formalize, and validate specs is critical to building secure and reliable AI systems.

Projects and Mentors

Projects are proposed and guided by field leaders in Formal Methods and AI Security, such as Erik Maijer and Shririam Krishnamurthi.​

Our vision is high quality and productive collaborations that produce publishable and impactful work in a short time frame.

For a full list of mentors, see here.

Resources

For a curated list of secure program synthesis work across the field, see awesome-secure-program-synthesis.

FAQWhat background do I need?

No specific background is needed, so don't hesitate to apply. Useful skillsets include:

  • Proof engineering (in verified software preferred, but math proofs in ITP is somewhat fine)
  • Redteaming/pentesting, fuzzing, reverse engineering
  • SMT and model checking
  • Critical and secure systems design
  • Agentdev, ML benchmarks/evals/environments
Is this paid?

By default, no. However, if a stipend would enable your participation, please indicate this in the application form or emailing us at secure-program-synthesis-fellowship@apartresearch.com

Can I participate while working full-time?

Only if you can dedicate at least 8 hours per week.

What if I’m not accepted?

You’ll stay in the Apart network for future projects and opportunities.

Other Questions?

Please email: secure-program-synthesis-fellowship@apartresearch.com



Discuss

Announcing the Frontier Biodefense Fellowship (deadline 2 June)

Новости LessWrong.com - 25 мая, 2026 - 10:58

August 3 to October 2, 2026 in London | Applications close June 2 (AoE)

TL;DR: We're running our first Frontier Biodefense Fellowship at pivotal. Nine weeks, fully funded, 1:1 mentorship from Blueprint, SecureBio, SynX, Coefficient Giving, CLTR and more. Open to applicants from a wide range of backgrounds, including those without prior bio experience. Apply at fellowship.bio.

This post is the short version. For the longer argument about why we are running the fellowship, we will be posting our companion post soon.

Key info
  • Dates: 3 August to 2 October 2026 (9 weeks)
  • Location: In-person at LISA (London Initiative for Safe AI).
  • Extensions: Up to 2 months of continued funding, mentorship, and workspace for strong projects.
  • Funding: £6,000-£8,000 stipend, plus travel and accommodation support
  • Mentorship: 1:1 with mentors from Blueprint, SecureBio, SynX, Coefficient Giving, CLTR, and more.
  • Project areas: AIxBio, Biohardening, Detection, Governance/Policy, MCMs, Mirror Life, PPE, and Strategic Response Planning.
  • Eligibility: Anyone 18+ serious about contributing to biodefenses.
  • Apply: fellowship.bio by 2 June (AoE)
The fellowship

Pivotal is best known for our AI safety fellowships, which cover technical safety, technical governance, governance & policy, AIxBio, and more. This year, we're branching into the defense-in-depth agenda in biosecurity with the launch of our first Frontier Biodefense Fellowship.

For 9 weeks, fellows work in person at LISA on a research project with an external mentor. Each fellow gets weekly 1:1s with their mentor, weekly support from a Pivotal Research Manager who helps with scoping, blockers, and career planning, and a cohort of at least 20 peers working on biodefense problems. Group projects are possible and often encouraged.

The goal is to produce a research or practical output, typically a paper or policy brief, with blog posts and other formats also common. Fellows retain ownership of their research.

For strong projects, we offer up to 2 months of extension funding, mentorship, and workspace after the fellowship. In our last AI safety cohort, the extensions had an acceptance of ~90%, and it has become a substantial part of what we offer.

Browse the mentor list to see whether there's research you'd be excited to work on. In our experience, a strong match with a specific mentor can often matter more than your overall background.

The mentors

Our mentors are researchers in leading labs working on bio defense-in-depth and adjacent areas. They are Jacob Swett, Victoria Slaughter, Richard Williamson, & Brian Renda (Blueprint Biosecurity), Cassidy Nelson (CLTR), Chris Doering (SecureBio), Aman Patel (Coefficient Giving), Lennart Justen (MIT/Broad Institute), Sebastian Oehm & Askar Kleefeldt (SynX Therapeutics), Annabella Wheatley (Amodo Design), Skandan Ananthasekar (BU Pandemic Center), Anemone Franz (American Enterprise Institute), Sofya Lebedeva (Oxford) & Maximilian Görlitz (Blueprint Biosecurity), Chris Stamper, and Aaron Maiwald (Oxford/SecureBio).

Each mentor's profile lists their project ideas, what they're looking for in a mentee, what they're like to work with, and a short bio. You can check them out here.

The fellows

We're looking for people committed to working on biodefense. Our target audience is deliberately broad and includes strong undergraduates, early-career professionals, PhD candidates, experienced engineers, founders, policy researchers, and people moving into biosecurity from adjacent fields are all in scope. 

Prior biosecurity experience is welcome but not required. We’re looking for both researchers and operators, because much of the work needed to strengthen global biodefense will not be research.

The support

We provide a strong support system & infrastructure to help fellows focus on your project. Fellows receive a stipend of £6,000-£8,000, travel to and from London, accommodation support, and weekday lunch and dinner. Pivotal's research managers help with the research process, and with considerations around career planning.

The FAQ (Frequently Anticipated Questions)

How is biodefense different from biosecurity?

The terms overlap, and are used in varied ways. We see biosecurity/GCBR as the overarching category, where prevention (e.g. through lab safety, pathogen access, safeguards) is often emphasised.

Biodefense concentrates on the systems that protect us when that prevention fails (e.g. detection, protection, or response). Our fellowship primarily focuses on this, and we encourage you to explore the list of projects from our mentors to learn more about research directions.

Why is Pivotal running this? Why biodefense?

Our upcoming companion post will go into this question in more detail. In short, we think biorisk is one of the most pressing xrisk sources & we're likely to enter a transitional period soon where risks are increased due to AI capability increases. We expect our AI safety fellowship model to translate well, and the talent gap in biodefense to be so large that even a single cohort matters a lot.

Do I need a bio or biosecurity background?

No! We (and many of the mentors) are definitely looking for people from a wide range of backgrounds. The fellowship is also a place for motivated people with expertise in policy, engineering, economics, and many other adjacent fields. Each mentor has a ‘What I'm looking for in a Mentee’ section on our website. In our experience, a great match with a specific mentor can often matter more than your overall background.

What matters most is that you take catastrophic biological risk seriously, are motivated & self-directed, and are ready to dig into some really tough & novel problems.

How can I help?

If you know great candidates, recommend them and we’ll pay you $1,000 if we accept them based on your recommendation.

If you’re interested to mentor or work with us in the future, fill in this form or reach out at team@pivotal-research.org

If you have access to specific platforms and groups you think would be interested, feel free to spread the word (you can share a short message).



Discuss

We Need Unhobbled Donors

Новости LessWrong.com - 25 мая, 2026 - 09:06

Epistemic status: I work on AI safety communications, policy, and field-building. High confidence in the core claim that donors should be front-loading their giving. Lower confidence on magnitudes, recruitment strategies, and the activities of existing funders.

TL;DR: A large wave of philanthropic capital will enter the field, but it will arrive slowly and unevenly. This means that the neglectedness and tractability of different interventions will dramatically change. The field sorely needs unhobbled donors who can give fast before the wave, and seed the neglected projects megafunders will not.

"A good plan, violently executed now, is better than a perfect plan executed next week."
- George S. Patton


Individual donors and small grantmakers need to radically rethink their priorities, deployment timelines, and risk tolerances.

The world is finally waking up to the coming wave of philanthropic capital. Attention is rightfully shifting towards strategy and talent bottlenecks: what are the needed organizations and interventions, and who will make them happen.

But the days of being constrained by capital aren’t over.

We have no guarantee of how much money will get deployed, by when, or to what. Important, high-variance bets will likely remain unfunded. And capital is not flowing fast enough into the rapid grants needed to prepare.

What this means is those willing to act today are incredibly leveraged. They can fund the projects that will become dramatically more neglected, and seed efforts that will be newly tractable at scale.

This post lays out the need for unhobbled donors: the missing category of funders who are willing to deploy capital before the wave, support early-stage projects, take risky bets, and put their names behind public campaigns.

LeverageDiscount Rate

The discount rate on spending is extremely high. A dollar deployed in 2026 can get you things a dollar in 2028 cannot.

Political windows are closing. The midterms end in a few months. The Trump administration is developing its stance on AI. AI safety-conscious candidates are running for political offices. The story of 2026 will define what people run on in 2028. Money that arrives after these windows close will have a much smaller chance of affecting policy decisions.

Talent pipelines are still developing. Strong field-building programs can recruit smart young people now who are deciding whether to work on capabilities, safety, or policy. Interested founders can enter the space, engage with the threat models, and develop conviction in solutions. Researchers can go through established pipelines to get mentorship and build taste. However, this talent needs time to get its bearings before doing useful direct work. Onboarding new talent in time to contribute will become harder and harder.

Building credibility takes time. Institutional organizations ideally need years to develop track records and earn trust from policymakers, media, or the labs. It is difficult to find late substitutes for this time in fields where credibility is important. Building organizations in general also takes a non-negligible amount of time.

Timelines may be short. Will more capital even be useful in 2028 or 2029? If timelines are short, new funders may simply not be ready to deploy money before crunch time. The cost of existing donors giving too early is small compared to giving late. They are the only ones who can.

First-mover advantages. Agenda-setting is very powerful. It allows you to get more for less. What will advocacy groups be fighting for? Will AI safety remain a nonpartisan issue? What policy paradigms will key stakeholders consider? Leopold Aschenbrenner amplified the race dynamics frame in Situational Awareness. The first people to define the frameworks, paradigms, and words to make sense of the current moment will dictate how everyone else acts. Attention will only become harder to compete for over time.

Comparative Advantages

Importantly, money is not totally fungible. Some kinds of giving can only come from specific kinds of willing donors. If a funder is unhobbled, they can have an outsized impact.

Funding sources affect influence. Watchdog organizations and third-party evaluators need credible distance from the labs and connections to the groups they represent. METR cannot take money from the OpenAI Foundation. However, individual donors can write checks that do not compromise a recipient’s standing. This is harder with lab-adjacent funding exposure. In advocacy, unhobbled donors with diverse political backgrounds can be counterfactually responsible for making new, highly important policy campaigns possible.

c4 dollars are hard to come by. Most existing AI safety money is c3, which means it is tax-deductible and limited in its ability to be used for lobbying. c4 dollars are not tax-deductible and have no such limit. Available c4 capacity is small and valuable. Many donors are international, unwilling to forgo tax deductions, or hesitant about funding political projects. Megafunders will likely not make these grants either. Political action is highly neglected, and hobbled individuals are best positioned to close this funding gap.

Hard dollar donations are capped. Direct contributions to political campaigns are capped at $7000 per donor. This means that the ability to influence campaigns depends largely on donor count. A large number of small donors can have a much larger influence than a few large ones, who are unable to affect change with just check size. 

Public giving has power. Named giving can accomplish things anonymous giving cannot. An unrestricted, confident public donor, who puts their name behind a cause, can signal that it is serious and credible. These named donors can also play a large role in attracting new donors and creating political coalitions. 

Shaping

The grants that donors make now will shape the landscape and determine what gets scaled later. 

Seeding matters more than scaling. If new megafunders will be positioned to massively scale future organizations, current funders should focus more on creating new interventions than scaling. Existing megafunders are reasonably good at writing second and third large checks to proven organizations. However, they can struggle to scope or support newer projects. Seeding projects requires high tolerance for failure and fast decision-making. Small, new funders are best positioned for this work.

Megafunders will scale existing organizations. Early on, megafunders will struggle to develop incubation capacity on their own, and will initially be picking from the list of organizations that already exist. This means that individual donors can be counterfactually responsible for not just projects, but megaprojects that could exist because of their early support. This is an incredible opportunity.

Small funders can explore the option space. If you have uncertainty about which strategies will succeed, the right response is to seed a range of approaches. Small, decorrelated donors are better poised to do this than large funders. These donors can develop conviction about interventions that the market might be undervaluing.

Unhobbled donors can test different strategies. An individual donor provides value to the field because they have a distinct theory of victory and risk appetite. Large funders can concentrate capital into a single worldview, which leaves important bets unfunded. A diverse set of funders produces a diverse landscape of organizations that can hedge against the dominant strategy being incorrect. Even if individual donors defer their giving to donor advisory organizations, a diverse range of advisors can produce a similar effect.

Megafunders

Megafunders are not going to produce the necessary actions on their own.

Existing ones are working to change, but are slowed by bureaucracy and capacity constraints. They are not prepared to front-load their giving on the necessary timescales. New megafunders will arrive with their own constraints.

Existing Funders

The AI safety funding ecosystem is a monopsony. Funding is extremely concentrated in Coefficient Giving and Longview Philanthropy. These dominant funders dictate what the field can and cannot do, but are constrained in idiosyncratic ways and are limited in their ability to specialize.

Concentration is more harmful than helpful. Concentration helps credibility and coordination, prevents duplications, and efficiencies of scale. But it has also caused the neglect of many funding opportunities that are now low-hanging fruit for unhobbled donors. Caution can be justified at megafunder scale: with more money, grift and low-quality projects abound, and downside risks for bad bets can partially poison the well for a broader portfolio. Multiple grantmakers and unhobbled donors with reputational firewalls, specialization, and different social graphs will enable more ambitious action.

Institutional structures slow decision-making. Good projects often wait three to six months for funding decisions. Strong projects with motivated teams stagnate or miss windows of opportunity. The Future of Life Institute is reported to hold several hundred million dollars in endowment and paid out approximately only $30M in 2025.

Passive grantmaking is the default. Requests for Proposals (RFPs) are common, but do not work at scale: the most competent people to start new projects are not usually unemployed and waiting in the wings. Active grantmaking, finding strong founders and persuading them to take on specific work, is becoming more common but still rare. The small number of existing funders also makes it hard to give credible commitments of future funding to ambitious founders, which raises the risk of starting an organization. The cultural shift from passive grantmaker to being capable of attracting founders and developing new networks will take time.

Risk tolerance is low.  Reputational considerations and risk tolerance disqualify the most important opportunities. Funding will flow to well-known organizations like METR and Transluce, but neglected projects will stay neglected. These projects tend to require a higher risk tolerance: public-facing movement building, organizations representing stakeholder groups (e.g. labor, media, religious communities), relationship-building grants to DC think tanks, and interventions in general that have a more nebulous theory of change but high expected value.

Constraints are not legible. Clearly, these grantmakers have institutional constraints and strategic worldviews that make them appropriately cautious about what they support. Not all of these constraints are legible to the rest of the field. To their credit, CG has recognized the need for decorrelation and a diverse funder base. But the full extent of the gap has not been made clear, and new grantmaking organizations and unhobbled donors have not emerged.

Donor preferences are being aggregated. Sometimes, donor advisory organizations like Longview pool individual donors’ money into a single vehicle. This causes each unique donor, with their own theory of victory and preferences, to get flattened into the same averaged-out worldview. The decorrelation these donors could provide, and their ability to fund riskier, neglected projects, gets erased. Other times, this happens implicitly: donors take their cues from the same centralized funders and advisors, converging on a similar worldview and risk tolerance as megafunders.

The existing megafunders deserve lots of credit, but are (currently) failing to meet the moment.

New Megafunders

New megafunders will face their own constraints. The wave will arrive slower than expected, and not necessarily in the shape the field needs.

The wave will be slower than expected. Some have estimated that hundreds of billions of dollars in philanthropic capital are about to become liquid, largely from AI wealth. But both major funding sources are gated. The OpenAI Foundation has committed $25 billion (~10% of the Foundation’s value), but over an unspecified time frame. The framing of the Foundation as “the largest long-term beneficiary” of the for-profit's growth also suggests an endowment-style approach rather than a serious intention of trying to spend down capital in the years that matter. The Anthropic IPO is even further out: while a tender offer recently occurred, an IPO has not been announced, and post-IPO lockups will force employees to wait months before they can liquidate afterwards. Barring new ultra high-net worth individuals entering the space, the capital might come late, potentially too late.

New megafunders and donors are constrained. OpenAI Foundation faces serious optics constraints and has a similar governing board to the for-profit. Its pillars span a wide enough range that the pillars that are easier to spend on (life science, community programs) might absorb funding first. Work that materially affects OpenAI’s positioning will likely be implicitly off-limits. Anthropic employees, though they have the potential to become unhobbled donors, will likely have their own quirks. Many will be busy and want to delegate their philanthropic thinking entirely to trusted advisors. Some will avoid risky bets or political work that they perceive as outside the Overton window, opting to fund non-AI causes instead. Many will route through pooled vehicles for convenience, recreating the preference aggregation problem that incumbents suffer from.

Building infrastructure takes time. Even new megafunders that want to move quickly will not initially have the operational capacity to do so. Scoping new organizations, incubating founders, hiring staff, and massively scaling interventions all require institutional knowledge and processes. Creating that infrastructure takes time. Whether or not these megafunders will ultimately be successful depends in part on if existing funders and unhobbled donors can rise to the challenge to support them.

Unhobbled Donors

The capital has not materialized yet. Even when it does, the most important projects will remain unfunded. Bold, unhobbled donors are needed to close those gaps.

These donors can have orders of magnitude more impact than large grantmakers. They can fund the most neglected projects that no one else can. They can move quickly on time-sensitive work or on infrastructure that needs years to mature. By seeding new projects early, they shape what organizations megafunders eventually scale.

It has never been a better time to be an individual donor with conviction.

What unhobbled donors do

They deploy fast and make grants directly, not through pooled intermediaries. They are laser-focused on impact rather than legibility or reputational comfort, taking the bets megafunders structurally cannot. They give c4, accepting the loss of the tax deduction. They put their names on public campaigns, engage with the media, evangelize the cause, and actively recruit new donors. They have a theory of victory, a causal story for how their grants will help the future go well, and aggressively front-load their giving to support it.

Recruiting 

As the issue gets more salient, we might be on track to get more unhobbled donors by default. But given the importance of speed, we must be more proactive.

There are broadly three ways to close the gap: get existing megafunders to act bolder, activate more giving from individual donors already in the community, or recruit new donors entirely. Each is difficult.

Existing megafunders are trying, but are unlikely to become dramatically bold enough to completely solve the problem.

The most overlooked constituency is existing donors, especially those taking modest risks, splitting capital between causes, and deferring to advisors. We need to make the case to willing individuals that this moment warrants more aggressive deployment.

Recruiting new donors entirely is the most leveraged and the most difficult. The largest pools of recent wealth are mostly implicated in the problem they would be funding to address. People in general do not give, and right-of-center wealth, which would be especially useful for cross-partisan AI policy work, gives least.

The most viable candidates outside the implicated pool are scattered: billionaires and centimillionaires who are becoming concerned about AI, public figures with platforms, founders in adjacent industries. These donors will need to be found, persuaded, and supported by trusted advisors over months or years.

Actions

Existing funders should make demand more legible. As funders attempt to scale and front-load their giving, they should be more transparent about what they will and will not fund. By making their constraints legible publicly, they can more clearly communicate with potential donors about where they can be most impactful. Megafunders should also explore mechanisms to reward upstream funders, such as offering rebates to the previous funders of projects that they decide to scale.

Individual donors should spend differently. Look to front-load giving. Donor swaps allow donors to give to AI safety now in exchange for later commitments to other causes, or the reverse. Anthropic employees can take out loans against future giving. Regranting is a powerful tool for both small and large donors. Platforms like Manifund expose donors to new projects and allow for quick redeployment. Donors should resist organizations or platforms that aggregate their preferences.

The field should build donor advising infrastructure. Most donors who could become unhobbled are not ready to make complex grants on their own, and the field has not built the infrastructure to support them. This is a donor product design problem. New donors need clear default options, trusted advisors who can match them to opportunities without aggregating their preferences into one fund, and pathways into the field that do not require months of learning the landscape. Donors with higher risk tolerances and openness to neglected fields like politics should be carefully advised, and the field should coordinate to effectively allocate their capital.

If you can unhobble yourself, do it. Being unhobbled means giving up the things that donors are usually reasonable to want (reputational cover, tax deduction, confidence and institutional credibility). These are very real costs. But these are not normal times. Are the costs unhobbled giving could possibly have worth more than the direct impact? The dysfunction in the current landscape means there is enormous impact on the table for the ambitious philanthropists who are bold enough to take it. Stepping up to give ambitiously is a true service, and a sacrifice. It’s also exciting and energizing. Between now and 2028, the strategic playing field will be set. Why not shape it?



Discuss

Taxing Small Cars To Improve MPG

Новости LessWrong.com - 25 мая, 2026 - 00:50

Cars and trucks are getting bigger, and I had a vague sense that fuel economy regulations were partly to blame. Looking into it, it's hard to say how much is regulations vs people wanting to buy vehicles that look rugged, but the regulations really aren't helping.

This chart is the core of it:

This is what manufacturers were looking at when they decided to build today's cars. To figure out the target fuel economy for a vehicle you first calculate its "footprint", which is the area between the wheels. On our 2013 Honda Fit that's 4.8ft side-to-side and 8.2ft front-to-back, for a footprint of 39sqft. Then you ask if it's a car or truck. This tells you which curve to use, and where along it to look.

Looking at the chart we can now see why it's hard for Honda to sell a Fit today. The best Honda could do for a five-seater non-hybrid hatchback is maybe a CAFE rating of 44mpg. [1] This puts them 23mpg short, and if Honda was a one-model car company they'd expect to owe $3,910/vehicle in fines: $17 per 0.1mpg shortfall. Since the regulation is about an average across all the cars they sell the actual effect is both lower and more complex, and maybe something like $2k.

Aside: the fine structure here is a sad artifact of us thinking in miles-per-gallon instead of gallons-per-mile. Going from 25mpg (0.04 gpm) to 50mpg (0.02 gpm) saves as much gas as going from 50mpg (0.02 gpm) to infinite (0 gpm). But the penalty for being below a target is calculated on the gap in miles-per-gallon and not gallons-per-mile. If you miss a 50mpg (0.02gpm) target by hitting 25mpg (0.04gpm), or miss a 75mpg (0.013gpm) target by hitting 50mpg (0.02gpm), you pay the same fine even though the first involves burning much more counterfactual gas: over 10,000 miles the first burns 200 gallons more than its target while the second only burns 67 more.

What did Honda do? They discontinued the Fit, and replaced it with the HR-V. It's bigger and heavier, and looks like it was trying to be a "light truck". Combined with its larger footprint that would give a much lower target: 49mpg instead of 67mpg. It still doesn't hit that, but it's less of a penalty. And then it doesn't actually count as a light truck, though I don't know if that was the plan from the beginning or a compromise they had to accept.

Overall, this regulatory structure taxes manufacturers more for making small low vehicles, the kind that are easiest to make fuel efficient. Here's where I would write that this is counterproductive and we should stop, except we sort of already did. In 2025 the penalty for non-compliance was set to $0 as part of the OBBBA. This means in some sense manufacturers are free to make small cars and trucks with achievable mileage. Except the rest of the structure is still there, complete with the distorted incentives, and ready to be reinstated by a future government.

If at some point there's political will to improve this situation, and a carbon tax remains off the table, I'd like to see a return to the simpler Ford-era system where targets didn't scale with vehicle size. But then I'd need to understand why they switched to this system (if it's crash safety we should legislate that directly) and it's not clear that continued regulatory whiplash is worth it.


[1] The closest to 67mpg would be something like the first-gen Honda Insight. This got very close, but seating only two people with a lightweight construction that would do very poorly in modern crash testing. If you're willing to make it a hybrid, which does add significant cost, it is possible: the the Jazz e:HEV (essentially a hybrid Fourth-generation Fit) would probably come in around 72mpg.

Comment via: facebook, mastodon, bluesky



Discuss

A (Slightly) Mechanistic Theory for Exponentially Increasing AI Time Horizons?

Новости LessWrong.com - 24 мая, 2026 - 18:52

AI ‘time horizons’ are mostly not about time (I think it’s mostly ‘data’, but you’ll see where I’m unsure).

One chart from 2025 has become perhaps the most (in)famous in modern AI commentary.

For those in the know, ‘the METR graph[1] is unusually compelling because it achieves what so few measures of AI progress have achieved: a somewhat meaningful Y axis (‘time horizon’[2]) as well as a somewhat predictable trend over time! (This is remarkably rare!)

Frustratingly, the only superficially available takeaway is something like, ‘the line goes up straight-ish over time’. This is better than nothing, but it’s very dissatisfactory from the point of view of getting confidence in the predictions, because it exposes no deeper mechanism. This drives a lot of confusion and argument about the implications.

A deeper mechanism would be good for two reasons:

  • It enables a sanity check on the trend, perhaps enabling more confidence in its predictions than we would sensibly allow with only the surface understanding.
  • It gives some way to interrogate when and how the trend might change (because if the deeper mechanism gets deflected, the superficial projection would be broken, but a prediction based on the deeper mechanism might stay viable for longer).
    • (A sub-reason: if we want the trend to change, knowing some more mechanism might shed light on some levers to pull rather than sitting around to wait and see.)

As an analogy, a similarly superficial trend, Moore’s Law, can be a little better mechanistically explained by the more general Wright’s Law[3]. This is great, because that law covers more cases, and it can handle some deflection from the trend, or give some idea of when (and under what conditions) the trend might break. Important when looking at plausible futures, and how to steer toward desirable ones!

Attempting to find some mechanism in the METR graphTask ‘length’ and success modelling

Why did METR focus on ‘task length’?

First, it’s not how long the AI agent takes. It’s how long the task in question takes a panel of sampled human experts, on average[4]. So in their ‘time horizon’ measurements, METR is capturing the effective hours of human-expert-equivalent activity that AI agents can carry out.[5]

One way to think about the time it takes human experts to complete a task is that, for each subtask they had to know how to do (or be able to figure out how to do) and then successfully execute, the overall task takes incrementally longer. By how much? That depends on exactly what ‘subtasks’ we're imagining breaking things down into.[6] But on average longer tasks correspond to more distinct challenges, all else equal.[7]

A random generation of tasks (rows) with ‘subtasks’ as segments, sorted by subtask count from least to most. You can see that the more subtasks, the longer, on average. It’s a little ragged — not all subtasks are the same length, so occasionally fewer, longer subtasks add up to more overall time than more, shorter subtasks. What METR can easily measure is the overall duration. Even if the subtask division is somewhat subjectively defined, duration stands as a reasonable proxy for it. Note that the vertical subtask count axis is sorted but not uniformly spaced. (Created with claude.ai.)

This is the first piece of mechanism we should take into account. ‘Time’ is not agent time: it's a noisy estimate for ‘number of somewhat challenging requirements necessary to complete the task’.[8]

This is treating overall tasks as formed by something like drawing ‘subtasks’ out of a large collection of possible requirements. Given the agent’s general competence, specific knowledge, tools available, and opportunity to retry or learn on the fly, sometimes the agent can meet these requirements. Other times it can’t.[9] ‘Longer’ tasks simply draw more subtasks (that’s why they’re ‘longer’, in this model: expert humans had more subtasks they needed to carry out).[10]

Toby Ord demonstrates one way to take this intuition further, noting that if we explicitly model overall success mjx-container[jax="CHTML"] { line-height: 0; } mjx-container [space="1"] { margin-left: .111em; } mjx-container [space="2"] { margin-left: .167em; } mjx-container [space="3"] { margin-left: .222em; } mjx-container [space="4"] { margin-left: .278em; } mjx-container [space="5"] { margin-left: .333em; } mjx-container [rspace="1"] { margin-right: .111em; } mjx-container [rspace="2"] { margin-right: .167em; } mjx-container [rspace="3"] { margin-right: .222em; } mjx-container [rspace="4"] { margin-right: .278em; } mjx-container [rspace="5"] { margin-right: .333em; } mjx-container [size="s"] { font-size: 70.7%; } mjx-container [size="ss"] { font-size: 50%; } mjx-container [size="Tn"] { font-size: 60%; } mjx-container [size="sm"] { font-size: 85%; } mjx-container [size="lg"] { font-size: 120%; } mjx-container [size="Lg"] { font-size: 144%; } mjx-container [size="LG"] { font-size: 173%; } mjx-container [size="hg"] { font-size: 207%; } mjx-container [size="HG"] { font-size: 249%; } mjx-container [width="full"] { width: 100%; } mjx-box { display: inline-block; } mjx-block { display: block; } mjx-itable { display: inline-table; } mjx-row { display: table-row; } mjx-row > * { display: table-cell; } mjx-mtext { display: inline-block; text-align: left; } mjx-mstyle { display: inline-block; } mjx-merror { display: inline-block; color: red; background-color: yellow; } mjx-mphantom { visibility: hidden; } _::-webkit-full-page-media, _:future, :root mjx-container { will-change: opacity; } mjx-math { display: inline-block; text-align: left; line-height: 0; text-indent: 0; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; border-collapse: collapse; word-wrap: normal; word-spacing: normal; white-space: nowrap; direction: ltr; padding: 1px 0; } mjx-container[jax="CHTML"][display="true"] { display: block; text-align: center; margin: 1em 0; } mjx-container[jax="CHTML"][display="true"][width="full"] { display: flex; } mjx-container[jax="CHTML"][display="true"] mjx-math { padding: 0; } mjx-container[jax="CHTML"][justify="left"] { text-align: left; } mjx-container[jax="CHTML"][justify="right"] { text-align: right; } mjx-mi { display: inline-block; text-align: left; } mjx-c { display: inline-block; } mjx-utext { display: inline-block; padding: .75em 0 .2em 0; } mjx-mo { display: inline-block; text-align: left; } mjx-stretchy-h { display: inline-table; width: 100%; } mjx-stretchy-h > * { display: table-cell; width: 0; } mjx-stretchy-h > * > mjx-c { display: inline-block; transform: scalex(1.0000001); } mjx-stretchy-h > * > mjx-c::before { display: inline-block; width: initial; } mjx-stretchy-h > mjx-ext { /* IE */ overflow: hidden; /* others */ overflow: clip visible; width: 100%; } mjx-stretchy-h > mjx-ext > mjx-c::before { transform: scalex(500); } mjx-stretchy-h > mjx-ext > mjx-c { width: 0; } mjx-stretchy-h > mjx-beg > mjx-c { margin-right: -.1em; } mjx-stretchy-h > mjx-end > mjx-c { margin-left: -.1em; } mjx-stretchy-v { display: inline-block; } mjx-stretchy-v > * { display: block; } mjx-stretchy-v > mjx-beg { height: 0; } mjx-stretchy-v > mjx-end > mjx-c { display: block; } mjx-stretchy-v > * > mjx-c { transform: scaley(1.0000001); transform-origin: left center; overflow: hidden; } mjx-stretchy-v > mjx-ext { display: block; height: 100%; box-sizing: border-box; border: 0px solid transparent; /* IE */ overflow: hidden; /* others */ overflow: visible clip; } mjx-stretchy-v > mjx-ext > mjx-c::before { width: initial; box-sizing: border-box; } mjx-stretchy-v > mjx-ext > mjx-c { transform: scaleY(500) translateY(.075em); overflow: visible; } mjx-mark { display: inline-block; height: 0px; } mjx-mn { display: inline-block; text-align: left; } mjx-msup { display: inline-block; text-align: left; } mjx-msub { display: inline-block; text-align: left; } mjx-TeXAtom { display: inline-block; text-align: left; } mjx-mfrac { display: inline-block; text-align: left; } mjx-frac { display: inline-block; vertical-align: 0.17em; padding: 0 .22em; } mjx-frac[type="d"] { vertical-align: .04em; } mjx-frac[delims] { padding: 0 .1em; } mjx-frac[atop] { padding: 0 .12em; } mjx-frac[atop][delims] { padding: 0; } mjx-dtable { display: inline-table; width: 100%; } mjx-dtable > * { font-size: 2000%; } mjx-dbox { display: block; font-size: 5%; } mjx-num { display: block; text-align: center; } mjx-den { display: block; text-align: center; } mjx-mfrac[bevelled] > mjx-num { display: inline-block; } mjx-mfrac[bevelled] > mjx-den { display: inline-block; } mjx-den[align="right"], mjx-num[align="right"] { text-align: right; } mjx-den[align="left"], mjx-num[align="left"] { text-align: left; } mjx-nstrut { display: inline-block; height: .054em; width: 0; vertical-align: -.054em; } mjx-nstrut[type="d"] { height: .217em; vertical-align: -.217em; } mjx-dstrut { display: inline-block; height: .505em; width: 0; } mjx-dstrut[type="d"] { height: .726em; } mjx-line { display: block; box-sizing: border-box; min-height: 1px; height: .06em; border-top: .06em solid; margin: .06em -.1em; overflow: hidden; } mjx-line[type="d"] { margin: .18em -.1em; } mjx-mrow { display: inline-block; text-align: left; } mjx-mtable { display: inline-block; text-align: center; vertical-align: .25em; position: relative; box-sizing: border-box; border-spacing: 0; border-collapse: collapse; } mjx-mstyle[size="s"] mjx-mtable { vertical-align: .354em; } mjx-labels { position: absolute; left: 0; top: 0; } mjx-table { display: inline-block; vertical-align: -.5ex; box-sizing: border-box; } mjx-table > mjx-itable { vertical-align: middle; text-align: left; box-sizing: border-box; } mjx-labels > mjx-itable { position: absolute; top: 0; } mjx-mtable[justify="left"] { text-align: left; } mjx-mtable[justify="right"] { text-align: right; } mjx-mtable[justify="left"][side="left"] { padding-right: 0 ! important; } mjx-mtable[justify="left"][side="right"] { padding-left: 0 ! important; } mjx-mtable[justify="right"][side="left"] { padding-right: 0 ! important; } mjx-mtable[justify="right"][side="right"] { padding-left: 0 ! important; } mjx-mtable[align] { vertical-align: baseline; } mjx-mtable[align="top"] > mjx-table { vertical-align: top; } mjx-mtable[align="bottom"] > mjx-table { vertical-align: bottom; } mjx-mtable[side="right"] mjx-labels { min-width: 100%; } mjx-mtr { display: table-row; text-align: left; } mjx-mtr[rowalign="top"] > mjx-mtd { vertical-align: top; } mjx-mtr[rowalign="center"] > mjx-mtd { vertical-align: middle; } mjx-mtr[rowalign="bottom"] > mjx-mtd { vertical-align: bottom; } mjx-mtr[rowalign="baseline"] > mjx-mtd { vertical-align: baseline; } mjx-mtr[rowalign="axis"] > mjx-mtd { vertical-align: .25em; } mjx-mtd { display: table-cell; text-align: center; padding: .215em .4em; } mjx-mtd:first-child { padding-left: 0; } mjx-mtd:last-child { padding-right: 0; } mjx-mtable > * > mjx-itable > *:first-child > mjx-mtd { padding-top: 0; } mjx-mtable > * > mjx-itable > *:last-child > mjx-mtd { padding-bottom: 0; } mjx-tstrut { display: inline-block; height: 1em; vertical-align: -.25em; } mjx-labels[align="left"] > mjx-mtr > mjx-mtd { text-align: left; } mjx-labels[align="right"] > mjx-mtr > mjx-mtd { text-align: right; } mjx-mtd[extra] { padding: 0; } mjx-mtd[rowalign="top"] { vertical-align: top; } mjx-mtd[rowalign="center"] { vertical-align: middle; } mjx-mtd[rowalign="bottom"] { vertical-align: bottom; } mjx-mtd[rowalign="baseline"] { vertical-align: baseline; } mjx-mtd[rowalign="axis"] { vertical-align: .25em; } mjx-msubsup { display: inline-block; text-align: left; } mjx-script { display: inline-block; padding-right: .05em; padding-left: .033em; } mjx-script > mjx-spacer { display: block; } mjx-c::before { display: block; width: 0; } .MJX-TEX { font-family: MJXZERO, MJXTEX; } .TEX-B { font-family: MJXZERO, MJXTEX-B; } .TEX-I { font-family: MJXZERO, MJXTEX-I; } .TEX-MI { font-family: MJXZERO, MJXTEX-MI; } .TEX-BI { font-family: MJXZERO, MJXTEX-BI; } .TEX-S1 { font-family: MJXZERO, MJXTEX-S1; } .TEX-S2 { font-family: MJXZERO, MJXTEX-S2; } .TEX-S3 { font-family: MJXZERO, MJXTEX-S3; } .TEX-S4 { font-family: MJXZERO, MJXTEX-S4; } .TEX-A { font-family: MJXZERO, MJXTEX-A; } .TEX-C { font-family: MJXZERO, MJXTEX-C; } .TEX-CB { font-family: MJXZERO, MJXTEX-CB; } .TEX-FR { font-family: MJXZERO, MJXTEX-FR; } .TEX-FRB { font-family: MJXZERO, MJXTEX-FRB; } .TEX-SS { font-family: MJXZERO, MJXTEX-SS; } .TEX-SSB { font-family: MJXZERO, MJXTEX-SSB; } .TEX-SSI { font-family: MJXZERO, MJXTEX-SSI; } .TEX-SC { font-family: MJXZERO, MJXTEX-SC; } .TEX-T { font-family: MJXZERO, MJXTEX-T; } .TEX-V { font-family: MJXZERO, MJXTEX-V; } .TEX-VB { font-family: MJXZERO, MJXTEX-VB; } mjx-stretchy-v mjx-c, mjx-stretchy-h mjx-c { font-family: MJXZERO, MJXTEX-S1, MJXTEX-S4, MJXTEX, MJXTEX-A ! important; } @font-face /* 0 */ { font-family: MJXZERO; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Zero.woff") format("woff"); } @font-face /* 1 */ { font-family: MJXTEX; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Regular.woff") format("woff"); } @font-face /* 2 */ { font-family: MJXTEX-B; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Bold.woff") format("woff"); } @font-face /* 3 */ { font-family: MJXTEX-I; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Math-Italic.woff") format("woff"); } @font-face /* 4 */ { font-family: MJXTEX-MI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Italic.woff") format("woff"); } @font-face /* 5 */ { font-family: MJXTEX-BI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Math-BoldItalic.woff") format("woff"); } @font-face /* 6 */ { font-family: MJXTEX-S1; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size1-Regular.woff") format("woff"); } @font-face /* 7 */ { font-family: MJXTEX-S2; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size2-Regular.woff") format("woff"); } @font-face /* 8 */ { font-family: MJXTEX-S3; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size3-Regular.woff") format("woff"); } @font-face /* 9 */ { font-family: MJXTEX-S4; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size4-Regular.woff") format("woff"); } @font-face /* 10 */ { font-family: MJXTEX-A; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_AMS-Regular.woff") format("woff"); } @font-face /* 11 */ { font-family: MJXTEX-C; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Calligraphic-Regular.woff") format("woff"); } @font-face /* 12 */ { font-family: MJXTEX-CB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Calligraphic-Bold.woff") format("woff"); } @font-face /* 13 */ { font-family: MJXTEX-FR; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Fraktur-Regular.woff") format("woff"); } @font-face /* 14 */ { font-family: MJXTEX-FRB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Fraktur-Bold.woff") format("woff"); } @font-face /* 15 */ { font-family: MJXTEX-SS; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Regular.woff") format("woff"); } @font-face /* 16 */ { font-family: MJXTEX-SSB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Bold.woff") format("woff"); } @font-face /* 17 */ { font-family: MJXTEX-SSI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Italic.woff") format("woff"); } @font-face /* 18 */ { font-family: MJXTEX-SC; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Script-Regular.woff") format("woff"); } @font-face /* 19 */ { font-family: MJXTEX-T; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Typewriter-Regular.woff") format("woff"); } @font-face /* 20 */ { font-family: MJXTEX-V; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Vector-Regular.woff") format("woff"); } @font-face /* 21 */ { font-family: MJXTEX-VB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Vector-Bold.woff") format("woff"); } mjx-c.mjx-c1D446.TEX-I::before { padding: 0.705em 0.645em 0.022em 0; content: "S"; } mjx-c.mjx-c1D461.TEX-I::before { padding: 0.626em 0.361em 0.011em 0; content: "t"; } mjx-c.mjx-c1D443.TEX-I::before { padding: 0.683em 0.751em 0 0; content: "P"; } mjx-c.mjx-c28::before { padding: 0.75em 0.389em 0.25em 0; content: "("; } mjx-c.mjx-c29::before { padding: 0.75em 0.389em 0.25em 0; content: ")"; } mjx-c.mjx-c3D::before { padding: 0.583em 0.778em 0.082em 0; content: "="; } mjx-c.mjx-c31::before { padding: 0.666em 0.5em 0 0; content: "1"; } mjx-c.mjx-c2212::before { padding: 0.583em 0.778em 0.082em 0; content: "\2212"; } mjx-c.mjx-c1D437.TEX-I::before { padding: 0.683em 0.828em 0 0; content: "D"; } mjx-c.mjx-c2F::before { padding: 0.75em 0.5em 0.25em 0; content: "/"; } mjx-c.mjx-c32::before { padding: 0.666em 0.5em 0 0; content: "2"; } mjx-c.mjx-c1D6FC.TEX-I::before { padding: 0.442em 0.64em 0.011em 0; content: "\3B1"; } mjx-c.mjx-c1D6FD.TEX-I::before { padding: 0.705em 0.566em 0.194em 0; content: "\3B2"; } mjx-c.mjx-c6C::before { padding: 0.694em 0.278em 0 0; content: "l"; } mjx-c.mjx-c6F::before { padding: 0.448em 0.5em 0.01em 0; content: "o"; } mjx-c.mjx-c67::before { padding: 0.453em 0.5em 0.206em 0; content: "g"; } mjx-c.mjx-c2061::before { padding: 0 0 0 0; content: ""; } mjx-c.mjx-c221D::before { padding: 0.442em 0.778em 0.011em 0; content: "\221D"; } mjx-c.mjx-c1D45B.TEX-I::before { padding: 0.442em 0.6em 0.011em 0; content: "n"; } mjx-c.mjx-c74::before { padding: 0.615em 0.389em 0.01em 0; content: "t"; } mjx-c.mjx-c72::before { padding: 0.442em 0.392em 0 0; content: "r"; } mjx-c.mjx-c61::before { padding: 0.448em 0.5em 0.011em 0; content: "a"; } mjx-c.mjx-c69::before { padding: 0.669em 0.278em 0 0; content: "i"; } mjx-c.mjx-c6E::before { padding: 0.442em 0.556em 0 0; content: "n"; } mjx-c.mjx-c1D6FE.TEX-I::before { padding: 0.441em 0.543em 0.216em 0; content: "\3B3"; } mjx-c.mjx-c28.TEX-S1::before { padding: 0.85em 0.458em 0.349em 0; content: "("; } mjx-c.mjx-c29.TEX-S1::before { padding: 0.85em 0.458em 0.349em 0; content: ")"; } mjx-c.mjx-c34::before { padding: 0.677em 0.5em 0 0; content: "4"; } according to a simple model where chance of failure compounds with task ‘length’, , we get a reasonable fit for the data METR collected. (Interestingly Toby mainly seems to continue treating this as ‘agent time’. I’ll instead take as given that we’re talking about a proxy for number of subtasks.)

In other words, for a given AI agent and task domain, there's something like a ‘hazard rate’, (per-subtask probability of failure), which reasonably well summarises (and predicts) the AI's level of success in that domain:

(i.e. to succeed at a -step task, the agent must not fail — must avoid the ‘hazard’ — times.)

This enables us to translate back and forth between an estimate of this hazard rate and an estimate of a ‘half-life’ or 50% success horizon — how ‘long’ (i.e. complex) a task needs to be before the agent fails more often than not — and also to extrapolate to ‘durations’ corresponding to other reliability levels, like 99% or 99.9%[11].

In this formulation, the hazard rate, , stands in for what fraction of our ‘subtask’ pool the agent can’t (yet) succeed at, which ends up being a reasonable summary of the agent’s competence in this domain.[12]

This time, we’re looking at overall task success as if the agent has a 98% chance of meeting any particular subtask’s requirements. Sometimes a shorter task will happen to have one of the difficult subtasks — but usually they’re overall successful. As tasks get longer, there’s a greater chance that at least one subtask requirement is insurmountable at this reliability level. Among longer tasks, overall success becomes fewer and farther between. This agent can’t expect to often succeed on tasks longer than 50 or so subtasks.

If you have a new task, you don’t know if the agent has all it needs to complete it. But the task ‘length’ is an indicator of how many tricky subtasks it has, and similar-lengthed tasks will have similar numbers of such subtasks — so their average success rate is a good estimate for how likely the agent is to succeed at this new task.

Relating hazard rate with frontier AI development

METR's graph is compelling because it suggests a steadily increasing frontier of success horizon as AI developers produce new agents over time.

What does this imply if we interrogate our hazard rate model? Well, 'half-life' (and indeed various success-level horizons) is observed apparently growing exponentially with date :

This is the central striking takeaway from the METR graph (modulo their measurement uncertainty). Half-life go up!

But half-life according to our model has:

where is the per-step hazard rate from before. When this is not too close to 1, that half-life is, fairly intuitively, approximately proportional to the reciprocal of the hazard rate:

So METR's observation of rising time horizons is equivalent to saying that the frontier hazard rate is shrinking exponentially over time.

Recall that this hazard rate corresponds with the fraction of ‘subtasks’ in a domain that an agent doesn’t yet know how to complete. So this fraction is presumed to shrink roughly exponentially with date, in turn driving the observed ‘longer’ success horizons.

Why does hazard rate shrink with date?

Here’s where to look for the next bit of mechanism. Why would the hazard rate, the fraction of ‘subtasks’ which remain out of reach, shrink in that way?

It goes without saying that AI developers are chasing after increasing competence in their products, so (if they are doing anything at all right!) the direction of movement is unsurprising. Why that particular roughly-exponential form, though?

I confess here I’m uncertain and the quest for more mechanism continues.

My best guess is that it’s about the effective evidence available to the agent toward subtask solution strategy. Intuitively, if you’ve seen very similar subtasks many times before, it’s hard to go too wrong. If you’ve only seen vaguely similar subtasks once or twice, you’re in much less familiar territory and stand a good chance of stalling. Suggestively, effective evidence and training data are both information-like quantities, but I don’t want to make too much of that without a crisper connection. Formally, we could consider how many bits of evidence the agent can muster about how to proceed (either from past learning or by exploring in context).

In other words, training produces learnings. These range from broad, generally-applicable heuristics for adaptable, effective behaviour (experiment, test your work, notice when something surprising happens, read the manual if you can find one, accrue power and resources at any opportunity, ...), to narrow specific details about particular situations and activities (Earth's radius is roughly 6.4 megameters, detonating TNT yields roughly 4.2 kJ/g, humans succumb to oxygen deprivation after around 5 minutes, …). Ahem.

Empirically, AI developers have historically poured something like exponentially increasing ‘quantities’ of ‘data’ into their machine learning pipelines.[13] Mathematically, that implies a power law: data inputs rising at one exponential rate, matched by hazard rate decaying at another exponential rate.

Power laws aren’t deeply mechanically explanatory, but they’re often the best we have in machine learning, and are at least more predictable than mere date-based trends. Under the simple subtask model described here, this power law translates directly into a power law between ‘time horizon’ and data. This is actually the same level of explanatory improvement offered by Wright’s Law over Moore’s: not fully mechanistic, but an extra layer of detail which offers firmer purchase on what’s going on.

What this doesn’t straightforwardly account for is the benefit to success rates of increased in-context reasoning, which is exhibited according to METR’s estimates. I expect this is operating on those borderline subtasks — where the agent would have some slim chance of satisfying them if it ‘rushed’. In those cases, ‘thinking harder’ may more effectively recall and combine the relevant learned knowledge, and allow better choices for exploratory discovery in situ. In any case, changing the thinking budget of an otherwise similar existing system certainly calls for a more mechanistic understanding than mere date-based trend extrapolation!

I would be thrilled if someone with more smarts, time to experiment, and access to data were to dig into ways we could match up various AI production inputs (especially ‘data’ in various forms) with observed outputs like ‘time horizon’. One of the more difficult pieces might be quantifying ‘data’, especially teasing apart what types of evidence are ‘relevant’ for the domain and tasks at hand.

Upshot

The kind-of-boring upshot of this is that data and ‘practice’ on related tasks makes AI better at those tasks! This is boring because, well obviously!, we already basically knew that. But it’s encouraging because we can say a little more than that, which gives us some better grasp on what’s driving ‘time horizon’ progress in particular domains — and it can help get more precise about predictions.

The fact that the ‘subtask’ model — with a ‘hazard rate’ of subtasks currently out of reach — is a fairly explanatory fit for capability profiles of individual agents is evidence that there’re not unusual amounts of generalisation capability in AI. As with humans, they can extrapolate a bit, but need ‘experience’ and examples to succeed.[14] Importantly, this means that vast in silico training ranges for software, cyber, and mathematics very likely won’t transfer much to other domains of interest, like interpersonal intelligence, medical discovery, bioweapons development, intelligence analysis, and robotic manipulation. Of course, like with every domain of human experience and activity, we have some relevantly-similar data already collected, and schemes can be devised to more rapidly expand that digitised experience bank for AI to learn from. Increasing adoption of AI in task-integrated contexts, industrial deployment, and even explicit approaches to gathering example data such as ‘hand movement farming’ are the leading indicators to watch for progress in particular domains — not just the headline benchmark metrics in software-like tasks.

For some types of activity, developers are probably ‘running out’ of raw example data to scrape from the internet. The era of mostly-pretraining is over. For domains which can be relatively easily verified, like mathematics and coding, this is very surmountable — you can just run drills galore on a computer and get data that way. But this costs extra compute and doesn’t scale at the same exponential rate for long (perhaps 10x/year presently). As soon as this year, developers could be back to ‘only’ scaling compute around 4x per year (and a bit after that they might have bought most of the compute! — and will only be able to scale at the positively sloth-like 1.5x-ish a year of underlying hardware progress). I don’t feel confident extrapolating exactly where that cashes out, but if the data-driven subtask-learning model is right, it would imply we should see less steepness to the time horizon growth quite soon.[15]

Some commentaries project that, once AI can autonomously do software and machine learning work reliably, it will thereafter enter a ‘recursive self-improvement’ phase and rapidly colonise all capabilities. I don’t think this is missing the point entirely: there will be modest multipliers on the speed of the AI development pipeline, and we might see an ‘explosion’ in the speed and cost-effectiveness of AI (because they are among the most immediately-verifiable properties to iterate on). But generalisation doesn’t come for free, so on-task data and compute will remain crucial to broadening the frontier of autonomous capabilities. Collecting that data and manufacturing that compute look to me like the rate-limiting steps, and therefore the major leading indicators to use in foresight. The best case I can make for a much more general explosion is if the speed and cost-effectiveness explosions rapidly accelerate the gathering and digestion of diverse task data — but I think that remains mostly rate-limited in the familiar ways: some domains easy and some more difficult. Don’t mistake me for ruling out across-the-board AI capability! Companies are charging ahead with data collection and set on automating much of their AI production pipeline. It just won’t happen overnight.

Thanks to Coz Ududec for a conversation prompting me to think about this.

  1. ^

    Produced by AI monitoring non-profit METR

  2. ^

    Very importantly, it’s measured within a particular collection of challenges/tasks which are mostly associated with software development, especially ML engineering. METR also has a great preliminary study of some other domains, finding differing, but perhaps also somewhat predictable trends.

  3. ^

    Moore’s Law is the very superficial observation that, over time, the number of transistors per chip doubles roughly every two years. (More recently, it’s been more clearly expressed as the price per transistor halving every year-or-two.)

    Wright’s Law is the slightly more mechanistic and general observation that production of many commodities follows ‘learning curves’, such that each doubling of cumulative production produces roughly similar relative cost savings. (We can in turn attempt to explain this in yet more mechanistic terms, pointing to the insight gained from observing and recording many trials and experiments, with suitably diminishing returns.)

    Now, if the quantity demanded and produced grows exponentially over time (as it has for computer chips), then Wright’s Law predicts comparable cost savings each year: Moore’s Law. If the quantity produced grows (or shrinks) in some other pattern over time, Wright’s Law, by accounting for this mechanistic detail, can often forecast cost trends more reliably than Moore’s.

  4. ^

    Also note that the estimation of ‘task length’ according to human experts was quite crude (naturally, humans are the most expensive part of most experiments!), and there are good reasons to treat the reported error bars as much too narrow, i.e. misleadingly confident. I’ll use quotes around ‘time’ related quantities in this post as a reminder that it’s a loose estimate of a crudely human-performer-derived time-to-completion for tasks, and doesn’t correspond well to real time as such.

  5. ^

    I don’t know if METR publishes how long the agents themselves take at these tasks — I don’t think so, and it’d arguably be ill-defined anyway since it would depend in part on how fast a computer you ran the agent on.

  6. ^

    If we conceptually carve up subtasks into smaller pieces, they'll be quicker per piece, but there are commensurably more of them, and vice versa.

  7. ^

    This could come apart if longer tasks are systematically more likely to include repetitive similar activities rather than a series of distinct ones, for example. Or longer tasks might tend to admit more truly alternative pathways. Both these effects could make longer tasks slightly easier than the naive picture. There are also higher-level ‘orchestration’ tasks i.e. coherently coming up with (and executing and adapting) an appropriate sequential plan: perhaps these might be systematically more difficult for longer tasks.

  8. ^

    Notably, agents sometimes take a (relatively) longer time to do something that’s quicker for humans, and vice versa.

  9. ^

    Incidentally, success (or not) here already accounts for the agent attempting and re-attempting steps or fixing earlier mistakes, which might take variable amounts of time: another reason not to treat this as agent time. Some subtasks might be intermediate and succeed sometimes (for example if the agent can’t easily choose the best approach but sometimes hits on the right one, or sometimes gets stuck in a terminal cycle but sometimes makes lucky progress.)

  10. ^

    This is throwing away some detail: obviously not all subtasks are equally likely to follow from each other! There’s some correspondence between on-task sequences. But within a particular domain (like software engineering), this naive model of overall tasks combining subtasks somewhat randomly seems to do OK.

  11. ^

    By the way, the rule of 72 provides a really quick mental approximation for the higher-reliability ‘time’ horizons, depending on the ‘half-life’ (the 50% ‘time’ horizon).

    Divide the ‘half-life’ by 72. That’s the 1% failure horizon (equivalent to the 99% success horizon). Multiply by your target failure rate in percent, and you’re done: that’s your target success ‘time’ horizon. E.g. if ‘half-life’ is 1h, the ‘time’ horizon at 99.9% is (1h/72)*(0.1) i.e. 5 seconds.

    (This also reveals that cutting the ‘time’ horizon tenfold cuts the average failure rate tenfold and so on.)

    Going the other way, estimating long-horizon success rates, divide your target horizon by the ‘half-life’. That’s how many halvings of success to expect: raise one half to that power for your success rate. E.g. if ‘half-life’ is 1h, your 24h success rate is i.e. one in sixteen million.

  12. ^

    It didn’t have to be that way! A single number which manages to explain a lot of variation in agent capability is very suggestive of an underlying mechanism something like the ‘fraction of subtasks’ model I’ve described here. Of course there is still some residual uncertainty and there may be better summaries available with a more detailed model or epicycles on this one.

  13. ^

    This may recently be trickier to measure as training pipelines have adapted to incorporate more reinforcement learning, which means these experience data are less ‘homogeneously slurped up from the internet’ and increasingly ‘proactively curated from in-domain training curricula’. So the mere quantity of data isn’t like-for-like over time.

  14. ^

    In fact contemporary AI is perhaps substantially less good at generalisation than humans, though I’d like to be better informed about how factors like sample efficiency of AI learning (including in-context learning) stack up.

  15. ^

    Actually saying something so bearish about AI makes me nervous, as there is a venerable history of people boldly declaring AI is about to hit a wall! But I think it’s borne out. I’m not saying progress stops, I’m saying it probably gets slower (in exponential terms).



Discuss

Neurogastronomic Phenomenology for Advanced Beginners, Applied and Pure

Новости LessWrong.com - 24 мая, 2026 - 16:56

(This one's a double-header on the tightly-linked senses of smell and taste, especially pertaining to foodcraft; it comprises both The Space of Olfaction is δ-Hyperbolic and A Partial Theory of Flavor Pairing in Foodcraft. You could read them in either order. I've chosen to put the more widely-appealing one about food phenomenology first and the less polished and more abstract and speculative one about olfaction second. Dedicated to SR-S and SS on the occasion of their marriage, and to the whole S family, who has already begun to benefit from this theoretical depth. Enjoy the sixspice buns!)

(Part 1: Theory of Foodcraft. Epistemic status: only partially worked out, lots of handwaving, still not something I've seen talked much about explicitly anywhere.)

(With thanks to @johnswentworth, @Morphism, @WhatsTrueKittycat, and MR and RG of Mox, among others, who all asked for this. If you asked me for this out of the list, it's also for you.)

Food and drinks have flavors [citation needed]. In fact, they have lots of flavors - careful tasting of an ordinary bottled barbecue sauce presents sweetness and tartness and savoriness, and beneath those, tomato and molasses, and beneath those - if you get that far - mustard seed and paprika and onion powder and "some kind of fish sauce???". (It's Worcestershire sauce.) Some flavors blend nicely, like onion and garlic, while others clash, like onion and pineapple. But then some very different flavors pair just fine, like apples and cinnamon, or vanilla and nearly anything you'd find in a dessert. And even onion and pineapple go together just fine in the greater context of a salsa, or even a pizza! So what's going on?

Here's a stab at explaining why. I'll use "food" as a term of art to mean anything intended to be eaten and enjoyed. A food flavor is comprised of two major parts: its tastes (sweet, salty, spicy, all the basic and chemosensory types) and its flavors (individual odorants, mostly associated with specific ingredients like cumin, tomato, or beef). On top of that, we have things like its context (what's the nature of the larger mixture? is it a dessert? a stew?), its temperature, and the relative concentration of flavors, and to a lesser extent modifiers like how cooked it is (caramelized, raw, normally cooked as "blurs out and turns up the gain" on flavors), what solvent it's in (water, alcohol, fat), the physical properties of the substrate (is it crunchy? soft? liquid?), and what expectations you have when tasting the food.

On my model, a combination of flavors tastes at least OK if at least one of three things is true, and generally better with more of them satisfied. The combination can call back to a known tasty food, it can have satisfying blending with no bad clashes, and it can have interesting bistable contrasts with indepdently good-but-maybe-overstrong components.

The first of these, the Rule of Familarity, is the simplest to explain. A food will probably taste OK if the flavors in it match closely to the major notes of a known and beloved dish and the presentation of the food isn't too terribly different. This is the operating principle behind any fussy "deconstructed" food: you take the components of a dish and permute or alter their order or presentation while leaving the basic notes intact, as well as the general presentation. Maybe you also really sell the phenomological binding by adding some additional element that would classically go with the dish, just to control expectations a little. Take the example of a deconstructed apple pie. Turn the apple filling to a reduction sauce and swap crust for an artful bed of crumbs. Make very sure that the apple sauce has a bit of molasses and cinnamon in it, maybe some other pie spices. Apple pie's easy for presentation, since it's served both hot and cold, but if you really want to sell the effect, serve it with the sauce piping hot with a scoop of vanilla ice cream alongside. It'll be... fine. Sell it for 30 bucks a plate. (Give me a proper slice of pie any day, though.) Almost every food can be done up this way - dishes have major ingredients that people will expect, expected form factors or temperatures to serve them at, and expected roles that need to be filled. Match those well enough and you probably end up with something good.

The second, the Rule of Harmony, is a little trickier: there is a need for satisfying blending. A food will probably taste OK if any given pair of flavors in it blend satisfyingly, and no pair of ingredients clashes. On this model, a pair of flavors blends well if they share flavor chemicals; the more, the better the blend. Thus: onions and garlic, vanilla and most (but not all!) sweet flavors, and meat with anything savory, like tomato. This also helps explain what's going on with spices: they're almost pure flavor, and frequently contain flavorant chemicals that they share in common with ingredients. This in particular is why vanilla and chocolate see such wide appeal, and why rose was once the standard for desserts before vanilla: their chemical makeups are exceedingly complex and multifaceted, having some degree of commonality with a wide variety of different ingredients. This is also why oak is used for wine casks: it too contains vanillin. Alternately, we might contrast the sense of "blending" here with "masking", where "masking" should be taken to mean an attempt to force a fit by moving as far as possible to one extreme of the bistable spectrum; this rarely works out well. For instance, certain intoxicatingly herbed pastries frequently contain lots of chocolate in a doomed attempt to mask the taste. Better to work with the terrain rather than against it, to blend the flavor in instead - from personal experience I can recommend a nice quiche Florentine, whose heavy spinach component blends much better with bitter green flavors, especially alongside the base of pleasantly sulfurous Gruyere cheese and eggs, with onions, garlic, and perhaps some bacon all indicated.

Lastly, the Rule of Interesting Contrasts. A food will probably taste OK if it has some kind of interesting bistable contrast in it and it's made of individually good-but-maybe-overstrong components - and, again, no bad clashes. By bistable here I refer to an effect that can be achieved straightforwardly with careful balances of pairs of distinct tastes or even flavors, where we note that (e.g.) with a combination of salty and sweet, at one extreme the mix is just salty, at the other it's just sweet, but at some point in between, finely graded concentration differences and habituation effects give rise to the sensation of a taste that seems to flip back and forth between the two components. This is the operating principle behind trail mix: people generally like some subset of dried fruit, chocolate, beef jerky, cheese crackers, assorted nuts, and the like. (It's also the operating principle behind any cursed combination of foods that's "surprisingly good".) Each of them hits all the expected marks for being individually enjoyable - one or more of salt, sugar, and fat; individually enjoyable flavors; pleasant texture and form factor; all that good stuff. Also, for any given pair of those, there's one or both of a difference in taste - covering both sweet and salty - and a reasonably compatible difference in flavor (e.g. meat and fruit). Here we find a deep secret of foodcraft: never neglect the acid. A little tartness is a vital component of almost any food, and I hypothesize that a part of why is its capacity to play a supporting role to contrast well with sweet, fatty, and salty alike.

As with any art, the rules are not ironclad, and they can be broken to good effect. Neither are they universal: a combination of shrimp paste and pears might disgust you, but to the Indonesian palate, it evokes delicious rojak. Another is the rule against bad clashes: on this model, the reason why onion and pineapple go just fine together in a salsa is that the onion pairs excellently with the tomato (itself arrogating the savoriness of meat, along with salt), and the pineapple serves the role of adding sweetness and tartness, supporting the salsa as a whole rather than pairing with any specific ingredient; the same is generally true of any dish where one ingredient sticks out as particularly weird, and likewise, there likely ultimately exists no pair of ingredients that cannot be made to go together somehow in some dish. From this we might posit that the rule of familarity can override some minor clashes, if one of the clashing ingredients is core to the dish, the other is serving some role, and the clash is merely not enjoyable rather than actively offensive. Conversely, the "Incompatible Food Triad" - three ingredients that go together well pairwise but not as a triple - points the way to what looks like a puzzling inconsistency, but we might resolve the seeming paradox by pointing out that in such cases, any pair of the three evokes a very different dish, with the third having no place in it at all. That said, even the foremost research into Incompatible Food Triads has failed to turn up any particularly clean or striking examples of one, the closest being yogurt, salted cucumber, and sugar - breakfast yogurt, tzatziki, and sweet pickles are each perfectly fine dishes, but they pull in very different directions.

Using the principles expounded here, you can start composing your very own dishes. My specialties are generally of this sort: I've made a delicious beef stew halfway between an English-style stew and a boeuf borguignon; I've made a variant on cinnamon buns that uses plenty of Chinese five-spice powder, on top of the use of tangzhong dough preparation, to approval from Grandma Kim and numerous friends alike; and I've swapped out the broccoli in various dishes with Romanesco cauliflower to cheers. People ask me how I think of these substitutions, but considered rightly in a frame that this post partially illuminates, they all constitute natural alterations. Go make something delicious of your own!

(Part 2: Theory of Olfaction. Epistemic status: barely even half-baked - but unique, intriguingly plausible, and anyway no one has any better ideas.)

Vision, hearing, the numerous aspects of touch, taste, and smell: of these, smell - or olfaction - is by far the worst-understood, even if we try to tease out the role that olfaction plays in flavor, separating it from the gustation and chemoception that strict-sense taste encompasses. As Convergent Research puts it, “We can’t yet replicate animal olfaction synthetically as a sensing and classification modality. We currently lack a comprehensive model explaining how biological systems decode and classify chemical signals through olfaction. Understanding this process is critical for applications ranging from flavor science to disease diagnostics to understanding and harnessing animal communication.” This past weekend, I briefly attended a “gap mapping” research hackathon organized by YJK; my thanks both to him and to DK who invited me.

While I couldn’t hope to build a full olfaction decoding model, nor fully map odorant-receptor binding, nor even give a robust and comprehensive working theory of how to replicate olfaction in the few hours I had, I thought it prudent to at least clean up my existing thoughts on the subject, given how they’re informed heavily by both my experience in geometric group theory - far removed from the life sciences - as well as my experience as a skilled home chef, sometime perfume blender, and possessor of a keen sense of smell as linked to a keener phenomenology. With any luck, the added insight from the model I sketch out of how olfaction might work will prove a useful map for others more skilled in more central approaches to the question of olfaction; I believe the model to be a plausible one, given a few established facts about both the biochemical basis and subjective experience of olfaction.

Let me start by defining some terms carefully and laying out premises in the language that those terms scaffold. I’ll use “smell” to describe a direct olfactory percept, like the experience of exposure to (+)-limonene, or to ammonia with minor adjuncts, or to a blend of citronellol, geraniol, rose oxide, and beta-damascenone. I’ll use “scent” to mean the olfactory experience a person might have on being exposed to that smell in some concentration or set of concentrations; respectively: orange, stale cat urine, and rose.

For some established facts, we first note that olfactory receptors come in many different varieties, each highly selective to a single small molecule, or to a small set of chemically similar small molecules. Additionally, every such receptor has a band of sensitivity in terms of (say) parts per billion, below which the smell is imperceptible and above which the receptor either tops out or else no longer fires at all (consider the infamous case of hydrogen sulfide); we can rescale that range to the open interval (0, 1) as a fraction of maximal perception strength. As a minor fact, chiral molecules generally smell very different from each other, and don’t cancel each other out: the scent of (+)-limonene closely corresponds to the smell of oranges, but (-)-limonene’s scent better approximates pine; (+)-carvone smells like caraway or dill, while (-)-carvone’s scent is much more like the smell of spearmint. Meanwhile, we may make two mysterious observations: that given one scent, the addition of any amount of any other scent will be smothered by it, blend with or mutate it into a different scent, or stand out against it altogether; and that when moving through a room with a single (complex) smell source present, the resulting scent perceived can nonetheless change with factors including position with respect to the source, air currents, and even different individuals’ olfactory keenness or disabilities.

My major premise is this: arbitrary combinations of smells can be observed, but any two scents built up from smells - even the same list of smells, in some cases - differ greatly from each other, and this suggests that olfaction is best understood as having a treelike or hyperbolic structure to it. (This is notably unlike audition, which may be modeled as involving something like a Fourier decomposition with some added spatial information from timing differences, and unlike vision, which may be loosely modeled as having some pixel-like structure with three-dimensional Euclidean (color) coordinates for each pixel.) In addition, just as the color gamut is limited to only a part of Euclidean space, so too is the scent gamut limited to a tiny sliver of the possible (high-dimensional) hyperbolic space, given the nonexistence of anti-scents - though the nature of hyperbolic space is such that unlike with color, we barely notice the lack.

To understand why this is so plausible, it will be necessary to explain the concept of a δ-hyperbolic space. A δ-hyperbolic space is a metric space in which for any triangle ABC that we might draw, every point on the side AC is distance at most δ from some point on one of the sides AB, BC, and likewise for the other two sides Put another way, the entirety of each of the sides is relatively nearby to the other two sides of the triangle. (This is the picture at the head of the page.) The ordinary hyperbolic plane can be calculated to be ~0.88-hyperbolic, and at δ = 0, we find trees - note that for any three points in a tree, if they’re not part of a single path, then there exists a unique vertex which all three sides of the triangle contain.

The δ-thin hyperbolic model of olfaction then goes like this:

  • Each olfactory receptor has some band it receives best in; we can rescale this to an open interval like (0, 1).
  • Smells are best transcribed as a list of olfactory receptors, ordered from strongest to weakest (rescaled) response; something like (ABCDEF…)
  • A pair of scents is similar in quality exactly when the two scents agree for a large number of initial receptors, that is, we have something like a natural word metric on the space.
  • Each such receptor defines a vector in the hyperbolic space, with most pairs of scents orthogonal to each other and a rare few pairs closer-linked. We can consistently define something like cosine similarity between subjective responses to pure stimulation of individual olfactory receptors, but this would need to be measured empirically.
  • Obviously, chiral and algebraic inverse are not the same; spearmint and caraway do not cancel out, even if their rescaled percept strengths are similar enough for the receptors to be adjacent in the scent-list.
  • Slight differences in scent perception can occur if the strengths of the components of the smell vary slightly (perhaps due to air/fluid circulation), especially if some pair of smell components provokes similarly strong rescaled responses. This might lead to bistable smells.
  • The set of all perceptible smells, having this treelike structure, is δ-hyperbolic, and we could (empirically) measure this δ; scent perception is based on dividing up this hyperbolic space into contiguous directions, possibly Voronoi cells on the surface of some segment of a hyperbolic hypersphere, surely of varying size.
  • The experience of more than one different scent occurs when the components of a smell can be most naturally grouped into two or more recognizable clusters with small intra-cluster and large inter-cluster metric distances.

This model suggests a few ex ante predictions/explanations and proposes associated measurements and tests.. 

  • There should be a bistable scent that can be generated by the right mix of smells - probably three of them, where going from (ABC) to (ACB) produces notably different scents. That scent (or pair of scents?) should give us good bounds on the just-noticeable-difference level for relative concentrations from swapping the concentrations of the subdominant smell components.
  • There should be some overlapping chain of smells which makes two otherwise unrelated-smelling scents blend. Similarly, there should be a scent (probably a mix of at least four or five smells) from which the removal of some “central” scent breaks the scent apart into two unrelated-smelling scents.
  • There should be a “base perfume” blend that one could turn into numerous importantly different scents through the addition of a small amount of chosen extra odorants - specifically not just “neutral-ish base enriching chosen scents”.
  • The phenomenon of synthetic scents feeling “flat” but only rarely wrong likely comes from a natural scent having a form like (ABCDEF…) and the synthetic form having a form like (ABCD) - truncated, but still nearby.
  • The (now sadly-well known) phenomenon where some sufferers of COVID-19 smell nothing but garbage is likely partially explained by total loss or at least of some of the olfactory receptors, collapsing most of the space of scents down to their garbage-like components by dropping the other coordinates. It’s also possible that these receptors have instead been scrambled such that their signals cannot be adequately received or interpreted, much as some sufferers of nerve damage have reported with various forms of touch.
  • Actually do the empirical measurements for subjective cosine similarities between olfactory receptor activations - something like the process that brought us word2vec.
  • Likewise, actually measure the subjective metric distances between nearby scents, and probe the borders of each projective scent subspace.


Discuss

Heretical Pasta

Новости LessWrong.com - 24 мая, 2026 - 04:50

If you ask the internet how to prepare pasta you'll hear two things:

  • You must salt the water.

  • You must serve it mixed with the sauce.

I disagree on both.

I've been cooking pasta since I was a kid, and I prepare it the way my mother (who grew up in Rome) did:

  • Cook it way less than it says on the box, until it's no longer crunchy but not further.

  • Time dinner so that the pasta is the last thing to be ready, where you're eating it within 5min of it coming out of the pot.

  • Serve it in one bowl, with the sauce in another.

The primary goal is to keep the tastes and textures distinguishable, merging only as you chew. The pasta resists your teeth; the sauce flows. The sauce is rich and flavorful; the pasta is a hearty foil. Secondarily, by combining only on each person's plate you can handle a range of preferences in sauce-to-pasta ratio, and different dietary restrictions (ex: a separate vegan sauce).

I don't know how people ended up thinking there was only one way to cook pasta, but to my taste the standard approach is a big missed opportunity.



Discuss

Veganism is Virtuous but not Obligatory

Новости LessWrong.com - 24 мая, 2026 - 02:19
Veganism is Virtuous, but not Obligatory


Tl;dr: Here, I argue that eating meat is morally acceptable. The central point is that every argument for abstaining from animal products being a moral obligation is also an argument for more extreme levels of obligatory abstinence. The “as far as is practicable” constraint vegans often assert either permits omnivorous diets, or entails extreme obligations that nearly all vegans fail to fulfil. I view abstaining from animal products as a virtuous, supererogatory act - similar to building free houses for the homeless. It is something that is beneficial and kind, but regarding it as obligatory seems indefensible under a consistent view of ethics.

Upfront acknowledgments: Factory farming is despicable and I would gladly see it abolished. Animals have moral worth. I support vegans, particularly on policy. The animal industry is bad for the environment. Eating animal products is not necessary for a long, healthy life. 

Terminology: I use slightly nonstandard meanings of certain terms (eg: virtuous = supererogatory) in this essay, and have taken care to use terms consistently throughout. Definitions are available in the appendix.


Obligation vs Virtue

There is a distinction between an obligatory action and a virtuous action. 

Obligation: Something you must (or must not) do

Virtuous action: Something that is moral to do, beyond one’s obligations; supererogatory.

The factors that determine whether abstaining from a practice is obligatory are twofold: the sacrifice it would impose on the abstainer, and the immorality of the action. These jointly determine which side of the obligation threshold an action falls on.


There are many cases where an action causes substantial harm to moral patients, yet is permissible. Two actions can have equal expected harm, yet differ in permissibility.


Consider the following hypothetical:

Anne has one laptop. It contains treasured photographs and videos of her late family, who died in a car accident. The laptop is broken, preventing copying or backing up of files through either external drives or the internet  - it is her only way of preserving the memories of her loved ones, and is her most treasured possession. However, every time she charges the laptop, she causes some harm to the environment, since the grid relies on fossil fuels.


John has an identical laptop. He uses it as a paperweight for papers he doesn’t use, but because he thinks the light emitting from the screen looks slightly interesting, he keeps it on, and charges it using the same grid, with equal frequency to Anne. This causes the same harm to the environment.

In this case, John has (or is far more likely to have) an obligation to abstain from using the laptop.  For Anne, using the laptop is clearly permissible, despite equal expected harm produced by their actions.

Rejecting the principle entails the conclusion that John and Anne are equally obligated to abstain from charging the laptop in order to avoid harming the environment - an absurd conclusion.

If sacrifice doesn’t free you of an obligation as long as harm is brought to moral patients, then eating a vegan diet is also unacceptable, as it carries an expected lifetime burden of counterfactually killing numerous small creatures as an unavoidable side effect of crop farming.

Most vegans acknowledge this principle by implication. The vegan position is broadly that one ought to “reduce animal suffering to the greatest extent possible, or practicable” - implying that highly impractical or gratuitous sacrifices are not part of one’s harm reduction obligations, but feasible, non-extreme sacrifices are. 

Furthermore, just because an action (such as a vegan diet) is less harmful than many alternatives, doesn’t justify it as permissible. All actions that are harmful are less harmful than alternatives. Eating typical quantities of meat is less harmful than eating enormous quantities of meat, which is less harmful than massacring millions of animals. If one holds that an action being less harmful than many alternatives is adequate to justify it as permissible, then virtually all actions, including extreme, gratuitous violence, are permissible. 

There is no Principled Distinction Between “Need” and “Want”

Invoking “need” is central to many arguments made by vegans. If a vegan aims to differentiate not eating meat and, for instance, not using electricity, on the basis that you “need” to use electricity, they must provide a clear distinction between what is a need, and what isn’t.

I argue that needs are exclusively instrumental to some wanted thing, and there is no such thing as a terminal “need” that is different to a terminal “want”. 

One “needs” something only if it produces a result that they want. In every case of a need, the needed thing facilitates, or is equivalent to, a desired end state. Absent a preference for the end state, need dissolves without exception. A person who doesn’t care to live, doesn’t need to breathe, for example. 

Therefore, a “need” is equivalent to: “A thing that results in something we want (for its own sake)”, and is thus entirely subjective.


Contra the “Survival” Framing of Need

Commonly, a need is defined as: “A thing without which you will die”

A literal reading of this definition faces an obvious problem: we will all eventually die. So, strictly, this entails we need everything in existence, rendering the term functionally meaningless.

A steelman of this framing is “A thing without which, you will die earlier than otherwise”. This does not withstand scrutiny either.


Consider a hypothetical supplement that would extend your lifespan by a single hour per sixty years of daily use. If you miss a single day, the clock resets. 

Under the definition of “a thing without which you will die (earlier than if you had it)”, this supplement qualifies as a thing you need. It seems clearly absurd to call this a “need”, and thus the definition fails to capture the true meaning of the word.

There is no mind-independent threshold of lifespan addition beyond which this supplement would suddenly become a “real” need versus an optional luxury. At 1 hour of extra lifespan per 60 years, or 1 day per 60 weeks, or even eternal life after a single dose, there is no point at which it transforms into an inarguable, mind-independent need.


On “Proper Function” and “Natural Kinds”

To avoid bloating this essay, I will not argue for the position of anti-realism about natural kinds and related concepts like “proper function” - I am flagging it here as a background premise. 

In brief, I reject the position that there are objective facts about the “correct” functioning of humans, animals, or any other organism or object. I maintain that these are merely observations about “things that tend to be true of X” rather than “things that X should have”. The second cannot be objectively derived from the first.


Contra the “Harm Avoidance” Framing of Need

The supplement hypothetical shows longevity is insufficient to ground a distinction between need and want. However, one may argue for “need” being sustained in both cases by the opposing harm involved.  The supplement is bitter and inconvenient. These are forms of harm

So, can “that which avoids or mitigates harm” serve as an objective distinction between “need” and “want”?


This distinction collapses to equating need with want, and relies on smuggling “wants” into the definition via the word “harm”. 

What does “harm” mean in the context of this framing, if not “a thing producing unwanted consequences"?

Some may invoke an outside view of harm: something like “an adversely impactful divergence from the typical or proper function of a thing”.

This invokes “proper function”, which as noted, is taken to be an invalid concept. Additionally, the work of this definition of harm is being done by the word “adverse”, which means “harmful”, making it a circular definition. Without the word “adverse”, this definition reduces to “a divergence from the norm”, and a person being transformed into Superman would be considered harmed. 

Therefore, the outside view of harm is either circular or absurd.


Eating Meat is Less Immoral Than Killing Animals by Virtue of Many Mitigators

The distinction between eating meat and killing animals is one that is often glossed over in arguments about animal ethics, where the two are treated as equivalent - purchasing one chicken equals killing one chicken. I argue this is a false equivalence.

Eating meat is not as morally bad as killing animals because it is significantly less direct in every important regard.

By directness, I mean: having a close causal or mechanistic relationship to something, absent intermediary mechanisms or ambiguating factors. 

Directness is a spectrum: the less time, intermediary mechanisms, and unpredictability connecting an action to its outcome, the more direct it is, and vice versa. For example, shooting a man is slightly more direct than paying someone to poison him, which is significantly more direct than starting a podcast that advocates for violent crime in his local area. This is orthogonal to outcomes - even if the expected value of human lives lost is the same in all three cases, they differ in directness.

I argue that a harmful action that is more direct is more immoral than a harmful action that is less direct. I argue that the direction of this difference universally favours eating meat over killing animals.


Directness being rejected as an intensifier leads to absurd conclusions

If directness is held to be irrelevant to the morality of an action as long as the expected outcome is the same, this leads to indefensible conclusions. 

For instance:

Approximately 365 million vertebrates are killed by cars on roads each year in the United States. There are approximately 240 million licensed US drivers. Therefore, the average driver is expected to kill ~1.5 vertebrates per year. Assuming a driving window of 50 years, this entails that the average driver will kill roughly 75 vertebrates in their lifetime.

A vegan who rejects indirectness as a mitigating factor must concede that driving a car in typical conditions today is no different morally than buying a car that is (through hypothetical technology) completely harmless to animals on the roads, but is sold to you on the condition you personally crush 75 live vertebrates underfoot before being handed the keys.

This position is untenable, and the vast majority of vegans would consider this to be grossly immoral in a way that general car driving would not, despite identical expected harm. Thus, a rejection of indirectness as a mitigator is reduced to absurdity.


Per-Instance Probability Is Relevant to Morality Even with Equal Expected Outcome

In cases of equal expected value, I argue the absolute per-instance probability of a bad outcome remains relevant, and that lower probability bad events are more moral choices to higher probability bad events, ceteris paribus.

Consider the hypothetical:

A mother of 10 children must either (A) sacrifice one of her children, or (B) accept a 1/10 chance that all ten die. 

I maintain that the 1/10 chance is a more moral choice, and that a woman who would willingly forgo a 90% chance of total safety in such a scenario would be acting immorally.

Inverting the principle seems unacceptable, committing one to sacrificing their child for the sake of avoiding an unlikely, proportionally severe event.

Ambivalence is suspect - it seems to call for a preference for one case or the other, particularly given the apparent unacceptability of option A. Few would waive their right to deliberate in this scenario. 


Indirectness of mechanism: Variance in the supply chain

If you go to a grocery store and buy a chicken, the store owner does not immediately notify a chicken farmer of your purchase and order exactly one chicken to replace it. Instead, the grocer may take inventory every two or three days to see how many chickens have sold (with some inaccuracy given the inevitability of misplaced stock, theft, administrative errors, etcetera). Let's say it's 187, so he puts in an order for 200 chickens. He might sell more next week, and it's better to have a slight surplus than to run out of stock.

Further still, the number of chickens ordered doesn't have a 1:1 relationship with the number of chickens that are raised and killed by the farmer, because a single grocer ordering 200 chickens is analogous to a single customer buying one chicken from the grocer - the effect compounds. From the perspective of the chicken farmer, who may service several grocery stores, he could get orders for chickens ranging in size from 1200 in a week to 1400 in a week. Again, the cost of slightly overestimating is far less than the cost of slightly underestimating (reputation, lost profits, scalability, etc.), so he aims to produce 1400 chickens, perhaps 1500 for good measure.

We now have a situation where the individual purchase is removed from its cumulative effect by several degrees of imprecision. If you abstain from buying chickens this week, the most likely outcome is that the grocer still rounds up, the farmer still overestimates, and the same number of chickens meet the axe.

Importantly, the expected outcome of buying a chicken, on average, is still approximately "1 chicken dies'', but its effect is ambiguated by what is now a lottery system. Rather than causing a chicken to die every time you buy a chicken, you have something like a 1 in 200 chance of counterfactually causing the deaths of 200 chickens every time you buy a chicken. 

This is a simplified illustration of the high-variance processes connecting your purchase to its harmful effect. In reality, this 200 figure is conservative. For most economies, there are more middlemen in the picture than this simplified example describes. Each stage enlarges variance and increases the size of the thresholds. These include:


The mechanisms connecting your purchase of a chicken to the death of a chicken are significantly less direct than killing a chicken yourself by virtue of probabilistic lottery-style impact rather than certain per-purchase outcomes, and numerous imperfectly efficient agentic intermediaries.

The “hitman” objection

A common objection to the premise that indirectness ameliorates immorality, is the reductio of hiring a hitman.

“If it’s true that indirect harmful actions are less immoral than more direct actions, this means that hiring a hitman would be better than killing someone directly. Further still, you can hire a man, to hire another man, to hire a hitman, and so on, washing your hands of the crime with each stage. This is an unacceptable conclusion and reduces the position to absurdity”

This fails to reduce the position to absurdity. Paying a hitman is slightly less bad, because it is slightly less direct. The degree of indirectness remains relevant. The mechanism is clear, and there are not many intermediaries. 

Either the case being posited is a realistic hitman (or hitman hirer) with imperfect efficiency, or a hypothetical ideal hitman (or hitman hirer) with perfect efficiency.

The “realistic” middleman case: If the reductio posits realistic conditions for the hiring process, and realistic reliability of the intermediaries, then it fails to show the action is equally immoral. Each stage of the “hiring” chain involves leakage - there is an increased chance they fail to complete their task, get caught, or renege on the deal at every stage. Therefore, with each intermediary, the expected outcome gets less bad. At 0 intermediaries, the murder is near certain. With many intermediaries, the outcome is far from certain. Therefore, the expected harm occasioned by initiating this lengthy chain sequence of hitman hirers is predictably much less, and the action is therefore less immoral.

The “perfect” middleman case: If the reductio posits that there is no drop-off from intermediary to intermediary, then this raises the question of what constitutes an intermediary versus the same mechanism. The “perfect” middleman must have no counterfactual agency, and be deterministically tied to the subsequent events with complete in-advance certainty. This is disanalogous to the case it attempts to refute. In the case of buying meat, there is counterfactual agency, significant variance, and there is not complete in-advance certainty of the outcome. The outcome of an indirectly harmful act, though equal in expectation, is ambiguated by numerous rolls of the dice, whereas the perfect hitman eliminates this salient factor. It is hard to argue that these middlemen are actually separate causal entities at all. In the same way, one can argue that a person shooting someone is employing many intermediary mechanisms - millions of neurons must fire in sequence, each a “middleman”, their muscles must pull on their tendons, which pull on bones, which pull on the finger-facing molecules of the trigger, which kinetically influence the adjacent molecules, and so on. The principle that distinguishes a middleman from a single causal mechanism in this context is ambiguity and variance. If there is no, or negligible ambiguity or variance in the outcome, then it is not a valid analogy for indirectness.

Time lag

A component of indirectness is time lag. That which has an immediate effect is more direct than that which doesn’t. 

In the case where an intentional harmful act causes harm later rather than immediately, it is less immoral than the reverse, as long as the harm eventually occasioned is equal in expectation.

Buying a chicken to eat involves a substantial time lag from the moment of purchase to a counterfactual chicken being killed. Therefore, buying a chicken is less immoral than killing a chicken.

Intention

I argue intention affects how moral an action is, independently of its expected outcome.

An action that is done with the intention of bringing about a harmful result, is worse than an action that is not done with the intention of bringing about a harmful result, all else equal.

For example, extending the vehicle analogy: 

A woman driving a car intends to get to work, and accepts the incidental expected harm of pollution, collision risk, and roadkill as an unfortunate, unavoidable compromise. However, a man who drives in exactly the same way, but for whom getting to work is just a bonus, and whose primary goal is to legally kill animals that happen to wander into his car’s path, is acting more immorally than the first person. 

This is true even if the expected outcome is exactly the same. In some cases, it is even enough to offset substantially greater expected harm.

For instance, a woman who drives in a heavily forested area with many creatures wandering across the roads is expected to run over far more squirrels and frogs and snakes on her way to work than a man who lives in a desert. But, the man in the desert, who prays for small animals to wander across the road during his daily commute so he can relish their deaths, is acting more immorally than the woman, who intends merely to get to work, and regards the vertebrates she kills as an unfortunate, unavoidable externality. 

Rejecting the principle, taking the position that intention makes no difference, and that the moral calculus is purely consequential, entails that there is no difference in morality between someone driving so they can get to work, and someone driving so they can see the blood of innocent creatures decorate their car, as long as they drive in the same way. 

“Name the trait”-style reductios are disanalogous by virtue of mitigators

Either mitigators such as directness and intention decrease one’s obligation to sacrifice, or they don’t.

If they don’t: Then all actions with equal expected consequences are treated as morally equivalent. In this case, the “name the trait” challenge collapses under the same argumentation. 

Recalling the vehicle analogy:

A typical American driver will kill (crush) ~75 vertebrates in their lifetime. Under the view mitigators such as directness are irrelevant, this is morally equivalent to personally crushing 75 vertebrates underfoot.

To hold consistently to the framing of “name the trait”-style reductios, a vegan driver must name the trait true of animals that, if true of humans, would justify crushing them underfoot in order to drive.

Clearly, no trait true of animals applied to humans would permit crushing a human underfoot for the sake of being able to drive. Therefore, either driving is morally unacceptable, or the position that mitigators do not decrease one’s obligation is reduced to absurdity. 

If they do: Then “name the trait that permits you to kill an animal” is disanalogous to the case of purchasing meat. Purchasing meat is less direct, less intentional, involves high variance, many causal intermediaries, and substantial time lag before counterfactual harm occurs. These are the same mitigators that differentiate driving a car from crushing vertebrates underfoot. The extent and significance of these mitigators is subjective and arguable for both cases. I would argue the driving case actually has fewer and less extensive mitigators than purchasing meat - you are not ever likely to be in the same room as a slaughtered animal, but you are virtually guaranteed to personally crush a squirrel or a frog, for instance. 

This issue is not resolved by constraining the challenge to consumption, i.e.,  "Name the trait true of animals that, if true of humans, would justify buying their meat.".  The constrained form has the same logical structure as the original, and the same structure can be turned on driving: "name the trait true of small vertebrates that, if true of humans, would justify driving in a manner that near-certainly crushes 75 of them in a lifetime." No such trait exists in either case. So the constrained challenge condemns driving exactly as the unconstrained one does. Since vegans generally regard abandoning driving as beyond one’s obligations, they must concede that the absence of a distinguishing trait is not sufficient to establish impermissibility. Constraining NTT to consumption therefore fails to establish that buying meat is unacceptable.

Humans are Worthy of Greater Moral Consideration than Animals


If one assigns moral value on the basis of capacity for suffering, pleasure, intelligence, meaning, social exchange, or anything else humans generally value, humans as a group outperform animals on these metrics. The degree of value difference will vary depending on the weights you assign to each of these traits, but the direction of the net difference is obvious. Almost all vegans will readily concede that humans are more valuable, in general, than animals.

Without some hierarchy of value anchored on morality-relevant traits, instead viewing moral relevance as binary contingent on meeting a threshold of traits, absurd conclusions result. 

Imagine, for instance, a choice between certainly killing an ant that has the minimum level of moral traits needed to satisfy the threshold, or a 99% chance of killing a creature that possesses morally relevant traits to a vastly greater degree than humans. A threshold view entails that no matter how intensely this creature feels suffering, how richly it experiences positive emotions, or how intelligent it is, near-certainly killing it to save a threshold-clearing ant is the optimal moral choice - an evidently absurd conclusion.

Therefore, all else equal, harm occasioned to animals is less immoral than harm occasioned to humans.

Dietary Veganism is a Significant Sacrifice that Varies Between Individuals

Dietary veganism requires most people to give up or markedly diminish valued aspects of their lives for the sake of reducing expected harm to animals. It thus involves a sacrifice.

Vegans commonly argue the extent of the sacrifice involved is minor. However, I argue the sacrifice involved for most non-vegans is significant, and varies substantially between individuals. 

Taste/pleasure

For most non-vegans, a major factor motivating their consumption of animal products is that a diet including these products is tastier and more enjoyable than a diet without. The stated and revealed preferences of the vast majority of humans clearly indicate that eating animal products makes their lives more enjoyable. Therefore, to abstain from eating animal products involves sacrificing a degree of the all-things-considered enjoyableness of one’s life. 

The scale of this sacrifice varies. For some, animal products are disgusting, and thus it involves essentially no sacrifice to abstain from eating them. For others, animal products are a source of immense satisfaction, and plant-based staples are disgusting. The latter group must make a markedly greater sacrifice to the overall enjoyment of their lives in order to be dietary vegans. 


For the typical non-vegan, the taste/pleasure sacrifice is considerable. Taste consistently ranks as a primary reason people give for not reducing meat consumption. Additionally, the most common reason for relapse from vegan diets is food dissatisfaction. So, even among people actively motivated to abstain from meat, taste is often a prohibitive sacrifice by revealed preference.

Available evidence on stated and revealed preferences points strongly to the taste value of animal products being a significant factor in most people’s lives, and should therefore be regarded as a considerable sacrifice by default.

Nutrition

Because a vegan diet is a subset of an omnivorous diet, it necessarily narrows the range of available foods. 

I do not argue that animal products are essential for a healthy, long life. However, the sacrifice involved comes in the form of greater difficulty, and a considerably narrower range of ways to meet nutritional needs. 

Additionally, without careful dietary planning and/or supplementation, there is an elevated risk of essential nutrient deficiencies like B12 and iron. Factors like protein quantity and source variety become a greater consideration as well due to the diminished bioavailability and variable amino acid profile of plant proteins. 

Thus, there is an unavoidable tradeoff between risk of dietary mismanagement and time investment (research, meal planning) to ensure nutritional sufficiency.

Convenience/time

Another sacrifice associated with veganism is that it is inconvenient. Despite the trivial connotations of the word “convenient”, convenience is a strongly valued aspect of most people’s lives, and a legitimate object of concern. 

Some (non-exhaustive) ways veganism results in lost time or convenience include:

  • Adjusting to the lifestyle
  • Auditing existing habits and ongoing purchases
  • Greater logistical need to prepare meals where vegan options may not be available
  • Research on nutrition and supplements
  • Limitations or friction introduced in social settings
  • Moving location to areas with greater availability of vegan staple ingredients (such as for those living in food deserts)

It is worth noting this aspect of sacrifice is a catch-all category which trades off against the other sacrifices. To achieve a level of enjoyment closer to that of an omnivorous diet, one must spend a greater amount of time and effort learning and practicing how to make tasty vegan food. To achieve more complete nutrition and a lower risk of deficiencies, one must spend more time and effort learning and organising their diet. 

Social 

Social difficulties are among the most common reasons for vegan diet lapses, and a major barrier to veganism for many people.

Vegans are disliked by the general population, and face widespread discrimination. A prospective vegan must not only accept restrictions on their nutritional options, meal enjoyment, and convenience, but also risk being ostracised or mocked in social settings. This is particularly the case in settings involving communal eating.

Communal eating is a near-universally cherished aspect of human experience. It is valued to such an extent that sharing meals with friends and family predicts happiness and wellbeing comparably to income. Even marginal encroachments on this aspect of life therefore merit serious consideration.

On balance, veganism has a clearly negative expected effect on one’s social life.

_

The nature and extent of the sacrifice veganism requires varies, however, by revealed preference, the vast majority of people do regard it as a sacrifice (since it is worse than their preferred default). For people with strong social ties to omnivores and anti-vegans, high taste affinity for animal products, limited nutritional knowledge, or strong preferences for convenience and variety, the sacrifice is substantial.

Veganism Either Entails Extreme Conclusions or Permits Eating Meat

The most commonly cited definition of veganism, from The Vegan Society, is as follows:

"Veganism is a philosophy and way of living which seeks to exclude—as far as is possible and practicable—all forms of exploitation of, and cruelty to, animals for food, clothing or any other purpose; and by extension, promotes the development and use of animal-free alternatives for the benefit of animals, humans and the environment. In dietary terms it denotes the practice of dispensing with all products derived wholly or partly from animals."

First, the terms “exploitation and cruelty” decompose to harm. Roughly, “exploitation” means “harm in order to benefit”, and “cruelty” means “harm for its own sake”. These two terms are thus functionally inclusive of all forms of foreseeable harm, and will be treated as an equivalent concept.

The contentious aspect of this definition comes from the qualifier: “possible and practicable”. Since practicable is a subset of “possible” (all practicable things are possible), practicable is the only relevant component to address.

The term practicable maps directly to the sacrifice framing established in this essay: That which is practicable involves a level of sacrifice that is below a certain acceptable threshold. That which is impracticable exceeds this threshold. 

For example: a person living in a food desert for whom dietary veganism would result in an unhealthily restrictive diet. It is a contentious topic among vegans whether such a person would have an obligation to go vegan at all, or if the financial and logistical challenge of moving to a more abundant area (or tolerating malnutrition) is too great a sacrifice.

This “practicability” framing produces two possibilities: Either "practicable" is a threshold that constrains obligation, or it isn't. I will first briefly address positions absent a practicability constraint, before discussing the dominant practicability-constrained view.


Possibility 1: Practicability does not constrain one’s obligation to abstain from harmful practices. This commits vegans to unbounded, extreme levels of abstention. 

The spectrum of potential acts of abstinence is continuous, and has no principled stopping point.

Consider these escalating possible acts of harm-reducing abstention:

  • Buying meat from a store that sells only meat
  • Buying meat from a general store 
  • Buying less meat from a general store
  • Buying vegetables from a general store that also sells meat
  • Buying vegetables from a vegan store
  • Buying vegetables from a vegan store with verifiably all-vegan employees
  • Growing your own vegetables from inexpensive seeds bought from vegan stores with all-vegan employees…


At the first stage, financially and socially supporting stores that sell only meat is a stronger per-dollar causal contribution to animal harm than buying equivalent products from general grocers. Thus, it’s possible to reduce harm by abstaining and seeking alternatives.

Buying vegetables from general grocery stores does little, but not nothing, to facilitate the meat industry - the grocery store does not compartmentalise revenue with perfect efficiency, and in expectation, your contributions will fund the meat-producing functions of the store to an extent. 

Further, buying from a vegan store (that has some omnivore employees) is less harmful, but still contributes to the animal industry via the wages paid to employees who will buy meat. 

Continuing with possible sacrifices, you can further reduce harm by growing your own vegetables, ensuring the seeds are sourced from vegan stores with vegan employees. Even in this case, there is nonzero monetary leakage into the meat industry - it is impossible to entirely isolate your purchase from downstream contributions to the broader economy. 

The progression does not stop there. With no threshold of sacrifice or “practicability”, this commits you to abstaining from the economy entirely, choosing instead to grow your vegetables from seeds that you have found through foraging, with fertiliser you have made yourself. 

No mechanism under this view constrains your obligation to make further concessions beyond that which is “possible”. 

Ultimately, this position seems to entail suicide by starvation, the maximum possible degree of abstention. 

If one argues that a person would be obligated to eat to such an extent that it causes a net-benefit relative to merely abstaining from food[1], such as by being an advocate, this concedes there is no act/omission distinction relevant to one’s obligations - one must act to eat, one must act to advocate, one must act to afford the seeds. Therefore, such a position produces unbounded act-based consequentialist obligations, like sneaking into as many factory farms as you can to free animals. All the while, ruthlessly optimising your calorie intake so as to never scrape above the threshold of absolute necessity.

The demandingness of this view makes it an untenable position, and entails that the vast majority of vegans fall short of their obligations. Because of this, a practicability constraint is the majority view among vegans.

Possibility 2: Practicability does constrain one’s obligation to abstain from harmful practices. 

This licenses the view that abstaining from animal products is supererogatory, and that omnivores are eligible as vegans. 

A typical omnivore already abstains from practices that are harmful which don’t involve inordinate sacrifice - i.e. to a practicable extent. The popularity of “free range” eggs demonstrates omnivores are often willing to sacrifice money for the purpose of reducing their causal contribution to animal suffering, for instance. An omnivore may abstain from eating live octopus, despite otherwise finding it an amusing cultural experience - another practicable sacrifice. 

Given this, the obligations produced by this position hinge entirely on what is “practicable”. 

As previously argued, there is no principled, objective distinction between “need” and “want”, thus, this cannot serve as a valid definitional basis for practicability.

What remains is a sacrifice threshold view: that there is a continuous spectrum of “wants” that can be sacrificed to greater or lesser degrees, and there is a threshold beyond which further sacrifices are supererogatory. 

Importantly, whether a given abstention is obligatory (practicable) varies even with identical sacrifice willingness. One can be willing to sustain a level of subjective suffering that is exactly the same as that of a dietary vegan, and yet remain an omnivore, because sacrifice is contingent upon variable subjective affinities and aversions for the tradeoffs involved. 


Consider the following hypothetical case:

Lisa and William both fully accept they should reduce their contribution to animal suffering to the greatest extent practicable - they are both willing to make sacrifices of abstention to the point of moderate discomfort and inconvenience, but not beyond. William is in a great position to do so: his family and friends are vegan and he never much cared for the taste of meat, eggs or dairy. Lisa, however, lives in the heart of Texas - barbecue is a cherished part of her culture, she loves the taste of meat, can hardly stomach most plants, and her friends and family are ardent conservatives who openly express contempt for vegans and vegetarians. For William, eating entirely plant foods falls well below the threshold of practicability, barely registering as a sacrifice at all. For Lisa, even moderate reductions in her animal product consumption risks great discomfort, distressing identity dissonance, and being ostracised by her loved ones.

In this case, if they both abstain from eating animal products altogether, Lisa is making a vastly greater sacrifice that exceeds practicability, whereas William is making virtually no sacrifice, and would be obligated to abstain further in order to reach the point of moderate inconvenience and/or discomfort - otherwise, he isn’t abstaining “as far as is practicable”. For them to make the same sacrifice, Lisa will either give up trivial amounts of meat to match William's negligible discomfort, or William will need to make major additional sacrifices to match the distress Lisa faces by giving up meat entirely. It is not possible for Lisa and William to have the same level of sacrifice-mediated practicability, while also adopting the same lifestyle.

Without a categorical distinction between “need” and “want”, there is no principled argument that Lisa is obligated to abstain from eating meat that is not also an argument for William to abstain from other things such as electricity, driving, and general economic activity to the point of intense discomfort. As established, no such categorical distinction exists. 

Therefore, positions on this threshold either skew towards permitting many omnivores, or entailing untenable obligations. There is no threshold at which practicability unambiguously prohibits eating meat for Lisa that does not also entail extreme compromises for William.


The reasonable threshold of practicability is modest

There are many strong reasons to believe the threshold of obligation is relatively low for abstaining from animal products, both in principle and by precedent.

In principle:

  • Mitigators 

Given the abundance and extent of mitigators distinguishing eating meat from the expected harm eating meat entails, the obligation to abstain from the practice is considerably reduced. Consuming animal products involves many layers of indirect mechanisms with high variance, many agentic intermediaries, significant and variable time lag, and views harm as an unfortunate consequence rather than a desired result. 

These mitigators are powerful, not trivial. Driving a car ad libitum for 8 months is vastly less immoral than crushing a squirrel underfoot, despite equal expected harm and fewer, less extensive mitigators than buying meat. 

Given the level of immorality is an input that distinguishes an obligatory sacrifice from a merely virtuous one, a greatly reduced level of immorality entails a greatly reduced degree of obligatory sacrifice.

  • Moral consideration

Given humans at minimum match, and near-universally exceed animals on commonly weighted factors such as lifespan, capacity for social connection, and intelligence, actions that harm groups of animals and insects are significantly more likely to be permissible given equal harm. 

The group-level difference in moral consideration is significant. This is hard to quantify, however even in the strongest case where a cow is otherwise trait equalised with a human, their natural lifespan is roughly 5X shorter, for instance. Factoring in other trait differences such as social and intellectual capacity points to a very significant overall difference in moral consideration.

Most vegans acknowledge humans are, on the whole, worthy of substantially greater moral consideration than nonhuman animals.

Since consuming animal products primarily harms animals as opposed to humans, the action is less immoral than for the human equivalent.

Given lesser immorality entails lesser obligations, the threshold of practicability therefore must concomitantly decrease. 


By Precedent:

  • Housing

Residential buildings contribute to a surprisingly large number of bird deaths. Birds mistake windows for open passages, crash into them, and die. An estimated one billion birds are killed by colliding with buildings every year in the United States alone, the majority of which are small residential buildings. Some estimates are as high as several billion. On average, using conservative figures, roughly 3 birds are killed per house per year this way. 

However, residences in bird migration hotspots frequently kill 10-20 per year once factors like scavenging are accounted for. A person who lives in a house in a high-bird-traffic area is therefore contributing to approximately 500-1000 bird deaths over a lifetime by virtue of merely living in a dwelling with windows[2]

Further still, having a 2-storey tree in the yard of a property increases the rate of bird collisions by 3.6X. A person in a high bird traffic area who does not give up their tree in the front yard is accepting several additional bird deaths (and many injurious nonfatal collisions) per year.

It is not typically considered practicable to avoid buying or renting properties that have large windows, or trees close by. Nor is moving to a low bird traffic area to minimise one’s economic contribution to bird deaths considered practicable despite a death toll that, in many regions, potentially exceeds the impact of a moderate meat eater.

Many (possibly most) omnivores would prefer to move to a low-bird-density area than to give up animal products, implying the sacrifice involved is comparable.

Given this precedent, where the feasible, yet inconvenient burden of limiting residential options is not considered practicable, it seems likely that the standard of practicability is fairly low to an extent that permits animal product consumption for many people. 

The vegan position that accepts a practicability limiter is thus constrained to two views:

  • The standard of practicability extends to limiting one’s residential options to treeless, low-bird-traffic dwellings, including cases where someone lives in a hotspot; or


  • The standard of practicability falls short of limiting one’s residential options to treeless, low-bird-traffic dwellings.

If 1: This is highly demanding and entails that the vast majority of vegans fall short of their obligations.

If 2: Then the threshold of practicability with respect to a lifestyle that kills 500-1000 birds falls short of relocation. Therefore, a meat eater who would eat 500-1000 (animals comparable to) birds in their lifetime, who would rather limit their residential options than give up animal products, would be making a greater sacrifice in doing so than the practicability threshold demands. Therefore, this view permits eating meat to an extent equivalent to hundreds of bird deaths. 

  • Electricity


Using electricity is widely considered to be permissible, including by vegans. This is despite the fact that all major methods of electricity generation are environmentally deleterious, and in most cases result in direct animal deaths. For instance, power lines alone in the United States kill approximately 30 million birds per year via collisions and electrocution - implying an individual electricity user counterfactually causes approximately 5-15 bird deaths in their lifetime. Ad libitum electricity usage is common among vegans, and abstaining from, or drastically reducing one’s electricity consumption is not typically considered practicable or obligatory. 

  • Driving


As discussed, driving carries a lifetime burden of ~75 vertebrate deaths as a conservative estimate. Abstaining from driving is typically considered impracticable, despite it being technically feasible in most cases to use alternate means of transportation. A person who commutes 20 minutes by car to work can typically bike to work and accept a more challenging hour-long commute. However, the inconvenience this involves ostensibly crosses the threshold of impracticability, despite it providing a near total reduction in roadkill.


  • Vegan advocacy itself implies low thresholds

Vegan advocates themselves near-universally claim that abstaining from animal products is easy. In doing so, this admits they are not typically making a significant sacrifice by eliminating animal products. If that is the case, and no further (difficult) sacrifices are obligatory, this further indicates the threshold of obligation to sacrifice is modest. Therefore, it is far more likely that interpersonal subjective variation on issues like taste will distinguish whether a given act of abstention is practicable. To obligate omnivores for whom it is difficult to switch, “don’t worry, all you need to do is make easy sacrifices”, does not suffice.


Conclusion & Summary

Abstaining from animal products is a practice that reduces harm, and therefore doing so is a virtuous practice. However, a practice being harm-reducing is not sufficient to justify it as an obligation - this neglects the crucial factors of the sacrifice required to perform that harm-reducing action, and mitigators that may diminish the immorality of the action. 

The sacrifice involved in abstaining from a given practice is subjective, and a major determinant of whether abstention is obligatory. The greater the personal sacrifice required to forgo something, the lesser one’s obligation to forgo it. In the case of consuming animal products, the harm a typical omnivore faces by abstaining is significant: enjoyment, nutritional sufficiency/ease, and convenience/time accumulate to a considerable sacrifice, affecting its obligatory status.

Mitigators - factors that ameliorate the immorality of an action orthogonally to harm, drastically affect obligations in many areas of life. Cars, electricity, housing and economic participation all result in considerable harm, yet are considered permissible by virtue of their indirectness, lack of intentionality, or delayed, distributed effect. The act of consuming animal products is ameliorated by these mitigators as well. 

Animals, though moral patients, are worthy of less moral consideration in general than humans because they possess morally relevant traits to lesser extents. Given this, harming animals is less immoral than harming humans. Given the aforementioned sacrifice/immorality model of obligation, this lesser immorality further shifts the designation of dietary veganism towards being supererogatory rather than morally required.

Vegans typically acknowledge a threshold of obligation themselves that is functionally identical to the framing of sacrifice vs immorality by their call to: “[abstain from animal products] as far as is practicable”. Without a practicability constraint, the view that one should abstain from practices harmful to animals entails unbounded, extreme abstention. With a practicability constraint, the threshold of practicability is either low enough to permit omnivorous diets, or it is high enough to demand unrealistic lifestyle compromises.

There are compelling reasons to believe “practicable” abstention from animal products is a low threshold rather than an extreme one. In principle, there are many mitigators present which decrease its immorality relative to harm, as well as its harm being concentrated among animals, which are of lesser moral consideration. By precedent, economic participation, driving, electricity usage, and residential windows all cause animal deaths, yet are “impracticable” to abstain from. Vegans themselves widely claim that the practice of dietary veganism is easy - further suggesting the standard of “practicable” is not very demanding.

Given many principled and precedent-based reasons to view a “practicable” sacrifice as one that is relatively easy and painless, and the strong reasons to believe abstaining from animal products is, for most omnivores, far from easy and painless, the standard vegan position that one ought to abstain from animal products to the greatest extent practicable, permits most omnivores to continue eating omnivorous diets while definitionally and morally qualifying as vegans.

Therefore, veganism - as in abstaining from consuming animal products such as meat, eggs, and dairy - is virtuous, but not obligatory.

Appendix


Definitions:

Moral patient: An entity worthy of moral consideration 

Harm: (a thing) Producing a result that is dispreferred by moral patients

Benefit: (a thing) Producing a result that is preferred by moral patients

Morality (moral/immoral): The goodness or badness of an action, which is a function of the action’s harm and/or benefit, and mitigators/intensifiers, weighted by the moral consideration of the affected entities.

Mitigator: A thing that decreases the moral weight of an action independently of its harm or benefit.

Intensifier: A thing that increases the moral weight of an action independently of its harm or benefit.

Obligation: A thing that one must (or must not) do.

Permissible: Lacking an obligation to abstain (from something).

Sacrifice: A voluntary self-imposed harm accepted for the purpose of benefiting other moral patients



  1. ^

    “permitted” is insufficient - this acknowledges a practicability constraint

  2. ^

    Bird-deterring window films do not solve this problem particularly well. Independent studies suggest roughly a 40-70% reduction in collisions, and only if the window film is applied on the outside, which requires more frequent reinstallation and is a much more difficult (expensive) process to complete. Full external installation of window film for a medium-sized house costs in the thousands of dollars, and needs to be replaced every 5-10 years. Most vegans do not do this, and do not consider it an obligatory sacrifice.



Discuss

Low Expectancy is Not a Confidence Problem

Новости LessWrong.com - 24 мая, 2026 - 01:48

Lukeprog's How to Beat Procrastination includes in its framework a term for expectancy or how likely/accomplishable a successful outcome feels internally. One of the levers to combat procrastination is thus to increase the perceived odds of getting a reward. I think this misattributes low expectancy to poorly calibrated self-confidence, when really it boils down to your own actual capabilities and the problem structure.

In many cases, the root cause for low expectancy is that you personally do not have experience, knowledge, or resources. Expectancy should be low rationally. While the original post does prescribe learning/process goals, this is framed as a means to the end of increasing self-confidence. In practice, this framing can lead you to focus on goals that increase confidence without tackling underlying understanding/competence. Competence can lead to confidence, but false confidence (which is fairly easy to manufacture via the methods in the original post) can lead to disaster. An unfortunate side effect: If you're optimizing for self-confidence, when reality hits back, you will start to distrust self-confidence as a signal, leaving you in an even worse place than where you started. (I actually think this is the main cause of chronically miscalibrated low self-confidence.)

Another common case that brings low expectancy: a task has long feedback loops and credit assignment is difficult. Even if you have the fundamental skills, there's no way of knowing if your actions are moving the needle. Setting intermediate process goals here can help sustain effort in one direction, but it cannot change the nature of the problem: it takes time to know if the intermediate process goals you choose are actually moving you towards your terminal goals effectively, especially in a new/unstructured domain. Society's answer here is to create concrete, well-trodden paths with visible rewards (structure the domain) or to work closely and learn from someone who has similar experience. This works great when it works (though it also relies on you to generally understand your direction/terminal goals).

However, the above solution is founded on an implicit trust that the promises will be fulfilled, and the environment and institutions will remain similar enough by the time you complete the intermediate goal. For rationalists who put some credence in short AI timelines (and anyone else in an unstable environment), this assumption is tenuous. Even if you distance yourself from the problem or try to reframe it (e.g., by coming up with plans that work on shorter horizons, or framing your work as a bet on a specific world, or by trying to create robust plans that work across many worlds), that doesn't eliminate the underlying reality that things are going to change, rapidly and unpredictably. The only answer I can think of here is to make peace with that fact. After you acknolwedge it and factor it in, continuing to dwell on it provides no new information, and will just cause paralysis.



Discuss

Basic principles for dressing better.

Новости LessWrong.com - 23 мая, 2026 - 23:01

I've been a toe-in rat and existed on the outskirts of the social scene for approaching a decade now, and I can confidently say (with love) that rationalist men rarely dress well.

I am drowning in a sea of reasonably-attractive men diminishing themselves in skinny jeans and free t-shirts from random events three years ago.

But you can do better. I believe in you. Honestly, it isn't even that hard.

In this post, I'll be teaching you two things:

  1. The basic theory behind how to actually assemble an outfit that will instantly make you look more interesting, attractive, and put-together.
  2. And how to find the clothes you'll need to buy to accomplish #1. I'll even give you a list of links to make things easy for you.

(while this post will be geared toward men, anyone could read this and get something out of it I think)

Outfit Assembly 101

I come from an art background. Assembling a good outfit is, in my opinion, a bit like trying to create a painting. You want the overall composition to feel balanced while still being interesting and nice to look at.

The biggest things I think rationalist men neglect to consider in their outfits (to the extent they give any of this any consideration at all) are color, visual weight, and detail.

Let's look at some examples of things I'd consider Pretty Good Outfits:

In an effort to instill in you more of the elusive thing called taste, let's talk about why I think these outfits work.

  1. They aren't afraid to wear color and pattern -- while none of the outfits shown here are super crazy, they also aren't particularly plain. Check out #6's scarf and handbag, or the rich baby blue and maroon cardigan on the man to his right (#7).
  2. There's a nice balance of visual weight -- meaning something bright and colorful and patterned (like the yellow shirt on #3, or the quilted jacket on #9, btw these are often called statement pieces) are paired with more understated items/solid colors to balance them out. Your outfits don't need to be insanely maximalist to still be interesting. Many men working corporate jobs with strict dress codes have a culture of purchasing statement dress socks, for example.
    1. A decision making process I'd recommend following as a beginner here would be to limit yourself to one statement piece per outfit, and then have everything else be solid, neutral colors. Think interesting shirt + jeans, or cool trousers + plain white sweater, for example.
  3. And small details bring it all together -- notice how #1's bag, belt, and shoes are all the same general shade of leather. Do you see how that gives the outfit an air of intentionality, of put-togetherness? Even the very casual outfit on #10 has some of this, look at how his bandana is blue like his jeans, his white t-shirt matches his sneakers, and even his belt is dark like his corduroy jacket. Your outfits don't need to be monochromatic, but think about ways you can have an accent color appear in more than one place.

(If, like some of my male housemates you object on principle to the concept of a bag, you could color-match a part of your outfit to something like a watch strap, belt, or shoes)

The last high-level bit of analysis I want to point out here is how, despite all of the variety in terms of color, texture, and pattern, the basic formula behind these outfits is fairly simple.

We start with a pair of well-fitting pants (bonus points if they have a slightly wider leg, slim-fit jeans aren't actually that flattering IMO) and then add either a blank t-shirt, tank top, or button-up shirt.

Add some shoes and accessories to that and you can call it a day then and there. Or, you could take it up another notch and layer something like a blazer, jacket, or cardigan.

None of these outfits are particularly brain-breaking. They're very straightforward.

Your shopping list so far is pretty simple:
  1. Trousers
  2. Button up shirt
  3. Something to layer on top
  4. Solid-color t-shirts (no prints or logos)
  5. And a few miscellaneous accessories

For #1-3, get two versions: one plain, one statement piece.

If you follow the advice thus far, you'll absolutely look more attractive and put together. Your outfits will feel more intentional and curated when you add a little bit of color, pay attention to details, and consider visual weight.

But the immediate failure mode I expect many of you to fall into is that you buy those items off of Amazon like you're checking things off of a grocery list.

Part of what makes the above outfits interesting is that the clothes themselves are interesting. They have drape, texture, structure, interesting details. They're nice to look at.

The way to look hotter and more interesting is both to purchase higher quality clothes that fit you well (and to get things tailored, if you can) and to have those clothes say something about who you are.

Fashion is an opportunity to express yourself.

You were sorta on the right track with this when you started wearing all of those t-shirts with xkcd comics on the back, except the signal value is about as worthless as a college degree now, because everyone else wears them too.

So think about other statements you can make or personality traits you can express. Even colors you might like to wear more!

I digress. Let's get into my list of stores to shop from.

My Favorite Menswear Stores, And How To Find Your Own

All of the clothes pictured above are real items you can purchase from the stores in this list. I could tell you where each of them are from, but I think your life will be better if you do some digging through the online stores on this list yourself.

(Also, fair warning, many of these are a bit pricey, like $150 for a shirt kind of pricey)

For the basics:

  1. J. Crew -- If you need an entirely new wardrobe, go here first.
  2. Bonobos -- Very similar to J.Crew. Nothing innovative here, but solid.
  3. Wax London -- This is J.Crew and Bonobos's cooler younger brother.
  4. Todd Snyder -- Very much in the same category as the previous three. Good, not super interesting IMO, but hey! Not everything needs to be a statement piece.
  5. 7Diamonds -- If you tend to be a little sweaty/run hot, the synthetic short-sleeve shirts they carry will do wonders for your temperature regulation. Just don't buy the pants from here, the crotch seams will tear after a few months.
  6. Industry of All Nations -- Lots of basics in a million colors with very straightforward product photos and good material quality.

For statement pieces and more interesting basic options:

  1. Perte D'Ego -- If the things you really want are super interesting shirts that will get you endless compliments, go here. They take ages to ship, but the quality is great.
  2. Arran Studios -- This is a small independent brand still gaining traction, but if you're more into an understated workwear/modern wild-west look, they're great.
  3. Cord Studio -- I think this brand carries some of the most interesting and well-crafted linen button downs out there. Great details.
  4. Society of Cloth -- Features a variety of smaller designers. Lots of variation in price and very fun to browse.
  5. House of Errors -- God, House of Errors has some of the coolest clothes I've ever seen. So much attention to detail. They release new stuff on the regular and all of it is fun and innovative. Lots of unique knits and embroidery work that elevates an outfit.
  6. Found Co -- My favorite hoodie is from this brand. They do a lot of cool things with quilting and patches, and have a very nice earthy color palette.
  7. A Kind of Guise -- One of the pricier brands on this list, but their workwear and suiting looks really fun.
  8. Desigual -- Love the button-ups from here, lots of fun textures and patterns without being too loud. Lots of art history inspired stuff.
  9. OAS Company -- I LOVE the texture and prints from this brand.

** Note that for many of the brands on the second list, lots of what they release is in the form of small micro-collections, meaning you should really consider joining their email list, or you'll end up missing out on their best stuff. Purchasing from small independent fashion designers has pretty few drawbacks, but that's one of them.

But how did I find all of these interesting brands, you might ask?

You're going to hate this part, but... I've found the vast majority of these brands on Instagram.

The thing about Instagram is that it's extremely happy to show you ads it thinks you'll click on. So why not just use this power for good?

If you follow the brands above, like a few of their posts, and only engage with ads that show you menswear (better yet, menswear you like) the algorithm will turn into your own de-facto personal shopper, plumbing the depths of the internet to serve you ads from other menswear brands just like them.

Hope this helps.



Discuss

Boltzmann brains, like Doomsday, require no explaining

Новости LessWrong.com - 23 мая, 2026 - 19:16

Brothers and sisters I have none, but that man's father is my father's son. Who am I?

— ancient riddle


In Eliezer Yudkowsky’s post this week, he writes: “Our current experience -- your own experience, at this very moment, of seeing ordered letters on a screen -- therefore seems to provide overwhelming anthropic evidence against any model of reality or physics which would imply that most brains are Boltzmann brains.”

I hope it’s fair for me to roughly present this line of reasoning like so:

  1. If Boltzmann brains are possible
  2. And if Boltzmann brains would severely outnumber ordered brains
  3. And if each observe is a random draw among all observers
  4. Then finding ourselves to be non-Boltzmann brains should come to us as a huge surprise
  5. Meaning either one of the prior premises is false, or we need (in all likelihood) a theory of the universe that can compensate for this severely unlikely observation


I take the side that “one of the prior premises is false”. Can you guess which one?

What follows is a post about the Doomsday Argument, but everything I write equally applies to the idea that the possibility of Boltzmann brains presents a problem that theories of the universe must solve.


The Doomsday Argument

The Doomsday Argument has been debated among philosophers for decades. It seems to indicate that the number of future humans should be roughly equal to the number of past humans—though that obviously can't be true among all points in time. I believe most people reject the argument, but how they do so can widely vary. For instance, some folk believe in compensating theories that end up implying the opposite: that actually there can never be an ultimate Doomsday, because the number of humans (or at least, the number of conscious observers) is infinite.

To understand the argument works, let's start with a game. There's a deck of cards that can be any size N between 1 to 100, though you don’t actually get to see the deck. A random card will be chosen from the deck and shown to you. If the fifth card is drawn (which we can call K = 5), how large might you expect the deck to be, on average? If the seventy-fifth card is drawn (K = 75), does that make the chance of a maximum deck size (N = 100) higher than if the twenty-fifth had been drawn?

The answer to that second question is yes. Larger draws make larger deck sizes more likely, since they eliminate the possibility of smaller deck sizes.

Smaller draws do the opposite. Before any card is drawn, the chance of any deck size is 1%. After drawing the very first card, K = 1, the chance that you drew that card because you had to draw that card, because that was the only card available, increases. (Similarly, the chance that you drew it as a 50% shot between the choice of two cards also increases, though not as much.)

Doing the math will show that the chance of N = 1 jumps all the way up to 19%.

Observations give us probabilistic evidence towards the possible worlds in which those observations were more likely—e.g., smaller deck sizes for smaller card draws.

The Doomsday argument takes the same dynamic and applies it to the human population:

  1. There’s a 1/N chance of you being any particular person in history.
  2. Therefore the chance of being in the first K people is only K/N. Once you observe yourself to be the Kth person ever, then following a similar line of math as with the deck of cards, the most likely N becomes roughly 2*K.
  3. However, the human population has been growing exponentially rather than linearly. If our total population is likely to be in the range of 2*K, then that exponential growth means we’ll hit that bound soon.

The implication being: If there’s only so many humans left, then something will have to kill us all off. Maybe a black hole, or maybe a quantum bomb that destroys the whole universe. Every possible Doomsday scenario should be ascribed a higher probability than we would otherwise, without the Doomsday Argument.

Of the three steps outlined above, I believe that step 2 is solid. Please double-check the math yourself if you are so inclined!

Step 3 is fun because it makes the whole argument more dramatic, but it’s debatable and ultimately unnecessary. Whether humans would be expected to die very very soon, thanks to the extrapolation from exponential growth, or whether that end we’d be expected to last just a bit longer, as our population follows a sigmoidal curve, doesn’t change the fundamental prediction: Whatever K population we currently have, the Doomsday Argument will predict a grand total of 2*K humans to exist in all of space and time.

And honestly, 2*K doesn’t seem like an unreasonable prediction. Maybe there’s truth to the argument after all. Surely it won’t lead to any-


Utter Absurdity

Let’s rewind the clock about 300,000 years—or maybe more accurately for this scenario, 6,000 years—and let’s forget about the theory of evolution or the genetic quagmire that would wreak havoc upon a population from intense inbreeding.

Adam and Eve.

In the Garden of Eden, the couple meet their fabled serpent. They learn of carnal temptation, partake in Biblical acquaintanceship, and then face God’s judgement. Unable to handle His Great and Terrifying Disappointment, Eve flees and begins her journey of pregnancy alone. Adam, on the other hand, is arrogant and unperturbed. He neither chases after Eve nor dissents against God. Instead he lounges at home, eating apples and chatting with his new scaley friend.

In captivity, a common garter snake can live up to twenty years. This serpent of Eden ends up surviving only another two months, when an eagle happens by and eats him. Adam tries to befriend the eagle, but of all the world’s animals, only the serpent had been imbued by the grace of God to speak proto-Semitic. As such, Adam is left alone with only his thoughts and his apples for the next one hundred years.

When Eve at last decides to return, she brings a giant family in tow. An overwhelmed Adam sits in shock as she tells her tale: How she traveled miles every day, subsisting mostly off rainwater, barely edible berries, and mildly poisonous mushrooms. How she befriended a wolf pack whose warm bodies helped her survive the harsh winter. How she birthed a miraculous octuplet of daughters, each of whom was forced to learn to hunt as soon as walk—but each of whom grew up happy, surrounded by lupine guardians and sisterly affection.

Eve describes next how each of her daughters and granddaughters likewise experienced parthenogeneses, God granting them a healthy amount of genetic variety (and the ability for humanity to jumpstart with only first-cousins level of incest). She describes the flourishing of her family, then has each of her fifty great-granddaughters and fifty great-grandsons introduce themselves to Adam.

Eve says: “I dream of a thriving humanity that will venture through all the jungles and deserts and oceans of this world, spreading our roots. You, Adam, will have been the first of thousands. Maybe even perhaps the first of billions.”

Adam, who’s grown visibly uncomfortable and increasingly shifty during all this, replies: “Hah! You really believe me to expect the evidence of my lying eyes? If it were true that I had fifty male descendants, then there would be a less than one-in-fifty chance of me being me, for I could have been born any one of them! Much more likely that all this is a ruse pulled by God the Deceiver and that I have no male descendants. You expect me to believe that there will be billions of humans? Then the chance of me having been the first human would be one in billions. In fact, think about this: God promises a future that will last billions of years, if not forever. What chance could I have had to be alive right now, during this mere century, rather than at any other point in the history of time? Essentially zero, unless I were in fact immortal. This is the most probabilistic explanation of my observations: These children are fake. You are fake. Everything in this universe are but figments of my imagination—me, the only thing that’s real, forever and always.”

But Adam, quite worked up and over a century old, goes into cardiac arrest at that exact moment and dies a minute later. Eve buries Adam in the Garden of Eden and marks his grave with a single upright stick, because neither gravestones nor crosses had yet been invented.

Adam tries to argue that certain things about the future must be true, based on an understanding of himself as being a random draw among events that haven’t happened yet, and which may or may not even happen. 

In other words: Adam ends up forming beliefs about the size of humanity’s future population based on his belief about his place within the scope of humanity’s past and future population.

It’s circular logic.

Ouroboros by Luke Orrin


We can start building the right intuition that avoids this circularity by recognizing that Adam’s first rank isn’t any more remarkable than someone being the 42nd human to exist, or the 6,283,185,307th human. That distinguishes the Adam example from the deck sizes example, in which smaller card draws are remarkable for being more likely with smaller deck sizes, and larger card draws are remarkable for only being possible with larger deck sizes.

If God told Adam, “I created a billion different universes, each with a different human population across all time, from 1 to a billion. I then picked a random universe, and picked a random body throughout history inside that universe, went to my soul factory, then plopped your soul into that body,” then we’d have a situation analogous to the deck of cards: Adam should indeed predict smaller future populations.

Lacking such an explanation from God Himself, the Doomsday Argument falters. How exactly? Earlier, I presented the Doomsday Argument in three steps. The second step was unassailable and the third step irrelevant.

The first step was this: “(Given a total population of N) there’s a 1/N chance of you being any particular person in history”. Or rephrased:

  • P(I am the Kth human | past-and-future population has size N) = 1 / N

This is an incorrect premise. (Which is why you don’t need the “SIA” or “SSA” or any other theory to fix anything.)

Except this leaves the question: If 1/N is wrong, then what’s the correct value? What was the chance of me being me, or you being you, or Adam having been first?


The Mistaken Assumption

Realistically, I don’t think there’s anybody in history we can point to as the “first” human. Evolution would’ve made the boundary too fuzzy. How many individuals would’ve blurred the definitions between Homo heidelbergensis than Homo sapiens? In a broader sense, at what point were we more monkey than man?

But let’s say we drew an arbitrary line and defined the exact moment in time “humanity” began. Man #1 gets to be known as “Adam”, woman #1 gets to be known as “Eve”, and I hereby bequeath upon the 42nd man the name “Dams Ouglas”.

We want to know: What’s the chance that a given person has their rank K? Every K should have the same chance, because no position is more remarkable than any other, which means this question is equivalent to asking: What’s the chance that “Dams” was 42nd?

  • P(“Dams” was 42nd | total population of N) = ?

The answer:

  • P(“Dams” was 42nd | total population of N) = P(“Dams” was 42nd)

And:

  • P(“Dams” was 42nd) = 1


This… might violate some intuitions. 


It's natural to ask questions about the relative fortunes or misfortunes around the circumstances of one’s birth. Questions which will have answers with probability less than 1. For instance—to pull a completely random question out of my ass,[1] with no personal salience whatsoever—I might ask: “What's the chance, if I were a random American millennial, that I'd be diagnosed with colorectal cancer before 35?” (Less than 1 in 2,000.) Or: “What's the chance, if I were a random American millennial, that said cancer would metastasize by 40?” (Again, roughly 1 in 2,000.)

The conditional clauses are doing important work here. If I omit them and instead only ask, “What was the chance I’d be diagnosed with CRC by age 35?” the answer would be 1. 

In the exact same way, the chance of Dams being 42nd was 1. That's because both “Dams” and “42nd person” are references that point to the same entity. The subject and object are just different names for the same person. 

When we ask something like, “If God were to choose a random human from across all time, what's the chance he'd happen to choose Dams?”, then we end up with a different subject (a random selection of human) from object (Dams). 

(More on this later.)


Why do philosophers keep getting this wrong?

Because references (AKA “pointers”) are hard!

This is a well-known fact to any software developer who’s programmed in C, and also many riddlemasters and magicians. The riddle I presented at the beginning of this essay follows the format, “It seems like I’m referring to different individuals, but actually I’m referring to the same one,” as does Lewis Carroll’s classic problem involving the nominal ambiguity of familial relationships:


The Governor of Kgovjni plans on hosting a small dinner party and invites his father's brother-in-law, his brother's father-in-law, his father-in-law's brother, and his brother-in-law's father. How many guests will he have?


(The answer can be as small as one.)


In the same vein, “these two cards seem like they have no relation, but actually they’re the same card” is the basic format of about half the card tricks I’ve ever learned, as in this fun short video by VSauce.

The Anti-Doomsday Argument

It’s tempting to think that no matter what maniacal technologies we develop, we’ll never destroy the universe nor the planet: They’ve been around too long, and have surely seen worse than us. The worst we can do is destroy ourselves.

Is there merit to that idea, or might this be another argument built atop a faulty premise?

It’s certainly within the realm of plausibility, similar as it is to arguments like this:

  1. Trees have been around a very long time.
  2. In all that time, trees have never obliterated the planet.
  3. Trees tend to remain treelike. It takes a long time for trees to evolve into non-treelike things.
  4. Therefore, trees are unlikely to obliterate the planet any time soon.


I would hope that’s uncontroversial.

THE TREES WILL KILL US ALL. (Gorgeous piece by Sylvain Sarrailh)


The important difference between humans and trees is not that we’re more destructive, but that our behaviors are demonstrably changing on an incredibly rapid timescale. We existed for a few hundred thousand years before inventing T.N.T., but then it took us only 82 more years to invent the atomic bomb.

When combined with one other observation, I believe we can conclude that humans are extremely unlikely to instantly destroy the entire universe, all at once—but any lesser degree of destruction might well be within our grasp, including instant destruction of the planet, or a chain-reaction that will eventually but not immediately destroy everything else.

The other observation is that of the universe’s unimaginable vastness. Astronomers estimate that the Milky Way contains around 100 to 400 billion stars, and even more incredible, that the universe likely contains somewhere in the range of 200 billion to 2 trillion galaxies.

So no matter how rare the existence of life, gods only know how many trillions of Earthlike planets might be out there, and how many trillions of alien species might exist with human or superhuman intelligence. If none of them ever managed to deliberately or accidentally destroy the entirety of the universe in a single instant, I think it’s fair to judge that such an act is flatly impossible.

However, with space being so large and difficult to traverse that we’ve yet to actually meet or even observe any aliens, we’re blind to what fates might eventually befall the typical alien species. Maybe the universe is littered with the desiccated husks of once-thriving civilizations who venture too far with scientific experimentation; maybe every black hole is the scorch mark of a quantum bomb.


So is Doomsday likely or not?

It was 1944 and Manhattan Project scientists needed to know how to safely test a nuclear detonation. J. Robert Oppenheimer argued that they needed a test whose scale would be “comparable with that contemplated for final use”. Brigadier General Groves was concerned—though not with safety. He just wanted to make sure they didn’t waste their expensive plutonium.

The Trinity test proceeded in secrecy in the Jornada del Muerto desert (literally “Journey of Death”), about 35 miles away from the nearest city, though only about 13 miles away from the nearest ranchers.

They were confident with this margin of miles. Trinity would not wipe New Mexico off the map. 

But how could they be so sure?


We might reason:

  • There’s a 1/100 chance that measurements are 100x off
  • There’s a 1/100 chance that calculations are 100x off
  • Combined with other such chances, there must be a larger than 1-in-a-10,000 chance of the test detonation ending up in horrific tragedy


The military might judge 1-in-10,000 to be an acceptably low risk, but I'm betting many New Mexicans would disagree.


The key to their confidence was the Square-cube law. All explosions, nuclear or not, scale as a cubic root rather than linearly: To cover twice as much radius, a bomb will need eight times as much energy. That meant the Manhattan Project scientists’ measurements could have been many orders of magnitude off, and the test bomb (codenamed “the gadget”) still wouldn’t have blasted past the ends of Joranda del Muerto.

It’s conceivable that in the future, some scientist will theorize a new type of super-efficient energy reactor that might usher in a new golden age of civilization and resource abundance, but with one minor potential drawback: The scientist also theorizes that this new process might instead trigger an explosion that would destroy half the planet.

In such a situation, the appropriate next step isn’t the weighing of potential gains against potential risks to determine whether to pursue this technology.


The best next step would be to do more science.


Manhattan Project scientists were able to develop extremely more confident probabilistic beliefs thanks to their knowledge of the Square-cube law. A good prediction about the impact of a quantum bomb will undoubtedly require deep knowledge of physics. A good prediction about near-future population sizes would involve deep knowledge about economics, politics, epidemiology, understanding the factors behind birth rate decline, and also the potential factors for birth rate escalation. A good prediction about distant-future population sizes would require deep scientific knowledge about likely extinction events and the feasibility of interplanetary or intergalactic colonization.

None of these begin with the assumption, “I’m a random draw from among all humans.” Though the fact of our existence is incredibly arbitrary (and a mystery), and without omniscience, our lives are dominated by randomness (deterministic universe or no), we are not “1/N” random. Our K ranks are what they are; they do not bear on what the future holds, whether humanity will soon face Doomsday, whether we’ll survive the next trillion years, or whether we’ll survive for all infinity.

Boltzmann brains

Let's imagine you've entered a twenty-person footrace. You don’t know how good the other competitors are, so you judge yourself to have an equal shot at any particular result, 1/20.

You win the Bronze! The organizers start handing out medals. After you receive yours, you remember: The organizers had previously stated how many people would receive medals, and it was either all of the top ten, or all twenty participants. 

Now that you've received a medal, what's the chance that one the top ten will receive one?

  • YGM: Event that you got this medal
  • N: Number of people who will receive a medal
  • P(N=20) = P(N=10) = 1/2
  • P(YGM | N=20) = 1
  • P(YGM | N=10) = 1/2
  • P(YGM) = 3/4
  • P(N=10 | YGM) = (1/2)*(1/2)/(3/4) = 1/3


This math mistakenly leads you to thinking that P(N=20) is now twice as likely, when actually it hasn't changed. 

The third place observation of receiving the bronze medal is identical in both possible worlds, so you gain no new knowledge from it (Just as waking up gives you no new knowledge in the Sleeping Beauty problem). The P(You get Bronze) = 1/20, which changes all the math above. Or you could factor in that you’d be one of the fastest, and then P(YGM | N=10) would equal 1. 

Analogously, the observation of being an ordered brain is identical in both possible universes (of Boltzmann brains being possible and outnumbering structured brains, versus that not being the case). We are what we are: human or not human, early or late in history, structured or unstructured. Every class of observer must get instantiated, no matter how outnumbered they might be compared to other classes. Without bringing in additional assumptions (like God's "soul factory" mentioned before), there is no P=1/N to start with, and no Bayesian update to make.


  1. ^

    Pun intended.



Discuss

Probabilities are not the right concept

Новости LessWrong.com - 23 мая, 2026 - 19:10
Introduction

This sequence is an attempt to sketch a unified framework for several interconnected questions: Where do Bayesian priors come from? What even are probabilities? How should we deal with infinite ethics? What's going on with anthropics? I hope to lay out both some of the existing answers and my own preferred synthesis.[1]

I understand that many people have already thought about these questions, and I have only read portions of the existing literature. I think most of what I will write here, even in the section about my preferred synthesis, is not novel. People whose writing I'm building on include Wei Dai, Paul Christiano, Joe Carlsmith, Scott Garrabrant and Richard Ngo. I've also listened to some people like Lukas Finnveden, Vivek Hebbar and Ryan Greenblatt talk about related topics, which was also influential on me.[2]

However, most of the prior work is scattered across many, often very confusingly written blog posts, and I can't easily tell where I first came across various ideas I'm exploring here. Therefore, I will not try to do a full exegesis of where each idea came from, and will instead present the arguments as a unified flow, with only occasional direct references to the work of prior authors. It's also very possible that there are important insights that I missed that people have already written on these topics - in that case, feel encouraged to link to them in the comments.

This first post will look at some possible definitions of probabilities and why I think they don't really work. Later posts will examine what we can best replace probabilities with.

What even are probabilities?

What do I mean when I say that I give a 10% probability that it's going to rain in my town tomorrow? This 10% probability doesn't refer to any tangible fact about the real world. Sure, there is some amount of objective randomness in whether it will rain or not tomorrow, due to quantum randomness. But I have no idea how big the quantum effects are on the weather tomorrow, and when I say I give a 10% chance for rain, I'm clearly not referring to the true quantum probabilities.

I'm also not satisfied with the frequentist view where you need to look at a series of sufficiently similar events in the past, and count the frequency with which the event happens. This view may be tenable for rain (though I still don't know how you define "sufficiently similar" days), but I don't know how you would apply it to any less generic question, like the probability that the Russia-Ukraine war ends in 2026.

The classical Bayesian view holds that probabilities are just my subjective credences; they only live in my head. I find this view appealing. Still, if someone tells me he thinks there is a 50% chance that Bigfoot is standing in the next room, I wouldn't just shrug and say "Yep, it's all subjective, like liking chocolate and vanilla ice cream. He says 50%, that's as good as any other probability estimate."

I intuitively think that giving a 50% probability for Bigfoot standing next door must be wrong in some important sense, so we will need to investigate more deeply what probabilities mean instead of just saying they are all subjective.

I will explore two common answers - one based on defining an objective prior for Bayesianism, and another based on defining probabilities through betting odds. I think both answers offer valuable insights that I will build on in later posts, but neither of them give a satisfactory definition of probabilities.

Probabilities from priors

When I try to predict what will happen next, I rely on past evidence. The reason I believe there is less than a 50% chance of Bigfoot standing in the next room is that I have looked into many rooms in my life and Bigfoot was in none of them, plus I have read about other people not encountering Bigfoot, plus I have some broader evidence on what kind of animals are found where.

However, relying on past evidence runs into the problem of induction.

The sun has risen every day, so I expect it will rise again tomorrow. But it is an equally valid hypothesis, equally fitting the evidence, that the laws of nature dictate the sun will rise every day until June 1, 2026, and never again. Why, on May 31st, do I still think the sun will probably rise?

Galilei observes that all objects fall at the same rate, and then encounters a tropical fruit he has never seen before. Should he assume this fruit also falls the same way? Russell playfully conjectures that there might be an intact teapot floating between Earth and Mars. Why do I expect our probes won't find it?

The traditional answer is something like a simplicity prior, also referred to as Occam's Razor. The laws of nature are supposed to be simple: they shouldn't differ for every particular object, they shouldn't contain arbitrary date-specific caveats, and complex objects like teapots shouldn't appear without a cause. But it's unclear what "simplest explanation" actually means, so we will need to explore that further.

Solomonoff induction

In Bayesian terms, everything I've observed in my life is evidence for and against various hypotheses. I started with some set of hypotheses that had some initial prior probabilities, and all my observations updated them. The question is: what were these starting hypotheses and prior probabilities, before I had any evidence at all?

One common answer is the Solomonoff induction. All hypotheses are assumed to be computable: everything I've observed was produced by a computer program, and the next observations will be produced by the same program. My prior distribution is based on program length on a Universal Turing Machine. A program of length n gets prior probability proportional to, let's say, mjx-math { display: inline-block; text-align: left; line-height: 0; text-indent: 0; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; border-collapse: collapse; word-wrap: normal; word-spacing: normal; white-space: nowrap; direction: ltr; padding: 1px 0; } mjx-container[jax="CHTML"][display="true"] { display: block; text-align: center; margin: 1em 0; } mjx-container[jax="CHTML"][display="true"][width="full"] { display: flex; } mjx-container[jax="CHTML"][display="true"] mjx-math { padding: 0; } mjx-container[jax="CHTML"][justify="left"] { text-align: left; } mjx-container[jax="CHTML"][justify="right"] { text-align: right; } mjx-msup { display: inline-block; text-align: left; } mjx-mn { display: inline-block; text-align: left; } mjx-c { display: inline-block; } mjx-utext { display: inline-block; padding: .75em 0 .2em 0; } mjx-TeXAtom { display: inline-block; text-align: left; } mjx-mo { display: inline-block; text-align: left; } mjx-stretchy-h { display: inline-table; width: 100%; } mjx-stretchy-h > * { display: table-cell; width: 0; } mjx-stretchy-h > * > mjx-c { display: inline-block; transform: scalex(1.0000001); } mjx-stretchy-h > * > mjx-c::before { display: inline-block; width: initial; } mjx-stretchy-h > mjx-ext { /* IE */ overflow: hidden; /* others */ overflow: clip visible; width: 100%; } mjx-stretchy-h > mjx-ext > mjx-c::before { transform: scalex(500); } mjx-stretchy-h > mjx-ext > mjx-c { width: 0; } mjx-stretchy-h > mjx-beg > mjx-c { margin-right: -.1em; } mjx-stretchy-h > mjx-end > mjx-c { margin-left: -.1em; } mjx-stretchy-v { display: inline-block; } mjx-stretchy-v > * { display: block; } mjx-stretchy-v > mjx-beg { height: 0; } mjx-stretchy-v > mjx-end > mjx-c { display: block; } mjx-stretchy-v > * > mjx-c { transform: scaley(1.0000001); transform-origin: left center; overflow: hidden; } mjx-stretchy-v > mjx-ext { display: block; height: 100%; box-sizing: border-box; border: 0px solid transparent; /* IE */ overflow: hidden; /* others */ overflow: visible clip; } mjx-stretchy-v > mjx-ext > mjx-c::before { width: initial; box-sizing: border-box; } mjx-stretchy-v > mjx-ext > mjx-c { transform: scaleY(500) translateY(.075em); overflow: visible; } mjx-mark { display: inline-block; height: 0px; } mjx-mi { display: inline-block; text-align: left; } mjx-mfrac { display: inline-block; text-align: left; } mjx-frac { display: inline-block; vertical-align: 0.17em; padding: 0 .22em; } mjx-frac[type="d"] { vertical-align: .04em; } mjx-frac[delims] { padding: 0 .1em; } mjx-frac[atop] { padding: 0 .12em; } mjx-frac[atop][delims] { padding: 0; } mjx-dtable { display: inline-table; width: 100%; } mjx-dtable > * { font-size: 2000%; } mjx-dbox { display: block; font-size: 5%; } mjx-num { display: block; text-align: center; } mjx-den { display: block; text-align: center; } mjx-mfrac[bevelled] > mjx-num { display: inline-block; } mjx-mfrac[bevelled] > mjx-den { display: inline-block; } mjx-den[align="right"], mjx-num[align="right"] { text-align: right; } mjx-den[align="left"], mjx-num[align="left"] { text-align: left; } mjx-nstrut { display: inline-block; height: .054em; width: 0; vertical-align: -.054em; } mjx-nstrut[type="d"] { height: .217em; vertical-align: -.217em; } mjx-dstrut { display: inline-block; height: .505em; width: 0; } mjx-dstrut[type="d"] { height: .726em; } mjx-line { display: block; box-sizing: border-box; min-height: 1px; height: .06em; border-top: .06em solid; margin: .06em -.1em; overflow: hidden; } mjx-line[type="d"] { margin: .18em -.1em; } mjx-mrow { display: inline-block; text-align: left; } mjx-msqrt { display: inline-block; text-align: left; } mjx-root { display: inline-block; white-space: nowrap; } mjx-surd { display: inline-block; vertical-align: top; } mjx-sqrt { display: inline-block; padding-top: .07em; } mjx-sqrt > mjx-box { border-top: .07em solid; } mjx-sqrt.mjx-tall > mjx-box { padding-left: .3em; margin-left: -.3em; } mjx-c.mjx-c32::before { padding: 0.666em 0.5em 0 0; content: "2"; } mjx-c.mjx-c2212::before { padding: 0.583em 0.778em 0.082em 0; content: "\2212"; } mjx-c.mjx-c1D45B.TEX-I::before { padding: 0.442em 0.6em 0.011em 0; content: "n"; } mjx-c.mjx-c31::before { padding: 0.666em 0.5em 0 0; content: "1"; } mjx-c.mjx-c1D45D.TEX-I::before { padding: 0.442em 0.503em 0.194em 0; content: "p"; } mjx-c.mjx-c2B::before { padding: 0.583em 0.778em 0.082em 0; content: "+"; } mjx-c.mjx-c28::before { padding: 0.75em 0.389em 0.25em 0; content: "("; } mjx-c.mjx-c29::before { padding: 0.75em 0.389em 0.25em 0; content: ")"; } mjx-c.mjx-c3D::before { padding: 0.583em 0.778em 0.082em 0; content: "="; } mjx-c.mjx-c33::before { padding: 0.665em 0.5em 0.022em 0; content: "3"; } mjx-c.mjx-c221A::before { padding: 0.8em 0.853em 0.2em 0; content: "\221A"; } mjx-c.mjx-c35::before { padding: 0.666em 0.5em 0.022em 0; content: "5"; } mjx-c.mjx-c2248::before { padding: 0.483em 0.778em 0 0; content: "\2248"; } mjx-c.mjx-c30::before { padding: 0.666em 0.5em 0.022em 0; content: "0"; } mjx-c.mjx-c2E::before { padding: 0.12em 0.278em 0 0; content: "."; } mjx-c.mjx-c38::before { padding: 0.666em 0.5em 0.022em 0; content: "8"; } mjx-c.mjx-c2C::before { padding: 0.121em 0.278em 0.194em 0; content: ","; } mjx-container[jax="CHTML"] { line-height: 0; } mjx-container [space="1"] { margin-left: .111em; } mjx-container [space="2"] { margin-left: .167em; } mjx-container [space="3"] { margin-left: .222em; } mjx-container [space="4"] { margin-left: .278em; } mjx-container [space="5"] { margin-left: .333em; } mjx-container [rspace="1"] { margin-right: .111em; } mjx-container [rspace="2"] { margin-right: .167em; } mjx-container [rspace="3"] { margin-right: .222em; } mjx-container [rspace="4"] { margin-right: .278em; } mjx-container [rspace="5"] { margin-right: .333em; } mjx-container [size="s"] { font-size: 70.7%; } mjx-container [size="ss"] { font-size: 50%; } mjx-container [size="Tn"] { font-size: 60%; } mjx-container [size="sm"] { font-size: 85%; } mjx-container [size="lg"] { font-size: 120%; } mjx-container [size="Lg"] { font-size: 144%; } mjx-container [size="LG"] { font-size: 173%; } mjx-container [size="hg"] { font-size: 207%; } mjx-container [size="HG"] { font-size: 249%; } mjx-container [width="full"] { width: 100%; } mjx-box { display: inline-block; } mjx-block { display: block; } mjx-itable { display: inline-table; } mjx-row { display: table-row; } mjx-row > * { display: table-cell; } mjx-mtext { display: inline-block; } mjx-mstyle { display: inline-block; } mjx-merror { display: inline-block; color: red; background-color: yellow; } mjx-mphantom { visibility: hidden; } _::-webkit-full-page-media, _:future, :root mjx-container { will-change: opacity; } mjx-c::before { display: block; width: 0; } .MJX-TEX { font-family: MJXZERO, MJXTEX; } .TEX-B { font-family: MJXZERO, MJXTEX-B; } .TEX-I { font-family: MJXZERO, MJXTEX-I; } .TEX-MI { font-family: MJXZERO, MJXTEX-MI; } .TEX-BI { font-family: MJXZERO, MJXTEX-BI; } .TEX-S1 { font-family: MJXZERO, MJXTEX-S1; } .TEX-S2 { font-family: MJXZERO, MJXTEX-S2; } .TEX-S3 { font-family: MJXZERO, MJXTEX-S3; } .TEX-S4 { font-family: MJXZERO, MJXTEX-S4; } .TEX-A { font-family: MJXZERO, MJXTEX-A; } .TEX-C { font-family: MJXZERO, MJXTEX-C; } .TEX-CB { font-family: MJXZERO, MJXTEX-CB; } .TEX-FR { font-family: MJXZERO, MJXTEX-FR; } .TEX-FRB { font-family: MJXZERO, MJXTEX-FRB; } .TEX-SS { font-family: MJXZERO, MJXTEX-SS; } .TEX-SSB { font-family: MJXZERO, MJXTEX-SSB; } .TEX-SSI { font-family: MJXZERO, MJXTEX-SSI; } .TEX-SC { font-family: MJXZERO, MJXTEX-SC; } .TEX-T { font-family: MJXZERO, MJXTEX-T; } .TEX-V { font-family: MJXZERO, MJXTEX-V; } .TEX-VB { font-family: MJXZERO, MJXTEX-VB; } mjx-stretchy-v mjx-c, mjx-stretchy-h mjx-c { font-family: MJXZERO, MJXTEX-S1, MJXTEX-S4, MJXTEX, MJXTEX-A ! important; } @font-face /* 0 */ { font-family: MJXZERO; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Zero.woff") format("woff"); } @font-face /* 1 */ { font-family: MJXTEX; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Regular.woff") format("woff"); } @font-face /* 2 */ { font-family: MJXTEX-B; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Bold.woff") format("woff"); } @font-face /* 3 */ { font-family: MJXTEX-I; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Math-Italic.woff") format("woff"); } @font-face /* 4 */ { font-family: MJXTEX-MI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Main-Italic.woff") format("woff"); } @font-face /* 5 */ { font-family: MJXTEX-BI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Math-BoldItalic.woff") format("woff"); } @font-face /* 6 */ { font-family: MJXTEX-S1; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size1-Regular.woff") format("woff"); } @font-face /* 7 */ { font-family: MJXTEX-S2; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size2-Regular.woff") format("woff"); } @font-face /* 8 */ { font-family: MJXTEX-S3; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size3-Regular.woff") format("woff"); } @font-face /* 9 */ { font-family: MJXTEX-S4; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Size4-Regular.woff") format("woff"); } @font-face /* 10 */ { font-family: MJXTEX-A; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_AMS-Regular.woff") format("woff"); } @font-face /* 11 */ { font-family: MJXTEX-C; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Calligraphic-Regular.woff") format("woff"); } @font-face /* 12 */ { font-family: MJXTEX-CB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Calligraphic-Bold.woff") format("woff"); } @font-face /* 13 */ { font-family: MJXTEX-FR; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Fraktur-Regular.woff") format("woff"); } @font-face /* 14 */ { font-family: MJXTEX-FRB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Fraktur-Bold.woff") format("woff"); } @font-face /* 15 */ { font-family: MJXTEX-SS; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Regular.woff") format("woff"); } @font-face /* 16 */ { font-family: MJXTEX-SSB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Bold.woff") format("woff"); } @font-face /* 17 */ { font-family: MJXTEX-SSI; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_SansSerif-Italic.woff") format("woff"); } @font-face /* 18 */ { font-family: MJXTEX-SC; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Script-Regular.woff") format("woff"); } @font-face /* 19 */ { font-family: MJXTEX-T; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Typewriter-Regular.woff") format("woff"); } @font-face /* 20 */ { font-family: MJXTEX-V; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Vector-Regular.woff") format("woff"); } @font-face /* 21 */ { font-family: MJXTEX-VB; src: url("https://cdn.jsdelivr.net/npm/mathjax@3/es5/output/chtml/fonts/woff-v2/MathJax_Vector-Bold.woff") format("woff"); } .[3] This way, the sum of all priors is finite and can be normalized to 1.

Then, I look at all the observations I have made so far, I do the Bayesian updating starting from this above-described prior, and that's how you make predictions about unknown events.

This matches our intuition nicely. If we have no evidence about whether the sun will cease to exist on June 1st, we should assign this low probability, because the program encoding a special caveat for June 1st is longer than one without it.

Problems with Solomonoff induction

It's tempting to say that one should define probabilities as the result of Solomonoff induction. Probabilities would be still subjective in the sense that no one can actually run the full Solomonoff induction, so we are all just giving our best guesses. But I can at least still say that the guy who gives 50% probability to Bigfoot standing next door is wrong in the sense that I'm confident that's not close to what the Solomonoff induction says.

There are several problems, however. I will not engage with the problem of Solomonoff induction being uncomputable[4] - I think it would still provide a valuable philosophical grounding of probabilities even without it being computable. I will also not engage with the problems of the agent reasoning about itself, explained in the Embedded agency post.[5] But there are some other problems I plan to engage with:

1. Why assume computability? I can't find it anymore, but Wei Dai has a very old post asking what we would do if an advanced alien civilization, who otherwise showed themselves to be trustworthy and benevolent, told us they had a halting oracle. Should we give 0% probability that they are telling the truth, given that our prior only contains computable universes and those can't have halting oracles in them? Why should we be so certain that all our observations are produced by a computer program? Isn't this a kind of arbitrary assumption?

2. Which Universal Turing Machine? Solomonoff induction weighs hypotheses by how long they are to write as programs on a Universal Turing Machine. But there are many different Universal Turing Machines - which one should we rely on? After all, there exists some convoluted Universal Turing Machine on which "the laws of physics plus Bigfoot standing next door in this particular moment" is actually a very short program, because Bigfoot-next-door is baked into the programming language.

Proponents of the Solomonoff induction like to point out that different choices of the UTM only lead to a finite constant factor difference in how big a probability Solomonoff induction assigns to various predictions, and with unlimited evidence, the results converge. But in practice, I don't have unlimited evidence. I want to decide whether to go next door, and I don't want to be eaten by Bigfoot. If my friend says Bigfoot is 50% likely to be there, I want to have some counter-argument, instead of just shrugging that there exists a UTM under which this is a reasonable estimate.

3. Description length of my observations, not the universe. Our intuition is that the laws of nature should be simple. But if I naively apply Solomonoff induction to my observations, the shortest program producing what I, David Matolcsi, am observing is not just a description of the laws of the universe. It's the laws of nature plus a pointer to my specific location in the universe. These two pointers together are hopefully still shorter than a raw dump of my observations.[6] But now the simplicity prior operates not just over the laws of the universe, but also over my place in it. According to the Solomonoff-prior that gives probability to all n-long descriptions, the probability that I am in a moment whose shortest description is at least N-long should only be 1/N.[7] This would imply that I'm probably in a simple-to-describe place in the universe, but it doesn't really look like it, especially if I take into account the quantum multiverse.

4. Simulations and malignity. As I explained in my previous post, and as discovered by Paul Christiano and others, the Solomonoff induction is malign. You can read my full post, but here is a brief summary.

It really looks like we are in a very special small region of space-time.

We live in the millennium when it's likely that our species either goes massively multi-planetary or dies. Every species goes through this crucial millennium at most once. Planets absorb only a small fraction of stellar energy, most planets don't naturally spawn life, a millennium is vanishingly short compared to a planet's history, and only a tiny fraction of energy during that millennium sustains biological minds reflecting on things.

This means an extremely small fraction of all negentropy[8] in the history of the universe is used to power biological minds living in their species' crucial millennium. On the other hand, it seems plausible that a technologically mature, galaxy-spanning civilization can capture and put to their own use a large fraction of the negentropy of the universe.

I have no reason to think that the universe that looks like this one has an especially high prior in the Solomonoff-prior compared to many other, similarly large universes that sustain intelligent life. If there is even a one-in-a-billion chance that a powerful space-faring civilization dedicates even a one-in-a-billion fraction of its harvested resources to simulating minds that believe they are biological beings living through their crucial millennium, this vastly outweighs the real instances.[9]

So if it looks like you are living in the crucial millennium of your species' history, you are probably in a simulation. But there are many different possible simulations, some quite short, some quite weird, many basically solipsistic (only simulating one decision of one or a few people). Given that short, solipsistic simulations are much cheaper to run, there are plausibly more of them.

So if you find yourself making a decision that might be important for the future of humanity (and this decision might be as mundane as publishing a blog post), then you should have a significant probability of being in a short solipsistic simulation. But then every probability estimate you make about your future ("will it rain when I step outside?") is heavily influenced by your expectations on what kind of simulation you might be in, and this can lead to very unintuitive results, which are contrary to how we normally think about probabilities.

In particular, if you try to make any important decision based on your all-things-considered probability estimate, then plausibly your probability estimates will be dominated by aliens trying to simulation-capture you to influence the predictions of your copies in base reality. Being influenceable by these simulation-captures is what's called the malignity of Solomonoff induction.

—-

While I think Solomonoff induction is a good starting point, and I will get back to it later in this sequence, I think these problems are serious enough that it's not reasonable to define probabilities as the result of Solomonoff induction. I think Problem 3 may be solvable with a different formalism (I will write more about this in my next post), but Problems 1, 2 and 4 afflict all formalized priors I can think of.

This makes me think that defining probabilities based on a formal prior is not a very useful concept, and doesn't really match how we normally think about probabilities.

Probabilities as betting odds

For most confusing philosophical questions, I think the best way to get out of the definitional quagmire is to try to form the questions in a way that is action-relevant. If I need to make an actual decision in a (possibly hypothetical) situation, that often clarifies my thinking, and dissolves the semantic squabbles that were irrelevant to the main question.

In the case of probabilities, I think it's often best to think of them as the betting odds at which I'd be indifferent between betting in either direction.

If the weather forecast says 37% chance of rain, and I trust it, then I'd accept a bet at 30% odds on rain but not at 40%. The point of indifference is 37%, so that's my probability. There must always be one set of betting odds at which I'm indifferent to betting, so this can be a coherent definition of probabilities.

Some people don't like these betting-based definitions, and insist that there must be something more real in probabilities than just how one would bet.[10] I will write more about this in a future post, but for now I will just say that I'm myself very sympathetic to thinking in terms of bets. I believe basically everything can be formulated as a "bet", and I don't quite see what could be there about probabilities that can't be phrased this way.

"What do you anticipate happening?" From my perspective, anticipation is nothing else than thinking about the consequences of an event. That's useful if the event happens, and a waste of time if it doesn't. Therefore, whether I anticipate an event translates to whether I want to bet my time on thinking about it.

"Aren't you surprised by this event?" To me, surprisal is just getting into a situation that I didn't make plans for. It's equivalent to losing a bet: I wagered my time on thinking about the consequences of the other possibility, but the outcome that I didn't bet on had come to pass.

This leads me to believe that thinking in terms of what bets I would make is all there is to say about probabilities. However, the terms of the bets often get confusing, and I will eventually need to conclude that in some cases, thinking about probabilities is just not the right thing to do at all.

Sleeping Beauty

Before I go further in exploring this betting-based definition, I will introduce a famous puzzle in anthropics which will help illustrate some difficulties.

Sleeping Beauty is put to sleep by researchers. During the two days that her sleep will last, the researchers will briefly wake her up either once or twice, depending on the toss of a fair coin (heads: once; tails: twice). After each waking, they will put her back to sleep with a drug that makes her forget that waking. When Sleeping Beauty is woken up, what probability should she give that the coin toss is heads?

Some argue the answer should be ½: after all, she is predicting the result of a fair coin flip. Some argue it should be ⅓: if the experiment happened many times, then only about ⅓ of Sleeping Beauty's wake-ups would happen in situations where the coin landed on heads.

Sleeping Beauty taking bets

Let's try to solve this puzzle in terms of the betting-based definition.

Whenever Sleeping Beauty wakes up, she is offered a choice to bet $1 on the coin coming out heads. What are the betting odds where Sleeping Beauty should be indifferent to entering the bet?

With this operationalization, the answer is clearly 1/3: that translates to Sleeping Beauty making a bet at each awakening that she will pay $1 if the coin came up tails, and will gain $2 if it came up heads. Looking at this from before the experiment started: with 50% probability, the coin will land on heads, Beauty will be awakened once and will gain $2 on the bet. With 50% probability, the coin will land on tails, she will be awakened twice, and will lose $1 twice. This strategy generates 0 money in expectation, so a bet with an implied probability of 1/3 is what makes Sleeping Beauty indifferent.

The trouble with money-based definitions

However, operationalizing probabilities through monetary bets gets funky pretty quickly. What's the probability of hyperinflation in the next 10 years? If I operationalize "is it above 10%?" as "would I prefer one dollar conditional on no hyperinflation, or ten dollars conditional on hyperinflation?"—well, ten dollars during hyperinflation isn't worth much.

And it's not just inflation. Money's value correlates with all sorts of things. A marginal dollar has different value depending on how rich you will become. For a utilitarian, the value of a dollar is also dependent on how much leverage you have over the future; a dollar is more valuable if you have more leverage. For example, the number of alien civilizations affects your estimate of humanity's expected share of cosmic resources, and therefore affects how much you can expect to influence the cosmos from spending a dollar on AI safety work today. So it becomes confusing to operationalize your probabilities on whether aliens exist in the lightcone via hypotheticals on which odds you would bet on it.

All of this means that defining probabilities in terms of monetary bets is often not the right choice.

Betting on experiences

It might be more useful to imagine betting on experiences. The probability of an event is 10% if I'm indifferent between savoring a piece of chocolate if the event occurs versus savoring a piece of chocolate if a random number generator rolls below 0.10.[11] I think Paul Christiano uses a definition like this in this comment to operationalize the probability of being in a simulation.

However, this seemingly reasonable definition also leads to some pretty strange places. For example, let's see how this changes the Sleeping Beauty analysis.

Suppose that whenever Beauty wakes up, she can receive a piece of chocolate if the coin landed on heads, or receive a piece of chocolate if an independent random number generator produces a number below p. We can define the p for which she is indifferent between the two choices as her probability of the coin landing heads.

This boils down to a value judgement: is waking up twice, eating the same type of chocolate both times, then forgetting both, twice as valuable as eating it once then forgetting it? If you think yes, it's exactly twice as good, then you should bet with ⅓ implied probability.

But you could also think that eating a chocolate once, or going through the exact same experience twice in a memory-wiped state are equally good. Then if you bet on heads, you get the experience with ½ probability, and if you bet on the random number generator, you get the experience at least once with probability. So the point of indifference is when , so according to this definition, Sleeping Beauty should give a probability to the coin landing on heads.

If you believe that eating two identical chocolates and forgetting them is somewhat better but not exactly twice better than eating the chocolate once,[12] then under this definition, your probability of heads should be somewhere between 0.333 and 0.382, depending on your exact philosophical views.

I think the Sleeping Beauty problem is not just an edge-case. This dependency on your philosophical views on copied experiences is something that pops up whenever you try to reason about simulations and infinite universes if you define probabilities using the bets on experiences.

This is a pretty unnatural way for probabilities to work, so if you insist on defining probabilities, we should look for something else.

Betting on terminal values

Perhaps the cleanest definition uses an even more hypothetical terminal value: a new happy planet appearing somewhere far away, unaffected by anything on Earth. "Would I prefer a happy planet to appear if there's hyperinflation, or a happy planet to appear if the RNG rolls below 0.10?" If I'm indifferent, hyperinflation has a 10% probability, because the planet is far away and unaffected by indirect correlations.

In the Sleeping Beauty question, I think I'm back at ⅓ implied probability with this definition.

Unfortunately, even this breaks down for sufficiently abstract questions. "What's the probability of being in a simulation?"—where does the planet appear, inside or outside the simulation? "How many alien civilizations exist?"—depending on some philosophical considerations, at some point adding an extra planet to the already teeming alien life might have diminishing returns in value.

Altogether, I don't think there is a clean definition of probabilities based on betting that makes probabilities a useful concept in full generality.

Probabilities for the exotic and the mundane

Ultimately, what matters is not how I define probabilities, but how I make decisions. I will argue in my next two posts why I am mostly acting in a way as if I was assuming a materialistic world-view and that we are outside the simulation.

Under these assumptions, probabilities are a useful abstraction.

Probabilities in the mundane world

For mundane questions—rain, hyperinflation, AGI timelines—I mentally translate "probability" to what implied probabilities I would bet with if I was betting on far-away planets appearing, assuming that we don't live in a simulation and assuming a materialistic world-view.

Imagining probabilities in terms of these bets on terminal value is a useful definition for me. When I'm deciding whether to bring an umbrella with myself, I have some intuitive estimate of how much productivity it would cost me to get drenched in the rain and how much productivity it would cost to spend time on carrying and storing my umbrella. I try to work on things that matter for my terminal values, so productivity translates to value. So once I know how I would bet in terms of terminal values (e.g. far-away happy planets appearing), I can use that information in an expected value calculation for various decisions related to rain: whether to bring an umbrella, whether to bring a rain jacket, whether to invest in farm-land, etc. This makes probabilities a useful abstraction for mundane questions.[13]

Letting go of probabilities

For philosophically confusing questions involving anthropics and the simulation hypothesis, I refuse to answer with probabilities and instead ask what exact bet we are hypothetically making, or what action we need to decide on. This makes me reluctant to pick a side in the SIA vs SSA debate in anthropics; I just don't believe it's the right level of abstraction to ask these questions. (Though SIA is generally closer to the mark in my opinion.)

Similarly, I can't in good-faith respond with probabilities to questions that don't make sense under materialistic assumptions, like "what is the probability that Jesus rose from the dead?" Amending "…assuming a materialistic universe" defeats the purpose of the question. It's a somewhat awkward position that I can't give straightforward probabilities if someone asks about Jesus, and instead I need to say that "for complicated philosophical reasons, I'm mostly acting as if he was an ordinary human".[14] But I maintain that there is no good way to put probabilities on this question - Jesus rising from the dead is deep into the territory where probabilities stop being a useful abstraction.

Once I give a probability to Jesus rising from the dead, how do I deal with Pascal's Wager, with infinite reward standing on one side? In my next posts, I will discuss infinite ethics and dealing with the supernatural, but this will require going beyond natural notions of probabilities.

Also, if you insist on using probabilities, what is the probability that you are in a short solipsistic simulation now? And given that you are reading about Jesus right now, what's the probability that Jesus is indeed a centrally important character in a larger simulation and now the simulators are just testing how you are thinking about this character? As I said above, I ignore simulations when asked for probabilities of mundane events, and I will present arguments for this choice in a later post. But given how similar gods and simulators are, it feels unfair to silently add "assuming we are not in any kind of simulation" when someone asks a question about the Son of God.

Finally, if you want to define probabilities outside mundane questions, you need to have some resolution to the SIA vs SSA question in anthropics. I'm sympathetic to Joe Carlsmith's arguments that SIA is generally more reasonable, and this would imply that we should accept the Presumptuous Philosopher's logic that we are more likely to be in worlds with more observers similar to us. But how does this interact with the supernatural? Did you know that a prominent strain within Mormon theology claims that we are in an infinite causal chain where people ascend to godhood and create new worlds - a chain of creation without start or end?[15]

I will try to deal with all these considerations about the supernatural in a later post, but that will not be based on the concept of probabilities anymore.

Conclusion

Altogether, I think probabilities are a useful abstraction under some circumstances, but for the more complex questions I need to fall back to a basic question:[16] I want to choose between action A and B, and taking into account all considerations, I want to know which action leads to a better world according to my values.

Of course, this is easier said than done. When I'm deciding whether to bring an umbrella with myself, I'm helping the versions of myself that live in worlds where it's going to rain, and I'm inconveniencing the versions of myself that live in worlds where it's not going to rain. So I will need some method to weigh against each other the consequences of my actions in infinite possible worlds. I will write more about my proposed solutions in my next posts, but I believe that probabilities are not the right abstraction to handle these questions in general.

  1. ^

    For the avoidance of doubt: The views and opinions of the author expressed herein are personal and do not necessarily reflect those of the European Commission or other EU institutions.

  2. ^

     I’m only familiar with the LessWrong line of thought on these topics. I’m woefully unaware of the academic philosophy tradition, and I’m possibly rediscovering ideas that appeared there too.

  3. ^

     If the prior probabilities were only proportional to then the overall probabilities of n-length programs would add up to 1 for every n, and the full sum would be infinite. So we need a somewhat stronger decay in probabilities - now the overall probability of n-length programs is , and the sum of these is finite. We could have also chosen a different decay factor that ensures a finite sum.

  4. ^

     That means there is no algorithm that can compute the Solomonoff-prior of strings up to arbitrary precision.

  5. ^

     I think the problems of embedded agency might be important; I just haven’t really engaged with them yet.

  6. ^

     Otherwise, if I believed there were no universe laws plus location pointer that were simpler than my raw observations, then I’d basically think of myself as a Boltzmann-brain and I couldn’t predict any next observations.

  7. ^

     The overall prior of all n-long descriptions is , and summing from N to infinity is approximately 1/N.

  8. ^

     I’m not a physicist and I’m not actually sure that negentropy is the right term here, but something like this seems right.

  9. ^

     There is some complication that maybe the real crucial millennium has unusually short description-length, so it gets relatively large weight within the universes. But I believe that the rest of space-time likely still holds much larger weight, so turning a fraction of that into simulations still outweighs the real crucial millennium.

  10. ^

    For example, Joe Carlsmith expresses skepticism of defining everything through betting in this post.

  11. ^

     I love chocolate.

  12. ^

     This is the view that matches my intuition.

  13. ^

     Of course, in practice, when I’m deciding whether to bring an umbrella with myself, I’m not thinking exclusively in terms of work productivity. I’m often thinking in terms of how things would make me feel. Ideally I would only take my well-being into account to the extent it matters for productivity and wisdom to make the world better. In the rest of this series, I will implicitly rely on the assumption that my only goal is trying to pursue the scope-sensitive Good (otherwise, the entire theory I’m building here kind of goes haywire). I actually aspire to live like that, though of course I can’t promise I’m always living up to this ideal - the spirit is willing but the flesh is weak. 

  14. ^

     I will write a bit more about how I relate to existing religions in a later post.

  15. ^

     I would love to read someone sincerely making this SIA argument for Mormonism. Unfortunately, I couldn’t find any examples of this on the internet.

  16. ^

     Arguably the only important type of question that exists




Discuss

Your Left Brain Doesn't Trade With Your Right

Новости LessWrong.com - 23 мая, 2026 - 18:12

[see also Four Ways Learning Economics makes you people dumber future AI]

This is a tweet by Seb Krier that caught my eye. The exact person and exact points are incidental. It illustrates what to is a flaw in many 'economics' frames on AI. 

Expecting a model to do all the work, solve everything, come up with new innovations etc is probably not right. This was kinda the implicit assumption behind *some* interpretations of capabilities progress. The ‘single genius model’ overlooks the fact that inference costs and context windows are finite.

(...)  People overrate individual intelligence: most innovations are the product of social organisations (cooperation) and market dynamics (competition), not a single genius savant. 

 (...) Most of the *value* and transformative changes we will get from AI will come from products, not models. The models are the cognitive raw power, the products are what makes them useful and adapted to what some user class actually needs.

This seems to me missing something incredible important about what Artificial General Intelligence will actually be.  [1] There is a certain type of economist [eg Tyler Cowen] that will proclaim AGI is near [or even already here!] and apply their standard economics tools to confidently proclaim AGI will not be dangerous, or it won't meaningfully impact growth rates, or it will adhere to human contracts and all this AI safety stuff is silly nonsense, even regulatory capture!

AGI as a Tool; AGI as an Agent 

Let's start with: thinking of AGI as a Tool instead of as an Agent.

"Most of the *value* and transformative changes we will get from AI will come from products, not models. The models are the cognitive raw power, the products are what makes them useful and adapted to what some user class actually needs."

The point of AGI is exactly its generality: learning how to make good products, or scaffolding around ' raw intelligence' is itself a task that can be learned. Indeed it is learned by humans every day. 

The Bitter Lesson, Again and Again Rich Sutton warned us: betting against scale is a losing game. Yet every few months, someone announces their specialized AI that finally beats the frontier models at medical diagnosis or legal reasoning through "clever architecture" or "curated data." A few months later? The next GPT or Claude absorbed their innovation and surpassed them while simultaneously improving at everything else. The Bitter Lesson isn't just about chess or Go anymore - it's about everything. Specialized training on curated datasets can't compete with the universal learner trained on everything. Economists predicting stable specialization are making the classic mistake Sutton identified: betting on human ingenuity over computational scale.

Expecting a model to do all the work, solve everything, come up with new innovations etc is probably not right. This was kinda the implicit assumption behind *some* interpretations of capabilities progress. The ‘single genius model’ overlooks the fact that inference costs and context windows are finite.

People overrate individual intelligence: most innovations are the product of social organisations (cooperation) and market dynamics (competition), not a single genius savant. 

There is a remarkable uniformity and linearity in the AI capabilities of AI models. To a very good approximation AIs can be pretty linearly ordered in their capabilities. The frontier models produced by OpenAI, DeepMind, xAI, Anthropic are simulatenously the SOTA for virtually all AI tasks[2] . There aren't really specialized models doing all kinds of specialized work. Rather it is overwhelmingly the case that virtually all tasks that can be done well by AI are done best by frontier large language models. 

Why is this the case? AIs are trained on the whole of the internet. Any innovation that is made by one company is quickly absorbed by the others. New workflows, scaffolding, tools, specific business contexts can be absorbed through extra finetuning, in-context learning or simply more compute. Vision and image generation is easily integrated into a larger multimodal large language model. 

There is not much economic sense in training many different AIs. Nor is there much sense in building specialized AIs trained on only specific data sets. On the whole you want to spend as much of your compute on as much data as you can on one mega model.  

One Big Transformer

Actually instead of 'general intelligence' I think it's better to talk about 'universal intelligence'. In other words, an intelligence that can absorb the skills and abilities of any other intelligence. We have some idealized formal models [solomonoff induction, AIXI] of what a universal intelligence might look like. 

These mathematical models are highly idealized of course but they come down to a remarkable idea: one can simply amalgate different intelligences/minds/AIs into one big intelligence/mind/AI that is (almost) as good at any task as any of its constitutent intelligences/minds/AIs. Ultimately all intelligence may be absorbed into one super universal singleton intelligence. 

Current AI already looks remarkably like this idealization. In a way LLMs are closer to this idealized universal intelligence than humans. 

Humans can't directly amalgate into one big super smart human. Humans can't directly share their thoughts, knowledge or abilities. Their abilities are limited by the size of their skulls, the length of their lives, the limits of their senses. 

Humans can share the contents of their minds much better than lower animals using language. Indeed is oft argued that language is the reason the human species rules over the lower animals. Using language humans can share skills, knowledge, abilities, and coordinate strategies over vast distances in space and time to many other humans simulatenously. 

 

Transformers combining into one big transformer

The Dismal Science 

Humans can't directly unify into one big human. This neccesitates complex coordination mechanisms for coordinating efforts, this includes culture, institutions and  markets. Society retains specially trained humans to analyze these mechanisms. We call these economists. Despite a constant barrage of criticism from their envious social science and humanity cousins, economists have been fantastically succesful in their ability to describe and prescribe society.

Consider what enables economics: agents can't share their internal states, learning is costly and slow, knowledge transfer is lossy, coordination requires negotiation, and capabilities are rivalrous (if I'm using my brain for law, I can't simultaneously use it for medicine).

For AGI, none of these constraints may be relevant. Minds can fork and merge. Training can be instant through weight sharing. Coordination happens at silicon speed without contracts. When one AI masters a new domain - say, protein folding or contract law - it won't need to teach others through language or demonstration. It will simply share the relevant weights, like copying a file. The receiving AI instantly acquires years of "experience" in milliseconds.  

Will AGI need currency? Currency exists because humans can't directly verify and compare utility functions. Will it need prices? Prices exist because information about preferences and production possibilities is distributed and hidden. Will it need contracts? Contracts exist because commitments can't be directly verified and trust must be manufactured. Will it need property rights? Property rights exist because rivalrous goods require allocation mechanisms. When a unified intelligence can directly observe all its subsystems, perfectly coordinate its actions, and share all resources optimally - these mechanisms become vestigial, like discussing the TCP/IP protocol between neurons in a brain.

The Comfort of Familiar Frames

The insistence that institution and culture and economics and multiagent systems will be a useful frame to look at the nature of AGI is widespread. This is implicit in the otherwise revolutionary Hanson's " world of Em", in Eric Drexler's " AI services", "a datacentre of geniuses",  and all over economist's models of the future of AI. 

But is it a good frame? Economics is most relevant when there are many different individuals with different skills, abilities, knowledge, etc that nonetheless are attempting to coordinate. 

Economists studying AGI like a market phenomenon may be akin to biologists studying computers through the lens of evolution - technically possible, occasionally insightful, but probably fundamentally missing the point. The economic frame persists not because it's accurate but because it's comfortable. It allows experts to feel relevant without confronting the possibility that their expertise might become obsolete.

The economists' frame may be precisely inverted. They're trying to understand a unified intelligence through the lens of coordination mechanisms that exist only because unified intelligence is impossible for humans. The question isn't "how will AGIs trade?" but "why would they remain separate enough to need trade?"

Your Left Brain Doesn't Trade With Your Right

Consider again Seb Krier's claim:

People overrate individual intelligence: most innovations are the product of social organisations (cooperation) and market dynamics (competition), not a single genius savant. 

Human innovations often emerge from collaboration rather than isolated genius. But economists mistake this for a fundamental truth about intelligence rather than a workaround for human limitations. We collaborate because we can't fit all knowledge in one head, can't live long enough to master everything, can't directly share our trained neural patterns. A universal AGI doesn't collaborate with itself any more than your left and right brain hemispheres engage in "trade." It simply thinks, with perfect internal coordination that makes our best institutions look like children playing telephone.

A much better analogy for the future and nature of AGI may be that of a superintelligent (benevolent?) hivemind. 

  1. ^

    Perhaps Seb and his intellectual ilk don't believe AGI is possible or are very sure it will take another thousand years. This would seem to be at odds with the enormous amount of progress we have seen in recent years. It would seem that at the very least they should flag this rather extreme epistemic position and I'm a little skeptical they will defend this position when pressed. It is nevertheless possible that they believe this; I don't know this specific person well but I doubt it. 

  2. ^

    with some minor exceptions, eg Go



Discuss

Страницы

Подписка на LessWrong на русском сбор новостей