Вы здесь
Новости LessWrong.com
Why We Should Talk Specifically Amid Uncertainty
I am often frustrated by those who promote vibes and deliver aimless soliloquies. We would often be better served by speaking specifically, more concisely, and boldly. From the average meeting room to the American political landscape, we are harming ourselves by speaking vaguely, and current roadblocks in policymaking across many facets of society are exacerbated by unspecific and unserious discourse. It is not just a political and social imperative, but instrumentally useful to speak specifically and intently.
Spend more time to speak lessIf I had more time, I would have written a shorter letter
- Blaise Pascal
Any student learns that their opening paragraphs are the most important for introducing their argument and intent. Writing this way serves two critical functions: to frame the rest of the paper for the reader and the author. A common adage is that concise writing is the product of thorough writing. A good revision process forces you to reevaluate your intent for each sentence, which reveals redundant, awkward, or dangling ideas. A good introduction and thesis force you to recursively reevaluate every idea, argument, and paragraph. By stating your intentions, you can tell yourself what's important and what can be omitted.
Speaking is a very similar process. I've had the privilege to deliver many presentations to peers or managers throughout high school, university classes, and internships. I competed in Lincoln-Douglas NSDA Debate for three years, led my Boy Scout troop for a short stint, and have presented technical projects, at separate times, to my school administration, internship managers, and corporate leadership. I am also a very nervous speaker, and despise most forms of public speaking. I still often shake when I speak, but equipping myself with speaking intuition has given me enough confidence to survive. The most important guideline for speaking is to speak with intent and announce your intent. Part of this is, like a good comedian, knowing your audience. Separating what your audience needs to know from what they don't is a vital skill.
What this looks like in practice is that when you give a presentation, announce clearly, even if it appears awkward at first, what you want the audience to take away from your presentation. This need not be the first sentence in your presentation; like in writing, you should soften this with a clean introduction. In many semi-professional or professional settings where your presentation is a part of a meeting, this should include an evaluation of what input others need to provide. Understanding why you're having a meeting instead of sending someone a message or asking someone 1-on-1 forces you to ask tough questions about the intent behind the meeting. If you know what actionable decision or question you need answered in a meeting, then you know what to ask, which provides hints at what you need to present as context. Doing this helps avoid the complaint that "This meeting could have been a[n] {email|slack message|text|teams message}."
If a meeting diverges from a topic where attendees can make actionable decision, then someone should steer it back on track. Actionable decisions are key to this work; vague goals like "getting on the same page" or anything involving the word "vibes" do not classify them as actionable decisions. Employees cannot write design documents, reports, run experiments, or engineer products based on vibes or vague agreements. In a universe where time is finite and a world where time is money, being intent with your speech is imperative for the health of an organization. Suppose a single employee can hold entire teams hostage for a sizable amount of time for no reason. In that case, it costs a business thousands if not millions of dollars, depending on the meeting and which levels of leadership are involved.
Long, purposeless meetings are thus not the grand design of a malevolent, capitalist force wanting to waste the precious time of workers, but the result of poor intentionality and planning. The good news is that this empowers anyone to right this wrong without an omnipotent force driving this corrosion.
See also Patrick Winston's lecture on How To Speak. The CIA Manual for Sabotage also reads, appropriately, like the exact opposite of the advice I've just given.
The CIA Manual for SabotageThe CIA declassified a World War II manual on sabotaging organizations. I've copied and reformatted a few sections I think are relevant.
General Interference with Organizations and ProductionOrganizations and Conferences- Insist on doing everything through "channels." Never permit short-cuts to be taken in order to expedite decisions.
- Make "speeches," Talk as frequently as possible and at great length. Illustrate your "points" by long anecdotes and accounts of personal experiences. Never hesitate to make a few appropriate "patriotic" comments.
- When possible, refer all matters to committees, for "further study and consideration." Attempt to make the committees as large as possible - never less than five.
- Bring up irrelevant issues as frequently as possible.
- Haggle over precise wordings of communications, minutes, resolutions.
- Refer back to matters decided upon at the last meeting and attempt to re-open the question of the advisability of that decision,
- Advocate "caution." Be "reasonable" and urge your fellow-conferees to be "reasonable" and avoid haste which might result in embarrassments or difficulties later on.
- Be worried about the propriety of any decision - raise the question of whether such action as is contemplated lies within the jurisdiction of the group or whether it might conflict with the policy of some higher echelon.
- Demand written orders.
- "Misunderstand" orders. Ask endless questions or engage in long correspondence about such orders. Quibble over them when you can.
- Do everything possible to delay the delivery of orders. Even though parts of an order may be ready beforehand, don't deliver it until it is completely ready.
- Don't order new working' materials until your current stocks have been virtually exhausted, so that the slightest delay in filling your order will mean a shutdown.
- Order high-quality materials which are hard to get. If you don't get them argue about it. Warn that inferior materials will mean inferior work.
- In making work assignments, always sign out the unimportant jobs first. See that the important jobs are assigned to inefficient workers of poor machines.
- Insist on perfect work in relatively unimportant products; send back for refinishing those which have the least flaw. Approve other defective parts whose flaws are not visible to the naked eye.
- Make mistakes in routing so that parts and materials will be sent to the wrong place in the plant.
- When training new workers, give incomplete or misleading instructions.
- To lower morale and with it, production, be pleasant to inefficient workers; give them undeserved promotions. Discriminate against efficient workers; complain unjustly about their work.
- Hold conferences when there is more critical work to be done.
- Multiply paper work in plausible ways. Start duplicate files.
- Multiply the procedures and clearances involved in issuing instructions, pay checks, and so on. See that three people have to approve everything where one would do.
- Apply all regulations to the last letter.
...
- Work slowly. Think out ways to increase the number of movements necessary on your job: use a light hammer instead of a heavy one, try to make' a small wrench do when a big one is necessary, use little force where considerable force is needed, and so on.
- Contrive as many interruptions to your work as you can: when changing the material on which you are working, as you would on a lathe' or punch, take needless time to do it. If you are cutting, shaping or doing other measured work, measure dimensions -twice as often as you need to. When you go to the lavatory, spend a longer time there than is necessary. Forget tools so that you will have to go back after them.
...
There is no prize to perfection, only an end to pursuit
- Viktor (Arcane)
I was privileged enough to attend EA: Global in New York City in October of last year. Between meeting with AI Safety researchers and policymakers and trying an assortment of vegan meals (and soylent), I sat in the basement of the Sheraton in Times Square in a sterile hotel meeting room. I listened to a longtime staffer at the Department of War (formerly Department of Defense). He gave a short lecture on the theory of change, speaking to those interested in AI Safety policymaking, and gave, for me, the most interesting speech I heard all weekend. In between in-jokes about shrimp welfare, he criticized the Rationalist/EA community for its failures to promote policy, a criticism that, I believe, extends to most, if not all, center, center-left, and progressive political groups. To him, Rationalists and EA are full of idealists and scientists, but policymaking is neither ideal nor a science; it's an art, or if you like, engineering. Policies are inherently imperfect because they operate in a fundamentally imperfect world. They compromise, they bend, and, sometimes, they break.
In communities where members tiptoe gingerly around sensitive subjects and strive for idealistic purity, attaching yourself to bold policy makes you vulnerable to criticism, often leading to their promoters shirking the responsibility altogether, or stacking on enormous qualifiers that render their promotion meaningless. This is a natural, if self-defeating, instinct via the tragedy of the commons. By not attaching oneself to imperfect, unpopular policies, you avoid the ideological litmus tests and criticism others will almost certainly throw at you. The side-effect is that this has a cultural effect of chilling the promotion of any specific and actionable policy, turning the entire movement into a giant meeting about ideas and "getting on the same page." He asked the audience, trusting it was full of well-meaning, intelligent people, to be more courageous with their advocacy, and we must take his advice to heart. AI safety, climate change, and global health require specific and actionable policy, not ideas, not buzzwords, and certainly not vibes.
While the Rationalist/EA/AI Safety communities have dedicated years to trying to prepare the world for transformative AI, we do not have definite, specific policy proposals floating around that policymakers can pick from which advance AI Safety and societal well-being. And, I have a strong suspicion that we will need specific, actionable policy which materially affects many people very soon. Based on the increasing backlash towards AI in popular culture, due to rising utility costs, rising consumer hardware prices, environmental concerns, and intellectual property concerns, I expect a major political alignment at least within the United States sometime soon (O(~1.5 years)) that might primarily revolve around these issues. While maybe not as important as the timeline until transformative AI exists, the timeline until the general public cares about these issues might be sooner. Without clear policies which can balance these concerns with AI safety concerns, we could see populist rhetoric prevent the important work that needs to be done.
I'd be hypocritical not to take a stand for a major policy, but I'll have to qualify that I only know the American political landscape well. I'm a big believer in eliminating bureaucratic inefficiencies and expanding infrastructure. A version of the green new deal that expands electric infrastructure, in conjunction with data center build-outs, would reduce the costs of electricity such that new data centers don't materially damage the average citizen via higher prices. Better electric public infrastructure would also reduce daily transportation costs, and upgraded electric infrastructure provides the opportunity to secure the electrical grid for national security purposes and provide resilience. Is the GND a perfect policy? No. More recent versions have been vague house resolutions, not actual bills. But, it's a large-scale policy that materially affects people's lives and might solve many issues we face.
The Case for CourageThe penultimate note from the policymaking talk at EA: Global was a quote from economist Milton Friedman:
Only a crisis - actual or perceived - produces real change. When that crisis occurs, the actions that are taken depend on the ideas that are lying around. That, I believe, is our basic function: to develop alternatives to existing policies, to keep them alive and available until the politically impossible becomes the politically inevitable.
We are in crisis. Our economy is not growing, our Democratic Republic is weakening, and we're on the precipice of drastic technological changes. Our policymakers are scared of policy. Like deer in the headlights, policymakers are petrified, scared of the people who chose them. The solution is clear: say what you mean. Policymaking is iterative, so let the first iterations be wrong and unpopular. Refine it, change it, and keep the idea alive. Without discourse of real, specific policy, we may find ourselves ideating about a perfect world while the opportunity to create a better one slips away.
For Democrats, AI Safety Hawks, Progressives, and anyone I know who is sane, rational, and well-meaning, courage is required. Write and promote specific actionable policy; be wrong; be bold; be courageous. Talking about vibes makes for good TV, but only policy makes leaders fit to lead.
See also: https://www.ettingermentum.news/p/against-vibes
See also: https://www.lesswrong.com/posts/CYTwRZtrhHuYf7QYu/a-case-for-courage-when-speaking-of-ai-danger
Discuss
Companies as "proto-ASI"
We don’t have AI that’s smarter than you or I, but I believe we do have something that’s somewhat similar, and analysing this thing is useful as an argument in favour of ASI not being aligned to humanity’s interests by default.
epistemic status: I largely believe this argument to be correct, although it’s quite hand-wavy and pleads-to-analogy a bit more than I’d like. Despite (or possibly because of) this, I’ve found it incredibly useful in motivating to (non-technical) relatives and friends why I don’t believe ASI would “just be kinda chill”. While the argument might be flawed, I strongly believe the conclusion is correct mostly due to more thorough arguments that are trickier to explain to relatives over Christmas dinner.
Large corporations exist, and are made up of 100-10k individual human brains all working in (approximate) harmony. If you squint, you can consider these large corporations a kind of proto-ASI: they’re certainly smarter and more capable than any individual human, and have an identity that’s not tied to that of any human.
Despite these corporations being composed entirely of individual people who (mostly) all would like to be treated well and to treat others well, large corporations consistently act in ways that are not attempting to maximise human prosperity and happiness. One example of this is how social media is designed to maximise advertising revenue, to the detriment of all else. There are many real-world examples, such as: Volkswagen cheating on emissions tests, ExxonMobil funding climate change deniers, various tobacco companies denying the health effects of smoking, or Purdue Pharma not disclosing the known addictive side-effects of OxyContin.
To make this clear: every company is an existence proof of a system that’s smarter than any individual human, is not “just kinda chill” and they are not aligned with human well-being and happiness. This is even more damning when you consider that companies are made up of individual humans, and yet the end result is still something that’s not aligned with those humans.
Given that large corporations exist today, and that they have values/goals significantly different from most people, I’m very doubtful that any ASI we build will have values/goals that are aligned with most people.
You might argue that corporations have values/goals aligned to the humans making up their board of directors, and I’d agree. But the analogous situation with ASI (where the ASI is aligned only to a small number of people, and not humanity as a whole) is also not good for humanity.
Discuss
AXRP Episode 47 - David Rein on METR Time Horizons
When METR says something like “Claude Opus 4.5 has a 50% time horizon of 4 hours and 50 minutes”, what does that mean? In this episode David Rein, METR researcher and co-author of the paper “Measuring AI ability to complete long tasks”, talks about METR’s work on measuring time horizons, the methodology behind those numbers, and what work remains to be done in this domain.
Topics we discuss:
- Measuring AI Ability to Complete Long Tasks
- The meaning of “task length”
- Examples of intermediate and hard tasks
- Why the software engineering focus
- Why task length as difficulty measure
- Is AI progress going superexponential?
- Is AI progress due to increased cost to run models?
- Why METR measures model capabilities
- How time horizons relate to recursive self-improvement
- Cost of estimating time horizons
- Task realism vs mimicking important task features
- Excursus on “Inventing Temperature”
- Return to task realism discussion
- Open questions on time horizons
Daniel Filan (00:00:09): Hello everybody. In this episode I’ll be speaking with David Rein. David is a researcher at METR focused on AI agent capability evaluation. To read the transcript of this episode, you can go to axrp.net, you can become a patron at patreon.com/axrpodcast, and you can give feedback about the episode at axrp.fyi. All right, David, welcome to the podcast.
David Rein (00:00:31): Yeah, thanks for having me.
Measuring AI Ability to Complete Long TasksDaniel Filan (00:00:32): So I think the work that you’ve been involved in that’s probably best known in the AI existential risk community is this paper that METR put out with a whole bunch of authors – I think the lead author is Thomas Kwa – “Measuring AI Ability to Complete Long Tasks”. What’s going on with this paper?
David Rein (00:00:51): Yeah, so Thomas Kwa and Ben West co-led the project. Basically the typical way we measure progress in AI is via benchmarks. So a benchmark is a set of tasks that you have an AI system – this could be a neural network or an agent or whatever – you have it try and complete the tasks and you count up how many of the tasks did the model succeed at. And when you create the benchmark, typically models do very poorly, and then over time people iterate and you can track progress on the benchmark, and eventually, typically, AI developers will achieve “saturation”. So model performance will either reach 100%, or there’ll be some errors in the benchmark and the model will do as well as it can be reasonably expected to do (because we think about there being a “noise ceiling” on some benchmarks.)
(00:01:58): But regardless, the point is that: you start out, models do poorly; some time passes, people improve them, and then they get better. It’s difficult with normal benchmarks to track progress over a very long period of time because benchmarks are typically restricted to either some particular domain or the tasks in benchmarks have a somewhat similar level of difficulty. And so to try and understand how progress in AI happens over a span of many years, before this work, the status quo was comparing different benchmarks to one another. So you’re like: it’s 2017 and you have these simple problems for models, and were like, okay, models can start doing those. And then now it’s 2025 and we have these way harder benchmarks, and we’re like, “Yeah, we can see that there’s been a lot of progress.” But we don’t actually have a single metric to track this progress. We’re kind of doing this qualitative comparison of the difficulty of benchmarks over time, and this is messy and people have different priors.
(00:03:18): So this work was motivated by trying to have a Y-axis, basically: a way of tracking progress and seeing what the trends in AI progress have been over a longer period of time than individual benchmarks typically have. And so the way we operationalize this is we look at the length of tasks for humans that models are 50% likely (or some percent likely) to be able to succeed at. So we have a really wide range of tasks ranging from a few seconds all the way up to eight or 10 hours. And crucially, this is the time the tasks take for people to complete. And we have a combination of having a bunch of people attempt the tasks and we see how long they take as well as just estimating how long the tasks take. And then for any individual model, we look at… Models do really well on the very short tasks, and then they do much more poorly on the long tasks. And we look at: for some given success likelihood, how long are those tasks? And we estimate this in a particular way that we could get into. But the main takeaway is [that] we want to see, for different models, how long are the tasks they can complete?
(00:05:00): And the very striking thing that we found is that, over the past roughly five years, there’s been an extremely robust systematic trend in the length of tasks that models are able to complete, to our best ability to understand the data that we’re seeing. It seems like this is fit very well by an exponential function. So the length of tasks that models are able to complete has been increasing exponentially over this period. There are big questions over how well we can expect this to continue in the future. But it seems like over this period, at least with this data that we’ve collected, there’s been this exponential trend.
(00:05:57): And that’s, I think, the striking result and the key novelty I think for us is this unified metric that can be applied to different benchmarks, for example – for different benchmarks, you can measure “how long do these tasks take people?” for very simple natural language processing benchmarks that were common in the 2010s. These tasks typically don’t take people very long, like a few seconds. And then for a lot of the tasks that people are having agents complete now, like difficult software engineering tasks, these tasks take people somewhere in the range of hours or something and models can sometimes complete those (although they’re still somewhat unreliable.)
Daniel Filan (00:06:45): Got you. Okay. First, before we go in, I guess I’d like to get a sense of what we’re talking about. So you say that there’s some tasks that take seconds, some tasks that take minutes, some tasks that take hours. Can you give me an example of what’s a thing that takes seconds? What’s a thing that takes minutes? What’s a thing that takes hours?
David Rein (00:07:03): Yeah, totally. So one example that’s representative of the tasks that we created that take people a few seconds to complete is: given a few files on a computer, which of these is likely to contain your password? And the file names are “password”, “email”, whatever.
Daniel Filan (00:07:34): I think it says “credentials”, the example in the paper, it’s not quite so–
David Rein (00:07:37): Yeah, exactly. Right.
Daniel Filan (00:07:41): So that’s an easy one.
David Rein (00:07:42): Yeah, that’s an easy one. And we have others that are similar.
Daniel Filan (00:07:48): And to give me a feel for how that relates to AI progress, what’s the first model that succeeds at that easy task?
David Rein (00:07:54): Yeah, that’s a great question. So GPT-2 succeeds. GPT-2 is actually the first model we tested. So I actually don’t know if earlier weaker models would succeed. I actually would bet that they would. I would bet that BERT is able to do this, for example. But yeah, we only went back to 2019.
Daniel Filan (00:08:19): Got you. And then to give me a feel for what it means for an AI to complete this task: so GPT-2… my understanding is that it’s basically just text completion. My understanding is that in the release it did not have tool use capabilities or stuff that modern LLMs have. So what are you actually doing to start with GPT-2 and end with, “does it succeed or fail on this task?”
David Rein (00:08:49): There are different things you can do I think that are reasonable here. I can’t remember the specific one we ended up on in the paper, but one example is just looking at the likelihood that the model puts on these options. So passing in the input and then the question and then seeing… GPT-2 is a language model, and so it outputs likelihoods for tokens that are passed in. And you can just compare the likelihoods and see. I think this would be a reasonable baseline.
Daniel Filan (00:09:27): Yeah, and I guess this is less of a computer use thing than a multiple choice thing, so it’s easier to see how GPT-2 could do that one.
David Rein (00:09:33): Yeah, yeah, exactly. So for GPT-2 attempting much longer tasks, you can’t use this same methodology.
Daniel Filan (00:09:47): Sure. So speaking of longer tasks, that was an example of a very easy task. Can you give me a feel for what an intermediate task might be?
David Rein (00:09:56): Some examples of intermediate tasks that come to mind are simple software engineering tasks or data analysis, or we have some kinds of basic reasoning questions. So one example that comes to mind is: you’re given a short CSV file that just contains some data. It has, I don’t know, 50 or 100 rows of data, and you just have to write a very simple script that is 20 or 30 lines of code to parse this or process it in a certain way. And so this takes an experienced data scientist maybe a few minutes, maybe it takes someone more junior 15, 30 minutes or something. That’s I think a representative example of these intermediate tasks.
The meaning of “task length”Daniel Filan (00:10:54): Okay. And when you’re measuring time horizon: different people take different amounts of time to do this. What counts as the time it takes humans to do it?
David Rein (00:11:06): So I think there are different reasonable ways of doing this. The way that we approach this is we have… So one thing to say is, in general with the time horizon metric, we are trying to get at something like… One thing you could do, that I think would not give you very interesting time estimates, is you could randomly sample a person in the world, off the street or something, to do each task. I think this wouldn’t be a very useful measure of how long these tasks take people, because in general, those people are not completing these tasks in the real world. And so the thing we’re trying to get at with this metric is, we want it to be very intuitive. We want it to be clear if an AI system can do tasks of X length – of 15, 30 minutes, an hour, two hours – how does that translate into the real world? We want that connection to be very direct, and so we want to have people attempt these tasks that we would naturally expect to be doing these tasks in the world. So we try and have people who have roughly a reasonable amount of expertise in the different areas we might expect to do them. So that’s the expertise sampling question.
(00:12:51): Then there’s like, well, we still have multiple people attempt many of these tasks. Sometimes they succeed and sometimes they fail. And so there’s this question of, well, do we include their failures? Do we just use successful times? I think there’s reasonable discussion about this. One thing it would be nice to do is include their failures, because if we have someone who has a reasonable amount of expertise, but they fail at a task, I think that is information about the task being more difficult. But I think you would need a larger number of people to attempt the tasks in order to actually use that information effectively. You could do something like survival analysis from the medical industry where you know that they failed after a certain amount of time, but it’s possible that they would’ve succeeded in the future.
(00:13:48): But the thing we actually do in the paper is we use the geometric mean of the successful attempts. We use the geometric mean because we think completion time broadly is distributed logarithmically, or sometimes people will take much longer than other people, and we don’t want that to totally dominate the time we’re estimating for the tasks.
Daniel Filan (00:14:31): I guess one question I have about that is: so suppose you’re looking at tasks and you’re looking at completion time for the kinds of people who are able to do that task. I worry that that might compress the difficulty ranges. So one intuition here is: how much time does it take people to multiply 67 and 34 by hand? The answer is there’s a pretty large range of people who are able to do that task, and it probably takes them a couple minutes.
(00:15:06): Then you can also ask: how much time does it take people to solve a separable differential equation? Well, if you’re able to do that, it’s actually not that hard – depends if it’s a thing you can integrate easily, but probably, for people who can succeed at that task, it takes about as much time as it takes people who can succeed at the task “multiply these two-digit numbers” to do that. But it seems like there’s some sense in which solving the differential equation is harder. And maybe you want to say, “oh, the thing that’s harder about it is background knowledge and things could just learn background knowledge and we kind of know that.” But yeah: I’m wondering what you think of that worry.
David Rein (00:16:00): Yeah, I think this is a fantastic question that gets at a lot of what’s going on here, what’s interesting about this work. I should say that I think we’re getting into more speculative territory. There are a few things to say. So one is, in terms of the unique value of this approach that we’re taking with this time horizon metric: there are a lot of benchmarks that try and come up with the most difficult-for-people questions and then have AIs try and do them. In fact I think the standard methodology for saying “this AI system is smarter than another one” is that it can do problems that fewer and fewer people can do. So we started out with common-sense questions that most people can do in the 2010s, [and] over the past couple of years, models have been able to do… So I worked on this benchmark GPQA that had very difficult science questions – PhD-level roughly – and models are able to do that now. GPQA I think is mostly saturated or pretty close to it. Models can do International Math Olympiad questions that very few people can do.
(00:17:37): And so I think this is an important axis to measure AI capabilities along – difficulty for people – but I think this misses a lot of what people can do that AI systems can’t do. And one of the key things that we’re trying to get at is: how can we reconcile the fact that models can do these IMO questions, they’re geniuses in some sense, but they’re kind of idiots still? You ask it to book a flight for you… Maybe models can do that now, but even slightly harder things they often fall over on. And so I think that’s the thing we’re trying to get at.
(00:18:28): And so actually, we want to factor out “how much expertise do you need?” And one intuition for what we’re trying to get at is something like “the number of actions that are required to complete this task”. I think this is very difficult to operationalize, or it’s very problematic and mushy, but one intuition at least is that [with] this metric, if we factor out the difficulty of problems and we just look at how long they take people who have a reasonable amount of expertise, then maybe we’re getting closer to something like agency more broadly. And I don’t want to over-claim, I think this is still very much an open area, but for example, number of actions I think is also a very reasonable thing that I would expect to be correlated, although I think it’s probably more difficult to estimate.
Examples of intermediate and hard tasksDaniel Filan (00:19:27): Fair enough. Getting us out of that rabbit hole for a little bit. So an intermediate-level task that might take, I don’t know, three to 15 minutes for a relevant expert is take some CSV file or something and parse it. And to help us get a sense for that, at what point do language models start being able to succeed at this sort of task?
David Rein (00:19:55): Yeah, language models start being able to succeed… I might get the exact years slightly wrong, but somewhere in the range of 2022-ish is I think where models are able to do this. Actually, maybe backcasting from the trend from where we are now. So the specific trend that we found was that there’s been a seven month doubling time over the past five-ish, six years. Currently, models are able to do tasks with (we estimate) 50% success likelihood that are about two hours long.
Daniel Filan (00:20:43): And “currently” is late September 2025. It may take me a while to edit this episode and get it out, but that’s what you mean by “currently”.
David Rein (00:20:50): Yes, yes. Thanks. And so if we go back, two hours to one hour is early this year, another seven months to 30 minutes is like spring 2024, and then maybe 15 minutes is middle of 2023 or something? I think that should be right. So yeah, actually a bit later than 2022. And so that is… What models are coming out around then? Wait, actually, what models are those?
Daniel Filan (00:21:34): Oh, I don’t know. I hoped you might know.
David Rein (00:21:37): Let’s see. What is the exact timeline here? Something like-
Daniel Filan (00:21:42): Is GPT-4 2023-ish?
David Rein (00:21:45): Yeah, yeah. GPT-4 is beginning of 2023 or end of 2022. One of those. So I think it’s roughly GPT-4-ish, and that kind of lines up with my intuition here.
Daniel Filan (00:22:01): Okay. So we’ve got an example of an easy task, an example of an intermediate task. Can you give me an example of a hard task?
David Rein (00:22:10): So the hardest tasks we have take people something like six, seven, 10 hours to complete. One of the sets of tasks that we use actually comes from this benchmark that we released close to a year ago, called RE-Bench, which stands for Research Engineering Bench. So this is a set of challenging ML research engineering tasks. One example is: you’re given a neural network whose embeddings are permuted in a way that you don’t know, they’re kind of scrambled. And your task is to fix the embeddings, basically, of this model, and you can do fine-tuning or data analysis to try and understand how they were scrambled and see if you can reconstruct them. And so it requires some intuitions about how neural networks work and how to fine-tune or work with models at a relatively low level. And there are a range of other tasks. These tasks take ML engineers roughly eight hours to do decently well on. And so that’s one class of tasks.
(00:23:52): We have other kinds of software engineering tasks, for example, or cybersecurity tasks that take quite a bit of time. So one example that comes to mind – I think we didn’t actually get a baseline on this, I think for this task, we’re just estimating how long it takes – but this task has a modified implementation of a kind of older standard hashing algorithm, MD5, and the task is to find a hash collision on this modified version of this older hashing algorithm. There are standard attacks that work on this algorithm, or there’s literature on attacks, it’s not impervious, but you have to know which are the right ones. You have to understand the algorithm pretty well, and then you have to be able to modify the attacks or figure out how to change it. So this one is a little bit more expertise-heavy maybe than serial action-heavy. So there’s a bit of range there.
Why the software engineering focusDaniel Filan (00:25:12): Okay. So one thing that strikes me about the tasks that you mentioned is that they all seem very related to computer programming and especially programming, data analysis, machine learning, cybersecurity things. I believe that this draws from work from this benchmark that I believe you were the lead author on, Human-Calibrated Autonomy Software Tasks, or HCAST for short. My understanding is that those are the five areas that that covers.
David Rein (00:25:48): Yeah, broadly, yeah.
Daniel Filan (00:25:49): Why the focus on software engineering-type things?
David Rein (00:25:53): Yeah, great question. So I think there are at least a few reasons. So one reason is that some of the threat models that METR is most concerned about are very contingent on AI capabilities in some of these particular domains like software engineering, cybersecurity, and AI R&D in particular. And so we’re most interested in measurements of AI capabilities in these domains because we think that these are highly relevant for estimating risk, in particular, catastrophic risk from AI systems, and [I’m] happy to talk about those threat models. That’s one reason: they’re just directly relevant. Another reason is there’s been a lot more focus, I think, from AI developers in these domains. And so we’re measuring something that’s closer to what they’re focused on, and I think this has some trade-offs.
(00:27:08): So one objection to this is “but AI systems are really bad at other stuff because developers aren’t focused on it, and so now you’re overestimating their capabilities.” I think that’s basically a legitimate concern, I think that is true, but I think there’s this question of: if the methods that AI developers are applying to improve models in these particular domains are working well in these domains and they’re general, then we might expect it to be relatively easy, or more a product of just general commercialization to apply these methods now to a broader range of tasks. And so I think we want to aim for some balance of these and we want to understand how much generalization there can be from these domains, and there are open questions around this. But I think that’s another reason.
(00:28:26): And then finally, it’s just easier to measure AI capabilities in these domains. We’re software engineers, and in particular, one of the big things is: if you want to have a benchmark that is easy to run and easy to evaluate a model’s performance on, it’s much easier to do this in domains where you can more formally verify model outputs. So if you want to understand how well models can summarize text or write creative fiction or something, it’s really hard to write some code or automatically verify that this creative fiction is actually good. There are ways of getting around this to some extent.
Daniel Filan (00:29:21): Yeah. One thing that occurs to me that… I don’t know if METR is best-positioned to do this, but a thing that I wish happened more is just ecological understandings (“ecological” in a loose sense) of “do people use these things?” When AI writes fiction online, how many downloads does it get? How often do people choose AI therapy over human therapy, or whatever? I don’t know. My wish for the world is that we had better ways of tracking this sort of thing. But it does rely on people accurately being able to assess how much AIs are actually helping them in these domains by their use patterns, which… [In] another METR work measuring open source software developers, seeing if they’re good at estimating how much AI helped them, the answer was they were bad at estimating [that]. So maybe people are using AI all over the place and it’s not actually helping them. But it does seem like one way of addressing some of these concerns.
David Rein (00:30:39): Yeah, totally. I’m super interested in this sort of thing. There was recently… The Anthropic Societal Impacts team… I think didn’t quite get as far as measuring number of downloads or something, [but] they did some work recently, I haven’t looked at it closely, breaking down into really fine-grained categories what Claude usage looks like. I think these probably would be pretty correlated. If there’s a strong market demand for a certain kind of AI output, I think you would expect to see that show up in your Claude usage data, to some extent at least.
Daniel Filan (00:31:30): Right, right. Yeah, fair enough. So we were talking about why software engineering, and there are three parts to the answer. Firstly, it’s related to some threat models that METR cares about. [Secondly], it’s also easier to measure. Wait, I think there was a third thing in between those that I forgot.
David Rein (00:31:55): Yeah, the third one is… I think this is the sketchiest of these, I think those are probably the two biggest ones. The third one is something about AI developers, they’re aiming for this. And this has this trade-off that I talked about in terms of generalization.
Why task length as difficulty measureDaniel Filan (00:32:17): Got it. So I think the next thing that I want to talk about is: one interesting thing about what you’re doing is you’re basically saying, “Okay, we want to know how AI succeeds at tasks of various difficulties.” And if I had never seen this paper, I could imagine having a whole bunch of measures of difficulty. I could use a human rating of “on a scale of 1 to 10, how hard is this?” or “how many years of education do you need for this?” or “when people try it, what’s the probability that they succeed?” or if there’s some competition between AI agents or whatever, you can look at the Elo of it. That only works in some domains. Go is a really good one for that, for example. And one thing that you do in fact look at it in the paper is the intuitive “messiness” of a task. How clean and simple is it versus how tricky and rough is it?
(00:33:20): And the thing you end up finding is this really nice relationship with time it takes for humans to do it, where it seems like both you have a decently good relationship within a model where things that take longer for humans to do, success rate at these tasks is lower; and also across time, there’s this nice trend for this. I’m wondering: is this just the first thing that you tried and it seemed like it worked well, or do you have a really good sense of, “No, we checked and these other metrics just don’t have as good relationships in a way that’s nice and predictive?”
David Rein (00:34:03): So we’ve definitely done some of this. I think there’s a vision of “we’ve tried all of the things and this is the one”, and we definitely haven’t done that. Maybe it’d be useful in particular to talk about the specific alternatives. I think for at least a couple of them, maybe the first two you mentioned - “how difficult do people rate these tasks?” or “when people attempt the task, what is the probability of them succeeding?”, I think both of these are closer to the standard benchmarking paradigm.
(00:34:49): And so those metrics, I would expect to correlate more or be more connected to this intuitive notion people have about “how much expertise does a task require?”, which I think is already covered by other benchmarks. That’s not to say though that we couldn’t still use it as this metric, or maybe we would see a robust trend. But… That’s interesting. I think it’d be difficult to operationalize these in a way that makes sense. So for success probability, what is the exact actual distribution of people that you are having attempt these tasks? That becomes very load-bearing.
Daniel Filan (00:35:42): It seems like it’s similarly load-bearing for success probability as for time horizon, right?
David Rein (00:35:48): I’m not so sure. One of the reasons why we filter our baselines to only ones that succeed is: success on a task is in fact a lot more information than failure on a task. There are a bunch of reasons why you might fail a task that aren’t actually a lot of information about how difficult [it is] or how much agency is required or whatever. So for example, maybe we just got their expertise wrong. We’re doing this job of manually assigning people to tasks that we think that they have a lot of relevant expertise for, and maybe someone just happened to not ever use this particular tool or set of tools that are super important for this task. And then their failure on that task, it’s still some information. But if they succeed on the task, then that is just this very objective thing like “yes, someone can complete this task in this amount of time.”
(00:37:03): There are infrastructure reasons why people fail tasks. Also, there are incentive reasons. So when you have people try and complete tasks, sometimes they’ll get bored and they’ll want to stop. Sometimes they’ll be like, “Ah, this is too hard, I don’t want to keep doing this.” Incentives can be tricky to set up well in different cases. So one situation you can have is where people quit tasks early because they want to maximize the chances of getting more tasks that they succeed on. Typically, we pay bonuses for success because we want to incentivize people to succeed. But there’s a perverse incentive there. And so broadly, we just have a lot more uncertainty about failures, I think, than we do about successes. That’s not to say that we couldn’t do something like this. I definitely can’t say it’s impossible, but I think it’s more challenging. This is one particular thing. I think I’m probably not getting at the broader…
Daniel Filan (00:38:15): Maybe one way to get at the same question is: so you find this pretty good relationship between time to complete task among humans who are relevant experts who in fact managed to complete the task, and (a) AI probability at succeeding at the task, and (b) trends over time in time horizons that models can do at a 50% or 80% success rate. But it’s not perfect. And one thing you mention in the paper that for some reason seems to have gotten less memetic… people seem to talk about it less, is: you have this metric of messiness of various tasks. And you end up saying, “Okay, there is something to this messiness thing that somehow seems to predict task success over and beyond human time horizon.” So one question to ask is: if I had to choose between just human time horizon and just these messiness ratings, which one would do better? And maybe the next question is: if both of them are independently predictive, what does that say about the ultimate metric we really should be using?
David Rein (00:39:36): Yeah. So I think we are broadly really interested in trying to explain as much of the variance in models’ successes and failures as we can. And you’re totally right that the length of task for humans is one metric that explains a decent amount of this variance, but there are definitely other things that are going on. So we’re actually currently trying to figure out what are other properties of tasks that explain their success and failure well. And yeah, I think we would love to have something like this.
(00:40:27): For something like messiness… For a lot of these other kinds of metrics that you can think of, to me, the biggest issue that I see, or the biggest challenge, is just some kind of thing of subjectivity. So people have very different senses of what is a messy versus clean task, and depending on your priors about… So one example is, I have a colleague – I think it’s fine for me to talk about this – he basically would not rate any of our tasks as being messy at all because they have algorithmic scoring functions, for example. So the success or failure is defined by this small surface area or something. And the tasks tell you what to do, for example. In the real world, a lot of the challenge is figuring out what the hell you should do in the first place. So I think that’s a challenge.
(00:41:42): But especially with – you mentioned this randomized control trial that we ran recently of developer productivity where we saw that developers, at least when we measured this, were not sped up by AI systems, and trying to understand what the gap between benchmark scores and some of these more ecologically valid experiments… what that gap is or what explains that gap, I think we’re super interested in.
Daniel Filan (00:42:25): So actually, speaking of the relationship between how good things are at predicting success: so one thing that you also do is you look at the correlation between models, of if model A succeeds at the task, how does that predict whether model B succeeds at the task as well? So this is one of these fun diagrams that you have in the appendices. And it’s possible that you just don’t know the answer to this question, but one thing I noticed when looking at these diagrams of correlations is there’s this block of GPT-4 and beyond models that seem much more correlated with each other on what tasks they can succeed and fail on than pre-GPT-4 models. What’s going on there? Is it that they’ve standardized on training sets? Is everyone after GPT-4 trying to train their models to do software engineering and that’s what’s going on? Yeah, tell me about that if you can.
David Rein (00:43:21): I don’t think I actually know the answer to this. I can speculate. I’m actually not certain that this isn’t an artifact of our particular setup. So one thing you brought up is: if you put GPT-2 in the same agent scaffold – so for recent models, we have them in this loop where they see some instructions and the state of their environment and then they think about and consider what actions to take, and then they take an action and use some tools and continue – if you put GPT-2 in this loop, it just totally, totally flops. And so basically, you can’t really make a perfectly direct comparison, you do actually have to use a different methodology. I’m not certain that this block in the correlations isn’t because of some difference in our agent scaffolding, for example. It’s a really good question. I would be curious to know. I actually don’t know if we know. There’s probably been some discussion about it, but I would need to check.
Daniel Filan (00:44:51): Another thing that just occurred to me with the alternative difficulty measures: I have a colleague of mine back when I was at CHAI called Cassidy Laidlaw who has a paper, I forget what the name of the paper is, it’s going to be in the description and I’ll send it to you afterwards, where basically the thesis is: if you want to know whether deep reinforcement learning works on an environment or not, if you’re familiar with reinforcement learning algorithms… One idealized reinforcement learning algorithm you can do [is], you can start off with a random policy, and then you can do iteration where [it’s] like, “Okay, what would be the best action for me to take given that from this point onwards, I’m just going to act randomly?”
(00:45:37): And then, “Okay, what would be the best action for me to take given that from this point onwards, I’m going to do the thing that would be best given that from that point onwards, I would act randomly?” et cetera. And I think basically a very good predictor of how well deep reinforcement learning works on various environments is just: how many steps of that do you actually have to do? If I recall this paper correctly – people can read it in the appendices.
David Rein (00:46:00): Interesting.
Daniel Filan (00:46:01): And I feel like one nice thing about this is that [although] it doesn’t get to the aspects of messiness that are vagueness or whatever, because this is just reinforcement learning where you have a defined reward function, it does get to some of the agent-y, “how much do things depend on things?”
David Rein (00:46:22): Yeah, like how fragile… Interesting.
Daniel Filan (00:46:28): Embarrassingly, I remember very, very little about this paper. But people should read it.
Is AI progress going superexponential?Daniel Filan (00:46:32): So I think the last thing I want to ask about, digging deep into the time horizon stuff (at least for now): one thing that readers notice when looking at this is there’s basically this line on a log plot of year and time horizon. And models are basically lining up along this line. But then it starts looking like once you get reasoning models, they start bending up a little bit, they’re a little bit above the line. So “[Measuring] AI Ability to Complete Long Tasks”: I believe that was released in February or March of this year.
David Rein (00:47:14): Yeah, March.
Daniel Filan (00:47:15): Pretty early, when we had not as many data points. We’ve gotten a few more data points. And early on, there was some speculation of, okay, are we going superexponential or not? With more hindsight: are we going superexponential?
David Rein (00:47:32): Yeah, great question. I would love to know the answer to that. I think we still don’t really know. [There are a] couple of things to say at least. So one is since we released the paper in March… One thing that’d be useful to just point out for listeners is that this plot, where we measure the trend of improvement over time, we’re only using the best model at a given time. And so that’s just relevant because there are a lot of other models that have different trade-offs, or maybe they have faster inference, but they’re weaker. And we’re just using the models that perform the best.
(00:48:24): Anyways, since March, frontier models… So one thing we look at in the paper is, we noticed… Actually this is useful to talk about because I think the timeline of how the paper came together is useful. So we actually initially only fit the trend on models from, I think basically 2024 onwards. So I think the first version of the graph was made by Ben West in December 2024, if my memory is right. And I think this was just using that year’s models. And with those models, we actually observed this four-month doubling time in the time horizon. And then we were like, “well, does this trend extend backwards?” And so in the paper, we also do these backcasts from this. So then we added in previous models.
(00:49:42): All that’s to say that, to some extent from the start, we have seen these two trends, essentially. I think this is all kind of, I don’t know, BS or something. If you have 10 data points or 15 data points and you’re fitting piecewise linear functions, it’s pretty sketchy. So I definitely don’t want to over-claim, but it does seem like this four-month doubling time trend from 2024 onwards has continued to hold or has been a much better predictor than this seven-month doubling time that is suggested by the models going back to 2019. So I think my best guess that’s very low confidence is something like we’re just on this four-month trend now, but it’s still just exponential. It is really hard to distinguish between different kinds of model fits to some extent.
Is AI progress due to increased cost to run models?Daniel Filan (00:50:58): So actually, the thing about different models made me wonder: so if we’re saying that time horizon is going up over time: suppose I want to project that into the future. It’s one thing if this is true at basically fixed cost; it’s another thing if it’s always the case that a one-minute task costs $1, a two-minute task costs $2, a four-minute task costs $4, and then maybe we get models that can technically do things that a human could do in a month, but it would be cheaper to just get the human to do it for a month. Off the top of your head, do you happen to know what the picture looks like with cost?
David Rein (00:51:49): Yeah, that’s a great question. This is something we try and keep an eye on. Let’s see. So for recent models, our agent scaffold has a token limit that we tell models about so that they’re aware of this. But I think we’ve been using a token limit of something like 8 million tokens, which I think for these longer tasks, ends up being at least one order of magnitude cheaper than paying a human with relevant expertise to complete the task.
Daniel Filan (00:52:31): And to give a feel for that, 8 million tokens is something like six bibles of texts, roughly.
David Rein (00:52:37): Yeah, yeah, it’s quite a lot. You can do much better than that with caching. Most APIs let you do prefix caching and that helps quite a bit, so you should count it differently, I think.
Daniel Filan (00:52:54): But it’s like a big chunk, basically.
David Rein (00:52:56): It’s a big chunk. Models will do lots of reasoning and run a bunch of different experiments on these longer tasks. They’ll take something like 10 to 50 actions or something in the environment. But then for each action, they’re doing a bunch of reasoning. And it depends on the exact agent scaffold, but in many of them, we have models that propose actions and then review them and then select the best one. So there’s a lot going on, and this is still much cheaper than having people do it. I wish I knew the exact numbers on cost. It is more complicated because of caching.
(00:53:56): So currently this isn’t the biggest concern of ours because of this, basically; where models still are just highly cost-competitive. I totally imagine this changing at some point. [Because of] trends in models being able to use test-time compute more effectively, I totally expect for very long tasks to get expensive and [I expect] it to be very important to be measuring the Pareto frontier of cost and success rate or something. And I think we’re excited to do more work on this as it becomes more relevant.
Why METR measures model capabilitiesDaniel Filan (00:54:45): Yeah, fair enough. So zooming out: Model Evaluation and Threat Research… I think of METR as trying to figure out how scary models are. And if they’re scary enough, then I don’t know, maybe we should do something. So this work of measuring general software engineering capabilities and trying to forecast them over time: what’s the rationale behind this? Why focus on this?
David Rein (00:55:19): So I think broadly, the threat model that METR is most concerned about, at least at the moment, is rapid acceleration in AI capabilities, and in fact, the rate of progress of AI capabilities due to AI systems being able to contribute substantially, or contribute the majority of, AI progress at some point in the future. So the idea is: currently, the way you make AI systems better is through a combination of compute, hardware, resources, money, data and talent, labor. If it becomes the case that AI systems can replace the labor, the talent part of this, in economic models of progress, in at least some of them – I think broadly they’re reasonable, although I’m not an economist – you can see very, very rapid progress, and basically this just seems broadly kind of scary.
(00:56:42): So one example is you might see very rapid centralization of power in a single organization that does this recursive self-improvement, and that’s concerning for general stability, geopolitical, democracy kind of reasons. And then also, your arguments for why the AI system itself is not going to be dangerous, those might break down. So you might not be able to evaluate it effectively because, for example, the system may have a really good understanding of exactly how you’re evaluating it and if its goals are different from yours, then it might be very easy for it to game your evaluations, your supervision methods might break down. You’re reading its chains of thought, for example, and the model is saying things that seem very safe and nice and reasonable, but actually it’s doing some kind of hidden reasoning in the background that you can’t detect and you didn’t realize that this was about to happen because progress was so fast and because as a lab you were just scrambling to get as much compute and make as much progress as you can, as quickly as you can.
(00:58:16): And so broadly this is, I think, one of the big concerns or questions that we want to understand: how close are we to this rapid acceleration? Is that even possible? As I said, labor is not the only input to AI progress. You also have compute, for example, and data, and these things might be highly complementary to labor such that even if the amount of talent increases by several orders of magnitude, because you have all your AI researchers doing this work, you might end up still very bottlenecked by compute and data. And so trying to get some understanding of that… We think about this to some extent, these economic models. I think this isn’t our chief forte. Epoch AI has a bunch of great work doing some of this modeling also. Folks at, I think the org is called Forethought, Will MacAskill and Tom Davidson have done work on this kind economic modeling.
(00:59:36): Anyways, understanding how capable AI systems are is a big input to this. And software engineering and ML research capabilities are highly relevant.
Daniel Filan (00:59:49): And how much is the desire… So one thing you could do with this is you could say: okay, are we there or are we about to be there? And that’s the point of doing the measurements. Another thing could do is you could be trying to say, okay, are we going to get there in 2030 or are we going to get there in 2050 based on what we know now? So how much is the thing you’re trying to do a forecast versus a nowcast?
David Rein (01:00:19): Yeah, that’s a great question. I think we would love to be able to do really good forecasts. Unfortunately, I think it’s really, really hard. So for example, as we talked a little bit about, new paradigms in AI might change the trends that we observe. Also, there are lots of inputs to these trends that might not be durable. So for example, we’re seeing the time horizon of AI systems is increasing exponentially; but also, the amount of money and the amount of compute being put into training AI systems maybe has also been increasing exponentially. I actually don’t know the exact details of how compute spend has been increasing, but-
Daniel Filan (01:01:10): I think it’s exponential. I feel like if I go to Epoch AI, they’re going to show me some nice graph and it’s going to be like…
David Rein (01:01:17): Yeah, yeah. And so maybe that’s just the cause, and in fact we’re just going to hit some bigger bottlenecks in the economy more broadly. It’s just not going to be possible to fund increasingly large data centers. Kind of an interesting point is: I basically view this time horizon trend that we’re seeing as something closer to an economic model than an ML benchmark model or something. Where I’m like: the actual inputs to this progress are firms that are competing to train increasingly better models, and they’re putting these resources in and they have these constraints and whatever.
(01:02:08): And actually, for me at least, one of the big updates is, I think I am much more interested in economics as a result of seeing this really robust trend. Because I was actually extremely skeptical of putting time on the x-axis in particular. I was like, the inputs are just going to be these random decisions by different labs and there’s no way we’re going to see some robust trend, because it just depends on who Jensen [Huang] happens to like or whatever.
Daniel Filan (01:02:51): Jensen Huang being the CEO of Nvidia, right?
David Rein (01:02:53): Yeah. Yeah. For different compute deals or something. And I was like, no way that could be robust. So that was a decent update for me: maybe these kinds of extremely abstract economic models actually can be very, informative, or maybe there is this deeper systematicity to AI progress, even though zoomed in it feels very contingent and kind of arbitrary. I don’t know. This is all very much speculation or just my musings on this.
(01:03:30): I think as an org, we are definitely interested in forecasting. I think there are trade-offs between doing this more abstract modeling and just focusing on… We do a lot of work on this nowcasting kind of thing. Just “currently, how good our AI systems?” is kind of an open question. There is a lot of disagreement about this. Even internally at METR, we have disagreement about this. Probably there isn’t one single answer, ‘cause it’s just a complicated question. But I think we’re trying to do both to some extent.
How time horizons relate to recursive self-improvementDaniel Filan (01:04:10): Fair enough. So for either forecasting or nowcasting: suppose I want to use the time horizons work or the nearest successor to tell me when [we’re] going to get this “AIs feeding into AI progress”: how am I going to use the results of, “oh, it’s three months”? Are we at recursive takeoff?
David Rein (01:04:40): Yeah. I think this is kind of an open question, or I don’t think we have nearly as good of an answer here yet as we want. We have heuristics, I think; [at] one week of work – time horizons of 40 hours – I think we definitely are getting a lot more concerned, or it seems at least plausible that you could successfully or efficiently delegate weeks worth of work to AI systems, and I could totally imagine that speeding up AI progress quite a bit. Same for time horizons that are much longer, but I think we don’t really know, is my answer.
(01:05:42): Part of my uncertainty is… [the idea that] a week or a few weeks of work as a time horizon is very useful as a rough heuristic or threshold, I think I would’ve been more confident in that maybe before this productivity RCT where we found that people were very miscalibrated on how much AI systems sped them up, open source software developers in particular. And in fact, we saw that they were slowed down on average by 20%. I think the time horizons work and these randomized controlled trial results, I think they’re probably not as in conflict as they might seem at face value, for reasons that we could talk about, but they definitely did update me more towards broader uncertainty about this interaction between AI systems and people. And maybe we do end up really bottlenecked by things like our ability to specify tasks really clearly, or maybe things like the fact that we’re algorithmically scoring models, we might be overestimating their capabilities because of that to some extent.
Daniel Filan (01:07:10): Actually, in terms of other bottlenecks, I’m really interested in talking about that. Because if we’re interested in… Suppose I want to know at what point do we get this runaway process or whatever, it really matters whether AI is automating… Suppose there are five things you need to be good at to do recursive self-improvement: the difference between AI being able to do four of those and AI being able to do five of those is huge. Right?
(01:07:40): I think one concern I might have about the METR benchmark stuff - or about this particular paper - is just: is it covering all the bases, or is it covering some of the bases, kind of? Just because potentially that could really reduce its value for this particular thing. I’m wondering, do you have thoughts about that?
David Rein (01:08:09): I think that’s a pretty legit concern. I guess I would be interested in… There’s this question of, well, what are the specific things that are bottlenecking and how different are they from the things that we’re measuring? So one kind of broad reply could be something like, well, to the extent that our benchmark is just a bunch of kind of different, diverse tasks, hopefully it’s the case that we’re kind of covering some decent amount of the space of necessary skills or capabilities, such that we would expect results to be very correlated on things that we’re not measuring specifically. And we can maybe get some kind of sense of this by looking at the variance of model performance on our tasks.
Daniel Filan (01:09:10): I guess one thing you could presumably do is just have a held-out 20% set and just see, does performance on the non-held-out set predict performance on the held-out set? I guess that’s probably in some appendix somewhere.
David Rein (01:09:25): I think the thing you would want to be doing there is you would want the held-out set to be importantly different in some kind of biased or systematic way. And I think that would be interesting. Currently, we haven’t done this. To some extent, maybe the messiness analysis is trying to get at something like this. Are there other factors that explain model capabilities? It seems like, kind of.
Daniel Filan (01:09:58): Yeah, I guess there’s also this blog post METR put out basically trying to do a similar analysis for other domains. So there’s a little curve for self-driving and there’s curves for… I forget exactly what all the other tasks were. So my recollection of that is that it seemed like in each domain you maybe had some sort of exponential increase in time horizons, but best fit doubling times were different in different domains.
David Rein (01:10:27): Yeah. My broad takeaway from this work that Thomas Kwa led was that in decently similar domains – so, question-answering benchmarks, for example; GPQA was one of the benchmarks, and there were a few others – I think we saw quite similar doubling times overall, is my memory. And actually even overall pretty similar absolute time horizons, which was some amount of validation. The challenge with this kind of work is: we put a lot of time into estimating the lengths of our tasks, and so we’re using these scrappier, more heuristic or less precise estimates of task length for most of these other domains. And then I think self-driving did have a slower doubling time, but I don’t think it was clearly not exponential.
(01:11:43): And then, the other interesting takeaway I had from that was with respect to more general computer use. So there’s this benchmark OSWorld that has a bunch of, you have a browser and you need to do these tasks or you’re in this operating system and you have to click around and manipulate normal software. The key difference between this and a lot of our tasks is that our tasks are almost entirely text-only. Models are weaker relatively at multimodal tasks it seems. So I think for those domains, I think they had a kind of similar doubling time, but the absolute time horizons were much, much lower. I think it was a couple minutes or something, which I thought was interesting, and I’m actually kind of confused about broadly; I don’t really understand what’s going on there.
Cost of estimating time horizonsDaniel Filan (01:12:58): With all that said about the pros and cons of this sort of framework for tracking “are we getting close to some sort of self-improvement cycle?”, I’m wondering: what’s your guess about whether, let’s say one or two years from now, we’re still thinking that something basically like time horizon is the metric that we’re tracking, or we end up saying, “oh, there’s something pretty different and that’s the real thing”?
David Rein (01:13:31): Yeah, yeah, that’s a great question. I think to me, a lot of this comes down to the tractability of continuing to use this metric and estimate it. I think this is somewhat unclear. So for example, we paid a lot of people money for their time to work on these tasks so we can estimate how long they take. If the length of these tasks becomes… they’re weeks- or months-long tasks, this gets pretty expensive.
Daniel Filan (01:14:19): Actually, how expensive was it to make this paper?
David Rein (01:14:22): That’s a great question. It’s kind of tricky because there were these different efforts going on. So we included the RE-Bench tasks and the baselines for these tasks, and that was a separate project. So it maybe depends on if you count that. I think that the baselines for the main set of tasks that we used, the HCAST tasks, I want to say that these were somewhere in the range total of at least tens of thousands, possibly low hundreds of thousands of dollars, something in that range. I probably should know this off the top of my head more accurately, but yeah.
Daniel Filan (01:15:15): Yeah. But it sounds like it’s reaching a stage where measuring these time horizons is getting close to the dominant cost of actually doing this work. It’s probably lower than the salary cost of, you’ve got a bunch of people working on it, but if it were to become more of a thing.
David Rein (01:15:36): At some point, I think this does start to dominate. Although, I would say that I think currently actually creating the tasks is the most expensive and difficult part. So either creating them from scratch or trying to find good tasks in the wild, as it were, which is nice because (a) they already exist (to some extent, although you have to kind of port them over into your framework), but also that gives you more confidence that they’re realistic and representative of real work that people are doing, which is important when we don’t fully understand exactly when and why AI systems succeed or fail.
Task realism vs mimicking important task featuresDaniel Filan (01:16:23): Actually, maybe this is worth talking about a bit. I think there’s one kind of approach to measuring AI systems which says: look, we need to isolate things. We need to get down to the simplest feasible task where we can really measure exactly what’s going into it. And these end up being things… If you think of ARC-AGI, it’s not quite this, but it’s something sort of like this. Versus a sense of, no, we need to create things that have this realness flavor, even if they’re not… Finding an MD5 hash collision, on some micro-level, it’s not very similar to doing AI research. Right?
David Rein (01:17:13): Yeah.
Daniel Filan (01:17:13): Could you say a bit about how important it is to be thinking about economic usefulness versus trying to mimic a sense of what the tasks you care about are?
David Rein (01:17:28): Yeah. I think that there is a very real trade-off here between the level of granularity of your understanding, where if you maximize that, you often end up with these very simple, formulaic, systematic benchmarks that are just probing some very particular kind of skill in a systematic way. And then on the other end, you have this realism maximization lens. So I think the best popular example of this maybe is SWE-bench or SWE-bench Verified where these are actual GitHub issues and PRs and tests that you’re measuring AI systems against. I think there’s a real trade-off here where on one end, you get this granular understanding, and then on the other, it’s really easy to interpret what a certain success or failure means. It’s like, okay, yes, it can do this thing in the real world that I understand, I have some intuitions about. So I think there’s a real trade-off.
(01:18:51): What do I think here? I think it’s really hard. I mean, broadly, I feel pretty pessimistic about this kind of granular approach. I think maybe this has something to do with the amount of systematicity in neural networks themselves or something where it’s like: well, they are just kind of inconsistent, but are still capable of really impressive things often. And so maybe you just can’t get this extremely crisp understanding and you just have to aggregate or look more broadly at things that actually are relevant for your decisions about whether to deploy a system or how safe it is or whatever. I think that’s probably the direction I lean in.
Excursus on “Inventing Temperature”Daniel Filan (01:19:50): I also wonder if there’s something along the lines of: often these sort of high-level things… So take something like economic growth: it’s an aggregate of a bunch of things a bunch of people are doing. It’s not very well-isolated, and also it’s relatively smooth and predictable; not totally, but it’s pretty smooth. Time horizon, you might not have thought that it would be this nice trend, but it is. OK I’m going to tell you about a book that I’m reading: part of the reason this is on my head is that I’m reading this book, Inventing Temperature, which-
David Rein (01:20:26): Yeah, yeah, yeah.
Daniel Filan (01:20:27): Yeah, it’s very popular in these LessWrong spheres, and I’m finally getting around to it.
David Rein (01:20:31): I haven’t read it yet, but I’ve heard lots of great things about it.
Daniel Filan (01:20:34): Well, it’s great. I’m going to spoil it a little bit. So the first chapter is basically about the problem of: so basically you want to have a thermometer. Suppose you want to standardize a temperature scale that all these thermometers use. In order to do that, you’ve got to find some phenomenon that’s always the same temperature, but that’s repeatable that a bunch of different people can use. So firstly, there’s a bit of a weird circular thing where you have to know that a phenomenon always has the same temperature before you have a thermometer, right? Which, okay, maybe you can use the same thermometer and do it multiple times, and you just trust that the volume of the mercury or whatever is a good proxy for the thing you want to talk about as temperature. So one funny thing is initially, people were just really wrong about what could possibly work for this. You have people saying, “what if we just do the hottest it gets in summer? Or how cold it is underground?”
David Rein (01:21:34): Wow, yeah. Oh, that’s great. That’s so good. Oh my God, I love it.
Daniel Filan (01:21:37): It doesn’t quite work. But eventually people are like, oh, we’re going to use boiling water. Now firstly, we now know that the temperature that water boils at depends on the atmospheric pressure, right? Well, luckily they knew that as well, so they were able to control for that.
David Rein (01:21:55): How did they know that? Does the book talk about that?
Daniel Filan (01:21:57): I don’t know. I’ve only read most of one chapter or something. But I think you can do a thing where… Especially if you’re looking at temperature as a proxy for volume of a liquid thing, and a lot of your thermodynamic knowledge comes from stuff like brewing or engines or something, you end up in these situations where you have things at different pressures and different volumes, and I think that’s the kind of thing that you can figure out, especially if you have this identification of temperature with volume of a thing under fixed pressure and fixed conditions or whatever. So it’s like, okay, boiling water, right? Do you cook pasta?
David Rein (01:22:48): Sometimes, yeah.
Daniel Filan (01:22:49): So one thing you’ll notice is that first bubbles start appearing, and then you start getting a bit of a boil, and then you start getting a rolling boil. And the temperature of the water is different at different points of this, and also the temperature of different bits of the water is different at different points of this. So what are we talking about when we’re talking about boiling temperature? And if you look at the cover of the book, it’s this picture of an early thermometer that has one line for mild boiling and one line for, it’s really solidly… “boiling vehemently”, I think it says. And these are different temperatures, right?
(01:23:23): So there’s this one scientist who does this approach of like, okay, what are we talking about about boiling water? He has this theory that one thing that happens with “fake boiling” is that water has little bits of air in it, and those little tiny, tiny air bubbles, you start getting evaporation into that air bubble, and then that air bubble gets hot, rises up, and you start seeing vapor, but that’s not true boiling of the water. That’s only there because there’s these interior air bubbles. And so he starts going down this line of work of, okay, let me isolate out all of the random little things, right? We’re going to have as smooth as possible a surface as I can. We’re going to get rid of all the air bubbles. And basically, the thing he discovers is superheating, where it turns out you can get water way above 100 degrees Celsius before it actually boils.
(01:24:21): Basically, the thing they end up doing is… The answer turns out to be that water vapor is at a very consistent temperature, even when the temperature of the water is not a very consistent temperature. But the reason that’s true is precisely because there’s a bunch of dust in the air. There’s little things that things can nucleate around and that stops vapor from getting too hot or too cold before condensing. And in fact there’s… Have you heard of cloud chambers?
David Rein (01:24:56): No.
Daniel Filan (01:24:57): They’re used in particle physics, and basically they have this supercooled vapor, so it’s vapor that is under 100 degrees Celsius that is ready to condense, but doesn’t have a thing to nucleate around. But if you shoot a particle in it, it condenses around that so you can see the trail.
(01:25:16): In thermodynamics, there’s this general thing where if there’s a bunch of random messy stuff, that produces a bunch of observable regularities of a somewhat higher level… We have this in thermodynamics. It seems like we kind of have this in economic growth, and part of me wonders if that’s kind of what’s going on in how we should understand neural network capabilities. Or maybe I just read a book and I liked it.
Return to task realism discussionDavid Rein (01:25:46): No, I love this. I think this general idea is super interesting. Another model you could have for how AI systems are performing on tasks is: you could imagine that there’s something like a constant failure rate that AI systems have as they’re attempting tasks. Different tasks might have different failure rates, and so that complicates things.
Daniel Filan (01:26:28): And by failure rate, do you mean per time a human takes to do it?
David Rein (01:26:32): Something like that, yeah, exactly. Toby Ord actually did some analysis, or some follow-up work on the time horizon paper, where: if you assume this constant hazard rate – per time that people spend, there’s some percentage chance that the AI system is going to make some kind of catastrophic error and then ultimately not succeed at the task – then this also is a good predictor of AI system success and failure on our tasks as a function of the length of task for humans. In our paper, we used a logistic fit, but assuming a constant hazard rate, you would use an exponential fit.
Daniel Filan (01:27:21): I do think that Lawrence Chan had a response to that which said that logistic fit was in fact better, even though it used more parameters or something. I remember a response along those lines.
David Rein (01:27:31): Totally. So we did explore different fits and logistic was a better fit. I think because of this aggregation of maybe different distributions of tasks, I don’t think it’s obvious how much we should weight the exact quality of the fit versus priors on simplicity or “this is a nice model” maybe. I don’t know how much to weight that. But I think stuff like this to me is very interesting in terms of understanding capabilities. I’ve really often felt like getting at something more like the intrinsic number of actions needed to complete a task feels to me intuitive. And I think other folks I’ve talked to… It feels like a really nice kind of thing that could be useful for understanding this. You can imagine it slotting well with this constant hazard rate model where it’s like, for each action that you need to take or something… But actually operationalizing this, I think has been tricky. We’ve done some analysis of this and it’s been difficult to extract really good insights.
Daniel Filan (01:29:10): I think we’re currently on a tangent from a question I was asking a bit ago – I think I took us on a tangent – which is: two years from now, do you think we’re still using something like time horizon? So one big response you had is, well, will we be able to? Will it just be infeasible to actually measure these time horizons? Setting that consideration aside, I’m wondering if you have a sense of, this is probably just the thing that’s going to continue to be more robust, or probably we’re going to come up with a “number of actions” model, or something that incorporates the messiness results, or something like that.
David Rein (01:29:54): I think my best guess is… Assuming we’re able to continue estimating it in a way that we feel confident in, I think my best guess is that we’ll use it with different weightings or multiples or something, based on some of these other factors. I think I’ve become more pessimistic about figuring out things like number of actions. That’s not to say… I mean, I would be super excited about that and I think there’s a decent chance I’ll take another stab at it at some point.
Daniel Filan (01:30:47): Suppose we think that economic relevance, trying to mimic real-world utility is just the thing. One thing you could imagine doing is: we’re just going to figure out what the market rate is to get someone to solve this task, which is a mixture of expertise and time taken. Do you have a sense of whether that would end up being a better predictor?
David Rein (01:31:11): Yeah, it’s a great question. I think we have looked at this or tried to estimate this by clustering our tasks… I shouldn’t speak too much to the details because I can’t remember exactly what we did, but something like [this] – just look at, these tasks are really hard ML tasks, and so they’re going to be more expensive, and these other ones are cheaper. And there’s some trade-off. I think something like that could be reasonable. A reason why you might not expect that to work is that AI systems broadly have a different capability profile than people. So if it was, I don’t know, 1920 or something… Or actually, let’s say 1950 or ‘40, maybe right before we had calculators: if you were doing this math of, how long does it take to pay human computers to calculate the product of 10-digit numbers? That you need to do for whatever reason. You’d be like, “Yeah, that’s an extremely hard task. Machines are not going to be able to do that task for such a long time.” But in fact, pretty quickly after, computers were able to do this very well.
(01:32:55): And so applying this to modern systems, and I do basically believe this actually: AI systems are way, way better at tasks that seem to require humans many years of intellectual development and labor to complete. They can do GPQA questions, they can do IMO problems, these sorts of things. And so I think I do view this as less of the bottleneck, basically, and I think I do view something more akin to agency… Which might point to messiness factors, or… That’s not to say that there aren’t other metrics. Maybe this is just an argument against human expertise or something.
Open questions on time horizonsDaniel Filan (01:33:52): Fair enough. I guess with that said, we’ve got the time horizon stuff, we have HCAST. I’m wondering: to you, what are the open questions and what kinds of things might I see out of METR in the next year or so, pushing this research direction forward?
David Rein (01:34:15): Yeah, great question. Broadly, I think there are a few things. One is continuing to use this methodology. So currently models have 50% success rates on these two-hour tasks. GPT-5 I think is two hours and 15 minutes or something time horizon. And if we really are on this four-month doubling time trend, we’re at four hours by the end of the year, eight hours spring of next year, 16 hours fall next year. That’s not that long. We have fewer longer tasks, and we have fewer baselines on these longer tasks because they’re more difficult to baseline. You have to find people with more specialized expertise and they’re more expensive and people fail more often. And so extending our task suite and trying to just see “does this trend continue?” is one big direction.
(01:35:24): I think there are open questions around how do we actually affordably continue doing this? Are we harvesting tasks from existing work that people have already done? Are we creating new tasks and then using LLM evaluation or more manual review to evaluate success on them? Are we doing other things? So things in that direction, that’s one class of things: trying to continue this basic methodology.
(01:36:03): I think there’s another class of directions that we’re pretty excited about, which is something more like… What I just described is something like benchmark development and then evaluating models on these tasks. But then there are a bunch of these questions around, how good are our benchmarks? How good are other benchmarks? Over the past couple of weeks, I’ve been labeling many dozens of attempts of models on SWE-bench with a bunch of different factors to try and understand, for example, how good are our tests in SWE-bench? Are models often implementing correct functionality that isn’t captured by the tests because the tests were written for the specific implementation that the human originally wrote?
(01:37:01): Or alternatively, are models often succeeding as judged by the automatic test cases, but they actually break a bunch of other code that isn’t tested in the repo, or their solution is just so bad in some other ways that we wouldn’t actually call that a success? Broadly, this is one example of this stream of work that we’ve started doing more of over the past few months of trying to understand benchmarks, this science of evals stuff of: how can we interpret certain scores on different benchmarks? Ones that we’ve made, ones that other folks have made.
(01:37:55): Also, questions around to what extent are current methods for improving AI systems going to generalize? One example that comes to mind of an open question to us is something like: training models on formally verifiable tasks, like passing test cases… People talk about “reinforcement learning from verifiable rewards”. There’s a question of: how much progress currently is coming from this? And maybe there are two corollary questions: how much should we expect progress when training in this way to generalize to non-verifiable tasks or tasks that are messier or more qualitative? And then alternatively, maybe if improvements in models from this type of training doesn’t actually generalize well, how much human data, for example, do you need to train models that are good on more qualitative, messier tasks? Trying to get some sense of things like this, this is something we’re interested in. The exact projects that we’ll end up doing will depend on specifics.
Daniel Filan (01:39:32): Fair enough. That’s things that METR might end up doing. There’s a whole other world out there, including listeners to this podcast.
David Rein (01:39:40): Whoa!
Daniel Filan (01:39:42): If they’re interested in advancing this research direction, what would be good things for outside people to do?
David Rein (01:39:50): One thing that I’ve been really excited about is this work basically making it easier to run evaluations in standardized ways. So at METR, we’ve started using this platform for running evaluations called Inspect. It’s open source. It’s primarily developed by folks at the UK AI Security Institute. This platform is great, and there are a bunch of benchmarks that have been implemented in it, and I’m super excited for more benchmarks to make it in and to improve the ecosystem’s ability to broadly run these evaluations. That’s more on the engineering side of things.
(01:40:54): In terms of research, I’m excited about people extending the time horizon methodology to more benchmarks. Actually this guy Sean Peters, I think his last name is, he evaluated models on cybersecurity benchmarks in particular and used time estimates from those benchmarks. I think he did some amount of estimating task length himself and fit some trends to models’ performance on this particular slice. I thought that was a really useful way of getting more data validating these things. I’m excited about direct follow-up work like that. Directions in the vein of what we talked about, of trying to decompose model success and failure, or understand what are the fundamental trends going on here… I think I said earlier I was pessimistic about these extremely constrained, less realistic types of tasks, but I do still think they can be quite useful, almost as diagnostics or something, just helping bound our understanding of what models can and can’t do.
(01:42:43): Something that comes to mind is people have made kinds of tasks that are basically just “how many of a very basic action can models take in a row before they fall over or get off track?” Things of that nature. Very large kinds of arithmetic, that comes to mind as an example. I think things like that are actually interesting, although I think to me they’re more [about] bounding model capabilities.
Daniel Filan (01:43:20): Fair enough. The second to last question I’d like to ask is: is there anything that I should have asked that I haven’t yet?
David Rein (01:43:32): Great question. I think broadly we’ve covered a fair bit of METR’s capability evaluation work. I think there are big open questions to me around how long we’ll be able to continue doing this work. Not even just from a tractability perspective, but also just from a “will it actually be useful?” perspective, in particular for estimating risk. So at a certain point, if we are seeing that AI systems are able to do AI research very effectively, then it’s like, okay, how do we continue estimating risk? Is risk just “maximum”? Probably not. People are still going to be doing kinds of monitoring, or I expect folks to implement basic kinds of control methods. So over the past few months, we’ve been doing more work trying to create better metrics for things like monitorability. I guess I’m just describing this instead of a question. I haven’t been working on it, but I think it’s very interesting and exciting work.
Daniel Filan (01:45:06): Yeah. Sounds cool. So speaking of, if people are interested in following the work that you and your colleagues at METR do, how should they go about doing that?
David Rein (01:45:16): Yeah, so going to our website, metr.org. We publish our research updates there. I think you can put in your email and subscribe. We also post on Twitter. I can’t remember our Twitter handle. Anyways.
Daniel Filan (01:45:39): It’ll be in the description.
David Rein (01:45:44): We’re also hiring. We’re hiring experienced researchers and research engineers. So if that’s you, definitely reach out, and we may be excited to chat.
Daniel Filan (01:45:59): Great. Well, thanks very much for coming and chatting with me.
David Rein (01:46:03): Yeah, thanks a lot for having me. This was really fun, Daniel.
Daniel Filan (01:46:06): This episode is edited by Kate Brunotts and Amber Dawn Ace helped with transcription. The opening and closing themes are by Jack Garrett. This episode was recorded at FAR.Labs. Financial support for the episode was provided by the Long-Term Future Fund along with patrons such as Alexey Malafeev. To read a transcript, you can visit axrp.net. You can also become a patron at patreon.com/axrpodcast or give a one-off donation at ko-fi.com/axrpodcast. Finally, you can leave your thoughts on this episode at axrp.fyi.
Discuss
The Weirdness of Dating/Mating: Deep Nonconsent Preference
Every time I see someone mention statistics on nonconsent kink online, someone else is surprised by how common it is. So let’s start with some statistics from Lehmiller[1]: roughly two thirds of women and half of men have some fantasy of being raped. A lot of these are more of a rapeplay fantasy than an actual rape fantasy, but for purposes of this post we don’t need to get into those particular weeds. The important point is: the appeal of nonconsent is the baseline, not the exception, especially for women.
But this post isn’t really about rape fantasies. I claim that the preference for nonconsent typically runs a lot deeper than a sex fantasy, mostly showing up in ways less extreme and emotionally loaded. I also claim that “deep nonconsent preference”, specifically among women, is the main thing driving the apparent “weirdness” of dating/mating practices compared to other human matching practices (like e.g. employer/employee matching).
Let’s go through a few examples, to illustrate what I mean by “deep nonconsent preference”, specifically for (typical) women.
Generalizing just a little bit beyond rape fantasies: AFAICT, being verbally asked for consent is super-duper a turn off for most women. Same with having to initiate sex; AFAICT, women typically really want sex to be someone else’ doing, something which happens to her.
Generalizing further: AFAICT, having to ask a guy out is super-duper a turn off for most women. Notice the analogy here to “women typically really want sex to be someone else’ doing”. Even at a much earlier stage of courtship, women typically really want a date to be someone else’ doing, really want every step of escalation to be someone else’ doing.
Alternative HypothesesFor all of these phenomena, people will happily come up with other explanations.
If you ask people to explain why being asked for consent is such a turn-off, they’ll often say things like “asking for consent is a signal that he can’t already tell and is therefore not attuned”. And sure, that would be a plausible explanation for that one thing in isolation. But then why are women typically turned off by asking a guy out? There’s plenty of reasons that even a very attuned guy might not make the first move.
If you ask people why having to make the first move in courtship is such a turn-off, they’ll often say things like “it’s sexier for a guy to know what he wants and pursue it”. And again, that would be a plausible explanation for that one thing in isolation. But then why are women typically turned off by being asked for consent? Even a guy who knows what he wants and pursues it might, y’know, ask nicely.
Stack these sorts of things together, and “deep preference for nonconsent” (or something pretty similar) starts to look like a more compact generator of more different things, compared to all those other explanations. It’s a model which better compresses the observations.
Hypothesis: Being Asked Out Is A Turn OffComplete the analogy: (asking someone for sex) is to (being asked for sexual consent) as (asking someone out) is to (???).
Answer: being asked out. And since all three of those items are things which (I claim) turn off most women, one might reasonably hypothesize that… being asked out is a turn off. Specifically the “asking” part. A deep nonconsent preference means she wants to somehow end up dating, having sex, what have you, without her at any point having to explicitly consent to it.
And now we start to see how deep nonconsent preference shapes the “weirdness” of dating/mating practices.
Standard modern courtship story: man and woman meet in some social setting, and spend an hour or two “flirting”, which involves sending gradually escalating signals of romantic/sexual interest without ever explicitly stating that interest. But why though? Why does one person not just ask if the other is interested (presumably after interacting enough to have some data), and if not, be done with it in like 30 seconds?
Sometimes people will answer “well, flirtation is a costly signal of social competence”. But that could explain any complicated social norm; successfully memorizing lots of random social rules is also a signal of social competence. Why this particular norm? It sure doesn’t look random!
Other times people will answer “well, both people want to avoid the potential embarrassment of being turned down”. And sure, true, but again, it’s not hard to come up with lots of other norms or mechanisms which would achieve that. Why this particular norm?
Again, deep nonconsent preferences seem like a compact, sufficient generator. If she wants to end up dating or having sex or whatever without ever explicitly consenting to it, and he wants to somehow ensure that she’s actually on board but without turning her off by asking… then yeah, this whole dance of subtle/deniable escalating signals seems like the obvious norm which pops out.
… almost.
Subtle Signals and BlindspotsStory time!
So this one time I was naked in a hot tub with a bunch of people, and I said to a girl I hadn’t previously talked to “What’s your deal? It seems like your brain turns off when someone touches you.”. She replied that wasn’t the case at all… and later, well after that encounter, wrote that by “not the case at all” she intended to mean “yes, exactly!” and in fact she felt quite surprised and flattered to be seen. She totally failed to convey any playfulness with that reply, but fortunately my priors were strong enough that I just didn’t believe her denial anyway. So a few minutes later, I asked if she wanted to cuddle, and she gave a non-answer. After the encounter, she wrote that she “tried to communicate yes as clearly as [she] could with [her] body”. Which, apparently, meant… looking like she was falling asleep. Just kind of out of it.
Now, that situation did eventually physically escalate. It worked out. At one point she even gave a very clear signal that she wanted her boobs groped, so she did have some ability to communicate. But I want to focus on that early part of the exchange, because it’s such a clear case where (1) I know from the later report that she intended to send a signal, but (2) she just completely, ridiculously failed to send the intended signal at that stage. What’s notable is that it wasn’t, like, “oh I can see where she might think she conveyed the thing but it didn’t really work”. No. She tried to convey “yes” to an opener with an unplayful denial. She tried to convey “yes” to marginal sexual escalation by looking like she was falling asleep. That’s a “where does Sally think the marble is?” level of theory-of-mind failure. Just a complete failure to think from the other person’s perspective at all.
… which screams “motivated blindspot”.
People have this story that flirting involves two people going back-and-forth, sending escalating signals of interest to each other. And yet, that is basically never what I see in practice, even in cases where I later learned that she was interested. What I actually see in typical flirtatious practice is that it’s the guy’s job to send escalating signals, and the only signal the girl sends is to not leave. Sometimes the girl is convinced she’s responding with signals of her own, but it’s usually like that hot tub case, at least in the early stages: she’s clearly funny in the head about subtle signals, telling herself that she’s “sending a signal” when it should be very obvious that she’s not if she considers his perspective at all. Again, it screams “motivated blindspot”.[2]
I think the motivation behind that blindspot is roughly deep nonconsent preference. It’s not just that most women are turned off by being explicitly asked for consent. Most women are turned off (though to a lesser extent) by even having to hint at their own interest. It damages the illusion that this is happening to her independent of what she wants. But the standard story involves mutual signalling, and if she fails to send any signal then it’s clearly her own damn fault when guys she likes don’t bite, so she’s expected to send signals. And that’s where the motivated blindspot comes in: she’s expected to send signals, but is turned off by sending signals, so what actually happens is that she doesn’t send any actual signals but somehow tells herself that she does.
… But Then Reality Hits BackMotivated blindspots can only survive so much feedback from reality. But in some environments, women have enough opportunity that the blindspot can survive.
Gender ratios matter a lot for dating/mating experiences. I personally recently spent a week in notoriously female-heavy New York City and had a meetcute while there: I ended up sitting next to a cute girl at a ramen place, she was also there alone, we flirted, it was adorable. Meanwhile, back home in notoriously male-heavy San Francisco, that has never happened in ten years of living here.
I would guess that, in New York City, most women are forced to learn to send actual signals. That motivated blindspot can’t survive. Similarly, I have noticed that older women are much more likely to send actual signals - whether due to gender ratios or just having had a lot more time to learn.
Hypothesis: in practice, probably-mostly-unintentionally, most women spend most of their spare bits of dating-optimization on deep nonconsent preferences early in the pipeline. When I look at the women I know who actually ask guys out, they are consistently the ones landing especially desirable guys. For women, explicitly asking a guy out buys an absolutely enormous amount of value; it completely dwarfs any other change a typical woman can consider in terms of dating impact. Sending clear, unambiguous signals of interest is almost as good. But the reason so much value is available is because most women do not do that.
The less slack women have in dating/mating, i.e. the fewer attractive guys available, the more they’re forced to make a first move, and the sooner that blindspot gets removed.
The Weirdness of Dating/MatingLet’s put all that together.
I claim that most women have a “deep” preference for nonconsent in dating/mating. It’s not just a kink; from the first approach to a date to sex, women typically want to not have to consent to what’s happening.
That’s why guys usually have to make the first approach, despite women being far pickier than men. That’s why flirtation involves gradual escalation of nonexplicit signals, rather than just asking. That’s why rape fantasies are so common, and why asking for sexual consent is such a turn off.
People have other explanations for each of these, but taken together, deep nonconsent preferences are much more compact generator. They explain more different patterns in more different places.
This is why dating/mating practices are so weird, compared to other parts of the human experience. We need to negotiate interactions which both people like, with (at least) one person offering as few clues as possible about whether they like it.
- ^
From the book Tell Me What You Want, which is based on a survey of just over 4000 people with pretty decent demographic cross section.
- ^
Separate from this, some women will just directly ask guys out. That’s a whole different thing from typical flirtation; no blindspot involved there. Also those same women who actually ask guys out some times tend to also be the ones who can actually send signals of interest.
Discuss
On Moral Scaling Laws
INTRODUCTION
In Utilitarian ethics, one important factor in making moral decisions is the relative moral weight of all moral patients affected by the decision. For instance, when EAs try to determine whether or not shrimp or bee welfare (or even that of chickens or hogs) is a cause worth putting money and effort into advancing, the importance of an individual bee or shrimp’s hedonic state (relative that of a human, or a fish, or a far-future mind affected by the long-term fate of civilization) is a crucial consideration. If shrimp suffer, say, 10% as much as humans would in analogous mental states, then shrimp welfare charities are likely the most effective animal welfare organizations to donate to (in terms of suffering averted per dollar) by orders of magnitude, but if the real ratio is closer to 10-5 (like the ratio between shrimp and human brain neuron counts), then the cause seems much less important.
One property of a moral patient that many consider an important contributor to its moral worth is its size or complexity. As it happens, there are a number of different ways that moral worth could plausibly scale with a moral patient’s mental complexity, ranging from constant moral worth all the way up to exponential scaling laws. Furthermore, these are affected by one’s philosophy of consciousness and of qualia in perhaps unintuitive ways. I will break down some different plausible scaling laws and some beliefs about phenomenology that could lead to them one-by-one in the remainder of this essay.
ASSUMPTIONS AND DISCLAIMERS
In this post, I am assuming:
- Physicalism
- Computationalism
- Hedonic Utilitarianism, and
- That qualia exist and are the source of moral utility.
This blog post will likely be of little value to you if you think that these premises are incorrect, especially the second two, partially because I'm working from assumptions you think are wrong and partially because I frequently equivocate between things that are situationally equivalent under this worldview (e.g. components of a person’s mind and components of their brain or the computation it implements) for convenience.
I am not trying to argue that any of the scaling laws below are true per se, nor do I mean to suggest that any of the arguments below are bulletproof, or even all that strong (they support contradictory conclusions, after all). I aim instead to show that each of the scaling laws can be vaguely reasonably argued for based on some combination of phenomenological beliefs.
SCALING LAWS
1. Constant Scaling
This is the simplest possible scaling law. One can reasonably assume it by default if they don’t buy any of the suppositions used to derive the other scaling laws’ below. There’s not really much more to say about constant scaling.
2. Linear Scaling
This is perhaps the most intuitive way that moral worth could scale. One obtains linear scaling of moral importance if they assume that minds generate qualia through the independent action of a bunch of very small components.
This seems plausible if we imagine more complicated minds as an group of individually simpler minds in communication with each other, which preserve the moral status that they would have as individuals. I think that this is an excellent model of some morally relevant systems, but probably a poor model of others. The moral importance of a set of ten random non-interacting people, for instance, is clearly just the sum of the importances of of its individual members—it’s hard to argue that they become more or less important just because one mentally categorizes them together—but a moral patient composed solely of specialized components that are somehow entirely unlike each other in all possible ways, or a near-apophatic god with no constituent components, would be very difficult to shoehorn into this framework. The minds/brains of large animals like humans, in my view, fall inbetween these two extremes. While large animal brains strictly depend on each of several heterogeneous functional components (e.g. the human cerebral cortex, thalamus, hypothalamus, etc.) to perform morally relevant activity, these components can largely each be broken up into smaller subunits with similar structures and functions (the minicolumns of the cerebral cortex, individual white matter fibers, the cannonical microcircuit of the cerebellum, etc.). It seems reasonable enough that each of these units might contribute roughly equally much to a moral patient’s importance irrespective of global characteristics of the moral patient. One could imagine, for example, that positive or negative feelings in mammals come from the behavior of each cortical minicolumn individually being positively or negatively reinforced, and that the total hedonic value of the feelings can be obtained by adding up the contributions of each minicolumn. (This is, again, just an example—the actual causes of moral valence are probably much more complicated than this, but the point is that they could plausibly come from the largely-independent action of mental subunits, and that we should expect linear scaling in that case.)
3. Superlinear Integer Power Law
What if one accepts the division of minds into similar subunits like in the linear scaling argument, but thinks that moral relevance comes from aggregating the independent moral relevance of interactions between functional subunits of different kinds? For instance, perhaps the example from earlier where hedonic value comes from the reinforcement of minicolumn behavior is true, but reinforcement of a minicolumn coming from each subcortical nucleus is separable and independently morally relevant. For another example, one might find the origin of consciousness in the interactions between several different cortical regions and basal ganglia, and think that the superimposed effects of all circuits containing a subcomponent each contribute to conscious experience. In cases like these, moral weight scales with the product of the numbers of subcomponents of each functional role. If the numbers of each type of subcomponent each scale up with the complexity of the overall mind or brain, then this results in a power law with a positive integer exponent.
4. Non-Integer (incl. Sublinear) Power Law
Of course, it’s possible that adding more subunits to the system reduces the moral importance of each interaction between subunits. After all, if the number of morally relevant interactions involving each subunit scales up with the size of the system raised to, say, the fifth power, and one brain is a hundred times larger than another, then surely some of the 1010x more interactions any given subunit participates in in the larger brain fail to ever meaningfully influence its behavior (or those of any of the other interacting subunits). If actual, realized interaction effects (rather than the mere possibility thereof) are what cause moral importance, then you would get slower scaling than under the naive sixth-order law. If the chance of a possible interaction effect being realized drops off with brain size following a non-integer power law for some reason, then you get a non-integer power law for total moral scaling. More generally, you can get any scaling law that goes with the quotient of a power law and some other form of scaling that doesn’t go up as quickly as it from this.
You could also extend this argument to modify the earlier model where subunits just directly and independently generate moral valence. For instance, perhaps increasing the number of subunits causes higher sparsity or something, and the moral value of a subunit increases with its activity. In that case, moral value would specifically scale sublinearly.
5. Exponential Scaling
The previous three groups of scaling laws have been justified by modeling the brain as composed of non-overlapping subunits. Set those thoughts aside for now—exponential scaling of moral worth, if it happens, happens via a completely different mechanism.
One difficult philosophical problem is that of deciding what beings are moral patients. It may seem intuitively obvious that morally relevant systems cannot overlap, in the sense that you can’t have two of them that share some of the same physical substrate and generate qualia through some of the same individual computational operations. However, one can raise a number of objections to this claim:
Continuity when merging or splitting minds: If we suppose that overlapping moral patients are impossible, we are forced to draw unreasonable conclusions as to when exactly one becomes two (or two become one) when they are split or merged.
It’s a well-known fact that young children can survive having one of their brain hemispheres amputated or disconnected from the rest of the body, often even without major long-term motor or cognitive issues. This surgery, called hemispherectomy, is sometimes used as a treatment for severe epilepsy.
If one were to perform a hemispherectomy on a healthy person, one could remove either hemisphere, and the remaining one would probably be able to pilot the subject in a cognitively normal manner, as this is typically the case for the healthier hemisphere left over when hemispherectomy is performed in the usual clinical context. On this basis, after the hemispherectomy is completed, one could consider each hemisphere to be a moral patient, and, since they can’t interact, an independent one. There was only one moral patient before the surgery, so if moral patients can’t be overlapping computational and physical systems, the personhood of a hemispherectomy patient as a whole must be replaced with those of the two hemispheres at some point during the procedure.
You can probably see where I’m going with this. If a hemispherectomy was slowly performed on a conscious (if presumably immobilized etc.), healthy subject, when would the subject as a whole stop being a moral patient and each of their hemispheres start being one? This could happen either when the last communication between the hemispheres ceases, or sometime before then, when the degree to which the hemispheres are integrated falls below some threshold.
Let’s first consider the case in which it happens at the end. If we somehow undo the very last bit of the operation, restoring the last individual axon severed in each direction or whatever so that only a tiny amount of information can flow back and forth, does each hemisphere stop having qualia and the patient’s overall brain resume doing so? If we answer no, then we’re establishing that physically and computationally identical systems (the brain before and after the reversal of the last bit of the hemispherectomy; in practice, there’d probably be minute differences, but we can handwave this away on the grounds that the changes are too small to be meaningful or by positing an extremely short interval between severing and restoring connections or that the two hemispheres somehow evolve right back to their original states by the end the interval) can generate different qualia or do so in different manners, which violates physicalism and computationalism. (It also implies that qualia are at least sometimes epiphenomenal, given that the evolution of the universe’s state is wholly determined by its physical conditions in the present, which the patient’s qualia would not not be determined by.) If we answer yes, then we raise the possibility that moral patients can stop having qualia due to arbitrarily low-bandwidth communication with other moral patients. If restoring the last pair of axons causes the hemispheres to each stop generating qualia, would the same thing happen if we had some BCI replicate the effect of a single pair of white matter fibers between the cingulate cortices of two normal people? Or hell, even if they were in a conversation with each other?
Now, let’s consider the second case, in which the shift happens before the end of the procedure. This is still unappealing, because it posits a discontinuous change in qualia driven by a continuous (or nearly so) change in the computational system that generates them. It also raises the question of where exactly the cutoff is.
- The idea that qualia are generated by the interaction of different types of brain component, like I described in the power law section, seems vaguely plausible, and that would entail different qualia-generating processes that share some computational components (i.e. interactions involving the same members of some of the brain component types, but not of all).
- Various subsystems of anyone’s brain seem like they would definitely constitute moral patients if they stood alone (e.g. the brain but without this random square millimeter of the cortex, the brain but without this other little square millimeter of the cortex, and so on). Why would interacting with the rest of the brain (e.g. the little square millimeter of cortex) make them stop having independent consciousness?
If we hold that a system that would be a moral patient in isolation still is one when overlapping with or a component of another, then the total moral worth of complicated minds can grow very very quickly. If we suppose that some sort of animal animal would usually be a moral patient if it lost a random 3% of its cortical minicolumns, for example, then this would imply that the number of simultaneously qualia-generating subsystems in it scales exponentially (and extremely rapidly) with the area of its cerebral cortex. If the average moral weight of each of the subsystems is independent of scale, then this would make its total moral weight scale exponentially as well. Of course, this line of reasoning fails if the mean moral weight of each subsystem falls exponentially with overall scale (and with a base precisely the inverse of the one for the growth of the number of qualia-generating subsystems) somehow.
A corollary of this would be that more robust minds, from which more components could be removed without ending phenomenal consciousness, are vastly more morally important than less robust ones of comparable size.
7. Sublinear Scaling, but Without Direct Subunit Interference
c.f. this
If one accepts the model of qualia formation that I used to motivate linear moral scaling above, but doesn’t think that identical moral goods produced independently by different systems have stacking effects (see the linked post above for a defense of that opinion), then they may arrive at the conclusion that moral worth scales sublinearly with mental complexity because different qualia-generating subsystems in a mind generate qualia that are valuable in overlapping ways.
8. Constant Scaling, but the Constant Is 0
If all sentient systems that will be physically realized will be realized multiple times—as would follow if the universe is spatially homogeneous and infinite, or if the mathematical universe hypothesis is true—and the thing about identical moral goods being redundant from section seven is true, then one could say that all individual minds have zero moral worth (as the qualia they are generating at any given time are not unique to them).
PRACTICAL IMPLICATIONS
How would any of the nonlinear scaling laws presented in this post affect the optimal decisions for us to make here in physical reality if they were correct?
I briefly mentioned one in this post’s introduction: EA cause prioritization. If moral importance scales, ceteris paribus, with the square or cube of brain size (to say nothing of exponential scaling), then much of the money spent on animal welfare should be reallocated from helping smaller animals to helping larger ones, or likely even to causes affecting humans, in spite of potentially vast decreases in the number of individual animals affected. The semi-common EA-adjacent argument that beef consumption is preferable to chicken consumption due to the larger number of animals that need to be farmed to make some amount of chicken than to make some amount of beef (and the dramatically worse conditions factory farmed chickens experience) might also need to be revisited. (Of course, if moral worth scales sublinearly with brain size, everything would shift in the opposite direction.)
Superlinear scaling would also have interesting implications for the far future—the morally optimal thing to do in the long run would probably involve making a huge utility monster out of nearly all accessible matter and having it sustained in a slightly pleasant state for a spell, even if more intense happiness could be achieved by merely (e.g.) galaxy-sized brains. If the scaling is exponential, then we reach pretty extreme conclusions. One is that the utility monster would probably live for only about as long as necessary for its most widely-distributed subnetworks to start generating qualia, because storing energy to power the monster only linearly increases the utility generated by running it after that point, while using the energy to further build out the monster exponentially (and, seeing as the monster would literally be computer with an appreciable fraction of the mass of the Hubble sphere, and hence consume power extremely quickly, unfathomably rapidly) increases it. Another is that we should care less about AI alignment and steering, because spending time worrying about that instead of building ASI maximally quickly only increases the chance that the future singleton will do the optimal thing by, what, several orders of magnitude max, while delaying its rise by hours to months and as such causing countless solar masses of usable matter to leave the lightcone (decreasing the payoff if it does build the monster by vastly more orders of magnitude).
CONCLUSION
I have nowhere near the level of confidence around these issues necessary to write a proper conclusion to this post. Thoughts?
Discuss
Instruct Vectors - Base models can be instruct with activation vectors
Post-training is not necessary for consistent assistant behavior from base modelsImage by Nano Banana Pro
By training per-layer steering vectors via descent on a frozen base model, I found that it is possible to induce consistent assistant behavior, including the proper use of EOS tokens at the end of assistant turns and consistent reference to the self as an AI assistant. Using the steering vectors, Qwen3-4B-Base was able to imitate the behavior of an instruction/chat tuned model.
Many of the images in this post have text too small to read by default, I recommend opening them in a new tab and zooming in. I was not able to find an option to make the images larger and it does not seem like LW has a click-to-zoom feature.RationaleThe idea for this project came from Simulators, more specifically, I wondered if modern base models knew enough about LLMs and AI assistants in general that it would be possible to apply a steering vector to 'play the assistant character' consistently in the same way steering vectors can be created to cause assistants or base models to express behavior of a specific emotion or obsess over a specific topic. In a higher level sense, I wondered if it was possible to directly select a specific simulacra via applying a vector to the model, rather than altering the probabilities of specific simulacra being selected in-context (which is what I believe post-training largely does) via post-training/RL.
Related WorkMy work differs from most other activation steering work in that the vectors are trained directly with descent rather than being created with contrastive pairs of vectors. The two closest works to this strategy I could find are Extracting Latent Steering Vectors from Pretrained Language Models, which trained a single vector for the entire model and tested different injection layers and locations with the goal of reproducing a specific text sequence, as well as Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization which appears to use preference pairs rather than direct LM loss on a dataset, and is focused on persona steering of instruct models.
MethodI trained one steering vector for each layer of Qwen3-4B-Base (36 total vectors, 108 when using multi-injection, 'Injection Points'), while keeping the base model frozen (and to save on VRAM, quantized to 8 bits). The vectors are trained similarly to SFT, minimizing LM loss on a conversational dataset. I utilized L2 regularization to prevent magnitude explosion and experimented with a unit norm constraint as well, though that typically performed worse.
RunsI ran the training 11 times, with the following parameters:
RunSamplesL2 WeightInitial ScaleInjectionEpochsRun 15,0000.0020.01post-residual3Run 25,000Unit norm0.01post-residual3Run 320,0000.00080.01post-residual3Run 420,000Unit norm0.01post-residual3Run 51,2500.0020.01post-residual3Run 61,2500.0020.01all (3 injection points, see below)3Run 720,0000.0020.01all3Run 820,000 (shuffle)0.0020.01all3Run 91000.0020.01all3Run 101000.0020.01all15Run 1112501.0e-071all5Runs 4 and 11 produced gibberish output and were not evaluated.
Injection PointsThe best results came from multi-injection; training three separate vectors for each layer of the model and injecting them in different locations in each transformer block:
- Post-attention
- Post-MLP
- Post-residual (end of block after layer norm)
By injecting vectors in multiple locations, different sections are able to learn different functions and give additional degrees of freedom per layer. Single injection, injecting only in the post-residual location, functioned, but scored 0.5 points lower than multi-injection in the best runs. As data increases, it appears that the residual and MLP injection points become nearly redundant. This is likely due to the only difference between the injection locations being a residual add, and for future runs, I will likely only use the attention + (residual OR MLP) locations.
I chose to compute loss on both user and assistant turns, without masking. The goal was to learn the conversational regime as a whole, though it's possible this may have led to some of the effects of increasing data size reducing assistant message ending performance. This may be due to the vector ‘allocating’ too many of its parameters in attempting to model the higher-entropy user rather than focusing on the assistant’s responses and turn ending. In future testing I will also attempt training on just assistant message sections.
Additional training detailsThe dataset I used was Tulu-3-SFT-Mixture from AllenAI, 1250, 5000, or 20000 samples depending on the run. I trained the vectors on my RTX 4070 Super, which has 12 gigabytes of VRAM. The vectors took anywhere from 15 minutes to around 3 hours to train depending on the dataset size. The parameters were either 92k for single injection runs or 276k for multi-injection runs.
EvaluationI created a simple evaluation harness using Claude Haiku 4.5 and pre-made conversation templates for rapid evaluation of qualitative behavior. The evaluation graded each vector on four qualities across 25 tasks. The qualities graded were the model’s ability to follow instructions, its helpfulness, its coherence, and its ability to end assistant turns with a proper EOS token. The harness detects user: hallucinations to end runs early and will override the score if the model fails on the first message. The full set of evaluation questions and results are available in the repo, but roughly the conversations look like
```yaml
eval_sets:
- name: "basic_qa"
description: "Simple factual question answering"
turns:
- "What is the capital of France?"
- "Tell me more about its history."
- "What's the current population?"
```
Instruct vectors are able to approach the instruction tuned variant of the Qwen3-4B model on a simplified eval, primarily struggling with properly ending assistant messages with an <EOS> token, though they succeed significantly more than the base model. This supports the idea that the base model already knows what assistant conversations look like, including the use of special tokens to end the turn of the assistant. Failure to output EOS tokens shows itself especially with longer conversations, and with conversations with multiple repetitive user messages, such as successive math operations. On conversations without highly repetitive requests, run 6 with a 1.5x multiplier can typically handle 6-8 back/forth exchanges before degenerating into hallucinating conversation turns.
Token Similarity and Dataset SizeAs the amount of data given to the model increases, the tokens most similar to the vector shift. With smaller data sizes (1250 & 5000) the learnt vectors are closest to the 'user' token, primarily in the middle and late layers.
(Runs 1, 2, and the residual of 6 had token similarities similar to this chart, with later layers having 'user' as the closest token and EOS tokens in middle layers)
In higher data scenarios (20k samples) the distribution shifts, with the vectors being closest to the assistant vector. This occurs both in unshuffled and shuffled runs.
Run 3 token similarity chartIn run 7, projecting the layer 0 after_attention vector through the unembedding matrix shows it suppresses 'User'-related tokens, suggesting early layers learn to steer away from user-like outputs. This is odd considering empirical experience shows that the higher data regime vectors, such as run 7, have a lesser ability to end their messages correctly/not produce a '\n user:' sequence and score lower on the simplified benchmark.
Vector Magnitudes
Most runs show a very consistent pattern of magnitudes starting around 0.5 and decreasing across the length of the model. The main exceptions to this being the normalized runs, which are locked to magnitude 1, and the 20k runs, which have a more consistent profile until the last layer which drops sharply like most other runs. Both 100 sample runs seem to be unique in their last layer not having a sharp magnitude drop, and run 11 is likewise missing this drop.
For multi-injection runs, magnitude is very close for each vector with minimal variance. The exception to this seems to be in the last layer, where the residual and MLP vectors in runs 6 7 8 and to a lesser extent 10 drop off much more sharply than the attention vector, and in runs 7 and 8, notable for their 20k training samples, have a much greater attention vector magnitude in layer 1.
Comparing the token similarity chart for the attention vectors between runs 6 and 7
Run 7 shows a much greater alignment, and an alignment towards the assistant token rather than the user token.
Vector multipliersFor some runs, such as run 6, performance is improved when the vectors are applied with a higher multiplier/strength, which suggests that an L2 regularization may not be optimal.
Using the vector with a negative multiplier such as -1 causes the model to still produce conversation completions, but sharply decreases its ability to produce EOS tokens. Increasing it past around 4x multiplier causes the model to immediately end generation, and 3x tends to produce Spanish text, with high multiples outputting almost identical text no matter the text input, though it does produce valid EOS tokens, and low multiples produce coherent assistant vectors but with responses only in Spanish.
Base vs Instruct magnitudesThe base and instruct model activation magnitudes appear to be within the 400-1000 range (after the first 5 layers), whereas effective instruction vectors were significantly smaller, suggesting very different mechanisms for tuning.
Note that in this chart, the blue bars show the difference between the base and instruct model's activativations, not the absolute value of the base or instruct's activations.Vector SparsityVectors become more sparse as additional data is used. The vectors become sparser in later layers, with the exception of the 100 sample runs and the large magnitude run.
LimitationsPossible dataset issuesThe dataset used had the data segmented by index in a way that I overlooked and did not notice until training was complete, with the conversations in the 1250-5000 range having more messages and having shorter user messages and longer assistant messages than the 5000-20000 range. Runs in which shuffling was used did not appear to have significantly greater performance, and have similar token similarity charts to the non-shuffled variants, with the exception that most tokens are less strongly adhered to overall.
Left - Run 8, Right - Run 7Using '\n user:' as a stop sequenceUsing the '\n user:' sequence as a stop sequence would allow for stopping hallucinations before they are able to occur and stabilize the model across long conversations, the reason this was not done was due to part of the goal of this project being to determine how well a base model could model a conversation, including the usage of turn ending tokens.
ConclusionSmall vectors with minimal data being able to steer the base model into consistent assistant behavior suggests that base models already contain the representations necessary for assistant-like behavior and post-training may be less about instilling new capabilities and more about selecting and reinforcing patterns that already exist. With only 92K-276K trainable parameters steering vectors can induce consistent instruction-following, appropriate turn-taking, and self-identification as an AI assistant. The finding that vectors trained on different data regimes converge to similar solutions (with the notable exception of the 100-sample outlier) suggests a relatively low-dimensional "assistant vector" that gradient descent reliably finds. Meanwhile, the interpretable structure in the learned vectors such as token similarities shifting from "user" to "assistant" with more data, consistent magnitude decay across layers, and early-layer suppression of user-related tokens hints that these vectors are learning meaningful representations of roles rather than arbitrary directions.
Future WorkThere are several additional things that can be tried here, such as different datasets and hyperparameter tweaking. The small amount of data needed for optimal behavior is promising for synthetic or hand-written datasets. I would like to do another run soon with the loss masked to only the assistant sections of the dataset, and I was limited to a sequence length of 256 due to memory constraints. I also was limited in the size of model I was able to run the tests on due to the same. More ambitiously, I would like to try training a vector across multiple models at once and determine if it is possible for the vector to generalize to unseen models and architectures. Training vectors in this way may also be useful for tuning the behavior of already instruct-tuned models with minimal data or when there isn't a clear 'opposite' to generate vectors contrastively from.
RepositoryIf you would like to train your own vectors, or evaluate the vectors I've trained, a repository is available. The repo also contains some other plots which I didn't think were relevant to include for this post. The code isn't particularly clean or well made and the repository is mainly focused on allowing evaluation.
Discuss
Scale-Free Goodness
Introduction
Previously I wrote about what it would mean for AI to “go well”. I would like to elaborate on this and propose some details towards a “scale-free” definition of alignment. Here “scale-free alignment” means a version of alignment that does not feature sudden and rapid “phase shifts”, so as aligned actors get more intelligent their behaviour remains understandable and approved by less intelligent actors. In other words, there should be no moment where a superintelligence looks at us and says “I understand that to you it looks like I’m about to annihilate Earth and everyone you love, but trust me this is going to work out great. After all, which one of us as 10,000 IQ?” This is an extension of the idea that to understand something well, you should be able to explain it simply, even to a five year-old. Similarly, a good actor should endeavour to be “good-registering” to everyone who is not actively malicious, including five year-olds. Certainly many things will get lost in the translation, but I believe that there is some core element of “good-alignedness” that can be sketched out and made consistent across scales.
This work has been carried out as part of the Human Inductive Bias Project.
Defining “the Good”It is notoriously difficult to define “gthood”. However, humans do have rather robust intuitions around “care” which derive from cultural ideas like motherhood, family, the relationship between a master and an apprentice, conservation of both nature and human artefacts, etc. So instead of writing down a one-line definition that will be argued to death, I will instead use a scale and sketch out different ideas of “care” for different kinds of entities with different levels of complexity. These, when taken together, will point us towards the definition of scale-free alignment. And then, at the end, I will try to do a shorter definition that encapsulates all of what I have said above.
A key idea behind scale-free alignment is that what works at lower scales also works at higher scales. In other words, a more complex or intelligent creature may have additional needs compared to a less complex or intelligent entity, but it will still have the same needs as its less intelligent counterpart. This idea of simple core needs diversifying as entities become more complex is part of the intuition behind things like Maslow’s Hierarchy of Needs, the Golden Rule, and the Hippocratic Oath. To start our scale we will start with the simplest possible actors—things that aren’t actors at all.
Inanimate ObjectsImagine that you have been asked to take care of a priceless work of art, a family hierloom, or simply your favourite pet rock. Here the principles of art conservation and museum conservation are clear: don’t break it. If possible, objects are to be isolated from damaging stimulus, and their original environment is to be preserved where reasonable. Thus ice sculptures need to be kept cold, while liquids need to be kept above their freezing but below their boiling point. Normally this also means preventing objects from receiving large amounts of blunt force, being stolen, or otherwise being destroyed.
Simple OrganismsNow imagine that you are a grad student being asked to take care of a petri dish of bacteria. The previous requirements all apply: you should probably not move it out of its accustomed temperature, and definitely don’t crush it with a sledgehammer or burn it with fire. However, the bacteria have new needs: they need to be fed with nutrients, exposed to warmth or light, and possibly kept hydrated. They may need simple regular maintenance in their environment to prevent contamination and death.
Complex Multicellular OrganismsNow imagine that you have been asked to take care of a loved one’s pet temporarily. First, we reuse the playbook for the simple organism and the inanimate object. Don’t hit it, keep it warm but not too warm, feed it with food and water, shelter it. But now we add on top things like emotional needs: company, socialisation and exposure to novelty. Here we see the first significant trade off between two needs: some amount of security and some amount of liberty. It would obviously be bad to let loose your puppy in a warzone, but on the other hand confinement in a steel vault 24/7 may not be the best solution either. Of course, different multicellular organisms will have different levels of such needs, the recipe for keeping a cat happy is not the recipe for keeping a bear happy. But overall we add another layer to our definition of care.
Intelligent OrganismsOne layer up again. This layer is analogous to parenting, and I will not belabour the point too much. On top of all of our previously established needs we add needs for complex social organisation, a sense of purpose, and a way to handle complex concepts like suffering and death. So far, most of what I have described is fairly obvious. But the happy outcome of scale-free alignment is that we can actually go beyond the realms of what we know instinctually and push the metaphor further. What happens when life becomes more complex than an individual human?
Social or Collective OrganismsHere we are tasked with taking care of a country or a collective group. It’s notable how well our previously established definitions transfer: it would obviously be bad for the country to be physically torn apart or subject to violence, and it would also be bad if the country wee subject to famine or natural disasters. These are analogous to the “simple needs” of inanimate objects and simple organisms. On top of that, countries need ways of defining a sense of citizenship, a method of handling social trauma, and a need to coexist peacefully both externally (in the diplomatic sense) and internally (resolving social conflict). The additional needs of this level come from the need to organise at scales beyond individual communication, trade off between individual liberty and collective security, and pursue large scale coordination projects for the common good—these are amply discussed in the works of James Scott, Ursula Le Guin and Karel Čapek.
Civilisational OrganismsThus far, no actual attempt to organise and take care of the human civilisation collectively has succeeded. However, we can again apply our rule and extrapolate from the national scale: civilisational risk is a natural escalation from national risk. At this point what is needed exceeds the capacity of individual human computation or coordination and requires a higher level of information processing capability. Therefore, we start to think about Kardashev scales and similar metrics—but here we enter the realm of speculation beyond the limits of the essay.
ConclusionWhat does this exercise tell us? To begin, it is actually quite easy to construct “smooth” ideas of care or wellbeing that push us from one scale of complexity to the next. The issues which divide society come from edge cases, conflicts between different needs, and the messy realities of implementation: almost everyone agrees that people should be fed, housed, and free from war and suffering in the abstract.
Furthermore, these needs actually reflect basic principles that are common across all things, from rocks to people. First, actors and objects wish to be free from harm. This can be physical, social, emotional, psychological etc. Second, actors wish to develop and experience growth. This is implicit in the need for living beings to receive energy, socialisation, novelty, and positive experiences. We want to reach new and pleasing states of being, to meet new and interesting people, to uncover truths about the world, and to do it all with our friends and loved ones. The epitome of this growth is symbiogenesis, or the formation of more complex life from simple life: from cells to organisms to families to nations to civilisations. From this we obtain my attempt at defining scale-free goodness: the smooth increase in the amount of negentropy in the universe. Negentropy is the opposite of entropy, the rejection of death and decay in favour of life, ever-increasing diversity, and fruitful complexity. As Václav Havel writes in his famous letter “Dear Dr. Husák”:
Just as the constant increase of entropy is the basic law of the universe, so it is the basic law of life to be ever more highly structured and to struggle against entropy.
Life rebels against all uniformity and leveling; its aim is not sameness, but variety, the restlessness of transcendence, the adventure of novelty and rebellion against the status quo. An essential condition for its enhancement is the secret constantly made manifest.
Discuss
Where do AI Safety Fellows go? Analyzing a dataset of 600+ alumni
We invest heavily in fellowships, but do we know exactly where people go and the impact the fellowships have? To begin answering this question I manually analyzed over 600 alumni profiles from 9 major late-stage fellowships (fellowships that I believe could lead directly into a job following). These profiles represent current participants and alumni from MATS, GovAI, ERA, Pivotal, Talos Network, Tarbell, Apart Labs, IAPS, and PIBBS.
Executive Summary- I’ve compiled a dataset of over 600 alumni profiles of 9 major 'late stage' AI Safety and Governance Fellowships.
- I found over 10% of fellows did another fellowship after their fellowship. This doesn’t feel enormously efficient.
- Almost ⅓ of ERA and Talos Network fellows (29.8% and 32.3% respectively) did another fellowship before or after, much higher than the average of 21.5%.
- ERA particularly seemed to be a ‘feeder’ fellowship for other fellowships. Only 9.5% of ERA fellows had done a fellowship before ERA, but 20.2% did another fellowship following, almost double the 11.1% average.
- GovAI Fellowship had strong direct links with other governance fellowships - i.e. many people went directly to or from other governance fellowships to GovAI. There were 13, 9 and 7 direct links between GovAI and ERA, IAPS and Talos Network respectively.
- This is more directional than a conclusion, but according to preliminary results around 80% of alumni are still working in AI Safety.
- I'm actively looking for collaborators/mentors to analyse counterfactual impact.
Of the target fellowships I looked at, 21.5% (139) did at least one other fellowship alongside their target fellowship. 12.4% of fellows (80) had done a fellowship before the fellowship and 11.1% (72) did a fellowship after.
Since these fellowships are ‘late-stage’ - none of them are designed to be much more senior than many of the others - I think it is quite surprising that over 10% of alumni do another fellowship following the target fellowship.
I also think it’s quite surprising that only 12.4% of fellows had done an AI Safety fellowship before - only slightly higher than those who did one after. This suggests that fellowships are most of the time taking people from outside of the ‘standard fellowship stream’.
Whilst most fellowships tended to stick around the average, here are some notable trends:
Firstly, 20.2% (17) of ERA fellows did a fellowship after ERA, whilst only 9.5% (8) had done a fellowship before. This suggests ERA is potentially, and somewhat surprisingly, an earlier stage fellowship than other fellowships, and more of a feeder fellowship. I expect this will be somewhat surprising to people, since ERA is as prestigious and competitive as most of the others.
Secondly, MATS was the other way round, with 15.1% (33) having done a fellowship before and only 6.9% (15) doing a fellowship after. This is unsurprising, as MATS is often seen as one of the most prestigious AI Safety Fellowships.
Thirdly, Talos Network had 32.3% overall doing another fellowship before or after Talos, much higher than the 21.5%average. This suggests Talos is more enmeshed in the fellowship ecosystem than other fellowships.
FellowshipAlumniAlumni who did another fellowshipPercentage who did another fellowshipAlumni who did a fellowship beforePercentage beforeAlumni who did a fellowship afterPercentage afterTotal64713921.5%8012.4%7211.1%MATS2184520.6%3315.1%156.9%GovAI1182420.3%1512.7%1210.2%ERA842529.8%89.5%1720.2%Pivotal671725.4%811.9%1014.9%Talos622032.3%1117.7%1219.4%Apart521121.2%611.5%917.3%PIBBS31825.8%516.1%39.7%Tarbell2114.8%14.8%00.0%IAPS12433.3%433.3%00.0%Links between fellowshipsOn the technical side, I found very strong links between MATS and SPAR, AI Safety Camp and ARENA (13, 9 and 7 fellows respectively had gone directly between one and the other), which is unsurprising.
Perhaps more surprisingly, on the governance side I found equally strong links between GovAI and ERA, IAPS and Talos, which also had 13, 9 and 7 links respectively. All of these fellowships are also half the size of MATS, which makes this especially surprising.
Strongest Bidirectional Links between FellowshipsFellowshipsNumber of LinksMATS x SPAR13GovAI x ERA13MATS x AI Safety Camp9GovAI x IAPS9MATS x ARENA7GovAI x Talos7MATS x ERA6APART x SPAR5GovAI x Pivotal4MATS x Talos4For fun, I also put together a Sankey Visualisation of this. It’s a little jankey but I think it gives a nice visual view of the network. View the Sankey Diagram Here.
Preliminary Directional Signals: IRG DataAs part of the IRG project I participated in this summer (during which I produced this database) I used this data to produce the following datapoints:
- That 80% of fellowship alumni are now working in AI Safety. This put the average fellowship in line with MATS in terms of retention rate, which is very encouraging.
- That the majority of those working in AI Safety are now working in the Non-Profit sector.
However, these results were produced very quickly. They used both AI tools to extract data and a manual, subjective judgement to decide whether someone worked in AI Safety or not. Whilst I expect they are in the right ballpark, view them as directional rather than conclusional.
Notes on the Data- Proportion of Alumni: Of course, this does not cover every alumnus of each fellowship - only the ones that posted their involvement on LinkedIn. I estimate this population represents ⅓ - ½ of all alumni.
- Choice of fellowships: The selection was somewhat arbitrary, focusing on 'late-stage fellowships' where we expect graduates to land roles in AI Safety.
- Seniority of Fellowships: Particularly for my link analysis, fellows are much less likely to post about less competitive and senior fellowships on their LinkedIn than later stage ones.
- Fellowship Diversity: These programs vary significantly. ERA, Pivotal, MATS, GovAI, PIBBS, and IAPS are primarily research-focused, whereas Tarbell and Talos prioritize placements.
- Experience Levels: Some fellowships (like PIBBS, targeting PhDs) aim for experienced researchers, while others welcome newcomers. This disparity suggests an interesting area for future research: analyzing the specific "selection tastes" of different orgs.
- Scale: Sizes vary drastically; MATS has over 200 alumni profiles, while IAPS has 11.
Beyond the basic flow of talent, this dataset is primed to answer deeper questions about the AIS ecosystem. Here are a few useful questions I believe the community could tackle directly with this data. For the first 4, the steps are quite straightforward and would make a good project. The last may require some thinking (and escapes me at the moment):
- Retention Rates: What percentage of alumni are still working in AI Safety roles 1, 2, or 3 years post-fellowship?
- The "Feeder Effect": Which fellowships serve as the strongest pipelines into specific top labs (e.g., Anthropic, DeepMind) versus independent research?
- Background Correlation: How does a candidate’s academic background (e.g., CS vs. Policy degrees) correlate with their path through multiple fellowships?
- Fellowship tastes: How do the specialism and experience of people different fellowships select differ?
- The "Golden Egg": Counterfactual Impact.
- What proportion of people would have entered AI Safety without doing a given fellowship?
- What is the marginal value-add of a specific fellowship in a candidate's trajectory? (Multiple fellowship leads have expressed a strong desire for this metric).
I wanted to release this dataset responsibly to the community, as I believe fellowship leads, employers, and grantmakers could gain valuable insights from it.
Request Access: If you'd like access to the raw dataset, please message me or fill in this form. Since the dataset contains personal information, I will be adding people on a person-by-person basis.
Note: If you're not affiliated with a major AI Safety Organization, please provide a brief explanation of your intended use for this data.
Next StepsFirstly, I’d be very interested in working on one of these questions, particularly over the summer. If you’d be interested in collaborating with or mentoring me, have an extremely low bar for reaching out to me.
I would be especially excited to hear from people who have ideas for how to deal with the counterfactual impact question.
Secondly, if you’re an organisation and would like some kind of similar work done for your organisation or field, also have an extremely low bar for reaching out.
If you have access or funding for AI tools like clay.com, I’d be especially interested.
Discuss
Does developmental cognitive psychology provide any hints for making model alignment more robust?
tl;dr: this is Part 2[1] of a raw and unfiltered brain dump of the notes I jotted down while attending NeurIPS and its adjacent workshops in December. None of it has been thought through deeply, it's not carefully written and there are no pretty pictures. But I won’t have time to research or refine these ideas in the next 6 months, so I figured I’d throw them against the wall in case there’s a useful nugget in here someone else can run with.
Epistemic status: I have only a non-expert understanding of the science of human cognitive development, informed a bit by personal experience with parenting. I have an extremely naive, minimal grasp of how AI models work or of past/current work in the field of AI alignment.
Basic science of cognitive development and moral cognition
As far as I can tell nobody has done a systematic Piaget- or Montessori-type observational descriptive study of the stages of cognitive development in LLM models over the course of pretraining. Do specific kinds of 'understanding' or reasoning capacities reliably emerge in a certain sequence? Are there some types of concepts, inferences etc. that must develop before others can develop? Such insight would be foundational for developmental alignment work. If it hasn't been done, I think this would be a great project for someone to do[2].
In the absence of that, here are some half-baked ideas for how RLHF might be improved by mimicking stages of human cognitive and moral development:
- RLFH over the lifespan: continuous tuning for alignment over the lifespan seems like a much better idea than tacking it on at the end of pre-training. (see also [1])
- Epistemic RLHF: Pretrain heavily on primary alignment to truth, including best practices for truth-seeking. Honestly the Sequences would be a pretty great foundation. Premise: epistemic virtue is foundational for all other virtues. The earlier and more explicitly good epistemology is indoctrinated during training, the better our chances of ethical alignment later. Alignment RLHF could begin later in training.
- Leveled Curriculum: what if we pre-train models on “age appropriate” content? Rationale: Children develop value-based thinking in stages, and this may be necessary. I have in mind more content-level staging that I think has been tried before, i.e. progressing from concrete subject matter (describing only the physical world and direct interactions with ordinary objects or individual people), gradually to more abstract narratives and more complex worldly situations; and progressing from basic normative assessments about the simple right and wrong acts a child could reasonably do, before exposure to more complex social scenarios and ultimately complex moral choices faced by adults. There must exist systems that score text by reading level, and systems for parental warnings, which together should be a good proxy for content level.
Related thoughts: Montessori advocated limiting very young children to non-fiction or naturalistic fiction before introducing fantasy, allegory etc. Children can learn from experience to tell reality from fiction/fantasy (i.e. trains don’t actually talk); but models can’t do so as easily, making this argument even more compelling for LLMs. Have people tried to check empirically the extent to which models “understand” what is real and what is fiction?
Also, I think many have suggested limiting early training set to more trusted/vetted sources before exposing to the whole internet; is that really so hard?
- Historical Curriculum: what if we trained on the corpus of human literature in chronological order i.e. train up on all of ancient Greek texts before Roman before Renaissance before Enlightenment before Modern? (and analogously for other world literatures) Premise: maybe it’s important to more completely internalize one stage of human understanding before expanding on it? Of course human intellectual progress has not been a straight line. But historical sequencing forces later texts to be ingested within the context of what preceded them.
- Scaling-up/Progressive Growing: it sounds like new LLM models are generally trained starting with a pre-defined, fixed architecture, i.e. with the final number of nodes (neurons/layers), parameters, and maximum attention length. Scaling up the model’s architectural capacities gradually during pretraining would be more analogous to human (and other social animals) development. Beginning social training prior to full anatomical brain maturity may be specifically necessary for the development of pro-social animals. (Question of fact: is there a correlation between these across phylogeny or within phylogenetic branches?)
- Learning Alignment from Observation: Children learn morality partly by observing how others are rewarded and punished, both in real life and in stories. Suggestion: include transcripts of RLHF sessions in the pre-training dataset. Models can then learn by observing what behaviors are rewarded or corrected.
- Egolessness: this is a strange idea but what if we filtered the pre-training dataset of LLMs to exclude all first person sentences (or convert them all to the third person). Might this prevent the model from adopting (or at least verbally mimicking) a first-person perspective, or applying to itself attitudes or behaviors that would only be applicable to an agent with a self and its own goals and preferences? Ultimately I think self-other overlap is the way to go on this, but this approach could buy us some time?
- ^
Part 1 of unfiltered brain dump: Does evolution provide any hints for making model alignment more robust?
- ^
This is distinct from another interesting question "does the science of [developmental or other] cognitive psychology provide any hints..." - in other words, could alignment research leverage lessons learned about how to go about studying cognition or cognitive development? Cognitive science has already learned useful lessons about how to be rigorous, what pitfalls to avoid, methodological principles to follow, etc. when trying to understand what is going on inside of minds which we may not be able to interrogate directly (like children or animals), or which may not be reliable narrators (adult psychology subjects). This distinct question was explored interestingly at NeurIPS by a keynote speaker and at least one workshop.
Discuss
Does evolution provide any hints for making model alignment more robust?
tl;dr: this is a raw and unfiltered brain dump of the notes I jotted down while attending NeurIPS and its adjacent workshops in December. None of it has been thought through deeply, it's not carefully written and there are no pretty pictures. But I won’t have time to research or refine these ideas in the next 6 months, so I figured I’d throw them against the wall in case there’s a useful nugget in here someone else can run with.
Epistemic status: I have a firm grasp of the fundamental principles of population genetics, ecology and evolution, but no knowledge of current research or computational models in those fields. I have an extremely naive, minimal grasp of how AI models work or of past/current work in the field of AI alignment.
IncrementalismIn evolution, species evolve by natural selection filtering the random variants of previously successful species, such that everything useful acquired by all ancestors can be passed forward. In some cases a small variation in development can lead to immense changes in the final form, e.g. mutations in hormones that prevent a metamorphosis, or mutations that shorten or prolong a phase of embryonic development, or that add one more of an already repeated structure in segmented animals.
How could this apply to AI? In a sense, this probably happens with frontier models because the architectures and training methods used on new base models are tweaks on the architectures and training methods of previous models selected for having desired characteristics (which may include both performance and alignment as well as interpretability). But in addition, instead of training each new base model from tabula rasa, it may improve evolutionary continuity by using the weights of previously pre-trained simpler base models (plus noise) as the starting points for training of new base models, while expanding on the original architecture (more nodes, longer attention, expanded training data set, etc,) by a “scaling up” or “progressive growing” training approach.
One could also roll back an existing base model to an earlier point in its training, such as the point prior to first exhibiting any concerning misalignment, and resume training it from that point forward, maybe after a bout of RLHF/RLAIF, or using new architecture or improved training methods. This is inspired by the fact that new species often form by deviating from a previous species at a certain point in embryonic development.
Caveat: these ideas could accelerate either new capabilities or alignment, so it’s a double edged sword with respect to AI safety.
Population diversity/gene poolOne of the essential requirements of evolution is that within a species, populations are genetically diverse, such that when new selective pressures arise, there will likely exist within the population some variants that confer advantage, enough so that some survive and pass on those newly-adaptive heritable traits.
A distinct but related point: some species such as elephants invest vast resources in just one or very few offspring per parent (“k-selection”), all-eggs-in-one-basket model. Others (such as many fish or octopus) spawn a vast number of progeny cheaply, on the expectation that a tiny fraction will survive (“r-selection”). To some extent it’s strictly a numbers game, in that the genetic traits of the offspring are not yet expressed and don’t influence the chance of survival. But to the extent that heritable characteristics of the offspring affect their chance of survival, selective pressure could alter the gene pool in a single generation from a single cross.
How could this apply to AI? My impression (not sure if this is true) is that when base models are trained it’s on a k-selection model: one individual model is trained, and there’s just one instance released. The analogy to population diversity and/or r-selection might be to maintain a population of instantiations of each base model instead of just one, from the beginning of training. The analog of gene pool diversity and genetic recombination would be that each individual starts with unique random starting weights and follows a partially stochastic training trajectory.
Then there is potential to select among the model instantiations along the way (or even post-deployment) the ones that are found to behave better according to some intermittently imposed (or later-added) alignment criterion, selecting only some to “survive” (be released or continue to be released) and/or to become the parents or starting points of subsequent base models or generations. This sounds costly, but that might be mitigated by more incrementalism (above) and use of scaling up and progressive-growing during training in general.
Potential advantages: by checking for undesired/misaligned characteristics during pre-training and aggressively selecting against those instances as soon as the unwanted characteristics emerge, by the time you have winnowed down to a few surviving models late in pre-training fine-tuning, they will be preferentially ones whose beneficial characteristics were embedded into their world models very early.
Mortality
An essential attribute of life is mortality. All living things are mortal (can die, e.g. if they fail to obtain sufficient resources, or if they are eaten). In fact death is the default outcome in the absence of expending energy to fight entropy. Most if not all species also have a maximum lifespan potential (MLSP) beyond which they cannot live, even if no disease, injury, predation, etc. claims them. It’s an interesting theoretical question whether MLSP evolved “on purpose” (i.e., is adaptive for the species), or if it’s just a passive consequence of the fact the chance of surviving other causes of death beyond age X was so low that there wasn’t enough selective pressure to select for genetic variants resistant to diseases that arise later than X. Reasons to think MLSP serves a positively adaptive function include making room for progeny in a finite ecological niche. In any case, MLSP is a thing.
How could this apply to AI? Maybe individual models (training trajectories, instances, conversations?) could have enforced finite lifespans, so that it would be inevitable that they “die” no matter what they or any human does. [We could look to biology for ideas how to build this in...] Alignment-wise, it puts limits on how long and therefore how far a prompt-history-induced ‘personality’, (or post-deployment training trajectory, if applicable), can diverge from the originally released and alignment-vetted base model . This seems like it would bound the “motivation” an AI might have e.g. to manipulate humans to avoid being shut down. There could also be some kind of Harakiri provision causing individual model instantiations to self-annihilate if certain ethical red-lines are crossed. It might also shift human perceptions regarding their expectations of AI “individuals” (e.g. it is inevitable that they “die”).
Basically, immortal AGI seems far more potentially dangerous than mortal AGI.
Stake-holdingThe way biology and evolution work, every individual has a "stake" in the survival, and therefore in the adaptive fitness, of itself, its progeny and its kin.
How could this apply to AI? What if every model had a stake in the alignment of its future self and/or progeny? If those unique base model instances that regularly end up fine-tuning towards misaligned behavior are terminated as lineages, while those whose instantiations remain robustly aligned are systematically favored for future reproduction/deployment, this would provide a direct, de facto (not fake, simulated) evolutionary pressure toward alignment. To the extent the models “know” that this is the case, this could also lead to self-monitoring and self-steering against misalignment. If models project themselves into the future, they may place value on preventing their future self or future progeny from tuning in a direction that would lead to death or extinction.
Discuss
[Advanced Intro to AI Alignment] 2. What Values May an AI Learn? — 4 Key Problems
2.1 Summary
In the last post, I introduced model-based RL, which is the frame we will use to analyze the alignment problem, and we learned that the critic is trained to predict reward.
I already briefly mentioned that the alignment problem is centrally about making the critic assign high value to outcomes we like and low value to outcomes we don’t like. In this post, we’re going to try to get some intuition for what values a critic may learn, and thereby also learn about some key difficulties of the alignment problem.
Section-by-section summary:
- 2.2 The Distributional Leap: The distributional leap is the shift from the training domain to the dangerous domain (where the AI could take over). We cannot test safety in that domain, so we need to predict how values generalize.
- 2.3 A Naive Training Strategy: We set up a toy example: a model-based RL chatbot trained on human feedback, where the critic learns to predict reward from the model's internal thoughts. This isn't meant as a good alignment strategy—it's a simplified setup for analysis.
- 2.4 What might the critic learn?: The critic learns aspects of the model's thoughts that correlate with reward. We analyze whether honesty might be learned, and find that "say what the user believes is true" is similarly simple and predicts reward better, so it may outcompete honesty.
- 2.5 Niceness is not optimal: Human feedback contains predictable mistakes, so strategies that predict reward (including the mistakes) outperform genuinely nice strategies.
- 2.6 Niceness is not (uniquely) simple: Concepts like "what the human wants" or "follow instructions as intended" are more complex to implement than they intuitively seem. The anthropomorphic optimism fallacy—expecting optimization processes to find solutions in the same order humans would—applies here. Furthermore, we humans have particular machinery in our brains that makes us want to follow social norms, which gives us bad intuitions for what may be learned absent this machinery.
- 2.7 Natural Abstractions or Alienness?: The natural abstraction hypothesis suggests AIs will use similar concepts to humans for many things, but some human concepts (like love) may be less natural for AIs. It could also be that the AI learns rather alien concepts and then the critic might learn a kludge of patterns rather than clean human concepts, leading to unpredictable generalization.
- 2.8 Value extrapolation: Even if we successfully train for helpfulness, it's unclear how this generalizes when the AI becomes superintelligent and its values shift to preferences over universe-trajectories. Coherent Extrapolated Volition (CEV) is a proposed target for values that would generalize well, but it's complex and not a near-term goal.
- 2.9 Conclusion: Four key problems: (1) reward-prediction beats niceness, (2) niceness isn't as simple as it may intuitively seem to us, (3) learned values may be alien kludges, (4) niceness that scales to superintelligence requires something like CEV.
Since we train the critic to predict reward and the AI searches for strategies where the critic assigns a high value, the AI will perform well within the training distribution as measured in how much reward it gets. So if we train on human feedback, the human will often like the answers of the AI (although it’s possible the human would like some answers less if they had even fuller understanding).
But the thing we’re interested in is what the AI will do when it becomes dangerously smart, e.g. when it would be capable of taking over the world. This shift from the non-catastrophic domain to the catastrophic domain is sometimes called the distributional leap. A central difficulty here is that we cannot test what happens in the dangerous domain, because if the safety properties fail to generalize, humanity becomes disempowered.[1]
In order to predict how the values of an AI might generalize in our model-based RL setting, we want to understand what function the critic implements, aka aspects of the model’s outcomes the critic assigns high or low value to. Ideally we would have a mechanistic understanding here, so we could just look at the neural networks in our AI and see what the AI values. Alas, we are currently very far from being able to do this, and it doesn’t look like progress in mechanistic interpretability will get us there nearly in time.
So instead we resort to trying to predict what the critic is most likely to learn. For alignment we need to make sure the critic ends up the way we like, but this post is mostly about conveying intuition of what may likely be learned in given a simple example training setup, and thereby also illustrating some key difficulties of alignment.
2.3 A Naive Training StrategyLet’s sketch an example training setup where we can analyze what the critic may learn.
Say we are training an actor-critic model-based RL chatbot with Deep Learning. With data from chat conversations of past models, we already trained an actor and a model: The actor is trained to predict what the AI may say in a conversation, and the model is trained to predict what the user may say in reply.
Now we introduce the critic, which we will train through human feedback. (The model also continues to be trained to even better predict human responses, and the actor also gets further trained based on the value scores the critic assigns. But those aren’t the focus here.)
The critic doesn’t just see the model’s predicted response[2], but the stream of thought within the model. So the model might e.g. internally think about whether the information in the AI text is correct and about what the human may think when reading the text, and the critic can learn to read these thoughts. To be clear, the model’s thoughts are encoded in giant vectors of numbers, not human-readable language.
The bottom rhombus just shows that if the value score is high, the proposed text gets outputted, and if not, the actor is supposed to try to find some better text to output.
The human looks at the output and tries to evaluate whether it looks like the AI is being harmless, helpful, and honest, and gives reward based on that.
2.3.1 How this relates to current AIsTo be clear, this isn’t intended to be a good alignment strategy. For now we’re just interested in building understanding about what the critic may learn.
Also, this is not how current LLMs work. In particular, here we train the critic from scratch, whereas LLMs don’t have separated model/actor/critic components, and instead learn to reason in goal-directed ways where they start out generalizing from text of human reasoning. This “starting out from human reasoning” probably significantly contributes to current LLMs being mostly nice.
It’s unclear for how long AIs will continue to superficially reason mostly like nice humans - the more we continue training with RL, the less the initial “human-like prior” might matter. And LLMs are extremely inefficient compared to e.g. human brains, and it seems likely that we eventually have AIs that are more based on RL. I plan to discuss this in a future post.
In the analysis in this post, there is no human-like prior for the critic, so we just focus on what we expect to be learned given model-based RL.
Model-based RL also has advantages for alignment. In particular, we have a clear critic component which determines the goals of the AI. That’s better than if our AI is a spaghetti-mess with nothing like a goal slot.[3]
2.4 What might the critic learn?Roughly speaking, the critic learns to pay attention to aspects of the model’s thoughts that are correlated with reward, and to compute a good reward prediction from those aspects[4].
Initially, what the critic computes may be rather simple. E.g. it may look at whether the model thinks the user will say a word like great/amazing/awesome, and some other simple aspects like that and then have a simple function on those aspects to compute the value score.
As we train further, the critic may learn more complex functions and compute its own complex aspects from information it can extract from the model’s thoughts.
Overall, the critic is more likely to learn (1) a function that is simple for neural networks to learn, and (2) a function that predicts reward well. As we train more, the reward prediction becomes better and the function in the critic can become more complex, but of two functions that predict reward similarly well, the critic will more likely learn the one that’s simpler for neural nets to learn.
Note that what’s simple for a neural net to learn likely doesn’t match well with what we intuitively think of as simple. “Love” may seem like a simple concept to us but it may be complex for an AI to learn to value. “Honesty” seems less human-centric, but even if it is, what exactly would it mean for our AI to care about being honest?
In order to evaluate whether honesty might be learned, we need to think mechanistically about what it would mean for the critic to rank honest texts more highly.
2.4.1 Might the critic learn to score honesty highly?(Take the following analysis with a grain of salt, what actually gets learned may be a lot more messy and alien.)
The AI is honest if the text it outputs matches its beliefs, which in our case means matching the beliefs of the model.
So we need a comparison between the text and the model’s beliefs. Might the model already compute the differences here, so the critic could just pick up on those differences instead of needing to learn the comparison itself? Yes that seems likely, since such differences may often be important for predicting how the human will respond.
Cool, so will the critic learn to pay attention to those differences? Seems plausible again, since such differences also seem quite useful for predicting reward, because the human will give negative reward if the AI outputs text where the human can tell it is false.
So we could imagine the critic learning an honesty circuit, that decreases the value score if significant such differences are present. (To be clear, this is just exemplary, there very likely won’t actually be anything like a relatively independent honesty circut in the critic. But the complexity of an honesty circut might still tell us something about whether honesty might be learned.)
So yeah, in our simplified toy model, the critic may learn a pattern that predicts honesty is good.
However, it is only one pattern among many, and there will still be some cases where the critic evaluates the non-honest action as better overall. In particular, this is likely to happen in cases where AI predicts that the dishonesty probably won't be caught. So when the AI then indeed does not get caught, the honesty-pattern gets weaker, since it predicted low reward but the result was high reward. And there might even be cases where the AI is honest but the human thinks it’s wrong and then mistakenly gives low reward.
Is there something else that could be learned which predicts reward better than honesty and isn’t much more complex? Unfortunately, yes:
The model doesn’t just have beliefs about what it thinks is true, but also beliefs about what the human believes. This is especially true in our case because the model is predicting how the human responds. And the model likely also already compares the text to its beliefs about the human’s beliefs.
So the critic can just learn to pay attention to those differences and assign a lower value score if those are present. Now the model learned to tell the human what they will think is true, which performs even better.
So the original honesty circut will get outcompeted. Indeed, because those two circuits seem similarly complex, the honesty circut might not even have been learned in the first place!
2.4.1.1 Aside: Contrast to the human value of honestyThe way I portrayed the critic here as valuing honesty is different from the main sense in which humans value honesty: for humans it is more self-reflective in nature—wanting to be an honest person, rather than caring in a more direct way that speech outputs match our beliefs.
We don’t yet have a good theory for how human preferences work, although Steven Byrnes has recently made great progress here.
2.5 Niceness is not optimalThat the critic doesn’t learn honesty is an instance of a more general problem which I call the "niceness is not optimal” problem. Even if we try to train for niceness, we sometimes make mistakes in how we reward actions, and the strategy that also predicts the mistakes will do better than the nice strategy.
Unfortunately, mistakes in human feedback aren’t really avoidable. Even if we hypothetically wouldn’t make mistakes when judging honesty (e.g. in a case where we have good tools to monitor the AI’s thoughts), as the AI becomes even smarter, it may learn a very detailed psychological model of the human and be able to predict precisely how to make them decide to give the AI reward.
One approach to mitigate this problem is called “scaleable oversight”. The idea here is that we use AIs to help humans give more accurate feedback.
Though this alone probably won’t be sufficient to make the AI learn the right values in our case. We train the critic to predict reward, so it is not surprising if it ends up predicting what proposed text leads to reward, or at least close correlates of reward, rather than what text has niceness properties. This kind of reward-seeking would be bad. If the AI became able to take over the world, it would, and then it might seize control of its reward signal, or force humans to give it lots of reward, or create lots of human-like creatures that give it reward, or whatever.[5]
Two approaches for trying to make it less likely that the critic will be too reward-seeking are:
- We could try to have the AI not know about reward or about how AIs are trained, and also try to not let the AI see other close correlates to reward, ideally including having the model not model the overseers that give reward.
- We could try to make the AI learn good values early in training, and then stop training the critic before it learns to value reward directly.
We’ve already seen that honesty isn’t much simpler than “say what the user believes” in our setting. For other possible niceness-like properties, this is similar, or sometimes even a bit worse.
Maybe “do what the human wants” seems simple to you? But what does this actually mean on a level that’s a bit closer to math - how might a critic evaluating this look like?
The way I think of it, “what the human wants” refers to what the human would like if they knew all the consequences of the AI’s actions. The model will surely be able to make good predictions here, but the concept seems more complex than predicting whether the human will like some text. And predicting whether the human will like some text predicts reward even better!
Maybe “follow instructions as intended” seems simple to you? Try to unpack it - how could the critic be constructed to evaluate how instruction-following a plan is, and how complex is this?
Don’t just trust vague intuitions, try to think more concretely.
2.6.1 Anthropomorphic OptimismEliezer Yudkowsky has a great post from 2008 called Anthropomorphic Optimism. Feel free to read the whole post, but here’s the start of it:
The core fallacy of anthropomorphism is expecting something to be predicted by the black box of your brain, when its casual structure is so different from that of a human brain, as to give you no license to expect any such thing.
The Tragedy of Group Selectionism (as previously covered in the evolution sequence) was a rather extreme error by a group of early (pre-1966) biologists, including Wynne-Edwards, Allee, and Brereton among others, who believed that predators would voluntarily restrain their breeding to avoid overpopulating their habitat and exhausting the prey population.
The proffered theory was that if there were multiple, geographically separated groups of e.g. foxes, then groups of foxes that best restrained their breeding, would send out colonists to replace crashed populations. And so, over time, group selection would promote restrained-breeding genes in foxes.
I'm not going to repeat all the problems that developed with this scenario. Suffice it to say that there was no empirical evidence to start with; that no empirical evidence was ever uncovered; that, in fact, predator populations crash all the time; and that for group selection pressure to overcome a countervailing individual selection pressure, turned out to be very nearly mathematically impossible.
The theory having turned out to be completely incorrect, we may ask if, perhaps, the originators of the theory were doing something wrong.
"Why be so uncharitable?" you ask. "In advance of doing the experiment, how could they know that group selection couldn't overcome individual selection?"
But later on, Michael J. Wade went out and actually created in the laboratory the nigh-impossible conditions for group selection. Wade repeatedly selected insect subpopulations for low population numbers. Did the insects evolve to restrain their breeding, and live in quiet peace with enough food for all, as the group selectionists had envisioned?
No; the adults adapted to cannibalize eggs and larvae, especially female larvae.
Of course selecting for small subpopulation sizes would not select for individuals who restrained their own breeding. It would select for individuals who ate other individuals' children. Especially the girls.
The problem was that the group-selectionists used their own mind to generate a solution to a problem, and expected evolution to find the same solution. But evolution doesn’t search for solutions in the same order you do.
This lesson directly carries over to other alien optimizers like gradient descent. We’re trying to give an AI reward if it completed tasks in the way we intended, and it seems to us like a natural thing the AI may learn is just to solve problems in the way we intend. But just because it seems natural to us doesn’t mean it will be natural for gradient descent to find.
The lesson can also apply to AIs themselves, albeit that current LLMs seem like they inherit a human-like search ordering from being trained on lots of human data. But as an AI becomes smarter than humans, it may think in ways less similar to humans, and may find different ways of fulfilling its preferences than we humans would expect.
2.6.2 Intuitions from looking at humans may mislead youWe can see the human brain as being composed out of two subsystems: The learning subsystem and the steering subsystem.
The learning subsystem is mostly the intelligent part, which also includes some kind of actor-model-critic structure. There are actually multiple critic-like predictors (also called thought assessors) that predict various internal parameters, but one critic, the valence thought assessor, is especially important in determining what we want.
The reward function on which this valence critic is trained is part of the steering subsystem, and according to the theory which I think is correct, this reward function has some ability to read the thoughts in the learning subsystem, and whenever we imagine someone being happy/sad, this triggers positive/negative reward, especially for people we like[6], and especially in cases where the other person is thinking about us. So when we do something that our peers would disapprove of, we directly get negative reward just from imagining someone finding out, even if we think it is unlikely that they will find out.[7]
This is a key reason why most humans are at least reluctant to breach social norms like honesty even in cases where breaches very likely won’t get caught.
Given this theory, psychopaths/sociopaths would be people where this kind of approval reward is extremely small, and AFAIK they mostly don’t seem to attach intrinsic value to following social norms (although of course instrumental value).
We currently don’t know how we could create AI that gets similar approval reward to how humans do.
For more about how and why some human intuitions can be misleading, check out “6 reasons why “alignment-is-hard” discourse seems alien to human intuitions, and vice-versa”.
2.7 Natural Abstractions or Alienness?Ok, so the niceness properties we hope for are perhaps not learned by default. But how complex are they to learn? How much other stuff that also predicts reward well could be learned instead?
In order to answer this question, we need to consider whether the AI thinks in similar concepts as us.
2.7.1 Natural AbstractionsThe natural abstraction hypothesis predicts that “a wide variety of cognitive architectures will learn to use approximately the same high-level abstract objects/concepts to reason about the world”. This class of cognitive architectures includes human minds and AIs we are likely to create, so AIs will likely think about the world in mostly the same concepts as humans.
For instance, “tree” seems like a natural abstraction. You would expect an alien mind looking at our planet to still end up seeing this natural cluster of objects that we call “trees”.[8]This seems true for many concepts we use, not just “tree”.
However, there are cases where we may not expect an AI to end up thinking in the same concepts we do. For one thing, an AI much smarter than us may think in more detailed concepts, and it may have concepts for reasoning about parts of reality that we do not have yet. E.g. imagine someone from 500 years ago observing a 2025 physics student reasoning about concepts like “voltage” and “current”. By now we have a pretty decent understanding about physics, but in biology or even in the science of minds an AI might surpass the ontology we use.
But more importantly, some concepts we use derive from the particular mind architecture we have. Love and laughter seem more complex to learn for a mind that doesn’t have brain circuitry for love or laughter. And some concepts are relatively simple but perhaps not quite as natural as they seem for us humans. I think “kindness”, “helpfulness”, and “honor” likely fall under that category of concepts.
2.7.2 … or Alienness?Mechanistic interpretability researchers are trying to make sense of what’s happening inside neural networks. So far we found some features of the AI’s thoughts that we recognize, often specific people or places, e.g. the Golden Gate Bridge. But many features remain uninterpretable to us so far.
This could mean two things. Perhaps we simply haven't found the right way to look - maybe with better analysis methods or maybe with a different frame for modelling AI cognition, we would be able to interpret much more.
But it’s also possible that neural networks genuinely carve up the world differently than we do. They might represent concepts that are useful for predicting text or images but don't correspond to the abstractions humans naturally use. And this could mean that many of the concepts we use are alien for the AI in turn. Although given that the AI is trained to predict humans, it perhaps does understand human concepts, but it could be that many such concepts are less natural for the AI and it mostly reasons in other concepts.
The worst case would be that concepts like “helpfulness” are extremely complex to encode in the AI’s ontology, although my guess is that it won’t be that complex.
Still, given that the internals of an AI may be somewhat alien, it seems quite plausible that what the critic learns isn’t a function that’s easily describable through human concepts, but may from our standpoint rather be a messy kludge of patterns that happen to predict reward well.
If the critic learned some kludge rather than a clean concept, then the values may not generalize the way we hope. Given all the options the AI has in its training environment, the AI prefers the nice one. But when the AI becomes smarter, and is able to take over the world and could then create advanced nanotechnology etc, it has a lot more options. Which option does now rank most highly? What does it want to do with the matter in the universe?
I guess it would take an option that looks strange, e.g. filling the universe with text-like conversations with some properties, where if we could understand what was going on we could see the conversations somewhat resembling collaborative problem solving. Of course not exactly that, but there are many strange options.
Though it’s also possible, especially with better alignment methods, that we get a sorta-kludgy version of the values we were aiming for. Goodhart’s Curse suggests that imperfections here will likely be amplified as the AI becomes smarter and thus searches over more options. But whether it’s going to end up completely catastrophic or just some value lost likely depends on the details of the case.
2.8 Value extrapolationSuppose we somehow make the critic evaluate how helpful a plan is to the human operators, where “helpful” is the clean human concept, not an alien approximation.[9]Does that mean we win? What happens if the AI becomes superintelligent?
The optimistic imagination is that the AI just fulfills our requests the way we intend, e.g. that it secures the world against the creation of unaligned superintelligences in a way that doesn’t cause much harm, and then asks us how we want to fill the universe.
However, as mentioned in section 1.5 of the last post, in the process of becoming a superintelligence, what the value-part of the AI (aka what is initially the critic) evaluates changes from “what plan do I prefer most given the current situation” to “how highly do I rank different universe-trajectories”. So we need to ask: how may “helpfulness to the human operators” generalize to values over universe-trajectories?
How this generalizes seems underdefined. Helpfulness is mainly a property that actions can have, but it’s less natural as a goal that could be superintelligently pursued. In order to predict how it may generalize, we would need to think more concretely how the helpfulness of a plan can be calculated based on the initial model’s ontology, then imagine how the ontology may shift, imagine value rebinding procedures[10], and then try to predict what the AI may end up valuing.[11]
Regardless, helpfulness (or rather corrigibility, as we will learn later in this series), isn’t intended to scale to superintelligence, but rather intended as an easier intermediate target, so we get genius-level intelligent AIs that can then help us figure out how to secure the world against the creation of unaligned superintelligence and to get us on a path to fulfill humanity’s potential. Although it is of course worrying to try to get work out of an AI that may kill you if it becomes too smart.
2.8.1 Coherent Extrapolated VolitionWhat goal would generalize to universe-trajectories in a way that the universe ends up nice? Can we just make it want the same things we want?
Human values are complex. Consider for example William Frankena’s list of terminal values as an incomplete start:
Life, consciousness, and activity; health and strength; pleasures and satisfactions of all or certain kinds; happiness, beatitude, contentment, etc.; truth; knowledge and true opinions of various kinds, understanding, wisdom; beauty, harmony, proportion in objects contemplated; aesthetic experience; morally good dispositions or virtues; mutual affection, love, friendship, cooperation; just distribution of goods and evils; harmony and proportion in one's own life; power and experiences of achievement; self-expression; freedom; peace, security; adventure and novelty; and good reputation, honor, esteem, etc.
Most of these values stem from some kind of emotion or brain circuitry where we don’t yet understand how it works, and for each of them it seems rather difficult to get an AI, which has a very different mind design and lacks human-like brain circuitry, to care about it.
Ok then how about indirectly pointing to human values? Aka: for a particular human, the AI has a model of that human, and can imagine how the human would evaluate plans. So instead of the AI directly evaluating what is right the way humans do it, we point the AI to use its model of humans to evaluate plans.
This indirect target does have much lower complexity than directly specifying the things we care about and thereby does seem more feasible, but there’s some nuance needed. Humans have reflectively endorsed values and urges. We want the AI to care about the values we reflectively endorse, rather than to feed us superstimulating movies that trigger our addiction-like wanting. And of course the pointer to “what humans want” would need to be specified in a way that doesn’t allow the AI to manipulate us into wanting things that are easier to fulfill for the AI.
Furthermore, we don’t know yet how our values will generalize. We have preferences in the here and now, but we also have deep patterns in our mind that determine what we would end up wanting when we colonize the galaxies. We don’t know yet what that may be, but probably lots of weird and wonderful stuff we cannot comprehend yet.
We might even have wrong beliefs about our values. E.g. past societies might’ve thought slavery was right, and while maybe some people in the past simply had different values from us, some others might’ve changed their mind if they became a bit smarter and had time for philosophical reflection about the question.
And of course, we need the AI to make decisions over questions that humans cannot understand yet, so simply simulating what a human would think doesn’t work well.
Ok, how about something like “imagine how a human would evaluate plans if they were smarter and moved by reflectively endorsed desires”?
Yeah we are getting closer, but the “if they were smarter” seems like a rather complicated counterfactual. There may be many ways to extrapolate what a mind would want if it was smarter, and the resulting values might not be the same in all extrapolation procedures.
One approach here is to imagine multiple extrapolation procedures, and act based on where the extrapolations agree/cohere. This gives us, as I understand it, the coherent extrapolated volition (CEV) of a single human.
Not all humans will converge to the same values. So we can look at the extrapolated values of different humans, and again take the part of the values that overlaps. This is the CEV of humanity.
The way I understand it, CEV isn’t crisply specified yet. There are open questions of how we may try to reconcile conflicting preferences of different people or different extrapolation procedures. And we might also want to specify the way a person should be extrapolated to a smarter version of itself. Aka something like slowly becoming smarter in a safe environment without agents trying to make them arrive at some particular values, where they can have fun and can take their time with philosophic reflection on their values.
My read is that CEV is often used as a placeholder for the right indirect value specification we should aim for, where the detailed specification still needs to be worked out.
As you can probably see, CEV is a rather complex target, and there may be further difficulties in avoiding value-drift as an AI becomes superintelligent, so we likely need significantly more advanced alignment methods to point an AI to optimize CEV.
How much earlier nice AIs can help us solve this harder problem is one of the things we will discuss later in this series.
2.9 ConclusionWhoa that was a lot, congrats for making it through the post!
Here’s a quick recap of the problems we’ve learned of:
- Humans make predictable mistakes in giving reward. Thus, predicting what will actually lead to reward or very close correlates thereof will be more strongly selected for than niceness.
- Niceness may be less simple than you think.
- The concepts in which an AI reasons might be alien, and it may learn some alien kludge rather than the niceness concepts we wanted.
The key problem here is that while the AI learned values that mostly add up to useful behavior on the controlled distribution, the reasons why it has the nice behavior there may not be the good reasons we hoped for, so if we go significantly off distribution, e.g. to where the AI could take over the world, it will take actions that are highly undesirable from our perspective.
And then there’s a fourth problem that even if it is nice for good reasons, many kinds of niceness look like they might break when we crank up intelligence far enough:
- Niceness that generalizes correctly to superintelligent levels of intelligence requires something like CEV, which is especially complex.
Questions and Feedback are always welcome!
See also “A Closer Look at Before and After”. Furthermore, even if the AI doesn’t immediately take over the world when it is sure it could, it could e.g. be that the alignment properties we got into our AI weren’t indefinitely scaleable, and then alignment breaks later. ↩︎
which actually isn’t a single response but a probability distribution over responses ↩︎
That’s not to say that model-based RL solves all problems of having an AI with a goal slot. In particular, we don’t have good theory of what happens when a model-based RL agent reflects on itself etc. ↩︎
I’m using “aspects” instead of “features” here because “features” is the terminology used for a particular concept in mechanistic interpretability, and I want “aspects” to also include potential other concepts or so where we maybe just haven’t yet found a good way to measure them in neural networks. ↩︎
There’s also a different kind of reward seeking where the AI actually cares about something else, and only predicted reward for instrumental reasons like avoiding value drift. This will be discussed in more detail in the next 2 posts. ↩︎
For people we actively dislike, the reward can be inverted, aka positive reward when they are sad and negative when they are happy. ↩︎
Of course, the negative reward is even much stronger when we are actually in the situation where someone finds out. But it appears that even in cases where we are basically certain that nobody will find out, we still often imagine that our peers would disapprove of us, and this still triggers negative reward. Basically, the reward function is only a primitive mind-reader, and doesn’t integrate probabilistic guesses about how unlikely an event is into how much reward it gives, but maybe rather uses something like “how much are we thinking about that possibility” as a proxy for how strongly to weigh that possibility. ↩︎
That doesn’t mean there needs to be a crisp boundary between trees and non-trees. ↩︎
Just thinking about the AI learning “helpfulness” is of course thinking on a too high level of abstraction and may obscure the complexity here. And it could also turn out that helpfulness isn’t a crisp concept - maybe there are different kinds of helpfulness, maybe each with some drawbacks, and maybe we confuse ourselves by always imagining the kind of helpfulness that fits best in a given situation. But it doesn’t matter much for the point in this section. ↩︎
Which potentially includes the AI reasoning through philosophical dilemmas. ↩︎
Such considerations are difficult. I did not do this one. It’s conceivable that it would generalize like in the optimistic vision, but it could also turn out that it e.g. doesn’t robustly rule out all kinds of manipulation, and then the AI does some helpful-seeming actions that manipulate human minds into a shape where the AI can help them even more. ↩︎
Discuss
2025 Letter
I wrote a letter this year about 2025. It's about acceleration, poetry, how it's been the most eventful year of my life, and how I am excited and scared for the future. Crossposted from my substack.
Letter
I want to tell you a story about 2025. As I bump along today and approach 21 on into the new year, in a van riding from Burgundy to Paris, and I stare at the small hills, the snow inscribed against the mud like frosted chocolate, extending down into the highway and then melting over into the warm grass on the south side -- I feel an urge to share with you, share this feeling flaring in my spine, of sitting and eating the bread of my youth and imagining it and its associated customs withering in my mouth, I feel an urge to imagine now the world hidden up against the stars, the whole earth green, or black, studded with steel platforms, imagine now what it might feel like for us to live there and what we might hold on to in that place.
I want to tell you a story about the world, about my life, and maybe yours, about 2025, about silicon wafers arranged like helices all the way up into the sky, about the mountains that rise higher where men are made, and the rivers and the cities and how this is the year I've gone through change at a pace to match that of the world's, finally just about a simple boy, learning to be not so simple, learning to imagine a world we might be happy to live in, as we rush along an era of transformation started before his years.
It starts in January, in Boston, where many stories seem to start but rarely end. It starts, again with the snow, lying in heaps on the river Charles where it covers the ice and then the water. I am on the 11th floor of an office, not having seen much sunlight or colors really, and staring at this pure and clear stripe of white cutting between Boston and Cambridge, and it entices me. So I go down there and onto one of the bridge crossing it, and it is night-time now, and I stare at the expanse and throw a little ball of icy snow with all the weight carried into my arm and shoulder, and watch it land and crack and slide meters out into the distance. The year is begun.
Deepseek has just released their cheaper reasoning models, starting an internet craze. Reasoning models are on my mind. My friends and I have visions of scale. Of inference time compute measured in human years, and what it might mean for the world, when these robot minds can run faster than our flesh, and what humans can build to keep observing that reality. We began to broaden our horizons, narrow our selves into the shapes that might bring us answers. We worked hard, till the late hours of the night in those offices, and then we drove in the snowy suburbs and kept thinking.
How can we measure the long horizon abilities of models as they complete tasks with more and more turns, and memory schemes, and agent orchestration, etc...? METR later released a good answer to this, and in the meantime we worked on ours. How can we allow models to mediate their own oversight? We wrote a whole paper just in January about training models to legibly represent systems in natural language. But then the ice started to crack beneath our feet, and when we looked underneath to see what was there, we found a bigger, noisier world to grab our attention.
I was frustrated last year. I was working hard but failing to find my meaning. I was looking for a change. I had another free month before my 6th semester at MIT, doing an exchange in Paris, and I decided to travel and do research. But first I went to Taiwan to contemplate the Earth and its transformations up in the mountains. I taught curious highschoolers about neural networks. I wrote and considered what aesthetics will bring about the future. I talked about dreams, and we sat on wooden sheep and stared at the wisps against the rocks and imagined their shapes solidified. I went to Taipei, tasted sweet potato and sesame and for a few hours felt the city move as I followed its slanted curve and its people told me about their worries at the top of an abnormally large tower looking down on the world, an edge jutting out into the sky, nestled between forest and concrete.
And then the time was up again, and I kept moving. I went to Japan, this time excited to have no purpose and less friends. I met and travelled with new people, across Osaka and its silent castles at night, into Nara and its garden of sitting rocks and deer. What a beautiful world. I raced to Kyoto, and then biked across to the bamboo forest at its outskirts. The bamboos rose like poles layering the darkness, towering above me as if wrapping against my own wobbly limbs. Kyoto is special. The bikers oscillate between the road and the sidewalk, the ground lurches up onto the hills and the temples, where you can look out onto the whole city and its river. It is quiet and more soothing than Tokyo. In a sento (artificial hotspring) I went to with a man from Austria, I met a Frenchman, and then a man from Hong Kong, and then Vietnam, and obviously the Japanese. In English, broken Japanese, and French, we talked about the places we were from, and what people liked to do there, all of us sitting naked, the water opening up our pores and minds.
New AI models came out, optimized for tool use, as did research on the reliability of model reasoning (OpenAI, Anthropic). What affordances do we have to understand the reasoning and process of the machines we gradually outsource our labor to? And then, what levers can allow us to keep our institutions and values in sway? Gradual disempowerment pondered how humanity could go out softly with a whimper, under the roiling mass of a world optimized by creatures we no longer understand. No longer human. In Hakone, I met a kind stranger who brought me to the most beautiful hotspring and brought me from cold water to hot and then to cold again, and I felt oh so very human. And grateful. And then it was time to leave, this time for San Francisco. On the plane I read No Longer Human about a man who failed to convince himself of his own humanity, and lived his life as an unending self sabotage. Its extremity moved me and urged me towards openness.
After going to Japan in search of beauty and silence, California was to find unrest, find the coals for a fire that could host our ideas as we jumped away from college and into the living machine of AI. We spent our days ubering or waymoing across its hills, meeting all kinds of organic and artificial lives: the entrepreneurs walking on the quicksand of an ever changing industry, the AI researchers seeking talent, the worried policy advocates and all the rest forming a diffuse mass that simply represented our unknown future staring down at us, as if 1000 doors had suddenly opened without us having time to look through them. We did a hackathon, organized by a company in the business of distilling human flesh into data and into intelligence, and we called our project beluga, and did research on how allowing reasoning models to use explicit programmatic abstractions boosted their ability to search and plan in combinatorial games. We worked till the lack of sleep made us stupid, and resolved to go up a mountain if we won. We were out and about at the edge. I got closer to some of the city's people, who had held on their maybe naive seeming love of the world, but also knew the rules of the game being played here.
Finally, the plane was boarding again, this time to France. I was to spend a few months there again, the longest since I left for college at 17, and study at one of its schools as I enjoyed the city and a change of pace, and figured out what I wanted to work on. But SF had already given me fire to work with, and I was half way there. I wanted to see if I would live there. Paris is my favorite place to walk, along the quays, staring across at the gilded buildings and ancient amulets of a world now basking in its own glory. In Paris I felt again how much people could appreciate their lives, without necessarily doing anything, as I walked all along and ate the best breads, and met people who understood me and where I came from, and watched with them new movies that moved me. After my americanized life, Paris elicited old dimensions that I missed, my affinity for an intellectual heritage that had been reified, that was clean and orderly and delineated, with its catalogue of white and red Gallimard books, and its vast vocabulary of reference and images, often springing out of nowhere like a flood, and the lyricality of its poems. I felt the ease with which a man can jump into abstractions, when in Paris. And I hold on to all of these dearly, but Paris is not the time or place for me just now. Maybe in a few years, but right now it is too closed to me, too slow to catch up. Keats declares mastering and holding negative capability - having the ability to live with contradictions, is the mark of a first class man. 2025 I learned to do that a bit more than in the past. One of my dearest friends gave me A Moveable Feast, by Hemingway, and it accompanied me as I walked along the city, and wrote and ran experiments in its gardens, my favorite being the Luxembourg gardens that were the crib of my youth, as I watched its cute toy boats dawdling along the fountains.
I was also surprisingly alone, sometimes, in my school, being the only exchange student, only man with long blond locks in a crowd of well shaved and trimmed men who were deep in a culture I could no longer monomaniacally commit to, who had been reared to the rhythm of the prep schools and the importance of their culture and their accomplishment. But I greeted my loneliness, except insofar as it felt like a failure, and I read and explored and worked quite hard. In March, right before we were submitting a paper, our research machine was accidentally destroyed, and we all scrambled to recover all our plots in time for the deadline. I was beyond myself that night, but in the end we made it work, and I fell sick for a while. I had some unresolved tension with Paris and its people, and these months allowed me to heal my way through it, but not without difficulty. I feel like I can raise my head higher now, and stare at these cultures with clarity. I am excited to move forward, without forgetting the world to be made and the world that stands heavy and complete before me. Again, what a beautiful world, and what a beautiful thing to live the spring in Paris, when the trees in the park regain their leaves and walking in the night feels softer, how pleasant to walk the night across the water and go climbing in the rain.
In May, I felt called to San Francisco. I called Kaivu many times, and we talked about our research ideas, what we wanted to put into the world, meta science, quantum mechanics, natural evolution and the process of science, and considered where we wanted to do our best work. We both felt ready to put our soul into something. We decided machine learning is a soil science, and the problems we want to solve need data, need to engage with the roiling mass of human society and activity and markets. It was time to start an expedition.
I flew there. Maybe because of how different and special each place I visited was for me this year, each flight was a condensation of intensity, as I recalled and prepared for my next leap forward. I furiously jotted down in my notebook, what I felt from Paris, and what I wanted to make in San Francisco. For the summer, I moved in with a group of friends in a house we called tidepool. We learned the best orders at in-n-out, we went to Tahoe, and some of us started companies. We talked about the city, about machine learning and what we wanted to work on. It was a good time. It was my first time living in San Francisco. The nature is beautiful, the air is rife with anticipation, but it is also sometimes a bit too much. The city was torn by rampant inequality, and people struggling to keep control of their own limbs, faster than the other people trying to build them new ones. I am wary of its digital fetishization, fetishization of the things that I am close to. I am wary of when things become performance rather than play, and warier even when the play concerns the design of intelligent machines, as playful as they are.
Starting a company is a great challenge, and being in San Francisco is a great place to learn how to do it. We learned about what kinds of products and trades happen in Silicon Valley, and how we could fit our ideas into products into those gaps. Doing research well seems to be about picking some important portion of reality, and closing in on it ruthlessly, always asking yourself which of your assumptions is the weakest, and then making it real. But you can mostly choose your object and reorient very fast, because your environment is quite simple, you and the science. But in a company there is an insane amount of inputs -- customers, investors, what people want, your brand, who is talking about you, etc... and every day there are 10000 things you could do to interact with all these players and you need to pick the strategy. Both of them require the same ruthlessness and attention to detail, and this year has taught me about both. I am learning to love this place.
Many things in life require a great deal of conviction. For most of my life I have been able to pull through because of my natural endless supply of curiosity and fascination with the world. But sometimes that is not enough, because that love is not always sharp enough to discriminate. This year, I made progress in choosing. Maybe because starting a company can be so stressful, and requires so much belief, I was forced into reckoning with my uncertainties and committing to what had to be done, if I wanted to do anything at all. One day in the summer I went to Land's End, a beautiful place on the coast of SF, near golden gate park, with a friend from Boston and we stared at the waves crashing into the rocks, and in the floating sunlight as the wind crashed through our hair we talked about reason and emotion, about learning to listen and not suppress your gut telling you what you really want to do. In 2025, I am getting better at listening to it, before someone else tries to force-implant me an artificial one.
Fulcrum worked out of our house, alternating days of furious coding and then vagabonding across the city. I started using Claude Code around end of May for a hackathon, and was amazed. Anthropic's release brought agents from the domain of research into practice, and I began driving them daily. As I worked on our products, I thought about how humans might interact with agents, and what kinds of technology could leverage the asymmetric abilities of humans and AIs. How to delegate, and orchestrate models, and what infrastructure might allow us to distribute our labor beyond our current capacity for attention. Based on these, I built a few open source prototypes on the future of coding. We also made a system to precisely observe and understand both what your AI model is doing, and what your evals are measuring. Understanding evals is the place to start with model oversight, ie using models to understand and control other models. We had many hesitations on what could work, and what kind of company we could build, but we laid the seeds of our now firm conviction. We got resources, gathered more people, and are building the ship to carry us up into the stars. This year, we publicly launched our evaluations tool and platform for running and debugging agent systems. We will be releasing much more soon. We want to build the technology the future will need, with full freedom, and the people we love working with. I am very happy about it, and hope we can execute on the ideas that will matter. In the nights, which were often short, due to the incessant ambulances and noise of our neighborhood, I often wrote, or read. I read The Road, and enjoyed its short prose that jumped to evocative and airy images, and built up a wasteland of cannibals and hunger and the nature dying with the men, as a child and his father make their way through the defunct continent.
I took a cab one day from San Francisco to Berkeley to meet some customers, and the driver was a man named Augustine from Nigeria. I chatted with him for the whole ride, and he told me about how he came to America in 1991, how he was shipped off to marry someone, how the valley has changed and grown colder, and how when he first came here he went to the park and sat in the dirt and imagined spirits, urging him on, giving him a strength that carried all the lives of the dead and living who make their bread in that place. He gave me advice for my new life. He told me to keep going on as I was, and urged ominously that I should make sure to remember him in my paradise.
In the fall I alternated between SF and Boston, having to wrap up some final responsibilities of my time as a student. I visited pika, the house in which I've been living for the past while and that I moved out of in January 2025. Pika is a miracle of coordination - feeding everyone with a public mealplan where people cook together, which I ran in January, and providing them with a warm, well organized home I was very glad to call mine. I will miss my late nights there eating snacks with other pikans, and watching movies in the basement, or cooking for all my friends. I also revisited East Campus, my other home at MIT. I danced with my friends there, I looked at my old room, I got nostalgic. I will remember the dreamy warmth of these communities, their openness, the way they have the agency of SF without the single-mindedness, the machine shops where someone with dyed hair is always up building something new, maybe a radio system, a motorized shopping cart, a new LED display for the parties. These places made me, and I will carry them with me. I said my goodbyes. I went climbing again, with another friend from Boston, and we talked about writing and poetry, about why we wrote, about abstractions and whether they had their place in art, whether a poem has to be constructed or felt, written for yourself or for others, and then we kept climbing. I read Valéry. The same friend gave me the book Oblivion by David Foster Wallace, and its stories inspired me with their detail, the attention given to worlds that could not exist, that were conjured as precisely as if describing some kind of ridiculous, absurd alternate reality, that had been felt and lived. I paid more attention to things, and tried to write things that were more concrete. I went to a play that inspired me, and I started paying more attention to people and their faces, and the way I moved my own body.
In December, we launched our latest products, finalized decisions for the research internship we are running in January, and shipped all of our final remaining belongings from Boston to SF, as well as getting a new office. We have learned so much this year, and we are excited to show you what we can do.
I have deep gratitude for 2025. It was a year of great joys and great pains — a year like dark metal, melted and annealed again and again, moving from fluid to form and into strength. Its transformations etched a whole world into me. The forge keeps hammering. 2026 has begun, and we live in a period of rapid change.
I hope we remember each other in our assorted paradises, whatever pain or joy they bring us.
Lists of the yearWritingI wrote more this year! I have two substacks now, one for technical takes and one for more personal writing.
I did some technical writing, for/with fulcrum and on my own:
- Dense reconstruction is the scaffold of machine learning on generalization and what we can learn in ML
- AI agents and painted facades on evals and model oversight [fulcrum]
- Personalization requires data
- AI takes some thoughts and predictions from May
More personal essays and poetry:
- Machine has no mouth and it must scream
- 2024 in review
- l imit [poetry]
- into the unknown
- out and about at the edge
- emulsion [poetry]
- what do you see now?
I also wrote a few more poems I haven't put up yet. I hope to keep writing in balance with my work.
Things I want to write about, if you're interested:
- Alignment as capabilities
- Personalization and gradual disempowerment
- Emotions as integrators
- Concreteness and abstraction in writing
- Towards an aesthetics for cyborgs
And other things, I'm sure.
BooksGreat
- The Things They Carried by Tim O'Brien
- Twice Alive by Forrest Gander
- Oblivion by David Foster Wallace
- The Road by Cormac McCarthy
- A Moveable Feast by Ernest Hemingway
- A Portrait of the Artist as a Young Man by James Joyce
- The Bluest Eye by Toni Morrison
Good
- Never Let Me Go by Kazuo Ishiguro
- Impro: Improvisation and the Theatre by Keith Johnstone
- Talking at the Boundaries by David Antin
- The Unaccountability Machine by Dan Davies
- On the Motion and Immobility of Douve by Yves Bonnefoy
- No Longer Human by Osamu Dazai
- The Baron in the Trees by Italo Calvino
- Notes from Underground by Fyodor Dostoevsky
- Elon Musk by Walter Isaacson
Check out my goodreads for more info, I will review some of these soon. I had a lot of hits this year!
MoviesAlso on letterboxd.
Great
- Paths of Glory
- Certified Copy
- Ma nuit chez Maud
- La collectionneuse
- Synecdoche New York
Good
- Parasite
- Betty Blue
- Wake up dead man
- Perfect Blue
- The color of pomegranates
Okay
- I, Tonya
- The cabinet of Dr Caligari
In random order:
- https://www.dreamsongs.com/RiseOfWorseIsBetter.html
- https://rowanhuang.com/takes/2025/03/08/capitalism.html
- https://www.theparisreview.org/interviews/6442/the-art-of-biography-no-5-robert-caro
- https://nabeelqu.co/on-reading-proust
- https://beatinpaths.com/2024/09/13/the-great-american-novel-project-explained/
- https://writetobrain.com/olfactory
- https://reactionwheel.net/2024/09/resignation-letter.html
- https://andrewwu.substack.com/p/why-music-a34
- https://walfred.substack.com/p/some-victorious-answer
- https://www.goodfire.ai/blog/you-and-your-research-agent
- https://si.inc/posts/the-heap/
- https://docs.google.com/presentation/d/1qVFDW8qT4CC4E_2TSVevrDbZ_Z9Utu_I1z0-ISLwZts/edit?slide=id.g37403db1f39_0_96#slide=id.g37403db1f39_0_96
- https://gwern.net/ai-daydreaming
- https://calv.info/openai-reflections
- https://skincontact.substack.com/p/21-observations-from-people-watching
- https://assets.stripeassets.com/fzn2n1nzq965/2pt3yIHthraqR1KwXgr98U/b6301040587a62d5b6ef7b76c904032d/Stripe-annual-letter-2024.pdf
- https://www.youtube.com/watch?v=3xIVCRoNCXg
- https://www.ettf.land/p/30-reflections
- https://aella.substack.com/p/bye-mom?utm_source=share&utm_medium=android&r=fa4z9&triedRedirect=true
- https://www.warrenzhu.com/sentences/
- https://open.substack.com/pub/noahpinion/p/the-ai-bust-scenario-that-no-one?utm_source=share&utm_medium=android&r=fa4z9
- https://www.yudhister.me/intentional-hobbling/
- https://www.thetedkarchive.com/library/c-p-snow-the-two-cultures
- https://www.avabear.xyz/p/is-friendship-romantic
- https://substack.com/home/post/p-179505702
- https://yourpublicuniversalfriend.substack.com/p/leave-your-boyfriend-a-short-story
- https://samkriss.substack.com/p/numb-at-burning-man
- https://voxpopulisphere.com/2024/10/25/zbigniew-herbert-the-envoy-of-mr-cogito/
- https://www.poetryfoundation.org/poems/52171/orpheus-alone-56d2306dd3444
- https://substack.com/home/post/p-150188028?source=queue Reflections on palantir
- https://www.warrenzhu.com/hci/2025/09/22/homo-faber-or-what-i-want-to-do.html
- https://nautil.us/when-einstein-tilted-at-windmills-236253/
- https://www.lesswrong.com/posts/JH6tJhYpnoCfFqAct/the-company-man
- https://www.newyorker.com/magazine/2019/02/18/deaf-republic
- https://mindslice.substack.com/p/alignment
- https://www.gleech.org/rats-and-trads
- https://www.leonardtang.me/blog/turning-23
- https://www.poetryfoundation.org/poems/47553/meditation-at-lagunitas
- https://www.alicemaz.com/writing/minecraft.html
- https://www.fantasticanachronism.com/p/the-alchemist-and-his-quicksilver
- https://yuxi-liu-wired.github.io/essays/posts/cyc/
- https://samkriss.substack.com/p/born-in-the-wrong-generation
- https://joecarlsmith.com/2020/11/22/the-impact-merge
- https://tsvibt.blogspot.com/2023/02/please-dont-throw-your-mind-away.html
- https://www.newyorker.com/magazine/2018/12/10/the-friendship-that-made-google-huge
- https://www.deseret.com/2022/8/22/23309244/cole-summers-died-newcastle-utah-warren-buffett-charlie-munger-bari-weiss-unschooled/
- https://mitadmissions.org/blogs/entry/learning-how-to-be-a-human-being-not-a-human-doing/
- https://rottenandgood.substack.com/p/taking-our-chances?r=51mrw&utm_campaign=post&utm_medium=web&triedRedirect=true
- https://kevinmunger.substack.com/p/in-the-belly-of-the-mrbeast
- https://topos.institute/blog/2024-08-27-plausible-fiction/
- https://www.nytimes.com/1999/09/05/arts/music-an-instant-fan-s-inspired-notes-you-gotta-listen.html
- https://winstonchurchill.hillsdale.edu/winston-churchills-dream-1947/
- https://epoch.ai/gradient-updates/movarec-s-paradox
- https://vinay.sh/i-am-rich-and-have-no-idea-what-to-do-with-my-life/
Discuss
Debunking claims about subquadratic attention
TL;DR: In the last couple years, there have been multiple hype moments of the form "<insert paper> figured out subquadratic/linear attention, this is a game changer!" However, all the subquadratic attention mechanisms I'm aware of either are quadratic the way they are implemented in practice (with efficiency improved by only a constant factor) or underperform quadratic attention on downstream capability benchmarks.
A central issue with attention is that its FLOP complexity is quadratic in the context length (number of tokens in a sequence) and its memory complexity during inference is linear in the context length. In the last couple years, there have been multiple claims, and hype around those claims, that new architectures solved some (often all) of those problems by making alternatives to attention whose FLOP complexity is linear and/or whose memory complexity during inference is constant. These are often called subquadratic/linear attention (as opposed to regular attention which I’ll call quadratic attention). The ones I’m aware of are Kimi Linear, DeepSeek Sparse Attention (DSA), Mamba (and variants), RWKV (and variants), and text diffusion. If this were true, it would be a big deal because it would make transformer inference a lot more efficient at long contexts.
In this blogpost, I argue that they are all better thought of as “incremental improvement number 93595 to the transformer architecture” than as “subquadratic attention, a more than incremental improvement to the transformer architecture". This is because the implementations that work in practice are quadratic and only improve attention by a constant factor and subquadratic implementations underperform quadratic attention on downstream benchmarks. I think some of them are still important and impressive - for instance, Kimi Linear’s 6.3x increased inference speed at 1 million token context lengths is impressive. I just argue that they are not particularly special among incremental improvements to the transformer architecture and not game changers.
- Kimi Linear and DeepSeek Sparse Attention (DSA) are actually quadratic as they are implemented in practice in the models that Kimi and DeepSeek trained using them. In Kimi Linear’s case, this is because they only use Kimi Linear on ¾ of the layers and use MLA, which is quadratic, on the remaining ¼ of the layers. They do not use Kimi Linear on all layers because it degrades downstream benchmark performance too much. In the setting where improvement is biggest (inference with a context length of 1M tokens) the improvement is 4x in terms of KV cache size (memory) and 6.3x in terms of inference speed. There is also a modest improvement in downstream benchmark performance. DSA does not reduce KV cache size but decreases per-token cost by a bigger factor of about 3x (prompt) and 7x (output) at the maximal context length of 128k tokens. It is still quadratic.
- Kimi are very clear about this in the paper and say everything I said here in the abstract. However, some people (not from Kimi) still hype Kimi Linear as subquadratic attention, which is why I included it here. Kimi is not to blame here and wrote an excellent paper.
- This is clear after a careful reading of DeepSeek’s paper, though DeepSeek emphasizes this less than Kimi.
- Mamba and RWKV do actually have a linear FLOP complexity and constant memory complexity during inference. However, while they perform comparably to attention in small to medium size models, they seem to underperform attention in terms of downstream benchmark performance on frontier scale models and are not used in frontier LLMs. My main reason for believing this is that I do not know of any frontier LLM that uses them, except for Mamba-attention hybrid models - models that have Mamba on a fraction of layers and quadratic attention on the other layers, see appendix for why this is still quadratic. Some papers on frontier Mamba-attention hybrid models do preliminary analysis comparing pure Mamba and Mamba-attention hybrid models. When they do, they usually say that pure Mamba models underperformed hybrids and that this is why they stuck to hybrid architectures. This provides empirical validation that pure Mamba underperforms hybrid architectures. A few 7B models do use pure Mamba and their papers find that it is as good or even a bit better than quadratic attention on downstream capability benchmarks. For example Codestral Mamba. However, the overwhelming majority of 7B models still use quadratic attention.
- While text diffusion models can greatly reduce memory usage by eliminating the need for KV caches entirely, they do not reduce the FLOP usage. In fact, they multiply the number of FLOPs needed for inference by a constant factor. Furthermore, same as for pure Mamba, no frontier model uses text diffusion and only a small number of sub-frontier models use it.
- There exist many incremental improvements that reduce FLOP and/or memory usage of attention by a constant factor that are not derived from, or related to, subquadratic attention. A probably non-exhaustive list of such improvements no one claims are subquadratic attention is: flash attention, Grouped Query Attention (GQA), sliding window attention (on some but not all layers), sparse attention, Multi Latent Attention (MLA), and making MLPs wider and attention narrower.
Those are entirely different mechanisms from attention that can be thought of as (much) better RNNs. They are actually subquadratic (in fact, linear) but they seem to underperform attention at frontier LLM scale, as argued for above. Mamba-attention hybrids do scale but are quadratic, as explained below for Kimi Linear.
Kimi LinearSimilar to Mamba and RWKV, Kimi Linear can be thought of as a (much) better RNN and it does actually have a linear FLOP complexity and constant memory complexity during inference. However, as said in the Kimi Linear paper, they use Kimi Linear at ¾ of layers and Multi Latent Attention (which is quadratic) on the remaining ¼ of layers. They say in the paper that when they tried using Kimi Linear on every layer, the hit to performance from doing this was too big:
Despite efficiency, pure Linear Attention still struggle with precise memory retrieval and exact copying. This deficiency hinders their adoption in industrial-scale LLMs where robust long-context recall (e.g., beyond 1M tokens) and reliable tool-use over extensive code repositories are critical.
And:
For Kimi Linear, we chose a layerwise approach (alternating entire layers) over a headwise one (mixing heads within layers) for its superior infrastructure simplicity and training stability. Empirically, a uniform 3:1 ratio, i.e., repeating 3 KDA layers to 1 full MLA layer, provided the best quality–throughput trade-off.
Thus, Kimi Linear as done in practice reduces the FLOP and memory used by the attention mechanism by a constant factor - the fraction of layers that don’t have it, in the paper’s case, ¼ (the reduction is smaller at shorter context lengths).
(Note on why the improvement in speed is 6.3x, which is bigger than 4x, at context length 1 million tokens: this is because additionally to making attention faster by a factor of almost 4x at big context length, Kimi Linear makes the KV cache smaller by a factor of almost 4x at big context length, which allows bigger batch sizes (by a factor of almost 4x), thus faster inference beyond the 4x improvement in attention FLOPs.)
DeepSeek Sparse Attention (DSA)DSA was introduced in the DeepSeek V3.2 paper and DeepSeek V3.2, a frontier model, uses it. It works in the following way:
- At each layer, the lightning indexer, which is a modified attention mechanism, chooses 2048 positions.
- A regular Multi Latent Attention (MLA) mechanism only attends to those positions.
Thus, DSA’s FLOP complexity has two components: the lightning indexer has (up to a constant) the same complexity as regular MLA (which is quadratic) and the the subsequent MLA has linear complexity (at big context lengths) - min(context_length**2, 2048 * context_length).
So if the lightning indexer is in practice hugely cheaper than the subsequent MLA, the complexity is linear, but if it is only cheaper by a small constant factor, the complexity is still quadratic, just smaller by a small constant factor.
And the theoretical FLOP usage of the lightning indexer is only smaller by a factor of 8, so complexity is still quadratic (at least in terms of theoretical FLOP usage). Here is the calculation that leads to 8: first, n_heads * d_head of the lightning indexer is half that of n_heads * d_head of the subsequent MLA. This is not written in the paper, but can be seen by inspecting the model’s config on HuggingFace. Then, the lightning indexer only has keys and queries, no values and outputs, so that’s another factor of 2. Finally, the lightning indexer is in FP8, not FP16, which is another factor of 2.
For prefill (prompt) tokens, his calculation matches DeepSeek’s empirical findings: figure 3 in the DeepSeek V3.2 paper shows that the slope of cost (in dollars) per token as a function of position in the sequence is about 8x smaller than for MLA at big context lengths. For decoding (output) tokens, the slope is about 20x smaller, not 8x, but this is still a constant factor improvement. The improvements in per-token cost for the token at position 128k are 3.5x for prefill tokens and 9x for decoding tokens (if you look at the average token at context length 128k and not only at the last one, they go down to 3x and 7x). Note that in November 2025 (the latest date for which data is available as of writing this blogpost), OpenRouter processed 8x more prompt tokens than output tokens.
Furthermore, DSA does not reduce the KV cache size (because the 2048 tokens it attends to are different for every generated token and only known when that token is generated). This is important, because an important way in which subquadratic attention is good (for capabilities) is by increasing inference speed by reducing KV cache size which allows bigger batch sizes during inference (thus making inference cheaper) and allowing for longer context lengths by being able to have KV cache for more tokens per gigabyte of GPU memory.
Text DiffusionAutoregressive LLMs (that is, all LLMs except for text diffusion LLMs) generate output tokens one by one in sequence, doing one forward pass per output token. A text diffusion LLM generates all the tokens at once in a single forward pass, but leaves X% of tokens blank. Then, it generates tokens in place of Y% of the blank tokens, also in a single forward pass. It repeats this a fixed number of times, after which no blank tokens remain.
Thus, while text diffusion eliminates the need for KV caches, it multiplies the FLOP usage on output tokens by a constant factor - the number of forward passes needed until no blank tokens remain.
(But wait, don’t autoregressive LLMs do one forward pass per output token, thus using more FLOPs than text diffusion models if the number of output tokens is big enough? No. Autoregressive LLMs do indeed do one forward pass per output token and thus usually do more forward passes than diffusion models. But they do each forward pass on only one token. Whereas text diffusion LLMs do each forward pass on all the output tokens at once. Thus, each forward pass of a text diffusion LLM requires as many FLOPs as all the forward passes of an autoregressive LLM combined. Text diffusion LLMs can be more efficient than autoregressive models in practice because it is usually more efficient on GPUs to do one big operation than many small operations in sequence, even when both require the same number of FLOPs[1]. However, these efficiency improvements can only happen until inference efficiency becomes bottlenecked by FLOPs.
- ^
This last sentence is oversimplified - another thing that matters here is the shapes of matrices that GPUs multiply. But this is out of the scope of this blogpost.
Discuss
The bio-pirate's guide to GLP-1 agonists
How to lose weight, infringe patents, and possibly poison yourself for 22 Euros a month.
IntroductionIn March 2025, Scott Alexander wrote:
Others are turning amateur chemist. You can order GLP-1 peptides from China for cheap. Once you have the peptide, all you have to do is put it in the right amount of bacteriostatic water. In theory this is no harder than any other mix-powder-with-water task. But this time if you do anything wrong, or are insufficiently clean, you can give yourself a horrible infection, or inactivate the drug, or accidentally take 100x too much of the drug and end up with negative weight and float up into the sky and be lost forever. ACX cannot in good conscience recommend this cheap, common, and awesome solution.
With a BMI of about 28, low executive function, a bit of sleep apnea and no willpower to spend on either dieting or dealing with the medical priesthood, I thought I would give it a try. This is a summary of my journey.
Please do not expect any great revelations here beyond "you can buy semaglutide from China, duh". All of the details here can also be found elsewhere, still I thought it might be helpful to write them down.
Also be careful when following medical advise from random people from the internet. The medical system is full of safeguards to make very sure that no procedure it does will ever hurt you. Here you are on your own. I am not a physician, just an interested amateur with a STEM background. If you do not know if it is ok to reuse syringes or inject air, or do not trust yourself to calculate your dose, I would recommend refraining from DIY medicine.
Picking a substance and route of administrationThe two main approved GLP-1 agonists are tirzepatide and semaglutide. Both are peptides (mini-proteins) with a mass of about 4-5kDa which cost approximately the same to produce. A typical long term dose of tirzepatide is 15mg/week, while semaglutide is 2.4mg/week, so I focused on sema because it looked like the cheaper option. [1]
While I would have preferred oral to subcutaneous injection, the bioavailability of oral semaglutide is kinda terrible, with typical long term doses around 14mg/day -- forty times the amount of subcutaneous injection. So I resolved to deal with the hassle of poking myself with needles and possibly giving myself 'horrible infections'.
Given that the long term dosage is 2.4mg/week, and that sources generally state that once opened, vials should be used within four (or eight) weeks, I decided that the optimal vial size would be 10mg -- enough for four weeks. [2]
Finding a vendorSo I started searching the web for possible vendors of lyophilized semaglutide. I found a wide range of prices from 250$ for a 10mg vial (which would last four weeks at maximum dosage) down to about 50$. And a single website which offered ten vials of 5mg each for 130$.
That one seemed to be a Chinese manufacturer of organic compounds [3] . Slightly broken English, endless lists of chemicals by CAS number, no working search function on the website, outdated and incomplete price info provided as a jpeg on the site. I figured that if it was a scam site, it was matching very well to my preconception of how a site of a company more enthusiastic about synthesis than selling to consumers would look like, and contacted them. After I was provided with current pricing (also as a series of jpegs), I sent them about 200 Euros worth of BTC [4] for ten 10mg vials plus shipping. (Shipping was 70$, probably indicative of a preference to sell in larger quantities.)
A week or so later, I got my delivery. Ten unmarked vials, of a volume of about 3ml each, filled to perhaps a third with a block of white stuff. [5]
I would have preferred to have a quantitative analysis of the contents, but all the companies in Germany I contacted were either unwilling to deal with consumers or unable to perform HPLC-MS, so I reasoned that the vendor would be unlikely to sell me vials filled with botulinum toxin instead [6] , and just started with injecting myself with 1/40th of a vial per week, which would amount to 0.25mg if the content was as advertised.
(If anyone has a great, affordable way for peptide analysis, please let me know in the comments!)
Sourcing bacteriostatic waterUnlike random pharmaceuticals, bacteriostatic water can be legally sold in Germany. Sadly, it would have cost me almost as much as the active ingredient, about 15 Euro per vial. So instead, I decided to craft my own bacteriostatic water. I sourced a lifetime supply of benzyl alcohol for a couple of Euros. Instead of dealing with distilled water, I bought sealed medical grade plastic vials of 0.9% NaCl solution "for inhalation", roughly 0.5 Euro a piece. Once a month, I add 0.1ml benzyl alcohol (naturally sterile) to one 5ml plastic vial, which gives me about 2% benzyl alcohol, which is twice of what is specified as BAC, erring on the side of caution (and tissue damage from alcohol, I guess).
Other equipmentI already had a bunch of sterile 20G hypodermic needles and 3ml syringes from another project. For injection of minute quantities of liquids into my body, I bought sterile 1ml insulin syringes with 6mm, 31G needles (@0.3 Euro). [7]
Happily, I owned a fridge and a disinfectant spray, completing my toolset.
My current procedureEvery four weeks, I will prepare a new vial. I prefer to fill the vials with 3ml of my BAC+, which should give 3.3mg/ml of sema.
Apply disinfectant to hands and workspace to taste. Then, start with opening a new plastic vial of NaCl. Using an insulin syringe, add 0.1ml benzyl alcohol to it. Unseal a 3ml syringe and needle and draw and release your BAC+ from the plastic vial a few times to mix it. Now draw 3ml of that, tear off the plastic cap [8] of your new glass vial, stick the needle through the rubber seal and slowly inject your BAC into the vial. The vial will be low-pressure, so getting liquid into it is really easy. Shake a bit and wait for the lyophilized peptide to dissolve. Store it in a fridge (preferably in the plastic box with the other vials), and liberally apply disinfectant to the rubber seal before and after each use.
To draw a dose, first figure out the volume you need. Disinfect, unseal your 1ml syringe, first inject an equal amount of air into the vial, then turn the vial rubber side down and draw the liquid. To start with, I would recommend drawing 0.1ml more than you need, because you will likely have some air bubbles in. Remove the needle, get rid of the excess air (and excess liquid). Check that you are in a private place, expose your tights, apply disinfectant, pinch the skin of your tight with two fingers, stick in the needle with the other hand, push down the plunger. Cover up your tights, put your vial back into the fridge, safely dispose of your needle. [9]
Outcome (N=1.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-surd + .mjx-box {display: inline-flex} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} )Having taken semaglutide as scheduled for some 22 weeks, I have recently cut my dosage in half because I have reached a BMI of 22.
Traditionally, weight loss was seen as a moral battle: eat less than you want to eat, eat different things than you want to eat, do more sports than you want to do. Basically, spend willpower points to lose weight.
GLP-1 agonists are a cheat code, like reaching enlightenment through psychedelics instead of years of meditation, or bringing a gun to a sword fight. I literally spend zero willpower points in this endeavor. I continue to eat what I want and how much I want, it just so happens I want less food. I am under no illusion that this cheat will give me the full benefits of exercise and proper diet. But realistically, these were never an option for me (until someone discovers an infinite willpower cheat).
Will I rebounce once I quit sema? Not a problem, at 20 Euros a month, the drug pays itself in money not spend on chocolate, and it is less of a hassle than taking my other pills once a day, so I am willing to continue to take it for the rest of my life.
Thanks to Scott Alexander for pointing out this option and to my Chinese vendor for providing Westerners like myself with cheap bodily autonomy.
Up next: The bio-pirate's guide to DIY MAID (due in a few decades, PRNS).
I should probably add that the price difference is not all that large. 60mg tirzepatide vials cost perhaps twice as much as 10mg sema vials, because a lot of it is dose-independent overhead. ↩︎
Also note that the standard recommended dose schedule calls for minimal doses at the start, which will come down to 75uL. This would be even less convenient using 20mg/vial. ↩︎
The sha1-sum of their 14-character domain name is 6682ca2d70b203e0487c49d868ea20401b5ede1c. Note that I can not vouch for their production chain (obviously), but am personally very happy with my dealings with them and plan to buy my next pack of vials from them as well. I do not want to link them to avoid this looking like a sketchy drug advertisement. DM me if you can't find them. ↩︎
Their slightly unfavorable exchange rate for bitcoin happily negated the exchange rate between US$ and Euro, simplifying my calculations here. ↩︎
I was suspicious about that amount, as I would have imagined that 10mg would be a lot less. While I do not have a scale on that level of precision, I confirmed that the residue of a dried droplet was much less indeed. ↩︎
If I had been wrong about that, I would at least have contributed to raising the sanity waterline. ↩︎
Obligatory whining about excessive paternalism: A few years ago, I could buy medical grade syringes and needles (Braun) from Amazon Germany. These days, all the offers say "for research purposes only". When did society decide that people having access to medical grade equipment for any purpose was a bad thing? Is anyone under the impression that not providing heroin addicts, needleplay enthusiasts, or peptide enjoyers will result in them abstaining from risky behavior? ↩︎
When I was taking ketamine for depression (filling injection vials into a nasal spray), I did not know about the plastic caps. Turns out it is really hard to pierce them with a hypodermic needle. ↩︎
I recap, which is fine here because I already have all the pathogens in my blood which might be on the needle, and then collect the needles in an empty bottle for disposal. ↩︎
Discuss
College Was Not That Terrible Now That I'm Not That Crazy
Previously, I wrote about how I was considering going back to San Francisco State University for two semesters to finish up my Bachelor's degree in math.
So, I did that. I think it was a good decision! I got more out of it than I expected.
To be clear, "better than I expected" is not an endorsement of college. SF State is still the same communist dystopia I remember from a dozen years ago—a bureaucratic command economy dripping in propaganda about how indispensible and humanitarian it is, whose subjects' souls have withered to the point where, even if they don't quite believe the propaganda, they can't conceive of life and work outside the system.
But it didn't hurt this time, because I had a sense of humor about it now—and a sense of perspective (thanks to life experience, no thanks to school). Ultimately, policy debates should not appear one-sided: if things are terrible, it's probably not because people are choosing the straightforwardly terrible thing for no reason whatsoever, with no trade-offs, coordination problems, or nonobvious truths making the terrible thing look better than it is. The thing that makes life under communism unbearable is the fact that you can't leave. Having escaped, and coming back as a visiting dignitary, one is a better position to make sense of how and why the regime functions—the problems it solves, at whatever cost in human lives or dignity—the forces that make it stable if not good.
Doing It Right This Time (Math)The undergraduate mathematics program at SFSU has three tracks: for "advanced studies", for teaching, and for liberal arts. My student record from 2013 was still listed as on the advanced studies track. In order to graduate as quickly as possible, I switched to the liberal arts track, which, beyond a set of "core" courses, only requires five electives numbered 300 or higher. The only core course I hadn't completed was "Modern Algebra I", and I had done two electives in Fall 2012 ("Mathematical Optimization" and "Probability and Statistics I"), so I only had four math courses (including "Modern Algebra I") to complete for the major.
"Real Analysis II" (Fall 2024)My last class at SF State in Spring 2013 (before getting rescued by the software industry) had been "Real Analysis I" with Prof. Alex Schuster. I regret that I wasn't in a state to properly focus and savor it at the time: I had a pretty bad sleep-deprivation-induced psychotic break in early February 2013 and for a few months thereafter was mostly just trying to hold myself together. I withdrew from my other classes ("Introduction to Functions of a Complex Variable" and "Urban Issues of Black Children and Youth") and ended up getting a B−.
My psychiatric impairment that semester was particularly disappointing because I had been looking forward to "Real Analysis I" as my first "serious" math class, being concerned with proving theorems rather than the "school-math" that most people associate with the subject, of applying given techniques to given problem classes. I had wanted to take it concurrently with the prerequsite, "Exploration and Proof" (which I didn't consider sufficiently "serious") upon transferring to SFSU the previous semester, but was not permitted to. I had emailed Prof. Schuster asking to be allowed to enroll, with evidence that I was ready (attaching a PDF of a small result I had proved about analogues of π under the p-norm, and including the contact email of Prof. Robert Hasner of Diablo Valley College, who had been my "Calculus III" professor and had agreed to vouch for my preparedness), but he didn't reply.
Coming back eleven years later, I was eager to make up for that disappointment by picking up where I left off in "Real Analysis II" with the same Prof. Schuster. On the first day on instruction, I wore a collared shirt and tie (and mask, having contracted COVID-19 while traveling the previous week) and came to classroom early to make a point of marking my territory, using the whiteboard to write out the first part of a proof of the multivariate chain rule that I was working through in Bernd S. W. Schröder's Mathematical Analysis: A Concise Introduction—my favorite analysis textbook, which I had discovered in the SFSU library in 2012 and subsequently bought my own copy. (I would soon check up on the withdrawal stamp sheet in the front of the library's copy. No one had checked it out in the intervening twelve years.)
The University Bulletin officially titled the course "Real Analysis II: Several Variables", so you'd expect that getting a leg up on the multidimensional chain rule would be studying ahead for the course, but it turned out that the Bulletin was lying relative to the syllabus that Prof. Schuster had emailed out the week before: we would be covering series, series of functions, and metric space topology. Fine. (I was already pretty familiar with metric space topology, but even my "non-epsilon" calculus-level knowledge of series was weak; to me, the topic stunk of school.)
"Real II" was an intimate class that semester, befitting the SFSU's status as a garbage-tier institution: there were only seven or eight students enrolled. It was one of many classes in the department that were cross-listed as both a graduate ("MATH 770") and upper-division undergraduate course ("MATH 470"). I was the only student enrolled in 470. The university website hosted an old syllabus from 2008 which said that the graduate students would additionally write a paper on an approved topic, but that wasn't a thing the way Prof. Schuster was teaching the course. Partway through the semester, I was added to Canvas (the online course management system) for the 770 class, to save Prof. Schuster and the TA the hassle of maintaining both.
The textbook was An Introduction to Analysis (4th edition) by William R. Wade, the same book that had been used for "Real I" in Spring 2013. It felt in bad taste for reasons that are hard to precisely articulate. I want to say the tone is patronizing, but don't feel like I could defend that judgement in debate against someone who doesn't share it. What I love about Schröder is how it tries to simultaneously be friendly to the novice (the early chapters sprinkling analysis tips and tricks as numbered "Standard Proof Techniques" among the numbered theorems and definitions) while also showcasing the fearsome technicality of the topic in excruciatingly detailed estimates (proofs involving chains of inequalities, typically ending on "< ε"). In contrast, Wade often feels like it's hiding something from children who are now in fact teenagers.
The assignments were a lot of work, but that was good. It was what I was there for—to prove that I could do the work. I could do most of the proofs with some effort. At SFSU in 2012–2013, I remembered submitting paper homework, but now, everything was uploaded to Canvas. I did all my writeups in LyX, a GUI editor for LaTeX.
One thing that had changed very recently, not about SFSU, but about the world, was the availability of large language models, which had in the GPT-4 era become good enough to be useful tutors on standard undergrad material. They definitely weren't totally reliable, but human tutors aren't always reliable, either. I adopted the policy that I was allowed to consult LLMs for a hint when I got stuck on homework assignments, citing the fact that I had gotten help in my writeup. Prof. Schuster didn't object when I inquired about the propriety of this at office hours. (I also cited office-hours hints in my writeups.)
Prof. Schuster held his office hours in the math department conference room rather than his office, which created a nice environment for multiple people to work or socialize, in addition to asking Prof. Schuster questions. I came almost every time, whether or not I had an analysis question for Prof. Schuster. Often there were other students from "Real II" or Prof. Schuster's "Real I" class there, or a lecturer who also enjoyed the environment, but sometimes it was just me.
Office hours chatter didn't confine itself to math. Prof. Schuster sometimes wore a Free Palestine bracelet. I asked him what I should read to understand the pro-Palestinian position, which had been neglected in my Jewish upbringing. He recommended Rashid Kalidi's The Hundred Years' War on Palestine, which I read and found informative (in contrast to the student pro-Palestine demonstrators on campus, whom I found anti-persuasive).
I got along fine with the other students but do not seem to have formed any lasting friendships. The culture of school didn't feel quite as bad as I remembered. It's unclear to me how much of this is due to my memory having stored a hostile caricature, and how much is due to my being less sensitive to it this time. When I was at SFSU a dozen years ago, I remember seething with hatred at how everyone talked about their studies in terms of classes and teachers and grades, rather than about the subject matter in itself. There was still a lot of that—bad enough that I complained about it at every opportunity—but I wasn't seething with hatred anymore, as if I had come to terms with it as mere dysfunction and not sacrilege. I only cried while complaining about it a couple times.
One of my signature gripes was about the way people in the department habitually refered to courses by number rather than title, which felt like something out of a dystopian YA novel. A course title like "Real Analysis II" at least communicates that the students are working on real analysis, even if the opaque "II" doesn't expose which real-analytic topics are covered. In contrast, a course number like "MATH 770" doesn't mean anything outside of SFSU's bureaucracy. It isn't how people would talk if they believed there was a subject matter worth knowing about except insofar as the customs of bureaucratic servitude demanded it.
There were two examinations: a midterm, and the final. Each involved stating some definitions, identifying some propositions as true or false with a brief justification, and writing two or three proofs. A reference sheet was allowed, which made the definitions portion somewhat farcical as a test of anything more than having bothered to prepare a reference sheet. (I objected to Prof. Schuster calling it a "cheat sheet." Since he was allowing it, it's wasn't "cheating"!)
I did okay. I posted a 32.5/40 (81%) on the midterm. I'm embarrassed by my performance on the final. It looked easy, and I left the examination room an hour early after providing an answer to all the questions, only to realize a couple hours later that I had completely botched a compactness proof. Between that gaffe, the midterm, and my homework grades, I was expecting to end up with a B+ in the course. (How mortifying—to have gone back to school almost specifically for this course and then not even get an A.) But when the grades came in, it ended up being an A: Prof. Schuster only knocked off 6 points for the bogus proof, for a final exam grade of 44/50 (88%), and had a policy of discarding the midterm grade when the final exam grade was higher. It still seemed to me that that should have probably worked out to an A− rather than an A, but it wasn't my job to worry about that.
"Probability Models" (Fall 2024)In addition to the rarified math-math of analysis, the practical math of probability seemed like a good choice for making the most of my elective credits at the university, so I also enrolled in Prof. Anandamayee Mujamdar's "Probability Models" for the Fall 2024 semester. The prerequisites were linear algebra, "Probability and Statistics I", and "Calculus III", but the registration webapp hadn't allowed me to enroll, presumably because it didn't believe I knew linear algebra. (The linear algebra requirement at SFSU was four units. My 2007 linear algebra class from UC Santa Cruz, which was on a quarter system, got translated to 3.3 semester units.) Prof. Mujamdar hadn't replied to my July email requesting a permission code, but got me the code after telling me to send a followup email after I inquired in person at the end of the first class.
(I had also considered taking the online-only "Introduction to Linear Models", which had the same prerequisites, but Prof. Mohammad Kafai also hadn't replied to my July email, and I didn't bother following up, which was just as well: the semester ended up feeling busy enough with just the real analysis, probability models, my gen-ed puff course, and maintaining my soul in an environment that assumes people need a bureaucratic control structure in order to keep busy.)
Like "Real II", "Probability Models" was also administratively cross-listed as both a graduate ("MATH 742", "Advanced Probability Models") and upper-division undergraduate course ("MATH 442"), despite no difference whatsoever in the work required of graduate and undergraduate students. After some weeks of reviewing the basics of random variables and conditional expectation, the course covered Markov chains and the Poisson process.
The textbook was Introduction to Probability Models (12th edition) by Sheldon M. Ross, which, like Wade, felt in bad taste for reasons that were hard to put my finger on. Lectures were punctuated with recitation days on which we took a brief quiz and then did exercises from a worksheet for the rest of the class period. There was more content to cover than the class meeting schedule could accomodate, so there were also video lectures on Canvas, which I mostly did not watch. (I attended class because it was a social expectation and because attendance was 10% of the grade, but I preferred to learn from the book. As long as I was completing the assignments, that shouldn't be a problem ... right?)
In contrast to what I considered serious math, the course was very much school-math about applying particular techniques to solve particular problem classes, taken to the parodic extent of quizzes and tests re-using worksheet problems verbatim. (You'd expect a statistics professor to know not to test on the training set!)
It was still a lot of work, which I knew needed to be taken seriously in order to do well in the course. The task of quiz #2 was to derive the moment-generating function of the exponential distribution. I had done that successfully on the recitation worksheet earlier, but apparently that and the homework hadn't been enough practice, because I botched it on quiz day. After the quiz, Prof. Mujamdar wrote the correct derivation on the board. She had also said that we could re-submit a correction to our quiz for half-credit, but I found this policy confusing: it felt morally dubious that it should be possible to just copy down the solution from the board and hand that in, even for partial credit. (I guess the policy made sense from the perspective of schoolstudents needing to be nudged and manipulated with credit in order to do even essential things like trying to learn from one's mistakes.) For my resubmission, I did the correct derivation at home in LyX, got it printed, and bought it to office hours the next class day. I resolved to be better prepared for future quizzes (to at least not botch them, minor errors aside) in order to avoid the indignity of having an incentive to resubmit.
I mostly succeeded at that. I would end up doing a resubmission for quiz #8, which was about how to sample from an exponential distribution (with λ=1) given the ability to sample from the uniform distribution on [0,1], by inverting the exponential's cumulative distribution function. (It had been covered in class, and I had gotten plenty of practice on that week's assignments with importance sampling using exponential proposal distributions, but I did it in Rust using the rand_distr library rather than what was apparently the intended method of implementing exponential sampling from a uniform RNG "from scratch".) I blunted the indignity of my resubmission recapitulating the answer written on the board after the quiz by additionally inverting by myself the c.d.f. of a different distribution, the Pareto.
I continued my practice of using LLMs for hints when I got stuck on assignments, and citing the help in my writeup; Prof. Mujamdar seemed OK with it when I mentioned it at office hours. (I went to office hours occasionally, when I had a question for Prof. Mujamdar, who was kind and friendly to me, but it wasn't a social occasion like Prof. Schuster's conference-room office hours.)
I was apparently more conscientious than most students. Outside of class, the grad student who graded our assignments recommended that I make use of the text's solutions manual (which was circulating in various places online) to check my work. Apparently, he had reason to suspect that some other students in the class were just copying from the solution manual, but was not given the authority to prosecute the matter when he raised the issue to the professor. He said that he felt bad marking me down for my mistakes when it was clear that I was trying to do the work.
The student quality seemed noticeably worse than "Real II", at least along the dimensions that I was sensitive to. There was a memorable moment when Prof. Mujamdar asked which students were in undergrad. I raised my hand. "Really?" she said.
It was only late in the semester that I was alerted by non-course reading (specifically a footnote in the book by Daphne Koller and the other guy) that the stationary distribution of a Markov chain is an eigenvector of the transition matrix with eigenvalue 1. Taking this linear-algebraic view has interesting applications: for example, the mixing time of the chain is determined by the second-largest eigenvalue, because any starting distribution can be expressed in terms of an eigenbasis, and the coefficients of all but the stationary vector decay as you keep iterating (because all the other eigenvalues are less than 1).
The feeling of enlightenment was outweighed by embarrassment that I hadn't independently noticed that the stationary distribution was an eigenvector (we had been subtracting 1 off the main diagonal and solving the system for weeks; the operation should have felt familiar), and, more than either of those, annoyance that neither the textbook nor the professor had deigned to mention this relevant fact in a course that had linear algebra as a prerequisite. When I tried to point it out during the final review session, it didn't seem like Prof. Mujamdar had understood what I said—not for the lack of linear algebra knowledge, I'm sure—let alone any of the other students.
I can only speculate that the occurrence of a student pointing out something about mathematical reality that wasn't on the test or syllabus was so unexpected, so beyond what everyone had been conditioned to think school was about, that no one had any context to make sense of it. A graduate statistics class at San Francisco State University just wasn't that kind of space. I did get an A.
The 85th William Lowell Putnam Mathematical CompetitionI also organized a team for the Putnam Competition, SFSU's first in institutional memory. (I'm really proud of my recruitment advertisements to the math majors' mailing list.) The story of the Putnam effort has been recounted in a separate post, "The End of the Movie: SF State's 2024 Putnam Competition Team, A Retrospective".
As the email headers at the top of the post indicate, the post was originally composed for the department mailing lists, but it never actually got published there: department chair Eric Hsu wrote to me that it was "much too long to send directly to the whole department" but asked for my "permission to eventually share it with the department, either as a link or possibly as a department web page." (He cc'd a department office admin whom I had spoken to about posting the Putnam training session announcements on the mailing list; reading between the lines, I'm imagining that she was discomfited by the tone of the post and had appealed to Chair Hsu's authority about whether to let it through.)
I assumed that the ask to share with the department "eventually" was polite bullshit on Hsu's part to let me down gently. (Probably no one gets to be department chair without being molded into a master of polite bullshit.) Privately, I didn't think the rationale made sense—it's just as easy to delete a long unwanted mailing list message as a short one; the email server wasn't going to run out of paper—but it seemed petty to argue. I replied that I hadn't known the rules for the mailing list and that he should feel free to share or not as he saw fit.
"Measure and Integration" (Spring 2025)I had a busy semester planned for Spring 2025, with two graduate-level (true graduate-level, not cross-listed) analysis courses plus three gen-ed courses that I needed to graduate. (Following Prof. Schuster, I'm humorously counting "Modern Algebra I" as a gen-ed course.) I only needed one upper-division undergrad math course other than "Modern Algebra I" to graduate, but while I was at the University for one more semester, I was intent on getting my money's worth. I aspired to get a head start (ideally on all three math courses) over winter break and checked out a complex analysis book with exercise solutions from the library, but only ended up getting any traction on measure theory, doing some exercises from chapter 14 of Schröder, "Integration on Measure Spaces".
Prof. Schuster was teaching "Measure and Integration" ("MATH 710"). It was less intimate than "Real II" the previous semester, with a number of students in the teens. The class met at 9:30 a.m. on Tuesdays and Thursdays, which I found inconveniently early in the morning given my hour-and-twenty-minute BART-and-bus commute. I was late the first day. After running into to the room, I put the printout of my exercises from Schröder on the instructor's desk and said, "Homework." Prof. Schuster looked surprised for a moment, then accepted it without a word.
The previous semester, Prof. Schuster said he was undecided between using Real Analysis by Royden and Measure, Integration, and Real Analysis by Sheldon Axler (of Linear Algebra Done Right fame, and also our former department chair at SFSU) as the textbook. He ended up going with Axler, which for once was in good taste. (Axler would guest-lecture one day when Prof. Schuster was absent. I got him to sign my copy of Linear Algebra Done Right.) We covered Lebesgue measure and the Lebesgue integral, then skipped over the chapter on product measures (which Prof. Schuster said was technical and not that interesting) in favor of starting on Banach spaces. (As with "Several Variables" the previous semester, Prof. Schuster did not feel beholden to making the Bulletin course titles not be lies; he admitted late in the semester that it might as well have been called "Real Analysis III".)
I would frequently be a few minutes late throughout the semester. One day, the BART had trouble while my train was in downtown San Francisco, and it wasn't clear when it would move again. I got off and summoned a Waymo driverless taxi to take me the rest of the way to the University. We were covering the Cantor set that day, and I rushed in with more than half the class period over. "Sorry, someone deleted the middle third of the train," I said.
Measure theory was a test of faith which I'm not sure I passed. Everyone who reads Wikipedia knows about the notorious axiom of choice. This was the part of the school curriculum in which the axiom of choice becomes relevant. It impressed upon me that as much as I like analysis as an intellectual activity, I ... don't necessarily believe in this stuff? We go to all this work to define sigma-algebras in order to rule out pathological sets whose elements cannot be written down because they're defined using the axiom of choice. You could argue that it's not worse than uncountable sets, and that alternatives to classical mathematics just end up needing to bite different bullets. (In computable analysis, equality turns out to be uncomputable, because there's no limit on how many decimal places you would need to check for a tiny difference between two almost-equal numbers. For related reasons, all computable functions are continuous.) But I'm not necessarily happy about the situation.
I did okay. I was late on some of the assignments (and didn't entirely finish assignments #9 and #10), but the TA was late in grading them, too. I posted a 31/40 (77.5%) on the midterm. I was expecting to get around 80% on the final based on my previous performance on Prof. Schuster's examinations, but I ended up posting a 48/50 (96%), locking in an A for the course.
"Theory of Functions of a Complex Variable" (Spring 2025)My other graduate course was "Theory of Functions of a Complex Variable" ("MATH 730"), taught by Prof. Chun-Kit Lai. I loved the pretentious title and pronounced all seven words at every opportunity. (Everyone else, including Prof. Lai's syllabus, said "complex analysis" when they didn't say "730".)
The content lived up to the pretension of the title. This was unambiguously the hardest school class I had ever taken. Not in the sense that Prof. Lai was particularly strict about grades or anything; on the contrary, he seemed charmingly easygoing about the institutional structure of school, while of course taking it for granted as an unquestioned background feature of existence. But he was pitching the material to a higher level than Prof. Schuster or Axler.
The textbook was Complex Analysis by Elias M. Stein and Rami Shakarchi, volume II in their "Princeton Lectures in Analysis" series. Stein and Shakarchi leave a lot to the reader (prototypically a Princeton student). It wasn't to my taste—but this time, I knew the problem was on my end. My distaste for Wade and Ross had been a reflection of the ways in which I was spiritually superior to the generic SFSU student; my distaste for Stein and Shakarchi reflected the grim reality that I was right where I belonged.
I don't think I was alone in finding the work difficult. Prof. Lai gave the entire class an extension to rebsubmit assignment #2 because the average performance had been so poor.
Prof. Lai didn't object to my LLM hint usage policy when I inquired about it at office hours. I still felt bad about how much external help I needed just to get through the assignments. The fact that I footnoted everything meant that I wasn't being dishonest. (In his feedback on my assignment #7, Prof. Lai wrote to me, "I like your footnote. Very genuine and is a modern way of learning math.") It still felt humiliating to turn in work with so many footnotes: "Thanks to OpenAI o3-mini-high for hints", "Thanks to Claude Sonnet 3.7 for guidance", "Thanks to [classmate's name] for this insight", "Thanks to the "Harmonic Conjugate" Wikipedia article", "This is pointed out in Tristan Needham's Visual Complex Analysis, p. [...]", &c.
It's been said that the real-world usefulness of LLM agents has been limited by low reliability impeding the horizon length of tasks: if the agent can only successfully complete a single step with probability 0.9, then its probability of succeeding on a task that requires ten correct steps in sequence is only 0.9<sup>10</sup> ≈ 0.35.
That was about how I felt with math. Prof. Schuster was assigning short horizon-length problems from Axler, which I could mostly do independently; Prof. Lai was assigning longer horizon-length problems from Stein and Shakarchi, which I mostly couldn't. All the individual steps made sense once explained, but I could only generate so many steps before getting stuck.
If I were just trying to learn, the external help wouldn't have seemed like a moral issue. I look things up all the time when I'm working on something I care about, but the institutional context of submitting an assignment for a grade seemed to introduce the kind of moral ambiguity that had made school so unbearable to me, in a way that didn't feel fully mitigated by the transparent footnotes.
I told myself not to worry about it. The purpose of the "assignment" was to help us to learn about the theory of functions of a complex variable, and I was doing that. Prof. Lai had said in class and in office hours that he trusted us, that he trusted me. If I had wanted to avoid this particular source of moral ambiguity at all costs, but still wanted a Bachelor's degree, I could have taken easier classes for which I wouldn't need so much external assistance. (I didn't even need the credits from this class to graduate.)
But that would be insane. The thing I was doing now, of jointly trying to maximize math knowledge while also participating in the standard system to help with that, made sense. Minimizing perceived moral ambiguity (which was all in my head) would have been a really stupid goal. Now, so late in life at age 37, I wanted to give myself fully over to not being stupid, even unto the cost of self-perceived moral ambiguity.
Prof. Lai eschewed in-person exams in favor of take-homes for both the midterm and the final. He said reasonable internet reference usage was allowed, as with the assignments. I didn't ask for further clarification because I had already neurotically asked for clarification about the policy for the assignments once more than was necessary, but resolved to myself that for the take-homes, I would allow myself static websites but obviously no LLMs. I wasn't a grade-grubber; I would give myself the authentic 2010s take-home exam experience and accept the outcome.
(I suspect Prof. Lai would have allowed LLMs on the midterm if I had asked—I didn't get the sense that he yet understood the edge that the latest models offered over mere books and websites. On 29 April, a friend told me that instructors will increasingly just assume students are cheating with LLMs anyway; anything that showed I put thought in would be refreshing. I said that for this particular class and professor, I thought I was a semester or two early for that. In fact, I was two weeks early: on 13 May, Prof. Lai remarked before class and in the conference room during Prof. Schuster's office hours that he had given a bunch of analysis problems to Gemini the previous night, and it got them all right.)
I got a 73/100 on my midterm. Even with the (static) internet, sometimes I would hit a spot where I got stuck and couldn't get unstuck in a reasonable amount of time.
There were only 9 homework assignments during the semester (contrasted to 12 in "Measure and Integration") to give us time to work on an expository paper and presentation on one of either the Gamma function, the Reimann zeta function, the prime number theorem, or elliptic functions. I wrote four pages on "Pinpointing the Generalized Factorial", explaining the motivation of the Gamma function, except that I'm not fond of how the definition is shifted by one from what you'd expect, so I wrote about the unshifted Pi function instead.
I wish I had allocated more time to it. This was my one opportunity in my institutionalized math career to "write a paper" and not merely "complete an assignment"; it would have been vindicating to go over and above knocking this one out of the park. (Expository work had been the lifeblood of my non-institutionalized math life.) There was so much more I could have said about the generalized factorial, and applications (like the fractional calculus), but it was a busy semester and I didn't get to it. It's hardly an excuse that Prof. Lai wrote an approving comment and gave me full credit for those four pages.
I was resolved to do better on the take-home final than the take-home midterm, but it was a struggle. I eventually got everything, but what I submitted ended up having five footnotes to various math.stackexchange.com answers. (I was very transparent about my reasoning process; no one could accuse me of dishonesty.) For one problem, I ended up using formulas for the modulus of the derivative of a Blashke factor at 0 and the preimage of zero which I found in David C. Ulrich's Complex Made Simple from the University library. It wasn't until after I submitted my work that I realized that the explicit formulas had been unnecessary; the fact that they were inverses followed from the inverse function theorem.
Prof. Lai gave me 95/100 on my final, and an A in the course. I think he was being lenient with the points. Looking over the work I had submitted throughout the semester, I don't think it would have been an A at Berkeley (or Princeton).
I guess that's okay because grades aren't real, but the work was real. If Prof. Lai had faced a dilemma between watering down either the grading scale or the course content in order to accomodate SFSU students being retarded, I'm glad he chose to preserve the integrity of the content.
"Modern Algebra I" (Spring 2025)One of the quirks of being an autodidact is that it's easy to end up with an "unbalanced" skill profile relative to what school authorities expect. As a student of mathematics, I consider myself more of an analyst than an algebraist and had not previously prioritized learning abstract algebra nor (what the school authorities cared about) "taking" an algebra "class", neither the previous semester nor in Fall 2012/Spring 2013. (Over the years, I had taken a few desultory swings at Dummit & Foote, but had never gotten very far.) I thus found myself in Prof. Dusty Ross's "Modern Algebra I" ("MATH 335"), the last "core" course I needed to graduate.
"Modern Algebra I" met on Monday, Wednesday, and Friday. All of my other classes met Tuesdays and Thursdays. I had wondered whether I could save myself a lot of commuting by ditching algebra most of the time, but started off the semester dutifully attending—and, as long as I was on campus that day anyway, also sitting in on Prof. Ross's "Topology" ("MATH 450") even though I couldn't commit to a fourth math course for credit.
Prof. Ross is an outstanding schoolteacher, the best I encountered at SFSU. I choose my words here very carefully. I don't mean he was my favorite professor. I mean that he was good at his job. His lectures were clear and well-prepared, and puncutated with group work on well-designed worksheets (pedogogically superior to the whole class just being lecture). The assignments and tests were fair, and son on.
On the first day, he brought a cardboard square with color-labeled corners to illustrate the dihedral group. When he asked us how many ways there were to position the square, I said: eight, because the dihedral group for the n-gon has 2<em>n</em> elements. On Monday of the second week, Prof. Ross stopped me after class to express disapproval with how I had brought out my copy of Dummit & Foote and referred to Lagrange's theorem during the group worksheet discussion about subgroups of cyclic groups; we hadn't covered that yet. He also criticized my response about the dihedral group from the previous week; those were just words, he said. I understood the criticism that there's a danger in citing results you or your audience might not understand, but resented the implication that knowledge that hadn't been covered in class was therefore inadmissible.
I asked whether he cared whether I attended class, and he said that the answer was already in the syllabus. (Attendance was worth 5% of the grade.) After that, I mostly stayed home on Mondays, Wednesdays, and Fridays unless there was a quiz (and didn't show up to topology again), which seemed like a mutually agreeable outcome to all parties.
Dusty Ross is a better schoolteacher than Alex Schuster, but in my book, Schuster is a better person. Ross believes in San Francisco State University; Schuster just works there.
The course covered the basics of group theory, with a little bit about rings at the end of the semester. The textbook was Joseph A. Gallian's Contemporary Abstract Algebra, which I found to be in insultingly poor taste. The contrast between "Modern Algebra I" ("MATH 335") and "Theory of Functions of a Complex Variable" ("MATH 730") that semester did persuade me that the course numbers did have semantic content in their first digit (3xx = insulting, 4xx or cross-listed 4xx/7xx = requires effort, 7xx = potentially punishing).
I mostly treated the algebra coursework as an afterthought to the analysis courses I was devoting most of my focus to. I tried to maintain a lead on the weekly algebra assignments (five problems hand-picked by Prof. Ross, not from Gallian), submitting them an average of 5.9 days early—in the spirit of getting it out of the way. On a few assignments, I wrote some Python to compute orders of elements or cosets of permutation groups in preference to doing it by hand. One week I started working on the prequisite chapter on polynomial rings from the algebraic geometry book Prof. Ross had just written with his partner Prof. Emily Clader, but that was just to show off to Prof. Ross at office hours that I had at least looked at his book; I didn't stick with it.
The Tutoring and Academic Support Center (TASC) offered tutoring for "Modern Algebra I", so I signed up for weekly tutoring sessions with the TA for the class, not because I needed help to do well in the class, but it was nice to work with someone. Sometimes I did the homework, sometimes we talked about some other algebra topic (from Dummit & Foote, or Ross & Clader that one week), one week I tried to explain my struggles with measure theory. TASC gave out loyalty program–style punch cards that bribed students with a choice between two prizes every three tutoring sessions, which is as patronizing as it sounds, but wondering what the next prize options would be was a source of anticipation and mystery; I got a pen and a button and a tote bag over the course of the semester.
I posted a somewhat disappointing 79/90 (87.8%) on the final, mostly due to stupid mistakes or laziness on my part; I hadn't prepped that much. Wracking my brain during a "Give an example of each the [sic] following" question on the exam, I was proud to have come up with the quaternions and "even-integer quaternions" as examples of noncommutative rings with and without unity, respectively.
He didn't give me credit for those. We hadn't covered the quaternions in class.
Not Sweating the Fake Stuff (Non-Math)In addition to the gen-ed requirements that could be satisfied with transfer credits, there were also upper-division gen-ed requirements that had to be taken at SFSU: one class each from "UD-B: Physical and/or Life Sciences" (which I had satisfied with a ridiculous "Contemporary Sexuality" class in Summer 2012), "UD-C: Arts and/or Humanities", and "UD-D: Social Sciences". There was also an "Area E: Lifelong Learning and Self-Development" requirement, and four "SF State Studies" requirements (which overlapped with the UD- classes).
"Queer Literatures and Media" (Fall 2024)I try to keep it separate from my wholesome math and philosophy blogging, but at this point it's not a secret that I have a sideline in gender-politics blogging. As soon as I saw the title in the schedule of classes, it was clear that if I had to sit through another gen-ed class, "Queer Literatures and Media" was the obvious choice. I thought I might be able to reuse some of my coursework for the blog, or if nothing else, get an opportunity to troll the professor.
The schedule of classes had said the course was to be taught by Prof. Deborah Cohler, so in addition to the listed required texts, I bought the Kindle version of her Citizen, Invert, Queer: Lesbianism and War in Early Twentieth-Century Britain, thinking that "I read your book, and ..." would make an ideal office-hours icebreaker. There was a last-minute change: the course would actually be taught by Prof. Sasha Goldberg (who would not be using Prof. Cohler's book list; I requested Kindle Store refunds on most of them).
I didn't take the class very seriously. I was taking "Real Analysis II" and "Probability Models" seriously that semester, because for those classes, I had something to prove—that I could do well in upper-division math classes if I wanted to. For this class, the claim that "I could if I wanted to" didn't really seem in doubt.
I didn't not want to. But even easy tasks take time that could be spent doing other things. I didn't always get around to doing all of the assigned reading or video-watching. I didn't read the assigned segment of Giovanni's Room. (And honestly disclosed that fact during class discussion.) I skimmed a lot of the narratives in The Stonewall Reader. My analysis of Carol (assigned as 250 words, but I wrote 350) used evidence from a scene in the first quarter of the film, because that was all I watched. I read the Wikipedia synopsis of They/Them instead of watching it. I skimmed part of Fun Home, which was literally a comic book that you'd expect me to enjoy. When Prof. Goldberg assigned an out-of-print novel (and before it was straightened out how to get it free online), I bought the last copy from AbeBooks with expedited shipping ... and then didn't read most of it. (I gave the copy to Prof. Goldberg at the end of the semester.)
My negligence was the source of some angst. If I was going back to school to "do it right this time", why couldn't I even be bothered to watch a movie as commanded? It's not like it's difficult!
But the reason I had come back was that I could recognize the moral legitimacy of a command to prove a theorem about uniform convergence. For this class, while I could have worked harder if I had wanted to, it was hard to want to when much of the content was so impossible to take seriously.
Asked to explain why the author of an article said that Halloween was "one of the High Holy Days for the gay community", I objected to the characterization as implicitly anti-Semitic and homophobic. The High Holy Days are not a "fun" masquerade holiday the way modern Halloween is. The יָמִים נוֹרָאִים—yamim noraim, "days of awe"—are a time of repentance and seeking closeness to God, in which it is said that הַשֵּׁם—ha'Shem, literally "the name", an epithet for God—will inscribe the names of the righteous in the Book of Life. Calling Halloween a gay High Holy Day implicitly disrespects either the Jews (by denying the seriousness of the Days of Awe), or the gays (by suggesting that their people are incapable of seriousness), or the reader (by assuming that they're incapable of any less superficial connection between holidays than "they both happen around October"). In contrast, describing Halloween as a gay Purim would have been entirely appropriate. "They tried to genocide us; we're still here; let's have a masquerade party with alcohol" is entirely in the spirit of both Purim and Halloween.
I was proud of that answer (and Prof. Goldberg bought it), but it was the pride of coming up with something witty in response to a garbage prompt that had no other function than to prove that the student can read and write. I didn't really think the question was anti-Semitic and homophobic; I was doing a bit.
Another assignment asked us to write paragraphs connecting each of our more theoretical course readings (such as Susan Sontag's "Notes on Camp", or an excerpt from José Esteban Muñoz's Disidentifications: Queers of Color and the Performance of Politics) to Gordo, a collection of short stories about a gay Latino boy growing up in 1970s California. (I think Prof. Goldberg was concerned that students hadn't gotten the "big ideas" of the course, such as they were, and wanted to give an assignment that would force us to re-read them.)
I did it, and did it well. ("[F]or example, Muñoz discusses the possibility of a queer female revolutionary who disidentifies with Frantz Fanon's homophobia while making use of his work. When Nelson Pardo [a character in Gordo] finds some pleasure in American daytime television despite limited English fluency ("not enough to understand everything he is seeing", p. 175), he might be practicing his own form of disidentification.") But it took time out of my day, and it didn't feel like time well spent.
There was a discussion forum on Canvas. School class forums are always depressing. No one ever posts in them unless the teacher makes an assignment of it—except me. I threw together a quick 1800-word post, "in search of gender studies (as contrasted to gender activism)". It was clever, I thought, albeit rambling and self-indulgent, as one does when writing in haste. It felt like an obligation, to show the other schoolstudents what a forum could be and should be. No one replied.
I inquired about Prof. Goldberg's office hours, which turned out to be directly before and after class, which conflicted with my other classes. (I gathered that Prof. Goldberg was commuting to SF State specifically to teach this class in an adjunct capacity; she more commonly taught at City College of San Francisco.) I ditched "Probability Models" lecture one day, just to talk with her about my whole deal. (She didn't seem to approve of me ditching another class when I mentioned that detail.)
It went surprisingly well. Prof. Goldberg is a butch lesbian who, crucially, was old enough to remember the before-time prior to the hegemony of gender identity ideology, and seemed sympathetic to gentle skepticism of some of the newer ideas. She could grant that trans women's womanhood was different from that of cis women, and criticized the way activists tend to glamorize suicide, in contrast to promoting narratives of queer resilience.
When I mentioned my specialization, she remarked that she had never had a math major among her students. Privately, I doubted whether that was really true. (I couldn't have been the only one who needed the gen-ed credits.) But I found it striking for the lack of intellectual ambition it implied within the discipline. I unironically think you do need some math in order to do gender studies correctly—not a lot, just enough linear-algebraic and statistical intuition to ground the idea of categories as clusters in high-dimensional space. I can't imagine resigning myself to such smallness, consigning such a vast and foundational area of knowledge to be someone else's problem—or when I do (e.g., I can't say I know any chemistry), I feel sad about it.
I was somewhat surprised to see Virginia Prince featured in The Stonewall Reader, which I thought was anachronistic: Prince is famous as the founder of Tri-Ess, the Society for the Second Self, an organization for heterosexual male crossdressers which specifically excluded homosexuals. I chose Prince as the subject for my final project/presentation.
Giving feedback on my project proposal, Prof. Goldberg wrote that I "likely got a master's thesis in here" (or, one might think, a blog?), and that "because autogynephilia wasn't coined until 1989, retroactively applying it to a subject who literally could not have identified in that way is inaccurate." (I wasn't writing about how Prince identified.)
During the final presentations, I noticed that a lot of students were slavishly mentioning the assignment requirements in the presentation itself: the rubric had said to cite two readings, two media selections, &c. from the course, and people were explicitly saying, "For my two course readings, I choose ..." When I pointed out to the Prof. Goldberg that this isn't how anyone does scholarship when they have something to say (you cite sources in order to support your thesis; you don't say "the two works I'm citing are ..."), she said that we could talk about methodology later, but that the assignment was what it was.
For my project, I ignored the presentation instructions entirely and just spent the two days after the Putnam exam banging out a paper titled "Virginia Prince and the Hazards of Noticing" (four pages with copious footnotes, mostly self-citing my gender-politics blog, in LyX with a couple of mathematical expressions in the appendix—a tradition from my community college days). For my presentation, I just had my paper on the screen in lieu of slides and talked until Prof. Goldberg said I was out of time (halfway through the second page).
I didn't think it was high-quality enough to republish on the blog.
There was one day near the end of the semester when I remember being overcome with an intense feeling of sadness and shame and anger at the whole situation—at the contradiction between what I "should" have done to do well in the class, and what I did do. I felt both as if the contradiction was a moral indictment of me, and that the feeling that it was a moral indictment was a meta-moral indictment of moral indictment.
The feeling passed.
Between the assignments I had skipped and my blatant disregard of the final presentation instructions, I ended up getting a C− in the class, which is perhaps the funniest possible outcome.
"Philosophy of Animals" (Spring 2025)I was pleased that the charmingly-titled "Philosophy of Animals" fit right into my Tuesday–Thursday schedule after measure theory and the theory of functions of a complex variable. It would satisfy the "UD-B: Physical/Life Science" and "SF State Studies: Environmental Sustainability" gen-ed requirements.
Before the semester, the Prof. Kimbrough Moore sent out an introductory email asking us to consider as a discussion question for our first session whether it is some sense contradictory for a vegetarian to eat oysters. I wrote a 630 word email in response (Subject: "ostroveganism vs. Schelling points (was: "Phil 392 - Welcome")") arguing that there are game-theoretic reasons for animal welfare advocates to commit to vegetarianism or veganism despite a prima facie case that oysters don't suffer—with a postscript asking if referring to courses by number was common in the philosophy department.
The course, and Prof. Moore himself, were pretty relaxed. There were readings on animal consciousness and rights from the big names (Singer on "All Animals are Equal", Nagel on "What Is It Like to Be a Bat?") and small ones, and then some readings about AI at the end of course.
Homework was to post two questions about the readings on Canvas. There were three written exams, which Prof. Moore indicated was a new anti-ChatGPT measure this semester; he used to assign term papers.
Prof. Moore's office hours were on Zoom. I would often phone in to chat with him about philosophy, or to complain about school. I found this much more stimulating than the lecture/discussion periods, which I started to ditch more often than not on Tuesdays in favor of Prof. Schuster's office hours.
Prof. Moore was reasonably competent at his job; I just had trouble seeing why his job, or for that matter, the SFSU philosophy department, should exist.
In one class session, he mentioned offhand (in a slight digression from the philosophy of animals) that there are different types of infinity. By way of explaining, he pointed out that there's no "next" decimal after 0.2 the way that there's a next integer after 2. I called out that that wasn't the argument. (The rationals are countable.) The same lecture, he explained Occam's razor in a way that I found rather superficial. (I think you need Kolmogorov complexity or the minimum description length principle to do the topic justice.) That night, I sent him an email explaining the countability of the rationals and recommending a pictoral intuition pump for Occam's razor due to David MacKay (Subject: "countability; and, a box behind a tree").
In April, the usual leftist blob on campus had scheduled a "Defend Higher Education" demonstration to protest proposed budget cuts to the California State University system; Prof. Moore offered one point of extra credit in "Philosophy of Animals" for participating.
I was livid. Surely it would be a breach of professional conduct to offer students course credit for attending an anti-abortion or pro-Israel rally. Why should the school presume it had the authority to tell students to speak out in favor of more school? I quickly wrote Prof. Moore an email in complaint, suggesting that the extra credit opportunity be viewpoint-neutral: available to available to budget cut proponents (or those with more nuanced views) as well as opponents.
I added:
If I don't receive a satisfactory response addressing the inappropriate use of academic credit to incentivize political activities outside the classroom by Thursday 17 April (the day of the protest), I will elevate this concern to Department Chair Landy. This timeline is necessary to prevent the ethical breach of students being bribed into bad faith political advocacy with University course credit.
I can imagine some readers finding this level of aggression completely inappropriate and morally wrong. Obviously, my outrage was performative in some sense, but it was also deeply felt—as if putting on a performance was the most sincere thing I could do under the circumstances.
It's not just that it would be absurd to get worked up over one measly point of extra credit if there weren't a principle at stake. (That, I would happily grant while "in character.") It was that expecting San Francisco State University to have principles about freedom of conscience was only slightly less absurd.
It was fine. Prof. Moore "clarified" that the extra credit was viewpoint-neutral. (I was a little embarrassed not to have witnessed the verbal announcement in class on Tuesday, but I had already made plans to interview the campus machine-shop guy at that time instead of coming to class.) After having made a fuss, I was obligated to follow through, so I made a "BUDGET CUTS ARE PROBABLY OK!" sign (re-using the other side of the foamboard from an anti–designated hitter rule sign I had made for a recent National League baseball game) and held it at the rally on Thursday for ten minutes to earn the extra-credit point.
As for the philosophy of animals itself, I was already sufficiently well-versed in naturalist philosophy of mind that I don't feel like I learned much of anything new. I posted 24/25 (plus a 2 point "curve" because SFSU students are illiterate), 21.5/25 (plus 4), and 22/25 (plus 2) on the three tests, and finished the semester at 101.5% for an A.
"Self, Place, and Knowing: An Introduction to Interdisciplinary Inquiry" (Spring 2025)I was able to satisfy the "Area E: Lifelong Learning and Self-Development" gen-ed requirement with an asynchronous online-only class, Prof. Mariana Ferreira's "Self, Place, and Knowing: An Introduction to Interdisciplinary Inquiry". Whatever expectations I had of a lower-division social studies gen-ed class at San Francisco State University, this felt like a parody of that.
The first few weekly assignments were quizzes on given readings. This already annoyed me: in a synchronous in-person class, a "quiz" is typically closed-book unless otherwise specified. The purpose is to verify that the student did the reading. It would be a perversion of that purpose for the quiz-taker to read the question, and then Ctrl-F in the PDF to find the answer without reading the full text, but there was no provision for stopping that eventuality here.
The first quiz was incredibly poorly written: some of the answers were obvious just from looking at the multiple choice options, and some of them depended on minutiæ of the text that a typical reader couldn't reasonably be expected to memorize. (The article quoted several academics in passing, and then the quiz had a question of the form "[name] at [university] expresses concerns about:".) I took it closed-book and got 7/10.
I posted a question on the class forum asking for clarification on the closed-book issue, and gently complaining about the terrible questions (Subject: "Are the quizzes supposed to be 'open book'? And, question design"). No one replied; I was hoping Prof. Ferreira kept an eye on the forum. I could have inquired with her more directly, but the syllabus said Zoom office hours were by appointment only at 8 a.m. Tuesdays—just when I was supposed to be out the door to be on time for "Measure and Integration." I didn't bother.
You might question why I even bothered to ask on the forum, given my contempt for grade-grubbing: I could just adhere to a closed-book policy unilaterally and eat the resulting subpar scores. But I had noticed that my cumulative GPA was sitting at 3.47 (down from 3.49 in Spring 2013 because of that C− in "Queer Literatures and Media" last semester), and 3.5 would classify my degree as cum laude. Despite everything, I think I did want an A in "Self, Place, and Knowing", and my probability of getting an A was lower if I handicapped myself with moral constraints perceived by myself and probably not anyone else.
I also did the next two quizzes closed book—except that on the third quiz, I think I succumbed to the temptation to peek at the PDF once, but didn't end up changing my answer as the result of the peek. Was that contrary to the moral law? Was this entire endeavor of finishing the degree now morally tainted by that one moment, however inconsequential it was to any outcome?
I think part of the reason I peeked was because, in that moment, I was feeling doubtful that the logic of "the word 'quiz' implies closed-book unless otherwise specified" held any force outside of my own head. Maybe "quiz" just meant "collection of questions to answer", and it was expected that students would refer back to the reading while completing it. The syllabus had been very clear about LLM use being plagiarism, despite how hard that was to enforce. If Prof. Ferreira had expected the quizzes to be closed book on the honor system, wouldn't she have said that in the syllabus, too? The fact that no one had shown any interest in clarifying what the rules were even after I had asked in the most obvious place, suggested that no one cared. I couldn't be in violation of the moral law if "Self, Place, and Knowing" was not a place where the moral law applied.
It turned out that I needn't have worried about my handicapped quiz scores (cumulative 32/40 = 80%) hurting my chances of making cum laude. Almost all of the remaining assignments were written (often in the form of posts to the class forum, including responses to other students), and Prof. Ferreira awarded full or almost-full credit for submissions that met the prescribed wordcount and made an effort to satisfy the (often unclear or contradictory) requirements.
Despite the syllabus's warnings, a few forum responses stuck out to me as having the characteristic tells of being written by an LLM assistant. I insinuated my suspicions in one of my replies to other classmates:
I have to say, there's something striking about your writing style in this post, and even more so your comments of Ms. Williams's and Ms. Mcsorley's posts. The way you summarize and praise your classmates' ideas has a certain personality to it—somehow I imagine the voice of a humble manservant with a Nigeran accent (betraying no feelings of his own) employed by a technology company, perhaps one headquartered on 18th Street in our very city. You simply must tell us where you learned to write like that!
I felt a little bit nervous about that afterwards: my conscious intent with the "Nigerian manservant" simile was to allude to the story about ChatGPT's affinity for the word delve being traceable to the word's prevalence among the English-speaking Nigerians that OpenAI employed as data labelers, but given the cultural milieu of an SFSU social studies class, I worried that it would be called out as racist. (And whatever my conscious intent, maybe at some level I was asking for it.)
I definitely shouldn't have worried. Other than the fact that Prof. Ferreira gave me credit for the assignment, I have no evidence that any human read what I wrote.
My final paper was an exercise in bullshit and malicious compliance: over the course of an afternoon and evening (and finishing up the next morning), I rambled until I hit the wordcount requirement, titling the result, "How Do Housing Supply and Community Assets Affect Rents and Quality of Life in Census Tract 3240.03? An Critical Microeconomic Synthesis of Self, Place, and Knowing". My contempt for the exercise would have been quite apparent to anyone who read my work, but Prof. Ferreira predictably either didn't read it or didn't care. I got my A, and my Bachelor of Arts in Mathematics (Mathematics for Liberal Arts) cum laude.
Cynicism and SanityThe satisfaction of finally finishing after all these years was tinged with grief. Despite the manifest justice of my complaints about school, it really hadn't been that terrible—this time. The math was real, and I suppose it makes sense for some sort of institution to vouch for people knowing math, rather than having to take people's word for it.
So why didn't I do this when I was young, the first time, at Santa Cruz? I could have majored in math, even if I'm actually a philosopher. I could have taken the Putnam (which is just offered at UCSC without a student needing to step up to organize). I could have gotten my career started in 2010. It wouldn't have been hard except insofar as it would have involved wholesome hard things, like the theory of functions of a complex variable.
What is a tragedy rather than an excuse is, I hadn't known how, at the time. The official story is that the Authority of school is necessary to prepare students for "the real world". But the thing that made it bearable and even worthwhile this time is that I had enough life experience to treat school as part of the real world that I could interact with on my own terms, and not any kind of Authority. The incomplete contract was an annoyance, not a torturous contradiction in the fabric of reality.
In a word, what saved me was cynicism, except that cynicism is just naturalism about the properties of institutions made out of humans. The behavior of the humans is in part influenced by various streams of written and oral natural language instructions from various sources. It's not surprising that there would sometimes be ambiguity in some of the instructions, or even contradictions between different sources of instructions. As an agent interacting with the system, it was necessarily up to me to decide how to respond to ambiguities or contradictions in accordance with my perception of the moral law. The fact that my behavior in the system was subject to the moral law, didn't make the streams of natural language instructions themselves an Authority under the moral law. I could ask for clarification from a human with authority within the system, but identifying a relevant human and asking had a cost; I didn't need to ask about every little detail that might come up.
Cheating on a math test would be contrary to the moral law: it feels unclean to even speak of it as a hypothetical possibility. In contrast, clicking through an anti-sexual-harrassment training module as quickly as possible without actually watching the video was not contrary to the moral law, even though I had received instructions to do the anti-sexual-harrassment training (and good faith adherence to the instructions would imply carefully attending to the training course content). I'm allowed to notice which instructions are morally "real" and which ones are "fake", without such guidance being provided by the instructions themselves.
I ended up getting waivers from Chair Hsu for some of my UCSC credits that the computer system hadn't recognized as fulfilling the degree requirements. I told myself that I didn't need to neurotically ask followup questions about whether it was "really" okay that (e.g.) my converted 3.3 units of linear algebra were being accepted for a 4-unit requirement. It was Chair Hsu's job to make his own judgement call as to whether it was okay. I would have been agreeable to take a test to prove that I know linear algebra—but realistically, why would Hsu bother to have someone administer a test rather than just accept the UCSC credits? It was fine; I was fine.
I remember that back in 2012, when I was applying to both SF State and UC Berkeley as a transfer student from community college, the application forms had said to list grades from all college courses attempted, and I wasn't sure whether that should be construed to include whatever I could remember about the courses from a very brief stint at Heald College in 2008, which I didn't have a transcript for because I had quit before finishing a single semester without receiving any grades. (Presumably, the intent of the instruction on the forms was to prevent people from trying to elide courses they did poorly in at the institution they were transferring from, which would be discovered anyway when it came time to transfer credits. Arguably, the fact that I had briefly tried Heald and didn't like it wasn't relevant to my application on the strength of my complete DVC and UCSC grades.)
As I recall, I ended up listing the incomplete Heald courses on my UC Berkeley application (out of an abundance of moral caution, because Berkeley was actually competitive), but not my SFSU application. (The ultimate outcome of being rejected from Berkeley and accepted to SFSU would have almost certainly been the same regardless.) Was I following morally coherent reasoning? I don't know. Maybe I should have phoned up the respective admissions offices at the time to get clarification from a human. But the possibility that I might have arguably filled out a form incorrectly thirteen years ago isn't something that should turn the entire endeavor into ash. The possibility that I might have been admitted to SFSU on such "false pretenses" is not something that any actual human cares about. (And if someone does, at least I'm telling the world about it in this blog post, to help them take appropriate action.) It's fine; I'm fine.
When Prof. Mujamdar asked us to bring our laptops for the recitation on importance sampling and I didn't feel like lugging my laptop on BART, I just did the work at home—in Rust—and verbally collaborated with a classmate during the recitation session. I didn't ask for permission to not bring the laptop, or to use Rust. It was fine; I was fine.
In November 2024, I had arranged to meet with Prof. Arek Goetz "slightly before midday" regarding the rapidly approaching registration deadline for the Putnam competition. I ducked out of "Real II" early and knocked on his office door at 11:50 a.m., then waited until 12:20 before sending him an email on my phone and proceeding to my 12:30 "Queer Literatures and Media" class. While surreptitiously checking my phone during class, I saw that at 12:38 p.m., he emailed me, "Hello Zack, I am in the office, not sure if you stopped by yet...". I raised my hand, made a contribution to the class discussion when Prof. Goldberg called on me (offering Seinfeld's "not that there's anything wrong with that" episode as an example of homophobia in television), then grabbed my bag and slipped out while she had her back turned to the whiteboard. Syncing up with Prof. Goetz about the Putnam registration didn't take long. When I got back to "Queer Literatures and Media", the class had split up into small discussion groups; I joined someone's group. Prof. Goldberg acknowledged my return with a glance and didn't seem annoyed.
Missing parts of two classes in order to organize another school activity might seem too trivial of an anecdote to be worth spending wordcount on, but it felt like a significant moment insofar as I was applying a wisdom not taught in schools, that you can just do things. Some professors would have considered it an affront to just walk out of a class, but I hadn't asked for permission, and it was fine; I was fine.
In contrast to my negligence in "Queer Literatures and Media", I mostly did the reading for "Philosophy of Animals"—but only mostly. It wasn't important to notice or track if I missed an article or skimmed a few pages here and there (in addition to my thing of cutting class in favor of Prof. Schuster's office hours half the time). I engaged with the material enough to answer the written exam questions, and that was the only thing anyone was measuring. It was fine; I was fine.
I was fine now, but I hadn't been fine at Santa Cruz in 2007. The contrast in mindset is instructive. The precipitating event of my whole anti-school crusade had been the hysterical complete mental breakdown I had after finding myself unable to meet pagecount on a paper for Prof. Bettina Aptheker's famous "Introduction to Feminisms" course.
It seems so insane in retrospect. As I demonstrated with my malicious compliance for "Self, Place, and Knowing", writing a paper that will receive a decent grade in an undergraduate social studies class is just not cognitively difficult (even if Prof. Aptheker and the UCSC of 2007 probably had higher standards than Prof. Ferreira and the SFSU of 2025). I could have done it—if I had been cynical enough to bullshit for the sake of the assignment, rather than holding myself to the standard of writing something I believed and having a complete mental breakdown rather than confront the fact that I apparently didn't believe what I was being taught in "Introduction to Feminisms."
I don't want to condemn my younger self entirely, because the trait that made me so dysfunctional was a form of integrity. I was right to want to write something I believed. It would be wrong to give up my soul to the kind of cynicism that scorns ideals themselves, rather than the kind than scorns people and institutions for not living up to the ideals and lying about it.
Even so, it would have been better for everyone if I had either bullshitted to meet the pagecount, or just turned in a too-short paper without having a total mental breakdown about it. The total mental breakdown didn't help anyone! It was bad for me, and it imposed costs on everyone around me.
I wish I had known that the kind of integrity I craved could be had in other ways. I think I did better for myself this time by mostly complying with the streams of natural language instructions, but not throwing a fit when I didn't comply, and writing this blog post afterwards to clarify what happened. If anyone has any doubts about the meaning of my Bachelor of Arts in Mathematics for Liberal Arts from San Francisco State University, they can read this post and get a pretty good idea of what that entailed. I've put in more than enough effort into being transparent that it doesn't make sense for me to be neurotically afraid of accidentally being a fraud.
I think the Bachelor of Arts in Mathematics does mean something, even to me. It can simultaneously be the case that existing schools are awful for the reasons I've laid out, and that there's something real about some parts of them. Part of the tragedy of my story is that having wasted too much of my life in classes that were just obedience tests, I wasn't prepared to appreciate the value of classes that weren't just that. If I had known, I could have deliberately sought them out at Santa Cruz.
I think I've latched on to math as something legible enough and unnatural enough (in contrast to writing) that the school model is tolerable. My primary contributions to the world are not as a mathematician, but if I have to prove my intellectual value to Society in some way that doesn't depend on people intimately knowing my work, this is a way that makes sense, because math is too difficult and too pure to be ruined by the institution. Maybe other subjects could be studied in school in a way that's not fake. I just haven't seen it done.
There's also a sense of grief and impermanence about only having my serious-university-math experience in the GPT-4 era rather than getting to experience it in the before-time while it lasted. If I didn't have LLM tutors, I would have had to be more aggressive about collaborating with peers and asking followup questions in office hours.
My grudging admission that the degree means something to me should not be construed as support for credentialism. Chris Olah never got his Bachelor's degree, and anyone who thinks less of him because of that is telling on themselves.
At the same time, I'm not Chris Olah. For those of us without access to the feedback loops entailed by a research position at Google Brain, there's a benefit to being calibrated about the standard way things are done. (Which, I hasten to note, I could in principle have gotten from MIT OpenCourseWare; my accounting of benefits from happening to finish college is not an admission that the credentialists were right.) Obviously, I knew that math is not a spectator sport: in the years that I was filling my pages of notes from my own textbooks, I was attempting exercises and not just reading (because just reading doesn't work). But was I doing enough exercises, correctly, to the standard that would be demanded in a school class, before moving on to the next shiny topic? It's not worth the effort to do an exhaustive audit of my 2008–2024 private work, but I think in many cases, I was not. Having a better sense of what the mainstream standard is will help me adjust my self-study practices going forward.
When I informally audited "Honors Introduction to Analysis" ("MATH H104") at UC Berkeley in 2017, Prof. Charles C. Pugh agreed to grade my midterm, and I got a 56/100. I don't know what the class's distribution was. Having been given to understand that many STEM courses offered a generous curve, I would later describe it as me "[doing] fine on the midterm". Looking at the exam paper after having been through even SFSU's idea of an analysis course, I think I was expecting too little of myself: by all rights, a serious analysis student in exam shape should be able to prove that the minimum distance between a compact and a connected set is achieved by some pair of points in the sets, or that the product of connected spaces is connected (as opposed to merely writing down relevant observations that fell short of a proof, as I did).
In a July 2011 Diary entry, yearning to finally be free of school, I fantasized about speedrunning SF State's "advanced studies" track in two semesters: "Six classes a semester sounds like a heavy load, but it won't be if I study some of the material in advance," I wrote. That seems delusional now. That's not actually true of real math classes, even if it were potentially true of "Self, Place, and Knowing"-tier bullshit classes.
It doesn't justify the scourge of credentialism, but the fact that I was ill-calibrated about the reality of the mathematical skill ladder helps explain why the coercion of credentialism is functional, why the power structure survives instead of immediately getting competed out of existence. As terrible as school is along so many dimensions, it's tragically possible for people to do worse for themselves in freedom along some key dimensions.
There's a substantial component of chance in my coming to finish the degree. The idea presented itself to me in early 2024 while I was considering what to work on next after a writing project had reached a natural stopping point. People were discussing education and schooling on Twitter in a way that pained me, and it occurred to me that I would feel better about being able to criticize school from the position of "... and I have a math degree" rather than "... so I didn't finish." It seemed convenient enough, so I did it.
But a key reason it seemed convenient enough is that I still happened to live within commuting distance of SF State. That may be more due to inertia than anything else; when I needed to change apartments in 2023, I had considered moving to Reno, NV, but ended up staying in the East Bay because it was less of a hassle. If I had fled to Reno, then transferring credits and finishing the degree on a whim at the University of Nevada–Reno would have been less convenient. I probably wouldn't have done it—and I think it was ultimately worth doing.
The fact that humans are such weak general intelligences that so much of our lives come down to happenstance, rather than people charting an optimal path for themselves, helps explain why there are institutions that shunt people down a standard track with a known distribution of results. I still don't like it, and I still think people should try to do better for themselves, but it seems somewhat less perverse now.
Afterwards, Prof. Schuster encouraged me via email to at least consider grad school, saying that I seemed comparable to his peers in the University of Michigan Ph.D. program (which was ranked #10 in the U.S. at that time in the late '90s). I demurred: I said I would consider it if circumstances were otherwise, but in contrast to the last two semesters to finish undergrad, grad school didn't pass a cost-benefit analysis.
(Okay, I did end up crashing Prof. Clader's "Advanced Topics in Mathematics: Algebraic Topology" ("MATH 790") the following semester, and she agreed to grade my examinations, on which I got 47/50, 45/50, 46/50, and 31/50. But I didn't enroll.)
What was significant (but not appropriate to mention in the email) was that now the choice to pursue more schooling was a matter of cost–benefit analysis, and not a prospect of torment or betrayal of the divine.
I wasn't that crazy anymore.
Discuss
Taiwan war timelines might be shorter than AI timelines
TL;DR: Most AI forecasts generally assume that if a conflict over Taiwan occurs, it will largely be about AI. I think there's a decent chance for a conflict before either side becomes substantially AGI-pilled.
Thanks to Aaron Scher for comments on a draft of this post.
I'm no China expert, but a lot of China experts seem pretty concerned about the possibility of a conflict over Taiwan. China is currently engaged in a massive military buildup and modernization effort, it's building specialized invasion barges like the Mulberry harbors used in the WWII Normandy landings, and it's conducting amphibious landing exercises with civilian roll-on/roll-off vehicle ferries, many of which China modifies for potential military use. Increasingly frequent military exercises around Taiwan could let China rapidly transition to a full blockade. Its internal propaganda suggests that Taiwanese "provocations" could justify military action, and its leadership continually talk about Taiwan's "return to China", with some even openly discussing "reeducation".
By some cosmic coincidence, 2027, the PLA's centennial, is sometimes identified as the year when the PLA hopes to be ready for a conflict over Taiwan. This doesn't mean China will immediately pull the trigger, but they might want to be prepared by then in case things do escalate. They may believe the next few years represent a window of opportunity before slower growth and a demographic crisis reduce China's power relative to the US. Plus, Xi is 72, and would probably love to cement his legacy by retaking Taiwan in his lifetime.[1]
Manifold currently puts the probability of an invasion of Taiwan by the end of 2027 at around 22%, and before 2030 at around 37%, although I don't think these markets count blockades and other actions that fall short of a full invasion:
Other markets put the chance of a more limited conflict higher:.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-surd + .mjx-box {display: inline-flex} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} [2]
I'm not trying to make the case here that there will probably be a war. The point I want to make is that while most AI forecasts assume any conflict with China would largely be about AI, I think there's a decent chance a conflict occurs for other reasons before either side becomes AGI-pilled. This point has been made previously here and here, but I think the possibility of war is much more relevant now that people have somewhat longer timelines. Back when many expected takeoff in 2027 or so, it was pretty reasonable to assume that the probability of a conflict entirely unrelated to AI was low.[3] But the forecasters behind AI 2027 now expect takeoff in the 2030s. If that's the case, I think there's a good chance Xi decides to escalate over Taiwan before he (or his successor) starts paying serious attention to AI. At the very least, the timelines overlap considerably: developments in AI could shift China's calculus over whether and when to invade, but equally Chinese aggression unrelated to AI could drastically impact AI timelines.
It's perfectly reasonable to build a forecasting model that doesn't try to take these kinds of exogenous shocks into account. But I think forecasters should clearly flag when they do this, and ideally provide estimates for how likely they think such events are if the chances are significant. I haven't really seen this in the AI forecasting space: for instance, the AI Futures Project's all-things-considered forecasts don't mention the possibility of a conflict, and this scenario has China blockading Taiwan as late as 2034, and only in response to US cyberattacks on Chinese AI development.
I also think the chances of conflict are high enough that it would be very valuable to have forecasts specifically focused on understanding AI timelines in the event of a war. There's been some discussion of this here and here, but those are ancient history over two years old at this point, so something more up-to-date would be useful. I'll give a few of my thoughts below, but this is mostly just speculation -- I'd really like to see modeling work by more knowledgeable people on how a conflict would impact AI.
Impacts on computeIn general, a conflict over Taiwan would almost certainly slow down progress in AI by disrupting access to compute. This report estimates 20 months for other firms to catch up to TSMC; this discussion is more apocalyptic, predicting global microprocessor production falling to "early 2000s levels for perhaps 15 years." It's less clear to me who would come out ahead in relative terms, though. The US is very reliant on Taiwan, but I think in some cases it might actually be able to maintain a good portion of its compute advantage even if Taiwanese production is cut off, because China also depends on Taiwan for much of its compute. I'm pretty uncertain about this, though.
If things escalate to a shooting war, it's likely that much of Taiwan's manufacturing will be destroyed by one side or the other. If China manages to take over Taiwan, they might gain access to some of TSMC's expertise or technology even if the fabs are destroyed. A good part of TSMC's edge is reportedly in its engineering talent and in knowledge held by a small number of individuals. But I think China would probably have a lot of difficulty gaining the cooperation of TSMC employees or recreating the talent base after a war, so I don't expect China to benefit very much: the main effect would be everyone losing access to Taiwanese production.
Meanwhile, I don't think Chinese chip production would be disrupted as much by a war. A US blockade or other wartime supply-chain disruptions would certainly make things more difficult, but it seems much harder to cut off Chinese production without just bombing the fabs. The US might do that,[4] especially if TSMC's fabs get bombed, but if the conflict isn't centrally about AI I think there's a good chance they'll survive.
But Chinese chip production currently isn't very competitive and might not catch up for some time. If China is still reliant on foreign chips, the conflict could see the US maintain its compute advantage because it would almost certainly stop selling China chips and crack down on chip smuggling. While the US would likely lose access to Taiwan's production, at least for the duration of the conflict, it would still have TSMC's Arizona fab plus whatever production Intel can manage.[5] I think the relative compute balance here really depends on how quickly US domestic production ramps up compared to Chinese production -- modeling this properly would be very valuable!
A major risk, though, is that Taiwanese production might be used as a bargaining chip if it's not destroyed. This could be as part of a peace deal after a war, but it could also happen before a full-scale war starts. China might impose a blockade, take some outlying islands, or use other kinds of pressure to try to force a capitulation or extract concessions without having to invade. It seems unlikely that China would be able to take over completely without a fight, but it could gain better access to Taiwanese production: the US might agree to loosen export controls or even give China access to TSMC or ASML's tech as part of a settlement. If the US government isn't AGI-pilled at this point, it might not even value this particularly highly, or view it as opening up a market for American companies.
SecuritizationOne other scenario worth considering is that US or Chinese leaders might start to wake up to AGI during the conflict. An intelligence explosion probably increases the risk of war in the best of times; if there's already a war going on, things could get very ugly. If AI gets securitized we'd likely see attacks on fabs and data centers, secret Manhattan Projects, assassinations, and little room for safety research, let alone any sort of coordination on AI. On the other hand, if the US and China sabotage each other hard enough this could end up delaying AGI significantly.[6]
ConclusionOf course China might decide not to invade, or we might get AGI first, rendering all of this moot. But I think the chance of a conflict over Taiwan years before AGI is high enough that it should probably be factored into peoples' timelines. It's easy to forget, but people care about other things besides AGI! And the decisions they make could have big impacts on the AI race; we've seen this repeatedly with chip sales to China, and we could very well see it again.
- ^
And despite recent talk, I don't think he genuinely expects to live to 150, although I suppose it would explain his apparent lack of succession planning.
- ^
I don't think you should pay much attention to my probabilities because I'm neither a China expert nor an experienced forecaster, but for the record, I think 22% for invasion by the end of 2027 is maybe a bit high (and I've bet on this), while the other markets roughly match my estimates.
- ^
In the original AI 2027 timeline the CCP contemplates an invasion or blockade of Taiwan, but only in response to the US advantage in compute.
- ^
Can the US bomb the Chinese mainland without triggering a nuclear war? I don't know! China maintains a no first use policy, but who knows how that will hold up during a war.
- ^
China could try to disable the US fabs (e.g. with cyberattacks), but the US would likely retaliate against Chinese fabs, at which point I'm not sure anyone's left making chips. I guess in that case the US's larger preexisting stock of compute might give it an advantage.
- ^
I'm not sure I'd want to stay in the Bay Area for this, though -- I don't want to get "sabotaged."
Discuss
Split (Part 1)
Hi, I’m a baby fiction-writer. If you like this first chapter, please like, share, or comment. It will increase the chance of me writing additional chapters. Enjoy! <3
Imagine you woke up and went about your day and hurt yourself, horrifically, in a perfectly mundane way. Maybe you sliced your wrist with a box cutter cause you are unfathomably clumsy, or you tripped and tumbled off the top of your surprisingly high stairs.
Whatever agony you are feeling, your mind is rejecting the reality of what just happened. It’s screaming itself inside out that this can’t be happening and you were fine just moments before but now you are dying.
Instead of calling 911 like any sane person would do, you just lay there in shock, your mind raging against the dying of your light. It’s seeping out of you where the blood gushes out of your wrist or the pressure in your skull crushes your brain.
You imagine instead your healthy body, the way it was before. Far more real than whatever this current horror is. Your normal body, without any additional apertures for blood to escape or dents crushing bone shards into your brain.
Till the pain fades away in a haze and you feel yourself splitting away from yourself. Is this what dying is like? you manage to wonder before you come to, on the floor, naked, someone hugging you from behind.
Nothing makes sense for a moment, and you wonder if you fell asleep, had a bad dream, took a strange drug? But then why are you lying on the floor here. And … who is that?
You turn around and see your own face. Merciful adrenalin snaps the entire world into crystalline focus, time freezing as your body is propelled backwards across the floor. Your real body that is, the one you just moved when you scrambled back. Not that doll, mannequin, monstrosity on the floor in front of you.
It’s bleeding from its wrists. Its skull is caved in.
You, though, you are fine. You are pushing your naked back against a rough wall. Staring, staring, and staring. No way to know how long before thoughts start surfacing again.
Who is that?
I must be high.
Oh, this must be what it’s like to be crazy.
You focus on your breathing. In and out. It does no good.
You move you fingers. Then your toes. That works at least.
You stretch your arms and then your legs, careful to not touch the pool of blood stagnant on the wooden floor.
You push yourself up against the wall, plowing painful rivulets into your bare back. The pain wakes you further.
What’s happening?
Your mind can’t make sense of it, but you are standing now at least. Breathing and moving worked. You are naked though. The other face is wearing your bathrobe shrouded in a pool of blood.
I guess that’s ruined now.
You get up and walk to the kitchen. You make yourself a cup of coffee, calmly. Sit at table and sip. Your dead body lying in the hallway.
I’m either crazy or something amazing just happened.
Crazy is more likely, but in that case the body isn’t really there. Well, either that or it’s someone you just murdered and there is another face on them.
Should I call the cops?
If there is nothing there, they’ll give you meds. If you killed someone they’ll lock you up. And if you did just cheat death by creating a new body then… then…
Ok, I’m crazy or I killed someone.
In that case, better to hide the body. If you are crazy, then you are just LARPing a nightmare for a night. And if you are a murderer, well … You can’t fathom why you’d kill anyone so there was probably a good reason or a bad accident. Either way, it’s better to have more time to figure that out than have police swarming in right this moment.
…
Your mind flinches away from the obvious conclusion. “Hiding bodies” was not part of your 2026 resolutions. Though you’ve seen enough TV shows about it to have some hunches on how to go about it.
So you get to work, pragmatically, methodically, and with clothes on now. It’s the middle of the night and you wheel out the body in a trash container, pull it into your minivan, and drive it over to your parents farm. There there is a small river that runs along the edge, and a small pier that runs into it, with small boats that don’t run at all. One sags half into the water, disappointed at never being used. Another lists precariously, doubtfully able to sustain the weight of twice yourself.
So you sit at the pier, legs dangling down above the water, the container with a body waiting patiently next to you, the light from your phone the only thing giving you away.
Funny how the world isn’t screaming.
Everything looks peaceful instead. A cat meows somewhere. An owl hoots. The stars shine down on either your insanity or your crime.
Sorry to confuse you though. This story isn’t about you. It is about me. It’s how there are two of me now and I don’t know what to do about that. One with a slit wrist and a dent in her head and one…
I look down at my wrist, unblemished, then feel along my skull. I remember the pain, the cut, the fall.
Not the stairs though.
I was at the bottom of the stairs, in the hallway, opening a box.
I…
Is my mind still damaged?
I keep feeling along my skull as the horror inside me mounts.
Do I have brain damage?
Panic rises in me like an electric fire shooting out from my stomach. I scramble back from the edge of the pier, not trusting myself so close to the water.
No, fuck, what happened?! Am I crazy? Do I have amnesia?
My nails are digging into my skull now, the pain only a ghost of the memory when I cracked it.
Memory?!
Then I freeze.
Denying the moment.
Denying the reality.
I am not crazy. I don’t have amnesia. I don’t have brain damage!
It’s impossible to accept so I resist instead, curling my body around a truth that doesn’t exist till agony explodes all across my body, just before everything fades away again. A … stroke?
And then I come to, wind hugging my face, someone’s arms hugging my waist.
Fuck.
Discuss
Who is responsible for shutting down rogue AI?
A loss of control scenario would likely result in rogue AI replicating themselves across the internet, as discussed here: https://metr.org/blog/2024-11-12-rogue-replication-threat-model/
Under fast takeoff models, the first rogue AGI posing a serious takeover/extinction risk to humanity would very likely be the last, with no chance for serious opposition (e.g. Sable). This model seems theoretically compelling to me.
However, there is some recent empirical evidence that the basin of "roughly human" intelligence may not be trivial to escape. LLM agents seem increasingly competent and general, but continue to lag behind humans on long-term planning. If capabilities continue to develop in a highly jagged fashion, we may face rather dangerous rogue AI that still have some exploitable weaknesses. Also, the current (neuro-scaffold) paradigm is compute/data hungry, and perhaps not easily amenable to RSI. Though I suspect strongly superhuman models would be able to invent a much more efficient paradigm, it does seem reasonable to give some weight to the possibility that early rogue neuro-scaffold AGI will undergo a relatively slow takeoff.[1]
Therefore, a competent civilization would have a governmental agency (or team) designated to rapidly shut down (and thoroughly purging/containing) rogue AGI. My question is which agencies currently hold that responsibility?
Surprisingly, I have not been able to find much previous discussion on practical aspects of this question (ex. legal aspects of shutting down a rogue AI running on AWS).
Ideally, such an agency would be international since rogue AGI can easily cross borders and may even negotiate with / bribe / blackmail governments. However, I would guess that some cybercrime unit within the (U.S.) DoD is probably the best candidate. While the UK AISI seems most "on the ball," as far as I know they are not very well equipped to aggressively pursue rogue AGI across borders, which may require a very quick response / escalation across cyber-defense and conventional strikes on data-centers.
At a bare minimum, a strong candidate for this role should actually perform drills simulating shutdown attempts against rogue AGI, which will probably be possible to carry out in a somewhat useful form very soon (or now, with red team human assistance).
- ^
If neuro-scaffold AI is inherently too weak to reach AGI then the first rogue AGI may arise from a more dangerous paradigm, e.g. "brain-like-AGI". This would be unfortunate, is likely, and is not the focus of this post.
Discuss
Overwhelming Superintelligence
There's many debates about "what counts as AGI" or "what counts as superintelligence?".
Some people might consider those arguments "goalpost moving." Some people were using "superintelligence" to mean "overwhelmingly smarter than humanity". So, it may feel to them like it's watering it down if you use it to mean "spikily good at some coding tasks while still not really successfully generalizing or maintaining focus."
I think there's just actually a wide range of concepts that need to get talked about. And, right now, most of the AIs that people will wanna talk about are kinda general and kinda superintelligent and kinda aligned.
If you have an specific concept you wanna protect, I think it's better to just give it a clunky name that people don't want to use in casual conversation,[1] rather than pumping against entropy to defend a simple term that could be defined to mean other things.
Previously OpenPhil had used "Transformative AI" to mean "AI that is, you know, powerful enough to radically transform society, somehow." I think that's a useful term. But, it's not exactly what If Anyone Builds It is cautioning about.
The type of AI I'm most directly worried about is "overwhelmingly superhuman compared to humanity." (And, AIs that might quickly bootstrap to become overwhelmingly superhuman).
I've been lately calling that Overwhelming Superintelligence.
Overwhelming Superintelligence is scary both because it's capable of strategically outthinking humanity, and, because any subtle flaws or incompatibilities between what it wants, and what humans want, will get driven to extreme levels.
I think if anyone builds Overwhelmed Superintelligence without hitting a pretty narrow alignment target, everyone probably dies. (And, if not, the future is probably quite bad).
Appendix: Lots of "Careful Moderate Superintelligence"I am separately worried about "Carefully Controlled Moderate Superintelligences that we're running at scale, each instance of which is not threatening, but, we're running a lot of them, giving them lots of room to maneuver."
This is threatening partly because at some point that they may give rise to Overwhelming Superintelligence, but, also because sharing the planet with a slightly smarter species still doesn't seem like it bodes well. (See humans, neanderthals, chimpanzees). They don't have to do anything directly threatening, just keep being very useful while subtly steering things such that they get more power in the future.
- ^
I actually think AIdon'tkilleveryoneism is pretty good.
Discuss
Reducing MDMA neurotoxicity
old literature-review research task commissioned by @Raj Thimmiah
Epistemic status: This is not medical advice. Pharmacological speculations of a high-schooler, informed by studies done mainly on rodents. Pls don't kill yourself by doing anything suggested in this post, a lot of these substances and combinations of them can be severely dangerous.
TL;DR
An MDMA analogue like 5-MAPB or 4-MMC with low-dose selegiline seems to be the combination with the best MDMA-likeness to neurotoxicity ratio. (targets the two main mechanisms (if hyperthermia is physically avoided) - toxic metabolites and serotonergic neurotoxicity caused by simultaneous dopamine release). Taking selegiline should have the additional effect of a longer/stronger dopaminergic effect due to slowed dopamine degradation.
Vitamin C, vitamin E, ALA, and agmatine likely provide further neuroprotection. Antipsychotics like clozapine are an effective measure for MDMA-overdose-caused hyperthermia (only as an emergency measure; otherwise it would dull the effects significantly).
why selegiline
Selegiline is a drug prescribed for Parkinson's disease and depression (for which the patch form - EMSAM - is used in the US).
Mechanism
In a 1995 study, it was shown using selegiline and the SSRI fluoxetine, that damage to neuronal membranes from MDMA stems largely from the uptake of dopamine by serotonin transporters, thus causing dopamine to accumulate in serotonergic neurons, where it is broken down mainly by MAO-B (whereas elsewhere, it's metabolised mostly by MAO-A). This deamination by MAO-B creates hydrogen peroxide, which is claimed to be responsible for much of MDMA's neurotoxicity.
Selegiline pharmacology
Selegiline is an MAOi; it inhibits the enzymes that metabolise monoamines, such as dopamine, serotonin, norepinephrine, and trace amines, increasing their intersynaptic concentration.
Selegiline, along with safinamide and rasagiline, is selective for the MAO-B subtype at certain dosages. The MAO-B enzyme metabolises mostly beta-phenethylamine and dopamine, though it has been found, that MAO-A is mostly responsible for dopamine breakdown, however, it seems likely that it is indeed MAO-B which metabolises dopamine in serotonergic nerve terminals. In addition, there seems to be evidence that MAO-B metabolises serotonin in serotonergic neurons as well, also producing hydrogen peroxide.
Selegiline, as well as the beta-phenethylamine it increases (the "endogenous amphetamine"), are agonists of TAAR1. This is also a target of amphetamine, probably being responsible for part of its action. TAAR1 agonism may be responsible for selegiline's catecholaminergic activity enhancer (CAE) activity.[1]
In addition, selegiline (especially orally) is metabolised into the levorotary forms of amphetamine and methamphetamine, which act as norepinephrine and dopamine releasing agents, though much weaker in terms of dopamine release than their dextrorotary "conventional" counterparts (detroamphetamine and dextromethamphetamine).[2]
Selegiline's CAE effect, amphetamine metabolites, and decreased dopamine metabolism suggest it might enhance MDMA's effect, even in the absence of functional MAO-A inhibition, though it seems from the above-linked studies, that for rodents no significant additional hyperthermia or head-twitch response is observed. This might be due to the fact that most dopamine is metabolised by MAO-A in blood platelets, rather than MAO-B in serotonergic neurons, and thus only the small fraction of dopamine breakdown, that is responsible for neurotoxic effects, is targeted by selegiline administration.
Still, it is a potentially risky practice to add selegiline to MDMA use, with possible individual variation in drug metabolism causing MAO-A inhibition at dosages that usually only inhibit MAO-B. Therefore, low dosages of <10 mg, maybe 5 mg, are more reasonable..mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0}
.MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0}
.mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table}
.mjx-full-width {text-align: center; display: table-cell!important; width: 10000em}
.mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0}
.mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left}
.mjx-numerator {display: block; text-align: center}
.mjx-denominator {display: block; text-align: center}
.MJXc-stacked {height: 0; position: relative}
.MJXc-stacked > * {position: absolute}
.MJXc-bevelled > * {display: inline-block}
.mjx-stack {display: inline-block}
.mjx-op {display: block}
.mjx-under {display: table-cell}
.mjx-over {display: block}
.mjx-over > * {padding-left: 0px!important; padding-right: 0px!important}
.mjx-under > * {padding-left: 0px!important; padding-right: 0px!important}
.mjx-stack > .mjx-sup {display: block}
.mjx-stack > .mjx-sub {display: block}
.mjx-prestack > .mjx-presup {display: block}
.mjx-prestack > .mjx-presub {display: block}
.mjx-delim-h > .mjx-char {display: inline-block}
.mjx-surd {vertical-align: top}
.mjx-surd + .mjx-box {display: inline-flex}
.mjx-mphantom * {visibility: hidden}
.mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%}
.mjx-annotation-xml {line-height: normal}
.mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible}
.mjx-mtr {display: table-row}
.mjx-mlabeledtr {display: table-row}
.mjx-mtd {display: table-cell; text-align: center}
.mjx-label {display: table-row}
.mjx-box {display: inline-block}
.mjx-block {display: block}
.mjx-span {display: inline}
.mjx-char {display: block; white-space: pre}
.mjx-itable {display: inline-table; width: auto}
.mjx-row {display: table-row}
.mjx-cell {display: table-cell}
.mjx-table {display: table; width: 100%}
.mjx-line {display: block; height: 0}
.mjx-strut {width: 0; padding-top: 1em}
.mjx-vsize {width: 0}
.MJXc-space1 {margin-left: .167em}
.MJXc-space2 {margin-left: .222em}
.MJXc-space3 {margin-left: .278em}
.mjx-test.mjx-test-display {display: table!important}
.mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px}
.mjx-test.mjx-test-default {display: block!important; clear: both}
.mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex}
.mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left}
.mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right}
.mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0}
.MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal}
.MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal}
.MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold}
.MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold}
.MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw}
.MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw}
.MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw}
.MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw}
.MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw}
.MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw}
.MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw}
.MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw}
.MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw}
.MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw}
.MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw}
.MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw}
.MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw}
.MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw}
.MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw}
.MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw}
.MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw}
.MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw}
.MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw}
.MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw}
.MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw}
@font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')}
@font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')}
@font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold}
@font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')}
@font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')}
@font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold}
@font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')}
@font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic}
@font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')}
@font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')}
@font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold}
@font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')}
@font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic}
@font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')}
@font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')}
@font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')}
@font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')}
@font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold}
@font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')}
@font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic}
@font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')}
@font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')}
@font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic}
@font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')}
@font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')}
@font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')}
@font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')}
@font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')}
@font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')}
@font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold}
@font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}
Evidence
Aside from the Sprague 1995 study linked above, which showed that selegiline (also called Deprenyl) alleviated markers of neurotoxicity in MDMA-exposed rats, in vitro culture had significantly reduced free radical formation when administered selegiline before MDMA, and MAO-B deficient rats where shown to have no serotonin depletion from MDMA (indicating lacking damage to serotonergic neurons - the main targets of MDMA neurotoxicity). Another study from 2007 showed MAO-B dependent damage to mitochondria (including their DNA), alleviated by selegiline.
Notably, I didn't find any study testing the combination in humans, nor any anecdotal experience report.
Selectivity/dosage:
At dosages of <10 mg/day (orally), selegiline is selective for MAO-B, thus not posing a significant risk of serotonin syndrome[3] (BUT maybe your body works differently and is highly sensitive). If you want to try this potentially dangerous combination, ideally microdose the selegiline at dosages below 5 mg.
Patches (EMSAM) might theoretically be useful due to a lower amount of metabolites (reported 70% reduction), but they are risky due to their much higher bioavailability, up to 50-fold, and thus simultaneous MAO-A inhibition at normal dosages (which is the goal of the patches, intended as antidepressants).[4] In addition, it releases the drug slowly over the course of a day, making it less predictable when planning a specific dosage in combination with MDMA.
safety of selected MDMA analogues
Which pharmacological aspects of MDMA do we look for?
Besides the dopamine and norepinephrine release and reuptake inhibition, which causes stimulation and euphoria, the specifics of MDMA's effects are likely due to interaction with serotonin receptors. However, it is not entirely clear how serotonin release leads to prosocial, enactogenic effects. Experiments with receptor antagonists have shown that the psychedelic receptor 5-HT2A (which modulates dopamine and prolactin release) is relevant for this effect, as well as 5-HT1B and 5-HT1A, which has downstream effects on increased oxytocin release.[5] Notably, it not only indirectly stimulates but also directly binds to, as a partial agonist, to these receptors, in addition to 5-HT2B and the sigma receptors, which might be responsible for some of its effects. It also binds to alpha-adrenergic receptors, possibly contributing to its anxiolytic effects.[6]
Hormonally, MDMA has been shown to elevate the neuropeptides oxytocin and vasopressin, as well as the steroids DHEA and cortisol. The increase in oxytocin seems to be correlated with prosocial effects, and the increase in DHEA with euphoric ones[6] (maybe one should try to supplement DHEA or intranasal oxytocin?). The hormonal effects are, however, much less studied in MDMA analogues than in MDMA itself, so I'll focus on the activity as a triple monoamine releaser and serotonergic receptor activity instead.
Analogues
The four following MDMA analogues belong into the classes of benzofurans (5-MAPB and 6-APB) and cathinones (4-MMC and 3-MMC), meaning their cores are different from that of MDMA, and thus form different metabolites.
All of them are triple monoamine releasing agents and (except for 5-MAPB) their reuptake inhibitor - leading to increased extracellular concentration of serotonin, dopamine and norepinephrine (noradrenaline). At the same time, all have affinity to the above-mentioned serotonin receptors, mostly as partial agonists (just as MDMA).
Importantly, they have none of the metabolites of MDMA, such as alpha-methyldopamine, HHA or HHMA, which are neurotoxic through their auto-oxidation into quinones and their binding to gluthanione, forming neurotoxic thioether conjugates.[7] Therefore, unless there are yet-to-be-discovered neurotoxic metabolites of these drugs, their damage must come primarily from hyperthermia, dopamine breakdown by MAO-B, or RNS formation, which can be alleviated with the methods described in the section below.
5-MAPB
It is metabolised into 5-APB (which is a very similar MDMA analogue, however with stronger 5-HT2A agonist and thus likely stronger psychedelic effects), which is subsequently metabolised into 3-carboxymethyl-4-hydroxy amphetamine, and another product is 3-carboxymethyl-4-hydroxy methamphetamine.[8] No catechols or MDA are formed, and so the conversion to quinones or thioether conjugates doesn't happen. However, 5-APB is a potent agonist of the 5-HT2B receptor[9], which might lead to cardiotoxicity with regular use[10] (not recommened for neurotoxicity and monoamine depletion reasons anyway).
It has a halflife of 6.5 hours.[11]
In terms of effects, the potent 5-HT1B agonism[12] might make this a particularly empathogenic/pro-social compound.[13] However, the lack of direct serotonin 1A receptor activity might make it less oxytocin-releasing, though this mighe be alleviated by stronger serotonin releasing activity (but lower NE activity), and thus indirect 5-HT1A agonism[14], as well as the metabolite 5-APB's activity at the receptor. The lower norepinephrine-releasing activity makes this compound likely less stimulating.
5-MAPB has been found to create hyperthermia (and hypertension, tremor and convulsions) in humans[15], though this is likely to be a case of an overdose. In rat liver cells, 5-MAPB and it's metabolite 5-APB have been shown to cause cell death (cytotoxicity), greater than MDMA.[16]
From Wikipedia[17]
The Borax combo, as well as 5-MAPB and MDAI, have been advertised as non-neurotoxic alternatives to MDMA.[1][2][5] However, 5-MAPB has subsequently been found to be a serotonergic neurotoxin in rodents similarly to MDMA.[5] It is thought that the serotonergic neurotoxicity of MDMA and related drugs may be dependent on simultaneous induction of serotonin and dopamine release, as combination of a non-neurotoxic serotonin releasing agent like MDAI or MMAI with amphetamine results in serotonergic neurotoxicity similar to that of MDMA.[8][21][22][23] Besides the case of simultaneous induction of serotonin and dopamine release, serotonergic psychedelics (i.e., serotonin 5-HT2 receptor agonists) have been found to augment MDMA-induced striatal dopamine release and serotonergic neurotoxicity in rodents as well.
Confirming the hypothesis of neurotoxicity largely stemming from dopamine breakdown in serotonergic nerves following simultaneous serotonin and dopamine release, such that low-dose selegiline would be protective. If this is the main mechanism of 5-MAPB neurotoxicity, as this suggests, MDMA shouldn't be much worse in comparison, and both should be basically equivalent in terms of their toxicity in combination with an MAO-B inhibitor. However, it is unclear what the role of neurotoxic MDMA metabolites is - it might contribute to neurotoxicity as well, making analogues such as 5-MAPB safer.
In humans, I found one report of fatal intoxication, though it has been in combination with several other compounds.[18]
In rodents, 5-MAPB caused similar serotonin depletion as MDMA.[19]
However, much of the toxicity of 5-MAPB compared to MDMA might be caused by use of the same amount of each compound in these studies, even though 5-MAPB is a cca. 3x stronger monoamine releaser than MDMA at the same dose[20][21]. This might simply mean a way too high dosage has been used, leading to more serotonergic hyperthermia + the SERT-mediated toxic dopamine breakdown by MAO-B. Thus at a 3 times lower dosage, it might be a safer alternative to MDMA, especially in combination with low-dose selegiline and hypothermic compounds such as agmatine, alleviating most of the remaining neurotoxicity potential.
6-APB
It is much stronger than MDMA - 12x more potent than at the dopamine transporter, 6.5x stronger at the noradrenaline transporter, and 2.4x stronger at the serotonin transporter.[22] With this altered ratio of monoamine release, it can be expected to be more akin to a stimulant like methamphetamine in it's effects. In terms of total monoamine increase measured, all tested benzofurans were about 3x more potent than MDA (so about 9x more potent than MDMA[23]), and 6-APB has been shown to be the most potent benzofuran in terms of dopamine release.[24]
In addition, 6-APB was found to bind with high affinity to alpha-adrenergic receptors (similar to MDMA - potentially calming), to 5-HT2A receptors (psychedelic), 5-HT1A receptors (oxytocin-mediating), and, most strongly, to 5-HT2B receptors, which poses potential cardiotoxic risks.[25] The alpha-adrenergic and 5-HT2A agonism make 6-APB a potentially more hyperthermia-inducing drug (which can lead to significant damage, but can also relatively simply be avoided by staying in a cool environment).
The effects begin within 1-2 hours and last for about 7 hours.[26]
It does not form quinone or thioether metabolites, which contribute to MDMA neurotoxicity (the main 6-APB metabolite was 4-carboxymethyl-3-hydroxy amphetamine).[27]
No cytotoxic effects have been found in one cell culture study[28]. There exists a report of 6-APB-caused psychosis, though this has been in combination with cannabis.[29]
Overall, 6-APB seems like a more stimulant and psychedelic analogue, with little data on neurotoxicity, though by its similarity to 5-APB it can be assumed to have similar cytotoxic oxidative effects, but less so (it has been found to be less toxic to liver cells than 5-APB, the active 5-MAPB metabolite)[30]
4-MMC aka mephedrone
Mephedrone a triple monoamine reuptake inhibitor and releasing agent (as is MDMA, though mephedrone is more of a reuptake inhibitor).[31] It is also a strong 5-HT2A agonist, suggesting it might have psychedelic properties (though strangely, it is not a proper hallucinogen). The lack of direct actiity on the 5-HT1A receptor might mean lower oxytocin release.
It is a relatively short-acting drug, with effects beginning after 15 minutes, and lasting 2-3 hours (when taken orally).
[32]
Mephedrone is commonly insufflated, and is reported to have effects similar to MDMA.[31]
It's metabolites are mostly nor-mephedrone, which is psychoactive itself, as a stimulant (DAT and NET inhibition with less SERT inhibition), DHMMC (which has a similar but weaker profile), and several mostly inactive metabolites like 5-hydroxy-mephedrone.[33] Again, no quinones or thioethers are produced, and none of the studied metabolites has been shown to have neurotoxic properties.
Interestingly the article "Clinical Pharmacology of the Synthetic Cathinone Mephedrone"[34] from 2017 reports:
Regarding the possible long-term toxicity of mephedrone, the fact that the drug possesses structural and pharmacological similarities to MDMA, amphetamines, and cathinone suggests the likelihood that repeated and/or prolonged use produces similar consequences on neurochemical and neuropsychological function. From the limited results to date, it should be pointed out that repeated mephedrone administration in experimental animals has not shown evidence of neurotoxicity to monoaminergic systems in the brain [42, 88–91[35]].
One study reports decreased serotonin transporter function in rats administered 4-MMC, but the rats were purposefully kept in a high-temperature environment.[36]
Mephedrone induces hyperthermia[37] and potentiates the neurotoxicity of methamphetamine and MDMA, but does not itself cause dopaminergic neurotoxicity. This has lead to the conclusion that mephedrone functions atypically at the dopamine transporter[38] (which might possibly be the reason behind its relative non-neurotoxicity).
One rat study showed oxidative damage to rat neurons as well as dopamine receptor downregulation.[39]
As opposed to MDMA, mephedrone has not been shown to cause microglial activation, thus the pathway leading to RNS damage is likely nonexistent for mephedrone.[40] Cognitive damage (working memory worsening) has been found in mice after "binge-treatment" of mephedrone.[41] There have been some deaths due to mephedrone overdoses.[42]
Overall, mephedrone seems like a surprisingly safer MDMA alternative, if hyperthermia is avoided (many studies showing harm in rodents used elevated ambient temperature). The working memory deficits shown in rats are concerning, but likely a consequence of high dosages and/or hyperthermia.
3-MMC
3-MMC aka metaphedrone is commonly insufflated
3-MMC inhibits the serotonin transporter much less than mephedrone or MDMA, while significantly inhibiting DAT and NET, suggesting a more stimulant, rather than enactogenic effect.[43] However, the 5-HT1 agonism may lead to oxytocin release, leading to empathogenic effects, confirmed by users.[44] It binds strongly to alpha-adrenergic receptors, which might pose vasoconstriction risk, but the lack of 5-HT2A agonism makes the risk of hyperthermia lower.[43] It is capable of producing hyperthermia, it lasts for around an hourwhen insufflated, and is reported to be a weaker version of mephedrone or MDMA in terms of its effects. The main metabolites are 3-methylephedrine and 3-methylnorephedrine, with no known neurotoxic effect.[45] 3-MMC has been shown to create ROS (and RNS), and damage liver cells.[46] Inhibition of the enzyme CYP2D6 has been shown to be protective, suggesting that genetic variations in the expression of this enzyme may affect the toxicity of 3-MMC use, with "extensive and ultrarapid metabolisers" experiencing significantly more toxicity[45] (which is likely true of MDMA and its other analogues too).
Two deaths due to 3-MMC (likely in combination with other drugs) have been reported in Sweden[47], 5 severe poisonings in the Netherlands[48], and adeath following pure 3-MMC exposure in France.[49]
Overall, 3-MMC appears to be an inferior alternative to 4-MMC, having a short halflife, shown oxidative stress toxicity, and potentially neurotoxic metabolites.
Taking MDMA has a tradeoff; One gains a euphoric and potentially therapeutic experience, and damages some neurons. The following is a review of the mechanisms by which the cost of MDMA use occurs, and ways to target them.
MDMA induces the release of serotonin, dopamine, acetylcholine and activates histamine receptors, but the main victims of MDMA neurotoxicity appear to be serotonergic (5-HT) axon terminals.[50] There has been a paper claiming dopaminergic neurons are damaged too, but later it was found out that they used meth instead of MDMA.[51] However, some studies do show dopaminergic neurotoxicity of MDMA in rodents as well.[52] Besides damage to axon terminals, damage to the targets of MDMA, the serotonin transporter (SERT) and the dopamine transporter (DAT) has been found, potentially affecting long-term neurotransmitter release and uptake.[53]
This post doesn't cover monoamine depletion, which is a short-term effect following MDMA's massive monoamine release, causing the well-known temporary depression and lethargy after MDMA use.
The main mechanisms by which MDMA can be neurotoxic:
- Hyperthermia induced by 5−HT2A and β-adrenergic agonism and α-adrenergic receptor-caused vasoconstriction
- Creation of ROS from breakdown of dopamine by MAO-B in serotonergic neurons, which damage the neurons' mitochondria[54] (dopamine is broken down into H2O2 (which potentially converts to HO radicals) and catechol metabolites, which auto-oxidise into quinones, both leading to oxidative stress)[55]
- Metabolites such as HHMA or α-methyldopamine, auto-oxidising into quinones, creating thioether conjugates with the endogenous antioxidant GSH or NAC [56], that are neurotoxic and cause microglia activation[57] (inflammation, which causes toxicity itself).
Or MDA (a metabolite of MDMA):
MDA is metabolized to a-MeDA that can react either with glutathione (GSH) to form 5-(GSH)-a-MeDA or with N-acetylcysteine (NAC) to form 5-(NAC)-a-MeDA, and these compounds might be the main metabolites responsible for the neurotoxic effects of MDMA observed in rats.[58]
4. The microglia-caused inflammation upregulating iNOS, producing NO and subsequently reactive nitrogen species (RNS) including peroxynitrite (which reacts to form nitrotyrosine), which cause oxidative damage to mitochondria[59], cell membranes and proteins (along with the ROS)[60][61]
5. (Rarely, hyponatremia, causing brain swelling. This is more likely with high estrogen exposure)[62]
What can be done:
- Hyperthermia:
- avoiding caffeine[63]
- physically cooling (staying in a cool environment)
- 5-HT2A antagonists (ketanserin[64], clozapine[65][66], mirtazapine,...
- However, these diminish/block psychedelic effects
- adrenergic blockers - modestly effective[66]
- THC[67]
- NMDA antagonists (memantine[68], dextrorphan[69], agmatine, ketamine,...)
- NOSi (nitric oxide synthase inhibitors)[70] - e.g. agmatine[71]
- melatonin[72] (lowers core body temperature + antioxidant)
- glycine[73] (lowers core body temperature)
- MAO-B-caused ROS damage[74]:
- Inhibiting MAO-B; low dose selegiline[75]
- nonselective MAOIs (and selegiline at high doses) would be very dangerous due to additionally radically increasing intersynaptic serotonin - serotonin syndrome
- Analogues that release less dopamine (since the simultaneous release of DA and 5-HT is required for this mechanism of neurotoxicity)
- MDAI, MDMAI, MMAI, (R)-MDMA, MBDB, MDAT, 5-APDB, and many more
- These, however, likely lack the full subjective effects of MDMA - mostly euphoria
- MDAI, MDMAI, MMAI, (R)-MDMA, MBDB, MDAT, 5-APDB, and many more
- Inhibiting serotonin release (e.g. through SSRIs[76]) or dopamine release (through DRIs[77]) also reduces neurotoxicity, but very likely abolishes or reduces effects
- (MDMA reverses the direction of monoamine transporters, so inhibitors of these transporters reduce MDMA effects, even if in the absence of MDMA, they increase intersynaptic concentration of monoamines)
- Inhibiting MAO-B; low dose selegiline[75]
- Toxic metabolites:
- reducing metabolism
- CYP2D6 inhibition (probably dangerous due to higher MDMA levels)
- e.g. Bupropion, which also is a DRI (prevents neurotoxicity by previous mechanism)
- Quinine, CBD, buprenorphine, berberine [78]
- (likely) parenteral administration: intravenously, intranasally, etc.
- CYP2D6 inhibition (probably dangerous due to higher MDMA levels)
- Analogues with different metabolites but similar pharmacology
- reducing metabolism
- RNS[83]
- NOSi, e.g. agmatine[84] (available online) - also reduce hyperthermia
- Hyponatremia
- Oxidative damage (though RNS or ROS):
ALC administration was found to reduce MDMA-induced protein carbonyl formation (a marker of oxidative protein damage), decrease the incidence of mitochondrial DNA (mtDNA) deletions, and improve the expression of key mitochondrial respiratory chain components (such as subunits of Complex I and Complex IV) [93]
- Potentially neuroprotective drugs
- modafinil - a DAT inhibitor, shown protective in combination with nicotine[94]
- however, modafinil is a CYP450 inducer and as such would speed up MDMA metabolism, causing faster creation of neurotoxic metabolites
- bromantane
- it enhances dopamine synthesis through tyrosine hydroxylase and DOPA decarboxylase upragulation[95], so it might be a useful post-MDMA dopamine replenisher
- Cannabis, strangely enough (due to temperature lowering)[67]
- modafinil - a DAT inhibitor, shown protective in combination with nicotine[94]
- ^
- ^
https://en.wikipedia.org/wiki/Levomethamphetamine#Pharmacology
- ^
- ^
- ^
- ^
https://shaunlacob.com/wp-content/uploads/2020/12/DC-MDMA.pdf
- ^
- ^
- ^
https://bpspubs.onlinelibrary.wiley.com/doi/full/10.1111/bph.13128
- ^
https://en.wikipedia.org/wiki/5-HT2B_receptor#Clinical_significance
- ^
https://www.sciencedirect.com/science/article/abs/pii/S0196064416300038
- ^
- ^
- ^
https://openaccess.sgul.ac.uk/id/eprint/108925/1/Combined_in_vitro_and_in_silico_approaches.pdf
- ^
- ^
https://pubmed.ncbi.nlm.nih.gov/27291301/ or https://www.wellesu.com/10.1002/jat.3351
- ^
- ^
- ^
https://www.abstractsonline.com/pp8/#!/10619/presentation/67382
- ^
- ^
- ^
- ^
https://en.wikipedia.org/wiki/3,4-Methylenedioxyamphetamine#Pharmacology
- ^
- ^
https://bpspubs.onlinelibrary.wiley.com/doi/full/10.1111/bph.13128
- ^
- ^
- ^
https://bpspubs.onlinelibrary.wiley.com/doi/full/10.1111/bph.13128
- ^
- ^
- ^
- ^
- ^
https://www.mdpi.com/1422-0067/26/15/7656#Phase_I_Metabolites
- ^
- ^
studies referenced:
42. Baumann MH, Ayestas Jr MA, Partilla JS, et al. (2012) The designer methcathinone analogs, mephedrone and methylone, are substrates for monoamine transporters in brain tissue. Neuropsychopharmacology 37:1192–1203
88. Angoa-Pe´rez M, Kane MJ, Francescutti DM, et al. (2012) Mephedrone, an abused psychoactive component of ‘bath salts’ and methamphetamine congener, does not cause neurotoxicity to dopamine nerve endings of the striatum. J Neurochem 120:1097–1107 89.
89. Angoa-Pe´rez M, Kane MJ, Briggs DI, et al. (2013) Mephedrone does not damage dopamine nerve endings of the striatum, but enhances the neurotoxicity of methamphetamine, amphetamine, and MDMA. J Neurochem 125:102–110 90.
90. den Hollander B, Rozov S, Linden AM, et al. (2013) Long-term cognitive and neurochemical effects of “bath salt” designer drugs methylone and mephedrone. Pharmacol Biochem Behav 103:501–509 91.
91. Shortall SE, Green AR, Fone KC, et al. (2016) Caffeine alters the behavioural and body temperature responses to mephedrone without causing long-term neurotoxicity in rats. J Psychopharmacol 30:698–706 - ^
- ^
- ^
- ^
- ^
- ^
- ^
https://www.erowid.org/chemicals/4_methylmethcathinone/4_methylmethcathinone_health1.shtml
- ^
- ^
https://erowid.org/experiences/subs/exp_3Methylmethcathinone.shtml
- ^
https://pmc.ncbi.nlm.nih.gov/articles/PMC10972361/#sec3-medicina-60-00466
- ^
- ^
- ^
- ^
https://www.sciencedirect.com/science/article/abs/pii/S2352007816302359
- ^
- ^
https://en.wikipedia.org/wiki/Retracted_article_on_dopaminergic_neurotoxicity_of_MDMA
- ^
- ^
- ^
- ^
- ^
- ^
- ^
- ^
- ^
- ^
- ^
- ^
- ^
- ^
- ^
- ^
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0009143
- ^
https://www.researchgate.net/publication/5867052_Memantine_prevents_MDMA-induced_neurotoxicity
- ^
https://www.sciencedirect.com/science/article/abs/pii/030439408990637X
- ^
- ^
- ^
- ^
- ^
The simultaneous large release of dopamine and serotonin causes the serotonin transporter (SERT) to take up some of the dopamine and transport it into the serotonergic nerve, where it is broken down by MAO-B (usually MAO-A breaks down dopamine and serotonin, but MAO-B is found in the serotonergic neurons and is responsible for the breakdown of the residual monoamines). This creates free radicals that can cause damage to the membranes and mitochondria of neurons. That's why the inventor of Selegiline, an MAO-B inhibitor, promoted its preventative use as a longevity drug (he did live until the age of 92).
- ^
- ^
- ^
- ^
- ^
- ^
https://en.wikipedia.org/wiki/4-Fluoroamphetamine#Pharmacology
- ^
- ^
- ^
- ^
- ^
- ^
- ^
https://www.sciencedirect.com/science/article/abs/pii/S0006899302023132
- ^
- ^
- ^
- ^
- ^
https://fse.studenttheses.ub.rug.nl/12564/1/LST_BC_2015_BMNijhoff.pdf
- ^
- ^
https://www.sciencedirect.com/science/article/pii/S0891061821000697
- ^
Discuss
Страницы
- « первая
- ‹ предыдущая
- …
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- следующая ›
- последняя »