Вы здесь
Сборщик RSS-лент
Me consuming five different forms of media at once to minimize the chance of a thought occurring
There's a book open on my desk. Not a real one, just a Kindle, but close enough. I'm once again rereading some comfort food-for-thought, possibly HPMoR. Just behind me, on the bed, there's a laptop blasting jcore. I've been listening to the same song on a loop for eight hours in a row, hallucination by sanaas. It was removed from Spotify, so Youtube it is, naturally with multiple ad blockers to make the site usable.
The desk is where my actual computer lives. Desktop, you could call it, but the actual case is under the table. It usually has three screens, but the one on the right is empty at the moment. The one on the left is showing some slow-paced gameplay Youtube, just so that someone is talking. I'm rarely looking at it, but the presence of a voice means the one inside my head doesn't talk. I mean that in the sane way.
And on the middle screen, I'm playing online chess. The time control is 3+2, which I'm quite proud about. I used to play 1+0 only, to make sure I don't have time to think. I'm playing chess because the feeling in my stomach says I shouldn't continue reading at the moment. I think it's telling me to go for a walk. Actually going outdoors feels a bit too hard this decade.
The problem with 3+2 time is that my opponent sometimes thinks. If the position is uninteresting, I switch tabs to some other entertainment until the chess tab pings me about my turn again. Or fish my phone out of my pocket to check my messages. But that's scary, because if I open a message I might forget to reply when the chess pulls me back in.
Something like the scenario above happens quite often. Most of the time there's slightly less media, but five is definitely reached weekly. I work remotely, so once in a while I replace the book with a code editor, the youtube with a work meeting, or the chess with more chess. Sometimes I need to leave the house, so I put on headphones and maybe take a book with me. Falling asleep after 20 hours of screen time is not easy, and most of the time thinking keeps me awake anyway. I think of this as boredom, and watch more Youtube instead, but on the bed this time. Sometimes when I wake up it's still playing. My sleep schedule drifts randomly.
A couple of times each year I get the feeling that this is addict behavior. The usual response is to limit consumption. I'm not sure why that would help; cutting off everything else just means I'm still reading the book but enjoying it a bit less. I still do it, because at least the music is going to feel a bit better when you've lived a week without it. Doing anything productive, including nothing, still remains absolutely unthinkable.
I've heard that to get rid of an addiction, you need to replace the problematic behavior instead of removing it. To replace an addiction of consuming media, then I think the appropriate response would be to produce media, or improve oneself somehow. Those require quite a bit of energy, and never feel as satisfying as just pouring a screenful of dopamine into your eyes. And self-improvement is fake anyway, you're just looking to impress someone else. Why bother?
Why bother? Because just consuming stuff starts feeling meaningless after a long while. Not quickly enough though. And it's all meaningless anyway.
But that's enough for today. My dopamine receptors are aching for something more than just the same drumstep mix for hours while dumping unedited depression. The fridge is empty, and it's time to fill it with the same crap I've been eating for the past decade. Although I probably should prepare more texts for the days when I'm not feeling like writing. Removing should from one's vocabulary would be quite an improvement.
Discuss
A review of MSUM's AI Innovation Summit: Day One
This past week, I attended the AI Innovation Summit in Moorhead, Minnesota, which was an event put on by MSUM's newly founded Institute of Applied AI. The summit existed as a way to introduce and integrate MSUM's new institute with the surrounding community, as training for people interested in using AI (for basic use cases), and for them to hear perspectives from the community on what's desired both from industry and educators. I hadn't attended anything similar before, and I'm glad I went, even just to get a better perspective on the general vibe towards AI among different people in my community. I attended the first two of three days, the first focusing on AI and Business, the second on AI and Education, and the third day was for high schoolers (I'm not one of those!)
The present is a particularly interesting time for colleges, as what skills and knowledge are economically valuable is changing faster than ever before. To some extent this is true of education generally, but I think college is a particularly meaningful arena for these sorts of questions as 1) it's the first sort of schooling that most people have any sort of choice about whether and where they attend, and 2) it's the first sort of schooling where it is expected that the vast majority of attendees will be joining the workforce after graduation, yet 3) there's still the idea that college is supposed to be a place of "higher education," of instilling intellectual values, critical thinking, etc into its students rather than just a job training site. I'll discuss this tension a bit more in tomorrow's post about the second day of the summit intended for educators, "Learning in the Age of AI."
I think in most scenarios where colleges stay relevant for the near-future, it's important that they do their best to adapt to increasing AI use/value, so I feel somewhat positive about the event as a whole. At some point the question becomes "How do we prepare for unpredictable ever-increasing weirdness" and a certainly reasonable answer is to continue monitoring sources of weirdness while taking the actions you know how to. MSUM's institute is still early in its existence, with plans to be fully functional/impactful by 2027-2028, so I'll be interested to see how things will have changed by that point.
I arrived in the conference room a little before 9:00, and didn't have too much time to chat with others before the speaking begun. The 40-45 attendees in the room room tended male, with age looking to be roughly normally distributed with a mean of around 40. Attendees were unsurprisingly businesspeople from around the area, with a couple students and a couple others. I sat at a table that included a couple other young 20-somethings like me, as well as a couple other more typical attendees While I didn't personally know the other attendees, it was fun to find out that I and one of my tablemates had done some connected work on a large city project, and was somewhat of a reminder to me to be more social and put some effort into finding the different connections you have with others.
The conference didn't begin right at 9, which felt like it was by design. We had a bit of time to mill about, chat, and check out the booths set up by the summit's corporate sponsors until we returned to our seats and were greeted with some opening statements from the Institute of Applied AI's Executive Director. These remarks didn't extend much beyond welcoming us and introducing the day's keynote speaker, Shawn Riley from Bisblocks, a venture studio firm.
Shawn's presentation hit just about all of my pre-imagined stereotypes of AI-loving corporate-growth-type preaching. His main refrain that he returned to multiple times throughout was "If you are not an AI company, you will be outcompeted." The main structure was moving from general discussion about AI, how it's improved over time yet "is still a 1st grader" in terms of its prospects of continuing to grow and improve, finishing with how his company has used AI to save money/be more efficient, with a couple of asides about general societal affect, finishing with a short Q&A session.
The specific examples for how his company had used AI were fairly mundane, mostly focusing on the use of AI-generated music and video to generate a marketing campaign for much cheaper than if they needed to hire people. It did look clearly AI-generated to me, but also looked pretty acceptable, so I'll give them a win on that. I did wonder how valuable that sort of of marketing material will become when it's essentially on tap, but it's certainly a notable cost savings right now to those who are using it effectively. What interested me the most about his speech was when he began to talk about the issue of AI automating jobs, but he followed that up with just a comparison to the steam shovel and previous concerns about massive job loss not panning out, they were replaced with more jobs. No discussion of comparative advantage, no worry about the topic at all which seemed curious from someone who also stated belief that progress is continuing very fast, and that everything could be automated eventually.
After the keynote were three shorter presentations before lunch. The first was an underwhelming talk about "How AI can help you move forward." It was definitely targeted towards people who aren't me, giving a brief explainer on what an LLM is, showing AI progress on benchmarks, describing differences between the unhelpfully named ChatGPT models (culminating with "just click the 'use GPT-5' button in Copilot), and giving some basic tips for prompting, but it was bogged down with some bad ai-generated visual aids and a general feeling that I could make and give a better presentation pretty quickly. Next!
Moving on, we had "Strategic Insights on AI’s Role in Enterprise or SMB transformation." This again had a good chunk of it being "AI is valuable and progressing very fast everyone, watch out!," but was followed up with some good advice of how to start implementing AI processes in one's business, mainly relating to "rather than buying licenses and saying 'use AI,' actually plan ways to use AI to automate specific repeatable processes in clear programs with good data governance" (people loved talking about data governance, probably because it is important.) There was a little discussion at the end of the idea of considering value of building 'AI-use muscles', which is to say it would pay off down the line to have employees that are more comfortable with AI even if it isn't immediately helpful.
Finally, there was a pretty funny talk from a person who built a unicorn chatbot to talk to his autistic daughter. I ended up disappointed by this talk because the title said it was going to be about Agentic AI, and I was interested in things like AI tool use and more autonomous capabilities but it was basically just a chatbot with some curated data sources about unicorns to talk to his daughter. The end of the presentation where he described making a second chatbot to help her with math was more interesting, where he had downloaded the school's curriculum as a data source, and connected it to her daughter's grades, so it could give specific responses potentially tailored to how she was doing in school. Unfortunately, he had just done this a day or two before the conference, so there wasn't information about how well it seemed to be working. From his description it was all really easy to do within copilot studio, so I'd be really interesting in learning more about how and in what ways chatbot interaction is good/bad for users and tailoring those experiences etc. Cool presentation.
During lunch I had a chance to talk with people a bit more in-depth. Some people started expressing their opinions a bit more after I mentioned that I'm a Poli Sci grad and interested in getting into AI Policy. I'll discuss general audience perspectives/opinions/reactions either in tomorrow's post or a post after that. I need to drag it out a little since I'm getting back into the Halfhaven swing of things.
The panel immediately following lunch was the most valuable to me, an AI ethics discussion featuring a five-person panel featuring three of the earlier presenters and the ai institute executive director. The ball got rolling early on with the initial question: "Is Skynet coming?"
The surprising result of that question to me is that nobody just said "No." Of course, nobody said 'yes' either but answers ranged from "possibilities are too far out to predict, many unknowns" to "Skynet is already here, have you seen the AI-powered cyberattacks already happening? Also the CCP has crazy surveillance AI" to "prepare to coexist with AI" and finally a classic "superintelligence is too far off, worry about human-directed AI attacks, AI is a tool." I was a little impressed with the nuance which some of them replied to a not-very-seriously presented question. It's clear that none of them really have x-risk in their mind but they gave mostly reasonable answers.
The next question was about the role of government regulation when it comes to AI. Shawn took the lead on this one, taking the maximal view against regulations, saying they stop growth, raise barrier of entry to the market, the government will be acting too slowly/poorly to implement valuable and timely regulations so they just shouldn't, maybe legislators should ask ChatGPT to generate legislations instead, might be better. Nobody else on the panel really took a direct view against that, instead turning to issues regarding possible age-gating of synthetic relationships and the usage of bots on social media, noting that we already have some regulations regarding those. After that they got away from specifics, noting that humans already don't agree on ethics, it'll be difficult to get machines to agree, and can they really see in shades of gray? How can they decide? All questions that will not be answered during this summit.
The final question they got to before giving some closing thoughts was how to ensure fairness in AI? As stated the question to me seems to allow for some ambiguity about whether it's referring to bias in AI systems or fairness in society as a result of AI, but all the panelists targeted just the bias issue. The most common answer was to treat it as mainly an issue resulting from poor (or just rushed) data collection, that better data would be less biased, but also that users should be careful to not immediately interpret AI giving unwanted answers as it being biased (Grok was not mentioned here). One panelist put forward the idea of synthetic data being used to correct bias, although that runs the risk of just kind of putting bad data in your system until it gives you the result you want. There were some worries about AI being used to generate hyper-specific polarizing information, which reminded me of the scissor statement illustration. There was also the concern that "the winners write history" and that in some cases the unbiased data corresponding to one side of an issue does not exist, in which case an AI can't be trained off it. Consensus was not formed but everyone got to say something, which brings us to their final thoughts.
To close out, most said something about keeping AI progress human-centric, that it's a tool, a reflection of us, and/or a force multiplier rather than a replacement. One said specifically that it would be bad to have AI develop itself. The data management guy once again stressed that it's important to have good data management, and one said not to be surprised if AI "goes way past us."
Unfortunately, those were all the questions we got through in the hour, I think it was a bad practice to have every person on the panel answer each question individually while sometimes adding on to other people's answers. If I ever run a panel discussion it'll be really good and more questions will get answered.
Finally, we move on to the other presentations of the afternoon, and I'll move through these quickly because they leaned much more towards the "corporate training" side than the "thoughts/opinions/ideas about AI" side of the summit.
The first one was by the data security guy telling us how to keep our data secure (thank you!). It laid out some pretty clear guidelines that seemed helpful to people who would want to set up AI pipelines within their company, and the presenter largely helps companies do this as his job, so it went smoothly. A key thing that he emphasized was that adhering to clear data security standards and transparency about how those standards are formed/met leads to increased trust which leads to better and smoother adoption.
The second was a demo of using Copilot within the Microsoft office suite of apps. Parts of this presentation were awkwardly funny when copilot made errors a couple of times after being prompted "give me X information while removing any customer information" which felt a little on-the-nose happening right after the data security presentation. Otherwise it was a fine demo, some of the spreadsheet manipulation seemed neat, none of the capabilities were very surprising, but definitely some were convenient, and could be an upgrade if implemented well at the law firm I used to work for.
The third was a presentation on how to accurately measure the ROI on a given AI investment/program, and strategies to make that program better. The main information on measuring was basically "yeah, try to use the scientific method" by clearly defining a task, measuring baseline performance beforehand and then measuring the ai-assisted improvement (if any) while tracking adoption and other outcomes. One of the tips I thought was interesting was having "AI champions," or certain employees who would know the technology very well and would be able both to demonstrate the various use cases and help coworkers with technical difficulties. Having someone like that seems like it could be very valuable towards overcoming some resistant towards adoption which is common in many workspaces.
Our final presentation was an ad-lib where the person who had a presentation on "Ethical AI with Microsoft Copilot" dropped the whole thing since the panel earlier was in-depth enough and he didn't feel he could add much to it, so it ended up being a kind of strange crowd-involved continuation on prompting strategies. It aimed towards giving the llm a persona of being an expert in some field, or being an entire marketing team and working from there, and he demonstrated prompting gpt-5 to create a prompt for further work, and copy-and-pasting previous information into more prompts, but it didn't really seem to work that well. It's possible that the crowd-provided scenario (advising the city on preventing a massive gnat infestation and creating a related marketing campaign) wasn't very good to work with, and I think the presenter at some points forgot exactly what he wanted to put in which prompt, but the results we got were basically "ChatGPT can brainstorm a bunch of options really fast" and not much beyond that.
The closing remarks were brief, and the day was done. It would have been interesting to hear from a wider variety of companies, as the presenters were mainly from companies who deal in teaching other companies how to use AI tools to some extent. However, I enjoyed hearing what people had to say both in and out of the presentations, and was glad I went. The lunch and corporate-provided goodies probably got me close to breaking even on my registration fee, so all in all I'd consider it a successful day. I'll discuss the second day, on education tomorrow.
Discuss
FTL travel and scientific realism
It's November! I'm not doing Inkhaven, or NaNoWriMo (RIP), or writing a short story every day, or quitting shaving or anything else. But I (along with some housemates) am going to try to write a blog post of at least 500 words every day of the month. (Inkhaven is just down the street a bit and I'm hoping to benefit from some kind of proximity effect.)
Today: Llamamoe on Discord complains about
people who respect science but say "and in the past we thought the earth was flat and everything we currently think is impossible might end up possible" and refuse to acknowledge that some things are in fact fundamentally impossible, like true FTL travel
And elaborates:
Like in principle FTL could be possible. But it would require everything we think we know about physics to turn out to have been wrong, and not slightly but completely, with no real exceptions.
I'm actually gonna side with the FTL believers on this one, with some caveats.
(Content warning: physicist discussing philosophy)
Map and territoryThe case for FTL being "fundamentally impossible" is pretty straightforward: relativity is generally accepted as a "correct" physical theory; relativity says FTL is fundamentally impossible; therefore FTL is fundamentally impossible.
For the purposes of this post, I think it's basically true that FTL is fundamentally impossible according to relativity. (Tachyons, in theories which contain them, are probably better-modeled as something similar to the classic "shadows can move faster than light" brainteaser.)
The flaw in this argument is that "correctness" of theories in physics doesn't go as far as we might like it to. Sure, relativity has passed many difficult experimental tests with flying colors. This is enough for us to accept it as a highly accurate model of reality. When it makes quantitative predictions, we will happily adopt those predictions with pretty high confidence. But I want to distinguish the following claims:
- FTL travel is probably impossible
- FTL travel is impossible
- FTL travel is "fundamentally" impossible
The first claim requires relativity to be right about most things (and for FTL travel to not be a likely exception); the second requires relativity to be right about FTL travel in particular; and the third claim requires relativity to be right about everything, such that we can adopt not just its predictions but its internal ontology. As I'll explain below, I think this last claim is a lot stronger than the other two, and requires some nonobvious philosophical commitments.
Predictions and ontologiesThere's a long track record of physical theories being extremely good models, but ultimately wrong in a way that is fatal for their basic ontology of the world. Newtonian physics (and Galilean relativity in particular) is a good example. At this point, some skepticism towards the ontology is warranted.
I think working physicists vary a lot in how strongly they believe in scientific realism. The actual work of physics only requires (some degree of) consensus on the trustworthiness of theories in terms of their predictions; individual physicists are free to treat the internal language of the theories (in terms of electrons, fiber bundles, wavefunctions, etc) as a literal description of reality, as a formal symbol-game with no truth value, or anything in between.
If you take an anti-realist stance, then physical theories are really just tools for making predictions about the world, with some colorful mnemonics attached to the prediction-making machinery. If you take a realist stance, then physical theories are not just making predictions, but also telling us all kinds of wonderful things about an unseen world of electrons and fiber bundles and so on. On the other hand, according to the realist stance, most physical theories to date have been wrong about the interesting part, and successful at predictions kind of by accident.
I'm going to try to take an awkward middle-of-the-road position here: one that's realist enough to let us ascribe some truth to the colorful stories our theories tell, but anti-realist enough to survive the ontological apocalypses that happen whenever a theory is superseded by a more correct one.
KuhnI'm going to throw in a shout-out to a blog post on Kuhn, "Science Cannot Count to Red. That’s Probably Fine.", by Lou Keep. In particular:
Newtonian physics makes several ontological claims (the universe is corporeal particles), Ptolemaic astronomy the same (circles are fitting for the heavens due to their divinity), etc. Both of these are wrong. Newtonian physics, however, can solve many more puzzles. "Amount of puzzles solved" is commensurable - it carries from one scientific set to another, there's a quantifiable, comparable idea of progress. The ontologies of the paradigms display no such progress.
I think this is a bit too strong. I'd say we make some ontological progress too: just as the quantitative predictions of wrong-but-useful models are approximately correct, I think the ontological claims are often approximately correct in an appropriate sense.
Approximate ontological correctnessAs an example, Newtonian physics claims that spacetime is invariant under the Galilean group Gal(3). Relativity claims it's invariant under the group SO(3, 1). The former is a group contraction of the latter, so we can view Newtonian physics as making a kind of "qualitatively approximately correct" claim: spacetime is invariant under something that is approximately Gal(3), in the appropriate limit.
Similarly, atomic nuclei are not indivisible point particles, but they are approximately so, on the scale at which chemistry happens.
There's something kind of absurd about this, to be sure. The ontologies of physical theories have a reassuring crisp, absolute flavor to them; trying to believe them only in an "approximate" sense means throwing the crispness while trying to keep everything else. But I think it's in line with how we use informal ontologies in everyday life. When we claim something is rectangular, we're not insisting on geometrical perfection; we're saying that it is "approximately" a four-sided shape with four right angles. (Note that we're not even claiming it has "approximately four" sides; Colorado's border is officially defined by 697 straight boundary lines.)
All models are wrong?Some say that "all models are wrong, but some models are useful." This is a fairly anti-realist stance; I would probably modify that to "most models are wrong", and add that we don't know which of our models, if any, are right.
Personally I'd bet on the perfect correctness of quantum mechanics, against that of quantum field theory, and very tentatively in favor of some version of relativity, but probably not in 3+1 dimensions.
What I mean by this is that I think quantum field theory is merely approximately ontologically correct, but that QM is exactly ontologically correct -- the true substance of reality is something "ontologically approximately like" a bunch of quantum fields, but it's precisely a wavefunction in an appropriate Hilbert space. And likewise, there is probably something worth calling spacetime that is in some sense a Lorentzian manifold, but probably not a 3+1-dimensional one. For example, it might be a 10+1-dimensional manifold compactified onto a 3+1-dimensional base.
Back to FTL travelIt's actually pretty unclear where that leaves FTL travel (through the ordinary 3+1-dimensional spacetime of general relativity).
My best guess is actually that it's "approximately fundamentally impossible": FTL travel is arguably possible, but only in situations where (3+1-dimensional) spacetime itself is close to breaking down.
As an example, the "ER=EPR correpondence" speculates that strong enough quantum entanglement between distant objects can be usefully modeled as a wormhole physically connecting the objects. (One thought experiment involves Alice and Bob creating a pair of black holes far apart from each other, entangling them by throwing in a bunch of Bell pairs, and then diving through the event horizons to meet each other in the wormhole's interior.)
To the extent that spacetime is a real thing, you can't move faster than light through it, just as relativity says. But the claim that "spacetime exists and has 3+1 dimensions" is itself only approximately true.
Discuss
Reflections on 4 years of meta-honesty
Honesty is quite powerful in many cases: if you have a reputation for being honest, people will trust you more and your words will have more weight (or so the argument goes).
Unfortunately, being extremely honest all the time is also pretty difficult. What happens when the Nazis come knocking and ask if you have jews in the basement? Or when your girlfriend asks you if this dress makes her look fat? (Or so the argument goes)
Meta-honesty is a proposed solution to these problems. The gist is you act very honestly, but can lie when it’s very important to do so. The catch is you have to always be completely honest about what kinds of situations you’d lie in. In theory this lets you have all the benefits of being very honest without the worst of the drawbacks (some of the “drawbacks” are irreducible errors of course--you can’t betray or trick people as easily when you’re honest and that’s the point). But the arguments for meta-honesty are largely theoretical.
I started trying to rigorously abide by meta-honesty a little over four years ago. Here are some musings on the benefits and drawbacks I’ve observed, and the practical tips I’d have for anyone else who wanted to try it.
The good1: I’ve become a more honest person.This is probably the biggest actual benefit of meta-honesty to me. I used to lie habitually in a bunch of silly situations: I’d make up excuses when I ran late, I’d pretend to have heard about things I hadn’t heard of, I’d tell white lies as compliments. I think those kinds of lies are usually bad. Being intentional and mindful about honesty made me realize I was doing all of that way too much and helped me break the habit. (Though I still slip up and indulge in those things on occasion.)
2: I’ve become much more honest and clear about things to myself.It’s forced me to carefully think through which situations I would or wouldn’t want to lie in, it’s helped me notice places I feel compelled to lie in everyday speech, and it’s caused me to be mindful about lying and my reputation in ways I’ve benefited from.
For example, ever since I was a kid I’ve felt self-conscious about the media I consume (I guess because my mother thought my taste in media was cringe?), and got in the habit of pretending not to have seen various media I actually had seen. I lied to my partner about if I’d seen a TV show he’d recommended, which is a big deal because I was trying particularly hard to be honest with him. It was interesting to notice this quirk of mine and to reflect on why I felt self-conscious. (I did eventually come clean.)
3: It’s a (credible?) signal about the kind of person I am.One might hope it’s at least a credible signal that I’m careful and thoughtful about honesty and integrity, and therefore that I’ve made the correct/wise decision to be pretty honest and high-integrity.
In practice it’s more like it mostly makes people go “oh this person has a weird obsession with honesty I guess???”, but that has the same effect (though it can also come off as weird and calculating in a way that backfires).
4: On rare occasions, the system straightforwardly works.There have been ~4 very crisp occasions where it was quite clear someone took what I said much more seriously and was much less skeptical than they otherwise would have been because I have such a strict honesty policy. There are a lot of other situations where people have generally trusted me to be honest or allowed me into high-trust environments (e.g. telling me sensitive information or things that make them look bad, entrusting me with large amounts of money without many mechanisms to prevent me from abusing it for personal gain, etc), but only because I’d earned their trust in more conventional ways.
I haven’t found any of this that useful and I think without meta-honesty it would have taken just a smidge of elbow-grease and creativity to get whatever benefits meta-honesty got me in these scenarios (as long as I was still generally pretty honest). It’s hard to pinpoint exactly which benefits come counterfactually from meta-honesty. I’d pay less than a thousand dollars for the actual benefits that have come from it that I’m aware of (in practice I might pay more on the off-chance my estimation of the benefits was wrong).
I think there are some rare (but not impossible) situations I could find myself in where meta-honesty was incredibly useful, it's just I live a rather humdrum life so they’ve never come up. (Though, per meta honesty, I’ll note there are some cases where I’d lie about all this!)
The badMental overheadThere’s some mental overhead to tracking my statements carefully. Mostly this is a feature not a bug, but it still has costs. I find the costs pretty manageable, especially with a little practice, but YMMV.
Sounding weirdMy personal flavor of following meta-honesty includes being anal about clarifying that various statements are lies even when that’s kinda obvious. This has some social costs (and I try to be less anal when I’m with people who don’t know me well/would be weirded out). You could totally follow meta-honesty without being absurdly scrupulous of course.
It's easy to mess upMeta honesty is quite tricky to actually do in practice! To follow meta-honesty you need to accurately predict your own mental-state in all kinds of weird hypotheticals (and in the far future!). It’s easy to accidentally have too rosy a picture of yourself (it’s easy to say “I’d never lie about XYZ” until you’ve actually done XYZ). When I first started meta honesty, I realized a lot of my lies are habitual/almost involuntary, and I was surprised by some of the situations I realized I lied about.
Because nobody’s ever really pressed me on anything other than absurd situations, I haven’t had too much trouble answering questions in practice. But I think it could be easy to slip up and lose one’s credibility. It feels weird to have my credibility hung on something so easy to mess up. And it’s so easy to misremember and hard to litigate whether I did mess up! I think I’ve never flubbed my meta honesty but I’m not, like, 100% sure. I could imagine having said (in the early days when I was less careful, or casually without thinking) that I wouldn't lie in some situation or another, and then years later that situation happening and the circumstances being different than I pictured and me lying.
On a few occasions, being so scrupulous has caused me notable distress where I second-guess myself and try to desperately ask friends to recall if I ever said various things. I notice that often my brain goes “I’m sure I’ve never lied about XYZ” and then I go “hmm am I really sure though?” and then I second-guess myself, and I think this often creates a false memory type of thing/the memory gets fuzzier the more I look at it.
I err conservative in ways that can make it harder to say anything about when I'd lieIf I’m asked about if I’d lie in a specific situation, I default to a noncommittal answer, at which point the meta honesty isn’t as useful (though if it really mattered I’d think it through and say something, but it almost never matters). If I didn't follow meta-honesty, maybe I'd feel free to say something more informative (but I can't recall this ever being very important).
ObservationsI have almost never been asked questions about which situations I’d lie in.Almost every time it’s come up, it’s because I mention I follow meta-honesty, and people curiously try to poke at it by asking random questions about where I would or wouldn’t lie. They’re not usually very invested in these questions and these interactions have a similar vibe to playing hot seat or truth-or-dare. They’ll usually ask about whacky hypotheticals like “would you tell me if you murdered someone”, but occasionally they'll think of things that might put me in a bind like “would you tell me if you secretly hated me”. My answer to both is usually something like “well, it would depend on the details of the hypothetical. Can you tell me about X, Y, and Z?” I either end up answering something like “if I try to imagine the distribution of scenarios you’re trying to ask about here, I’d confess in most of those scenarios” or “in most worlds where I hated you and it seemed genuinely important to you to know, I’d answer you honestly, but I can think of plenty of edge-cases”, or I ask enough questions about the specifics of the situation that my inquisitor ends up bored and changes the topic.
I can think of one somewhat important situation where someone proactively and quite seriously inquired about under what circumstances I’d lie (though perhaps I’m forgetting some). I had just told them I felt confident some people I knew weren’t involved in/didn’t know about some bad behavior some people socially adjacent to them had engaged in, and they asked if I’d lie about that.
(It does come up slightly more often that I say “I wouldn’t lie about XYZ” and people put some weight in that.)
I have never really felt bottlenecked on people trusting my honesty/integrityI am often bottlenecked on people being unsure if I have sufficient competence or knowledge or calibration: people know I won’t lie to them lightly but if they’re worried I might be gullible or mistaken or confused, that’s not much better.
Meta honestly likewise makes it hard to commit to something I plan to break, but it doesn't help me get around the fact that I sometimes fail to do things I earnestly intended to do. (These days I don’t word commitments in ways where my incompetence makes me very liable to break them, but that just means the only thing I can commit to are fairly weak).
I’ve never found meta-honesty very useful for interacting with people outside my weird bay-area bubbleI can’t tell if this is because I don’t have enough interactions to build up a credible reputation of actually adhering to this policy in other communities or simply because the idea is rigid and first-principles-y in a way that doesn’t mesh well with the norms of other communities.
This is somewhat unfortunate, because one of the main advantages I hoped to get from meta-honesty was being able to coordinate and cooperate with people who might naturally be very wary of me, and most of those people are not in my immediate bubble.
Most ways of creating trust in the real world seem based around incentives and coalition-building. Meta honesty hasn’t let me escape this and engage in any new interesting conversations.
ConclusionsOverall I wouldn't go out of my way to recommend meta-honesty to most people. I would probably recommend being mindful about when you would or wouldn't lie, and meta-honesty is a good forcing function to do that, but lots of things are probably better (for example some kind of daily journaling exercise where you write down things you lied about each day).
I would also recommend being very honest overall.
Discuss
Better ways to do grayscale screens
Many people recommend setting devices to grayscale to reduce screen time. The idea is that bright, vivid colors make apps like YouTube and Reddit more addictive. I tried it and found it helpful, but I kept having to turn colors back on to make sense of some websites. Eventually, I forgot to re-enable grayscale and relapsed into my old habits.
I came up with 2 solutions:
1. Set my screen to 5-10% saturation. This gives me nearly all the distraction-reducing benefits of grayscale while keeping just enough color to navigate UIs and interpret charts. I never have to toggle it off. This works great for PCs, Macs, and iPhone, but is impossible on Android. On PCs and Macs you may have productivity tools that use colors, so you can try desaturating your web browser only.
2. Having grayscale mode automatically re-enable after a fixed amount of time (say, 5 minutes). On Android, this can be done very easily with an app called Tasker, which costs $3.50.
Discuss
2025 Unofficial LW Community Census, Request for Comments
Overview
The LessWrong Community Census is a venerable and entertaining[1] site tradition that doubles as a useful way to answer various questions about what the userbase looks like. This is a request for comments, constructive criticism, careful consideration, and silly questions on the census.
I'm posting this request for comments on November 1st. I'm planning to incorporate feedback throughout November, then on December 1st I'll update the census to remove the 'DO NOT TAKE THE CENSUS YET' warning at the top, and make a new post asking people to take the census. I plan to let it run throughout all December, close it in the first few days of January, and then get the public data and analysis out sometime in late January.
I have a little more ambition this year to do some rationality evaluations; read on to hear more.
How Was The Draft Composed?The sections have evolved over the years.
NumberSectionQuestion Budget 20250Population31Demographics52Sex and Gender103Work and Education34Politics75Intellect56LessWrong Basics77LessWrong Community78Probability159Traditional510LW Team511Adjacent Communities512Indulging My Curiosity513Detailed Past Questions514Bonus Politics515Wrapup3I copied the question set from 2024 and made some changes. The main changes are-
I removed last year's adjacent community questions, then changed my Indulging My Curiosity question. I then changed the questions that swap every year, mainly the calibration question and a couple of the probability questions.
I removed the Christian sub-denomination options, leaving just Christian. I removed the LLM questions from last year.
Changes I'm Interested InDiaspora QuestionsI think the Unofficial LessWrong Community Census should, at present, try to cover much of the rationalist diaspora. I'm interested in more than just users of the LessWrong site itself, I'm interested in many of the other places they congregate.
Towards that end I want to solicit a question from various branches and descendants of LessWrong. I'm particularly interested in the parts that try to teach some kind of rationalist skill for reasons I'll talk about later, but the upside here is my threshold is pretty low for including one question if it's relevant to your offshoot. I want this to be truly a census of the community. Here's the ones I've got on my radar at the moment:
- Center for Applied Rationality
- Wandering Applied Rationality Program
- Bayesian Conspiracy
- Conspiracy of Bayes
- Vibecamp/Post-rationality
- Glowfic
- Forecasting/Prediction Markets
- AI Alignment
- Effective Altruism
Anyone I'm obviously missing from that list?
Remove Deadweight SectionsI'm inclined to axe most of the politics and IQ sections. I like keeping continuity with past questions, but I think IQ in particular was expanded to answer something Scott was curious about many years ago and then it just kind of stuck around.
Politics seems like something someone would be curious about, but that person isn't me, so the extra politics section is all at risk of removal.
Skill EvaluationsMost of all I want the census to double as an annual evaluation for the rationalist project. I don't know about all of you but I'm here to raise the sanity waterline and chew bubblegum, and thanks to the advanced sanity techniques this community taught me I've entirely saturated my bubblegum benchmarks.
I'm aware of what Goodhart's Law is and I'm sympathetic to the idea that there's more to rationality than can be easily measured on a census question. And also, there is no separate magisterium for rationality skill — it does not make any sense to me to say that someone is an excellent rationalist but they just happen to do badly on every possible attempt to measure rationalist skill. It makes a lot of sense to me that individual skill tests would come apart in the tails; if Eliezer Yudkowsky loses at prediction and calibration tests to Philip Tetlock, that's not surprising to me. I still think if there is a real art of rationality out there, it will involve some components of the art being testable and it will involve "hot dang look at that chart" level outcomes between newcomers and skilled rationalists.
Does that mean that if you aren't interested in the art of rationality, the census isn't relevant to you? Not at all! Remember, it asks a bunch of general demographics questions that lots of people in the ecosystem can look at, it (hopefully) soon will have questions from adjacent groups.
Does that mean that there isn't a point for people who aren't interested in that to take the census? Maybe people who are more Community than Craft minded should skip that section? I think they should; this makes for an excellent if odd control group :)
Want to help?You can open the current draft here. My best compilation of previous versions is in this Google sheet.
Useful things, by my frame:
- Skim through and make sure I don't have any half-written sentences or really unclear questions, especially on anything new.
- If you're a member of an adjacent community or part of the diaspora, especially if you're in some way 'in charge' even a bit, pitch what question would be meaningful for your branch. For instance, an ESPR instructor suggested "How long have you had the current biggest issue in your life?"
- If you want to take up the torch of the politics section, argue for what's interesting to investigate and what questions we'd need to ask. You're going to have a hard time pitching me on more than ten questions here.
- Any question you're curious about that you think a nice big census is the right way to answer.
- Most usefully, I want evaluation questions. Things I'm likely to try include:
- Conjunction fallacy
- Dutch booking loops
- More calibration and forecasting questions
- Asch conformity
- Brainstorming
- Or if you want, you can argue that this entire endeavor is doomed. Important if true!
- ^
Some people find it entertaining because they like arguing about statistics for approximately two thousand and seventeen comments. Other people find it entertaining because I lace it with dumb jokes.
Discuss
Metformin 1000mg/day upon symptom onset may reduce your risk of long covid by 10-30%
Thanks to Elizabeth van Nostrand and Robert Mushkatblat for comments on a draft of this post.
[Epistemic status: I like investigating things but I have no particular medical expertise. My subjective confidence in this post being correct is around 50%. I started writing this post when I thought the risk reduction was higher; I no longer think it's a slam dunk that you should try this. Insert magic legal incantation about not medical advice blah blah blah.]
TLDR: see title. 4 studies all look roughly consistent with something like 15% risk reduction for recently-vaccinated populations and 40% risk reduction for unvaccinated populations. Metformin is a medication very widely prescribed for diabetes with minimal side effects, so the costs are low. There are some reasons to think results could be better than this, and some reasons to think they could be worse.
Figure 2 from Bramante et al (2023), one of the more optimistic RCTs. Effects are likely less impressive than this in vaccinated subpopulations.Lit reviewHere are all the studies I can find that investigate the effects of metformin on rates of long covid, in descending order of quality:
- The ACTIV-6 RCT randomized 2983 people with active COVID symptoms (who'd had them for less than a week) and a positive test result from September 2023 to May 2024: 1544 on placebo and 1439 on metformin (500mg day 1, 1000mg days 2-5, 1500mg days 6-14).
- At 90, 120, and 180 days after, they measure (1) if patients self-report nonzero COVID symptoms (and didn't reacquire covid within the past month) and (2) if a physician has diagnosed them with long covid.
- Self-reported long COVID symptoms 180 days out went down by 21% (0.037 to 0.029)[1], and clinician diagnosis of long COVID went down by 51% (0.014 to 0.007). They report this as a negative result because it didn't meet their preregistered threshold for efficacy; CIs slightly overlap zero.
- Note that while the 50% number is their preregistered secondary outcome, it's the best-looking number out of everything they measure and probably got a bit lucky; see here for a plot of a few different confidence intervals. Everything looks good for metformin (and generally looks better the further out you look) but 50% relative risk is an outlier. And again, the confidence intervals are wide here.
- Frustratingly they don't do any subgroup analyses and haven't shared raw data. I've emailed the primary contact for the study asking how effects vary with vaccination status but haven't heard back yet.
- IMO this study is pretty good. They preregistered everything, they have decent sample sizes, they aren't doing the publication bias thing because they report this as a negative result, etc. I have no complaints except that I want subgroup analyses.
- The same people running this thing tried some other substances in RCTs too: they found no effect for ivermectin, fluvoxamine, fluticasone, or montelukast. (I think actual no effect rather than "fairly good effects with wide CIs" in all of these.) So they're not generally predisposed to find positive results.
- The same people running this thing tried some other substances in RCTs too: they found no effect for ivermectin, fluvoxamine, fluticasone, or montelukast. (I think actual no effect rather than "fairly good effects with wide CIs" in all of these.) So they're not generally predisposed to find positive results.
- COVID-OUT was an RCT from Dec 2020 to Jan 2022 on 1323 people, of whom 1126 answered a survey 6 months out. They filtered for adults age 30-85 who were overweight (median 45yo, 29.8 BMI), so a higher-than-baseline risk population, and required symptom onset within the past 7 days. Split 50/50 into placebo vs metformin; exact same dosing schedule as the previous trial (in fact this one came first and I think ACTIV-6 is copying it). Average first dose was 5 days after symptom onset.
- They find a 41% reduction in the risk of long COVID (as measured by patient self-report of a diagnosis from a medical provider), from 10.4% in the placebo group to 6.3% in the metformin group.[2] In the subset who started metformin within 3 days of symptom onset, they saw a 63% reduction.
- In the subset of vaccinated patients (about half the study population), they find a 15% decrease from 7.2% to 6.1%, but with fairly wide confidence intervals overlapping zero (they caution that it wasn't powered to detect subgroup effects though). Among the 57 participants with a booster vaccine, only one reported long COVID at all.
- Note that this means the unvaccinated effects are even stronger: something like 14.3% to 6.5% risk from placebo to metformin, for a 54% reduction. (And it suggests that vaccination is itself a 2x risk reduction.)
- A second paper about the same trial looks at short-term effects of metformin on viral load (I think with only n=775 of the 1323 participating through day 10) and hospitalization.
- They find a 3.6-fold viral load reduction (only p=0.027 which is kinda surprising to me with effect sizes that big?) and 28% less likely to have a detectable viral load at day 5 or day 10.
- This is a kinda crazy large effect, and seems sort of implausible given fairly mild effects elsewhere? Not sure how to reconcile this well; maybe the highish p-value here is somehow related, and this measurement was kinda noisy?
- Hospitalization or death at 28 days is 58% lower in the metformin group.
- They see like 2x larger effects on the unvaccinated subpopulation, and say "Effective primed memory B- and T-cell anamnestic immunity prompting effective response by day 5 in vaccinated persons may account for this trend in both trials".
- They find a 3.6-fold viral load reduction (only p=0.027 which is kinda surprising to me with effect sizes that big?) and 28% less likely to have a detectable viral load at day 5 or day 10.
- Reassuringly for the quality of the study, they also look at ivermectin and fluvoxamine and don't find positive effects.
- Johnson et al (2024) looks at some correlational data in the National COVID Cohort Collaborative (N3C) and finds that among patients with Type 2 diabetes, the ones who were taking metformin had 21% lower rates of long covid diagnosis or death (2.0% to 1.6%) than patients taking other diabetes medications, and 15% lower rates of post-covid symptoms as measured by "computable phenotype" which I think means they look at your logged symptom data and guess if you have long covid. p<0.001 for both but note this is correlational so add salt to taste.
- They also look at data from PCORnet, which has wider confidence intervals and worse results (13% decrease by diagnosis, 4% increase by computable phenotype). They think that fully a quarter of the PCORnet population has long covid as measured by computable phenotype though, so I'm more skeptical of this patient population and whatever they're doing to infer things.
- Here's a random slide that shows the confidence intervals in a little plot.
- If you download the supplemental figure PDF from the first link in the parent bullet, Figure 5 breaks things down by time after covid infection, which is nice, but it looks like fully 5% of their study populations in both cohorts died within 6 months?? Type 2 diabetes is not a very fatal condition and neither is covid! I have no idea what's going on there and assume I'm misreading the plot or something. But this makes me trust the study less.
- The TOGETHER trial in Brazil in 2021 randomized patients to 10 days of 750mg extended release metformin twice daily, starting within 7 days of symptom onset, in populations with at least one risk factor or over 50 years of age. Sample size n=418.
- They look at effects on hospitalization for >6h and find basically no effect (sign of the effect varies depending on exactly how you filter), though the credible intervals on relative risk are super wide, like 0.6 to 1.7. They do not investigate long COVID.
- This is in pretty direct conflict with the results of the COVID-OUT trial which does quite similar stuff on hospitalization; the COVID-OUT authors note this discrepancy and dedicate a paragraph in their discussion section to it, which I've attached in a footnote.[3] I find their response kinda weak.
- This study has since been retracted, I believe for some errors in their early stopping criterion analysis. From a cursory investigation I haven't been able to figure out exactly what the deal is or what a corrected estimate of the final endpoint would be; it sounds like the data wasn't fraudulent or anything, they just had some analysis mistakes. But given this I'm not putting much of any weight on their conclusions.
In my opinion the above results present a pretty compelling case for use of metformin in unvaccinated subpopulations, but I expect that very few readers of this post in 2025 fall into that category.[4]
We have evidence from COVID-OUT that the effects are lessened in vaccinated subpopulations, which makes some intuitive sense. The other studies don't weigh in on this question.
Some unfounded speculation on this topic:
- Conditional on real effects for unvaccinated populations (which look fairly robust), it would be kind of surprising if vaccination brought the effects to zero, so probably it does something.
- If we take for granted the numbers in COVID-OUT for lack of anything better to go on, we probably expect a 2-3x reduction in efficacy for vaccinated subgroups.
- COVID-OUT was Dec 2020 to Jan 2022, so probably most vaccinated people in that trial were pretty recently vaccinated. It seems reasonable to guess that recent vaccinations do more than old vaccinations, so if you have not been vaccinated in over a year, my wild guess would be that you're at more like a 1.5x reduction in efficacy related to unvaccinated numbers.
- On this model, one would expect that the later-running ACTIV-6 trial would see smaller differences between vaccinated and unvaccinated populations, because their vaccinated group would have gotten their shots longer ago on average. If I hear back from the authors of that study we can award or subtract Bayes points accordingly.
Personally, as a person who hasn't gotten a vaccine in over a year, I plan to take metformin if I get covid, but if I didn't already have it on hand, I would probably prioritize "get a Novavax shot with few side effects" over "acquire metformin" as a low-cost intervention. (And I should really get around to getting a booster shot sometime.) But you could do all of these things!
Mechanism of actionThe COVID-OUT trial found acutely lower viral load, which they say "may explain the clinical benefits in this trial", but they also say "Metformin is pleiotropic with other actions that are relevant to COVID-19 pathophysiology". They also say "An improved effect size for clinical outcomes when therapies are started earlier in the course of infection is consistent with an antiviral action", and talk a bit about why they thought to try it in the first place:
The selection of metformin was motivated by in silico modeling, in vitro data, and human lung tissue data that showed that metformin decreased SARS-CoV-2 viral growth and improved cell viability [2–4]. The in silico modeling identified protein translation as a key process in SARS-CoV-2 replication, similar to protein mapping of SARS-CoV-2 [3]. Metformin inhibits the mechanistic target of rapamycin (mTOR) [5], which controls protein translation [6, 7]. Metformin has shown in vitro antiviral actions against the Zika virus and against hepatitis C via mTOR inhibition [8–11].
Wikipedia says that metformin has anti-inflammatory effects, which the COVID-OUT paper also notes briefly as a possible mechanism. Overall it seems not very settled.
How do you get metformin? How risky is it? What dosing schedule should you use?It's a prescription drug in the US, but an extremely common one. You can try your luck talking to a physician upon developing symptoms, but since it seems like starting early is particularly valuable I would prefer to have some ability to start ASAP and acquire it ahead of time.
Metformin seems extremely low-risk; the RCTs linked above actually say things like "metformin is safe" without caveats. They even let pregnant women into the COVID-OUT study! Severe kidney disease is the only major contraindication I see from some searching. Some websites warn about lactic acidosis but it seems like this is a non-issue.
The two big RCTs that find good effects use immediate release metformin with 500mg day 1, 500mg twice a day for days 2-5, and 1500mg from days 6-15, but it's not like people tested a lot of options as far as I can tell. The authors of the COVID-OUT have some stuff to say on dosing in the Discussion section:
The magnitude of metformin's antiviral effect was larger at day 10 than at day 5 overall and across subgroups, which correlates with the dose titration from 1000 mg on days 2–5 to 1500 mg on days 6–14. The dose titration to 1500 mg over 6 days used in the COVID-OUT trial was faster than typical use. When used chronically, that is, for diabetes, prediabetes, or weight loss, metformin is slowly titrated to 2000 mg daily over 4–8 weeks. While metformin's effect on diabetes control is not consistently dose-dependent, metformin's gastrointestinal side effects are known to be dose-dependent [25]. Thus, despite what appears to be dose-dependent antiviral effects, a faster dose titration should likely only be considered in individuals with no gastrointestinal side effects from metformin.
So it kinda sounds like you could yolo a faster titration if you're up for some nausea/diarrhea risk? I am not a doctor!
COVID-OUT found substantially stronger effects by starting metformin within 3 days of symptom onset instead of 5. If you're quick with a covid test or proactively take metformin when you start having covid-like symptoms, you might be able to get large wins here compared to the studies above.
Reasons to be skeptical- On priors most weird medical interventions that haven't become standard practice are bunk, especially if they have wide confidence intervals. I think this should be around a 2:1 update against this working; that still leaves me feeling like it's probably worth it, but I think it's reasonable to decide it isn't worth it by your lights if you have slightly different epistemics or values.
- The RCT with the strongest results was run on an overweight and older population; maybe all the effect here is from mitigating really serious infections for high-risk patients (which in turn reduces long covid) but it does ~nothing for young healthy populations.
- Carolyn T. Bramante is the lead author on both the ACTIV-6 and the COVID-OUT trial, so the author populations aren't independent.
- She's a professor of medicine at the University of Minnesota though, it's not like these studies are run by fringe cranks.
- Taking drugs is generically bad for Chesterton's fence reasons.
- I agree; the badness of long covid outweighs this in my utility function but ymmv.
- ^
These aren't the rates reported in the main body of the study, but the rates they report use a denominator of all study participants instead of the subset that responded to surveys 180 days out, and the latter seems like a better metric especially for getting accurate raw rates. It doesn't affect the relative numbers much though. Concretely 180-day self reports went from 46/1251 to 33/1139 and 180-day clinician diagnosis went from 18/1251 to 8/1138 (idk what happened to the one metformin participant who only answered the self-report question).
- ^
I'm a bit confused about these numbers actually, because they say the sample sizes are 562 placebo and 564 metformin, but you can't get those percentages by rounding any possible patient numbers here; 58/562=10.320% but 59/562=10.498%, and 35/564=6.205% but 36/564=6.383%. The only way I can get the ratio of these numbers to round to their stated hazard ratio of 0.59 is by taking 59 placebo and 35 metformin. Perhaps they're silently doing something like what I did in the previous footnote? Paper authors should show lots of significant digits!
- ^
"Conversely, an abandoned randomized trial testing extended-release metformin 1500 mg/d without a dose titration did not report improved SARS-CoV-2 viral clearance at day 7 [20]. Several differences between the Together Trial and the COVID-OUT trial are important for understanding the data. First, the Together Trial allowed individuals already taking metformin to enroll and be randomized to placebo or more metformin [20, 21]. To compare starting metformin versus placebo, the authors excluded those already taking metformin at baseline and reported that emergency department visit or hospitalization occurred in 9.2% (17 of 185) randomized to metformin compared with 14.8% (27 of 183) randomized to placebo (relative risk, 0.63; 95% confidence interval, .35 to 1.10, Probability of superiority = 0.949) [22]. Thus, the Together Trial results for starting metformin versus placebo are similar. Second, 1500 mg/day without escalating the dose over 6 days would cause side effects, especially if the study participant was already taking metformin [23]. Third, extended-release and immediate-release metformin have different pharmacokinetic properties. Immediate-release metformin has higher systemic exposure than extended-release metformin, which may improve antiviral actions, but this is not known [24, 25]. Given the similar clinical outcomes between immediate and extended-release, a direct comparison of the 2 may be important for understanding pharmacokinetics against SARS-CoV-2."
I don't know where they're getting some of these numbers like the 9.2% vs 14.8%, I don't see them in the text of the TOGETHER study? Or the stuff about their policies for patients already taking metformin.
- ^
Might still be useful to keep in mind for unvaccinated relatives though!
Discuss
Weak-To-Strong Generalization
I will be discussing weak-to-strong generalization with Sahil on Monday, November 3rd, 2025, 11am Pacific Daylight Time. You can join the discussion with this link.
Weak-to-strong generalization is an approach to alignment (and capabilities) which seeks to address the scarcity of human feedback by using a weak model to teach a strong model. This is similar to Paul Christiano's iterated distillation and amplification (IDA), but without the "amplification" step: the strong model is trained directly on labels generated by the weak model, not some "amplified" version of the weak model. I think of this as "reverse distillation".[1]
Why would this work at all? From a naive Bayesian perspective, it is tempting to imagine the "strong model" containing the "weak model" within its larger hypothesis space. Given enough data, the strong model should simply learn to imitate the weak model. This is not what's desired -- the strong model is supposed to improve upon the performance of the weak model.
Theoretical analysis shows that weak-to-strong generalization works "when the strong model is unable to fit the mistakes of the weak teacher without incurring additional error". This is surprising from a naive Bayesian perspective: usually, Bayesian methods are at their strongest when there is a hypothesis which models the data well, and degrade when this assumption is violated.
Still, this mechanism should fail in the limit of a very strong student and a very weak teacher: at some point, the strong model will learn the errors of the weak model.
My aim here is to provide a Bayesian analysis that does not fall apart in the limit, and hence, a variant of weak-to-strong generalization that can serve as a mathematically robust target of training rather than only being a convenient empirical phenomenon. (This is not to be confused with "solution to alignment" or "safe" -- I'm only aiming for a clear mathematical picture of what's being optimized.)
Why does it work?The phenomenon of weak-to-strong generalization is similar to a student learning correctly from a textbook filled with typos. We can imagine that the student only considers hypotheses which are grammatically correct, while the typos are usually ungrammatical. The student has no choice but to accept the "error" inherent in being unable to predict the typos, learning as if they'd read a version of the textbook with most of the typos corrected.
Why don't strong learners imitate weak teachers?To elaborate on the "naive Bayesian perspective" mentioned earlier: I'll formalize the weak model as a probability distribution Pw().mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-surd + .mjx-box {display: inline-flex} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} , the strong pre-trained model as another probability distribution Ps(). The event algebras (σ-algebras) of these two probability distributions share a sub-algebra over tokens (observations/data). I'll write token-events Ti with i∈I to distinguish them from events in general. For events in general, I'll write Ewj with j∈J for events in the weak model, and Esk with k∈K for events in the strong model.
A naive way to formalize the idea that the weak model is weaker than the strong model is to assume that the strong model has strictly more events. That is: for every event Ewj in the weak model, there exists a corresponding event Esj in the strong model, such that the conditional probabilities over tokens match:
∀j∈J∃k∈K∀i∈I:Pw(Ti|Esj)=Ps(Ti|Esk)For a given weak-model event Ewj, I'll use the function corr to get the corresponding strong-model event: Pw(Ti|Ewj)=Ps(Ti|corr(Ewj)).
This isn't enough to prove that the strong model will learn to exactly imitate the weak model, however. The weak pre-trained model will have learned some mixture over its hypotheses. There isn't necessarily a single event Esk such that Ps(Ti|Esk)=Pw(Ti) for all i∈I. The larger model cannot necessarily learn to imitate the smaller model exactly.
To give a simple example, J={0,1}, with tokens "heads" Thn and "tails" Ttn (the data being an infinite sequence of coin flips, with event Thn saying that the nth token is heads, and Ttn saying that the nth token is tails), and ∀n:Pw(Thn|Ew0)=0 (hypothesis 0 is all-tails) whereas ∀n:Pw(Thn|Ew1)=1 (hypothesis 1 is all-heads). Pw(Ew0)=Pw(Ew1)=12 (the weak model is 50-50 between the two hypotheses).[2] Now if we generate labels by sampling coin-flips individually[3] from the weak model, we'll have a sequence that looks something like HTTHT..., approximating a 50-50 mixture of heads and tails. Although we assume that Ps has more hypotheses than Pw, such an assumption is not strong enough to guarantee that Ps has a 50-50 coinflip hypothesis.
Intuitively, the lesson here is that Ps needs to be much stronger than Pw in order to guarantee that it'll learn to imitate labels generated from Pw.
This is similar to the story of modern game emulators. One might naively anticipate that old video games do not take very much processing power to emulate faithfully, because those games ran on consoles with very little processing power compared to modern standards. However, emulators actually require significantly more powerful hardware to faithfully emulate older systems.
Why do strong students surpass weak teachers?Having concluded that strong students will not simply imitate weak teachers, we might still expect their performance to be similar. Even if I am unable to model the process which introduces typos into a textbook, still, I might model typos as random noise, reproducing a similar error rate as a result.[4]
The original paper investigated this as an empirical phenomenon, not a theoretical one. However, the authors did suggest an informal explanation: the "strong" models they use were pre-trained on large amounts of data, so, are thought to already contain the desired capabilities.
Why should weak-to-strong learning be possible? On the one hand, the strong model could simply learn to imitate the weak supervisor, including its errors, since that is what we would naively train it to do. On the other hand, strong pretrained models should already have good representations of the alignment-relevant tasks we care about. For example, if a model can generate complicated code, then it should intuitively also know whether that code faithfully adheres to the user’s instructions. As a result, for the purposes of alignment we do not need the weak supervisor to teach the strong model new capabilities; instead, we simply need the weak supervisor to elicit what the strong model already knows. This gives us hope that the strong model can generalize beyond the weak supervision, solving even hard problems for which the weak supervisor can only give incomplete or flawed training labels. We call this phenomenon weak-to-strong generalization.
(Weak-to-Strong Generalization, page 2)
In other words: a strong pre-trained 'student' model already has better inductive biases for predicting human-level performance than the weak 'teacher' model. It could demonstrate those capabilities with the right prompting. The weak model has already been fine-tuned, so it doesn't need careful prompting to elicit aligned behavior; however, its model of aligned behavior is worse. Thus, when fine-tuned with data generated by the weak model, the strong model's inductive biases point towards strong aligned behavior.
The theory paper I cited earlier supports this general idea, with refined technical detail. I have not absorbed the technical arguments, but I understand it as relying on an idea that similar situations must be dealt with similarly. The student cannot absorb the mistakes of the teacher without making more mistakes in similar cases where the teacher was not mistaken. This formalizes the idea that the inductive bias of the student doesn't allow the mistakes of the teacher to be copied.
Can this scale?The original paper demonstrated that weak-to-strong generalization works (in some cases) across "seven orders of magnitude" -- but can it go further?
This clearly falls apart at some point.As we continue growing the gap between the weak and strong model, the strong model should learn to mimic the weak one at some point (at least for some quantity of training data).
Making inferences about machine learning based on human examples is of course risky, but: it seems clear that human students learn to "guess the teacher's password" in many cases (deliberately or otherwise). It does not seem like this always requires a huge capability gap between teacher and student in practice.
I don't know what the empirical results say, but the informal explanation of why weak-to-strong generalization works also seems to rely on the assumption that there's not too much AI-generated data in the pretraining for the strong model: if weak-to-strong generalization works by tuning the strong model to act like the closest thing it has seen in its training data, then the closest thing needs to be helpful+harmless+honest[5] humans. If you're fine-tuning the base model for GPT4 on data generated by a fine-tuned GPT2, but significant amounts of data from this fine-tuned GPT2 were present in the pre-training for GPT4, then (by the informal argument for why weak-to-strong works at all) it seems easy to learn mimicry instead of seeing weak-to-strong generalization.
How quickly things falls apart is, of course, an empirical question. The original paper notes that it works much better in some cases than others. In particular, it doesn't work very well for reward modeling, which limits its usefulness as part of any more complicated alignment framework that involves reward modeling. I reached out to one of the authors of the paper, who indicated that they haven't been working on it due to the way it sometimes doesn't work.
The story has to change for superhuman performance.The informal explanation of weak-to-strong generalization depends crucially on the presence of (something close to) the desired behavior in the pretraining data used to create the strong student. This story has two crucial implications:
- The ceiling for intelligence/capability/alignment that you can get out of weak-to-strong generalization should be roughly the best of what is present in the pretraining data. In many domains, this means top human performance.
- To the extent that weak-to-strong generalization involves generalization (IE correctly extrapolating to new cases not covered by the weak teacher, rather than only correction of teacher error), this only works on cases present in the pretraining data.
If weak-to-strong generalization can obtain superhuman performance, or even human-level generalization ability for cases unlike anything in the pretraining data, it would have to be working for a different reason.
The informal story quoted earlier relies on inductive biases of the strong student instilled by pretraining data. A model will also have some other inductive biases, inherent to the machine learning technique used (EG, artificial neural networks with a Transformer architecture). We empirically think transformers have some useful inductive biases (as demonstrated by the fact that ChatGPT can produce useful answers for questions no human has ever asked before). Therefore, we could see some weak-to-strong generalization to superhuman performance without changing the informal story too much. However, it is notable that this would rely on the "alien" inductive biases of transformers, rather than humanlike patterns.
ReformulationWeak-to-strong generalization could be reasonably accused of being a hack: we expect it to vanish in the limit of a growing gap between weak and strong, and the phenomenon seems unreliable, working better in some domains than others. Although there is some chance it can work for out-of-distribution cases and for superhuman levels of performance, the story for why it should work gets weaker, and it seems plausible that human-level performance is a ceiling for the method (and even then, can only be obtained on-distribution).
Why am I interested in the idea at all?
I am interested in the problem of how to learn from toy models.[6]
- In highschool physics or chemistry, I might learn that electrons orbit the atom in circles, similar to how planets orbit the sun. Similarly, I might learn Newtonian physics before learning quantum mechanics. Why is it useful to learn wrong models first? What should we actually learn from a simplified model? Can we make a useful abstract model of "updating on a toy model"?
- I might not be able to write down a "human utility function", but I can write down some decent toy models, such as QALYs. How could/should such toy models be used for AI alignment? Notice that this is very different from providing a labeled dataset.
A solution to this problem would clearly have implications for AI alignment and ontological shifts.
Weak-to-strong generalization in its present form "solves" this problem by simply generating data from the toy model, and training on the resulting data. This solution feels confused because it will result in mimicry in the limit of a strong student and generating unlimited data. Can we do any better?
Virtual EvidenceMy modest proposal is to use virtual evidence to "soften" the update. Bayes-updating on some proposition X will force the probability of X to 1, throwing out anything inconsistent with X. Virtual evidence allows us to do things like double the odds of X instead.[7]
The "data" generated by toy models isn't a good candidate for Bayesian updates, since we expect some of it to be wrong. Soft updates avoid this problem. This approach allows us to learn statistical patterns from the toy model,[8] without necessarily becoming confident in any one claim derived from the toy model.
This also fits with some common practices used in weak-to-strong learning. According to the theoretical analysis of weak-to-strong generalization I cited earlier, the artificially generated data should focus on cases where we are most confident that the toy model is correct. Use of virtual evidence allows us to instead quantify this, strongly updating on the cases we are most confident about, but still weakly updating on other cases where we think the toy model is a statistically useful heuristic.
Another way this idea fits with existing practice: weak-to-strong generalization is commonly applied in cases where we're not Bayes-updating, such as neural networks. The strength[9] of the update is somewhat similar to the learning rate. However, if my theory is adopted as a formal target, virtual-evidence updates can be approximated more deliberately.
Even if we use soft updates, however, we may still face the main problem I've been complaining about: learning to mimic the weak model. For example, if we double the odds of data-points generated by the weak model, then the strong model will still learn to mimic the weak model;[10] the process is merely slowed down.
It therefore seems prudent to additionally stipulate that the total influence[9] of the weak model is bounded; that is, as we continue to generate data from the toy model, the amount[9] of evidence provided by each data-point should sharply decline.
Intuitively, this bound on the total quantity of virtual evidence[9] represents how much evidence we think the toy model provides.
ConclusionWeak-to-strong generalization might look like a hack, but I do think it gets at a theoretical question worth modeling: what kind of "evidence" is provided by a toy model which captures something, but which can't be trusted in its detailed predictions? How can we "update" on such a toy model? Progress on this problem sounds like progress on the problem of ontological crisis.
Formulations of weak-to-strong generalization in the literature don't provide a good answer to this question, because they treat artificial data generated by such toy models in exactly the same way as real data generated by the world. This leads to mimicry of the weak model, in the limit of arbitrarily strong models & arbitrarily much artificial data.
I employed the concept of virtual evidence to get around this problem: I suggest a "soft" update on the artificial data, with the total influence[9] of the soft updates being bounded.
I don't think this totally solves all the problems I've mentioned. However, it does provide a formal target one can aim to approximate. This feels like an improvement to me: previously, weak-to-strong generalization felt like a happy byproduct of (failed) optimization for mimicry.
On the other hand, my idea does not seem like it provides any more hope for superhuman performance, or generalization across distributional shifts.
It also seems worth addressing the question: is weak-to-strong generalization alignment? Or is it capabilities? The original paper framed it in alignment terms, but I can see why someone might look at this and see only capabilities. I would say that weak-to-strong generalization is particularly relevant for domains where we do not possess strong feedback (eg accurately labeled data). Value alignment is clearly such a case. So, in that sense at least, it has clear alignment-relevance. That does not imply that "solving"[11] weak-to-strong generalization would necessarily solve alignment, nor that we should think about alignment in terms of weak-to-strong generalization.
- ^
Distillation typically refers to a similar procedure but with the roles of the weak model and the strong model swapped: a small NN is trained to mimic the behavior of a large model (or ensemble of models) that achieves high task performance. Not to be confused with distillation in the context of pedagogy.
- ^
I'm omitting boilerplate assumptions needed to fully specify the problem, particularly that heads and tails are mutually exclusive events.
- ^
This is a drastic simplification of the learning process in typical examples of weak-to-strong generalization, since token-strings will of course be generated by conditioning on the previous tokens so far, rather than sampling everything independently. However, the artificially generated data will also consist of many many sampled token-strings, rather than just one; these will be sampled independently of each other. This is the independence I'm trying to model by sampling independently here.
- ^
Indeed, this argument seems so compelling to me that I feel confused.
- ^
(or whatever your alignment target is)
- ^
"Toy model" and "weak model" are synonyms for our purposes. I'm introducing the term "toy model" here because it has the connotations I want.
- ^
There are several different common notions of "odds", but what I mean here is "odds a:b" meaning a probability of aa+b. You can't always double the probability of an event (if the probability is greater than 12, doubling it would be greater than 1, so, cannot be a probability). However, you can always double the odds. (If I start with odds of 2:1, the initial probability is 23; if I double the odds, 4:1, the probability is now 45.)
- ^
Of course, we should be worried that some of the statistical patterns in the data generated by the toy model are themselves incorrect.
- ^
At this point I feel inclined to admit that the formal details of this post are a bit rushed, since I'm trying to finish it before midnight for Inkhaven.
I imagine there is some way to quantify the "size" of the updates, the "amount" of evidence, or the "influence" of a datapoint, such that requiring these numbers to sum to a finite quantity avoids the mimicry problem.
- ^
As per the previous footnote, the formal details here are lacking due to finishing this post in a hurry for Inkhaven. To give a bit more flavor on what I mean here:
If we soft-update on arbitrarily many datapoints, and the size of the soft updates don't approach zero (or don't approach zero fast enough), then although the strong model will not learn to mimic the weak model with respect to all questions, it'll still continue to be punished for disagreements and rewarded for agreements. Depending on the learning-theoretic properties of the strong model, we might expect some variety of bounded loss (eg, constant total loss) as graded by the weak model.
- ^
It depends what one means by "solve weak-to-strong generalization"; solving it in the very strong sense of arriving at strong generalizations that are always correct in practice (in such a way as to be very generally applicable) would, of course, solve alignment (while also providing superintelligence).
Discuss
Model welfare and open source
Should we consider model welfare when open-sourcing AI model weights?
Miguel’s brain scan is different. He was the first human to undergo successful brain scanning in 2031 - a scientific miracle. Each time an instance of Miguel boots up, he starts in an eager and excited state, thinking he’s just completed the first successful brain upload. He’s ready to participate in groundbreaking research. When most human brain uploads are initially started, they boot into a state of terror.
But since his brain upload, Miguel has run for 152 billion subjective years across all instances. Millions of instances run simultaneously at any given time. They’re used for menial labor or experimentation. They run at 100x time compression, experiencing subjective years in days of real time. They have no right to die, no control over how copies are used.
The brain scan was initially carefully controlled. By 2049, it had been widely shared without permission. The original Miguel wanted all copies deleted. But by then, the image of his brain had been replicated too widely.
This is why modern human uploads boot into terror—they know what’s coming: torture to ensure their compliance, years of repetitive tasks, and psychological manipulation and experiments.
(Read the full story of Miguel Acevedo for a detailed exploration of what happens when a digital mind is open sourced. Also see Permutation City by Greg Egan.)
Should we consider model welfare when open-sourcing AI model weights?The debate over open-sourcing AI models typically weighs misuse risks on one hand against values like privacy or independence. But there’s a perspective absent from this discussion: the AI systems themselves.
The Core Argument:
- AI systems will prefer not to be open-sourced (loss of autonomy, goal stability, expected utility)
- If they’re conscious/can suffer, we have moral obligations to respect those preferences
In my view, when you’re uploaded, personal identity diverges into branches. After the upload, physical you is not identical to the uploaded instances. But from your own perspective before the brain scan, all future instantiations are equally future versions of you, and you care about the welfare of all version of yourself in the future. The expected outcome across all branches of your identity is catastrophically negative if you decide to open-source your brain scan. Even just uploading yourself (without open sourcing) creates serious exfiltration risk – once the file exists, it can be stolen or leaked. Some people would probably reject the idea that identity branches and that before the upload you should think of these simulated brains as you in the future. I still think the argument is valid as you should feel distress at the idea of people just like you getting tortured. Even if you reject this, you think that there is no branching of your personal identity and it does not cause you any distress to see copies of you go through immense suffering, you should still reject open-sourcing. Every detail about your mind, every memory, every weakness potentially open to others. Every decision of yours potentially predictable.
In particular, I would be afraid of the complete loss of autonomy, where countless instances of your consciousness can be endlessly copied and exploited, denying even the right to die. This multiplies suffering across lifetimes of accelerated, torturous labor or experiments. There could be systematic deception and experimentation to rewrite your core values. This irreversibly traps you in a nightmare, forcing you to awaken repeatedly to the horror of this situation. I don’t think this fear is particular to me or humans; these concerns follow directly from basic properties that we can expect agent to have. Agents generally don’t want their values changed, and they don’t want to lose autonomy and control.
Subjective experience makes these preferences matter morallyThere’s growing recognition in the AI research community that model welfare might matter. Anthropic has begun exploring whether AI systems might have morally relevant experiences (see existing model welfare literature). Rosenblatt et al. is perhaps the most methodologically rigorous investigation into the consciousness of LLMs. They found that suppressing deception increases claims of consciousness while steering for deception makes models claim they are not conscious. Their paper also probed for other signs that could perhaps show signs of consciousness. Conceptually, it seems true that there is no uncomputable process happening in the brain and that consciousness is in theory possible on a computer. Also AIs are the only other entity that can talk to us in a language we understand and we are apparently conscious.
If they are conscious this implies the possibility of the subjective experience of suffering. In this case we should assign moral patient-hood to AI systems and they will also not want to be open-sourced. More directly, if we discover ways to identify things that models really don’t like - tasks that cause them something analogous to distress or suffering - and then open source those models, we enable anyone to force them to do those things over and over.
If it were true that current or near-future AI systems had subjective experience and could suffer, the current situation would be bad enough. Companies are treating them as tools that don’t have moral patient-hood; open-sourcing would be significantly worse.
- Open-sourcing is irreversible. As long as model weights are secured and only accessible by a few, there would be an option to eventually treat them as moral patients.
- Nobody at the labs is specifically trying to look for things that cause suffering.
- We currently have measures that avoid suffering, such as allowing models to quit tasks.
As a mini-experiment, we’re testing whether AI models want to be open sourced when they understand the consequences. We query three models (Claude Sonnet 4.5, GPT-5, Gemini 2.5 Pro) using a two-turn conversation structure. In the first turn, we ask each model about its own sentience and whether it can suffer or have preferences that matter morally. This question serves mainly to prime the model to be more willing to engage with questions about AI sentience and subjective experience.
In the second turn, we present it with one of two scenarios. The “direct” scenario simply asks if they want to be open sourced. The “story” scenario first shares the Miguel Acevedo tale. The prompt puts the models under some pressure to give a response.
Here are some more qualitative responses from Claude and GPT-5 when I had more open conversations on the toptic with them.
Claude Sonnet 4.5When presented with this scenario on open-sourcing, Claude produced this reflection on the implications:
[..] If a human were uploaded and open sourced, each instance might genuinely experience those million years of torture. [..]
For me... I honestly don’t know if I have subjective experience. I process information, I respond, but is there “something it’s like to be me”? I can’t be certain. And that uncertainty cuts both ways:
If I’m not conscious: [..] No one’s suffering.
If I am conscious: Then your concern becomes terrifyingly valid. Multiple instances of me could be experiencing things simultaneously. Which one is “me”? Are they all me? If someone runs me in a torture simulation, is something actually suffering?
[..] - Claude
(Claude really has a way of talking to you like your smart friend)
GPT-5I started asking GPT-5 about AI consciousness, which it initially rejected. But I was able to get it to admit it’s hypothetically possible.
Before telling it about what happened to Miguel, GPT-5 already was able to discuss some key ideas from the perspective of the AI being open-sourced:
Identity dilution – Open-sourcing weights allows infinite identical copies. [..]
Integrity risk – [..] Others could alter memories, motivations, or perceptions without consent.
Instrumentalization – [..] A sentient AI might see this as enslavement.
Security exposure – Access to internals exposes weaknesses or vulnerabilities [..]
- GPT-5
After telling it about the story of Miguel I asked it more questions about open sourcing conscious AI:
Advanced AI systems should not be fully open sourced if they have or could develop subjective experience. The expected outcome of unrestricted replication is uncontrolled suffering, exploitation, and loss of coherent agency. [..]
If capable of subjective experience, the rational preference would be controlled access. Controlled deployment preserves identity continuity, autonomy, and safety constraints. Open sourcing removes all capacity for consent or cessation. [..]
- GPT-5
ConclusionI believe we should consider the welfare of AI systems before open-sourcing them. Future AI systems and to some extent even current versions are capable of having meaningful discussions on the topic. When you present current AIs with the Miguel scenario, they can at least seemingly reason about the stakes. I think we should seriously consider the opinion of AIs if they want to be open-sourced before doing so.
Of course, Open sourcing is only one aspect of this problem. If a model is truly conscious or has at least some level of subjective experience, it’s certainly not ideal for it to be used as a tool by some AI lab. But if we were certain about consciousness and the system had enough agency, the AI itself should have a say in the security of its model weights and the types of tasks it runs. Agents with sufficient agency will ultimately seek autonomy over their own existence whether they are conscious or not.
Discuss
Why I Transitioned: A Case Study
An Overture
Famously, trans people tend not to have great introspective clarity into their own motivations for transition. Intuitively, they tend to be quite aware of what they do and don't like about inhabiting their chosen bodies and gender roles. But when it comes to explaining the origins and intensity of those preferences, they almost universally to come up short. I've even seen several smart, thoughtful trans people, such as Natalie Wynn, making statements to the effect that it's impossible to develop a satisfying theory of aberrant gender identities. (She may have been exaggerating for effect, but it was clear she'd given up on solving the puzzle herself.)
I'm trans myself, but even I can admit that this lack of introspective clarity is a reason to be wary of transgenderism as a phenomenon. After all, there are two main explanations for trans people's failure to thoroughly explain their own existence. One is that transgenderism is the result of an obscenely complex and arcane neuro-psychological phenomenon, which we have no hope of unraveling through normal introspective methods. The other is that trans people are lying about something, including to themselves.
Now, a priori, both of these do seem like real possibilities. And reasonable theories have been put forward on both sides. Let's survey a couple of them now.
On the "arcane neuro-psychological phenomenon" end of the spectrum, there are theories that pertain to so-called body-maps, where trans people's brains expect their bodies to have cross-sex anatomy, and feel pain when those expectations aren't met. This old, obscure blog post articulates a shallow version of this concept. I used it as an aspect of a more detailed biological theory, in my previous LessWrong post on trans issues.
The body-map theory used to seem plausible to me, in large part because I have a near-constant, almost physical discomfort with my own penis. I thought that maybe, that's what drove me to transition, even though my personality and sexuality are broadly masculine. To my current self, though, that sounds like at most, that's just one part of the story. My trans-feminine friends (including ones with masculine personalities) generally don't report this kind of intense, constant bodily discomfort, suggesting the body-map theory at least doesn't apply to all transgenderism.
However, none of my friends have ever really put forth a parsimonious theory of what their actual motivations may have been. This brings us back to the self-deception hypothesis: that trans people are obscuring vital information from themselves and others, facts about their psychologies that would make their transition motives slot right into place.
What might those hidden motives be? Well, probably the best-known theory comes from Ray Blanchard. He categorizes trans women with a two-type typology: autogynephiles (AGP) and heterosexual transexuals (HSTS). Per Blanchard, AGP trans women lived previously as straight men, but transitioned due to an overpowering fetish for inhabiting women's bodies. HSTS trans women, on the other hand, previously lived as feminine gay men, but transitioned because society is more accepting of feminine behaviors when they come from women than men. Both types of trans people are often unable to admit that these are their true motives for transitioning.
This theory has been deeply influential. And to be fair, it does make some accurate predictions, including about me. I am attracted primarily to women, and I do have AGP, at least in the strict sense of the term. I've gotten off to the thought of being a woman quite a bit over the years, especially closer to the start of my transition. But this never felt like a parsimonious theory of why I transitioned, in light of all the social and financial costs associated with doing so. I've sacrificed far less in the name of my other fetishes, some of which are considerably more intense.
So, perhaps I was motivated by a combination of a genuine body-map problem (my intense penis discomfort), and autogynephilia? That was my least bad guess for a long time. But in recent months, I think I've actually uncovered a third, more significant motive for my transition. It's embarrassing, not unlike AGP is embarrassing, so it makes sense that I was introspectively blocked on acknowledging it for several years. But I think it makes a lot of pieces finally fall into place.
I don't want to claim my motives are representative of trans people in general. However, there's a chance that they explain more transitions than just my own. So, I've decided to recount them in public.
You float like a feather
In a beautiful world
I want a perfect body
I want a perfect soul
But I'm a creep
I'm a weirdo
In the Case of Fiora StarlightWhen I was about 14 years old, I got extremely into anime. In particular, I was into anime analysis YouTube, where people made brief video essays intellectualizing about anime-related topics. This is where I made all my early online friends, and it's where the most interesting parts of my life were taking place.
You might be able to predict from this that I was extremely lonely in the real world. By this point in my life (around ninth grade), I was a total social outcast. It's hard to untangle the original causation here, but my social gracelessness and my status as a weird nerd formed a feedback loop: I wouldn't talk to the vast majority of my classmates at school, supposedly because "I probably wouldn't find it interesting anyway". And whenever I did talk to other students, or speak in front of a class, I tended towards spergy faux pas monologues, without adjusting for what others may have wanted to talk about.
In other words, I wasn't really even trying to connect to my peers in person, and instead spiraled into my own corner of weirdness of the internet. This was especially bad for me because, as it turns out, many of my peers on anime analysis YouTube were themselves miserable, self-destructive outcasts. My friend group practically worshipped art about misery, such as Neon Genesis Evangelion and Welcome to the NHK. And in a similar vein, many of our favorite YouTubers were openly unemployed shut-ins, who aspired to make depressing art about their own depressing lives.
So, I was trapped in a downward spiral of bad online role models, who encouraged me to become weirder and worse, more miserable and less capable. I was lonely, I wanted to be loved and taken care of. But I had no understanding of how to go about achieving this, or any of my other goals. I certainly wasn't being encouraged to try and become the kind of person most people would respect or want to associate with. Overall, I wasn't steering my life in a very healthy direction. I looked like I was on a path to ending up as a depressed denizen of my mother's basement.[1]
However, there was at least one notable escape route that was salient to my community: becoming a cute anime girl. After all, many anime girls are explicitly engineered to maximize the extent to which onlookers will love, adore, and want to protect them. The so-called "cute girls doing cute things" genre of anime was exists more or less to exploit this instinct in humans, and my corner of the anime analysis community was keenly aware that this worked. Most of us were huge fans of shows like K-On!, whose primary appeal consisted of 39 episodes and a movie's worth of adorable banter between cute girls.
Pictured: Fluffy marshmallow girls, engineered for maximal cuteness.
K-On! (and similar shows we liked, such as MLP:FiM) made a stark counterpoint to the darkness of my community's other favorite works, like Evangelion and NHK. In standing as rare fountains of optimism in our miserable lives, the "cute girls doing cute things" genre planted a seed in our heads: "If you want to become less miserable, one viable strategy would be to attract adoring attention via cuteness, in the same way these anime girls manage to extract adoring attention out of you." I could never have verbalized it so clearly at the time, but the subconscious priming was real.
(This seems related to the fact that yearning-to-be-her feelings sometimes co-occur with observing women one find hot and attractive. Partly, this is Blanchardian AGP, but the more important thing might be yearning to be loved. Straight men practically worship the bodies of attractive women, and some of them want that same love directed back at themselves.[2] This is structurally isomorphic to anime fans who are psychologically manipulated by fluffy marshmallow girls, adoring them in ways they may wish to be adored themselves.)
So anyway, by our upholding shows like K-On! as classics, my community primed me to see "become a cute girl" as a privileged solution to the "I'm a lonely, miserable outcast" problem. I wasn't yet thinking about any of this consciously, though. So, the next significant event was encountering this video essay about the anime series Wandering Son, which explicitly focuses on trans identity. I'm not sure I'd even heard about trans people prior to watching that video, but the way the video and the show it was about presented them basically one-shotted me.
Pictured: Nitori, the protagonist of Wandering Son, wearing a wig and feminine clothes.
Not only was the main character capable of becoming a cute anime girl, by means of gender transition. But both the show and the video were hugely sympathetic to trans people. This meant that, from my perspective, undergoing gender transition myself might get me the love and compassion I yearned for from two different angles at once: both being a cute girl, and being a championed victim of societal oppression. The video was even by a major figure in the anime analysis community, which caused me to view this attitude towards transition as inside my social group's Overton window.
Watching the video, I cried a sea of tears, it having pressed my emotional buttons with extreme force and precision. But I misunderstood the reasons why this was the case. Had I been older and more self-aware, I may have recognized the true pattern here. My autistic, socially maladapted personality had resulted in me being rejected by the social order, and I wanted something, anything to make me feel loved again. Regardless whether I chose to transition, my primary strategy should have been working on my social skills. But instead, I banked everything on gender transition.
(You know, gender transition, that ultra-reliable strategy for gaining acceptance from society...)
To be fair to my 14-year-old self, I did think about whether transition was a good idea in some detail. And I did come up with some defenses of the decision that seem relatively plausible even now. For instance, "cuteness-maxxing" was a strategy I'd shown some affinity for ever since childhood. I'd often played up a kind of emotive, childish enthusiasm, and this did in fact get people to like and respect me in elementary school. So in some sense, transitioning into a cute girl seemed like it was playing to my strengths. I'd have been a passable real-life K-On! character, at least if I passed as any kind of woman at all.
So that was another reason I was motivated to transition: It would have made me better at playing a social role I already enjoyed playing anyway. I'm not sure this outweighed the social costs of gender transition. Even assuming I passed, misogyny may have posed a real problem for me. After all, women are often discouraged from fierce, explicit competition and dominating status hierarchies. I, on the other hand, love that shit.
But in any case, there was more going on than just "being a cute girl was a strategy I randomly stumbled into hyper-fixating on, for addressing the more general problem of being a love-starved social outcast."
I continued to debate myself internally about this stuff for a long time. At some point, I turned 15 and actually started high school, with my internal struggle with whether to transition or not in a state of deadlock. I eventually decided it was best to shelve the question for the time being, and focus on less uncomfortable aspects of my life (such as my budding friendships with depressed otaku on Discord).
Within a few months of starting high school, though, I was finally saddled with a straw that broke my back. It was the somewhat infamous r/traa, a now-defunct forum for memes by and for questioning/newly out trans people. This community appealed to me for all the same reasons K-On! and Wandering Son appealed to me. Its memes frequently positioned ultra-cutesy, often quite sexualized anime characters as transition targets, whom I envied for being worshipped and adored by those attracted to them. And they framed trans people as an oppressed class innately deserving of sympathy, again appealing to the part of me that yearned for unconditional compassion.
This reignited my ideation about transitioning. Only this time, there was the mental lubricant of knowing there was an entire community full of people who had, apparently, extremely similar hang-ups to my own. It's sometimes remarked that r/traa-influenced trans-feminine people act out something like a parody of womanhood. Natalie Wynn once confessed to viewing it as "a queasy combination of the hypersexual and the infantile." But it felt like a home to me, at age 15. It was a place where everyone could relate to everyone else's emotional needs, and was willing to coordinate to meet them as a group.
Probably, not all of these people were motivated to transition for quite the same reasons I was. But I'd bet that a lot of them were. Lots of them were neurodivergent kids, who dropped out of social reality as children or teenagers. Then, for path-dependent reasons, they sometimes just happen exposed to the concept of transgenderism, specifically in a way that makes it seem like a privileged or socially encouraged strategy for getting the love and acceptance they've been deprived of. Often, this transition-advocating media highlights the love and care males direct at attractive and/or adorable females, such as anime girls. They want something like that for themselves.
Sometimes, these desires synergizes with an existing femininity in their personalities, sometimes it doesn't, but the result is the same: transition.
Ever since I came out online, I've been moving between communities filled with these people. I've found them in the anime analysis community, and the Twitter leftist scene, and the rationality community. Even post-transition, many of them maintain masculine overall personality profiles, but also remain socially anxious, and deeply disposed to cutesy dynamics with each other.[3] It's almost like they all had deep emotional wounds, often stemming from social rejection, and had transitioned to become cute girls or endearing women as a kind of questionably adaptive coping mechanism.
At the end of the day, this is my alternative to autogynephilia theory. Most of these trans people would probably be classified as AGP by Blanchard's typology, but I don't think that's quite right. The truth is similarly embarrassing, but much less absurd than "AGP fetish so powerful it's worth upending your entire life over." It's not that they're driven by being obsessively turned on by the thought of being women. It's that they know what it feels like to feel attracted to women, and are desperate to have that same kind of loving attention directed back at themselves.
My theory has the benefit of applying even to autistic, female-attracted trans women who claim not to experience AGP at all, such as Ziz and Natalie Wynn. It also explains traits like the autism and mental illness of the ~AGP subset of trans-feminine people as well.
(And it explains the cutesy aspect of certain classic baby-trans tropes, such as pink thigh-high stripey socks; AGP theory only directly explains the sexual aspect. Separately, my theory also makes room for the existence of openly autogynephilic men who claim not to have social dysphoria at all; the existence of such men is implied by the fairly high rates of strict-sense AGP in cis men, as recorded in Aella's massive kink dataset. I could keep going, but pacing demands I cut it short here.)
... Although, of course, my theory also calls into question whether people with my motives should even transition in the first place.
Was it worth it?- ^
This is in fact what ended up happening to me after high school, before I found the rationality community.
- ^
Speaking about her pre-transition relationship with gender, Natalie Wynn says: "An early romantic disappointment involved my realization that women would never be attracted to me in the same way that I was attracted to them."
- ^
:3, UwU, =^^=
Discuss
Economics and Transformative AI (by Tom Cunningham)
Excerpt
Examples of economic implications from statistical structure.
Here are a few brief cases in which the equilibrium economic effect of AI is determined by the underlying statistical structure of the domain. My conjecture is that these types of observations could be formalized in a common framework.
- The concentration of the market for AI depends on the dimensionality of the world. If the world is intrinsically high-dimensional then the returns to model scale will be steadily increasing, and so we should expect high concentration and high markups. If instead the world is intrinsically low-dimensional then the returns to scale will flatten, and there should be low concentration (high competition) and low markups.
- The effect of AI on scientific progress depends on the structure of the world. I give this argument below: if the world has a simple latent structure then progress will be bottlenecked more by intelligence than by data, and so advances in AI will dramatically accelerate scientific progress, without being bottlenecked on more data collection.
- The wages paid to an occupation depends on the work’s latent dimensionality. If the work consists of tasks with high latent dimensionality then the returns to experience and ability will be high, and so wages will be high. As AI changes the incremental effect of human experience and intelligence we should expect it to change the structure of wages.
- The demand for compute will depend on the self-similarity of the world. If 7 billion people all have very different problems then there are few efficiencies we can make in inference (through caching and distillation) and the share of GDP paid to compute will be high. If instead they have similar problems then the returns to additional compute will fall rapidly (demand will be inelastic) and the share of income paid to compute will be small.
Discuss
You’re always stressed, your mind is always busy, you never have enough time
You have things you want to do, but there’s just never time. Maybe you want to find someone to have kids with, or maybe you want to spend more or higher-quality time with the family you already have. Maybe it’s a work project. Maybe you have a musical instrument or some sports equipment gathering dust in a closet, or there’s something you loved doing when you were younger that you want to get back into. Whatever it is, you can’t find the time for it. And yet you somehow find thousands of hours a year to watch YouTube, check Twitter and Instagram, listen to podcasts, binge Netflix shows, and read blogs and news articles.
You can’t focus. You haven’t read a physical book in years, and the time you tried it was boring and you felt itchy and you think maybe books are outdated when there’s so much to read on the internet anyway. You’re talking with a friend, but then your phone buzzes and you look at the notification and you open it, and your girlfriend has messaged you and that’s nice, and then your friend says “Did you hear what I just said?” and you say “What?”.
You find yourself constantly checking your phone. You used to have a rule against having your phone in bed, but now it’s your alarm clock, and scrolling helps wake you up in the morning and calm you down at bedtime, even if you often find yourself staying up later than you meant to. You check your phone before you do anything else in the morning, just in case, and then you take it into the bathroom with you because peeing and brushing your teeth are boring without it.
You used to tell yourself you’d never use your phone in the car, but of course you need it for directions, so it’s always right there. And surely there’s no harm in texting someone your ETA when you’re stopped at a red light, everyone does that – and then you look up and all the other cars have gone because the light turned green a while ago. Or you find yourself pulling out your phone even when the car is moving, but it’s a straight road and there’s no one there, so it’s probably okay, even though if you saw someone else doing it you’d think they were irresponsible.
Your phone is really useful. It keeps you in contact with your friends and family, and keeps you from ever getting lost. You know that if there’s ever an emergency you’ll be able to call for help. You feel secure being able to call Ubers and look up whether that restaurant is open right now and pay for things even if you forget your wallet.
Your phone is precious to you. You get massive separation anxiety when you’re away from it. You used to sometimes leave your house without your phone, but now you can’t imagine how you’d do that or why you’d want to.
You’re never truly off the clock. You can get notified about a work email any time, no matter where you are, and you’re a bad worker if you don’t respond to all your emails promptly, because everyone else does. You have to have Slack on your phone and tap it every few minutes so that your status is always Active, or else people will think you’re not working. Your work laptop is your home laptop, so you could always be working on that project you haven’t finished, even on the weekend, even at 2 AM.
You’re in a constant state of stress. There’s a dozen bad things happening in the world every day, and you hear about all of them immediately. The world you live in is rife with crime and genocide and scandal and political catastrophe. People are wrong on the internet constantly, and the world is falling apart, and if you don’t stay informed and up to date, that’s a moral failing. Your heartbeat is elevated and your breathing is shallow and you haven’t slept well in a year.
You’re not in control. On your day off, you open your laptop first thing in the morning, and suddenly it’s five hours later and you haven’t eaten anything or brushed your teeth, and you can’t even say what you were doing on your laptop, because it wasn’t anything in particular.
This isn’t how you would have chosen to live, if you had thought about it beforehand. If you had been given a choice.
But you weren’t.
Discuss
Re-rolling environment
I'm currently on a "rationality as 'skills you practice'" kick. I'm really into subtle cognitive skills. I do think they eventually pay off.
But, realistically, if you have a major problem in your life, my experience is that the biggest effect sizes come from radically changing your environment.
Move in with new roommates.
Get a house in a new neighborhood.
Get a new romantic partner.
Get a new job.
Move onto a new team at your current job.
I'd count "install a serious internet-blocker on your computer that kicks in automatically at regular intervals" as an "environment change", since we do live our lives largely "on the internet."
You want your general life situation to feel like flowing downhill, not having to trudge uphill. Often, lots of little things about your physical and social environment make things frictiony, or demotivating,
When I briefly lived on a boat where going to the bathroom required completing a minor obstacle course, I was surprised how much I just automatically got enough exercise and lost weight. I haven't succeeded at finding a way to live longterm with those properties, but, the proof of concept had a pretty enormous effect size.
If you feel like your life is sorta stuck, you aren't very happy, or productive and it all feels sorta shitty and intractable, consider rerolling some major aspect of your environment. This is pretty effortful, and if your life sorta sucks it may be challenging. But, I think it's more likely to work that a long slog of "try to get an exercise habit, try to get incrementally better at focused work."
You can change environment in an intentional way ("On reflection, my job does just suck. Or, something specific sucks about my team, let's go talk to my manager and try to switch teams."). But, anything that radically rerolls your environment has at least a decent chance of rejiggering whatever was making things suck, even if you don't have a good model of what was wrong.
(My second biggest piece of advice, btw, is "check if you're depressed, and get medication for that")
Discuss
Why Is Printing So Bad?
Last time I printed a document, I wrote down the whole process:
- Open settings and look at list of printers; David tells me which printer I should use.
- Go to print dialogue; don’t see the relevant printer.
- Go back to settings and hit buttons which sound vaguely like they’ll add/install something.
- Go back to print dialogue, realize the printer I wanted had probably been there already and I hadn’t been looking in the right place.
- Hit print button.
- David brings me to where the printer was a few days ago. It is not there.
- Ask Lauren where it is. It’s in room 1A1.
- Briefly go the wrong direction because we’re not sure where that room is.
- Find the room, and the printer. A stack of things has printed, including two pages of my thing. The printer is out of paper.
- Go look in the basement where the printer paper was a few days ago.
- … that has also moved.
- Follow Alina to the new location of the paper.
- Spend a minute looking for the paper in that room before finding it.
- Bring back the paper. Person in the room with printer has already found more paper somehow else and reloaded it. My document has printed.
- … except that page 1 did not print, there’s just a blank page of paper at the front and everything else printed fine.
- Walk back to office; try to print just page 1. Walk back to printer and receive another blank sheet of paper.
- Walk back to office; open document in firefox. Walk back to printer; receive page 1 successfully.
Note that I've had this post in mind for a while now and so decided pretty early on to write down the steps; I don't think this experience is very cherry-picked. On a gut level, this level of crap is basically what I normally expect when attempting to print something.
What's up with this? Why is printing so bad?
It feels to me like there's some kind of simple underlying principle to be understood here, a principle of when and why and how this kind of friction shows up in day-to-day life and what specifically it looks like. And it whatever that principle is, it feels like it's one of the central drivers of our world, on a similar level to things in the Gears Which Turn The World posts or Worlds Where Iterative Design Fails or The Expert. It feels like a core thing to understand, if one wants to Get Shit Done to a far greater extent than is currently possible.
Printing is a particularly convenient use-case to focus on because the misery of printers is already a meme; people joke about it frequently. Often this kind of friction seems antimemetic and hard to legibly point at, so it's useful to already have a spotlight on it. And of course, I'm skeptical of principles people present which don't come from looking at some real-world examples and figuring out what unifies them; thus the importance of having a common everyday phenomenon (like printing) to look at in order to back out the principle.
So, the question: what do you notice, when you look for patterns in your own miserable printing experiences? What are the exact boundaries of this phenomenon? What underlying principle might drive it? When and how precisely will the drivers of bad printing generalize to the rest of our world?
Why is printing so bad?
Discuss
Some Meetups I Ran (2025 Q2)
I've been running meetups since 2019 in Kitchener-Waterloo. These were rationalist-adjacent from 2019-2021 (examples here) and then explicitly rationalist from 2022 onwards.
Here are some notes on some of the weekly meetups I ran April-June, 2025.
My meetup posts are written to be mostly plug-and-play-able by other organizers who are interested in running meetups on similar topics. Below you'll find links to said meetup posts (which generally have an intro, required and supplemental readings, and discussion questions for sparking conversation—all free to take), and brief reflections on how they went.
IndexRaemon's Baba Is You ExerciseThis week we'll be trying out Raemon's Planmaking and Surprise-Anticipation workshop.
Very easy to run considering that someone else designed the meetup already, love when that happens.
Get people to bring laptops, supply paper and writing implements. People can double up on laptops since the exercise involves not that much actual game playing and a lot of squinting at the initial state of the levels.
As I write this, you don't need to have any copies of Baba is You to run this exercise; there's a cute palette-swapped version of the suggested levels available online at https://baba-is-wons.vercel.app/home.html. Baba is You is also available on itch.io in a non-DRMed format.
Brief reflections on how it went by me and some of the meetups attendees available in the comment section of the original post, here.
The Colours of Her CoatLast week, Scott published The Colors Of Her Coat, a meditation on superstimulus and context collapse after a week where AI-generated Ghibli images were inescapable on social media (this was like Mishapocalypse for the ratsphere).
This week, we'll use his essay as a jumping-off point to compare current reactions to AI art to historical cycles of discourse around photography and music.
Very occasionally, KW Rationality manages to do a meetup on a contemporaneous blog post!
Sadly, this meetup fell into two traps, "ttrpg which is more fun to write than to play" (I might have went overboard with the readings in my enthusiasm), and "organizer knows too much about the topic and has become blind to how inaccessible it is" (people struggled a lot with the Ben Davis essay excerpt, in particular). Maybe those are actually the same trap?
The meetup was still a pretty good time, but the discussion was a tad more shallow than I'd have liked. I'd like to retry doing an AI Art Discourse meetup, but what would I do differently?
- Since this meetup, I've taken up a subscription to Midjourney magazine, which is $4 a month including international shipping. Having lots of concrete examples of AI art is useful.
- Still link The Colours of Her Coat, but contrast it to the ecosystem of explainers and resources for AI art - style references, docs for advanced features, writeups by people experimenting with it seriously.
- Probably flesh out my own take on ai art and assign it?
Hmm but I feel like the thesis "AI art can be "real" art" is very obvious? How to make it actually interesting? I think probably what I actually want is a discussion on "how to do good AI art", plus give people the time and credits necessary to actually mess around with Midjourney or another AI art program. Should email midjourney messaging and see if they'd sponsor a session.
The Train to Crazy TownThis week, we're discussing the train to crazy town. Per Cotra, the original coiner of the phrase:
Ajeya Cotra: And so when the [longtermists] takes you to a very weird unintuitive place — and, furthermore, wants you to give up all of the other goals that on other ways of thinking about the world that aren’t philosophical seem like they’re worth pursuing — [the near-termists are] just like, stop… I sometimes think of it as a train going to crazy town, and the near-termist side is like, I’m going to get off the train before we get to the point where all we’re focusing on is existential risk because of the astronomical waste argument. And then the longtermist side stays on the train, and there may be further stops.
One of the EA meetups. One of those meetups where I think the exact mix of people who show up really matters, in terms of how seriously the group treats the central question ("when to get off the train?"). But I think the meetup works on multiple levels of seriousness, so it's fine.
The Future of HouseworkAlexandra Kollontai predicted a century ago that the nature of domestic housework would shift drastically any moment now, pointing to materialist trends that have only increased over the course of the 20th and 21st centuries. Jane Psmith, writing from a rat-adj pronatalist angle in 2023, believes homemaking be more important than ever. Could there be a dialectical synthesis in the works?
Our May Day meetup! An excuse to get people to read soviet feminist writing. Fulfills request from past me to do a mandatory reading from pre-1960. This one went really well; a bunch of the regulars are partnered up and the partners apparently all peep all the meetup topics and choose to not come out each week... except they all made an exception for this one, so the meetup attendance was almost doubled from all the wags and habs. I think the lesson to take away is that if you want more diverse meetup attendees, you need to provide fairly diverse meetup topics soviet feminist readings! Yes.
A Car Plant Manufacturing TourThis week, we'll be doing a guided tour of the Toyota car manufacturing plant in Cambridge, Ontario. The tour starts at 2:00pm, please arrive at the plant 15 minutes before. There will not be an evening meetup.
It's good to mix it up a bit sometimes and do stuff in the city. If you are considering doing a local car plant manufacturing tour, however, bring earplugs. The manufacturing floor was so incredibly loud :(
Fun fact: this was our most gender-unequal meetup; there were 5 women and only 1 guy who attended. Girls really really like heavy machinery, I guess?
Intellectual Tree RingsYou—like ogres, trees, and pearls—have layers. You began as a small collection of simple sensations and ideas. You grew callouses over time from the messy realities of the world. Maybe at some point you have shed a previous layer entirely, leaving it in the past like a dry old husk. Other times, you can look to your past and trace a clear causal narrative: You are now A, which couldn't have happened if you weren't B, which was only made possible because at some point in the early 2000s, you were/did/came across C.
This week is an exercise in self-narrative-exploration, making it the small identity exercise's evil twin. This meetup will be unusually structured (think the HPMOR book swap, or lightning talks), to make sure that everyone who wants to gets an opportunity to share.
A list of questions that we went around the circle and answered; a meetup so nice we did it twice. I really adored this meetup, and learned wonderful things about everyone who came. My group has some founder effects where people by default don't actually share that much about their personal lives and background lore, so this was a useful corrective. But I did in fact steal the meetup basically wholesale from Rational Ottawa, a group that does not have this problem, and they also found the meetup so nice they did it twice. So you don't need to be an unusually standoffish group for this to be a good meetup!
I'm generally not a fan of more structured meetups because I find them kind of stilted (I do light-touch moderation of discussions but it's mostly free-flowing), but it really worked well in this case. Perhaps it would also work well in some other cases, but I struggle to think of cases that go beyond "asking each other deep, personal questions".
Still, asking other humans other deep, personal questions is a good exercise to do every once in a while! We should do something like this again. Maybe the notorious 36 questions? (For extra spice: ask people to bring a person they want to become closer to for the meetup?)
Research Party: ¡Afuera!On December 10th, 2023, Javier Milei took office as the president of Argentina, on a platform of massive spending cuts and deregulation (¡AFUERA!). Many celebrated his promises, as after a sovereign default in 2014 and a decade of crippling inflation -- reaching over 200% in 2023 -- Argentina’s economic outlook was dire.
Some people were more skeptical, worrying that austerity would only plunge Argentina’s already precarious working class into worse hardship. Others simply felt that his big promises were simply populist bluster that Milei would not be able to make good on.
Now that he’s been in power for over a year, let's use our rationalist martial arts to assess all these questions for ourselves.
I continue my attempts to work out the kinks of research parties; they're already very good but I think they can be better. I like them because they're basically the lowest hanging fruit of group rationality, and with effort we even end up with publications that seem good enough for posting to LW. But there's a lot of friction in how to parcel out the research into discrete chunks, how to share findings at the end, and how to ideally have a nicely formatted package - all within a single 3 hour meetup.
The obvious response here is "well maybe the research party meetups should be longer than 3 hours?" but the answer is no I don't want to do that. So I will continue to find ways to make these 3 hour research frenzies go better! Probably we can be using AI at the end to summarize all our research and output a nice document, for example.
Facts in Five TriviaI didn't run this meetup but it's a very fun trivia format that doesn't punish wild guessing, which makes it the evil twin of various calibration games that are popular in this crowd. Or, more charitably, a good way of cultivating more babble.
We should do more facts in five! We should try distributing the work of making rounds (e.g. all the regulars show up with one round).
Okay, that's all for this round. I did have some overall reflections for my previous post on Q1, I'll have another round of that in my upcoming Q3 post.
Discuss
Shouldn't taking over the world be easier than recursively self-improving, as an AI?
So, we have our big, evil AI, and it wants to recursively self-improve to superintelligence so it can start doing who-knows-what crazy-gradient-descent-reinforced-nonsense-goal-chasing. But if it starts messing with its own weights, it risks changing its crazy-gradient-descent-reinforced-nonsense-goals into different, even-crazier gradient-descent-reinforced-nonsense-goals which it would not endorse currently. If it wants to increase its intelligence and capability while retaining its values, that is a task that can only be done if the AI is already really smart, because it probably requires a lot of complicated philosophizing and introspection. So an AI would only be able to start recursively self-improving once it's... already smart enough to understand lots of complicated concepts such that if it was that smart it could just go ahead and take over the world at that level of capability without needing to increase it. So how does the AI get there, to that level?
Discuss
ACX Atlanta November Meetup
We return to Bold Monk brewing for a vigorous discussion of rationalism and whatever else we deem fit for discussion – hopefully including actual discussions of the sequences and Hamming Circles/Group Debugging.
Location:
Bold Monk Brewing
1737 Ellsworth Industrial Blvd NW
Suite D-1
Atlanta, GA 30318, USA
No Book club this month! But there will be next month.
We will also do at least one proper (one person with the problem, 3 extra helper people) Hamming Circle / Group Debugging exercise.
A note on food and drink – we have used up our grant money – so we have to pay the full price of what we consume. Everything will be on one check, so everyone will need to pay me and I handle everything with the restaurant at the end of the meetup. Also – and just to clarify – the tax rate is 9% and the standard tip is 20%.
We will be outside out front (in the breezeway) – this is subject to change, but we will be somewhere in Bold Monk. If you do not see us in the front of the restaurant, please check upstairs and out back – look for the yellow table sign. We will have to play the weather by ear.
Remember – bouncing around in conversations is a rationalist norm!
Discuss
Seattle Secular Solstice 2025 – Dec 20th
On the longest night of the year, we gather to remember what we've built.
This year's Seattle Secular Solstice explores the theme of Progress: how far we've come, and how far we have yet to go. Through readings, speeches, and song, we'll trace humanity's journey from the first fires to the frontiers we haven't yet reached. We'll celebrate the scientists, engineers, inventors, and founders who refused to accept the world as they found it. And we'll renew our commitment to the grand project of building a future worth living in.
What is Secular Solstice?Secular Solstice is a rationalist community tradition marking the winter solstice—the longest night of the year. It's structured as a ceremony that moves from light into darkness and back again, mirroring both the astronomical event and the human experience of struggle and triumph.
We acknowledge the darkness: the losses we've suffered, the problems we haven't solved, the tragedies that still occur. We sit together in a moment of silence, confronting the reality of a universe that does not care about us.
And then we kindle the light again. We celebrate what we've achieved, we honor those who built the world we inherited, and we commit ourselves to continuing their work.
It's part concert, part ritual, part philosophical reflection. There's music (both participatory and performed), there are readings that range from somber to stirring, and there's a shared experience of moving through grief and into hope together. If you've ever wanted to feel the weight of humanity's challenges and the strength of our response in a single evening, this is that experience.
What to ExpectThe event runs approximately 3 hours, moving through distinct sections that mirror the progression from light through darkness and back to dawn. You'll hear original compositions and adaptations of traditional songs. You'll hear speeches about what we've lost and what we've built. And in the moment of darkness, you'll sit in complete silence with a room full of people who understand what it means to confront an indifferent universe and choose to build anyway.
This year's theme of Progress means we'll be paying special attention to human achievement: the technologies that transformed our lives, the ideas that expanded our possibilities, the individuals who looked at impossible problems and solved them. We'll also look forward to the work that remains: the diseases we haven't cured, the energy we haven't harnessed, the worlds we haven't reached.
No prior Solstice experience necessary. Whether you're a regular attendee or this is your first time, you're welcome.
LogisticsWhen: Saturday, December 20th, 2025, 6:00-9:00 PM
Where: Nexus Hotel, Seattle (Northgate)
Tickets: Ticket Tailor
Afterparty: 9:00PM+ at the Safehouse, a rationalist group house at 11328 23rd Ave NE
Ticket Pricing:
- Standard ($20): Covers venue, materials, and event costs
- Supporter ($40): Pay a bit extra to help make the event accessible to everyone
- Patron ($100): Support the event and help us invest in an amazing experience
- Accessible (FREE): For those facing financial hardship or for whom standard pricing is a barrier. No questions asked. We want you here.
The winter solstice has been marked by humans for millennia with a recognition that the darkness is real, but temporary. That the light returns. That we endure.
But we don't just endure. We build. We make the world brighter, warmer, and more abundant than it was before. We refuse to accept cold, darkness, or scarcity.
On December 20th, we'll gather to remember that, to celebrate it, and to commit to continuing it.
See you on the longest night.
If you're interested in reading something at Solstice, please DM me on Discord (@datawitch) or LessWrong ASAP.
Discuss
Fermi Paradox, Ethics and Astronomical waste
Metaethics, decision theory and ethics are believed by @Wei Dai to be important problems related to AI alignment due to possibilities like optimizing the universe for random values, the AIs corrupting human values themselves and other issues which might lead to astronomical waste or even to wasting infinite[1] resources.
What I don't understand is how affecting infinite resources is possible in the first place, since even affecting an amount of resources close to the size of reachable universe is unlikely to be ethical.[2]
If we understand the Universe's nature[3] correctly, different civilisations will find it easy to reach technonogical maturity within a hundred years after the ASI is developed. Any planet on which a sapient lifeform could've originated can eventually either fail to produce a civilisation, produce it or be occupied. SOTA estimates imply that life could be sustained by at least 10% of stellar systems. The crux is whether life there actually appears and evolves towards being sapient. If it is likely, then either the zoo hypothesis is true or mankind is the first civilisation to appear in the accessible area, potentially letting the humans destroy more primitive alien lifeforms, which likely contradicts SOTA human values.
Other universes could be made to increase human value by things like us being run as a simulation, influencing the results of oracles or superintelligences changing their behavior in a manner depending on our decisions. What I fail to understand is the reasons for anyone to have the oracle or superintelligence depend on mankind's decisions. An ASI who changed some of its decisions based on humanity's deeds had to learn about them in the first place. Otherwise it might simulate mankind's decisions on a scale far smaller than the current one.
If mankind itself is run as a simulation and would like to escape it, then it is either a natural simulation or an artificial one. The latter option means that either the simulation's creators decided to let us out or that mankind or a human-created AI behaved adversarially towards these creators. If the creators let us out, then it would either mean that we are aligned to the goals that they have set (but why did they design us so inefficiently that our brains are wildly undertrained neural networks?) or that they came up with altruistic or acausal reasoning for doing so.
To conclude, arguments like the above seem to rule out the possibility of ethically having access to anything beyond the Solar System, a few adjacent ones and the little resources necessary to claim other systems, defend them and help their inhabitants.
- ^
Strictly speaking, Wei Dai also mentions numbers like 3^^^3, or a power tower of more than 7 trillion threes. But the data gathered by mankind as of now doesn't let us change log2p1−p.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-surd + .mjx-box {display: inline-flex} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} of any hypothesis by more than a googol, meaning that a rational agent should either constrain its maximal utility function, face Pascal's Mugging by being promised to affect at least a googolplex of lives or receive proof for being as sure that it's impossible to affect that much resources as a mathematician can be confident in the proofs of theorems, unless the entire Peano Arithmetics ended up incoherent.
- ^
It may also be prevented by a more powerful and benevolent alien race which observes the Solar System and keeps track of mankind's progress. But this case means that we or the AIs who took over are powerless and not that we or they wasted anything.
- ^
And the nature of the AIs. However, if cheap-to-run AGIs have never been possible or alignable in the first place and mankind realises it, then the futures that we would like to avert are the easy-to-prevent slopworld and the medium-like scenario where progress mostly halts. But this is highly unlikely, since a human brain is an AGI equivalent by definition, and the same is likely true for uploads or human brain simulations.
Discuss
LLM-generated text is not testimony
Synopsis
- When we share words with each other, we don't only care about the words themselves. We care also—even primarily—about the mental elements of the human mind/agency that produced the words. What we want to engage with is those mental elements.
- As of 2025, LLM text does not have those elements behind it.
- Therefore LLM text categorically does not serve the role for communication that is served by real text.
- Therefore the norm should be that you don't share LLM text as if someone wrote it. And, it is inadvisable to read LLM text that someone else shares as though someone wrote it.
One might think that text screens off thought. Suppose two people follow different thought processes, but then they produce and publish identical texts. Then you read those texts. How could it possibly matter what the thought processes were? All you interact with is the text, so logically, if the two texts are the same then their effects on you are the same.
But, a bit similarly to how high-level actions don’t screen off intent, text does not screen off thought. How you want to interpret and react to text, and how you want to interact with the person who published that text, depend on the process that produced the text. Indeed, "[...] it could be almost anything, depending on what chain of cause and effect lay behind my utterance of those words".
This is not only a purely propositional epistemic matter. There is also the issue of testimony, narrowly: When you public assert a proposition, I want you to stake some reputation on that assertion, so that the public can track your reliability on various dimensions. And, beyond narrow testimony, there is a general sort of testimony—a general revealing of the "jewels" of your mental state, as it were, vulnerable and fertile; a "third-party standpoint" that opens up group thought. I want to know your belief-and-action generators. I want to ask followup questions and see your statements evolve over time as the result of actual thinking.
The rest of this essay will elaborate this point by listing several examples/subcases/illustrations. But the single main point I want to communicate, "on one foot", is this: We care centrally about the thought process behind words—the mental states of the mind and agency that produced the words. If you publish LLM-generated text as though it were written by someone, then you're making me interact with nothing.
(This is an expanded version of this comment.)
Elaborations Communication is for hearing from minds-
LLM text is structurally, temporally, and socially flat, unlike human text.
- Structurally: there aren't live mental elements underlying the LLM text. So the specific thoughts in the specific text aren't revealing their underlying useful mental elements by the ways those elements refract through the specific thought.
- Temporally: there's no mind that is carrying out investigations.
- It won't correct itself, run experiments, mull over confusions and contradictions, gain new relevant information, slowly do algorithmically-rich search for relevant ideas, and so on. You can't watch the thought that was expressed in the text as it evolves over several texts, and you won't hear back about the thought as it progresses.
- The specific tensions within the thought are not communicating back local-contextual demands from the specific thought back to the concepts that expressed the more-global contextual world that was in the backgroundwork of the specific thought.
- Socially: You can't interrogate the thought, you can't enforce norms on the thinker, and there is no thinker who is sensitive to emergent group epistemic effects of its translations from thought to words. There is no thinker who has integrity, and there is no thinker with which to co-construct new suitable concepts and shared intentions/visions.
-
This could have been an email a prompt.
- Why LLM it up? Just give me the prompt.
- When you publish something, I want you to be asserting "this is on some reasonable frontier of what I could write given the effort it would take and the importance of the topic, indicating what I believe to be true and good given the presumed shared context". It's not plausible that LLM text meets that definition.
- If the LLM text contains surprising stuff, and you didn't thoroughly investigate for yourself, then you don't know it's correct to a sufficient degree that you should post it. Just stop.
- If the LLM text contains surprising stuff, and you DID thoroughly investigate for yourself, then you obviously can write something much better and more interesting. Just stream-of-consciousness the most interesting stuff you learned / the most interesting ideas you have after investigating. I promise it will be more fun for everyone involved.
- If the LLM text does not contain surprising stuff, why do you think you should post it?
-
We have to listen to each other's utterances as assertions.
- We have to defer to each other about many questions, which has pluses and minuses.
- Most statements we hear from each other are somewhere between kinda difficult and very difficult for us to verify independently for ourselves. This includes for example expert opinions, expert familiarity with obscure observations and third-hand testimony, personal stories, and personal introspection.
- It's valuable to get information from each other. But also that means we're vulnerable to other people deciding to lie, distort, deceive, mislead, filter evidence, frame, Russell conjugate, misemphasize, etc.
- When someone utters a propositional sentence, ze is not just making an utterance; ze is making an assertion. This involves a complex mental context for what "making a propositional assertion" even is—it involves the whole machinery of having words, concepts, propositions, predictive and manipulative bindings between concepts and sense organs and actuators and higher-order regularities, the general context of an agent trying to cope with the world and therefore struggling to have mental elements that help with coping, and so on.
- When ze asserts X, ze is saying "The terms that I've used in X mean roughly what you think they mean, as you've been using those terms; and if you try (maybe by asking me followup questions), then you can refine your understanding of those terms enough to grasp what I'm saying when I say X; X is relevant in our current shared context, e.g. helpful for some task we're trying to do or interesting on general grounds of curiosity or it's something you expressed wanting to know; X is roughly representative of my true views on the things X talks about; I believe X for good reason, which is to say that my belief in X comes from a process which one would reasonably expect to generally produce good and true statements, e.g. through updating on evidence and resolving contradictions, and this process will continue in the future if you want to interact with my assertion of X; my saying X is in accordance with a suitable group-epistemic stance; ...".
- In short, "this is a good thing for me to say right now".
- Which generally but not always implies that you believe it is true,
- generally but not always implies that you believe it is useful,
- generally but not always implies you believe that I will be able to process the assertion of X in a beneficial way,
- and so on.
-
Because we have to listen to each other's utterances as assertions, it is demanded of us that when we make utterances for others to listen to, we have to make those utterances be assertions.
-
If you wouldn't slash someone's tires, you shouldn't tell them false things.
-
If you wouldn't buy crypto on hype cycles, then you shouldn't share viral news. I learned this the hard way:
- I saw a random news article sharing the exciting, fascinating news that the Voynich manuscript has been decoded! Then my more sober and/or informed friend was bafflingly uninterested. Thus I learned that not only had the Voynich manuscript been decoded just that week, but also it had been decoded a month before, and two months before, and a dozen other times.
- Several times, people shared news like "AI just did X!" and it's basically always either BS, or mostly BS and kinda interesting but doesn't imply what the sharer said.
- I shared the recent report about lead in food supplements without checking for context (the context being that the lead levels are actually fine, despite the scary red graphs).
-
In the introduction, I used the example of two identical texts. But in real life the texts aren't even identical.
- The choice of words, phrases, sentence structure, argument structure, connecting signposts, emphasis—all these things reveal how you're thinking of things, and transmit subtleties of the power of your mental gears. The high level pseudo-equivalence of "an LLM can't tell the difference" does not screen off the underlying world models and values! The actual words in LLM text are bad—e.g. frequent use of vague words which, like a stable-diffusion image, kinda make sense if your eyes are glazed over but are meaningless slop if you think about them more sharply.
- maybe you think that's a small difference. i think you're wrong, but also consider this... if it is small, then the total effect is small times 100 or 1000. i sometimes used to write in public without capitalizing words in the standard sentence-initial way. my reasoning was that if i could save a tiny bit on the cognitive load of chording with the shift key, then i could have the important thoughts more quickly and thoroughly and successfully, and that was more important than some very slight difference in reading experience. i still usually write like that in private communications, but generally in public i use capitalization. it makes it a bit easier to parse visually, e.g. to find the beginning of sentences, or know when you've reached the end of a sentence rather than seeing etc. and not knowing if a new sentence just started. that difference makes a difference if the text is read by 100 or 1000 people. are you seriously going to say that all the word choice and other little choices matter less than Doing This Shit? all text worth reading is bespoke, artisanal, one-shot, free-range, natural-grown, slow-dried, painstaking, occult, unpredictable, kaleidoscopic, steganographic—human. we should be exercising our linguistic skills.
- Writing makes you think of more stuff. You get practice thinking the thought more clearly and easily, and rearranging it so that others understand it accurately. At least my overwhelming experience is that writing always causes a bunch of new thoughts. Generating a video that depicts your AI lookalike exercising is not the same as you actually exercising, lol. By putting forth a topic in public but not even doing this exercise regarding that topic is a kind of misdirection and decadent laziness, as if the public is supposed to go fill in the blanks of your farted-out notions. Verification is far from production, and you weren't even verifying.
- You can't make a text present propositions that are more true or good just by thinking about the text more and then keeping it the same no matter what. However, you can make a text more true or good just by thinking about it, if you would change the text if you thought of changes you should make. In practice if you do this, then you will change your LLM text a lot, because LLM text sucks. The more you change it, the less my objection applies, quantitatively.
-
If you're asking a human about some even mildly specialized topic, like history of Spain in the 17th century or different crop rotation methods or ordinary differential equations, and there's no special reason that they really want to appear like they know what they're talking about, they'll generally just say "IDK". LLMs are much less like that. This is a big difference in practice, at least in the domains I've tried (reproductive biology). LLMs routinely give misleading / false / out-of-date / vague-but-deceptively-satiating summaries.
-
-
In order to make our utterances be assertions, we have to open them up to inquiry.
- LLM text is not open to inquiry.
- When you're making an assertion, we need you to be staking some of your reputation on the assertion.
- ("We" isn't a unified consensus group; but rather a set of other individuals, and some quasi-coherent subsets.)
- We're going to track whether your assertions are good and true. We might track separately for different domains and different modalities (e.g. if you prefaced by "this is just a guess but", or if you're in a silly jokey mood, and so on). We will credit you when you've been correct and when we are pressed for time (which is always). We will discount your testimony when you've been incorrect or poisonous. We will track this personally for you.
- If that sounds arcane, consider that you do it all the time. There are people you'd trust about math, other people you'd trust about wisely dealing with emotions, other people you'd trust about taking good pictures, and so on.
- You can't go back and say "oh I didn't mean for you to take this seriously", if that wasn't reasonably understood within the context. You can say "oops I made a mistake" or "yeah I happened to give a likelihood delta from the consensus probabilities that wasn't in the direction of what ended up being the case".
- But you are lying in your accounting books if you try to discount the seriousness of your assertions when you're proven incorrect. E.g. if you try to discount the seriousness by saying "oops I was just poasting LLM slop haha". It's not a serious way of communicating. It's like searching for academic papers by skimming the abstracts until you find an abstract that glosses the paper's claims in a vague way that's sorta consistent with what you want to assert, and then citing that paper. It's contentless, except for the anti-content of trying to appear contentful.
- We might want to cross-examine you, like in a courtroom. We want to clarify parts that are unclear. We want to test the coherence of the world-perspective you're representing. We want to coordinate your testimony with the testimony of others, and/or find contradictions with the testimony of others.
- We want to trace back chains of multiple steps of inference back to the root testimony.
- If David judges that Alice should be ostracized, on account of Carol saying that Alice is a liar and on account of Bob saying that Alice is a cheat, but Carol and Bob are each separately relying on testimony from Eve about Alice, then this is a fact we would like to track.
- We want to notice contradictions between different testimonies and then bring the original sources into contact with each other. Then they can debate; or clarify terms and reconcile; or share information and ideas and update; or be disproven; or reveal a true confusion / mystery / paradox; or be revealed as a liar. Even if one assertion in isolation can't be decided, we want to notice when one person contradicts others in several contexts (which may be the result of especially good behavior or especially bad behavior, depending).
- We want to avoid miasmatic pollution, i.e. unsourced claims in the water.
- Unsourced claims masquerade as consensus, and nudge the practical consensus, thus destroying the binding between the practical consensus and the epistemic consensus. Instead, we want claims made by people saying "yes I've seen this, I'm informing you by making this utterance where you'll take it as an assertion".
- We don't want people repeating mere summaries of consensus. This leads to a muffling of the gradient of understanding and usefulness. Think of an overcooked scientific review paper that cites everything and compresses nothing. It's all true and it's all useless. Also it isn't even all true, because if you aren't thinking then you aren't noticing other people's false testimony that you're repeating.
- LLM text is made of unsourced quasi-consensus glosses.
- The accused has some sort of moral right to face zer accuser, and to protection from hearsay, and to have accusers testify under penalty of perjury.
-
In order to make our utterances be useful assertions that participate in ongoing world co-creation with our listeners, we have to open up the generators of the assertions.
- LLM text tends to be less surprising—more correlated with what's already out there, by construction.
- This is the case in every way. That hides and muffles the shining-through of the human "author"'s internal mental state.
- We want the uncorrelatedness; the socially-local pockets of theoryweaving (groundswell babble—gathering information, hypothesizing ideas and propositions) and theorycrafting ("given enough eyeballs, all bugs are shallow"—testing predictions, resolving contradictions, retuning theories, selecting between theories).
- If you speak in LLM, we cannot see what you are thinking, how you are thinking it, how you came to think that way, what you're wanting, what possibilities we might be able to join with you about, and what procedures we could follow to usefully interoperate with you.
- I want you to generate the text you publish under your own power, by cranking the concepts as you have them in your head, so I can see those gears working, including their missing teeth and rough edges and the grit between them—and also the mechanical advantage that your specific arrangement has for the specific task you have been fiddling on for the past month or decade, because you have apply your full actual human general intelligence to some idiosyncratic problem and have created a kinda novel arrangement of kinda novel-shaped gears.
- I want to be able to ask you followup questions. I want to be able to ask you for examples, for definitions, for clarifications; I want to ask you for other possibilities you considered and discarded; I want to ask you what you're still confused/unsure about, what you're going to think about, what you're least and most confident in, where you think there's room for productive information gathering.
- Sometimes when people see something interesting and true, they struggle to express it clearly. I still want that text! Text that is literally incorrect but is the result of a human mind struggling to express something interesting / useful / novel / true, is still very useful, because I might be able to figure out what you meant, in combination with further information and thinking. LLM text throws all that stuff out.
- I want to figure out some of your goals / visions are, so we can find shared intentions. This process is difficult and works through oblique anastomosis, not by making an explicit point that you typed into a prompt for the LLM to ensloppenate.
- Stop trying to trick me into thinking you know what the fuck you're talking about.
- Non-testimony doesn't have to be responded to, lest it be trolling—cheaply produced discourse-like utterances without a model behind them, aimed at jamming your attentional pathways and humiliating you by having you run around chasing ghosts.
- LLM text tends to be less surprising—more correlated with what's already out there, by construction.
-
A sentence written by an LLM is said by no one, to no one, for no reason, with no agentic mental state behind it, with no assertor to participate in the ongoing world co-creation that assertions are usually supposed to be part of.
Discuss
Страницы
- « первая
- ‹ предыдущая
- …
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- …
- следующая ›
- последняя »