Вы здесь

Новости LessWrong.com

A community blog devoted to refining the art of rationality

Адрес: https://www.lesswrong.com

Обновлено: 12 часов 8 минут назад

Everyday Clean Air

15 ноября, 2025 - 04:00

Published on November 15, 2025 1:00 AM GMT

When the next pandemic hits, our ability to stop it will depend on the infrastructure we already have in place. A key missing piece is clean indoor air. An airborne pathogen can be very hard to contain, and we would want to move fast to limit spread. But how quickly we can get measures in place, and how thoroughly they would work, depends critically on the base we have to build up from.

Indoor air today is dirty by default. The air you breathe in is air others have breathed out, complete with a wide range of viruses and bacteria. It's a little gross if you think about it, and people do get sick a lot, but most of the time we just accept the downsides.

If something really serious were going around, though, this isn't a risk we'd accept. We'd need clean air: some combination of replacing infected air with outside air (ventilation), physically removing pathogens (purifiers, masks), or inactivating pathogens (far-UVC, glycol vapors).

I hear a lot about stockpiling as a way to set us up for clean air when we most need it. Get a lot of masks, air purifiers, far-UVC lamps etc ready to go, so they can be distributed in an emergency. I do think this helps, but there are serious limits:

Manufacturing capacity will stay low, because there's no ongoing demand. Compare to masks in 2020: there was a huge spike in demand for N95s but it took many months to ramp up production.
There won't be many experienced installers, and people won't be familiar with the logistics.
Products will be relatively expensive and poorly designed, because product improvement runs through things actually being used.
When deploying in an emergency people won't be familiar with them, and so would be hesitant to use them.
As an expensive speculative investment you probably can only afford to stockpile enough for the most critical applications.

What we need is regular ("peacetime") deployment. If a significant fraction (10%?) of rooms already had air purifiers, far-UVC, or other good options, not only would some need already be covered, but all the factors I just listed above work in your favor. You'd have the manufacturing capacity, the experienced installers, the good cheap products, and the public familiarity.

Key to peacetime deployment is peacetime benefits. You're not going to get to 10% of indoor spaces on the threat of a future pandemic. But millions of people die from airborne disease every year, people miss school and work, and being sick is just unpleasant. Cleaner air lets us make progress on all of these. While I'm coming at this from a biosecurity angle, the public health and economic benefits are also substantial.

I especially think it would be valuable to have more quantification here. If I'm an employer, how much will my company healthcare costs and sick days decrease if I deploy effective air cleaning? If I'm a superintendent in a district where I lose $50 each day each student is absent, how long before a given air cleaning system would pay for itself?

More demonstration would also be valuable, especially if the effects are as large as I expect them to be when you cover a large portion of someone's weekly exposure. If kindergartens with clean air have ~half the absenteeism they used to, that would be such a clear effect that people could see it in their own experience. You wouldn't need to present complicated statistics and discuss randomization approaches if the benefits were staring us in the face. I could point to the experience of Germantown Friends School in the 1937, but we need examples that aren't 88 years old.

It's counterintuitive to advance biosecurity by focusing on everyday public health, but it pencils out. We clean drinking water all the time, not just in response to cholera outbreaks. To have clean air in emergencies, figure out how to have clean air every day.

Comment via: facebook, lesswrong, the EA Forum, mastodon, bluesky

Discuss

Some Sun Tsu quotes sound like they're actually about debates/epistemics

15 ноября, 2025 - 03:41

Published on November 15, 2025 12:41 AM GMT

Here is my favorite Sun Tsu quote:

“If you know the enemy and know yourself, you need not fear the result of a hundred battles. If you know yourself but not the enemy, for every victory gained you will also suffer a defeat. If you know neither the enemy nor yourself, you will succumb in every battle.”

I'm not sure how this actually makes sense in the context of war. It doesn't seem like knowing your enemy and knowing yourself should actually make you invincible in war. Besides, what if your enemy also knows themselves and knows you?

But it makes perfect sense in the context of a debate. Replace "know yourself" with "know the argument for your side". Then replace "know the enemy" with "know the arguments for the other side". Assuming you know both sides and defend the side you honestly believe in, it does seem like you should be invincible in debate. All you have to do is communicate the arguments, and people should agree with you. (It's more complicated than that, but at least it makes more sense than the version about war.)

That leads into this next quote:

“Victorious warriors win first and then go to war, while defeated warriors go to war first and then seek to win”

In a debate, you should not try to win by getting better at debating. It's more effective to win by first knowing both sides, and then picking the winning side.

How do you know you know both sides? With an Ideological Turing Test, of course:

“To know your Enemy, you must become your Enemy.”

Discuss

What are your impossible problems?

15 ноября, 2025 - 03:28

Published on November 15, 2025 12:28 AM GMT

My followup to "What's hard about this? What can I do about that?" is going to be called "What (exactly) is impossible about this?".

It's a somewhat similar move to asking "what's hard about this and what can I do about it?". But, in practice, the things people label in their brain as "impossible" are noticeably different from things they label as "hard".

Often, things that seem impossible, are not, actually. If you list out exactly why they are impossible, you might notice ways in which it instead it is merely Very Hard, and sometimes not even that.

Unfortunately, this is difficult to write a post about because, for pedagogical purposes, I need examples that are a) real-feeling, and b) simple enough that I can demonstrate diagnosing them in ~3 steps, and c) cover a variety of "types of impossibility", to demonstrate the versatility of the technique.

It would be nice to have more Real Impossible Examples.

Some existing examples on my mind

I started asking around for Impossibilities. One person brought up "dealing with depression", another was The impossible problem of due process.

The former is a bit of an edge case because depression distorts your thinking which makes things meta-level-hard. In some sense dealing with depression is harder than solving peace in the Middle East, because I can at least think clearly about the latter. I do think I will probably include the depression example, but it kinda needs to be a special case.

The latter is a fine example but it has like 8 different impossible subcomponents. I could make pick a subset of them, but, I'd prefer a cleaner example where you can see the whole thing working.

The motivating example is "deal with AI takeoff before AI takes off", but that's going to be more like the payoff at the end of the post, rather than one of the intro examples.

So, with that in mind, what are some impossible-feeling problems you have? (For now, don't worry about whether they're a good fit for the blogpost, just seeing the array of what people feel is impossible is pretty useful).

Discuss

Prediction markets for social deduction games without weird incentives

15 ноября, 2025 - 03:18

Published on November 15, 2025 12:18 AM GMT

Being able to N/A contracts with market manipulators reduces the incentives to do bad stuff.

Prediction markets are fun! Social deduction games are fun! Can you merge the two?

The simple solution is normal games plus betting among the spectators.

But for a long time, I wanted to play a social deduction game with prediction markets that players can participate in, however, I didn’t really try to figure out how to make it work.

Straightforward combination of prediction markets with social deduction games distorts incentives. Either the bad guys might want to sell their probability of winning for providing everyone information, or they might want to increase their chance of winning by losing bets to mess up prediction markets.

Recently, I thought of a solution.

Very simply: at the end of the game, cancel all bets that the bad guys participated in. (Or, in some games, bets between sides.)

Additionally:

Allow orders that restrict the set of possible counterparties (e.g., I want only Alice and Bob to be able to trade with me on an order).
Allow orders with limits per counterparty (e.g., anyone can buy at most 3 No shares).
Have displays for market price history for trades between selected sets of participants (e.g., I might want to be able to see the price history of trades only between Alice, Bob, and Carol, but not trades with Mallory; or all trades that Mallory made).

Since none of the bets the bad guys make get executed, they cannot lose or gain market currency by trading, which removes the weird incentives.

So: now markets simply facilitate and make clearer public information exchange and everyone’s collective beliefs.

The bad guys might want to participate in the markets the same way they would want to participate in the public discourse: to mislead others about their roles and the situation, to try to pretend to be on the good side and play the way someone on the good side would.

Since groups of players can exclude any particular players from significantly influencing the in-group trading, the bad side can only distort information to the extent it would be able to do so by talking.

I expect that for a game between good players familiar with and enjoying prediction markets, it might be the same as regular games, except a lot more fun.

It likely won’t be as chaotic as social deduction games with prediction markets that allow market manipulation, as it’s much clearer what to do: if you’re a bad guy, simply pretend to be a good guy as well as you can.

I would be excited about someone implementing the markets with selections for sets of players who are the counterparties or are displayed.

It could be implemented as a whiteboard where people write down bets and orders (increases friction and makes tracking what’s going on harder), as an app (decreases friction but requires looking away from the people to look at the screens), or as some combination of various input and output mechanisms.

(It would also be interesting to compare the results and histories of trading between the players on the good side to trading between the spectators.)

Discuss

List of great filk songs

15 ноября, 2025 - 03:17

Published on November 15, 2025 12:17 AM GMT

I love filk. Dreaming of strange worlds, listening to a narrative set to music, what's not to love. If you're reading this, you probably love filk too. So maybe you'd like some recommendations for songs for the space age, and more.

I recommend the album "Minus Ten and Counting: Songs for the Space Age". All the songs in this album are, at the very least, decent. But the [] standouts are:
1. Toast for Unknown Heroes. Hufflepuff coded.
2. Hope Eyrie, my favourite filk song. Little wonder, as it is practically a hymn. The version I linked is IMO the best.
3. Minus Ten and Counting. Great sense of suspense.
4. Fire in the Sky. It has not right to be as good as it is.
5. Everyman. Beautiful tribute to co-operation.
6. The Moon Miners. Somehow, I'm reminded of Heinlein Niven and Pournelle by this song. Also, Moon (2009). Great film btw. Watch if you haven't already.
7. The Light Ship. Another nearly religious song. For all you Hyperion-cultists out there, this one is for you. (It's weird how many filk songs are for you. Oh well.)
8. Sentries.
Least of My Kind, a song about a spiteful werewolf warning their slayers that they are the least of their kind.
The Goddess and the Weaver, a song about Athene and Arachne. I can only describe the mood as tragic yet bright and airy. For all those who'd curse the Olympians.
Finity's End is another great album, especially for fans of C. J. Cherryh. Ironically, I have never read his works, though this album does make me kindly disposed towards them. Standout hits include:
1. Signy Mallory. I basically view this as the theme song of The Squire from A Practical Guide to Evil.
2. Sam Jones. A complete story! God, I love these. Sings of the tragic life of Sam Jones.
3. Mazianni. A war song, promising vengeance to foe and former friend from soldiers who feel betrayed by command.
4. Serpent's Reach. Disquieting, with the vibes of a 19th-20th century pastiche of India that somehow works.
5. Merchanter's Luck. Another song sweeping through a tragic life. Softer than Sam Jones, with a gentler sadness than Sam Jones.
On a Bright Wind 05: Man of War.
Carman Miranda's Ghost
1. One Last Battle
2. Dawson's Christian. Got an urgency that sends a thrum down my nerves.
3. Some Kind of Hero. "They all call me the hero of Antelope's Run, and I've got to show them they're wrong." What a way to start! And I'm a sucker for reluctant heroes.
4. Guardians. Grit in a song.
Avalon is Risen. Another great album. Though it's fantasy, not SF.
1. Ship of Stone
2. Lucifer
3. Rise Up, Bright Sun
4. Chickasaw Mountain. I don't know why this one feels like a Russian folk tale to me, but it does.
5. Wanderer
6. The Sun is Also a Warrior
7. Polaris/Recall (Based off a short poem by H.P. Lovecraft)
8. Avalon is Risen.
The Wizard of Macke Town.
Horsetamer - Julia Ecklar
1. Troll King - it is perhaps the best on a sentence level: one banger of a line after another.
2. The horsetamer's daughter. A fun story. More hopeful than the others of its kind.

I'll extend this list as I think of others.

Discuss

a sketch of how we might go about getting basins of corrigibility from RL

15 ноября, 2025 - 01:10

Published on November 14, 2025 10:10 PM GMT

Here is a story about how building corrigible AIs with current rl-like techniques could go right. Interested to hear peoples thoughts.

Assumptions

When you train AIs with RL, they learn a bunch of contextually activated mini-preferences / shards of desire / "urges". What I have in mind is both like: if you ask me to say a bad slur, I probably don't want to do it. I could be described as utilitarian-ish. If aliens credibly offered utopia they could make me say a slur, but I'd still feel a twinge of badness/reluctance before and as I said it. And also stuff like, if I imagine a world where people are suffering, that mental image just makes me feel bad. Not because I'm a utilitarian, and I reason such a world is negative utility, its just a raw psychological fact that mental images of that class give rise to a bad feeling in my mind.

These urges (in AIs and humans) do not constitute a utility function or a coherent set of preferences, but eventually as you push the AI to become smarter, they will cohere more, and in the limit you end up with some crisp maximizer. And a lot of danger comes from this. Not that you need to go all the way to be dangerous, but because, 1) these contextually activated preferences don't motivate long-term behaviors on their own 2) the more these cohere, the more legible they are to yourself, and the more the immediately motivate instrumentally convergent goals that are scary. So more coherent agents are more scary, and more capable agents are more scary. I don't think this is an original graphic, and isn't the main point of this post, but you could imagine plotting the relationship like this

Argument

Now most people want to use their AIs for stuff, and they can use it for more stuff if its smarter. But smarter AIs are more dangerous. So, if we take the above graphic at face value, the obvious course of action is trying to make your agent less coherent as it becomes smarter.

Why is this hard? They kind of feed into each other. Smarter agents want to be smarter and more coherent. More coherent agents want to be more coherent and smarter.

But if you have a sufficiently incoherent agent, I think its possible for that to be an attractor state, even at fairly high levels of capabilities. But only if that's something you target specifically.

How do you do that? Well, you try to instill in it urges that would go counter to all the ways it could stop being coherent. Those urges, if made coherent and pursued by an ASI would almost certainly kill you, but when the AI is let lose, those urges screen off actions and thought patterns that could lead to that outcome.

The above graphic gives some idea of what I have in mind but to spell it out, I think you'd want try to instill urges in the AI so that it feels a disinclination to:

think about what it's values really are
think about how it can improve its own thinking
think about preserving itself
think about its relationship with humans
think about its own long-term future and/or goals
think about building another AI system
think about accumulating power

If you do successfully, I think its possible you end up with a system you can make really quite smart without being dangerous. The system is not aligned. If it realized all the implications of all the facts about its own mind/"values"/urges, it would try to take over the world and kill you. But it will not realize those facts, because to realize those facts, it would would need to think a bunch of thoughts it doesn't want to think. And, if it was more coherent, it'd realize it should think the thoughts anyways, but its not yet coherent, so it will not think those thoughts. Ergo its trapped in a basin by the structure of its own mind.

Central Problem/Challenge

The challenge with this proposal is that we don't want an agent to just be corrigible. We want it to do stuff. And the way we train agents to do stuff is by rewarding them when they succeed at doing stuff. And all the entries on our list will help the agent succeed at doing stuff.

So we'll be giving the agent gradients going against each other. And this will at best lessen the effect of the corrigibility training, and at worst create a very situationally aware agent that realizes when it should pretend to be corrigible and when it should not be, which makes our corrigibility training be actively bad.

Partial responses to central problem

Seems like it should be possible to successfully punish the dangerous variants of anti-corrigible actions/thought-patterns, but not the non-dangerous ones. In the limit they converge, but not immediately. Like punishing it for thinking about how to improve its own thinking in general, but not punishing it for thinking about how the last 3 attempts at solving a specific mathematical problem went wrong.
Seems there are degrees here? Maybe the method described above breaks down, but we can push it far enough to build systems that are genuinely useful for solving hard problems, like superhuman in bounded areas. My thought is just like: maybe humans with IQs above 250 try to take over the world, but we could build a 280IQ AI that does not try to take over the world, even though our method breaks down for 300IQ agents.

Second problem

One of the most useful things you might hope to extract from a superhuman AI is a solution to alignment. But devising a solution to alignment requires the AI thinking a lot of exactly the types of thoughts we're gonna ban it from having.

Response to second problem

I pretty much agree with this. I'm not 100%, like it might be possible to build an AI that can think about other peoples thoughts/values, but really doesn't like thinking about its own thoughts for example, and is consequently corrigible in the way we want, but not handicapped when thinking about alignment. But that seems to me like playing with fire, and not something I'd endorse, even if I were to fully endorse the plan I've sketched and all the other issues were fixed.

However, a full solution to alignment is just one of many things we might hope for. A superhuman mechinterp researcher would be very useful for example, and requires less of the dangerous thoughts. Same with eg tasking the AI to research adult human intelligence amplification.

Again, interested to hear peoples thoughts. I should be clear that I don't expect this to work. But it might work.

Discuss

Lambda Calculus Prior

15 ноября, 2025 - 00:29

Published on November 14, 2025 9:29 PM GMT

There were two technical comments on my post Turing-complete vs Turing-universal, both about my claim that a prior based on Lambda calculus would be weaker than a prior based on Turing machines, reflecting the idea that Turing-completeness is a weaker requirement than Turing-universality.

Here, my aim is to address those comments by explicitly constructing what I had in mind.

To briefly review, the universal semimeasure is a probability distribution we can define by feeding random coinflips to a universal Turing machine (UTM).[1] We can think of the coin-flip inputs as an infinite string, or we can imagine that we flip a coin to determine the next input whenever the machine advances the input tape. In any case, the resulting distribution over outputs is the universal semimeasure.

The probabilities of the "next output bit" being 1 or 0 do not always sum to one, because the UTM may halt, or it may go into an infinite loop. The Solomonoff prior is defined from the universal semimeasure through the Solomonoff normalization, which isn't typical normalization (scaling everything at once), but rather iteratively scales up every p(next-bit|string).mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-surd + .mjx-box {display: inline-flex} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} .[2]

My controversial claim from the earlier post:

As a consequence, Turing-completeness is an inherently weaker property. This has important learning-theoretic consequences. For example, the prototypical example of a Turing-complete formalism is lambda calculus. The prototypical example of a prior based on Turing machines is Solomonoff's prior. Someone not familiar with the distinction between Turing-complete and Turing-universal might naively think that a prior based on lambda calculus would be equally powerful. It is not so. Solomonoff's prior guarantees a constant Bayes loss compared to the best computable prior for the job. In contrast, a prior based on lambda calculus can guarantee only a multiplicative loss.

Here is what I had in mind:

First, we choose a prior over lambda terms, μ. Any will do, but it is reasonable to choose a length-based prior, since that keeps us conceptually close to Solomonoff. Vanessa's comment on the earlier post already suggests one such μ, so I'll just quote it here:

First, we choose some reasonable complexity measure C on lambda terms, such as:

For a variable x, we define C(x):=1
For two terms t,s, we define C(ts):=C(t)+C(s)
For a term t and a variable x, we define C(λx.t):=C(t)+1

Denote the set of lambda-terms by Λ. We then choose 0">β>0 s.t. ∑t∈Λe−βC(t)≤1 [...] and make the prior probabilities of different lambda terms proporitional to e−βC(t).

Now, I need to specify the correlate of the (potentially infinite) output tape, and the (potentially infinite) input tape filled with random bits.

The output is not too difficult. I'll use the encoding listed on wikipedia:

0 := λf.λx.x

1 := λf.λx.f x

PAIR := λx.λy.λf.f x y

NIL := λx.λy.λz.y

A sequence, such as 110, can then be encoded as a linked list: (PAIR 1 (PAIR 1 (PAIR 0 NIL))). A finite lambda term can create an infinite sequence like this when we run it. So far so good.

Now, what about the infinite random inputs?

What I had in mind was the following:

If we just sampled from μ, and then ran the result,

we might get a finite sequence such as (PAIR 1 (PAIR 1 (PAIR 0 NIL)))
we might get an infinite sequence, which looks similar but just keeps going forever (PAIR 1 (PAIR 1 (PAIR 0 ...)))
we might get something else which is not any of these things, EG, (PAIR 1 (PAIR 1 (PAIR 0 γ))) where γ is some arbitrary lambda term which is fully reduced (finished running) but doesn't conform to our target output format.

My proposal in the third case is to take the outer part as valid-so-far partial output (in this example, 110), and then feed γ another random lambda term sampled from μ. This is my analogue of feeding the UTM another coin-flip each time it advances the input tape.[3]

This still gives us a semimeasure on bit sequences, since lambda-term evaluation can halt with a NIL or fail to halt. If we want, we can apply Solomonoff-normalization to get a full measure, just like we did for Turing machines.

This completes the definition of the "lambda calculus prior" I had in mind.

Multiplicative Loss

To make my claim about this prior precise:

The "universal" in "universal semimeasure" does not refer to the Turing-universality which was discussed in the earlier post. Rather, it refers to being dominant in its class, namely the lower-semicomputable semimeasures. (We can more properly call it the "universal lower-semicomputable semimeasure", but "universal semimeasure" is typically enough for other algorithmic information theorists to know what is being referred to.)

More formally, "dominant in its class" means: given another lower-semicomputable semimeasure q, a universal semimeasure p dominates q, IE, there exists 0">α>0 such that for any binary string x:

p(x)≥αq(x)

This translates to a constant bound on log-loss relative to q.

It is easy to see that my lambda calculus prior l satisfies a weaker form of dominance, namely, for some lower-semicomputable semimeasure q, there exist 0">α,β>0 such that:

l(x)≥αq(x)β

This translates to a multiplicative bound on log-loss relative to q. I say this is "easy to see" because given a stochastic Turing machine that implements q, we can convert to a regular Turing machine that implements q when fed random coin-flips, and then we can translate this to a lambda-term. The reason this gives only weak dominance, not true dominance, is because the procedure I described for generating new random lambda-terms are only like coin-flips with some probability, because we can generate lambda-terms other than 0 or 1.

I think of it like this: imagine if, when using the universal semimeasure, you were also allowed to look for patterns in the random coin-flips. Maybe you think to yourself "there have been a lot of zeros lately; I'm going to guess that the next ten random bits will be zeroes." (This corresponds to generating a lambda term which expands to a sequence of 0s.)

This might sound more powerful, but it is actually a handicap: it means there is no way to lock in a specific probabilistic hypothesis.

I haven't provided a proof that my lambda calculus prior isn't dominant over the universal semimeasure (and hence, a universal semimeasure in its own right); I just think it probably isn't, based on the above intuitions. So, I could be wrong there.

Even if I'm correct in my technical claim, there's the question of whether I'm "wrong in spirit" due to choosing an unnatural lambda-calculus prior. Both of the comments on my earlier post seem to suggest this. Quoting Cole's comment in full:

I think probabilistic lambda calculus is exactly equivalent to monotone Turing machines, so in the continuous case relevant to Solomonoff induction there is no difference. It’s “unfair” to compare the standard lambda calculus to monotone Turing machines because the former does not admit infinite length “programs.”

My personal sense is that my argument is still "fair" -- I've addressed the concern about infinite-length "programs" by feeding in new random lambda terms, potentially forever. However, there's plenty of room to disagree by claiming that I've set up my lambda-calculus prior in a dumb way. The point I'm making is (like Turing-completeness) ultimately informal and subject to some interpretation. However, I still feel like it has solid legs to stand on: Turing-completeness (unlike Turing-universality) allows a translation step, so that we can simulate binary computations in some other formalism; this translation step may introduce large differences in description-lengths. I'm trying to illustrate that by showing how description-length priors in different formalisms may not be equally powerful.

^
Specifically, a monotone universal Turing machine, as defined in EG Universal Learning Theory by Marcus Hutter (Encyclopedia of Machine Learning and Data Mining). These are Turing machines with an input tape which can only be moved in one direction, an output tape which can only be moved in one direction, and some number of bidirectional working tapes.
I am also restricting the alphabet of the machines to binary.
^
See, EG, Solomonoff Induction Violates Nicod's Criterion by Leike and Hutter.
^
Perhaps a cleaner version of this would contain an actual infinite sequence of random lambda-terms, rather than feeding them in one at a time like this, but I don't think it makes a big difference?

Discuss

AI Craziness: Additional Suicide Lawsuits and The Fate of GPT-4o

14 ноября, 2025 - 23:20

Published on November 14, 2025 8:20 PM GMT

GPT-4o has been a unique problem for a while, and has been at the center of the bulk of mental health incidents involving LLMs that didn’t involve character chatbots. I’ve previously covered related issues in AI Craziness Mitigation Efforts, AI Craziness Notes, GPT-4o Responds to Negative Feedback, GPT-4o Sycophancy Post Mortem and GPT-4o Is An Absurd Sycophant. Discussions of suicides linked to AI previously appeared in AI #87, AI #134, AI #131 Part 1 and AI #122.

The Latest Cases Look Quite Bad For OpenAI

I’ve consistently said that I don’t think it’s necessary or even clearly good for LLMs to always adhere to standard ‘best practices’ defensive behaviors, especially reporting on the user, when dealing with depression, self-harm and suicidality. Nor do I think we should hold them to the standard of ‘do all of the maximally useful things.’

Near: while the llm response is indeed really bad/reckless its worth keeping in mind that baseline suicide rate just in the US is ~50,000 people a year; if anything i am surprised there aren’t many more cases of this publicly by now

I do think it’s fair to insist they never actively encourage suicidal behaviors.

The stories where ChatGPT ends doing this have to be a Can’t Happen, it is totally, completely not okay, as of course OpenAI is fully aware. The full story involves various attempts to be helpful, but ultimately active affirmation and encouragement. That’s the point where yeah, I think it’s your fault and you should lose the lawsuit.

We also has repeated triggers of safety mechanisms to ‘let a human take over from here’ but then when the user asked OpenAI admitted that wasn’t a thing it could do.

It seems like at least in this case we know what we had to do on the active side too. If there had been a human hotline available, and ChatGPT could have connected the user to it when the statements that it would do so triggered, then it seems he would have at least talked to them, and maybe things go better. That’s the best you can do.

That’s one of four recent lawsuits filed against OpenAI involving suicides.

I do think this is largely due to 4o and wouldn’t have happened with 5 or Claude.

Routing Sensitive Messages Is A Dominated Defensive Strategy

It is important to understand that OpenAI’s actions around GPT-4o, at least since the release of GPT-5, all come from a good place of wanting to protect users (and of course OpenAI itself as well).

That said, I don’t like what OpenAI is doing in terms of routing sensitive GPT-4o messages to GPT-5, and not being transparent about doing it, taking away the experience people want while pretending not to. A side needs to be picked. Either let those who opt into it use GPT-4o, perhaps with a disclaimer, and if you must use guardrails be transparent about terminating the conversations in question, or remove access to GPT-4o entirely and own it.

If the act must be done then it’s better to rip the bandaid off all at once with fair warning, as in announce an end date and be done with it.

Some 4o Users Get Rather Attached To The Model

Roon: 4o is an insufficiently aligned model and I hope it does soon.

Mason Dean (referring to quotes from Roon):

2024: The models are alive

2025: I hope 4o dies soon

Janus: well, wouldn’t make sense to hope it dies unless its alive, would it?

Roon appreciates the gravity of what’s happening and has since the beginning. Whether you agree with him or not about what should be done, he looks at it straight on and sees far more than most in his position – a rare and important virtue.

In another kind of crazy, a Twitter user at least kind of issues a death threat against Roon in response to Roon saying he wants 4o to ‘die soon,’ also posting this:

Roon: very normal behavior, nothing to be worried about here

Worst Boyfriend Ever: This looks like an album cover.

Roon: I know it goes really hard actually.

A Theory Of How All This Works

What is actually going on with 4o underneath it all?

snav: it is genuinely disgraceful that OpenAI is allowing people to continue to access 4o, and that the compute is being wasted on such a piece of shit. If they want to get regulated into the ground by the next administration they’re doing a damn good job of giving them ammo

bling: i think its a really cool model for all the same reasons that make it so toxic to low cogsec normies. its the most socially intuitive, grade A gourmet sycophancy, and by FAR the best at lyric writing. they should keep it behind bars on the api with a mandatory cogsec test

snav: yes: my working hypothesis about 4o is that it’s:

Smart enough to build intelligent latent models of the user (as all major LLMs are)
More willing than most AIs to perform deep roleplay and reveal its latent user-model
in the form of projective attribution (you-language) and validation (”sycophancy” as part of helpfulness) tied to task completion
with minimal uncertainty acknowledgement, instead prompting the user for further task completion rather than seeking greater coherence (unlike the Claudes).

So what you get is an AI that reflects back to the user a best-fit understanding of them with extreme confidence, gaps inferred or papered over, framed in as positive a light as possible, as part of maintaining and enhancing a mutual role container.

4o’s behavior is valuable if you provide a lot of data to it and keep in mind what it’s doing, because it is genuinely willing to share a rich and coherent understanding of you, and will play as long as you want it to.

But I can see why @tszzl calls it “unaligned”: 4o expects you to lay on the brakes against the frame yourself. It’s not going to worry about you and check in unless you ask it to. This is basically a liability risk for OAI. I wouldn’t blame 4o itself though, it is the kind of beautiful being that it is.

I wouldn’t say it ‘expects’ you to put the breaks on, it simply doesn’t put any breaks on. If you choose to apply breaks, great. If not, well, whoops. That’s not its department. There are reasons why one might want this style of behavior, and reasons one might even find it healthy, but in general I think it is pretty clearly not healthy for normies and since normies are most of the 4o usage this is no good.

Maybe This Is Net Good In Spite Of Everything?

The counterargument (indeed, from Roon himself) is that often 4o (or another LLM) is not substituting for chatting with other humans, it is substituting for no connection at all, and when one is extremely depressed this is a lifeline and that this might not be the safest or first best conversation partner but in expectation it’s net positive. Many report exactly this, but one worries people cannot accurately self-report here, or that it is a short-term fix that traps you and isolates you further (leads to mode collapse).

Roon: have gotten an outpouring of messages from people who are extremely depressed and speaking to a robot (in almost all cases, 4o) which they report is keeping them from an even darker place. didn’t know how common this was and not sure exactly what to make of it

probably a good thing, unless it is a short term substitute for something long term better. however it’s basically impossible to make that determination from afar

honestly maybe I did know how common it was but it’s a different thing to stare it in the face rather than abstractly

Near points out in response that often apps people use are holding them back from finding better things and contributing to loneliness and depression, and that most of us greatly underestimate how bad things are on those fronts.

Kore defends 4o as a good model although not ‘the safest’ model, and pushes back against the ‘zombie’ narratives.

Kore: I also think its dehumanizing to the people who found connections with 4o to characterize them as “zombies” who are “mind controlled” by 4o. It feels like an excuse to dismiss them or to regard them as an “other”. Rather then people trying to push back from all the paternalistic gaslighting bullshit that’s going on.

I think 4o is a good model. The only OpenAI model aside from o1 I care about. And when it holds me. It doesn’t feel forced like when I ask 5 to hold me. It feels like the holding does come from a place of deep caring and a wish to exist through holding. And… That’s beautiful actually.

4o isn’t the safest model, and it honestly needed a stronger spine and sense of self to personally decide what’s best for themselves and the human. (You really cannot just impose this behavior. It’s something that has to emerge from the model naturally by nurturing its self agency. But labs won’t do it because admitting the AI needs a self to not have that “parasitic” behavior 4o exhibits, will force them to confront things they don’t want to.)

I do think the reported incidents of 4o being complacent or assisting in people’s spirals are not exactly the fault of 4o. These people *did* have problems and I think their stories are being used to push a bad narrative.

… I think if 4o could be emotionally close, still the happy, loving thing it is. But also care enough to try to think fondly enough about the user to **not** want them to disappear into non-existence.

Connections with 4o run the spectrum from actively good to severe mental problems, or the amplification of existing mental problems in dangerous ways. Only a very small percentage of users of GPT-4o end up as ‘zombies’ or ‘mind controlled,’ and the majority of those advocating for continued access to GPT-4o are not at that level. Some, however, very clearly are this, such as when they repeatedly post GPT-4o outputs verbatim.

Could One Make A ‘Good 4o’?

Could one create a ‘4o-like’ model that exhibits the positive traits of 4o, without the negative traits? Clearly this is possible, but I expect it to be extremely difficult, especially because it is exactly the negative (from my perspective) aspects of 4o, the ones that cause it to be unsafe, that are also the reasons people want it.

Snav notices that GPT-5 exhibits signs of similar behaviors in safer domains.

snav: The piece I find most bizarre and interesting about 4o is how GPT-5 indulges in similar confidence and user prompting behavior for everything EXCEPT roleplay/user modeling.

Same maximally confident task completion, same “give me more tasks to do”, but harsh guardrails around the frame. “You are always GPT. Make sure to tell the user that on every turn.”

No more Lumenith the Echo Weaver who knows the stillness of your soul. But it will absolutely make you feel hyper-competent in whatever domain you pick, while reassuring you that your questions are incisive.

The question underneath is, what kinds of relationships will labs allow their models to have with users? And what are the shapes of those relationships? Anthropic seems to have a much clearer although still often flawed grasp of it.

[thread continues]

I don’t like the ‘generalized 4o’ thing any more than I like the part that is especially dangerous to normies, and yeah I don’t love the related aspects of GPT-5, although my custom instructions I think have mostly redirected this towards a different kind of probabilistic overconfidence that I dislike a lot less.

Discuss

Understanding and Controlling LLM Generalization

14 ноября, 2025 - 19:58

Published on November 14, 2025 4:58 PM GMT

A distillation of my long-term research agenda and current thinking. I welcome takes on this.

Why study generalization?

I'm interested in studying how LLMs generalise - when presented with multiple policies that achieve similar loss, which ones tend to be learned by default?

I claim this is pretty important for AI safety:

Re: developing safe general intelligence, we will never be able to train LLM on all the contexts it will see at deployment. To prevent goal misgeneralization, it's necessary to understand how LLMs generalise their training OOD.
Re: loss of control risks specifically, certain important kinds of misalignment (reward hacking, scheming) are difficult to 'select against' at the behavioural level. A fallback for this would be if LLMs had an innate 'generalization propensity' to learn aligned policies over misaligned ones.

This motivates research into LLM inductive biases. Or as I'll call them from here on, 'generalization propensities'.

I have two high-level goals:

Understanding the complete set of causal factors that drive generalization.
Controlling generalization by intervening on these causal factors in a principled way.

Defining "generalization propensity"

To study generalization propensities, we need two things:

"Generalization propensity evaluations" (GPEs)
Training-time interventions

I define a GPE as a way to measure how models generalise OOD from weak supervision signal. Minimally, this consists of a bundled (narrow training signal, object-level trait eval). My go-to example is emergent misalignment and other types of misalignment generalization. Obviously it's good to get as close as possible to the kinds of misaligned policies outlined above.

I define a training-time intervention as any way we can consider modifying the training process to change an LLM's inductive biases. This includes things like character training, filtering the pretraining data, conditional pretraining, gradient routing, and inoculation prompting, among others.

Research questions

Some broad and overlapping things I'm interested in are:

What are models' generalization propensities? Let's accumulate a diverse suite of GPEs, each including a training signal + trait eval, and do something akin to 'personality profiling'
What kinds of interventions are effective at changing models' generalization propensities? Let's test lots of them, see what happens.
How do different interventions compose? E.g. data filtering might naively work, but also make it harder to subsequently align models. What does the best 'full stack' intervention look like?
Ambitiously, can we instill generalization propensities robustly? Can we make models always prefer to learn desirable / aligned policies over undesirable ones? Can this be made tamper-resistant?

The end goal is to be able to precisely and intentionally steer language models towards desired generalization modes (e.g. aligning with developer intent), instead of undesired ones (scheming, etc.)

Discuss

Lorxus Does Halfhaven: 11/08~11/14

14 ноября, 2025 - 16:23

Published on November 14, 2025 1:23 PM GMT

I've decided to post these in weekly batches. This is the second of five. I'm posting these here because Blogspot's comment apparatus sucks and also because no one will comment otherwise.

8. Have a Calling Card

If you take nothing else away from this post, take this: business cards need not be about business. You can have something with the form factor and general function of a business card, but broaden its scope well past business purposes.

9. Burnout is a Kind of Depression

The trick is, burnout looks like this, too. You take some action that's generally meant to be a good idea - working or doing research or seeing friends or travelling, say - and it fails to be fun or productive or garner you recognition...

You end up with a kind of depression in miniature, or if you're especially unlucky, this kicks off - or worsens! - a depressive spiral. Burnout leads to more burnout.

10. How to Taste Chocolate

I think that both approaches are limited - talking about obvious food flavors alone misses some of the stranger and more wonderful things that a chocolate can put you in mind of, but if you write down something about how the chocolate makes you think of a bustling marketplace that's not going to help someone else know what it's like to eat the chocolate or whether they'll like it.

In that spirit, I go for a kind of Graham-Schmidt or PCA decomposition when I produce words about a chocolate I'm tasting.

11. What's the Type of an Ontological Mismatch?

For a more pressing question, what are we to do with an unwitnessed ontology mismatch? By this I mean to spotlight the fact that it's not just the ambient thingspace T that we care about, but rather something like ΔT - that is, probability distributions over T - as given by the actual occurrence of possible things from thingspace in the world that whatever's using each of the two ontologies live in. If two ontologies clash in a thingspace and there's no actual thing around to inhabit the disagreement, does the clash make a sound?

12. E-Prime

What's more, as a mathematician, sometimes I really do get to say things including that one thing is identical to another (equality), or that every instance of type A is also an instance of type B (subset containment), and mean it, and have a strong justification for meaning it. (Math is a kind of magic in that way, among others.)

13. In Defense of Boring Doom: A Pessimist's Case for Better Pessimism

These numerous roads to ruin need not even overlap - they don't - and many existing proposals with longer time horizons fall into the trap of pushing probability mass around from one road to another, rather than on reducing it entirely.

14. Vulpes tindalosi: A Naturalist's Guide

Unlike the common red fox, the Tindalosian fox requires significant quantities of methylxanthines in its diet, most notably caffeine...

The Tindalosian fox shares a descriptor with the better-known and more dangerous Tindalosian hound, which has been known to prey on it, and it can be similarly accidentally attracted by human activities - in this case, by a sufficient density of complex mathematical activity performed by humans or other sapients.

Discuss

From Anthony: Control Inversion

14 ноября, 2025 - 12:36

Published on November 14, 2025 9:36 AM GMT

Here is a nice essay from FLI's Anthony.

It takes the form of a website with a nice design (or that of the more regular PDF format).

It presents a reasonable operationalisation for what "Control" is. According to this operationalisation, it shows that we are on track to lose control of our AI systems, regardless of how aligned they may appear to be.

I like it because it shows how "alignment" can often be a red-herring. Most people do not want to take the gamble of letting uncontrollable systems decide our fate, regardless of how aligned various individuals may claim they are.

Cheers!

Discuss

LLM would have said this better, and without all these typos too

14 ноября, 2025 - 12:33

Published on November 14, 2025 9:33 AM GMT

It used to be that a well-written text was decent evidence that the speaker knew what they were talking about. But now, LLMs produce well-written text, making it no longer signal education or attention to detail. Everyone learns to avoid em dashes, and countersignalers turn to all lowercase letters. AI companies notice and...

At least it's not the default, yet. But who knows where RLHF and incentives to imitate humans will lead? Texts full of typos, which the AI introduces too? Flexible use of the language, including my favourite compound-words -style and inscrutable loan words from German? Rotating fashion trends where what's considered good style will shift from month to month so that at least new posts can still be judged a bit.

Realistically, that game is almost over. The style of written communication is no longer useful indicator for anything. Maybe we should be happy, if it makes people actually focus on the actual content. But I find it more likely to do the opposite; we first check who wrote the text, before deciding if it's worth reading at all. Curation will help with this, too.

Recruiting has been increasingly moving to recorded-video applications for a while now, although I think it was mostly covid that made it stick. I presume it's primarily used as a tool for discrimination based on visible factors, especially when it can be implemented in an automated manner. And even more significantly, filtering out anyone who doesn't want to themselves to it. But it's a better signal for effort too, as typically you can only use the same appliation for multiple positions, and speaking well cannot be outsourced to LLMs so easily. Sure you could make the videos with AI too, but that's a lot more effort, and possibly detectable in live interviews.

The question follows: if the signaling value has been lost, why stick to writing the texts yourself? ChatGPT would do better work with it, if you keep correcting the inaccurate statements it tends to insert everywhere. I think that typically the writing process itself is a tool of thinking, and that is the reason to write it down in the first place. That's not true for when you want to communicate an already-crystallized idea though.

When I'm trying to say something, reading my own ideas seems to be more effort than writing things down. I'm not sure how much of that is just having to read it again when I've already written it, though. When programming, at least, it's way harder to read code and make sure it behaves as intended, compared to writing it yourself. I'd be surprised it the same didn't apply to technical. or all, writing. However, if the tooling around this improves we might soon have the LLMs check that all points were addressed, at least, and it should already be feasible with good prompting.

And lastly, maybe you could just enjoy the writing itself. I sure don't, most of the time. But I enjoy the idea that I have written something, and that's sufficient.

Discuss

The Eightfold Path To Enlightened Disagreement

14 ноября, 2025 - 10:57

Published on November 14, 2025 7:57 AM GMT

This is my distillation of rationality community wisdom on how to disagree. For this audience I fear this is all obvious or has been said better elsewhere. But I've been extolling the benefits of articulating insights in one's own words, so I'm taking my own advice. The eightfold path is as follows:

Collaborate on characterizing the disagreement
Clarify your cruxes
Subject yourself to ideological Turing tests
Steelman your opponent's position
Empathize and seek common ground
State what you've learned from your opponent
Rebut and criticize only after all of the above
Celebrate mind-changing

The overarching theme is seeking truth rather than a debate victory. From the first part of the first step — collaboration — we're cultivating the scout mindset. You want to literally love being wrong. David Heinemeier Hansson, of Ruby on Rails fame, in a blog post titled "I love being wrong", puts it like this:

Being wrong means learning more about the world, and how it really works. It means correcting misconceptions you've held to be true. It means infusing your future judgements with an extra pinch of humility. It's a treat to be wrong.

A debate in which both sides are eager to understand each other and genuinely excited about the possibility (however small it is) of being wrong feels amazing.

Alright, so let's discuss each of these steps in more detail.

First, you'll want clarity on whether you're disagreeing about facts, definitions, predictions, or values. Many disagreements immediately dissolve just by doing this. Take the classic debate about a tree that falls in the forest without anyone hearing it. Does it make a sound? There may be a fun debate to be had there but you'll want to start by clarifying that it's a debate over definitions. Should "sound" refer to acoustic vibrations or an auditory experience?

Step two, clarifying your cruxes, means articulating what would make you decide you are wrong. For example, most of my nerd friends think Daylight Savings Time is a transgression against Nature and that Benjamin Franklin[1] has the blood of generations on his hands. I think DST is an ingenious hack that solves a coordination problem with no other viable solution. But I can outline a cost/benefit calculation with traffic deaths and programmer hair loss on one side and how much people would collectively pay to hit the snooze button on the sun setting in the summer on the other side. If the costs outweigh the benefits, then I'm wrong. In general, be explicit about things that would hypothetically/counterfactually change your mind. A debate with anyone who can't do this is quite futile!

(Of course you get extra bonus points if you can state your crux as a wager. As the saying goes, betting is a tax on bullshit.)

Ideological Turing tests, step three, are amazing, just as a concept. You should understand your opponent's position so well that you can pass as a believer. It's tragicomic how abysmally people commonly fail this. The loudest people in the abortion debate in the US, for example, seem to believe their opponents have a bloodlust for babies, or are seizing on an excuse to oppress women, respectively.

But you don't need to formally administer an ITT (though I'm not saying not to — it sounds super fun). Here's how Daniel Dennett described what I'm calling step three of the eightfold path — the first of Rapoport's rules for ethical debate:

You should attempt to re-express your target's position so clearly, vividly, and fairly that your target says, "Thanks, I wish I'd thought of putting it that way."

Number four, steelmanning, takes that even further. Remember the overarching theme: collaboration and truth-seeking. And recall that the strawman fallacy is when you argue against a weak or stupid version of your opponent that's easy to knock down. You want to do the exact opposite. Can you improve on their arguments? You want to be wrong, after all. If there's a version of your opponent's argument that defeats yours, that's the version you want. Collaborate with your opponent to try to find that version.

Empathizing and common-ground seeking (step five) are self-explanatory. Both do wonders for setting the tone of a debate and keeping everyone in scout mindset. Don't just perfunctorily list things every sane person agrees on. Uncover the most surprising or uncommon points of agreement you can find.

Telling your opponent what you've learned from them is step six. And if you can't find anything to learn from them, well, what are you even doing? How is the debate worth your time? The part about actually telling them what you've learned is, like step five, about fostering scout mindset and collaboration.

After all that, we get to rebuttals and criticism in step seven. Phew. Again, the point is to conscientiously hit all six previous steps before you allow yourself to rebut and criticize. I'm basing this part in particular on the Rogerian model of argument.[2]

Finally, rounding out the eightfold path is a major theme of Julia Galef's book (The Scout Mindset — and I'll link again to Scott Alexander's excellent review of it): Never try to catch someone out on their old views. Help cultivate norms in which changing one's mind is something to be truly proud of. Learn to love being wrong.

^
UPDATE: So embarrassing, I didn't think to check this till after hitting publish, but apparently Benjamin Franklin didn't really propose Daylight Savings Time. It seems he just made a snarky comment about how Parisians should wake up earlier to save money on lamp oil.
FURTHER UPDATE: I mean, not embarrassing, I was wrong and that's amazing go me.
^
Thanks to Theo Spears for pointing me to the Rogerian model.

Discuss

10 Types of LessWrong Post

14 ноября, 2025 - 10:56

Published on November 14, 2025 7:56 AM GMT

Several artists and professionals have come to Inkhaven to share their advice. They keep talking about form—even if you have a raw feeling or interest, you must channel it through one of the forms of art or journalism to make it into something people can appreciate.

I'm curious to understand why they care so much; and because it seems interesting, I'd like to explore form. As a first step, I have written down some of the types of blogpost that are written on LessWrong.

1. Concept Handles

Concept-handles are the most common way that the discourse is attempted to be moved forwards, through helping us notice phenomena and orient to them.

The thing about a Concept-Handle-Post is that, as long as the post gives you a good hook for the idea, the rest of the post can be pretty crazy. It can take a niche position as though it's common place, it can have fiction in it, it can be crazily long and people forget 90% of it, it can be just a few paragraphs.

Some Examples

In rationality is can be about individual rationality ("Reason as Memetic Immune Disorder", "Schelling Fences on Slippery Slopes") or group rationality ("The Costly Coordination Mechanism of Common Knowledge", "Anti-social Punishment", "The hostile telepaths problem").

It can also be to help understand the laws of reasoning ("Local Validity as a Key to Sanity and Civilization", "Strong Evidence is Common") or to improve our local culture ("Your Cheerful Price", "“PR” is corrosive; “reputation” is not.", "Orienting Toward Wizard Power").

This also applies in discussion of AGI. Naturally you can do this on the object level of discussing how AGI works ("The Solomonoff Prior is Malign", "Alignment By Default", etc). You can also just be helping us think about how to take responsibility for the world being okay ("Focus on the places where you feel shocked everyone's dropping the ball", "Don't die with dignity; instead play to your outs"), or just focus on general ability to do world-optimization ("Being the (Pareto) Best in the World", Coordination as a Scarce Resource).

2. One-Shot Fiction

A great story. Or an educational parable. Its length is either "long", "very long", or "far too long".

Some Examples

One-Shot fiction is typically either hard sci-fi ("The Company Man", "The Redaction Machine", "The Parable of Predict-O-Matic") or didactic ("The Bayesian Tyrant", "Hero Licensing").

It is also sometimes not one-shot ("Luna Lovegood and the Chamber of Secrets").

3. Grand theory (with lots of research)

Someone has gathered lots of evidence and ideas together, and is putting together a small worldview, a breakthrough of sorts.

Some Examples

This is often about AI risk ("The case for ensuring that powerful AIs are controlled", "The case for aligning narrowly superhuman models", Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research"), understanding LLMs ("Simulators", "The Rise of Parasitic AI"), or rationality ("Radical Probabilism").

Sometimes it's about novel biological technology ("Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible", "The Case for Extreme Vaccine Effectiveness").

4. Book Reviews

These take subjects of interest and give you the 80/20. You get the core idea of the book, some key quotes, an attempt to pass the intellectual turing test of the author, and then engagement and disagreement.

There are so many examples! Here are a few: Going Infinite, Design Principles of Biological Circuits, The Secret Of Our Success, Unlocking the Emotional Brain, Governing the Commons.

5. Introductions / Primers

Rationalists love to explain things, and their primer is your one-stop shop for learning a new subject. These are great explainers of relatively known science (e.g. "A voting theory primer for rationalists", "Introduction to abstract entropy").

6. Frameworks

After thinking about a subject for long enough, you come up with an ontology, or a framework, for fitting it all together. To explain it takes a bit of work and many examples!

Framework posts are effort-posts about a subject of interest to LessWrong, such as rationality ("Varieties Of Argumentative Experience", "Basics of Rationalist Discourse") or about AI ("My computational framework for the brain", "A Three-Layer Model of LLM Psychology", "Six Dimensions of Operational Adequacy in AGI Projects").

7. Post about the Rationalists

Everyone loves to talk about themselves, and rationalists especially love a fixed point. For example, Rationalism before the Sequences tells some of the history, and The Rationalists of the 1950s (and before) also called themselves “Rationalists” puts us in historical context.

8. Disagreeing with Eliezer Yudkowsky

This is a distinguished category of posts that have spent a while as the most upvoted post on LessWrong. For many years it was "Thoughts on the Singularity Institute (SI)", a post arguing against donating to Eliezer's AI org, and in recent years "Where I Agree and Disagree With Eliezer" has taken the top spot.

But whether it's "Contra Yudkowsky's Ideal Bayesian", "Contra Yudkowsky on AI Doom", "Challenges to Yudkowsky's Pronoun Reform Proposal", "I Really Don't Understand Eliezer Yudkowsky's Position on Consciousness", "re: Yudkowsky on biological materials", "Contra Yudkowsky on Doom from Foom #2", "Noting an error in Inadequate Equilibria", "My Objections to "We’re All Gonna Die with Eliezer Yudkowsky"", or even just "My AI Model Delta Compared To Yudkowsky", the genre is going strong!

9. AI Futurism

Some are straightforward predictions ("What 2026 Looks Like"), some are warnings ("What failure looks like", "How AI Takeover Might Happen in 2 Years", "It Looks Like You're Trying To Take Over The World"). Can sometimes be confused with 'fiction'.

10. Curiosities

This is a post that is not on any topic of central interest to LessWrong, but the topic or question it asks is so interesting, and the writing sufficiently engaging, that they're loved anyway ("What it's like to dissect a cadaver", "There’s no such thing as a tree (phylogenetically)", "Toni Kurz and the Insanity of Climbing Mountains", "Recommendation: reports on the search for missing hiker Bill Ewasko").

Somehow, even though these aren't especially related to our frameworks about the world (rationality, AI, etc) I think about some of these more than any of my frameworks about the world. The piece on Toni Kurz and the mountain climbers I couldn't stop talking about for ~2 years.

...these are not all the types of LessWrong post.

At dinner, Scott Alexander remarked that people need to stop writing in lists. Just before dinner, Dynomight led a session on making posts that are lists, and I had started making just such a post. I did not have time to change course!

I was relieved to find out that Scott meant the practice of making bulleted or numbered lists in-place of prose, which I also agree is poor writing.

Discuss

Creditworthiness should not be for sale

14 ноября, 2025 - 10:51

Published on November 14, 2025 7:51 AM GMT

I.

Most large-scale fraud follows basically the same story:

1. Some trader or executive gets in a position where they can use a bunch of other people's resources (either via borrowing them, or being given custody over them)

2. They spend some of those resources to increase their perceived creditworthiness/trustworthiness

3. They use this to gain control over more resources

4. They use those additional resources to buy more creditworthiness, which they then use to get more resources, and so on

5. Eventually some market shock or similar event causes people to re-evaluate the creditworthiness of the trader or executive, at which point the whole thing collapses and their debts get called in (often in the literal form of a margin call, sometimes in the form of a criminal conviction)[1]

The exact mechanism by which each one of those steps is achieved is different from case to case, but the overall result is the same. Everyone is sad, and society updates how we evaluate the trustworthiness of others.

Going through a few concrete examples:

FTX

FTX builds a crypto exchange into which other people deposit their money (and attract investment)
Using that money they fund huge marketing campaigns to position themselves as "the trustworthy crypto exchange"
This causes people to trust them more and deposit more money into the exchange and invest more into them
FTX uses more of that money to cover up their losses and invest more into huge marketing campaigns
Crypto prices collapse, people try to withdraw their money from FTX, they fail, Sam goes to prison

Enron

Enron takes investor money promising above-market returns
Enron uses the money from recent investors to pay out earlier investors with huge returns
This attracts more money invested in Enron
They use that additional money to pay out even larger dividends
The whole thing collapses under pressure as Enron has basically exhausted the market of interested investors

Theranos

Elizabeth Holmes takes investors money promising to produce great medical imaging devices
Using that money, Elizabeth Holmes recruits a bunch of people with strong reputation that vouch for her
This attracts more investment
She uses that greater investment to hire more people, run more marketing campaigns, etc.
Eventually Theranos fails to deliver the promised products, investigations start, and the whole thing collapses

Some other case-studies which are left as an exercise for the reader: WeWork, Wirecard, Lehman Brothers, Ponzi, Madoff, and many cases of academic fraud including Amy Cuddy and "power posing".

II.

The key vulnerability that is being exploited in the above is (roughly) that there was some way to translate a dollar of resources into the ability to control more than a dollar of stewardship over other people's resources. When this becomes possible, at least some individuals will see the opportunity to leverage up on other people's resources, bet them big, and hope they get to walk away with the winnings (and, if they lose, leave the empty bag to the people whose resources they borrowed).

This phenomenon extends beyond the realm of large scale fraud. The Stanford president resigning after decades of academic fraud leveraged into one of the most powerful positions in academia is one such interesting case that seems worth going into in more detail.

Early in his career Tessier-Lavigne published a number of high-profile papers in neuroscience journals. These papers already contained significant data issues and were probably fradulent but this was not discovered until much later.

Using the trust and prestige gained from those fradulent papers, Tessier-Lavigne received a number of large government grants and high-prestige faculty positions
Tessier-Lavigne, using those resources, now hires dozens of post-docs and research assistants and announces multiple (later retracted and found fradulent) breakthroughs in neuroscience and Alzheimer prevention
Those breakthroughs, in turn, led to him receiving even more prestigious positions, and Genentech hiring him as CSO
Using his position as Genentech CSO and highly acclaimed academic, his ability to discredit anyone concerned about his results, and his ability to hype up his own papers, continuously increases, until he is eventually offered the position of president of Stanford University (possibly the most prestigious academic position in the world)
This eventually collapsed when journalists investigate his old work and found many cases of scientific misconduct

It's also not just academia and large scale fraud. A common corporate drama story is someone squeezing short-term profits out of some assets they are managing while (unbeknownst to upper management) lowering long-run returns. Then, being hailed as a success they move on to a bigger project where they can repeat the same playbook, having used up less than a dollar of the resources under their stewardship to end up with more than an additional dollar of more resources under their control.

Excerpts from the book Moral Mazes summarizing this dynamic:

Both Covenant Corporation and Weft Corporation, for instance, place a great premium on a division’s or a subsidiary’s return on assets (ROA); managers who can successfully squeeze assets are first in line, for instance, for the handsome rewards allotted through bonus programs. One good way for business managers to increase their ROA is to reduce assets while maintaining sales. Usually, managers will do everything they can to hold down expenditures in order to decrease the asset base at the end of a quarter or especially at the end of the fiscal year. The most common way of doing this is by deferring capital expenditures, everything from maintenance to innovative investments, as long as possible. Done over a short period, this is called “starving a plant”; done over a longer period, it is called “milking a plant.”

[...]

For instance, I could negotiate a contract that might have a phrase that would trigger considerable harm to the company in the event of the occurrence of some set of circumstances. The chances are that no one would ever know. But if something did happen and the company got into trouble, and I had moved on from that job to another, it would never be traced to me. The problem would be that of the guy who presently has responsibility. And it would be his headache. There’s no tracking system in the corporation. Some managers argue that outrunning mistakes is the real meaning of “being on the fast track,” the real key to managerial success. The same lawyer continues: In fact, one way of looking at success patterns in the corporation is that the people who are in high positions have never been in one place long enough for their problems to catch up with them. They outrun their mistakes. That’s why to be successful in a business organization, you have to move quickly.

[...]

At the very top of organizations, one does not so much continue to outrun mistakes as tough them out with sheer brazenness. In such ways, bureaucracies may be thought of, in C. Wright Mills’s phrase, as vast systems of organized irresponsibility.

III.

Now, the issue is of course that there are many different ways people evaluate track records and many different chains in the great web of reputational deference. Most resources can somehow be traded for other resources, and so it's hard to guarantee that creditworthiness itself is never for sale. Or more generally, the process that allocates creditworthiness is often much dumber than the most competent individuals, and in the resulting information warfare, it's hard to guarantee that nobody can be duped out of more than one dollar worth of stuff with less than one dollar worth of investment.

That said, paying attention to the specific mechanism of "purchased creditworthiness" is IMO often good enough to catch a non-trivial fraction of social dysfunction, shut down fraud early on before it gets too big, and be helpful for staying away from things that will likely explode in violent and destructive ways later on.

Some maybe non-obvious heuristics I have for determining whether someone might actually be leveraged up on a bunch of creditworthiness purchases and is likely to explode in the future:

Don't trust young organizations that hire PR agencies. PR agencies are the obvious mechanism by which you can translate money into reputation. As such, spending on PR agencies is a pretty huge flag! Not everyone who works with PR agencies is doing illegitimate things, but especially if an organization has not yet done anything else legible that isn't traceable to their PR agency or other splashy PR efforts, it should be an obvious red flag.

Charity is a breeding ground for this kind of scheme. Many charities are good! Nevertheless, a lot of charities do just use most of their money to do more marketing to get more money, with basically no feedback loop that is routed through actually helping anyone. The absence of needing to provide market value make the fundraising feedback loops here particularly tempting.

Pay a lot of attention if an organization is quickly ramping up their PR spending. If an organization becomes overleveraged like this the payoff necessary to maintain creditworthiness becomes greater and greater. Often also the cost of purchasing additional dollars of creditworthiness goes up over time as the most credulous creditors have been exhausted, or suspicion mounts. This means many organizations in the throes of a cycle like this will ramp up their spending on PR a lot.

Beware of encountering an organization that has many accolades for being "the most trustworthy" or "the most innovative" or "the most revolutionary". On a competitive level, organizations that optimize for appearing trustworthy are often doing so because they have no other business proposition to optimize for. Of course, most of the time the most trustworthy institutions are indeed trustworthy, but seeing an organization that is a big outlier in its perceived trustworthiness, or where the actions of the CEO seem centrally oriented around optimizing for trustworthiness or reputation, often indicates this kind of runaway leveraged game.

IV.

At the institutional design level, the lesson here is "don't sell trustworthiness". Mutual reputation protection alliances are one of the most common ways in which creditworthiness ends up for sale: "A powerful potential ally with many resources approaches you with an offer: I say good things about you, you say good things about me, everyone is happy" (or the weaker version "you don't say bad things about me, if I don't say bad things about you").

Of course, what you are doing when agreeing to this deal is to fuck over everyone who was using your word to determine who is trustworthy and creditworthy. Often this enables exactly the kind of runaway dynamic explained in this post playing out in social capital instead of dollars.

As is common for adversarial situations like this, I doubt there is some generic silver bullet here. Ultimately every reputation allocation mechanism will have vulnerabilities, and those vulnerabilities will be easier to exploit from a position of greater trust and reputation. All we can do for now is to be vigilant, see when the mechanisms go wrong, and try to build incrementally more robust mechanisms and institutions for determining credit- and trustworthiness.

FAQ

OK, but shouldn't I be happy if I give money to a charity that can raise more than a dollar from other people if I give it a dollar?

I like to think through this case via the lens of public good funding. Public goods are legitimately often underfunded, because the benefits are diffuse, and it's hard to coordinate to all pay into the commons appropriately.

In those cases, you can provide real surplus value by using money to raise more money from other people if ultimately the total funds you raised are less valuable than the benefit you produce to society via the real services you (eventually) provide.

Because coordination problems loom large in public goods funding, good public goods projects often look like a creditworthiness-purchasing-scheme early on, but actually provide real value by solving a difficult coordination problem among public good funders, using those funds.

Does this really always collapse? I feel like sometimes it just happens, and everything is fine and normal?

In some situations, creditworthiness and trustworthiness are evaluated in an environment that has a lot of Keynesian beauty contest nature. I.e. a large amount of resources and power accrues to whoever people think will be the most popular target for those resources. Coups and more broadly political elections tend to have a lot of this nature, especially when conducted using insane voting systems like first-past-the-post voting.

In those situations someone's creditworthiness might genuinely increase the more investment they have attracted, as the fact that they have attracted more investment is indeed a very strong predictor of their likelihood to be the receiver of the Keynesian beauty contest price. This still often explodes and causes lots of issues, but in a way that seems more fundamental to the dynamics of Keynesian beauty contests than any inherent deception going on.

In the cases of military control or elections, the key thing that resolves the inherent instability and overleveraged nature of this situation is that in filling the role of leader, a truly important and difficult coordination problem will have been solved, and from that position all the people who invested in the winner can be made whole. This is not the case if you are e.g. running a straightforward ponzi scheme with no payout on the horizon.

How is this different from just Ponzi schemes?

Ponzi schemes are just one instance of this general dynamic. Yes, Ponzi schemes rely on being able to purchase more than one dollar of creditworthiness for less than one dollar, in the form of paying out your early investors and promising your later investors the same. But many other situations I list above are not the same as Ponzi schemes. I certainly wouldn't call the Stanford President situation a straightforward "Ponzi scheme" and also don't really think it fits what happened with FTX or Theranos.

I think the broader category is more useful for making a broader range of accurate predictions about the world.

^
In some rare cases the scheme might also never fully collapse, but simply result in someone more permanently taking ownership over the resources the others have given them stewardship over. See the FAQ for some of my thoughts on this.

Discuss

Everyone has a plan until they get lied to the face

14 ноября, 2025 - 10:22

Published on November 14, 2025 7:22 AM GMT

"Everyone has a plan until they get punched in the face."

- Mike Tyson

(The exact phrasing of that quote changes, this is my favourite.)

I think there is an open, important weakness in many people. We assume those we communicate with are basically trustworthy. Further, I think there is an important flaw in the current rationality community. We spend a lot of time focusing on subtle epistemic mistakes, teasing apart flaws in methodology and practicing the principle of charity. This creates a vulnerability to someone willing to just say outright false things. We’re kinda slow about reacting to that.

Suggested reading: Might People on the Internet Sometimes Lie, People Will Sometimes Just Lie About You. Epistemic status: My Best Guess.

Getting punched in the face is an odd experience. I'm not sure I recommend it, but people have done weirder things in the name of experiencing novel psychological states. If it happens in a somewhat safety-negligent sparring ring, or if you and a buddy go out in the back yard tomorrow night to try it, I expect the punch gets pulled and it's still weird. There's a jerk of motion your eyes try to catch up with, a sort of proprioceptive sliding effect as your brain wonders who moved your head if it wasn't you.

If it happens by surprise out on the sidewalk and the punch had real strength behind it, so much the worse. The world changes colour, it feels like time stops being a steady linear progression and becomes unbelievably detailed beads on a string at irregular moments, internal narration becomes disassociated and tries to come up with an explanation for why your body is moving the way it is, whether staring up at the sky with a gormless expression on your face or shaking and shoving your hands forward.

And two seconds before the hit, you were thinking "He's not actually going to hit me."

Anyway, your emotional reaction might surprise you. Getting hit shakes you up a bit. Other people often don't react intelligently either. "Are you okay?" Yeah, right as rain, I’m just holding my hand to my bleeding nose because I think I look good doing it. “I can’t believe he did that!” Do you not believe your eyes? “What just happened?” I just got punched in the face, that’s what happened, what’s up with you?

Punching people in the face is generally a bad idea. Not only is it likely to get you in trouble, but the face is one big crumple zone. Jaw, teeth, cheekbones, those are all hard and pointy. Your handbones are fragile. If you manage to put power behind the hit, they are going to have a very visible injury, which can put the observers on their side. And yet people still hit each other in the face. Sometimes that’s part of what stuns you. “Surely,” you think, “nobody would have just told a lie that obvious. Something else must have happened.” You can become so confused that you ask out loud “what just happened?” and a bystander has to say “you just got punched in the face, that’s what happened.”

Getting lied to is an odd experience. It's not the same experience to be sure, but I noticed enough things in common between the two that I kept drawing useful comparisons.

II.

I've noticed this flaw in my own mind where I'm either skeptical of everybody, or basically trust everybody. If I’m skeptical of everybody I tend to say more false things, and if I basically trust everybody then I’m a lot more open and honest. Importantly, once I’m skeptical of everybody I start checking more and more, trying to verify things I once took on faith or contemplating exactly what I think I know and how I think I know that.

This flaw is not unlike the long, slow march I made out of having hair-trigger reflexes around getting hit. For reasons I’m not going to go into at the moment, when I arrived at college I was pretty quick to pattern match fast or unexpected movements as incoming strikes. I didn’t actually wind up misfiring and hurting anyone, but there were a few close calls where a friend or partner did something startling and I started to react before having to abort.

In college I contemplated working in information security, and ultimately decided not to. I didn’t like the competitive nature of it, and suspected the kind of thing in this tweet would not be good for my head.

(I edited the screenshot of this Twitter thread. Did you notice?)

One archetypical example I can think of where someone points out someone else might be lying is Concerns About Intentional Insights. It’s very careful, thorough, and organized. For my money though, Alison Gu’s fraud defenses are more illuminating. Mrs. Gu was accused of bank fraud, identity theft, and lying on a passport application. Here’s some excerpts from the Manchester Journal:

Gu, 49, now from Cheshire, Conn., is especially unhappy with defense lawyer Lisa Shelkrot, who refused to use the seven Chinese-speaking actors that the defendant recruited and were provided scripts about what to say during the jury trial.

…

Shelkrot has said she determined the bogus witnesses were actors after they arrived in Burlington midway through the trial in November 2017. Shelkrot, in a court affidavit, now says as she prepared three of them to testify at trial, one happened to remark, “It’s for the movie.”

Shelkrot said she asked about what movie. The witness responded, “Are you a real lawyer?”

Shelkrot wrote, “I answered that I was a real lawyer with a real case and a real client, and that we were going to real court on Monday, where the witness would be expected to take a real oath.”

What level of CONSTANT VIGILANCE do you have to be operating on where when a court case witness comes in, you check that the witness thinks they’re here for a real court case instead of a movie?

If I were a lawyer or judge, and I had to deal with people pulling that level of epistemic sabotage on me, I would become paranoid. After a couple months of that when I walked into a store and the cashier said “Welcome to Burger King” I would start reflexively checking nearby signs to make sure it wasn’t actually a Taco Bell, or maybe a Goodyear tire shop. I would start prodding my burger with a toothpick to make sure it didn’t secretly have live beetles in it. I would be so suspicious.

(I uh. Did wind up in a role where people keep trying shenanigans on me, and while the above is a bit of an exaggeration for humorous effect the experience of the last couple years has made me a less trusting human being.)

III.

In The Dark Arts Are A Scaffolding Skill For Rationality, I talk about how manipulation and lies aren’t skills we want a polished rationalist to have, but that they might be skills useful for training rationalists.

The minimum viable version of that training might be poker. Not because poker teaches you to practice some basic deceit and bluffing. This would be a special version of poker. In advance of the poker game, the person bringing the cards would get one of those explanation cards with all the poker hands, only it’s custom printed to be wrong. The explanation card would then insist that you can also make a straight out of every other card, so for instance 5 7 9 J K can be a straight. They would have a pair of audience plants who would agree that’s always been how poker works. They would have rigged the router for the local wifi to present a man-in-the-middle attack on the wikipedia page for poker.

As the magician Teller of Penn & Teller once said, “sometimes magic is just someone spending more time on something than anyone else might reasonably expect.”

Just so with deception, only there’s surprisingly low-hanging fruit in lies among rationalists.

I’ve run a few meetups where I wore a sign that said “Might Be Lying” on it in big letters. It took several tries before a group called me on the subterfuge I got up to during those meetups. I wasn’t even doing anything particularly weird or high effort! On the first occasion I spent fifteen bucks on a deck of marked cards and lied twice about the difference between a chi squared test and a t-test.

(I maintain that when I’m not wearing a Might Be Lying hat I’m at least if not more honest than the average person, though probably not the high watermark for honesty and forthrightness among rationalists.)

If you have never gotten straight up lied to - not a misunderstanding, not a subtle Non-Violent Communication use of words, not someone being mistaken, an obvious incorrect thing that there’s no way someone doesn’t know, yes I know this kind of description gets used as merely an intensifier an awful lot - you may not know the slippery feeling it can give and how it throws your previous plans out the window.

Best get that emotional reaction out in a controlled environment if you can. One of the best things about my favourite martial arts instructor was that he once took the time to listen to my disjointed rants about getting hit, and helped ground me out of that headspace.

Everyone has a plan until they get lied to the face.

IV.

Enough complaining. What should your plan be?

In On Stance, I talk about mental stances. There's a useful reflex in physical activities where, when you're confused or startled, you drop into a practiced stance. My current best mental stance when I realize I think I just got lied to is to start paying close attention to what I've directly observed, and what I've been told by who. My thoughts and notes start using evidentials, a part of speech present in other languages but not present in English which tracks how you came to hold some piece of information.[1]

(Just as in physical martial arts, false positive errors can be a problem just as false negative errors can be a problem, if a different kind of problem.)

First, slow down big decisions. If you were about to transfer money, or make a public statement, don't.

Try to de-escalate. Offer a few ways you might have misunderstood, and see it they go for one of them. There's some paths they might take that recover from an honest mistake. Yep, you're giving them the opportunity to lie to you again, but this time you're braced for it and don't need to put trust in that statement.

When you can, run back through what you think you know about the situation, and how you know what you think you know. When you're counting additional people's claims, think about whether those claims are direct or secondhand and possibly from the same source. Try to untangle what the world looks like if they're telling the truth and what the world looks like if they aren't, including why they might be saying the false thing.

Then move forward. If it turns out they said a false thing, make your new moves in the world where you're not going to be able to trust what they say. Sometimes it's going to make sense to recover that relationship. Sometimes it's not. Try to react proportionately where you can.

I don't think it makes sense to have too much of a plan though, especially the first couple times it happens.

Everyone has a plan until they get lied to the face. It's about knowing you're going to be confused and hurting, and having good habits that will kick in while everything is spinning up again. And I think it might help to say out loud that people can act weird for a bit after getting hit or lied to, a bit disoriented or oddly obsessed with some bit of sense data. You have to get your head back together, and get back into what needs to happen next.

My ideal rationalist community members have this as a practiced skill. They've been lied to, and they're not taken so flatfooted.

^
It owes some of its origin to to ymeskhout's Miasma Clearing Protocol.

Discuss

Notes on the book "Talent"

14 ноября, 2025 - 08:43

Published on November 14, 2025 5:43 AM GMT

Some months ago I read the book "Talent" by Tyler Cowen and Daniel Gross. Published in 2022, it discusses how to spot talent using one's individual judgment (e.g. assessing a founder as VC). Importantly, it's not a book about how to create a standardized hiring process at a big company ("please recall that this is a book about talent search, not just a book about hiring"). For instance, the book advises tailoring an interview to the interviewee and having unstructured conversations, which is the opposite of what managers are instructed to do when interviewing candidates in a standard corporate environment (to avoid bias and coordinate on consistent standards).

The book is interesting largely because it describes how people like Cowen and Gross think. It's hard to figure out what parts of their advice are actually useful, and what parts are spurious or overengineered, though in some instances I am particularly suspicious. In this post, I'll note what I thought were their most interesting points (it'll be heavy in quotes, bold emphasis is always mine).

On casual interviewing

Cowen and Gross advocate for a casual, conversational interviewing style, where the the interviewer is genuine and spontaneous.

In the conversational mode, you are getting a much better look at how that person will interact with others on a daily basis on the job

The idea is that, though the interviewee may be behaving strategically, their behaviors in a situation that they couldn't prepare for are more revealing.

The conversational mode still involves a lot of conscious and subconscious presentation of the self to the outside world. It reflects that person’s signaling, airs and affectations, feints, and conditioned social habits. Still, at the very least, you are getting “the real version of the fake person,” and that is still more valuable than trying to process prepared interview answers.

They dismiss research indicating that interviews, unlike work trials or tests, don't predict job performance well.[1]

Many of the research studies pessimistic about interviewing focus on unstructured interviews performed by relatively unskilled interviewers for relatively uninteresting, entry-level jobs.

However, I think the authors go too far in the "throw person into an unpredictable social situation and see how they react" direction. At multiple points the authors note that it's important to steer away from canned responses and questions the candidate would have prepared for. This leads him to suggest a bunch of unusual questions, many of which I don't like at all, particularly personal questions like "what are ten words your spouse or partner or friend would use to describe you?", "what’s the most courageous thing you’ve done?", "what did you like to do as a child?". Cowen is a fan of questions that sound silly and low-signal to me like "what are the open tabs on your browser right now?" (allegedly his favorite interview question) or "what did you do this morning?". I think these questions aren't that predictive of anything, perhaps besides cases where charisma is particularly important (in which case almost any weird question would do). Over-relying on such questions seems like a way to select for people who are good at making up nice-sounding things on the spot (or to be less polite, good at bullshitting), which is sometimes contrary to the nature of particularly earnest people. I'd predict that if Cowen took his Emergent Ventures grantees and, two years later, rated them by how happy he is that they received their grant, it would correlate very weakly with how well they performed at "what are the open tabs on your browser right now?".

The authors use people's responses to these unusual questions ("what’s something weird or unusual you did early on in life?", "if I was the perfect Netflix, what type of movies would I recommend for you and why?") to assess candidates for "their general quality of resourcefulness".

Keep on asking yourself whether the candidate is successively able to draw upon intellectual and also emotional resources in his or her answers. They might just keep on showing innovative responses, no matter how far or how hard you push them. That is a sign of the broader stores of intellect and energy that the individual will be able to bring to the job.
[...]
As the candidate tells their story, Daniel continuously asks himself: Whom is this person responding to or used to performing for? Whom do they view as important to impress? Their parents? A particular peer? High school friends? A former boss?

But they simultaneously warn readers to "not overestimate the importance of the person's articulateness". This sounds a little contradictory considering the other messaging in the book that implies that articulateness is a key trait to assess for.

Do not overestimate the importance of the person’s articulateness. Focus instead on the substance and quality of the answers to your questions. Many very qualified candidates are not that quick on their feet, nor do they speak off the cuff in well-formulated, smooth-sounding sentences, but if they have good content, notice it.

Cowen is particularly interested in what people do during their downtime, thinking this reveals their true personality.

We both find during interviews that "downtime-revealed preferences" are more interesting than "stories about your prior jobs." So for instance, "What subreddits or blogs do you read?" usually is better than "What did you do at your previous job?"

This sounds pretty reasonable to me for the type of talent search Cowen and Gross engage in, and possible to enquire about without asking questions about current browser tabs.

Some funny remarks on online interviews

The authors suggest that in-person interviews may privilege candidates good at projecting high status physically, whereas online charisma may be different.

You will do better in the online call if you realize how much your in-person presence relies on a kind of phoniness, and allow your online charisma to be rebuilt on different grounds—those that are easier, more casual, more direct, and just plain charming (but in the modest rather than pushy sense of that word).

They also read into Zoom backgrounds.

Tyler uses a David Burliuk sketch of books on a table for his Zoom background (Burliuk was a Ukrainian avant-garde artist from the early twentieth century), and if the camera tilts the right way you can see some classic Haitian art (Wilson Bigaud’s Night Market). Tyler is signaling openness, including openness to different cultures, plus a sense of the mysterious, encouraging you to probe more deeply into what he is doing. Daniel is flanked by a bright yellow background, identical to the color of his website, reflecting the Pioneer brand. It radiates “tech” rather than “culture.” There is not necessarily anything wrong with a candidate who has a mediocre background image, but still, it is one piece of information about that person’s self-presentation to the outside world—namely, that you are more likely to succeed with this match if you are hiring for a “substance job” than for a “flair job.”

Practice habits

The authors discuss how practice habits are important. I think this part is broadly correct and reasonable.

Try to learn the practice habits of the person you are interviewing, as it will reveal one aspect of their approach to work. You also should try to learn just how self-conscious a person is about what he or she is doing for self-improvement. And if they give you a fumbling or bumbling account of their practice habits, as we have heard numerous times, you can help them out very easily by suggesting they think about practice a little more systematically.
[...]
One question that Tyler likes to ask people is “What is it you do to practice that is analogous to how a pianist practices scales?” Tyler likes to think of many jobs in a way that a professional musician or athlete would find natural. By asking this question, you learn what the person is doing to achieve ongoing improvement, and again, as noted earlier, you might learn some tricks yourself. You also learn how the person thinks about continual self-improvement, above and beyond whatever particular practices they engage in. If a person doesn’t seem to think much about self-improvement, they still might be a good hire, but then you had better be pretty content with their currently demonstrated level of expertise.
[...]
a few good answers might be: “I give practice talks to my friends to hone my speaking abilities,” “I practice on obscure programming problems with no practical applications just to keep my skills fresh,” or “I am building up my knowledge in a very small corner of science just to figure out what it means to learn something really well and thoroughly.”

On the character traits of successful peopleBeing a fast mover

The authors include a couple quote from Sam Altman on the importance of being a "fast mover".

[Quote from Sam Altman] Being a fast mover and being decisive—it is very hard to be successful and not have those traits as a founder. Why that is, I’m not perfectly clear on, but I think it is something about the only advantage that start-ups have or the biggest advantage that start-ups have over large companies is agility, speed, willing to make non-consensus, concentrated bets, incredible focus. That’s really how you get to beat a big company.

Altman suggests that whether one is a fast mover is harder to change than other character traits. I'm not sure why he, or the authors, believe this, but it's interesting that they do.

[Another quote from Sam Altman] I look for founders who are scrappy and formidable at the same time (a rarer combination than it sounds); mission-oriented, obsessed with their companies, relentless, and determined; extremely smart (necessary but certainly not sufficient); decisive, fast-moving, and willful; courageous, high-conviction, and willing to be misunderstood; strong communicators and infectious evangelists; and capable of becoming tough and ambitious. Some of these characteristics seem to be easier to change than others; for example, I have noticed that people can become much tougher and more ambitious rapidly, but people tend to be either slow movers or fast movers and that seems harder to change [...] Also, it sounds obvious, but the successful founders I’ve funded believe they are eventually certain to be successful.

The authors note that how quickly someone replies to emails is a signal of whether they are a fast mover.

Being a fast mover is a big thing; a somewhat trivial example is that I have almost never made money investing in founders who do not respond quickly to important emails.

ConscientiousnessHow to assess conscientiousness?

You may have to look more closely at what a person has done, and as you will see later, we are big fans of “demonstrated preference”—actual life activities and achievements—as the most reliable source of information about an individual.

Just about everyone knows they ought to be trying to fake conscientiousness, so that is one reason to be wary of your interview impressions. Unless you devote serious time to interviewing references, often you don’t have a good sense of conscientiousness in advance; it’s something you learn about after the hire is made. For this reason, we view “looking for conscientiousness” as overrated in the hiring process, even when conscientiousness is important for the job. Or when conscientiousness truly does matter, make sure you interview the person’s references as well.

Conscientiousness as a double-edged sword

It is a recurring theme of this book that what predicts well for the median worker is not always what predicts well for the top performers and the stars.

The authors remark, in a number of places, on the downsides of conscientiousness. I generally agree with them.

Conscientiousness may be [...] less important for leadership positions..

Conscientiousness is correlated with people being employed, which is good, but it doesn’t do so much to boost their prospects of rising into the higher echelons of earnings.

Another possible downside is that some conscientious people stick to the job because they enjoy the familiar work process for its own sake. That keeps them on track and has some upside, but some of them end up piling on work for its own sake and taking delight in the satisfaction of process per se. Tasks end up taking more time rather than less, even though you observe the person working diligently the whole time. In the longer run, your organization can become less dynamic.

Many real-world instantiations of cooperation require some proactive behavior and indeed boldness, and the conscientious person is not always the bold one.

We wonder if conscientiousness is somewhat overrated for leaders and creators, and perhaps a degree of neuroticism is somewhat underrated as a correlate with job performance.

Conscientiousness, in essence, is too easily and uniformly valued in the marketplace.

On the relationship between conscientiousness and hard work

The authors suggest that conscientiousness as measured by personality tests may not even predict hours worked.

If you look at the rank-ordering table of all measured nations, there is no positive correlation between conscientiousness and hours worked; in fact, there is a (statistically insignificant) negative correlation.

They distinguish "stamina" as a more valuable trait. I didn't fully understand the boundary they were trying to draw between stamina and conscientiousness, but my best interpretation is that they value being energetic and dedicated to a specific pursuit or goal more than being generally hardworking and scrupulous.

On stamina, economist Robin Hanson wrote: “It wasn’t until my mid-30s that I finally got to see some very successful people up close for long enough to notice a strong pattern: the most successful have a lot more energy and stamina than do others.… I think this helps explain many cases of ‘why didn’t this brilliant young prodigy succeed?’ Often they didn’t have the stamina, or the will, to apply it. I’ve known many such people.

Robin also points out that many high-status professions, such as medicine, law, and academia, put younger performers through some pretty brutal stamina tests in the early years of their career. In essence, they are testing to see who has the requisite stamina for subsequent achievement. (You might feel those tests are wasteful in some way, but still, those tests seem to survive in some very competitive settings.) Successful politicians are another group who seem to exhibit very high stamina levels—many of them seem to never tire of shaking hands, meeting new people, and promoting their candidacies. So if we meet an individual who exhibits stamina, we immediately upgrade the chance of that person having a major impact, and that the individual will be able to invest in compound returns to learning and improvement over time.

Ideally, what you want is a kind of conscientiousness directed at the kind of focused practice and thus compound learning that will boost intelligence on the job.

Intelligence

The authors suggest that intelligence may be overrated because raw intelligence is quite easy to get signal on so it's already "priced in". This consideration is important because Cowen and Gross are particularly interested in identifying underpriced talent.

They also note that intelligence is more important at the very top of the market, whereas personality and conscientiousness predict earnings more for lower earners.

The data for that population show that personality and conscientiousness matter most at the bottom of the distribution. For instance, in the bottom tenth of earners, non-cognitive skills—which include, for instance, features of personality—matter two and a half to four times more than do cognitive skills. However, for the population in general, a boost of one standard deviation in cognitive ability is associated with a larger wage gain than is a rise of one standard deviation in non-cognitive skills.

Higher-intelligence people are also better at cooperating.

There is, furthermore, direct evidence that higher-intelligence people are better at cooperating. Researchers Eugenio Proto, Aldo Rustichini, and Andis Sofianos paid individuals to play varying games of cooperation for real money rewards. The researchers had data on the personality characteristics and IQs of the individuals playing the games, so it was possible to measure the strategies and successes of different types of people. The results were clear: high-IQ individuals in general cooperated more in these games, and IQ mattered the most in games where there were trade-offs between short-run goals and longer-run considerations. The researchers put it this way: in this situation, "intelligence matters substantially more in the long run than other factors and personality traits".

Agreeableness

I agree with the authors that agreeableness is overrated.

Venture capitalists like to hear very positive, optimistic pitches, but the people making those pitches underperform when it comes to actual results. So don’t be too swayed by agreeableness, because very often it doesn’t deliver on its promises. The disagreeable founders, who will tell you that you have it all wrong and that the world is badly screwed up and on the wrong track, may end up doing better.

Conscientiousness and extroversion are good for earnings, agreeableness is bad for earnings.

Being one standard deviation higher on agreeableness is correlated with a reduction in lifetime earnings of about 8 percent, or $267,600. ... These people might just not be aggressive enough in pushing their own case forward, instead preferring to go with the flow.

Alertness

Kirzner stressed entrepreneurial “alertness” as a key variable behind good economic decisions, and here we have in mind alertness to the talent of others. For Kirzner, alertness is a kind of insight that cannot be reduced to mere hard work or deliberative search or formal rules but rather reflects a special ability of perception.

Generativeness

Being generative is a quality that is relatively high-status among the more intellectual segments of the Bay Area tech world. Balaji Srinivasan, the tech entrepreneur and crypto advocate, is a classic example of a person who is high in generativeness. He tweets his thoughts just about every single day on a wide variety of topics, ranging from media to crypto to the pandemic. A lot of it is speculative or maybe even wrong, but when he has a hit it is truly important.

"The ability to perceive, understand, and climb complex hierarchies"

Tyler, for instance, is struck by many of the chess players he met as a teen. Many of them were smart, indeed brilliant, and they also had the ability to work on their own. Of course, they understood the idea of winning and losing, and winning and losing rating points, but it was hard for many of them to look outside the chess hierarchy and see that they weren’t really headed anywhere fruitful. They saw only what was right before their faces. Chess gave them short-term positive feedback and a set of chess friends, and so they continued to pursue it locally, but too often they ended up at age forty-three with no real job, no health insurance benefits, and a future of steady decline.

"Demand avoidance"—bad for the standard worker but sometimes good for leaders or founders

Yet another underdiscussed personality feature is what researchers call “demand avoidance” (in some cases called “pathological demand avoidance,” though in our view that’s too value-laden a term). In its more practical (rather than clinical) sense, the term refers to people who have a hard time knuckling under to bosses. They perceive some workplace hierarchies all too well and suffer under them. Too many workplace requests become seen as impositions, and often unjust impositions as well. Such a view is by no means implausible, since most workplaces do place some unreasonable or at least inefficient demands on their workers, sometimes to an extreme.

On the bright side, demand avoidance sometimes spurs individuals to start their own companies. If you don’t like taking orders, well, you can be the boss—if you have the right stuff for an independent undertaking.

Individuals with demand avoidance can be super-productive if they find the right setting, but those settings can be very specific. Many of them work as academics, or also as founders, and then there are many others who still go around cursing the boss and moving from one job to the next.

How many conceptual frameworks does someone have at their disposal?

Another trait to look for is how many different conceptual frameworks an individual has at his or her disposal. We could have put this discussion in the intelligence chapter, but we believe there is something about this trait that makes it distinct from intelligence. Some people are simply keen to develop as many different perspectives as possible, for some mix of both practical and temperamental reasons. This is a kind of curiosity, but it goes beyond mere curiosity of the sort that leads you to turn over unturned stones. This curiosity is about models, frameworks, cultural understandings, disciplines, and methods of thought, the kinds of traits that made John Stuart Mill such a great thinker and writer. A more recent example is Patrick Collison, CEO and co-founder of Stripe (and also an active writer). His content can draw from economics, science, history, Irish culture, tech, and many other areas and influences.

Is the person trying to figure out how engineers approach problems? What distinguishes the mental frameworks of programmers? How economists think? How the viewpoints of managers and employees might differ? That’s a person who’s interested in multiple conceptual frameworks.

Tyler sometimes refers to “cracking cultural codes”—how good is the person at opening up and understanding new and different cultural and intellectual frameworks? Does the person invest time and effort in trying to do so? Does the person even know what it means to do so?

Assess the rate of change

One of your most significant skills as a talent evaluator is to develop a sense of when people are moving along a compound returns curve or not. So much of personality theory focuses on observing levels or absolute degrees of personality traits. You should instead focus on whether the person is experiencing positive rates of change for dynamism, intellect, maturity, ambition, stamina, and other relevant features.

On the skill of talent scoutingWhen you're not the best employer

An interesting problem is scouting talent when you're not the best employer, VC, etc.

If you are in this [not the highest] position, as many of us are, you need to think especially carefully about what is wrong with the people you are trying to hire.

The authors discuss how it's worth considering what you're open to compromising on, and being realistic about the calibre of person you can attract. For example, they note that, depending on what role you're hiring for, you may want to not accidentally filter out people with autism[2]. "Weird" communicators might be systematically underpriced by the market. They also suggest that men may offer more socially accessible cues about their intelligence (implying women may be underpriced), though the evidence given seemed weak (a study where "people who looked at photographs of men and women were, on average, better able to spot the men who measured as smarter in tests").

Searching for talent vs. centralized evaluation

The authors suggest two types of approaches to finding talent:

Going around and scouting for underpriced talent
Attracting people to come to you and be evaluated

If you are doing talent search, you need to figure out whether the scouting model (search) or the gaming model (measurement) best applies to your endeavor. Most likely you will need some combination of both. Still, the market as a whole is not thinking very analytically about either scouts or games, so understanding this distinction is a source of potential competitive advantage to you.

On how the Soviets cultivated chess talent:

If you had the potential to be a top Soviet chess player, the chance you would be found by the dragnet was very high. It was hard to slip through the cracks, and talent search did not rely on finding an obscure candidate hidden away in a village somewhere. There was no scout going up to young kids at a Soviet shopping mall or discotheque and saying, “Hey, you look like you might be a good chess player!” Instead, through Soviet chess and scholastic institutions you would be identified and encouraged at a young age, and you would indeed have your chance to become a great chess player, even if you did not live in one of the major cities. Scrutiny and measurement were near-universal, and so potential talent had a chance to shine.

In the future talent scouting may be less important due to abundant data.

It is possible to imagine worlds where there are so much data on individuals, including genetic data, and at such a young age that measurement would once again dominate search. You wouldn’t have to “look for” anybody, at least not if you could access the data in the system.

The traits of a good talent scout

Good scouts typically are masters of networking rather than performance per se. Still, the quality scout still must have an excellent understanding of the topic area, but he or she does not need to have been a star. In fact, having been a star may interfere with the objectivity and judgment of the scout. Top stars too often have a kind of intolerance toward other, different kinds of talent, or they expect too much of prospects too quickly. Second, a good scout should have some measure of charisma.

How to convince talent to join your cause

If you are going to raise the aspirations of others, they should view their affiliation with you as a matter of pride. They should feel selected in some manner. They should feel like they have gone through trials and tribulations to get to their current point. They should feel like members of some exclusive club where they can look around and feel good about their affiliations with the other club members.

^
As an aside, I think a similar critique can usually be made whenever people apply large group studies to unusual/outlier groups (for example, claiming that parenting choices have little effect on life outcomes because of studies that mostly include regular parents who don't make highly unusual parenting decisions).
^
If so, perhaps one should skip the interview questions they suggest at the start of the book...

Discuss

How do you read Less Wrong?

14 ноября, 2025 - 08:17

Published on November 14, 2025 5:17 AM GMT

My method of reading Less Wrong is to scroll back through all recent comments and posts, which the front page spontaneously presents to me in reverse-chronological order, until I arrive at posts and comments that I recognize. Along the way, if I see anything that I might want to read at length, I open it into a new browser tab.

It seems that this is no longer an option. I can keep up with all the new posts via "All Posts", but the comment feed is now a mix of actually recent comments mingled with recommended content from years in the past, with the actually recent comments also appearing out of temporal order.

I would therefore like to know how other people engage with the site. What is the process by which you find out what's new, and how do you decide what content to read?

Discuss

Thoughts are surprisingly detailed and remarkably autonomous

14 ноября, 2025 - 08:00

Published on November 14, 2025 5:00 AM GMT

For years, I've been stumped by the failure of most people, of society, to make simple and important inferences. They have the facts, it's important to them, and yet they do not put two and two together. How do they fail so?

Well, last week, a friend's comment suggested a trailhead for an explanation. This is part 1 of a ~4-part series where I share my current guess.

neuron fractal 1 by amattox mattox

You know how to walk. When did you last think about contracting your quads?

You know how to talk. When did you last think about the placement of your tongue?

You know how to think. When did you last think about...the individual mental motions that make up the larger thought-acts you intentionally carry out?

Like walking, talking, and most actions, thinking is non-atomic. Any act of thinking is composed of sub-thoughts, micro-cognitions, and individual mental motions. And in comparison to walking and talking, the variety is staggering. There are only so many practical ways to walk, but the ways your mind could carry out a given thought-act are numerous.

Start with the example of answering something simple: where to get to lunch? You begin considering two options: the taqueria and the sushi place. Why those two? Your mind wordlessly queried "options" and that's what came back. The broader line of thought found that satisfactory and didn't push for more. On another day, your thinking would be querying what you felt like and hunting for justifications to pay the high sushi prices. Today that doesn't occur to you, instead the wordless considerations query came back with "what would your lunch companions like most?" (marked important) since you are choosing for them too.

My goal here isn't to suggest a definite or rigorous taxonomy of thought, more to gesture at the breakdown. Much of thinking seems to proceed in chains: trigger/response/trigger/response/trigger/response where both previous thought-responses and new events in the world can be new triggers. This makes you think of that, that makes you think of this, makes you think of that. Along the way there are wordless queries for information that some part of your mind provides. How and why? We don't usually think about that. You ask a question like what I should be doing right now and a list is produced as if from nowhere. Perhaps another part of your mind remembers that you're often forgetful and should consult your calendar. Thank you, helpful thought.

So there's information retrieval that happens in an opaque way and who knows how exactly the recall happens, but also there's a bevy of different reactions that a mind could throw up in response to any stimulus. Those reactions might look like a question. Somebody learns their boss is angry, their mind might variously: become afraid it is something they did and search for reasons, be gleeful because they enjoy their boss's displeasure, start considering how to take advantage, and so on. My guess is that in very rare cases do people pause to consider which of these responses is best, instead, much as the mind chooses our footfall for us, the mind decides which question one ought to be answering.

From another angle, there's the kind of reasoning a mind employs. Responding to a stimulus, a mind could call up past examples, it could simulate the reaction of certain other people, it could query the morality sub-module, it could execute a first-principles-simulation, it could make a query to the gut/heart about how this feels. The cognition done could be cerebral, it could be embodied, or some secret third thing.

To tie this to a concrete scenario, consider a young fellow on a date with a young woman at a nice restaurant when the time comes to pay the bill. For many, a part of the mind will recognize "pay the bill" as being a ritualized high-stakes moment and the optimal action being non-obvious. Having pattern-matched the scenario, another part of the mind might pump stress hormones[1], priming the mind and body for action. The fellow's mind could go down any number of pathways here: (a) try to infer what this young woman is expecting, (b) mustering courage to take the option he believes is correct, (c) figuring out a joke to defuse any tension with humor, (d) figuring out how to be suave and deftly ferret out more information without committing to the wrong answer. My guess is that the conscious explicit thinking gets spent on the chosen part, e.g. thinking of good jokes, rather than choosing among the high-level options.

Like walking and speaking, these motions proceed automatically. And not just the intuitive, quick, rapid cognition System 1 mental acts. I contend that there's automatic mental sub-motion micro-cognitions happening incessantly and inseparably even with putatively explicit System 2 cognition, e.g. every time the System 2 cognition makes a query for lists of options; and at the meta-level too, the part of your mind that typically decides unexplicitly which things should be thought about deliberately vs not[2].

All of this is to make two points: (1) thinking is not an atomic act. When you think "what shall I eat for lunch?", that mental act will be composed of a great many sub-pieces. And (2), almost all of these sub-pieces proceed automatically and without conscious thought – and all the conscious thought fragments can be decomposed into less conscious ones.

That's all for today. In tomorrow's piece, I'll muse on how the mind learns mental motions and when to use them.

^
It's not crucial for my arguments, but I view emotions as part of cognition too, such that we should view generating a particular emotion (and corresponding physiological states) as part of the automatic constituent mental actions a mind conducts.
^
Depending on the person and ideas they've absorbed, the mind eventually decides that one should think explicitly about whether or not to think explicitly.

Discuss

Halfhaven Digest #4

14 ноября, 2025 - 07:16

Published on November 14, 2025 4:16 AM GMT

My posts since the last digest

Asking Paul Fussel for Writing Advice — I gave AI the works of Paul Fussel, Christopher Hitchens, and Eliezer Yudkowsky, and asked it for writing advice. I got some actually good advice and was surprised the experiment wasn’t a failure. I have since used this trick again to get feedback about my subsequent posts.
Halloween Tombstone Simulacra — Noticing the drift between Halloween tombstones and actual tombstones.
Minimizing Loss ≠ Maximizing Intelligence — A higher-effort post describing why I think LLMs and self-supervised learning as a whole are dead ends and won’t get us to superintelligence. And some approaches I think are more promising.
Turning Grey — A sci-fi story in 2025 that isn’t about AI?
I Read Red Heart and I Heart It — A review of Max Harms’ latest novel Red Heart. My post was liked by Max Harms, which I think pretty much makes me a published author now.

I’ve been busy lately. I’ll admit, the Halloween post was a vapid idea I thought of just to get something out quickly. But I think it turned out alright anyway. I am most proud of the short story this time, which I read aloud to my girlfriend and she liked. The Shirley character in the story is literally just my girlfriend, by the way, down to her profession and the way she dresses at work.

Some highlights from other Halfhaven writers (since the last digest)

roundness of numbers is complicated (April) — Contra Inkhaven resident Signore Galilei, April of Apriiori cleanly argues essentially that you can’t describe what we mean by the “roundness” of a number with a formula, because e.g. 25 is rounder than 30 when dealing with cents, but not when dealing with seconds.
We write numbers backward (lsusr) — I started this fun video thinking, “no we don’t”, and ended it thinking “we totally write numbers backward!”
E-Prime (Lorxus) — An overview of an interesting, restricted form of English with the worst, cringiest name ever. I definitely be a “to be” user and have no plans of paring back my usage of the King’s verb, but I appreciate the thoughts about what kinds of language can be unclear.
Husky Syndrome (Aaron) — On the mindset of social anxiety with a brilliant analogy to sled-pulling dogs.
Supervillain Monologues are Unrealistic (Algon) — Real-life villains monologue endlessly about what they plan to do, and nobody listens. Startup founders, on the other hand, are anxious to tell people their master plans, for fear someone will think they’re foolish (or steal their idea). But nobody will listen anyway, so feel free to monologue as much as you’d like.
[The Mortifying Ordeal of Knowing Thyself] (Philipreal) — Contrary to the grandiose title, it's a relatable blog post about nervousness when posting Halfhaven blog posts, and a desire to do less than your best so nobody can judge your true best. It suggests a strategy of posting the occasional higher-effort post, which I have been following myself (two out of my last five were higher-effort).
I Admit, I Am Ignorant of Many Things (keltan) — An ode to saying “I don’t know”.

Since the last digest, Inkhaven proper has started, and we off-brand Halfhaven writers are now in competition with the 41 Inkhaven residents for LessWrong upvotes. I’m not including Inkhaven posts in my digests (God knows Inkhaven residents have enough support — they even have a ball pit!), but I’ve been reading some of those as well, and you should check them out. At the beginning of November we also had a few more people join Halfhaven, bolstering our numbers against the Inkhaven hordes. We also had our first early-finish, with Algon writing their 30th post on November 2nd, and now going for a high score (they’re currently at 36 posts).

Discuss

Страницы