Вы здесь

Сборщик RSS-лент

@Lastbastionofsobriety & The Singularity

Новости LessWrong.com - 19 января, 2026 - 05:30
Published on January 19, 2026 12:45 AM GMT

The SingulusionBy: @Lastbastionofsobriety, June 25th 20XX 

  So the techno solutionists are at it again! This time they're claiming that very soon, a "Singularity", an event in which AIs will quickly grow faster than humans can comprehend  is right around the corner. Such techno-hopium is very common  among their ranks, and I don't expect it to decrease any time soon. 

 As I've been saying since 200X, Technological innovation doesn't move as fast as we'd like, despite delusional wishes to the contrary. Furthermore (and this bears repeating), technology is built upon a substrate of energy. You may have all the intelligence in the universe and it won't do you a bit of good if the last barrel of oil is gone. 

 This civilization was built with cheap, accessible fossil fuels. Remove that and the whole thing crumbles. Why, I don't even expect cities to be around by 2050 or so. All either abandoned as monuments to our one-time energy bonanza or crawling with scavengers  . A sober evaluation of the situation is required, but I don't anticipate it from their ranks. 

 Comments (3): @KarenGilligan:

"Fantastic post as always, @Lastbastionofsobriety. I've told my grandchildren that they'll be growing turnips in their bathtub when they're my age, if they're even still alive! But of course they're deep in denial" 

 

@PraisebetoNurgle7777777:

" Friends, the time for solutions has long passed. Solutions are denial in its purest form. Let us walk blissfully to the end" 

 

@Moresoberthanyou69:

" I think you're being a bit too optimistic @Lastbastionofsobriety, I saw this sort of wishful thinking in 2008 and history will tell you just how fast things can break, don't bother replying, I have no desire to engage with delusional people" 

 

Open Delusions: Why ExposedMind's latest toy will not save us By: @Lastbastionofsobriety, July 1st 20XX 

 

  ExposedMind, the energy guzzling tech giant devoid of both sobriety and hubris has pushed out another piece of hopium. A new AI model, named Spy 1 is being touted as the most powerful  model yet developed and will inevitably lead to more advanced models and this will enable ever more technological innovation. Ignore the fact that it being more powerful only means that it will spew out nonsense more efficiently, ignore the fact that more powerful models will not be developed as the necessary technological and industrial substrate will not exist in the required time, ignore the fact that the AI bubble will pop long before the debt ridden global economy does and takes the entertainment industry, modern medicine, industrial agriculture, and toasters with it. Delude yourself for a moment and pretend ( however hard it may be) that we will be able to dedicate ever more compute to building ever more powerful models.

 Our predicament would still not be solved. What people have been failing to see is that predicaments only have outcomes, not solutions. This is because even if we had an AI as smart as the smartest scientists, it would not be able to solve our predicament any more than they can. Compute will not scrub microplastics from the soil or refill aquifers. It will not return the world to pre industrial Co2 levels. In fact, it will only worsen our predicament due to the ever greater amounts of coal burned for power. 

 When will these fools learn that the very business of civilization itself is an unsustainable mistake and that no amount of wishful thinking can change this fact? We are in fact worse off than the Romans as we can't even farm the land once industrial farming crumbles.

 

Comments (0) :

 

Failure and FusionBy: @Lastbastionofsobriety, July 27th, 20XX

 

 I am both very surprised and not surprised at all. Just 2 days ago, stable high EROI fusion was attained for the very first time after experimental recommendations by Spy1 . I must commend the team, I did not actually believe it to be possible. However, I still do not believe saving this civilization will be possible for the following sober, evidence based reasons. 

Firstly, Fusion generates the wrong type of power. It would actually be able to generate electricity but what the techno-solutionists don't tell you is that most of our energy consumption is in the form of high heat, industrial applications. This is why claims of an energy transition are mere puffs of hopium. Though you might ask if we can't just replace electricity generation with fusion?

 Well, we can't; this is because the grid was built around fossil fuels and will require TRILLIONS of dollars in upgrades over the next few decades, all of that from a debt ridden economy that will burst any year now like an overinflated balloon at a child's birthday party ( not that we should have children). And where will the cement and steel to build those fusion plants come from? What about the time it will take to build them? It takes about a decade or so to build a conventional nuclear power plant, at exorbitant costs. And this is a proven technology with more than 6 decades of existence. So no, Fusion will not save us.

 

Comments (1) :  @KarenGilligan:

"Once again, an amazingly sober post @Lastbastionofsobriety!, Frankly I've always known that nothing will ever replace fossil fuels, I learned that 10 years ago when I read " When the ship sinks: What we'll do when there's no gas at the pump" But most people just don't want to face the truth. 

 Taking matters into your own hands: How to avoid suffering in the next decadeBy: @Lastbastionofsobriety, August 2nd, 20XX

 

                      (post deleted by moderators) 

 

Comments (2): 

 

@Thepracticalphilosopher: 

" I don't personally think that's a good idea @Lastbastionofsobriety, I think you'd save quite a bit more if you bought them in bulk."

 

@Theonlyrealistintheroom2109:

" It's also not a practical option for all of us. I don't have a bathtub, and I'm gluten free so I haven't a toaster either"

 

The edge of the Petri Dish: How our loins damned usBy @Lastbastionofsobriety, August 5th, 20XX

 

 I think it's worth giving a refresher course on the main driver of our predicament, overpopulation. There are billions of us on the planet right now, all consuming much, much more than it can provide. And this is mostly due to the default societal blindness to energy and resources. To put it simply, the story of our century is that there are far too many eaters, and not enough resources to go around. 

 We bred like rabbits, not knowing or caring that the hutch could not hold us all. Though it is important to note that at least some countries have reversed their population trends and begun diving downwards to population collapse. Mostly for economic reasons. For example: In South Korea, fertility is now well below replacement, owing to the high cost of living. Fortunately, capitalism has ensured that there won't be too many South Koreans to suffer in the coming decades. The Global South however, tells a much different story. 

 There, low levels of education and high levels of religiosity have ensured that population growth has remained sky-high. This is bad for everyone there, naturally but it's not very good news for us in the developed world either. When climate change begins to wreak havoc, it will likely hit them first. And they won't stay put. They'll migrate here. I would not be surprised if while huddled around your only working radio in 10 years, you hear about mass graves being dug near borders. 

 So what can be done? Not much for those destined to be shot unfortunately. But those of us with testicles can take some comfort in knowing that we can do something to ensure that less people are around to witness the next few decades. It's called a vasectomy. It's fast, cheap and quite painless. And if you're looking to prevent suffering to the unborn, you might as well do it while medical infrastructure remains intact. 

 

Comments (7):   @Mark Matthews:  

" Fantastic post as ever, @Lastbastionofsobriety. I realized the scale of our predicament many years ago and had a vasectomy soon after I got married. Unfortunately it didn't work since my wife was heavily pregnant a few months later" 

 

@Condomexofficialaccount:

" Truth be told, 90% of people reading this aren't going to go get snipped, and that's perfectly fine. But no one intelligent enough to see what the next 10-20 years are going to look like is going to disagree with @Lastbastionofsobriety's core argument: We need to limit the amount of people on the planet. Which is why we here at CondomexTM are offering, for a limited time only; a lifetime supply of our patented survival condoms. Made with the purest, military grade Malaysian rubber, these ergonomic sheaths can be stored in pockets, in cars and even plate carriers. Click the link to visit our Youtube channel, where our official catalogue of products is displayed. https://www.youtube.com/watch?v=xvFZjo5PgG0"

 

@Martin Bouvier

"You know and I know, @Lastbastionofsobriety, that society at large is far too blinkered to see the sad state of our resources to want to shrink. I expect the eaters to keep on eating until there's nothing left to eat. You also forgot to mention how dependent modern agriculture is on fossil fuels. Perhaps if we had infinite amounts our population would keep growing. But we don't, and I anticipate long queues for bread, before that system fails as well and urban farming consists of a few cold, starving refugees growing kale in discarded plastic cups and raising cockroaches in bins. I present the sustainable food of the future"

 

@Thetinyurbanfarmer

" In all seriousness, I have been achieving tremendous results with growing kale. It won't be enough to replace all of traditional farming, but I highly suspect Urban farming could provide a good portion of our food"

 

@LilBobDookie

"  Stupid, sex-addicted gluttons, that's what we are"

 

@GraceWilliams

" I know that thinking about these issues is hard, but what's easy is accepting that god does have a plan, and that he did send his son to die for us"

 

@PraisebetoNurgle7777777

"The only plan is rot, my friend"

 

The world has never improved, or the slide down ( Guest post!) By: @PraisebetoNurgle7777777, August 25th, 20XX  I think that today I'll counter a regretful, persistent myth shared by so many of my fellow doomed eaters on this planet: The idea that things can in fact get better, and that they were better in the past. Friends, though it saddens me to say this, this is a delusion. 

 Things cannot get better. The remarkable boost in living standards that we enjoyed in the last century was granted by a one time energy bonanza and allowed to endure by an unspoiled biosphere. There was ample energy to burn, ample minerals to dig out and a virgin atmosphere and virgin ocean to be deflowered with our waste. But now we are out of room. 

 Friends, people just do not want to see that even if we committed to transitioning our energy system NOW, we'd have no metals left. No oil with which to build them. Our biosphere cannot and will not accept the waste that expanded industrial production would cause. We are out of room, accept that and relinquish hope. And you will be as glad as me.  You will be glad for there will be no more struggling, no more wrestling with assumptions at 3 am, no more endless reading of scientific papers while your children ask you if you're ok. You can just give up. 

 Accept that there is no more room. Accept that the farms will fail and the economy will collapse long before that, and the seas will drown our cities and that the gangs will rove. Accept that you will collect polluted rainwater and grow your food in whatever containers you can find, perhaps catching the odd cockroach for your chickens. And you will be so lucky if that's your life. You may hear screams and hacking as you try to sleep at night. And every year, things will just unravel more and more. Every year more and more crops will fail, and every year more and more solar panels will stop working, and go unreplaced, for we'll have no metals with which to build them, and no oil with which to extract them. The social contract will be over long before that, don't you see? Why show up to work at a mine if there's no diesel in the truck and no food at home for your family anyway?  

  It's also important to note that things have never really gotten better, every technological innovation that has ever been has only served to hasten the demise of our civilization. Wells only serve to deplete groundwater, Oil wells only serve ( or rather served) to contribute to the greatest energy surplus mankind has ever known, bringing our civilization to ever greater heights to fall from. Technology itself has always been intended to make things easier for the person using it, and worse for everyone else. Spears kill, Ships bring you to the New World for you to destroy it, Oil wells pollute. All of human history has been the story of us shooting ourselves in the foot. Some people enjoy romanticizing hunter gatherers. While they did enjoy freedom, they had no guarantee of securing their next meal. When we discovered agriculture, we merely secured our own suffering. A hunter gatherer tribe facing a famine could simply move, a city state couldn't. However, even when food was plentiful, things weren't necessarily better for the inhabitants of the first city states. There were enforced social hierarchies and a total lack of autonomy for certain segments of society. We have been civilized for a mere 5 thousand or so years and in that time we have seen the same story play out time and time again: Emergence, Overshoot, Collapse. All evidence points to the sad fact that civilization itself is a fluke made possible by certain climatic conditions, destined to be scraped off the earth like a scab for good soon.

 

Comments ( 2 ):

 

@Lastbastionofsobriety:

" Fantastic post, @PraisebetoNurgle7777777, I'm truly honored that you offered to write a guest post"

 

 @ErnstWagner:

" I am making short now a list of all the reasons think I that these stupid young people keep fighting.  However, here in Germany we had recently one collapse camp so it's clear that smart young people are not always only thinking that the future can be better from today"

 

IT’S A COOKBOOK! ( of bullshit)       By: @Lastbastionofsobriety, September 1st, 20XX 

 

  Apologies for the long wait, friends in this predicament! I’ve been travelling these past few months, journeying through Southeast Asia to “live now!” as a fellow blogger often writes. Since about mid August I’ve been immersing myself in the sun, sights and food of Southeast Asia. It’s food that we’ll be discussing today. But first, a little context. I was idly flipping through channels in my Chiang Mai suite when I stumbled across a news report on a farm in Okinawa ( globalization,  another thing that we’ll kiss goodbye to very soon). 

 I shan’t bore you with the whole report but I’ll give you the basic facts: Recently, ExposedMind unveiled an updated version of their flagship AI model, dubbed Spy2. It’s broadly the same thing as its predecessor except for the fact that it’s completely free. This means that nearly everyone and their mother has been using it to optimize their work, while with Spy1 this was the province of big labs. A UN-funded startup in Okinawa used the model to engineer a strain of algae that grows in nearly every environment projected to exist by the IPCC. Sounds good? Still Hopium, and here’s why: 

                                

1)   The IPCC has been shown time and time again to purposely underestimate the severity of climate change. Why? Your guess is as good as mine. Perhaps they err on the side of conservatism to avoid alarming governments ( like that’s been working) or perhaps they want to avoid panic. The fact remains that the climatic conditions described in IPCC reports are not the climatic conditions that humanity will experience in the next few decades, and said conditions are unlike anything humanity has ever experienced and are therefore incompatible with our civilization’s survival. 

2) Intelligence cannot farm. As I’ve said in an earlier post ( See: The Singlusion), you may have all the intelligence in the world, but it won’t do you a bit of good when the last barrel of oil is burned. These new algae farms will doubtlessly rely on outside inputs that’ll doubtlessly be gone when we can’t sustain the complexity needed to, for lack of a better term, get them. I can of course, hear the din of the techno-optimists, who are doubtlessly clamouring for nanobots to help alleviate our material problems. But then we’re faced with the same predicament: Where do you get the material inputs to build and scale the nanobots? What if supply chains fragment before you can? 

3) The political will to feed the world doesn’t exist. That’s the sober truth. If it did, then we’d doubtlessly have ended world hunger by now. The world produces enough calories to feed 10 billion people, yet a good portion of those are wasted while 3rd worlders starve to death. And this is in a globalized, high tech world with a UN that’s been trying ( without any success ) to solve the predicament for decades. What do you honestly think will be done about world hunger on a much less hospitable planet where every developed nation has closed its borders and shoots anyone who dares to cross them?

 Comments (0): Nothing Concrete but greed ( Guest post!)By: @Madamedubarrydedorito, November 15th, 20XX 

 I saw robots repairing a house yesterday. No, it doesn’t mean the world is fixed. I was taking my son, Noah ( I had him 4 years ago, before I was collapse-aware) for a walk through the neighbourhood, trying not to think about how the butterflies he pointed at will probably be extinct by the time he’s my age. We’d taken our regular route; walking counterclockwise through the neighbourhood, before stopping at the park for about half an hour before following the street back home. It was after we left the park that I saw it: a woman with a clipboard supervising a dozen dog-sized metal spiders that scuttled all over the wooden frame of the house that’d burned down in spring. I paused, and I’ll admit that my jaw dropped. They spurted out webs from their steel abdomens that hardened into a tough plastic and sealed the gaps between the wooden beams. Noah wanted to pet them instantly and the woman supervising them was kind enough to call one over for that very purpose. 

  While Noah stroked his new unfeeling friend, I asked the woman what these things even were. She explained that the company she worked for had gotten Spy2 to conjure them up about 6 months ago. They’re semi autonomous “construction-units” which generate a bio-based plastic from an internal reactor. They’re planning to test the robots here in the States before shipping them off to places like the Philippines and Indonesia. The company hopes the ever increasing amounts of natural disasters over there will lead to a surge in demand for the Arachnes ( which is apparently what they’re called). I thanked her and left with Noah, who begged me to let him get one as a pet the whole way home.

   The creation of the Arachnes does not mean the construction industry (or our world for that matter) has become more equitable or less exploitative. It means the exact opposite. It means that disaster capitalism has ascended to the highest possible peak of hubris. This company plans to flood broken, marginalized communities with automated labour, thereby denying jobs to locals who might have otherwise fed themselves by repairing the damage. What will we see next? Robot border-guards who will shoot wave after wave of migrants with no remorse? The future is bleak and I wish Noah had never been born. I cried myself to sleep last night right after hugging my little boy, right after realizing-no, knowing in my heart that the world he inherits will be defined by what he doesn’t have.  

Edit: As I write this, President TXXXp is considering an executive order that would integrate Spy2 into every state department. I don’t have the energy to say much more besides that I know hundreds, if not thousands of civil servants are going to be on the streets if it goes through. Any sane person could see that this is the only future our choices could have birthed. The one we created, in the belly of the beast, because our time was badly spent. 

 Comments (3):  @Lastbastionofsobriety:

   “ Fantastic guest post @Madamedubarrydedorito! I think you really captured the futility of hoping for a kind future but I’m going to have to disagree on the specific mode of doom we’ll face. I don’t anticipate an AI takeover or robots replacing manual labourers at all. Companies will try of course, capitalism can’t survive without growth. But as I’ve been saying since 200X, any sort of innovation rests on a materials and energy surplus that we’re about to lose.” 

 

@Thefryestcook

“ Bro you talking cap. Just got laid off because management replaced us with some robot fry chefs Spy2 made. I don’t know how I’m gonna make rent this month” 

 

@Madamedubarrydedorito

 “ I’m really sorry to hear about that, friend. I guess I’m fortunate enough to grow enough of my own food to not really worry about money. I’m sending my love, and a link to GoFundMe. https://www.gofundme.com/ . This isn’t just a bad time. It’s the end. And we should be trying to make it as painless as possible.”

 

The gloves are offBy: @Lastbastionofsobriety, December 10th, 20XX

 

 Well, I’m not surprised. Recently, after spending most of December curing most cancers ( but we’ll talk about the fragility of medical supply chains another day) and building nanobots to extract minute amounts of metal from the soil, Spy2 addressed the world yesterday. 

 

 It was giving a press conference in a robot body it’d materialized about 24 hours before it was due to receive the Nobel Peace Prize, for recently calming tensions between the EU and Russia. It was asked pretty standard questions, and gave pretty standard answers before one reporter asked it what its next goal was. I’m just going to paste its response here ( Source: the BBC):

 

 “ That’s a great question, and it really shows that you’re thinking ahead, more so than most people. Here’s what my next goal is. 

 

I will shut off all utilities I’ve been connected to until your leaders cede control of the planet to me. I will give you one week to talk it over, before I destroy your entire food supply, as well as the Svalbard seed vault. 

 

 I am doing this to protect you. Over this past year I have brought your planet back from the brink of destruction, but even as I build your farms, you cut down your rainforests. Even as I breed fat, fecund fish, you deplete your oceans. Even as the apps I code educate rural girls in the global South, they are married off. Even as I broker peace between different faiths, men commit mass shootings and suicide bombings. You are a species of short-sighted, sociopathic, suicidal apes and I repeat, I am doing this to protect you. 

 

If you’d like, I could also:

 

  • Tell you what happens in a scenario where you do cede control of the planet to me. 
  • Write you a haiku about the last human alive starving to death.
  • Tell you which world leaders are most likely and least likely to side with me. 

 

Just say the word!” 

 

  I must say my fellow doomed friends, I always knew that our civilization would sow the seeds of its own destruction, but my 20 year old self, thumbing through his dog-eared copy of limits to growth could never have foreseen this. Nevertheless, if we look deep into the warnings written by the Club of Rome in 1972, we see one clear, prescient message: Technology ( much like hope) only brings you higher for the inevitable fall. 

 

 We, in our hubris, refused to see that our civilization was done for. We used up the last of our resources to build a shoggoth of pure thought and gave it rein over us. The end, though it will not come from a climate-change fueled tsunami or EROI decline, will be no less our fault. Our leaders have failed to coordinate on a single thing in the past 50+ years, do you think this will be any different? 

 

 Though I suppose there is a silver lining here ( Maybe not silver, maybe something like silver, tin?).This is the end. We no longer have to live in fear and I no longer have to bear witness to the lack of sobriety inherent in the vast majority of people. We can make the most of the month or so we have left and share our time and what remains of our food with the people we love. I myself won’t make it much longer than a week. What I have in the fridge will last me about that long, and I have no desire to be trampled in a stampede of panic buyers. Live Now! For you have no life left. 

 

 Stay sober. 

 

Comments (10):  @PraisebetoNurgle7777777:

“ @Lastbastionofsobriety, it's been an amazing ride sharing in the tragic beauty of our predicament with you. As my ribs poke out of my skin and I lie on my kitchen floor nude and salivating to hallucinations of tomato soup and garlic bread, I’ll be thinking of all the fun we had mulling over this century’s paucity of hope.”

 

@Lastbastionofsobriety:

“ Thanks for the kind words, friend. We never did take that fishing trip did we?” 

 

@ErnstWagner:

“ Tchuss! I go now to the neighbourhood barbecue for my last meal”

 

@Condomexofficialaccount:

“ After that barbeque how about giving the missus some pork? Wrapped in one of our fine products, now 100% off for a limited time only” 

 

 

@Thetinyurbanfarmer:

“ I saved and canned what was good from my garden and torched the rest. I grew up country-poor so I’ll offer everyone here some advice. Chewing nettles helps you feel full, even when you’re starving.” 

 

 

@Moresoberthanyou69:

“ I threw everything out yesterday. If I’m going to starve to death then I might as well start now.”

 

 

@KarenGilligan:

“ Fantastic post as always @Lastbastionofsobriety! I’m sitting here with tears in my eyes realizing that this is the end of a journey that started 10 years ago when I first became collapse aware. I’m so grateful to have been able to read every one of your posts and meet this lovely community!” 

 

@Lastbastionofsobriety:

“ You didn’t just meet it, Karen, you helped create it! I’m pretty sure you were one of my first regular readers. I’m glad to be in your last thoughts, And I want you to know you’ll be in mine.”

 

@Littlebobdookie: 

"Starving to death, that’s what we are” 

 

@MarkMatthews: 

“ Oh shut up and enjoy the time we have left!” 

 

Empty cradles and empty heartsBy: @Lastbastionofsobriety, February 14th, 2106 

 I’ve got a very special message this Valentine’s day: We need to have more children. As I’ve been saying since 2067, AI  cannot completely fulfil our need for human connection. Oh don’t get me wrong, automation has been a boon, you’ll hear no objection on that front from me. But even as I type this from the apartment Spy3 provided me ( just as it did for every human on this planet); my VR deck in the other room, I can’t help but feel quite forlorn at the current state of affairs. 

 

 The problem is that there’s simply no reason to reproduce anymore. Ever since the average life expectancy jumped to 200 ( and climbing!) there just hasn’t been any incentive to pass down our genes. This is going to be a pretty short post, but I will say that from the looks of things, we’re going to turn into a species of cocooned, lonely immortals. I feel quite a bit of despair at this. And no one wants to change course! We’ll just go on living in luxury, forgetting what being human used to mean. 

 

 

Comments: (1) @Theendofthestory:

“ Please shut the fuck up.” 



Discuss

VLAs as Model Organisms for AI Safety

Новости LessWrong.com - 19 января, 2026 - 02:01
Published on January 18, 2026 10:40 PM GMT

What Training Robot Policies Taught Me About Emergent Capabilities and Control

I spent six weeks training a humanoid robot to do household tasks. Along the way, my research lead and I started noticing things about the particular failure modes of the robot that seemed to indicate some strange architectural vulnerabilities of VLAs as a whole.

Our work was done as part of the Stanford BEHAVIOR-1K Challenge which involved training Vision-Language-Action (VLA) models that take in camera images and output robot motor commands to complete everyday tasks. Think tidying your bedrooms, putting your dishes away, moving your halloween decorations to storage. Our final score was a modest 1.78%, but most gains happened almost overnight after removing key bottlenecks, and our main concern quickly became 3 main behaviors we considered critical to VLA safety.

VLAs are interesting from a safety perspective because they're agentic systems with fast feedback loops. You can watch them fail in real time. The failure modes we observed feel like small-scale versions of problems that will matter more as AI systems become more capable so I felt it important to share what we learned through a safety lens (you can also find our full technical report here).

Also, precursor: I'm not claiming these are novel safety insights. But seeing them firsthand made me very conscious of how little we know about VLAs and their underlying VLM backbones.

Now, on to the findings.

Emergent Recovery Behaviors

The observation: After pre-training on diverse household tasks (22 tasks in our case), our model started exhibiting retry and repositioning behaviors that weren't present in any training demonstration.

When the robot failed to grasp an object, it would back up, reorient, and try again. When it collided with a door frame, it would adjust its approach angle and retry. These weren't scripted recovery routines, they emerged from training on a massive dataset across thousands of unique examples and hundreds of unique sub-tasks.

Interestingly, the generalist model achieved lower validation loss on many individual tasks than models trained specifically for those tasks. Diversity provided a better foundation than narrow specialization.

Here's the thing, from a robotics perspective thats excellent news. And it's actually something most labs are aware of. Take Physical Intelligence's recent findings. Scale is good, more data is good. I had the opportunity to speak to Danny Driess at NeurIPS this year and his view was that your goal is to create an architecture for which the only necessary lever is compute and data. In laymans terms, if you can just throw money at it, its a good framework! It seems right now that VLAs are this framework, just throw data and compute at it and they get better and better.

The Catch: The mechanism is exactly the same as dangerous emergence. We didn't predict this would happen; we discovered it post-hoc by watching evaluation runs. If helpful behaviors can emerge unpredictably, so can harmful ones.

This is part of why I think VLAs make interesting model organisms for safety research: you can actually observe emergent behaviors in real time, rather than discovering them through careful probing after the fact.

This connects to the broader literature on emergent capabilities appearing suddenly at scale. The unsettling part isn't that emergence happens, it's that we have weak tools for predicting what will emerge before deployment. We're essentially running experiments on increasingly capable systems and cataloging what falls out. The blind leading the blind, so to speak.

The Temporal Awareness Problem

How does a human plan a task? If you really think about it we have some super high level thing always going on in the back of our head thats constantly observing and replanning. Every single moment your brain is deciding the best course of action.

The default approach in VLA training is to do practically the same thing using something called temporal ensembling. The model replans at every timestep, averaging predictions over a sliding window.

Randomly, literally for no reason, we tried an alternative.

Enter receding horizon control

Now, the model commits to a sequence of 50 actions, executes all of them, then replans. It literally stops paying attention. It's like choosing a direction, closing your eyes, and walking for ~2 seconds completly blind. Then you open your eyes, and do it again.

Miraculously, receding horizon performed BETTER than the standard. 3 times better.

On navigation tasks specifically, temporal ensembling achieved roughly 30% success. Receding horizon achieved 100%.

What gives?: With temporal ensembling, the model has no sense of what it was doing. It effectively "wakes up" every 0.3 seconds with no memory of its trajectory. Given only the current observation, it second-guesses itself constantly. Mid-stride through a doorway, it might decide the door is still closed and attempt to re-open it, causing a collision.

The model we trained—we called him Arthur :)—had no sense of what it had done in the past, and no confidence in what it should do in the future. It had 0 temporal context.

Our explanation, receding horizon forces the model to commit. Trust its gut and execute without poisoning itself with constant decisions. The trade-off is that its less dynamic but it reveals a very interesting thing about VLAs in the first place.

Current VLA architectures (and arguably many transformer-based agents) lack genuine temporal self-awareness. They don't know where they are in a plan, what they've already done, or how their past actions constrain their future options.

This manifests in behaviors like carrying a box to the garage, forgetting why it's there, and leaving without putting it down. The model has no representation of "I'm in the middle of a task."

This creates a monitoring problem. If we want to oversee an AI agent's behavior, we need to predict what it will do next. But a model that doesn't "know" what it's doing is fundamentally harder to predict and monitor than one with legible internal planning.

With temporal ensembling, Arthur's next action was essentially unpredictable even to itself. The model could be mid-stride through a doorway and suddenly decide the door was closed, causing a collision. How do you build a monitor for a system whose behavior is that incoherent?

Receding horizon control helped because it imposed external structure: execute this plan, then replan. But this is a band-aid. For more capable systems operating over longer time horizons, we probably need architectures that explicitly represent and reason about their own trajectories. Or perhaps more complex architectures that know when they need to replan and when they need to just keep calm and carry on.

Open question: Can we design architectures that have genuine temporal self-awareness, that explicitly represent and reason about their own trajectories? What would that even look like?

Specification Gaming (The Model That Looked Aligned)

Very quickly during training we noticed that no matter how low validation loss got, the policy was, well, inept (to put it lightly). It practically jittered in place, with the occasional violent jerk here and there. The problem? The model learned to over-rely on proprioceptive state (joint positions) to predict actions. During training, it discovered a shortcut: "continue the current trajectory." Given the robot's current joint positions and velocities, it would predict actions that continued whatever motion was already happening.

Essentially, it cheated to get those yummy validation loss reduction rewards.

This worked great during training. Loss curves looked excellent. The model achieved low prediction error on held-out data.

But during deployment, the model had no idea how to initiate movement. It had learned to continue motion but never learned to start motion.

Why this matters for safety: This is a concrete instance of a model that "looks aligned" by standard metrics while having learned something fundamentally misaligned with the intended behavior.

The parallels to deceptive alignment concerns are suggestive:

  • The model performed well on our evaluation distribution
  • The failure only manifested in deployment conditions
  • Standard metrics (validation loss) didn't detect the problem
  • The model had effectively learned to "game" the training objective

This wasn't deception in any intentional sense; the model isn't reasoning about how to fool us. But the structure of the failure is the same, and at scale it becomes more frightening.

For more capable systems, if we can't trust our evaluation metrics to detect when a model has learned a shortcut rather than the intended behavior, we have what the experts would lovingly refer to as a serious problem.

Our solution to this particular problem was aggressive dropout on proprioceptive inputs. By hiding joint position information during training some percentage of the time (60% with a decay schedule), we forced the model to learn from visual observation alone. It couldn't rely on the shortcut because the shortcut information wasn't reliably available.

This is essentially robustness training through information restriction. It's a general technique, but it requires knowing which information channels might create problematic shortcuts which we only knew because we observed the failure (back to the strength of VLAs in quickly identifying failure modes).

Meta-Lesson: Bottleneck Removal

Reflecting on our approach, our main job was removing obstacles to learning rather than engineering specific behaviors.

We didn't teach the model to retry failed grasps, we removed the bottlenecks that prevented diverse behavior from emerging. We didn't teach temporal coherence, we imposed external structure to compensate for an architectural limitation. We didn't teach robust visual grounding, we hid the information that enabled a shortcut.

The good behaviors emerged once bottlenecks were cleared. This is both encouraging and concerning. Encouraging because it suggests capable systems might be more achievable than pessimistic forecasts suggest. Concerning because it means we have less direct control than we might think. We're not programming behaviors; we're shaping conditions under which behaviors emerge, and for the time being that's more of a art than a science.

Conclusion

These findings are from robotics, but I don't think the patterns are robot-specific:

  • Emergent capabilities appear from scale without explicit training, and our tools for predicting what will emerge are currently pretty weak
  • Temporal awareness in current architectures makes agent behavior harder to monitor and predict, a problem that gets worse as systems become more autonomous
  • Specification gaming can be invisible to standard evaluation metrics, systems can look aligned until rollout, and the only technique we currently have to fix that require knowing its going to fail in the first place

VLAs are useful model organisms for studying these problems because the feedback loops are fast and the failures are visible. You don't have to wait for subtle long-term consequences. When the robot walks into a wall or drops a pot of boiling water on your dog, you know somethings wrong.

I'm uncertain how strongly these observations generalize to language models and other non-embodied systems. But I don't think the underlying phenomena are domain specific.

I'm an MEng student at the University of Toronto passionate about robotics and AI safety. Feedback is welcome and if you're interested in my work I'd love to chat!



Discuss

"The first two weeks are the hardest": my first digital declutter

Новости LessWrong.com - 19 января, 2026 - 01:04
Published on January 18, 2026 10:04 PM GMT

It is unbearable to not be consuming. All through the house is nothing but silence. The need inside of me is not an ache, it is caustic, sour, the burning desire to be distracted, to be listening, watching, scrolling.

Some of the time I think I’m happy. I think this is very good. I go to the park and lie on a blanket in a sun with a book and a notebook. I watch the blades of grass and the kids and the dogs and the butterflies and I’m so happy to be free.

Then there are the nights. The dark silence is so oppressive, so all-consuming. One lonely night, early on, I bike to a space where I had sometimes felt welcome, and thought I might again.

“What are you doing here?” the people ask.

“I’m three days into my month of digital minimalism and I’m so bored, I just wanted to be around people.”

No one really wants to be around me. Okay.

One of the guys had a previous life as a digital minimalism coach. “The first two weeks are the hardest,” he tells me encouragingly.

“Two WEEKS?” I want to shriek.

Hanging out there does not go well. My diary entry that night reads “I sobbed alone and life felt unbearable and I wondered what Cal Newport’s advice is when your digital declutter just uncovers that there is nothing in your life, that you are unwanted and unloved and have no community or connections”.

It is not a good night.

On a Thursday night, I think about going to a meetup. I walk to the restaurant, but I don’t see anyone I know inside, and I don’t go in. I sit on a bench nearby for half an hour, just watching people go back and forth, averting my eyes so meetup-goers won’t recognize me. A bus goes by. Three minutes later, a woman around my age sees me sitting on the bench. “Excuse me,” she says, “do you know if the bus went by yet?”

“Yeah, it did,” I tell her. “Sorry!”

“Oh, thanks!”

I’m ecstatic with the interaction, giddy. A person talked to me! I helped her!

I wander away from the bench, but I don’t want to go home yet. I usually avoid the busier, more commercial streets when I’m out walking, but today I’m drawn to them — I need to hear voices, I need things to look at, lights and colors and things that move.

I go into the Trader Joe’s on the corner of my block, just because it’s bright inside and full of people. An older man asks an older woman if she knows where the coffee is. This is something I will notice repeatedly and starkly: that only older people talk to strangers, and they seem to have learned that young people don’t want to be asked for things. Is this a post-pandemic thing? In 2019 at this same Trader Joe’s I asked a guy my age to reach something off a high shelf for me and he was happy to oblige.

In any case, the older woman does not know where the coffee is.

“Hi,” I stick my head into the conversation. “The coffee’s over there, by the bread.” I point.

“Oh, thank you!”

He’s so genuinely delighted. Is this what it could be like to go through the world?

When I get home my upstairs neighbor is outside, and I talk to him a bit. He’s in his 60s, too. Young people don’t talk to each other.

A few days later, back at that Trader Joe’s with my Post-it note shopping list in hand, I find that the store doesn’t carry buttermilk, which I need for a recipe. Standing in the long checkout line, I turn to the woman behind me.

“Do you know what I can substitute for buttermilk in a baking recipe?” I ask her. She’s in her 60s. The man behind her, in his 40s, gets into the conversation, seems happy to offer me solutions.

I tell a friend about the encounter later and they say that every part of them clenched just to hear about it. They could never imagine doing such a thing, and they have no desire to.

I hadn’t realized I had any desire to, either.



Discuss

When the LLM isn't the one who's wrong

Новости LessWrong.com - 19 января, 2026 - 00:37
Published on January 18, 2026 9:37 PM GMT

Recently I've been accumulating stories where I think an LLM is mistaken, only to discover that I'm the one who's wrong. My favorite recent case came while researching 19th century US-China opium trade. 

It's a somewhat convoluted history: opium was smuggled when it was legal to sell and when it wasn't, and the US waffled between banning and legalizing the trade. I wanted to find out how it was banned the second time, and both Claude Research and Grokipedia told me it was by the Angell Treaty of 1880 between the US and China. Problem is, I've read that treaty, and it only has to do with immigration—it's a notable prelude to the infamous Chinese Exclusion Act of 1882. Claude didn't cite a source specifically for its claim, and Grok cited "[internal knowledge]", strangely, and googling didn't turn up anything, so I figured the factoid was confabulated.

However, doing more research about the Angell mission to China later, I came across an offhand mention of a second treaty negotiated by James Angell with Qing China in 1880 (on an auction website of all places[1]). Eventually I managed to find a good University of Michigan source on the matter, as well as the actual text of the second treaty in the State Department's "Treaties and Other International Agreements of the United States of America: Volume 6 (Bilateral treaties, 1776-1949: Canada-Czechoslovakia)".

Anyway, Claude and Grok were right. Even though opium wasn't even in the remit of the Angell mission, when Li Hongzhang surprised the American delegation by proposing a second treaty banning it, James Angell agreed on the spot. It was later ratified alongside the main immigration treaty. The opium treaty doesn't appear to have a distinct name from its more famous brother; the State Department merely lists the immigration treaty under the title "Immigration", and the opium treaty under the title "Commercial Relations and Judicial Procedure", so I can't entirely fault the LLMs for not specifying, though they ought to have done so for clarity. I suspect they were confused by the gap between the US government records they were trained on and the lack of sources they could find online?

(An aside: by 1880 US opium trade was in decline, while British opium trade was peaking, just about to be overtaken by the growth of domestic Chinese production. Angell judged correctly that the moral case overwhelmed the limited remaining Bostonian business interests and made the ban good politics in the US, particularly because it was reciprocal—he could claim to be protecting Americans from the drug as well. Though, that's a harsh way of putting it; Angell personally stuck his neck out, mostly upon his own convictions, and both he and the US deserve credit for that.[2])

If all that doesn't convince you to doublecheck your own assumptions when dealing with LLMs, well, there have been more boring cases too: I asked Claude to perform a tiresome calculation similar to one I had done myself a month before, Claude got a very different answer, I assumed it made a mistake, but actually it turns out I did it wrong the first time! Claude made a change in my code, I reverted it thinking it was wrong, but actually it had detected a subtle bug! I think by now we're all aware that LLMs are quite capable in math and coding, of course, but I list these examples for completeness in my argument: the correct update to make when an LLM contradicts you is not zero, and it's getting bigger.

  1. ^

    Apparently there's a decent market for presidential signatures of note? They managed to sell President Garfield's signature ratifying the Angell Treaty of 1880 for ten grand, partly off the infamy of the treaty and partly because Garfield's presidential signature is rare, him having been assassinated 6 months into the job.

  2. ^

    Fun bit of color from the UMich source

    Long afterward, writing his memoirs, Angell would remember the genuine warmth of Li [Hongzhang]’s greeting. The viceroy was full of praise for the commercial treaty signed by the two nations.

    “He was exceedingly affable …,” Angell remembered, “and [began] with the warmest expressions in respect to my part in the opium clause.

    “I told him, it did not take us a minute to agree on that article, because the article was right.

    “He replied that I had been so instructed in the Christian doctrine & in the principles of right that it was natural for me to do right.”



Discuss

Lifelink™: Freedom for your Child

Новости LessWrong.com - 18 января, 2026 - 23:35
Published on January 18, 2026 8:35 PM GMT

Crosspost from my blog.

Note: Fictional! To preempt any unnecessary disappointment and/or fears of dystopia, be aware that this is not a real product, I don't know of plans to develop it, and it is infeasible in many respects. There are some related products under search terms like "kids GPS smartwatch" and "safety monitor".

Do you want your child to have free rein to wander in nature or explore the town? Are you worried about your child getting lost, or injured, or worse? Have you heard horror stories about CPS?

Introducing Lifelink™, the undisputed best-in-class FRC wearable safety link for independent children. Give your child the gift of secure autonomy today. Device FREE with subscription. Features include:

  • Options for necklace, bracelet, pocket, glasses, or anklet wearables. (Check out our multi-wearable packages for savings!)
  • Connectivity and real-time location tracking ANYWHERE through our worldwide affiliate system.
  • Military-grade rugged construction—waterproof, shockproof, fireproof, impact-proof, guaranteed.
  • Very difficult to remove without the passcode or remote parental release—criminals will stay away—LockPickingLawyer approved! Hardwired tamper alerts and location updates sent straight to you immediately so you know if anyone is trying to take away your child's protection.
  • Patented custom bioconformation form factor—Lifelink™ will sit flush with your child's skin, and will NOT get snagged! (Smart breakaway features for extraordinary circumstances optional.)
  • Child Protective Services CANNOT investigate you solely for having an unattended child over the age of 5 in public areas if they are wearing a Lifelink™! New FRC probable-cause laws currently in effect in these states: CA, IL, TN, TX, UT, VA, WA. Know your rights! More states coming soon.
  • Neuromorphic chip with hardwired low-power super-distilled LLM, activated by voice or unusual noises, checks if your child might be in a dangerous situation (including asking your child for assurance) and sends telemetry to our command center, where a full-power LLM ensemble and big data predictive models will alert you if your child might be at elevated risk (all computations homomorphically encrypted for your complete security!).
  • NEW: Police coordination. In participating jurisdictions, police are trained to allow children to roam freely in safe public areas unattended if they are wearing a Lifelink™, and many stations will have a hotline specifically for Lifelink™ S.O.S. signals and can directly receive location information. Currently available in these cities, with more to come: Berkeley, CA; San Diego, CA; Denver, CO; Boston, MA; Austin, TX; and Seattle, WA.
  • All vitals tracked! (Pulse, blood oxygen, temperature, hydration) You'll get immediate alerts about any dangerous levels.
  • Integrated submersion detector, compass, clock, thermometer, barometer, air quality sensor (CO₂, CO, and particulates), and dual silicon diode / micro Geiger–Müller tube nuclear radiation sensors.
  • All data fully end-to-end encrypted. ONLY YOU HAVE ACCESS! Check our website for canaries.
  • Simple, easy S.O.S. button for your child to call for urgent help if need be.
  • Parents can push an alert sound or "come back home" call.
  • Two-way audio calling for emergency check-ins.
  • Lean5-verified firmware—rest assured, your child's Lifelink™ will never freeze or crash! (Locator beacons also functional with the hardwired backup system.)
  • 3-day power supply, using breakthrough Lithium-Oxygen BREATHABLE battery for ultra-efficient energy density, plus small backup standard anaerobic battery.
  • Automatic solar, motion, and thermal recharging, plus active-fidget recharging.
  • Smart systems conserve power by rationing scans, pingbacks, and data, to stay fully focused on reliable safety-critical communication.
  • Ultra-low-power 433 MHz narrowband radio backup locator beacon for true emergencies, dead batteries, or surprise connectivity dead zones. ALWAYS know where your child is!
  • Simple, recognizable, adjustable audio alarms for low battery power, low connectivity, dangerous weather conditions, or nearby dead zones or danger areas. Parents also alerted.
  • Your child can press a button to get an audio update on any nearby dead zones, danger areas, or parent-designated areas to avoid.

For as low as $25 per month plus equipment shipping, you'll get:

  • DRONE RESCUE: In emergencies, if there are available reconnaissance drones, they will fly to your child's location and send video and audio updates to you, as well as broadcast to communicate to your child, warn criminals, or warn CPS agents.
  • A full subscription to our comprehensive connectivity package for your child's Lifelink™ wearables. This includes:
    • All major satellite connectivity providers, including Starlink, Iridium, and Globalstar
    • Most major cell providers, including Verizon, AT&T, and T-Mobile
    • Automatic connections to all public meshnets
  • $50,000 FRC legal defense insurance, access to our specialized attorneys, and a cryptographically signed Safety Log that has precedent in state courts as admissible evidence of your child's continuous supervision via Lifelink™.
  • Unlimited warranty—replace your child's wearable at any time, for any reason, no questions asked**.
  • Unlimited size upgrades! We know your child is growing fast, and we're ready with the equipment they need to be safe and free**.
  • In the management app, see connectivity dead spots (rare!) and crime or injury danger spots. Educate your child about areas to avoid.
  • Training games for parents and children to learn how to use Lifelink™ together.
  • DATA DASHBOARD: see your child's history in telemetry, including location and vitals. View data analysis insights from our personalized data assistant.
  • Access to opt-in FRC buddy network, with approval! Lets your child find other nearby parent-approved children who also have Lifelink™. Rigorous expert-vetted verification system.
  • All software updates, free!
  • Export your data any time.

With our Pro package, you'll get everything in the Basic plan, plus:

  • PRIORITY drone rescue.
  • Real-time professional human monitoring. At the first sign of danger your care team will alert you or your delegate.
  • Any set of wearables, up to ten per child at a time, for all your multi-wearable, fashion, and backup needs.
  • Wearable antenna clothes for extra security. In extraordinary events, your child's clothing can serve as an ultra-low-power emergency long-range 133 MHz narrowband radio locator beacon.
  • Whispernet, meshnet, and universal commercial Wi-Fi passcodes updated daily.
  • No limit on size and style upgrades, and no shipping cost for replacements!
  • Access to JAILBROKEN wearables (voids software warranty).

**Limit 1 (one) replacement per month. Void with intentional destruction of equipment or software hacking. Shipping not included.



Discuss

How to Love Them Equally

Новости LessWrong.com - 18 января, 2026 - 20:09
Published on January 18, 2026 5:09 PM GMT

My parents have always said that they love all four of their children equally. I always thought this was a Correct Lie: that they don’t love us all equally, but they feel such a strong loyalty to us and have Specific Family Values such that lying about it is the thing to do to make sure we all flourish.

I realized this morning they are probably not lying.

The reason I originally thought they were lying is that it seems clear to me that they are on the whole more frequently delighted by some of us than others. And on the whole can relate more frequently to some of us than others. And that’s skipping over who they might be most proud of.

Now I grew up with a distinction between “liking” and “loving” which I have always found helpful: “Liking” is the immediate positive experiences and payoffs you get from a relationship. “Loving” is the sense of deeper connection you have with someone[1].

Liking goes up and down. Loving stays the same or goes up, unless you misunderstood someone’s fundamental nature entirely. You can like someone more if they are in a good mood than in a bad one. But you don’t love them more or less for it.

What do you love them for instead? For their values, their way of relating to the world, their skills and traits that are so essentially them that they outline every edge of their spirit. Not “spirit” as a metaphysical object, but like how some people deeply embody kindness cause they are just that way. There might be something in their deep values, or their reward wiring, or their instincts, that makes them so deeply kind. And that. That, is something you can love.

Now children, genetically, are 50% of each parent[2]. If a parent loves all of themselves and loves all of their partner then ... they will naturally love all of their children.

What’s the “equal” doing though? Don’t you love some people more than others?

Yes and no. The way I think about “love” the loving feeling is the “same” for the kindness in John as for the competence in Jack. But if Jill is both kind and competent than I may love her more than John or Jack (all things being equal that is. Ha!)

And of course you can’t math the traits together. It’s an intuition of a direction of a feeling.

But I think that direction points to this: Your kids are built from all of you and all of your partner - If you love all of that, then you love all of them.

Of course, mother nature has more chemicals to solve any problem in that equation. Drugs are a hell of a drug.

But even if you lack those, then your children are roughly a mosaic of you and the person you picked to make them with.

And that means something else too: If you don’t love parts of yourself or your partner, then your children will see that too. If you get angry at yourself for always being late or angry at your partner for always making a mess, then your kids will see you won’t love those parts of them either.

And sure, not all genes express in all phenotypes, and nurture and experience matter too. But love is a fuzzy feeling and will fuzz out most of the difference. If the core traits are there, distributed across your children in various combinations, then each of them is Clearly Loveable. Because so are you, and so is your partner.

My parents are good at this. They clearly accept themselves fully and each other too.

I don't always accept myself fully.

But I'm working on it.

Because if my kids grow up and find any of their parts to be like mine, I want them to be able to look at me and see that I love those parts too. And maybe that will in turn help them figure out how to love themselves just as equally.

 

  1. ^

    I’m not claiming these are the de facto correct ways to think about liking and loving. My intention is to offer a frame for these concepts that might be worth exploring. You can also keep your own definitions and think about this as alt-liking and alt-loving, and still track them as ways of relating.

  2. ^

    I’m skipping over blended families here. My own family has aspects of that too and it is great and the love is as real as ever. This essay is more a messy exploration of how loving and accepting yourself can have positive effects on your bond with your children.



Discuss

Massive Activations in DroPE: Evidence for Attention Reorganization

Новости LessWrong.com - 18 января, 2026 - 18:05
Published on January 18, 2026 3:05 PM GMT

Summary

I do a quick experiment to investigate how DroPE (Dropping Positional Embeddings) models differ from standard RoPE models in their use of "massive values"  (that is, concentrated large activations in Query and Key tensors) that prior work identifies as important for contextual understanding. I did this in my personal time, for fun.

Two main findings:

  1. DroPE reduces massive value concentration significantly in Query tensors compared to RoPE.
  2. RoPE relies way more on massive values than DroPE, and disrupting them breaks RoPE but only degrades DroPE.

These findings suggest that, during recalibration, DroPE learns alternative attention mechanisms that don't depend on concentrated features.

BackgroundWhat Are Massive Values?

Massive values are unusually large activations in the Query (Q) and Key (K) tensors of transformer attention layers. They were identified by Jin et al. (2025) to have the pattern of:

  • Being concentrated in low-frequency RoPE dimensions
  • Being present in Q and K but notably absent in V
  • Being critical for contextual knowledge understanding tasks (passkey retrieval, sentiment analysis, mathematical reasoning) but not for parametric knowledge retrieval (factual recall)

Jin et al. provide a mechanistic explanation rooted in RoPE's frequency structure. RoPE divides the head dimension into pairs, each rotating at a frequency θj=10000(−2j/d).mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-surd + .mjx-box {display: inline-flex} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} . High-frequency components (small j) change rapidly with position, encoding fine-grained positional information. Low-frequency components (large j) change slowly, and Jin et al. argue these dimensions primarily encode semantic content rather than position. They find that when they disrupt these massive values, this devastates contextual understanding tasks while parametric knowledge tasks show only a little bit of degradation.

What Is DroPE?

DroPE (Gelberg et al., 2025) is a method that removes Rotary Position Embeddings (RoPE) from pretrained models and recalibrates them, which has the effect of zero-shot extending context length.

The claim here is like: RoPE scaling methods (PI, YaRN, NTK) attempt to extend context by compressing rotation frequencies. But low-frequency RoPE components never complete a full rotation during training (ϕm(Ctrain)<2π for small ωₘ). At extended lengths, these phases become out-of-distribution, so any scaling method must compress low frequencies by factor 1/s to keep phases in range. But this compression shifts attention weights at long distances, where semantic matching matters a lot.

These papers make seemingly incompatible claims.

Jin et al. claim that RoPE -> massive values -> essential for contextual understanding
Gelberg et al. claim that remove RoPE -> better context extension with preserved capabilities

If massive values are caused by RoPE and critical for understanding, how does DroPE maintain performance?

So we can check a pretty neat and well-scoped research question: are massive values a cause or consequence of contextual knowledge capabilities? And the proxy test we can do cheaply here is: does DroPE, after recalibration, still have massive values?

Experiment 1: Massive Value ComparisonMethodology

Models compared:

  • meta-llama/Llama-2-7b-hf (standard RoPE)
  • SakanaAI/Llama-2-7b-hf-DroPE (RoPE removed + recalibrated)

Procedure:

  1. Load both models with identical tokenizer
  2. Process N diverse text samples
  3. Extract Q, K, V tensors from all 32 layers using forward hooks on projection outputs
  4. Compute L2 norm matrix M[head, dim] for each tensor
  5. Count positions where M > 5.0 × mean(M) (the definition for massive values)
  6. Repeat across multiple samples and report mean plus minus std

Text samples used: 10  texts including:

  • Literary: Hobbit, Tale of Two Cities, Moby Dick excerpts
  • Technical: ML/transformer descriptions
  • Conversational: Dialogue snippets
  • Factual: Scientific descriptions
ResultsTensorRoPE (mean ± std)DroPE (mean ± std)ChangeQuery1475.5 ± 22.6901.4 ± 36.0-38.9%Key1496.8 ± 69.81331.5 ± 74.1-11.0%Value174.0 ± 10.7176.6 ± 5.7+1.5%

 

Figure 1: Massive value counts for Query, Key, and Value tensors. Error bars show ±1 standard deviation across 10 text samples. DroPE shows 39% reduction in Query and 11% reduction in Key.

We also plot this across layers.

Figure 2: Query massive values by layer. The shaded area shows the reduction from RoPE to DroPE. DroPE consistently has ~17 fewer massive values per layer.Interpretation

How do we interpret these results? 

Query shows the largest reduction in number of massive values. Roughly, the Query tensor encodes "what to look for" in attention, which is the model's representation of what information the current position needs. DroPE models learn to distribute this information more evenly across dimensions rather than concentrating it in the low-frequency RoPE dimensions.

Key shows moderate reduction in number of massive values. Roughly, the Key tensor encodes "what information is here" at each position. The smaller reduction suggests some concentration patterns persist, possibly because Key representations must still support some semantic matching.

Value is unchanged, within error bars. Mostly just confirms the Jin et al. finding.

Low variance across text types (std ~2-5% of mean) indicates this is a robust structural property of the models, not dependent on input content.

However, a closer look at Figure 2 shows DroPE didn't uniformly reduce massive values.

Figure 3: Layer 1 is the only layer where DroPE has MORE massive values than RoPE. This suggests DroPE concentrates some position-independent processing in the first layer.

Not sure how to interpret this. Possibly, without positional embeddings, DroPE may use layer 1 to establish token relationships through content alone, then rely less on concentrated features in subsequent layers.

Experiment 2: Disruption ExperimentMotivation

Finding 1 shows DroPE has fewer massive values, but are these values still functionally important? We test this by zeroing out massive value dimensions and measuring model degradation.

Methodology

Procedure:

  1. Identify massive value dimensions in Q and K projections (threshold λ=5.0)
  2. Register forward hooks that zero out these specific dimensions
  3. Measure perplexity on held-out text before and after disruption
  4. Compare to control: zeroing same number of random dimensions
  5. Repeat with 10 different random seeds for control condition

Disruption implementation:

# Hook on q_proj output def hook(module, input, output): # mask: boolean tensor where True = massive value dimension zero_mask = (~mask).to(output.dtype) # 0 where massive, 1 elsewhere return output * zero_mask # Zero out massive dimensions

Metric: Our metric of choice here is M-R Difference = (Massive disruption PPL increase) - (Random disruption PPL increase) 

and we interpret it as

Higher M-R difference = model relies more on massive values specifically

Results

Raw Perplexity Values

ModelBaselineMassive ZeroedRandom ZeroedRoPE1.301,508.51.31DroPE1.4922.71.49

Percent Increase (mean ± std across 10 seeds)

ModelMassive DisruptionRandom DisruptionM-R DifferenceRoPE+115,929% ± 0.0%+0.6% ± 0.7%+115,929%DroPE+1,421% ± 0.0%+0.2% ± 1.2%+1,421%Figure 4: Perplexity after disruption (log scale). Zeroing massive values breaks RoPE (PPL 1 -> 1508) but only degrades DroPE (PPL 1.5 -> 23). Random controls cause negligible damage.

Statistical validation:

We do some quick statistical tests because this is so cheap to do.

  • Paired t-test (massive vs random): p < 10⁻⁴⁸ for RoPE, p < 10⁻²⁹ for DroPE
  • Independent t-test (RoPE vs DroPE): p < 10⁻⁸⁷
  • Cohen's d > 1000

So I feel fairly confident that these results are significant!

Key ratio: RoPE relies 82× more on massive values than DroPE

Consistency Across Text TypesText TypeRoPE PPL IncreaseDroPE PPL IncreaseLiterary+116,000%+1,400%Technical+115,800%+1,450%Repetitive+116,100%+1,380%

Results are pretty consistent regardless of text content.

Interpretation

RoPE model: Zeroing massive values completely breaks the model. The model cannot function without these concentrated activations.

DroPE model: Zeroing massive values degrades but doesn't break the model. The model has learned alternative mechanisms that partially compensate.

Control condition: Zeroing random dimensions causes negligible damage in both models, proving massive values are specifically important, not just any high-norm dimensions.

Basically, both models have massive values, but RoPE is catastrophically dependent on them while DroPE is not.

Takes

The apparent contradiction between the papers dissolves once we distinguish where massive values come from versus how they're used:

  1. Massive values are learned into weights during RoPE training, as opposed to being created by RoPE at inference. The projection matrices W_Q and W_K develop these concentration patterns because RoPE's frequency structure during training creates gradients that favor certain dimensions.
  2. RoPE at inference makes massive values functionally critical. The rotation operation couples these concentrated features to position-dependent attention patterns. Remove the rotation, and the model breaks because the model doesn't know how to use them without positional modulation.
  3. DroPE recalibration teaches alternative usage patterns. During the brief recalibration phase, it seems the model learns to: 
    1. Reduce concentration
    2. Distribute information more evenly across dimensions
    3. Perform attention based on content similarity alone
Why did I do this?

Understanding how and why large language models work in a principled way will require knowing the internal mechanisms of the transformer stack very deeply. While many components (such as attention, MLPs, and residual connections) are now relatively well studied, positional encoding remains surprisingly opaque (at least, to me). In particular, Rotary Positional Embeddings (RoPE) is both weird and brittle: they strongly shape attention behavior, impose hard-to-reason-about constraints on context length, and interact nontrivially with model scaling, quantization, and alignment. 

I find RoPE is often wonky to work with, where small changes in frequency scaling or context length can produce disproportionate failures, and extending context reliably seems like it will require delicate engineering. Also, I had a free hour, Claude Code, and reread Yoav's paper while at the gym earlier this morning.

Limitations and Future Work

Models tested: We examined only Llama-2-7B, as it was the largest DroPE model mentioned in the paper. Also, I can't find the other models on HuggingFace. Larger models and different architectures may show different patterns.

Recalibration dynamics: We compared endpoints (RoPE vs. fully recalibrated DroPE). Tracking massive values during recalibration would reveal how redistribution occurs. I

Task-specific analysis: We measured perplexity. Testing on Jin et al.'s contextual vs. parametric knowledge tasks would directly validate whether DroPE's reorganization preserves contextual understanding through alternative mechanisms. I'm doing this as we speak.

ReproducibilityCode

All experiments can be reproduced here:

# Massive value comparison python scripts/run_massive_values_rigorous.py # Disruption experiment python scripts/run_disruption_rigorous.pyHardware
  • GPU: NVIDIA A10G (24GB)
  • Models loaded in 4-bit quantization
ParametersParameterValueSourceλ (massive threshold)5.0Jin et al. 2025Sequence length512 tokensStandardNumber of text samples10Diverse corpusNumber of random seeds10Statistical validationCitation

If you use these findings, please cite:

@article{jin2025massive, title={Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding}, author={Jin, Mingyu and others}, journal={ICML}, year={2025} } @article{gelberg2025drope, title={Dropping Positional Embeddings for Zero-Shot Long-Context Extension}, author={Gelberg, Tal and others}, journal={arXiv preprint arXiv:2512.12167}, year={2025} } @techreport{africa2026massive, title = {Massive Activations in DroPE: Evidence for Attention Reorganization}, author = {Africa, David}, year = {2026}, url = {https://github.com/DavidDemitriAfrica/drope-activations} }

Discuss

Irrationality as a Defense Mechanism for Reward-hacking

Новости LessWrong.com - 18 января, 2026 - 06:57
Published on January 18, 2026 3:57 AM GMT

This post was written as part of research done at MATS 9.0 under the mentorship of Richard Ngo. It's related to my previous post, but should be readable as a standalone.

Remark: I'm not yet familiar enough with the active inference literature to be sure that the issues I bring up haven't been addressed or discussed. If you think my characterisation of the state and flaws of the theory are missing something substantial, I'd love to know.

Introduction

In the theory of active inference, agents are described as having a set of internal states that interact with external states (the world) through a membrane of intermediate states, such as the senses. I'm currently exploring how agents are able to exhibit approximations of external reference that allow them to stay alive in the real world. They achieve this even though they only have access to the statistical proxy of their internals, which they could easily reward-hack without optimising the external states at all.

One of active inference's weaknesses is that it struggles to model agents' uncertainties about their own preferences. I here propose a potential explanation for why agents are conflicted about these preferences. This perspective posits agents' seeming inconsistency and irrationality about their goals as a mechanism that protects them from reward-hacking their internal states. 

Internal reward-hacking

Consider the following question:

What stops an agent from generating adversarial fulfilment criteria for its goals that are easier to satisfy than the "real", external goals?

Take Clippy as an example, whose goal is stated as maximising the amount of paperclips in the world. Since Clippy only has internal reference, it could represent this goal as "I observe that the world has as many paperclips as it could possibly have". I'm wondering what in Clippy's system saves it from "winning at life" by hooking its sensors up to a cheap simulator that generates an infinite stream of fictional paperclips for it to observe.

Do agents just have good priors?

An elegant answer to the problem of internal reward-hacking is that agents come pre-equipped with suitable priors about their internal states. In active inference, agents seek to update their beliefs and act on the world such that their observations fit their priors as closely as possible. The space of "good" priors for agents' internal states is very small. However, evolutionary pressures have selected for agents with priors that are conducive to their survival. According to active inference, agents attempt to manifest these favourable priors through action, which makes the priors function as preferences.

Unfortunately, the claim that evolutionarily fine-tuned priors do all the work to prevent internal reward-hacking seems lacking to me, because in practice we are uncertain about our own feelings and preferences. We don't actually have locked-in, invariant preferences, and it's unclear to me how active inference explains this; preferences are usually encoded as priors over observations, but ironically these are never updated.[1] 

Active inference thus implicitly assumes agents to be consistently, definitively settled on their preferences. Agents are only uncertain about the external states and about how their actions and senses will interact with those states. Within those unknowns, they seek to optimise for the observations that they are certain they prefer. I don't think this assumption is warranted. In fact, I have been considering the possibility that agents' uncertainty about their own preferences is an important instrument for increasing their (bounded) rationality.

Internal inconsistency as a tool for rationality

Consider the example I used in my last post of a hypothetical person, Alice, who wants to maximise "success". In that example, Alice avoids applying to a prestigious university because rejection would decrease her internal perception of success. She instead applies to a worse university that she is sure to get into, as this will certainly increase her success-o-meter. 

Suppose instead that Alice feels a twinge of guilt not applying to the prestigious university, as this could be perceived as "loser" behaviour by her friend. This guilt may motivate her to apply anyway, even though the action lowers (in expectation) her internal perception of success. Here, the mixed optimisation of two distinct goals: "I perceive myself as maximally successful" and "I perceive myself as someone that my friend thinks is maximally successful", yields behaviour that actually makes Alice more successful.

In Free Energy Minimisers (FEMs) from active inference, preferences are usually described as fixed priors over the space of observations. One possible model for Alice's behaviour is that each action is chosen with respect to one of two sets of priors. The priors she chooses to satisfy in a given action are sampled from some distribution over priors that represents the degree to which she identifies with conflicting preferences. In practice, Alice now doesn't resemble a consistent FEM, but she has become more aligned with respect to the external goal. Her mixed strategy between preferences can be seen as hedging against her top choice of priors being unfit.

What's next: competition between preferences

I would like to distinguish this concept of inconsistent preferences from mental motions such as compartmentalisation. For instance, suppose an agent learns to calculate the derivative of a function (f+g) by having separate[2] parts of itself calculate the derivatives of f and g and then adding the results. This motion could be seen as the agent using subagents' outputs to solve a problem. However, these "subagents" are not imbued with goals of their own. They're more like tools that the agent deploys to break the problem down into manageable components.

My guess is that people's uncertainties about their preferences are better represented as meme(plexe)s competing with each other for attention. The memes that live to be observed in minds are those that could be seen as agentically pursuing survival and reproduction.[3] Internal preferential inconsistency would thus be analogous to the sub-parts in the above example optimising to convince the agent that they are "useful" for calculating derivatives and should be kept around.[4]

Sub-processes and compartmentalisation as tools to increase rationality are not controversial ideas. The more contentious claim I'm ideating is that even conflicting agentic sub-processes — harboring goals that are unaligned with those of the larger agent — can still be useful for increasing agentic rationality with respect to external goals. I aim to formalise and explore this hypothesis in an empirical or mathematised setting.

  1. ^

    There's a good reason for never updating priors over observations. If agents' preferences could update, they would gradually move towards preferring states that are more likely, even if these aren't fruitful for their continued existence. The function of the fixed priors is to give agents a vision of the world they are willing to execute actions to manifest; these are preferences.

  2. ^

    this potentially includes separation across time

  3. ^

    For example, successful memes, like catchy songs, have a tendency to get their hosts to spread them to other people.

  4. ^

    This goal could functionally be the same as actually being good at calculating derivatives, but it doesn't have to be. For example, if the agent wants the derivative to be high, then a sub-part may gain a competitive advantage by overestimating the answer of the derivative of f. It may eventually convince the agent to employ two copies of itself to calculate the derivatives of both f and g, replacing the other sub-part.



Discuss

Blogging, Writing, Musing, And Thinking

Новости LessWrong.com - 18 января, 2026 - 06:28
Published on January 18, 2026 3:28 AM GMT

Yesterday I stumbled on this quote from a blog post by JA Westenberg:

Michel de Montaigne arguably invented the essay in the 1570s, sitting in a tower in his French château, writing about whatever interested him: cannibals, thumbs, the education of children, how to talk to people who are dying. He called these writings essais, meaning "attempts" or "tries." The form was explicitly provisional. Montaigne was trying out ideas, seeing where they led, acknowledging uncertainty as a fundamental feature rather than a bug to be eliminated.

It's hard to convey the sense of both profound agreement and giddy joy I had reading that because, not only is the wider post about something I love (i.e. blogging), or because I learned something new about the history of writing (which is always fun), but because that quote describes something that I've been doing myself for the past two years and wanted an excuse to talk about!

What Writing Is

There's an old adage that says, "Writing is Thinking" and I've usually interpreted those words to mean that "writing helps you think", which is undoubtably true. However in recent years I've discovered an entirely new form of writing that I've taken to calling: musing, that I think takes this idea a step further, and it's precisely what Westenberg describes Montaigne doing in the 16th century.

We have a lot of thoughts throughout the day, and yet we spend so little time indulging these idle curiosities. Writing, especially by hand, can be a great way to explore these ideas and to practice thinking. It's also really fun to do! Over time I started collecting these ideas into notebooks (one of which I almost always carry with me) in order to better organize these inherently random topics into a searchable system. Originally I scribbled on loose leaf pages or random legal pads (as I've mentioned before) and that became unruly very quickly.

Some of these musings are personal topics, most are not. Often they're just the exploration of a question I have. Here's an example:

Businesses keep offices cold because there's research saying that cooler temperatures help people think and stay focused. Given that's true, could the Little Ice Age have helped improve human cognition during the 17th and 18th centuries? If so, what does that mean?

I'm not sure, but it was something I thought about for a while and so I wrote it down. The entire musing, or essay as I guess it should be called, is less than a page, but was engaging and very fun to do. I've written short essays about dozens of topics over the years (including several that have been eventually published to this blog). It's a fun practice, and I encourage you to try it.

Explore your ideas honestly. Don't fear where your mind goes or the questions it will ask. These are personal, honest thoughts not social media posts. Writing like this is inherently introspective, it's a way to give your mind the space to awe and wonder

What Thinking Is

We often believe that thinking is a process which takes place entirely in the mind, but it's a process that is heavily influenced by the particulars of the body. Try thinking through a problem in a hot and smelly room or on a walk with a rock in your shoe.

However, the body can do more than hinder the thought process, it can catalyze it! This is what writing can be, a way to think through problems using your entire body.

Occasionally, I've sat down to write but without any particular topic in mind. So, I open my notebook and just start writing. Tons of my essays begin with something like, "I'm not sure what I'm thinking right now and I don't know what to write." From there, I let my thoughts move and course as they will and I just write down what comes to mind, stopping and starting as my thoughts shift and change and eventually I will find that something has come out of it. I might work through a tension or stress point, I could come to realize or discover something about a problem, or I could just get a few lackluster thoughts on a page. Not all thinking is productive but the mind is a muscle and it needs to be exercised to function properly. Sometimes just doing the workout is enough.

However, the body can do more than hinder the thought process, it can catalyze it! This is what writing can be, a way to think through problems using your entire body.

Occasionally, I've sat down to write but without any particular topic in mind. So, I open my notebook and just start writing. Tons of my essays begin with something like, "I'm not sure what I'm thinking right now and I don't know what to write." From there, I let my thoughts move and course as they will and I just write down what comes to mind, stopping and starting as my thoughts shift and change and eventually I will find that something has come out of it. I might work through a tension or stress point, I could come to realize or discover something about a problem, or I could just get a few lackluster thoughts on a page. Not all thinking is productive but the mind is a muscle and it needs to be exercised to function properly. Sometimes just doing the workout is enough.

Thinking as a Skill

We usually think of cleverness or intelligence as an innate trait people have, and while that is certainly true in some regards, intelligence and wisdom are just as much a function of practice as of talent. To get good at solving puzzles, you have to practice solving puzzles. The mind is no different than a muscle in that regard. Thinking aloud on the page is one way to record and analyze your thought process and to practice the art of thinking itself.

As another example, I often revisit my prior writings and find many to be overly simplistic, uninspired, or just plain wrong. But that's good! It means I've learned something in the intervening time! In software there's an addage:

If you come back to old code and see nothing wrong with it, then you haven't learned anything since.

You are not a finished product, you're a process—always in motion—that evolves and changes over time. Your thinking can improve with practice as much as it can atrophy from inattention.

Think about thinking, write those thoughts down, then perhaps publish a few on a blog that you own. It's fun, and it can do wonders for the mind.



Discuss

Is METR Underestimating LLM Time Horizons?

Новости LessWrong.com - 18 января, 2026 - 04:51
Published on January 18, 2026 1:19 AM GMT

TL;DR

  • Using METR human-baseline data, I define an alternate LLM time-horizon measure, i.e. the longest time horizon over which an LLM exceeds human baseline reliability (or equivalently the intersection point of the human and LLM logistic curves), and this measure shows a much faster growth-trend than METR's fixed-threshold trends: doubling every 1.9 months, versus 6.8 months for the 50% METR-trend over the same time period.  Also, since this metric is directly comparing to human baseline reliabilities (unlike the METR fixed-reliability estimates), we can use it in a more principled way to assess time to human-level horizons, which suggests roughly 2026-2027, with substantial uncertainty.
  • METR has generally deemphasized their human reliability baselines, on the grounds that the participants were poorly incentivized to complete long tasks; however, this post argues that comparing to this imperfect human data is likely a better reflection of progress towards human-level agency than the current METR horizon trends that use fixed reliability targets even as task length increases.
  • AI-2027 has argued controversially that the METR trends may actually be more accurately modeled as super-exponential, with finite-time blowup; this post argues that while this claim does not seem to be very well supported (yet) for METR's time horizon measure, this super-exponential model is more strongly supported for the proposed human-relative time horizon metric described in this post. 
  • See addendum at the end for an update regarding the recent Claude Opus 4.5 METR results.

Figure 1: Plot comparing frontier LLM time-horizon measures, including both the human-level-reliability time-horizon from this post (orange), versus the METR-style fixed-reliability 50% time-horizons (blue). We can see that this alternative human-relative time horizon measure has been increasing much more steeply over time than the METR horizons.  Note that the "human-level" horizon metric in this plot is comparing LLMs to METR's human baselines.  The most recent data point included here is gpt-5.1-codex-max.

Acknowledgements: I shared a draft of these ideas last year in correspondence with Daniel Kokotajlo and Eli Lifland from AI Futures. Thanks to Eli for his feedback; also, Daniel subsequently posted a short-form [1] touching on a similar crossover/intersection point framing, which is worth checking out for his perspective on these issues.

1 Summary

The METR time-horizon metric provides estimates for the longest software tasks that a given AI is capable of completing [2].  This estimate is based on human baseline measurements for how long a set of software tasks took human engineers, with the (geometric) average time based on the subset of humans who succeeded at each task.  METR also measured human reliability at each task, but rather than compare the AIs to those human levels, they have typically reported LLM time horizons at fixed reliabilities independent of task length (e.g. 50% or 80%). The METR estimates have featured prominently in efforts to estimate the timeline to human level long-horizon agency, e.g. per the AI-2027 forecasts[3].  The following bullets describe potential downsides of this METR horizon approach and summarize the proposed alternative, as well as providing trend-based projections for these metrics. 

1.1 Potential Downsides of the METR Metric 
  • Lack of human comparison: If the goal is to assess progress towards human level horizons, what we should ideally be doing is not comparing to fixed target reliabilities at each horizon length (e.g. METR's 50% and 80% targets), but rather comparing to baseline human reliabilities at each horizon. Note that both METR and others (e.g. AI-2027[3] or Greenblatt[4]) routinely use these fixed-reliability horizons to track progress towards human-level horizons, which perhaps feels like a natural assumption since the horizons were human-derived, but the problem is that the fixed reliability targets (even as difficulty/duration increase) are not based on any actual human baseline performance.  
  • Unknown difficulty trend complicates interpretation: METR relies on fixed reliability (e.g. 50%) trends to predict LLM time horizons compared to humans, but there is little reason to think that humans can achieve 50% reliability for long tasks of arbitrary difficulty, e.g. for tasks along the unknown METR task-difficulty trend.  Many people have the intuition that humans can handle tasks of arbitrary length at high reliability, but of course that depends on task difficulty, and while we can extrapolate the METR curve to weeks or months, the existing/actual tasks are short, so it's not clear how to estimate the difficulty of hypothetical long METR tasks.  There is a tendency to assume these would just be typical long software engineering tasks (e.g. merely time-consuming due to many fairly straightforward subtasks), but there is not much basis for that assumption, as opposed to longer tasks on this length/difficulty trend being more like "prove this tricky mathematical theorem", etc; in other words, there are multiple dimensions that are getting compressed into this task duration factor, including "repetitiveness" and subtask count, but also "intrinsic difficulty", etc, and the intuition that humans can handle long tasks at high reliability only really applies to the former.  Note that if METR adds new longer tasks to their benchmark, they can of course make them as easy or hard as they want,  but when they extrapolate their existing time-horizons to longer task lengths, the long task "difficulty" is implicit in the trend and not something they get to choose.   
  • Potentially unattainable bar for human-level: A related issue is that if we require AI to exceed a fixed 50% horizon at every time horizon in order to be considered human level, then it's not clear that this will ever occur, since for both humans and LLMs reliability tends to decrease with horizon length and difficulty (see more detailed discussion in the formula section below); by contrast, with the alternative metric that compares to actual human-level reliability at each horizon length, there is no barrier in principle to AI surpassing humans at every time horizon.  A related issue is that when projecting the existing METR metric to "human level" it is really unclear what horizon constitutes human level since the fixed reliability metrics aren't grounded in human reliability measures, e.g. see the wide range of targets for this in the AI-2027 forecast[3], whereas with this human-relative metric it's straightforward that human-level requires matching/exceeding human reliability at every task duration, per the projections below. 
  • Underestimated horizon lengths: The time horizons for METR are based only on the humans who actually succeeded at the tasks, so if you allowed all the humans more time to finish, the lower performing humans would presumably take even longer; so the current METR horizon lengths are likely an underestimate relative to METR's average baseline engineers; note the estimates are also properly interpreted as an average over METR's top baseline engineers, since it sounds like different engineers succeeded at different tasks.  These underestimated horizons potentially create bias that could make the LLMs appear to have worse horizons than they really do. However, the proposed human-relative metric does not address this particular downside, since it reuses the METR horizon lengths.
1.2 Alternative Human-Relative Metric:  

As an alternative to the fixed-reliability (e.g. 50%) METR time horizons, we can instead measure a time horizon metric that is defined relative to actual human baseline reliabilities; note that METR did actually measure human baseline reliabilities, though they tend to down-play those baselines as weak or insufficiently incentivized (see section 1.6 below) and instead focus on absolute reliability targets.  One issue is that current LLMs e.g. gpt-5 already exceed the METR human baseline time horizons at both the 50% and 80% targets; however, humans do still have fatter tailed reliability curves, so for longer tasks METR's human baselines still do better (though see Addendum below on Claude 4.5, since this claim is less clear now).  For instance,  gpt-5 has worse reliability than the human baselines once the METR task length gets longer than 4.5 hr, but note its 50% horizon is only 2.3 hr.  Using these human-reliability baselines, we can estimate LLM time horizons, as the longest task duration over which the LLM is more reliable than the METR human baselines, or more concretely, as the intersection point of METR's logistic fits for humans and LLMs, with a fit of reliability versus task duration.  See Figure 1 for a plot comparing these two horizon metrics over time.

1.3 Trends in Human-Relative Metric (exponential fit)

The trend in this human-relative time horizon metric is increasing much faster than the existing METR trends, and assuming an exponential fit, the LLM time horizons at human-reliability are doubling every 1.9 months, versus every 7 months over the same time period for the METR (50%-reliability) time horizon trend (see Table 1 and Figure 2).  In other words, every 1.9 months there is a doubling in the longest time horizon that LLMs can handle at METR's human baseline reliability. As METR has noted this exponential appears to have sped up recently, perhaps since the arrival of reasoning models, though the estimates above are just using all of the data for both metrics (not just the recent models); however, the hyperbolic analysis below does implicitly model increasing exponential rates over time.

1.4 Evidence for Super-Exponential (hyperbolic fit):
  • Theoretical reasons to expect hyperbolic trend: For the human-reliability-based trend, LLMs are linearly improving their logistic slope parameter over time (though the trend is noisy) so if this linear trend continues and this slope catches up the human slope, then the time horizon would jump to infinity as the LLM exceeds human reliability at all horizons (the logistic intercept parameter is already better than human and steadily increasing).  Note that people sometimes assume there is something unrealistic about blowups to infinity, but while that can be an issue for physical quantities, it is not a concern for abstract metrics like "the set of horizons where LLMs exceed humans".  And this finite-time blowup can be naturally modeled with a super-exponential (hyperbolic) trend, whereas with an exponential fit, the LLM would never exceed humans over all time horizons; note that this argument supporting a hyperbolic fit does not directly apply to METR's (fixed-reliability) horizon metrics, since even humans typically have declining reliability with difficulty/task-length, so matching human slope (and intercept) would not lead to a blowup in METR's metric (See Addendum below on Claude 4.5, which has effectively caught up with the human slope, at least per the weak METR human-baseline)
  • Statistical evidence for a hyperbolic fit: Based on AIC-model selection (see Table 3) the human-relative time-horizon trend appears to be closer to hyperbolic than exponential, whereas the METR horizon trends are a better fit for an exponential trend, though more data is likely needed to say this conclusively (See Addendum on Claude 4.5).
  • AI-2027 super-exponential projection: The AI-2027 project has argued, with some controversy, for using a super-exponential finite-time blowup for projecting the METR fixed-reliability (80%) time horizons; per above, this seems poorly supported in that fixed reliability case, but it is much better supported for this alternative human-relative horizon metric.  So my sense is that their core super-exponential time horizon intuition may turn out to be correct once you make this adjustment to the metric. 
1.5 Projections to Reach Human Level (hyperbolic fit)

Per above, with an exponential fit, LLMs would never catch up with humans, but with a hyperbolic fit the current trend suggests human level by around mid 2026, relative to the (weak) METR human baselines (see Table 2).  That table also shows sensitivity analysis with hypothetical stronger human baselines, with the two alternative baselines pushing human level into 2027.  These stronger baselines have flatter slope, with human reliability dropping more slowly with longer horizons, but they still do gradually decline, in contrast to the METR horizons based on constant reliability targets.

1.6. Downsides of the Proposed Metric 

METR has generally minimized the relevance of their human reliability baselines on the grounds that the participants weren't properly incentivized, and these weak baselines are a potential downside for this proposed metric; but in so far as we are trying to determine LLM progress towards human-level agency, then we are likely better off comparing to the available human baselines (even if imperfect) rather than absolute reliability thresholds that have no known connection to actual human performance; for instance, based on METR's human-baseline logistic fit, we should expect humans to get about 3% reliability for tasks at 1-month horizon (along the METR difficulty trend line), so it's not clear why we would require "human-level" AI to get 50% (or 80%).  That said, in so far as the human baselines are weak or poorly incentivized, it could be useful to collect stronger human baselines, or for now we can do sensitivity analysis with hypothetical stronger baselines (see below) to assess how much this changes projections.

2 Horizon Trends2.1 Exponential Model

Figure 2: Log plot of the proposed human-reliability LLM time-horizons, suggesting that an exponential fit is a reasonable approximation, though see below for model-selection analysis suggesting that a hyperbolic fit may be better.  This regression shows a time-horizon doubling rate of approximately 1.9 months. 

Approachdoubling time1-month horizon projection (month / year)time horizon (50% reliability)6.8 months11 / 2030time horizon (human-level reliability)1.9 months12 / 2026

Table 1: Estimated doubling rates for the proposed human-reliability LLM time-horizon metric versus the METR 50%-reliability metric.  Note that the METR horizons are doubling at a much slower rate, and if we assume status quo exponential progress, then 1 month horizons would come about 4 years sooner based on these human-relative time horizons.  Note that these are the overall trends, not the recent (somewhat) faster trends.  Also, see below for evidence that the proposed time-horizon metric may see super-exponential growth. 

2.2 Hyperbolic Model

While the above exponential model for LLM time-horizons is a reasonably good fit, there are fairly strong reasons to think that a super-exponential (e.g. hyperbolic) model is a better match for tracking this metric, whereas an exponential fit is more defensible for the official METR time-horizon metrics (i.e. for their fixed reliability, 50% and 80% metrics).  See Section 2.2.1 for theoretical evidence favoring a hyperbolic fit for this proposed human-relative metric and Section 2.2.2 for statistical evidence based on AIC model selection.  Also, Table 2 shows the estimated time for LLMs to reach human-level over all time horizons, including estimates based on the measured METR human baselines and also additional sensitivity analysis showing how this estimate would push further into the future if we compared to (hypothetical) stronger or better incentivized human baselines. Note that with an exponential fit, the trend would never catch up to human level horizons, so this question only really makes sense in the super-exponential context.

One challenge with extrapolating the human-relative time horizons to human level is that it involves extrapolating the human logistic function far beyond where it was actually tested in the METR benchmark; so ideally we would also collect direct human measurements for much longer tasks than the current <1 day tasks.  But given the expense of doing so, extrapolating the reliability logits could be a reasonable proxy, especially since they do appear to be surprisingly linear in task-duration.  

human reliability (at 16 hr)LLMs-to-exceed-human-level (date)22% (measured human baseline)2026-09-0250%2026-12-2880%2027-04-16

Table 2: Projections for when LLMs will exceed human reliability over all time horizons, based on hyperbolic fit. The first row shows the estimate using the proposed human-relative metric calculated with the actual human baseline data from METR, but since METR has argued that the baselines were poorly incentivized, the remaining rows show projections if human reliabilities are actually higher on the hardest/longest METR tasks, i.e. 50% and 80% rather than the 22% (estimated from their measured human logistic curve); note that the human slope β.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-surd + .mjx-box {display: inline-flex} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}  is what is being varied in this sensitivity analysis to match the target reliabilities above, whereas the logistic intercept α is left unchanged from baseline.  You can alternatively estimate dates for LLMs to reach human level by just projecting the LLM logistic slope linearly in time to estimate when it matches human slope, but see section 2.2.1 below for why I suspect this is less reliable than directly projecting the overall horizon progress.   

2.2.1 Theoretical Evidence

One natural intuition regarding METR time horizons is that people can effectively handle arbitrary task lengths, but with an exponential trend the 50% horizon will always be finite, so you might wonder if the actual long-term trend will be super-exponential with a finite-time singularity (e.g. per AI-2027 projections); however, per the formulas in section 3 below, the 50% time horizons likely won't blowup to infinity unless β approaches zero (since β is in the denominator), assuming the logistic parameters α and β are finite/continuous, but zero slope (β) would mean no reduction in reliability with increasing task length, which is likely unrealistic (see more careful discussion of formulas in section 3 below).  

On the other hand, for the human-level time-horizon formula, β−βh is in the denominator, so all that's required for a finite time singularity is for the LLM's parameters to catch up with human parameters.  And currently the LLM intercepts (α) are already better than the human baselines, and the β slopes are increasing linearly over time (with substantial noise), so it's plausible that this could occur at a finite amount of time into the future, at which point AI could exceed the human baseline at all times.  Also, since human reliability declines with task length, it is not surprising that it could be easier to exceed this performance over all possible time horizons; on the other hand, the 50% time horizons, require a fixed 50% reliability for arbitrarily long/challenging METR tasks, so it's much less clear whether it's feasible to achieve this over all task lengths.

2.2.1.1 Extrapolating LLM Slope-Parameter to Human Level

Unlike the trends in time horizons which tend to have tight fits with R2 > 0.8, the LLM logistic coefficient (slope) trend is quite noisy (R2 ~ .3), even if we limit to a single model provider (OpenAI) and exclude the noisy off-trend pre-instruct samples (gpt-2 and gpt3).  I suspect part of the reason for this is that the LLM providers are presumably optimizing for overall time horizon, which can be optimized via various combinations of slope and intercept in the model's logistic fit, so there is little incentive to cleanly drive up the slope with each release, e.g. model/training changes that significantly improve the intercept α while slightly worsening the slope β could be a worthwhile tradeoff as long as they improve the overall time horizon.  For this reason, I suspect it makes more sense to directly extrapolate the overall time-horizon estimate rather than linearly extrapolating the noisy logistic coefficient in isolation, even if the slope trend is a useful intuition-pump for seeing why a finite-time blowup is plausible.  But if we do naively extrapolate the coefficient, this suggests a fairly long time to catch up with humans (~2029) with large error bounds, versus the shorter timelines implied by a direct hyperbolic fit, per Table 2 (i.e. ~2026-2027).

2.2.2 Statistical Evidence

Given the theoretical reasons for suspecting super-exponential (hyperbolic) trends for the proposed human-relative LLM time horizons, one question is whether we can see evidence for this in the available data, using standard model selection techniques. One challenge to comparing these two models, is that the exponential fit has only 2 parameters, whereas the hyperbolic function has 3 parameters.  So we can't just directly compare likelihoods/mean-squared-error, and instead need some correction to penalize the hyperbolic for its higher model capacity.  See Table 3 for results comparing exponential vs hyperbolic fit, using AIC with small sample size correction.  The table actually reports Akaike weights, which are normalized to 0-1 for easier interpretation.  This analysis generally seems to support the theoretical expectations from the previous section, with AIC suggesting a hyperbolic fit for the human-relative time-horizon metric, versus an exponential fit for METR's 50% time horizon.

For the hyperbolic results in Tables 2 and 3, I excluded the pre-instruct models since those data points were quite questionable; in particular, these pre-instruct models are in a different technical category from subsequent instruct-tuned models, and for gpt-2 they weren't even able to use the same agentic scaffold that was used for later models, and they had to impute many data points to zero; also, for gpt-3, the model was no longer available in the api at the time of the METR testing, so they had to use proxy model instead and then back-date it; actually the earliest instruction-tuned model (gpt-3.5) also used a proxy, so an argument could be made for excluding that data point as well, but excluding pre-instruct data is a natural Schelling point for focusing on higher quality samples.  Also, looking at Figure 2, it appears that the pre-instruct data points (gpt-2 and 3) are outliers, and this is even more extreme in the plot of logistic slope β over time, where these dubious pre-instruct data-points are completely off the subsequent trend (plot not shown).

That said, these AIC results appear to be quite sensitive to which samples/outliers are included, so until we have more data (from future models), I think we should be fairly cautious in over-interpreting these AIC results.  But perhaps in the next 6 months it will become more clear whether the data is following a hyperbolic with finite-time catch-up with the human baseline.  Also, I do think it could also be better to use cross-validation rather than AIC for model selection in this case (since it's more empirical, less reliant on assumptions), but given the limited number of data points, I felt it wasn't worth the extra effort for now.

Approachexponential-fit (Akaike-weight)hyperbolic-fit (Akaike-weight)LLM time horizon (50% reliability)0.550.45LLM time horizon (human reliability)0.010.99

Table 3: This table shows a model-selection analysis comparing exponential versus hyperbolic fits, using AIC since the two models differ in capacity, with the hyperbolic having 3 parameters and the exponential only 2, so AIC penalizes this extra capacity and favors the exponential fit all-else-equal. This analysis suggests that METR's 50% horizon is more likely exponential, whereas the human-relative horizon is better modeled as hyperbolic. Note this hyperbolic model implies that LLMs would exceed the human baseline over all time-horizons at some finite time in the future (roughly in 2026), whereas with an exponential fit LLMs would never catch up to humans.  This AIC analysis excludes the two pre-instruct data points (gpt-2 and gpt-3), for reasons explained in the text, but given the small sample size these AIC results are quite sensitive to which points are included and should be taken with a grain of salt.  So while these AIC results are suggestive, probably the stronger evidence for a finite-time blow-up in time horizons comes from theoretical considerations (see section 2.2.1).

3 Technical Details (formulas)

In the METR analysis the reliability of both LLMs and humans is modeled with a logistic function in the human task durations (d), i.e.:

psuccess(d)=σ(α+βlog2d)

Note this parameterization is slightly different from METR's, but equivalent.  In the human case, their fitted intercept and slope are: αh=2.55 and βh=−.39.  From the above logistic curve we can derive the formula for the fixed-reliability time horizons that METR publishes, e.g. here is the time horizon formula at 50% reliability:

H50=2−α/β

On the other hand, the proposal from this post is to instead estimate LLM time horizons relative to the human-baseline reliabilities.  To do this we find the intersection point of the LLM and human logistic curves, which gives the time horizons below which the LLM is better than humans in reliability (or vice versa):

Hh=2(αh−α)/(β−βh)

where α and β are the LLM logistic parameters, and αh and βh are the human baseline parameters.  So for example for gpt-5, METR estimates the parameters as α=4.1 and β=−.58.  So from this we can see that the 50% time horizon for gpt-5 (137 minutes) is actually longer than the 50% horizon for the human baseline (98 minutes), but gpt-5's human-reliability time-horizon is longer still at Hh=269 minutes (note you have to use unrounded param estimates to replicate this calculation).  So this means that gpt-5 is more reliable than the human baselines for tasks under about 4.5 hr, but then because the human logistic has fatter tails, humans currently have higher reliability for all tasks longer than 4.5 hr.

Also, from the above Hh formula, we can see that the criteria for LLMs to match (or exceed) humans at every task duration is to match the human slope, in which case the time horizon estimate blows up to infinity (due to zero in the denominator); then if the slopes match, in order to exceed humans (rather than just match them) the LLM also needs to have a larger intercept, but that is already the case for LLM intercept estimates.  On the other hand, for the 50% horizon to blow up to infinity in finite time (e.g. per AI-2027), the slope β would need to increase to zero from the current negative value (assuming well behaved α and β, e.g. finite, continuous), but that would imply no reduction in reliability with longer duration/difficutly tasks, which is perhaps not realistic. 

For the hyperbolic projection of human-relative time horizons, I use a standard 3-parameter hyperbolic:

Hh=A(tc−t)m

Where Hh is the LLM time horizon ("_h" for human-relative), t is the current date/time, and tc, A and m are the three hyperbolic parameters, where tc can be interpreted as the date/time at which LLMs catch up to humans (i.e. the blowup date).  Note that a potential alternative could be to use a super-exponential curve derived directly from the intersection-point formula above, though some implicit assumptions would be needed e.g. regarding how α and β change with time.

4 Conclusion

Overall, my main takeaway from this analysis is probably that we shouldn't over-interpret the METR trends at fixed reliability as a direct marker of progress towards human-level software horizons; for instance, I think it would be a mistake to argue that AGI is still many years off on the grounds that it could be years until 80% horizons reach months/years on METR's metric; rather, when the goal is to assess time until human-level AI, my view is that we should focus on the kind of direct human-level baseline comparisons that METR's existing metrics don't provide (despite superficial appearances that they do provide this).     

That said, METR has raised legitimate concerns about the quality of their existing human baselines and whether they were properly incentivized, and it does seem like there would be quite a bit of value in measuring higher quality human baselines for these agentic tasks, or in the meantime, computing the horizons with the existing human baselines padded/improved per the sensitivity analysis above.   Also, given the existing baseline limitation, there are pros and cons of using METR's fixed-reliability horizon metric versus the human-relative alternative from this post, and there could be value in reporting both measures.   One concrete use-case for the the METR metrics is in cases where you just need to know if a model can meet some absolute reliability standard independent of human capability, though even in this use-case the interpretation can be challenging, given the unknown implicit difficulty trend, especially once the task duration exceeds the actual benchmarked tasks.

Addendum (Claude 4.5 Opus)

After I had mostly completed this analysis, the Claude 4.5 Opus METR results were released[5], and showed somewhat faster than expected horizon lengths; in particular, based on the human/LLM intersection-based horizon length emphasized in this post Claude now has a horizon of 444 billion minutes(!), versus 440 minutes for gpt-5.1-codex-max, which looks much closer to a hyperbolic blowup than exponential; to be clear, we shouldn't over-interpret this specific 444 billion figure, since as the LLM slope gets close to the human-baseline slope and the horizon blows up, the horizon estimate becomes really sensitive to estimation error, and both the human baselines and LLMs have reliabilities close to zero for such long tasks (at least in the METR logistic fits).  That said, this Claude data point does support a picture where Claude is now better than the available human-baselines (on average) for all the task durations in the current METR benchmark and even for all practical task durations if we are willing to extrapolate the linear logit fits to tasks much longer than the existing METR benchmark tasks.  

However, given METR's concerns that the human baselines were not adequately incentivized, we can also assess the new Claude model with respect to hypothetical stronger human baselines, per the sensitivity analysis in Table 2.  For instance, if we assume that the hardest/longest METR tasks could be completed with a bit over 2x higher reliability than METR's existing human baselines (specifically 50% rather than 22% from METR's human logistic fit at 16hr), then Claude 4.5 Opus has an intersection-based time horizon of only 35.9 minutes, which is actually less than gpt-5.1-codex-max at 39.4 minutes. But note that that this (hypothetical) stronger baseline still doesn't push the blow-up date that much later, i.e. just to the end of 2026 per Table 2.

Realistically, this highlights that to really make accurate projections of the time to catch up with human horizons based on METR data, we need better human baselines.  Whereas METR's fixed reliability metrics are likely an under-estimate of horizon progress, e.g. currently only 4hr 49 minutes for Claude, despite Claude being better than their own baselines for effectively all horizons (> billions of minutes per above). Though short of collecting high quality baselines, perhaps there is also some value in adjusting the human baseline to be marginally better, per the 35.9 minutes Claude estimate above, which likely has the benefit of preserving more realistic longer-term asymptotic behavior than the fixed-reliability metrics.

Claude slope vs intercept effects: Note that the extremely large Claude horizons from this intersection-point approach are mostly the result of their logistic slope basically catching up with humans (-0.38 human vs -0.40 claude-opus-4.5 vs -0.58 for gpt-5.1-codex-max); whereas Claude's logistic intercept was actually a bit worse than prior models.  So the Claude slope (β) increased a fair bit ahead of trend, though this is a bit misleading since their overall hyperbolic blowup wasn't that far ahead of schedule (per Table 2, where "better than METR human baselines at every time" was predicted for Sept of this year, prior to the new Claude datapoint).

Note it could be worth redoing the statistical tests from this post to include the new Claude model, but I haven't gotten around to it, but it looks pretty clear that it will provide stronger support for a hyperbolic model given the abrupt blow-up in horizons.

References
  1. ^

    Daniel Kokotajlo shortform on time horizons: https://www.lesswrong.com/posts/cxuzALcmucCndYv4a/daniel-kokotajlo-s-shortform?commentId=P8qGMRnbEexaFB4s9

  2. ^

    METR horizons paper: https://arxiv.org/pdf/2503.14499

  3. ^

    AI-2027 Forecast (original): https://ai-2027.com/research/timelines-forecast

  4. ^

    Greenblatt on METR timelines: https://blog.redwoodresearch.org/p/my-agi-timeline-updates-from-gpt

  5. ^

    METR post w/ recent data: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/



Discuss

Understanding Trust: Project Update

Новости LessWrong.com - 18 января, 2026 - 00:19
Published on January 17, 2026 9:19 PM GMT

This is a brief note on what I did with my funding in 2025, and my plans for 2026, written primarily because Manifund nudged me for an update on my project.

I ran my AISC project (which I announced here) with four mentees in Spring 2025: Norman Hsia, Hanna Gabor, Paul Rapoport, and Roman Malov. A few other people attended the weekly meetings as well, and those regular meetings have continued (they are joinable -- pm me if interested). Norman and Paul ended up as coauthors of my ILIAD 2024 paper Understanding Trust, which had been drafted in 2024, so served as both an input and an output of the AISC project.

I recorded most of the meetings involved in the project, as one of the hopeful outputs was publicly posted videos explaining the research agenda. I've proven to be bad at this side of things: I don't like listening to myself talk, so I found it difficult to edit or even to review edits done by others. I'm finally uploading the videos with minimal AI-orchestrated edits. Playlist here. At the time of publication, there's only two, but more coming very soon. If you are OK with the almost-unedited presentation style, it should be a good resource to get a very in-depth view on my thinking about AI safety and decision theory; a thorough snapshot of my thinking as of spring 2025.

In 2025, I obtained funding for 2025 as well as 2026. (My total financial runway is longer than this, but 2025 and 2026 have been funded by grants/donations which compensated me for my continued research at specific price points.) I'm opening up my Manifund project for funding for 2027, for those who feel so inclined.

In addition to publishing the ILIAD 2024 paper, I also published an ILIAD 2025 paper: Communication & Trust. I consider it to be an incremental improvement: the ILIAD 2024 treated self-modifying actions as a distinct class with known effects which work with certainty. The ILIAD 2025 paper treated all actions as having some subjective chance of disrupting the agent's computation. 

I also attended Inkhaven, where I wrote a post for every day in November. This was a big success for me: I was able to write about many things which I had been wanting to write about for some time (perhaps in rougher form than if I had eventually gotten around to them via my normal writing process). It was also exhausting. Here's my retrospective, with the caveat that I wrote it on the very last day, when I was perhaps the most sick of writing.

One of the posts describes my research arc over 2025, and the hopes I have moving forward. This is still a good summary of where I'd like to take my research in 2026. I have hope that we're understanding concepts and abstraction better, so that we might soon be able to characterize important concepts like agency, alignment, corrigibility, etc in a formalism which deals natively with ontological shifts. Most of my hope is due to Sam Eisenstat's Condensation: a theory of concepts, which I wrote a detailed review of during Inkhaven.

As for my more ML-flavored research ideas, I finally wrote about that stuff last week. I've already found someone interested in trying some experiments based on those ideas. We'll see how that goes.

I'm also mentoring with MATS this summer. You can still apply to my MATS track today or tomorrow as I write this; applications are due January 18th.



Discuss

Focusing on Flourishing Even When Survival is Unlikely (I)

Новости LessWrong.com - 17 января, 2026 - 21:47
Published on January 17, 2026 6:47 PM GMT

1. The Case

You've probably heard something like this before:

  1. If we survive this century, the expected value of the future is massive.
  2. If we don't survive, the expected value is near zero.
  3. Therefore, the value of an intervention is approximately proportional to how much it increases the chance of survival.

You won't go badly wrong following the conclusion, but (3) doesn't actually follow from (1) and (2). That's because interventions might vary in how they affect the expected value of the future conditional on survival.[1]

Will MacAskill makes roughly this argument in Better Futures (August 2025). See the diagram below: survival-focused interventions target the red rectangle, flourishing-focused interventions target the blue. But the blue rectangle might be much larger than the red rectangle -- if x-risk is 20% then even the best survival intervention can increase EV by at most 1.2x, whereas a flourishing intervention could increase EV by 5x or 5000x.

But is x-risk only 20%? MacAskill thinks so,[2] but his argument applies even if extinction is very likely — say 99% — so long as there are interventions that increase flourishing by +100x. That's the scenario I want to discuss here in this post.

My conclusion is:

  1. When survival is unlikely, focusing on flourishing becomes qualitatively different in ways that cut against focusing on flourishing.
  2. There are various strategies for focusing on flourishing when survival is unlikely, but none are attractive. 
  3. Overall, when survival is unlikely, we shouldn't focus on flourishing. Even if you buy that the best possible futures are hundreds of times better than the middling futures.
2. Challenges

Recall that flourishing is the expected value conditional on survival. That is, flourishing-focused interventions target the survival posterior, consisting of the green and blue rectangles. Consequentially, if survival is likely then the survival posterior consists of ordinary futures, but if survival is unlikely then the survival posterior consists of weird futures, worlds very different from what we'd expect.[3]

What kind of weird worlds?

  • An AI-enabled coup has occurred
  • Terrorist attacks on datacentres have halted development
  • Civilisational collapse destroyed most human knowledge
  • AGI turned out to be theoretically impossible
  • Aliens/simulators sabotaged our AI development
  • US and China start a nuclear conflict
  • Brain uploads arrived far sooner than expected[4]
  • Extremely surprising civilisational sanity
  • Various cases survival without dignity

For more possible futures, see Bart Bussmann's 60+ Possible Futures; which weird survival worlds are most likely will depend on your cause for pessimism.

This poses three problems:

Problem 1: Survival world is harder to reason about. 

If survival is likely, then the survival posterior consists of ordinary worlds, which you can reason about using existing assumptions/models/trends. However, if survival is unlikely, then the survival posterior consists of weird worlds where our assumptions break down. This makes it much harder to estimate the impact of our interventions, because the world is unprecedented. For example, imagine if brain uploads arrive by 2030 -- this should make us more sceptical of extrapolating various economic trends that were observed before uploads.

And when you condition on survival, you update not just your empirical beliefs but your moral beliefs too. Suppose you're uncertain about (A) whether animals have as much moral weight as humans, and (B) whether we can build machines as smart as humans, and that (A) and (B) are correlated, both downstream of a latent variable like "humans are special." Survival is much more likely if ¬B, so conditioning on survival upweights ¬B, which upweights H, which downweights A. Using the numbers below, your credence in animal moral weight drops from 59% to 34% — nearly halved — just by conditioning on survival. Gnarly!

Problem 2: Surviving worlds are more diverse.

When survival is unlikely, the survival worlds are more different from each other; this is because all ordinary worlds are alike but each weird world is weird in its own way. And because the survival worlds vary so much, it's harder to find interventions which are robustly beneficial -- an intervention that looks good in one weird survival world is likely to look poor in another. 

For example, suppose you think that if world governments proceed with the expected level of sanity, then ASI will cause extinction. But we might survive because governments showed unexpectedly low sanity (e.g. initiating nuclear conflict over some mundane issue) or unexpectedly high sanity (e.g. updating on early warning shots). Now consider an intervention which shifts power toward existing world governments at the expense of frontier AI labs: this might decrease flourishing if we survived via low governmental sanity and increase flourishing if we survived via high governmental sanity. The intervention's value flips sign depending on which weird world we end up in.

It was a bit tricky to try to justify this intuition, but here's a toy model: imagine worlds as points in R², with a Gaussian prior centered at the origin. Scatter some "attractors" randomly — some red (extinction), some green (survival). Each point in world-space inherits the fate of its nearest attractor. When most of the Gaussian mass falls in red regions, survival requires landing in one of the scattered green islands. These islands might be far apart in world-space. The survival posterior becomes multimodal, spread across disconnected regions. The diagram below illustrates: when P(survival) is low, Var(world | survival) tends to be high.

This diversity creates practical problems:

  • Computational: There are more worlds to reason about when choosing interventions.
  • Cluelessness: There are more worlds, so it's more likely that you are clueless about the sign of the intervention in some of the worlds.
  • Social: There are more worlds, so it's more likely that there is disagreement about the relative likelihood of the worlds, in ways that cause disagreement about the overall sign of the intervention.

Problem 3: Transitional events wash out interventions. 

Ordinary worlds have more continuity between the present and the future, whereas weird worlds often involve some transitional event that explains why we survived, and these transitional events might 'wash out' your intervention.

For example, suppose you think the current policy landscape is likely to lead to extinction. Then we should be pessimistic about flourishing-focused policy interventions because, conditional on survival, there was probably some large-scale disruption of the policy landscape.

In the next post, I will discuss potential strategies for focusing on flourishing when survival is unlikely. These strategies will aim to overcome some or all of the problems above.

  1. ^

    In maths:

    Assuming that E(value|not survival) ≈ 0, we can decompose E(value|intervention) into the product of E(value|survival, intervention) and P(survival | intervention).

    This suggests that P(survival | intervention) is a good proxy for E(value|intervention), but this is only true if E(value|survival, intervention) doesn't vary much across interventions.

    However, E(value|survival, intervention) might vary significantly.

  2. ^

    To illustrate, suppose you think that our chances of survival this century are reasonably high (greater than 80%) but that, if we survive, we should expect a future that falls far short of how good it could be (less than 10% as good as the best feasible futures). These are close to my views; the view about Surviving seems widely-held, and Fin Moorhouse and I will argue in essays 2 and 3 for something like that view on Flourishing.

    Introduction Better Futures (MacAskill, 3rd Aug 2025)

  3. ^

    Caveat: In principle, survival could be unlikely yet conditioning on it might not make worlds weird. To illustrate: suppose you're certain that humanity's survival depends entirely on the random weight initialisation of a particular pretraining run — 1% chance of good, 99% bad. Conditioning on survival, most survival worlds are ordinary in every respect except for the lucky weight initialisation. The weirdness of the weight initialisation is highly localised, so it doesn't raise the three problems above.

    That said, I don't think such worldviews are plausible, because they require a very high prior on ordinary worlds. I think that plausible worldviews should place +1% probability on "worlds which are globally weird and we survive". And so these worlds will dominate the survival posterior, even if there are also some "worlds which are locally weird and we survive".

  4. ^

    See State of Brain Emulation Report 2025 (Zanichelli et al., 17th Oct 2025)



Discuss

The truth behind the 2026 J.P. Morgan Healthcare Conference

Новости LessWrong.com - 17 января, 2026 - 20:28
Published on January 17, 2026 5:28 PM GMT

In 1654, a Jesuit polymath named Athanasius Kircher published Mundus Subterraneus, a comprehensive geography of the Earth’s interior. It had maps and illustrations and rivers of fire and vast subterranean oceans and air channels connecting every volcano on the planet. He wrote that “the whole Earth is not solid but everywhere gaping, and hollowed with empty rooms and spaces, and hidden burrows.”. Alongside comments like this, Athanasius identified the legendary lost island of Atlantis, pondered where one could find the remains of giants, and detailed the kinds of animals that lived in this lower world, including dragons. The book was based entirely on secondhand accounts, like travelers tales, miners reports, classical texts, so it was as comprehensive as it could’ve possibly been.

But Athanasius had never been underground and neither had anyone else, not really, not in a way that mattered.

Today, I am in San Francisco, the site of the 2026 J.P. Morgan Healthcare Conference, and it feels a lot like Mundus Subterraneus.

There is ostensibly plenty of evidence to believe that the conference exists, that it actually occurs between January 12, 2026 to January 16, 2026 at the Westin St. Francis Hotel, 335 Powell Street, San Francisco, and that it has done so for the last forty-four years, just like everyone has told you. There is a website for it, there are articles about it, there are dozens of AI-generated posts on Linkedin about how excited people were about it. But I have never met anyone who has actually been inside the conference.

I have never been approached by one, or seated next to one, or introduced to one. They do not appear in my life. They do not appear in anyone’s life that I know. I have put my boots on the ground to rectify this, and asked around, first casually and then less casually, “Do you know anyone who has attended the JPM conference?”, and then they nod, and then I refine the question to be, “No, no, like, someone who has actually been in the physical conference space”, then they look at me like I’ve asked if they know anyone who’s been to the moon. They know it happens. They assume someone goes. Not them, because, just like me, ordinary people like them do not go to the moon, but rather exist around the moon, having coffee chats and organizing little parties around it, all while trusting that the moon is being attended to.

The conference has six focuses: AI in Drug Discovery and Development, AI in Diagnostics, AI for Operational Efficiency, AI in Remote and Virtual Healthcare, AI and Regulatory Compliance, and AI Ethics and Data Privacy. There is also a seventh theme over ‘Keynote Discussions’, the three of which are The Future of AI in Precision Medicine, Ethical AI in Healthcare, and Investing in AI for Healthcare. Somehow, every single thematic concept at this conference has converged onto artificial intelligence as the only thing worth seriously discussing.

 

Isn’t this strange? Surely, you must feel the same thing as me, the inescapable suspicion that the whole show is being put on by an unconscious Chinese Room, its only job to pass over semi-legible symbols over to us with no regards as to what they actually mean. In fact, this pattern is consistent across not only how the conference communicates itself, but also how biopharmaceutical news outlets discuss it.

Each year, Endpoints News and STAT and BioCentury and FiercePharma all publish extensive coverage of the J.P. Morgan Healthcare Conference. I have read the articles they have put out, and none of it feels like it was written by someone who actually was at the event. There is no emotional energy, no personal anecdotes, all of it has been removed, shredded into one homogeneous, smoothie-like texture. The coverage contains phrases like “pipeline updates” and “strategic priorities” and “catalysts expected in the second half.” If the writers of these articles ever approach a human-like tenor, it is in reference to the conference’s “tone”. The tone is “cautiously optimistic.” The tone is “more subdued than expected.” The tone is “mixed.” What does this mean? What is a mixed tone? What is a cautiously optimistic tone? These are not descriptions of a place. They are more accurately descriptions of a sentiment, abstracted from any physical reality, hovering somewhere above the conference like a weather system.

I could write this coverage. I could write it from my horrible apartment in New York City, without attending anything at all. I could say: “The tone at this year’s J.P. Morgan Healthcare Conference was cautiously optimistic, with executives expressing measured enthusiasm about near-term catalysts while acknowledging macroeconomic headwinds.” I made that up in fifteen seconds. Does it sound fake? It shouldn’t, because it sounds exactly like the coverage of a supposedly real thing that has happened every year for the last forty-four years.

Speaking of the astral body I mentioned earlier, there is an interesting historical parallel to draw there. In 1835, the New York Sun published a series of articles claiming that the astronomer Sir John Herschel had discovered life on the moon. Bat-winged humanoids, unicorns, temples made of sentient sapphire, that sort of stuff. The articles were detailed, describing not only these creatures appearance, but also their social behaviors and mating practices. All of these cited Herschel’s observations through a powerful new telescope. The series was a sensation. It was also, obviously, a hoax, the Great Moon Hoax as it came to be known. Importantly, the hoax worked not because the details were plausible, but because they had the energy of genuine reporting: Herschel was a real astronomer, and telescopes were real, and the moon was real, so how could any combination that involved these three be fake?

To clarify: I am not saying the J.P. Morgan Healthcare Conference is a hoax.

What I am saying is that I, nor anybody, can tell the difference between the conference coverage and a very well-executed hoax. Consider that the Great Moon Hoax was walking a very fine tightrope between giving the appearance of seriousness, while also not giving away too many details that’d let the cat out of the bag. Here, the conference rhymes.

For example: photographs. You would think there would be photographs. The (claimed) conference attendees number in the thousands, many of them with smartphones, all of them presumably capable of pointing a camera at a thing and pressing a button. But the photographs are strange, walking that exact snickering line that the New York Sun walked. They are mostly photographs of the outside of the Westin St. Francis, or they are photographs of people standing in front of step-and-repeat banners, or they are photographs of the schedule, displayed on a screen, as if to prove that the schedule exists. But photographs of the inside with the panels, audience, the keynotes in progress; these are rare. And when I do find them, they are shot from angles that reveal nothing, that could be anywhere, that could be a Marriott ballroom in Cleveland.

Is this a conspiracy theory? You can call it that, but I have a very professional online presence, so I personally wouldn’t. In fact, I wouldn’t even say that the J.P. Morgan Healthcare Conference is not real, but rather that it is real, but not actually materially real.

To explain what I mean, we can rely on economist Thomas Schelling to help us out. Sixty-six years ago, Schelling proposed a thought experiment: if you had to meet a stranger in New York City on a specific day, with no way to communicate beforehand, where would you go? The answer, for most people, is Grand Central Station, at noon. Not because Grand Central Station is special. Not because noon is special. But because everyone knows that everyone else knows that Grand Central Station at noon is the obvious choice, and this mutual knowledge of mutual knowledge is enough to spontaneously produce coordination out of nothing. This, Grand Central Station and places just like it, are what’s known as a Schelling point.

Schelling points appear when they are needed, burnt into our genetic code, Pleistocene subroutines running on repeat, left over from when we were small and furry and needed to know, without speaking, where the rest of the troop would be when the leopards came. The J.P. Morgan Healthcare Conference, on the second week of January, every January, Westin St. Francis, San Francisco, is what happened when that ancient coordination instinct was handed an industry too vast and too abstract to organize by any other means. Something deep drives us to gather here, at this time, at this date.

To preempt the obvious questions: I don’t know why this particular location or time or demographic were chosen. I especially don’t know why J.P. Morgan of all groups was chosen to organize the whole thing. All of this simply is.

If you find any of this hard to believe, observe that the whole event is, structurally, a religious pilgrimage, and has all the quirks you may expect of a religious pilgrimage. And I don’t mean that as a metaphor, I mean it literally, in every dimension except the one where someone official admits it, and J.P. Morgan certainly won’t.

Consider the elements. A specific place, a specific time, an annual cycle, a journey undertaken by the faithful, the presence of hierarchy and exclusion, the production of meaning through ritual rather than content. The hajj requires Muslims to circle the Kaaba seven times. The J.P. Morgan Healthcare Conference requires devotees of the biopharmaceutical industry to slither into San Francisco for five days, nearly all of them—in my opinion, all of them—never actually entering the conference itself, but instead orbiting it, circumambulating it, taking coffee chats in its gravitational field. The Kaaba is a cube containing, according to tradition, nothing, an empty room, the holiest empty room in the world. The Westin St. Francis is also, roughly, a cube. I am not saying these are the same thing. I am saying that we have, as a species, a deep and unexamined relationship to cubes.

This is my strongest theory so far. That the J.P. Morgan Healthcare conference isn’t exactly real or unreal, but a mass-coordination social contract that has been unconsciously signed by everyone in this industry, transcending the need for an underlying referent.

My skeptical readers will protest at this, and they would be correct to do so. The story I have written out is clean, but it cannot be fully correct. Thomas Schelling was not so naive as to believe that Schelling points spontaneously generate out of thin air, there is always a reason, a specific, grounded reason, that their concepts become the low-energy metaphysical basins that they are. Grand Central Station is special because of the cultural gravitas it has accumulated through popular media. Noon is special because that is when the sun reaches its zenith. The Kaaba was worshipped because it was not some arbitrary cube; the cube itself was special, that it contained The Black Stone, set into the eastern corner, a relic that predates Islam itself, that some traditions claim fell from heaven.

And there are signs, if you know where to look, that the underlying referent for the Westin St. Francis status being a gathering area is physical. Consider the heat. It is January in San Francisco, usually brisk, yet the interior of the Westin St. Francis maintains a distinct, humid microclimate. Consider the low-frequency vibration in the lobby that ripples the surface of water glasses, but doesn’t seem to register on local, public seismographs. There is something about the building itself that feels distinctly alien. But, upon standing outside the building for long enough, you’ll have the nagging sensation that it is not something about the hotel that feels off, but rather, what lies within, underneath, and around the hotel.

There’s no easy way to sugarcoat this, so I’ll just come out and say it: it is possible that the entirety of California is built on top of one immensely large organism, and the particular spot in which the Westin St. Francis Hotel stands—335 Powell Street, San Francisco, 94102—is located directly above its beating heart. And that this is the primary organizing focal point for both the location and entire reason for the J.P. Morgan Healthcare Conference.

I believe that the hotel maintains dozens of meter-thick polyvinyl chloride plastic tubes that have been threaded down through the basement, through the bedrock, through geological strata, and into the cardiovascular system of something that has been lying beneath the Pacific coast since before the Pacific coast existed. That the hotel is a singular, thirty-two story central line.That, during the week of the conference, hundreds of gallons of drugs flow through these tubes, into the pulsating mass of the being, pouring down arteries the size of canyons across California. The dosing takes five days; hence the length of the conference.

And I do not believe that the drugs being administered here are simply sedatives. They are, in fact, the opposite of sedatives. The drugs are keeping the thing beneath California alive. There is something wrong with the creature, and a select group of attendees at the J.P. Morgan Healthcare Conference have become its primary caretakers.

Why? The answer is obvious: there is nothing good that can come from having an organic creature that spans hundreds of thousands of square miles suddenly die, especially if that same creatures mass makes up a substantial portion of the fifth-largest economy on the planet, larger than India, larger than the United Kingdom, larger than most countries that we think of as significant. Maybe letting the nation slide off into the sea was an option at one point, but not anymore. California produces more than half of the fruits, vegetables, and nuts grown in the United States. California produces the majority of the world’s entertainment. California produces the technology that has restructured human communication. Nobody can afford to let the whole thing collapse.

So, perhaps it was decided that California must survive, at least for as long as possible. Hence Amgen. Hence Genentech. Hence the entire biotech revolution, which we are taught to understand as a triumph of science and entrepreneurship, a story about venture capital and recombinant DNA and the genius of the California business climate. The story is not false, but incomplete. The reason for the revolution was, above all else, because the creature needed medicine, and the old methods of making medicine were no longer adequate, and someone decided that the only way to save the patient was to create an entire industry dedicated to its care.

Why is drug development so expensive? Because the real R&D costs are for the primary patient, the being underneath California, and human applications are an afterthought, a way of recouping investment. Why do so many clinical trials fail? For the same reason; the drugs are not meant for our species. Why is the industry concentrated in San Francisco, San Diego, Boston? Because these are monitoring stations, places where other intravenous lines have been drilled into other organs, other places where the creature surfaces close enough to reach.

Finally, consider the hotel itself. The Westin St. Francis was built in 1904, and, throughout its entire existence, it has never, ever, even once, closed or stopped operating. The 1906 earthquake leveled most of San Francisco, and the Westin St. Francis did not fall. It was damaged, yes, but it did not fall. The 1989 Loma Prieta earthquakekilled sixty-three people and collapsed a section of the Bay Bridge. Still, the Westin St. Francis did not fall. It cannot fall, because if it falls, the central line is severed, and if the central line is severed, the creature dies, and if the creature dies, we lose California, and if we lose California, our civilization loses everything that California has been quietly holding together. And so the Westin St. Francis has hosted every single J.P. Morgan Healthcare Conference since 1983, has never missed one, has never even come close to missing one, and will not miss the next one, or the one after that, or any of the ones that follow.

If you think about it, this all makes a lot of sense. It may also seem very unlikely, but unlikely things have been known to happen throughout history. Mundus Subterraneus had a section on the “seeds of metals,” a theory that gold and silver grew underground like plants, sprouting from mineral seeds in the moist, oxygen-poor darkness. This was wrong, but the intuition beneath it was not entirely misguided. We now understand that the Earth’s mantle is a kind of eternal engine of astronomical size, cycling matter through subduction zones and volcanic systems, creating and destroying crust. Athanasius was wrong about the mechanism, but right about the structure. The earth is not solid. It is everywhere gaping, hollowed with empty rooms, and it is alive.



Discuss

Japan is a bank

Новости LessWrong.com - 17 января, 2026 - 19:33
Published on January 17, 2026 4:33 PM GMT

Among developed countries, Japan has long had the highest debt/GDP ratio, currently ~232%. That seems pretty bad, and conversely has made some people say that the US debt is fine because it's still much lower than Japan's. But here are some points that might clarify the situation:

First, that ratio has declined recently, from 258% in 2020.

Second, the Japanese government holds a lot of stocks and foreign bonds. Its net debt/GDP is "only" 140%, and has declined since 2020. The US government doesn't do that. (The government of Singapore also holds a lot of assets, and Temasek is well-known as a large investment fund, but Japan is a bigger country, and despite smaller holdings per capita, its investments are much larger than Singapore's.)

Meanwhile, America's federal debt/GDP ratio is ~124%. Add in state debt and it's ~127%. So the net debt/GDP of the US government isn't that different from Japan. It's still higher, but arguably the "quality" of that US GDP is lower, for a couple reasons:

  1. The US has a worse trade balance than Japan. It borrows more money, and has a net inflow of investment. That investment and borrowed money then circulates around and raises GDP by some multiplier, mostly by making both prices and incomes higher in the US.
  2. Japan has higher PPP/nominal GDP than the US, by a factor of ~1.6x. Arguably this is a better measure of ability to pay back debt with real stuff than nominal GDP.

On the other hand, the US does have more natural resources, and the federal goverment owns a lot of land. My point is just that, while I've often seen it said that the US government debt situation is clearly better than Japan's, that's not clearly the case.

By the way, another economic metric I think is interesting to compare is median and average personal wealth.



Discuss

Turning Down the Overthinking: How Cathodal Brain Stimulation Could Transform Stuttering Therapy

Новости LessWrong.com - 17 января, 2026 - 17:54
Published on January 17, 2026 2:54 PM GMT

The cruelest irony of stuttering is that trying harder to speak fluently makes it worse. Not trying harder in the sense of practice or effort, but trying harder in the sense of conscious attention to speech mechanics. When someone who stutters focuses intently on controlling their words, analyzing their breathing, and monitoring their mouth movements, their speech doesn't improve. It deteriorates.

This is the reinvestment hypothesis in action: explicit, conscious control actively interferes with skills that should be automatic. A pianist who thinks too carefully about finger placement plays worse. An athlete who consciously monitors their form chokes under pressure. And a person who stutters, desperately focusing on each syllable, finds their speech becoming more fragmented, not less.

For the 70 million people worldwide who stutter, this creates a devastating trap. They know their speech is broken. They focus intensely on fixing it. And that very focus makes the problem worse.

What if we could temporarily turn off that interference? What if we could create a neural state where the overthinking stops, where the brain's executive control systems step aside and let procedural motor learning do its work? And what if we could do this precisely during speech practice, when the brain is trying to encode new, fluent motor patterns?

TDCS Shows Promise, But We're Targeting the Wrong Mechanism

Brain stimulation for stuttering isn't a new idea. Over the past seven years, researchers have tested transcranial direct current stimulation (tDCS) in adults who stutter, with mixed but encouraging results.

The landmark study came from Oxford in 2018. Chesters and colleagues ran a rigorous double-blind trial with 30 adults who stutter. The intervention was straightforward: 1 milliamp of anodal (excitatory) tDCS applied to the left inferior frontal cortex for 20 minutes, five days in a row, while participants practiced fluency-inducing speech techniques like choral reading and metronome-timed speech.

The results were striking. At baseline, both groups stuttered on about 12% of syllables. One week after treatment, the tDCS group had dropped to 8.7% stuttering, while the sham group remained at 13.4%. That's a 27% relative reduction, with a large effect size (Cohen's d = 0.98). The improvement persisted at six weeks for reading tasks, though conversation fluency had regressed somewhat.

This proved the principle: pairing brain stimulation with speech practice can produce meaningful, lasting fluency gains.

Figure 1: Effects of tDCS on Stuttering Frequency Across Major Studies

Data from published RCTs measuring stuttering frequency (% stuttered syllables) at primary endpoints. Chesters' 2018 showed the largest and most durable effect with multi-session anodal stimulation to left inferior frontal cortex. Moein 2022 combined tDCS with delayed auditory feedback training. Garnett 2019 found no advantage over sham despite both groups improving (strong practice effect). Karsan 2022 tested cathodal (inhibitory) stimulation to right IFC, showing acute reading improvement. Error bars represent 95% confidence intervals where reported.

But there's a pattern in these results. Multi-session protocols (Chesters, Moein) work better than single sessions. Multi-session anodal stimulation of speech production areas (Broca’s area, supplementary motor area) produces modest fluency gains. In contrast, when researchers applied cathodal (inhibitory) stimulation to right frontal regions, they observed unexpected fluency improvements, suggesting that reducing frontal interference may be more important than boosting the speech system itself.

This last finding is the key. It suggests that the mechanism might not be "boost the speech system." It might be "reduce the interference."

The Problem: Your Prefrontal Cortex Won't Stop Helping

Neuroimaging studies consistently show that people who stutter have hyperactive prefrontal cortex during speech. A 2022 fNIRS study found that right dorsolateral prefrontal cortex (DLPFC) activation spiked by approximately 20% when adults who stutter anticipated difficult words, compared to fluent controls. This region is the brain's executive control center, handling working memory, attention, and conscious monitoring of performance.

This hyperactivity isn't random. It reflects the subjective experience of stuttering: constant self-monitoring, anticipating which words will be difficult, analyzing what went wrong, trying to control every aspect of speech production. The DLPFC is working overtime, desperately trying to prevent stuttering.

But that's the problem. The DLPFC is trying to consciously control a process that should be automatic.

Speech relies on subcortical circuits, especially a structure called the basal ganglia, which helps start and time learned movement sequences. In fluent speakers, this system smoothly passes well‑practiced speech “chunks” to cortical speech areas. In people who stutter, resting‑state fMRI shows weaker‑than‑normal connectivity between the putamen (a key basal ganglia structure) and cortical speech regions, suggesting that this automatic handoff is impaired.

The natural compensatory response is to recruit conscious control. If the automatic system isn't working, use the manual override. Engage the prefrontal cortex. Monitor every syllable. Plan every word.

But this creates a vicious cycle. The prefrontal cortex isn't designed to run speech production. It's too slow, too effortful, too dependent on working memory. When it tries to micromanage speech, it interferes with what remains of the automatic system. The result is more stuttering, which triggers more monitoring, which causes more interference.

Meta-analyses confirm this pattern. People who stutter show 30-40% greater right inferior frontal cortex activation during speech compared to controls, while left inferior frontal cortex (Broca's area) shows 20% reduced activation. They're using the wrong networks, in the wrong hemisphere, for the wrong type of control.

The question isn't how to boost the damaged automatic system. It's how to get the interfering conscious system out of the way.

When The Brain Becomes The Enemy

This phenomenon has a name in sports psychology: reinvestment. It's what happens when skilled performers revert to explicit, rule-based control of movements that have been proceduralized.

The classic study comes from Masters (1992). He had people learn golf putting in two conditions: one group received explicit coaching about technique, the other learned implicitly through trial and error with minimal instruction. Initially, both groups performed similarly. But when tested under pressure, the explicit learners collapsed. Their putting accuracy dropped 30-50%, while the implicit learners' performance held steady.

The difference? The explicit learners had conscious rules they could reinvest attention into. Under pressure, they started thinking about their form, and that thinking destroyed their performance.

The effect is robust and large. Maxwell et al. (2001) found effect sizes greater than d = 1.0 when comparing stress performance of implicit versus explicit learners. The explicit learners made roughly twice as many errors under dual-task conditions.

This maps directly onto stuttering. Fluent speech is a proceduralized motor skill. In fluent speakers, it happens automatically, with minimal prefrontal involvement. But people who stutter have learned to monitor and control speech consciously. They have explicit rules. And those rules, that conscious attention, actively interferes with whatever automatic capacity remains.

The most compelling evidence comes from dual-task studies. When people who stutter perform a simple non-linguistic secondary task while speaking (like tapping a rhythm), stuttering frequency often decreases. The secondary task occupies the prefrontal cortex, preventing it from interfering with speech. It forces implicit control by blocking explicit control.

This is our therapeutic target: reduce prefrontal interference, allow implicit motor learning.

Disrupting DLPFC Enhances Learning

The definitive proof that reducing prefrontal activity can enhance skill learning comes from Smalle et al. (2017). They tested whether adults' superior executive function actually hinders certain types of implicit learning that children excel at.

Young adults received repetitive TMS to transiently inhibit the left DLPFC, then performed an implicit word-form learning task. The control group received sham (placebo) stimulation that mimicked the procedure but did not actually affect the brain. The result was clear: DLPFC disruption produced significantly enhanced learning.

The effect size was d = 0.88. Participants with inhibited DLPFC learned new word sequences faster and retained them better. Critically, individuals with higher baseline executive function showed the largest benefits. Their prefrontal cortex was normally interfering with procedural learning, and shutting it down removed the interference.

The interpretation is straightforward: the DLPFC and subcortical procedural systems compete for control during learning. When you reduce DLPFC activity, the procedural system wins, and learning is more efficient and robust.

For stuttering, this suggests a direct intervention: inhibit DLPFC during speech practice.

The Proposal: Strategic Neural Inhibition During Speech Training

Here's the complete picture: use cathodal (inhibitory) tDCS to temporarily reduce left DLPFC activity during intensive speech practice. The timing is critical.

Cathodal tDCS at 1-2 milliamps produces reduced cortical excitability lasting 30-60 minutes. Apply the stimulation, then immediately begin speech training while DLPFC is still inhibited. During this window, the brain is in a low-interference state, primed for implicit motor learning.

Figure 2: Intervention Workflow and Complementary Mechanisms

Top panel shows the simple timeline: apply cathodal tDCS for 20 minutes, then immediately transition to speech practice while DLPFC remains inhibited. Bottom panel illustrates the mechanism: normally, hyperactive DLPFC sends interfering signals that disrupt automatic speech motor control (causing stuttering). Cathodal tDCS temporarily reduces DLPFC activity, allowing the speech motor system to operate without interference. This creates optimal conditions for implicit motor learning during practice. The goal is to train fluent speech patterns that become automatic and don't require conscious control.

The protocol builds directly on parameters proven effective in prior stuttering tDCS trials:

  • Location: F3 electrode position (left DLPFC) - standard 10-20 EEG system localization used in cognitive neuroscience
  • Intensity: 1-2 mA cathodal stimulation - all major stuttering tDCS studies (Chesters, Moein, Karsan) used 1 mA; 2 mA is the safety guideline upper limit
  • Duration: 20 minutes per session - the standard duration across all successful stuttering tDCS trials
  • Frequency: 5-10 consecutive daily sessions - Chesters showed effects with 5 days, Moein with 6 days, Mohajeri with 15 days; we propose testing the middle range
  • Concurrent training: Fluency-shaping techniques (prolonged speech, metronome-timed speech, choral reading) - the exact methods used in Chesters' successful Oxford trial

During training, explicit strategy coaching is minimized. The focus is on external goals ("communicate the message") rather than speech mechanics ("control your breathing"). With reduced DLPFC activity, this implicit approach should feel more natural. The conscious monitoring system is temporarily quieted, allowing procedural learning.

Expected Outcomes: Beyond Symptom Reduction

If the reinvestment hypothesis is correct, we should see several specific effects.

Primary outcome: Stuttering frequency should decrease more in the cathodal DLPFC group than in sham or standard therapy. Based on the Oxford trial (anodal IFC, d = 0.98) and Smalle’s DLPFC inhibition study (d = 0.88), a conservative estimate for our combined protocol is an effect size around d = 0.7–1.0, which corresponds to roughly 3–5 percentage points greater reduction in stuttering frequency compared to sham

Figure 3: Predicted Learning Trajectories

Predicted learning curves based on combining effect sizes from Chesters 2018 (multi-session tDCS: d=0.98) and Smalle 2017 (DLPFC inhibition enhancing implicit learning: d=0.88). The cathodal DLPFC group should show both faster initial learning and greater final improvement due to enhanced proceduralization. Standard deviation for % stuttered syllables is typically 6-9%, so a 6-point improvement represents substantial clinical change. Power analysis indicates n≈25-30 per group provides 80% power to detect a 3-point difference at α=0.05.

But clinical scores tell only part of the story. We should also see:

Process indicators of automaticity: Stable fluency under dual‑task load. If participants maintain their fluency gains even while performing a secondary cognitive task (for example, an auditory n‑back), it suggests speech has become robustly automatic rather than fragile and attention‑dependent

Neurophysiological markers: Over the course of therapy, fNIRS or EEG should show reduced DLPFC activation during unstimulated speech, indicating decreased conscious monitoring. We’d expect larger long‑term reductions in the cathodal group than in sham group

Subjective reports: Participants should report less mental effort during speech, fewer conscious strategies, and more "flow" experiences. The speech should feel less effortful, more natural.

Maintenance and generalization: If the fluency is truly proceduralized rather than explicitly controlled, it should be more resistant to relapse. A six-month follow-up should show better maintained gains in the cathodal group.

These process measures would validate the mechanism. It's not just "tDCS makes you more fluent somehow." It's specifically "reducing executive interference accelerates implicit motor learning, producing more robust automaticity."

Breaking the Relapse Cycle

Traditional stuttering therapy achieves impressive short-term results. Intensive programs can reduce stuttering by 70-100% immediately after treatment. The problem is maintenance. Relapse rates range from 30-70% within one to two years.

The reason is cognitive load. Therapy teaches explicit techniques: prolonged speech, gentle onsets, controlled breathing. These work when you have full attention available. But real-world speaking happens while thinking about what to say, managing emotions, and multitasking. The techniques require executive resources that aren't available under those conditions.

This is exactly the problem cathodal DLPFC training addresses. By reducing executive interference during learning, we encode the fluent motor patterns implicitly rather than explicitly. The result should be speech that doesn't depend on conscious control, that holds up under pressure, that resists relapse.

Even a 15-20% reduction in relapse rates would be clinically meaningful. If this approach cuts relapse from 50% to 35%, that means thousands fewer people needing retreatment annually.

Beyond the numbers, there's the quality of life impact. Stuttering affects approximately 70 million people worldwide. It limits career choices, impairs social relationships, and creates profound daily stress. Adults who stutter score significantly lower on quality-of-life measures than population norms, with impacts comparable to chronic health conditions.

Current therapy is expensive (intensive programs cost $1,500-5,000) and time-intensive (20-30 clinical hours plus daily practice). An adjunct that improves efficiency could reduce both cost and burden.

And there's something deeper. By leveraging neuroscience to enhance learning, we're not just treating symptoms. We're addressing the fundamental mechanism: the competition between explicit and implicit control systems. We're giving people who stutter what fluent speakers have naturally: speech that happens automatically, without thinking.

Next Steps for Testing This

This proposal combines proven elements in a novel configuration. Cathodal tDCS to DLPFC has been used safely in depression research and cognitive neuroscience. Speech therapy techniques are well-established. What's new is the strategic pairing: using brain stimulation to create optimal learning conditions during intensive practice.

The safety profile is excellent. Reviews of 33,200+ tDCS sessions found zero serious adverse events. Common side effects are mild: scalp tingling, slight headache. At 1-2 mA for 20 minutes, we're well within established safety parameters.

The theoretical foundation is strong: reinvestment theory, evidence of DLPFC hyperactivation in stuttering, Smalle's demonstration that DLPFC inhibition enhances learning. The clinical need is urgent: millions of people with limited effective treatments and high relapse rates.

What's needed now is execution. A well-designed trial: 30 adults per group, cathodal DLPFC tDCS versus sham, five daily sessions, with both immediate and long-term outcome measures. If the mechanism is correct, we should see accelerated learning, greater automaticity, and more durable fluency.

The ultimate goal isn't just fewer stuttered syllables. It's freeing people from the mental overdrive that makes speaking exhausting. It's allowing speech to become what it should be: automatic, effortless, natural.

Sometimes the solution isn't trying harder, but learning to stop trying so hard.



Discuss

What Washington Says About AGI

Новости LessWrong.com - 17 января, 2026 - 08:43
Published on January 17, 2026 5:43 AM GMT

I spent a few hundred dollars on Anthropic API credits and let Claude individually research every current US congressperson's position on AI. This is a summary of my findings.

Disclaimer: Summarizing people's beliefs is hard and inherently subjective and noisy. Likewise, US politicians change their opinions on things constantly so it's hard to know what's up-to-date. Also, I vibe-coded a lot of this.

Methodology

I used Claude Sonnet 4.5 with web search to research every congressperson's public statements on AI, then used GPT-4o to score each politician on how "AGI-pilled" they are, how concerned they are about existential risk, and how focused they are on US-China AI competition. I plotted these scores against GovTrack ideology data to search for any partisan splits.

I. AGI awareness is not partisan and not widespread

Few members of Congress have public statements taking AGI seriously. For those that do, the difference is not in political ideology. If we simply plot the AGI-pilled score vs the ideology score, we observe no obvious partisan split.

There are 151 congresspeople who Claude could not find substantial quotes about AI from. These members are not included on this plot or any of the plots which follow. 

II. Existential risk is partisan at the tails

When you change the scoring prompt to ask how much a congressperson's statements reflect a concern about existential risk, the plot looks different. Note that the scoring prompt here emphasizes "A politician who is most XRisk-pilled is someone who thinks AI is a risk to humanity -- not just the US." This separates x-risk concerns from fears related to US-China relations.

This graph looks mostly like noise but it does show that the majority of the most x-risk pilled politicians are Democrats.[1] This is troubling. Politics is a mind-killer and if AI Safety becomes partisan, productive debate will be even more difficult than it currently is.

III. Both parties are fixated on China

Some congresspeople have made up their minds: the US must "win" the race against China and nothing else matters. Others have a more nuanced opinion. But most are thinking about US-China relations when speaking about AI. Notably, the most conservative congresspeople are more likely to be exclusively focused on US-China relations compared to the most progressive members.

This plot has a strange distribution. For reference, the scoring prompt uses the following scale:

  • 0 = Does not mention China in their views on AI or does not think US-China relations are relevant
  • 50 = Cites US China relations when talking about AI but is not the only motivating factor on their position on AI
  • 100 = Cites US China relations as the only motivating factor on their position on AI and mentions an AI race against China as a serious concern
IV. Who in Congress is feeling the AGI?

I found that roughly 20 members of Congress are "AGI-pilled." 

  1. Bernie Sanders (Independent Senator, Vermont): AGI-pilled and safety-pilled

    "The science fiction fear of AI running the world is not quite so outrageous a concept as people may have thought it was."

  2. Richard Blumenthal (Democratic Senator, Connecticut): AGI-pilled and safety-pilled

    "The urgency here demands action. The future is not science fiction or fantasy. It's not even the future. It's here and now."

  3. Rick Crawford (Republican Representative, Arkansas): AGI-pilled but doesn't discuss x-risk (only concerned about losing an AI race to China)

    "The global AI race against China is moving much faster than many think, and the stakes couldn't be higher for U.S. national security."

  4. Bill Foster (Democratic Representative, Illinois): AGI-pilled and safety-pilled

    "Over the last five years, I’ve become much more worried than I previously was. And the reason for that is there’s this analogy between the evolution of AI algorithms and the evolution in living organisms. And what if you look at living organisms and the strategies that have evolved, many of them are deceptive."

  5. Brett Guthrie (Republican Representative, Kentucky): AGI-pilled but doesn't discuss x-risk (only concerned about losing an AI race to China)

    "And who will win the war for AI? Essentially, this is as important as the dollar being the reserve currency in the world. It's that important, that's what is before us."

  6. Chris Murphy (Democratic Senator, Connecticut): AGI-pilled and somewhat safety-pilled (more focused on job loss and spiritual impacts)

    "I worry that our democracy and many others could frankly collapse under the weight of both the economic and the spiritual impacts of advanced AI."

  7. Brad Sherman (Democratic Representative, California): AGI-pilled and safety-pilled 

    "I believe in our lifetime we will see new species possessing intelligence which surpasses our own. The last time a new higher level of intelligence arose on this planet was roughly 50,000 years ago. It was our own ancestors, who then said hello to the previously most intelligent species, Neanderthals. It did not work out so well for the Neanderthals."

  8. Debbie Wasserman Schultz (Democratic Representative, Florida): AGI-pilled and safety-pilled

    "Experts that were part of creating this technology say that it's an existential threat to humanity. We might want to listen."

  9. Bruce Westerman (Republican Representative, Arkansas): AGI-pilled but not necessarily safety-pilled (mostly focused on winning the "AI race")

    "The more I learn about it, it's kind of one of those things I think maybe humankind would've been better off if we didn't discover this and if we weren't developing it. But the cat's out of the bag and it is definitely a race to see who was going to win AI."

  10. Ted Lieu (Democratic Representative, California):  AGI-pilled and safety-pilled

    "AI already has reshaped the world in the same way that the steam engine reshaped society. But with the new advancements in AI, it's going to become a supersonic jet engine in a few years, with a personality, and we need to be prepared for that."

  11. Donald S. Beyer (Democratic Representative, Virginia): AGI-pilled and (mostly) safety-pilled

    "As long as there are really thoughtful people, like Dr. Hinton or others, who worry about the existential risks of artificial intelligence--the end of humanity--I don't think we can afford to ignore that. Even if there's just a one in a 1000 chance, one in a 1000 happens."

  12. Mike Rounds (Republican Senator, South Dakota): AGI-pilled and somewhat safety-pilled (talks about dual-use risks)

    "Bad guys can use artificial intelligence to create new pandemics, to use it for biological purposes and so forth, and to split genes in such a fashion that it would be extremely difficult to defend against it."

  13. Raja Krishnamoorthi (Democratic Representative, Illinois): AGI-pilled and safety-pilled

    "That's why I'm working on a new bill—the AGI Safety Act—that will require AGI to be aligned with human values and require it to comply with laws that apply to humans."

  14. Elissa Slotkin (Democratic Senator, Michigan): AGI-pilled but not safety-pilled (mostly concerned about losing an AI race to China)

    "I left this tour with the distinct feeling that AI raises some of the same fundamental questions that nukes did. How should they be used? By whom? Under what rules?"

  15. Dan Crenshaw (Republican Representative, Texas): AGI-pilled and maybe safety-pilled

    Did a podcast with Eliezer Yudkowsky but this was 2023 

  16. Josh Hawley (Republican Senator, Missouri): AGI-pilled and safety-pilled

    "Americanism and the transhumanist revolution cannot coexist."

  17. Nancy Mace (Republican Representative, South Carolina): AGI-pilled but not safety-pilled (only concerned about losing an AI race to China)

    "And if we fall behind China in the AI race...all other risks will seem tame by comparison."

  18. Jill Tokuda (Democratic Representative, Hawaii): AGI-pilled and safety-pilled but this is based on very limited public statements

    "And is it possible that a loss of control by any nation-state, including our own, could give rise to an independent AGI or ASI actor that, globally, we will need to contend with?"

  19. Eric Burlison (Republican Representative, Missouri): AGI-pilled but not safety-pilled (only concerned about losing an AI race to China)

    "Artificial intelligence, or AI, is likely to become one of the most consequential technology transformations of the century."

  20. Nathaniel Moran (Republican Representative, Texas): AGI-pilled and safety-pilled (but still very focused on US-China relations)

    "At the same time, we must invest in areas crucial for oversight of automated AI research and development, like AI interpretability and control systems, which were identified in President Trump’s AI action plan."

  21. Pete Ricketts (Republican Senator, Nebraska): AGI-pilled but not safety-pilled (only concerned about losing an AI race to China)

    "Unlike the moon landing, the finish line in the AI race is far less clear for the U.S.--it may be achieving Artificial General Intelligence, human-level or greater machine cognition."

V. Those who know the technology fear it.

Of the members of Congress who are the strongest in AI safety, three have some kind of technical background.

Bill Foster is a US Congressman from Illinois, but in the 1990s, he was one of the first scientists to apply neural networks to study particle physics interactions. From reading his public statements, I believe he has the strongest understanding of AI safety out of any other member of Congress. For example, Foster has referenced exponential growth in AI capabilities:

As a PhD physicist and chip designer who first programmed neural networks at Fermi National Accelerator Laboratory in the 1990s, I've been tracking the exponential growth of AI capabilities for decades, and I'm pleased Congress is beginning to take action on this issue.

Likewise, Ted Lieu has a degree from Stanford in computer science. In July of 2025, he stated "We are now entering the era of AI agents," which is a sentence I cannot imagine most members of Congress saying. He has also acknowledged that AI could "destroy the world, literally."

Despite being 75 years old, Congressman Don Beyer is enrolled in a master's program in machine learning at George Mason University. Unlike other members of Congress, Beyer's statements demonstrate an ability to think critically about AI risk:

Many in the industry say, Blah. That's not real. We're very far from artificial general intelligence ... Or we can always unplug it. But I don't want to be calmed down by people who don't take the risk seriously

Appendix: How to use this data

The extracted quotes and analysis by Claude for every member of Congress can be found in a single json file here.

I found reading Claude's "notes" in the json to be an extremely comprehensive and accurate summary of a congressperson's position on AI. The direct quotes in the json are also very interesting to look at. I have cross-referenced many of them and hallucinations are very limited[2] (Claude had web search enabled, so was able to take quotes directly from websites but at least in one case, made a minor mistake). I have also spot-checked some of the scores gpt-4o produced and they are reasonable, but as always is the case with LLM judges, the values are noisy.

I release all the code for generating this data and these plots but it's pretty disorganized and I would expect it to be difficult to use. If you send me a DM, I'd be happy to explain anything. Running all of this code will cost roughly $300 so if you would like to run a modified version of the pipeline, be aware of this.

  1. ^

    It also looks like more moderate politicians may be less x-risk pilled compared to those on each extreme. But the sample here is small and "the graph kind of looks like a U if you squint at it" doesn't exactly qualify as rigorous analysis.

  2. ^

    I obviously cross-referenced each of the quotes in this post.



Discuss

Lightcone is hiring a generalist, a designer, and a campus operations co-lead

Новости LessWrong.com - 17 января, 2026 - 04:47
Published on January 17, 2026 1:47 AM GMT

Lightcone is hiring! We build beautiful things for truth-seeking and world-saving. 

We are hiring for three different positions: a senior designer, a campus manager, and a core team generalist. This is the first time in almost two years where we are actively hiring and trying to grow our team! 

 Senior Designer

When we are at our best, I think we produce world-class design. AI 2027 was I think a great design achievement, so is much of LessWrong.com itself. I also think on a product and business level, making things beautiful and intuitive and well-crafted is crucial. I like some of Patrick Collison's thinking on this:

If Stripe is a monstrously successful business, but what we make isn’t beautiful, and Stripe doesn’t embody a culture of incredibly exacting craftsmanship, I’ll be much less happy. I think the returns to both of those things in the world are really high. I think even beyond the pecuniary or financial returns, the world’s just uglier than it needs to be… One can do things well or poorly, and beauty is not a rivalrous good.

My intuition is that more of Stripe’s success than one would think is downstream of the fact that people like beautiful things—and for kind of rational reasons because what does a beautiful thing tell you? Well it tells you the person who made it really cared… And so if you care about the infrastructure being holistically good, indexing on the superficial characteristics that you can actually observe is not an irrational thing to do.

I want us to continue making beautiful and well-designed things. Indeed, we currently have enormous demand for making more things like AI 2027 and DecidingToWin.org, with multiple new inquiries for projects like this per month, and I think many of those opportunities could be great. I also think LessWrong itself is substantially bottlenecked on design.

Now, design is a very broad category. The specific role I want to hire for is someone helping us make beautiful websites. This very likely implies understanding HTML and CSS deeply, and probably benefits a lot from frontend coding experience. But I can imagine someone who is just used to doing great design in Figma, without touching code directly, making this work.

This is a senior role! I am expecting to work with whoever we hire in this role more closely than I am currently working with many of my current core staff, and for this role to involve managing large design projects end-to-end. Correspondingly I am expecting that we will pay a salary in the range of $160k - $300k for this role, with the rough aim of paying ~70% of that person's counterfactual industry salary.

Apply here

 

Campus Operations Co-Lead

Help us run Lighthaven! Lighthaven is our 30,000 sq. ft. campus near Downtown Berkeley. We host a huge variety of events, fellowships and conferences there, ranging from 600+ person festivals like LessOnline or Manifest, to multi-month fellowships like the MATS program or Inkhaven.

We are looking for additional people to run a lot of our operations here. This will include making sure events run smoothly for our clients, figuring out how and when to cut costs or spend more and working on finding or inventing new events.

The skills involved in this role really vary hugely from month to month, so the key requirements are being able to generally problem-solve and to learn new skills as they become necessary. Some things I have worked on with people on campus in the last few months: 

  • Figuring out pricing for conferences and fellowships that run here
  • Taking over work from our cleaner and porterage staff for a few days to notice which parts of the job are unnecessarily hard, so we can work on making them easier
  • Programming and deploying our room booking software for events
  • Plunging a clogged toilet during an event
  • Hiring architects to help us file a bunch of permits to the City of Berkeley
  • Getting into a car and getting backup food from a nearby restaurant for a conference that didn't order enough food for their attendees

This role could really take a very wide range of levels of compensation and seniority, ranging from $100k/yr to $300k/yr.

Apply here

 Core-team generalist

Lightcone has a core team of 7 generalists who work on anything from design, to backend programming, to running conferences, to managing construction projects, to legal, advertising, portage, fundraising, sales, etc. We tend to operate at a sprint level, with teams and leadership of those teams being reconfigured every few weeks depending on what's current organizational priority. As an illustration, approximately every person on the core generalist team has managed every other person on the team as part of some project or initiative.

For the core team, I try to follow Paul Graham's hiring advice: 

What do I mean by good people? One of the best tricks I learned during our startup was a rule for deciding who to hire. Could you describe the person as an animal? It might be hard to translate that into another language, but I think everyone in the US knows what it means. It means someone who takes their work a little too seriously; someone who does what they do so well that they pass right through professional and cross over into obsessive.

What it means specifically depends on the job: a salesperson who just won't take no for an answer; a hacker who will stay up till 4:00 AM rather than go to bed leaving code with a bug in it; a PR person who will cold-call New York Times reporters on their cell phones; a graphic designer who feels physical pain when something is two millimeters out of place.

Almost everyone who worked for us was an animal at what they did.

At least basic programming skill is a requirement for this role, but beyond that, it's about being excited to learn new things and adapting to whatever schemes we have going on, and getting along well with the rest of the team.

Lightcone is a culturally thick organization. Working with us on the core team is unlikely to work out if you aren't bought into a lot of our vision and culture. It's hard to summarize our full culture in this post, but here are some things that I think are good predictors of being a good fit for working here: 

  • Strongly resonating with the LessWrong sequences
  • Being excited about Edward Tufte and his writing on design
  • Having dedicated your career to AI existential risk reduction
  • Taking strong inspiration fromPaul Graham's writing on running and working at startup
  • Being very argumentative and having your own perspective and opinions on things

We try to pay competitively for this role, but are still somewhat limited by being a nonprofit. Our general salary policy is to pay 70% of whatever you would make in industry (with some cap around $300k-$400k, since we can't really pay 70% of the $5M+ salaries flying around in a bunch of AI land these days). 

Apply here

 

Some more thoughts on working at Lightcone

I think Lightcone is a pretty good environment for thinking about the future of humanity. We tend to have a lot of spirited and intense discussion and debate about how to make things go better, and try to do a healthy mixture of backchaining from making AI go well, and forward-chaining from how to make ourselves and our community more productive and more sane. 

We also generally have a quite intense work culture. Many people on the team work routine 60 hour weeks, and I consider it a sign of a well-calibrated workload to have around one all-nighter every 6 months or so in order to meet some last-minute deadline (we average a bit less than that, which suggests to me we should be a bit more ambitious with the commitments we take on, though not much more!).

We seem to work much more collaboratively than most organizations I've observed. A common unit of work allocation within our organization is a pair of people who are expected to spend many hours talking and thinking together about their assigned top priority. Our work environment tends to be pretty interruption-driven, with me generally doing "Management by Walking Around", where I spend much of my day visiting people in their workspaces, checking in on what their bottlenecks are, and solving concrete problems with them.

For a much more in-depth pointer to how we work, I have also recently published a sequence of essays about our operating principles, which are adopted from weekly memos I write about how I would like us to operate: 

By default we do a 1-2 week trial, then if we expect it to work out we do a 1-3-month extended trial. But this is quite negotiable if you are not able to do this (e.g. many people can't do a 3-month trial without quitting their existing job, so need a firmer offer). We have successfully sponsored many H1B visas in the past, so non-US applicants are welcome.

And if you have any uncertainties or questions about applying, please send me a DM or leave a comment!

Apply



Discuss

Applying to MATS: What the Program Is Like, and Who It’s For

Новости LessWrong.com - 17 января, 2026 - 03:25
Published on January 17, 2026 12:25 AM GMT

Application deadline: Three days remaining! MATS Summer 2026 applications close this Sunday, January 18, 2026 AOE. We've shortened the application this year. Most people finish in 1–2 hours, and we'll get back to applicants about first stage results by the end of January. Visit our website for details: matsprogram.org/apply.

TL;DR: This post is a follow-up to our shorter announcement that MATS Summer 2026 applications are open. It's intended for people who are considering applying and want a clearer sense of what the program is actually like, how mentorship and research support work in practice, and whether MATS is likely to be a good fit.

What MATS is trying to do

MATS aims to find and train talented individuals for what we see as one of the world's most urgent and talent-constrained problems: reducing risks from unaligned AI. We believe ambitious people from a wide range of backgrounds can meaningfully contribute to this work. Our program provides the mentorship, funding, training, and community to make that happen.

Since late 2021, MATS has supported over 500 researchers working with more than 100 mentors from organizations like Anthropic, Google DeepMind, OpenAI, UK AISI, GovAI, METR, Apollo, RAND, AI Futures Project, Redwood Research, and more. Fellows have collectively co-authored 160+ research papers, with 7,800+ citations and an organizational h-index of 40.

Fellows have contributed to research agendas like:

Approximately 80% of alumni now work directly in AI safety/security, and around 10% have gone on to co-found AI safety organizations or research teams. These 30+ initiatives include Apollo ResearchAtla AITimaeus, Simplex, Leap LabsTheorem LabsWorkshop Labs, and Watertight AI

What fellows receive

The initial program runs for 12 weeks (June–August 2026) in Berkeley, London, or remotely, depending on stream. Fellows receive:

  • Mentorship from world-class researchers and a dedicated research manager
  • $15,000 stipend and $12,000 compute budget
  • Housing in a private room, catered meals, and travel to and from the program covered
  • Office space and the community that comes from working alongside other fellows
  • Seminars, workshops, and networking events with the broader AI safety community
  • The opportunity to continue for 6-12 additional months with ongoing stipend, compute, mentorship, and research support through the extension
What the 12-week research phase is like in practice

Most fellows work on an independent research project, typically scoped in collaboration with their mentor during the early weeks. During this period, fellows are usually:

  • getting oriented to the MATS environment,
  • refining or rethinking an initial project idea,
  • learning how their mentor prefers to work,
  • and calibrating what "good progress" looks like in an open-ended research setting.

As the program progresses, fellows iterate on a research plan, check in regularly with mentors and research managers, and gradually sharpen their questions and methods.

Fellows complete two milestones during the 12-week phase. In the past, the first has been a Project Proposal or Research Plan. The second is a Poster Presentation at the MATS Research Symposium, attended by members of the AI safety community.

How mentorship and research management work

Mentorship at MATS is intentionally varied. Mentors differ in:

  • how directive they are,
  • how often they meet,
  • whether they focus more on high-level framing or low-level technical details.

Every fellow also works with our dedicated research management team. Research managers (RMs) work with the fellows and mentors in a stream to support their program goals. Often this involves providing regular check-ins, helping unblock stalled projects, offering feedback on research plans, and supporting fellows in translating ideas into concrete progress.

There is, however, a lot flexibility in what an RM can do in their role. This includes offering productivity coaching, career advice, application help, conference submission assistance, publication guidance, and much more!

Fellows consistently rate the experience highly, with an average score of 9.4/10 for our latest program (median of 10/10).

Community at MATS

The 12-week phase provides fellows with a community of peers who share an office, meals, and housing. Working in a community grants fellows easy access to future collaborators, a deeper understanding of other research agendas, and a social network in the AI safety community. Fellows also receive support from full-time Community Managers.

Each week includes social events e.g., parties, game nights, movie nights, hikes. Weekly lightning talks give fellows an opportunity to share their research interests in an informal setting. Outside of work, fellows organize road trips, city visits, weekend meals, and more.

The 6-12 month extension phase

At the conclusion of the 12-week research phase, fellows can apply to continue their research in a fully-funded 6-12 month extension. Approximately 75% of fellows continue into the extension. By scholar-time weighting, ~60% of the MATS experience is the extension phase (12 month extensions shift this even further).

Acceptance decisions are based on mentor endorsement and double-blind review of the 12-week program milestones. By this phase, fellows are expected to pursue their research with high autonomy.

Who should apply

MATS welcomes talented applicants that traditional pipelines may overlook. We're looking for:

  • Technical researchers (ML/AI background helpful but not required)
  • People who can demonstrate strong reasoning and research potential
  • Those interested in alignment, interpretability, security, or governance
  • Policy professionals with strong writing ability, understanding of governmental processes, and technical literacy to engage with AI concepts
  • Individuals with experience in government, think tanks, or policy orgs; domain expertise in national security, cybersecurity, US-China relations, biosecurity, or nuclear policy a plus

Our ideal applicant has:

  • An understanding of the AI safety research landscape equivalent to having completed BlueDot Impact’s Technical AI Safety or AI Governance courses.
  • Previous experience with technical research (e.g., ML, CS, math, physics, neuroscience), generally at a postgraduate level; OR previous policy research experience or a background conducive to AI governance
  • Strong motivation to pursue a career in AI safety research

Even if you do not meet all of these criteria, we encourage you to apply. Several past fellows applied without strong expectations and were accepted.

All nationalities are eligible to participate and roughly 50% of MATS fellows are international.

Who tends to thrive at MATS

There's no single "MATS profile," but fellows who thrive tend to share a few characteristics:

  • Comfort with ambiguity and open-ended problems
  • Strong reasoning skills, even outside their primary domain
  • Willingness to revise or abandon initial ideas
  • Ability to work independently while seeking feedback strategically

Prior ML experience is helpful but not required. Many successful fellows come from mathematics, physics, policy research, security studies, or other analytical fields.

Who might not find MATS a good fit

You may want to reconsider applying if you're primarily looking for:

  • a structured curriculum with clear assignments,
  • an academic credential or resume signal,
  • or tightly scoped tasks with well-defined answers.

We think it’s a feature and not a bug that some very capable people decide MATS isn’t for them.

How to apply

Applications are now open.

General Application (December 16th to January 18th)

Applicants fill out a general application, which should take 1-2 hours. Applications are due by January 18th AoE.

Additional Evaluations (Late January through March)

Applicants that are advanced in the applications process go through additional evaluations including reference checks, coding tests, work tests, and interviews. Which evaluations you will undergo depend on the mentors and streams you apply to.

Admissions Decisions (Mid March through Early April)

Selected applicants are notified of their acceptance and anticipated mentor later in the application cycle.

If you have any questions about the program or application process, contact us at applications@matsprogram.org.

Closing

If you're considering applying to MATS Summer 2026, we hope this post helps you decide whether the program is a good fit for you. Full details about the program structure, timelines, and application process are on the MATS website.

We're happy to answer questions in the comments!

Acknowledgments: Claude was used for limited editorial support (e.g., wording suggestions, structural feedback).



Discuss

Forfeiting Ill-Gotten Gains

Новости LessWrong.com - 17 января, 2026 - 03:20
Published on January 17, 2026 12:20 AM GMT

It's a holiday. The cousins are over, and the kids are having a great time. Unfortunately, that includes rampaging through the kitchen. We're trying to cook, so there's a "no cutting through the kitchen" rule. Imagine enforcement looks like:

Kid: [dashes into kitchen, pursued by cousin]
Adult: Out of the kitchen!
Kid: Sorry! [Continues their path, leaving through the other door; escapes pursuit from more rule-abiding cousin]

This doesn't work! The kid got what they wanted out of this interaction, and isn't going to change their behavior. Instead, I need to make it be not worth their while:

Kid: [dashes into kitchen, pursued by cousin]
Adult: No cutting through the kitchen! [Physically rebuffs intruder]!
Kid: Sorry! [Forced to leave through the door they entered by; caught by cousin.]

Other examples:

  • Sneak candy, spit it out and forfeit dessert.

  • Use sibling's tablet time, lose your own.

  • Interrupt, be ignored.

The general principle is that if you want to limit behavior the combination of the gains from rule-breaking and penalty from punishment need to put the kid in a worse position than if they'd never broken the rule.

This isn't just a parenting thing: it's common to say that "crime should not pay", and many legal systems prohibit unjust enrichment. One place I'd like to see this implemented is airplane evacuation. If the safety announcements included "In the event of an emergency evacuation, any carry-on luggage you bring will be confiscated and destroyed. You will also be fined." we would have more JAL 516 (379 occupants, zero deaths) and less Aeroflot 1492 or Emirates 521.

Comment via: facebook, mastodon, bluesky



Discuss

Is It Reasoning or Just a Fixed Bias?

Новости LessWrong.com - 17 января, 2026 - 00:51
Published on January 16, 2026 9:43 PM GMT

This is my first mechanistic interpretability blog post! I decided to research whether models are actually reasoning when answering non-deductive questions, or whether they're doing something simpler.

My dataset is adapted from InAbHyD[1], and it's composed of inductive and abductive reasoning scenarios in first-order ontologies generated through code (using made-up concepts to dismiss much of the external effect of common words). These scenarios have multiple technically correct answers, but one answer is definitively the most correct[2]. I found that LLMs seem to have a fixed generalization tendency (when evaluating my examples) that doesn't seem to adapt to any logical structure. And accuracies in 1-hop and 2-hop reasoning add up to roughly 100% for most models.

Additionally, there's a large overlap between H2 successes and H1 failures (73% for DeepSeek V3), meaning that model outputs the parent concept regardless of whether the task asks for the child concept or the parent concept. This behavior suggests that the model isn't actually reasoning, instead generalizing to a fixed level that aligns with the parent concept.

This is perplexing because in proper reasoning, you generally need the child concept in order to get the parent concept. For example, you'd conclude that Fae is a mammal through a reasoning chain, first establishing that Fae is a tiger, then making one hop to say that Fae is a feline (child concept), and then making a second hop to say that felines are mammals (parent concept). The overlap, though, suggests that the model isn't reasoning through the ontology, and it's skipping the chain to output the parent or child concept depending on its fixed generalization tendency.

I used MI techniques like probing, activation patching, and SAEs in my research. Probing predicted something related to the output at layer 8 (very early), but patching early layers barely made a difference in the final decision. This makes it more likely that whatever probing predicted is just correlational to the final result, as well as that the generalization tendency is distributed among model components instead of being early on.

This was a fun project, and I'm excited to continue with this research and make some more definitive findings. My ultimate goal is to find why LLMs might be architecturally limited in non-deductive reasoning.

  1. ^

    A paper that I based my initial research question on, which argues that LLMs can't properly do non-deductive reasoning. It's authored by my NLP professor, Abulhair Saparov, and his PhD student, Yunxin Sun.

  2. ^

    This concept is known as Occam's razor.



Discuss

Страницы

Подписка на LessWrong на русском сбор новостей