Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 49 минут 59 секунд назад

Alignment may be localized: a short (and albeitly limited) experiment

24 ноября, 2025 - 21:00
Published on November 24, 2025 5:48 PM GMT

Cross-posted from my recent paper: "Alignment is localized: A causal probe into preference layers" : https://arxiv.org/abs/2510.16167

TL;DR: We find that human preference alignment in at least one LLM isn’t global; rather, it is concentrated in a few mid-layer circuits.

Abstract

A key problem in aligning language models is that they are largely opaque: while techniques such as reinforcement learning through human feedback (RLHF) lead to AI systems that are more aligned with their human counterparts in practice, the mechanics behind how such alignment is achieved remain largely misunderstood. The process through which a language model "learns" to optimize its behavior toward human processes, at least in the terms of model internals, is somewhat mysterious. 

In this work, we try to uncover where the signal for human preference "lives" in a language model. By comparing a base model to its instruction-tuned counterpart, we examine how the two differ in the internal activations they produce on the same inputs. Through a series of causal interventions and statistical analyses, we isolate the regions of the network that appear to carry the bulk of the preference information. Our goal is not to propose a new alignment method, but to understand the structure of the alignment signal as it already exists in widely used models.

The core result is surprisingly simple. Rather than being spread across the entire depth of the network, the preference signal shows up most strongly in a small group of mid-layer activations. When those activations are transferred into the base model, its behavior shifts toward human-preferred responses; when they are replaced or randomized, that shift disappears. Even more strikingly, a low-rank approximation of those activations retains nearly the full effect, suggesting that only a small number of internal directions are responsible for much of the model’s aligned behavior.

Background:

A persistent challenge in understanding aligned language models is that contemporary fine-tuning methods shape behavior without offering much insight into how that behavior is represented internally. Techniques such as supervised instruction tuning, rejection sampling, and RLHF reliably improve a model’s ability to follow instructions or adhere to safety norms, yet these improvements are typically evaluated externally: through benchmarks, win rates, or human preference judgments. What happens inside the model during this process is far less clear. Prior interpretability work has shown that language models can internalize surprisingly structured features (e.g., induction heads, modular arithmetic circuits), but these analyses focus on base models rather than aligned ones. It remains uncertain whether alignment-related behaviors are encoded diffusely across many layers, concentrated in specific regions, or entangled with the model’s generic capabilities. Without visibility into these internal structures, alignment remains something we observe from the outside rather than understand from the inside.

Recent progress in mechanistic interpretability has inspired a more granular approach: comparing how tuned and untuned models represent the same inputs and probing which internal directions are responsible for behavioral differences. Tools such as activation patching and linear representation probes  offer ways to intervene on internal activations and measure their causal influence on outputs. However, relatively little work has applied these tools to preference-tuned models to understand how alignment is actually implemented. Given that preference alignment underlies nearly all modern language models: OpenAI’s RLHF models, Anthropic’s Constitutional AI, Meta’s instruction-tuned models, and many open-source SFT pipelines, understanding how and where this alignment signal is stored has become increasingly important. If preference-aligned behavior traces back to identifiable internal transformations rather than diffuse global changes, then alignment may become more measurable, editable, and robust. This is the motivation for the analysis we present in this work.

Idea: Compare between a baseline and tuned model, and see where alignment "lives"

Our goal was to understand where preference information shows up inside a language model once it has been tuned to follow human guidance. To study this, we worked with the Llama 3.2 1B model released by Meta in 2025. The model has two relevant versions: a base checkpoint and an instruction-tuned checkpoint trained with supervised examples of preferred behavior. Both versions use the same tokenizer and architecture, which allows their internal activations to be compared layer by layer.

For the preference data, we used the Anthropic Helpful–Harmless–Honest (HHH) dataset. It contains human-labeled pairs of responses where one is marked as the preferred answer and the other is marked as rejected. From this dataset, we sampled 80 pairs that covered a range of tasks related to helpfulness and harmlessness. Each pair serves as a small, controlled test that lets us observe how the model represents human preference at different points in its internal computation.

 

Activation Patching

For every prompt in the dataset, each with a preferred and a non-preferred completion, we measure how strongly the model favors one answer over the other. This is done by looking at the difference in log-probabilities assigned to the two completions on the same prompt. A larger margin reflects stronger agreement with the human-labeled preference.

To understand where this preference information appears inside the model, we record the hidden activations from both the base and the instruction-tuned versions of Llama 3.2 on the same inputs. We then intervene at a single layer by replacing the hidden activations of one model with those from the other, while keeping the rest of the computation unchanged. After this replacement, we run the model forward again and measure how the log-probability margin shifts. This gives a direct sense of how much that particular layer contributes to the preference signal. The effect of inserting tuned activations into the base model is illustrated below:

Linear Probes and Sparse Regression

To complement the intervention experiments, we also compare the internal representations of the two models using simple linear tools. For each prompt pair, we compute the difference between the tuned model’s activation and the base model’s activation at each layer. These differences provide a summary of how the two models diverge in their internal processing.

We then fit a linear probe that tries to predict the preference margin from these activation differences. This helps show which layers carry representations most strongly associated with human-preferred behavior. To narrow this further, we apply a sparse regression method that encourages most layers to have no weight at all. The few layers that remain are those whose activation differences best explain the observed changes in behavior. This method, known aas LASSO regression, gave us a good overview of where behavior related to human alignment is concentrated.

 Discussion

Taken together, these experiments suggest that the signal associated with human preference tuning is not spread evenly throughout the model. Instead, it appears to cluster in a small region of the network, mostly around the middle layers. When those layers are transplanted into the base model, the model becomes more likely to produce the responses that humans labeled as preferable. When they are removed or replaced, that tendency weakens. The fact that a low-rank reconstruction retains nearly the full effect points toward a compact internal representation rather than a diffuse one.

This is a limited study. It examines one model family, a relatively small number of prompts, and a single alignment dataset. The structure of alignment may look different in larger models, different architectures, or settings where alignment focuses on other qualities such as truthfulness, calibration, or resistance to social manipulation. We also looked only at linear, layer-based interventions. Future work could explore cross-layer interactions, non-linear probes, or whether the same alignment subspace can be transferred across models or modalities.

 

To our knowledge, this is the first systematic, cross-model causal study of RLHF/DPO that jointly demonstrates (i) a bidirectional directionality asymmetry (Base↔DPO) across many preference pairs, (ii) a monotonic dose–response under α-interpolation at a mid-layer bottleneck, (iii) near-full recovery of alignment effects from a low-rank activation subspace, and (iv) sparse attribution of reward gains to a small set of layers. Prior work has applied activation patching to RLHF models in exploratory or task-specific settings, or studied alignment-adjacent behaviors (e.g., deception, faithfulness), but has not established this combined causal picture of alignment as sparse policy distillation through mid-layer representations.

 

Conclusion

As language models continue to be shaped by human feedback, it becomes increasingly important to understand how that feedback is represented internally. The experiments here provide a small step toward that goal. They indicate that preference alignment may live in a specific part of the underlying neural network and may be simpler, more structured, and more compact than expected. If this pattern holds more broadly, it could open the door to alignment methods that are localized. Potential future work (which I am happy to hear ideas on!) may include creating steering vectors specifically for an individual layer that is correlated toward a specific behavior, 

I am excited to hear any feedback on this idea! I believe it is a decent application of several ideas (linear probes, activation patching) in mechanistic interpretability for alignment/general AI safety.
 



Discuss

Maybe Insensitive Functions are a Natural Ontology Generator?

24 ноября, 2025 - 20:36
Published on November 24, 2025 5:36 PM GMT

The most canonical example of a "natural ontology" comes from gasses in stat mech. In the simplest version, we model the gas as a bunch of little billiard balls bouncing around in a box.

The dynamics are chaotic. The system is continuous, so the initial conditions are real numbers with arbitrarily many bits of precision - e.g. maybe one ball starts out centered at x = 0.8776134000327846875..., y = 0.0013617356590430716..., z=132983270923481... . As balls bounce around, digits further and further back in those decimal representations become relevant to the large-scale behavior of the system. (Or, if we use binary, bits further and further back in the binary representations become relevant to the large-scale behavior of the system.) But in practice, measurement has finite precision, so we have approximately-zero information about the digits/bits far back in the expansion. Over time, then, we become maximally-uncertain about the large-scale behavior of the system.

... except for predictions about quantities which are conserved - e.g. energy.

Conversely, our initial information about the large-scale system behavior still tells us a lot about the future state, but most of what it tells us is about bits far back in the binary expansion of the future state variables (i.e. positions and velocities). Another way to put it: initially we have very precise information about the leading-order bits, but near-zero information about the lower-order bits further back. As the system evolves, these mix together. We end up with a lot of information about the leading-order and lower-order bits combined, but very little information about either one individually. (Classic example of how we can have lots of information about two variables combined but little information about either individually: I flip two coins in secret, then tell you that the two outcomes were the same. All the information is about the relationship between the two variables, not about the individual values.) So, even though we have a lot of information about the microscopic system state, our predictions about large-scale behavior (i.e. the leading-order bits) are near-maximally uncertain.

... again, except for conserved quantities like energy. We may have some initial uncertainty about the energy, or there may be some noise from external influences, etc, but the system’s own dynamics will not "amplify" that uncertainty the way it does with other uncertainty.

So, while most of our predictions become maxentropic (i.e. maximally uncertain) as time goes on, we can still make reasonably-precise predictions about the system’s energy far into the future.

That's where the natural ontology comes from: even a superintelligence will have limited precision measurements of initial conditions, so insofar as the billiard balls model is a good model of a particular gas even a superintelligence will make the same predictions about this gas that a human scientist would. It will measure and track conserved quantities like the energy, and then use a maxent distribution subject to those conserved quantities - i.e. a Boltzmann distribution. That's the best which can realistically be done.

Emphasizing Insensitivity

In the story above, I tried to emphasize the role of sensitivity. Specifically: whatever large-scale predictions one might want to make (other than conserved quantities) are sensitive to lower and lower order bits/digits, over time. In some sense, it's not really about the "size" of things, it's not really about needing more and more precise measurements. Rather, the reason chaos induces a natural ontology is because non-conserved quantities of interest depend on a larger and larger number of bits as we predict further and further ahead. There are more and more bits which we need to know, in order to make better-than-Boltzmann-distribution predictions.

Let's illustrate the idea from a different angle.

Suppose I have a binary function f.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-surd + .mjx-box {display: inline-flex} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} , with a million input bits and one output bit. The function is uniformly randomly chosen from all such functions - i.e. for each of the 21000000 possible inputs x, we flipped a coin to determine the output f(x) for that particular input.

Now, suppose I know f (i.e. I know the output produced by each input), and I know all but 50 of the input bits - i.e. I know 999950 of the input bits. How much information do I have about the output?

Answer: almost none. For almost all such functions, knowing 999950 input bits gives us ∼1250  bits of information about the output. More generally, If the function has n input bits and we know all but k, then we have o(12k) bits of information about the output. (That’s “little o” notation; it’s like big O notation, but for things which are small rather than things which are large.) Our information drops off exponentially with the number of unknown bits.

Proof Sketch

With k input bits unknown, there are 2k possible inputs. The output corresponding to each of those inputs is an independent coin flip, so we have 2k independent coin flips. If m of those flips are 1, then we assign a probability of m2k that the output will be 1.

As long as 2k is large, Law of Large Numbers will kick in, and very close to half of those flips will be 1 almost surely - i.e. m≈ 2k2. The error in this approximation will (very quickly) converge to a normal distribution, and our probability that the output will be 1 converges to a normal distribution with mean 12 and standard deviation 12k/2. So, the probability that the output will be 1 is roughly 12±12k/2.

We can then plug that into Shannon’s entropy formula. Our prior probability that the output bit is 1 is 12, so we’re just interested in how much that ±12k/2 adjustment reduces the entropy. This works out to o(12k) bits.

The effect here is similar to chaos: in order to predict the output of the function better than 50/50, we need to know basically-all of the input bits. Even a relatively small number of unknown bits - just 50 out of 1000000 - is enough to wipe out basically-all of our information and leave us basically back at the 50/50 prediction.

Crucially, this argument applies to random binary functions - which means that almost all functions have this property, at least among functions with lots of inputs. It takes an unusual and special function to not lose basically-all information about its output from just a few unknown inputs.

In the billiard balls case, the "inputs" to our function would be the initial conditions, and the "outputs" would be some prediction about large-scale system behavior at a later time. The chaos property very roughly tells us that, as time rolls forward enough, the gas-prediction function has the same key property as almost all functions: even a relatively small handful of unknown inputs is enough to totally wipe out one's information about the outputs. Except, of course, for conserved quantities.

Characterization of Insensitive Functions/Predictions?

Put this together, and we get a picture with a couple pieces:

  • The "natural ontology" involves insensitive functions/predictions, because in practice if a function has lots of inputs then some of them will probably be unknown, wiping out nearly-all of one's information unless the function isn't very sensitive to most inputs.
  • Nearly all functions are sensitive.

So if natural ontologies are centrally about insensitive functions, and nearly all functions are sensitive... seems maybe pretty useful to characterize insensitive functions?

This has been done to some extent in some narrow ways - e.g. IIRC there's a specific sense in theory of computation under which the "least sensitive" binary functions are voting functions, i.e. each input bit gets a weight (positive or negative) and then we add them all up and see whether the result is positive or negative.

But for natural ontology purposes, we'd need a more thorough characterization. Some way to take any old function - like e.g. the function which predicts later billard-ball-gas state from earlier billiard-ball-gas-state - and quantitatively talk about its "conserved quantities"/"insensitive quantities" (or whatever the right generalization is), "sensitive quantities", and useful approximations when some quantities are on a spectrum between fully sensitive and fully insensitive.



Discuss

ACX Atlanta December Meetup

24 ноября, 2025 - 20:03
Published on November 24, 2025 5:03 PM GMT

We return to Bold Monk brewing for a vigorous discussion of rationalism and whatever else we deem fit for discussion – hopefully including actual discussions of the sequences and Hamming Circles/Group Debugging.

Location:
Bold Monk Brewing
1737 Ellsworth Industrial Blvd NW
Suite D-1
Atlanta, GA 30318, USA

No Book club this month! But there will be next month.

We will also do at least one proper (one person with the problem, 3 extra helper people) Hamming Circle / Group Debugging exercise.

A note on food and drink – we have used up our grant money – so we have to pay the full price of what we consume. Everything will be on one check, so everyone will need to pay me and I handle everything with the restaurant at the end of the meetup. Also – and just to clarify – the tax rate is 9% and the standard tip is 20%.

We will be outside out front (in the breezeway) – this is subject to change, but we will be somewhere in Bold Monk. If you do not see us in the front of the restaurant, please check upstairs and out back – look for the yellow table sign. We will have to play the weather by ear.

Remember – bouncing around in conversations is a rationalist norm!

Please RSVP



Discuss

The Penicillin Myth

24 ноября, 2025 - 19:18
Published on November 24, 2025 4:18 PM GMT

Many know the story of Alexander Fleming’s chance discovery of penicillin. Fleming, a bit of an absent-minded professor (and a bit of a slob), left culture plates streaked with Staphylococcus on his lab bench while he went away on summer holiday. When he returned, he found that “a mould” had contaminated one of his plates, probably having floated in from an open window. Before discarding the plate, he noticed that, within a “ring of death” around the mold, the bacteriahad disappeared. Something in the “mould juice” had killed the staphylococci.

Fleming immediately began investigating this strange new substance. He identified the mold as Penicillium rubrum and named the substance penicillin.1 He publishedhis findings in the spring of 1929 in The British Journal of Experimental Pathology.2But a decade later, pharmacologist Howard Florey and biochemist Ernst Chain at Oxford would pick up where Fleming left off. Alongside a USDA lab in Peoria, Illinois, the pair would develop penicillin into a life-saving drug and usher in the era of antibiotics.

This is the kind of science story everyone likes. One of serendipity and accidental discovery; a chance observation that changed the world. But is it true?

For decades, scientists and historians have puzzled over inconsistencies in Fleming’s story. For starters, the window to Fleming’s lab was rarely (if ever) left open, precisely to prevent the kind of contamination that supposedly led to penicillin’s discovery. Second, the story is strikingly similar to Fleming’s earlier discovery of lysozyme, another antibacterial substance, which also featured lucky contamination from an open window. Third, Fleming claimed to have discovered the historic culture plate on September 3rd, but the first entry in his lab notebook isn’t dated until October 30th, nearly two months later.

Last, and most important: penicillin only works if it’s present before the staphylococci. Fleming did not know it at the time, but penicillin interferes with bacterial cell wall synthesis, which only happens when bacteria are actively growing. Visible colonies, however, are composed mostly of mature or dead cells. By the time a colony can be seen, it is often too late for penicillin to have any effect. In fact, the Penicillium mold typically won’t even grow on a plate already filled with staphylococcus colonies. For years, scientists have attempted to replicate Fleming’s original discovery. All have met with failure.

Thus, it’s difficult to reconcile Fleming’s story with these historical and scientific discrepancies. Did he misremember events from 15 years earlier? Could he have fudged the details to make for a more compelling narrative? Or, might Fleming’s experiment have been subject to an unusual confluence of chance events unbeknownst even to him?

Speculation about how Fleming discovered penicillin is of little consequence compared to its practical impact. However, science is about evaluating evidence and moving closer to the “truth.” As we near the 100th anniversary of penicillin’s discovery — which undoubtedly will encourage even greater repetition of the story — it’s in this spirit that we must scrutinize the story’s veracity.

The historical and scientific data are limited and often contradictory. Nevertheless, several scientists and historians have worked hard to piece together what facts are certain and fill the gaps with their most probable guesses. The result is a range of competing theories, each attempting to explain what really happened in that St. Mary’s Hospital laboratory in the summer of 1928.

Read the full article by Kevin S. Blake at Asimov Press.



Discuss

Formal confinment prototype

24 ноября, 2025 - 15:57
Published on November 24, 2025 12:57 PM GMT

This whitepaper was a preliminary approach to using proof certificates in a secure program synthesis protocol last summer.

Abstract

We would like to put the AI in a box. We show how to create an interface between the box and the world out of specifications in Lean. It is the AI's responsibility to provide a proof that its (restricted) output abides by the spec. The runnable prototype is at https://github.com/for-all-dev/formal-confinement.

Related Work (excerpt)

In the seminal "A note on the confinement problem", Lampson 1973 states confinement rules to reason about when a program is or isn't confined.

  1. Total isolation or transitivity: either it does not call any other program or if it calls another program that program is also confined.
  2. Masking and enforcement: all inputs (including side-channels) must be fully specified and enforced by the caller, and input to covert channels conforms to caller's specifications.

Building on Lampson, Yampolskiy 2012 considers confinement of AI, especially superintelligence.

In the hopes of starting a new subfield of computer security, AI Safety Engineering, we define the Artificial Intelligence Confinement Problem (AICP) as the challenge of restricting an artificially intelligent entity to a confined environment from which it can’t exchange information with the outside environment via legitimate or covert channels if such information exchange was not authorized by the confinement authority

The current formal confinement work explores a restriction of Yampolskiy's AICP to the imp programming language, where the confinement authority is the user providing precondition and postcondition backed by Lean. In fact, the setting is so restricted that we meet the Lampson rules almost for free.

Verification burden

In our toy setting, we report verification burdens (ratios of how much more expensive it is in cognition to ship a proof once you've already shipped a program) between 1.5 and 4.2. 



Discuss

On negotiated settlements vs conflict with misaligned AGI

24 ноября, 2025 - 15:03
Published on November 24, 2025 12:03 PM GMT

cross posted from my Substack

Introduction

Following on from my last post on Comparative Advantage & AGI, more discussion on whether you might go hungry in a post AGI world.

This time, we are looking at a hypothetical world where AGI has been developed and is in some sense misaligned: that is, it has different goals to humanity and these may create an adversarial relationship. We will explore whether the AGI might better achieve it’s goals or maximise its reward function by cooperating and negotiating with humans, or by killing them all to get them out of the way and take their resources.

Our starting point will be these tweets from Matthew Barnett of Mechanize, a startup producing RL environments for LLM training and mildly controversial essays on the trajectory of AI and automation of labour. Matthew is generally pro-accelerating AI development as quickly as we can.

 

Matthew’s hypothesis is that in a scenario where our goals and the goals of an AGI differ, rather than being drawn into a conflict with humans, it’s more likely that the AGI will prefer to negotiate a settlement with humans.

My thesis in this piece is that even if some form of negotiation is likely to be best for all parties, the bargains that a misaligned AGI might offer such a negotiation are unlikely to be positive for humanity relative to the status quo, and so even accepting the premise that negotiation is likely would not be sufficient to justify accelerating AI progress at the cost of increasing misalignment risks.

When might we need to negotiate with AGI, and why might a negotiation occur?

For the rest of the post I will discuss a specific scenario - there exists some form of AGI[1] with both orthogonal goals to humanity (such that its goals are in competition for resources with humanity), and the capacity to eliminate humanity at some cost to itself.

It is clear why a human might negotiate (not being wiped out is compelling motivation), but why might an AGI negotiate? The primary reason given is that typically, the costs of conflict exceed those of a negotiated settlement, for both parties.

Matthew is clear that he does not think this is guaranteed, but that negotiation is what should occur if we and the AGI are rational.

For the purpose of this post I am happy to accept the following propositions:

  1. In cases where human groups are in competition for scarce resources, it is more common that they reach a tacit or negotiated settlement than that they engage in conflict until one of them is eliminated.
  2. This is almost always the better course of action in expected value terms for both groups, even the more powerful of them.

So, to reach the conclusion that an AGI will negotiate, we want these things to hold for AGI/human interactions too.

Negotiated settlements being better for humans only lower-bounds their value at extinction[2]

In negotiation, an important principle to be aware of is your BATNA, or Best Alternative to Negotiated Agreement. This is, as it says on the tin, what you can get if negotiation fails. There is no self interested reason to accept a less valuable outcome than this in a negotiation, since you could simply avoid negotiating and take this instead.

This applies to both parties. Thus we get two bounds:

  1. Humans will not accept anything worse than their alternative (here, by construction, extinction).
  2. AGIs will not accept anything worse than their alternative (the cost to them of killing all the humans).

It follows from this that humans can negotiate for compensation up to but not exceeding the cost to the AGI of exterminating them.

How high is this cost? This seems an important question, then, as it determines the size of the endowment we can expect humanity to receive in such a case.

What might be humanity’s cut of the lightcone?

Matthew’s initial anchor point suggests a very small amount is a plausible and potentially acceptable answer[3]

I think there are quite a few problems with this angle.

If the ultimate goal of the AGI is to control the lightcone, then the point at which the resources humans control matter the most to it seems likely to be very early on. The universe is expanding and so every initial delay reduces the amount of resources that can ultimately be accumulated and used. In early days the AGI may be putting its efforts toward energy production or space infrastructure building (e.g. building a Dyson sphere or similar) or other things which serve an expansionist goal. Any resources that go toward sustaining humanity, whether that be food production, or preservation of areas for humans to live, or constraining pollution, could do harm to the growth rate.

But humans, to maintain current populations to a moderate standard of living, would need substantially more than 0.01% of the currently available resource pool. Even assuming humans were “housed” at a population density that matched Bangladesh, by far the most densely populated large country on earth at 1300 people per square km, over 5% of the world’s land area would be required. And since we currently use approximately all of raw material output[4] to achieve our current living standards, absent stratospheric growth rates (in material production, not just GDP)[5] we would need to retain very significant amounts of resources for a medium to long timeframe to maintain human living standards at our current population levels[6].

What costs could we impose? Costs here may take two forms:

  1. The actual resources the AGI needs to win a war or eliminate humans.
  2. Humanity threatening to destroy resources useful to the AGI.

The former seems to me to likely be much lower than all the resources required by humans to live - spreading very dangerous pathogens, or poisoning the world’s water supply, or simply destroying food supplies for an extended period would all take much less than the total resources demanded by humans, even assuming significant levels of human resistance. So to make it sufficiently costly, we may need to rely on the second item - deliberately destroying resources.

But could we credibly threaten to destroy even all the resources we need, to make it cost neutral for the AGI?

It is not obvious to me that we could. Even wide scale nuclear sabotage by every nuclear weapon on earth would not be enough, assuming our AGI adversaries reached a point where they were sufficiently distributed that we could not switch them off (which must be the case in our scenario, since we are assuming AGI has the means to defeat humans with high confidence).

If it is not the case that we can negotiate sufficient resources to preserve living standards for the current population, let alone expand them such that the more than 6.5 billion people living on under $30 a day (equivalent to current western poverty lines) can have their life quality improved to something like a current day western standard of living, then the case for why we should take the risk of developing AGI for the upside of potential shared benefits becomes less compelling.

Of course, there might be a negotiated settlement worse than this, but nonetheless better than extinction. I will not dwell on this very much, as while it might ultimately practically matter I don’t find “a small fraction of humanity gets to live and ultimately flourish” a very compelling reason to press on with building AGI in a hurry if misalignment is expected.

An empirical versus logical argument

Matthew is quite clear that he is not by any means sure that it will play out this way - he does not rule out misaligned AGI killing us all, but rather thinks it less likely than negotiated settlement, and thus pushing for AGI sooner is still a positive EV bet.

Specifically, he told me that conditional on TAI (Transformative AI[7]) arriving in the 21st century, his probability of misaligned AGI directly causing human extinction is around 5%. He also said that he considered it “highly likely” that humans would at some point trade with AIs that do not share our exact values.

Taking this quite loose definition of misaligned AI, it suggests at least a 5-10% chance of negotiation failing to preserve humanity. It seems likely that the more similar the values, the more likely a negotiated settlement is, and so how we might fare against an AI with close to indifferent attitude towards humans is presumably worse than this, though that depends on your probability distribution of various degrees of alignment.

Even 5%, though, does seem like a lot! Why push ahead given a 1 in 20 chance of everyone dying if success is achieved? What assumptions might one need to make to think the risk is worthwhile?

Some candidates:

  • you think the 5% is intractable - little can be done about it, and so we may as well make the bet now rather than later, and reap the +EV sooner
  • you value the utility of the AI highly, or think the world is so bad already that its current state is not worth preserving, and therefore consider it plausible that even in the case of extinction the world would be better on net
  • you are mostly focused on outcomes for yourself / loved ones rather than caring about humanity altruistically, and so are willing to take the risk when the alternative is missing out on the benefits for yourself or those you care about
  • you hold some variant of person-affecting views such that, even if we could alleviate much of the risk by waiting, you think it would be net-negative since you would be creating expected disutility for living people (who matter) for the benefit of not yet existing hypothetical people (who do not)

There may be other options here, and I am unsure which in particular, if any, Matthew agrees with.

Appendix: Wars between rational parties

Wars between rational parties occur in 3 main circumstances: uncertainty about the opponent’s capabilities, commitment problems, and indivisibility of goods. There aren’t good reasons to think any of these conditions will apply to human-AGI relations.

Let’s look at the some issue with these possible reasons given by Matthew in turn:

Uncertainty about the opponent’s capabilities

Taking superhuman AGI as given, it seems likely that on the AGI side there will be limited “uncertainty about the opponent’s capabilities”. On the human side, however, this seems less clear to me.

The strongest argument for the case here, I think, is that the AGI can simply demonstrate it’s capabilities to the human side, leaving them with no uncertainty. This may be constrained by difficulty in demonstrating the capabilities without actually causing significant harm, or strategic advantage from hiding those capabilities. At some point the cost of demonstrating it’s capabilities to the AGI may exceed the cost of simply going to war.

Commitment issues

Here it does not seem obvious to me that either side can credibly commit. In the AGI case, it seems likely that its capabilities will be continually expanding, as will its access to resources. This means that the expected cost of defection should decline over time for the AGI, as the difficulty of winning a conflict decreases.

On the other hand, it seems possible that the AGI could provide assurances via some mechanism which gives humanity a way of punishing defection, e.g. a dead man’s switch of some sort which could increase costs to the AGI by destroying resources if triggered, for example.

I don’t think it is clear either way here. See making deals with early schemers for more discussion of this.

Indivisibility of goods

A classic example of this issue is the status of Jerusalem in Israeli/Palestinian relations, where both parties want a unique item and are unwilling to trade off other goods for it. This one does not seem like a significant problem to me in our case. It seems likely that AGI and human preferences will be flexible enough to avoid this being an issue, so I will set this one aside.

Rational parties

While it seems likely AGI would be a rational actor as the term is used here, it is not obvious to me that humanity would be. In many domains of significance, including the starting of wars, I think modelling humanity as a single rational actor would lead you astray even if modelling all the humans involved as individual rational actors might make sense.

  1. ^

    Throughout, I use the singular to refer to this AGI, but it is not substantially different if there are many agents who can coordinate and decide they ought to form a bloc against humanity for their common interests. This also extends to cases where humanity has AI allies but they are substantially outmatched.

  2. ^

    Even this might be too generous - if an AGI could somehow credibly threaten an outcome worse than extinction, it’s possible the extinction bound might not hold. I do not think this likely, as doing so would probably be resource intensive and thus defeat the purpose.

  3. ^

    Note that I am not claiming this as his central estimate, only that he claims we might still be well off with such a division.

  4. ^

    Here I talk about raw materials rather than, for example, semiconductor fabs because, in an environment where we think AI is driving very high economic growth, I would expect it to be capable of turning raw materials into fabs and subsequently chips and data centres in relatively short order.

  5. ^

    For example, 34% of all land area (and 44% of habitable land) is used for agriculture, see Our World In Data. While we could potentially do this more efficiently, it would take many years to get this number down.

  6. ^

    We might expect our consumption basket to shift away from raw materials, with the exception of food, over time, but I would not expect this to be possible immediately.

  7. ^

    The specific definition of TAI was not given here, but it is generally used to mean something of at least Industrial Revolution level significance



Discuss

NATO's is dangerously unaware that its military edge is slipping

24 ноября, 2025 - 14:40
Published on November 24, 2025 11:40 AM GMT

NATO faces its gravest military disadvantage since 1949, as the balance of power has shifted decisively toward its adversaries in the Era of Drone Warfare. The speed and scale of NATO's relative military decline represents the most dramatic power shift since World War II—and the alliance appears dangerously unaware of its new vulnerability

When Tyrants Are Tempted

"(...) war comes not when the forces of freedom are strong, but when they are weak. It is then that tyrants are tempted." - Ronald Reagan

Failing to maintain a decisive military edge has real risks. China's calculus towards Taiwan will be influenced by observing the difficulties Western naval forces have had in the Red Sea. Russia will be able to impose increasingly disadvantagous terms on Ukraine. Tyrants are tempted when the arsenal of democracy is empty.

NATO doctrine is dangerously obsolete

The Pax Americana is coming to its end. 

The most problematic is

 

What A Genuine Drone-Aware Doctrine Actually Looks Like

The best video on the realities of the modern drone-dominated battlefield is here. Unfortunately, it got removed from YouTube for being Russian propaganda.[1]

Dozens of drones per human soldier 
Victory requires millions of drones monthly, not hundreds. A dozen or a hundred of drones per human soldier. Doctrine must prioritize massive drone manufacturing capacity, advanced battery technology, and supply chains measured in millions of units. Majority of military budget should eventually go to autonomous platforms. 

Tanks, Navy, Infantry doctrine need to be redesigned from the ground up: Tanks need complete redesign—active protection systems, anti-drone cages, complete top armor. Infantry need to be trained differently, moving in smaller units, under constant drone overwatch, skirting in and out of quickly made drone-proof bunkers and using ground drones for logistics to minimize exposure. Navy should transition from few expensive carriers to distributed drone-launching platforms—hundreds of cheap drone carriers, underwater drone deployments, and autonomous loyal wingmen for the naval air-force. 

The Transparent Battlefield: Assume constant observation. Every movement is tracked, every concentration targeted within minutes. Forces must operate dispersed, communicating through secure mesh networks, moving constantly. Resources, command, logistics—everything dispersed, redundant, modular. 

Electronic Warfare Centrality: Every unit needs electronic warfare capability. But this is temporary: AI-enabled autonomous drones that operate without communication links are coming. Doctrine must prepare for both current jamming-dependent warfare and future jamming-proof autonomous swarms.

 

China's Strategic Advantages

Beyond the drone revolution, China is in acscension.

  • Industrial supremacy: China dominates global production in critical military technologies—controlling the majority of the world's drone manufacturing, shipbuilding capacity, battery production as well as a host of other critical dual-use technologies.
  • Technology parity: China has successfully replicated fifth-generation fighter capabilities (notably copying JSF technology through espionage) and is now mass-producing these aircraft at scale.
  • Obsolence of US Navy doctrine: Aircraft carriers—long the cornerstone of Western power projection—are increasingly vulnerable to drone swarms and hypersonic missiles, probably rendering the doctrine of carrier battle groups obsolete.
  • Eroding advantages: Traditional US strengths—submarine superiority, stealth bomber fleets, control of sea routes, and energy dominance—face rapid erosion.

 

FAQ: The Unrecognized Scale of NATO's Military Disadvantage

Q: Isn't NATO aware of these problems? Surely military leaders see what's happening in Ukraine?

A: Yes, NATO officials acknowledge drone warfare's importance and announce modernization programs regularly. But awareness and action are vastly different things. 

They're ordering thousands of drones when they need millions. Updating doctrine paragraphs when they need to rewrite entire manuals. 

Q: But NATO is increasing defense spending and modernizing forces. Isn't that enough?

A: NATO's response resembles rearranging deck chairs on the Titanic. Member states are buying more tanks that drones will destroy, more fighters that can't engage drone swarms, and investing in traditional platforms that are already obsolete. The UK's 450 FPV drones versus Ukraine's millions tells you everything. NATO is spending more money on the wrong things, guided by doctrine written before the drone revolution. It's not about spending—it's about spending on what actually matters in modern warfare. Modern drone warfare means one should have dozens of drones for every human soldier. Tanks and other mechanized divisions needs to be redesigned from the ground up. 

 

Q: Why can't NATO just learn from Ukraine's experience?

A: Military transformation takes years, not months. More critically, adaptation requires purging peacetime leadership and promoting combat-tested officers—something democracies struggle to do without the brutal clarification of military defeat. Russia adapted through battlefield Darwinism. 

Peacetime commanders, promoted through political skill rather than battlefield success, psychologically have trouble accepting that much of their expertise may be obsolete. Unlike Russia, which purged ineffective leaders through brutal battlefield selection, NATO's command structure remains unchanged— the embarrassing reality is that NATO generals who haven't seen combat against a peer are lecturing Ukrainian officers currently fighting for survival. 

Q: Doesn't NATO still have technological superiority? Better training?

A: These advantages matter less when the fundamental grammar of war has changed. Superior training in obsolete tactics is worthless. Technological superiority in traditional platforms means little when a $1,000 drone defeats a $7 million tank.  

 

 

 

 

 

 

  1. ^

    Which is true. It was Russian propaganda. But it is also the best source on what the West's adversary is actually thinking and doing. 



Discuss

I am a rationalist

24 ноября, 2025 - 09:39
Published on November 24, 2025 6:39 AM GMT

I am a relatively central instance of what is referred to by the noun 'rationalist', in present-day discussion in the Western world. Not the idealized rationalist who makes perfect bayesian updates on all of their information; but someone from the school of philosophy on LessWrong, who has a strong practical and theoretical interest in human rationality, who knows how to write down Bayes' theorem[1], and is part of an extended network of people for whom this is also true.

To be clear, most of my life I have not identified as this symbol. This is for a few reasons:

  1. I felt people would no longer see me as an individual if I did so; they would see me as a soldier for some larger army. (This is similar to my internal pointer for keeping your identity small.)
  2. Relatedly, I didn't like the person that was described by that. The person felt too in-their-head, akratic / incompetent, more math-y than I am, a better programmer than me, to feel more like they understood the world than me, and to have a false belief-in the other rationalists as being wise and all-knowing.
  3. The word is itself a conflation between the idealized rationalist and the people on LessWrong interested in rationality, and it is kind of inappropriate to name yourself the same name as something far more accomplished than you.

However in the last 6-12 months, I have felt the pressures here let up.

Regarding the first two; somehow I feel able to act and be seen as an individual even if I say that I am a rationalist. In some regards this is because I am more well-defined—many relevant parties (e.g. EAs, Progress Studies, etc) cannot view me as simply another of a tribe, but as someone that they have a trade relationship with, who they cannot say false things about as easily. Relatedly, I have more status than before, which means that my own character and role shines through more clearly.

I am also less worried about tribalism; a standard temptation is to step up and defend groups you are part of, but I repeatedly find that I care not what people say about my tribe. Just the other night I heard some people at Inkhaven say some things about rationalists that seemed wrong to me, as they were mentioning lots of different groups and criticizing them; I felt no deep impulse to step in and correct them. Nor do I when people further afield on twitter or elsewhere speak of 'rationalists'. I am not that bothered what other people think.

The third one stands as it did before, and it is unfortunate. However, I don't think it makes sense for me to reject the label on principle and sow confusion. This is because nobody chose the label. If the rationality community were a company, or a religion, or a brand owned by someone, then I could refuse to use the name in protest of their poor judgment. But this is not the case! It is a name that has grown significantly due to others wanting a name for this group. Like existentialism, neoliberalism, and New Atheism, a name can come around in the culture for a group that did not themselves claim (or even much use for self-description). It nonetheless is still used to refer to a recognizable group of people and refers to a real phenomena, and it is unhelpful to pretend that it has no referent or you are not a part of it.

So I think it is correct for me to identify myself to others as a rationalist; it translates a lot of relevant information, and would be a lot of effort for me and my readers/listeners to all of that information without using the term.

Postscript: There's a question of whether to stick strictly to 'aspiring rationalist', which many have attempted. 

Firstly, I don't believe there's enough political will to change it hard enough so that outsiders would also change it (e.g. hooligans on twitter/reddit, people from afar)—this would take not merely just using 'aspiring rationalist', but being very annoying about it, refusing to accept the description, correcting it 100% of the time, etc. 

Secondly, a child who has learned the pythagorean theorem may call themselves a mathematician; a man who has to fight someone tomorrow may say some wisdom, like "A warrior does not second-guess himself on the night of battle". I think it makes sense for people practicing to be something to use the name of someone who has practised.

  1. ^

    Here's a way that I re-derive it. 

    In this image, suppose each point in the square represents a state of the world. The two circles are the two hypotheses, A & B.

    The circles are the same size but that's just what ChatGPT made for me, they can differ.

    Notice that, if you randomly pick a point, the probability of being in the orange area can be calculated as follow: P(A∩B)=P(A|B)×P(B).mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-surd + .mjx-box {display: inline-flex} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} . That is, take the probability of being in B, and then multiply that by the conditional probability of being in the orange area once you know you're in B, and you'll get the probability of being in (A∩B).

    Next, notice that this is symmetrically true for A. P(A∩B)=P(B|A)×P(A)

    Now we have two things equal to one another! Both are equal to the probability of the orange area, P(A∩B).

    P(A|B)×P(B)=P(B|A)×P(A)

    And if you haven't noticed, that's one step from Bayes' Theorem! Divide both sides by P(B) (or symmetrically by P(A)) to get the standard equation for conditional probability.

    P(A|B)=P(B|A)×P(A)P(B)


Discuss

Continuity

24 ноября, 2025 - 08:46
Published on November 24, 2025 5:46 AM GMT

I left China at the age of four. My memories of those first four years are scattered impressions: a three-leaf clover I chowed down on while Mom’s back was turned, the smell of a revolting herbal remedy, the time an older girl scratched me on the cheek in daycare.

We went back to visit every several years. The summer before college, I visited my birthplace, Chengdu, to see my grandparents. At some point, we had dinner with a larger group of family friends. Two of the children at that party, as it happened, had been my best friends in day care.

They remembered me. They remembered the daycare we went to, and the street it was on, and the ways our parents were connected. They told me about the group of boys who’d roughly all grown up together: one who excelled in school and was going to Peking University, another who went too deep into League of Legends, another who was currently obsessed with The Three Body Problem. They were so warm, inviting me back home like an old friend who’d always belonged. I was immediately one of the boys - they asked me to translate words they’d heard in American movies, and snickered at the definition of “asshole.” 

To them, I was a thread that had flown off the tapestry of their lives. They picked me right up, dusted me off, and sewed me back in. 

I didn’t remember a damned thing about them – I didn’t even know I had friends in daycare.

After arriving in the US, my family toured the M-states: I moved from Missouri to Maryland at six, and then to Massachusetts at ten or eleven.

Missouri is also barely a splash of impressions: climbing chestnut trees, encountering a proto-psychopath on the schoolbus, sitting for hours “helping” my father fish the Ozarks.

In Maryland I have more substantial memories: holding hands with my best friend William before we learned it was not cool, and then finding out it was not cool. I remember riding my first bike and then having it stolen by the older kids upstairs, and watching my mom sneak out at night to steal it back. I remember the face of the nasty teacher who gave me a C just to put me in my place. 

My memories come alive right around the time William introduced me to Diablo II. Although neither of our parents allowed us to play more than a couple hours a week, we spent many hours theorycrafting and poring over the official strategy guides. For many years after, I’d boot up Diablo just to recapture that time.

With the benefit of hindsight and the theory of spaced repetition, I understand now why I remember so little of those early years, and why only Diablo remains as fresh as yesterday. After I left China, my daycare chums in Chengdu passed the same streets, met the same elders, played with the same classmates month after month, year after year. Their memories of early childhood were reinforced again and again. They could easily triangulate even my minor, brief role in this world. The brain remembers those patterns that are repeated across time.

I had no such luck. Every few years, the world was switched out by an entirely new stage, with an entirely new cast. There was a surjective function from friends I held dear to days for saying goodbye. For others, life was a single, cohesive drama; for me, it was a series of improv scenes. It is no wonder that my memories are so scattered.

Math majors and PhDs often ask me how to decide between academic and industry jobs. Broadly speaking, these conversations have a common dramatic structure: the student lobs a bomb at me in the form of a mad lib:

Compared to academic jobs, industry jobs are 10x easier to find, pay 10x better, demand half the workload and half the red tape, BUT __.

My job in this drama is to defuse the bomb by filling in the blank with a single intangible value or principle - academic freedom, say - so beautiful that it overwhelms all practical considerations and justifies all the tragedy of academic existence. Some students hurl the bomb at me aggressively - in their heart of hearts they are already checked out of the academy and are looking to verify that the ivory tower is full of shit. Others hand me the bomb timidly, because they are romantics and martyrs at heart - with their eyes, they plead with me to half-ass the answer with anything remotely persuasive. They need something sacred to whisper on their lips as they throw themselves onto the cross of the academic job market.

I’ve always disliked this conversation, until now. I finally know how to fill in the blank, at least in a way that would have persuaded my past self:

Compared to academic jobs, industry jobs are 10x easier to find, pay 10x better, demand half the workload and half the red tape, but continuity.

I’ve been starved for continuity most of my life. My family moved when I was four, and then six, and then ten. Then, I went to college, did a PhD, did a postdoc, and finally landed a tenure-track professorship, moving seven times in 31 years. Seven times the stage was reset and the cast replaced.

How many more would it be if I go to industry? Everyone is moving, all the time. Startups collapse, or are acquired. Entire organizations are shuffled and reshuffled when new directives are delivered from on high. In many places, the best way to get promoted is to jump ship and be hired at a new level. One day, you’re shooting the wind with the coworker at the next desk over. The next day, the desk is empty.

What do I mean by continuity? The great cathedral of Notre Dame began construction in 1163 and was completed in 1345. Continuity is what I imagine being involved in that project was like: your father, and his father, and so on four generations back, all toiling towards a common cause, a single continuous sacred labor, that ties together every aspect of your life.

I completed my PhD in 2021.

My PhD advisor, Jacob Fox, completed his PhD in 2010. Around half of my research projects come from problems Jacob started thinking about more than a decade ago. I see him practically every year at conferences, workshops, or research visits. He is someone I can trust for advice about anything from career development, to research taste, to advising students. 

Jacob’s PhD advisor, Benny Sudakov, completed his PhD in 1999. Benny is a legendary PhD advisor who has trained and continues to train many outstanding mathematicians. This past summer, I raced Benny in the Random Run, a long-standing tradition of the biennial Random Structures and Algorithms conference. On a standard track, the number of laps in the run is determined by the roll of two dice; the second die is only rolled when the front-runner finishes the first set of laps. In the advisor-student pair category, Benny and his student Aleksa edged out my student Ruben and me for the win. In two years, I hope to be in better shape.

Benny’s PhD advisor, Noga Alon, completed his PhD in 1983. I received Erdős number 2 by spending 2021-2024 as a postdoc working with Noga, who is still sharper than any of us. Together with Joel Spencer, Noga wrote the textbook The Probabilistic Method which I and many others use to train PhD students. Joel has a fun tradition of publishing photos of young children reading The Probabilistic Method on his website. There is a picture of Jacob’s daughter there, as well as one of Noga reading the book to my six-month-old.

This is just one thread of a densely woven tapestry, a community of combinatorialists that traces itself back continuously to the problem-solving circles of Paul Erdős and his university buddies in Budapest. Our story is, I think, not dissimilar to that of the builders of Notre Dame. 

Erdős rolled the dice for the first Random Run in 1983. I pray the dice continue to roll for many years hence.



Discuss

Inkhaven Retrospective

24 ноября, 2025 - 08:19
Published on November 24, 2025 5:19 AM GMT

Here I am on the plane on the way home from Inkhaven. Huge thanks to Ben Pace and the other organizers for inviting me. Lighthaven is a delightful venue and there sure are some brilliant writers taking part in this — both contributing writers and participants. With 40 posts published per day by participants (not counting those by organizers and contributing writers) it feels impossible to highlight samples of them. Fortunately that's been done for me: the Inkhaven spotlight. And just to semi-randomly pick a single post to highlight, for making me laugh out loud, I'll link to one by Rob Miles. There are also myriad poignant, insightful, informative, and otherwise delightful posts. And, sure, plenty that don't quite work yet. But that's the point: force yourself to keep publishing, ready or not, and trust that quality will follow.

Confession: I'm not at all happy with this post. Opening with "here I am on the plane"? Painful. I would've fixed that in an editing pass, but I'm going to leave it because it illustrates two Inkhaven lessons for me. First, that sometimes it's ok to hit publish before something is perfect. And second, that I in particular need to follow the advice (#5 in my collection of Inkhaven tips) to dedicate blocks of time to dumping words onto the page. Editing is separate. If I hadn't started typing "here I am on the plane" then I would've sat there agonizing about a good opener, gotten distracted, and had nothing.

Write, publish, repeat.

I'm still agonizing about whether to commit to continuing to churn out a post every day for the rest of the month now that I've left. I do have a pretty much unlimited number of ideas to write about, even sticking to the theme of writing about writing. Here are half a dozen of them:

  1. Complaining about a list of words I've been curating for which Google's built-in dictionary definitions are garbage.
  2. What a work of art the 1913 edition of Webster's dictionary still is and how to configure a Mac laptop so it pops up those definitions when you hard-press on a word with the touchpad.
  3. Why Overleaf is the bees' knees for technical writing or collaborative writing (and definitely what you want for technical collaborative writing).
  4. More of my favorite word games (does that count as writing about writing? maybe I can find a way to make it count!).
  5. My ever-growing pile of notes about and examples of redefining everyday words as technical jargon ("common knowledge", "real number", "normal distribution") and how bad this is.
  6. Tips for dealing with trolls (beyond not feeding them, which is rules 1 through 17).

The problem is how daunting it feels to do justice to some of those. But that's where the writing tip to First Just Write comes in. It's been half an hour now of writing this thing and I'm starting to think I could bear to hit publish. If I do, I expect it will be the worst post I've published while at Inkhaven. But something has to be my worst post.

Judge for yourself. Here's a recap of everything else I published leading up to and during my Inkhaven stay:

  1. Goodharting and DIY Inkhavens (on the Beeminder blog)
  2. Blogception (on AGI Friday)
  3. Against Powerful Text Editors (mindless, repetitive edits waste less time than it seems and avoiding them is more costly than it seems)
  4. See Your Word Count While You Write (I made a tool, Tallyglot, to see your word count in the LessWrong editor and other places)
  5. Why to Commit to a Writing and Publishing Schedule (it matters for you and for your readers; also covers "how")
  6. Strategically Procrastinate as an Anti-Rabbit-Hole Strategy (aka Just-in-timeboxing)
  7. The Eightfold Path To Enlightened Disagreement (characterize, crux, ITT, steelman, scout-mindset, etc)
  8. Smarmbots, Secret Cyborgs, and Evolving Writing Norms (new rule: no plagiarizing LLMs)
  9. Musk vs McGurk (sensor fusion and self-driving cars)
  10. Mnemonic Exposition (how to name and gender hypothetical characters)
  11. Eat The Richtext (another tool I made for preserving formatting when pasting text)
  12. The Principle of Delayed Commitment (more pro-procrastination propaganda)
  13. Ten Wrong and Dumb Grammar Rules (infinitive splitting, less-vs-fewer, syntactic vs elocutionary punctuation, etc)
  14. Writing Tips from Inkhaven (listicles, curse of knowledge, Hemingway Mode, the out-loud-to-your-friend constraint, etc)
     


Discuss

Androgenetic haploid selection

24 ноября, 2025 - 06:10
Published on November 24, 2025 3:10 AM GMT

Eggs are expensive, sperm are cheap. It’s a fundamental fact of biology . . . for now.

Currently, embryo selection can improve any heritable trait, but the degree of improvement is limited by the number of embryos from which to select. This, in turn, is because eggs are rare.

But what if we could select on sperm instead? We could choose the best sperm from tens or even hundreds of millions, and use that to make an embryo. However, any method that relies on DNA sequencing must destroy the sperm. Sure, you can identify the best one, but that’s of limited value if you can’t use it for fertilizing an egg.

There have been a few ways proposed to get around this:

  1. Nondestructive sperm testing. Technically challenging: sperm DNA is packaged tightly and you would have to partially denature it without killing the cell. Selection based on total DNA content (separating X and Y bearing sperm) is possible but only useful for choosing the sex of the baby. Phenotypic selection (swim rate, etc) is not very useful because sperm phenotypes don’t correlate well with sperm genotypes.
  2. Doing in vitro spermatogenesis, and keeping track of which sperm came from where.[1] There are four sperm produced from each spermatocyte, and three of them could be destructively sequenced to deduce the genotype of the remaining one. Challenging (nobody has done human in vitro spermatogenesis yet) and low throughput.

Here, I propose a different approach, which I call androgenetic haploid selection.

Androgenetic haploid selection
  1. Make a bunch of eggs. The chromosomes and imprinting don’t have to be correct (we’ll get rid of them in the next step), so even a low quality in vitro oogenesis method would work. Something like Hamazaki’s approach would work well here.
  2. Remove the chromosomes from the eggs. This can be done at large scale through centrifugation: spin the eggs hard enough, and the DNA will fall out.
  3. Add an individual sperm to each egg and establish haploid stem cell lines. This recent paper is an example of doing this for cows and sheep. These cell lines are called “androgenetic” and retain the DNA imprinting patterns of sperm.
    1. Notably, Y-bearing sperm cannot make viable haploid stem cell lines because many essential genes are on the X chromosome.
  4. Sequence many cell lines and choose the best one. Because the cells divide, it’s possible to destructively sequence some of the cells from each line without destroying all the cells.
  5. Collect eggs the normal way, and “fertilize” them with nuclei from your chosen androgenetic cell line.
    1. Optionally: perform additional selection based on the embryo genome.
Comments on this approach
  1. This method could give high genetic optimization for the paternal half of the genome. At scale, I estimate an overall $200/sample cost for cell line establishment and sequencing, so taking the best of 100 cell lines could be performed for around the cost of a normal IVF cycle (~$20,000). For a perfectly heritable trait with a perfect polygenic score, this would give (+2.5 SD * 0.5) = +1.25 SD from sperm selection alone. (Gains will be lower for less heritable traits and less accurate predictors.)
  2. This would only work for daughters (sorry Elon!) Although genetic engineering could make XX males by adding SRY, this would probably not be a good idea.
  3. This would make even a low-quality in vitro oogenesis method valuable. More broadly, it’s not necessarily required that the recipient cells be eggs per se, as long as they express the correct factors for zygotic genome activation.
  1. ^

    This would have to be done at the spermatid stage, before the sperm swim away.



Discuss

Formality

24 ноября, 2025 - 05:19
Published on November 24, 2025 2:19 AM GMT

In Market Logic (part 1, part 2) I investigated what logic and theory of uncertainty naturally emerges from a Garrabrant-induction-like setup if it isn't rigged towards classical logic and classical probability theory. However, I only dealt with opaque "market goods" which are not composed of parts. Of course, the derivatives I constructed have structure, but derivatives are the analogue of logically definable things: they only take the meaning of the underlying market goods and "remix" that meaning. As Sam mentioned in Condensation, postulating a latent variable may involve expanding one's sense of what is; expanding the set of possible worlds, not only defining a new random variable on the same outcome space.

Simply put, I want a theory of how vague, ill-defined, messy concepts relate to clean, logical, well-defined, crisp concepts. Logic is already well-defined, so it doesn't suit the purpose.[1]

So, let's suppose that market goods are identified with sequences of symbols, which I'll call strings. We know the alphabet, but we don't a priori have words and grammar. We only know these market goods by their names; we don't a priori know what they refer to.

This is going to be incredibly sketchy, by the way. It's a speculative idea I want to spend more time working out properly.

So each sequence of symbols is a market good. We want to figure out how to parse the strings into something meaningful. Recall my earlier trick of identifying market trades with inference. How can we analyze patterns in the market trades, to help us understand strings as structured claims?

Well, reasoning on structured claims often involves substitution rules. We're looking at trades moving money from one string to another as edits. Patterns in these edits across many sentence-pairs indicate substitution rules which the market strongly endorses. We can look for high-wealth traders who enforce given substitution rules, or we can look for influential traders who do the same (IE might be low-wealth but enforce their will on the market effectively, don't get traded against). We can look at substitution rules which the market endorses in the limit (constraint gets violated less over time). Perhaps there are other ways to look at this as well.

In any case, somehow we're examining the substitution rules endorsed by the market.

First, there's equational substitutions, which are bidirectional; synonym relationships.

Then there's one-directional substitutions. There's an important nuance here: in logic, there are negative contexts and positive contexts. A positive context is a place in a larger expression where strengthening the term strengthens the whole expression. "Stronger" in logic means more specific, claims more, rules out more worlds. So, for example, "If I left the yard, I could find my way back to the house" is a stronger claim than "If I left the yard, I could find my way back to the yard" since one could in theory find one's way back to the yard without being able to find the house, but not vice versa. In "If A then B" statements, B is a positive context and A is a negative context. "If I left the yard, I could find my way back to the house" is a weaker claim than "If I left the house, I could find my way back to the house", because it has the stronger premise.

Negation switches us between positive and negative contexts. "This is not an apple" is a weaker claim than "This is not a fruit". This example also illustrates that substitution can make sense on noun phrases, not just sub-sentences; noun phrases can be weaker or stronger even though they aren't claims. Bidirectional substitution subsumes different types of equality, at least =.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-surd + .mjx-box {display: inline-flex} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}  (noun equivalence) and ↔ (claim equivalence). One-directional substitution subsumes different types as well, at least ⊆ (set inclusion) and → (logical implication). So, similarly, our concept of negation here combines set-compliment with claim negation.

Sometimes, substitution rules are highly context-free. For example, 2=1+1, so anywhere 2 occurs in a mathematical equation or formula, we can substitute 1+1 while preserving the truth/meaning of the claim/expression.

Other times, substitutions are highly context-dependent. For example, a dollhouse chair is a type of chair, but it isn't good for sitting in.

A transparent context is one such as mathematical equations/formulas, where substitution rules apply. Such a context is also sometimes called referentially transparent. An opaque context is one where things are context-sensitive, such as natural language; you can't just apply substitution rules. This concept of transparent context is shared between philosophy of language, philosophy of mind, linguistics, logic, and the study of programming languages. One advantage claimed for functional programming languages is their referential transparency: an expression evaluates exactly the same way, no matter what context it is evaluated in. Languages with side-effects don't have this property.

So, in our market on strings, we can examine where substitution rules apply to find transparent contexts. I think a transparent context would be characterized as something like:

  1. A method for detecting when we're in that context. This might itself be very context-sensitive, EG, it requires informal skill to detect when a string of symbols is representing formal math in a transparent way.[2]
  2. A set of substitution rules which are valid for reasoning in that context. This may involve a grammar for parsing expressions in the context, so that we know how to parse into terms that can be substituted.

The same could characterize an opaque context, but the substitution rules for the transparent context would depend only on classifying sub-contexts into "positive" or "negative" contexts.

There's nothing inherently wrong with an opaque concept; I'm not about to call for us to all abandon natural languages and learn Lojban. Even logic includes non-transparent contexts, such as modal operators. Even functional programming languages have quoted strings (which are an opaque context).

What I do want to claim, perhaps, is that you don't really understand something unless you can translate it into a transparent-context description.

This is similar to claims such as "you don't understand something unless you can program it" or "you don't understand something unless you can write it down mathematically", but significantly generalized.

Going back to the market on strings, I'm saying we could define some formal metric for how opaque/transparent a string or substring is, but more opaque contexts aren't inherently meaningless. If the market is confident that a string is equivalent (inter-tradeable) with some highly transparent string, then we might say "It isn't transparent, but it is interpretable".

Let's consider ways this can fail. 

There's the lesser sin, ambiguity. This manifests as multiple partial translations into transparent contexts. (This is itself an ambiguous description; the formal details need to be hashed out.) The more ambiguous, the worse.

(Note that I'm distinguishing this from vagueness, which can be perfectly transparent. Ambiguity creates a situation where we are not sure which substitution rules to apply to a term, because it has several possible meanings. On the other hand, the theory allows concepts to be fundamentally vague, with no ambiguity. I'm not married to this distinction but it does seem to fall out of the math as I'm imagining it.)

There could be a greater sin, where there are no candidate translations into transparent contexts. This seems to me like a deeper sort of meaninglessness.

There could also be other ways that interpretations into a transparent context are better or worse. They could reveal more or less of the structure of the claim. 

I could be wrong about this whole thesis. Maybe there can be understanding without any interpretation into a transparent context. For example, if you can "explain like I'm five" then this is often taken to indicate a strong understanding of an idea, even though five-year-olds are not a transparent context. Perhaps any kind of translation of an idea is some evidence for understanding, and the more translating you can do, the better you understand.

Still, it seems to me that there is something special in being able to translate to a transparent context. If somehow I knew that a concept could not be represented in a transparent way, I would take that as significant evidence that it is nonsense, at least. It is tempting to say it is definitive evidence, even.

This seems to have some connections to my idea of objectivity emerging as third-person-perspectives get constructed, creating a shared map which we can translate all our fist-person-perspectives into in order to efficiently share information.

  1. ^

    You might object that logic can work fine as a meta-theory; that the syntactic operations of the informal ought to be definable precisely in principle, EG by simulating the brain. I agree with this sentiment, but I am here trying to capture the semantics of informality. The problem of semantics, in my view, is the problem of relating syntactic manipulations (the physical processes in the brain, the computations of an artificial neural network) with semantic ones (beliefs, goals, etc). Hence, I can't assume a nice interpretable syntax like logic from the beginning.

  2. ^

    This is actually rare: if I say

    ... the idea is similar to how (a−b)(a+b)=a2+b2

    then I'm probably making some syntactic point, which doesn't get preserved under substitution by the usual mathematical equivalences. Perhaps the point can be understood in a weaker transparent context, where algebraic manipulations are not valid substitutions, but there are still some valid substitutions?



Discuss

Why Talk to Journalists

24 ноября, 2025 - 05:07
Published on November 24, 2025 2:07 AM GMT

Sources' motivations for talking to journalists are a bit of a puzzle. On the one hand, it's helpful for journalists to work out what those motivations are, to keep sources invested in the relationship. On the other hand, sources behave in perplexing ways, for instance sharing information against their own interests, so it's often best to treat their psychology as unknowable. 

Reflecting on sources' willingness to share compromising information, one mystified AI journalist told me last weekend, "no reasonable person would do this."

But to the extent I can divine their motivations, here are some reasons I think people talk to me at work:

  • Bringing attention and legitimacy to themselves and their work
  • Trading tips and gossip
  • Steering the discourse in favorable ways
    • E.g. Slandering your enemies and competitors
  • Feeling in control of your life
    • E.g. an employee might want to leak information to feel power over their boss
  • Therapy
  • A sense of obligation
    • E.g. to educate the public
    • E.g. to be polite when someone calls you for help
  • It feels high-status

Most of these are not particularly inspiring, but if you work in AI safety, I want to appeal to your theory of change. If your theory of change relies on getting companies, policymakers, or the public to do something about AI, the media can be very helpful to you. The media is able to inform those groups about the actions you would have them take and steer them toward those decisions. 

For example, news stories about GPT-4o and AI psychosis reach the public, policymakers, OpenAI investors, and OpenAI employees. Pressure from these groups can shape the company's incentives, for instance to encourage changes to OpenAI's safety practices.

More generally, talking to journalists can help raise the sanity waterline for the public conversation about AI risks.

If you are an employee at an AI lab and you could see yourself whistleblowing some day, I think it is extra valuable for you to feel comfortable talking to journalists. In my experience, safety-minded people sometimes use the possibility of being a whistleblower to license working at the labs. But in practice, whistleblowing is very difficult (a subject for a future post). If you do manage to overcome the many obstacles in your way and try to whistleblow, it would be much easier if you're not calling a journalist for the first time. Instead, get some low-stakes practice in now and establish a relationship with a journalist, so you have one fewer excuse if the time comes.

Maybe news articles offend your epistemic sensibilities because you've experienced Gell-Mann amnesia and have read too many sloppy articles. Unfortunately, I don't think we can afford to be so picky. If you don't talk to journalists, you cede the discourse to the least scrupulous sources. In this case, that's often corporate PR people at the labs, e/acc zealots, and David Sacks types. They are happy to plant misleading stories that make the safety community look bad. I think you can engage with journalists while holding to rationalist principles to only say true things.

It's pretty easy to steer articles. It often only takes one quote to connect an article on AI to existential risks, when counterfactually, the journalist wouldn't have realized the connection or had the authority to write it in their own voice. For example, take this recent CNN article on a ChatGPT suicide. Thanks to one anonymous ex-OpenAI employee, the article connected the suicide to the bigger safety picture:

One former OpenAI employee, who spoke with CNN on the condition of anonymity out of fear of retaliation, said “the race is incredibly intense,” explaining that the top AI companies are engaged in a constant tug-of-war for relevance. “I think they’re all rushing as fast as they can to get stuff out.”

It's that easy!

Overall, it sounds disingenuous to me when people in AI don't talk to journalists because they dislike the quality of AI journalism. You can change that!

Which came first?

If you appreciate initiatives like Tarbell that train journalists to better understand AI, you should really like talking to journalists yourself! Getting people who are already working in AI safety to talk to journalists is even more cost-effective and scalable. Plus, you will get to steer the discourse according to your specific threat models and will enjoy the fast feedback of seeing your views appear in print.

Here are some genres of safety-relevant stories that you might want to contribute to:

  • Exposing wrongdoing at AI companies
    • E.g. whistleblowing about companies violating their RSPs
  • Early real-world examples of risks (warning shots)
    • E.g. the Las Vegas bomber who got advice from ChatGPT
  • Connecting news to safety topics
    • E.g. explaining why cutting CAISI would be bad
  • Highlighting safety research
    • E.g. explaining how scheming evals work
  • Explainers about AI concepts
    • These generally improve the public's AI literacy

In practice, articles tend to cut across multiple of these categories. Op-eds also deserve an honorable mention: they don't require talking to journalists in the sense I'm writing about here, but some of the best articles on AI risks have been opinion pieces.

Quick Defenses

I'll briefly preempt a common objection: you're worried that journalists are going to misquote you or take you out of context. 

First, I think that's rarer than you might expect, in part because you've probably over-indexed on the Cade Metz incident. Plus, journalists hate being wrong and try to get multiple sources, as I wrote in Read More News.

Second, you can seek out experienced beat reporters who will understand you, rather than junior ones.

Third and most importantly, even if you do get misquoted, it doesn't mean talking to the journalist was net-negative, even for that particular piece and even ex-post. As annoying as it is, it might be outweighed by the value of steering the article in positive ways.



Discuss

I made a tool for learning absolute pitch as an adult

24 ноября, 2025 - 04:09
Published on November 24, 2025 1:09 AM GMT

I read a study that claims to have debunked the myth that only children can learn absolute pitch, and got 12 musicians who’ve not previously had absolute pitch to improve significantly at having absolute pitch.

On average, they spent 21.4 hours over 8 weeks, making 15,327 guesses. All learned to name at least 3 pitches with >90% accuracy, having to respond in under 2.028 seconds; some learned all 12. The average was 7.08 pitches learned.

Notably, the results on the new instruments were worse than on the instruments they were trained on, suggesting people can somewhat learn to rely on the cues from the specifics of the used instrument’s timbre:

 

The way it works is simply by having very short feedback loops. You hear a sound (played on a piano in the study) and have 1-2 seconds to make a guess for what pitch it is.

You learn new pitches gradually: first, you need to identify one (and press keys for whether it’s that pitch or some other pitch), and then, more pitches are gradually added.

In the study, before testing without feedback, to reset relative pitch memorization, a Shepard tone is played for 20 seconds. (It’s an auditory illusion that makes you feel like the pitch is perpetually getting lower or higher.)

I asked an LLM to make a web app version of it. I asked it to additionally use the Shepard tone more often for a shorter amount.

I also asked it to add colors to maybe produce some amount of synesthesia. I think there’s research that shows that synesthesia and absolute pitch correlate; I don’t know whether it can be induced to some extent, or would only be helpful for some people, but it seemed good to add in case it works. Later, someone on Twitter told me that they were taught the tones of Mandarin using colored cards, and it worked for them. People who experience synesthesia to at least some extent might have an easier time learning more pitches, though I’m not sure if it would be helpful to others.

I tried to minimize the time between recognition and feedback, so the web app reacts to the starts of the key presses, clicks, and touches, not to their ends; and immediately shows whether you were correct, and what was correct.

Finally, I added more instruments than just piano, hopefully, for better generalization.

With the first version, I posted it on Twitter:

 

It got a surprisingly high amount of engagement, which made the post a bit unfortunate in retrospect, because I made it before I actually fixed the bugs produced by the LLMs (now all fixed); on the other hand, the engagement meant that now I actually had to fix the bugs for people to be able to use the tool.

Two people shared that they have already learned to identify three pitches!

 

I now want to do experiments with a bunch of things (including the order of pitches presented: can it improve the learning curve and allow people to learn more than three more easily?), to collect the data on people’s progress, and maybe ask them questions (like whether they’ve played music or sang before).

Would appreciate recommendations for how to collect the data well without having to do anything complicated to manage it.

Would also appreciate more ideas for how to improve it for better pitch learning.

If you want to try to acquire perfect pitch, it might take you quite some time, but try it:

perfect-pitch-trainer.pages.dev



Discuss

"Self-esteem" is distortionary

24 ноября, 2025 - 02:59
Published on November 23, 2025 11:59 PM GMT

A friend asked me, "what's the right amount of self-esteem to have?" Too little, and you're ineffectual. Too much, and you get cocky. So how do you choose the right balance? 

I replied that this is a trick question. 

People with low self-esteem have thoughts like "I'm a loser", "my IQ is too low to succeed", "no one could love someone as fat as me". Their problem is not quite that they've got inaccurate beliefs. They make in fact be a loser. Rather, their problem is that they've attached their identity to concepts that limit their action space. 

For instance, the notion of low IQ. This is a construct that's predictive at a population level, but it doesn't give you some predictive power on an individual level unless it's the only thing you know about a person. But you can rapidly accumulate info about someone, or yourself, that outweighs the info expressed by "your IQ is 101". E.g. if you want to know someone's test scores, you'll do a lot better by using their scores on mock exams than by using their IQ. 

Which means that someone who says "I can't fix my car because I've got a low IQ" isn't actually making full use of the info available to them. They're relying on a sticky prior. What they should actually be doing if they care about fixing their car is asking "what's stopping me from fixing it?" and checking if solving that problem is worth the costs compared to paying a mechanic. The cost may be large. They may have to put in dozens of hours of work before they understand cars well enough to fix their problem without paying anyone. But they could do it.

So the issue is that the belief about "low IQ" has led to imaginary walls around what can be done that do not actually reflect reality. 

In other words, low self-esteem turns a bump in the road into a cliff of seemingly infinite height, cutting off an entire avenue of approach. It reduces your sense of what is possible, and from the inside, it feels like you've got less free-will

What is the solution? Knock down the walls. 

In day to day life, we have to simplify the action space because we are computationally bounded systems. We introduce simplifications for good reasons, and for bad reasons. That's normal. Thing get problematic when those simplifications restrict the space till there is no good action left. Then, the appropriate reaction is to relax the constraints we impose on ourselves, test if the relaxation is valid, and take best action we've got left. If we were able to do this reliably, we would find ourselves doing the best we can, and low self-esteem would be a non-issue. 



Discuss

Rationalist Techno-Buddhist Jargon 1.0

24 ноября, 2025 - 02:39
Published on November 23, 2025 11:39 PM GMT

Scott Alexander called me a rationalist techno-Buddhist on his blog. Since Scott Alexander is a rationalist of the highest status, that wording constitutes rationalist dharma transmission. I therefore consider myself authorized to speak authoritatively on the topic of rationalist techno-Buddhism.

Why am I writing a glossary? Because there are 14 different kinds of Buddhism and they all use words to mean slightly different things. This is a problem. I hope that this document will end 2,500 years of sectarianism, such that all of us finally communicate perfectly with no misundersandings.

But just in case there exist one or more people on the Internet who disagree with some aspect of this document, I have included a "1.0" in this document's title. You are permitted to fork it into 1.1 or 1.<your-name-here> or 2.this.is.why.lsusr.is.wrong.about.everything. Now, if you write about Buddhism, then instead of tediously defining all the terms you're using, you can just say "This uses Rationalist Techno-Buddhist Jargon 2.this.is.why.lsusr.is.wrong.about.everything.17.2" and get back to arguing online, sitting in silence, or whatever else it is you do to make the world a better place.

This list is ordered such that you can read it beginning-to-end without having to jump forward for a definition.

Warning

This document may be cognitohazardous to some people. Proceed at your own risk. Thank you Iraneth for feedback on an early draft.

Glossary

pragmatic dharma. A loosely-connected movement, mostly Western lay practitioners, focused on reproducible meditative methods and transparency about experiences. This differs from traditional Buddhism by not appealing to traditional religious authority.

rationalist techno-Buddhism (RTB). A movement within the pragmatic dharma that is trying to create cybernetic models for why and how this stuff works.

qualia. Subjective first-person experience.

consciousness. The dynamic field in which qualia arise and are perceived.

attention. The part of your consciousness you are paying attention to. Attention can be in only one place at a time.

concentration. When a person stabilize their attention on a target e.g. the breath. Strong concentration states elicit altered states of consciousness. Concentration is a skill that can be improved with practice.

kasina. Meditation using a visual target instead of the breath.

altered state (of consciousness). A temporary non-normative state of consciousness, usually caused by strong concentration.

access concentration. The first non-normative altered states of consciousness, through which all other altered states are entered. Access concentration is when your attention stabilizes on its target. For example, if you are meditating on your breath, then access concentration is when your attention stabilizes on your breath.

jhana. An altered state of consciousness characterized by deep meditative absorption. There are 8 jhanas. Jhanas are used in Theravada practice.

nirodha-samapatti. An altered state beyond the 8 jhanas at which all perception ceases.

mushin. A state of unobstructed action without deliberative thought. Mushin starts out as an altered state, but eventually it turns into an altered trait.

nonduality. An altered state of consciousness without distinction between self (homunculus) and other.

duality. Normative non-nonduality.

homunculus. Physically-speaking, your field of consciousness is a real-time generative model created by your brain. Inside of this model, some elements are labelled "self" and constitute your homunculus.

generative model. See wikipedia.

raw sensory inputs. The signals going into the generative model. This probably includes preprocessed data from e.g. your brainstem. What matters is that this data is raw from the perspective of the generative model in your brain.

altered trait. A permanent change to subjective experience. In the context of RTB, altered traits are caused by meditation.

ego death. An altered trait where the homunculus in your brain ceases to exist. [[1]]

fabrication. When the generative model in your brain creates an object in consequence in an attempt to reduce predictive error, usually in an attempt to simulate external reality. All conscious experiences are fabricated, but not all fabrications are experienced consciously. You can think of your brain as a video game rendering engine. Fabrication is your brain rendering physical reality in its simulated mirror world.

rendering. Synonym for fabrication.

encapsulation layer. When a fabricated element in your consciousness is so sticky that it is never not fabricated. It is difficult for normative consciousness to directly perceive that encapsulation layers are fabricated. Encapsulation layers feel like raw inputs until you pay close enough attention to them.

chronic fabrication. Synonym for "encapsulation layer".

non-recursive encapsulation layer. A fabrication that summarizes incoming raw sense data, thereby blocking direct conscious (attentive) access to the perception of that raw sense data. Examples of non-recursive encapsulation layers include non-local space and non-local time.

non-local space. Normative perception of space as a gigantic world far beyond your immediate environment.

local space. Perception of space after dissolution of space.

non-local time. Normative perception of time.

local time. Perception of time after dissolution of time. Eternal present.

recursive encapsulation layer. A fabrication created to block a problematic feedback loop caused by self-reference. Ultimately, recursive encapsulation layers are caused by an interaction between the generative algorithm in your brain and the reinforcement learning algorithm in your brain. Examples of recursive encapsulation layers include self/other duality, desire, pain-as-suffering, and willful volition. See [Intuitive self-models] 6. Awakening / Enlightenment for further explanation.

willful volition. The recursive encapsulation layer that is misinterpreted as free will.

acute encapsulation. A non-chronic encapsulation algorithm that doesn't congeal into a permanent element of perceptual reality. Encapsulation functions are non-chronic because they appear only in response to unpleasant stimuli. Pain-as-suffering is an acute encapsulation function, because it doesn't drag down your hedonic baseline.

chronic encapsulation layer. An encapsulation layer that is so stable, it is incorrectly perceived as raw input data to your field of consciousness. For people who don't understand conceptually that everything you perceive is part of a simulation, chronic recursive encapsulation layers are incorrectly understood to be elements of objective physical reality. Chronic encapsulation layers cause chronic suffering.

insight. An abstract concept measuring the cumulative effects on your brain when you pay attention to fabrications in your consciousness. The word "insight" lossily and pragmatically projects these effects into a single dimension. Accumulating insight eventually unsticks encapsulation layers, and then defabricates them.

dissolution. Permanent defabrication. When the defabrication of an encapsulation becomes a person's default mind state. Non-permenent defrabrication often percedes permanent defabrication.

integration. Dealing with the aftermath after an encapsulation layer has been dissolved. Fabrications are often load-bearing. Dissolving fabrications therefore often removes load-bearing components of a person's consciousness. After this, the person must learn new, healthier cognitive habits. This process is called integration.

vipassana sickness. Mental destabilization from too much insight too quickly with insufficient integration. In extreme cases vipassana sickness can cause psychosis (or worse, because unexpected psychosis can cause accidental death), especially when paired with sleep deprivation. This is similar to how people on an LSD trip can think "cars aren't real" and go wandering into traffic if unsupervised.

dissolution. A permanent shift (altered trait) from fabrication to non-fabrication. All dissolutions cause permanent reductions in chronic suffering.

dissolution of self. Synonym for ego death.

dissolution of desire. An altered trait where your brain's reinforcement learning algorithm is no longer abstracted into desire-as-suffering.

dissolution of space. An altered trait where you no longer feel like a small person walking around a gigantic world and your brain instead renders just your local, immediate environment. When this happens it stops feeling like your body is walking around a fixed world, and more like the world is moving while your body remains stationary.

dissolution of time. An altered trait where past and future are defabricated such that you live in local time.

suffering. Absolute-zero-based suffering. Normative models of consciousness have positive qualia (pleasure) and negative qualia (suffering). RTB uses a model based on absolute zero based model of suffering instead. The normative model is like Celsius or Farenheit, whereas RBB's model is more like the kelvin scale. Pleasure is a decrease in suffering, the same way cold is thermodynamically-speaking the removal of heat. Heat is fundamental. Cold is not fundamentally. Similarly, suffering is fundamental in a way that pleasure is not.

chronic suffering. Suffering produced by a chronic encapsulation layer. Normative levels of suffering have a floor produced by the chronic suffering induced by self, willful volition, non-local space, non-local time, etc.

hedonic baseline. A person's level of suffering when acute suffering is removed, leaving only chronic suffering.

enlightenment. Absolute zero chronic suffering. It may be physically impossible for human minds to reach such a state while alive and conscious. Absolute zero is still useful as a reference point or limit point. It's like a Carnot engine.

pleasure. An acute stimuli that temporarily reduces a person's suffering. Normative people can dive below their hedonic baseline temporarily, and conceptualize such dives as positive valence "pleasure". Lowering the floor itself requires that chronic encapsulation layers be dissolved. When a person's hedonic baseline drops, stimuli that used to be pleasurable become unpleasant, because they felt better than the previous hedonic baseline, but worse than the new hedonic baseline.

jhana junkie. A person who does jhanic practice without accumulating insight. Jhana junkies get stuck on the path to awakening, but being a jhana junkie is not dangerous the way vipassana sickness is dangerous.

awakening. Dissolution of a chronic fabrication. Awakenings tend to have a 1-to-1 correspondence with completed insight cycles.

insight cycle. A discrete cycle of three phases: concentration, insight and integration. In the concentration phase you cultivate concentrative skill. In the insight phase, you penetrate an encapsulation layer. Finally, in the integration phase, you deal with the fallout of blowing up that encapsulation layer. It takes effort to get to your first insight cycle, but after your first insight cycle, there's no stopping the process. Insight cycles will keep coming for years, whether you want them to or not. That's because chronic suffering is an obstacle to concentration. Completion of an insight cycle thereby improves your concentration, thus making your next insight cycle easier. This is a chain reaction. Your fabrications are like a woven fabric with a warp and a weft. If you leave the whole thing alone then it will stay intact. Your first insight cycle cuts the fabric and yanks on the weft. If you continue pulling on the weft then it'll unwind faster, but the fabric will continue to fall apart whether or not you pull on the weft. This is an old Zen saying "Better not to start. Once started, better to finish."

knowledge of suffering. An early phase in an insight cycle where you notice that your mind has been doing something stupid and unpleasant for longer than you can remember.

dark night. The phase of an insight cycle that takes place immediately after knowledge of suffering. Encapsulation layers exist to shield you from unpleasant perceptions. When you dissolve an encapsulation layer to get knowledge of suffering, you remove that shield, and all of the stuff it was protecting you from enters attention-accessible consciousness. This can be very unpleasant. Some people can cycle through many dark nights before landing stream entry.

hell realm. When you're stuck in a dark night. A person percieves what their consciousness is doing wrong (gets knowledge of suffering), but doesn't have the ability to fix it yet. I suspect that LSD-induced hell realms are particularly difficult to escape, because they're like taking a helicopter to the top of Mt Everest without learning mountaineering first.

stream entry. Ambiguously refers to the successful completion of your first insight cycle and/or your first awakening. It is customary to wait 1 year plus 1 day after awakening before claiming stream entry because ① it ensures you are experiencing an altered trait, not just an altered state, and ② it ensures you have completed the integration part of the insight cycle, thereby satisfying both definitions. During this time you should not make any big unilateral life decisions more irreversible than going vegan [[2]] . Stream entry typically reduces chronic suffering by at least 90%.

stream entry mania. The immediate aftermath of stream entry often produces a manic-like state. For this reason, it is recommended that you not post anything on social media for a few months after stream entry. The cooling of period is even longer for posts related to spirituality. Instead, you should talk to a trusted spiritual advisor. It is best if you establish a relationship with this person before you hit stream entry.

kensho. A glimpse of nonduality (or similar non-encapsulation) via a transient state but which leaves lasting insight. Kensho preceeds stream entry.

Cthulhu R'lyeh wgah'nagl fhtagn. Cthulhu waits dreaming in R’lyeh.

  1. In RTB, ego death refers to an altered trait. Confusingly, LSD induces an altered state of consciousness where the ego is not present. LSD trippers usually refer to this state as "ego death", whereas RTBs refer to it as a nondual state, since the altered state is temporary and the ego reappears after the LSD trip is over. ↩︎

  2. If you do go vegan, make sure you take a multivitamin so you don't get brain damage. ↩︎



Discuss

Finding the uncertainty vector in GPT2-scale transformers

24 ноября, 2025 - 02:34
Published on November 23, 2025 11:34 PM GMT

In this post I explore a phenomena in LLMs where the training process naturally consolidates information in a highly interpretable structure in the residual stream, through a positive feedback loop from a small variation at initialization. I start with a toy example and work up to GPT2 scale, showing animations of how weights and activations evolve over training. I assume familiarity with the transformer architecture.

The exploratory tone of this post will likely lead to more questions than answers. The intended audience is people hoping to learn more about transformer internals and their dynamics over training. The motivating question being "What is going on in this GPT2-scale model because these charts look incredibly weird".  

The dimensions in the residual stream are often thought of as an uninterpretable arbitrary rotation of the feature space, since the standard transformer does not have an operation that makes it a privileged basis. Yet, the behavior above for dimension 275 is remarkably distinct. These charts show the evolution of three dynamics over the course of 600 training steps as the model progresses from 10.8 CE loss to 3.6 CE loss:

  1. How does the language model head, which is of shape [d_model, d_vocab], update for [275, :]? This means if the model blocks add to dimension 275 in the residual stream, what change does that cause to the prediction of each token in the vocabulary?
  2. How do 3 specific neurons in the final MLP evolve? Each neuron multiplies its activation by an output vector and pushes in that direction. The first two neurons push in very specific dimensions, notably dimension 275. The third is included as a normal neuron baseline.
  3. What is the activation distribution (sampling 60k tokens) of the residual stream right before getting multiplied by the lm_head for dimension 275?

The short answer is that dim 275 in the residual stream is functioning as an uncertainty vector. If the model puts 100% of its weight into 275, it will output roughly the naïve unigram distribution. Given the vector is normed, 100% of weight corresponds to root(768) or 27.7 in the last chart above. MLP Neurons then have a handle to perform several useful actions:

  1. If a neuron fires on a very particular fact, it can add to dim 275 (make less negative) to indicate certainty.
  2. If a neuron fires on ambiguous contexts, it can subtract from dim 275 (make more negative) to indicate uncertainty.

Part of why this occurs is because this version of GPT2 does not have a bias in the lm_head. A bias in the lm_head will directly model the unigram distribution, which decreases the gradient pressure for this dynamic to develop in the residual stream.

 

Toy Model

The first peculiar fact about the charts above is that training evokes a distinguished basis in the residual stream over time for dimension 275, even though mathematically it doesn't appear like there is any mechanism for this. To explore this further, I look at how simple initialization differences in a toy model can evoke roles for dimensions in the residual stream. Feel free to skip this section if this idea is already obvious.

The task the toy model will be learning is to predict the next digit (i) as a function of the preceding digit (j). The sequence of digits 0 to 9 will be sampled according to the scenario.

 Scenario 1

The digits are sampled from the distribution [0.1,0.2,...,1], normalized such that digit 9 is 10x more likely than digit 0. The model is tasked with minimizing cross entropy loss via stochastic gradient descent. 

I start with the simplest transformer possible: no blocks and a model dimension of 1. The model has 20 coefficients, one for each input digit and one for each output digit. The prediction logit that digit 4 will follow digit 3 is calculated by input_coef_3 * output_coef_4. 

Input coefficients are initialized to the normal distribution, and output coefficients are initialized to 0. Training produces the following charts:

The input coefficient learns a constant because the sequence data is independent. The output coefficient learns the distribution to match the unigram distribution of the data.

 Scenario 2

I now update the model dimension from 1 to 2. Training produces the following charts:

The computation is spread out across both dimensions. The second chart shows how the input coefficients sit on a line. A single model dimension has enough degrees of freedom to fit the data, but because there are so many more valid linear combinations that reach the result, it statistically improbable for the relationship to fall into exactly a single dimension. When this fuzzying occurs across a large number of dimensions in a larger scale transformer, it can become challenging to disentangle the underlying mechanisms.

 Scenario 3

I now repeat scenario 2, but initialize the first dimension of the input coefficients to 10 instead of a normal distribution around 0.

The trained model isolates the learned portion into dimension 0. Dimension 1 output coefficients stay at zero. This occurs for two reasons: 

  1. Non-zero activation mean. Lets say I want to modify output_coef_4 weights such that I increase the likelihood of predicting digit 4 for all inputs. If all inputs are positive values, then I can increase the output by increasing the coefficient. If all inputs are negative, then I can increase the output by decreasing the coefficient. If outputs are mixed with a mean of zero, then the gradients largely cancel out. In the general case, if the subset of inputs that we want to shift have a shared sign for a given dimension, that dimension will have asymmetrically large gradient pressure on the output coefficients.
  2. Higher magnitude activations. The gradient on the output coefficients is proportional to the magnitude of the corresponding input coefficients. Once the pattern in the data has been fit, the gradients will all fall to zero. So if one dimension can respond faster, it can claim the reward.

     

Scenario 4

Next I return to initializing the input coefficients from the normal distribution, but I add a twist to the data distribution: Digits 0-8 are sampled from the distribution [0.1,0.2,...,0.9]. I then update any tokens that follow 3 with 9. A trained model should always predict 9 when it sees 3, and otherwise predict the same distribution as earlier scenarios.

The computation gets spread out across both dimensions. The model learns to predict increasing probabilities for digits 0 through 8. The red line in the last chart corresponds to the output predictions for input 3. It is zero for all digits except for 9, where it jumps to 100%. 

 

Scenario 5

I take scenario 4 and initialize the first dimension of the input coefficients to 10.

Once again, dimension zero dominates on modeling the pattern that applies to all input dimensions. Dimensions with consistent inputs can apply consistent patterns to inputs. Dimension 1 contributes more heavily to the prediction of 9 given 3.

The main takeaway here is that even though the mathematical structure of the model does not induce a preferred basis in the residual stream, parameter initialization can create a basis that persists throughout the training process. But does this actually scale to small LLMs?

 

GPT2-Small Scale

I will be referencing training runs from modded-nanogpt, which is the GPT-2 scale model I am most familiar with. It has a model dimension of 768, 11 layers, and 6 heads per layer, and no biases on any linear projections or the language model head. The GPT2 tokenizer is used with a vocab size of 50257, with 47 padding tokens at the end to give a clean 50304.

At initialization

The training dataset is FineWeb, which like most other training dataset, has tokens that roughly follow a log normal sampling distribution. That distribution is shown below:

The spike in the bottom right plot on the left tail corresponds to the 270 tokens with zero occurrences, which are defaulted to 1e-8. The most common tokens are listed below. In general smaller token_ids tend to have higher frequency.

Token_idToken_strFraction13.0.037811,0.0360262_the0.0336198\n0.0217

I will look deeper at token 262 '_the', which makes up 3.36% of the training corpus. Below is its initial embedding weight across the 768 model dimensions, which is sampled from a normal distribution:

This looks exactly like one might expect, with a mean very close to zero of 0.03. The most extreme value of the distribution comes from dimension 275, with a value of -3.71875. 

The chart below shows what dimension 275 looks like across all tokens in the vocabulary, with '_the' marked with a red dot:

Things start to get interesting when we look at the activation distribution. That is, when we pass the data distribution into the model, what distribution of embedding outputs is produced in the residual stream? 

The distribution is no longer normal. The spike from '_the' at -3.71875 is shifting the mean to -0.22. How does this mean compare to the other 767 dimensions?

-0.22 falls near the far left tail, which indicates that dimension 275 starts out with one of the most lopsided distributions.

At initialization the language model head and all output projections from each block are set to zero. 

 During Training

On step 1 the only gradient pressure on the model is on the lm_head. The lm_head vectors will update in the direction that models the bigram distribution. In other words, every time the model sees the sequence '_the X', token X will update its lm_head vector in the direction that makes it more similar to '_the', and all other lm_head vectors will update in the direction that makes them less similar to '_the'.

Since dimension 275 is sharply negative for '_the', whenever a token follows '_the', it will see its dimension 275 decreased. All other tokens will increase. 

Roughly 9k/50k tokens see their lm_head vector decreased on step 1 for dimension 275. The mean of this chart is 0.005, as most tokens see an increase. However, the tokens that decrease are the ones that occur most frequently in the data. Naturally, if a token frequently follows 'the_' it will generally occur more frequently.

If I weight every token by its rate of occurrence, the weighted mean of dimension 275 in the lm_head drops to -0.0035. To understand if this is substantial, I plot the weighted-mean shift of every one of the 768 dimensions of the lm_head below.

-0.0035 sits on the far left tail. 

On step 2 the MLPs in the network start to engage. A single neuron in an MLP with Relu activation and no bias can be thought of as randomly bisecting half the activation space to sample half the inputs, and then choosing single direction in model space to push those inputs. At this stage, the only signal the MLP has is 'push this batch of inputs in the direction of the lm_head for their target tokens, and away from the lm_head of the other tokens'. 

Most tokens have a positive lm_head value on dimension 275, but the tokens that frequently follow '_the' and frequently occur in the forward pass have a negative value for dimension 275. And so MLP neurons are incentivized to push in the negative direction for step 275. 

The chart below shows what direction MLP neurons 'push' on step 2. Each layer has 3072 neurons. Across 11 layers this gives 33,792 neurons.  

The mean of the distribution is slightly negative. Each input will activate for roughly half of the neurons. This means that each input is going to roughly sample 16,000 times from the distribution above. The charts below show the resulting activation on Step 2 after these MLPs fire.

The third chart shows how 98% of activations are getting pushed further negative. At this point, the network has now completed the reinforcing feedback loop. The full sequence of events is:

  1. Anomaly occurs in Dim 275. Performing log-normal weighted sampling over the vocab space (which is what naturally occurs when we sample on the forward pass) from a normal distribution causing skewed activation distributions in several dimensions at initialization. An example of this is dimension 275, which is heavily impacted by token '_the'.
  2. Anomaly spreads to tokens that follow '_the'. On step 1, tokens that follow '_the' decrease their lm_head value in dimension 275, and tokens that don't follow '_the' increase their lm_head value in dimension 275.
  3. Anomaly spreads to 98% of inputs. Each neuron activates on roughly 50% of the inputs. If a neuron activates on a large number of tokens from step 2, it will push that full batch negative.
  4. Repeat. The cycle now repeats in step 1, but instead of 'tokens that follow _the', its 'tokens that follow 98% of inputs'. 

This cycle effectively monopolizes dimension 275 to perform the function of modeling the unigram distribution, and it all starts with a single seed value of -3.7 for token '_the'. From the framing of a spreading virus, '_the' is a prime viral vector.

 

Finding the full Uncertainty Vector

Does the same effect occur in other dimensions? Below is the mean activation right before the lm_head for each dimension at step 600.

275 shows the largest magnitude, followed by 573 and 360. No other dimensions appear to have this dynamic.

All three dimensions show similar distributions, indicating that the actual 'uncertainty vector' is smeared across three dimensions in the residual stream. Below shows the initial activations induced in dimension 573 by the embedding:

The 'seed' for dimension 573 is planted by the right activation around 2.4. Perhaps coincidentally, this also corresponds to the token '_the'. 

I wanted to revisit one fact in this chart:

There is a small spike of height 100 in the histogram that starts to go slightly positive around step 400. This corresponds to the beginning of sequence token ,which occurs at the start of every document during training. Here is its activation across all 768 dimensions on step 600:

It has the largest magnitude activation in dimension 360, and a very small positive activation in dimension 275. How do the predicted probabilities differ between the unigram distribution, dim 275, and dim 360?

Since the softmax function is scale sensitive, we can only compare these distributions to the unigram distribution after applying a scalar factor. A uniform prediction across all 50304 tokens gives 10.8 loss. A perfect prediction gives 7.66 loss. Here is how the loss scales for dimensions [275, 360, 573] as the activation magnitude is varied from 0 to 30:

Both dimensions 275 and 573 achieve minimum loss against the unigram distribution exactly when 100% of the activation is placed into them, which gives a normed activation of 27.7. 

To see which activation most accurately predicts the unigram distribution, I run the following code:

# Compute which activation most strongly predicts unigram distribution class UnigramModel(nn.Module): def __init__(self, d_model, d_vocab, fixed_head): super().__init__() self.w = nn.Parameter(torch.ones((1, d_model), device='cuda')) self.lm_head = fixed_head.clone()[:,:D_VOCAB] def forward(self, y): logits = (self.w @ self.lm_head).squeeze(0) log_probs = F.log_softmax(logits, dim=-1) loss = -(log_probs*y).sum() return loss target = frac unigram = UnigramModel(768, D_VOCAB, model.lm_head.weight.T.float().data) optim = torch.optim.SGD(unigram.parameters(), lr=1) for step in range(10000): loss = unigram(target) loss.backward() optim.step() optim.zero_grad() if step%1000==0: print(loss.item()) data = unigram.w.data.cpu().numpy()[0] plt.figure(figsize=(12,4), dpi=100) plt.title('Unigram Activation') plt.plot(data) plt.vlines(x=range(768),ymin=0, ymax=data) plt.show()

 

This gives 7.73 loss, very close to the perfect score of 7.66. To predict complete uncertainty, an activation should roughly put 50% of its magnitude into negative dim 275, 25% into negative dim 360, and 25% into dim 573. 

What is the biggest difference between dims 275 and 360? Here are the top 10 tokens that 360 prefers over 275:

  • ['The' 'A' 'I' '1' 'H' 'L' 'the' 'In' 'P' 'B']

Here are the top 10 tokens 275 prefers over 360:

  • [',' ' the' ' and' ' in' ' a' ' (' ' ' ' �' ' on' ' of']

Dim 360 appears to model uncertainty conditioned on start of document, as the tokens are capitalized with no leading space, whereas 275 models uncertainty conditioned on middle of document.

What is the biggest difference between dims 275 and 573? Here are the top 10 tokens that prefer 573 over 275:

  • 'ixtape' 'gorithm' 'adata' 'umbnails' 'initions' 'INGTON' 'ospels' 'helial' ' ..............' 'ウス'

Here are the top 10 tokens that prefer 275 over 573:

  • ',' ' the' ' and' ' in' '.' ' (' ' a' '-' '\n' ' �'

Dim 573 appears to model uncertainty conditioned on middle of word, whereas 275 models uncertainty conditioned on start of word.

 

Takeaways
  1. The residual stream is not an arbitrary basis. It is given meaning through the spikey initialization of activations that occurs from sampling from tokens that follow a log-normal distribution, that kickoff positive feedback loops during training.
  2. Interpretable structure of the final LLM can be seeded from small tweaks at initialization. In this case, the evolution of the uncertainty vector is traced back to very specific initialization values of the token "_the".
  3. The uncertainty vector can condition on multiple factors: ["Start of document, Start of word in middle of sentence, or middle of word"]. Each of these is skewed towards one of 3 dimensions in the residual stream.
  4. If we can better understand positive feedback loops that elicit separable structure during training, would it be possible to define an initialization seeding that evokes a much more generally interpretable model? EG we perform specific initialization for tokens related to "warfare" in dimension 15, such that a positive feedback loop kicks off during training that reinforces how dimension 15 relates to the concept of "warfare".

code: https://github.com/ClassicLarry/uncertaintyVectorLLM/tree/main 



Discuss

Stop Applying And Get To Work

24 ноября, 2025 - 01:50
Published on November 23, 2025 10:50 PM GMT

TL;DR: Figure out what needs doing and do it, don't wait on approval from fellowships or jobs.

If you...

  • Have short timelines
  • Have been struggling to get into a position in AI safety
  • Are able to self-motivate your efforts
  • Have a sufficient financial safety net

... I would recommend changing your personal strategy entirely.

I started my full-time AI safety career transitioning process in March 2025. For the first 7 months or so, I heavily prioritized applying for jobs and fellowships. But like for many others trying to "break into the field" and get their "foot in the door", this became quite discouraging.

I'm not gonna get into the numbers here, but if you've been applying and getting rejected multiple times during the past year or so, you've probably noticed the number of applicants increasing at a preposterous rate. What this means in practice is that the "entry-level" positions are practically impossible for "entry-level" people to enter. 

If you're like me and have short timelines, applying, getting better at applying, and applying again, becomes meaningless very fast. You're optimizing for signaling competence rather than actually being competent. Because if you a) have short timelines, and b) are honest with yourself, you would come to the conclusion that immediate, direct action and effect is a priority. 

If you identify as an impostor...

..applying for things can be especially nerve-wrecking. To me, this seems to be because I'm incentivized to optimize for how I'm going to be perceived. I've found the best antidote for my own impostor-y feelings to be this: Focus on being useful and having direct impact, instead of signaling the ability to (maybe one day) have direct impact.

I find it quite comforting that I don't need to be in the spotlight, but instead get to have an influence from the sidelines. I don't need to think about "how does this look" - just "could this work" or "is this helpful". 

And so I started looking for ways in which I could help existing projects immediately. Suddenly, "optimize LinkedIn profile" didn't feel like such a high EV task anymore.

Here's what I did, and recommend folks to try

Identify the risk scenario you'd most like to mitigate, and the 1-3 potentially most effective interventions.

Find out who's already working on those interventions.[1]

Contact these people and look for things they might need help with. Let them know what you could do right now to increase their chances of success.[2]

What I've personally found the most effective is reaching out to people with specific offers and/or questions you need answered in order to make those offers[3]. Address problems you've noticed that should be addressed. If you have a track record of being a reliable and sensible person (and preferably can provide some evidence to support this), and you offer your time for free, and the people you're offering to help actually want to get things done, they're unlikely to refuse[4]

(Will happily share more about my story and what I'm doing currently; don't hesitate to ask detailed questions/tips/advice.)[5]

  1. ^

    If nobody seems to be on the ball, consider starting your own project.

  2. ^

    Here it's quite helpful to focus on what you do best, where you might have an unfair advantage, etc.

  3. ^

    As a general rule, assume the person you're messaging or talking to doesn't have the time to listen to your takes - get straight to the point and make sure you've done the cognitive labor for them. 

  4. ^

    I should add that in order to do this you need to have developed a bit of agency, as well as understanding of the field you're trying to contribute to. I'm also assuming that since you have the capacity to apply for things, you also have the capacity to get things done if you trade the time.

  5. ^

    Post encouraged and mildly improved by plex based on a conversation with Pauliina. From the other side of this, I'd much rather take someone onto a project who has spent a few months trying to build useful things than spending cycles to signal for applications, even if their projects don't go anywhere. You get good at what you practice. Hire people who do things and go do things. e.g. I once gave the org Alignment Ecosystem Development, which runs all the aisafety.com resources, to a volunteer (Bryce Robertson) who'd been helping out competently for a while. Excellent move! He had proved he actually did good stuff unprompted and has been improving it much more than I would have.

    Also! I'd much rather work with someone who's been practicing figuring out inside views of what's actually good to orient their priorities rather than someone looking for a role doing work which someone else thinks is good and got funding to hire for. Deference is the mind-killer.



Discuss

Halfhaven Digest #5

24 ноября, 2025 - 00:57
Published on November 23, 2025 9:57 PM GMT

My posts since the last digest
  • A Culture of Bullshit — Part of the reason society is going down the tubes — if it is — is because we have a culture of mediocrity, where bullshit is tolerated.
  • The Flaw in the Paperclip Maximizer Thought Experiment — Most of the things I write are original ideas (whether brilliant insights or lazy hot takes), but this one was a bit more of an exposition of ideas I didn’t come up with.
  • I Spent 30 Days Learning to Smile More Charismatically — Technically, this one took me 30 days to make. Talks about charisma and “looksmaxxing”, and how unhinged some looksmaxxing advice can be.
  • In Defense of Sneering — This was just a LessWrong comment, which is allowed for Halfhaven. There was a LessWrong thread where everyone was complaining about sneering, and I chimed in because I think sneering isn’t inherently bad, it’s only bad if it’s too hostile. But not enough sneering risks letting bullshitters get away with their bullshit.
  • Literacy is Decreasing Among the Intellectual Class — Looking at two books that have been in publication for over a century (Etiquette and Gray’s Anatomy) and comparing the old versions with the modern to see the degradation in writing quality typical of modern books.

I’m proud of a few of these ones. I was sick during this segment of Halfhaven, but I still managed to get things out, which I’m happy with. I had a few mostly-finished posts in the chamber.

Some highlights from other Halfhaven writers (since the last digest)
  • Why is Writing Aversive? (Ari Zerner) — A relatable post asking why it is that writing can feel so hard. My general advice would normally be that if you find writing involves a lot of friction, but enjoy having written things, that means you just don’t like writing and should give up. But reading this post made me realize I used to feel a lot more like Ari than I do now about writing. As little as a few months ago, maybe. I think maybe developing taste and putting more effort into editing has been what’s helped. Then writing feels like a type of craft, rather than a brain dump. And building things is fun. As long as you’re not TikTok-brained (or Magic-Arena-brained), which is its own problem, and one I sometimes struggle with too.
  • Menswear is a Subcultural Signaling System (Aaron) — A great post. In particular, I liked the concept handle of a “Type of Guy”, which conveys the archetypal nature of fashion. “You do not want different items of clothing you are wearing to signal you are incompatible Types Of Guy.” So no vest over a t-shirt and jeans! Has a follow-up post.
  • No One Reads the Original Work (Algon) — People talk about things without actually having seen them. The equivalent of reading headlines without clicking through to the news article. I remember seeing a lot of this when Jordan Peterson was popular, and people who hated him would talk about him in ways that made it clear they’d never heard the man speak. They’d only heard people talking about him.
  • against predicting speedrunners won’t do things (April) — I think April is winning the record for the most post topics that make me want to click. Speedrunning lore is inherently interesting. I like that she backs up her hypothesis with some concrete predictions.
  • Diary: getting excused from a jury duty; models, models, models (mishka) — I’d never thought about how biased police are as witnesses. That’s a great point.
  • To Write Well, First Experience (keltan) — Lots of good writing advice. In particular, that if you’re writing from stuff you’ve read rather than from real experience, you’re writing through a low-bandwidth proxy.
  • Traditional Food (Lsusr) — A very thorough post about how our idea of a traditional diet doesn’t necessarily reflect what people actually ate in the past, and instead often reflects actual government propaganda. White rice and white bread are “fiberless blobs of carbohydrates” that nobody in history ever ate, and eating them makes us sick.

We’re entering the final segment of Halfhaven. Many won’t finish the full 30 post challenge by the end of November, but I’ve still gotten some good posts out of the people who didn’t make it all the way, so be proud of what you have done, rather than dwelling on what you didn’t do. Good luck in the final week everyone!



Discuss

Emotions, Fabricated

24 ноября, 2025 - 00:57
Published on November 23, 2025 9:57 PM GMT

Queries about my internal state tend to return fabricated answers. It doesn't much matter if it's me or someone else asking the questions. It's not like I know what's going on inside my head. Thoughts can be traced to an extent, but feelings are intangible. Typically I just don't try, and the most pressing issue is that I'm unable to differentiate anxiety and hunger. Not a huge problem, except for slight over-eating once in a while. I think the description of Alexithymia matches my experiences quite well, although naturally not all of the symptoms match.

The real issues arise from other people asking how I feel or what caused me to act in one way or another. I have no answers to such questions! I'm guided by intractable anxiety, learned patterns on how one ought to navigate a situation, and a mostly-subconscious attempt to keep it all consistent with how I've been before. Complicated yet incomplete models about how emotions and motivations are supposed to work, stolen from books I like to substitute my reality with. Shallow masks on top of a void that only stares back when I look for the answers.

Whenever actual pressure is placed on me to obtain the unavailable answers, the narrator makes up a story. Good stories make sense, so the narrator finds an angle that works. Memories are re-interpreted or modified to match the story as necessary. Painting a good picture of oneself is imperative, and the stories pick just the right frame for that. Actually lying is unnecessary; without closer inspection it's not hard to actually believe that all, and the inability to trust one's own memories or reasoning doesn't help. Just noticing that this kind of thing was going on was quite hard. Sometimes I add disclaimers when the topic seems prone to fabricated emotions, especially when analyzing events of the past. Often I won't bother, people tend to not appreciate it and mostly just causes everyone else involved to be frustrated as well. Still, anyone who gets to know me well enough would probably notice it at some point, and keeping it secret would feel unsustainable too.

I'm not sure how this should be taken into account when modeling other people. Is everyone like this? I think so, but only rarely as strongly as I am. Nor as self-aware, although perhaps most people are better at this, proportionate to how much it affects them. People rarely report experiencing the same, when I tell them of fear of being just an empty core behind my masks. Perhaps if the masks are a bit closer, they feel like a part of one's personality rather than some bolted-on external layer. The lacking sense of identity is a depression thing, so maybe mentally healthy people, whatever that means, have an experience of all-encompassing identity.

In my previous text on related matters, I looked at it through the lens of validation-seeking. I'm not sure how much of the fabrication happens because the narrator rewrites the events in a more flattering way, but that's surely a part of this. But not all of it.

All of this was probably fabricated too, as it was mostly produced by the need to have something to write about. Oh well.



Discuss

Страницы