# Новости LessWrong.com

A community blog devoted to refining the art of rationality
Обновлено: 24 минуты 54 секунды назад

### Where are intentions to be found?

1 час 43 минуты назад
Published on April 21, 2021 12:51 AM GMT

This is independent research. To make it possible for me to continue writing posts like this, please consider supporting me.

As we build powerful AI systems, we want to ensure that they are broadly beneficial. Pinning down exactly what it means to be broadly and truly beneficial in an explicit, philosophical sense appears exceptionally daunting, so we would like to build AI systems that are, in fact, broadly and truly beneficial, but without explicitly answering seemingly-intractable philosophical problems.

One approach to doing this is to build AI systems that discover what to do by examining or interacting with humans. The hope is that AI systems can help us not just with the problem of taking actions in service of a goal, but also with the problem of working out what the goal ought to be.

Inverse reinforcement learning is a classical example of this paradigm. Under inverse reinforcement learning, an AI observes a human taking actions, then looks for an explanation of those actions in terms of a value function, then itself takes actions that optimize that value function.

We might ask why we would build an AI that acts in service of the same values that the human is already acting in service of. The most important answer in the context of advanced AI, it seems to me, is that AI systems are potentially much more powerful than humans, so we hope that AI systems will implement our values at a speed and scope that goes beyond what we are capable of on our own. For this reason, it is important that whatever it is that the AI extracts as it examines a human taking actions is trustworthy enough that if it were implemented faithfully by the AI then the world brought forth by the AI would be a good world.

Inverse reinforcement learning is just one version of what I will call extraction-oriented AI systems. An extraction-oriented AI system is one that examines some part of the world, then, based on what it finds there, takes actions that affect the whole world. Under classical inverse reinforcement learning the particular part of the world that gets examined is some action-taking entity such as a human, the particular extraction method is to model that entity as an agent and look for a value function that explains its behavior, and the particular way that the system acts upon this value function is, at least under classical AI paradigms, to itself take actions that optimize that value function. But there are many other choices for what part of the world to examine, what to extract from it, and how to implement that which is extracted. For example, we might examine the net behavior of a whole human society rather than a single human; we might extract a policy by imitation learning rather than a value function by imitation learning; and we might act in the world using a satisficer rather than an optimizer. There are many choices for how we might do this. What I’m addressing here is any approach to developing AI that becomes aligned with what is truly beneficial by investigating some part of the world.

So long as we are within the regime of extraction-oriented AI systems, we are making the assumption that there is some part of the world we can examine that contains information sufficient to be a trustworthy basis for taking actions in the world.

Let us examine this assumption very carefully. Suppose we look at a closed physical system with some humans in it. Suppose that this system contains, say, a rainforest in which the humans live together with many other animal and plant species:

Suppose that I plan to build an AI that I will insert into this system in order to help resolve problems of disease, violence, ecological destruction, and to assist with the long-term flourishing of the overall ecosystem:

It is difficult to say exactly what it means for this overall ecosystem to flourish. How do I balance the welfare of one species against that of another? Of one individual against another? How do we measure welfare? Is welfare even the right frame for asking this question? And what is an appropriate way to investigate these questions in the first place? Due to such questions, it is difficult to build an AI purely from first-principles and so suppose I tell you that I am planning to build an AI that discovers the answers to these questions by examining the behavior of humans and perhaps other living beings within the ecosystem. Perhaps I have some elaborate scheme for doing this; there is no need to get into the details here, the important thing is that I tell you that the basic framework I will be working within is that I will observe some part of the system from some amount of time, then I will do some kind of modelling work based on what I observe there, then I will build an AI that acts in some way upon the model I construct, and in this way I will sidestep needing an explicit answer to the thorny philosophical questions of what true benefit really means:

You might then ask which part of the system will I examine and what is it that I hope to find there that will guide the actions of the powerful AI that I intend to insert into the system. Well, suppose for the sake of this thought experiment that the part of the world I am planning to examine was the right toe of one of the humans:

Suppose I have an elaborate scheme in which I will observe this toe for aeons, learn everything there is to learn about it, interact with it in this or that way, model it in this or that way, place it in various simulated environments and interact with it in those simulated environments, wait for it to reach reflective equilibrium with itself, and so forth. What do you say? You say: well, this is just not going to work. The information I seek is just not in the toe. It is not there. I can examine the spatial region containing a single human toe for a long time but the information I seek is not there, so the AI I build is not going to be of true benefit to this ecosystem and the living beings within it.

What information is it that I am seeking? Well I am seeking information sufficient to guide the actions of the AI. I do not have an understanding of how to derive beneficial action from first principles so I hope to learn or imitate or examine something somewhere in a way that will let me build an AI whose actions are beneficial. It could be that I extract a policy or a value function or something else entirely. Suppose for the sake of thought experiment that I am in fact a computer scientist from the future and that I present to you some scheme that is unlike anything in contemporary machine learning, but still consists of examining a part of the world, learning something from it, and on that basis building an AI that sidesteps the need for a first principles answer to the question of what it means to be beneficial. And suppose, to continue with my thought experiment, that the region of space I am examining is still a single human toe. It really does not matter what sophisticated scheme I present: if the part of the world that I’m examining is a left toe then this scheme is not going to work, because this part of the world does not contain the kind of information that could guide the actions of an AI that will have power over this ecosystem’s destiny.

Now let us suppose that I present to you the following revised plan: the part of the world I am going to examine is a living rabbit. Yes, a rabbit:

Again, let’s say that I present some sophisticated scheme for extracting something from this part of the world. Perhaps I am going to extrapolate what the rabbit would do if it had more time to consider the consequences of its actions. Or perhaps I am going to evolve the rabbit forward over many generations under simulation. Or perhaps I am going to provide the rabbit with access to a powerful computer on which it can run simulations. Or perhaps I have some other scheme in mind, but it is still within the following framework: I will examine the configuration of atoms within a spatial region consisting of a live rabbit, and on the basis of what I find there I will construct an AI that I will then insert into this ecosystem, and this AI will be powerful enough to determine the future of life in this ecosystem.

Now, please do not get confused about whether I am trying to build an AI that is beneficial to humans or to rabbits. Neither of those is my goal in this hypothetical story. I am trying to build an AI that is overall beneficial to this system, but I do not know what that means, or how to balance the welfare of rabbits versus that of humans versus that of trees, or what welfare means, or whether the welfare of the whole system can be decomposed into the welfare of the individual beings, or whether welfare is the right kind of frame to start with. I am deeply confused at every level about what it means for any system to be of true benefit to anything, and it is for that very reason that I am building an extraction-oriented AI: my hope is that rather than first coming to a complete understanding of what it means to be of true benefit to this small world and only then building an AI to implement that understanding, I can sidestep the issue by extracting some information from the world itself. Perhaps if I do the right kind of extraction -- which may involve allowing the rabbit to reflect for a long time, or allowing it to interact with statistical imitations of itself interacting with statistical imitations of itself, or any other such scheme -- then I can find an answer to these questions within the world itself. And it does not have to be an answer that I personally can understand and be satisfied with, but just an answer that can guide the actions of the AI that I plan to insert into this world. But no matter how many layers of uncertainty we have or what specific scheme I present to you, you might still ask: is it plausible that the information I seek is present in the particular spatial region that I propose to examine?

And, I ask you now, back here in the real world: is this information in fact present in the rabbit? Could some hypothetical superhumans from the future build this AI in a way that actually was beneficial if they were limited to examining a spatial region containing a single rabbit? What is the information we are seeking, and is it present within the rabbit?

I ask this because I want to point out how nontrivial is the view that we might examine any part of such a system and find answers to these profound questions, no matter how the extraction is done. Some people seem to hold the view that we could find these answers by examining a human brain, or a whole human body:

Of course, the schemes for doing this do not anticipate that we will just read out answers from the structure of the brain. They are more sophisticated than that. Some anticipate running simulations of the human brain based on the neural structures we find and asking questions to those simulations. Others anticipate modelling the brain based on the output it produces when fed certain inputs. But the point is that so long as we are in the regime of extraction-oriented AI, which is to say that we examine a spatial region within a system, then, based on what we find there, build an AI that takes actions that affect the whole system, then we might reasonably ask: is the information we seek plausibly present in the spatial region that we are examining? And if so, why exactly do we believe that?

Is it plausible, for example, that we could examine just the brain of a human child? How about examining an unborn human embryo? A strand of human DNA? A strand of DNA from a historical chimpanzee from which modern humans evolved? A strand of DNA from the first organism that had DNA? If the information we seek is in the human brain then how far back in time can we go? If we have a method for extracting it from an adult human brain then could we not extract it from some causal precursor to a fully-formed human brain by evolving a blueprint of the precursor forward in time? We are not talking here about anything so mundane as extracting contemporary human preferences; we are trying to extract answers to the question of whether preferences are even the right frame to use, whether we should incorporate the preferences of other living beings, where the division between moral patienthood and moral non-patienthood is, whether the AI itself is a moral patient, whether the frame of moral patients is even the right frame to use. These are deep questions. The AIs we build are going to do something, and that something may or may not be what is truly beneficial to the systems into which we deploy them. We cannot avoid these questions completely, but we hope to sidestep explicitly answering them by imitating or learning from or modelling something from somewhere that can form some kind of basis for an AI that takes actions in the world. If we are within this extraction-oriented AI regime, then the actions taken by the AI will be a function of the physical configuration of matter within the spatial regions that we examine. So we might ask: do we want the future to be determined by the physical configuration of matter within this particular spatial region? For which spatial regions are we willing to say yes? So long as we are in this regime, no amount of modelling wizardry changes this functional dependence of the whole future of this world upon the physical configuration of some chosen part of the world.

If the spatial region we choose is a human brain, or a whole human body, or even an entire human society, then we should ask: how is it that the information in this spatial region is relevant to how we would want the overall configuration of the system to evolve, but information outside that spatial region is not relevant? How did that come to be the case?

As I wrote in my reflections on a recent seminar by Michael Littman, it seems to me that my own intentions have updated over time at every level. It does not seem to me that I have some underlying fixed intentions lying deep within me that I am merely unfolding. It seems to me that it is through interacting with the world that my intentions develop and mature. I do not think that you could find out my current intentions by examining my younger self because the information was not all in there: much of the information that informs my current intentions was at that time out in the world, and it is through encountering it that I have arrived at my current intentions. And I anticipate this process continuing into the future. I would not trust any scheme that would look for my true intentions by examining my physical body and brain today, because I do not think the information about my deepest intentions in the future is located entirely within my body and brain today. Instead I think that my intentions will be informed by my interactions with the world, and some of the information about how that will go is out there in the world.

But this is just introspective conjecture. I do not have full access to my own inner workings so I cannot report on exactly how it is that my intentions are formed. My point here is more modest, and it is this: that we can discover what is of benefit to a system by examining a certain part of the system is a profound claim. If we are to examine a part of the universe in which we find ourselves located, and that part contains one or several hairless primates under the supposition that the desired information is present in that part, then we should have a good account of how that came to be the case. It is not obvious to me that it is in there.

Discuss

### Why don't we vaccinate people against smallpox any more?

2 часа 26 минут назад
Published on April 21, 2021 12:08 AM GMT

The eradication of smallpox in 1979 represented one of the greatest achievements of modern civilization. However, since then most countries have elected to stop vaccinating their populations against the disease. This seems like a very concerning vulnerability to me, with waning herd immunity due to more and more of the world's population being replaced by unvaccinated young people.

What if there was an unintentional release from one of the labs around the world that still hold on to samples of the virus? What about an intentional release by terrorists/rogue nations? What gives scientists the confidence that there are no undiscovered animal reservoirs or uncontacted tribes in remote places where smallpox is still circulating? Smallpox is highly contagious and hundreds of times more deadly the SARS-Cov-2.

How would the world respond to such a release? Is there enough capacity to rapidly produce and deploy billions of doses of smallpox vaccines? (Right now we're at an all-time high in terms of pandemic preparedness; I'm thinking decades down the road when the lessons from Covid-19 has been all but forgotten)

Discuss

### What am I fighting for?

3 часа 7 минут назад
Published on April 20, 2021 11:27 PM GMT

Written at CEEALAR.

Status: toward the end of writing this I started reading Suffering-Focused Ethics by Magnus Vinding as well as more Brian Tomasik, and I'm feeling myself value-drift in a more negative direction. It's possible I will endorse none of what follows fairly soon.

If you want to link to the higher-order freedoms formalization separate from the context of this post, just message me and I'll set it up in it's own post

Special thanks to comments from Nick Ronkin, Nicholas Turner

Motivation

It recently occurred to me that I can't be expected to calculate my behavior if I didn't put in the work of figuring out what I'm fighting for and performing backward induction.

Another word for figuring out what I'm fighting for is articulating my terminal values, which practically I think looks like painting dreams and hopes of what I think sentience ought to do. I will call this a specific notion of winning (SNoW). Crucially, though I'm seeking more detail I already know that an instrumental value is the world not dying, the great sentience project (on earth) not failing, etc. An argument against writing this post is the following: why be self-indulgent, writing amateur philosophy/scifi, when you've already converged upon a (n intermediary) goal that implies several behaviors?

You can imagine in the picture amputating the SNoW node and working entirely from the middle node! Just as, in a sense, the limits of my abstraction or foresight amputated an even further-right node in the first place (if you believe that no value is really terminal). However, I think there is a property inherent to backward induction that isn't captured by the syntax, the idea that as you move from right to left you're adding detail and motivation, every arrow brings complexity to its head node.

There is also to consider the colleague property of this setup: having a transhumanist/utopian terminal value endows me with the instrumental value of everything not blowing up, which endows me with the privilege of calling many of you my colleagues.

Indeed, it would not be shocking if Alice and Bob, having abolished x-risk and secured the future, were then at eachothers' throats because they disagreed so fundamentally about where to go from there. Natural questions arise: are they colleagues in the face of x-risk? Ought they be? The latter I will not attempt to answer, but my feeling about the former is that the answer is yes. (note: indeed "abolished" is a strong word here, when "reasonably mitigated in perpetuity" may be more realistic).

Again, you can amputate the rightmost column from the graph and still have a veritable machine for generating behaviors. So why do I indulge in this post? Because I've tried to calculate my behavior strictly from the world not ending node, and I've gotten stuck. I think I have a messy & confused view of my expected impact, and I don't know how I should be spending my time. My hypothesis is that filling out detail further to the right of the graph is going to give me information that empowers my thinking. Having spent a great deal of time believing the argument against writing this post, I've been reticient to invest in my ideas, my visions, my dreams. I'm guessing this was a mistake: the lack of information that ought to come from the right side of the graph leaves empty patches (sorrys) in every node and arrow that remains, leading to a sloppy calculation.

Another point comes from the driving insight behind Fun Theory, which is that people have to want something in order to fight for it, so promoting imagination of transhumanities that people would actually want to live in could be an important part of building out allies.

Useful Exercise: What does the world look like if you win?

About a year ago, just before the plague, I went for a networking lunch with an EA. Very quickly, she asked me "what does the world look like if you win?". I was taken aback by the premise of the question; the idea that you could just think about that shocked me for some reason. I think because I was so mired in the instrumental convergence to goals that aren't personal visions but shared visions, and believing that it would be self-indulgent to go further-right on the graph.

In any case, I think this is a valuable exercise. I got a group together at EA Philly to do it collectively, and even derailed a party once with it.

Anyway, when initially asked I probably mumbled something about bringing autonomy and prosperity to all, because I didn't have a lot of time to think about it. It was approximately the next day I thought seriously about questions like "why is prosperity even good?", "what does it mean to maximize autonomy?", and came up with a semi-satisfying model that I think is worth writing down.

Against Knowing What You're Fighting For

If you're buying the premise of this post, let's take a moment to consider this arc of Replacing Guilt, Nate Soares includes a post called "You don't get to know what you're fighting for".

If someone were to ask you, "hey, what's that Nate guy trying so hard to do," you might answer something like "increase the chance of human survival," or "put an end to unwanted death" or "reduce suffering" or something. This isn't the case. I mean, I am doing those things, but those are all negative motivations: I am against Alzheimer's, I am against human extinction, but what am I for? The truth is, I don't quite know. I'm for something, that's for damn sure, and I have lots of feelings about the things that I'm fighting for, but I find them rather hard to express.

Soares writes that what we care about is an empirical matter, but that human introspection isn't sophisticated enough yet to release those facts into our epistemic state. He looks forward to a day when values are mapped and people can know what they're fighting for, but feasibility is only one component; there is also humility or the possibility that one's articulation is wrong. Soares seems to believe that under uncertainty negative goals are, as a rule, easier to justify than positive goals. He emphasizes the simple ability to be wrong about positive values, but when it comes to the urgent and obvious matters of alzheimers or extinction he does not highlight anything like that. I think this is reasonable. Indeed, activists implicitly know this because you see them protest against existing things more than you see them build new things, they don't want to open themselves up to the comparatively greater uncertainty, or they just find it harder to build teams given that uncertainty. But moreover, you know inherently more about consequences of existing things than potential things, when you try to bring about things that don't exist yet it's much closer to making a bet than when you try to stop something from existing.

But there's also a more general note here about value drift, or one interpretation of it among many. You can easily imagine things looking differently as you get closer to them, not least due to obtaining knowledge and clarity over time. Additionally, as the consequences of your steps propagate through the world, you may find premises of the goal suddenly violated. Much is stacked against your positive goals maintaining fidelity as you work toward them. Soares points out "The goal you think you're pursuing may well not survive a close examination." The example he gives is total hedonic utilitarianism: the asymmetry between how easy it is to claim your allegiance to it and the difficulty of defining "mind" or "preference", deciding on processes for extracting preferences from a mind, deciding on population ethics, etc. Of course one could naively think they've solved the "positive goals are slippery" problem just by taking these specific critiques and putting a lot of thought into them, but I think it's at least slightly less naive to try to think about meta-values or the premises on top of which valuers can come along and value stuff, reason about why it is they value it, etc. I will say more about meta-values later.

Higher-order Freedoms

Before I can describe my specific notion of winning (SNoW), I need to explain something. It appeared to me as "original", though I have no idea if it's "novel", and it forms the core of my win condition.

Motivation

What does it mean to maximize autonomy? Why is prosperity even good?

I want my account of autonomy to have:

• qualitative properties, where we ask "what kind?" of autonomy
• quantitative properties, where we ask "how much?" autonomy

And ideally we won't do this as two separate steps!

I'll be content for my account of prosperity to be thoroughly dependent on autonomy.

The formalism English first

The order of a freedom is the number of other freedoms associated with it.

Formally
• We will take options to be discrete, but it should be generalizable to a continuous setting.
• We shall model a decision (to-be-made) as a set of actions representing your available options.
• Associated with each option is a PMF assigning probabilities to consequences of actions.
• A consequence may be either a terminal outcome or another decision.
• A decision's consequence set is the union of the consequences of all its options.
• The consequence set of an outcome is empty.
• We define interesting in the following way: a decision is interesting when most of its actions lead to more decisions most of the time.
• We'll call a chain of options representing decisions-made and terminating in an outcome a questline.
• The order of a questline is it's length, or the number of →s plus one. O(Q)=3
• A decision can be filtered by reminding the agent of subsequent goals. For example, as the agent ponders the options in A, their considering not only the bringing about of B but ultimately of the bringing about of C as well, so if there are options contrary to C in A, the agent has foresight not to select them.
Examples
• Bob is living on subsistence UBI. He wants to go jetskiing. He'll need to get a job and go jetskiing on his day off, because activities like that aren't covered under the definition of subsistence. Write Bob's questline and state it's order ::: work \rightarrow jetskiing, order 2. :::
• Alice lives in a town with 3 brands of toothpaste at CVS. Bob lives in a town with 7 brands of toothpaste at CVS. Which one has more freedom? ::: They have the same amount of freedom. :::
• Alice wants to play piano. Like Bob, she is living on a subsistence UBI. Write a questline and state it's order ::: work \rightarrow buy piano \rightarrow practice \rightarrow play beautifully, order 4 :::
Issue from factoring

You may have noticed from the piano example that the granularity level is subjective. In short, every finite list is hiding a much longer list by generalizing steps, suppressing detail, clustering steps together. The step buy piano could easily be unpacked or factored into select piano at store → buy piano → move piano into house, (though I think the limit here is something like quark level, and we don't enter infinity). You're wondering if we have a principled way of selecting a granularity level, right? The way we proceed will have the following properties:

• We want to suppress detail so that an action is at the abstraction level most useful to the agent
• We want to emphasize interesting decisions, modulo filtering them with respect to information from the right side in the backward induction syntax. I.e. if a decision is interesting but contrary to some later goal, it can easily be ignored.
• We are free to imagine a personal assistant AI that automates some of the process of filtering decisions with respect to information from the right side in the syntax, and suppressing uninteresting decisions. Indeed, later we'll see that such a personal assistant plays a crucial role in ensuring that people are actually happy in a world of higher-order freedoms.
My Specific Notion of Winning

If I win, the freedoms of the world will be increasing in order. I think the heretofore state of human cognition and society imposes an upper bound on the order of freedoms, and that the meaning of rate of progress is the first derivative of this upper bound.

I said I would derive prosperity from my account of autonomy, here it is: prosperity is simply the accessibility of higher-order freedoms.

Deriving altruism

It's easy for me to say I want the first derivative of the upper bound of orders of freedoms to be increasing for all, but are incentives aligned such that selfish agents take an interest in the liberties of all?

Idea: higher-order freedoms intertwine individuals

Intuitively, one lever for increasing the interestingness of my options is having colleagues who have interesting options, and the more potential colleagues I have the higher quality choices I can make for who to collaborate with. Therefore, a self-interested agent may seek to increase freedoms for others. Besides, a society organized around maximizing higher-order freedoms would retain a property of today's earth society: that one person's inventions can benefit countless others.

There is of course the famous Stephen Gould quote

I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops.

Thus an unequal world is a sub-optimal world in the framework of higher-order freedoms, and even selfish individuals are incentivized to fight for bringing higher-order freedoms to others.

Criticism from scarcity of computation Maximizers and satisficers

Satisficers find a "good enough" option, unlike maximizers who look for a "best" option. On wikipedia, "satisficing is a decision-making strategy or cognitive heuristic that entails searching through the available alternatives until an acceptability threshold is met."

Meanwhile there is a literature on overchoice, defined by Alvin Toffler "[Overchoice takes place when] the advantages of diversity and individualization are canceled by the complexity of buyer's decision-making process.", which sadly as you'll see on wikipedia comes with a disclaimer that it hasn't been adequately reproduced. I am not going to attempt a rigorous literature review to figure out which parts of it we should keep and which parts we shouldn't, but below I will engage with four intuitive points because they challenge me to think through the ramifications of my framework.

Societies can maximize while individuals satisfice

An essay called Harmful Options appeared in the Fun Theory Sequence. In it, Eliezer pointed out that options come with them compute obligations, "Bounded rationalists can easily do worse with strictly more options, because they burn computing operations to evaluate them." Indeed, speaking personally, I've gotten pretty good at when to maximize and when to satisfice. A demon who wants to exhaust my resources might do so by offering me more choices, but I'm only susceptible if I'm blindly maximizing, if I'm incapable of saying to myself at any decision "I meta-prefer to conserve resources right now than to figure out what I prefer, so I'll just take whatever".

Put another way, consider the following edge case. Suppose an individual wanted to bring about a higher upper bound on the order of freedoms in their society, so they started by just trying to maximize their autonomy in their every day life. Consider the limiting case of this, of an agent who wants to maximize their autonomy in this higher-order freedoms setup, and consider also they find themselves in an infinite environment. Consider also an arbitrary foresight/simulation mechanism allowing them to plan their entire questline before taking a step. Please notice that every time they deliberate between a terminal outcome and another decision, they will choose the decision. So this tension emerges between allowing any questline to complete and maximizing autonomy. In this example, the agent will just plan forever and never act. Can you avert the planning forever outcome by removing one supposition? :::spoiler The first one, maximizing personal autonomy :::

And indeed we don't need this supposition: it's certainly not clear that the best way to boost the upper bound for society at large is to maximize your personal freedom at every step, but this is a natural mistake to make.

There can clearly be a divergence on the maximizer-satisficer axis between societal scale and the individual scale. I'm proposing that the societal scale should be maximizing (trying to get the highest upper bound on the order of freedoms as possible) while the individual is satisficing.

Tyranny of choice

Psychologist Barry Schwartz believes that welfare and freedom diverge after a certain mass of choices. In this short article, he outlines four reasons for this

1. Buyer's remorse: having had more choices makes you wonder more if you made the right decision.
2. Opportunity cost: When there are lots of alternatives to consider, it is easy to imagine the attractive features of alternatives you reject that makes you less satisfied with the option you've chosen.

With fewer choices under consideration, a person will have fewer opportunity costs to subtract.

1. Escalation of expectations: suppose you invest k units of computation into your preferences because you're faced with a decision of l options. Schwartz suggests that the amount of satisfaction you'll expect is some f(k) where f increasing by some factor. In a world of higher ls, ks will need to be higher, making your expectation f(k) much higher indeed.
2. Shifting the blame: When you're dissatisfied and you didn't have many options, you can blame the world. When you had a lot of options, and you're dissatisfied, the problem must have been your computation of preferences step.

Schwartz is of course studying humans, without augmented cognition. I suggest we extract from these conclusions a principle, that the amount of comfortable freedom, that is, an amount of freedom beyond which it starts to diverge from welfare, is dependent on the cognitive abilities of the agents in question. I'd go one further and suggest that augmented cognition and social technologies are needed to assist people in dodging these failure modes.

Is my SNoW hostile to people who fall on the maximizer side of the spectrum?

I think if a world implemented my SNoW, there would be a big risk of people who tend maximizer being left behind. We need various cognitive and social technologies to be in place to help maximizers thrive. One example of such would be some parseability enhancers that aid in compression and filtering. I don't have a detailed picture of what it looks like, but I anticipate the need for it.

Again, at a high level, overchoice literature isn't necessarily replicating

In order to be inspired to do a more rigorous literature review, I would have to see an opportunity to implement a cognitive or social technology that I think would drag either the mean or upper bound order higher in my community, society, or planet. Again, I included Schwartz' four points because I think it's reasonable they would intuitively/philosophically have arisen.

When is Unaligned AI Morally Valuable?

Paul Christiano defined good successor as follows:

an AI is a good successor if I believe that building such an AI and “handing it the keys” is a reasonable thing to do with the universe.

Exercise: take at least ten minutes to write down your own good successor criteria.

My good successor criterion is synchronized with my SNoW

If you've gotten this far, you should be able to see what I'm about to claim.

I am willing to hand the keys of the universe over to an AI society that can implement my SNoW better than humans can. If it turns out that humans run up against the physical limits of how much higher-order their freedoms can be faster or with more friction than the AIs, then I think the AIs should inherit the cosmic endowment, and if they meet or create a civilization that can seize higher-order freedoms with less friction than they can then they ought to hand over the keys in turn.

Paperclipping

In my view, it is natural to ask "What's wrong with paperclippers winning? Surely if they're propagating value in the universe it would be racist to think this was some tragedy, right?", and I claim that taking this seriously has been one of the most nutritional exercises in my growth. I will refer to people who feel that the obvious answer is "as a human I want humans to win!" as provincialists in the sense of "The act or an instance of placing the interests of one's province before one's nation", as suggested by language in the Value Theory sequence (where in the metaphor sentience/freedom-seizing creatures are the nation and humanity is the province).

We can't relax our grip on the future - let go of the steering wheel - and still end up with anything of value. And those who think we can - they're trying to be cosmopolitan. I understand that. I read those same science fiction books as a kid: The provincial villains who enslave aliens for the crime of not looking just like humans. The provincial villains who enslave helpless AIs in durance vile on the assumption that silicon can't be sentient. And the cosmopolitan heroes who understand that minds don't have to be just like us to be embraced as valuable -

The broader point is not just that values we would recognize as valuable aren't just negligible points in the space of possible values - but let that sink in if the thought isn't absolutely familiar to you - but also that steering doesn't necessarily mean clinging to provincial values. If you give a human a steering wheel, they are not obligated to drive only in their comfort zone, they in fact have been known to go across town to a neighborhood they've never been to before.

To change away from human morals in the direction of improvement rather than entropy, requires a criterion of improvement; and that criterion would be physically represented in our brains, and our brains alone.

While I'm not sure I totally get the part about the brain yet, I think my SNoW/good successor criterion is a reasonable candidate for such a "criterion of improvement".

I want to be abundantly clear: coalitioning with provincialists may be abundantly crucial as humans may remain the best at seizing freedoms. I think designing AIs which preserve my SNoW is at least linearly harder than solving any of the alignment problems. This post is not the start of an alt-LW millenarian faction, indeed you could convince me that allocating research effort to ensuring that AIs are prosperous under this definition of prosperity does more harm than good.

Conclusion

I will not be publically performing backward induction at this time, but I'll just say I'm seeing gains in clarity of thinking about goals and behaviors since I sat down to knock out this post!

I recommend you take anything interesting in this post as a recommendation to do an exercise, whether that's articulating some positive vision of what you'd like to see after x-risk or tackling when is unaligned AI morally valuable. (I'm especially curious if anyone but me thinks those two exercises are highly related).

Notice: I didn't do the exercise from fun theory of writing what an average day would be like in my win condition. This is because of time/talent constraint!

Discuss

### Gradations of Inner Alignment Obstacles

4 часа 16 минут назад
Published on April 20, 2021 10:18 PM GMT

The existing definitions of deception, inner optimizer, and some other terms tend to strike me as "stronger than necessary" depending on the context. If weaker definitions are similarly problematic, this means we need stronger methods to prevent them! I illustrate this and make some related (probably contentious) claims.

Summary of contentious claims to follow:

1. The most useful definition of "mesa-optimizer" doesn't require them to perform explicit search, contrary to the current standard.
2. Success at aligning narrowly superhuman models might be bad news.
3. Some versions of the lottery ticket hypothesis seem to imply that randomly initialized networks already contain deceptive agents.

It's possible I've shoved too many things into one post. Sorry.

Inner Optimization

The standard definition of "inner optimizer" refers to something which carries out explicit search, in service of some objective. It's not clear to me whether/when we should focus that narrowly. Here are some other definitions of "inner optimizer" which I sometimes think about.

Mesa-Control

I've previously written about the idea of distinguishing mesa-search vs mesa-control:

• Mesa-searchers implement an internal optimization algorithm, such as a planning algorithm, to help them achieve an objective -- this is the definition of "mesa-optimizer"/"inner optimizer" I think of as standard.
• Mesa-controller refers to any effective strategies, including mesa-searchers but also "dumber" strategies which nonetheless effectively steer toward a misaligned objective. For example, thermostat-like strategies, or strategies which have simply memorized a number of effective interventions.

I think mesa-control is thought of as a less concerning problem than mesa-search, primarily because: how would you even get severely misaligned mesa-controllers? For example, why would a neural network memorize highly effective strategies for pursuing an objective which it hasn't been trained on?

However, I would make the following points:

• If a mesa-searcher and a mesa-controller are equally effective, they're equally concerning. It doesn't matter what their internal algorithm is, if the consequences are the same.
• The point of inner alignment is to protect against those bad consequences. If mesa-controllers which don't search are truly less concerning, this just means it's an easier case to guard against. That's not an argument against including them in the definition of the inner alignment problem.
• Some of the reasons we expect mesa-search also apply to mesa-control more broadly.
• "Search" is an incredibly ambiguous concept.
• There's a continuum between searchers and pure memorized strategies:
• Explicit brute-force search over a large space of possible strategies.
• Heuristic search strategies, which combine brute force with faster, smarter steps.
• Smart strategies like binary search or Newton's method, which efficiently solve problems by taking advantage of their structure, but still involve iteration over possibilities.
• Highly knowledge-based strategies, such as calculus, which find solutions "directly" with no iteration -- but which still involve meaningful computation.
• Mildly-computational strategies, such as decision trees, which approach dumb lookup tables while still capturing meaningful structure (and therefore, meaningful generalization power).
• Dumb lookup tables.
• Where are we supposed to draw the line? My proposal is that we don't have to answer this question: we can just include all of them.
• Some of the reasons we expect mesa-search also apply to mesa-control more broadly.
• There can be simple, effective strategies which perform well on the training examples, but which generalize in the wrong direction for off-distribution cases. Realistic non-search strategies will not actually be lookup tables, but rather, will compress the strategies a lot. Such agents probably follow perverse instrumental incentives because it's a common theme of effective strategies, even without search-based planning.
• Non-search strategies can still factor their knowledge into "knowledge of the goal" vs "knowledge of the world", and combine the two to plan. (For example, the calculus-like optimization I mentioned.) This gives us a critical ingredient for deceptive agents: the training score can be improved by increasing the knowledge in the world-model instead of aligning the goal, such that the agent deceptively cooperates with the training task to achieve its own goals.
• For non-search strategies, it's even more important that the goal actually simplify the problem as opposed to merely reiterate it; so there's even more reason to think that mesa-controllers of this type wouldn't be aligned with the outer goal.
Mesa-Learning

I mentioned this category in the same Mesa-Search vs Mesa-Control post. This refers to the phenomenon of spontaneous emergence of learning algorithms. Basically: a system ends up learning-to-learn when you were only trying to get it to learn. This may or may not involve search.

This could be concerning/important for several reasons, but I don't have a lot I want to say about it in this post.

Explicitly Representing Values

This refers to one of the properties I mentioned in the mesa-control subsection: does a model represent its objective separately from its world-model, and combine those to plan?

Or, slightly more generally: does the system have an explicitly represented objective? (Whether or not it has a "world model".)

There are several reasons to think this might be the critical distinction for pointing to inner optimization:

• It's critical to one story for why we might expect deception from highly capable machine-learning systems, as I previously outlined.
• Representing a goal explicitly seems required for "having a misaligned goal" in a significant sense. (In other words, this just seems like a very natural definition. A system which doesn't recognize some goal as a regularity behind its strategy doesn't "have a goal" in a mechanistic sense.)
• A system that does not do this has little reason to be systematically misaligned.
• That is: even if one "misaligned behavior" is learned as a generalization of effective strategies in the training data, there is little/no reason to expect another misaligned behavior to be learned (particularly not misaligned in the same direction, that is, pursuing the same misaligned goal) unless the system has compressed its strategies in terms of an explicitly represented objective. Therefore, one might argue that there is no reason to expect high levels of capability toward misaligned goals without such factoring.

I don't think these arguments are enough to supersede (misaligned) mesa-control as the general thing we're trying to prevent, but still, it could be that explicit representation of values is the definition which we can build a successful theory around / systematically prevent. So value-representation might end up being the more pragmatically useful definition of mesa-optimization. Therefore, I think it's important to keep this in mind as a potential definition.

Generalizing Values Poorly

This section would be incomplete without mentioning another practical definition: competently pursuing a different objective when put in a different context.

This is just the idea that inner optimizers perform well on the training data, but in deployment, might do something else. It's little more than the idea of models generalizing poorly due to distributional shift. Since learning theory deals extensively with the idea of generalization error, this might be the most pragmatic way to think about the problem of inner optimization.

Deception

Evan Hubinger uses "deceptive alignment" for a strong notion of inner alignment failure, where:

1. There is an inner optimizer. (Evan of course means a mesa-searcher, but we could substitute other definitions.)
2. It is misaligned; it has an objective which differs from the training objective.
3. It is non-myopic: its objective stretches across many iterations of training.
4. It understands the training process and its place within it.
5. In order to preserve its own values, it "cooperates" with the training process (deceptively acting as if it were aligned).

I find that I often (accidentally or purposefully) use "deception" to indicate lesser crimes.

Hidden (possibly "inaccessible") Information

The intuition here is that a "deceptive" system is one that is hiding something from us; it knows more than it is letting on. For example, a system which is using a lot of inaccessible information.

When I read The Case for Aligning Narrowly Superhuman Models, my reaction was:

1. This seems like a great experiment.
2. I expect the experiment to "succeed". That is: I expect you totally can find a training procedure which "aligns" GPT-3 better, in the sense of getting it to give more useful responses to questions.
3. Contrary to what I perceive most people as thinking, I think success would be bad news, and the greater the success, the worse the news (in a sense I'll try to clarify).

My model is that GPT-3 almost certainly is "hiding its intelligence" at least in small ways. For example, if its prompt introduces spelling mistakes, GPT-3 will 'intentionally' continue with more spelling mistakes in what it generates. I expect that if a prompt begins with reasoning mistakes, GPT-3 will continue with similar reasoning mistakes to some extent.

So it seems quite plausible that we can nudge smarter behavior out of GPT-3. For example, with the following procedure inspired by Nisan's paper that Ajeya cites:

1. Train a model to predict upvotes on Quara, Stackxchange, and similar question-answering websites. This serves as a function recognizing "intelligent and helpful responses".
2. Use this upvote predictor as a loss function to fine-tune GPT-3.

I'm thinking of Ajeya's research program roughly as "come up with some better version of that".

So then a very interesting question is: would that work? How well?

Assuming step 2 doesn't just effectively create adversarial examples for the model trained in step 1 (a failure mode which Nisan's work suggests won't happen), I think the procedure would work at least to some extent. EG, it would at least stop GPT from introducing "intentional" reasoning errors. GPT-3 fine-tuned in this way would probably be more helpful than GPT-3 in a meaningful way.

Here's my possibly-contentious claim:

The better this works, the move evidence we have that GPT is close to worrisome territory WRT inner optimizers.

If this procedure barely works at all, then we have evidence that GPT was not "hiding anything" or "playing dumb". This means GPT was not very much like an inner optimizer. It didn't have very much private information it was using to model humans, without revealing to the humans.

If this procedure works moderately well, then this provides some confirmation that GPT had some hidden information.

If this procedure worked quite well, producing a model which answers questions quite intelligently and helpfully, this provides significant evidence that GPT-3 had a kind of inner optimizer. In particular, if this procedure was able to produce better answers than those in the Stackxchange dataset, this would suggest that GPT-3 knew those better answers all along, rather than developing better answers as a consequence of training on that signal.

If this procedure works exceedingly well, causing GPT to "wake up" and be a human-level conversation partner or greater, we should be very worried indeed. (Since we wouldn't then know the alignment of the resulting system, and could be virtually sure that it was an inner optimizer of significant power.)

Barring the final extreme outcome which creates a risk, this is a great experiment which could provide some valuable information about the potential for near-term inner optimizers.

But I worry that people could interpret the experiment incorrectly, thinking that "good" results from this experiment (ie creating much more helpful versions of GPT) are actually "good signs" for alignment. I think the opposite is true: successful results would actually be significant reason for caution, and the more success, the more reason for caution.

This does, of course, depend highly on methodology. If the methodology is not very much like what I describe, then positive results might indeed be good news rather than bad.

Also note that hiding information does not in itself imply inner optimization, much less inner misalignment. Ajeya privately mentioned the example of ImageNet: certainly ImageNet knows more about images than it overtly reveals through its outputs.

However, it does imply some sort of alignment problem, I think, because arguably alignment implies the absence of hidden information. In the GPT-3 example, this can clearly be traced to an outer alignment problem: GPT-3 was trained to imitate humans, not to give the most useful responses possible. So GPT-3 hiding information does strike me as an example of a misaligned meza-optimizer even if not an inner-misaligned mesa-optimizer. (Depending, of course, on your preferred definition of mesa-optimizer. I have no idea whether GPT-3 conducts an internal search. Planning ahead seems like a broadly useful thing for it to do, but, we know little about GPT-3's internal strategies.)

(In an extreme case, an aligned AI might hide information from us for our own sake. However, this at least implies an absence of corrigibility, since it results in difficult-to-verify and difficult-to-correct behavior. I don't feel bad about a definition of "deception" which includes this kind of behavior; avoiding this kind of deception seems like a worthwhile goal.)

A Treacherous Turn

The core reason why we should be interested in Evan's notion of deception is the treacherous turn: a system which appears aligned until, at an opportune moment, it changes its behavior.

So, this serves as a very practical operational definition.

Note that this is identical with the "generalizing values poorly" definition of inner optimizer which I mentioned.

My Contentious Position for this subsection:

Some versions of the lottery ticket hypothesis seem to imply that deceptive circuits are already present at the beginning of training.

The argument goes like this:

1. Call our actual training regime T.
2. I claim that if we're clever enough, we can construct a hypothetical training regime T' which trains the NN to do nearly or exactly the same thing on T, but which injects malign behavior on some different examples. (Someone told me that this is actually an existing area of study; but, I haven't been able to find it yet.)
3. Lottery-ticket thinking suggests that the "lottery ticket" which allows T' to work is already present in the NN when we train on T.
4. (Furthermore, it's plausible that training on T can pretty easily find the lottery ticket which T' would have found. The training on T has no reason to "reject this lottery ticket", since it performs well on T. So, there may be a good chance that we get an NN which behaves as if it were trained on T'.)

Part of my idea for this post was to go over different versions of the lottery ticket hypothesis, as well, and examine which ones imply something like this. However, this post is long enough as it is.

So, what do we think of the argument?

I actually came up with this argument as an argument against a specific form of the lottery ticket hypothesis, thinking the conclusion was pretty silly. The mere existence of T' doesn't seem like sufficient reason to expect a treacherous turn from training on T.

However, now I'm not so sure.

If true, this would argue against certain "basin of corrigibility" style arguments where we start with the claim that the initialized NN is not yet deceptive, and then use that to argue inductively that training does not produce deceptive agents.

Discuss

4 часа 28 минут назад
Published on April 20, 2021 7:05 PM GMT

If you’re like me in 2017, your phone is your drug of choice. My insurance didn’t provide me with any internet abuse rehab options. (Though at least one place exists.) I was sick of being a NEET and spending all my time online, so I gave my phone and laptop to a friend because a lot of people on Reddit had talked about this “cold turkey” approach. Cut to me asking for my devices back a day later. My friends were all too nice to actually hold me accountable. So I descended back into the murky ocean of online lurking, my life experience reduced to an endless scroll, searching for novel information and parasocial fantasies.

The cold turkey approach and similar “digital detox” methods are very popular in the “no surf” community. I see the appeal, it’s tempting to blame the internet on all of your problems and assume you’re better off without it. For hardcore internet users, which I believe is becoming the norm for young and old alike, it’s just probably not realistic or even all that helpful.

Cut to 2021- I’m in school and employed. I’m still an “internet person” but my phone screen time is around 2.5 hours per day when I used to hover around 15. I use my laptop for school and only fall into endless scrolling hell when I’m particularly depressed.

“Put down your phone and go outside.” didn’t work for me. I went to therapy. At some point I gained the ability to actually experience life directly, to be present in the current moment. I realized that the reason I had no life is because I was constantly escaping it. I had to start feeling my feelings. Running away wasn’t working.

That's the tldr; of how I escaped NEETdom. The recovery process is a bit more complex than I've alluded to, maybe I'll make a part 2 in the future.

Discuss

### Thiel on secrets and indefiniteness

4 часа 35 минут назад
Published on April 20, 2021 9:59 PM GMT

Some excerpts from Peter Thiel's 2014 book Zero to One that I've repeatedly come back to over the years:

[...] Why has so much of our society come to believe that there are no hard secrets left? It might start with geography. There are no blank spaces left on the map anymore. If you grew up in the 18th century, there were still new places to go. After hearing tales of foreign adventure, you could become an explorer yourself. This was probably true up through the 19th and early 20th centuries; after that point photography from National Geographic showed every Westerner what even the most exotic, underexplored places on earth look like. Today, explorers are found mostly in history books and children’s tales. Parents don’t expect their kids to become explorers any more than they expect them to become pirates or sultans. Perhaps there are a few dozen uncontacted tribes somewhere deep in the Amazon, and we know there remains one last earthly frontier in the depths of the oceans. But the unknown seems less accessible than ever.

Along with the natural fact that physical frontiers have receded, four social trends have conspired to root out belief in secrets. First is incrementalism. From an early age, we are taught that the right way to do things is to proceed one very small step at a time, day by day, grade by grade. If you overachieve and end up learning something that’s not on the test, you won’t receive credit for it. But in exchange for doing exactly what’s asked of you (and for doing it just a bit better than your peers), you’ll get an A. This process extends all the way up through the tenure track, which is why academics usually chase large numbers of trivial publications instead of new frontiers.

Second is risk aversion. People are scared of secrets because they are scared of being wrong. By definition, a secret hasn’t been vetted by the mainstream. If your goal is to never make a mistake in your life, you shouldn’t look for secrets. The prospect of being lonely but right—dedicating your life to something that no one else believes in—is already hard. The prospect of being lonely and wrong can be unbearable.

Third is complacency. Social elites have the most freedom and ability to explore new thinking, but they seem to believe in secrets the least. Why search for a new secret if you can comfortably collect rents on everything that has already been done? Every fall, the deans at top law schools and business schools welcome the incoming class with the same implicit message: "You got into this elite institution. Your worries are over. You’re set for life." But that’s probably the kind of thing that’s true only if you don’t believe it.

Fourth is "flatness." As globalization advances, people perceive the world as one homogeneous, highly competitive marketplace: the world is "flat." Given that assumption, anyone who might have had the ambition to look for a secret will first ask himself: if it were possible to discover something new, wouldn’t someone from the faceless global talent pool of smarter and more creative people have found it already? This voice of doubt can dissuade people from even starting to look for secrets in a world that seems too big a place for any individual to contribute something unique.

There’s an optimistic way to describe the result of these trends: today, you can’t start a cult. Forty years ago, people were more open to the idea that not all knowledge was widely known. From the Communist Party to the Hare Krishnas, large numbers of people thought they could join some enlightened vanguard that would show them the Way. Very few people take unorthodox ideas seriously today, and the mainstream sees that as a sign of progress. We can be glad that there are fewer crazy cults now, yet that gain has come at great cost: we have given up our sense of wonder at secrets left to be discovered.

The World According to Convention

How must you see the world if you don’t believe in secrets? You’d have to believe we’ve already solved all great questions. If today’s conventions are correct, we can afford to be smug and complacent: "God’s in His heaven, All’s right with the world."

For example, a world without secrets would enjoy a perfect understanding of justice. Every injustice necessarily involves a moral truth that very few people recognize early on: in a democratic society, a wrongful practice persists only when most people don’t perceive it to be unjust. At first, only a small minority of abolitionists knew that slavery was evil; that view has rightly become conventional, but it was still a secret in the early 19th century. To say that there are no secrets left today would mean that we live in a society with no hidden injustices.

In economics, disbelief in secrets leads to faith in efficient markets. But the existence of financial bubbles shows that markets can have extraordinary inefficiencies. (And the more people believe in efficiency, the bigger the bubbles get.) In 1999, nobody wanted to believe that the internet was irrationally overvalued. The same was true of housing in 2005: Fed chairman Alan Greenspan had to acknowledge some "signs of froth in local markets" but stated that "a bubble in home prices for the nation as a whole does not appear likely." The market reflected all knowable information and couldn’t be questioned. Then home prices fell across the country, and the financial crisis of 2008 wiped out trillions. The future turned out to hold many secrets that economists could not make vanish simply by ignoring them. [...]

Thiel argues that there are many discoverable (or discovered-but-not-widely-known) truths you can use to get an edge, make plans, and deliberately engineer a better future.

Some general social facts Thiel cites to argue that people in the US are less interested in secrets than they were in, e.g., the 1950s:

• From ch. 1: We live in an age of globalization, rather than technology innovation—outside of information technology, we've seen a Great Stagnation in new ideas and innovations since the 1970s-80s. We speak as though prosperous nations are "developed" as opposed to "developing," and focus on improving the world by spreading ideas that have already worked, rather than by coming up with radically new ideas.
• From ch. 2: As a result of the Dot-Com Bubble, even IT has become allergic to developing big new plans and ideas.
• From ch. 6: The US in the 1950s had "definite optimism," whereas the present-day US has "indefinite optimism". We think things are going to get better, but we're skeptical that we can learn anything that will help us concretely plan or invent the better thing.
• From ch. 13: The 2005-2010 cleantech bubble again shows companies relying on incremental improvements, conventional and widely shared knowledge, and vague ungrounded optimism.

In an indefinite world, according to Thiel...

Process trumps substance: when people lack concrete plans to carry out, they use formal rules to assemble a portfolio of various options. This describes Americans today. In middle school, we’re encouraged to start hoarding "extracurricular activities." In high school, ambitious students compete even harder to appear omnicompetent. By the time a student gets to college, he’s spent a decade curating a bewilderingly diverse résumé to prepare for a completely unknowable future. Come what may, he’s ready—for nothing in particular. [...]

Instead of working for years to build a new product, indefinite optimists rearrange already-invented ones. Bankers make money by rearranging the capital structures of already existing companies. Lawyers resolve disputes over old things or help other people structure their affairs. And private equity investors and management consultants don’t start new businesses; they squeeze extra efficiency from old ones with incessant procedural optimizations. It’s no surprise that these fields all attract disproportionate numbers of high-achieving Ivy League optionality chasers; what could be a more appropriate reward for two decades of résumé-building than a seemingly elite, process-oriented career that promises to "keep options open"? [...]

While a definitely optimistic future would need engineers to design underwater cities and settlements in space, an indefinitely optimistic future calls for more bankers and lawyers. Finance epitomizes indefinite thinking because it’s the only way to make money when you have no idea how to create wealth. If they don’t go to law school, bright college graduates head to Wall Street precisely because they have no real plan for their careers. And once they arrive at Goldman, they find that even inside finance, everything is indefinite. It’s still optimistic—you wouldn’t play in the markets if you expected to lose—but the fundamental tenet is that the market is random; you can’t know anything specific or substantive; diversification becomes supremely important.

Indefinite Finance

The indefiniteness of finance can be bizarre. Think about what happens when successful entrepreneurs sell their company. What do they do with the money? In a financialized world, it unfolds like this:

• The founders don’t know what to do with it, so they give it to a large bank.
• The bankers don’t know what to do with it, so they diversify by spreading it across a portfolio of institutional investors.
• Institutional investors don’t know what to do with their managed capital, so they diversify by amassing a portfolio of stocks.
• Companies try to increase their share price by generating free cash flows. If they do, they issue dividends or buy back shares and the cycle repeats.

At no point does anyone in the chain know what to do with money in the real economy. But in an indefinite world, people actually prefer unlimited optionality; money is more valuable than anything you could possibly do with it. Only in a definite future is money a means to an end, not the end itself.

Indefinite Politics

Politicians have always been officially accountable to the public at election time, but today they are attuned to what the public thinks at every moment. Modern polling enables politicians to tailor their image to match preexisting public opinion exactly, so for the most part, they do. Nate Silver’s election predictions are remarkably accurate, but even more remarkable is how big a story they become every four years. We are more fascinated today by statistical predictions of what the country will be thinking in a few weeks’ time than by visionary predictions of what the country will look like 10 or 20 years from now.

And it’s not just the electoral process—the very character of government has become indefinite, too. The government used to be able to coordinate complex solutions to problems like atomic weaponry and lunar exploration. But today, after 40 years of indefinite creep, the government mainly just provides insurance; our solutions to big problems are Medicare, Social Security, and a dizzying array of other transfer payment programs. It’s no surprise that entitlement spending has eclipsed discretionary spending every year since 1975. To increase discretionary spending we’d need definite plans to solve specific problems. But according to the indefinite logic of entitlement spending, we can make things better just by sending out more checks. [...]

Indefinite Philosophy

From Herbert Spencer on the right and Hegel in the center to Marx on the left, the 19th century shared a belief in progress. (Remember Marx and Engels’s encomium to the technological triumphs of capitalism from this page.) These thinkers expected material advances to fundamentally change human life for the better: they were definite optimists.

In the late 20th century, indefinite philosophies came to the fore. The two dominant political thinkers, John Rawls and Robert Nozick, are usually seen as stark opposites: on the egalitarian left, Rawls was concerned with questions of fairness and distribution; on the libertarian right, Nozick focused on maximizing individual freedom. They both believed that people could get along with each other peacefully, so unlike the ancients, they were optimistic. But unlike Spencer or Marx, Rawls and Nozick were indefinite optimists: they didn’t have any specific vision of the future. [...]

Today, we exaggerate the differences between left-liberal egalitarianism and libertarian individualism because almost everyone shares their common indefinite attitude. In philosophy, politics, and business, too, arguing over process has become a way to endlessly defer making concrete plans for a better future.

Indefinite Life

Our ancestors sought to understand and extend the human lifespan. In the 16th century, conquistadors searched the jungles of Florida for a Fountain of Youth. Francis Bacon wrote that “the prolongation of life” should be considered its own branch of medicine—and the noblest. In the 1660s, Robert Boyle placed life extension (along with "the Recovery of Youth") atop his famous wish list for the future of science. Whether through geographic exploration or laboratory research, the best minds of the Renaissance thought of death as something to defeat. (Some resisters were killed in action: Bacon caught pneumonia and died in 1626 while experimenting to see if he could extend a chicken’s life by freezing it in the snow.)

We haven’t yet uncovered the secrets of life, but insurers and statisticians in the 19th century successfully revealed a secret about death that still governs our thinking today: they discovered how to reduce it to a mathematical probability. "Life tables" tell us our chances of dying in any given year, something previous generations didn’t know. However, in exchange for better insurance contracts, we seem to have given up the search for secrets about longevity. Systematic knowledge of the current range of human lifespans has made that range seem natural. Today our society is permeated by the twin ideas that death is both inevitable and random.

Meanwhile, probabilistic attitudes have come to shape the agenda of biology itself. In 1928, Scottish scientist Alexander Fleming found that a mysterious antibacterial fungus had grown on a petri dish he’d forgotten to cover in his laboratory: he discovered penicillin by accident. Scientists have sought to harness the power of chance ever since. Modern drug discovery aims to amplify Fleming’s serendipitous circumstances a millionfold: pharmaceutical companies search through combinations of molecular compounds at random, hoping to find a hit.

But it’s not working as well as it used to. Despite dramatic advances over the past two centuries, in recent decades biotechnology hasn’t met the expectations of investors—or patients. Eroom’s law—that’s Moore’s law backward—observes that the number of new drugs approved per billion dollars spent on R&D has halved every nine years since 1950. [...]

Biotech startups are an extreme example of indefinite thinking. Researchers experiment with things that just might work instead of refining definite theories about how the body’s systems operate. Biologists say they need to work this way because the underlying biology is hard. According to them, IT startups work because we created computers ourselves and designed them to reliably obey our commands. Biotech is difficult because we didn’t design our bodies, and the more we learn about them, the more complex they turn out to be.

But today it’s possible to wonder whether the genuine difficulty of biology has become an excuse for biotech startups’ indefinite approach to business in general. Most of the people involved expect some things to work eventually, but few want to commit to a specific company with the level of intensity necessary for success. It starts with the professors who often become part-time consultants instead of full-time employees—even for the biotech startups that begin from their own research. Then everyone else imitates the professors’ indefinite attitude.

On Thiel's account, people don't believe in secrets, but they do believe in mysteries or things we can't figure out today, though we might know them at some point in the indefinite future. The indefinite optimism he's criticizing doesn't assume we're omniscient, but it assumes that there are relatively few cheat codes or exploits an individual can discover, especially in the domain of "altering and predicting the long-term future".

New discoveries spontaneously pop out of a slot machine, and then go straight to the textbook or the trash heap; and only the gullible will favor unpopular ideas over popular ones.

Discuss

### Hard vs Soft in fields as attitudes towards model collision

20 апреля, 2021 - 21:57
Published on April 20, 2021 6:57 PM GMT

Many people will describe physics, chemistry, microeconomics and some parts of biology as "hard sciences" while describing psychology, sociology, politics and other parts of biology as "soft sciences". I think this taps into a set of attitudes within each field towards what we might refer to as model collision.

In fields which attempt to describe reality, different systems are described with different models. For example in physics we might use fluid dynamics to describe flowing water, rigid body mechanics to describe the movement of a set of gears, quantum electrodynamics to calculate the energy of a chemical bond between two hydrogen atoms etc. Each model can be considered to cover a certain area of reality, with different models covering different situations. Some models have clearly-delineated boundaries, sometimes there are gaps where nothing is understood, but in many cases it is not clear which model to apply. This could be because the boundaries of a model are not well defined within the model (what counts as a fluid?) or because two or more models overlap in scope (almost all economics and psychology). We can refer to this as a model collision.

For example (to use physics again) should we model not-quite-molten metal as a fluid which can flow or as a plastic solid body deforming under its own weight. In this case the answer is to experiment first and build a model later. In fields like this the collision of two different models can be resolved by experiment. It helps in physics that the underlying reality is well understood, in this case we know that the rigidity (or not) of a body is governed by the forces between particles.

Without the ability to experiment (or experiment reproducibly), as happens often in fields like psychology or politics, the two (or many more) models must end up coexisting. There are two ways for this to resolve: one is "everyone in the field puts appropriate weight on each model when making decisions, while searching for the truth" and the other is "the field splits into several angry mobs trying to prove that their model is the obviously correct one". Sadly the second case is more common (partially because it results in more papers being published). This can also result in models going in and out of fashion according to political concerns of those working to get grant funding. (I believe the hypothesis that protein aggregates cause Alzheimer's is finally going out of fashion, hurrah!)

So is there a way to shift from the second scenario to the first? Well as stated above if we can simply do experiments, we can find out the answer. Demanding mathematical rigour in our models can also help, as it allows us to compare them more meaningfully (in some cases in chemistry, once two models are known to be accurate in extreme cases, we just numerically interpolate between them for intermediate cases). We can also try and force culture shifts away from the warlike confusion scenario to the more collaborative one.

As an aside: If we can put different scientific disciplines on a scale based on how they handle model collision, what happens if we go off the deep end of model coexistence? I think we end up with disciplines like film, literature or art analysis. Here various theories of analysis are explicitly allowed to coexist, and are more about rationalizing why a piece of art has an effect, rather than trying to predict the effect of new art. Note that softer disciplines can definitely still have something interesting to say, though evaluating the field as a whole will probably not allow you to make predictions about it.

Discuss

### Young kids catching COVID: how much to worry?

20 апреля, 2021 - 21:03
Published on April 20, 2021 6:03 PM GMT

Low confidence, slapdash job. Just putting this up in case other people want to compare notes.

Now that all the adults in my family have been (at least partially) vaccinated, my kids will soon be the most COVID-vulnerable members of my family. Therefore it is newly decision-relevant to get a good sense for exactly how worried I should be about them catching COVID-19. This is April 2021, Boston area, with a 2yo and 6yo.

Target audience: Frequent lesswrong.com readers. Everyone else, go away. This is written for people who treat 1-in-10,000 risks as dramatically, wildly, viscerally scarier than 1-in-100,000 risks, people who understand that “zero risk” is a thing that does not exist in our universe, people who understand that life is full of tradeoffs, including between mental health and physical health, etc. etc. etc.

1. Death from COVID-19:

According to this paper written in August 2020 Fig. 2, the IFR is ~3/100,000 for age 0-4, <1/100,000 for age 5-9. I personally can probably adjust that downward from the known lack of risk factors. So that's very low—not worth sacrificing significant quality-of-life over. (That’s like a month or two out of a reasonable fatality risk budget, I figure—even less since we’re not making decisions that swing the risk of COVID-19 infection all the way from 0% to 100%.)

2. MIS-C:

“Multisystem Inflammatory Syndrome in Children” is a frightening syndrome that can produce severe problems including heart problems, neurological symptoms, strokes, and so on. CDC says (via NYT) that they know of 3185 cases (of which 1% were fatal, but death is already included in the previous section) as of this writing. I'll ignore the possibility that there are more MIS-C cases that the CDC doesn’t know about—this is a pretty serious and well-publicized condition, I presume that most kids with MIS-C are being hospitalized and diagnosed. CDC says most cases of MIS-C were ages 1-14, which is I guess a population of 60M in the USA. I dunno how many kids have been infected with COVID total, but if it’s similar to the prevailing rate (figure 28% including undetected cases), then we’re around 2-in-10,000 risk of getting MIS-C, conditional on catching COVID. (The number of detected cases in kids is disproportionately low compared to the rest of the population, I think, but I’ve always just been figuring that they’re less likely to be symptomatic than adults and therefore have an unusually low detection rate.)

Mayo clinic says “In rare cases, MIS-C could result in permanent damage or even death.”, which (accidentally) implies that almost all the time, kids who get MIS-C fully recover without permanent damage. That’s not a great source, but whatever. Also, this says that 7/7 MIS-C cases at a particular hospital were “fully recovered”.

So I figure, conditional on a kid catching COVID, there’s a 2-in-10,000 risk of getting MIS-C, going through a somewhat terrifying ordeal, but eventually fully recovering. And, there's a, I dunno, 1-in-100,000 risk of permanent problems. Again, combine that with the fact that I’m not making decisions that swing the risk of COVID-19 infection all the way from 0% to 100%, and I find this a pretty much acceptable price in cases where I’m spending it on real benefits in my kids’ mental health and quality-of-life. Unless my numbers are wrong of course. So I'm pretty much ignoring MIS-C too. The next two categories seem much worse than that.

3. Long COVID:

Children with long covid” (New Scientist, Feb 2021) says “Evidence from the first study of long covid in children suggests that more than half of children aged between 6 and 16 years old who contract the virus have at least one symptom lasting more than 120 days, with 42.6 per cent impaired by these symptoms during daily activities.” What??? 43%? No way. That’s way too high. This article calls it rare. 43% is not rare.

The 43% statistic comes from Preliminary Evidence on Long COVID in children. It seems like a helpful article but I don’t know what to make of the selection bias. Where exactly did they get these kids? “This cross-sectional study included all children ≤18 year-old diagnosed with microbiologically-confirmed COVID-19 in Fondazione Policlinico Universitario A. Gemelli IRCCS (Rome, Italy).” That should disproportionately sample sick kids, and especially severely sick kids, right? So I’m going to ignore that.

The New Scientist article also says “The UK Office for National Statistics's latest report estimates that 12.9 per cent of UK children aged 2 to 11, and 14.5 per cent of children aged 12 to 16, still have symptoms five weeks after their first infection.” That’s this link. I’m guessing that the population here is "initially-symptomatic kids" as opposed to "all infected kids". So divide by 2 or 3? And not all of those 12.9% are catastrophic. Some may be kids who are easily-fatigued for 5 weeks then recover, which kinda sucks but isn’t that big a deal. I dunno, figure, conditional on a kid catching COVID-19, 2% chance that it’s, like, a really really long and miserable slog that everyone will deeply regret. The rest of the time it's at worst in the ballpark of adding up 1 unusually severe flu + 1 broken leg—lots of pain, hassle, doctors visits, medical bills, missed activities, and so on, but not worse than that. Life goes on.

4. Long-term complications:

Maybe you catch COVID as a kid and then there’s a 1% higher risk of heart disease decades later. Or something else. How would we even know?

My general impression is that kids’ bodies are generally good at recovering and rebuilding themselves over time. But that’s not really based on anything. An example in the opposite direction is polio: I guess polio kills nerve cells in a way that’s unrepairable, and which gradually gets worse and worse over decades after apparent recovery? Is the nervous system unusually hard to repair?? Because, um, COVID often impacts the nervous system too, right?! Yeesh.

I have no idea, I’m out of my depth here.

I guess I'll say 1% chance of a big-deal long-term latent problem, conditional on catching COVID-19. That's not really based on anything, but I need a number because I have to make decisions and weigh tradeoffs. Happy for any input here.

Conclusions:

So, conditional on a kid catching COVID, I guess I'm currently thinking that I should mainly be weighing a ~2% chance of a miserable months-long ordeal until they recover, plus (overlapping) ~1% chance of a big-deal long-term latent problem that will show up later in life.

OK, I guess when I multiply everything out right now…

• Risk rounds to zero, do it without thinking twice.
• Fully-vaccinated adults (6+ weeks past 1st dose) spending time indoors unmasked with my kids
• If the adults have not recently been spending extensive (or unmasked) time indoors with unvaccinated people, then no problem, don’t even think twice about it. If they have, then try to avoid it, but maybe it’s OK from time to time if there’s sufficient social benefit.
• Somewhat-vaccinated adults (2-6 weeks past 1st dose) spending time indoors unmasked with my kids
• If necessary. Depends on what the adult has been up to and how much social benefit we’re getting out of it.
• Kids go to school (masked)
• We're already doing that. Our local school is pretty good about ventilation and masking, and has mandatory universal weekly PCR testing. Any remaining risk is more than compensated by the very large benefits for both kids & parents.
• Kids spending time inside with other kids
• I guess on rare occasions if there’s a sufficient social benefit. But definitely try to keep such activities outside, until community spread goes down from its current high levels. Oh, I guess there should be an exception for school classmates, since they're already spending time together inside masked, every day at school. But go for masks and open windows.
• Kids tag along shopping indoors, masked
• Check the microcovid calculator, but probably not if it’s avoidable, at least not until community spread goes down from its current high levels.

Discuss

### Types of generalism

20 апреля, 2021 - 11:22
Published on April 20, 2021 8:22 AM GMT

[Cross-posted from here]

I am interested in the nature of inter- and transdisciplinary research, which often involves some notion of “generalism”. There are different ways to further conceptualize generalism in this context.

First, a bit of terminology that I will rely on throughout this post: I call bodies of knowledge where insights are being drawn from source domains”. The body of knowledge that is being informed by this approach is called the “target domain”.

Directionality of generalism

We can distinguish between SFI-style generalism from FHI-style generalism? (h/t particlemania for first formulating this idea)

• In the case of SFI-style generalism, the source domain is fixed and they have a portfolio of target domains that may gain value from “export”.
• In the case of FHI-style generalism, the target domain is fixed and the approach is to build a portfolio of diverse source expertise.

In the case of SFI, their source domain is the study of complex systems, which they apply to topics as varied as life and intelligence, cities, economics and institutions, opinion formation, etc.

In the case of FHI, the target domain is fixed, although more vaguely than it might be, via the problem of civilization-scale consequentialism and source domains include philosophy, international relations, machine learning and more.

Full vs partial generalism

Partial generalism: Any one actor should focus on one (or a similarly small number of) source domains to draw from.

Arguments:

• Ability: Any one actor can only be well-positioned to work with a small number of source domains because doing this work well requires expertise with the source domain. Expertise takes time to develop, so naturally, the number of source domains a single person will be able to draw upon (with adequate epistemic rigor) is limited.
• Increasing returns to depth: The deeper an actor’s expertise in two fields they are translating between, the higher the expected value of their work. This can apply to individual researchers as well as to a team/organization doing generalist researchers.

Full generalism: As long as you fix your target domain, an actor can and should venture into many source domains.

Arguments:

• Ability: An actor can do high-quality research while drawing from a (relatively) large number of source domains, some of which they only learn about as they discover them. This “ability” could come from several sources:
• The researchers’ inherent cognitive abilities
• The structure (i.e. lack of depth) of the field (sometimes a field might be sufficiently shallow in its structure that the assumption that someone can get adequately oriented within this field is justified)
• Error correction mechanisms within the intellectual community being sufficiently fit (which means that, even if an individual starts out by getting some important things wrong, error correction mechanisms guarantee that these mistakes will be readily discovered and corrected for).
• Increasing returns to scope: The richer (in intellectual diversity) an actor’s expertise, the juicier the insights. Again, this argument could apply to an individual or groups of individuals working closely together.

Note that you can achieve full generalism at an organizational level while having a team of individuals that all engage in partial generalism.

Discuss

### Does an app/group for personal forecasting exist?

20 апреля, 2021 - 08:04
Published on April 20, 2021 5:04 AM GMT

I'm interested in personal forecasting - predicting my own future behavior on a range of timescales. I see it as a more useful skill than forecasting on world events. Formulating personally useful forecasts seems like an important and neglected skill in the rationalist community. And it would be nice to have some company, and tools to make it more convenient. Right now, I'm just using a spreadsheet. Does anybody know if there are groups doing this sort of thing? Is there a good app to manage the process?

Discuss

### Iterated Trust Kickstarters

20 апреля, 2021 - 06:18
Published on April 20, 2021 3:18 AM GMT

Epistemic Status: I haven't actually used this through to completion with anyone. But, it seems like a tool that I expect to be useful, and it only really works if multiple people know about it.

In this post, I want to make you aware of a few things:

Iterated kickstarters: Kickstarters where all the payment doesn't go in instantly – instead people pay in incrementally, after seeing partial progress on the goal. (Or, if you don't actually have a government-backed-assurance-contract, people pay in incrementally as you see other people pay in incrementally, so the system doesn't require as much trust to bootstrap)

Trust kickstarters: Kickstarters that are not about money, and are instead about "do we have the mutual trust, goodwill and respect necessary to pull a project or relationship off?" I might be scared to invest into my relationship with you, if I don't think you're invested in me.

Iterated trust kickstarters: Combining those two concepts.

Iterated Kickstarters

In The Strategy of Conflict, Thomas Schelling (of Schelling Point fame), poses a problem: Say you have a one-shot coordination game. If Alice put in a million dollars, and her business partner Bob puts in a million dollars, they both get 10 million dollars. But if only one of you puts in a million, the other can abscond with it.

A million dollars is a lot of money for most people. Jeez.

What to do?

Well, hopefully you live in a society that has built well-enforced laws around assurance contracts (aka "kickstarters"). You put in a million. If your partner backs out, the government punishes them, and/or forces them to return the money.

But what if there isn't a government? What if we live in the Before Times, and we're two rival clans who for some reason have a temporary incentive to work together (but still incentive to defect)? What if we live in present day, but Alice and Bob are two entirely different countries with no shared tradition of cooperation?

There are a few ways to solve this. But one way is to split the one shot dilemma into an iterated game. Instead of putting in a million dollars, you each put in $10. If you both did that, then you each put in another$10, and another. Now that the game is iterated, the payoff strategy changes from prisoner's dilemma to stag hunt. Sure, at any given time you could defect, but you'd be getting a measly $10, and giving up on a massive$10 million potential payoff.

You see small versions of this fairly commonly on craigslist or in other low-trust contract work. "Pay me half the money up front, and then half upon completion."

This still sometimes results in people running off with the first half of the money. I'm assuming people do "half and half" instead of splitting it into even smaller chunks because the transaction costs get too high. But for many contractors, there are benefits to following through (instead of taking the money and running), because there's still a broader iterated game of reputation, and getting repeat clients, who eventually introduce you to other clients, etc.

(You might say that the common employment model of "I do a week of work, and then you pay me for a week of work, over and over again" is a type of iterated kickstarter).

If you're two rival clans of outlaws, trying to bootstrap trust, it's potentially fruitful to establish a tradition of cooperation, where the longterm payoff is better than any individual chance to defect.

Trust Kickstarters

Meanwhile: sometimes the thing that needs kickstarting is not money, but trust and goodwill.

Goodwill kickstarters

I've seen a few situations where multiple parties feel aggrieved, exhausted, and don't want to continue a relationship anymore. This could happen to friends, lovers, coworkers, or project-cofounders.

They each feel like the other person was more at fault. They each feel taken advantage of, and like it'd make them a doormat if they went and extended an olive branch when the other guy hasn't even said "sorry" yet.

This might come from a pure escalation spiral: Alice accidentally is a bit of a jerk to Bob on Monday. Then Bob feels annoyed and acts snippy at Alice on Tuesday. Then on Wednesday Alice is like "jeez Bob what's your problem?" and then is actively annoying as retribution. And by the end of the month they're each kinda actively hostile and don't want to be friends anymore.

Sometimes, the problem stems from cultural mismatches. Carl keeps being late to meetings with Dwight. For Dwight, "not respecting my time" is a serious offense that annoys him a lot. For Carl, trying to squeeze in a friend hangout when you barely have time is a sign of love (and meanwhile doesn't care when people are late). At first, they don't know about each other's different cultural assumptions, and they just accidentally 'betray' each other. Then they start getting persistently mad about the conflict and accrue resentment.

Their mutual friend Charlie comes by and sees that Alice and Bob are in conflict, but the conflict stems is all downstream from a misunderstanding, or a minor mishap that really didn't need to have been a big deal.

"Can't you just both apologize and move on?" asks Charlie.

But by now, after months of escalation, Alice and Bob have both done some things that were legitimately hurtful to each other, or have mild PTSD-like symptoms around each other.

They'd be willing to sit down, apologize, and work through their problems, if the other one apologized first. When they imagine apologizing first, they feel scared and vulnerable.

I'll be honest, I feel somewhat confused about how to best to relate to this sort of situation. I'm currently related it through the lens of game-theory. I can imagine the best advice for most people is to not overthink it, don't stress about game theory. Maybe you should just be letting your hearts and bodies be talking to each other, elephant to elephant.

But... also, it seems like the game theory is just really straightforward here. A "goodwill kickstarter" really should Just Work in these circumstances. If it's true that "I would apologize to you if you apologized to me", and vice versa, holy shit, why are you two still fighting?

Just, agree that you will both apologize conditional on the other person apologizing, and that you would both be willing to re-adopt a friendship relational stance conditional on the other person doing that.

And then, do that.

Competence Kickstarter

Alternately, you might to kickstart "trust in competence."

Say that Joe keeps screwing up at work – he's late, he's dropping the ball on projects, he's making various minor mistakes, he's communicating poorly. And his boss Henry has started getting angry about it, nagging Joe constantly, pressuring Joe to stay late to finish his work, constantly micromanaging him.

I can imagine some stories here where Joe was "originally" the one at fault (he was just bad at his job for some preventable reason one week, and then Henry started getting mad). I can also imagine stories here where the problems stemmed originally from Henry's bad management (maybe Henry was taking some unrelated anger out on Joe, and then Joe started caring less about his job).

Either way, by now they can't stand each other. Joe feels anxious heading into work each day. Henry feels like talking to Joe isn't work it.

They could sit down, earnestly talk through the situation, take stock of how to improve it. But they don't feel like they can have that conversation, for two reasons.

One reason is that there isn't enough goodwill. The situation has escalated and both are pissed at each other.

Another reason, though, is that they don't trust each other's competence.

Manager Henry doesn't trust that Joe can actually reliably get his work done.

Employee Joe doesn't believe that Henry can give Joe more autonomy, talk to him with respect, etc.

In some companies and some situations, by this point it's already too late. It's pretty overdetermined that Henry fires Joe. But that's not always the right call. Maybe Henry and Joe have worked together long enough to remember that they used to be able to work well together. It seems like it should be possible to repair the working relationship. Meanwhile Joe has a bunch of talents that are hard to replace – he built many pieces of the company infrastructure and training a new person to replace him would be costly. And there's a bunch of nice things about the company they work that makes Joe prefer not to have to quit to find a better job elsewhere.

To repair the relationship, Henry needs to believe that Joe can start getting work done reliably. Joe needs to believe that Henry can start treating him with respect, without shouting angrily or micromanaging.

This only works if they in fact both can credibly signal that they will do these things. This works if the missing ingredient is "just try harder." Maybe the only reason Joe isn't working reliably is that he no longer believes it's worth it, and the only reason Henry is being an annoying manager is that he felt like he needed to get Joe to get his stuff done on time.

In that case, it's reasonably straightforward to say: "I would do my job if you did yours", coupled with the relational-stance-change of "I would become genuinely excited to be your employee if you became genuinely excited about being my boss".

Sometimes, this won't work. The kickstarter can't trigger because Henry doesn't, in fact, trust Joe to do the thing, even if Joe is trying hard.

But, you can still clearly lay out the terms of the kickstarter. "Joe, here's what I need from you. If you can't do that, maybe I need to fire you. Maybe you need to go on a sabbatical and see if you can get your shit together." Maybe you can explore other possible configurations. Maybe the reason Joe isn't getting his work done is because of a problem at home, and he needs to take a couple weeks off to fix his marriage or something, but would be able to come back and be a valuable team member afterwards.

I think having the terms of the kickstarter clearly laid out is helpful for thinking about the problem, without having to commit to anything.

Why do you need to think about this in terms of "kickstarter", rather than just "a deal?". What feels special to me about relationship kickstarters is that relationship (and perhaps other projects) benefit from investment and momentum. If your stance is "I'm ready to jump and execute this plan if only other people were onboard and able to fulfill their end", then you can be better positioned to get moving quickly as soon as the others are on board.

The nice thing about the kickstarter frame, IMO, is I can take a relationship that is fairly toxic, and I can set my internal stance to be ready to fix the relationship, but without opening myself up to exploitation if the other person isn't going to do the things I think are necessary on their end.

Iterated Trust Kickstarters

And then, sometimes, a one-shot kickstarter isn't enough.

Henry and Joe

In the case of Henry and Joe: maybe "just try harder" isn't good enough. Joe has some great skills, but is genuinely bad at managing his time. Henry is good at the big picture of planning a project, but finds himself bad at managing his emotions, in a way that makes him bad at actually managing people.

It might be that even if they both really wanted things to work out, and were going to invest fully in repairing their working relationship... the next week, Joe might miss a deadline, and Henry would snippily yell at him in a way that was unhelpful. They both have behavioral patterns that will not change overnight.

In that case, you might want to combine "trust kickstarter" and "iterated kickstarter."

Here, Joe and Henry both acknowledge that they're expecting this to be a multi-week (or month) project. The plan needs to include some slack to handle the fact that they might fuck up a bit, and a sense of what's supposed to happen when one of them screws up. It also needs a mechanism for saying "you know what, this isn't working."

"Iterated Trust Kickstarter" means, "I'm not going to fully start trusting you because you say you're going to try harder and trust me in turn. But, I will trust you a little bit, and give it some chance to work out, and then trust you a bit more, etc." And vice versa.

Rebuilding a Marriage

A major reason to want this is that sometimes, you feel like someone has legitimately hurt you. Imagine a married couple who had a decade or so of great marriage, but then ended up in a several-year spiral where they stop making time for each other, get into lots of fights. Each of them has built up a story in their head where the other person is hurting them. Each of them has done some genuinely bad things (maybe cheated, maybe yelled a lot in a scary way).

Relationships that have gone sour can be really tricky. I've seen a few people end up in states where I think it's legitimately reasonable to be worried their partner is abusive, but also, it's legitimately reasonable to think that the bad behavioral patterns are an artifact of a particularly bad set of circumstances. If Alice and Bob were to work their way out of those circumstances, they could still rebuild something healthy and great.

In those cases, I think it's important for people to able invest a little back into the relationship – give a bit of trust, love, apology, etc, as a signal that they think the relationship is worth repairing. But, well, "once bitten, twice shy." If someone has hurt you, especially multiple times, it's sometimes really bad to leap directly into "fully trusting the other person."

I think the Iterated Trust Kickstarter concept is something a lot of people do organically without thinking about it in exactly these terms (i.e lots of people damage a relationship and then slowly/carefully repair it).

I like having the concept handle because it helps me think about how exactly I'm relating to a person. It provides a concrete frame for avoiding the failure modes of "holding a relationship at a distance, such that you're basically sabotaging attempts to repair it", and "diving in so recklessly that you end up just getting hurt over and over."

The ITK frame helps me lean hard into repairing a relationship, in a way that feels safe.

(disclaimer: I haven't directly used this framework through to completion, so I can't vouch for it working in practice. But this seems to mostly be a formalization of a thing I see people doing informally that works alright)

Concrete Plans

For an ITK to work out, I think there often needs to be a concrete, workable plan. It may not enough to just start trusting each other and hope it works out.

If you don't trust each other's competence (either at "doing my day job", or "learning to speak each other's love languages"), then, you might need to check:

• Does Alice/Bob each understand what things they want from one another? If this is about emotional or communication skills they don't have, do they have a shared understanding of what skills they are trying to gain and why they will help?
• Do they have an actual workable plan for gaining those skills?

Say that Bob has tried to get better at communication a few times, but he keeps running into the same ugh fields which prevent him from focusing on the problem. He and Alice might need to work out a plan together for navigating those ugh fields before Alice will feel safe investing more in the relationship.

And if Alice is already feeling burned, she might already be so estranged that she's not willing to help Bob come up with a plan to navigate the ugh-fields. "Bob, my terms for the initial step in the kickstarter is that I need you to have already figured out how to navigate ugh fields on your own, before I'm willing to invest anything."

Unilaterally Offering Kickstarters

Part of why I'd like to have this concept in my local rationalist-cultural-circles is that I think it's pretty reasonable to extend a kickstarter offer unilaterally, if everyone involved is already familiar with the concept and you don't have to explain it.

(New coordinated schemes are costly to evaluate, so if your companion isn't already feeling excited about working with you on something, it may be asking too much of them to listen to you explain Iterated Trust Kickstarters in the same motion as asking them to consider "do you want to invest more in your relationship with me?")

But it feels like a useful tool to have in the water, available when people need it.

In many of the examples so far, Alice and Bob both want the relationship to succeed. But, sometimes, there's a situation Alice has totally given up on the relationship. Bob may also feel burned by Alice, but he at least feels there's some potential value on the table. And it'd be nice to easily be able to say:

"Alice, for what it's worth, I'd be willing to talk through the relationship, figure out what to do, and do it. I'm still mad, but I'd join the Iterated Kickstarter here." If done right, this doesn't have to cost Bob anything other than the time spent saying the sentence, and Alice the time spent listening to it. If Alice isn't interested, that can be the end of that.

But sometimes, knowing that someone else would put in effort if you also would, is helpful for rekindling things.

Discuss

### How can we increase the frequency of rare insights?

20 апреля, 2021 - 03:12
Published on April 19, 2021 10:54 PM GMT

In many contexts, progress largely comes not from incremental progress, but from sudden and unpredictable insights. This is true at many different levels of scope—from one person's current project, to one person's life's work, to the aggregate output of an entire field. But we know almost nothing about what causes these insights or how to increase their frequency.

Incremental progress vs. sudden insights

To simplify, progress can come in one of two ways:

1. Incremental improvements through spending a long time doing hard work.
2. Long periods of no progress, interspersed with sudden flashes of insight.

Realistically, the truth falls somewhere between these two extremes. Some activities, like theorem-proving, look more like the second case; other activities, like transcribing paper records onto a computer, look more like the first. When Andrew Wiles proved Fermat's Last Theorem, he had to go through the grind of writing a 200-page proof, but he also had to have sparks of insight to figure out how to bridge the missing gaps in the proof.

The axis of incremental improvements vs. rare insights is mostly independent of the axis of easy vs. hard. A task can be sudden and easy, or incremental and hard. For example:[1]

incremental work sudden insights easy algebra homework geometry homework hard building machine learning models proving novel theorems

Insofar as progress comes from "doing the work", we know how to make progress. But insofar as it comes from rare insights, we don't know.

Some meditations on the nature of insights

Why did it take so long to invent X?

Feynman on finding the right psychological conditions

I worked out the theory of helium, once, and suddenly saw everything. I'd been struggling, struggling for two years, and suddenly saw everything at one time. [...] And then you wonder, what's the psychological condition? Well I know at that particular time, I simply looked up and I said wait a minute, it can't be quite that difficult. It must be very easy. I'll stand back, I'll treat it very lightly, I'll just tap it, and there it was! So how many times since then, I'm walking on the beach and I say, now look, it can't be that complicated. And I'll tap it, tap it, nothing happens.

Feynman tried to figure out what conditions lead to insights, but he "never found any correlations with anything."

P vs. NP

A pessimistic take would be that there's basically no way to increase the probability of insights. Recognizing insights as obvious in retrospect is easy, but coming up with them is hard, and this is a fundamental mathematical fact about reality because P != NP (probably). As Scott Aaronson writes:

If P=NP, then the world would be a profoundly different place than we usually assume it to be. There would be no special value in "creative leaps," no fundamental gap between solving a problem and recognizing the solution once it's found. Everyone who could appreciate a symphony would be Mozart; everyone who could follow a step-by-step argument would be Gauss; everyone who could recognize a good investment strategy would be Warren Buffett. It’s possible to put the point in Darwinian terms: if this is the sort of universe we inhabited, why wouldn’t we already have evolved to take advantage of it?

I'm not quite so pessimistic. I agree with Scott Aaronson's basic argument that solving problems is much harder than recognizing good solutions, but there might still be ways we could make it easier to solve problems.

johnswentworth on problems we don't understand

The concept of sudden-insight problems relates to johnswentworth's concept of problems we don't understand. Problems we don't understand almost always require sudden insights, but problems that require sudden insights might be problems we understand (for example, proving theorems). johnswentworth proposes some types of learning that could help:

• Learn the gears of a system, so you can later tackle problems involving the system which are unlike any you've seen before. Ex.: physiology classes for doctors.
• Learn how to think about a system at a high level, e.g. enough to do Fermi estimates or identify key bottlenecks relevant to some design problem. Ex.: intro-level fluid mechanics.
• Uncover unknown unknowns, like pitfalls which you wouldn't have thought to check for, tools you wouldn't have known existed, or problems you didn't know were tractable/intractable. Ex.: intro-level statistics, or any course covering NP-completeness.

I would expect these types of learning to increase the rate of insights.

Learning how to increase the frequency of insights

Insights happen less frequently under bad conditions: when you're sleep-deprived, or malnourished, or stressed out, or distracted by other problems. Some actions can increase the probability of insights—for example, by studying the field and getting a good understanding of similar problems. But even under ideal conditions, insights are rare.

Interestingly, most of the things that increase the frequency of insights, such as sleep and caffeine, also increase the speed at which you can do incremental work. It's possible that these things speed up thinking, but don't increase the probability that any particular thought is the "right" one.

I can come up with one exception: you can (probably?) increase the frequency of insights on a problem if you understand a wide variety of problems and concepts. I don't believe this does much to speed up incremental work, but it does make sudden insights more likely. Perhaps this happens because sudden insights often come from connecting two seemingly-unrelated ideas. I've heard some people recommend studying two disparate fields because you can use your knowledge of one field to bring a unique perspective to the other one.

Overall, though, it seems to me that we as a society basically have no idea how to increase insights' frequency beyond a basic low level.

Instead of directly asking how to produce insights, we can ask how to learn how to produce insights. If we wanted to learn more about what conditions produce insights, how might we do that? Could we formally study the conditions under which geniuses come up with genius ideas?

If someone gave me a pile of money and asked me to figure out what conditions best promote insights, what would I do? I might start by recruiting a bunch of mathematicians and scientists to regularly report on their conditions along a bunch of axes: how long they slept, their stress level, etc. (I'd probably want to figure out some axes worth studying that we don't already know much about, since we know that conditions (like sleep quality) do affect cognitive capacity.) Also have them report whenever they make some sort of breakthrough. If we collect enough high-quality data, we should be able to figure out what conditions work best, and disambiguate between factors that help provide insights and factors that "merely" increase cognitive capacity.

I'm mostly just speculating here—I'm not sure the best way to study how to have insights. But it does seem like an important thing to know, and right now we understand very little about it.

1. Some more specific examples from things I've worked on:

↩︎

Discuss

### Quick examination of miles per micromort for US drivers, with adjustments for safety-increasing behavior

20 апреля, 2021 - 02:19
Published on April 19, 2021 11:19 PM GMT

This post links to a Google Sheet containing a quick investigation into the accuracy of Wikipedia's figure for miles per micromort (230) for US drivers, when accounting for preventative behaviors.

The following are the main outcome estimates:

Miles per micromort, no adjustments, in US (2019)91-- If excluding motorcycles105-- If excluding motorcycles and pedestrians, pedalcyclists, and other nonoccupants137-- Amongst passenger vehicle occupants only132-- Amongst passenger vehicle occupants only, if setting single-car crashes to 0235-- Amongst passenger vehicle occupants only, if approximating the seatbelt-wearing only rate245-- Amongst passenger vehicle occupants only, if setting single-car crashes to 0 and approximating the seatbelt-wearing only rate442-- Amongst passenger vehicle occupants only, if setting single-car crashes to 0 and approximating the seatbelt-wearing only rate and if setting alcohol-impaired, drowsiness-associated, and distraction-associated deaths to 50% of current level (as an approximation of controlling one driver's behavior in two driver crashes)548

This rapid (~1.5 hrs including documentation) investigation was funded by Ruby Bloom via the Bountied Rationality FB group.

Discuss

### Wanted: Research Assistant for The Roots of Progress

19 апреля, 2021 - 22:03
Published on April 19, 2021 7:03 PM GMT

I’m hiring a part-time research assistant to support work on my essays, talks, and the book I’m writing on the history of industrial civilization.

You must have the ability to orient yourself in unfamiliar mental territory; to penetrate the fog of confusing, incomplete, and contradictory information; to sniff out reliable sources of key facts and to corroborate them; and to quickly sketch out a new intellectual landscape.

You will handle queries such as:

• What happened to the price of cotton and the wages of textile laborers before, during, and after textile mechanization in the 18th/19th centuries? Find data and analysis on this, including relevant statistics on labor productivity, and produce a list of sources.
• What startups or other commercial projects are pursuing advanced nuclear reactor designs? Make a list, and fill out details of each in a spreadsheet, such as type of reactor, amount and sources of funding, etc.
• Find first-person accounts of agricultural life and work before the 19th century, including descriptions of regular planting and harvesting seasons, and also times of crop failure or even famine.
• What is the difference between a bloomery, a blast furnace, and a Catalan forge? Make a list of sources that address this question.

The deliverable will typically be a list of sources, with brief notes on what each one contains, ranked roughly in order of relevance to the original query. You don’t have to answer the questions I pose, but you need to find sources that help me answer them.

The only real requirements are writing skills and attention to detail. However, the ideal candidate would be:

• A graduate student in history, economics, or a related field (ideally with access to scholarly sources)
• Familiar with and interested the progress community in general, and my work in particular
• Able to put in part-time work with fairly quick turnaround (24 hours for small queries would be excellent)

If you lack experience and credentials, apply anyway: you can make up for it by being dedicated, diligent, and willing/able to be trained.

The work will be variable, up to roughly 10–15 hours/week. We’ll mostly communicate by email/messaging, so you can be in any time zone. Pay: \$25–30/hour, depending on qualifications.

To apply, send a CV/resume and writing sample to me at jason@rootsofprogress.org.

Discuss

### You Can Now Embed Flashcard Quizzes in Your LessWrong posts!

19 апреля, 2021 - 16:44
Published on April 19, 2021 1:44 PM GMT

With the help of the LessWrong.com team, we've set up a way for you to embed flashcard quizzes directly in your LessWrong posts! This means that you can write flashcards for any of your LessWrong posts and either:

(b) provide an easy way for them to continue to be quizzed after they are done reading so that they can indefinitely remember the most important things they learned in your article!

This post will explain how to add flashcards to your own posts in a step-by-step fashion.

If you want to see an example of a LessWrong post with flashcards, check out my LessWrong post on self-control, where we first experimented with this feature.

And before we get to the instructions, here's an example (from that same post) of what embedded flashcards look like when you put them right in your article:

Step 1: Create your own deck of flashcards using Thought Saver

Create a Thought Saver account at app.thoughtsaver.com and use it to create some flashcards for your post. You'll need to put all the flashcards for your post into the same deck. Here’s how to do that:

(i) Click “New Card” in Thought Saver to start creating a new flashcard - but don’t save it just yet.

(ii) In the text input box with the label "Decks"...

1. Type the name of the new deck you'd like to create for your article.
2. Hit "Enter" ("Return") or click "Create new deck".
3. This card has now been added to that deck, and this deck will now be available so that you can add all the other flashcards (for your post) to it too!

Example:

• Type "Book summary: The Very Hungry Caterpillar."
• Click 'Create new deck: "Book summary: The Very Hungry Caterpillar."'
• Repeat these steps until you’ve created all flashcards for your article and added them all to this same deck.
Step 2: Go to the Thought Saver page for your new deck

(You’re ready to take this step once you've created all the flashcards for your article and added them to the same deck.)

(i) Navigate to the page for your deck by clicking the name of your deck on one of your flashcards:

Or alternatively, you can access a deck from the search bar by clicking in the search bar and then clicking the deck name when it appears:

(ii) Now set the order of the cards in your deck, so that they appear in the order that you'd like to quiz the reader on them. Click the overflow menu in the top right corner of the page (the 3 vertical dots). Click "Sort". Arrange the cards in the order you want. Users of your deck will be quizzed on the first card first, then the second card, and so on. This allows you to design your cards in such a way that the concepts built on each other. Click "Save" when you’re done sorting.

Step 3: Click the "Share" button for that deck and click  "Create Link" within the share window

(Please note that the actual text/verbiage may vary from this screenshot as we are actively iterating on this wording to make this section more understandable.)

IMPORTANT NOTE: The following steps will have to be repeated for each widget/quiz you’d like to embed in your article. We recommend including at least 2 quiz widgets in your article, but for a longer article, you may want to include more.

Step 4: Select which cards from your deck you would like to appear in the quiz for your (first) embedded widget

If you're embedding multiple widgets in your article, we’ll assume that you want to have each widget show different cards (as opposed to certain cards from your deck being repeated in more than one widget).

(i) Enter the appropriate ‘starting card number’ and ‘ending card number’ (based on how you sorted the cards in this deck previously). So for instance, if the starting card number is 3 and the ending card number is 7, that quiz widget will quiz the reader on cards 3 through 7.

(ii) Click “Copy” to copy the embed source URL to your clipboard:

IMPORTANT NOTE: you’ll need to have this URL copied to your clipboard for the steps below.

Example of how to spread the cards from your deck over multiple quizzes:

• You might choose to put the first card through the fifth card [cards 1–5] from your deck in the first flashcard quiz of your LessWrong post
• And then in the next flashcard quiz, you might include cards 6–10, etc., etc.)
• Note that from all embedded widgets, at the end of completing that quiz, users will have the option to subscribe to the full deck in Thought Saver (where they can get daily email quizzes, quiz themselves manually, create their own decks, etc.)

Step 5: Create a new post on LessWrong or open one you’re currently working on then click "Edit Block"  within your post

If you’re not logged in to your LessWrong account, or if you do not yet have an account, log in or create an account first

Once you're logged in, open the post you are working on, or create your new post.

When you've reached a point in your post when you'd like to embed a Thought Saver flashcard quiz widget, click the "Edit Block" button to the left of the current line:

NOTE: if you’re starting from a completely blank page, start typing something to make the “Edit Block” button appear or hover your mouse over the area just to the left of the current line you're on.

Step 6. Click "Insert Media" from the options menu

Step 7: Paste the embed URL you copied from Thought Saver, and click Save!

Now you've successfully embedded a Thought Saver flashcard quiz into your LessWrong post!

You may now continue writing your LessWrong post and repeating steps 4 through 7 to keep embedding more flashcard quizzes throughout that same post (as many as you'd like).

We hope you enjoy this new functionality! We'd love to hear your feedback on it and on Thought Saver more generally! Please give us feedback by commenting below, or by clicking the feedback button in the upper right-hand corner of the Thought Saver app.

If you're interested in how to write great flashcards, I'd recommend Andy Matuschak's article how to write good prompts: using spaced repetition to create understanding. Andy and his collaborator Michael Nielsen have been the pioneers in this space of embedding flashcards in essays. I highly recommend their essay Quantum Country where they introduced this medium. You may also want to check out Andy's other work related to this topic.

Thanks!

Discuss

### D&D.Sci April 2021 Evaluation and Ruleset

19 апреля, 2021 - 16:26
Published on April 19, 2021 1:26 PM GMT

This is a followup to the D&D.Sci post I made last week; if you haven’t already read it, you should do so now before spoiling yourself.

Here is the web interactive I built to let you evaluate your solution; below is an explanation of the rules used to generate the dataset. You’ll probably want to test your answer before reading any further.

Ruleset

(Note: to make writing this easier, I’m using standard D&D dice notation, in which “3+4d8” means “roll four eight-sided dice, sum the results, then add three”.)

EnemiesSharks

Sharks are 1/6 of encounters.

They attack in groups of 2+1d4, each of which does 1d10 points of damage.

Demon Whales

Demon Whales are 1/14 of encounters. (If that fraction seems high, you’re failing to account for all the sunk ships that couldn’t report encountering them.)

An attack from a Demon Whale does 17d12 points of damage.

A Demon Whale encounter has a ~78% fatality rateCrabmonsters

Crabmonsters are 1/14 of encounters.

A Crabmonster repeatedly rolls 1d80 as it tears through the ship, adding a point of damage with each roll, until it rolls a 1 (that is, encounters someone or something that stops it).

~8% of Crabmonster encounters do >200% damage; a Crabmonster encounter has a ~28% fatality ratePirates

Though the Navy’s records don’t bother to distinguish, Pirates come in two categories: Brigands (local criminals who had the poor fortune to cross paths with Naval supply ships while flying the black flag, and/or to mistake them for civilian cargo ships) and Privateers (agents of an enemy government, harassing your Navy’s fleet using hit-and-run tactics). Brigands are 1/6 of random encounters during your voyages, Privateers 1/21.

A fight with Brigands does 4d8 points of damage; a fight with Privateers does 6d12.

Merpeople

Surface-dwellers are unaware of the intricacies of underwater society, and record both Atlantean Merfolk (1/14 of encounters) and Alexandrian Merfolk (2/21 of encounters) as “Merpeople”. Fortunately, the two city-states are close enough politically that befriending one will cause them both to allow you free passage.

Atlanteans do 20+3d20 damage; Alexandrians do 1d8*1d8*1d8+1d20 damage.

~14% of Alexandrian attacks do >200% damage; an Alexandrian attack has a ~37% fatality rateKraken

Kraken are 2/21 of encounters.

They do 12d8 points of damage.

Nessie

Nessie is 1/21 of encounters.

She does 40+10d8 points of damage.

An encounter with Nessie has a ~2% fatality rateHarpies

Harpies are 1/14 of encounters.

They do 1d4+1d8+1d12 points of damage.

Water Elementals

Water Elementals are 2/21 of encounters.

The Navy has countering the powerful but predictable attacks of Water Elementals down to an art; there are well-known methods for ensuring they only almost destroy a given ship. They do 73+1d12 points of damage.

Direction

Direction is irrelevant from perspectives both practical (you have no control over how many trips you take each way) and epistemic (direction happens to have no effect on outcomes).

Time effects

Time has almost no effect. The one exception is that Privateers used to be much more common (and other encounters therefore slightly less common) before 4/1401; this is when your nation’s main rival changed tactics and stopped hiring mercenaries to attack supply ships.

Sinking Risk by Enemy

In the absence of interventions, ~50% of shipwrecks are caused by Demon Whales, ~18% by Crabmonsters, ~31% by Merpeople, ~1% by Nessie, and 0% by other threats.

Strategy

If attempting to optimize odds of survival, your best choices are to buy all oars, arm carpenters, tribute the Merpeople, and buy one extra cannon; congratulations to simon, GuySrinivasan and Measure for reaching this conclusion.

However, since Pirates never sink ships and Nessie is pretty bad at it, you may wish to take the money you’d spend on the cannon and either hold onto it (to impress the Navy’s accountants) or spend it on foam swords (to impress the Navy’s dockworkers).

Reflections

All else equal, there’s a little extra uncertainty when predicting quantities instead of categories: “is that sudden peak at 14% noise, or a clue to the generating function?”, etc. However, the main reason this challenge was so much more speculative than its predecessors is that the most important information – details of attacks that did 100%+ damage – was censored by the mechanics of the world. In the absence of hard evidence, small errors in inference compound, priors pick up the slack, and considerations like “what genres apply here?” or “is the scenario designer enough of a troll to have Demon Whale damage arbitrarily cap out at 99%?” take on a significance they wouldn’t otherwise.

This is both good and bad. Good because the personal touch adds intrigue to what would otherwise just be data-wrangling; bad because every unit of effort spent psychoanalyzing the GM is a unit of effort not spent on getting better at data-wrangling or on psychoanalyzing reality’s GM (i.e. studying Math and Science). I enthusiastically solicit feedback on this point, as well as on every other point.

Scheduling

The next D&D.Sci challenge should be ready sometime earlyish next month, but nebulous and open-ended work commitments mean I can’t promise anything.

Discuss

### Parameter count of ML systems through time?

19 апреля, 2021 - 15:54
Published on April 19, 2021 12:54 PM GMT

Pablo Villalobos and I have been working to compile a rough dataset of parameter counts for some notable ML systems through history.

This is hardly the most important metric about the systems (other interesting metrics we would like to understand better are training and inference compute , and dataset size), but it is nonetheless an important one and particularly easy to estimate.

So far we have compiled what it is (to our knowledge) the biggest dataset so far of parameter counts, with over a 100 entries.

But we could use some help to advance the project:

1. Is there any previous relevant work?  We are aware of the AI and compute post by OpenAI, and there are some papers with some small tables of parameter counts.
2. If you want to contribute with an entry, please do! The key information for an entry is a reference (citation and link), domain (language, vision, games, etc), main task the system was designed to solve, parameter count (explained with references so its easy to double check), and date of publication. The criteria for inclusion is not very well defined at this stage in the process; we have been focusing on notable papers (>1000 citations), significant SOTA improvements (>10% improvement on a metric over previous system) and historical relevance (subjective). We mostly have ML/DL/RL papers, and some statistical learning papers. To submit an entry either leave an answer here, send me a PM, email jaimesevillamolina@gmail.com or leave a comment in the spreadsheet.
3. If you'd be interested in joining the project, shoot me an email. The main commitment is to spend 1h per week curating dataset entries. Our current goal is compiling parameter counts of one system per year between 2000 and 2020 and per main domain. If you can compute the number of parameters of a CNN from its architecture you are qualified. I expect participating will be most useful to people who would enjoy having an excuse to skim through old AI papers.

Thank you to Girish Sastry and Max Daniel for help and discussion so far!

Discuss

### Problems of evil

19 апреля, 2021 - 11:06
Published on April 19, 2021 8:06 AM GMT

(Cross-posted from Hands and Cities)

I.

I wasn’t raised in a religious household, but I got interested in Buddhism at the end of high school, and in Christianity and a number of other traditions, early in college. Those were the days of the New Atheists, and of intricate wrangling over theistic apologetics. And I did some of that. I went, sometimes, to the atheist group, and to some Christian ones; I read books, and had long conversations; I watched lectures, and YouTube debates.

Much of the back-and-forth about theism that I engaged with at that point in my life, I don’t think about much, now. But I notice that one bit, at least, has stayed with me, and seemed relevant outside of theistic contexts as well: namely, the problem of evil.

As usually stated, the problem of evil is something like: if God is perfectly good, knowing, and powerful, why is there so much evil in the world? But I think this version is too specific, and epistemic. Unlike many other issues in theistic apologetics, I think the problem of evil — or something in the vicinity — cuts at something much broader than a “three O” (omnipotent, omniscient, omni-benevolent) God. Indeed, I think it cuts past belief, to a certain affirming orientation towards, and commitment to, reality itself — an orientation I think many non-theists, especially of a “spiritual” bent (including a secularized/naturalistic one), aspire towards, too.

II.

My impression is that of the many objections to theism, the problem of evil has, amongst theists, a certain kind of unique status — centrally, in its recognized force, and but also, in the way this force can apply independent of doubt about God’s existence per se.

Here’s the (devoutly Christian) theologian David Bentley Hart:

“That’s the best argument of all. It’s not an argument regarding God’s existence or non-existence, because that’s a question, first you have to define what existence means, what God means. But it goes directly to the question of divine goodness and benevolence. It’s the weightiest and the most powerful and the one that, actually, is the argument that’s adduced most often by believers, famously Dostoyevsky… It’s the argument that holds the most water for me.”

Indeed, Hart calls various responses to the problem of evil “banal and sometimes quite repulsive”:

“…the Calvinist argument for divine sovereignty, does it really have to justify itself to you morally; or equally, Richard Swinburne’s arguments, forgive me, I hate to name names, about how suffering gives us opportunities for moral goodness, and that includes, apparently, the holocaust… I think ultimately, if that’s the calculus, then God comes out as evil. There’s just no way you work your way to the end of these chains of reasoning, without coming up with an arbitrary and in some ways quite deplorable picture of God.”

(See also the Christian apologist and philosopher Alvin Plantinga, who writes: “I must say that most attempts to explain why God permits evil—theodicies, as we may call them—strike me as tepid, shallow and ultimately frivolous.”)

C.S. Lewis, too (another Christian apologist), seems to have felt the problem of evil with special acuity. In the beginning of The Problem of Pain, he describes why, before his conversion, he rejected Christianity:

“If you asked me to believe that [the pain and seeming indifference of the world] is the work of a benevolent and omnipotent spirit, I reply that all the evidence points in the opposite direction. Either there is no spirit behind the universe, or else a spirit indifferent to good and evil, or else an evil spirit.”

Indeed, A Grief Observed — a book compiled from journals Lewis wrote after his wife (called “H.” in the book) died of cancer — documents a (brief) crisis in this regard: not of faith in God, per se, but of faith in God’s goodness. Wracked by grief, haunted by his wife’s pain, Lewis writes:

“Come, what do we gain by evasions? We are under the harrow and can’t escape. Reality, looked at steadily, is unbearable. And how or why did such a reality blossom (or fester) here and there into the terrible phenomenon called consciousness?…

If H. ‘is not,’ then she never was. I mistook a cloud of atoms for a person. There aren’t, and never were, any people… No, my real fear is not materialism. If it were true, we — or what we mistake for ‘we’ — could get out, get from under the harrow. An overdose of sleeping pills would do it. I am more afraid that we are really rats in a trap. Or worse still, rats in a laboratory….

Sooner or later I must face the question in plain language. What reason have we, except our own desperate wishes, to believe that God is, by any standard we can conceive, ‘good’? Doesn’t all the prima facie evidence suggest exactly the opposite? What have we to set against it?…”

(The first few chapters of A Grief Observed, by the way, are some of my favorite bits of Lewis; and he exhibits, there, a vulnerability and doubt rare amidst his usual confidence).

Like Lewis, Dostoyevsky’s Ivan does not present evil as an objection to God’s existence per se. Indeed, he accepts that at the end of days, he may see the justice of the suffering of children; but he does not want to see it, or accept a ticket to heaven on such terms:

“Oh, Alyosha, I am not blaspheming! I understand, of course, what an upheaval of the universe it will be, when everything in heaven and earth blends in one hymn of praise and everything that lives and has lived cries aloud: ‘Thou art just, O Lord, for Thy ways are revealed.’…  I, too, perhaps, may cry aloud with the rest, looking at the mother embracing the child’s torturer, ‘Thou art just, O Lord!’ but I don’t want to cry aloud then… It’s not God that I don’t accept, Alyosha, only I most respectfully return Him the ticket.”

I think part of what might be going on, in these quotations, is that the problem of evil is about more than metaphysics. Indeed, Lewis dismisses materialism as confidently as ever; Hart sets the question of God’s “existence,” whatever that means, swiftly to the side; Ivan still expects the end of days. The problem of evil shakes them on a different axis — and plausibly, a more important one. It shakes, I think, their love of God, whatever He is. And love, perhaps, is the main thing.

III.

One common response to the problem of evil is: we don’t know why God permits so much evil, but we shouldn’t expect to know, either. He is too far beyond us. His ways are not our ways.

We see some of this, for example, in the book of Job. Job was “perfect and upright” (Job 1:1); but God, in a dispute with the devil about whether Job righteousness depends on his material advantages, allows the devil to kill Job’s children, servants, and livestock, and to cover Job’s body with boils. At first, Job refuses to curse God (“the Lord gave, and the Lord hath taken away; blessed be the name of the Lord”). But later, Job complains. Eventually, God appears to him in a whirlwind, to remind him how little he understands:

“Where wast thou when I laid the foundations of the earth? declare, if thou has understanding. Who hath laid the measures thereof, if thou knowest?” (Job 38:4).

More philosophical versions of this sometimes invoke a chess-master. If you see Gary Kasparov make a chess move that looks bad to you, this need not impugn his mastery.

Response: OK, but if you’re not sure whether it’s Kasparov, or a random move generator, bad moves are evidence. And eventually — as queen and rooks fall, as no hint of strategy emerges — lots of it.

But we can un-know harder: why think you even know what it is to win at chess? Sure, God does bad-seeming things. But what are human concepts of “good’ and “bad,” faced with God’s transcendence?

Here’s Lewis, responding to moves like this:

If God’s moral judgment differs from ours so that our ‘black’ may be His ‘white’, we can mean nothing by calling Him good; for to say ‘God is good’, while asserting that His goodness is wholly other than ours, is really only to say ‘God is we know not what’. And an utterly unknown quality in God cannot give us moral grounds for loving or obeying him.” (p. 567)

Whether this argument actually works isn’t clear. Analogy: if you are devoted to any being that plays for the true win conditions of chess, and you hypothesize, initially, that checkmate constitutes winning, you’ll still end up devoted to a being who plays for a wholly different condition, if that condition turns out to be the true one (thanks to Katja Grace for suggesting objections in this vein; and see, also, the Euthyphro dilemma). But I think Lewis is pointing at an important worry regardless (and indeed, one that hit him hard during the crisis described above): namely, that if we go too far into “unknowing”; if we strip from God too much of what we think of as “goodness”; or if we call too many bad things “good,” then God, and goodness, start to empty out completely.

This worry seems especially salient in the context of contemporary (liberal, academic) theology, which in my experience (though it’s been a few years now), is heavily “apophatic” and mystical. That is, it approaches God centrally in His beyond-ness: beyond language, knowledge, mind and matter, personhood and non-personhood; beyond, even, existence and non-existence. Thus, Meister Eckhart writes of God: “He is being beyond being: he is a nothingness beyond being.” Or John Scotus Eriugena: “Literally God is not, because He transcends being.”

Perhaps God is beyond being. But is he beyond goodness, too? Some bits of Eckhart suggest this. “God is not good, or else he could be better.” (Though, conceptual transcendence aside, this seems like a terrible argument? “Pure black is not dark, or else it could be darker.”) And indeed, if we are to say nothing about God, presumably this includes: nothing good. God is blank.

But what, then — amidst the horrors of this world — grounds worship, reverence, devotion?

I think a variety of non-theists face something like this question, too.

IV.

In my days of talking with lots of people about their spirituality, I learned to ask certain questions to figure out where they were coming from; and whether they believed in God, or even in a “personal God,” wasn’t high on the list. Of Christians, for example, I would generally ask whether they believed in the literal, bodily resurrection of the historical Jesus — a concrete question that I think efficiently distinguishes variants of Christianity (e.g., “I believe in miracles” vs. “well it’s really all a kind of symbolic thing at the end of the day isn’t it?”), and which has some biblical endorsement as central (Corinthians 15:14: “if Christ be not risen, then is our preaching vain, and your faith is also vain”).

Similarly, I think of whether someone believes that Ultimate Reality is in some sense “good” as a much more informative question, spiritually speaking, than whether they believe in God, or that e.g., our Universe was created by something like a person. Indeed, I know a variety of vigorously secular folks who take seriously creation stories involving intelligent agents (see, even, Dawkins). And there are many God-related words (the Ground of Being, the Absolute, the Source, the Unconditioned, the Deathless) and concepts (pantheism, Deism) that do not imply anything like goodness (many of which, relatedly, can be compatible with something like naturalism — though the practice of capitalizing letters of abstract, God-related words seems, instructively in this context, in a higher-level sort of tension with the intellectual aesthetic most associated with naturalism).

But I don’t think that “belief” — whether in divine goodness or no — really captures what matters, either. Indeed, mystical/apophatic traditions like Eckhart’s often focus on negating and/or going beyond concepts — and “belief” is tough without concepts. Does Eckhart’s God exist? Does a dog have Buddha nature? Mu.

More broadly, the relationship between “spirituality” and explicit belief seems, at least, complex. Consider Ginsberg (1956):

“The world is holy! The soul is holy! The skin is holy! The nose is holy! The tongue and cock and hand and asshole holy!

Everything is holy! everybody’s holy! everywhere is holy! everyday is an eternity! Everyman’s an angel!”

What is the “belief” here? Not, clearly, that men don’t murder, or that clocks don’t tick. And looking out at the panoply of spiritual practices, communities, and experiences to which even metaphysically-naturalistic folk devote passionate energy, belief (even of a fuzzy, inconsistent, and/or motte-and-bailey kind) hardly seems the main thing going on.

But if we set aside belief — and especially, if we endeavor to avoid belief of the kind that makes apologetics, metaphysics, etc necessary at all — is it all just “sexed up atheism,” in Dawkins’s phrase, or non-sense? I think that at the very least, there are interesting differences between how e.g. Meister Eckhart is orienting towards the world, and how straw-Dawkins (even when appreciative of the world’s beauty) is doing so — differences separable from their respective metaphysics, but core to their respective “spirituality.”

V.

Dawkins, Carroll, Sagan, Tyson: all are keen to remind us of the wonder and awe compatible with naturalism (indeed, Kriss (2016) goes so far as to accuse popular atheists of peddling forms of beauty that discourage social change: “Whenever you hear a rapturous defense of the natural world, you should be on your guard: this is class power talking, and it’s trying to kill you”). But beliefs aside, are wonder and awe enough to equate the attitudes of Dawkins and Eckhart? I think: no. For one thing, there is a difference between (a) attitudes directed at particular arrangements of reality (stars, flowers, and so forth), and (b) attitudes directed, in some sense, at reality itself, the Being of beings. (Though stars, flowers, etc can serve, for (b), as vehicles, or sparks, or windows.)

In this vein, we might think of an attitude’s “existential-ness” as proportionate to the breadth of vision it purports to encompass. Thus, to see a man suffering in the hospital is one thing; to see, in this suffering, the sickness of our society and our history as a whole, another; and to see in it the poison of being itself, the rot of consciousness, the horrific helplessness of any contingent thing, another yet.

We might call this last one “existential negative”; and we might call Ginsberg’s attitude, above, “existential positive.” Ginsberg looks at skin, nose, cock, and sees not just particular “holy” things, contrasted with “profane” things (part of the point, indeed, is that cocks read as profane), but holiness itself — something everywhere at once, infusing saint and sinner alike, shit and sand and saxophone, skyscrapers and insane asylums, pavement and railroads, the sea, the eyeball, the river of tears.

Or consider this passage from Hesse’s Siddhartha:

“He no longer saw the face of his friend Siddhatha. Instead he saw other faces, many faces, a long series, a continuous stream of faces — hundreds, thousands, which all came and disappeared and yet all seemed to be there at the same time, which all continually changed and renewed themselves, and which were yet all Siddhartha. He saw the face of a fish, of a carp, with tremendous painfully opened mouth, a dying fish with dimmed eyes. He saw the face of a newly born child, red and full of wrinkles, ready to cry. He saw the face of a murderer, saw him plunge a knife into the body of a man; at the same moment he saw this criminal kneeling down, bound, and his head cut off by an executioner. He saw the naked bodies of men and women in the postures and transports of passionate love. He saw corpses stretched out, still, cold, empty. He saw the heads of animals — boars, crocodiles, elephants, oxen, birds. He saw Krisha and Agni. He saw all these forms and faces in a thousand relationships to each other … And all these forms and faces rested, flowed, swam past and merged into each other, and over them all there was continually something thin, unreal and yet existing, stretched across like thin glass or ice, like a transparent skin, shell, form or mask of water — and this was Siddhartha’s smiling face which Govinda touched with his lips at that moment.”

This seems to me “existential positive,” too. Govinda’s vision purports to encompass all of birth and death; the ten thousand things, seen in their unity; and yet Siddhartha smiles. And we can see many of the quotes from theists, in section II, as responding to the sense in which the problem of evil threatens their own “existential positive,” whether it threatens their belief in God or no.

To the extent that it goes beyond e.g. “the stars are so beautiful,” I think that a lot of contemporary, non-theistic spirituality involves elements of “existential positive” — even if not explicitly stated, and even in the context of more metaphysically pessimistic traditions, like Buddhism. Mystical traditions, for example (and secularized spirituality, in my experience, is heavily mystical), generally aim to disclose some core and universal dimension of reality itself, where this dimension is experienced as in some deep sense positive — e.g. prompting of ecstatic joy, relief, peace, and so forth. Eckhart rests in something omnipresent, to which he is reconciled, affirming, trusting, devoted; and so too, do many non-Dualists, Buddhists, Yogis, Burners (Quakers? Unitarian Universalists?) — or at least, that’s the hope. Perhaps the Ultimate is not, as in three-O theism, explicitly said to be “good,” and still less, “perfect”; but it is still the direction one wants to travel; it is still something to receive, rather than to resist or ignore; it is still “sacred.”

Indeed, we might think of popular injunctions to be “present,” “aware,” “here,” “now” — at least when interpreted in non-instrumental terms — as expressing a kind of existential positive, too. If reality is not in some sense good; and if turning towards it, receiving it, being aware of it, promotes no other worldly end (calm, focus, ethical clarity, etc); why, then, be mindful, or awake? Why not distract, or dull, or delude, or ignore?

What’s more, even if everything in this world is holy, in the limit of breadth, the “existential positive” here extends yet further — beyond any of the ways the world just happens to be, to a kind of affirmation of Being/Reality in itself, however manifest. Or at least, this is implied, I think, by a kind of unconditional holiness. (Though Katja Grace suggests: maybe in this world, everything is holy, but that other world, it isn’t. Indeed, we could even try to imagine a kind of “holiness zombie” world, physically identical to this one). More contingent forms of universal holiness, I think, involve what we might think of as “existential luck” — akin to (though broader than) the type Satan accuses God of giving Job. Sure, you’re spiritual here, in an often-pretty world, with your telescopes and your oxen and your boil-free skin. But suppose you were in a hell world. Suppose, in fact, you already are. (Kriss thinks you are.) What holiness, then?

VI.

In the context of the “existential positive,” and especially in its least contingent forms, a kind of non-theistic problem of evil re-arises. What is Ginsberg’s holiness, if holocaust, Alzheimers, rape, depression, factory farm, be holy? Or if, more, the worst possible world would be holy — since it, too, would be real? Ginsberg need not excuse God’s creation of the world’s horrors; Ginsberg’s God need not create, or choose, or know. Nor need Ginsberg protect or preserve those horrors, however holy. But there is still something in them of the Real; and the Real, for Ginsberg (or, my imagined Ginsberg), and for many others, is sacred in itself.

We see pressures, here, similar to those that drove the old theologians towards the obscure doctrine of “privatio boni“: that is, the view that evil is nothing real and substantive in itself, but is rather the absence or privation of goodness. God, after all, is the fount of all reality; to say that some bit of reality is bad, then, risks marring God’s perfection. Indeed, Lewis, in his depiction of Hell in The Great Divorce, makes it a tiny, insubstantial place, fading into nothingness, barely there (though the “barely,” I think, points at part of what makes privatio boni unstable — e.g., if evil really weren’t there, it struggles to play a role in the story).

Relatedly, for the old theologians, reality/being/existence was itself a “perfection” (hence, e.g. the ontological argument). And the “transcendentals” — that is, the set of properties common to all beings — were thought to include not just non-normative properties like “truth,” “unity,” and so forth, but also “beauty,” and “goodness.” We might see Ginsberg’s “holiness” as a transcendental, too.

But as ever, as soon as we set out to forge a non-contingent connection between the True and the Good; the Real and the Sacred; the Is and the Ought; the Ultimate and the to-be-Trusted, Affirmed, Rested-In, Worshipped — we run right into cancers; genocides; parasites; paralysis; predators ripping flesh from bone; mass extinctions; “bees in the heart, then scorpions, maggots, and then ash.” Contra the old theologians, these things are just as True, Real, Is, Ultimate, as anything else. If these, too, are sacred, then what is sacredness? Why reverence for the Real? Why not defiance, rebellion, disgust?

VII.

We might make a similar point a different way. Much of contemporary spirituality, I think, aims at a certain type of unification or “non-duality.” It aims, that is, to erase or transcend distinctions rather than draw them; to reach the whole, rather than the part. Indeed, to the extent that an “existential” attitude aims, ultimately, to encompass as much of the “whole picture” as possible, some aspiration towards unity seems almost inevitable.

But as we raise the level of abstraction, but wish to persist in some kind of existential affirmation, we will include, and affirm, more and more of the world’s horror, too (until, indeed, we move past what the world is actually like, to what it could be like, and to horrors untold). The content of the affirmation thereby either drains away, or horribly distorts. That is, naively, affirmation is made meaningful via its dualism; via the distinction between what is to be affirmed and what is not — and much of the world is, one might think, “not.” As this distinction collapses, the difference between “existential positive” and nihilism, good and “beyond good,” becomes increasingly unclear.

Thus, for example, Rumi writes:

Out beyond ideas of wrongdoing and rightdoing,
There is a field. I’ll meet you there…

Indeed, in my experience, various non-dual-flavored spiritual teachers flirt with, or explicitly endorse, what look, naively, like fairly direct types of nihilism, even as they urge e.g. compassion and kindness elsewhere. Buddhists doing this often suggest that realizing the empty and constructed nature of all things will, in fact, lead to greater compassion; and perhaps, empirically, this is right. But what if it doesn’t? Why should it? Indeed, confidence has waned, amongst some Western Buddhists, in a strongly reliable or “default” connection between “ethics” and “insight.” And good vs. bad, in some non-dual contexts, is just another constructed distinction — indeed, perhaps a core barrier — holding you back; another type of separation; perhaps, indeed, another type of violence.

But if we go fully beyond ideas of “good” and “bad,” what calls us towards Rumi’s field? We need not understand “goodness” in narrow, brittle, moralistic, or universalized ways; discernment need not exclude openness and receptivity; and perhaps it is good, in ways, to learn to put down “good” entirely, at least at times. But the David Enochs and Thomas Nagels of the world are right, I think, to recognize the ubiquity of at least some kind of normativity to a huge amount of human thought, and non-dual spirituality (not to mention much of the discourse about “non-judgment”) is no exception. “Beyond good” is not “special extra super good.” It’s just actually not good (or bad). Go fully beyond any sort of good, and the sacred loses its shine.

VIII.

My main aim here has been to point at the ways that something reminiscent of the theistic problem of evil applies to more amorphous forms of (even very naturalistic) spirituality, too. I won’t, here, say much about how deep a problem this is, and how one might respond to it (and sufficiently mystical responses will simply be: “this is a problem that arises at the level of concepts; but if you go experience of e.g. holiness itself, it does not arise, at least not in this way” — and I think there’s at least something to this).

Obviously, one response is to reject any kind of reverence or affirmation towards the Real in itself. Indeed, the rejection of any sort of evaluatively rich attitude (positive or negative) towards the Real in itself seems to me a plausible candidate for the essence of secularism — or at least, one salient kind. That is, the secularist may have positive and negative attitudes towards particular arrangements of reality; but Being, the Real, the Ultimate, the Numinous — these things, just in themselves (insofar as they have meaning at all), are blanks. (I think there are connections, here, between secularism in this sense, and a lack of interest in “contact with reality” of the type I described here; but that’s another story.)

I do want to point, though, at a different family of responses that seem to me both interesting, and less obviously secular in this sense.

Fromm (1956) distinguishes between “father love” and “mother love.” (To be clear: these are archetypes that actual mothers, fathers, non-gendered parents, non-parents, etc can express to different degrees. Indeed, if we wanted to do more to avoid the gendered connotations, we could just rename them, maybe to something like “assessment love” and “acceptance love.”) Fromm’s archetypal father orients towards his child from a place of expectation and assessment. He loves as the child merits. He teaches morality, competence, and interaction with the outside world. Fromm’s archetypal mother, by contrast, relates to her child with unconditional acceptance. She loves no matter what. She teaches security, self-loyalty, home. (See also parallels with Darwall’s (1977) “appraisal respect” vs. “recognition respect” — though there are many differences, too).

“Father love,” for many, is easy to understand. Love, one might think, is an evaluative attitude that one directs towards things with certain properties (namely, lovable ones) and not others. Thus, to warrant love, the child needs to be a particular way. So too with the Real, for the secularist. If the Real, or some part of it, is pretty and nice, great: the secularist will affirm it. But if the Real is something else, the thing to be done is to reshape it until it’s better. In this sense, the Real is approached centrally as raw material (here I think of Rob Wiblin’s recent tweet: “I’m a spiritual person in that I want to convert all the stars into machines that produce the greatest possible amount of moral value”).

But mother love seems, on its face, more mysterious. What sort of evaluative attitude is unconditional in this way? Indeed, more broadly, relationships of “unconditional love” raise some of the same issues that Ginsberg’s holiness does: that is, they risk negating the sense in which meaningfully positive evaluative attitude should be responsive to the properties of their object (reflecting, for example, when those properties are bad). And one wonders (as the devil wondered about Job) whether the attitude in question is really so unconditional after all.

But is mother love unconditionally positive? Maybe in a sense. But a better word might be: “unconditionally committed” or “unconditionally loyal” (thanks to Katja Grace for suggesting this framing). That is, we can imagine an archetypal mother who cares, like the archetypal father, about the child’s virtue, who is pained by the child’s mistakes, and so forth; and whose love, in this sense, is far from a blanket of uniform affirmation (though whether this fits Fromm’s mother mold, I’m not sure). But where the archetypal father might, let us suppose, give up on the child, if some standard is not met, the mother will not. That is, the mother is always, in some sense, loyal to the child; on the child’s team; always, in some sense, caring; paying attention.

Exactly how to understand this sort of unconditional loyalty, I’m not sure; and it may, ultimately, have problems similar to unconditional holiness (and obviously, ideals of unconditional loyalty, commitment, love etc in actual human contexts have their own issues). But we have, at least, a robust kind of human acquaintance with “mother love” of various kinds, and I wonder if it might suggest less secular (in my sense above) responses — perhaps even ancient and familiar responses — to the problems of evil I’ve discussed.

We might look for other examples, too, of forms of love that seem to transcend and encompass something’s faults, without denying them. Here I think of this scene from Angels in America — one of my favorites of all time (spoilers at link; and hard to understand if you don’t know the play). And also, of the father’s forgiveness in the parable of the prodigal son. (For a set of moving reflections on the parable, I recommend Nouwen (1994).)

“And the son said unto him, Father, I have sinned against heaven, and in thy sight, and am no more worthy to be called thy son. But the father said to his servants, Bring forth the best robe, and put it on him; and put a ring on his hand, and shoes on his feet.” (Luke, 15:21-22).
Painting by Rembrandt, source here.

Chesterton, in Orthodoxy (chapter 5) talks about loyalty as well, and about loving things before they are lovable:

“My acceptance of the universe is not optimism, it is more like patriotism. It is a matter of primary loyalty. The world is not a lodging-house at Brighton, which we are to leave behind because it is miserable. It is the fortress of our family, with the flag flying on the turret, and the more miserable it is the less we should leave it. The point is not that this world is too sad to love or too glad not to love; the point is that when you do love a thing, its gladness is a reason for loving it, and its sadness a reason for loving it more … What we need is not the cold acceptance of the world as a compromise, but some way in which we can heartily hate and heartily love it. We do not want joy and anger to neutralize each other and produce a surly contentment; we want a fiercer delight and a fiercer discontent.”

I don’t think responses in this vein — that is, forms of love, loyalty, commitment, and forgiveness towards the Real, despite its faults — fully capture what’s going on with experiences of e.g., holiness, sacredness, reverence, or receptivity (thanks to Katja Grace for suggesting distinctions in this respect). Nor am I committed to (or especially interested in) claims about the “secularism” of such responses. But faced with problems of evil, theistic or no, I think these responses might have a role to play.

Discuss

### the fat baker principle

19 апреля, 2021 - 10:58
Published on April 19, 2021 7:58 AM GMT

Culpability

"wait am I responsible for the pareto depth thing"
"Yes." t. Jollybard
essay theme: https://soundcloud.com/jollybard/gigantopithecus

The self is the landlord of the mind

Beliefs should pay rent in anticipated experience.
Everyone knows this by now, we've already turned out the pockets of the laziest and most impoverished ideas,
the freeloaders in this informational class struggle have been evicted long ago.
If we're the land barons of our own minds, why stop at just a little power?
Taking this idea further, preferences should pay rent just as much as beliefs.
But what can a preference 'pay'?

The Fat Baker Principle

There are different competing constructions for the 'fat baker'. Naturally, I prefer mine.
One could go:"Never trust a thin baker."
Another: "it's way easier to become good at something if you actually enjoy what you make"
"lmao"
"lole!"
Finally: "how can you say you even like bread if you can't make a decent loafa?"

To put it another way, preferences should pay rent in changed behavior.
If you really like manga, maybe you should have internalized a model that can split out good manga from bad.
If you really, really, really like manga, maybe you should have internalized a model that's most of the way to synthesizing new manga.

Conclusions

This is not only a virtue-deontology ethics
("you should make things yourself if you like them enough to fling critique")
but a rudimentary system of personality-level course correction against flights of fancy.

If you find yourself liking something, do you find yourself wanting to curate examples of work, find other creatives, sketch out the bones of your own work?
Do you find yourself "liking" parasocial relationships of engagement and consumption with content-creators, luminaries, or "communities" instead?
If you can only like your likes at arms length, mediated through others and not your own hands, you might not like them as much as you think you do.

Discuss

19 апреля, 2021 - 06:13
Published on April 19, 2021 2:30 AM GMT

This is a linkpost for https://nibnalin.me/dust-nib/bellmans-curse-of-dimensionality.html

While popular study of dimensionality is usually focused on tight, mathematical systems, I’ve found that thinking about dimensionality as a metaphor in other, less well-defined systems is a surprisingly helpful tool.

Advice, for instance, tends to fit such a model very naturally. You have an extremely large state space (life) where you only get one opportunity to traverse a path. To add to that, like the typical exponential problem, as dimensionality of life increases, the volume of the space of possibilities increases so fast that the available data on historical outcomes becomes quite sparse. How, then, can someone evaluate others’ advice, or look at someone else’s life to figure out what they should do with their own?

I don't claim to have a solution to this problem, but in my experience it's helpful to keep the dimensionality framework suggested by Bellman in the back of your mind as you navigate advice. For starters, thinking about the dimensions in this space independently is helpful to evaluate different states clearly:

• Time: Your professor probably don’t have the best advice for you because their own experience about your situation was frozen in 1980s, and a lot about the world has changed since then.
• Space: Advice about fundraising from a US entrepreneur is obviously not as relevant for someone raising in India or China.
• Internal state: This is one of the harder categories to pin down. One way to think about these is as some function of an individual's personality and thought processes, but I think that only scratches the surface of what this broad category intends to cover. If someone attempts to use Steve Jobs' offbeat management methods, but lacks the charisma and the "reality distortion field" Steve Jobs is said to have conjured, they're not going be as successful in their endeavours.

I've always found it somewhat surprising how much different, often entirely contradictory, advice you can hear about any choices you can think of. When just about everyone has their own strongly held opinions about everything, it's really hard to evaluate the applicability and the net delta of any singular piece of advice.

I think the only antidote to evaluating advice given such dimensional differences is to figure out relative distortions in the advisor's state to your state, and "port" the advice as necessary. Don't do what Napoleon did, do what he would do if he was a founder born in silicon valley during the Information Age.

Another consequence of this framework is that differences in multiple dimensions scale non-linearly. A 60 year old college professor will always have strictly less applicable advice, since they are separated from you in time and space vs. someone of your age living in the same city as you. Note that I’m only making a claim about the applicability of their advice here, whether the advice itself is good or bad is a completely different question that one should evaluate separately.

Notably, however, there are many flaws in this framework. The most obvious flaw to me is that it isn't clear if the aforementioned dimensions are linearly independent, and treating them as such has unintended consequences (for instance, someone at different space and time but with the same internal state aren’t necessarily as bad as someone at the same space and time but wildly different internal state).

Blessing of dimensionality

There is one other interesting result that arises from this framework that's applicable to advice: the blessing of dimensionality. In math, the curse of dimensionality leads to a conjugate that feels un-intuitive on first consideration: because exponential problems are so hard, even basic heuristics are good enough to get reasonably close to optimal solutions. The oversimplified explanation for this is that state spaces for such problems are so large that taking random paths is exponentially worse than taking a path that’s just good enough.[1]

When I first read Dale Carnegie's "How to Win Friends and Influence People", I was extremely surprised by the blandness and obviousness of all the advice it described. Surely, I thought, this couldn't be a multi-decade bestselling book with advice that claims to have changed countless lives. Weirdly enough, Dale Carnegie doesn't need his readers to develop Steve Jobs' charisma to be successful, if they take the basic, most obvious steps they'll already be good enough to be reasonably successful in life. This "blessing of dimensionality" is why most popular advice you'll find is un-specific and vague, even though, in practice, most of us will end up applying that advice to the most specific scenarios and circumstances.

On a broader perspective, this conjugate "blessing of dimensionality" is a good reminder to avoid traps of overthinking and taking life overly seriously. If your actions have some reasonable motivations, at the end of the day, the worse you'll do in expectation is really not that bad in absolute terms: You're unlikely to end up in a ditch in a forest due to picking a wrong choice at some fork in the road once, if all the other forks you've picked before had some reasonable amount of consideration behind them too.

1. More technical details about the set up for this class of problems are better described here ↩︎

Discuss