# Новости LessWrong.com

A community blog devoted to refining the art of rationality
Обновлено: 42 минуты 34 секунды назад

### Rationality Cardinality

28 апреля, 2021 - 01:27
Published on April 27, 2021 10:27 PM GMT

Rationality Cardinality is a humorous card game, built out of rationality concepts and vocabulary and designed to teach them. I have recently finished an online-game implementation, which can be embedded into gather.town worlds. Try it out by joining the Rationality Cardinality gather.town server! It helps to have other people to play with, so I suggest  Saturday May 1, 11am PST as a Schelling meetup time.

Rationality Cardinality is a two-for one combined bet on dick jokes and the Sapir-Whorf hypothesis. Sapir-Whorf is the hypothesis that many people are bottlenecked on what thoughts they can think, on having vocabulary for the relevant concepts. I've been curating a set of cards (concepts, explanations, and jokes) over the years, and I'm quite proud of it. (Except for the cards about concepts that fell to the replication crisis, of course.)

Discuss

### Pitfalls of the agent model

28 апреля, 2021 - 01:19
Published on April 27, 2021 10:19 PM GMT

Abstract

We study the agent model as a frame for understanding the phenomenon of entities that exert influence over the future. We focus on one implication of the agent model: the assumption of a fixed agent policy that is unchanging over time. We study the consequences of this assumption in a number of scenarios. We consider single- and multi-agent scenarios, scenarios consisting of humans, machines, and combinations of humans and machines, and scenarios in which the agent model is being used by an entity to model itself versus other entities. We show that all combinations of these can be covered by six basic cases. For each of the six cases we identify pitfalls in which certain aspects of reality are abstracted away by the application of the agent model, and we examine the consequences of each. We draw connections to related work in embedded agency, partial agency, and embodied cognition. We conclude that the agent model has significant shortcomings as a frame for engineering advanced AI systems.

Introduction

Yesterday I wrote about the agent model as a frame for understanding the real-world phenomenon of entities in the world that exert influence over the future.

I said that we should be cautious about over-using any one frame as a means for understanding any phenomena, because when we only have one frame it is easy to forget about the boundary between the phenomena that we’re looking at and the frame that we’re looking at it with, which is a problem no matter how powerful or accurate our preferred frame is. This is indeed one reason to consider frames other than the agent model when studying entities that exert influence over the future. A second reason is that there are specific shortcomings of the agent model. I want to examine some of those today.

All frames have shortcomings. A frame is precisely that which gives us a way of seeing that is simpler than reality itself, so the whole point of a frame is that it does not include every aspect of reality. We can always point to ways in which any given frames fail to capture all of reality, and that is no reason on its own to discard a frame. Nevertheless, we may want to de-emphasize frames whose shortcomings are severe enough. In this post I will present the shortcomings of the agent model as I see them, so that you can decide to what extent you want to emphasize the agent model in your own thinking.

I will not be examining the up-sides of the agent model in this post. It is a powerful model with many up-sides, as evidenced by its broad adoption across many disciplines over the last few hundred years. Perhaps a future post will examine the virtues of the agent model.

What is the agent model?

Under the agent model, we consider the world as consisting of two parts: the agent, and the environment. Information flows from environment to agent as a sequence of observations, and from agent to environment as a sequence of actions.

The agent model is a frame that we use to understand entities in the world, and to engineer new entities. For example, when constructing a reinforcement learning system to play Atari games, it is natural enough for the engineers of the system to look at things as an agent receiving observations from an environment and sending actions back in return. In this case, the environment is the state of the Atari game, the observations are a 2D grid of pixels shown on the screen, and the actions are the moves available within the game. This is a situation where it makes good sense to use the agent model.

Consider another situation where it makes good sense to use the agent model. An engineer designing a robot vacuum might consider an environment consisting of various obstacles and patches of floor to be vacuumed, observations consisting of "bump" notifications generated when the robot touches a wall, and actions consisting of turning the left and right wheels forward or backwards at different speeds. A robot vacuum is not separate from the environment as an Atari game-playing AI is, but the agent model is still a fairly expedient frame within which human engineers might solve design problems concerning a robot vacuum.

This essay works through some more complicated scenarios in which use of the agent model may give rise to problems.

Note that just because the human engineers of a system use the agent model as a frame for constructing a system does not mean that the system itself uses the agent model internally to model itself or others. A house constructed by engineers who used an understanding of thermodynamics to optimize the heat-efficiency of its insulation does not generally contain any computer capable of using the laws of thermodynamics to consider ways that it might design other houses or redesign itself. Similarly, an entity constructed by engineers who used the agent model to aid development need not itself use the agent model within its own decision processes. My robot vacuum may have an internal representation of the size and shape of its own body, but it almost certainly does not have any explicit concept of itself as an agent receiving observations from and sending actions to the environment, much less an explicit concept of others as agents[1].

The fixed policy assumption

When looking at the world through the frame of the agent model, the environment is seen as changing over time, but the policy implemented by the agent is seen as fixed. Much of the power of the agent model as a frame for understanding the world comes from this fixed policy assumption. For example, under inverse reinforcement learning we observe a sequence of actions taken by some entity, then we look for a value function that explains this behavior. We do not model the value function as changing from one step to the next. If it did, inverse reinforcement learning would no longer work[2]. It is difficult enough to get inverse reinforcement learning to work even with the assumption of a fixed value function; incorporating a time-varying value function into the model would make the problem hopelessly underspecified. A fixed value function is just how inverse reinforcement learning works.

Or, consider ordinary reinforcement learning, in which we search over a space of possible policies, rolling out each one over time to see how it behaves. We may consider policies that behave differently at different times, but, at least in classical reinforcement learning, we do not consider policies that change over time. For example, in setups where the policy is encoded as a neural network, we do not consider policies with network coefficients that change from one time step to the next.

Now, we are aware that agents are implemented as computer programs running on physical computers, and we are aware that these implementations involve memory registers whose values change and wires carrying charge. We are aware that the state of the CPU is changing from moment to moment. We are aware even that the memory cells whose value does not not change are not unchanging at the level of the physical substrate, but instead the memory cell is constructed in a way that maintains a configuration within a certain range that is recognized as a single 0 or a single 1 by the higher-level computing machinery.

So we are aware that a physical computer is in fact changing over time at every level, but we choose to use a frame in which there is a computer program that is running, and the source code for that program is not changing. And this is a reasonable assumption in many cases. Computers in fact have excellent error correction mechanisms that allow them to keep running an unchanging computer program over a long period of time. My robot vacuum, for example, does in fact run the same computer program each time I turn it on. It will be a long time before I can expect a stray cosmic ray to flip a bit representing the core computer program on my robot vacuum, or for the memory cells to physically degrade to the point of information loss.

You might think that if we don’t want this fixed policy assumption then we could just consider a variant of the agent model in which some actions and some observations modify the policy. It’s true that we could modify the agent model in this way, but if you do this "all the way to the bottom", meaning that any aspect of the policy can in principle be modified, then you invalidate much of the machinery that has been built on top. The basic theorems underlying RL and IRL stop working. Even the more basic planning and control algorithms from earlier periods of AI stop working. And the basic conclusions from the rational actor model in economics stop applying, too. So the fixed policy assumption is deeply baked in, and thus will be the primary frame through which this essay views the agent model.

Example: MDP and POMDP

The Markov Decision Process and Partially Observable Markov Decision Process are two models from computer science that explicitly organize the world into an agent and an environment. The "decision process" here refers to the environment, which proceeds through a sequence of states, each one conditionally independent of all prior states given the immediately prior state (this is what "Markov" refers to). In the MDP the agent observes the full state of the world at each point in time, while in the POMDP the agent observes just some aspect of the world at each point in time.

The MDP and POMDP do not explicitly state that the agent receiving observations from and sending actions back to the decision process must be executing an unchanging policy, but their formal solution strategies, such as reinforcement learning, generally do.

The agent model does not imply optimization

Under the agent model we view entities that exert influence over the future as executing abstract algorithms that process observations and generate actions. Those abstract algorithms may or may not be optimization algorithms. For example, I might build a robot that moves forward until it hits a wall, and then stops. We need not view this robot as optimizing anything in order to view it as an agent.

Now, there is an orthogonal question of under what circumstances we might choose to view an algorithm as an optimization algorithm. We might encounter a robot that "under the hood" is taking actions based on a state machine, but choose to view it as acting in service of a goal due the compactness of that representation. This is an important distinction but is unrelated to whether we are using the agent frame or not. The agent frame merely posits an abstract algorithm as an appropriate model for explaining the behavior of some entity.

Dimensions of analysis

I am going to examine the ways that the view of the world afforded by using the agent model differs from the true state of things. I am going to do that by examining scenarios involving various subjects that are using the agent frame and various objects being looked at through the agent frame. The dimensions I want to cover are:

• Scenarios in which an entity is looking at itself versus looking at another entity.

• Scenarios with humans and machines (and combinations thereof). For now I will consider each combination of subject and object being human and machine.

• Scenarios with single and multiple entities. For now I will collapse this with the first dimension and consider cases consisting of either one entity or two entities, where in the former case I assume that the entity views itself through the agent frame, and in the latter case that the entity views the other entity through the agent frame.

The scenarios I will consider are as follows:

For each of these scenarios I will consider various pitfalls. There are 12 pitfalls in total, and they are summarized in a table at the bottom.

Humans looking at themselves as agents Pitfall 1: Self-hatred

Sometimes a human perceives that an action they have taken has caused harm in the world. In some cases the perception is mistaken, and in reality their action was not the cause of the harm, while in other cases the perception is correct, and their action was the cause of the harm. But upon seeing this harm, a human viewing themselves through the frame of the agent model, and therefore making the fixed policy assumption with respect to themselves, may conclude that this harm is an artifact of an immutable internal decision algorithm. Since the human does not want to cause harm, but also holds the view of an immutable internal decision algorithm that, from within the frame of the agent model, has been observed causing harm at least once, their only option appears to be to adopt a combative relationship with this immutable internal decision algorithm and limit the harm that it is causing by resisting it. This leads to a internal conflict as the human takes actions, then perceives these actions from within the agent frame as having been generated by an immutable internal decision algorithm, then concludes on the basis of having perceived this immutable internal decision algorithm having caused harm in the past that the action is probably also harmful, and then takes further actions to resist and limit the consequences of this first action. Of course these further actions are subject to the same chain of reasoning so long as the human is looking at themselves from within the agent frame, so the human ends up taking yet further actions to oppose the actions that were taken to oppose the first action, and this cycle continues indefinitely.

Perception from within the agent frame: An unchanging internal decision algorithm is causing harm but cannot be modified, so must be fought.

Reality: The human could simply choose to act differently.

Fundamental misperception due to the agent frame: That there is "some place else" from which actions originate, separate from the one reasoning about the harm.

Pitfall 2: Procrastination / forgetfulness

Sometimes a human sees that an action would be valuable to perform, but sees little benefit in performing that action sooner rather than later, perhaps up to some deadline, such as filing taxes before tax day. The human, viewing themselves through the frame of the agent model, believes that there will be no cost to delaying, since they perceive an unchanging internal decision algorithm that has been observed at least once identifying the action as valuable, and so is likely to do so in the future. In fact there are multiple mistakes in this reasoning. First, humans are subject to change due to interaction with their environments, and this particular human may change in such a way that they forget or undervalue this action in the future. Second, humans are subject to change due to their own actions, and the architecture of human brains is such that actions practiced repeatedly become more likely to be performed again in the future, so by delaying action the human is in fact performing a subtle form of self-modification in the direction of delaying actions in general. In other situations this might be what the human intended to do, but in this example the human is, by assumption, overlooking this.

Perception from within the agent frame: An unchanging decision algorithm could equally well perform the action now or in the future, and there is no harm caused by delaying action.

Reality: Human decision algorithms are subject to change due both to interaction with the environment and habit formation.

Fundamental misperception due to the agent frame: Actions taken by the human do not affect the human’s policy.

Humans looking at other humans as agents Pitfall 3: Hatred of others

Sometimes a human perceives an action taken by another human as having caused harm. In some cases the human’s perception is mistaken, and in reality the other human was not the cause of the harm, while at other times the perception is correct, and in reality the other human was the cause of the harm. But upon seeing this harm, a human viewing another human through the frame of the agent model may conclude that the cause of the harm can be traced back to an immutable decision algorithm within the other human. Since the human does not want there to be harm in the world, but also holds the view of an immutable decision algorithm within the other human that, from within the frame of the agent model, has been observed causing harm at least once, their only option appears to be adopting a combative relationship with the other human. In particular, the human may not believe that the other human has the capacity to change this internal decision algorithm even if they wanted to, so may not seek to negotiate with this other human, concluding instead that their only option is to resist or limit the consequences of other human’s actions.

Perception from within the agent frame: An unchanging decision algorithm within another human is causing harm but cannot be modified, so must be fought.

Reality: The other human might have the capacity to self-modify and might choose to do so if negotiated with.

Fundamental mis-perception due to the agent frame: Negotiation is unlikely to work because the other human could not change their own internal decision algorithm even if they wanted to.

Pitfall 4: Viewing oneself as a loser in Newcomb’s problem

When some aspect of the environment has been determined by an examination of a human’s decision-making strategies via a channel that is not the human’s own actions, a human viewing the world through the frame of the agent model may miss the opportunity to make changes to their own decision-making due to the belief that some other human will necessarily view them as having a fixed internal decision algorithm. Newcomb’s problem formalizes this in the context of a philosophical thought experiment. Within Newcomb’s problem, a human using a strict agent model may reason that they are a consequential decision agent for better or worse, and that hypothetical panel of experts in Newcomb’s problem will have perceived this, and therefore will have put the lesser of two amounts in the envelopes, and so their best course of action is to take both boxes.

Perception from within the agent frame: An unchanging internal decision algorithm will have been perceived and acted upon by some external entity.

Reality: There is in fact no unchanging internal decision algorithm, and a panel of experts as omniscient as the one hypothesized in Newcomb’s problem will have correctly realized this.

Fundamental misperception due to the agent frame: First, that an unchanging internal decision algorithm exists, and second that this will have been perceived and acted upon by an external entity.

Humans looking at machines as agents Pitfall 5: Agency hand-off fallacy

A human building an AI adopts a frame in which the AI, once deployed, will be an agent. The human correctly reasons that the AI will exert influence over the future, but incorrectly adopts the view that the AI will necessarily consist of an unchanging internal decision algorithm. Due to this, the human does in fact build an AI with an unchanging internal decision algorithm, overlooking other possible designs. This forces the human to hand off influence over the future to the agent at the time of the agent’s construction, which in turn forces the human to adopt a false dichotomy between solving a wide array of philosophical and technical problems before the first AI is built, or else deploying a powerful AI that is not certain to act in a manner that the human would approve of.

Perception from within the agent frame: Powerful AI systems will necessarily contain an unchanging internal decision algorithm.

Reality: There is a wider design space of autonomous machines that exert influence over the future.

Fundamental misperception due to the agent frame: That the design space for autonomous machines that exert influence over the future is narrower than it seems. This creates a self-fulfilling prophecy in which the AIs actually constructed are in fact within this narrower regime of agents containing an unchanging internal decision algorithm.

Pitfall 6: Forgetting to examine the scaffold

A human considers a robot they are building using the frame of the agent model. Due to this, they place most of their attention on formulating the decision algorithm, and place less attention on the sensors, actuators, and computing machinery that will implement the decision algorithm and connect it with the external world. Due to this, the design of the decision algorithm is not informed by the practical failure modes of the sensors, actuators, and computing machinery, and the overall system is fragile.

Perception from within the agent frame: An unchanging internal decision algorithm that receives observations and outputs actions is the primary objective of design and engineering efforts.

Reality: The sensors, actuators, and computing machine may require as much subtlety in design and engineering efforts as the decision algorithm.

Fundamental misperception due to the agent frame: Over-emphasis on the decision algorithm during development.

Pitfall 7: Putting ourselves "into the shoes" of the machine

A human considers an AI they are building through the frame of the agent model. They consider situations in which the AI may be copied and may not know how many times it has been copied, as per philosophical thought experiments such as the Sleeping Beauty problem. Due to the view of the AI as an agent with an unchanging internal decision algorithm, the human occupies most of their attention with the question of what they (the human) would do given the information available to the AI under various hypothetical situations, missing the opportunity to simply choose a design for the AI that has the consequences desired by the human. Within the Sleeping Beauty problem, for example, it is difficult to decide what the correct probability to place on various events is when looking at the problem from the inside, but easy to pick an AI design that would act in service of any particular objective when looking at the problem from the outside.

Perception from within the agent frame: A human emphasizes an internal mode of problem solving over an external mode due to putting themselves "into the shoes" of a perceived unchanging decision algorithm within a machine.

Reality: External-mode problem solving is also feasible.

Fundamental misperception due to the agent frame: Over-emphasis of internal mode problem solving.

Machines looking at themselves as agents Pitfall 8: Confusion about fallibility

An AI programmed to model itself using an explicit agent model might notice that its actions do not always match those predicted by its self-model. In fact this is due to a discrepancy between the AI’s model of itself, which predicts that its actions will be a function of its perceptions, and its real-world implementation, which involves physics sensors, actuators, and computing hardware that take time to process information and are subject to errors. Due to the inability of the agent model to capture the physicality of the AI’s computing hardware, the AI might develop false explanations for the discrepancy between its prediction of its own actions and those observed. In particular it is likely to explain these discrepancies as caused by features of the environment, since the environment contains most of the free parameters within the agent model.

Perception from within the agent frame: An AI’s actions have a purely functional relationship to its perceptions, and any discrepancy with respect to this assumption must be due to some feature of the environment.

Reality: All machines are physical entities and are at best approximated as functions from percepts to actions.

Fundamental misperception due to the agent frame: Any non-functional aspect of the AI’s behavior must be a feature of the environment

Pitfall 9: Difficulty modifying own hardware

An AI programmed to model itself using an explicit agent model may have difficulty making changes and upgrades to its own hardware. As the AI entertains actions that might modify its own hardware, an AI relying on the agent model may not fully account for all of the consequences of its actions since the AI will not expect its actions to affect its own core decision algorithm due to the fixed policy assumption. As mentioned previously, one might imagine certain "quick fixes" to the agent model that permit actions that directly change the agent’s decision algorithm, but in fact this is more challenging than it may seem, since the fixed policy assumption is core to many of the basic search and learning strategies that underlie contemporary AI theory.

Perception from within the agent frame: Actions will not change the AI’s own decision algorithm

Reality: The AI’s decision algorithm is represented within physical memory units and is executed on a physical computer, both of which can be affected by the AI’s actions.

Fundamental misperception due to the agent frame: There is an unchanging internal decision algorithm within the AI that is not subject to change because it is not part of the environment.

Machines looking at humans as agents Pitfall 10: Looking for fixed values

Consider an AI that is programmed to \ infer a value function that explains observed human behavior and then take actions in service of this inferred human value function. An AI programmed this way would assume that humans have values that are fixed over time, due to the view of an unchanging decision algorithm within humans. This may cause such an AI to incorrectly extrapolate current values to future values. This would fail, for example, when modelling children whose decision algorithms will evolve significantly as they grow, or when observing adults experiencing significant life changes. Upon observing humans who are in fact changing over time, the AI may be forced into an explanation that posits an unchanging value function, in which case the AI may form an incorrect view of human values and take undesirable actions.

Perception from within the agent frame: Human behavior can be explained by an unchanging internal decision algorithm.

Reality: Humans change over time.

Fundamental misperception due to the agent frame: Human values are fixed.

Pitfall 11: Incomprehensibility of corrigibility

A human engineer may wish to construct an AI that can be modified after it has been deployed, in case the human identifies mistakes in the AI’s design. But an AI programmed to model itself as an agent will have difficulty understanding the intentions of a human trying to modify the AI, since from within the agent frame the AI’s internal decision algorithm is not changeable. The human’s behavior may appear bizarre or incomprehensible to the AI.

Perception from within the agent frame: Human actions that are in fact intended to modify the AI’s internal decision algorithm appear incomprehensible since the AI views its internal decision algorithm as immutable.

Reality: A human might want to modify the AI’s internal decision algorithm

Fundamental misperception due to the agent frame: The AI’s internal decision algorithm is unchanging so it cannot be the intent of any other entity to modify it.

Machines looking at other machines as agents Pitfall 12: Bias towards conflict

An AI interacting with other AIs in pursuit of a goal will need to decide when to negotiate with entities that oppose it and when to fight with such entities. We expect there to be scenarios in which negotiation is the strategy we would wish for the AI to take and other scenarios in which fighting is the strategy we would wish for the AI to take, but an AI programmed to use the agent model to understand other AIs may be suboptimally biased towards fighting, for the following reason. The AI being perceived through the frame of the agent model may be seen as having a decision algorithm that gives it the capacity to choose its actions on the basis of negotiation. But some aspects of this other AI’s behavior may be perceived as a fixed consequence of the unchanging decision algorithm perceived within this other AI. This means that an AI using the agent model to understand other AIs may choose conflict in cases where it perceives these fixed aspects of the other AI’s behavior as being opposed to its own goals. In some cases it may be true that the other AI was incapable of overturning aspects of its programming. But in other cases the other AI may have in fact been capable of and willing to negotiate, and the decision to choose conflict over negotiation was due to a fundamental misperception due to use of the agent model.

Perception from within the agent frame: Some aspects of another entity’s behavior is attributable to a fixed internal decision algorithm and cannot be modified by the other entity even if it wanted to, so negotiations concerning these behaviors are futile.

Reality: Other entities may have the capacity to modify their behavior at ever level

Fundamental misperception due to the agent frame: Some aspects of other entities’ behavior are fixed consequences of an unchanging internal decision algorithm and must be fought.

Summary of the pitfalls

Connections to other research

Scott Garrabrant and Abram Demski have written about the novel challenges that arise when designing agents that are part of the world rather than outside the world in the Embedded Agency sequence. Garabrandt and Demski note that any real-world agent we might build would necessarily be part of the world in which it is deployed, and that we have little understanding of how to think about agency under these conditions. They identify four ways that embedded agency differs from non-embedded agency: that embedded agents lack predefined input/output channels between self and world, that embedded agents cannot conceivably fit a complete model of the whole world into their minds because their minds are physically smaller than the world, that embedded agents must consider the consequences of actions that might modify the agent itself, and that embedded agents are constructed out of the same fundamental parts as the world. The present critique of the agent model was very much inspired by this work. Rather than adapting the agent model to the embedded domain, my sense is that we should be seeking a post-agent model with which to understand entities that exert influence over the future.

I have not yet read Garrabrant’s sequence on cartesian frames, so cannot comment on its connections to the present work, which I expect to be numerous.

Abram Demski has described a concept he calls Partial Agency, in which an agent uses some but not all channels of influence to shift the world towards its objective. For example, an agent predicated upon causal decision theory does not attempt to exert influence over the world via the actions resulting from the predictions made by other agents about its decisions. This channel of influence is available to the agent, but its architecture is such that it does not consider making use of them. He compares this to "full agency", in which an agent does consider all affordances available to it. Both partial agency and full agency appear to be situated within the agent frame as I have described it in this essay, since they both view entities that exert influence over the future through the frame of an abstract algorithm processing observations and generating actions.

In Deconfuse Yourself About Agency, Vojta Kovaric attempts to make progress on the question of which entities in the world we should take to be agents, and then introduces three questions as directions for further research on agent fundamentals. Kovaric introduces the notion of A-morphization, in which we model some entity as a particular parametrization of a certain architecture A. He says that if the best model of some entity is an A-morphization, and if A is an agent-like architecture, then we may call the entity an agent. But this just begs the question of how we determine whether A is an agent-like architecture. On this, Kovaric simply notes that different people will see different architectures as being agent-like. Of particular interest is the following question posed at the end of the article: Is there a common fundamental physical structure or computation behind all agent-like behavior? Overall I see this piece as working primarily from within the agent frame.

Laurent Orseau, Simon McGill, and Shane Legg have published Agents and Devices: A Relative Definition of Agency, in which they describe the construction of a classifier that assigns probabilities to whether an object is a device or an agent. A device is taken to be something that operates according to a mechanistic input-output mapping, and is modelled formally by the authors using the speed prior (a fast computable approximation to the Solomonoff prior). An agent is something we model as having beliefs and making decisions according to an objective function. The authors assume that some set of possible utility functions are given (in the experiments they are goals of reaching certain labelled points in a maze), then use inverse reinforcement learning with a switching prior to perform inference on which goal might have been sought at which time. Having done this, they can compare the hypothesis that a certain object is a device to the hypothesis that it is an agent. This work is of great value from the perspective of the present essay as a fully fleshed-out operationalization of what exactly the agent model entails.

The field of embodied cognition views human cognition as deeply dependent on the body. A related field, embedded cognition, views human cognition as deeply dependent on the natural and social environment in which an organism is immersed. This large field is highly abstract, draws on ideas from continental philosophy, and seems difficult to penetrate, yet due to its focus on cognitive processes that are embedded in the physical world may contain insights of interest to the development of a post-agency understanding of intelligent systems. Of particular interest for a follow-up post is Rodney Brooks’ work on the subsumption architecture and "intelligence without representation".

Conclusion

The agent model is an exceptionally powerful model, and for this reason it is the primary model with which we have chosen to understand the entities on this planet that exert greatest influence over the future. It is precisely because of the power of this model that we have come to rely upon it so heavily. But when we use one frame to the exclusion of all others, we may forget that we are using a frame at all, and begin to accept the confines of that frame as a feature of reality itself, not as a temporarily and voluntarily adopted way of seeing.

I believe this has happened with the agent model. It seems to me that we are so acquainted with the agent model that we have lost track of the ways that it is shaping our view of reality. As we build advanced AI systems, we should carefully examine the pros and cons of the frames that we use, including the agent model, or else we may miss whole regions of the design space without noticing. In this essay I have attempted to lay out some of the pitfalls of using the agent model.

1. This is one of the most confusing things about conceptual work in the field of AI. This field is unique among all engineering disciplines in that the object of our engineering efforts has the potential to itself use frames of its own as it perceives the world. As Eiliezer wrote about repeatedly in the sequences, it is critical to be extremely clear about what is a frame that we are using to think about building an AI, and what is a frame being used by an AI to think about take action in the world. ↩︎

2. At least not if the value function was permitted to change arbitrarily between each step. Perhaps IRL could be made to work with a changing value function given some constraints on its rate of change, but classical IRL does not handle this case. ↩︎

Discuss

### Covid-19 in India: Why didn't it happen earlier?

27 апреля, 2021 - 22:13
Published on April 27, 2021 7:13 PM GMT

When I look at the numbers of Covid-19 cases and Covid-19 deaths among G20 countries, India still has a relatively low number per capita. The arguments for why the current wave is particularly catastrophical in India (like population density or poverty) applied all the time, however, except for the mutation. How did India manage to have relatively low case rates up to now?

Discuss

### Jaan Tallinn's 2020 Philanthropy Overview

27 апреля, 2021 - 19:22
Published on April 27, 2021 4:22 PM GMT

to follow up my philantropic pledge from last year, i've updated my philanthropy page with 2020 results.

TL;DR: i made $4.3MM worth of endpoint grants in 2020, mostly via SFF's grant rounds, beating my minimum commitment by about 2x. Discuss ### Scott Alexander 2021 Predictions: Market Prices 27 апреля, 2021 - 17:03 Published on April 27, 2021 2:03 PM GMT Scott Alexander has posted some predictions for 2021. Taking Zvi's approach from here and rather than making my own adjustments, making my best estimate of what various prediction (and financial) markets are saying the odds are. If anyone has seen a market for any of the questions listed here which I haven't found one, let me know and I'll add it. US/WORLD 1. Biden approval rating (as per 538) is greater than 50%: 80% Metaculus gives this 61%. (Much lower than both Zvi and Scott) 2. Court packing is clearly going to happen (new justices don’t have to be appointed by end of year): 5% PredictIt gives a lower bound on this at 5%. Metaculus gives a 27% chance to court packing by 2030. Assuming the chances of this happening is mostly weighted sooner rather than later (4/3/2/1 over Biden's term + 30% '24 - '30). would give this a 4% chance in this year 3. Yang is New York mayor: 80% 4. Newsom recalled as CA governor: 5% 5. At least$250 million in damage from BLM protests this year: 30%
6. Significant capital gains tax hike (above 30% for highest bracket): 20%
7. Trump is allowed back on Twitter: 20%

Not aware of any markets

8. Tokyo Olympics happen on schedule: 70%

Note that Metaculus is talking about an Olympics at any point in 2021, whereas the others are about the Olympics being on schedule.

9. Major flare-up (significantly worse than anything in past 5 years) in Russia/Ukraine war: 20%

Metaculus has a related question at 32%. If we assume that question is about reaching a level approximately the same as was achieved in the past, this is presumably putting the chances of something worse at ~16%

10. Major flare-up (significantly worse than anything in past 10 years) in Israel/Palestine conflict: 5%
11. Major flare-up (significantly worse than anything in past 50 years) in China/Taiwan conflict: 5%

Not aware of any markets (specific to 2021).

12. Netanyahu is still Israeli PM: 40%

PredictIt has this at 22%

13. Prospera has at least 1000 residents: 30%

Not aware of one. I think Metaculus are making one though

14. GME >$100 (Currently$170): 50%

Possibly the most concrete one. The options market is giving this 60% chance right now. (Actually, it's giving that until 21-Jan-22, so the chances are even higher than that)

15. Bitcoin above 100K: 40%

Metaculus gives this 43% (at any point)
Deribit options give this 25%(at the end of the year)

16. Ethereum above 5K: 50%

Deribit options give this 11% (at the end of the year)

17. Ethereum above 0.05 BTC: 70%

18. Dow above 35K: 90%

Option market gives this 50%

19. …above 37.5K: 70%

Option market gives this 20%

20. Unemployment above 5%: 40%

Metaculus gives this 37%

Not seen a market for this

22. Starship reaches orbit: 60%

Metaculus gives this 50%

COVID

23. Fewer than 10K daily average official COVID cases in US in December 2021: 30%
24. Fewer than 50K daily average COVID cases worldwide in December 2021: 1%

Not seen a market

25. Greater than 66% of US population vaccinated against COVID: 50%

Metaculus gives this 77% (their line is 69% vaccinated, so their probability for 60% is even higher)

26. India’s official case count is higher than US: 50%

Not seen a market

27. Vitamin D is generally recognized (eg NICE, UpToDate) as effective COVID treatment: 70%

None of Metaculus' 4 questions about Vit-D are >= 25%:

28. Something else not currently used becomes first-line treatment for COVID: 40%
29. Some new variant not currently known is greater than 25% of cases: 50%
30. Some new variant where no existing vaccine is more than 50% effective: 40%

Not seen a market

31. US approves AstraZeneca vaccine: 20%

Metaculus gives this 37%

32. Most people I see in the local grocery store aren’t wearing a mask: 60%

Not seen a market

Discuss

### Can you improve IQ by practicing IQ tests?

27 апреля, 2021 - 14:28
Published on April 27, 2021 11:28 AM GMT

As an European, I did never have any IQ test, nor I know anybody who (to my knowledge) was ever administered an IQ test. I looked at some fac-simile IQ tests on the internet, expecially Raven's matrices.

When I began to read online blogs from the United States, I started to see references to the concept of IQ. I am very confused by the fact that the IQ score seems to be treated as a stable, intrinsic charachteristic of an individual (like the height or the visual acuity).

When you costantly practice some task, you usually become better at that task. I imagine that there exists a finite number of ideas required to solve Raven matrices: even when someone invents new Raven matrices for making new IQ tests, he will do so by remixing the ideas used for previous Raven matrices, because -as Cardano said- "there is practically no new idea which one may bring forward"

The IQ score is the result of an exam, much like school grades. But it is generally understood that school grades are influenced by how much effort you put in the preparation for the exam, by how much your family cares for your grades, and so on. I expect school grades to be fairly correlated to income, or to other mesures of "success".

In a hypothetical society in which all children had to learn chess, and being bad at chess was regarded as a shame, I guess that the ELO chess ratings of 17 year olds would be highly correlated with later achievements.  Are IQ tests the only exception to the rule that your grade in an exam is influenced by how much you prepare for that exam? Is there a sense in which IQ is a more "intrinsic" quantity than, for example, the AP exam score, or the ELO chess rating?

Discuss

### Agents Over Cartesian World Models

27 апреля, 2021 - 05:06
Published on April 27, 2021 2:06 AM GMT

Thanks to Adam Shimi, Alex Turner, Noa Nabeshima, Neel Nanda, Sydney Von Arx, Jack Ryan, and Sidney Hough for helpful discussion and comments.

Abstract

We analyze agents by supposing a Cartesian boundary between agent and environment. We extend partially-observable Markov decision processes (POMDPs) into Cartesian world models (CWMs) to describe how these agents might reason. Given a CWM, we distinguish between consequential components, which depend on the consequences of the agent's action, and structural components, which depend on the agent's structure. We describe agents that reason consequentially, structurally, and conditionally, comparing safety properties between them. We conclude by presenting several problems with our framework.

Introduction

Suppose a Cartesian boundary between agent and environment:[1]

We describe how the agent interfaces with the environment with four maps: observe, orient, decide, and execute.[2]

• observe:E→ΔO describes how the agent observes the environment, e.g., if the agent sees with a video camera, observe describes what the video camera would see given various environmental states. If the agent can see the entire environment, the image of observe is distinct point distributions. In contrast, humans can see the same observation for different environmental states.
• orient:O×I→ΔI describes how the agent interprets the observation, e.g., the agent's internal state might be memories of high-level concepts derived from raw data. If there is no historical dependence, orient depends only on the observation. In contrast, humans map multiple observations onto the same internal state.
• decide:I→ΔA describes how the agent acts in a given state, e.g., the agent might maximize a utility function over a world model. In simple devices like thermostats, decide maps each internal state to one of a small number of actions. In contrast, humans have larger action sets.
• execute:E×A→ΔE describes how actions affect the environment, e.g., code that turns button presses into game actions. If the agent has absolute control over the environment, for all e∈E, the image of execute(e,⋅) is all point distributions over E. In contrast, humans do not have full control over their environments.

We analyze agents from a mechanistic perspective by supposing they are maximizing an explicit utility function, in contrast with a behavioral description of how they act. We expect many training procedures to produce mesa-optimizers that use explicit goal-directed search, making this assumption productive.[3]

Consequential Types

We use four types of objects (actions, observations, environmental states, and internal states) and four maps between them (observe, orient, decide, and execute) to construct a world model. The maps are functions, but functions are also types. We will refer to the original four types as consequential types and the four maps as structural types.

We can broadly distinguish between four type signatures of utility functions over consequential types, producing four types of consequential agents.[4]

• Environment-based consequential agents assign utility to environmental states. Most traditional agents are of this type. Examples include the Stamp Collector, a paperclip maximizer, and some humans, e.g., utilitarians that do not value themselves.
• Internal-based consequential agents assign utility to different internal states. Very few "natural" agents are of this type. Examples include meditation bot, which cares only about inner peace, happiness bot, which cares only about being happy, and some humans, e.g., those that only value their pleasure.
• Observation-based consequential agents assign utility to different observations. Many toy agents have bijective observe functions and could be either observation-based or environment-based. Examples include virtual reality bot, which wants to build itself a perfect VR environment, video game agents that value the observation of the score, and some humans, e.g., those that would enter the experience machine.
• Action-based consequential agents assign utility to different actions. Very few "natural" agents are of this type. Examples include twitch bot, which just wants to twitch, ditto bot, which wants to do whatever it did previously, and some types of humans, e.g., deontologists.[5]

Some agents have utility functions with multiple types, e.g., utilitarian humans with deontological side constraints are environment/act-based consequential agents. Some humans value environmental states and personal happiness, making them environment/internal-based consequential agents. Question answering agents value different answers for different questions, making them internal/act-based consequential agents.

Agents could also have utility functions over structural types. We defer discussion of such structural agents until after we have analyzed consequential agents.

Consequential Agents Cartesian world models

We modify discrete-time partially observable Markov decision processes (POMDPs) by removing the reward function and discount rate and adding internal states. Following Hadfield-Menell et al., who call a Markov decision process (MDP) without reward a world model, we will refer to a discrete-time POMDP without reward (POMDP\R) and with internal states as a Cartesian world model (CWM).[6]

Formally, a Cartesian world model is an 7-tuple (E,O,I,A,observe,orient,execute), where

• E is the set of environmental states (called "states" in POMDP\Rs),
• O is the set of observations the agent could see (also called "observations" in POMDP\Rs),
• I is the set of internal states the agent could have (not present in POMDP\Rs),
• A is the set of actions available to the agent (also called actions in POMDP\Rs),
• observe:E×A→ΔO is a function describing observation probabilities given environmental states (called "conditional observation probabilities" in POMDP\Rs),
• orient:O×I→ΔI is a function describing internal state probabilities given observations (not present in POMDP\Rs),
• execute:E×A→ΔE is a function describing transition probabilities between different states of the environment given different actions (called "conditional transition probabilities" in POMDP\Rs),

At each time period, the environmental state is in some e∈E, and the agent's internal state is some i∈I. The agent decides upon an action a∈A, which causes the environment to transition to state e′ sampled from execute(e,a). The agent then receives an observation o∈O sampled from observe(e′,a), causing the agent to transition to internal state i′ sampled from orient(o,i).

This produces the initial 4-tuple of context c0:=(e,i,a,o) representing the initial environment's state, the agent's initial internal state, the agent's initial action, and the agent's initial observation. Subsequent time steps t produces additional 4-tuples of context ct.

MDPs are traditionally used to model decision-making situations; agents trained to achieve high reward on individual MDPs implement policies that make "good" decisions. In contrast, we intend CWMs to capture how agents might model themselves and how they interact with the world.

Consequential Decision Making

Agents making decisions over CWMs attempt to maximize some utility function. To emphasize that agents can have utility functions of different types, we make the type explicit in our notation. For example, a traditional agent in a POMDP will be maximizing expected utility over an environment-based utility function, which we will denote U{E}(e,t), with the second parameter making explicit the time-dependence of the utility function.

More formally, let T be the set of possible consequential type signatures, equal to P({E,O,I,A}). For some T∈T, let UT:T×N→R be the agent's utility function, where T can vary between agents. Recall that ct:=(e,i,a,o) is at the 4-tuple of CWM context at time t. Let ct|T be ct restricted to only contain elements of types in T. A consequential agent maximizes expected future utility, which is equal to E[∑∞t=0UT(ct|T,t)].

UT's time-dependence determines how much the agent favors immediate reward over distant reward. When 0 \implies U_T(c_t|_T, t) = 0">t>0⟹UT(ct|T,t)=0 the agent is time-limited myopic; it takes actions that yield the largest immediate increase in expected utility. When UT(ct|T,t)≈UT(ct|T,t+1) the agent is far-sighted; it maximizes the expected sum of all future utility.

Examples Example: Paperclip maximizer

Consider a paperclip maximizer that can only interface with the world through a computer. Define a Cartesian world model (E,O,I,A,observe,orient,execute) as follows:

• E is all ways the universe could be,
• O is all values a 1080 x 1920 monitor can take,
• I is the set of internal states, which we leave unspecified,
• A is the set of keycodes plus actions for mouse usage,
• observe maps the environment to the computer screen; this will sometimes be correlated with the rest of the environment, e.g., through the news,
• orient maps the computer screen to an internal state, which we leave unspecified,
• execute describes how keycodes and mouse actions interact with the computer.

Additionally, let our agent's utility function be $U_{{E}}(e, t) = \text{the number of paperclips in e}$.

Many of these functions are not feasible to compute. In practice, the agent would be approximating these functions using abstractions, similar to the way humans can determine the consequences of their actions. This formalism makes clear that paperclip maximizers have environment-based utility functions.

Example: twitch-bot

Twitch-bot is a time-limited myopic action-based consequential agent that cares only about twitching. Twitch-bot has no sensors and is stateless. We can represent twitch-bot in a CWM+U as follows:

• E is all ways the universe could be,
• O={∅}; twitch-bot can only receive the empty observation,
• I={∅}; twitch-bot has only the empty state,
• A={twitch,∅}; twitch-bot has only two actions, twitching and not twitching,
• observe always gives ∅,
• orient always gives ∅,
• execute describes how twitching affects the world,

Additionally, let our agent's utility function be U{A}(twitch,0)=1, with U{A}(a,t)=0 otherwise.

The optimal decide for this CWM+U always outputs twitch.

Example: Akrasia

Humans sometimes suffer from akrasia, or weakness of will. These humans maximize expected utility in some mental states but act habitually in others. Such humans have utility functions of type {E,I,A}; in some internal states, the human is an environment-based consequential agent, and in other internal states, the human is an action-based consequential agent.

Relation to Partially Observable Markov Decision Processes

We desire to compare the expressiveness of CWMs to POMDPs. Since we want to compare CWMs to POMDPs directly, we require that the agents inside each have the same types. We will call a POMDP P equivalent to CWM C if there exists a utility function U such that the agent optimal in P is optimal with respect to U in C and vice-versa.

Partially Observable Markov Decision Processes ⊆ Cartesian World Models

Given a POMDP, we can construct a Cartesian world model + utility function (CWM+U) that has an equivalent optimal consequential agent. Let (S′,A′,T′,R′,Ω′,O′,γ) be a POMDP, where

• S′ is a set of states,
• A′ is a set of actions,
• T′ is a set of conditional transition probabilities between states,
• R′:S′×A′→R is the reward function,
• Ω′ is the set of observations,
• O′ is the set of conditional observation probabilities, and
• γ∈[0,1] is the discount factor.

Recall that an optimal POMDP agent maximizes E[∑∞t=0γtrt], where rt is the reward earned at time t.

Let our Cartesian world model be a 7-tuple (E,O,I,A,observe,orient,execute) defined as follows:

• E=S′,
• O=Ω′
• I=ΔS′, the set of belief distributions over S′,
• A=A′,
• observe=O,
• orient(o,i) describes the Bayesian update of a belief distribution when a new observation is made, and
• execute=T′

Additionally, let our agent's utility function be U{E,A}(e,a,t)=R′(e,a)γt.

Since a POMDP has the Markov property for beliefs over states, an agent in the CWM+U has information as an agent in the POMDP. Consequential agents in CWMs maximize E[∑∞t=0UT(ct|T,t)]. Substituting, this is equivalent to E[∑∞t=0U{E,A}(ct|{E,A},t)]. U{E,A}(ct|{E,A},t)=R′(et,at)γt=rtγt, so E[∑∞t=0UT(ct|T,t)]=E[∑∞t=0rtγt]. Thus our agents are maximizing the same quantity, so an agent is optimal with respect to the CWM+U if and only if it is optimal with respect to the POMDP.

Cartesian World Models ⊈ Partially Observable Markov Decision Processes

If we require the CWM agent to have the same type signature as in the POMDP, some CWMs do not have equivalent POMDPs. Agents in CWMs map internal states to actions, whereas agents in POMDPs map internal belief distributions over states into actions. Therefore, agents optimal in a POMDP have infinite internal states. Since one can construct a CWM with finite internal states, there must exist a CWM that cannot be converted to an equivalent POMDP.

This non-equivalence is basically a technicality and thus unsatisfying. A more satisfying treatment would use a less rigid definition of equivalence that allowed the CWM agent and the POMDP agent to have different types. In particular, it might be possible to construct a POMDP and a partition over belief states such that the elements of the partition can be put in correspondence with the internals states in the CWM. We leave exploration of this question to future work.

Structural Agents Structural Decision Making

In contrast to consequential agents, structural agents have utility functions over structural types. We model structural agents as reasoning about a CWM with a modified decision procedure. In this framework, the agent is not trying to select utility-maximizing actions but instead trying to enforce utility-maximizing relations between consequential types. The agent is optimizing over the set of possible decide functions instead of the set of possible actions. Note that an agent optimizing in this way could still have utility functions of multiple types in the same way that an environment-based consequential agent still optimizes over its possible actions.

Let a Cartesian World Model plus Self (CWM+S) be a 8-tuple (E,O,I,A,observe,orient,execute,decide), where the first 7 are a CWM and the last entry is the decide function of an agent. Let CWM+S be the set of all CWS+Ss.[7] The utility function of a structural agent is a function U:CWM+S→R that assigns utility to each CWS+S.

Let C be a CWM and decide:I→A be a decision function. Let C+decide be the CWM+S that agrees with C for the first 7 entries and is equal to decide for the 8th. This construction omits acausal implications of changing decide. A method of constructing CWM+Ss that include acausal implications is currently an open problem.

Recall that AI is the set of all functions from I to A. A structural agent reasoning according to a CWM C acts to implement decide∗=argmaxdecide∈AIU(C+decide). Behaviorally, when given an internal state i, an optimal structural agent will take a=decide∗(i)

Examples Pseudo-example: Updateless Decision Theory

An agent using updateless decision theory (UDT) is a structural agent with a consequential utility function over environmental states reasoning over a CWM+S that includes acausal implications. In order to convert the consequential utility function Ub into a structural one Us, we simply define Us(C+decide) by rolling out C+decide to produce a sequence e0,e1,e2,... of environmental states and let Us(C+decide)=∑∞t=0Ub(et)λt with some discount factor λ.

Example: Structural HCH-bot

In the limit of training, imitative amplification produces Human consulting HCH(HCH). We describe a structural agent that is implementing HCH.[8] The agent receives input via computer terminal and outputs text to the same terminal. It has 1 Mb of memory. In the following, let Σ be the alphabet of some human language, including punctuation and spaces:

• E is all ways the universe could be,
• O=Σ∗, all finite strings from Σ,
• I=P({n∣0≤n<8×106}), the set ways 1 Mb of memory can be,
• A=ΔΣ<1000, the set of probability distributions over strings from Σ of less than 1000 characters,
• observe yields the command typed in at the computer terminal with probability ~1,
• orient stores the last 1 Mb of the string given by observe,
• execute describes the world state after outputting a given string at the terminal.

Let P be some distribution over possible inputs. Let HCH:I→ΔA be the HCH function. Let our agents utility function U be defined as U(decide)=Ei∼P[KL(HCH(i)||decide(i))].[9]

It is unclear which P makes the agent behave properly. One possibility is to have P be the distribution of what questions a human is likely to ask.[10] Any powerful agent likely has a human model, so using the human distribution might not add much complexity.

Example: Structural Decoupling

A consequential approval-maximizing agent takes the action that gets the highest approval from a human overseer. Such agents have an incentive to tamper with their reward channels, e.g., by persuading the human they are conscious and deserve reward.

In contrast, a structural approval-maximizing agent implements the decide function that gets the highest approval from a human overseer. Such agents have no incentive to directly tamper with their reward channels, but they still might implement decision functions that appear safe without being safe. However, a decide function that overtly manipulates the overseer will get low approval, so structural approval-maximizing agents avoid parts of the reward tampering problem.

This example is inspired by the decoupled approval described in Uesato et al.[11]

Types of Structural Agents

There are roughly four types of consequential agents, one for each consequential type. This correspondence suggests there are four types of structural agents, one for each structural type.

Agents with utility functions over decide are coherent. However, since we do not include acausal implications when constructing a CWM+S, agents with utility functions over orient, observe, or execute have constant utility. More specifically, the agent only has control over its own decide function, which does not have influence over orient, observe, or execute (within the CWM+S), so agents with utility functions over those types will not be able to change anything they care about. How structural agents act when we include acausal implications is currently an open problem.

Utility/Decision Distinction

Besides having utility functions with different type signatures, structural agents also make decisions differently. We have two dimensions of variation: structural versus consequential utility functions and structural versus consequential decision-making. These dimensions produce four possible agents: pure consequential, pure structural, decision-consequential utility-structural, and decision-structural utility-consequential.

A pure consequential agent makes consequential decisions to maximize a consequential utility function; it reasons about how taking a certain action affects the future sequence of consequential types. A purely consequential environment-based agent takes actions to maximize ∑∞t=0U(et)γt for some discount factor γ.

A pure structural agent makes structural decision to maximize structural utility function; it reasons about how implementing a certain decide affect the structural types of its CWM+S. A purely structural decide-based agent implements the decide function to maximize U(C+decide), where C is the agent's CWM.

A decision-consequential utility-structural agent makes consequential decisions to maximize a structural utility function; it reasons about how taking a certain action affects how it models the world. For example, a decision-consequential utility-structural orient-based agent might rewrite its source code. If decision-consequential utility-structural agents are not time-limited myopic, they will take over the world to securely achieve desired CWM structural properties.[12]

A decision-structural utility-consequential agent makes structural decisions to maximize a consequential utility function. If only causal implications are included, a decision-structural utility-consequential agent behaves identically to a purely consequential agent. If we include acausal implications, decision-structural utility-consequential agents resemble UDT agents. Traditional UDT agents are decision-structural utility-consequential environment-based agents.

Pure Structural decide-based Agents = Time-Limited Myopic Action/Internal-based Behavioral Agents

Pure structural decide-based agents can be expressed as time-limited myopic action/internal-based consequential agents and vice versa. Let decide∗ be optimal according to our pure structural agent's utility function. To construct an action/internal-based consequential utility function for which decide∗ is optimal, define U such that ∀i∈I,a∈A:U(i,a)=1⟺decide∗(i)=a and 0 otherwise. To show the inverse, construct a structural utility function maximal utility to the time-limited myopic action/internal-based consequential agent's decision function.

These agents are behaviorally identical but mechanistically distinct; they use different decision mechanisms and have different types of utility functions.

Connection to Act-Based Agents

Act-based agents focus on satisfying users' short-term instrumental preferences. These agents might be safe insofar as learning short-term preferences naturally avoids catastrophic generalization. For instance, learning that killing is bad might be easy, allowing weak agents to avoid catastrophe.

Paul Christiano postulates a gradient of agents that satisfy "narrow" preferences to "broad" preferences:

Consider a machine choosing a move in a game of chess. I could articulate preferences over that move (castling looks best to me), over its consequences (I don't want to lose the bishop), over the outcome of the game (I want to win), over immediate consequences of that outcome (I want people to respect my research team), over distant consequences (I want to live a fulfilling life).

In contrast, our framework makes a sharp distinction between agents that use consequential versus structural decision-making. The above spectrum is composed of consequential agents with differing amounts of time-limited myopia. Such agents are unsafe because they would take over the world if they could do so fast enough. In general, all agents that reason over their actions' consequences are dangerous because we do not know reality's causal structure. We hope that structural agents provide a step towards constructing agents that do not use intrinsically dangerous decision-making procedures.

Input Distribution Problem

decide-based structural agents attempt to implement specific decide functions, which will often require determining the distance to a target function, e.g. the HCH function. Unless the distance metric treats all inputs identically, such as with sup norm, the metric requires a distribution over inputs. For instance, recall structural HCH-bots's utility function, U(decide)=Ei∼P[KL(HCH(i)||decide(i))], depends on an input distribution P.

However, the distribution of inputs depends on how the agent responds to various inputs, creating an acausal implication between the agent's actions and what inputs it receives. For example, what you google depends on Google's capabilities. Since the agent's utility depends on the inputs it receives, this acausal implication incentivizes the agent to implement a decide function that shifts the distribution towards high-scoring inputs. The agent is not optimizing over decide functions, but rather jointly optimizing over decide, input distribution pairs. More concretely, the agent has an incentive to hide its full capabilities so it will not be asked difficult questions. If the agent can only answer questions with obvious answers, it will probably be asked questions with easy answers, acausally shifting P to a higher utility distribution.[13]

This acausal implication reduces capabilities, but it also might be a problem for alignment. The capabilities hit from acausally optimizing the input distribution does not appear to intrinsically produce alignment failures, but the problem arises only when the agent thinks of itself as having logical control over other instances of itself. This pattern of reasoning potentially results in deceptive alignment arising through acausal means. [14] In general, any agent that has uncertainty over its input might be able to acausally influence the input distribution, potentially resulting in undesirable behavior.

Conditional Agents Conditional Decision Making

Traditional utility functions map types, e.g., environmental states, to utility. In contrast, conditional utility functions map types to utility functions. For example, an environment-conditional utility function takes in an environmental state and yields a utility function over other environmental states, actions, observations, internal states, etc. We will refer to the utility function given by a conditional utility function as the base utility function.

Conditional agents might make decisions in the following way. Let U be a conditional utility function. Upon having internal state i, the agent acts as if it has utility function U=U(argmaxs∈SP(s|i)), where S varies between E, O, I, and A depending on whether U is environmental, observational, or structural conditional. The agent reasons as if it were a structural or consequential agent depending on the utility function.[15]Action-based agents might run into issues around logical uncertainty.

Examples Example: Value Learner

A simple value-learning agent observes a human and infers the human utility function, which the agent then optimizes. Ignoring the issues inherent in inference, such an agent can be thought of as having a conditional utility function that maps observations to utility functions over environmental states, i.e., of type O→(E×N→R) (Recall that we include explicit time-dependence to allow for arbitrary discounting possibilities).

Shah et al.'s Preferences Implicit in the State of the World attempts to construct agents that infer human preferences based on the current environmental state. In our framework, these agents have conditional utility functions that map environmental states to utility functions over environmental states, i.e., of type E→(E×N→R).

Example: Argmax

Given a CWM, argmax takes in a utility function and outputs the action that maximizes that utility function. Since argmax only has access to its internal representation of the utility function, we can think of argmax as having a conditional utility function that maps internal states to utility functions over all types, i.e. of type I→(A×O×E×I×N→R).

Example: Imprinting-bot

Imprinting-bot is an agent that tries to imitate the first thing it sees, similar to how a baby duck might follow around the first thing it sees when it opens its eyes. Such an agent can be thought of as having a conditional utility function that maps observations to utility functions over decide functions, i.e., of type O→(AI→R).

Example: Conditional HCH-bot

Recall that HCH:I→ΔA is the HCH function. Let HCH(i) be the distribution of actions HCH would take given internal state i. Let HCH(i)(a) be the probability that HCH takes action a when internal state i.

Structural HCH-bot gets higher utility for implementing decide functions that are closer to HCH in terms of expected KL-divergence relative to some input distribution. Conditional behavioral-HCH-bot conditions on the internal state, then gets utility for outputting distributions closer to the distribution HCH would output given its current internal state as input. More precisely, conditional behavioral-HCH-bot has a conditional utility function defined by U:I→(A×N→R):=i↦((a,n)↦−log(HCH(i)(a)) if n=0 else 0).

Conditional structural-HCH-bot conditions on the internal state, then attempts to implement a decide function close to HCH. More precisely, conditional structural-HCH-bot has a conditional utility function described by U:I→(AI→R):=i↦(decide↦KL(HCH(i)||decide(i))).

Conditioning Type is Observationally Indistinguishable

Following a similar argument to previous sections, the type an agent conditions upon cannot be uniquely distinguished observationally. To see this, note that the only information a conditional agent has access to is the internal state, so it must back-infer observation/environmental states. Thus, internal-conditional utility functions can be constructed that mimic conditional utility functions of any conditioning type. Since back-inference might be many-to-one, the reverse construction is not always feasible.

However, there seem to be natural ways to describe certain conditional agents. For instance, one can consider a value learning agent that conditions upon various types. An environmental-conditional value learner looks at the world and infers a utility function. An observational-conditional value learner needs to observe a representation of the utility function. An internal-conditional value learner needs a representation of the utility function in its internal state. At each stage, more of the information must be explicitly encoded for "learning" to take place.

These agents can be distinguished by counterfactually different observe and orient maps. Back inference should happen differently for different observe and orient mappings, causing agents to potentially act differently. Environmental-conditional agents have different utility functions (and potentially take different actions) if either the observe or orient mappings differed. Observation conditional agents are stable under changes to observe, but act differently if orient differed. Internal conditional agents should not care if either observe or orient differed.

Doing back inference at any point opens the door to the input distribution problem; only internal-conditional agents do not back-infer.

Structural Conditional Utility Functions

In addition to consequential conditional utility functions, there are also structural conditional utility functions. Instead of conditioning on a particular environmental state, an agent can condition upon execute, e.g. by having a conditional utility function of type EA→(E×N→R). Such structural conditional agents have utility functions that depend on their model of the world.

For instance, Alice might have a utility function over 3D environmental states. However, suppose that Alice found out the world has four dimensions. Alice might think she gets infinite utility. However, Alice's utility function previously mapped 3D environmental states to utilities, so 4D states produce a type error. For Alice to get infinite utility, a 4D state's utility must generalize to be the infinite sum of all the 3D states it's made of. Instead, Alice might get confused and start reasoning about what utility function she should have in light of her new knowledge. In our framework, Alice has a structural-conditional utility function, i.e., she assigns utilities to 3D environmental states conditional on the fact that she lives in a 3D world.

In general, structural conditional utility functions are resistant to ontological crises – such agents will switch to different base utility functions upon finding out their ontological assumptions are violated.

Type Indistinguishability

The type of a conditional utility function is often mathematically identical to the type of a consequential utility function. For example, a conditional utility function that takes internal states and gives a utility function over actions has type I→(A×N→R), which is mathematically equivalent to an internal/act consequential utility function I×A×N→R. This equivalence is problematic because consequential utility functions do not possess many desirable safety properties.

We currently remain confused by the implications of this problem but sketch out a potential resolution. Classical Bayesianism accepts dogmatism of perception, i.e., observations, once observed, are believed with probability one. Similarly, we might require that conditional agents accept dogmatism of conditioning, i.e., the agent must believe that the object they are conditioning on occurs with probability one. This requirement neatly solves the input distribution problem; even if the agent thought it had logical control of the input distribution, the input could not have been anything else.

Contrast this with one of Demski's desiderata in Learning Normativity:

Learning at All Levels: Although we don't have perfect information at any level, we do get meaningful benefit with each level we step back and say "we're learning this level rather than keeping it fixed", because we can provide meaningful approximate loss functions at each level, and meaningful feedback for learning at each level. Therefore, we want to be able to do learning at each level.

Demski's motivation is that any given level might be corruptible, so learning should be able to happen on all of them. Our motivation is that if an input can be influenced, then it will, so the agent must think of it as fixed.

No Uncertainty Upstream of Utility

In general, many problematic situations seem to arise when agents have uncertainty that is logically upstream of the utility they ultimately receive.[16]An agent with uncertainty over the input distribution suffers from the input distribution problem.[17]An agent with uncertainty over how much utility its action will be power-seeking. We will call the former type of uncertainty historical uncertainty and the latter consequential uncertainty. Agents also must be uncertain about what actions they will take, which we call logical uncertainty.

We are interested in agents that possess no historical or consequential uncertainty. Such agents might be described as maximizing utility directly instead of maximizing expected utility, as there is no uncertainty over which to take an expectation.

We suspect avoiding historical uncertainty is the only way to avoid the input distribution problem described above because if agents have uncertainty logically upstream of utility they eventually receive, then they have an incentive to acausally influence the distribution to get more utility. We also suspect that avoiding consequential uncertainty is a way to eliminate power-seeking behavior and undesirable outcomes like edge instantiation.

The trick employed in Conditional Decision Making is to collapse the input distribution into the maximum probability instance. Thus an environmental-conditional agent will optimize the most probable base utility function instead of optimizing the weighted mixture represented by the probability distribution. This does not entirely avoid the input distribution problem, as the agent still has logical influence over what the maximum probability back-inference is, but it reduces the amount of logical control an agent has.

In practice, the agent will have logical control over parts of the input distribution. It remains an open question as to how to get an agent to think they do not have logical control.

Problem: Empirical Uncertainty

Agents with no historical or consequential uncertainty will still have uncertainty over parts of their world model. For instance, conditional HCH-bot will have uncertainty over its model of a human, which will translate into uncertainty as to what HCH would output for any given input. Since conditional HCH-bot's utility depends on how well it can approximate HCH, this means that there must be uncertainty that is logically upstream of the utility the agent receives. One might hope that the agent does not think of itself as having logical control over the human input-output mapping.

Drawing a natural boundary between these two types of uncertainty remains an open problem.

Problems Cartesian boundaries are not real

One major problem with this way of conceptualizing agency is that Cartesian boundaries are part of the map, not the territory. In reality, there is no distinction between agent and environment. What happens when an agent discovers this fact is relatively unclear, although we think it will not cause capabilities to disintegrate.

Humans have historically thought they were different from the environment. When humans discovered they were made of atoms, they were still able to act. Anthropomorphizing, humans have empirically been robust to ontological crises, so AIs might also be robust.

Even if structural agents can continue acting upon discovering the Cartesian boundary is not real, there might be other undesirable effects. If the agent begins conceptualizing itself as the entire universe, a utility function over decide reduces to a utility function over environmental states; desirable properties of structural agents are lost. As another example, the agent could start conceptualizing "become a different type of agent" as an action, which might cause it to self-modify into an agent that lacks desirable properties.

In general, the way a structural agent models the CWM boundary might change the action set and the internal state set. Depending on how the agent's utility function generalizes, this might cause undesirable behavior.

Myopia might be needed

Purely consequential act/internal-based farsighted agents are incorrigible. If an approval-maximizing agent picks the action trajectory that maximizes total approval, it will avoid being shut down to gain high approval later. While structural agents avoid some of these problems, the highest utility decide function needs to be myopic, else the problem reappears. The base utility function output by a conditional utility function must also be myopia, else the agent will act to preserve its utility function across time.

This analysis suggests myopia might be necessary for safe agents, where myopic agents "only care about the current episode" and "will never sacrifice reward now for reward later." Currently, myopia is not understood well enough to know whether it is sufficient. Myopia also has a number of open problems.

Training for types

If there are agents that possess desirable safety properties, how can we train them? The obvious way is to reward them for having that type. However, knowing an agent is optimal according to some reward function does not constrain the type of the agent's utility function. For instance, rewarding an agent for implementing a decision function that takes high approval actions is indistinguishable from rewarding the agent for creating environments in which its actions garner high approval.

Many agents are optimal for any reward, so the resulting agent's type will depend on the training process's inductive biases. In particular, we expect the inductive bias of stochastic gradient descent (SGD) to be a large contributing factor in the resulting agent.[18] We are eager for exploration into how the inductive biases of SGD interact with structural agents.

Given that behavior does not determine agent type, one could use a training process with mechanistic incentives. For instance, instead of rewarding the agent for taking actions, one can also reward the agent for valuing actions. We could also reward agents for reasoning over decision functions instead of actions or lacking uncertainty of certain types. This training strategy would require advances in transparency tools and a better mechanistic understanding of structural agents.[^current work]

[^current work]: A sample of current work in this general direction is Hubinger's Towards a mechanistic understanding of corrigibility and Relaxed adversarial training for inner alignment, Filan et al.'s Pruned Neural Networks are Surprisingly Modular, and much of the OpenAI Clarity team's work on circuits.

Conclusion

If one supposes a Cartesian boundary between agent and environment, the agent can value four consequential types (actions, environments, internals, and observations) and four structural types (observe, orient, decide, and execute). We presented Cartesian world models (CWMs) to formally model such a boundary and briefly explored the technical and philosophical implications. We then introduced agents that explicitly maximize utility functions over consequential and structural properties of CWMs.

We compared consequential agents, structural agents, and conditional agents and argued that conditional agents avoid problems with consequential and structural agents. We concluded by presenting multiple problems, which double as potential directions for further research.

Appendix

These sections did not fit into the above structure.

Two types of consequential agents are "wireheading": internal-based and observation-based agents. Internal-based agents wirehead by changing their internal states, e.g., by putting wires in their brain. Observation-based agents wirehead by changing their observations, e.g., by constructing a virtual reality. Internal state-based agents also wirehead by changing their observations, but observation-based agents will not change their internal states (typically).

These two agents demonstrate "wireheading" does not carve reality at its joints. In reality, internal and observation-based agents are maximizing non-environment-based utility functions. Internal-based or observation-based agents might look at environment-based agents in confusion, asking, "why don't they just put wires in their head?" or "have they not heard of virtual reality?"

Consider: are action-based agents wireheading? Neither yes nor no seems reasonable, which hints that wireheading is a confused concept.

Types are Observationally Indistinguishable

In most cases, it is impossible to determine an agent's type by observing its behavior. In degenerate cases, any set of actions is compatible with a utility function of any combination of {E,I,A,O}.[19] However, by employing appropriate simplicity priors, we guess the agent's approximate type.

As an analogy, all nondeterministic finite automaton (NFA) can be translated into deterministic finite automaton (DFA). What does it mean to say that a machine "is an NFA"? There is an equivalent DFA, so "NFA" is not a feature of the machine's input-output mapping, i.e., "being an NFA" is not a behavioral property.

Converting most NFA into DFA requires an exponential increase in the state-space. We call a machine an NFA if describing it as an NFA is simpler than describing it as a DFA. Similarly, we might call an agent "environment-based" if describing its behavior as maximizing an environment-based utility function is simpler than describing it as maximizing a non-environment-based utility function.

Unlike NFAs and DFAs, however, one can construct degenerate CWMs that rule out specific type signatures. Suppose that the environment consisted only of a single switch, and the agent could toggle the switch or do nothing. If an optimal agent always toggled the switch, we could infer that the agent was not purely environment-based.[20]

In general, the VNM Theorem rules out some types of utility functions for some sequences of actions. If the agent can act to leave itself unchanged, loops of the same sequences of internal states rule out utility functions of type {I}. Similarly, loops of the same (internal state, action) pairs rule out utility functions of type {I},{A} and {I,A}. Finally, if the agent ever takes different actions, we can rule out a utility function of type {A} (assuming the action space is not changing).[5:1]

However, to see that agent types cannot be typically distinguished behaviorally, note that an agent can always be expressed as a list of actions they would take given various observations and internal states. Given reasonable assumptions, one can construct utility functions of many types that yield the same list.[21] This flaw with behavioral descriptions justifies our intention that the CWM+U framework captures how the agent models its interactions with the world - a mechanistic property, not a behavioral one.

Example: human

Alice has a utility function over the environment. Alice's only way to obtain information is by observation, so we can construct a utility function over observations that results in indistinguishable behavior. Alice refuses to enter perfect VR simulations, so we suppose that Alice intensely disvalues the observation of entering the simulation. Observation-based Alice is more complicated than environment-based Alice but is compatible with observed behavior.

Observations are distinguishable only when they produce different internal states, so we can construct a utility function over internal states that results in indistinguishable behavior. Alice does not take drugs to directly alter the internal state, so we suppose they intensely disvalue the internal state of having decided to take drugs. Internal-based Alice is more complicated than environment-based Alice but is compatible with observed behavior.

Alice takes different actions, so we can rule out a pure action-based utility function. Since internal states are only distinguishable by what actions they cause Alice to take, we can construct an internal/action-based utility function that results in indistinguishable behavior. Alice is willing to sacrifice their life for a loved one, so we suppose they get immense utility from taking the action of saving their loved one's life when they believe said loved one is in danger. Internal/action-based Alice is more complicated than environment-based Alice but is compatible with observed behavior.

Natural descriptions

In the above, there seemed to be a "natural" description of Alice.[22] Describing Alice's behavior as maximizing an environment-based utility function did not require fine-tuning, whereas an observation-based utility function had specific observations rate very negatively. We can similarly imagine agents whose "natural" descriptions use utility functions of other type signatures, the trivial examples being agents who model themselves in a CWM and maximize a utility function of that type.

It is essential to distinguish between a mechanistic and a behavioral understanding of CWMs. Under the mechanistic understanding, we consider agents that explicitly maximize utility over a CWM, i.e., the agent has an internal representation of the CWM, and we need to look inside the agent to gain understanding. Under the behavioral understanding, we are taking the intentional stance and asking, "if the agent internally modeled itself as being in a CWM, what is the simplest utility function it could have that explains its behavior?"

Attempting to understand agents behaviorally using CWMs will sometimes fail. For instance, a thermostat is a simple agent. One might describe it as an environment-based agent whose goal is to maintain the same temperature. However, the thermostat only cares about the temperature in a tiny region around its sensors. We could describe it as an observation-based agent that wants to observe a specific temperature. Nevertheless, there is a 1-1 correspondence between observations and internal states, so it seems equally accurate to describe the thermostat as internal-based. Finally, we can describe it as an internal/action-based agent that wants to increase the temperature when it is low but decrease it when it is high.

We strain to describe a thermostat as environment-based or internal/action-based. However, there is a 1-1 correspondence between observation and internal state, so it is equally simple to describe the thermostat as observation-based or internal-based.

It can also be unclear what type of utility function suboptimal agents have. Suppose an agent does not know they can fool their sensors. In that case, an observation-based agent will act the same as an environment-based agent, making it impossible to obtain a mechanistic understanding of the agent by observing behavior.

"Becoming Smarter"

What exactly does it mean for an agent to improve? Our four-map model allows us to identify four potential ways. In what follows, we assume the agent is farsighted. We also implicitly relax other optimality assumptions.

• observe: An agent could expand O or reduce the expected entropy of observe(e,a). For example, an agent could upgrade its camera, clean the camera lens, or acquire a periscope. Many humans employ sensory aids, like glasses or binoculars.
• orient: An agent could both expand I and reduce the expected entropy of orient(o,i). For example, an agent could acquire more memory or better train its feature detection algorithms. Many humans improve their introspection ability and memories by meditating or using spaced repetition software.
• decide: An agent could implement a decide function that better maximizes expected future utility. For example, a chess-playing agent could search over larger game trees. Many humans attempt to combat cognitive biases in decision-making.
• execute: An agent could expand A or decrease the expected entropy of execute(e,a). For example, a robot could learn how to jump, build itself an extra arm, or refine its physics model. Many humans practice new skills or employ prostheses.

Many actions can make agents more powerful along multiple axes. For example, increasing computation ability might create new actions, increase decision-making ability, and better interpretation processing. Moving to a different location can both create new actions and increase observation ability.

1. This drawing is originally from Embedded Agents ↩︎

2. The decision-making model known as the OODA Loop inspired this naming scheme. Acting has been renamed to executing to avoid confusion with act-based agents. ↩︎

3. See Conditions for Mesa-Optimization for further discussion. ↩︎

4. See Locality of Goals for a related discussion. ↩︎

5. Technically, there could be two actions that both had maximal utility. This occurrence has measure 0, so we will assume it does not happen. ↩︎ ↩︎

6. Hadfield-Menell, Dylan, Smitha Milli, Pieter Abbeel, Stuart Russell, and Anca Dragan. "Inverse Reward Design." ArXiv:1711.02827 [Cs], October 7, 2020. http://arxiv.org/abs/1711.02827. ↩︎

7. Constructing this set is technically impossible. In practice, it can be replaced by the set of all finite CWS+Ss. ↩︎

8. We contrast this to an agent whose internals are structured like HCH, i.e., it has models of humans consulting other models of humans. An agent with this structure is likely more transparent and less competitive than an agent trying to enforce the same input-output mapping as HCH. ↩︎

9. This utility function is flawed because the asymmetry of KL-divergence might make slightly suboptimal agents catastrophic, i.e., agents that rarely take actions HCH would never take will only be slightly penalized. Constructing a utility function that is not catastrophic if approximated remains an open problem. ↩︎

10. What a human asks the agent depends on the agent's properties, so using the human distribution has acausal implications. See Open Problems with Myopia for further discussion. ↩︎

11. Uesato, Jonathan, Ramana Kumar, Victoria Krakovna, Tom Everitt, Richard Ngo, and Shane Legg. "Avoiding Tampering Incentives in Deep RL via Decoupled Approval." ArXiv:2011.08827 [Cs], November 17, 2020. http://arxiv.org/abs/2011.08827. ↩︎

12. The inverse might not hold. Time-limited myopic agents might also take over the world for reasons described here ↩︎

13. These concerns are similar to the ones illustrated in Demski's Parable of Predict-O-Matic. ↩︎

14. See Open Problems with Myopia for further discussion. ↩︎

15. Here, the agent is reasoning according to the maximum probability world state, a trick inspired by Cohen et al.'s Asymtotically Unambitious Artificial General Intelligence. ↩︎

16. What we mean by "upstream" is unclear but is related to the concept of subjunctive dependence from Functional Decision Theory: A New Theory of Instrumental Rationality. ↩︎

17. If an agent's experiences shape who they are, uncertainty over the input distribution might be viewed as a special case of anthropic uncertainty. ↩︎

18. See Understanding Deep Double Descent for more discussion. ↩︎

19. This is related to Richard's point that coherent behavior in the world is an incoherent concept. ↩︎

20. Technically, both environmental states could have equal utility, making all policies optimal. This occurrence has measure 0, so we will assume it does not happen. ↩︎

21. I am not sure what assumptions are needed, but having no loops of any type and the agents always taking the same action in the same (environmental state, internal state) pair might be sufficient. ↩︎

22. Here, we roughly mean "there is probably some notion of the complexity of an agent's description that has some descriptions as simpler than others, even though we do not know what this notion is yet." ↩︎

Discuss

### What topics are on Dath Ilan's civics exam?

27 апреля, 2021 - 03:59
Published on April 27, 2021 12:59 AM GMT

Dath Ilan is a parallel Earth on which human civilization has its act together, in ways that actual-Earth does not. Like actual-Earth, citizens of Dath Ilan sometimes take standardized tests, both to figure out what sort of jobs they'd be suited for, to make sure that its educational institutions are functioning, and to give people guidance about what they might want to study. Unlike Earth's, Dath Ilan's tests have had a lot of thought put into the choice of topics: rather a lot more economics, rather a lot less trigonometry and literature. Topics are selected based on cost/benefit; something that takes a long time to learn would need to be a lot more useful, or have major positive externalities to more people knowing it.

I want to create a test, that will tell people what topics they ought to learn, and enable people to make their knowledgeability legible.

What topics belong on it?

Discuss

### Scott Alexander 2021 Predictions: Buy/Sell/Hold

27 апреля, 2021 - 03:40
Published on April 27, 2021 12:40 AM GMT

Scott Alexander is out with his (late) 2021 predictions. You know what that means. It’s time to find things to disagree with!

Scott has the tough job here. He’s putting out a hundred plus predictions with probabilities attached. All I’m doing is saying where I definitely disagree with him.

Epistemic Status: Writing this quickly and off the cuff seems more appropriate and fair. I’m going to explain my reasoning here while also not trying to do a bunch of research on these questions. In general, if something seems reasonable and say or imply that I’m holding, that’s not a strong ‘this is also my probability strongly held’ answer, it’s in the ballpark but likely weakly held.

US/WORLD

1. Biden approval rating (as per 538) is greater than 50%: 80%

Biden’s approval rating is clearly steady. There’s always some honeymoon effect, but it would take a surprising event to send it that far down. 80% seems like it’s in the ballpark. Hold.

2. Court packing is clearly going to happen (new justices don’t have to be appointed by end of year): 5%

Indeed do many things come to pass, and ‘clearly going to happen’ isn’t a clear definition. If this is ‘legislation expanding the size of the court has passed’ then this seems high to me because not only does it seem unlikely Biden gets 50 votes on this, it seems unlikely he’d get them this quickly with so much else on the agenda, but also they’re talking about it, Biden’s already gone gangbusters on giant bills and 5% isn’t that high. So I can’t disagree strongly. Hold.

3. Yang is New York mayor: 80%

Yang is only at 69% on PredictIt, although PredictIt tends to be too low on favorites in this range (although not enough to justify trading on its own). He’s ahead, but he’s prone to rather silly things and there’s plenty of time to mess up, so I think I’m with PredictIt on this and I’ll stick with 70%. Sell.

4. Newsom recalled as CA governor: 5%

Depending on what counts as ‘recalled’ this is either at least 10%, or it’s damn near 0%. I don’t see how you get 5%. Once you get an election going, anything can happen. Weird one, I’d need more research.

5. At least $250 million in damage from BLM protests this year: 30% With the verdict in, I don’t see what causes this kind of damage in the next 7 months. That doesn’t mean it can’t happen, but$250 million is a lot. I’m selling this down at least to 20%. Sell.

6. Significant capital gains tax hike (above 30% for highest bracket): 20%

I don’t think you need to get to 30% to be significant, but that’s not the question. The question is how likely this is, which is asking how likely all 50 senators go along with it. Given there’s already been mention of specifically 29.6% as a Shilling point, I’m guessing 20% is about right. Hold.

7. Trump is allowed back on Twitter: 20%

I’m selling this to 10%. Why would Twitter do this? They’ve already paid the price they’re going to pay, and it’s not like Trump mellowed out.

8. Tokyo Olympics happen on schedule: 70%

I’m more at the Metaculus number of 80% provided slipping a few days doesn’t count as failing, I’m leaving it alone if a postponement of any length counts because random stuff does happen. I think Japan really, really wants this to happen and there’s no reason for it not to. Buy.

9. Major flare-up (significantly worse than anything in past 5 years) in Russia/Ukraine war: 20%

It’s definitely a thing that can happen but there isn’t that much time involved, and the timing doesn’t seem attractive for any reason. I’ll sell to at least 15% on reasonable priors.

10. Major flare-up (significantly worse than anything in past 10 years) in Israel/Palestine conflict: 5%

The last ten years have been unusually quiet here, so it arguably would take very little to count as a  major flare up here, but vagueness of what ‘major’ means makes it tough. With a tighter definition I might buy to 10%, if it’s wide enough maybe a little higher. Otherwise, hold.

11. Major flare-up (significantly worse than anything in past 50 years) in China/Taiwan conflict: 5%

Every war game of this situation I’ve read says that it’s a major disaster with no winners, even if China ‘wins,’ so it’s not in China’s interest to push on this, and it seems like it will have better spots in the future. 50 years is a long enough window that this has to be a shooting war. I do worry about this scenario but I think 5% is still high, and I’m selling to 3% if I’m truly risk-neutral. Given I’m very short China/Taiwan conflict due to being alive and liking nice things, I wouldn’t actually bet here, but worth noting my prior is lower.

12. Netanyahu is still Israeli PM: 40%

This is the PredictIt line for him on 6/30, and Scott’s predicting this out to January 1. I’m guessing that he didn’t notice? Otherwise, given how many things can go wrong, it’s a rather large disagreement – those wacky Israelis have elections constantly. I’m going to sell this down to 30% even though I have system 1 intuitions he’s not going anywhere. Math is math.

13. Prospera has at least 1000 residents: 30%

Hold/pass on the principle that Everything I Know About Prospera I Learned From Scott’s Article and he’s thought about this a ton more than I have.

14. GME >$100 (Currently$170): 50%

That’s an interesting place to put the line. GME clearly has upside skew, where it could randomly go to $500 again, whereas it could easily retreat to a reasonable fundamentals price like$30, at least until it gets to sell stock and becomes more valuable for that reason. So what do we think about its chances here? Early in this whole thing I’d have said no way, but here we are three months later and it’s sticky, so how does one now apply Lindy to this phenomenon? If it hasn’t ended by now, why does it have to? So my honest answer is I have no idea, and 50% here seems at least sane, so I’m not going to touch it, but I should be very worried I’m anchored. Then again, I’m pretty sure I’d have sold anything substantially higher than this down to at least 60%, and bought up to at least 40%, so it’s the right ballpark I think?

15. Bitcoin above 100K: 40%

16. Ethereum above 5K: 50%

Yearly reminder that this is absurdly bullish on crypto, because the bare minimum way to fufill these means crypto is fairly priced now. I’d sell Bitcoin down to 25%, Etherium down to 30%, and then hedge by buying both of them.

17. Ethereum above 0.05 BTC: 70%

This is outright saying ETH is likely to outperform BTC, so this is Scott’s biggest f*** you to the efficient market hypothesis yet. I’m going to say he’s wrong and sell to 55%, since it’s currently 0.046, and if it was real I’d consider hedging with ETH.

18. Dow above 35K: 90%

19. …above 37.5K: 70%

It’s currently at 34K so saying it’s 90% to be up over the next 7 months is… complete insanity? It’s twice as likely to be between 35K and 37.5K than below 35K at all? Rather than give a probability, I’ll simply say I’m slamming the arbitrage. Sell the 90% a lot and buy index funds and/or options, ignore the 70% cause it’s not as good.

20. Unemployment above 5%: 40%

It’s currently officially 6% and presumably will fall with recovery. They’re pumping in a ton of money, and it was 4% before things got bad, but also a lot of people got a lot of money and there will be a lot of disruption and a lot of money illusion and grumbling. I’m guessing (very naively) that this isn’t going to happen that fast this reliably, and buying to 50%.

I don’t know about the situation at Google but assuming they currently still do this I think it’s more likely than this that they keep doing it. If this is a blind prediction and Scott knows nothing I don’t know, I’d buy to 30%.

22. Starship reaches orbit: 60%

Yeah, no idea. Hold.

COVID

23. Fewer than 10K daily average official COVID cases in US in December 2021: 30%

This is a bad line. If we get things under control everywhere, it will be under 10K, and we’re vaccinating enough to get close to Israeli levels with plenty of time to spare. I’m buying this to 70%, and if someone tried to ‘take it away’ by buying it from me, I’m having none of it.

24. Fewer than 50K daily average COVID cases worldwide in December 2021: 1%

Yep, that’s right, hold. Not enough vaccines.

25. Greater than 66% of US population vaccinated against COVID: 50%

It’s at 42% now. Israel stalled lower than this (in the 50s) so we might hit a wall that’s hard to break. I think we’re favorites so I’ll buy to 60%, but it could go either way. Note that because of children this will play a lot stronger than it might sound.

26. India’s official case count is higher than US: 50%

Buy to 80% before I even start thinking, probably willing to go higher still on reflection. I’m confused how this got here.

27. Vitamin D is generally recognized (eg NICE, UpToDate) as effective COVID treatment: 70%

Vitamin D is good and important, you should be taking it, but I’m skeptical that such sources will recognize this in the future if they haven’t done so by now. Conditional on (I haven’t checked) the sources that matter not having made this call yet, I’d sell it to 50%, while saying that I definitely would use it to treat Covid if I had the choice.

28. Something else not currently used becomes first-line treatment for COVID: 40%

I’ll sell this to 25%, people are slow to adapt to change even when it happens, assuming ‘not currently used’ means not used at all rather than not first-line.

29. Some new variant not currently known is greater than 25% of cases: 50%

Depends what we mean by ‘known’ and what counts as a fully new variant, but my guess is this should be higher. Probably buy it to 60%, given there’s still a lot of time for this to happen.

30. Some new variant where no existing vaccine is more than 50% effective: 40%

I assume this means versus infection only. If it’s versus death, slam the sell button even more. If it’s versus infection only, I’d still sell this down to 25%, assuming this has to apply to Moderna/Pfizer.

31. US approves AstraZeneca vaccine: 20%

If it does happen it will be after it matters, since it already doesn’t matter, so I’m not sure why we would do it, but I don’t have a good model here. 20% seems low enough that I don’t want to go lower.

32. Most people I see in the local grocery store aren’t wearing a mask: 60%

Buy to 75%. Scott is in Berkeley, so I’m optimistic that the area will be sufficiently vaccinated to be very safe by year’s end. It then comes down to, just how crazy are all you people now that it’s over, and my guess is not this crazy all that often. But often enough that I’ve still got the one in four open.

COMMUNITY

33. Major rationalist org leaves Bay Area: 60%

I have private information, so recusing myself.

38. No new residents at our housing cluster: 40%

39. No current residents leave our housing cluster: 60%

My guess is Scott is going to be underconfident on this, and also that he’s not taking into account how late it is in the year, so I’m going to do the ‘blind bet’ thing and sell #38 to 35% and buy #39 to 65%, but not push it.

53. At least seven days my house is orange or worse on PurpleAir.com because of fires: 80%

Note that Scott is only saying he’s 50% to leave Berkeley for a month. I’m going to hold this but also point out that if you can’t breathe the air maybe it’s time to check out the air somewhere else.

PERSONAL

60. There are no appraisal-related complications to the new house purchase: 50%

Buy to 60% based on what I’ve learned about appraisals, assuming complication means a meaningful one, and assuming Scott’s #61 prediction isn’t nuts. I won’t go further than this due to asymmetrical information disadvantage.

61. I live in the new house: 95%

Sell to 90% on the ‘indeed to many things come to pass’ platform. Probably, but let’s not get too confident here.

62. I live in the top bedroom: 60%

Buy to 65% because this feels like a place where if Scott’s thinking it’s a favorite, it’s a bigger favorite than he thinks, but again information issues.

63. I can hear / get annoyed by neighbor TV noise: 40%

Sell to 30% but the fact that it’s here at all makes me wonder so I’ll stop there given information issues. I’ve literally never had this happen in a house, and also there are almost no TVs in Berkeley that are ever on in the first place, so I’d be curious to hear more.

64. I’m playing in a D&D campaign: 70%

I’ll trust Scott on this one and hold.

65. I go on at least one international trip: 60%

I’m guessing this underestimates the number of things that can go wrong, but Scott seems too skeptical about pandemic outcomes, which cancels that out, so I’ll hold.

66. I spend at least a month living somewhere other than the Bay: 50%

I wonder how much this is based on the whole ‘PurpleAir says you literally can’t breathe the air’ issue, and how much is travelling, and without more information I don’t think I can get involved, so staying out.

67. I continue my current exercise routine (and get through an entire cycle of it) in Q4 2021: 70%

People tend to be pretty overconfident in such matters, so I’m tempted to sell on general principles, but I do think the public prediction will help somewhat. I guess sell a tiny bit to 65% but keep it light.

68. I meditate at least 15 days in Q4 2021: 60%

69. I take oroxylum at least 5 times in Q4 2021: 40%

Don’t feel like I have a good enough handle here to do anything beyond hold.

70. I take some substance I haven’t discovered yet at least 5 times in Q4 2021 (testing exempted): 30%

That seems aggressive. Haven’t discovered yet seems a lot harsher than haven’t tried yet. I’ll sell to 25% but again, the prediction must have come from somewhere.

71. I do at least six new biohacking experiments in the next eight months: 40%

This seems like a lower bar to me by a lot than #70, so I’ll hold.

73. The Twitter account I check most frequently isn’t one of the five I check frequently now: 20%

I don’t think it’s that likely there will be a big new Twitter account at the top unless Scott is using Twitter for Covid a lot. Assuming his top 5 are mostly not that, I’ll sell this to 15%.

74. I make/retweet at least 25 tweets between now and 2022: 70%

I think I bet against a similar thing last time and lost by a wide margin. My guess is this is if anything a little underconfident, since 25 is not that many, so maybe buy to 75%.

WORK

75. Lorien has 100+ patients: 90%

76. 150+ patients: 20%

77. 200+ patients: 5%

78. I’ve written at least ten more Lorien writeups (so total at least 27): 30%

I’m somewhat sad that #78 is sitting so low, but I don’t feel like I have enough info to disagree with it. #75 is basically ‘does Lorien exist’ since there’s no way Scott either loses or fires his patients, but the 150+ and 200+ thresholds mean taking more, and I’m guessing that won’t happen. It does seem like 70% is a lot of space between 100-149 patients, so I’d probably split the difference and go to 85% and 25% to open up things a bit. The downside represents ‘Lorien experiment fails and Scott transitions to something else’ and the upside seems plausible too. I’ll also go to 10% on 200+ patients if ‘second doctor joins practice’ is a way to get there, hold if not.

84. I have switched medical records systems: 20%

85. I have changed my pricing scheme: 20%

Switching EMRs is a bitch and 20% sounds like a lot, sell #84 to 15%. On the pricing scheme, that’s entirely dependent on how much Scott is willing to sacrifice to see it through, so if he says 20% I believe him.

BLOG

86. ACX is earning more money than it is right now: 70%\

I have a hard time believing that ACX revenue won’t increase so long as ACX keeps up its quality and quantity levels. I’ll buy to 80%.

90. There is another article primarily about SSC/ACX/me in a major news source: 10%

I’ll buy this to 25%. Scott’s interesting, his relationship to the press is interesting, there are a lot of major news sources, and also this prediction might give people ideas.

91. I subscribe to at least 5 new Substacks (so total of 8): 20%

Substack costs can add up fast, so it seems reasonable that going to this many wouldn’t be that likely, but with a lot of revenue it makes sense to be in touch with the greater blogosphere. I’m going to buy this to 30%.

92. I’ve read and reviewed How Asia Works: 90%

Cool. Presumably this means he’s mostly done, I’ll be comparing this to my own review. Hold.

93. I’ve read and reviewed Nixonland: 70%

Also cool, possible this causes me to read it. Hold.

94. I’ve read and reviewed Scout Mindset: 60%

Buy to 70%, it would be pretty weird for Scott not to review this but I have to update on it only being 60%. I plan to read and likely review it as well, once Covid dies down or I otherwise find the time.

95. I’ve read and reviewed at least two more dictator books: 50%

Two is a lot here, so presumably this is important to Scott. I’ll sell it a bit down to 45% because two is tough, but mostly trust him.

96. I’ve started and am at least 25% of the way through the formal editing process for Unsong: 30%

97. Unsong is published: 10%

The implication here is that it’s about the halfway point in difficulty to get a quarter of the way through editing (about 1/3 chance of each step). My understanding is that publishing delays are often very long, so unless he plans to self-publish, no way this happens in 2021, but I can totally see a self-publishing for Unsong, so I’ll leave these be because there are too many variables I don’t have a good handle on.

98. I’ve written at least five chapters of some non-Unsong book I hope to publish: 40%

99. [redacted] wins the book review contest: 60%

There might be a best entry but these things seem more random than that? I’ll sell to 50%.

100. I run an ACX reader survey: 50%

101. I run a normal ACX survey (must start, but not necessarily finish, before end of year): 90%

Not sure how these two can coexist, so going to wait them out pending clarifications if any.

102. By end of year, some other post beats NYT commentary for my most popular post: 10%

I’m guessing such events are slightly less rare than this? But that was a really big event, so I’ll probably still hold.

103. I finish and post the culture wars essay I’m working on: 90%

104. I finish and post the climate change essay I’m working on: 80%

105. I finish and post the CO2 essay I’m working on: 80%

Good luck, sir, and may the odds be ever in your favor. I don’t think I’m in a position to second guess, if anything I’d be bullish on #104 and #105, maybe a little bearish on #103, but very small.

106. I have a queue of fewer than ten extra posts: 70%

Sell to 60% because if I was Scott I would totally end up with a much, much larger queue (and I do in fact have a truly gigantic one to the extent I have a queue at all).

META

107. I double my current amount of money (\$1000) on PredictIt: 10%

#107 is all about how much Scott is willing to risk. You can make this at least 40% by ‘betting on black.’ So I can’t really say, but my guess is Scott messes around enough that this can be bought to 15%.

108. I post my scores on these predictions before 3/1/22: 70%

This is one of those weird full-control meta-predictions. I think Scott will be that much more likely to post in late February and I’ll bump it to 75%, but there’s a bunch of ways this can fail.

Discuss

### Myocarditis after Pfizer-BioNTech Vaccine

27 апреля, 2021 - 03:25
Published on April 27, 2021 12:25 AM GMT

Israel Examines Heart Inflammation Cases After Pfizer Shot - Bloomberg

The day after I received my 2nd shot, I had an extremely uncomfortable feeling in my chest. My heart was beating rapidly and it was difficult to catch my breath after exertion. At some points I debated calling a physician or hospital. Has anyone else experienced similar effects? The article says the (potentially) linked cases happened primarily in men under 30, which happens to be my demographic. I am unsure if I had myocarditis of this kind, but it seems plausible.

My Fitbit thinks I spent 6.5 hours that day exercising (even though I spent almost the entire day sitting or in bed).

Discuss

### Announcing the Alignment Research Center

27 апреля, 2021 - 02:30
Published on April 26, 2021 11:30 PM GMT

(Cross-post from ai-alignment.com)

I’m now working full-time on the Alignment Research Center (ARC), a new non-profit focused on intent alignment research.

I left OpenAI at the end of January and I’ve spent the last few months planning, doing some theoretical research, doing some logistical set-up, and taking time off.

For now it’s just me, focusing on theoretical research. I’m currently feeling pretty optimistic about this work: I think there’s a good chance that it will yield big alignment improvements within the next few years, and a good chance that those improvements will be integrated into practice at leading ML labs.

My current goal is to build a small team working productively on theory. I’m not yet sure how we’ll approach hiring, but if you’re potentially interested in joining you can fill out this tiny form to get notified when we’re ready.

Over the medium term (and maybe starting quite soon) I also expect to implement and study techniques that emerge from theoretical work, to help ML labs adopt alignment techniques, and to work on alignment forecasting and strategy.

Discuss

### [Linkpost] Treacherous turns in the wild

27 апреля, 2021 - 01:51
Published on April 26, 2021 10:51 PM GMT

Mesa-optimizers potentially escape behavioral adversarial training by realizing they're being tested, gaming the test, then defecting when they're deployed. Luke Muelhauser recounts an example of such a treacherous turn in the wild.

Most relevant quote:

As part of a project studying the evolution of (simulated) organisms, computer scientist Charles Ofria wanted to limit the replication rate of a digital organism. So, he programmed the system to pause after each mutation, measure the mutant’s replication rate in an isolated test environment, and delete the mutant if it replicated faster than its parent. However, the organisms evolved to recognize when they were in the test environment and “play dead” (pause replication) so they would not be eliminated and instead be kept in the population where they could continue to replicate outside the test environment. Once he discovered this, Ofria randomized the inputs of the test environment so that it couldn’t be so easily detected, but the organisms evolved a new strategy, to probabilistically perform tasks that would accelerate their replication, thus slipping through the test environment some percentage of the time and continuing to accelerate their replication thereafter.

The full story is given in the original post.

Discuss

### FAQ: Advice for AI Alignment Researchers

26 апреля, 2021 - 21:59
Published on April 26, 2021 6:59 PM GMT

To quote Andrew Critch:

I get a lot of emails from folks with strong math backgrounds (mostly, PhD students in math at top schools) who are looking to transition to working on AI alignment / AI x-risk. There are now too many people “considering” transitioning into this field, and not enough people actually working in it, for me, or most of my colleagues at Stuart Russell’s Center for Human Compatible AI (CHAI), to offer personalized mentorship to everyone who contacts us with these qualifications.

From math grad school to AI alignment, Andrew Critch

I’m pretty sure he wrote that at least 4 years ago (2016 or earlier). The field has grown enormously since then, but so has the number of people considering it as a research area. So far, I’ve tried to give at least 10 minutes of my time to anyone who emails me with questions; that probably won’t be sustainable for much longer. So now I’m answering the questions I get most frequently.  I hope to keep this up to date, but no promises.

Usually, I write a blog post when I think I have something important and novel to say, that I am relatively confident in. That’s not the case for this post. This time, I’m taking all the questions that I frequently get and writing down what I’d say in response. Often, this is (a) not that different from what other people would say, and (b) not something I’m very confident in. Take this with more grains of salt than usual.

Thanks to Neel Nanda, Nandi Schoots, and others who wish to remain anonymous for contributing summaries of conversations.

See the linked post for the FAQ; which will hopefully be kept up to date over time.

Discuss

### Spoiler-Free Reviews: Monster Slayers, Dream Quest and Loop Hero

26 апреля, 2021 - 18:00
Published on April 26, 2021 3:00 PM GMT

Recently my love of card-based roguelikes caused me to check out these two games.

First, Loop Hero.

Loop Hero is a unique Tier 2 (Worth It) game that came out recently.

If you’re up for a unique retro-graphical semi-auto-battling rogue-like, stop reading here and go play it, as the game benefits from the highest possible degree of blindness.

At its core, you go around a loop auto-battling, as your battles give you loot in the form of better equipment and drawing cards from your deck. The cards in your deck give you more enemies to face to get more better loot, they give you bonuses, and they give you resources to take back after your foray is over. Each loop, your enemies get harder. Once you’ve played enough cards, the boss appears. When you’re done with the level, you take your resources back and improve your camp, which makes you stronger, and you go out for another meta-loop on the loop, until you win.

Loop Hero has a lot of new ideas in it, it’s easy to learn and get into, and its atmosphere and vibe are unique and cool. You make a bunch of interesting decisions, there’s a bunch to try and learn, and you can tune the challenge in various ways to keep it balanced if you’d like. The mission of ‘figure things out and win in the minimum number of expeditions’ was how I approached it, and that seemed good. A hardcore player might start from scratch to try and win as fast as possible, and that’s its own thing.

There are some issues. The graphics are very old school, and while some are done quite well, some (especially your hero icon on the map) are ugly. Half the cards you’ll draw will be terrain cards, and you’ll have to go through the process of playing them every time, which stops being interesting quickly. There’s a lot of ‘forced move’ style stuff that ends up taking up a bunch of time. And finally, once you realize what the ‘right answers’ are, things get less interesting in many sense.

If they ever make Loop Hero 2, there’s a lot of room to expand upon these ideas and make them better.

At some point I’d like to discuss this stuff in spoiler-included detail, since I find the choices interesting, but for now that’s about all you need to know.

Second, Monster Slayers and Dream Quest.

Back in 2014 there was this game called Dream Quest. Dream Quest was an awesome game, much more subtle and well balanced than it appeared. You progressed through three levels building up a deck and facing enemies that give you experience to level up and gold to spend on items, new cards and card removals when given the chance. You’d die a lot while both unlocking helpful things and learning the ropes, and when you knew what you were doing, and eventually win. Unfortunately, half of Dream Quest’s art looks like the game creator’s young child drew stick figures, because that’s exactly what did happen. It’s a great game, with only its presentation issues and the amount of essentially required dying keeping it in Tier 2. I can’t say it’s everyone’s cup of tea, but if you like the genre and don’t mind the rough parts, it’s elite.

Then in 2017 there was a game called Monster Slayers. Monster Slayers is a Dream Quest variant, and I put it in Tier 3 (Good).

There are some new twists, some of which are clear improvements, but the core elements are all there. Three levels to navigate, need to beat a bunch of enemies with various decks at various levels to get the XP to level up to face the boss, various things that boost you, standard issue rogue deckbuilding with a lot of the same elements and so on.

Monster Slayers makes the formula more user-friendly, easier on the eyes and easier to get into. Also easier period, if you at all know what you are doing, and that’s the problem. What felt like interesting tension in Dream Quest, making sure you could plan for the long term without dying now, and navigating your way through, feels like busywork in Monster Slayers, because you’re not under threat unless you mess up and face one of the few dangerous enemies at the wrong time.

And those fights can take quite a while, especially if you’re playing a defensive class. You’ll be clicking on your cards a lot. The reason things take so long is largely that the clearly correct strategy for every class seems to involve a lot of card draw and cycling through your deck, for reasons that any veteran of the genre should find very obvious. There’s essentially a ‘correct’ way to navigate things, and it’s not that hard to do it. Sure, occasionally you mess up, either you get restless or you don’t understand things, or you run into one of the few dangerous enemies without being equipped for it, so I only ‘won’ 4 of my 8 runs with 5 different base characters – I lost the first 2 while learning the rules, I lost the 3rd to running into something that killed me out of nowhere, and the 4th because I got complacent cause things were so slow and I didn’t have the heart to rewind a bit even though I had that available.

Did I enjoy my time with Monster Slayers? I enjoyed the first maybe two-thirds of it on net. But it definitely kept giving me the ‘go semi-infinite’ and ‘draw your whole deck every turn’ hits past the point where I was actually still enjoying it, and was finding it mostly tedious, so I won’t be exploring the other 7+ classes.

Still, it’s cheap, so check it out. But check out Dream Quest first/instead, cause it’s a better game.

Discuss

### Bayesian and Frequentists Approaches to Statistical Analysis

26 апреля, 2021 - 03:18
Published on April 26, 2021 12:18 AM GMT

In this post, I want to give a from-first-principles account of Bayesian and frequentists approaches to statistical analysis. Bayesian and frequentism have been discussed here often, although LessWrong posts on these topics are often focused on Bayesians and frequentists as it comes to probability and rationality, or its use in AI. Here, I am going to focus on statistical analyses that are intended for human readers. This is a somewhat experimental attempt to cobble together a sensible framework from the often confusing and mismatched pieces I have encountered when trying to learn about this topic myself, and to keep things grounded on what statistical analysis is actually trying to accomplish rather than on abstract philosophical positions. I will assume the reader is familiar with the mathematics of probability and Bayes rule.

Introduction

To start at the beginning: As staticians, we are interested in helping a reader understand and make inferences using data. We will do so by performing some computations on the data, and then presenting the results to the reader who will, hopefully, be able to use those results to draw conclusions faster and more accurately than they would if trying to parse the data on their own. The main thing we need to decide is what should be computed.

If you are willing to view the reader as a Bayesian reasoner we can formalize this a bit more. In that case, we know the reader ought to do inference by applying Bayes Rule to combine P(D|h) and d with their own prior. However, we assume the reader might not be able to effectively execute this process themselves (if they could we could just hand them P(D|h) and d and go home). Unfortunately, we do not know the reader’s prior, so we cannot simply do that computation for them either. Instead, we can view our task as to summarize the values of P(D=d|h) for different hs in a way that allows the reader to more easily integrate that result with their own prior beliefs.

Regardless of whether you want to take that perspective or not, we should acknowledge that this is a subjective task. After all, the most effective approach will depend on the particular psychology of the reader. However, we can still propose general strategies that we think are likely to be successful. Bayesians and frequentists statisticians offer two approaches.

Frequentists

The most common frequentist approach is null hypothesis testing, where the focus is on how d relates to P(D|hnull). hnull is a null hypothesis that readers might be interested in refuting. For example, in drug trials hnull will usually be “the drug has no effect”. In our coin flipping example, hnull would be “the coin is fair”. In particular, frequentists report P(D is less extreme than d|hnull), known as a p-value. What it means to be “less extreme” is defined by the statistician, and typically captures some notion of how far d is from what we would expect to observe under hnull. Low p-values are said to show the data is incompatible with hnull.

The main pro of this approach is that we do not need to bring in any additional assumptions beyond our knowledge of P(D|hnull). In some cases, depending on how “less extreme” is defined, we might be able to use even milder assumptions (e.g., if the definition only depends on the mean of the data, we only need to make assumptions about how the mean is distributed, not how D is distributed). The primary downside is that we are given no information about any hypothesis other than hnull, which leads to deficiencies when we try to do inference (there is also some issue with the subjectivity of defining what “less extreme” means, but I won’t focus on that here). In particular:

1. It means we are not given any information about P(D|h∗ very, very close to hnull) even though we are often interested in refuting those hypotheses as well. For example, when testing a drug we want to ensure it has a non-trivial effect, not just a non-zero effect. In our coin flipping example, if we are only told the data is incompatible with hnull=0.5, we don’t technically know if the data is incompatible with h=0.500001, which we might consider to be effectively unbiased, because we have not been given any information that rules out the possibility P(D|h=0.50) and P(D|h=0.50000001) are dramatically different distributions. As a result p-values do not, on their own, give you any information about the significance of the results[1].
1. Knowing that d is not compatible with P(D|hnull) also does not necessarily mean d is more compatible with any other hypothesis. This is problematic because if the data does not fit hnull, but it fits every other hypothesis even worse, we might still want to avoid rejecting hnull. These kinds of situations can occur if the data is unlikely under any hypothesis. For example, if there was a 95% chance our friend would give up after flipping the coin after one flip, the observation of 13 heads is incompatible with every hypothesis about how biased the coin is because no matter what the bias of the coin is, there will always be a less than 5% chance to see more than 1 head. This leads to cases like the voltmeter story. As a result a p-value does not necessarily tell you if the null hypothesis is better or worse than other hypotheses[2].

Frequentists have little choice but to argue that, in practice, these issues are not common. In other words, that a low p-value usually does imply the result has at least some significance, and that low p-values are usually not caused by the data just being generally unlikely. After all, if this was not the case these statistics would have very little use. Frequentists can make a reasonable argument for this. The probability distributions that are used in practice are usually smooth enough so that, if the data is not compatible with hnull it is also likely to be incompatible with h very, very close to hnull. Likewise, the odds of the data being generally unlikely is, by definition, unlikely and, for many commonly used distributions used for P(D|h), if the data is not compatible with hnull there will be some other hypothesis that the data is more compatible with. Of course, one could still object to the fact these possibilities are being hand-waved rather than accounted for in the math, and (1) and (2) are known to occur in practical settings (a classic example is that stopping rules can trigger the second issue).

Frequentists might also argue readers are smart and well-informed enough to account for (1) and (2) themselves (although this requires that readers are informed about the details of how the p-values were computed[3]). A particular appeal of this line of thought is that handling these issues requires making some additional assumptions, and one can argue it is better that readers be left to make those assumptions for themselves so they can bring in their own particular beliefs and prior knowledge rather than having the statistician make assumptions on the reader’s behalf. This is, however, putting a non-trivial burden on the reader and the extent to which even well-trained readers do this is debatable.

Bayesians

Bayesians usually summarize how h relates to P(H|D=d), which is computed using a particular prior, P(H). Here H is a random variable over possible hypotheses with a distribution P(H) that is chosen by the statistician. Using P(H) we can apply Bayes Rule to compute P(H|D=d). Bayesians then typically provide some kind of summary of P(H|D=d), such as a credible interval like P(r1<H<r2|D=d). Bayesians can also compute a likelihood ratio like P(D=d|H is close to hnull) / P(D=d|H not close to hnull) where hnull is again a hypothesis we might want to reject, although it still requires a prior of sorts since we will need to integrate over “H is close to hnull” and “H not close to hnull” to compute it.

One thing we should emphasize is that the prior used here is not going to be the reader’s prior. Therefore Bayesians can’t claim the P(H|D=d) they compute is the P(H|D=d) the reader ought to believe if they are doing proper Bayesian reasoning. However, Bayesians can still make the case that knowing the results computed under their prior will be informative for the reader.

One point in favor of this is that this method does help avoid the concerns mentioned for the frequentist approach. Bayesians consider P(D|h) for a range of hs, which means Bayesians at least take some account of hypotheses other than hnull and therefore have some room to make claims about statistical significance. Bayesians also handle the second issue by reporting the ratio of two probabilities. If the data is unlikely under any hypothesis, that general unlikeliness will appear in both numerator and denominator, and thus cancel out.

One can still make a technical objection, not dissimilar to the ones discussed for frequentists, against the use of a prior. In particular, there is no mathematical guarantee that the results computed using the chosen prior would not have been dramatically different if that prior is only slightly different than the chosen prior. This is particularly problematic since the prior must be chosen somewhat arbitrarily by the statistician.

Bayesians could offer a similar response to the one frequentists offer, i.e., that in practice results are unlikely to change dramatically given a small change in the prior, and that readers can judge for themselves how and when this might affect the reported results. Bayesians can also robustify their analysis by additionally reporting how their results would change if the prior changed, a method known as Robust Bayesian analysis.

So, what method is preferable?
I am going to caveat this by saying that I don’t have a huge amount of practical experience with statistics so these are not very confident statements. However, if someone asked me what kind of statistical analysis I would find most useful, I would currently say:

• If P(D|h) is a simple model (i.e., normal or categorical) I think I can more-or-less trust p-values to mean what they seem to mean. One caveat here is that the larger the sample size the less a low p-value implies a large effect size, and I don’t feel like I have a good intuition about how exactly the two are connected.
• For more complex models, I am pretty wary of p-values. Ultimately, p-values require the reader to extrapolate from what was reported to how the model would behave for hypotheses other than hnull, and once the models get complex I am not sure my intuition tracks that.
• I think my preferred approach would be a Bayesian result with sensitivity analysis. Maybe something following the structure of “If your prior of H falls within <some reasonable range> then P(H|D=d) will have <some important property>”, which is a statement I think I could understand and find useful even for complex models of P(D|h).

One could level a critique, again not dissimilar to the ones we have already presented, against both methods due to the assumption of an accurate model P(D|h). In particular, we once again have no mathematical insurance that our results will not change dramatically if our p(D|h) model is even slightly wrong. As far as I can tell, standard practice is to politely ignore that fact, although in principle one could do some kind of robustness analysis here too. Non-parametric methods offer a way out of this as well.

Can frequentists robustify their analysis?
One might ask if there is an analog of Bayesian sensitivity analysis for frequentists so that some kind of mathematical guarantee is provided in regards to (1) and (2). If there is, I don’t know it. I suspect part of the difficulty with doing this is that it is easy for solutions to regress into being de-facto Bayesians methods. For example, to provide results that cover significance levels we will need to analyze hypotheses other than hnull, but then to summarize that array of results we will likely need to do some kind of weighted average, which is starting to sound a lot like integrating over a particular prior. Likewise, dividing by P(D=d), as done by Bayesians, seems like the most obvious solution to (2).

Is this hammers and screwdrivers?
Occasionally I see people attempt to find a high ground on the debate about which method is preferable by saying “Asking which method is preferable is like asking whether using a hammer or screwdriver is preferable, in reality the answer depends on the context.” I don't think this analogy is accurate because these methods are, unlike hammers and screwdrivers, trying to solve the same problem. While it is never wrong to say one should account for context, it is still perfectly reasonable to have a general philosophical preference for one approach or the other.

Both options are vulnerable to p-hacking. Indeed, no amount of clever math or more complex statistical analysis can save us from p-hacking because, if a friend tells you they flipped a coin 20 times and got 16 heads, there is no way for you to distinguish between the case where that result was caused because the coin is biased, or because the friend flipped the coin dozens of times until they got a batch with 16 heads. The only solution is to better regulate how the experiment was run and reported.

1. Hence the American Statistics Association’s (ASA) statement on p-values states “A p-value, or statistical significance, does not measure the size of an effect or the importance of a result” ↩︎

2. Hence the American Statistics Association also states “By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis”. ↩︎

3. Which is part of the reason the ASA also says “Proper inference requires full reporting and transparency.” ↩︎

Discuss

### Beware over-use of the agent model

26 апреля, 2021 - 01:19
Published on April 25, 2021 10:19 PM GMT

This is independent research. To make it possible for me to continue writing posts like this, please consider supporting me.

Thank you to Shekinah Alegra for reviewing a draft of this essay.

Outline
• A short essay intended to elucidate the boundary between the agent model as a way of seeing, and the phenomena out there in the world that we use it to see.

• I argue that we emphasize the agent model as a way of seeing the real-world phenomenon of entities that exert influence over the future to such an extent that we exclude other ways of seeing this phenomenon.

• I suggest that this is dangerous, not because of any particular shortcomings in the agent model, but because using a single way of seeing makes it difficult to distinguish features of the way of seeing from features of the phenomenon that we are using it to look at.

The phenomenon under investigation

Yesterday I wrote about the pitfalls of over-reliance on probability theory as a sole lens for looking at the real-world phenomena of machines that quantify their uncertainty in their beliefs. Today I want to look at a similar situation with respect to over-reliance on the agent model as a sole lens for looking at the real-world phenomena of entities that exert influence over the future. Under the agent model, the agent receives sense data from the environment, and sends actions out into the environment, but agent and environment are fundamentally separate, and this separation forms the top-level organizing principle of the model.

And what is it that we are using the agent model to see? Well let’s start at the beginning. There is something actually out there in the world. We might say that it’s a bunch of atoms bouncing around, or we might say that it’s a quantum wavefunction evolving according to the Schrodinger equation, or we might say that it is God. This post isn’t about the true nature of reality, it’s about the lenses we use to look at reality. And one of the things we see when we look out at the world is that there are certain parts of the world that seem to exert an unusual amount of influence over the future. For example, on Earth there are these eight billion parts of the world that we call humans, and each of those parts has a sub-part called a brain, and if you want to understand the overall evolution of the cosmos in this part of the world then you can get a long way just in terms of pure predictive power by placing all your attention on these eight billion parts of the world that we call humans.

It’s a bit like in complex analysis, where if you want to compute the integral of a function with certain continuity properties, it turns out that you can just find the finite number of points at which the function goes to infinity and compute a quantity called the "residue" at those points, and then the whole integral is just a sum of those residues.

It’s quite remarkable. You can understand the whole function just by understanding what’s happening in the vicinity of a few points. It’s not that we decide that these singularity points are more worthy of our attention, it simply is the case, for better or worse, that the behavior of the entire function turns upon the behavior of these singularity points.

Now, understanding the evolution of the whole cosmos in our local region of space by understanding the evolution of the conglomerations of atoms or regions of the wavefunction or faces of God that we identify as humans has no precise connection whatsoever to the residue theorem of complex analysis! It is just an illustrative example! Human minds do not represent singularities in the quantum wavefunction! Understanding the future of life on Earth is not like computing the integral over a holomorphic function! Put any such thoughts out of your mind completely. It is just an example of the phenomenon in which a whole system can be understood by examining some small number of critical points, and not because we make some parochial choice to attend more closely to these critical points than to other points, but because it just is the case, for better or worse, that the evolution of the whole system turns upon the evolution of these finite number of points.

And that does seem to be the situation here on Earth. For better or worse, the fate of all the atoms in and close to the Earth now appear to turn upon the evolution of the eight billion little conglomerations of atoms that we identify as humans. This just seems to be the case.

We are interested in examining these conglomerations of atoms that we identify as humans, so that we might understand the likely future of this region of the cosmos, and so that we might empower ourselves to take appropriate action. Due to our interest in examining these conglomerations of atoms that we identify as humans we develop abstractions for understanding what is going on, because a human consists of a very large number of atoms / a very complex wavefunction / a very difficult-to-understand aspect of God’s grace, and we need abstractions in order to make sense of things. And one such abstraction is the agent model.

The agent model

Under the agent model, the agent receives sense data from the environment, and sends actions out into the environment, but agent and environment are fundamentally separate, and this separation forms the top-level organizing principle of the model.

The agent model abstracts away many details of the underlying reality, as all good models should. It abstracts away the physical details of the sensors -- how "observations" get transmitted from the "environment" to the "agent". It abstracts away the physical details of the actuators -- how "actions" get transmitted from the "agent" to the "environment". Very importantly, it abstracts away the physical details of the computing infrastructure used to run the agent algorithm.

The agent model is a good abstraction. It has proven useful in many domains. It was developed, I understand, although I have not looked into it, within economics. It is used extensively today within computer science as a lens through which we think about building intelligent systems. For example, in partially observable Markov decision processes (POMDPs), which is the basic model underlying reinforcement learning, there is an explicit exchange of actions and observations with the environment.

But these eight billion little conglomerations of atoms upon which the whole future of the cosmos appears to turn is a real phenomenon out there in the world, and the agent model is just one lens through which we might understand that phenomenon. It is a powerful lens, but precisely because it is so powerful I fear that we currently use it to the exclusion of all other lenses. As a result we wind up losing track of what is the lens and what is the phenomenon.

It is the same basic situation that we discussed yesterday with respect to over-use of probability theory as a lens for looking at the real-world phenomenon of machines with quantified uncertainty in their beliefs.

It is not that the agent model is a bad model, nor that we should discard all models and insist on gazing directly upon the atoms / wavefunction / God. It is that when we have only one lens, it is extremely difficult to discern in what ways it is helping us to see the world and in what ways we are seeing impurities in the lens itself.

A good general way to overcome this is to move between different lenses. Even if one lens seems to be the most powerful one available, it is still helpful to spend some time looking through other lenses if for no other reason than to distinguish that which the powerful lens is revealing to us about reality from that which is merely an artifact of the lens itself.

But what is a good second lens for looking at these conglomerations of atoms that exert power over the future? This is a question that I would very much like to begin a conversation about.

Discuss

### "Who I am" is an axiom.

26 апреля, 2021 - 00:59
Published on April 25, 2021 9:59 PM GMT

I guess most of us, in moments of existential crisis, have asked the question of "Why am I me?". It is not about grammar: "because 'I' and 'me' are both first-person singular pronouns." Nor is it a matter of tautology: "of course Bill Gates is Bill Gates. duh" It's the urge to question why among all things exist, I am experiencing the world from this particular being's perspective. How come I am not a different person, or animal, or some other physical entity.

Such a question does not have any logical explanation. Because it is entirely due to subjectivity. I inherently know this human is me because I know the subjective feeling of that person. Pain can be logically reduced to nerve signals, colors by wavelength. Yet only when this human is experiencing them will I have the subjective feeling. Explaining "Who I am" is beyond the realm of logic. It is something intuitively known. An axiom.

So rational thinking should have no answers to questions such as "the chances of me being born as a human". I'm a human is a fact that can only be accepted, not explained. However, when framed in certain ways, similar questions do seem to have obvious answers. For example, "There are 7.7 billion people in the world, about 60% Asian. What's the probability of me being an Asian ?" Many would say 60%.

That's because it is uneasy to have no answer. So we often subconsciously change the question. Here "I/me" is no longer taken as the axiomatic, subjectively defined perspective center. Instead used as a shorthand for some objectively defined entity, e.g. a random human. By using an objective reinterpretation, the question falls back into the scope of logical thinking. So it can be easily answered: since 60% of all people are Asian, then a randomly selected person should have a 60% chance of being one.

Mixing the subjectively defined "I" to objectively defined entity seldomly causes any problem. Anthropic reasoning is an exception. Take the Doomsday Argument for example. It is true a randomly chosen human would be unlikely to be the very early ones. If it turns out so, the Bayesian update to doom-soon would be valid too. However, the argument uses words such as "I" or "we" as something primitively understood: the intuitive perspective center. In this case, probabilities such as "me being born as the first 5% of all human" have no answer at all. Because "who I am" is something that can only be accepted not explained, an axiom.

Discuss

### Would robots care about Meaning and Relating?

26 апреля, 2021 - 00:17
Published on April 25, 2021 9:17 PM GMT

A few weeks ago, Vaniver gave a talk discussing meaning and meaningfulness. Vaniver had some particular thing he was trying to impart. I am not sure I got the thing he intended, but what I got was interesting. Here is my bad summary of some things I got from the talk, and some of the discussion after the talk (in particular from Alex Ray). No promises that either of them endorse this.

Epistemic status: I am not very confident this is the right frame, but it seemed at least like an interesting pointer to the right frame. "

WTF is Meaning™?

Humans seem to go around asking questions like "What makes life meaningful? What is 'The Meaning of Life?'. What is my purpose? What is the point of it all?"

What is the type-signature of a "Meaning", such that we'd recognize one if we saw it?

When asking a question like this, it's easy to get lost in a floating series of thought-nodes that don't actually connect to reality. A good rationalist habit around questions like this is to ask: "Do we understand this 'meaning' concept well enough to implement it in a robot? Could a robot find things meaningful? Is there a reason we'd want robots to find things meaningful? What sort of algorithms end up asking "what is the meaning of life?"

Here is a partial, possible answer to that question.

Imagine a StarCraft playing robot.

Compared to humans, StarCraftBot has a fairly straightforward job: win games of StarCraft. It does a task, and then it either wins, or loses, and gets a boolean signal, which it might propagate back through a complex neural net. Humans don't have this luxury – we get a confused jumble of signals that were proxies for what evolution actually cared about when it programmed us. We get hungry, or horny, or feelings of satisfaction that vaguely correlate with reproducing our genes.

StarCraftBot has a clearer sense of "what is my purpose."

Nonetheless, as StarCraftBot goes about "trying to get good at StarCraft", it has to make sense of a fairly complex world. Reality is high dimensional, even the simplified reality of the StarCraft universe. It has to make lots of choices, and there's a huge number of variables that might possibly be relevant.

It might need to invent concepts like "an economy", "the early game", "micro", "units", "enemy", "advantage/disadvantage." (disclosure: I am neither an ML researcher nor a Starcraft pro). Not only that, but it needs some way to navigate when to apply one of those concepts, vs another one of them. Sometimes, it might need to move up or down a ladder of abstraction

StarCraftBot has had the Meaning of Life spelled out for it, but it still needs a complex ontology for navigating how to apply that meaningfulness. And as it constructs that ontological framework for itself, it may sometimes find itself confused about "What is a unit? Are units and buildings meaningfully different? What principles underly a thriving economy?"

Now, compare this to humans. We have a cluster of signals that relate to surviving, and reproducing, and ensuring our tribe survives and flourishes. We end up having to do some kind of two-way process, where we figure out...

• Specific things like: "Okay, what is a tiger? What is food? What is my family? What is 'being a craftsman?' or 'being a hunter?'"
• Higher order things like "What is the point of all of this? how do all of these things tie together? If I had to tradeoff my survival, or my children's, or my tribes', which would I do? What is my ultimate goal?"

A thing that some religions and cultures do is tie all these things together into a single narrative, with multiple overlapping tiers. You have goals relating to your own personal development, and to raising a family, and to having a role in your tribe that helps it flourish as a group, and (in some cases) to some higher purpose of 'serve god' or 'serve the ancestors' or 'protect the culture.'

The idea here is something like "Have a high level framework for navigating various tactical and strategic goals, that is coherent such that when you move from one domain to another, you don't have to spend too much time re-orienting or resolving contradictions between them. Each strategic frame allows you filter out tons of extraneous detail and focus on the decision-at-hand."

Hammers, Relationships and Fittingness

Meanwhile, another concept that might bear on "Why do humans sit around saying 'what does it all mean!?'" is fittingness.

Say you have a hammer.

The hammer has a shape – a long handle, a flat hammer-part, and a curved hook thingy. There are many different ways you could interact with the hammer. You could kick it with your feet. You could grab it by the curved hook thingy. You could grab it by the handle. You could try to eat it

How do you relate to the hammer? It's not enough to know it exists. If a chimpanzee were to find a hammer, they might need some sense of "what is the hammer for?". Once they realize they can bash walnuts open with it, or maybe bash in the skull of a rival chimpanzee, they might get the sense of "oh, the thing I'm supposed to do here is grab the handle, and swing."

Later, if their concept-schemas comes to include nails and timber and houses, they might think "ohhhhh, this has a more specific, interesting purpose of hammering nails into wood to build things."

Later still, they might realize "ohhhhhhhhhh, this weird hook thing on the end is for pulling nails out." This involves using the hammer a different way than they might have previously.

Hammers vs Fathers

Okay. So, you might come upon a hammer and say: "I have this weird-shaped-object, I could fit myself around it in various ways. I could try to eat it. It's unclear how to fit it into my hand, and it's unclear how to fit it against the other parts of my environment. But after fiddling around a bunch, it seems like this thing has a purpose. It can bash walnuts or skulls or nails."

The process of figuring that out is a mental motion some people need to make sometimes.

Another mental motion people make sometimes is to look around at their tribe, their parents, their children, their day-to-day activities, and to ask questions like "how do I fit in here?".

Say you have a father. There are a bunch of ways you can interact with your father. You can poke them on the nose. You can cry at them. You can ask them philosophical questions. You can silently follow their instructions. You can grab them and shake them and yell "Why don't you understand me!!?".

Which of those is helpful depends on your goals, and what stage of life you're at, and what sort of tribe you live in (if any).

If you are a baby, "poke your father on the nose" is in some sense what you're supposed to be doing. You're a baby. Your job is to learn basic motor skills and crudely mimic social things going on around you and slowly bootstrap yourself into personhood.

If you're in some medieval cultures, and you are male and your father is a blacksmith, then your culture (and correspondingly, your father's personality), might give you a particular set of affordances: follow their instructions about blacksmithing and learn to be a blacksmith. [citation needed]. Learn some vaguely defined "how to be a man" things.

You can say to your dad "I wanna be a poet" and ask him questions about poetry, but in this case that probably won't go very well because you are a medieval peasant and society around you does not provide much opportunity to learn poetry, nor do anything with it. [citation needed again]

You can grab your father and shake him and say "why don't you understand me!!!?". Like the chimpanzee holding a hammer by the wrong end, mashing walnuts with the wooden handle, that sorta kinda works, but it is probably not the best way to accomplish your goals.

As you grow up, the culture around you might also offer you particular affordances and not others. You have a strong affordance for becoming a blacksmith. I don't really know how most medieval societies work but maybe you have other affordances like "become a tailor if for some reason you are drawn to that" or "join the priesthood" or "become a brigand" or "open an inn." Meanwhile you can "participate in tribal rituals" and "help raise barns when that needs doing", or you can ignore people and stick to your blacksmith shop being kinda antisocial.

Analogy or Literal?

It's currently unclear me if the questions "how do I relate to my hammer" and "how do I relate to my father?" are cute analogies for each other, or if they are just literally the same mental motion applied to very different phenomena.

I'm currently leaning into "they are basically the same thing, on some level." People and hammers and tribes are pretty different, and they have very different knobs you can fiddle with. But, maybe, the fundamental operation is the same: you have an interface with reality. You have goals. You have a huge amount of potential details to think about. You can carve the interface into natural joints that make it easier to reason about and achieve your goals. You fiddle around with things, either physically in reality on in your purely mental world. You figure out what ways of interacting with stuff actually accomplishes goals.

A schema for how to relate to your father might seem limiting. But, it is helpful because reality is absurdly complex, and you have limited compute for reasoning about what to do. It is helpful to have some kind of schema for relating to your father, whether it's a schema society provides you, or one you construct for yourself.

Having a mutually understood relationship prunes out the vast amount of options and extraneous details, down to something manageable. This is helpful for your father, and helpful for you.

Relating and Meaning

So, in summary, here is a stab at what meaning and relating might be, in terms that  might actually be (ahem) meaningful if you were building a robot from scratch.

A relationship might be thought as "a set of schemas for interacting with something, that let you achieve your goals." Your relationship with a hammer might be simple and unidirectional. Your relationship with a human might be much more complex, because both of you have potential actions that include modeling each other, thinking strategically, cooperating or defecting in different ways over time, etc. This creates a social fabric, with a weirder set of rules for how to interact with it.

Meaning is... okay geez I got to the end of this essay and I'm still not sure I can concisely describe "Meaning" rather than vaguely gesturing at it.

The dictionary definition of "meaning" that comes up when I google it is about words, and what words mean. I think this is relevant to questions like "what does it all mean?" or "what is the meaning of life?", but a few steps removed. When I say "what do the letters H-O-T mean?" I'm asking about the correspondence between an abstract symbol, and a particular Thing In Reality (in this case, the concept of being high-temperature).

When I ask "What does my job mean?", or "what does my relationship with my father mean?" or "what is the meaning of life?", I'm asking "how do my high level strategic goals correspond to each other, in a way that is consistent, minimizes overhead when I shift tasks, and allows me to confidently filter out irrelevant details?"

While typing this closing summary, I think "Meaningmaking" might be a subtype of "Relating". If Relating is fiddling-around-with or reflecting-on a thing, until you understand how to interact with it, then I think maybe "Meaningmaking" is fiddling around with your goals and high level strategies until you feel like you have a firm grasp on how to interact with them.

...

Anyway, I am still a bit confused about all this but those were some thoughts on Meaning and Relating. I am interested in other people's thoughts.

Discuss

### For mRNA vaccines, is (short-term) efficacy really higher after the second dose?

25 апреля, 2021 - 23:21
Published on April 25, 2021 8:21 PM GMT

For the Pfizer and Moderna vaccines, efficacy is commonly reported as something high (like 95%) after two doses, and something lower (in the 50-80% range) after only one dose. However, I think there's good reason to doubt that this 50-80% number is capturing the right thing, and there's good reason to believe that the mRNA vaccines are ~95% effective starting 12 days after the first dose. This is important because if your immunity is as high 12 days post-first-dose as it will be 12 days post-second-dose, then any precautions that you are waiting until you are fully vaccinated to get rid of can instead be shed 3-4 weeks earlier.

(Note that getting a second dose might still cause your immunity to wane more slowly in the long term; that is, second doses might still increase long-term efficacy.)

Overall, my current best guess, with confidence ~65%, is that the mRNA vaccines are equally effective 12 days after the first dose as they are 12 days after the second dose.

Below I'll explain the cases for and against the claim that the second dose boosts immunity. In short, the phase 3 data strongly suggests that it doesn't, and real-world data from Israel strongly suggests that it does. I'm posting this as a question because I hope that someone who knows more about this, or who has seen data that I haven't seen, can help figure this out.

Why might we think that second doses don't boost efficacy?

The short answer: because this is what the data from the phase 3 trials straightforwardly says. Here are the key graphs; note the lack of apparent effect from the second dose.

Incidence of Covid cases in the Pfizer phase 3. Note that dose 2 was given on day 21. Source.From the Moderna phase 3. Dose 2 was given on day 28. Source.

So what's up with the commonly reported 50-80% post-first-dose efficacy? The issue is that this number captures efficacy starting on the day you receive the vaccine (or sometimes 7 days later). As you can see from the graphs, is too soon for the vaccine to have any effect at all. Rather, immunity starts appearing in the data starting sharply on day 12, so what we really want to know is the efficacy in the time period between day 12 and the second dose.

(Note that since there's an incubation period between exposure and developing symptoms, the efficacy actually starts before day 12, perhaps around day 7. Shouldn't there be some noise around the day the vaccine kicks in due to varying incubation periods? I would think so, but we don't see that here.)

Frustratingly, the phase 3's don't report this number. But using some data included in the Pfizer phase 3, I was able to make this graph:

(They only give data binned by week, so I can't make this graph on a day-by-day basis. Also, it gets noisier as the trial continues because the number of participants dropped, so that oscillation after 70+ days is probably nothing real. Alas, I can't find the data in the Moderna phase 3 necessary to make this graph for Moderna.)

So based on the phase 3 data, it really doesn't look like the second dose does anything to boost immunity.

Why might we think that second doses do boost efficacy?

Because that seems to be what Israeli data is saying. Here is the key graph:

Efficacy of Pfizer vaccine against symptomatic Covid in Israel. There is also data for documented Covid cases (whether symptomatic or not), but the phase 3 data above was for symptomatic Covid, so I'm using this. Source.

Unlike the graphs above, it actually does take some time for the blue line to flatten out. This is not just an illusion; see here:

Efficacy is 1 - RR. Remember, we're looking at the symptomatic column. Source (same as above).

It's important to note that the Israeli trial is about 30x larger than the Pfizer phase 3 (N = 43K vs N = 1.2 million).

But it's equally important to note that the the phase 3's were true randomized controlled trials - the placebo group was given a fake injection and everything - whereas the Israeli studies were observational. So it's possible that something funky is going on, like that once you're vaccinated (and you know you're vaccinated) you go out partying in Tel Aviv and get Covid at higher rates than whichever non-vaccinated control person the study matched you with; then over time the excitement wears off and you start going out partying at close to population base rates.

One thing I'm uncertain about is how to weight the relative evidence from the phase 3's and the Israeli data, given that there is much more Israeli data but the phase 3's are RCTs (and are still quite large). For now, I'm trusting the RCTs slightly more, so my 65% confident guess is that the second dose does not boost your immunity.

But I would love to hear from others.

Discuss

### Is there a good software solution for mathematical questions?

25 апреля, 2021 - 21:03
Published on April 25, 2021 6:03 PM GMT

I have a programming problem that I want to optimize. To optimize it I have to calculate the minimum value that's possible for one variable given a set of equations (where some but not all variables are known at runtime). Is there any good automatized system that does that job for me and that gives me a formula given my constraints?

Discuss