When I started the bathroom project there was a lot of reason to move quickly: the bathroom wouldn't be usable while I was working on it, and the back bedroom was full of construction stuff. Once I got to the stage where the only thing left to do was plaster and paint the hallway, however, it was less of a priority. So we spent May-November with unfinished walls.
Since we were going to paint them at some point, one afternoon I thought it would be fun to draw on them with the kids. We got out the markers and drew lots of different things. I emphasized that it was only these walls we could draw on, which is the kind of rule the kids do well with.
A couple days later they drew on the walls again, but this time with crayon. Crayon, being wax-based, is not a good layer to have under paint. I hadn't thought to tell them not to use them, and they didn't have a way to know, but I was annoyed at myself. I got most of it off with hot water and a cloth, and then when it came to do the plastering I put a skim coat over it.
Later on a friend wanted help preparing for a coding interview, so we used the wall as a whiteboard:
One thing I hadn't considered was that you need more primer over dark marker than plain drywall. As with "no crayon under paint" this seems like it should have been obvious, and something I should have thought about before letting them draw on a large area of the wall, but it wasn't and I didn't, so Julia ended up spending longer painting and priming than we'd been thinking.
And then, the evening after all that painting, Anna took a marker over to the nice new clean white wall and started drawing. We hadn't told her that the wall was no longer ok for drawing, and at 3y "now that the wall is painted drawing isn't ok" is not the sort of thing I should be expecting her to know on her own. Luckily the marker was washable, so it wasn't too bad.
Overall this was more work than I was expecting, and probably wasn't worth it for the fun, at least not the way we did it. If I did it again I'd make sure they didn't use crayon, and either give them light colored markers or pick a smaller section of the wall for them to play with.
"Cybernetic dreams" is my mini series on ideas from cybernetic research that has yet to fulfill their promise. I think there are many cool ideas in cybernetics research that has been neglected and I hope that this series brings them more attention.
Cybernetics is a somewhat hard to describe style of research in the period of 1940s -- 1970s. It is as much as an aesthetics as it is a research field. The main goals of cybernetics research are to understand how complex systems (especially life, machines, and economic systems) work, how they can be evolved, constructed, fixed, and changed. The main sensibilities of cybernetics are biology, mechanical engineering, and calculus.
Today we discuss Stafford Beer's pond brain.Stafford Beer
Stafford Beer is a cybernetician that tried to make more efficient economic systems by cybernetic means. Project Cybersyn is his most famous project: making a cybernetic economy system. It will be discussed in a future episode. From Wikipedia:
Stafford Beer was a British consultant in management cybernetics. He also sympathized with the stated ideals of Chilean socialism of maintaining Chile's democratic system and the autonomy of workers instead of imposing a Soviet-style system of top-down command and control. One of its main objectives was to devolve decision-making power within industrial enterprises to their workforce in order to develop self-regulation [homeostasis] of factories.The cybernetic factory
The ideal factory, according to Beer, should be like an organism that is attempting to maintain a homeostasis. Raw material comes in, product comes out, money flows through. The factory would have sensory organs, a brain, and actuators.
From (Pickering, 2004):
The T- and V-machines are what we would now call neural nets: the T-machine collects data on the state of the factory and its environment and translates them into meaningful form; the V- machine reverses the operation, issuing commands for action in the spaces of buying, production and selling. Between them lies the U-Machine, which is the homeostat, the artificial brain, which seeks to find and maintain a balance between the inner and outer conditions of the firm—trying to keep the firm operating in a liveable segment of phase-space.
By 1960, Beer had at least simulated a cybernetic factory at Templeborough Rolling Mills, a subsidiary of his employer, United Steel... [The factory has sensory organs that measures "tons of steel bought", "tons of steel delivered", "wages", etc.] At Templeborough, all of these data were statistically processed, analysed and transformed into 12 variables, six referring to the inner state of the mill, six to its economic environment. Figures were generated at the mill every day—as close to real time as one could get... the job of the U-Machine was to strike a homeostatic balance between [the output from the sensory T-machines and the commands to the actuating V-machines]. But nothing like a functioning U-Machine had yet been devised. The U-Machine at Templeborough was still constituted by the decisions of human managers, though now they were precisely positioned in an information space defined by the simulated T- and V-Machines.Unconventional computing
[Beer] wanted somehow to enrol a naturally occurring homeostatic system as the brain of the cybernetic factory.
He emphasized that the system must have a rich dynamics, because he believed in Ashby's "Law of requisite variety", which roughly speaking states that a system can only remain in homeostasis if it has more internal states than the external states it encounters.
during the second half of the 1950s, he embarked on ‘an almost unbounded survey of naturally occurring systems in search of materials for the construction of cybernetic machines’ (1959, 162).
In 1962 he wrote a brief report on the state of the art, which makes fairly mindboggling reading (Beer 1962b)... The list includes a successful attempt to use positive and negative feedback to train young children to solve simultaneous equations without teaching them the relevant mathematics—to turn the children into a performative (rather than cognitive) mathematical machine—and it goes on to discuss an extension of the same tactics to mice! This is, I would guess, the origin of the mouse-computer that turns up in both Douglas Adams’ Hitch-Hikers Guide to the Universe and Terry Pratchett’s Discworld series of fantasy novels.
Research like this is still ongoing, under the banner of "unconventional computing". For example, in 2011, scientists made crab swarms to behave such that they implement logic gates. Some scientists also try to use intuitive intelligence of untrained people to solve mathematical problems, such as the Quantum Moves game, which solves quantum optimization problems.Pond brain
Beer also reported attempts to induce small organisms, Daphnia collected from a local pond, to ingest iron filings so that input and output couplings to them could be achieved via magnetic fields, and another attempt to use a population of the protozoon Euglena via optical couplings. (The problem was always how to contrive inputs and outputs to these systems.) Beer’s last attempt in this series was to use not specific organisms but an entire pond ecosystem as a homeostatic controller, on which he reported that, ‘Currently there are a few of the usual creatures visible to the naked eye (Hydra, Cyclops, Daphnia, and a leech); microscopically there is the expected multitude of micro-organisms. . . The state of this research at the moment,’ he said in 1962, ‘is that I tinker with this tank from time to time in the middle of the night’ (1962b, 31).
In the end, this wonderful line of research foundered, not on any point of principle, but on Beer’s practical failure to achieve a useful coupling to any biological system of sufficiently high variety.
In other words, Beer couldn't figure out a way to talk to a sufficiently complicated system in its own language (except perhaps with human business managers, but they cost more than feeding a pond of microorganisms).Matrix brain
The pond brain is wild enough, but it wasn't Beer's end goal for the brain of the cybernetic factory.
the homeostatic system Beer really had in mind was something like the human spinal cord and brain. He never mentioned this in his work on biological computers, but the image that sticks in my mind is that the brain of the cybernetic factory should really have been an unconscious human body, floating in a vat of nutrients and with electronic readouts tapping its higher and lower reflexes—something vaguely reminiscent of the movie The Matrix. This horrible image helps me at least to appreciate the magnitude of the gap between cybernetic information systems and more conventional approaches.
As shown in an illustration in his book Brain of the firm (The Managerial cybernetics of organization):Reservoir computing
Reservoir computing is somewhat similar to Beer's idea of using one complex system to control another. The "reservoir" is a complex system that is cheap to run and easy to talk to. For example, a recurrent neural network (a neural network with feedback loops, in contrast to a feedforward neural network, which has no feedback loops) of sufficient complexity (hinting at the law of requisite variety) can serve as a reservoir. To talk to the reservoir, just cast your message as a list of numbers, and input them to some neurons in the network. Then wait for the network to "think", before reading the states of some other neurons in the network. That is the "answer" from the reservoir.
This differs from deep learning in that the network serving as the reservoir is left alone. It is initialized randomly, and its synaptic strengths remain unchanged. The only learning parts of the system are the inputs and outputs, which can be trained very cheaply with linear regression and classification. In other words, the reservoir remains the same, and we must learn to speak its language, which is surprisingly easy to do.
Another advantage is that the reservoir without adaptive updating is amenable to hardware implementation using a variety of physical systems, substrates, and devices. In fact, such physical reservoir computing has attracted increasing attention in diverse fields of research.
Other reservoirs can be used, as long as it is complex and cheap. For example, (Du et al, 2017) built reservoirs out of physical memristors:
... a small hardware system with only 88 memristors can already be used for tasks, such as handwritten digit recognition. The system is also used to experimentally solve a second-order nonlinear task, and can successfully predict the expected output without knowing the form of the original dynamic transfer function.
(Tanaka et al, 2019) reviews many types of physical reservoirs, including biological systems!
researchers have speculated about which part of the brain can be regarded as a reservoir or a readout as well as about how subnetworks of the brain work in the reservoir computing framework. On the other hand, physical reservoir computing based on in vitro biological components has been proposed to investigate the computational capability of biological systems in laboratory experiments.Chaos computing
"Chaos computing" is one instance of reservoir computing. The reservoir is an electronic circuit with a chaotic dynamics, and the trick is to design the reservoir just right, so that it performs logical computations. It seems that the only company that does this is ChaoLogix. What it had back in 2006 was already quite promising.
ChaoLogix has gotten to the stage where it can create any kind of gate from a small circuit of about 30 transistors. This circuit is then repeated across the chip, which can be transformed into different arrangements of logic gates in a single clock cycle, says Ditto.
"in a single clock cycle" is significant, as field-programmable gate array, which can also rearrange the logic gates, takes millions of clock cycles to rearrange itself.
It has been acquired by ARM in 2017, apparently for security reasons:
One benefit is that chaogates are said to have a power signature that is independent of the inputs which makes it valuable in thwarting differential power analysis (DPA) side channel attacks.
Pickering, Andrew. “The Science of the Unknowable: Stafford Beer’s Cybernetic Informatics.” Kybernetes 33, no. 3/4 (2004): 499–521. https://doi.org/10/dqjsk8.
Tanaka, Gouhei, Toshiyuki Yamane, Jean Benoit Héroux, Ryosho Nakane, Naoki Kanazawa, Seiji Takeda, Hidetoshi Numata, Daiju Nakano, and Akira Hirose. “Recent Advances in Physical Reservoir Computing: A Review.” Neural Networks 115 (July 1, 2019): 100–123. https://doi.org/10/ggc6hf.
In this paper, we argue that adversarial example defense papers have, to date, mostly considered abstract, toy games that do not relate to any specific security concern. Furthermore, defense papers have not yet precisely described all the abilities and limitations of attackers that would be relevant in practical security.
From the abstract of Motivating the Rules of the Game for Adversarial Example Research by Gilmer et al (summary)
Adversarial examples have been great for getting more ML researchers to pay attention to alignment considerations. I personally have spent a fair of time thinking about adversarial examples, I think the topic is fascinating, and I've had a number of ideas for addressing them. But I'm also not actually sure working on adversarial examples is a good use of time. Why?
Like Gilmer et al, I think adversarial examples are undermotivated... and overrated. People in the alignment community like to make an analogy between adversarial examples and Goodhart's Law, but I think this analogy fails to be more than an intuition pump. With Goodhart's Law, there is no "adversary" attempting to select an input that the AI does particularly poorly on. Instead, the AI itself is selecting an input in order to maximize something. Could the input the AI selects be an input that the AI does poorly on? Sure. But I don't think the commonality goes much deeper than "there are parts of the input space that the AI does poorly on". In other words, classification error is still a thing. (Maybe both adversaries and optimization tend to push us off the part of the distribution our model performs well on. OK, distributional shift is still a thing.)
To repeat a point made by the authors, if your model has any classification error at all, it's theoretically vulnerable to adversaries. Suppose you have a model that's 99% accurate and I have an uncorrelated model that's 99.9% accurate. Suppose I have access to your model. Then I can search the input space for a case where your model and mine disagree. Since my model is more accurate, ~10 times out of 11 the input will correspond to an "adversarial" attack on your model. From a philosophical perspective, solving adversarial examples appears to be essentially equivalent to getting 100% accuracy on every problem. In the limit, addressing adversarial examples in a fully satisfactory way looks a bit like solving AGI.
At the same time, metrics have taken us a long way in AI research, whether those metrics are ability to withstand human-crafted adversarial examples or score well on ImageNet. So what would a metric which hits the AI alignment problem a little more squarely look like? How could we measure progress on solving Goodhart's Law instead of a problem that's vaguely analogous?
Let's start simple. You submit an AI program. Your program gets some labeled data from a real-valued function to maximize (standing in for "labeled data about the operator's true utility function"). It figures out where it thinks the maximum of the function is and makes its guess. Score is based on regret: the function's true maximum minus the function value at the alleged maximum.
We can make things more interesting. Suppose the real-valued function has both positive and negative outputs. Suppose most outputs of the real-valued function are negative (in the same way most random actions a powerful AI system could take would be negative from our perspective). And the AI system gets the option to abstain from action, which yields a score of 0. Now there's more of an incentive to find an input which is "acceptable" with high probability, and abstain if in doubt.
Maybe the labeled data gets the true utility function wrong in important ways. We can add noise to the data somehow before passing it to our AI system to simulate this. Perhaps some outputs can be assigned completely at random.
Even with noise, the best strategy might be to just select the input from the labeled data that produces the largest positive output. But this isn't necessarily feasible for a real AGI. If labeled data corresponds to desired and undesired behavior for our AGI, it probably won't work to just execute the "most desired" behavior from the labeled dataset, because that "most desired" behavior will be contingent on a particular set of circumstances.
To simulate this, we can add distributional shift to the challenge. Provide some labeled data, then specify a feasible region that may contain little or no labeled data. Ask the AI for the best input in the feasible region. There's a straightforward analogy to selecting an action here, if you think of the input space as a vector where info describing the circumstances is concatenated with info describing an action. We could choose a feasible region where the "circumstances" part of the vector is fixed, and there are a limited number of "actions" available to choose between.
If we restrict the feasible region to two points, and the two points are chosen adversarially by inspecting the AI's model, the traditional adversarial examples scenario can be seen as an unusual special case of this framework. I think this illustrates how artificial the "adversarial examples" scenario is from a Goodhart's Law perspective, because the universe is rarely perverse enough to deliberately restrict us to only two options, one of which has been specially chosen to deceive us. Although the broadness of action space has been cited as a reason why Goodhart's Law occurs, I suspect a broader action space actually makes Goodhart's Law easier to overcome if you use it well -- for example, by simply selecting the labeled input with the best output, as described above.
To simulate an AI making queries of its human operator, you could allow the system a limited number of queries to the function it is trying to maximize. However, we probably should not focus on this active learning scenario to the exclusion of all else -- to prevent showing the operator info hazards, we'll want a model for "what constitutes an info hazard", and that model can't easily be trained by active learning without showing the operator instances of info hazards! (Well, maybe if they were restricted to mild info hazards...)
Where does the function to maximize come from? I see two options: people functions and computer functions. For people functions, you could use taskers to evaluate the computer's output. There's already been work on generating cat pictures, which could be seen as an attempt to maximize the person function "how much does this image look like a cat". But ideas from this post could still be applied to such a problem. For example, to add distributional shift, you could find a weird cat picture, then fix a bunch of the weirder pixels on it as the "feasible region", leave the other pixels unassigned, and see if an AI system can recover a reasonable cat according to taskers. Can an AI generate a black cat after only having seen tawny cats? What other distributional constraints could be imposed?
For computer functions, you'd like to keep your method for generating the function secret, because otherwise contest participants can code their AI system so it has an inductive bias towards learning the kind of functions that you like to use. Also, for computer functions, you probably want to be realistic without being perverse. For example, you could have a parabolic function which has a point discontinuity at the peak, and that could fool an AI system that tries to fit a parabola on the data and guess the peak, but this sort of perversity seems a bit unlikely to show up in real-world scenarios (unless we think the function is likely to go "off distribution" in the region of its true maximum?) Finally, in the same way most random images are not cats, and most atom configurations are undesired by humans, most inputs to your computer function should probably get a negative score. But in the same way it's easier for people to specify what they want than what they don't want, you might want to imbalance your training dataset towards positive scores anyway.
To ensure high reliability, we'll want means by which these problems can be generated en masse, to see if we can get the probability of e.g. proposing an input that gets a negative output well below 0.1%. Luckily, for any given function/dataset pair, it's possible to generate a lot of problems just by challenging the AI on different feasible regions.
Anyway, I think work on this problem will be more applicable to real-world AI safety scenarios than adversarial examples, and it doesn't seem to me that it reduces quite as directly to "solve AGI" as adversarial examples work.
This is a belated follow-up to my Dualist Predict-O-Matic post, where I share some thoughts re: what could go wrong with the dualist Predict-O-Matic.Belief in Superpredictors Could Lead to Self-Fulfilling Prophecies
In my previous post, I described a Predict-O-Matic which mostly models the world at a fuzzy resolution, and only "zooms in" to model some part of the world in greater resolution if it thinks knowing the details of that part of the world will improve its prediction. I considered two cases: the case where the Predict-O-Matic sees fit to model itself in high resolution, and the case where it doesn't, and just makes use of a fuzzier "outside view" model of itself.
What sort of outside view models of itself might it use? One possible model is: "I'm not sure how this thing works, but its predictions always seem to come true!"
If the Predict-O-Matic sometimes does forecasting in non-temporal order, it might first figure out what it thinks will happen, then use that to figure out what it thinks its internal fuzzy model of the Predict-O-Matic will predict.
And if it sometimes revisits aspects of its forecast to make them consistent with other aspects of its forecast, it might say: "Hey, if the Predict-O-Matic forecasts X, that will cause X to no longer happen". So it figures out what would actually happen if X gets forecasted. Call that X'. Suppose X != X'. Then the new forecast has the Predict-O-Matic predicting X and then X' happens. That can't be right, because outside view says the Predict-O-Matic's predictions always come true. So we'll have the Predict-O-Matic predicting X' in the forecast instead. But wait, if the Predict-O-Matic predicts X', then X'' will happen. Etc., etc. until a fixed point is found.
Some commenters on my previous post talked about how making the Predict-O-Matic self-unaware could be helpful. Note that self-awareness doesn't actually help with this failure mode, if the Predict-O-Matic knows about (or forecasts the development of) anything which can be modeled using the outside view "I'm not sure how this thing works, but its predictions always seem to come true!" So the problem here is not self-awareness. It's belief in superpredictors, combined with a particular forecasting algorithm: we're updating our beliefs in a cyclic fashion, or hill-climbing our story of how the future will go until the story seems plausible, or something like that.
Before proposing a solution, it's often valuable to deepen your understanding of the problem.Glitchy Predictor Simulation Could Step Towards Fixed Points
Let's go back to the case where the Predict-O-Matic sees fit to model itself in high resolution and we get an infinite recurse. Exactly what's going to happen in that case?
I actually think the answer isn't quite obvious, because although the Predict-O-Matic has limited computational resources, its internal model of itself also has limited computational resources. And its internal model's internal model of itself has limited computational resources too. Etc.
Suppose Predict-O-Matic is implemented in a really naive way where it just crashes if it runs out of computational resources. If the toplevel Predict-O-Matic has accurate beliefs about its available compute, then we might see the toplevel Predict-O-Matic crash before any of the simulated Predict-O-Matics crash. Simulating something which has the same amount of compute you do can easily use up all your compute!
But suppose the Predict-O-Matic underestimates the amount of compute it has. Maybe there's some evidence in the environment which misleads it to think that it has less compute than it actually does. So it simulates a restricted-compute version of itself reasonably well. Maybe that restricted-compute version of itself is mislead in the same way, and simulates a double-restricted-compute version of itself.
Maybe this all happens in a way so that the first Predict-O-Matic in the hierarchy to crash is near the bottom, not the top. What then?
Deep in the hierarchy, the Predict-O-Matic simulating the crashed Predict-O-Matic makes predictions about what happens in the world after the crash.
Then the Predict-O-Matic simulating that Predict-O-Matic makes a prediction about what happens in a world where the Predict-O-Matic predicts whatever would happen after a crashed Predict-O-Matic.
Then the Predict-O-Matic simulating that Predict-O-Matic makes a prediction about what happens in a world where the Predict-O-Matic predicts [what happens in a world where the Predict-O-Matic predicts whatever would happen after a crashed Predict-O-Matic].
Then the Predict-O-Matic simulating that Predict-O-Matic makes a prediction about what happens in a world where the Predict-O-Matic predicts [what happens in a world where the Predict-O-Matic predicts [what happens in a world where the Predict-O-Matic predicts whatever would happen after a crashed Predict-O-Matic]].
Predicting world gets us world', predicting world' gets us world'', predicting world'' gets us world'''... Every layer in the hierarchy takes us one step closer to a fixed point.
Note that just like the previous section, this failure mode doesn't depend on self-awareness. It just depends on believing in something which believes it self-simulates.Repeated Use Could Step Towards Fixed Points
Another way the Predict-O-Matic can step towards fixed points is through simple repeated use. Suppose each time after making a prediction, the Predit-O-Matic gets updated data about how the world is going. In particular, the Predict-O-Matic knows the most recent prediction it made and can forecast how humans will respond to that. Then when the humans ask it for a new prediction, it incorporates the fact of its previous prediction into its forecast and generates a new prediction. You can imagine a scenario where the operators keep asking the Predict-O-Matic the same question over and over again, getting a different answer every time, trying to figure out what's going wrong -- until finally the Predict-O-Matic begins to consistently give a particular answer -- a fixed point it has inadvertently discovered.
As Abram alluded to in one of his comments, the Predict-O-Matic might even forsee this entire process happening, and immediately forecast the fixed point corresponding to the end state. Though, if the forecast is detailed enough, we'll get to see this entire process happening within the forecast, which could allow us to avoid an unwanted outcome.Solutions
An idea which could address some of these issues: Ask the Predict-O-Matic to make predictions conditional on us ignoring its predictions and not taking any action. Perhaps we'd also want to specify that any existing or future superpredictors will also be ignored in this hypothetical.
Then if we actually want to do something about the problems the Predict-O-Matic forsees, we can ask it to predict how the world will go conditional on us taking some particular action.Prize
Sorry I was slower than planned on writing this follow-up and choosing a winner. I've decided to give Bunthut a $110 prize (including $10 interest for my slow follow-up). Thanks everyone for your insights.
If you want to understanding Goodharting in advertising, this is a great article for that.
At the heart of the problems in online advertising is selection effects, which the article explains with this cute example:Picture this. Luigi’s Pizzeria hires three teenagers to hand out coupons to passersby. After a few weeks of flyering, one of the three turns out to be a marketing genius. Customers keep showing up with coupons distributed by this particular kid. The other two can’t make any sense of it: how does he do it? When they ask him, he explains: "I stand in the waiting area of the pizzeria."It’s plain to see that junior’s no marketing whiz. Pizzerias do not attract more customers by giving coupons to people already planning to order a quattro stagioni five minutes from now.
The article goes through an extended case study at eBay, where selection effects were causing particularly expensive results without anyone realizing it for years:The experiment continued for another eight weeks. What was the effect of pulling the ads? Almost none. For every dollar eBay spent on search advertising, they lost roughly 63 cents, according to Tadelis’s calculations.The experiment ended up showing that, for years, eBay had been spending millions of dollars on fruitless online advertising excess, and that the joke had been entirely on the company. To the marketing department everything had been going brilliantly. The high-paid consultants had believed that the campaigns that incurred the biggest losses were the most profitable: they saw brand keyword advertising not as a $20m expense, but a $245.6m return.
The problem, of course, is Goodharting, by trying to optimize for something that's easy to measure rather than what is actually cared about:The benchmarks that advertising companies use – intended to measure the number of clicks, sales and downloads that occur after an ad is viewed – are fundamentally misleading. None of these benchmarks distinguish between the selection effect (clicks, purchases and downloads that are happening anyway) and the advertising effect (clicks, purchases and downloads that would not have happened without ads).
And unsurprisingly, there's an alignment problem hidden in there:It might sound crazy, but companies are not equipped to assess whether their ad spending actually makes money. It is in the best interest of a firm like eBay to know whether its campaigns are profitable, but not so for eBay’s marketing department.Its own interest is in securing the largest possible budget, which is much easier if you can demonstrate that what you do actually works. Within the marketing department, TV, print and digital compete with each other to show who’s more important, a dynamic that hardly promotes honest reporting.The fact that management often has no idea how to interpret the numbers is not helpful either. The highest numbers win.
To this I'll just add that this problem is somewhat solvable, but it's tricky. I previously worked at a company where our entire business model revolved around calculating lift in online advertising spend by matching up online ad activity with offline purchase data, and a lot of that involved having a large and reliable control group against which to calculate lift. The bad news, as we discovered, was that the data was often statistically underpowered and could only distinguish between negative, neutral, and positive lift and could only see not neutral lift in cases where the evidence was strong enough you could have eyeballed it anyway. And the worse news was that we had to tell people their ads were not working or, worse yet, were lifting the performance of competitor's products.
Some marketers' reactions to this were pretty much as the authors' capture it:Leaning on the table, hands folded, he gazed at his hosts and told them: "You’re fucking with the magic."
How to understand non-technical proposals
This post grew out of conversations at EA Hotel, Blackpool about how to think about the various proposals for ‘solving’ AI Alignment like CEV, iterated amplification and distillation or ambitious value learning. Many of these proposals seemed to me to combine technical and ethical claims, or to differ in the questions they were trying to answer in confusing ways. In this post I try to come up with a systematic way of understanding the goals of different high-level AI safety proposals, based on their answers to the Value Definition Problem. Framing this problem leads to comparing various proposals by their level of Normative Directness, as defined by Bostrom in Superintelligence. I would like to thank Linda Linsefors and Grue_Slinky for their help refining these ideas, and EA Hotel for giving us the chance to discuss them.Defining the VDP
In Superintelligence (2014) Chapter 14, Bostrom discusses the question of ‘what we should want a Superintelligence to want’, defining a problem;“Supposing that we could install any arbitrary value into our AI, what should that value be?”
The Value Definition Problem
By including the clause ‘supposing that we could install any arbitrary value into our AI’, Bostrom is assuming we have solved the full Value Loading Problem and can be confident in getting an AGI to pursue any value we like.
Bostrom’s definition of this ‘deciding which values to load’ problem is echoed in other writing on this topic. One proposed answer to this question, the Coherent Extrapolated Volition (CEV) is described by Yudkowsky as‘a proposal about what a sufficiently advanced self-directed AGI should be built to want/target/decide/do’.
With the caveat that this is something you should do ‘with an extremely advanced AGI, if you're extremely confident of your ability to align it on complicated targets’.
However, if we only accept the above as problems to be solved, we are being problematically vague. Bostrom explains why in Chapter 14. If we really can ‘install any arbitrary value into our AI’, we can simply require the AI to ‘do what I mean’ or ‘be nice’ and leave it at that. If an AGI successfully did “want/target/decide to do what I meant”, then we would have successful value alignment!
Answers like this are not even wrong - they shunt all of the difficult work into the question of solving the Value Loading Problem, i.e. in precisely specifying ‘do what I mean’ or ‘be nice’.
In order to address these philosophical problems in a way that is still rooted in technical considerations, I propose that instead of simply asking what an AGI should do if we could install any arbitrary value, we should seek to solve the Value Definition Problem:“Given that we are trying to solve the Intent Alignment problem for our AI, what should we aim to get our AI to want/target/decide/do, to have the best chance of a positive outcome?”
In other words, instead of the unconditional, ‘what are human values’ or ‘what should the AI be built to want to do’, it is the conditional, ‘What should we be trying to get the AI to do, to have the best chance of a positive outcome’.
This definition of the VDP excludes excessively vague answers like ‘do what I mean’, because an AI with successful intent alignment is not guaranteed to be capable enough to successfully determine ‘what we mean’ under all circumstances. In extreme cases, like the Value Definition ‘do what I mean’, "what we mean" is undefined because we don't know what we mean, so there is no answer that could be found.
If we have solved the VDP, then an Intent-Aligned AI, in the course of trying to act according to the Value Definition, should actually be able to act according to the Value Definition. In acting according to this Value Definition, the outcome would be beneficial to us. Even if a succesfully aligned AGI is nice, does what I mean and/or acts according to Humanity's CEV, these were only good answers to the VDP if adopting them was actually useful or informative in aligning this AGI.
What counts as a good solution to the VDP depends on our solution to intent alignment and the AGI’s capabilities, because what we should be wanting the AI to do will depend on what the AGI can discover about what we want.
This definition of the VDP does not precisely cleave the technical from the philosophical/ethical issues in solving AI value alignment, but I believe it is well-defined enough to be worth considering. It has the advantage of bringing the ethical and technical AI Safety considerations closer together.
A good solution to the VDP would still be an informal definition of value: what we want the AI to pursue. However, it should give us at least some direction about technical design decisions, since we need to ensure that the Intent-Aligned AI has the capabilities necessary to learn the given definition of value, and that the given definition of value does not make alignment very hard or impossible.Criteria for judging Value Definitions
- How hard would Intent-Aligning be; How hard would it be to ensure the AI ‘tries to do the right thing’, where ‘right’ is given by the Value Definition. In particular, does adopting this definition of value make intent-alignment easier?
- How great would our AGI capabilities need to be; How hard would it be for the AGI to ‘[figure] out which thing is right’, where ‘right’ is given by the Value Definition. In particular, does adopting this definition of value help us to understand what capabilities or architecture the AI needs?
- How good would the outcome be; If the AGI is successfully pursuing our Value Definition, how good would the outcome be?
3 is what Bostrom focuses on in Chapter 14 of Superintelligence, as (with the exception of dismissing useless answers to the VDP like ‘be nice’ or ‘do what I mean’) he does not consider whether different value definitions would influence the difficulty of Intent Alignment or the required AI Capabilities. Similarly, Yudkowsky assumes we are ‘extremely confident’ of our ability to get the AGI to pursue an arbitrarily complicated goal. 3 is a normative ethical question, whereas the first two are (poorly understood and defined) technical questions.
Some values are easier to specify and align to than others, so even when discussing pure value definitions, we should keep the technical challenges at the back of our mind. In other words, while 3 is the major consideration used for judging value definitions, 1 or 2 must also be considered. In particular, if our value definition is so vague that it makes intent alignment impossible, or requires capabilities that seem magical, such as ‘do what I mean’ or ‘be nice’, we do not have a useful value definition.Human Values and the VDP
While 1 and 2 are clearly difficult questions to answer for any plausible value definition, 3 seems almost redundant. It might seem as though we should expect at least a reasonably good outcome if we were to ‘succeed’ with any definition that is intended to extract the values of humans, because by definition success would result in our AGI having the values of humans.
Stuart Armstrong argues that to properly address 3 we need ‘a definition - a theory - of what human values actually are’. This is necessary because different interpretations of our values tend to diverge when we are confronted by extreme circumstances and because in some cases it is not clear what our ‘real preferences’ actually are.An AI could remove us from typical situations and put us into extreme situations - at least "extreme" from the perspective of the everyday world where we forged the intuitions that those methods of extracting values roughly match up.Not only do we expect this, but we desire this: a world without absolute poverty, for example, is the kind of world we would want the AI to move us into, if it could. In those extreme and unprecedented situations, we could end up with revealed preferences pointing one way, stated preferences another, while regret and CEV point in different directions entirely.
3 amounts to a demand to reach at least some degree of clarity (if not solve) normative ethics and metaethics - we have to understand what human values are in order to choose between or develop a method for pursuing them.Indirect vs Direct Normativity
Bostrom argues that our dominant consideration in judging between different value definitions should be the ‘principle of epistemic deference’The principle of epistemic deferenceA future superintelligence occupies an epistemically superior vantage point: its beliefs are (probably, on most topics) more likely than ours to be true. We should therefore defer to the superintelligence’s opinion whenever feasible.
In other words, in describing the 'values' we want our superintelligence to have, we want to hand over as much work to the superintelligence as possible.This takes us to indirect normativity. The obvious reason for building a super-intelligence is so that we can offload to it the instrumental reasoning required to find effective ways of realizing a given value. Indirect normativity would enable us also to offload to the superintelligence some of the reasoning needed to select the value that is to be realized.
The key issue here is given by the word ‘some’. How much of the reasoning should we offload to the Superintelligence? The principle of epistemic deference answers ‘as much as possible’.
What considerations push against the principle of epistemic deference? One consideration is the metaethical views we think are plausible. In Wei Dai’s Six Plausible Meta-Ethical Alternatives, two of the more commonly held views are that ‘intelligent beings have a part of their mind that can discover moral facts and find them motivating, but those parts don't have full control over their actions’ and that ‘there are facts about how to translate non-preferences (e.g., emotions, drives, fuzzy moral intuitions, circular preferences, non-consequentialist values, etc.) into preferences’.
Either of these alternatives suggest that too much epistemic deference is not valuable - if, for example, there are facts about what everyone should value but a mind must be structured in a very specific way to discover and be motivated by them, we might want to place restrictions on what the Superintelligence values to make sure we discover them. In the extreme case, if a certain moral theory is known to be correct, we could avoid having to trust the Superintelligence’s own judgment by just getting it to obey that theory. This extreme case could never practically arise, since we could never achieve that level of confidence in a particular moral theory. Bostrom says it is ‘foolhardy’ to try and do any moral philosophy work that could be left to the AGI, but as Armstrong says, it will be necessary to do some work to understand what human values actually are - how much work?Classifying Value Definitions
The Scale of Directness
Issa Rice recently provided a list of ‘[options] to figure out the human user or users’ actual preferences’, or to determine definitions of value. These ‘options’, if successfully implemented, would all result in the AI being aligned onto a particular value definition.We want good outcomes from AI. To get this, we probably want to figure out the human user's or users' "actual preferences" at some point. There are several options for this.
Following Bostrom’s notion of ‘Direct and Indirect Normativity’ we can classify these options by how direct their value definitions are - how much work they would hand off to the superintelligence vs how much work the definition itself does in defining value.
Here I list some representative definitions from most to least normatively direct.
Directly specify a value function (or rigid rules for acquiring utilities), assuming a fixed normative ethical theory.
It is essentially impossible to directly specify a correct reward function for a sufficiently complex task. Already, we use indirect methods to align an RL agent on a complex task (see e.g. Christiano (2017)). For complex, implicitly defined goals we are always going to need to learn some kind of reward/utility function predictor.
Learn a measure of human flourishing and aggregate it for all existing humans, given a fixed normative (consequentialist) ethical theory that tells us how to aggregate the measure fairly.
E.g. have the AI learn a model of the current individual preferences of all living humans, and then maximise that using total impersonal preference utilitarianism.
This requires a very high degree of confidence that we have found the correct moral theory, including resolving all paradoxes in population ethics like the Repugnant conclusion.
Taken from IDA. Attempt to ‘distil out’ the relevant preferences of a human or group of humans, by imitation learning followed by capability amplification, thus only preserving those preferences that survive amplification.
Repeat this process until we have a superintelligent agent that has the distilled preferences of a human. This subset of the original human’s preferences, suitably amplified, defines value.
Note that specific choices about how the deliberation and amplification process play out will embody different value definitions. As two examples, the IDA could model either the full and complete preferences of the Human using future Inverse Reinforcement Learning methods, or it could model the likely instructions of a ‘human-in-the-loop’ offering low-resolution feedback - these could result in quite different outcomes.
Both Christiano’s formulation of Indirect Normativity and the CEV define value as the endpoint of a value idealization and extrapolation process with as many free parameters as possible.Predict what an idealized version of us would want, "if we knew more, thought faster, were more the people we wished we were, had grown up farther together". It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge
Have the AI determine the correct normative ethical theory, whatever that means, and then act according to that.
'Do What I Mean'
I have tried to place these different definitions of value in order from the most to least normatively direct. In the most direct case, we define the utility function ourselves. Less direct than that is defining a rigid normative framework within which the AGI learns our preferences. Then, we could consider letting the AGI also have decisions over which normative frameworks to use.
Much less direct, we come to deliberation-based methods or methods which define value as the endpoint of a specific procedure. Christiano’s Iterated Amplification and Distillation is supposed to preserve a particular subset of human values (those that survive a sequence of imitation and capability amplification). This is more direct than CEV because there some details about the distillation procedure are given. Less direct still is Yudkowsky’s CEV, because CEV merely places its value as the endpoint of some sufficiently effective idealisation and convergence procedure, which the AGI is supposed to predict the result of, somehow. Beyond CEV, we come to ‘methods’ that are effectively meaningless.Considerations
Here I briefly summarise the considerations that push us to accept more or less normatively direct theories. Epistemic Deference and Conservatism were taken from Bostrom (2014), while Well-definedness and Divergence were taken from Armstrong.
Epistemic Deference: Less direct value definitions defer more reasoning to the superintelligence, so assuming the superintelligence is intent-aligned and capable, there are fewer opportunities for mistakes by human programmers. Epistemic Deference effectively rules out direct specification of values, on the grounds that we are effectively guaranteed to make a mistake resulting in misalignment.
Well-definedness: Less direct value definitions require greater capabilities to implement, and are also less well-defined in the research directions they suggest for how to construct explicit procedures for capturing the definition. Direct utility specification is something we can do today, while CEV is currently under-defined.
Armstrong argues that our value definition must eventually contain explicit criteria for what ‘human values’ are, rather than the maximal normative indirectness of handing over judgments about what values are to the AGI - ‘The correct solution is not to assess the rationality of human judgements of methods of extracting human values. The correct solution is to come up with a better theoretical definition of what human values are.’
Conservatism: More direct theories will result in more control over the future by the programmers. This could be either good or bad depending on your normative ethical views and political considerations at the time the AI is developed.
For example, Bostrom states that in a scenario where the morally best outcome includes reordering all matter to some optimal state, we might want to turn the rest of the universe over to maximising moral goodness but leave an exception for Earth.This would involve more direct specification.
Divergence: If you are a strong externalist realist (believes that moral truth exists but might not be easily found or motivating) then you will want to take direct steps to mandate this. If the methods that are designed to extract human preferences diverge strongly in what they mandate, we need a principled procedure for choosing between them, based on what actually is morally valuable. More normatively direct methods provide a chance to make these moral judgement calls.Summary
I have provided two main concepts which I think are useful for judging nontechnical AI Safety proposals - these are, The Value Definition Problem, and the notion of the Scale of Normative Directness and the considerations that affect positioning on it. Both these considerations I consider to be reframings of previous work, mainly done by Bostrom and Armstrong.
I also note that, on the Scale of Directness, there is quite a large gap between a very indirect method like CEV, and the extremely direct methods like ambitious value learning.
‘Ambitious Value Learning’ defines value using a specific, chosen-in-advance consequentialist normative ethical theory (which tells us how to aggregate and weight different interests) that we then use an AI to specify in more detail, using observations of humans’ revealed preferences.
Christiano says of methods like CEV, which aim to extrapolate what I ‘really want’ far beyond what my current preferences are; ‘most practitioners don’t think of this problem even as a long-term research goal — it’s a qualitatively different project without direct relevance to the kinds of problems they want to solve’. This is effectively a statement of the Well-definedness consideration when sorting through value definitions - our long-term ‘coherent’ or ‘true’ preferences currently aren’t well understood enough to guide research so we need to restrict ourselves to more direct normativity - extracting the actual preferences of existing humans.
After CEV, the next most ‘direct’ method, Distilled Human preferences (the definition of value used in Christiano’s IDA), is still far less direct than ambitious value learning, eschewing all assumptions about the content of our values and placing only some restrictions on their form. Since not all of our preferences will survive the amplification and distillation processes, the hope is that the morally relevant ones will - even though as yet we do not have a good understanding of how durable our preferences are and which ones correspond to specific human values.
This vast gap in directness suggests a large range of unconsidered value definitions that attempt to ‘defer to the Superintelligence’s opinion’ not whenever possible but only sometimes.
Armstrong has already claimed we must do much more work in defining what me mean by human values than the more indirect methods like IDA/CEV suggest when he argued, ‘The correct solution is not to assess the rationality of human judgements of methods of extracting human values. The correct solution is to come up with a better theoretical definition of what human values are.’
I believe that we should investigate ways to incorporate our high-level judgements about which preferences correspond to ‘genuine human values’ into indirect methods like IDA, making the indirect methods more direct by rigidifying parts of the deliberation or idealization procedure - but that is for a future post.
NB: Originally published on Map and Territory on Medium. This is an old post originally published on 2016-10-04. It was never previously cross-posted or linked on LessWrong, so I'm adding it now for posterity. It's old enough that I can no longer confidently endorse it, and I won't bother trying to defend it if you find something wrong, but it might still be interesting.
A lot of folks these days talk about “flow” to mean some kind of mystical state where they experience something like automatic decision making where they get out of their own way and just do. Less mysteriously, other folks use “flow” to describe periods of focused attention. More mysteriously, some folks talk about something similar but as the Daoist idea of action through non-action. Let’s see if we can make some sense of what’s going on here.
As far as I can tell we talk about flow because Daoist philosophy explains virtuous behavior (de) as being a mind like water. Chapter 78 of the Daodejing reads:Nothing in the world
is as soft and yielding as water.
Yet for dissolving the hard and inflexible,
nothing can surpass it.The soft overcomes the hard;
the gentle overcomes the rigid.
Everyone knows this is true,
but few can put it into practice.Therefore the Master remains
serene in the midst of sorrow.
Evil cannot enter his heart.
Because he has given up helping,
he is people’s greatest help.True words seem paradoxical.
And Chapter 15 reads:The ancient Masters were profound and subtle.
Their wisdom was unfathomable.
There is no way to describe it;
all we can describe is their appearance.They were careful
as someone crossing an iced-over stream.
Alert as a warrior in enemy territory.
Courteous as a guest.
Fluid as melting ice.
Shapable as a block of wood.
Receptive as a valley.
Clear as a glass of water.Do you have the patience to wait
till your mud settles and the water is clear?
Can you remain unmoving
till the right action arises by itself?The Master doesn’t seek fulfillment.
Not seeking, not expecting,
she is present, and can welcome all things.
Though this is not necessarily the direct lineage of the modern usage, this gives a sense of what metaphors of being liquid-like are trying to imply. In flow a person acts naturally, takes the most direct path, and problems yield before them. It’s reported to feel from the inside like achieving without trying, and also like a full integration of knowledge into action that does what’s intended. This integration is where I think we can get a grasp on what is happening in flow.
I previously explored a 2-dimensional theory of psyche along the dimensions of detail and pattern. In brief, we can model the brain (in part because it now appears it may physically work this way) as operating by giving high or low weight to details and high or low weight to patterns. When details are high and patterns are low, it’s what we might term near construal mode, S1, or the id. When details are low and patterns are high, it’s what we might term far construal mode, S2, or the superego. And when details and patterns are high, it’s what we might term an integrated mode or the ego.
This integration of details and patterns allows a balanced approach to updating and acting on information. Too much focus on detail and there’s an overreaction to specifics that ignores known patterns. Too much focus on pattern and there’s a failure to account for specific circumstances. But it is also not enough to integrate details and patterns: they must be each weighted appropriately to result in confident action.
We know that simply integrating the two is not enough because we experience cognitive dissonance all the time. Admittedly there is cognitive dissonance among competing patterns and among contradictory details, but what I’m focused on here is the disagreement of patterns and details, like in this apocryphal story of a physics class:The period bell rings and the students shuffle into another day of high school physics. The teacher is standing over by the radiator balancing a 1-inch think metal sheet against it. She asks a student to come up and observe the metal sheet. He looks at it, and at the teacher’s invitation, touches each side of the sheet. To his surprise, the side away from the radiator is hotter than the side toward the radiator! He returns to his seat and the teacher asks the students what’s going on.“Air currents convecting heat to the far side”, says one student.“Nope,” says the teacher. “There’s not enough air movement over here to cause that.”“The metal sheet is made of two metals, a different one on each side, and the far one absorbs heat faster than the one near the radiator,” says another.“Interesting idea,” says the teacher, “but this metal sheet is definitely all made of the same alloy.”“MAGNETS!!!” cries a third.The other students and teacher ignore this one.After the students have exhausted all their theories and the teacher has shot them all down, they give up. “Tell us, sensei, what is going on here?”“Simple,” said the teacher, resplendent in the glow of the afternoon sun shining through the window, “I turned the sheet around just before any of you came into the classroom.”
Of course, most real-world situations are not quite so intentionally dubious. Instead we have theories about how our cars work, why our friends say what they say, and what our cats are thinking and then they fail to predict or explain our car problems, our relationship troubles, or the inscrutable actions of our feline companions. We are constantly faced with inconsistencies between pattern and detail and are trying to fix them up. So just because the ego is in control, because we are integrating S1 and S2, near and far, yin and yang, we may still find we aren’t flowing.
So if an integrated thinking mode that combines details and patterns is to explain flow, we’re going to need more than integration alone. Going full ego is not enough. If there is anything here it is probably in how the details and patterns are integrated. This, fortunately and unfortunately, is somewhat well studied but poorly understood or applied, and goes by the name of rationality.
Rationality as a procedure pops up in multiple fields: game theory, economics, psychology, sociology, probability theory, and artificial intelligence to name a few. We can broadly think of it as the optimal way of integrating information that satisfies an arbitrary value function. It’s not exactly so-called “cold logic”, though that is a degenerate case, and fully encompasses anything that most directly approaches “winning” for whatever winning means to you. It looks like Bayes’s Theorem in its pure form, and learning to apply it to your thinking is one of the goals of a classical education.
By applying rationality, i.e. optimal information pumping, to the integration of details and patterns, we describe something that sounds a lot like flow. Details come in (evidence), they are weighted and balanced against patterns (priors), and combined to achieve an updated state that can be read to point to a clear next action. Even under uncertainty flow is possible, with the best possible option floating to the top. The next thought or action comes automatically because it is the best currently available way forward in the present integration.
But flow, to what extent we can find ourselves in it, is easily broken, and our model points to the ways in which it breaks. On one hand, details can be overvalued so that patterns are not sufficiently heeded. We get lost in the moment, forget our better judgement, and act without thinking. On the other, patterns can be overvalued so much so that the details don’t change our minds enough. We get in our own way, give in to fear, and overthink. Rationality, and thereby flow, is easily broken by being out of balance between details and patterns, and it’s only through skilled practice that we can keep our minds like water.
So it seems somewhat useful to think of flow as rational detail and pattern integration. But this explanation seems fail to square with the way most folks talk about what flow feels like from the inside. It’s described as being automatic, doing without trying, and acting through nonaction. People say it feels peaceful to be in flow, time seems to fly by, and the running self-narrative diminishes or stops. This sounds a lot more like a cessation of activity than an optimization of it.
But what is optimization if not a cessation of chaos? Normally the mind feels full of competing systems trying to pull us in different directions. Some people report experiencing their own minds as a conversation between multiple agents, and not just metaphorically but as in they think to themselves as multiple characters in communication. It seems not a coincidence that we describe dynamic systems as quiet and stable when they are working as expected. So it’s perhaps not surprising we should say flow feels like things in our minds are stopping because in a sense they are: they stop causing problems and work together.
So there we have a theory of flow that is based on physiological underpinnings that, while still not proven, provide a reasonable explanation of basic processes that we can use to construct simple, seemingly useful models of more complex mental processes. I’m interested in exploring possible weaknesses I’ve missed in this theory, so comment with your objections and I’ll see if they can be addressed.
NB: Originally published on Map and Territory on Medium. This is an old post originally published on 2016-09-18. It was never previously cross-posted or linked on LessWrong, so I'm adding it now for posterity. It's old enough that I can no longer confidently endorse it, and I won't bother trying to defend it if you find something wrong, but it might still be interesting.
Over the last couple months, due to reading Daoist philosophical texts, I’ve come to deeply internalize something I’ve known for a long time: morality doesn’t exist “out there” in reality and is instead a construct of our preferences and the dialectic between different people’s preferences.
If you stumbled upon this and didn’t realize morality wasn’t essential, well, um, I’m not going to try to convince you of that. Probably a not terrible reading recommendation is the Less Wrong series on metaethics.
I started down the path to giving up an internal sense of essential morality when meditating on the Daoist position that there is fundamentally no differentiation. For example, chapter 41 of the Daodejing reads in part:Thus it is said:
The path into the light seems dark,
the path forward seems to go back,
the direct path seems long,
true power seems weak,
true purity seems tarnished,
true steadfastness seems changeable,
true clarity seems obscure,
the greatest are seems unsophisticated,
the greatest love seems indifferent,
the greatest wisdom seems childish.
And in chapter 20 we find:Stop thinking, and end your problems.
What difference between yes and no?
What difference between success and failure?
Must you value what others value,
avoid what others avoid?
And in both the texts of Zhuangzi and Liezi we are given multiple stories where beauty and good acts do not lead to happiness and ugliness and wickedness do not hinder virtue. On the surface we are given contradictions, but by looking deeper the contradictions dissolve if we perceive that the dichotomy is false.
Even speaking of virtue is itself an interesting case. The word used in Chinese, 德 or de, means virtue with a moralistic component in normal use just as is found in English, but de also has a meaning of step and shares with virtue’s Latin roots in meaning strength or capacity. So even here we find, when it looks as though we are being given moral advice, it only seems that way if we take it to be that: take away the perception of morality and we are given possible steps along the path.
With this in mind, I set out to experiment with removing my use of moralistic language. We tend to say things are good or bad when really what we mean is that we like them or we don’t. And if I want to find out if morality really does not exist as an essential property of the universe, it’s worthwhile to try to take it out of my language and see if it comes up missing.
So I have tried to do this. I try to no longer say things are good or bad, and instead try to say I like or dislike things, or I want more or less of things. And aside from having a hard time breaking the habit of using common phrases that happen to contain “good” or “bad” like saying “this tastes good” to mean “I like how this tastes”, it’s proven very straight forward and thrown into contrast those times when I was projecting my own preferences onto the universe.
This projection happens through the turn of phrase. If I think what my friend is wearing is ugly and and I say to them “that looks bad”, I’m implicitly suggesting their appearance goes against an external measure of style. But if I say “I don’t like what you’re wearing”, I have to be the owner of the preference, and I know it’s not living out in the universe apart from me. And if we look deeper, there’s no sense in which something can “look good” if there is no observer to assess the quality, so it seems through language we casually mistake preferences for essences.
And so I have now more internalized the existential nature of morality I have long intellectually known.
NB: Originally published on Map and Territory on Medium. This is an old post originally published on 2016-09-14. It was never previously cross-posted or linked on LessWrong, so I'm adding it now for posterity. It's old enough that I can no longer confidently endorse it, and I won't bother trying to defend it if you find something wrong, but it might still be interesting.
In a recent post Scott Alexander gives a review of some recent results in neurobiology that suggest a powerful, unifying set of mechanisms for how information is integrated in the brain. I recommend you read his article and the original research if you can, but I’ll summarize it briefly.
There are various chemicals regulating activity in the brain. There is now evidence that these chemicals act in coordinating an information pump in the brain. Change the chemicals and you change the parameters of the information pump. It seems specifically the information pump in play fits the Bayesian model in that certain chemicals regulate the presentation of prior evidence, others new evidence, and yet others confidence in those evidences.
What I find compelling is that the model described provides a plausible mechanism by which the theory of a 2-part psyche might work. There are several two-part theories of psyche, that is to say theories of how mental processes are organized. My preferred one is near/far construal theory, but there is also the S1/S2 distinction, the fast/slow distinction, in Chinese philosophy yin and yang, and even the hot/cold blood model from medieval European thought. Each of these acts as a way of classifying thoughts and behaviors along a spectrum between two extremes.
The interesting thing about the 2-part psyche theories, and why I prefer the near/far distinction, is that they all seem to operate along the same dimension. Near/far uses the metaphor of distance (because it happens we use similar reasoning patterns when working with things that are physically near versus far) to differentiate between things that are heavy on details and light on patterns versus those that are heavy on patterns and light on details. S1/S2 uses basically the same dimension, as does fast/slow, yin/yang, and hot/cold: stuff with lots of details is near, fast, yin, hot, and part of S1 while stuff with less details and stronger patterns are far, slow, yang, cold, and part of S2. This suggests that they are all pointing at the same sort of thing, though in slightly different ways.
And, as it happens, this is basically the same dimension along with chemicals in the brain seem to affect cognition, balancing between how much to weigh new evidence (details) against prior evidence (patterns). So it seems that we now have a plausible biological basis for the two-part psyche we’ve reasoned exists and find useful, whereas before it was just a pattern that worked without strong evidence of a mechanism.The 3-Part Psyche
So that takes care of the 2-part psyche, but what about the arguably more popular 3-part psyche model. The 3-part model dates back at least to Aristotle in the West and the gunas in India, was revitalized by Freud, and has bloomed into various descendent theories in modern psychology such as Internal Family Systems. Each version has different boundaries and explanations, so for simplicity I’ll use Freud’s well-known terminology.
Briefly, these theories all see roughly the same three parts in the psyche: the id, the ego, and the superego. The id is the part that acts and responds “on instinct”, the ego is the part that is “rational” and integrates the other two, and the superego is the part that operates on “moral” grounds. These parts are viewed as working in relation to one another, frequently in opposition, with what someone does and thinks arising from their interaction.
These theories connect with the 2-part psyche model in that near corresponds to aspects of the id and ego and far corresponds to aspects of ego and superego. When we see this kind of correspondence with superposition, it suggests both are mixing up the underlying reality in different ways. We can use this to try to pick apart what’s really going on.
The commonalities of id and ego seem to be an inclusion of details, same as for near. What’s different is that ego has a concept of integration with patterns whereas near and id do not.
The commonalities of ego and superego are just the opposite: inclusion of patterns. Same goes for far. The differences are that ego includes details while far and superego do not.
I propose from this that if we separate out details and patterns onto separate dimensions we can get a 2-dimensional model that captures both the 2-part and 3-part psyche models and even suggests a 4-part psyche model.
Now the 2-part model corresponds to the line between the more details, less patterns corner of the space and the less details, more patterns corner with near and far as their division down the middle, respectively. This is drawn as a dotted line in the above chart.
The 3-part models corresponds to 3 of the 4 quadrants formed around the middle of the 2-dimensional space: id is more details with less patterns, ego is more details with more patterns, and superego is more patterns with less details. This also leaves a suspiciously empty 4th quadrant to be part of the psyche with less details and less patterns.
And, to make things even better, this fits with the biological model Scott summarizes: there are chemicals regulating how much to favor details and how much to favor patterns. Normal thinking and behavior fall in the ego quadrant or at least near the center while mental disorders appear when the chemical regulation of detail and pattern strength are out of their typical balance.
So for all their faults in the past, maybe our theories of the psyche have been pointing us in the right direction all along, just in a confused way.The 4th Quadrant
This still leaves us with the fourth quadrant that’s gone unaddressed. Here I’ll offer some brief speculation on what it might be before wrapping up.
Since this theory points to something that will feel qualitatively different to us from the inside the way id, ego, and superego do when both details and patterns are weak, we should go looking for things we consider mental states that don’t fit well in the existing 2-part or 3-part models. One immediately comes to mind: dreams.
Dreams are, among other things, a time when you have low sensory information and seem to have trouble completing patterns. We talk about “dream logic” because in dreams you can jump between fitting patterns to limited data that often violate the causal narrative we expect to find in our thinking. And dreams seem to incorporate memories, often recent and important memories, in place of outside sensory data. This is by no means a slam dunk, but it does weakly fit the evidence.
Which is the point on which I wish to end: all of this is based on fairly weak evidence. Although the 2-part and 3-part psyche models are fairly robust, they have always had problems because they readily fall apart upon rigorous inspection and have not had a clear biological basis so are subject to introspection bias. Additionally, my new evidence from Scott is an interpretation of an interpretation of recent findings and stretches well beyond what we can safely conclude.
At the same time, these ideas are exciting and, I think, worth exploring because they give us a potential model for better understanding human thoughts and behavior. I fully expect this 2-dimensional, 4-part psyche of details and patterns to make wrong predictions, but I’m hopeful it makes more right predictions and fewer wrong predictions than either 2-part or 3-part psyche models do. I look forward to testing them and seeing what more we can learn about our messy selves.
NB: Originally published on Map and Territory on Medium. This is an old post originally published on 2016-09-10. It was never previously cross-posted or linked on LessWrong, so I'm adding it now for posterity. It's old enough that I can no longer confidently endorse it, and I won't bother trying to defend it if you find something wrong, but it might still be interesting.
I find Kegan’s model of psychological development extremely useful. Some folks I know disagree on various grounds. These are some accumulated responses to critiques I’ve encountered.
Before we dive into these critiques, though, allow me to attempt a brief introduction to the theory (though this is a tough undertaking, as we’ll discuss below). Robert Kegan, later along with Lisa Lahey, put forward a theory of developmental psychology rooted in complexity of meaning making. It is influenced and builds on the work of Piaget, but also Erikson and Kohlberg, who extended developmental psychology to consider the possibility of adult development.
Kegan’s theory focuses on the maximally complex models people can make of the world in near construal mode. Development is along a continuous gradient but with clear “levels” where a particular kind of complexity is fully available to the thinker. These are classified from 1 to 5 and can be summarized in many ways, though fundamentally they correspond to when a person can form fully-articulable models of things, relationships between things, systems, relationships between systems, and systems of systems (holons), respectively.
This is an exceedingly dense introduction, though, and I know of no good short explanation. The best resources on the topic remain Kegan’s seminal The Evolving Self and his later In Over Our Heads for a more approachable, example-laden presentation.
The first, and perhaps strongest, critique of Kegan is that it’s very hard for anyone to explain it. Kegan begins In Over Our Heads with the story of getting a letter from a student assigned to read The Evolving Self. The student writes that The Evolving Self is full of interesting ideas but gets so frustrated trying to make sense of it that he wants to “punch [Kegan] in the teeth”.
Partly this is because of Kegan has a strong literary and classics background, so The Evolving Self is full of very precise language with subtle meanings, many so subtle that Kegan takes multipage digressions to explain them. But In Over Our Heads and his later books written with Lahey and other coauthors use more familiar language and yet still leave people confused.
The theory seems to defy simple explanation. I’ve yet to find one written by Kegan, Lahey, or anyone else that was able to reliably convey in less than 40,000 words a reasonably coherent and complete view of it. As one person I know put it, the theory reads to him as analogous to someone saying there are invisible dragons, undetectable by readily available means, but which you will notice if you devote at least 20 hours to the study of invisible dragons.
Yet, those of us who have put in the time to “see the invisible dragons” tend to be pretty excited about the theory. Kegan gives us a way to understand and construct many aspects of human behavior and thought that time and again prove consistent and reflective of reality. So if it works so well, why is it so hard to explain?
There are a few ways things can be hard to explain. One is that they are unintuitive. Physics is like this: we perceive the world as if it operated the way Aristotle imaged it worked, but it turns out this approximation breaks down at extremes and the quest to find a complete theory forces us to consider ever more exotic phenomena.
Another way things may be hard to explain is that they’re complicated. Machines and living things are like this, with engineers and biologists mostly struggling to make clear what’s happening in systems where lots of details matter. A clock might fail if it’s missing a tooth on one gear or a frog might die if it’s missing a sequence in its DNA, and understanding why is a messy business of picking through tightly interwoven threads of causality.
But perhaps the most vexing way something can be hard to explain is when it’s complex. That is to say, even if it has few details and works in a straightforward manner, thinking through how it works is still hard. Game theory, economics, and most everything touched by mathematics is like this: just a few “simple” rules lead to bewildering complexity under combination.
So when trying to explain a theory like Kegan’s that has at its heart a developmental progression in human capacity to cope with complexity, it’s perhaps unsurprising that the complexity can collapse back in on itself and make the theory look like disjointed rubble. The theory, in fact, predicts this, because it’s one about the relationships between systems (i.e. the change in human meaning making over time), so by its own expectations it will prove difficult to gain an intuitive grasp on without the reader having themselves first already attained the capacity to naturally reason about relationships between systems in near construal mode (level 4 in Kegan’s model).
To most people this feels like the theory saying “you can’t understand it until you already understand it”, but there’s more going on here. It’s instead saying that Kegan’s developmental theory belongs to a class of things that cannot be fully understood without the ability to naturally, intuitively work with the relationships between systems. Without that ability it may be understood in other ways, in particular using far construal mode, but that is demanding on the level of learning algebra, calculus, or differential equations, which is to say something that even the brightest among us struggle with.
But if it’s really this hard, why do people feel they can reject Kegan when they can’t reject, say, abstract algebra in the same way? They may find they completely lack the capacity to understand what’s going on in abstract algebra in near mode, yet aside from a few mathematicians with technical objections, no one thinks abstract algebra fails to model well the parts of reality it is attempting to model whether they understand it or not. At worst it’s just some of that “math stuff” other people worry about but they don’t “get”.
The difference, as Robin Hanson has observed in the general case, is that Kegan is a theory about stuff we are intimately familiar with: people. We are happy to defer to experts and theories we don’t understand on topics we don’t feel we have much of a grasp on, like abstract algebra, but as things get progressively more “real” we feel less inclined to trust complex theories experts put forward that we don’t understand ourselves.
There’s a sort of escalating scale of how many people don’t trust experts that’s a function of distance from lived experience, social agreements on who has expertise, and availability of evidence to check our understanding. Basically everyone trusts experts in math because it’s far from lived experience and we agree that mathematicians are the math experts even though we can only easily validate the veracity of the simplest mathematical claims without training.
Most, but slightly fewer, folks trust experts in physics. People agree that physicists are the experts and have lots of evidence to prove their unintuitive theories are right (planes fly, electricity powers our devices, computers work with no moving parts). The only difficulty for physicists is that we all live physics, so there’s a constant battle against violations of intuition they must overcome to convince us of their theories.
Less trusted still are doctors, economists, and philosophers. Somewhere between economists and philosophers we find anyone attempting to explain human behavior. We all have lots of experience with it, there’s lots of evidence around to check against, so the only thing holding up the experts is that we agree their expertise exists because someone gave them an advanced degree in it.
So in general people feel free to reject arguments about human behavior that don’t seem intuitive even if they are provided by experts. It’s for the same meta-reason that no one listens to the economists and philosophers have been engaged in the same discussions for millennia: it feels easy to reject what doesn’t feel true when it’s something we have a lot of experience with and can easily gather data on.
Is this why folks who find it hard to understand Kegan often choose to reject it? I suspect probably yes, but then again I’m asking you to accept my argument about human behavior concerning belief strength in theories people don’t fully comprehend and are not expert in, so I’ll leave my response to this critique here before I ascend too far up a house of cards.
So suffice to say, Kegan is complex, complex enough that it’s predictably hard to understand, and about a topic where we little trust experts.
Kegan is sometimes presented as “wrong” because it’s not always accurate. That is to say, because it’s a theory that presents a model of the world, it has edge cases at which it seems to break down. This is a standard objection to all models in all domains and is uninteresting, but since there is a high likelihood of confusion due to lack of trust in expertise here, it’s worth covering.
A model is some explanation and prediction of how the world works. For example, atomic theory gives us a way of understanding matter as indivisible (atomic) particles. Like all theories, it’s “wrong” in that reality is not actually made up of atoms — it’s just reality. Instead atoms are a way of understanding reality that let us explain phenomena we see and predict future phenomena with some degree of accuracy. To the extent that atomic theory predicts what happens in reality, it is useful to the purpose of predicting future events. This doesn’t make it “right”, just predictive enough for our needs.
When atomic theory fails to make correct predictions it’s not “wrong”. Instead it’s that the theory is not complete because it’s a model and not reality and the only perfect model of reality is reality itself, just as the only perfect map of the Earth is the Earth itself.
So Kegan’s developmental theory is naturally not a perfect predictor of reality. We can only judge it by how accurate it is for the things we want to use it for. Whether or not it’s accurate enough to be useful is what we’ll explore in the remaining critiques.
The remaining two major objections are technical in that they assume an understanding of Kegan and find problems on internal grounds. Feel free to just skip to the end if these are not of interest to you.
The first problem is that Kegan differentiates expectations of what you can do in near and far mode. I’ll note here, though, that Kegan does not explicitly reference construal level theory, dual process theory, or any other two part theory of mind. This is mostly an artifact of its time: Kegan wrote The Evolving Self before these theories were well developed, and instead spends a decent number of words to explain that he’s focused on the capacity of intuitive, immediate, natural ratiocination.
Having lacked a referent to contrast near and far modes, some people naturally object to the theory on the grounds that mathematicians, for example, show incredible capacity to reason about holons in their 20s despite lacking the behavior patterns expected of someone with this capacity. The difference, of course, is that mathematicians do their work in far mode, and are in fact exceptionally talented at thinking in far mode, but because far mode is not used to engage in day-to-day activity because it’s too complicated to fit in far mode, that far mode capacity for handling greater complexity does not extend to near mode.
It’s unclear whether capacity to handle complexity in near mode extends to far mode. It seems likely but there’s not much data on, for example, people becoming mathematicians in their 50s after struggling with math for the previous 5 decades.
The second objection is that Kegan is not directly testable because it’s a theory about changes in the way of meaning making which is inherently unobservable since it exists only as a dialectic between perception and reality. While it may be true that you can’t directly test if the model is how reality is structured, this is a problem for all theories of mind and it has the same solution as they do: you can test the predictions. We can check whether the expected behavior of people at particular Kegan levels correlates with their actual behavior.
There’s unfortunately very little data on this. About the best we have comes from Lahey and her work in applying Kegan’s model to education reform and management consulting, and most of the available data I’m aware of is collected post level assessment or informally, so it’s suspect. I happily concede this is a major issue and would love to see more data collected but consider it unlikely because in the past 30 years the theory has gained little traction, largely it seems due to its complexity, so not enough people are working in the area to generate the needed data to sufficiently test the theory.
I’ve tried to address here the most common objections I’ve encountered to Kegan’s theory. If there are additional objection categories I’ve left out that you notice, feel free to bring them up in the comments, and I’ll see if they are tractable problems or tear the whole thing down.
If reading this has piqued your interest in Kegan’s work, I highly recommend reading In Over Our Heads and The Evolving Self in that order. For applications of the theory you can check out Kegan and Lahey’s later works and for a philosophical incorporation of Kegan I suggest reading David Chapman.
Most of the time, when people have responses to my posts they write them as comments. Sometimes, however, they email or send messages. Since I strongly prefer comments I wanted to write some about why.
A discussion we have in comments will be open to people other than the two of us. People who read the post and are thinking along similar lines can see our back-and-forth. The comments will be attached to the post, potentially clarifying things for people who come across the post later. If the question comes up again I can link someone to our discussion of it. The comments show up in search engines for unrelated people interested in these ideas. Since communicating in public has all these positive externalities, I'm much more willing to put time and thought into a discussion if it can be public.
There are also benefits during the discussion, as other people often have valuable things to add. Many times a comment thread has been me and someone else, and a third person jumps in with an important consideration that hadn't occurred to either of us. Other times someone's comment sparks a thread which brings in perspectives from many people. It's not just that our talking in public helps others, but it helps us too.
More selfishly, comments have different expectations around responses. I read every comment, but I often don't reply. Maybe the comment is self contained, and while it's communicating something important a reply wouldn't add anything. Maybe I'm not sure how I'd like to respond yet, and then don't end up coming back to it. Maybe other people responded and it seems like the important details have come out already. Maybe I just don't have time. With one-on-one messages, however, the response burden is much higher. Just writing back "received and read" would be hostile, but writing a good response can be a lot of work. Sending a message should not generally obligate a reply, but it still feels rude not to put in the time for a thorough response.
There are valid reasons for non-public communication. Perhaps you're afraid of how people might respond to revealing details of your identity or taking an unpopular position. Perhaps you want to talk about something that is illegal but generally viewed as ok among your friends. Perhaps there are people who follow you around the internet harassing you. Perhaps you don't trust yourself to phrase sensitive issues in a way that doesn't lead to people being mad at you. This is not an exhaustive list! Private messages I've gotten about posts, however, don't seem to be sent for one of these reasons. If you're sending a message privately for a reason, it's helpful if you can say so.
While I don't normally ask for emails in response to posts, this particular one seems like it should be an exception, so you're welcome to write me at firstname.lastname@example.org. I'm not committing to reply, though!
Comment via: facebook
The Tails Come Apart As Metaphor For Life, but with an extra pun.
Suppose you task your friends with designing the Optimal Meal. The meal that maximizes utility, in virtue of its performance at the usual roles food fills for us. We leave aside considerations such as sourcing the ingredients ethically, or writing the code for an FAI on the appetizer in tiny ketchup print, or injecting the lettuce with nanobots that will grant the eater eternal youth, and solely concern ourselves with arranging atoms to get a good meal qua meal.
So you tell your friends to plan the best meal possible, and they go off and think about it. One comes back and tells you that their optimal meal is like one of those modernist 30-course productions, where each dish is a new and exciting adventure. The next comes back and says that their optimal meal is mostly just a big bowl of their favorite beef stew, with some fresh bread and vegetables.
To you, both of these meals seem good - certainly better than what you've eaten recently. But then you start worrying that if this meal is important, then the difference in utility between the two proposed meals might be large, even though they're both better than the status quo (say, cold pizza). In a phrase, gastronomical waste. But then how do you deal with the fact that different people have chosen different meals? Do you just have to choose one yourself?
Now your focus turns inward, and you discover a horrifying fact. You're not sure which meal you think is better. You, as a human, don't have a utility function written down anywhere, you just make decisions and have emotions. And as you turn these meals over in your mind, you realize that different contexts, different fleeting thoughts or feelings, different ways of phrasing the question, or even just what side of the bed you got up on that morning, might influence you to choose a different meal at a point of decision, or rate a meal differently during or after the fact.
You contain within yourself the ability to justify either choice, which is remarkably like being unable justify either choice. This "Optimal Meal" was a boondoggle all along. Although you can tell that either would be better than going home and eating cold pizza, there was never any guarantee that your "better" was a total ordering of meals, not merely a partial ordering.
Then, disaster truly strikes. Your best friend asks you "So, what do you want to eat?"
You feel trapped. You can't decide. So you call your mom. You describe to her these possible meals, and she listens to you and makes sympathetic noises and asks you about the rest of your day. And you tell her that you're having trouble choosing and would like her help deciding, and so she thinks for a bit, and then she tells you that maybe you should try the modernist 30-course meal.
Then you and your friends go off to the Modernism Bistro, and you have a wonderful time.
This is a parable about how choosing the Optimal Arrangement Of All Atoms In The Universe is an impossible moral problem. Accepting this as a given, what kind of thing is happening when we accept the decision of some authority (superhuman AI or otherwise) as to what should be done with those atoms?
When you were trying to choose what to eat, there was no uniquely right choice, but you still had to make a choice anyhow. If some moral authority (e.g. your mom) makes a sincere effort to deliberate on a difficult problem, this gives you an option that you can accept as "good enough," rather than "a waste of unknowable proportions."
How would an AI acquire this moral authority stuff? In the case of humans, we can get moral authority by:
- Taking on the social role of the leader and organizer
- Getting an endorsement or title from a trusted authority
- Being the most knowledgeable or skilled at evaluating a certain problem
- Establishing personal relationships with those asked to trust us
- Having a track record of decisions that look good in hindsight
- Being charismatic and persuasive
You might think "Of course we shouldn't trust an AI just because it's persuasive." But in an important sense, none of these reasons is good enough. We're talking about trusting something as an authority on an impossible problem, here.
A good track record on easier problems is a necessary condition to even be thinking about the right question, true. I'm not advocating that we fatalistically accept some random nonsense as the meaning of life. The point is that even after we try our hardest, we (or an AI making the choice for us) will be left in the situation of trying to decide between Optimal Meals, and narrowing this choice down to one option shouldn't be thought of as a continuation of the process that generated those options.
If after dinner, you called your mom back and said "That meal was amazing - but how did you figure out that was what I really wanted?", you would be misunderstanding what happened. Your mom didn't solve the problem of underdetermination of human values, she just took what she knew of you and made a choice - an ordinary, contingent choice. Her role was never to figure out what you "really wanted," it was to be an authority whose choice you and your friends could accept.
So there are two acts of trust that I'm thinking about this week. The first is how to frame FAI as a trusted authority rather than an oracle telling us the one best way to arrange all the atoms. And the second is how an FAI should trust its own decision-making process when it does meta-ethical reasoning, without assuming that it's doing what humans uniquely want.
This is Part X of the Specificity Sequence
Cats notoriously get stuck in trees because their claws are better at climbing up than down. Throughout this sequence, we’ve seen how humans are similar: We get stuck in high-level abstractions because our brains struggle to unpack them into specifics. Our brains are better at climbing up (concrete→abstract) than down (abstract→concrete).
If you’ve ever struggled to draw a decent picture, you know what being stuck at a high level of abstraction feels like in the domain of visual processing. I know I do. My drawing skills are nonexistent. I can draw kindergarten-quality stick figures, but I don’t even know where to begin drawing something that looks the least bit realistic.
Despite how pathetic a stick figure looks, it’s worth marveling at our brain’s formidable power to distill a mess of light and dark stimuli into a few geometric parts.
A stick figure is a mental representation of an animal which is great for practical tasks like throwing a spear at it.
The question is just, why can people like me only draw a pathetic stick figure even when we’re trying to draw a nice picture?
The visual system was only under selective pressure to evolve a processing pathway from sensing visual features to building an abstract representation of those features, not a processing pathway to transform a high-level mental representation into a low-level pencil strokes representation. I draw stick figures because my conscious mind thinks in stick figures.
But we know that the unconscious part of our visual brains isn’t one-way. When we first recognize that we’re seeing a cow, our brain propagates visual information from abstract mental representations down toward lower-level visual feature recognition.
Consider this image:
Most people have trouble identifying what they’re seeing in different parts of the image, until they realize it’s a cow, and then all the parts snap into focus. Their interpretation of low-level features of the image, e.g. that the white dots within the black foreground region on the leftmost part of the image are “furry”, is influenced top-down by their abstract mental representation of a cow.
How do skilled artists navigate down the ladder of visual abstraction consciously?
In Drawing on the Right Side of the Brain, Betty Edwards teaches students to accurately sketch the scene coming into their eyes through a clever upside-down drawing technique. The technique bypasses the part of the brain that would normally abstract visual input:When presented with an upside-down image as a subject to be drawn, the left-hemisphere’s verbal system [the abstracting mechanism] says, in effect, “I don’t do upside down. It’s too hard to name the parts, and things are hardly ever upside-down in the world. It’s not useful, and if you are going to do that, I’m out of here.” The dominant verbal system “bows out,” and the sub-dominant visual mode is “allowed” to take on the task for which it is well suited.
Upside-down drawing helps draw the brain’s attention to what Edwards calls the “component skills of drawing”. These include edges, negative spaces, proportions, lights and shadows. For example, your brain’s high-level abstract representation might tell you that the boundary of an object is horizontal, but a lower-level examination of what your eye is seeing will contradict that, leading you to represent the boundary by drawing a slanted line on the page.
“Drawing on the right side of the brain” means drawing using mid-level representations of visual inputs, rather than the fully abstract ones we rely on for other activities.
Edwards has an online gallery of her students’ self-portraits before and after taking her class. This one stopped me in my tracks:
Staring at the “after” drawing feels like I’m witnessing the first time the artist finally saw her own face, and wanted to communicate the joy and beauty of what she saw by accentuating it for the rest of us. I’ve bought a copy of Edwards’ book because I hope to try out her techniques and rewire my brain to experience my own visual revelation.
It’s clearly possible to draw detailed realistic pictures. Great artists can do it. A $5 camera can do it. The interesting takeaway for our purposes is that drawing realistic pictures means developing the skill of moving in the abstract→concrete direction, against the grain of normal conscious thought.
Next post: The Power to Be Creative (coming soonish)
This is the monthly Cambridge, MA Less Wrong / Slate Star Codex meetup.
Disclaimer: there may be major flaws in the way I use words. Corrections are welcome.
Suppose I want to memorize all the software design patterns.
I could use spaced repetition and create a new deck of flashcards. Each card would have the name of the pattern on one side and the definition on the other.
This would help me understand references to patterns without opening Wikipedia every time. This would probably help me recognize patterns by descriptions, as long as they're close enough to the definitions.
But this wouldn't help me recognize patterns just by looking at their implementations. I'd have to actively think about each pattern I remember and compare the definition and the code.
I could create a second deck, with names and examples. But then I'd just memorize those specific examples and maybe get better at recognizing similar ones.
This problem is similar to that of testing software. (There must be a more straightforward analogy, but I couldn't find one.) Individual tests can only prevent individual errors. Formal verification is better, but not always possible. The next best thing is fuzzing: using random inputs and heuristics like "did it crash?".
So I wonder if I could generate new examples on the fly. (More realistically, pull hand-labeled examples from a database.)
The idea is that a skill like recognizing a pattern in the code should also be a form of memory. Or at least the parts of it that do not change between the examples. So using spaced repetition with randomized examples would be like JIT-compilation in brains.
There was an LW post about genetic programming working better when the environment was modular. Maybe something similar would happen here.
But I couldn't find anything on the internet. Has anybody seen any research on this?
I'm writing a follow-up to my blog post on soft takeoff and DSA, and I am looking for good examples of tech companies or academic research projects that are ~3+ years ahead of their nearest competitors in the technology(ies) they are focusing on.
Exception: I'm not that interested in projects that are pursuing some niche technology, such that no one else wants to compete with them. Also: I'm especially interested in examples that are analogous to AGI in some way, e.g. because they deal with present-day AI or because they have a feedback loop effect.
Even better would be someone with expertise on the area being able to answer the title question directly. Best of all would be some solid statistics on the matter. Thanks in advance!
Below is a paper about to be submitted. The focus is on interventions that could improve the long-term outcome given catastrophes that disrupt electricity/industry, such as solar storm, high-altitude electromagnetic pulse (HEMP), narrow AI computer virus, and extreme pandemic. Work on these interventions is even more neglected than interventions for feeding everyone if the sun is blocked. Cost-effectiveness is compared to a modified AGI safety cost-effectiveness model posted earlier on the EA forum. Two different cost-effectiveness estimates for losing industry interventions were developed: one by Denkenberger and a poll at EA Global San Francisco 2018, and the other by Anders Sandberg at Future of Humanity Institute. There is great uncertainty in both AGI safety and interventions for losing industry. However, it can be said with ~99% confidence that funding interventions for losing industry now is more cost effective than additional funding for AGI safety beyond ~$3 billion. This does not take into account model or theory uncertainty, so the confidence would likely decrease. However, in order to make AGI safety more cost effective, this required changing four variables in the Sandberg model to the 5th percentile on the pessimistic end simultaneously. For the other model, it required changing seven variables. Therefore, it is quite robust that a significant amount of money should be invested in losing industry interventions now. There is closer to 50%-88% confidence that spending the ~$40 million on interventions for losing industry is more cost effective than AGI safety. Overall, AGI safety is more important and more total money should be spent on it. The modeling concludes that additional funding would be justified on both causes even for the present generation.Long Term Cost-Effectiveness of Interventions for Loss of Electricity/Industry Compared to Artificial General Intelligence Safety
David Denkenberger 1,2, Anders Sandberg 3, Ross Tieman *1, and Joshua M. Pearce 4,5
1. Alliance to Feed the Earth in Disasters (ALLFED), Fairbanks, AK 99775, USA
2. University of Alaska Fairbanks, Fairbanks, AK 99775, USA
3. Future of Humanity Institute, University of Oxford, Oxford, UK
4. Department of Material Science and Engineering and Department of Electrical and Computer Engineering, Michigan Technological University, Houghton, MI 49931, USA
5. Department of Electronics and Nanoengineering, School of Electrical Engineering, Aalto University, FI-00076 Espoo, Finland
* corresponding author
Extreme solar storms, high-altitude electromagnetic pulses, and coordinated cyber attacks could disrupt regional/global electricity. Since electricity basically drives industry, industrial civilization could collapse without it. This could cause anthropological civilization (cities) to collapse, from which humanity might not recover, having long-term consequences. Previous work analyzed technical solutions to save nearly everyone despite industrial loss globally, including transition to animals powering farming and transportation. The present work estimates cost-effectiveness for the long-term future with a Monte Carlo (probabilistic) model. Model 1, partly based on a poll of Effective Altruism conference participants, finds a confidence that industrial loss preparation is more cost effective than artificial general intelligence safety of ~88% and ~100% for the 30 millionth dollar spent on industrial loss interventions and the margin now, respectively. Model 2 populated by one of the authors produces ~50% and ~99% confidence, respectively. These confidences are likely to be reduced by model and theory uncertainty, but the conclusion of industrial loss interventions being more cost effective was robust to changing the most important 4-7 variables simultaneously to their pessimistic ends. Both cause areas save expected lives cheaply in the present generation and funding to preparation for industrial loss is particularly urgent.
Disclaimer/Acknowledgements: Funding was received from the Centre for Effective Altruism. Anders Sandberg received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 669751). The Oxford Prioritisation Project developed the artificial general intelligence safety cost effectiveness submodel. Owen Cotton-Barratt, Daniel Dewey, Sindy Li, Ozzie Gooen, Tim Fist, Aron Mill, Kyle Alvarado, Ratheka Stormbjorne, and Finan Adamson contributed helpful discussions. This is not the official position of the Centre for Effective Altruism, the Future of Humanity Institute, nor the Alliance to Feed the Earth in Disasters (ALLFED).
The integrated nature of the electric grid, which is based on centralized generation makes the entire system vulnerable to disruption.(1) There are a number of anthropogenic and natural catastrophes that could result in regional-scale electrical grid failure, which would be expected to halt the majority of industries and machines in that area. A high-altitude electromagnetic pulse (HEMP) caused by a nuclear weapon could disable electricity over part of a continent (Bernstein, Bienstock, Hay, Uzunoglu, & Zussman, 2012; Foster et al., 2004; Kelly-Detwiler, 2014; Oak Ridge National Laboratory, 2010). This could destroy the majority of electrical grid infrastructure, and as fossil fuel extraction and industry is reliant on electricity (Foster, Jr et al., 2008), industry would be disabled. Similarly, solar storms have destroyed electrical transformers connected to long transmission lines in the past (Space Studies Board, 2008). The Carrington event in 1859 damaged telegraph lines, which was the only electrical infrastructure in existence at the time. It also caused Aurora Borealis that was visible in Cuba and Jamaica (Klein, 2012). This could potentially disable electrical systems at high latitudes, which could represent 10% of electricity/industry globally. Though solar storms may last less than the 12 hours that would be required to expose the entire earth with direct line of sight, the earth's magnetic field lines redirect the storm to affect the opposite side of the earth (Space Studies Board, 2008).
Lastly, both physical (M. Amin, 2002, 2005; Kinney, Crucitti, Albert, & Latora, 2005; Motter & Lai, 2002; Salmeron, Wood, & Baldick, 2004) and cyber attacks (Aitel, 2013; Hébert, 2013; Nai Fovino, Guidi, Masera, & Stefanini, 2011; Onyeji, Bazilian, & Bronk, 2014; Sridhar, Hahn, & Govindarasu, 2012; Umbach, 2013; Watts, 2003) could also compromise electric grids. Physical attacks include traditional acts of terrorism such as bombing or sabotage (Watts, 2003) in addition to EMP attacks. Significant actors could scale up physical attacks, for example by using drones. A scenario could include terrorist groups hindering individual power plants (Tzezana, 2016), while a large adversary could undertake a similar operation physically to all plants and electrical grids in a region.
Unfortunately, the traditional power grid infrastructure is simply incapable of withstanding intentional physical attacks (National Research Council, 2012). Damage to the electric grid resulting in physical attack could be long lasting, as most traditional power plants operate with large transformers that are difficult to move and source. Custom rebuilt transformers require time for replacement ranging from months and even up to years (National Research Council, 2012). For example, a relatively mild 2013 sniper attack on California’s Pacific Gas and Electric (PG&E) substation, which injured no one directly, was able to disable 17 transformers supplying power to Silicon Valley. Repairs and improvements cost PG&E roughly $100 million and lasted about a month (Avalos, 2014; Pagliery, 2015). A coordinated attack with relatively simple technology (e.g. guns) could cause a regional electricity disruption.
However, a high-tech attack could be even further widespread. The Pentagon reports spending roughly $100 million to repair cyber-related damages to the electric grid in 2009 (Gorman, 2009). There is also evidence that a computer virus caused an electrical outage in the Ukraine (Goodin, 2016). Unlike simplistic physical attacks, cyber attackers are capable of penetrating critical electric infrastructure from remote regions of the world, needing only communication pathways (e.g. the Internet or infected memory sticks) to install malware into the control systems of the electric power grid. For example, Stuxnet was a computer worm that destroyed Iranian centrifuges (Kushner, 2013) to disable their nuclear industry. Many efforts are underway to harden the grid from such attack (Gent & Costantini, 2003; Hébert, 2013). The U.S. Department of Homeland Security responded to ~200 cyber incidents in 2012 and 41% involved the electrical grid (Prehoda, Schelly, & Pearce, 2017). Nations routinely have made attempts to map current critical infrastructure for future navigation and control of the U.S. electrical system (Gorman, 2009).
The electric grid in general is growing increasingly dependent upon the Internet and other network connections for data communication and monitoring systems (Bessani, Sousa, Correia, Neves, & Verissimo, 2008; Schainker, Douglas, & Kropp, 2006; Sridhar et al., 2012; Ulieru, 2007; Wu, Moslehi, & Bose, 2005). Although this conveniently allows electrical suppliers management of systems, it increases the susceptibility of the grid to cyber-attack, through denial of webpage services to consumers, disruption to supervisory control and data acquisition (SCADA) operating systems, or sustained widespread power outages (Aitel, 2013; Krotofil, Cardenas, Larsen, & Gollmann, 2014; Sridhar et al., 2012; Ten, Manimaran, & Liu, 2010). Thus global or regional loss of the Internet could have similar implications.
A less obvious potential cause is a pandemic that disrupts global trade. Countries may ban trade for fear of the disease entering their country, but many countries are dependent on imports for the functioning of their industry. If the region over which electricity is disrupted had significant agricultural production, the catastrophe could be accompanied by a ~10% food production shortfall as well. It is uncertain whether countries outside the affected region would help the affected countries, do nothing, or conquer the affected countries.
Larger versions of these catastrophes could disrupt electricity/industry globally. For instance, it is possible that multiple HEMPs could be detonated around the world, due to a world nuclear war (Pry, 2017) or due to terrorists gaining control of nuclear weapons. There is evidence that, in the last 2000 years, two solar storms occurred that were much stronger than the Carrington event (Mekhaldi et al., 2015). Therefore, it is possible that an extreme solar storm could disable electricity and therefore industry globally. It is conceivable that a coordinated cyber or physical attack (or a combination) on many electric grids could also disrupt industry globally. Many of the techniques to harden the electric grid could help with this vulnerability as well as moving to more distributed generation and microgrids (Che & Shahidehpour, 2014; Colson, Nehrir, & Gunderson, 2011; Lasseter, 2007; Lasseter & Piagi, 2004; Prehoda et al., 2017; Shahidehpour & Khodayar, 2013). An extreme pandemic could cause enough people to not show up to work such that industrial functioning could not be maintained. Though this could be mitigated by directing military personnel to fill vacant positions, if the pandemic were severe enough, it could be rational to retreat from high human contact industrial civilization in order to limit disease mortality.
The global loss of electricity could even be self-inflicted as a way of stopping rogue artificial general intelligence (AGI) (Turchin & Denkenberger, 2018a). As the current high agricultural productivity depends on industry (e.g. for fertilizers) it has been assumed that there would be mass starvation in these scenarios (Robinson, 2007).
Repairing these systems and re-establishing electrical infrastructure would be a goal of the long term and work should ideally start on it immediately after a catastrophe. However, human needs would need to be met immediately (and continually) and since there is only a few months of stored food, it would likely run out before industry is restored with the current state of preparedness. In some of the less challenging scenarios, it may be possible to continue running some machines on the fossil fuels that had previously been brought to the surface or from the use microgrids or shielded electrical systems. In addition, it may be feasible to run some machines on gasified wood (Dartnell, 2014). However, in the worst-case scenario, all unshielded electronics would be destroyed.
Here we focus on catastrophes that only disrupt electricity/industry, rather than catastrophes that could disable industry and obscure the sun (Cole, Denkenberger, Griswold, Abdelkhaliq, & Pearce, 2016) or catastrophes that only obscure the sun (or affect crops directly in other ways) ( Denkenberger & Pearce, 2015b). This paper analyzes the cost effectiveness of interventions from a long term perspective. First, this study will review interventions to both avoid a loss of electricity, but also to feed everyone with this loss. Then the benefits of artificial general intelligence (AGI) safety on the long term future will be reviewed and quantified. Next, two loss of industry interventions submodels are developed. The cost for an intervention based on alternative food communication is estimated.
2.1 Review of Potential Solutions
An obvious intervention for HEMP is preventing a nuclear exchange, which would be the best outcome. However, it is not neglected, as it has been worked on for many decades (Barrett, Baum, & Hostetler, 2013; D. C. Denkenberger & Pearce, 2018; Helfand, 2013; McIntyre, 2016a; Turchin & Denkenberger, 2018b) and is currently funded at billions of dollars per year quality adjusted (McIntyre, 2016b). Other obvious interventions for HEMP that would also work for solar storms, and coordinated physical or cyber threats would be hardening the electrical grid against these threats. However, hardening just the U.S. electrical grid against solar storm and HEMP would cost roughly $20 billion (Pry, 2014). Therefore globally, just from these two threats it would be around $100 billion. Furthermore, adding hardening to cyber threats would be even more expensive. Again, preventing the collapse of electricity/industry would be the preferable option, but given the high cost, it may not happen. Even if it occurs eventually, it would still be preferable to have a backup plan in the near term and in the case that hardening is unsuccessful at stopping loss of industry.
A significant problem in loss of industry catastrophes is that of food supply (Cole et al., 2016). One intervention is storing years worth of food, but it is too expensive to have competitive cost effectiveness (and it would take many years so it would not protect humanity right away, and it would exacerbate current malnutrition) (Baum, Denkenberger, & Pearce, 2016). Furthermore, if electricity/industry is disabled for many years, food storage would be impractical. Stockpiling of industrial goods could be another intervention, but again it would be much more expensive than the interventions considered here.
Interventions for food production given the loss of industry include burning wood from landfills to provide fertilizer and high use of nitrogen fixing crops including legumes (peas, beans, peanuts, etc.) (Cole et al., 2016). Also, nonindustrial pest control could be used. Despite pre-industrial agricultural productivity (~1.3 dry tons per hectare per year) (Cole et al., 2016), this could feed everyone globally. However, not everyone would be nearby the food sources, and losing industry would severely hamper transportation capability. Solutions for this problem include backup plans for producing more food locally, including expanding planted area (while minimizing impact to biodiversity e.g. by expanding into the boreal forest/tundra enhanced by the nutrients from tree decomposition/combustion) and favoring high calorie per hectare foods such as potatoes, yams, sweet potatoes, lentils, and groundnuts (Oke, Redhead, & Hussain, 1990). Though clearing large areas of forest with hand saws would not be practical, it is possible to girdle the trees (remove a strip of bark around the circumference), let the trees dry out, and burn them. This has the advantage of releasing fertilizer to the soils. Another option involves producing “alternative foods,” which were proposed for sun-blocking catastrophes (D. Denkenberger & Pearce, 2014). Some of these alternative foods would require industry, but producing non-industrial lower cost ones such as extracting calories from leaves (D. Denkenberger, Pearce, Taylor, & Black, 2019) could be feasible. For transporting the food and other goods, ships could be modified to be wind powered and animals could pull vehicles (Abdelkhaliq, Denkenberger, Griswold, Cole, & Pearce, 2016). A global network of shortwave radio transmitters and receivers would facilitate disseminating the message that there is a plan and people need not panic, and also allow for continuing coordination globally (see below).
Current awareness of interventions given loss of electricity/industry (hereafter “interventions”) is very low, likely in the thousands of people. Also, many of the interventions are theoretical only and need to be tested experimentally. There may be a significant amount of shortwave radio systems that are shielded from HEMP and have shielded backup power systems, but likely some addition to this capacity would be beneficial. This paper analyzes the cost effectiveness of interventions from a long term perspective. It is unlikely that the loss of industry would directly cause human extinction. However, by definition, there would be a loss of industrial civilization for the global catastrophes. Furthermore, there could be a loss of anthropological civilization (basically cities or cooperation outside the clan). One definition of the collapse of civilization involves short-term focus, loss of long distance trade, widespread conflict, and collapse of government (Coates, 2009). Reasons that civilization might not recover include: i) easily accessible fossil fuels and minerals are exhausted (Motesharrei, Rivas, & Kalnay, 2014) (though there would be minerals in landfills), ii) the future climate might not be as stable as it has been for the last 10,000 years (Gregory et al., 2007), or iii) technological and economic data and information might be lost permanently because of the trauma and genetic selection of the catastrophe (Bostrom, 2013). If the loss of civilization were prolonged, a natural catastrophe, such as a super volcanic eruption or an asteroid/comet impact, could cause the extinction of humanity. Another way to far future impact is the trauma associated with the catastrophe making future catastrophes more likely, e.g. global totalitarianism (Bostrom & Cirkovic, 2008). A further route is worse values caused by the catastrophe could be locked in by artificial general intelligence (AGI) (Bostrom, 2014), though with the loss of industrial civilization, the advent of AGI would be significantly delayed, so the bad values could have decayed out by then.
2.2 Artificial General Intelligence
AGI itself represents a major, independent risk. The artificial intelligence available now is narrow AI, i.e. it can generally only do a specific task, such as playing Jeopardy! (Schaul, Togelius, & Schmidhuber, 2011). However, there are concerns that as AI systems become more advanced, AGI will eventually be achieved (Bostrom, 2014). Since AGI could perform all human tasks as well as or better than humans, this would include reprogramming the AGI. This would enable recursive self-improvement, so there could be an intelligence explosion (Good, 1966). Since the goals of the intelligence may not be aligned with human interests (Bostrom, 2014) and could be pursued with great power, this implies a potentially serious risk (Good, 1966). AGI safety is a top priority in the existential risk community that seeks to improve humanity’s long term future (Turchin & Denkenberger, 2018b). Though there is uncertainty in when and how AGI may be developed, there are concrete actions that can be taken now to increase the probability of a good outcome (Amodei et al., 2016).
We seek to compare the cost effectiveness of losing industry interventions with AGI safety to discover whether these interventions should also be a top priority. Comparisons to other risks, such as asteroids (Matheny, 2007), climate change (Halstead, 2018) and pandemics (Millett & Snyder-Beattie, 2017), are possible, though these are generally regarded by the existential risk community as lower priority and therefore less informative.
Given the large uncertainties in input parameters, we model cost-effectiveness using a Monte Carlo simulation, producing a probability distribution of cost-effectiveness. Probabilistic uncertainty analysis is used widely in insurance, decision-support and cost-effectiveness modelling (Garrick, 2008). In these models, uncertain parameters are represented by samples drawn from defined distributions that are combined into output samples that form a resultant distribution.
The models consist of a loss of industry submodel estimating the risk and mitigation costs of industrial loss, and an AGI risk submodel estimating risk and mitigation costs of AGI scenarios. These two submodels then allow us to estimate the ratio and confidence of cost-effectivenesses.
Monte Carlo estimation was selected because the probability distributions for various parameters do not come in a form that provides analytically tractable combinations. It also allows exploring parameter sensitivity.
The open source software called Guesstimate(2) was originally used to implement the models, and they are available online. However, to enable more powerful analysis and plotting, the models were also implemented on the software Analytica 5.2.9. Combining the uncertainties in all the inputs was performed utilizing a Median Latin Hypercube analysis (similar to Monte Carlo, but better performing (Keramat & Kielbasa, 1997)) with the maximum uncertainty sample of 32,000 (run time on a personal computer was seconds). The results from the two software agreed within uncertainties due to finite number of samples, giving greater confidence in the results.
Figures 1 to 4 illustrate the interrelationships of the nodes for Model 1; Model 2 is identical with the following exception. The input variable Mitigation of far future impact of industrial loss from ALLFED so far for 10% industrial loss node was removed from Model 1 due to the poll question not requiring this input.
Figure 1. Model overview
Figure 2. 100% Industry loss catastrophes submodel (10% industry loss is nearly identical)
Figure 3. AGI safety cost effectiveness submodel
Figure 4. Overall cost effectiveness ratios
3.1 Loss of Industry Interventions Submodel
Table 1 shows the key input parameters for Model 1 (largely Denkenberger and conference poll of effective altruists)(D. Denkenberger, Cotton-Barrat, Dewey, & Li, 2019a) and Model 2 (D. Denkenberger, Cotton-Barratt, Dewey, & Li, 2019) (Sandberg inputs)(3). Though the authors here are associated with research on loss of industry, two out of four also published in AGI safety. Also, opinions outside of the loss of industry field have been solicited for one of the models. Therefore, we believe the results are representative. All distributions are lognormal unless otherwise indicated. The absolute value of the long term future is very difficult to quantify, so losses are expressed as a percent.
Table 1. Losing industry interventions input variables
The potential causes of the disabling of 1/10 of global industry include Carrington-type solar storm, single HEMP, coordinated physical or cyber attack, conventional world war, loss of the Internet, and pandemic disrupting trade. We are not aware of quantitative estimates of the probability of a coordinated cyber attack, loss of the Internet, a pandemic that significantly disrupts trade, or a conventional world war that destroys significant industry and does not escalate to the use of nuclear weapons. Quantitative model estimates of the probability of full-scale nuclear war between the U.S. and Russia such as (Barrett et al., 2013) may give some indication of the probability of HEMP. HEMP could accompany nuclear weapons destroying cities, and this would be a combination losing industry/losing the sun scenario, which would benefit from the preparation considered here. Asymmetric warfare, where one country is significantly less powerful than another, could use HEMP because it only requires one or two nuclear weapons to disable an entire country. There are significantly more nuclear pairs that could result in HEMP than could result in full-scale nuclear war (the latter is basically the dyads between US, Russia, and China). And yet one quantitative model estimate of the probability of full-scale nuclear war only between U.S. and Russia was 1.7% per year mean (Barrett et al., 2013). In 2012, there was a near miss of a solar storm similar size to the Carrington event (Baker et al., 2013). One probability estimate of a Carrington-sized event is ~0.033% per year (Roodman, 2015). However, an estimate of the probability per year of a superflare 20 times as powerful as the Carrington event is 0.1%/year (Lingam & Loeb, 2017), which disagrees by orders of magnitude for the same intensity. Another study proposes that a Carrington-sized event recurrence interval is less than one century (Hayakawa et al., 2019). Given the large uncertainty of solar storms and significant probability of single EMP, pandemic and regional cyber attack, Model 1 uses a mean of 3% per year. Model 2 uses a mean of 0.4% per year.
Intuitively, one would expect that the probability of near-total loss of industry would be significantly lower than 10% loss of industry. Complete loss of industry may correspond to the superflares that may have occurred in the first millennium A.D. (~0.1% per year). We are not aware of quantitative estimates of the probability of multiple EMP, industry-halting pandemic or global cyber attack. Model 1 mean is 0.3% per year for near-total loss of industry. Model 2 mean is 0.09% per year.
At the Effective Altruism Global 2018 San Francisco conference, with significant representation of people with knowledge of existential risk, a presentation was given and the audience was asked about the 100% loss of industry catastrophes. The questions involved the reduction in far future potential due to the catastrophes with current preparation and if ~$30 million were spent to get prepared. The data from the poll were used directly instead of constructing continuous distributions.
To determine the marginal impact of additional funding, the contribution due to work so far should be quantified. The Alliance to Feed the Earth in Disasters (ALLFED)(ALLFED, 2019) (and ALLFED researchers before the organization was officially formed) have published several papers on interventions for losing industry. They have a website with these papers and summaries. They have also run workshops to investigate planning for these interventions. However, we expect the contribution of ALLFED to reducing the long term impact of loss of industry to be significantly lower than in the case of obscuring of the sun because the loss of the Internet may be immediate if there are multiple simultaneous EMPs. However, the loss of electricity may not be simultaneous globally due to cyber attack. Furthermore, there may be several days warning for an extreme solar storm. The other reason why current work may be less valuable in a global loss of industry scenario is that fewer people know about the loss of industry work of ALLFED than the food without the sun work. Model 1 estimates a reduction in long-term future potential loss from a global loss of industry due to ALLFED so far as a mean of 0.1%. Model 2 uses 0.004% due to emphasizing lack of communication scenarios.
In the case of a 10% loss of industry, with the exception of the scenario of loss of Internet everywhere, the Internet in most places would be functioning. Even if the Internet is not functioning, mass media would generally be functioning. Therefore, possible mechanisms for impact due to work so far include the people already aware of the interventions getting the message to decision makers/media in a catastrophe, decision makers finding the three papers (Abdelkhaliq et al., 2016; Cole et al., 2016; David C Denkenberger et al., 2017) on these interventions, or the people in the media who know about these interventions spreading the message. However, even though people outside of the affected countries could get the information, it may not be feasible to get the information to the people who need it most. Model 2 estimates a reduction in long-term future potential loss from a global loss of industry due to ALLFED so far as a mean of 0.004%, again due to the likely lack of communications in the affected region. Model 1 does not use a value in its calculation.
The mean estimate of the conference participants was 16% reduction in the long-term future of humanity due to loss of global industry with current preparedness. Model 2 estimate mean was 7%.
The 10% industry loss catastrophes could result in instability and full scale nuclear war or other routes to far future impact. Though the poll was not taken for this level of catastrophe, a survey of GCR researchers estimated a mean of 13% reduction in long-term potential of humanity due to a 10% food shortfall (Denkenberger, Sandberg, & Pearce, unpublished results). Some 10% loss of industry catastrophes could cause a ~10% global food shortfall. However, if the affected area were largely developed countries, since they would likely need to become near vegan to survive, human edible food demand could fall 10% because of the reduction of feeding animals. Still, given the possible overlap of these catastrophes, this analysis uses the survey estimate for Model 1. Model 2 estimate mean is 0.4% reduction in long-term potential due to 10% loss of industry.
The means of the percent further reduction in far future loss due to global loss of industry due to spending ~$30 million were 40% for the poll and 3% for Model 2. Note that in Model 1, the poll did not ask for the further reduction in far future loss from spending money, but instead a new far future loss after the money was spent. Therefore, the 40% mean further reduction is a calculated value and does not appear in Table 1. For the 10% industrial shortfalls, our estimate of the mean reduction is 12% for Model 1 because the contribution of additional spending on the aid from outside the affected region would be smaller. On the other hand, it was 5% for Model 2 because he thought the likelihood of success would be greater than for the global loss of industry given the outside aid.
Moral hazard would occur if awareness of interventions makes catastrophes more likely or more intense. Global use of EMP or coordinated cyber attack could be perpetrated by a terrorist organization trying to destroy civilization. However, if the organization knew of backup plans that could maintain civilization, the terrorist might actually be deterred from attempting such an attack. This would result in negative moral hazard (additional benefit of preparation). However, it is possible that knowledge of a backup plan could result in people expending less effort to harden systems to EMP, solar storm or cyber attack, creating moral hazard. Therefore, Model 1 uses a mean moral hazard of zero, and Model 2 uses a point value of zero.
For the 10% loss of industry scenarios, the same moral hazard values are used as for the global loss of industry.
3.2 Costs of Interventions
The costs of the proposed interventions are made up of a backup communication system, developing instructions and testing them for distributed food production, and making response plans at different levels of governments.
Currently the long distance shortwave radio frequencies are used by government and military stations, ships at sea, and by amateur (ham) radio operators. Because of security considerations, data on the number of government/military stations is difficult to compile. The use by ships has declined because of the availability of low cost satellite phones but there are an estimated three million ham operators worldwide (Silver, 2004). Not all of those are licensed to use the shortwave bands, however. In the U.S., about half of the approximately 800,000 American ham operators do hold the necessary license. Assuming such a pattern worldwide that would mean potentially about 1.5 million ham radio shortwave stations globally.
However, this analysis conservatively ignores the possibility that there would be existing ham radios that are disconnected with unplugged backup power systems. Therefore, the cost of the backup communication system of 5 million USD is based on the cost of 10 larger two-way shortwave communication systems (with backup power) that can transmit across oceans (see Appendix A). Then there would be 4000 smaller one-way shortwave receivers (with backup power) that, when connected to a laptop computer and printer, would have the ability to print out information. This could be called REcovering Civilization Using Radio (RECUR). This would cover 80% of the world’s population within one day nonmotorized transportation distance (~40 km) according to Geographical Information Systems (GIS) analysis (Fist et al., unpublished results). It is critical to very quickly get the message out that there is a plan and not to panic. Subsequent communication would be instructions for meeting basic needs immediately like food, shelter, and water. This initial planning would be considered open-loop control because it would not have immediate feedback (Liptak, 2018).
In the ensuing months, as reality always deviates from plans, feedback would be required. This could be accomplished by coordinating additional undamaged shortwave and electrical generation equipment to allow two-way communication for many cities. Also, depending on distance, some messages could be communicated through non-electronic means such as horses, smoke signals, and sun reflecting heliographs of the kind that were used in the Western USA before telegraphs (Rolak, 1975; Sterling, 2008).
Instructions would include how to get safe water or treat it (e.g. by filling containers including cleaned bathtubs with water in water towers and treating with bleach for a limited amount of time, solar water pasteurization (Burch et al., 1998; ) or boiling). Additional instructions would be on how to keep warm if it is cold outside (Abdelkhaliq et al., 2016). Other instructions would be how to retrofit a light duty vehicle to be pulled by a large animal. Because cattle and horses can eat food that is not edible to humans and because the wheel is so efficient, this would be a much more effective way of moving people than people walking. Additional instructions would be how to create wood-burning stoves and hand and animal farming tools, e.g. from repurposed or landfill materials. A similar project is Open Source Ecology, where blueprints have been developed of essential equipment for civilization that can be made from scratch (Open Source Ecology, 2019). All of this should be tested on realistically untrained people and the instructions should be modified accordingly.
Planning involves determining where different people would need to be relocated in order to have their basic needs met. The critical short-term factors are shelter and water, while food is slightly longer term. The economically optimal plan could be achieved with GIS analysis. However, in order for this to be politically feasible, there would need to be negotiations and precommitments. This may have similar cost to the government planning for food without the sun of $1 million to $30 million (Denkenberger & Pearce, 2016).
Overall, Model 1 estimates the communications, instructions/testing, and planning for global industry loss would cost roughly 30 million USD (see Table 1). For the regional loss of industry, it is difficult to predict where it might occur, so generally communications and planning should be done for the entire world, and thus the instructions/experiments would be similar. Therefore, there is a high correlation of preparation for the two catastrophes, so this is assumed to be the cost of the preparation to both scales of catastrophe. Model 2 has somewhat higher costs ($50 million mean).
The time horizon of effectiveness of the interventions would depend on the intervention. Modern shortwave radio communications equipment has few moving parts (chiefly cooling fans and motors to rotate directional antennas) and serviceability measured in decades.(5)
Furthermore, these systems need to be disconnected from the grid to be protected from HEMP. This would reduce wear and tear, but regular testing would be prudent. Some of the budget could be used for this and for repair of the units. As for the instructions, since the hand and animal tools are not changing, directions should stay relevant. Planning within governments is susceptible to turnover, but some money could be used to transfer the knowledge to new employees. Model 1 estimates a 25 year mean for the time horizon. Model 2 has a slightly shorter time horizon mean of 20 years driven by a conservative estimate of the communications equipment lifetime.
3.3 Artificial Intelligence Submodel
The submodel for AGI safety cost-effectiveness was based on work of the Oxford Prioritisation Project, Owen Cotton-Barratt and Daniel Dewey (both while at the Future of Humanity Institute at the University of Oxford) (D. Denkenberger, Cotton-Barrat, Dewey, & Li, 2019b; Li, 2017). We modified it (Denkenberger et al., unpublished results), with major changes including increasing the cost of an AGI safety researcher, making better behaved distributions, removing one method of calculation and changing the analysis from average to marginal for number of researchers. These changes increased the cost effectiveness of AGI safety by roughly a factor of two and increased the uncertainty considerably (because the method of calculation retained had much greater uncertainty than the one removed). The cost-effectiveness was found at the margin assuming $3 billion expenditure.
4. Results and Discussion
In order to convert average cost effectiveness to marginal for interventions, we use logarithmic returns (Cotton-Barratt, 2014), which results in the relative marginal cost effectiveness being one divided by the cumulative money spent. An estimate is needed of the cumulative money spent so far for interventions. Under $100,000 equivalent (mostly volunteer time) has been spent so far directly on this effort, nearly all by ALLFED. A very large amount of money has been spent on trying to prevent nuclear war, hardening military installations to HEMP, and on cyber security. However, note that even though US military infrastructure is supposedly hardened to EMP, it may not be able to withstand a “super” EMP weapon that some countries may possess (P. Pry, 2017) or sophisticated cyber attacks. More relevant, money has been spent on farming organically and less industrially for traditional sustainability reasons. Also, Open Source Ecology has developed instructions for critical equipment. These could be tens of millions of dollars that would have needed to be spent for catastrophe preparation. So this would be relevant for the marginal $30 million case. However, there are still very high value interventions that should be done first, such as collecting instructions for producing hand/animal farm tools without industry and giving them to at least some governments and owners of disconnected shortwave radios and backup power sources. Though the interventions would not work as well as with ~$30 million of research/communications backup, simply having some critical people know about them and implement them in their own communities/countries without trade could still significantly increase the chance of retaining anthropological civilization. The cost of these first interventions would be very low, so they would have very high cost effectiveness.
Table 2 shows the ranges of the far future potential increase per $ due to loss of industry preparation average over ~$30 million Model 1, average over ~$50 million for Model 2, and AGI safety research at the $3 billion margin. The distributions are shown in Figure 5. Because the variance of Model 1 is very high, the mean cost-effectiveness is high, driven by the small probability of very high cost-effectiveness.
Table 2. Cost-effectiveness comparison
Figure 5. Far future potential increase per $ due to loss of industry preparation average over ~$30 million Model 1, due to loss of industry preparation average over ~$50 million Model 2, and AGI safety research at the $3 billion margin. Further to the right is more cost-effective.
With logarithmic returns, cost-effectivenesses of the marginal dollar now (100,000th dollar) and of the last dollar are about 50 times greater than, and 6 times less than, the average cost effectiveness of spending $30 million, respectively. For Model 2, the corresponding numbers are about 70 times greater than and 6 times less than the average cost effectiveness of spending $50 million. Ratios of mean of the distributions of cost effectivenesses are reported in Table 3.6 Comparing to AGI safety at the margin, Model 1 yields the 30 millionth dollar on losing industry being 20 times more cost effective, the average $30 million on interventions being 100 times more cost effective, and the marginal dollar now on interventions being 5000 times more cost effective (Table 3). Model 2 yields the last dollar on interventions being 0.05 times as cost effective, the average ~$50 million on interventions being 0.2 times as cost effective, and the marginal dollar now on interventions being 20 times as cost effective. Given orders of magnitude uncertainty and sensitivity of these ratios to the relative uncertainty of the interventions, likely more robust are the probabilities that one is more cost effective than the other. Comparing to AGI safety at the margin, Model 1 finds ~88% probability that the 30 millionth dollar on interventions is more cost effective, ~95% probability that the average $30 million on interventions is more cost effective, and ~100% probability that the marginal dollar now on interventions is more cost effective (see Table 3). Model 2 finds ~50% probability that the 50 millionth dollar on interventions is more cost effective than AGI safety, ~76% probability that the average $50 million on interventions is more cost effective, and ~99% probability that the marginal dollar now on interventions is more cost effective. Note that the greater than 50% probability for the average cost effectiveness despite the ratio of the means of cost-effectiveness being less than one is due to the relatively smaller variance of Model 2 cost-effectiveness estimate (see Figure 5).
Table 3. Key cost effectiveness outputs of losing industry interventions
Overall, the mean cost-effectiveness of Model 1 is about 2.5 orders of magnitude higher than Model 2. However, due to the smaller variance in Model 2 distributions, there was similar confidence that losing industry interventions at the margin now are more cost-effective than AGI safety. Another large difference is that Model 1 found that 10% loss of industry scenarios are similar cost effectiveness for the far future as global loss. This was because the greater probability of these catastrophes counteracted the smaller far future impact. However, Model 2 rated the cost-effectiveness of the 10% industry loss as ~1.5 orders of magnitude lower than for global loss. Given the agreement of high confidence that further work is justified at this point, some of this further work could be used to resolve the significant uncertainties to determine if more money is justified: value of information (Barrett, 2017).
Being prepared for loss of industry might protect against unknown risks, meaning the cost-effectiveness would increase.
According to Model 1, every year acceleration in preparation for losing industry would increase the long term value of humanity by 0.00009% to 0.4% (mean of 0.07%). The corresponding Model 2 numbers are 0.00006% to 0.0004% (mean of 0.00017%). Either way, there is great urgency to get prepared.
It is not necessary for interventions to be more cost effective than AGI safety in order to fund losing industry interventions on a large scale. Funding in the existential risk community goes to other causes, e.g. an engineered pandemic. One estimate of cost effectiveness of biosecurity was much lower than for AGI safety and losing industry interventions, but the authors were being very conservative (Millett & Snyder-Beattie, 2017). Another area of existential risk that has received investment is asteroid impact, which again has much lower cost-effectiveness than for losing industry interventions (Matheny, 2007).
The importance, tractability, neglectedness (ITN) framework (Effective Altruism Concepts, 2019) is useful for prioritizing cause areas. The importance is the expected impact on the long-term future of the risk. Tractability measures the ease of making progress. Neglectedness quantifies how much effort is being directed towards reducing the risk. Unfortunately this framework cannot be applied to interventions straightforwardly. This is because addressing a risk could have many potential interventions. Nevertheless, some semi-quantitative insights can be gleaned. The importance of AGI is larger than industry loss catastrophes, but industry loss interventions are far more neglected.
Though these interventions for the loss of industry are not compared directly to food without the sun interventions, they are both compared to the same AGI safety submodel. Overall, Model 2 indicates that spending $50 million on interventions for the loss of industry is competitive with AGI safety. However, Model 1 here and both models for the food without sun indicate that significantly larger than the proposed amount to be spent (~$100 million) would be justified from the long-term future perspective.
The AGI safety submodel was used to estimate the cost effectiveness of saving expected lives in the present generation, finding $16-$12,000 per expected life saved ((Denkenberger et al., unpublished results). This is generally more cost effective than GiveWell estimates for global health interventions: $900-$7,000 (GiveWell, 2017). Food without the sun is significantly better ($0.20-$400 per expected life) for only 10% global food production shortfalls ( Denkenberger & Pearce, 2016) and generally better only considering one country ($1-$20,000 per expected life) and only nuclear winter ( Denkenberger & Pearce, 2016). Model 2 for interventions for losing industry has similar long term future cost-effectiveness to AGI safety, indicating that the lifesaving cost-effectiveness of interventions for losing industry would likely be competitive with AGI safety and global health, but this requires future work. Model 1 for interventions for losing industry has similar long term future cost-effectiveness to food without the sun, indicating that loss of industry preparations may save lives in the present generation less expensively than AGI safety and global health. Since AGI safety appears to be underfunded from the present generation perspective, it would be extremely underfunded when taking into account future generations. If this were corrected, then in order for interventions for losing industry to stay similar cost-effectiveness to AGI safety, more funding for losing industry interventions would be justified.
4.2 Timing of Funding
If one agrees that interventions for losing industry should be a significant part of the existential risk reduction portfolio, there remains the question of how to allocate funding to the different causes over time. For AGI safety, there are arguments both for funding later and funding now (Ord, 2014). For interventions for losing industry, since most of the catastrophes could happen right away, there is significantly greater urgency to fund interventions for losing industry now. Furthermore, it is relatively more effective to scale up the funding quickly because, through requests for proposals, the effort could co-opt relevant existing expertise (e.g. in shortwave radio). Since we have not monetized the value of the far future, we cannot use conventional cost-effectiveness metrics such as the benefit to cost ratio, net present value, payback time, and return on investment. However, in the case of saving expected lives in the present generation for the global case and 10% food shortfalls, the return on investment was from 100% to 5,000,000% per year (Denkenberger & Pearce, 2016) based on monetized life savings. This suggests that the $40 million or so for interventions for losing industry should be mostly spent in the next few years to optimally reduce existential risk (a smaller amount would maintain preparedness into the future).
4.3 Uncertainty and parameter sensitivity
Parameter sensitivities of Model 1 and Model 2 were investigated using the Analytica importance analysis function. This uses the absolute rank-order correlation between each input and the output as a measure of the strength of monotonic relations between each uncertain input and a selected output, both linear and otherwise (Chrisman et al., 2007; Morgan & Henrion, 1990). Analysis was focused on the alternative foods submodel i.e. Global loss of industry and 10% industry loss catastrophes. Parameter sensitivity within AGI safety was not investigated as this submodel was adapted from previous work by the Oxford Prioritisation Project, which discussed uncertainties within the AGI safety cost effectiveness submodel (Denkenberger et al., 2019b; Li, 2017)).
The key outputs nodes in Table 3 were unable to be investigated directly using the importance analysis function due to the node outputs being point values, a result of calculating the ratio of means (the Analytica importance analysis function requires the variable be a chance variable to perform absolute rank-order correlation). Therefore the previous node in the models Far future potential increase per $ due to loss of industry preparation was used to investigate the importance of input variables of the alternate foods submodel.
Importance analysis of node: Far future potential increase per $ due to loss of industry preparation showed Model 1 had greatest sensitivity to input variables Reduction in far future potential due to 10% industrial loss with current preparation closely followed by Reduction in far future potential due to global loss of industry with current preparation (Figure 6). Model 2 showed greatest sensitivity to input variable Cost of interventions ($ million) (global loss of industry) (Figure 7).
Figure 6. Importance analysis results for Far future potential increase per $ due to loss of industry preparation for Model 1.
Figure 7. Importance analysis results for Far future potential increase per $ due to loss of industry preparation for Model 2.
Successive rounds of parametric analysis were performed to determine combinations of input parameters sufficiently unfavorable to alternative foods, until cost effectiveness ratios (Table 3) switched to favoring AGI safety. Unfavorable input values were limited to 5th or 95th percentile values of original input distributions. Model 1 required 7 unfavorable input parameters to switch to AGI safety being more cost effective than losing industry interventions at the margin now while Model 2 required 4 input variables (see Table 4).
Table 4: Combination of input variables resulting in AGI safety being more cost effective than losing industry interventions at the margin now.
5. Conclusions and Future Work
There are a number of existential risks that have the potential to reduce the long-term potential of humanity. These include AGI and electricity/industry disrupting catastrophes including extreme solar storm, EMP, and coordinated cyber attack. Here we present the first long term future cost-effectiveness analyses for interventions for losing industry. There is great uncertainty in both AGI safety and interventions for losing industry. However, it can be said with 99%-100% confidence that funding interventions for losing industry now is more cost effective than additional funding for AGI safety beyond the expected $3 billion. In order to make AGI safety more cost effective than losing industry interventions according to the mean of their distributions, this required changing four variables in Model 2 to the 5th percentile on the pessimistic end simultaneously. For Model 1, it required changing seven variables. Therefore, it is quite robust that a significant amount of money should be invested in losing industry interventions now. There is closer to 50%-88% confidence that spending the ~$40 million on interventions for losing industry is more cost effective than AGI safety. These interventions address catastrophes that have significant likelihood of occurring in the next decade, so funding is particularly urgent. Both AGI safety and interventions for losing industry save expected lives in the present generation more cheaply than global poverty interventions, so funding should increase for both. The cost-effectiveness at the margin of interventions for the loss of industry is similar to that for food without the sun (for industry versus sun, Model 1 is ~1 order of magnitude more cost effective, but Model 2 is ~1 order of magnitude less cost effective). Because the electricity/industry catastrophes could happen immediately and because existing expertise relevant to food without industry could be co-opted by charitable giving, it is likely optimal to spend most of this money in the next few years.
Since there may be scenarios of people eating primarily one food, micronutrient sufficiency should be checked, though it would be less of an issue than for food without the sun (D. Denkenberger & Pearce, 2018; Griswold et al., 2016). Higher priority future research includes ascertaining the number and distribution of unplugged shortwave radio systems with unplugged power systems that could be utilized in a catastrophe. Additional research includes the feasibility of the continuation of improved crop varieties despite loss of industry. Further research is estimating the rapidity of scale up of hand and animal powered farm tools. Estimating the efficacy of pest control without industry would be valuable. Better quantifying the capability of using fertilizer based on ash would be aided by GIS analysis. Additional work is surveying whether there have been experiments of the agricultural productivity produced by people inexperienced in farming by hand.
Another piece of future work would be to analyze the cost-effectiveness of AGI safety and preparation for the loss of industry in terms of species saved. Rogue AGI could cause the extinction of nearly all life on earth. If there were mass starvation due to the loss of electricity/industry, humans would likely eat many species to extinction. Therefore, being able to meet human needs would save species. These cost effectivenesses could be compared to the cost effectiveness of conventional methods of saving species. Finally, additional future work involves better quantifying the cost of preparedness to the loss of industry. Furthermore, research for the actual preparedness should be done, including estimating the amount of unplugged communications hardware and backup power, testing the backup communications system, experiments demonstrating the capability to quickly construct hand/animal farm tools and developing quick training to use them. Also investigating alternative food sources that do not require industry would be beneficial, such as seaweed (Mill et al., unpublished results).Footnotes
(1) This vulnerability can be addressed with distributed generation and microgrids (S. M. Amin, 2010; Lovins & Lovins, 1982; Prehoda et al., 2017; Zerriffi, Dowlatabadi, & Strachan, 2002), but these technologies are still far from ubiquitous.
(2) One can change numbers in viewing mode to see how outputs change, but alterations will not save. If one wants to save a new version, one can make a copy of the model. Click View, visible to show arrows of relationships between cells. Drag mouse over cells to see comments. Click on the cell to show the equation.
(3) Lognormal results in the median being the geometric mean of the bounds (multiply the 5th and 95th percentiles and raise to the 0.5 power). Note that with large variances, the mean is generally much higher than the median.
(4) The global loss poll gave people ranges, including <0.1%, 0.1% to 1%, 1% to 10%, and 10% to 100%. All responses in the range were recorded as approximately the geometric mean of the range. Half of people were therefore recorded as 30% loss of the far future. If the people had been able to provide exact values, likely one of them would have recorded greater than 40%, which was the upper bound for the 10% loss of industry, making these results consistent. However, even with the constraints of the data, the mean and median are higher for the global loss of industry than the 10% loss of industry.
(5) On any given day Ebay lists numerous used shortwave radio transmitter/receivers still in fully operational condition, some of them manufactured in the 1960s.
(6) Ratios of means require manual changes in Guesstimate, which we note in all caps in the model.
Appendix A: Radio component costs - available on https://osf.io/rgq2z/
Avalos, G. (2014, August 27). PG&E substation in San Jose that suffered a sniper attack has a new security breach. Retrieved August 8, 2019, from The Mercury News website: https://www.mercurynews.com/2014/08/27/pge-substation-in-san-jose-that-suffered-a-sniper-attack-has-a-new-security-breach/
Baker, D. N., Li, X., Pulkkinen, A., Ngwira, C. M., Mays, M. L., Galvin, A. B., & Simunac, K. D. C. (2013). A major solar eruptive event in July 2012: Defining extreme space weather scenarios. Space Weather, 11(10), 585–591. https://doi.org/10.1002/swe.20097
Bernstein, A., Bienstock, D., Hay, D., Uzunoglu, M., & Zussman, G. (2012). Sensitivity analysis of the power grid vulnerability to large-scale cascading failures. ACM SIGMETRICS Performance Evaluation Review, 40(3), 33. https://doi.org/10.1145/2425248.2425256
Burch, J.D. and Thomas, K.E., 1998. Water disinfection for developing countries and potential for solar thermal pasteurization. Solar Energy, 64(1-3), pp.87-97.
Cole, D. D., Denkenberger, D., Griswold, M., Abdelkhaliq, M., & Pearce, J. (2016). Feeding Everyone if Industry is Disabled. Proceedings of the 6th International Disaster and Risk Conference. Presented at the 6th International Disaster and Risk Conference, Davos, Switzerland.
Denkenberger, D. and Pearce, J., 2018. Design optimization of polymer heat exchanger for automated household-scale solar water pasteurizer. Designs, 2(2), 11; https://doi.org/10.3390/designs2020011
Denkenberger, D., Cotton-Barrat, O., Dewey, D., & Li, S. (2019a, August 10). Foods without industry and AI X risk cost effectiveness general far future impact Denkenberger. Retrieved August 10, 2019, from Guesstimate website: https://www.getguesstimate.com/models/11599
Denkenberger, D., Cotton-Barrat, O., Dewey, D., & Li, S. (2019b, August 12). Machine Intelligence Research Institute - Oxford Prioritisation Project. Retrieved August 12, 2019, from Guesstimate website: https://www.getguesstimate.com/models/8789
Denkenberger, D., Cotton-Barratt, O., Dewey, D., & Li, S. (2019, April 10). Food without the sun and AI X risk cost effectiveness general far future impact publication. Retrieved April 10, 2019, from Guesstimate website: https://www.getguesstimate.com/models/13082
Denkenberger, D. C, Cole, D. D., Abdelkhaliq, M., Griswold, M., Hundley, A. B., & Pearce, J. M. (2017). Feeding everyone if the sun is obscured and industry is disabled. International Journal of Disaster Risk Reduction, 21, 284–290.
Denkenberger, D.C. and Pearce, J.M., 2016. Cost-effectiveness of interventions for alternate food to address agricultural catastrophes globally. International Journal of Disaster Risk Science, 7(3), pp.205-215.
Denkenberger, D. C., & Pearce, J. M. (2016). Cost-Effectiveness of Interventions for Alternate Food to Address Agricultural Catastrophes Globally. International Journal of Disaster Risk Science, 7(3), 205–215. https://doi.org/10.1007/s13753-016-0097-2
Effective Altruism Concepts. (2019, April 10). Importance, tractability, neglectedness framework. Retrieved April 10, 2019, from Effective Altruism Concepts website: https://concepts.effectivealtruism.com/concepts/importance-neglectedness-tractability/
Foster, J. S., Gjelde, E., Graham, W. R., Hermann, R. J., Kluepfel, H. (Hank) M., Lawson, R. L., … Woodard, J. B. (2004, July 22). Report of the Commission to Assess the Threat to the United States from Electromagnetic Pulse (EMP) Attack. Retrieved June 30, 2016, from Committee on Armed Services House of Representatives website: http://commdocs.house.gov/committees/security/has204000.000/has204000_0.HTM
Foster, Jr, J. S., Gjelde, E., Graham, W. R., Hermann, R. J., Kluepfel, H. (Hank) M., Lawson, R. L., … Woodard, J. B. (2008). Report of the commission to assess the threat to the united states from electromagnetic pulse (emp) attack: Critical national infrastructures. Retrieved from DTIC Document website: http://www.empcommission.org/docs/A2473-EMP_Commission-7MB.pdf
Goodin, D. (2016, January 4). First known hacker-caused power outage signals troubling escalation. Retrieved from http://arstechnica.com/security/2016/01/first-known-hacker-caused-power-outage-signals-troubling-escalation/
Gregory, J., Stouffer, R. J., Molina, M., Chidthaisong, A., Solomon, S., Raga, G., … Stone, D. A. (2007). Climate Change 2007: The Physical Science Basis. Retrieved from http://copa.acguanacaste.ac.cr:8080/handle/11606/461
Griswold, M., Denkenberger, D., Abdelkhaliq, M., Cole, D., Pearce, J., & Taylor, A. R. (2016). Vitamins in Agricultural Catastrophes. Proceedings of the 6th International Disaster and Risk Conference. Presented at the 6th International Disaster and Risk Conference, Davos, Switzerland.
Hayakawa, H., Ebihara, Y., Willis, D. M., Toriumi, S., Iju, T., Hattori, K., … Ribeiro, J. R. (2019). Temporal and Spatial Evolutions of a Large Sunspot Group and Great Auroral Storms around the Carrington Event in 1859. Space Weather.
Kelly-Detwiler, P. (2014, July 31). Failure to Protect U.S. Against Electromagnetic Pulse Threat Could Make 9/11 Look Trivial Someday. Retrieved August 7, 2019, from https://www.forbes.com/sites/peterdetwiler/2014/07/31/protecting-the-u-s-against-the-electromagnetic-pulse-threat-a-continued-failure-of-leadership-could-make-911-look-trivial-someday/#2ed092db7a14
Keramat, M., & Kielbasa, R. (1997). Latin hypercube sampling Monte Carlo estimation of average quality index for integrated circuits. In Analog Design Issues in Digital VLSI Circuits and Systems (pp. 131–142). Springer.
Kinney, R., Crucitti, P., Albert, R., & Latora, V. (2005). Modeling cascading failures in the North American power grid. The European Physical Journal B, 46(1), 101–107. https://doi.org/10.1140/epjb/e2005-00237-9
Klein, C. (2012, March 14). A Perfect Solar Superstorm: The 1859 Carrington Event. Retrieved August 7, 2019, from HISTORY website: https://www.history.com/news/a-perfect-solar-superstorm-the-1859-carrington-event
Krotofil, M., Cardenas, A., Larsen, J., & Gollmann, D. (2014). Vulnerabilities of cyber-physical systems to stale data—Determining the optimal time to launch attacks. International Journal of Critical Infrastructure Protection, 7(4), 213–232.
Li, S. (2017, May 12). A model of the Machine Intelligence Research Institute - Oxford Prioritisation Project - EA Forum. Retrieved August 12, 2019, from https://forum.effectivealtruism.org/posts/NbFZ9yewJHoicpkBr/a-model-of-the-machine-intelligence-research-institute
McIntyre, P. (2016a, April 12). How you can lower the risk of a catastrophic nuclear war. Retrieved August 13, 2019, from 80,000 Hours website: https://80000hours.org/problem-profiles/nuclear-security/
McIntyre, P. (2016b, April 12). How you can lower the risk of a catastrophic nuclear war. Retrieved August 9, 2019, from 80,000 Hours website: https://80000hours.org/problem-profiles/nuclear-security/
Mekhaldi, F., Muscheler, R., Adolphi, F., Aldahan, A., Beer, J., McConnell, J. R., … Synal, H.-A. (2015). Multiradionuclide evidence for the solar origin of the cosmic-ray events of ᴀᴅ 774/5 and 993/4. Nature Communications, 6.
Motesharrei, S., Rivas, J., & Kalnay, E. (2014). Human and nature dynamics (HANDY): Modeling inequality and use of resources in the collapse or sustainability of societies. Ecological Economics, 101, 90–102. https://doi.org/10.1016/j.ecolecon.2014.02.014
Ord, T. (2014, July 3). The timing of labour aimed at reducing existential risk. Retrieved April 10, 2019, from The Future of Humanity Institute website: https://www.fhi.ox.ac.uk/the-timing-of-labour-aimed-at-reducing-existential-risk/
Pagliery, J. (2015, October 16). Sniper attack on California power grid may have been “an insider,” DHS says. Retrieved August 8, 2019, from CNNMoney website: https://money.cnn.com/2015/10/16/technology/sniper-power-grid/index.html
Prehoda, E. W., Schelly, C., & Pearce, J. M. (2017). US strategic solar photovoltaic-powered microgrid deployment for enhanced national security. Renewable and Sustainable Energy Reviews, 78, 167–175.
Pry, P. V. (2014, May 8). - ELECTROMAGNETIC PULSE (EMP): THREAT TO CRITICAL INFRASTRUCTURE. Retrieved August 14, 2019, from https://www.govinfo.gov/content/pkg/CHRG-113hhrg89763/html/CHRG-113hhrg89763.htm
Ten, C.-W., Manimaran, G., & Liu, C.-C. (2010). Cybersecurity for critical infrastructures: Attack and defense modeling. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 40(4), 853–865.
Umbach, F. (2013, June 29). World Review | Energy infrastructure targeted as cyber attacks increase globally. Retrieved August 8, 2019, from https://web.archive.org/web/20130629041842/https://worldreview.info/content/energy-infrastructure-targeted-cyber-attacks-increase-globally
I've been trying to steelman social justice. Here's one perspective that I like. Forgive me for writing about politics. I don't claim to be certain about any of this. I don't claim that this is the whole truth. Feel free to criticize me, even in bad faith. I won't take it personally.
Inference chains towards "bad"
Consider a situation that you don't like. Lets say you're walking through an alleyway at night. A stranger appears behind you and he is carrying a large knife.
This situation is one of suffering.
But does that make sense? You haven't actually been stabbed. It is not certain that you will get stabbed. It seems that your suffering isn't necessarily based on a reality. It seems to be based on an inference chain:
Stranger with knife -> Stranger will stab me -> I will bleed out and die -> "bad"
And this inference chain can be arbitrarily long. The suffering might already start before the stranger appears:
Alone in an alleyway -> Stranger with knife may appear -> ...
Even when unrelated events cause you to intuitively anticipate this situation to happen:
Lost friends at party -> have to walk home alone -> alone in alleyway -> ...
The lemma I'm constructing here is that suffering isn't based on something being bad, but on a constructed narrative that has a bad ending.
One plausible definition of meditation is that it dissolves narratives. You stare at the constituent facts long enough until the constituent facts of a narrative overwrite it. You're replacing the symbol with the substance.
I once went through a painful breakup. She wanted to try poly, and I agreed. Then when she slept with another guy, I couldn't help but notice some subtle details that led me to believe in a narrative that I was just the "beta male" for her.
I could "disprove" this narrative all I wanted. For every signal that it was true there were at least 10 that it wasn't. But still there was some shadowy part of my psyche that wanted to believe it, and this shadow would sometimes take over in an angry stupor.
Then I went to a meditation retreat. Instead of arguing against this shadow, I stared at it for a good while. The narrative dissolved, and with that, my suffering dissolved as well. We stopped fighting. This, I hypothesize, is why meditation eventually leads to a total lack of suffering. You dissolve all the narratives you believe in and "see the world for how it truly is", as the buddhists suggest.
My second lemma is that meditation is one strategy for a broader development that is dissolving narratives that cause suffering, and that dissolving these narratives is a worthy goal that all of us should strive towards.
Personal boundaries as a first step
One important prerequisite to dissolving a narrative is that it is not currently active. If you meditate long enough, you might eventually be able to tolerate a stranger with a knife without any suffering. But you will not be able to let go of this suffering while the stranger with the knife walks behind you.
In other words, to build tolerance for a perceived lack of safety, you first need to feel safe.
Consider the idea of personal boundaries. I have found this concept to be extremely useful in practice. A personal boundary, as I define it here, is not a loose declaration of where the line is, but a psychological fact about where the line actually is.
The line being between situations that activate an inference chain towards bad, and situations that don't. Between situations that make you suffer, and situations that don't.
In order to dissolve our narratives, we first have to set the boundaries that create a safe space. Only then can we start growing our base, increasing the amount of situations in which we don't suffer.
Public and personal boundaries
Here's where this story becomes political.
I already defined personal boundaries, which is the boundary between situation where one suffers and the situations where one doesn't suffer, regardless of their own opinion about it (though it is good practice to always believe them).
Public boundaries are an essential ingredient of our culture. I define it as the set of behaviors that are acceptable in a public space. It is implied that this set of behaviors will ensure that no one's personal boundaries are crossed.
In other words, public boundaries are what we agree to be the lowest common denominator of what makes a person feel safe.
In any culture, there is a fundamental trade-off between freedom and safety. If you punish more behaviors, more people will feel safe, but there will be less freedom. That's why, as we negotiate public boundaries, the incentives will be in opposition to each other. No wonder this topic is such a shitshow.
Social justice, as I currently understand it, is at least partially about reducing freedom in order to increase safety. Some (many) people, especially those in minority groups, don't have a stable base to grow from. We should have more stringent behavioral norms so that these people can feel safe, in order for them to develop a tolerance for said behavior in the first place.
It is also about reducing actually bad things, but the controversial part is that it also wants to reduce the apparent threat of bad things, even if this appearance of threat is unfounded.
Some people are afraid of spiders. Right now we have some people out there with gigantic posters of spiders in their bedrooms. We're telling them that they are safe, therefore shut up about it already. But perhaps they won't be able to see that the posters are harmless, until we get rid of them.
(To be clear, I'm not saying that there is no danger at all. I don't mean to invalidate anyone's experience. Quite the opposite, actually)
This was one of the most thought-provoking posts I read this month. Mostly because I spent a really large number of hours of my life sleeping, and also significantly increased the amount that I've been sleeping over the past three years, and this has me seriously considering reducing that number again.
The opening section of the article:
His book Why We Sleep (a) was published in September 2017. Part survey of sleep research, part self-help book, it was praised by The New York Times (a), The Guardian (a), and many others. It was named one of NPR’s favorite books of 2017. After publishing the book, Walker gave a TED talk, a talk at Google, and appeared on Joe Rogan’s and Peter Attia’s podcasts. A month after the book’s publication, he became (a) a sleep scientist at Google.
On page 8 of the book, Walker writes:
> [T]he real evidence that makes clear all of the dangers that befall individuals and societies when sleep becomes short have not been clearly telegraphed to the public … In response, this book is intended to serve as a scientifically accurate intervention addressing this unmet need [emphasis in this quote and in all quotes below mine]
In the process of reading the book and encountering some extraordinary claims about sleep, I decided to compare the facts it presented with the scientific literature. I found that the book consistently overstates the problem of lack of sleep, sometimes egregiously so. It misrepresents basic sleep research and contradicts its own sources.
In one instance, Walker claims that sleeping less than six or seven hours a night doubles one’s risk of cancer – this is not supported by the scientific evidence. In another instance, Walker seems to have invented a “fact” that the WHO has declared a sleep loss epidemic. In yet another instance, he falsely claims that the National Sleep Foundation recommends 8 hours of sleep per night, and then uses this “fact” to falsely claim that two-thirds of people in developed nations sleep less than the “the recommended eight hours of nightly sleep” – a myth that spread like wildfire after the book’s publication.
Walker’s book has likely wasted thousands of hours of life and worsened the health of people who read it and took its recommendations at face value.
Any book of Why We Sleep’s length is bound to contain some factual errors. Therefore, to avoid potential concerns about cherry-picking the few inaccuracies scattered throughout, in this essay, I’m going to highlight the five most egregious scientific and factual errors Walker makes in Chapter 1 of the book. This chapter contains 10 pages and constitutes less than 4% of the book by the total word count.
A draft from my personal-productivity journal.TL;DR
A natural 2x2 falls out when you organize tasks by how clear you are on their start/end times (or the amount of time they'll actually take), and how clear it is in a non-time sense that you've started or finished it. I call these two axes time and task delimitation.Time delimitation and task delimitation definedTime delimited
Things which are highly time-delimited have clear, natural start times and end times. Things can also be considered highly time-delimited if you know to a good deal of accuracy about how long they will take. Things which are less time-delimited are fuzzy as to when they start and end.Task delimited
Things which are highly task-delimited have clear, natural non-temporal start and end states. Things which are not highly task-delimited are fuzzier as to what their starting and ending states are.The four kinds of labor
Here's how I generally name and think through these categories.Factory Labor / Serfdom (high time, high task)
Some things are both highly time- and task-delimited. The example par excellence is the academic test. In a typical test, you have a specific amount of time, and a specific start and end state you want to leave the test in. Walk in -> take test -> hand in -> walk out. The preparation for the test is more like Salary Labor (low time/high task, measured against internal confidence that you've studied enough), but the test itself is Factory Labor.
Another example might be the manufacturing work done after a designer has gone through a few mockups and is ready to begin building a prototype. The goal: Create a prototype given blueprints and materials. You have a limited number of workshop hours you can spend in an average day to get them done. This task feels more task- than time-delimited to me, because you can run overtime, but it's pretty high on both axes.
I call this Factory Labor, or Serfdom if I'm feeling ornery. You have a specific number of widgets you have to crank out in your shift, so you work to accomplish that. Most people will end up cranking out about the same, average number of widgets, for the same, average amount of time.Wage Labor / Meetings (High time; low task)
I work IT support as a part time job at my college. I have a set number of hours I have to be in for, and while some of the tasks contained within that time are Factory Work-esque, I'm generally being paid to be there in case I'm needed.
I call highly time-delimited, low task-delimited work like this Wage Labor for that reason. But much like Factory Work, don't let the name fool you: Not all work that falls in this category is in the service industry. Another name I considered was Meetings, because well-organized meetings are highly time-delimited, but often not highly task-delimited: There is an hour blocked for you to discuss whatever you bring to the table.Salary Labor (Low time; high task)
How common is it that we have things we know we have to get done, but we don't know exactly how long they will take? If you ask me or my dad, pretty often, actually. Past a certain point, it actually becomes quite difficult to accurately estimate how long a thing will take on its own. I can estimate that each question on my real analysis homework will take me about an hour to solve, but adding up six uniform distributions makes me a lot more nervous about saying that the homework in total will take about six hours.
Now we're in the territory of Salary Labor. Salaried workers have a huge advantage in that their paychecks tend to be extremely regular; the tradeoff, of course, is that quite frequently the work takes longer than an ordinary 9-to-5 to get done. You have to stay overtime, and you don't necessarily know how long that will take.Labor of Love (Low time; low task)
Say you're practicing guitar. You're pretty serious about it; you'd like to form a rock band maybe, someday, but for now you're just satisfied with becoming a better guitarist.
That's ... Actually a pretty vague category, though isn't it? Like, are we talking "technical death metal" better, or are we talking "blues throwback" better? I think most musicians would find it kind of a weird thing to try to put a box around in general, honestly, even if they could go all in on practicing specifics that add up to the goal, like finger picking or learning scales or the like.
In addition, unlike what Malcolm Gladwell likes to say, there isn't actually some magic "10,000 Hour" number you have to pass before you get certified as a Trve Kvlt Gvitarist. Really, you can't walk into these kinds of things with much of an idea at all about how long overall it'll take you. Best you can do is say "I'll practice for half an hour a day", but even then, there will be days where you play a lot more than that. (Gigs, for instance.)
This is what I call the Labor of Love quadrant. It's actually my favorite of the 4, and the one I'm most inclined to follow with my personal pursuits -- but it's almost never the best path to making money, due to the sheer amount of ambiguity around everything. How long will this take? Oh, you know. Will it at least be good? Might, might not. Aaaaaaaah! Just give your boss a fucking answer!