Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 14 минут 22 секунды назад

Major Update on Cost Disease

5 июня, 2019 - 22:10
Published on June 5, 2019 7:10 PM UTC

Recently I asked about cost disease at the Austin, TX meetup and someone responded "Isn't it just increasing labor costs via the Baumol effect?" and I said "huh?" The next day a friend on facebook linked to Marginal Revolution (MR) discussing the Baumol effect. Next thing I know I'm nerd-sniped.

Turns out Alex Tabarrok at MR just published a free book with the thesis, "Cost disease is mostly just the Baumol effect, which btw isn't a disease and is actually good in a way."

How does this fit in with Scott Alexander's original posts on cost disease? Well here's an imaginary dialogue to demonstrate how I think things went (MR sometimes means Alex Tabarrok and sometimes Tyler Cowen):

-----

(2017)

MR: Education and healthcare are experiencing cost disease!

Scott: Wow you're right! Let's look at a bunch of possible reasons but see that none of them quite work. For example, it can't be the Baumol effect because that implies increasing wages (for i.e. teachers, professors, and doctors), but we see all those wages increasing at rates equal to or less than the average.

MR: Great post Scott!

Scott: ... And despite lots of good comments, I still can't tell what the cause is. It's certainly not just the Baumol effect though.

(2019)

MR: So we did some research and actually, looks like it's mostly just the Baumol effect.

Scott: ???

-----

(Scott's last line is a stand-in for both "Is Scott going to respond?" and "what the eff?")

This back-and-forth of course made me extra confused. What did Scott and Alex see differently? Have the relevant salaries been greatly increasing or not?

Seems they disagree on the basic facts here. Focusing just on public K-12 education for simplicity, in 2017 Scott posted this (section III) graph:

which seems to show unimpressive changes in teacher salaries, ruling out Baumol. In contrast, Alex's new 2019 book gives this (page 19) graph:

which shows huge increases in "expenditures per instructor." And he insists that trend is mostly driven by increases in teacher salary and benefits. So either those are some serious benefits, or Scott and Alex are living in different USA's. I'm not sure what's going on here. It's very exciting that Alex says he's solved "cost disease," but it seems like a piece of the story is either missing or confused. (Or I just need to read his whole book.) Comments welcome!



Discuss

Book Review: The Secret Of Our Success

5 июня, 2019 - 09:50
Published on June 5, 2019 6:50 AM UTC

[Previously in sequence: Epistemic Learned Helplessness]

I.

“Culture is the secret of humanity’s success” sounds like the most vapid possible thesis. The Secret Of Our Success by anthropologist Joseph Heinrich manages to be an amazing book anyway.

Heinrich wants to debunk (or at least clarify) a popular view where humans succeeded because of our raw intelligence. In this view, we are smart enough to invent neat tools that help us survive and adapt to unfamiliar environments.

Against such theories: we cannot actually do this. Heinrich walks the reader through many stories about European explorers marooned in unfamiliar environments. These explorers usually starved to death. They starved to death in the middle of endless plenty. Some of them were in Arctic lands that the Inuit considered among their richest hunting grounds. Others were in jungles, surrounded by edible plants and animals. One particularly unfortunate group was in Alabama, and would have perished entirely if they hadn’t been captured and enslaved by local Indians first.

These explorers had many advantages over our hominid ancestors. For one thing, their exploration parties were made up entirely of strong young men in their prime, with no need to support women, children, or the elderly. They were often selected for their education and intelligence. Many of them were from Victorian Britain, one of the most successful civilizations in history, full of geniuses like Darwin and Galton. Most of them had some past experience with wilderness craft and survival. But despite their big brains, when faced with the task our big brains supposedly evolved for – figuring out how to do hunting and gathering in a wilderness environment – they failed pathetically.

Nor is it surprising that they failed. Hunting and gathering is actually really hard. Here’s Heinrich’s description of how the Inuit hunt seals:

You first have to find their breathing holes in the ice. It’s important that the area around the hole be snow-covered—otherwise the seals will hear you and vanish. You then open the hole, smell it to verify it’s still in use (what do seals smell like?), and then assess the shape of the hole using a special curved piece of caribou antler. The hole is then covered with snow, save for a small gap at the top that is capped with a down indicator. If the seal enters the hole, the indicator moves, and you must blindly plunge your harpoon into the hole using all your weight. Your harpoon should be about 1.5 meters (5ft) long, with a detachable tip that is tethered with a heavy braid of sinew line. You can get the antler from the previously noted caribou, which you brought down with your driftwood bow.

The rear spike of the harpoon is made of extra-hard polar bear bone (yes, you also need to know how to
kill polar bears; best to catch them napping in their dens). Once you’ve plunged your harpoon’s head into the seal, you’re then in a wrestling match as you reel him in, onto the ice, where you can finish him off with the aforementioned bear-bone spike.

Now you have a seal, but you have to cook it. However, there are no trees at this latitude for wood, and
driftwood is too sparse and valuable to use routinely for fires. To have a reliable fire, you’ll need to carve a lamp from soapstone (you know what soapstone looks like, right?), render some oil for the lamp from blubber, and make a wick out of a particular species of moss. You will also need water. The pack ice is frozen salt water, so using it for drinking will just make you dehydrate faster. However, old sea ice has lost most of its salt, so it can be melted to make potable water. Of course, you need to be able to locate and identify old sea ice by color and texture. To melt it, make sure you have enough oil for your soapstone lamp.

No surprise that stranded explorers couldn’t figure all this out. It’s more surprising that the Inuit did. And although the Arctic is an unusually hostile place for humans, Heinrich makes it clear that hunting-gathering techniques of this level of complexity are standard everywhere. Here’s how the Indians of Tierra del Fuego make arrows:

Among the Fuegians, making an arrow requires a 14-step procedure that involves using seven different tools to work six different materials. Here are some of the steps:

– The process begins by selecting the wood for the shaft, which preferably comes from chaura, a
bushy, evergreen shrub. Though strong and light, this wood is a non-intuitive choice since the
gnarled branches require extensive straightening (why not start with straighter branches?).

– The wood is heated, straightened with the craftsman’s teeth, and eventually finished with a
scraper. Then, using a pre-heated and grooved stone, the shaft is pressed into the grooves and
rubbed back and forth, pressing it down with a piece of fox skin. The fox skin becomes
impregnated with the dust, which prepares it for the polishing stage (Does it have to be fox skin?).

– Bits of pitch, gathered from the beach, are chewed and mixed with ash (What if you don’t include the ash?).

– The mixture is then applied to both ends of a heated shaft, which must then be coated with white clay (what about red clay? Do you have to heat it?). This prepares the ends for the fletching and arrowhead.

– Two feathers are used for the fletching, preferably from upland geese (why not chicken feathers?).

– Right-handed bowman must use feathers from the left wing of the bird, and vice versa for lefties (Does this really matter?).

– The feathers are lashed to the shaft using sinews from the back of the guanaco, after they are smoothed and thinned with water and saliva (why not sinews from the fox that I had to kill for the aforementioned skin?).

Next is the arrowhead, which must be crafted and then attached to the shaft, and of course there is also the bow, quiver and archery skills. But, I’ll leave it there, since I think you get the idea.

How do hunter-gatherers know how to do all this? We usually summarize it as “culture”. How did it form? Not through some smart Inuit or Fuegian person reasoning it out; if that had been it, smart European explorers should have been able to reason it out too.

The obvious answer is “cultural evolution”, but Heinrich isn’t much better than anyone else at taking the mystery out of this phrase. Trial and error must have been involved, and less successful groups/people imitating the techniques of more successful ones. But is that really a satisfying explanation?

I found the chapter on language a helpful reminder that we already basically accept something like this is true. How did language get invented? I’m especially interested in this question because of my brief interactions with conlanging communities – people who try to construct their own languages as a hobby or as part of a fantasy universe, like Tolkien did with Elvish. Most people are terrible at this; their languages are either unusable, or exact clones of English. Only people who (like Tolkien) already have years of formal training in linguistics can do a remotely passable job. And you’re telling me the original languages were invented by cavemen? Surely there was no committee of Proto-Indo-European nomads that voted on whether to have an inflecting or agglutinating tongue? Surely nobody ran out of their cave shouting “Eureka!” after having discovered the interjection? We just kind of accept that after cavemen working really hard to communicate with each other, eventually language – still one of the most complicated and impressive productions of the human race – just sort of happened.

Taking the generation of culture as secondary to this kind of mysterious process, Heinrich turns to its transmission. If cultural generation happens at a certain rate, then the fidelity of transmission determines whether a given society advances, stagnates, or declines.

For Heinrich, humans started becoming more than just another species of monkey when we started transmitting culture with high fidelity. Some anthropoligsts talk about the Machiavellian Intelligence Hypothesis – the theory that humans evolved big brains in order to succeed at social manuevering and climbing dominance hierarchies. Heinrich counters with his own Cultural Intelligence Hypothesis – humans evolved big brains in order to be able to maintain things like Inuit seal hunting techniques. Everything that separates us from the apes is part of an evolutionary package designed to help us maintain this kind of culture, exploit this kind of culture, or adjust to the new abilities that this kind of culture gave us.

II.

Secret gives many examples of many culture-related adaptations, and not all are in the brain.

One of the most important differences between man and ape is our puny digestive tracts:

Our mouths are the size of the squirrel monkey’s, a species that weighs less than three pounds. Chimpanzees can open their mouths twice as ide as we can and hold substantial amounts of food compressed between their lips and large teeth. We also have puny jaw muscles that reach up only to just below our ears. Other primates’ jaw muscles stretch to the top of their heads, where they sometimes even latch onto a central bony ridge. Our stomachs are small, having only a third of the surface area that we’d expect for a primate of our size, and our colons are too short, being only 60% of their expected mass.

Compared to other animals, we have such atrophied digestive tracts that we shouldn’t be able to live. What saves us? All of our food processing techniques, especially cooking, but also chopping, rinsing, boiling, and soaking. We’ve done much of the work of digestion before food even enters our mouths. Our culture teaches us how to do this, both in broad terms like “hold things over fire to cook them” and in specific terms like “this plant needs to be soaked in water for 24 hours to leach out the toxins”. Each culture has its own cooking knowledge related to the local plants and animals; a frequent cause of death among European explorers was cooking things in ways that didn’t unlock any of the nutrients, and so starving while apparently well-fed.

All of this is cultural. Heinrich is kind of cruel in his insistence on this. He recommends readers go outside and try to start a fire. He even gives some helpful hints – flint is involved, rubbing two sticks together works for some people, etc. He predicts – and stories I’ve heard from unfortunate campers confirm – that you will not be able to do this, despite an IQ far beyond that of most of our hominid ancestors. In fact, some groups (most notably the aboriginal Tasmanians) seem to have lost the ability to make fire, and never rediscovered it. Fire-making was discovered a small number of times, maybe once, and has been culturally transmitted since then.

And food processing techniques are even more complicated. Nixtamalization of corn, necessary to prevent vitamin deficiencies, involves soaking the corn in a solution containing ground-up burnt seashells. The ancient Mexicans discovered this and lived off corn just fine for millennia. When the conquistadors took over, they ignored it and ate corn straight. For four hundred years, Europeans and Americans ate unnixtamalized corn. By official statistics, three million Americans came down with corn-related vitamin deficiencies during this time, and up to a hundred thousand died. It wasn’t until 1937 that Western scientists discovered which vitamins were involved and developed an industrial version of nixtamalization that made corn safe. Early 1900s Americans were very smart and had lots of advantages over ancient Mexicans. But the ancient Mexicans’ culture got this one right in a way it took Westerners centuries to match.

Humans are persistence hunters: they cannot run as fast as gazelles, but they can keep running for longer than gazelles (or almost anything else). Why did we evolve into that niche? The secret is our ability to carry water. Every hunter-gatherer culture has invented its own water-carrying techniques, usually some kind of waterskin. This allowed humans to switch to perspiration-based cooling systems, which allowed them to run as long as they want.

And humans are consumate tool users. In some cases, we evolved in order to use tools better; our hands outclass those of any other ape in terms of finesse. In other cases, we devolved systems that were no longer necessary once tools took over. We are vastly weaker than any other ape. Heinrich describes a circus act of the 1940s where the ringmaster would challenge strong men in the audience to wrestle a juvenile chimpanzee. The chimpanzee was tied up, dressed in a mask that prevented it from biting, and wearing soft gloves that prevented it from scratching. No human ever lasted more than five seconds. Our common ancestor with other apes grew weaker and weaker as we became more and more reliant on artificial weapons to give us an advantage.

III.

But most of our differences from other apes are indeed in the brain. They’re just not necessarily where you would expect.

Tomasello et al tested human toddlers vs. apes on a series of traditional IQ type questions. The match-up was surprisingly fair; in areas like memory, logic, and spatial reasoning, the three species did about the same. But in ability to learn from another person, humans wiped the floor with the other two ape species:

Remember, Heinrich thinks culture accumulates through random mutation. Humans don’t have control over how culture gets generated. They have more control over how much of it gets transmitted to the next generation. If 100% gets transmitted, then as more and more mutations accumulate, the culture becomes better and better. If less than 100% gets transmitted, then at some point new culture gained and old culture lost fall into equilibrium, and your society stabilizes at some higher or lower technological level. This means that transmitting culture to the next generation is maybe the core human skill. The human brain is optimized to make this work as well as possible.

Human children are obsessed with learning things. And they don’t learn things randomly. There seem to be “biases in cultural learning”, ie slots in an infant’s mind that they know need to be filled with knowledge, and which they preferentially seek out the knowledge necessary to fill.

One slot is for language. Human children naturally listen to speech (as early as in the womb). They naturally prune the phonemes they are able to produce and distinguish to the ones in the local language. And they naturally figure out how to speak and understand what people are saying, even though learning a language is hard even for smart adults.

Another slot is for animals. In a world where megafauna has been relegated to zoos, we still teach children their ABCs with “L is for lion” and “B is for bear”, and children still read picture books about Mr. Frog and Mrs. Snake holding tea parties. Heinrich suggests that just as the young brain is hard-coded to want to learn language, so it is hard-coded to want to learn the local animal life (little boys’ vehicle obsession may be a weird outgrowth of this; buses and trains are the closest thing to local megafauna that most of them will encounter).

Another slot is for plants:

To see this system in operation, let’s consider how infants respond to unfamiliar plants. Plants are loaded with prickly thorns, noxious oils, stinging nettles and dangerous toxins, all genetically evolved to prevent animals like us from messing with them. Given our species wide geographic range and diverse use of plants as foods, medicines and construction materials, we ought to be primed to both learn about plants and avoid their dangers. To explore this idea in the lab, the psychologists Annie Wertz and Karen Wynn first gave infants, who ranged in age from eight to eighteen months, an opportunity to touch novel plants (basil and parsley) and artifacts, including both novel objects and common ones, like wooden spoons and small lamps.

The results were striking. Regardless of age, many infants flatly refused to touch the plants at all. When they did touch them, they waited substantially longer than they did with the artifacts. By contrast, even with the novel objects, infants showed none of this reluctance. This suggests that well before one year of age infants can readily distinguish plants from other things, and are primed for caution with plants. But, how do they get past this conservative predisposition?

The answer is that infants keenly watch what other people do with plants, and are only inclined to touch or eat the plants that other people have touched or eaten. In fact, once they get the ‘go ahead’ via cultural learning, they are suddenly interested in eating plants. To explore this, Annie and Karen exposed infants to models who both picked fruit from plants and also picked fruit-like things from an artifact of similar size and shape to the plant. The models put both the fruit and the fruit-like things in their mouths. Next, the infants were given a choice to go for the fruit (picked from the plant) or the fruit-like things picked from the object. Over 75% of the time the infants went for the fruit, not the fruit-like things, since they’d gotten the ‘go ahead’ via cultural learning.

As a check, the infants were also exposed to models putting the fruit or fruit-like things behind their ears(not in their mouths). In this case, the infants went for the fruit or fruit-like things in equal measure. It seems that plants are most interesting if you can eat them, but only if you have some cultural learning cues that they aren’t toxic.

After Annie first told me about her work while I was visiting Yale in 2013, I went home to test it on my 6-month-old son, Josh. Josh seemed very likely to overturn Annie’s hard empirical work, since he
immediately grasped anything you gave him and put it rapidly in his mouth. Comfortable in his mom’s
arms, I first offered Josh a novel plastic cube. He delighted in grapping it and shoving it directly into his mouth, without any hesitation. Then, I offered him a sprig of arugula. He quickly grabbed it, but then paused, looked with curious uncertainty at it, and then slowly let it fall from his hand while turning to hug his mom.

It’s worth pointing out how rich the psychology is here. Not only do infants have to recognize that plants are different from objects of similar size, shape and color, but they need to create categories for types of plants, like basil and parsley, and distinguish ‘eating’ from just ‘touching’. It does them little good to code their observation of someone eating basil as ‘plants are good to eat’ since that might cause them to eat poisonous plants as well as basil. But, it also does them little good to narrowly code the observation as ‘that particular sprig of basil is good to eat’ since that particular sprig has just been eaten by the person they are watching. This another content bias in cultural learning.

This ties into the more general phenomenon of figuring out what’s edible. Most Westerners learn insects aren’t edible; some Asians learn that they are. This feels deeper than just someone telling you insects aren’t edible and you believing them. When I was in Thailand, my guide offered me a giant cricket, telling me it was delicious. I believed him when he said it was safe to eat, I even believed him when he said it tasted good to him, but my conditioning won out – I didn’t eat the cricket. There seems to be some process where a child’s brain learns what is and isn’t locally edible, then hard-codes it against future change.

(Or so they say; I’ve never been able to eat shrimp either.)

Another slot is for gender roles. By now we’ve all heard the stories of progressives who try to raise their children without any exposure to gender. Their failure has sometimes been taken as evidence that gender is hard-coded. But it can’t be quite that simple: some modern gender roles, like girls = pink, are far from obvious or universal. Instead, it looks like children have a hard-coded slot that gender roles go into, work hard to figure out what the local gender roles are (even if their parents are trying to confuse them), then latch onto them and don’t let go.

In the Cultural Intelligence Hypothesis, humans live in obligate symbiosis with a culture. A brain without an associated culture is incomplete and not very useful. So the infant brain is adapted to seek out the important aspects of its local culture almost from birth and fill them into the appropriate slots in order to become whole.

IV.

The next part of the book discusses post-childhood learning. This plays an important role in hunter-gatherer tribes:

While hunters reach their peak strength and speed in their twenties, individual hunting success does not peak until around age 30, because success depends more on know-how and refined skills than on physical prowess.

This part of the book made most sense in the context of examples like the Inuit seal-hunting strategy which drove home just how complicated and difficult hunting-gathering was. Think less “Boy Scouts” and more “PhD”; a primitive tribesperson’s life requires mastery of various complicated technologies and skills. And the difference between “mediocre hunter” and “great hunter” can be the difference between high status (and good mating opportunities) and low status, or even between life and death. Hunter-gatherers really want to learn the essentials of their hunter-gatherer lifestyle, and learning it is really hard. Their heuristics are:

Learn from people who are good at things and/or widely-respected. If you haven’t already read about the difference between dominance and prestige hierarchies, check out Kevin Simler’s blog post on the topic. People will fear and obey authority figures like kings and chieftains, but they give a different kind of respect (“prestige”) to people who seem good at things. And since it’s hard to figure out who’s good at things (can a non-musician who wants to start learning music tell the difference between a merely good performer and one of the world’s best?) most people use the heuristic of respecting the people who other people respect. Once you identify someone as respect-worthy, you strongly consider copying them in, well, everything:

To understand prestige as a social phenomenon, it’s crucial to realize that it’s often difficult to figure out what precisely makes someone successful. In modern societies, the success of a star NBA basketball player might arise from his:

(1) intensive practice in the offseason
(2) sneaker preference
(3) sleep schedule
(4) pre-game prayer
(5) special vitamins
(6) taste for carrots

Any or all of these might increase his success. A naïve learner can’t tell all the causal links between an individual’s practices and his success. As a consequence, learners often copy their chosen models broadly across many domains. Of course, learners may place more weight on domains that for one reason or other seem more causally relevant to the model’s success. This copying often includes the model’s personal habits or styles as well as their goals and motivations, since these may be linked to their success. This “if in doubt, copy it” heuristic is one of the reasons why success in one domain converts to influence across a broad range of domains.

The immense range of celebrity endorsements in modern societies shows the power of prestige. For
example, NBA star Lebron James, who went directly from High School to the pros, gets paid millions to
endorse State Farm Insurance. Though a stunning basketball talent, it’s unclear why Mr. James is qualified to recommend insurance companies. Similarly, Michael Jordan famously wore Hanes underwear and apparently Tiger Woods drove Buicks. Beyonce’ drinks Pepsi (at least in commercials). What’s the connection between musical talent and sugary cola beverages?

Finally, while new medical findings and public educational campaigns only gradually influence women’s approach to preventive medicine, Angelina Jolie’s single OP-ED in the New York Times, describing her decision to get a preventive double mastectomy after learning she had the ‘faulty’ BRCA1 gene, flooded clinics from the U.K. to New Zealand with women seeking genetic screenings for breast cancer. Thus, an unwanted evolutionary side effect, prestige turns out to be worth millions, and represents a powerful and underutilized public health tool.

Of course, this creates the risk of prestige cascades, where some irrelevant factor (Heinrich mentions being a reality show star) catapults someone to fame, everyone talks about them, and you end up with Muggeridge’s definition of a celebrity: someone famous for being famous.

Some of this makes more sense if you go back to the evolutionary roots, and imagine watching the best hunter in your tribe to see what his secret is, or being nice to him in the hopes that he’ll take you under his wing and teach you stuff.

(but if all this is true, shouldn’t public awareness campaigns that hire celebrity spokespeople be wild successes? Don’t they just as often fail, regardless of how famous a basketball player they can convince to lecture schoolchildren about how Winners Don’t Do Drugs?)

Learn from people who are like you. If you are a man, it is probably a bad idea to learn fashion by observing women. If you are a servant, it is probably a bad idea to learn the rules of etiquette by observing how the king behaves. People are naturally inclined to learn from people more similar to themselves.

Heinrich ties this in to various studies showing that black students learn best from a black teacher, female students from a female teacher, et cetera.

Learn from old people. Humans are almost unique in having menopause; most animals keep reproducing until they die in late middle-age. Why does evolution want humans to stick around without reproducing?

Because old people have already learned the local culture and can teach it to others. Heinrich asks us to throw out any personal experience we have of elders; we live in a rapidly-changing world where an old person is probably “behind the times”. But for most of history, change happened glacially slowly, and old people would have spent their entire lives accumulating relevant knowledge. Imagine a world where when a Silicon Valley programmer can’t figure out how to make his code run, he calls up his grandfather, who spent fifty years coding apps for Google and knows every programming language inside and out.

Sometimes important events only happen once in a generation. Heinrich tells the story of an Australian aboriginal tribe facing a massive drought. Nobody knew what to do except Paralji, the tribe’s oldest man, who had lived through the last massive drought and remembered where his own elders had told him to find the last-resort waterholes.

This same dynamic seems to play out even in other species:

In 1993, a severe drought hit Tanzania, resulting in the death of 20% of the African elephant calves in a population of about 200. This population contained 21 different families, each of which was led by a single matriarch. The 21 elephant families were divided into 3 clans, and each clan shared the same territory during the wet season (so, they knew each other). Researchers studying these elephants have analyzed the survival of the calves and found that families led by older matriarchs suffered fewer deaths of their calves during this drought.

Moreover, two of the three elephant clans unexpectedly left the park during the drought, presumably in search of water, and both had much higher survival rates than the one clan that stayed behind. It happens that these severe droughts only hit about once every four to five decades, and the last one hit about 1960. After that, sadly, elephant poaching in the 1970’s killed off many of the elephants who would have been old enough in 1993 to recall the 1960 drought. However, it turns out that exactly one member of each of the two clans who left the park, and survived more effectively, were old enough to recall life in 1960. This suggests, that like Paralji in the Australian desert, they may have remembered what to do during a severe drought, and led their groups to the last water refuges. In the clan who stayed behind, the oldest member was born in 1960, and so was too young to have recalled the last major drought.

More generally, aging elephant matriarchs have a big impact on their families, as those led by older matriarchs do better at identifying and avoiding predators (lions and humans), avoiding internal conflicts and identifying the calls of their fellow elephants. For example, in one set of field experiments, researchers played lion roars from both male and female lions, and from either a single lion or a trio of lions. For elephants, male lions are much more dangerous than females, and of course, three lions are always worse than only one lion. All the elephants generally responded with more defensive preparations when they heard three lions vs. one. However, only the older matriarchs keenly recognized the increased dangers of male lions over female lions, and responded to the increased threat with elephant defensive maneuvers.

V.

I was inspired to read Secret by this review on Scholar’s Stage. I hate to be unoriginal, but after reading the whole book, I agree that the three sections Tanner cites – on divination, on manioc, and on shark taboos – are by far the best and most fascinating.

On divination:

When hunting caribou, Naskapi foragers in Labrador, Canada, had to decide where to go. Common sense might lead one to go where one had success before or to where friends or neighbors recently spotted caribou.

However, this situation is like [the Matching Pennies game]. The caribou are mismatchers and the hunters are matchers. That is, hunters want to match the locations of caribou while caribou want to mismatch the hunters, to avoid being shot and eaten. If a hunter shows any bias to return to previous spots, where he or others have seen caribou, then the caribou can benefit (survive better) by avoiding those locations (where they have previously seen humans). Thus, the best hunting strategy requires randomizing.

Can cultural evolution compensate for our cognitive inadequacies? Traditionally, Naskapi hunters decided where to go to hunt using divination and believed that the shoulder bones of caribou could point the way to success. To start the ritual, the shoulder blade was heated over hot coals in a way that caused patterns of cracks and burnt spots to form. This patterning was then read as a kind of map, which was held in a pre-specified orientation. The cracking patterns were (probably) essentially random from the point of view of hunting locations, since the outcomes depended on myriad details about the bone, fire, ambient temperature, and heating process. Thus, these divination rituals may have provided a crude randomizing device that helped hunters avoid their own decision-making biases.

This is not some obscure, isolated practice, and other cases of divination provide more evidence. In Indonesia, the Kantus of Kalimantan use bird augury to select locations for their agricultural plots. Geographer Michael Dove argues that two factors will cause farmers to make plot placements that are too risky. First, Kantu ecological models contain the Gambler’s Fallacy, and lead them to expect floods to be less likely to occur in a specific location after a big flood in that location (which is not true). Second…Kantus pay attention to others’ success and copy the choices of successful households, meaning that if one of their neighbors has a good yield in an area one year, many other people will want to plant there in the next year. To reduce the risks posed by these cognitive and decision-making biases, Kantu rely on a system of bird augury that effectively randomizes their choices for locating garden plots, which helps them avoid catastrophic crop failures. Divination results depend not only on seeing a particular bird species in a particular location, but also on what type of call the bird makes (one type of call may be favorable, and another unfavorable).

The patterning of bird augury supports the view that this is a cultural adaptation. The system seems to have evolved and spread throughout this region since the 17th century when rice cultivation was introduced. This makes sense, since it is rice cultivation that is most positively influenced by randomizing garden locations. It’s possible that, with the introduction of rice, a few farmers began to use bird sightings as an indication of favorable garden sites. On-average, over a lifetime, these farmers would do better – be more successful – than farmers who relied on the Gambler’s Fallacy or on copying others’ immediate behavior. Whatever the process, within 400 years, the bird augury system spread throughout the agricultural populations of this Borneo region. Yet, it remains conspicuously missing or underdeveloped among local foraging groups and recent adopters of rice agriculture, as well as among populations in northern Borneo who rely on irrigation. So, bird augury has been systematically spreading in those regions where it’s most adaptive.

Scott Aaronson has written about how easy it is to predict people trying to “be random”:

In a class I taught at Berkeley, I did an experiment where I wrote a simple little program that would let people type either “f” or “d” and would predict which key they were going to push next. It’s actually very easy to write a program that will make the right prediction about 70% of the time. Most people don’t really know how to type randomly. They’ll have too many alternations and so on. There will be all sorts of patterns, so you just have to build some sort of probabilistic model. Even a very crude one will do well. I couldn’t even beat my own program, knowing exactly how it worked. I challenged people to try this and the program was getting between 70% and 80% prediction rates. Then, we found one student that the program predicted exactly 50% of the time. We asked him what his secret was and he responded that he “just used his free will.”

But being genuinely random is important in pursuing mixed game theoretic strategies. Heinrich’s view is that divination solved this problem effectively.

I’m reminded of the Romans using augury to decide when and where to attack. This always struck me as crazy; generals are going to risk the lives of thousands of soldiers because they saw a weird bird earlier that morning? But war is a classic example of when a random strategy can be useful. If you’re deciding whether to attack the enemy’s right vs. left flank, it’s important that the enemy can’t predict your decision and send his best defenders there. If you’re generally predictable – and Scott Aaronson says you are – then outsourcing your decision to weird birds might be the best way to go.

And then there’s manioc. This is a tuber native to the Americas. It contains cyanide, and if you eat too much of it, you get cyanide poisoning. From Heinrich:

In the Americas, where manioc was first domesticated, societies who have relied on bitter varieties for thousands of years show no evidence of chronic cyanide poisoning. In the Colombian Amazon, for example, indigenous Tukanoans use a multistep, multiday processing technique that involves scraping, grating, and finally washing the roots in order to separate the fiber, starch, and liquid. Once separated, the liquid is boiled into a beverage, but the fiber and starch must then sit for two more days, when they can then be baked and eaten. Figure 7.1 shows the percentage of cyanogenic content in the liquid, fiber, and starch remaining through each major step in this processing.

Such processing techniques are crucial for living in many parts of Amazonia, where other crops are difficult to cultivate and often unproductive. However, despite their utility, one person would have a difficult time figuring out the detoxification technique. Consider the situation from the point of view of the children and adolescents who are learning the techniques. They would have rarely, if ever, seen anyone get cyanide poisoning, because the techniques work. And even if the processing was ineffective, such that cases of goiter (swollen necks) or neurological problems were common, it would still be hard to recognize the link between these chronic health issues and eating manioc. Most people would have eaten manioc for years with no apparent effects. Low cyanogenic varieties are typically boiled, but boiling alone is insufficient to prevent the chronic conditions for bitter varieties. Boiling does, however, remove or reduce the bitter taste and prevent the acute symptoms (e.g., diarrhea, stomach troubles, and vomiting).

So, if one did the common-sense thing and just boiled the high-cyanogenic manioc, everything would seem fine. Since the multistep task of processing manioc is long, arduous, and boring, sticking with it is certainly non-intuitive. Tukanoan women spend about a quarter of their day detoxifying manioc, so this is a costly technique in the short term. Now consider what might result if a self-reliant Tukanoan mother decided to drop any seemingly unnecessary steps from the processing of her bitter manioc. She might critically examine the procedure handed down to her from earlier generations and conclude that the goal of the procedure is to remove the bitter taste. She might then experiment with alternative procedures by dropping some of the more labor-intensive or time-consuming steps. She’d find that with a shorter and much less labor-intensive process, she could remove the bitter taste. Adopting this easier protocol, she would have more time for other activities, like caring for her children. Of course, years or decades later her family would begin to develop the symptoms of chronic cyanide poisoning.

Thus, the unwillingness of this mother to take on faith the practices handed down to her from earlier generations would result in sickness and early death for members of her family. Individual learning does not pay here, and intuitions are misleading. The problem is that the steps in this procedure are causally opaque—an individual cannot readily infer their functions, interrelationships, or importance. The causal opacity of many cultural adaptations had a big impact on our psychology.

Wait. Maybe I’m wrong about manioc processing. Perhaps it’s actually rather easy to individually figure out the detoxification steps for manioc? Fortunately, history has provided a test case. At the beginning of the seventeenth century, the Portuguese transported manioc from South America to West Africa for the first time. They did not, however, transport the age-old indigenous processing protocols or the underlying commitment to using those techniques. Because it is easy to plant and provides high yields in infertile or drought-prone areas, manioc spread rapidly across Africa and became a staple food for many populations. The processing techniques, however, were not readily or consistently regenerated. Even after hundreds of years, chronic cyanide poisoning remains a serious health problem in Africa. Detailed studies of local preparation techniques show that high levels of cyanide often remain and that many individuals carry low levels of cyanide in their blood or urine, which haven’t yet manifested in symptoms. In some places, there’s no processing at all, or sometimes the processing actually increases the cyanogenic content. On the positive side, some African groups have in fact culturally evolved effective processing techniques, but these techniques are spreading only slowly.

Rationalists always wonder: how come people aren’t more rational? How come you can prove a thousand times, using Facts and Logic, that something is stupid, and yet people will still keep doing it?

Heinrich hints at an answer: for basically all of history, using reason would get you killed.

A reasonable person would have figured out there was no way for oracle-bones to accurately predict the future. They would have abandoned divination, failed at hunting, and maybe died of starvation.

A reasonable person would have asked why everyone was wasting so much time preparing manioc. When told “Because that’s how we’ve always done it”, they would have been unsatisfied with that answer. They would have done some experiments, and found that a simpler process of boiling it worked just as well. They would have saved lots of time, maybe converted all their friends to the new and easier method. Twenty years later, they would have gotten sick and died, in a way so causally distant from their decision to change manioc processing methods that nobody would ever have been able to link the two together.

Heinrich discusses pregnancy taboos in Fiji; pregnant women are banned from eating sharks. Sure enough, these sharks contain chemicals that can cause birth defects. The women didn’t really know why they weren’t eating the sharks, but when anthropologists demanded a reason, they eventually decided it was because their babies would be born with shark skin rather than human skin. As explanations go, this leaves a lot to be desired. How come you can still eat other fish? Aren’t you worried your kids will have scales? Doesn’t the slightest familiarity with biology prove this mechanism is garbage? But if some smart independent-minded iconoclastic Fijian girl figured any of this out, she would break the taboo and her child would have birth defects.

In giving humans reason at all, evolution took a huge risk. Surely it must have wished there was some other way, some path that made us big-brained enough to understand tradition, but not big-brained enough to question it. Maybe it searched for a mind design like that and couldn’t find one. So it was left with this ticking time-bomb, this ape that was constantly going to be able to convince itself of hare-brained and probably-fatal ideas.

Here, too, culture came to the rescue. One of the most important parts of any culture – more important than the techniques for hunting seals, more important than the techniques for processing tubers – is techniques for making sure nobody ever questions tradition. Like the belief that anyone who doesn’t conform is probably a witch who should be cast out lest they bring destruction upon everybody. Or the belief in a God who has commanded certain specific weird dietary restrictions, and will torture you forever if you disagree. Or the fairy tales where the prince asks a wizard for help, and the wizard says “You may have everything you wish forever, but you must never nod your head at a badger”, and then one day the prince nods his head at a badger, and his whole empire collapses into dust, and the moral of the story is that you should always obey weird advice you don’t understand.

There’s a monster at the end of this book. Humans evolved to transmit culture with high fidelity. And one of the biggest threats to transmitting culture with high fidelity was Reason. Our ancestors lived in Epistemic Hell, where they had to constantly rely on causally opaque processes with justifications that couldn’t possibly be true, and if they ever questioned them then they might die. Historically, Reason has been the villain of the human narrative, a corrosive force that tempts people away from adaptive behavior towards choices that “sounded good at the time”.

Why are people so bad at reasoning? For the same reason they’re so bad at letting poisonous spiders walk all over their face without freaking out. Both “skills” are really bad ideas, most of the people who tried them died in the process, so evolution removed those genes from the population, and successful cultures stigmatized them enough to give people an internalized fear of even trying.

VI.

This book belongs alongside Seeing Like A State and the works of G.K. Chesterton as attempts to justify tradition, and to argue for organically-evolved institutions over top-down planning. What unique contribution does it make to this canon?

First, a lot more specifically anthropological / paleoanthropological rigor than the other two.

Second, a much crisper focus: Chesterton had only the fuzziest idea that he was writing about cultural evolution, and Scott was only a little clearer. I think Heinrich is the only one of the three to use the term, and once you hear it, it’s obviously the right framing.

Third, a sense of how traditions contain the meta-tradition of defending themselves against Reason, and a sense for why this is necessary.

And fourth, maybe we’re not at the point where we really want unique contributions yet. Maybe we’re still at the point where we have to have this hammered in by more and more examples. The temptation is always to say “Ah, yes, a few simple things like taboos against eating poisonous plants may be relics of cultural evolution, but obviously by now we’re at the point where we know which traditions are important vs. random looniness, and we can rationally stick to the important ones while throwing out the garbage.” And then somebody points out to you that actually divination using oracle bones was one of the important traditions, and if you thought you knew better than that and tried to throw it out, your civilization would falter.

Maybe we just need to keep reading more similarly-themed books until this point really sinks in, and we get properly worried.



Discuss

All knowledge is circularly justified

5 июня, 2019 - 01:39
Published on June 4, 2019 10:39 PM UTC

Many philosophers have tried to find the foundations of our knowledge, but why do we think there are any? The framing of foundations implies a separate bottom layer of knowledge from which everything is built up. And while this is undoubtedly a useful model in many contexts, why should we believe in this as the complete and literal truth as opposed to merely a simplification?

Consider:

1) If we dig deep enough into any of our truth claims, we'll eventually reach a point at which they are justified by intuition

2) The reliability of intuition or various intuitions is not something that is merely taken as basic or for granted, but can instead be justified somewhat by arguments from experience and evolutionary arguments.

3) However both empirical verification and evolutionary arguments themselves both rely on assumptions that are justified by intuition

This is circular, but is this necessarily a problem? If your choice is a circular justification or eventually hitting a level with no justification, then the circular justification suddenly starts looking pretty attractive.

Is this important? I don't know. For applied rationality, not so much. But perhaps the more philosophical areas of the rationalist project would look quite different if they were built upon a circular epistemology.



Discuss

Seeing the Matrix, Switching Abstractions, and Missing Moods

5 июня, 2019 - 00:08
Published on June 4, 2019 9:08 PM UTC

Epistemic Status: Poetry, but also, True Story

For seven years, I worked in a supermarket bakery.

The bakery was quite a nice place to work. I got a good mix of physical exercise (everything I know about basketball I learned from tossing heavy boxes of bread up several feet such that they landed just perfectly on top of each other).

I learned skills, I decorated cakes. I had an excellent manager, who led by example, who was funny, who was stern when she needed to be but almost never needed to be because people just wanted to do the right thing for her.

One day, we hired a person I found really annoying, who I'll call Debbie.

Debbie talked a lot, and she had a really grating, whiny, high pitched voice. And at first I tried to engage with her cheerfully, then I tried engaging with her politely, and then I tried to avoid her because she just wouldn't stop talking no matter what about inane things that nonetheless were just complicated enough that I felt pressure to think about how to respond.

Debbie was probably a decent person who didn't deserve my ire. Nonetheless, my ire she had.

Avoiding Debbie wasn't really an option because the bakery wasn't that big. A few weeks of annoyance passed. And one day Debbie was telling some story about her kids or sister or something that was probably a reasonably fine story but I just couldn't stand it any more and —

— and —

...and then I literally felt my brain make a slight "czhzk" sound. And Debbie's voice just of faded into the background. I heard all the other sounds in the supermarket – the customers talking, the air conditioners humming, the beep of distant item-scanners, the sliding of the automating doors. And Debbie's voice, one mechanical physical process among many.

And it felt like Neo, at the end of the Matrix, where he can suddenly see the Code, and he can also see Agent Smith. And then a flashback to earlier in the movie, when Neo looks upon the raw code for the first time and can't make heads of tails of it. A fellow crewmember says "Yeah, I don't even see the code anymore. I just parse it automatically. My brain sees 'blond', 'brunette', 'redhead'."

And suddenly, I could effortless switch back and forth between seeing Debbie, the human being with hopes and dreams and kids and sisters and stories she wanted to tell and coworkers she wanted to vibe with... and <Debbie>, the collection of atoms that were just inevitably, deterministically, going to keep on making high pitched noises no matter what I said or did.

And somehow... this made it simultaneously way easier to both ignore Debbie when I didn't feel like dealing with her – muttering automatic 'uh huhs' and 'yeah' and nodding as appropriate – as well as remembering she was a Person, whose experiences were valuable according my outlook on Personhood, and sometimes actually talking with her and engaging and actually caring about her kids or sister.

Now one lesson, a simpler lesson (but which I didn't actually learn till much later), is that nerds tend to think conversation needs to include lots of actually understanding what a person is saying and forming reasonable beliefs about the conversation topic and saying those things in a logical flow. But, a lot of what's going on is socializing, smalltalk, and sometimes vibing. I found Debbie annoying in part because I felt obligated to spend a fair amount of cognitive effort responding to her stories. I do think she cared at least somewhat that I actually listened. But I don't think she intended me to spend as much cognitive effort as I was.

The weirder, deeper, experiential lesson is to be able to see the Matrix. I don't expect you to gain that ability by reading this blogpost – I think I needed a weird combination of circumstances, and philosophical-beliefs-at-a-certain-stage-of-development, in order for things to click into place and my brain to go "czhzk".

But I think it's useful to be able to refer to this skill. My guess is it's not that hard to conceptually understand (at least for LessWrong folk).

And I think it becomes relevant when engaging with important moral questions, that require us both to have the ability to see a million deaths, or a million lives, and process them as a statistic that is weighed against other statistics, equations to be abstracted and simplified until you find the answer....

...and to remember the felt sense, that each of those lives and deaths are filled with meaning. And you can only process one or two of them at a time before your brain breaks, but that their meaning depends on the ability to look at them through a layer of abstraction that doesn't lend itself well to linear logic or calculations.

Sometimes you need to simplify part or all of those rich-inner-lives away to be able to think, or communicate clearly about the equations.

I think there's something like missing moods that go on sometimes, where one person is trying to have a conversation focusing on a particular abstraction, and another person is tuned into another abstraction, and they both share most of the same beliefs and values but at that particular moment they're trying to talk about different things and frustrated that the other person doesn't seem to care.

Sometimes you need to abstract away, not only the rich inner life of a person (or millions of people), but entire swaths of the cold mathematics too. Or, sometimes you might want to focus on rich-inner-life, but tuned into a particular subset – the part of the rich-inner-life that cares about agency and longterms goals, or the rich-inner-life that cares about the energy in the room at the moment.

Or, you might care about interpersonal rich-outer-life — the layers of meaning and personhood that apply to relationships and groups over individuals or masses. Two individuals might both be having individually good experiences but something about their relationship seems stagnant or toxic.

Reality is rich with detail and it seems quite common to need to tune into some of that detail, and necessarily tune out others.

I wrote this post after a couple recent moments wherein I seemed to focused on a different slice of reality than my conversation partner(s). I'm not sure whether this particular blogpost would have helped, but it seemed like a useful handle for how I personally relate to this sort of disconnect. I'd like to be able to briefly refer to this meta-concept, and then figuring out which moods are actually missing, and "get on the same page" as quickly as possible without interrupting the conversation.

I don't necessarily see all the same rich-detail that you do, and there might be times when I literally don't understand or don't care about the particular lens you are trying to optimize reality through. But I think, most of the time, I do see it, it just might not have been what I was focused on at the moment.



Discuss

How is Solomonoff induction calculated in practice?

4 июня, 2019 - 13:11
Published on June 4, 2019 10:11 AM UTC

Solomonoff induction is generally given as the correct way to penalise more complex hypotheses when calculating priors. A great introduction can be found here.

My question is, how is this actually calculated in practice?

As an example, say I have 2 hypotheses:

A. The probability distribution of the output is given by the same normal distribution for all inputs, with mean .mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} μ and standard deviation σ.

B. The probability distribution of the output is given by a normal distribution depending on an input x with mean μ0+mx and standard deviation σ.

It is clear that hypothesis B is more complex (using an additional input [x], having an additional parameter [m] and requiring 2 additional operations to calculate) but how does one calculate the actual penalty that B should be given vs A?




Discuss

Agents dissolved in coffee

4 июня, 2019 - 11:22
Published on June 4, 2019 8:22 AM UTC

Bottom line

When thinking about embedded agency it might be helpful to drop the notion of ‘agency’ and ‘agents’ sometimes, because it might be confusing or underdefined. Instead one could think of processes running according to the laws of physics. Or of algorithms running on a stack of interpreters running on the hardware of the universe.

In addition (or as a corollary) to an alternative way of thinking about agents, you will also read about an alternative way of thinking about yourself.

Background

The following is mostly a stream of thought that went through my head after I drank a cup of strong milk coffee and sat down reading Embedded Agency. I consume caffeine only twice a week. So when I do, it takes my thinking to new places. (Or maybe it's because the instant coffee powder that I use expired in March 2012.)

Start

My thoughts kick off at this paragraph of Embedded Agency:

In addition to hazards in her external environment, Emmy is going to have to worry about threats coming from within. While optimizing, Emmy might spin up other optimizers as subroutines, either intentionally or unintentionally. These subsystems can cause problems if they get too powerful and are unaligned with Emmy’s goals. Emmy must figure out how to reason without spinning up intelligent subsystems, or otherwise figure out how to keep them weak, contained, or aligned fully with her goals.

This is our alignment problem repeated in Emmy. In other words, if we solve the alignment problem, it is also solved in Emmy and vice versa. If we view the machines that we want to run our AI on as part of ourselves, we're the same as Emmy.

We are a low-capability Emmy. We are partially and often unconsciously solving the embedded agency subproblems using heuristics. Some of which we know, some of which we don't know. As we try to add to our capabilities, we might run into the limitations of those heuristics. Or not necessarily the limitations yet. We don't even know how exactly they work and how to implement them on computers. Computers are our current tool of choice to overcome the physical, biological and psychological limits of our brains.

Another way of adding to our capabilities is amplifying them using organizations. Then we have a composite (cf. the software design pattern): groups of agents can be viewed as one agent. An agent together with a tool is one agent. An agent interacting with part of the environment is one agent.

In the other direction an agent with an arm amputated (ie. without that arm) is still an agent. How much can we remove and still have an agent? Here we run into the thing with an agent being part of the environment, made out of non-agentic pieces.

How can we talk about agency at all? We're just part of some giant physical reaction. And we're a part that happens to be working on getting bigger stuff done, which is a human notion. From this point of view the earth is just a place of the universe where some quirky things (eg. people selling postcards showing crocodiles wearing swim goggles) are happening according to the laws of physics.

Fundamentally we're nothing else than a supernova or a neutron star making its pasta shapes or a quasar sending out huge amounts of radiation. We're just another physical process in the universe. (Duh.) When the aliens come visit (and maybe kill) us we have a chance to observe (albeit for a short time) the intermediate state of a physical process similar to the one that has been going on on earth.

You could take this as a cue to lean back and see how it all rolls out. You could also be worried that you might not be able to maintain your laid-back composure when you fall into poverty because you didn't work hard, or when your loved ones are raped and killed in a war, because you didn't help the world stay stable and keep peace (mostly).

Or you could lean forward and join those who are working on getting bigger stuff done.

Where does this illusion of choice come from, anyway? Is this the necessary ingredient for an agent? I don't think so, because an agent need not be conscious, which is a requirement for having illusions, I guess. (Aside: how do we deal with consciousness? Is it an incidental feature? Or is it necessary/very helpful for physical processes that have the grand and intricate results that we want?)

Is it helpful to talk about agency at all, then? In the end an aligned embedded agent is just another process running according to the laws of physics that hopefully has the results we want to have. We don't talk about agency when a thermite charge welds two pieces of rail together. We do talk about agency when we see humans playing tennis. Where is the boundary and what is the benefit of drawing one?

So I've gotten rid of agency. We only have processes running according to the laws of physics, or alternatively, algorithms running on a stack of interpreters running on the hardware of the universe. Does this help with thinking about the embedded subproblems? I don't know yet, but I will it in mind. Then the question is: how do we kick off the kind of processes that will have the results that we want?

Question prompts
  • Is this way of looking at the world new to you?
  • Did it help you in thinking about embedded agency?
  • Where am I using wrong terminology?
  • Where is my thinking misguided or wrong?


Discuss

The Inner Alignment Problem

4 июня, 2019 - 04:20
Published on June 4, 2019 1:20 AM UTC

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

This is the third of five posts in the Mesa-Optimization Sequence based on the upcoming MIRI paper “Risks from Learned Optimization in Advanced Machine Learning Systems” by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant. Each post in the sequence corresponds to a different section of the paper, with the full paper set to be published on the MIRI blog with the release of the last post in the sequence.

 

In this post, we outline reasons to think that a mesa-optimizer may not optimize the same objective function as its base optimizer. Machine learning practitioners have direct control over the base objective function—either by specifying the loss function directly or training a model for it—but cannot directly specify the mesa-objective developed by a mesa-optimizer. We refer to this problem of aligning mesa-optimizers with the base objective as the inner alignment problem. This is distinct from the outer alignment problem, which is the traditional problem of ensuring that the base objective captures the intended goal of the programmers.

Current machine learning methods select learned algorithms by empirically evaluating their performance on a set of training data according to the base objective function. Thus, ML base optimizers select mesa-optimizers according to the output they produce rather than directly selecting for a particular mesa-objective. Moreover, the selected mesa-optimizer's policy only has to perform well (as scored by the base objective) on the training data. If we adopt the assumption that the mesa-optimizer computes an optimal policy given its objective function, then we can summarize the relationship between the base and mesa- objectives as follows:(17) θ∗=argmaxθ E(Obase(πθ)), whereπθ=argmaxπ E(Omesa(π|θ)) That is, the base optimizer maximizes its objective Obase by choosing a mesa-optimizer with parameterization θ based on the mesa-optimizer's policy πθ, but not based on the objective function Omesa that the mesa-optimizer uses to compute this policy. Depending on the base optimizer, we will think of Obase as the negative of the loss, the future discounted reward, or simply some fitness function by which learned algorithms are being selected.

An interesting approach to analyzing this connection is presented in Ibarz et al, where empirical samples of the true reward and a learned reward on the same trajectories are used to create a scatter-plot visualization of the alignment between the two.(18) The assumption in that work is that a monotonic relationship between the learned reward and true reward indicates alignment, whereas deviations from that suggest misalignment. Building on this sort of research, better theoretical measures of alignment might someday allow us to speak concretely in terms of provable guarantees about the extent to which a mesa-optimizer is aligned with the base optimizer that created it.

 

3.1. Pseudo-alignment

There is currently no complete theory of the factors that affect whether a mesa-optimizer will be pseudo-aligned—that is, whether it will appear aligned on the training data, while actually optimizing for something other than the base objective. Nevertheless, we outline a basic classification of ways in which a mesa-optimizer could be pseudo-aligned:

  1. Proxy alignment,
  2. Approximate alignment, and
  3. Suboptimality alignment.

Proxy alignment. The basic idea of proxy alignment is that a mesa-optimizer can learn to optimize for some proxy of the base objective instead of the base objective itself. We'll start by considering two special cases of proxy alignment: side-effect alignment and instrumental alignment.

First, a mesa-optimizer is side-effect aligned if optimizing for the mesa-objective Omesa has the direct causal result of increasing the base objective Obase in the training distribution, and thus when the mesa-optimizer optimizes Omesa it results in an increase in Obase. For an example of side-effect alignment, suppose that we are training a cleaning robot. Consider a robot that optimizes the number of times it has swept a dusty floor. Sweeping a floor causes the floor to be cleaned, so this robot would be given a good score by the base optimizer. However, if during deployment it is offered a way to make the floor dusty again after cleaning it (e.g. by scattering the dust it swept up back onto the floor), the robot will take it, as it can then continue sweeping dusty floors.

Second, a mesa-optimizer is instrumentally aligned if optimizing for the base objective Obase has the direct causal result of increasing the mesa-objective Omesa in the training distribution, and thus the mesa-optimizer optimizes Obase as an instrumental goal for the purpose of increasing Omesa. For an example of instrumental alignment, suppose again that we are training a cleaning robot. Consider a robot that optimizes the amount of dust in the vacuum cleaner. Suppose that in the training distribution the easiest way to get dust into the vacuum cleaner is to vacuum the dust on the floor. It would then do a good job of cleaning in the training distribution and would be given a good score by the base optimizer. However, if during deployment the robot came across a more effective way to acquire dust—such as by vacuuming the soil in a potted plant—then it would no longer exhibit the desired behavior.

We propose that it is possible to understand the general interaction between side-effect and instrumental alignment using causal graphs, which leads to our general notion of proxy alignment.

Suppose we model a task as a causal graph with nodes for all possible attributes of that task and arrows between nodes for all possible relationships between those attributes. Then we can also think of the mesa-objective Omesa and the base objective Obase as nodes in this graph. For Omesa to be pseudo-aligned, there must exist some node X such that X is an ancestor of both Omesa and Obase in the training distribution, and such that Omesa and Obase increase with X. If X=Omesa, this is side-effect alignment, and if X=Obase, this is instrumental alignment.

This represents the most generalized form of a relationship between Omesa and Obase that can contribute to pseudo-alignment. Specifically, consider the causal graph given in figure 3.1. A mesa-optimizer with mesa-objective Omesa will decide to optimize X as an instrumental goal of optimizing Omesa, since X increases Omesa. This will then result in Obase increasing, since optimizing for X has the side-effect of increasing Obase. Thus, in the general case, side-effect and instrumental alignment can work together to contribute to pseudo-alignment over the training distribution, which is the general case of proxy alignment.

Figure 3.1. A causal diagram of the training environment for the different types of proxy alignment. The diagrams represent, from top to bottom, side-effect alignment (top), instrumental alignment (middle), and general proxy alignment (bottom). The arrows represent positive causal relationships—that is, cases where an increase in the parent causes an increase in the child.

Approximate alignment. A mesa-optimizer is approximately aligned if the mesa-objective Omesa and the base objective Obase are approximately the same function up to some degree of approximation error related to the fact that the mesa-objective has to be represented inside the mesa-optimizer rather than being directly programmed by humans. For example, suppose you task a neural network with optimizing for some base objective that is impossible to perfectly represent in the neural network itself. Even if you get a mesa-optimizer that is as aligned as possible, it still will not be perfectly robustly aligned in this scenario, since there will have to be some degree of approximation error between its internal representation of the base objective and the actual base objective.

Suboptimality alignment. A mesa-optimizer is suboptimality aligned if some deficiency, error, or limitation in its optimization process causes it to exhibit aligned behavior on the training distribution. This could be due to computational constraints, unsound reasoning, a lack of information, irrational decision procedures, or any other defect in the mesa-optimizer's reasoning process. Importantly, we are not referring to a situation where the mesa-optimizer is robustly aligned but nonetheless makes mistakes leading to bad outcomes on the base objective. Rather, suboptimality alignment refers to the situation where the mesa-optimizer is misaligned but nevertheless performs well on the base objective, precisely because it has been selected to make mistakes that lead to good outcomes on the base objective.

For an example of suboptimality alignment, consider a cleaning robot with a mesa-objective of minimizing the total amount of stuff in existence. If this robot has the mistaken belief that the dirt it cleans is completely destroyed, then it may be useful for cleaning the room despite doing so not actually helping it succeed at its objective. This robot will be observed to be a good optimizer of Obase and hence be given a good score by the base optimizer. However, if during deployment the robot is able to improve its world model, it will stop exhibiting the desired behavior.

As another, perhaps more realistic example of suboptimality alignment, consider a mesa-optimizer with a mesa-objective Omesa and an environment in which there is one simple strategy and one complicated strategy for achieving Omesa. It could be that the simple strategy is aligned with the base optimizer, but the complicated strategy is not. The mesa-optimizer might then initially only be aware of the simple strategy, and thus be suboptimality aligned, until it has been run for long enough to come up with the complicated strategy, at which point it stops exhibiting the desired behavior.

 

3.2. The task

As in the second post, we will now consider the task the machine learning system is trained on. Specifically, we will address how the task affects a machine learning system's propensity to produce pseudo-aligned mesa-optimizers.

Unidentifiability. It is a common problem in machine learning for a dataset to not contain enough information to adequately pinpoint a specific concept. This is closely analogous to the reason that machine learning models can fail to generalize or be susceptible to adversarial examples(19)—there are many more ways of classifying data that do well in training than any specific way the programmers had in mind. In the context of mesa-optimization, this manifests as pseudo-alignment being more likely to occur when a training environment does not contain enough information to distinguish between a wide variety of different objective functions. In such a case there will be many more ways for a mesa-optimizer to be pseudo-aligned than robustly aligned—one for each indistinguishable objective function. Thus, most mesa-optimizers that do well on the base objective will be pseudo-aligned rather than robustly aligned. This is a critical concern because it makes every other problem of pseudo-alignment worse—it is a reason that, in general, it is hard to find robustly aligned mesa-optimizers. Unidentifiability in mesa-optimization is partially analogous to the problem of unidentifiability in reward learning, in that the central issue is identifying the “correct” objective function given particular training data.(20) We will discuss this relationship further in the fifth post.

In the context of mesa-optimization, there is also an additional source of unidentifiability stemming from the fact that the mesa-optimizer is selected merely on the basis of its output. Consider the following toy reinforcement learning example. Suppose that in the training environment, pressing a button always causes a lamp to turn on with a ten-second delay, and that there is no other way to turn on the lamp. If the base objective depends only on whether the lamp is turned on, then a mesa-optimizer that maximizes button presses and one that maximizes lamp light will show identical behavior, as they will both press the button as often as they can. Thus, we cannot distinguish these two objective functions in this training environment. Nevertheless, the training environment does contain enough information to distinguish at least between these two particular objectives: since the high reward only comes after the ten-second delay, it must be from the lamp, not the button. As such, even if a training environment in principle contains enough information to identify the base objective, it might still be impossible to distinguish robustly aligned from proxy-aligned mesa-optimizers.

Proxy choice as pre-computation. Proxy alignment can be seen as a form of pre-computation by the base optimizer. Proxy alignment allows the base optimizer to save the mesa-optimizer computational work by pre-computing which proxies are valuable for the base objective and then letting the mesa-optimizer maximize those proxies.

Without such pre-computation, the mesa-optimizer has to infer at runtime the causal relationship between different input features and the base objective, which might require significant computational work. Moreover, errors in this inference could result in outputs that perform worse on the base objective than if the system had access to pre-computed proxies. If the base optimizer precomputes some of these causal relationships—by selecting the mesa-objective to include good proxies—more computation at runtime can be diverted to making better plans instead of inferring these relationships.

The case of biological evolution may illustrate this point. The proxies that humans care about—food, resources, community, mating, etc.—are relatively computationally easy to optimize directly, while correlating well with survival and reproduction in our ancestral environment. For a human to be robustly aligned with evolution would have required us to instead care directly about spreading our genes, in which case we would have to infer that eating, cooperating with others, preventing physical pain, etc. would promote genetic fitness in the long run, which is not a trivial task. To infer all of those proxies from the information available to early humans would have required greater (perhaps unfeasibly greater) computational resources than to simply optimize for them directly. As an extreme illustration, for a child in this alternate universe to figure out not to stub its toe, it would have to realize that doing so would slightly diminish its chances of reproducing twenty years later.

For pre-computation to be beneficial, there needs to be a relatively stable causal relationship between a proxy variable and the base objective such that optimizing for the proxy will consistently do well on the base objective. However, even an imperfect relationship might give a significant performance boost over robust alignment if it frees up the mesa-optimizer to put significantly more computational effort into optimizing its output. This analysis suggests that there might be pressure towards proxy alignment in complex training environments, since the more complex the environment, the more computational work pre-computation saves the mesa-optimizer. Additionally, the more complex the environment, the more potential proxy variables are available for the mesa-optimizer to use.

Furthermore, in the context of machine learning, this analysis suggests that a time complexity penalty (as opposed to a description length penalty) is a double-edged sword. In the second post, we suggested that penalizing time complexity might serve to reduce the likelihood of mesa-optimization. However, the above suggests that doing so would also promote pseudo-alignment in those cases where mesa-optimizers do arise. If the cost of fully modeling the base objective in the mesa-optimizer is large, then a pseudo-aligned mesa-optimizer might be preferred simply because it reduces time complexity, even if it would underperform a robustly aligned mesa-optimizer without such a penalty.

Compression of the mesa-optimizer. The description length of a robustly aligned mesa-optimizer may be greater than that of a pseudo-aligned mesa-optimizer. Since there are more pseudo-aligned mesa-objectives than robustly aligned mesa-objectives, pseudo-alignment provides more degrees of freedom for choosing a particularly simple mesa-objective. Thus, we expect that in most cases there will be several pseudo-aligned mesa-optimizers that are less complex than any robustly aligned mesa-optimizer.

This description cost is especially high if the learned algorithm's input data does not contain easy-to-infer information about how to optimize for the base objective. Biological evolution seems to differ from machine learning in this sense, since evolution's specification of the brain has to go through the information funnel of DNA. The sensory data that early humans received didn't allow them to infer the existence of DNA, nor the relationship between their actions and their genetic fitness. Therefore, for humans to have been aligned with evolution would have required them to have an innately specified model of DNA, as well as the various factors influencing their inclusive genetic fitness. Such a model would not have been able to make use of environmental information for compression, and thus would have required a greater description length. In contrast, our models of food, pain, etc. can be very short since they are directly related to our input data.

 

3.3. The base optimizer

We now turn to how the base optimizer is likely to affect the propensity for a machine learning system to produce pseudo-aligned mesa-optimizers.

Hard-coded optimization. In the second post, we suggested that hard-coding an optimization algorithm—that is to say, choosing a model with built-in optimization—could be used to remove some of the incentives for mesa-optimization. Similarly, hard-coded optimization may be used to prevent some of the sources of pseudo-alignment, since it may allow one to directly specify or train the mesa-objective. Reward-predictive model-based reinforcement learning might be one possible way of accomplishing this.(21) For example, an ML system could include a model directly trained to predict the base objective together with a powerful hard-coded optimization algorithm. Doing this bypasses some of the problems of pseudo-alignment: if the mesa-optimizer is trained to directly predict the base reward, then it will be selected to make good predictions even if a bad prediction would result in a good policy. However, a learned model of the base objective will still be underdetermined off-distribution, so this approach by itself does not guarantee robust alignment.

Algorithmic range. We hypothesize that a model's algorithmic range will have implications for how likely it is to develop pseudo-alignment. One possible source of pseudo-alignment that could be particularly difficult to avoid is approximation error—if a mesa-optimizer is not capable of faithfully representing the base objective, then it can't possibly be robustly aligned, only approximately aligned. Even if a mesa-optimizer might theoretically be able to perfectly capture the base objective, the more difficult that is for it to do, the more we might expect it to be approximately aligned rather than robustly aligned. Thus, a large algorithmic range may be both a blessing and a curse: it makes it more likely that mesa-optimizers will be robustly aligned rather than approximately aligned, but it also increases the likelihood of getting a mesa-optimizer in the first place.

Subprocess interdependence. There are some reasons to believe that there might be more initial optimization pressure towards proxy aligned than robustly aligned mesa-optimizers. In a local optimization process, each parameter of the learned algorithm (e.g. the parameter vector of a neuron) is adjusted to locally improve the base objective conditional on the other parameters. Thus, the benefit for the base optimizer of developing a new subprocess will likely depend on what other subprocesses the learned algorithm currently implements. Therefore, even if some subprocess would be very beneficial if combined with many other subprocesses, the base optimizer may not select for it until the subprocesses it depends on are sufficiently developed. As a result, a local optimization process would likely result in subprocesses that have fewer dependencies being developed before those with more dependencies.

In the context of mesa-optimization, the benefit of a robustly aligned mesa-objective seems to depend on more subprocesses than at least some pseudo-aligned mesa-objectives. For example, consider a side-effect aligned mesa-optimizer optimizing for some set of proxy variables. Suppose that it needs to run some subprocess to model the relationship between its actions and those proxy variables. If we assume that optimizing the proxy variables is necessary to perform well on the base objective, then for a mesa-optimizer to be robustly aligned, it would also need to model the causal relationship between those proxy variables and the base objective, which might require additional subprocesses. Moreover, the benefit to the base optimizer of adding those subprocesses depends on the mesa-optimizer having additional subprocesses to model the relationship between its actions and those proxy variables. This informal argument suggests that if a mesa-optimizer's computation neatly factors in this way, then developing a pseudo-aligned mesa-objective may require strictly more subprocesses than developing a robustly aligned mesa-objective.

This suggests that, at least in a local optimization process, mesa-optimizers might tend to start their development as proxy aligned before becoming robustly aligned. In other words, rather than simultaneously gaining competence and becoming aligned, we might expect such a system to first become competent at optimizing proxies, then possibly start becoming more robustly aligned.

Internals-based selection. In current machine learning systems, learned algorithms are selected primarily based on how their policies perform on the base objective. There is often some selection based on the internal structure of the learned algorithm, such as weight regularization, but such selection is usually very coarse. One possible idea for addressing pseudo-alignment could be to use a selection method based on a more sophisticated function of a learned algorithm's internal workings. This approach is similar in spirit to hard-coded optimization: we use knowledge of the system's internal workings to prevent it from being proxy-aligned.

One possibility might be to develop a meta-learner that is specifically trained to prevent pseudo-alignment. While this seems beyond current meta-learning techniques, the basic principle is similar to existing meta-learners designed to optimize for robustness. An extreme form of internals-based selection might have the base optimizer reason about or prove properties of the learned algorithm as part of its selection process.

While desirable, such methods are currently out of reach primarily due to the limitations of current transparency methods. One possible concern with all of these approaches, however, is that they might require the base optimizer to be better at searching for problems than the mesa-optimizer is at hiding them. Furthermore, in order to create a base optimizer that can do this, we may need to first train it using a third optimizer, which might result in the same problem at this second layer.

Adversarial training. One possible method to alleviate some of these concerns might be to sample environments adversarially from a very wide distribution. With such adversarial training, it might be possible to overcome these hurdles and make the robustly aligned objective the most reachable one. This would require making it so that, for every mesa-objective Omesa that is more reachable than the base objective Obase, there is some situation where Omesa and Obase disagree and the mesa-optimizer is incentivized to agree with Obase.

If we model reachability of an objective as simply its length in bits, then distinguishing Obase from every single more reachable Omesa gets exponentially harder as Obase gets more complex. Thus, for a very complicated Obase, sufficiently incentivizing the base optimizer to find a mesa-optimizer with that Obase is likely to be very difficult, though not impossible.

Even in such a case, however, there would likely still be a period of time where the learned algorithm is a misaligned mesa-optimizer, leaving open an ominous possibility: the misaligned mesa-optimizer could figure out the correct actions to take based on Obase while its objective function was still Omesa. We will call this situation deceptive alignment and will discuss it at greater length in the next post.

 

The fourth post in the Mesa-Optimization Sequence, titled “Deceptive Alignment,” will be released in two days.

Glossary | Bibliography



Discuss

[Site Status] Warning: We are currently experiencing an issue with saving and publishing posts. Fix underway.

4 июня, 2019 - 02:42
Published on June 3, 2019 11:42 PM UTC

This post is a heads up to users that the team has identified a bug which is preventing the "save to draft" or "publish changes" buttons from working, instead returning 504's.

The team is actively fixing this now.



Discuss

Washington SSC Meetup

3 июня, 2019 - 22:30
Published on June 3, 2019 7:30 PM UTC

Slate Star Codex discussion meetup for Washington, DC.

Meetup is held in the second floor lounge during rainy weather, or rooftop if sunny.



Discuss

Can movement from Conflict to Mistake theorist be facilitated effectively?

3 июня, 2019 - 20:31
Published on June 3, 2019 5:02 PM UTC

What are the common push and pull factors that facilitate an individuals movement from Conflict theorist to Mistake theorist? What are the transitional positions and how can they best be described? Can a technique be created to facilitate this transition?



Discuss

Our plan for 2019-2020: consulting for AI Safety education

3 июня, 2019 - 19:51
Published on June 3, 2019 4:51 PM UTC

Tl;dr: a conversation with a grantmaker made us drop our long-held assumption that outputs needed to be concrete to be recognized. We decided to take a step back and approach the improvement of the AI Safety pipeline on a more abstract level, doing consulting and research to develop expertise in the area. This will be our focus in the next year.

Trial results

We have tested our course in April. We didn’t get a positive result. It looks like this was due to bad test design, with a high variance and low number of participants clouding any pattern that could have emerged. In hindsight, we should clearly have tested knowledge before the intervention as well as after it, though arguably this would have been nearly impossible given the one-month deadline that our funder imposed.

What we did learn is that we are greatly unaware of the extent to which our course is being used. This is mostly due to using software that is not yet mature enough to give this kind of data. If we want to continue building the course, we feel that our first priority ought to be to set up a feedback mechanism that gives us precise insights into how students are journeying through.

However, other developments have pointed our attention away from developing the course, and towards developing the question that the course is an answer to.

If funding wasn’t a problem

During the existence of RAISE, it’s runway has never been longer than about 2 months. This did cripple our ability to make long term decisions, in favor of dishing out some quick results to show value. Seen from a "quick feedback loops" paradigm, this may have been a healthy dynamic. It did also lead to sacrifices that we didn’t actually want to make.

Had we been tasked with our particular niche without any funding constraints, our first move would have been to do extensive study into what the field needs. We feel that EA is missing a management layer. There is a lot that a community-focused management consultant could do, simply by connecting all the dots and coordinating the many projects and initiatives that exist in the LTF space. We have identified 30 (!) small and large organisations that are involved in AI Safety. Not all of them are talking to each other, or even aware of each other.

Our niche being AI Safety education, we would have spent a good 6 months developing expertise and network in this area. We would have studied the scientific frontiers of relevant domains like education and the metasciences. We would have interviewed AIS organisations and asked them what they look for in employees. We would've studied existing alignment researchers and looked for patterns. Talk to grantmakers and consider their models.

Funding might not be a problem

After getting turned down by the LTF fund (which was especially meaningful because they didn’t seem to be constrained by funding), we had a conversation with one of their grantmakers. The premise of the conversation was something like “what version of RAISE would you be willing to fund?” The answer was pretty much what we just described. They thought pipeline improvement was important, but hard, and just going with the first idea that sounds good (an online course) would be a lucky shot if it worked. Instead, someone should be thinking about the bigger picture first.

The mistake we had been making from the beginning was to assume we needed concrete results to be taken seriously.

Our new direction

EA really does seem to be missing a management layer. People are thinking about their careers, starting organisations, doing direct work and research. Not many people are drawing up plans for coordination on a higher level and telling people what to do. Someone ought to be dividing up the big picture into roles for people to fill. You can see the demand for this by how seriously we take 80k. They’re the only ones doing this beyond the organisational level.

Much the same in the cause area we call AI Safety Education. Most AIS organisations are necessarily thinking about hiring and training, but no one is specializing in it. In the coming year, our aim is to fill this niche, building expertise and doing management consulting. We will aim to smarten up the coordination there. Concrete outputs might be:

  • Advice for grantmakers that want to invest in the AI Safety researcher pipeline
  • Advice for students that want to get up to speed and test themselves quickly
  • Suggesting interventions for entrepreneurs that want to fill up gaps in the ecosystem
  • Publishing thinkpieces that advance the discussion of the community, like this one
  • Creating and keeping wiki pages about subjects that are relevant to us
  • Helping AIS research orgs with their recruitment process

We’re hiring

Do you think this is important? Would you like to fast track your involvement with the Xrisk community? Do you have good google-fu, or would you like to conduct depth interviews with admirable people? Most importantly, are you not afraid to hack your own trail?

We think we could use one or two more people to join us in this effort. You’d be living for free in the EA Hotel. We can’t promise any salary in addition to that. Do ask us for more info!

Let's talk

A large part of our work will involve talking to those involved in AI Safety. If you are working in this field, and interested in working on the pipeline, then we would like to talk to you.

If you have important information to share, have been plotting to do something in this area for a while, and want to compare perspectives, then we would like to talk to you.

And even if you would just like to have an open-ended chat about any of this, we would like to talk to you!

You can reach us at raise@aisafety.info





Discuss

To first order, moral realism and moral anti-realism are the same thing

3 июня, 2019 - 18:04
Published on June 3, 2019 3:04 PM UTC

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

I've taken a somewhat caricaturist view of moral realism[1], describing it, essentially, as the random walk of a process defined by its "stopping" properties.

In this view, people start improving their morality according to certain criteria (self-consistency, simplicity, what they would believe if they were smarter, etc...) and continue on this until the criteria are finally met. Because there is no way of knowing how "far" this process can continue until the criteria are met, this can drift very far indeed from its starting point.

Now I would like to be able to argue, from a very anti-realist perspective, that:

  • Argument A: I want to be able to judge that morality α is better than morality β, based on some personal intuition or judgement of correctness. I want to be able to judge that β is alien and evil, even if it is fully self-consistent according to formal criteria, while α is not fully self-consistent.
Moral realists look like moral anti-realists

Now, I maintain that this "random walk to stopping point" is an accurate description of many (most?) moral realist systems. But it's a terrible description of moral realists. In practice, most moral realists allow for the possibility of moral uncertainty, and hence that their preference approach might have a small chance of being wrong.

And how would they identify that wrongness? By looking outside the formal process, and checking if the path that the moral "self-improvement" is taking is plausible, and doesn't lead to obviously terrible outcomes.

So, to pick one example from Wei Dai:

I’m envisioning that in the future there will also be systems where you can input any conclusion that you want to argue (including moral conclusions) and the target audience, and the system will give you the most convincing arguments for it. At that point people won’t be able to participate in any online (or offline for that matter) discussions without risking their object-level values being hijacked.

If the moral realist approach included getting into conversations with such systems and thus getting randomly subverted, then the moral realists I know would agree that the approach had failed, no matter how internally consistent it seems. Thus, they allow, in practice, some considerations akin to Argument A: where the moral process ends up (or at least the path that it takes) can affect their belief that the moral realist conclusion is correct.

So moral realists, in practice, do have conditional meta-preferences that can override their moral realist system. Indeed, most moral realists don't have a fully-designed system yet, but have a rough overview of what they want, with some details they expect to fill in later; from the perspective of here and now, they have some preferences, some strong meta-preferences (on how the system should work) and some conditional meta-preferences (on how the design of the system should work, conditional on certain facts or arguments they will learn later).

Moral anti-realists look like moral realists

Enough picking on moral realists; let's look now at moral anti-realists, which is relatively easy for me as I'm one of them. Suppose I was to investigate an area of moral realism that I haven't investigated before; say, political theory of justice.

Then, I would expect that as I investigated this area, I would start to develop better categories than what I have now, with crisper and more principled boundaries. I would expect to meet arguments that would change how I feel and what I value in these areas. I would apply simplicity arguments to make more elegant the hodgepodge of half-baked ideas that I currently have in that area.

In short, I would expect to engage in moral learning. Which is a peculiar thing for a moral anti-realist to expect...

The first-order similarity

So, to generalise a bit across the two categories:

  1. Moral realists are willing to question the truth of their systems based on facts about the world that should formally be irrelevant to that truth, and use their own private judgement in these cases.
  2. Moral anti-realists are willing to engage in something that looks like moral learning.

Note that the justifications of the two points of view are different - the moral realist can point to moral uncertainty, the moral anti-realist to personal preferences for a more consistent system. And the long-term perspectives are different: the moral realist expects that their process will likely converge to something with fantastic properties, the moral anti-realist thinks it likely that the degree of moral learning is sharply limited, only a few "iterations" beyond their current morality.

Still, in practice, and to a short-term, first-order approximation, moral realists and moral-anti realists seem very similar. Which is probably why they can continue to have conversations and debates that are not immediately pointless.

  1. I apologise for my simplistic understanding and definitions of moral realism. However, my partial experience in this field has been enough to convince me that there are many incompatible definition of moral realism, and many arguments about them, so it's not clear there is a single simple thing to understand. So I've tried to define is very roughly, enough so that the gist of this post makes sense. ↩︎



Discuss

Conditional meta-preferences

3 июня, 2019 - 17:09
Published on June 3, 2019 2:09 PM UTC

I'd just want to make the brief point that many human meta-preferences are conditional.

Sure, we have "I'd want to be more generous", or "I'd want my preferences to be more consistent". But there are many variations of "I'd want to believe in a philosophical position if someone brings me a very convincing argument for it" and, to various degrees of implicitness or explicitness, "I'd want to stop believing in cause X if implementing it leads to disasters".

Some are a mix of conditional and anti-conditional: "I'd want to believe in X even if there was strong evidence against it, but if most of my social group turns against X, then I would want to too".

The reason for this stub of a post is that when I think of meta-preferences, I generally think of them as conditional; yet I've read some comments by people that imply that they think that I think of meta-preferences in an un-conditional way[1]. So I made this post to have a brief reference point.

Indeed, in a sense, every attempt to come up with normative assumptions to bridge the is-ought gap in value learning, is an attempt to explicitly define the conditional dependence of preferences upon the facts of the physical world.

Defining meta-preferences that way is not a problem, and bringing the definition into the statement of the meta-preference is not a problem either. In many cases, whether we label something conditional or non-conditional is a matter of taste, or whether we'd done the updating ahead of time or not. Contrast "I love chocolate", with "I love delicious things" with the observation "I find chocolate delicious", with "conditional on it being delicious, I would love chocolate" (and "I find chocolate delicious").

  1. This sentence does actually make sense. ↩︎



Discuss

Does Bayes Beat Goodheart?

3 июня, 2019 - 05:31
Published on June 3, 2019 2:31 AM UTC

Stuart Armstrong has claimed to beat Goodheart with Bayesian uncertainty -- rather than assuming some particular objective function (which you try to make as correct as possible), you represent some uncertainty. A similar claim was made in The Optimizer's Curse and How to Beat It, the essay which introduced a lot of us to ... well, not Goodheart's Law itself (the post doesn't make mention of Goodheart), but, that kind of failure. I myself claimed that Bayes beats regressional Goodheart, in Robust Delegation:

I now think this isn't true -- Bayes' Law doesn't beat Goodheart fully. It doesn't even beat regressional Goodheart fully. (I'll probably edit Robust Delegation to change the claim at some point.)

(Stuart makes some more detailed claims about AI and the nearest-unblocked-strategy problem which aren't exactly claims about Goodheart, at least according to him. I don't fully understand Stuart's perspective, and don't claim to directly address it here. I am mostly only addressing the question of the title of my post: does Bayes beat Goodheart?)

If approximate solutions are concerning, why would mixtures of them be unconcerning?

My first argument is a loose intuition: Goodheartian phenomena suggest that somewhat-correct-but-not-quite-right proxy functions are not safe to optimize (and in some sense, the more optimization pressure is applied, the less safe we expect it to be). Assigning weights to a bunch of somewhat-but-not-quite-right possibilities just gets us another somewhat-but-not-quite-right possibility. Perhaps it is able to bear more optimization pressure before breaking down, by virtue of being less incorrect. But why would we get anything stronger than that?

My intuition there doesn't address the gears of the situation adequately, though. Let's get into it.

Overcoming regressional Goodheart requires calibrated learning.

In Robust Delegation, I defined regressional Goodheart through the predictable-dissapointment idea. Does Bayesian reasoning eliminate predictable disappointment?

Well, it depends on what is meant by "predictable". You could define it as predictable-by-bayes, in which case it follows that Bayes solves the problem. However, I think it is reasonable to at least add a calibration requirement: there should be no way to systematically correct estimates up or down as a function of the expected value.

Calibration seems like it does, in fact, significantly address regressional Goodheart. You can't have seen a lot of instances of an estimate being too high, and still accept that too-high estimate. It doesn't address extremal Goodheart, because calibrated learning can only guarantee that you eventually calibrate, or converge at some rate, or something like that -- extreme values that you've rarely encountered would remain a concern.

(Stuart's "one-in-three" example in the Defeating Goodheart post, and his discussion of human overconfidence more generally, is somewhat suggestive of calibration.)

Bayesian methods are not always calibrated. Calibrated learning is not always Bayesian. (For example, logical induction has good calibration properties, and so far, hasn't gotten a really satisfying Bayesian treatment.)

This might be confusing if you're used to thinking in Bayesian terms. If you think in terms of the diagram I copied from Robust Delegation, above: you have a prior which stipulates probability of true utility .mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} y given observation x; your expectation g(x) is the expected value of y for a particular value of x; g(x) is not predictably correctable with respect to your prior. What's the problem?

The problem is that this line of reasoning assumes that your prior is objectively correct. This doesn't generally make sense (especially from a Bayesian perspective). So, it is perfectly consistent for you to collect many observations, and see that g(x) has some systematic bias. This may remain true even as you update on those observations (because Bayesian learning doesn't guarantee any calibration property in general!).

The faulty assumption that your probability distribution is correct is often replaced with the (weaker, but still problematic) assumption that at least one hypothesis within your distribution is objectively correct -- the realizability assumption.

Bayesian solutions assume realizability.

As discussed in Embedded World Models, the realizability assumption is the assumption that (at least) one of your hypotheses represents the true state of affairs. Bayesian methods often (though not always) require a realizability assumption in order to get strong guarantees. Frequentist methods rarely require such an assumption (whatever else you may say about frequentist methods). Calibration is an example of that -- a Bayesian can get calibration under the assumption of realizability, but, we might want a stronger guarantee of calibration which holds even in absence of realizability.

"We quantified our uncertainty as best we could!"

One possible bayes-beats-goodheart argument is: "Once we quantify our uncertainty with a probability distribution over possible utility functions, the best we can possibly do is to choose whatever maximizes expected value. Anything else is decision-theoretically sub-optimal."

Do you think that the true utility function is really sampled from the given distribution, in some objective sense? And the probability distribution also quantifies all the things which can count as evidence? If so, fine. Maximizing expectation is the objectively best strategy. This eliminates all types of Goodheart by positing that we've already modeled the possibilities sufficiently well: extremal cases are modeled correctly; adversarial effects are already accounted for; etc.

However, this is unrealistic due to embeddedness: the outside world is much more complicated than any probability distribution which we can explicitly use, since we are ourselves a small part of that world.

Alternatively, do you think the probability distribution really codifies your precise subjective uncertainty? Ok, sure, that would also justify the argument.

Realistically, though, an implementation of this isn't going to be representing your precise subjective beliefs (to the extent you even have precise subjective beliefs). It has to hope to have a prior which is "good enough".

In what sense might it be "good enough"?

An obvious problem is that a distribution might be overconfident in a wrong conclusion, which will obviously be bad. The fix for this appears to be: make sure that the distribution is "sufficiently broad", expressing a fairly high amount of uncertainty. But, why would this be good?

Well, one might argue: it can only be worse that our true uncertainty to the extent that it ends up assigning too little weight to the correct option. So, if the probability function isn't too small for any of the possibilities which we intuitively assign non-negligible weight, things should be fine.

"The True Utility Function Has Enough Weight"

First, even assuming the framing of "true utility function" makes sense, it isn't obvious to me that the argument makes sense.

If there's a true utility function utrue which is assigned weight wutrue, and we apply a whole lot of optimization pressure to the overall mixture distribution, then it is perfectly possible that utrue gets compromised for the sake of satisfying a large number of other ui. The weight determines a ratio at which trade-offs can occur, not a ratio of the overall resources which we will get or anything like that.

A first-pass analysis is that wutrue has to be more than 1/2 to guarantee any consideration; any weight less than that, and it's possible that utrue is as low as it can go in the optimized solution, because some outcome was sufficiently good for all other potential utility functions that it made sense to "take the hit" with respect to utrue. We can't formally say "this probably won't happen, because the odds that the best-looking option is specifically terrible for utrue are low" without assuming something about the distribution of highly optimized solutions.

(Such an analysis might be interesting; I don't know if anyone has investigated from that angle. But, it seems somewhat unlikely to do us good, since it doesn't seem like we can make very nice assumptions about what highly-optimized solutions look like.)

In reality, the worst-case analysis is better than this, because many of the more-plausible ui should have a lot of "overlap" with utrue; after all, they were given high weight because they appeared plausible somehow (they agreed with human intuitions, or predicted human behavior, etc). We could try to formally define "overlap" and see what assumptions we need to guarantee better-than-worst-case outcomes. (This might have some interesting learning-theoretic implications for value learning, even.)

However, this whole framing, where we assume that there's a utrue and think about its weight, is suspect. Why should we think that there's a "true" utility function which captures our preferences? And, if there is, why should we assume that it has an explicit representation in the hypothesis space?

If we drop this assumption, we get the classical problems associated with non-realizability in Bayesian learning. Beliefs may not converge at all, as evidence accumulates; they could keep oscillating due to inconsistent evidence. Under the interpretation where we still assume a "true" utility function but we don't assume that it is explicitly representable within the hypothesis space, there isn't a clear guarantee we can get (although perhaps the "overlap" analysis can help here). If we don't assume a true utility function at all, then it isn't clear how to even ask questions about how well we do (although I'm not saying there isn't a useful analysis -- I'm just saying that it is unclear to me right now).

Stuart does address this question, in the end:

I've argued that an indescribable hellworld cannot exist. There's a similar question as to whether there exists human uncertainty about U that cannot be included in the AI's model of Δ. By definition, this uncertainty would be something that is currently unknown and unimaginable to us. However, I feel that it's far more likely to exist, than the indescribable hellworld.Still despite that issue, it seems to me that there are methods of dealing with the Goodhart problem/nearest unblocked strategy problem. And this involves properly accounting for all our uncertainty, directly or indirectly. If we do this well, there no longer remains a Goodhart problem at all.

Perhaps I agree, if "properly accounting for all our uncertainty" includes robustness properties such as calibrated learning, and if we restrict our attention to regressional Goodheart, ignoring the other three.

Well... what about the others, then?

Overcoming adversarial Goodheart seems to require randomization.

The argument here is pretty simple: adversarial Goodheart enters into the domain of game theory, in which mixed strategies tend to be very useful. Quantilization is one such mixed strategy, which seems to usefully address Goodheart to a certain extent. I'm not saying that quantilization is the ultimate solution here. But, it does seem to me like quantilization is significant enough that a solution to Goodheart should say something about the class of problems which quantilization solves.

In particular, a property of quantilization which I find appealing is the way more certainty about the utility function implies that more optimization power can be safely applied to making decisions. This informs my intuition that applying arbitrarily high optimization power does not become safe simply because you've explicitly represented uncertainty about utility functions -- no matter how accurately, short of "perfectly accurately" (which isn't even a meaningful concept), it only seems to justify a limited amount of optimization pressure. This story may be an incorrect one, but if so, I'd like to really understand why it is incorrect.

Unlike the previous sections, this doesn't necessarily step outside of typical Bayesian thought, since this kind of game-theoretic thinking is more or less within the purview of Bayesianism. However, the simple "Bayes solves Goodheart" story doesn't explicitly address this.

(I haven't addressed causal Goodheart anywhere in this essay, since it opens up the whole decision-theoretic can of worms, which seems somewhat beside the main point. (I suppose, arguably, game-theoretic concerns could be beside the point as well -- but, they feel more directly relevant to me, since quantilization is fairly directly about solving Goodheart.))

In summary:
  • If optimizing an arbitrary somewhat-but-not-perfectly-right utility function gives rise to serious Goodheart-related concerns, then why does a mixture distribution over such functions alleviate such concerns? Aren't they just averaging together to yield yet another somewhat-but-not-quite-right function?
  • Regressional Goodheart seems better-addressed by calibrated learning than it does by Bayesian learning.
  • Bayesian learning tends to require a realizability assumption in order to have good properties (including calibration).
  • Even assuming realizability, heavily optimizing a mixture distribution over possible utility functions seems dicey -- it can end up throwing away all the real value if it finds a way to jointly satisfy a lot of the wrong ones. (It is possible that we can find reasonable assumptions under which this doesn't happen, however.)
  • Overcoming adversarial Goodheart seems to require mixed strategies, which the simple "bayesian uncertainty" story doesn't explicitly address.


Discuss

Cambridge LW/SSC Meetup

3 июня, 2019 - 04:55
Published on June 3, 2019 1:55 AM UTC

This is the monthly Cambridge, MA LessWrong / Slate Star Codex meetup.


Note: The meetup is in apartment 2 (the address box here won't let me include the apartment number).



Discuss

What is the evidence for productivity benefits of weightlifting?

2 июня, 2019 - 22:17
Published on June 2, 2019 7:17 PM UTC

I've been weightlifting for a while, and I've heard vaguely good things about it's effect on productivity, like a general increase in energy levels. A recent quick google search session came up empty. If someone looks into the literature and finds something interesting I'll pay a $50 prize.*

Assume the time horizon is <5 years. I'd prefer answers focus predominantly on productivity benefits. Effects on cardiovascular could be part of an analysis, but would not qualify on their own. If the evidence is for something clearly linked to productivity, like sleep, I'd count that. Introspective evidence will also not qualify. Comparisons to other forms of exercise would be especially interesting. Assume a healthy individual, although I'm at least somewhat interested in effects on individuals with depression or anxiety given their prevalence.

*Prize to go to best answer, as judged by me, if there are any that meet some minimal threshold of rigor, also as judged by me.



Discuss

On alien science

2 июня, 2019 - 17:50
Published on June 2, 2019 2:50 PM UTC

In his book The Fabric of Reality, David Deutsch makes the case that science is about coming up with good and true explanations, with all other considerations being secondary. This clashes with the more conventional view that the goal of science is to allow us to make accurate predictions - see for example this quote from the Nobel prize-winning physicist Steven Weinberg:
“The important thing is to be able to make predictions about images on the astronomers’ photographic plates, frequencies of spectral lines, and so on, and it simply doesn’t matter whether we ascribe these predictions to the physical effects of gravitational fields on the motion of planets and photons [as in pre-Einsteinian physics] or to a curvature of space and time.”
It’s true that a key trait of good explanations is that they can be used to make accurate predictions, but I think that taking prediction to be the point of doing science is misguided in a few ways.

Firstly, on a historical basis, many of the greatest scientists were clearly aiming for explanation not prediction. Astronomers like Copernicus and Kepler knew what to expect when they looked at the sky, but spent their lives searching for the reason why it appeared that way. Darwin knew a lot about the rich diversity of life on earth, but wanted to know how it had come about. Einstein was trying to reconcile Maxwell’s equations, the Michelson-Morley experiment, and classical mechanics. Predictions are often useful to verify explanations, but they’re rarely the main motivating force for scientists. And often they’re not the main reason why a theory should be accepted, either. Consider three of the greatest theories of all time: Darwinian evolution, Newtonian mechanics and Einsteinian relativity. In all three cases, the most compelling evidence for them was their ability to cleanly explain existing observations that had previously baffled scientists.

We can further clarify the case for explanation as the end goal of science by considering a thought experiment from Deutsch’s book. Suppose we had an “experiment oracle” that could predict the result of any experiment, but couldn’t tell us why it would turn out that way. In that case, I think experimental science would probably fade away, but the theorists would flourish, because it’d be more important than ever to figure out what questions to ask! Deutsch’s take on this:
“If we gave it the design of a spaceship, and the details of a proposed test flight, it could tell us how the spaceship would perform on such a flight. But it could not design the spaceship for us in the first place. And even if it predicted that the spaceship we had designed would explode on take-off, it could not tell us how to prevent such an explosion. That would still be for us to work out. And before we could work it out, before we could even begin to improve the design in any way, we should have to understand, among other things, how the spaceship was supposed to work. Only then would we have any chance of discovering what might cause an explosion on take-off. Prediction – even perfect, universal prediction – is simply no substitute for explanation.”
The question is now: how does this focus on explanations tie in to other ideas which are emphasised in science, like falsifiability, experimentalism, academic freedom and peer review? I find it useful to think of these aspects of science less as foundational epistemological principles, and more as ways to counteract various cognitive biases which humans possess. In particular:
  1. We are biased towards sharing the beliefs of our ingroup members, and forcing our own upon them.
  2. We’re biased towards aesthetically beautiful theories which are simple and elegant.
  3. Confirmation bias makes us look harder for evidence which supports than which weighs against our own beliefs.
  4. Our observations are by default filtered through our expectations and our memories, which makes them unreliable and low-fidelity.
  5. If we discover data which contradicts our existing theories, we find it easy to confabulate new post-hoc explanations to justify the discrepancy.
  6. We find it psychologically very difficult to actually change our minds.

We can see that many key features of science counteract these biases:
  1. Science has a heavy emphasis on academic freedom to pursue one’s own interests, which mitigates pressure from other academics. Nullius in verba, the motto of the Royal Society (“take nobody’s word for it”) encourages independent verification of others’ ideas.
  2. Even the most beautiful theories cannot overrule conflicting empirical evidence.
  3. Scientists are meant to attempt to experimentally falsify their own theories, and their attempts to do so are judged by their peers. Double-blind peer review allows scientists to feel comfortable giving harsher criticisms without personal repercussions.
  4. Scientists should aim to collect precise and complete data about experiments.
  5. Scientists should pre-register their predictions about experiments, so that it’s easy to tell when the outcome weighs against a theory.
  6. Science has a culture of vigorous debate and criticism to persuade people to change their minds, and norms of admiration for those who do so in response to new evidence.

But imagine an alien species with the opposite biases:
  1. They tend to trust the global consensus, rather than the consensus of those directly around them.
  2. Their aesthetic views are biased towards theories which are very data-heavy and account for lots of edge cases.*
  3. When their views diverge from the global consensus, they look harder for evidence to bring themselves back into line than for evidence which supports their current views.
  4. Their natural senses and memories are precise, unbiased and high-resolution.
  5. When they discover data which contradicts their theories, they find it easiest to discard those theories rather than reformulating them.
  6. They change their minds a lot.

In this alien species, brave iconoclasts who pick an unpopular view and research it extensively are much less common than they are amongst humans. Those who try to do so end up focusing on models with (metaphorical or literal) epicycles stacked on epicycles, rather than the clean mathematical laws which have actually turned out to be more useful for conceptual progress in many domains. In formulating their detailed, pedantic models, they pay too much attention to exhaustively replaying their memories of experiments, and not enough to what concepts might underlie them. And even if some of them start heading in the right direction, a few contrary pieces of evidence would be enough to turn them back from it - for example, their heliocentrists might be thrown off track by their inability to observe stellar parallax. Actually, if you’re not yet persuaded that this alien world would see little scientific progress, you should read my summary of The Sleepwalkers. In that account of the early scientific revolution, any of the alien characteristics above would have seriously impeded key scientists like Kepler, Galileo and others (except perhaps the eidetic memories).

And so the institutions which actually end up pushing forward scientific progress on their world would likely look very different from the ones which did so on ours. Their Alien Royal Society would encourage them to form many small groups which actively reinforced each other’s idiosyncratic views and were resistant to outside feedback. They should train themselves to seek theoretical beauty rather than empirical validation - and actually, they should pay much less attention to contradictory evidence than members of their species usually do. Even when they’re tempted to change their minds and discard a theory, they should instead remind themselves of how well it post-hoc explains previous data, and put effort into adjusting it to fit the new data, despite how unnatural doing so seems to them. Those who change their minds too often when confronted with new evidence should be derided as wishy-washy and unscientific.

These scientific norms wouldn’t be enough to totally reverse their biases, any more than our scientific norms make us rejoice when our pet theory is falsified. But in both cases, they serve as nudges towards a central position which is less burdened by species-contingent psychological issues, and better at discovering good explanations.


* Note that this might mean the aliens have different standards for what qualifies as a good explanation than we do. But I don’t think this makes a big difference. Suppose that the elegant and beautiful theory we are striving for is a small set of simple equations which governs all motion in the solar system, and the elegant and beautiful theory they are striving for is a detailed chart which traces out the current and future positions of all objects in the solar system. It seems unlikely that they could get anywhere near the latter without using Newtonian gravitation. So a circular-epicycle model of the solar system would be a dead end even by the aliens’ own standards.

Discuss

Moral Mazes and Short Termism

2 июня, 2019 - 14:30
Published on June 2, 2019 11:30 AM UTC

Previously: Short Termism and Quotes from Moral Mazes

Epistemic Status: Long term

My list of quotes from moral mazes has a section of twenty devoted to short term thinking. It fits with, and gives internal gears and color to, my previous understanding of of the problem of short termism.

Much of what we think of as a Short Term vs. Long Term issue is actually an adversarial Goodhart’s Law problem, or a legibility vs. illegibility problem, at the object level, that then becomes a short vs. long term issue at higher levels. When a manager milks a plant (see quotes 72, 73, 78 and 79) they are not primarily trading long term assets for short term assets. Rather, they are trading unmeasured assets for measured assets (see 67 and 69).

This is why you can have companies like Amazon, Uber or Tesla get high valuations. They hit legible short-term metrics that represent long-term growth. A start-up gets rewarded for their own sort of legible short-term indicators of progress and success, and of the quality of team and therefore potential for future success. Whereas other companies, that are not based on growth, report huge pressure to hit profit numbers.

The overwhelming object level pressure towards legible short-term success, whatever that means in context, comes from being judged in the short term on one’s success, and having that judgment being more important than object-level long term success.

The easiest way for this to be true is not to care about object-level long term success. If you’re gone before the long term, and no one traces the long term back to you, why do you care what happens? That is exactly the situation the managers face in Moral Mazes (see 64, 65, 70, 71, 74 and 83, and for a non-manager very clean example see 77). In particular:

74. We’re judged on the short-term because everybody changes their jobs so frequently.

And:

64. The ideal situation, of course, is to end up in a position where one can fire one’s successors for one’s own previous mistakes.

Almost as good as having a designated scapegoat is to have already sold the company or found employment elsewhere, rendering your problems someone else’s problems.

The other way to not care is for the short-term evaluation of one’s success or failure to impact long-term success. If not hitting a short-term number gets you fired, or prevents your company from getting acceptable terms on financing or gets you bought out, then the long term will get neglected. The net present value payoff for looking good, which can then be reinvested, makes it look like by far the best long term investment around.

Thus we have this problem at every level of management except the top. But for the top to actually be the top, it needs to not be answering to the stock market or capital markets, or otherwise care what others think – even without explicit verdicts, this can be as hard to root out as needing the perception of a bright future to attract and keep quality employees and keep up morale. So we almost always have it at the top as well. Each level is distorting things for the level above, and pushing these distorted priorities down to get to the next move in a giant game of adversarial telephone (see section A of quotes for how hierarchy works).

This results in a corporation that acts in various short-term ways, some of which make sense for it, some of which are the result of internal conflicts.

Why isn’t this out-competed? Why don’t the corporations that do less of this drive the ones that do more of it out of the market?

On the level of corporations doing this direct from the top, often these actions are a response to the incentives the corporation faces. In those cases, there is no reason to expect such actions to be out-competed.

In other cases, the incentives of the CEO and top management are twisted but the corporation’s incentives are not. One would certainly expect those corporations that avoid this to do better. But these mismatches are the natural consequence of putting someone in charge who does not permanently own the company. Thus, dual class share structures becoming popular to restore skin in the correct game. Some of the lower-down issues can be made less bad by removing the ones at the top, but the problem does not go away, and what sources I have inside major tech companies including Google match this model.

There is also the tendency of these dynamics to arise over time. Those who play the power game tend to outperform those who do not play it barring constant vigilance and a willingness to sacrifice. As those players outperform, they cause other power players to outperform more, because they prefer and favor such other players, and favor rules that favor such players. This is especially powerful for anyone below them in the hierarchy. An infected CEO, who can install their own people, can quickly be game over on its own, and outside CEOs are brought in often.

Thus, even if the system causes the corporation to underperform, it still spreads, like a meme that infects the host, causing the host to prioritize spreading the meme, while reducing reproductive fitness. The bigger the organization, the harder it is to remain uninfected. Being able to be temporarily less burdened by such issues is one of the big advantages new entrants have.

One could even say that yes, they do get wiped out by this, but it’s not that fast, because it takes a while for this to rise to the level of a primary determining factor in outcomes. And there are bigger things to worry about. It’s short termism, so that isn’t too surprising.

A big pressure that causes these infections is that business is constantly under siege and forced to engage in public relations (see quotes sections L and M) and is constantly facing Asymmetric Justice and the Copenhagen Interpretation of Ethics. This puts tremendous pressure on corporations to tell different stories to different audiences, to avoid creating records, and otherwise engage in the types of behavior that will be comfortable to the infected and uncomfortable to the uninfected.

Another explanation is that those who are infected don’t only reward each other within a corporation. They also do business with and cooperate with the infected elsewhere. Infected people are comfortable with others who are infected, and uncomfortable with those not infected, because if the time comes to play ball, they might refuse. So those who refuse to play by these rules do better at object-level tasks, but face alliances and hostile action from all sides, including capital markets, competitors and government, all of which are, to varying degrees, infected.

I am likely missing additional mechanisms, either because I don’t know about them or forgot to mention them, but I consider what I see here sufficient. I am no longer confused about short termism.

 

 



Discuss

Eli's shortform feed

2 июня, 2019 - 12:21
Published on June 2, 2019 9:21 AM UTC

I'm mostly going to use this to crosspost links to my blog for less polished thoughts, Musings and Rough Drafts.



Discuss

Selection vs Control

2 июня, 2019 - 10:01
Published on June 2, 2019 7:01 AM UTC


This is something which has bothered me for a while, but, I'm writing it specifically in response to the recent post on mesa-optimizers.

I feel strongly that the notion of 'optimization process' or 'optimizer' which people use -- partly derived from Eliezer's notion in the sequences -- should be split into two clusters. I call these two clusters 'selection' vs 'control'. I don't have precise formal statements of the distinction I'm pointing at; I'll give several examples.

Before going into it, several reasons why this sort of thing may be important:

  • It could help refine the discussion of mesa-optimization. The article restricted its discussion to the type of optimization I'll call 'selection', explicitly ruling out 'control'. This choice isn't obviously right. (More on this later.)
  • Refining 'agency-like' concepts like this seems important for embedded agency -- what we eventually want is a story about how agents can be in the world. I think almost any discussion of the relationship between agency and optimization which isn't aware of the distinction I'm drawing here (at least as a hypothesis) will be confused.
  • Generally, I feel like I see people making mistakes by not distinguishing between the two. I judge an algorithm differently if it is intended as one or the other.

(See also Stuart Armstrong's summary of other problems with the notion of optimization power Eliezer proposed -- those are unrelated to my discussion here, and strike me more as technical issues which call for refined formulae, rather than conceptual problems which call for revised ontology.)

The Basic Idea

Eliezer quantified optimization power by asking how small a target an optimization process hits, out of a space of possibilities. The type of 'space of possibilities' is what I want to poke at here.

Selection

First, consider a typical optimization algorithm, such as simulated annealing. The algorithm constructs an element of the search space (such as a specific combination of weights for a neural network), gets feedback on how good that element is, and then tries again. Over many iterations of this process, it finds better and better elements. Eventually, it outputs a single choice.

This is the prototypical 'selection process' -- it can directly instantiate any element of the search space (although typically we consider cases where the process doesn't have time to instantiate all of them), it gets direct feedback on the quality of each element (although evaluation may be costly, so that the selection process must economize these evaluations), the quality of an element of search space does not depend on the previous choices, and only the final output matters.

The term 'selection process' refers to the fact that this type of optimization selects between a number of explicitly given possibilities. The most basic example of this phenomenon is a 'filter' which rejects some elements and accepts others -- like selection bias in statistics. This has a limited ability to optimize, however, because it allows only one iteration. Natural selection is an example of much more powerful optimization occurring through iteration of selection effects.

Control

Now, consider a targeting system on a rocket -- let's say, a heat-seeking missile. The missile has sensors and actuators. It gets feedback from its sensors, and must somehow use this information to decide how to use its actuators. This is my prototypical control process. (The term 'control process' is supposed to invoke control theory.) Unlike a selection process, a controller can only instantiate one element of the space of possibilities. It gets to traverse exactly one path. The 'small target' which it hits is therefore 'small' with respect to a space of counterfactual possibilities, with all the technical problems of evaluating counterfactuals. We only get full feedback on one outcome (although we usually consider cases where the partial feedback we get along the way gives us a lot of information about how to navigate toward better outcomes). Every decision we make along the way matters, both in terms of influencing total utility, and in terms of influencing what possibilities we have access to in subsequent decisions.

So: in evaluating the optimization power of a selection process, we have a fairly objective situation on our hands: the space of possibilities is explicitly given; the utility function is explicitly given; we can compare the true output of the system to a randomly chosen element. In evaluating the optimization power of a control process, we have a very subjective situation on our hands: the controller only truly takes one path, so any judgement about a space of possibilities requires us to define counterfactuals; it is less clear how to define an un-optimized baseline; utility need not be explicitly represented in the controller, so may have to be inferred (or we think of it as parameter, so, we can measure optimization power with respect to different utility functions, but there's no 'correct' one to measure).

I do think both of these concepts are meaningful. I don't want to restrict 'optimization' to refer to only one or the other, as the mesa-optimization essay does. However, I think the two concepts are of a very different type.

Bottlecaps & Thermostats

The mesa-optimizer write-up made the decision to focus on what I call selection processes, excluding control processes:

We will say that a system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system. [...] For example, a bottle cap causes water to be held inside the bottle, but it is not optimizing for that outcome since it is not running any sort of optimization algorithm.(1) Rather, bottle caps have been optimized to keep water in place.

It makes sense to say that we aren't worried about bottlecaps when we think about the inner alignment problem. However, this also excludes much more powerful 'optimizers' -- something more like a plant.

When does a powerful control process become an 'agent'?

  • Bottlecaps: No meaningful actuators or sensors. Essentially inanimate. Does a particular job, possibly very well, but in a very predictable manner.
  • Thermostats: Implements a negative feedback loop via a sensor, an actuator, and a policy of "correcting" things when sense-data indicates they are "off". Actual thermostats explicitly represent the target temperature, but one can imagine things in this cluster which wouldn't -- in general, the connection between what is sensed and how things are 'corrected' can be quite complex (involving many different sensors and actuators), so that no one place in the system explicitly represents the 'target'.
  • Plants: Plants are like very complex thermostats. They have no apparent 'target' explicitly represented, but can clearly be thought of as relatively agentic, achieving complicated goals in complicated environments.
  • Guided Missiles: These are also mostly in the 'thermostat' category, but, guided missiles can use simple world-models (to track the location of the target). However, any 'planning' is likely based on explicit formulae rather than any search. (I'm not sure about actual guided missiles.) If so, a guided missile would still not be a selection process, and therefore lack a "goal" in the mesa-optimizer sense, despite having a world-model and explicitly reasoning about how to achieve an objective represented within that world-model.
  • Chess Programs: A chess-playing program has to play each game well, and every move is significant to this goal. So, it is a control process. However, AI chess algorithms are based on explicit search. Many, many moves are considered, and each move is evaluated independently. This is a common pattern. The best way we know how to implement very powerful controllers is to use search inside (implementing a control process using a selection process). At that point, a controller seems clearly 'agent-like', and falls within the definition of optimizer used in the meso-optimization post. However, it seems to me that things become 'agent-like' somewhere before this stage.

(See also: adaptation-executers, not fitness maximizers.)

I don't want to frame it as if there's "one true distinction" which we should be making, which I'm claiming the mesa-optimization write-up got wrong. Rather, we should pay attention to the different distinctions we might make, studying the phenomena separately and considering the alignment/safety implications of each.

This is closely related to the discussion of upstream daemons vs downstream daemons. A downstream-daemon seems more likely to be an optimizer in the sense of the mesa-optimization write-up; it is explicitly planning, which may involve search. These are more likely to raise concerns through explicitly reasoned out treacherous turns. An upstream-daemon could use explicit planning, but it could also be only a bottlecap/thermostat/plant. It might powerfully optimize for something in the controller sense without internally using selection. This might produce severe misalignment, but not through explicitly planned treacherous turns. (Caveat: we don't understand mesa-optimizers; an understanding sufficient to make statements such as these with confidence would be a significant step forward.)

It seems possible that one could invent a measure of "control power" which would rate highly-optimized-but-inanimate objects like bottlecaps very low, while giving a high score to thermostat-like objects which set up complicated negative feedback loops (even if they didn't use any search).

Processes Within Processes

I already mentioned the idea that the best way we know how to implement powerful control processes is through powerful selection (search) inside of the controller.

To elaborate a bit on that: a controller with a search inside would typically have some kind of model of the environment, which it uses by searching for good actions/plans/policies for achieving its goals. So, measuring the optimization power as a controller, we look at how successful it is at achieving its goals in the real environment. Measuring the optimization power as a selector, we look at how good it is at choosing high-value options within its world-model. The search can only do as well as its model can tell it; however, in some sense, the agent is ultimately judged by the true consequences of its actions.

IE, in this case, the selection vs control distinction is a map/territory distinction. I think this is part of why I get so annoyed at things which mix up selection and control: it looks like a map/territory error to me.

However, this is not the only way selection and control commonly relate to each other.

Effective controllers are very often designed through a search process. This might be search taking place within a model, again (for example, training a neural network to control a robot, but getting its gradients from a physics simulation so that you can generate a large number of training samples relatively cheaply) or the real world (evolution by natural selection, "evaluating" genetic code by seeing what survives).

Further complicating things, a powerful search algorithm generally has some "smarts" to it, ie, it is good at choosing what option to evaluate next based on the current state of things. This "smarts" is controller-style smarts: every choice matters (because every evaluation costs processing power), there's no back-tracking, and you have to hit a narrow target in one shot. (Whatever the target of the underlying search problem, the target of the search-controller is: find that target, quickly.) And, of course, it is possible that such a search-controller will even use a model of the fitness landscape, and plan its next choice via its own search!

(I'm not making this up as a weird hypothetical; actual algorithms such as estimation-of-distribution algorithms will make models of the fitness landscape. For obvious reasons, searching for good points in such models is usually avoided; however, in cases where evaluation of points is expensive enough, it may be worth it to explicitly plan out test-points which will reveal the most information about the fitness landscape, so that the best point can be selected later.)

Blurring the Lines: What's the Critical Distinction?

I mentioned earlier that this dichotomy seems more like a conceptual cluster than a fully formal distinction. I mentioned a number of big differences which stick out at me. Let's consider some of these in more detail.

Perfect Feedback

The classical sort of search algorithm I described as my central example of a selection process includes the ability to get a perfect evaluation of any option. The difficulty arises only from the very large number of options available. Selection processes, on the other hand, appear to have very bad feedback, since you can't know the full outcome until it is too late to do anything about it. Can we use this as our definition?

I would agree that a search process in which the cost of evaluation goes to infinity becomes purely a control process: you can't perform any filtering of possibilities based on evaluation, so, you have to output one possibility and try to make it a good one (with no guarantees). Maybe you get some information about the objective function (like its source code), and you have to try to use that to choose an option. That's your sensors and actuators. They have to be very clever to achieve very good outcomes. The cheaper it is to evaluate the objective function on examples, the less "control" you need (the more you can just do brute-force search). In the opposite extreme, evaluating options is so cheap that you can check all of them, and output the maximum directly.

While this is somewhat appealing, it doesn't capture every case. Search algorithms today (such as stochastic gradient descent) often have imperfect feedback. Game-tree search deals with an objective function which is much too costly to evaluate directly (the quality of a move), but can be optimized for nonetheless by recursively searching for good moves in subgames down the game tree (mixed with approximate evaluations such as rollouts or heuristic board evaluations). I still think of both of these as solidly on the "selection process" side of things.

On the control process side, it is possible to have perfect feedback without doing any search. Thermostats realistically have noisy information about the temperature of a room, but, you can imagine a case where they get perfect information. It isn't any less a controller, or more a selection process, for that fact.

Choices Don't Change Later Choices

Another feature I mentioned was that in selection processes, all options are available to try at any time, and what you look at now does not change how good any option will be later. On the other hand, in a control process, previous choices can totally change how good particular later choices would be (as in reinforcement learning), or change what options are even available (as in game playing).

First, let me set two complications aside.

  • Weird decision theory cases: it is theoretically possible to screw with a search by giving it an objective function which depends on its choices during search. This doesn't seem that interesting for our purposes here. (And that's coming from me...)
  • Local search limits the "options" to small modifications of the option just considered. I don't think this is blurring the lines between search and control; rather, it is more like using a controller within a smart search to try to increase efficiency, as I discussed at the end of the processes-within-processes section. All the options are still "available" at all times; the search algorithm just happens to be one which limits itself to considering a smaller list.

I do think some cases blur the lines here, though. My primary example is the multi-armed bandit problem. This is a special case of the RL problem in which the history doesn't matter; every option is equally good every time, except for some random noise. Yet, to me, it is still a control problem. Why? Because every decision matters. The feedback you get about how good a particular choice was isn't just thought of as information; you "actually get" the good/bad outcome each time. That's the essential character of the multi-armed bandit problem: you have to trade off between experimentally trying options you're uncertain about vs sticking with the options which seem best so far, because every selection carries weight.

This leads me to the next proposed definition.

Offline vs Online

Selection processes are like offline algorithms, whereas control processes are like online algorithms.

With offline algorithms, you only really care about the end results. You are OK running gradient descent for millions of iterations before it starts doing anything cool, so long as it eventually does something cool.

With online algorithms, you care about each outcome individually. You would probably not want to be gradient-descent-training a neural network in live user-servicing code on a website, because live code has to be acceptably good from the start. Even if you can initialize the neural network to something acceptably good, you'd hesitate to run stochastic gradient descent on it live, because stochastic gradient descent can sometimes dramatically decrease performance for a while before improving performance again.

Furthermore, online algorithms have to deal with non-stationarity. This seems suitably like a control issue.

So, selection processes are "offline optimization", whereas control processes are "online optimization": optimizing things "as they progress" rather than statically. (Note that the notion of "online optimization" implied by this line of thinking is slightly different from the common definition of online optimization, though related.)

The offline vs online distinction also has a lot to do with the sorts of mistakes I think people are making when they confuse selection processes and control processes. Reinforcement learning, as a subfield of AI, was obviously motivated from a highly online perspective. However, it is very often used as an offline algorithm today, to produce effective agents, rather than as an effective agent. So, that there's been some mismatch between the motivations which shaped the paradigm and actual use. This perspective made it less surprising when black-box optimization beat reinforcement learning on some problems (see also).

This seems like the best definition so far. However, I personally still feel like it is still missing something important. Selection vs control feels to me like a type distinction, closer to map-vs-territory.

To give an explicit counterexample: evolution by natural selection is obviously a selection process according to the distinction as I make it, but it seems much more like an online algorithm than on offline one, if we try to judge it as such.

Internal Features vs Context

Returning to the definition in mesa-optimizers (emphasis mine):

Whether a system is an optimizer is a property of its internal structure—what algorithm it is physically implementing—and not a property of its input-output behavior. Importantly, the fact that a system’s behavior results in some objective being maximized does not make the system an optimizer.

The notion of a selection process says a lot about what is actually happening inside a selection process: there is a space of options, which can be enumerated; it is trying them; there is some kind of evaluation; etc.

The notion of control process, on the other hand, is more externally defined. It doesn't matter what's going on inside of the controller. All that matters is how effective it is at what it does.

A selection process -- such as a neural network learning algorithm -- can be regarded "from outside", asking questions about how the one output of the algorithm does in the true environment. In fact, this kind of thinking is what we do when we think about generalization error.

Similarly, we can analyze a control process "from inside", trying to find the pieces which correspond to beliefs, goals, plans, and so on (or postulate what they would look like if they existed -- as must be done in the case of controllers which truly lack such moving parts). This is the decision-theoretic view.

In this view, selection vs control doesn't really cluster different types of object, but rather, different types of analysis. To a large extent, we can cluster objects by what kind of analysis we would more often want to do. However, certain cases (such as a game-playing AI) are best viewed through both lenses (as a controller, in the context of doing well in a real game against a human, and as a selection process, when thinking about the game-tree search).

Overall, I think I'm probably still somewhat confused about the whole selection vs control issue, particularly as it pertains to the question of how decision theory can apply to things in the world.



Discuss

Страницы