# Новости LessWrong.com

A community blog devoted to refining the art of rationality
Обновлено: 33 минуты 15 секунд назад

### What empirical work has been done that bears on the 'freebit picture' of free will?

5 октября, 2019 - 02:11
Published on October 4, 2019 11:11 PM UTC

This picture was described in Scott Aaronson's essay The Ghost in the Quantum Turing Machine in 2013, and claims that human free will is related to our choices being caused by (and/or causing) quantum bits from the initial state of the universe that first have macroscopic effects inside our brains, where all other observers must have purely Knightian uncertainty over such bits[*]. For it to be plausible, a few facts have to be true about human brain biology and cosmic background radiation:

1. "Quantum uncertainty—for example, in the opening and closing of sodium-ion channels—can not only get chaotically amplified by brainactivity, but can do so “surgically” and on “reasonable” timescales."
2. All photons that impinge on human brains have quantum states that could not "be altered, maintaining a spacetime history consistent with the laws of physics, without also altering classical degrees of freedom in the photons’ causal past".

Have these questions been studied in the intervening years, and what have the results been? Note that the plausibility of the picture has been discussed before on LW, and I'm not interested in further discussing whether a priori it seems at all promising to link free will and Knightian uncertainty.

[*] This is a poor summary, I recommend reading the paper if you have time.

Discuss

### Ideal Number of Parents

4 октября, 2019 - 23:00
Published on October 4, 2019 8:00 PM UTC

I'll often hear people say with varying levels of seriousness that the ideal number of parents to have is something large, like maybe five. Children, especially infants, can be an enormous amount of work, which is certainly easier spread out among more people. Kids also like getting a lot of attention, can require a lot of money, and, since different adults are good with kids in different ways, can benefit from having a range of adults in their lives. But while having many people involved in taking care of the kids is great, I'm not sure having all of them be co-equal parents is a good approach.

First, a question: why are people excited about putting so much of themselves into parenting, when if you wanted to pay for similar levels of childcare it would be incredibly expensive? Not to suggest that parenting is simply unpaid childcare; thinking about what why these superficially similar situations lead to such different levels of desire helps illuminate what's important about parenting.

Answers will, of course, vary based on individual perspectives and drives. For some people the answer is "I'm not excited about this, which is why I don't want kids", but for people who do want to be a parent I think it's common for things to trace back through two points:

• Having a substantial say in how the child is raised.
• Knowing that this is a life-long relationship.

For the first point, the more people you have co-parenting the less say each one has and the harder it is to reach agreement. Parents can have different ideas on what is safe, how discipline should work, how much help to give, how to do food, value of different kinds of toys/screens/games, co-sleeping, night training, potty training, is it ok to microwave baby milk, what rules to have for sharing, how structured the day should be, when they're ready to go outside alone, how to do money, what to do for childcare, when bedtime should be, what's important in schooling, how important is predictability, how to handle various unique challenges most kids have in some form, how to do presents, when to let them try a thing, what medical treatments make sense, how much to let them make their own decisions, whether to let them ask people for things when it's kind of rude, how much to push them, when to encourage an interest, how to build responsibility, and how to balance all kinds of tricky tradeoffs.

For important issues having more people involved in the decision could make it more likely you are to get good decisions, but you need to balance this against how hard it is to get people to come to agreement on things they feel very strongly about. Unless all the parents have an incredibly close sense of how kids should be raised there will be a lot of these, based on different childhood experience, different parenting philosophy, how to weigh different factors, etc. This is hard enough with two people, and it seems like something that gets substantially more difficult the more parents there are.

For the second point, the permanent nature of the relationship allows a kind of parent-child bonding that people are understandably wary of in more temporary arrangements. I care enormously about what happens to my kids, and part of that is knowing that they're my responsibility no matter what. Getting this kind of assurance of permanence with a larger number of parents is legally somewhere between "very difficult" and "not possible" in a society where only some parents will have official status. The legal parent(s) could at some point, if things fall apart, cut the others out. If you think co-parents wouldn't do this, consider how many loving relationships collapse into spitefests in divorce. Even if we fixed the legal aspect, however, the more people you have in parental roles the more likely there is to be some kind of falling-out over the years, and joint custody among large numbers of households wouldn't work well.

Other aspects that could be a problem, however, seem like they could be managed with good communication, good culture, and dividing things. For example, I do the kids breakfast and pack Lily's lunch in the morning (hence the thermos experimentation). I then pay attention to what comes home uneaten in the lunchbox to try to figure out what I should send next time. In figuring out what to send I also pay attention to what she's been eating and not eating at breakfast and dinner. [1] Even if we had several other co-parents we could still divide things up so this was all on one person and avoid having to manage this process across the full number of parents. I do think the ways parenting work tends to drift towards the people who are already doing the most, because they're currently best at it, are more of an issue, but a surmountable one if you're attentive.

Overall I do think something with more parents can work, and I'm excited people are trying out new approaches. I think it could turn out to be really positive for children to have so many adults strongly invested in their well-being. But I think children having one or two parents in a strong and stable community/household of family/friends probably works better than a larger number of fully-equal parents.

[1] I initially tried an approach of asking her each day what she wanted for lunch, but it turns out she's pretty bad at predicting what she's going to want to eat. So my current strategy is that I pack what I think she'll eat, and then if she wants me to pack something in addition she can ask me to and I'll do that as well. I do eventually want to move to her packing her own lunch and getting good at figuring out what she wants to eat, but at least for now it's much more important to me that she be getting enough to eat.

Discuss

### Book 1: 69-92 (13 Causes of Bad Science)

4 октября, 2019 - 22:52
Published on October 4, 2019 7:52 PM UTC

This is the sixth post in the Novum Organum sequence. For context, see the sequence introduction.

We have used Francis Bacon's Novum Organum in the version presented atwww.earlymoderntexts.com. Translated by and copyright to Jonathan Bennett. Prepared for LessWrong by Ruby.

Novum Organum is organized as two books each containing numbered "aphorisms." These vary in length from three lines to sixteen pages. Bracketed titles of posts in this sequence, e.g. Idols of the Mind Pt. 1, are my own and do not appear in the original.While the translator, Bennett, encloses his editorial remarks in a single pair of [brackets], I have enclosed mine in a [[double pair of brackets]].

[Brackets] enclose editorial explanations. Small ·dots· enclose material that has been added, but can be read as though it were part of the original text. Occasional •bullets, and also indenting of passages that are not quotations, are meant as aids to grasping the structure of a sentence or a thought. Every four-point ellipsis . . . . indicates the omission of a brief passage that seems to present more difficulty than it is worth. Longer omissions are reported between brackets in normal-sized type.Aphorism Concerning the Interpretation of Nature: Book 1: 69–92

by Francis Bacon

[[Bacon continues on from the discussion of Idols of the Mind. Demonstration might be interpret simply as experiment, but is likely closer to meaning of Aristotle's demonstrations: a scientific deduction where one moves from premises in which one has high confidence to new conclusions.]]

69. But the idols have defences and strongholds, namely defective demonstrations; and the demonstrations we have in dialectics do little except make •the world a slave to •human thought, and make human thought a slave to •words. Demonstrations are indeed incipient philosophies and sciences: how good or bad a demonstration is determines how good or bad will be the system of philosophy and the thoughts that follow it. Now the demonstrations that we use in our whole process of getting from the •senses and •things to •axioms and conclusions are defective and inappropriate. This process has four parts, with a fault in each of them. (1) The impressions of the senses itself are faulty, for the senses omit things and deceive us. Their omissions should be made up for, and their deceptions corrected. (2) Notion are abstracted badly from the impressions of the senses, and are vague and confused where they should be definite and clearly bounded.

(3) Induction goes wrong when it infers scientific principles by simple enumeration, and doesn’t, as it should, take account of the exceptions and distinctions that nature is entitled to. (4) The method of discovery and proof in which you first state the most general principles and then bring the intermediate axioms into the story, ‘proving’ them from the general principles, is the mother of errors and a disaster for all the sciences. At this stage I merely touch on these matters. I’ll discuss them more fully when, after performing these cleansings and purgings of the mind, I come to present the true way of interpreting nature.

70. The procedure that starts with experience and sticks close to it is the best demonstration by far. A procedure that involves transferring a result to other cases that are judged to be similar is defective unless the transfer is made by a sound and orderly process. The way men conduct experiments these days is blind and stupid. Wandering and rambling with no settled course and only such ‘plans’ as events force on them, they cast about and touch on many matters, but don’t get far with them. Sometimes they are eager, sometimes distracted; and they always find that some further question arises. They usually conduct their experiments casually, as though this were just a game; they slightly vary experiments that are already known; and if an experiment doesn’t come off, they grow weary and give up the attempt. And even if they worked harder at their experiments, applying themselves more seriously and steadfastly, ·they still wouldn’t get far, because· they work away at some one experiment, as Gilbert did with the magnet and the chemists do with gold. That is a way of proceeding that is as unskilful as it is feeble. For no-one successfully investigates the nature of a thing taken on its own; the inquiry needs to be enlarged so as to become more general.

And even when they try to draw some science, some doctrines, from their experiments, they usually turn aside and rashly embark on premature questions of practical application; not only for the practical benefits of such applications, but also because they want to do things that will •assure them that it will be worth their while to go on, and •show themselves in a good light to the world and so •raise the credit of the project they are engaged in. They are behaving like Atalanta ·in the legend from ancient Greece·: she turned aside to chase a golden ball, interrupting her running of the race and letting victory slip through her fingers. But in using the true course of experience to carry out new works, we should model our behaviour on the divine wisdom and order. On the first day of creation God created light and nothing else, devoting an entire day to a work in which no material substance was created. We should follow suit: with experience of any kind, we should first try to discover true causes and axioms, looking for •enlightening experiments rather than for •practically fruitful ones. For axioms don’t singly prepare the way for practical applications, but clusters of rightly discovered and established axioms do so, bringing in their wake streams—crowds!—of practical works. The paths of experience are just as rocky and jammed as the paths of judgment, and I’ll discuss that later. I have mentioned ordinary experimental work at this stage only in its role as a bad kind of demonstration. But considerations of order now demand that I take up next ·two linked topics·: •the signs or omens (mentioned a little way back) that current systems of philosophy and of thought are in a bad condition; and •the causes of ·this badness, which· seems at first so strange and incredible. When you have seen •the signs you will be more likely to agree ·with me about the badness·; and my explanation of •its causes will make it seem less strange. These two together will greatly help to render the process of wiping the idols from the intellect easier and smoother. ·My discussion of •the signs will run to the end of 77, and •the causes will run from there to the middle of 92·.

[In the next seven sections, the Latin signa will be translated sometimes as ‘signs’ and sometimes as ‘omens’.]

71. The sciences that we have come mostly from the Greeks. For the additions by Roman, Arabic and later writers are neither plentiful nor important, and such as they are they have been built on the foundation of Greek discoveries. Now, the wisdom of the Greeks was that of teachers of rhetoric, and it spawned disputations, which made it the worst kind of inquiry for finding the truth. Those who wanted to be thought of as philosophers contemptuously gave the label ‘sophists’ to the ancient rhetoricians Gorgias, Protagoras, Hippias and Polus; but really the label fits the whole lot of them: Plato, Aristotle, Zeno, Epicurus, Theophrastus, and their successors Chrysippus, Carneades and so on. There was this just difference: •the rhetoricians were wandering and mercenary, going from town to town, offering their wisdom for sale, and taking a price for it; whereas •the others were more ceremonial and ‘proper’—men who had settled homes, and who opened schools and taught their philosophy without charging for it. But although the two groups of philosophers were in other ways unalike, they had one thing in common: both lots were teachers of rhetoric; both turned everything into a matter for disputations, and created sects that they defended against heresies. They turned it all into •‘the talk of idle old men to ignorant youths’ (Dionysius’s jibe against Plato, a not unfair one!). But the earlier of the Greek philosophers—Empedocles, Anaxagoras, Leucippus, Democritus, Parmenides, Heraclitus, Xenophanes, Philolaus and so on (omitting Pythagoras because he was a mystic)—didn’t open schools, as far as we know. What they did was to apply themselves to the discovery of truth, doing this

• more quietly, severely and simply—that is, with less affectation and parade—

than the others did. And in my judgment they also performed

• more successfully,

·or would have done so· if it weren’t for the fact that their works were in the course of time obscured by less substantial people who offered more of what suits and pleases the capacity and tastes of the vulgar. Time is like a river, bringing lightweight floating stuff down to us and letting heavier and solider things sink. Still, not even they—·Empedocles and the rest·—were entirely free of the Greek fault: they leaned too far in the direction of ambition and vanity, founding sects and aiming for popular applause. The inquiry after •truth has no chance of succeeding when it veers off after •trifles of this kind. And I ought to mention the judgment, or rather the prediction, that an Egyptian priest made about the Greeks, namely that ‘they are always boys, with no •long-established knowledge and no •knowledge of ancient times’ [neater in Latin: •antiquitatem scientiae and •scientiam antiquitatis]. Assuredly they were like boys in their readiness to chatter, and in their inability to father anything—for their wisdom is full of words but sterile in works. So when we consider the currently accepted philosophy in the light of its place of origin and its family tree, the omens are not good!

72. And the omens provided by the character of the time and age aren’t much better than the ones from the character of the place and the nation. For knowledge at that period concerned only a short stretch of time and a small part of the world, and that’s the worst state to be in, especially for those who base everything on experience. For the preceding thousand years they had no history worthy of the name, but only fables and verbal traditions. And they knew only a small portion of the regions and districts of the world; they indiscriminately called everyone to the north of them ‘Scythians’. and those to the west ‘Celts’; they knew nothing of Africa beyond the nearest part of Ethiopia, or of Asia beyond the Ganges. They knew even less about the provinces of the New World. . . .and declared to be uninhabitable a multitude of climates and zones where actually countless nations live and breathe. . . . (Contrast that with the present day: we know many parts of the New World as well as the whole of the Old World, and our stock of experience has grown infinitely.) So if like astrologers we take omens ·for contemporary systems of philosophy· from the facts about when they were born, we can’t predict anything great for them.

73. Of all the signs ·we can have of the value of a field of endeavour·, none are more certain or more conspicuous than those based on the upshots ·of the endeavour·. For upshots and useful practical applications are like sponsors and guarantors of the truth of philosophies. [Throughout this work, ‘philosophies’ include ‘sciences’.] Now, from all those systems of the Greeks and the particular sciences derived from them, you can hardly name a single experiment that •points the way to some improvement in the condition of man, and that •really does come from the speculations and theories of philosophy. Hardly one, after all those years! And Celsus honestly and sensibly admits as much, when he tells us that •the practical part of medicine was discovered first, and that then •men philosophized about it and hunted for and assigned causes; rather than the reverse process in which •philosophy and the knowledge of causes led to •the discovery and development of the practical part. So it isn’t strange that among the Egyptians, who rewarded inventors with divine honours and sacred rites, there were more images of the lower animals than of men; for the lower animals have made many discoveries through their natural instincts, whereas men have given birth to few or none through their discussions and rational inferences.

The work of chemists has produced a little, but only •accidentally and in passing or else •by varying previous experiments (just as a mechanic might do!), and not by any skill or any theory. For the theory that they have devised does more to confuse the experiments than to help them. And the people who have busied themselves with so-called ‘natural magic’ have come up with nothing but a few trifling and apparently faked results. In religion we are warned to show our faith by our works; the same rule applies in philosophy, where a system should be judged by its fruits, and pronounced frivolous if it turns out to be barren, especially when it bears the thorns and thistles of dispute and contention rather than the fruits of grape and olive.

74. The growth and progress of systems and sciences provides signs ·as to their value·. Something that is grounded in nature grows and increases, while what is based on opinion alters but doesn’t grow. If those doctrines ·of the ancient Greeks· hadn’t been so utterly like a plant torn up by its roots, and had remained attached to and nourished by the womb of nature, the state of affairs that we have seen to obtain for two thousand years—namely

the sciences stayed in the place where they began, hardly changing, not getting any additions worth mentioning, thriving best in the hands of their first founders and declining from then on

—would never have come about. This is the opposite of what happens with the mechanical arts, which are based on nature and the light of experience: they (as long as they find favour with people) continually thrive and grow, having a special kind of spirit in them, so that they are at first rough and ready, then manageable, from then onwards made smoothly convenient by use—and always growing.

75. Admissions made by the very authorities whom men now follow constitute another sign ·that today’s sciences are in trouble·—if it is all right to apply the label ‘sign’ to what is really testimony, indeed the most reliable of all testimony. Even those who so confidently pronounce on everything do intermittently pull themselves together and complain of the subtlety of nature, the obscurity of things, and the weakness of the human mind. ·These complaints are not just a sign of trouble in the sciences; they are worded in such a way that they cause further harm·. If these people merely complained, some cowards might be deterred from searching further, while others with livelier minds and a more hopeful spirit might be spurred and incited to go on. But the complainers don’t merely speak for themselves: if something is beyond their knowledge or reach, and of their master’s, they declare it to be beyond the bounds of possibility, something that can’t be known or done; so that their lofty ill-nature turns the weakness of their own ‘discoveries’ into a libel against nature herself and a source of despair for the rest of the world. •Thus the school of the New Academy, which doomed men to everlasting darkness by maintaining as a matter of doctrine that nothing at all could be known. •Thus the opinion that men can’t possibly discover the forms, i.e. the real differentiae of things ·that put things into different species· (really they are laws of pure action [see note here]). •Thus also certain opinions in the field of action and operation, e.g. that the heat of the sun is quite different in kind from the heat of fire, so that no-one will think that the operations of fire could produce anything like the works of nature ·that are produced by the sun·. •That’s the source of the view that. . .

Latin: . . . compositionem tantum opus hominis, mistionem vero opus solius naturae esse

literal meaning: . . . men are capable only of composition, and mixing has to be the work of nature

intended meaning? . . . men are capable only of assembling things into physical mixtures (e.g. salt and pepper), and the subtler kind of combination involved in something’s being gold or water or salt or the like must be the work of nature

—lest men should hope to develop techniques for generating or transforming natural bodies, ·e.g. creating water or turning lead into gold·. ·I point out· this sign ·of second-rateness· to warn you not to let your work and your career get mixed up with dogmas that are not merely discouraging but are dedicated to discouragement.

76. Here is another sign ·of something’s being wrong· that I oughtn’t to pass over: the fact that formerly there existed among philosophers such great disagreement, and such differences between one school and another. This shows well enough that the road from the senses to the intellect was not well defended ·with walls along each side·, when the same raw material for philosophy (namely the nature of things) has been taken over and used to construct so many wandering pathways of error. These days, most of the disagreements and differences of opinion on first principles and entire ·philosophical· systems have been extinguished; but there are still endless questions and disputes concerning some parts of philosophy, which makes it clear that there is nothing certain or sound in the systems themselves or in the modes of demonstration ·that they employ·.

77. Some men think this:

There ·is great agreement in philosophy these days, because there is· widespread agreement in assenting to the philosophy of Aristotle; as witness the fact that once it was published the systems of earlier philosophers fell into disuse and withered away, while in the times that followed nothing better was found. Thus, it seems to have been so well laid out and established that it has drawn both ages—·ancient and modern·—to itself.

[[Philosophy here likely means philosophy and science unlike in modern usage where philosophy has been separated from science.]]

That brings me to the end of what I have to say to make my point that the signs of health and truth in the currently accepted philosophical systems and sciences are not good, whether they be drawn from their origins (71–2), their upshots (73), their progress (74), the admissions of their founders (75), or agreed acceptance (77).

78. I now come to the causes of these errors—so many of them, and such bad ones!—that have continued on through all those centuries. ·My discussion of thirteen of them will run on through 92·. You may have been wondering how the points I have made could have escaped men’s notice until now; my account of the causes should stop you wondering about that. When you understand the causes, you may have something else to be surprised by, namely the fact that someone has now seen through the errors, thought about them, and come up with my points against them. As for that, I see it as coming from my good luck rather than from my superior talents; it’s not that I am so clever, but rather that I was born at the right time.

(1) The first point ·about how long the errors went undetected· is this: If you look hard at ‘all those centuries’ you’ll see that they shrink into something quite small. We have memories and records of twenty-five, and of those you can hardly pick out six that were fertile in the sciences or favourable to their development. (There are wastelands and deserts in times just as in regions of the earth!) We can properly count only three periods when learning flourished, and they lasted barely two centuries each: that of •the Greeks, the second of •the Romans, and the last among us—•the nations of western Europe. The intervening ages of the world were not flourishing or fertile for the growth of knowledge. (Don’t cite the Arabs or the schoolmen ·as counter-examples to that·; for they spent the intervening times not •adding to the weightiness of the sciences but crushing them with the weight of their books!) So there is one cause for the lack of progress in the sciences, namely the brevity of the periods that can properly be said to have been favourable to them.

79. (2) Here is a second cause, and one of great all-around importance: Precisely at the times when human intelligence and learning have flourished most, or indeed flourished at all, men didn’t work at natural philosophy [here = ‘natural science’]. Yet it should have been regarded as the great mother of the sciences; because all arts and all sciences, though they may be polished and shaped and made fit for use, won’t grow at all if they are torn from this root ·of natural philosophy·. It is clear that after the Christian religion was generally accepted and grew strong, the vast majority of the best minds applied themselves to theology, that this offered the best promise of reward and the most abundant research support of all kinds, and that this focus on theology was the chief occupation ·of able people· in western Europe during the third period ·of the three I have named·—all the more so because at about the same time literacy began to be more widespread and religious controversies sprang up. During the Roman period—the second of my trio—philosophers mostly worked on and thought about moral philosophy, which was to the pagans what theology is to us. Also, in those times the best intelligences usually devoted themselves to public affairs, because the sheer size of the Roman empire required the services of a great many people. And—·moving back to the first of my trio·—there was only a tiny portion of time when natural philosophy was seen to flourish among the Greeks; for in earlier times all except Thales of the so-called ‘seven wise men’ applied themselves to morals and politics; and in later times, when Socrates had drawn philosophy from heaven down to earth, moral philosophy became more fashionable than ever and diverted men’s minds from the philosophy of nature.

And right at the time when inquiries into nature were carried on energetically, they were spoiled and made useless by controversies and the ambitious display of new opinions. During those three periods, then, natural philosophy was largely neglected or impeded, so it’s no wonder that men made so little progress with something that they weren’t attending to.

[This is the first of eleven remarks along the lines of ‘No wonder science hasn’t progressed, given the fact that. . . ’—one for each of Bacon’s causes of non-progress except the first and last.]

80. (3) I would add that especially in recent times natural philosophy, even among those who have attended to it, has scarcely ever had anyone’s complete and full-time attention (except perhaps a monk studying in his cell, or an aristocrat burning the midnight oil in his country house); it has usually been treated as merely a bridge leading to something else. And so ·natural philosophy·, that great mother of the sciences, has been subjected to the astonishing indignity of being degraded to the role of a servant, having to help medicine or mathematics in their affairs, and to give the immature minds of teen-agers a first dip in a sort of dye, to make them better able to absorb some other dye later on. Meanwhile don’t look for much progress in the sciences—especially in their practical part—unless natural philosophy is applied to particular sciences, and particular sciences are applied back again to natural philosophy. It is because this hasn’t been done that many of the sciences have no depth and merely glide over the surface of things. What sciences? Well, astronomy, optics, music, many of the mechanical arts, even medicine itself—and, more surprisingly, moral and political philosophy and the logical sciences. Because once these particular sciences have become widespread and established, they are no longer nourished by natural philosophy, which could have given them fresh strength and growth drawn from the well-springs—from true thoughts about

• motions, rays, sounds and textures, and
• microstructures of bodies [Bacon’s many uses of the word schematismus show that for him a body’s schematismus is its fine-grained structure. This version will always use ‘microstructure’, but be aware that Bacon doesn’t use a word with the prefix ‘micro’.], and
• feelings and intellectual processes.

So it’s not at all strange that the sciences don’t grow, given that they have been cut off from their roots.

81. (4) Another great and powerful cause why the sciences haven’t progressed much is this: You can’t run a race properly when the finishing-post hasn’t been properly positioned and fixed in place. Now the true and lawful finishing-post of the sciences is just new discoveries and powers in the service of human life. But the great majority of the mob ·of supposed scientists· have no feeling for this, and are merely hired lecturers. Well, occasionally some ambitious practitioner who is abler than most spends his own resources on some new invention; but most men are so far from aiming to add anything to the arts and sciences that they don’t even attend to what’s already there or take from it anything that they can’t use in their lectures or use in the pursuit of money or fame or the like. And when one of that multitude does pay court to science with honest affection and for her own sake, even then it turns out that what attracts him is not the stern and unbending search for truth so much as the richness of the array of thoughts and doctrines. And if there should happen to be one who pursues the truth in earnest, even he will be going after •truths that will satisfy his intellect by explaining the causes of things long since discovered, and not •truths that hold promise of new practical applications or •the new light of axioms. If the •end of the sciences hasn’t yet been placed properly, it isn’t strange that men have gone wrong concerning the •means.

82. (5) So men have mislocated the end and finishing-post of the sciences; but even if they hadn’t, their route to it is completely wrong and impassable. When you think about it carefully, it is amazing that •no mortal has cared enough or thought hard enough to lay out a securely walled road leading to the human intellect directly from the senses and experiment, and that •everything has been left either to the mists of tradition, or the whirl and eddy of argument, or the waves and mazes of random and fragmentary experience. Think about this soberly and carefully: What route have men customarily travelled in investigating and discovering things? No doubt what you will first come up with is a very simple and naive discovery procedure, the most usual one, namely this:

A man is bracing himself to make a discovery about something: first he seeks out and surveys everything that has been said about it by others; then he starts to think for himself; shaking up his mind and, as it were, praying to it to give him oracular pronouncements

—a ‘method’ that has no foundation at all, rests only on opinions, and goes where they go. Another man may perhaps call on dialectics to make his discovery for him, but the discoveries that dialectics is good for are irrelevant to what we are discussing—there’s nothing in common except the word ‘discovery’.

[Regarding the passage between *asterisks*: Bacon writes of ‘arts’ but doesn’t give examples (medicine and ship-building). This text also expands his in other ways that ·dots· can’t easily indicate.]

*Arts such as medicine and ship-building are made up of principles and axioms, and dialectics doesn’t discover these; all it can ‘discover’, given that you have the principles and axioms from some other source, is what else is consistent with them. If we try to insist on more than that, demanding that dialectics tell us what the •principles and axioms are, we all know that it will fling the demand back in our faces: ‘For •them you must trust the art in question. For the foundations of medicine, for example, don’t ask dialectics, ask medicine!’* ·Setting aside the opinions of others, and dialectics·, there remains simple experience—which we call ‘experiment’ if we were trying to produce it, and ‘chance’ if we weren’t. But such experience is no better than a broom with loose bristles, as the saying is—·those who steer by it are· like men in the dark, patting the walls as they go along hoping to find their way, when they’d have done much better to wait for daylight, or light a candle, and then set off. But experience managed in the rightorder first lights the candle and then uses it to show the way. It starts with experience that is ordered and classified, not jumbled or erratic; from that it derives axioms, and from established axioms it moves on to new experiments; just as God proceeded in an •orderly way when he worked on matter. So don’t be surprised that science hasn’t yet reached the end of its journey, seeing that men have gone altogether astray, either abandoning experience entirely, or getting lost in it and wandering around as in a maze. Whereas a rightly ordered method leads by an unbroken route through the thickets of experience to the open ground of axioms.

83. This trouble ·concerning not-finding-the-way· has been greatly increased by an old and harmful opinion or fancy, namely the self-important view that it is beneath the dignity of the human mind to be closely involved with experiments on particular material things given through the senses— especially as they are

• hard work to investigate,
• nasty to report on,
• not suitable things for a gentleman to perform,
• infinite in number, and
• full of extremely small-scale details.

So that it has finally come to this: the true way is not merely departed from but blocked off. It’s not that experience has been abandoned or badly handled; rather, it has been fastidiously kept at arm’s length.

84. (6) Men have been kept back from making progress in the sciences, as though by a magic spell, by •their reverence for antiquity, by •the authority of men of high standing in philosophy, and then by •the general acceptance ·of certain propositions·. I have spoken of the last of these ·in 77· above.

As for ‘antiquity’, the opinion that men have about it is a lazy one that does violence to the meaning of the word. For really what is antique is •the world in its old age, that is the world now; and •the earlier age of the world when the ancients lived, though in relation to us it was the elder, in relation to the world it was the younger. We expect •an old man to know more about the human condition than •a young man does, and to make more mature judgments about it, because of his experience and the number and variety of things he has seen, heard and thought about. In the same way, more could be fairly expected from •our age (if only we knew and chose to employ its strength) than from •ancient times, because ours is a more advanced age of the world, and has accumulated countless experiments and observations.

It is also relevant that through long voyages many things in nature will be discovered that may let in new light on philosophy (and such voyages will be increasingly frequent in our age). And given that the regions of the •material domain—i.e. of the earth, the sea and the stars—have been opened up and brought to light, it would surely be disgraceful if the •intellectual domain remained shut up within the narrow limits of old discoveries.

And with regard to authority: there is something feeble about granting so much to •authors while denying •time its rights—time, which is the author of authors, or rather of all authority. For the saying is ‘Truth is the daughter of time’, not ‘. . . the daughter of authority’!

We shouldn’t be surprised, then, when we find that the enchantments of •antiquity and •authority and •general agreement have tied up men’s powers—as though putting them under a spell—making them unable to rub shoulders with •things themselves.

85. (7) What brings man’s work to a halt in face of the discoveries that have already been made is not merely his admiration for antiquity, authority and general agreement, but also his admiration for the long-time achievements of the human race. When you look at the variety and beauty of the devices that the mechanical arts have assembled for men’s use, you’ll surely be more inclined to admire man’s wealth than to have any sense of his poverty! You won’t take into account the fact that

the original human observations and natural processes (which are the soul and first mover of all that variety)

are not many and didn’t have to be dug deeply for; and that apart from them it has been merely a matter of

patience, and the orderly and precise movements of hands and tools.

For example, it certainly takes precise and accurate work to make a clock, whose wheels seem to imitate the heavenly bodies and, in their alternating and orderly motion, to imitate the pulse of animals; but ·there isn’t much scientific content in this, because the entire mechanism· depends on only a couple of axioms of nature.

[Bacon next writes about ‘the refinement of the liberal arts’ and of the ‘art’ that goes into ‘the mechanical preparation of natural substances’, and lists the achievements in astronomy, music, language, the alphabet (‘still not used in China’), the making of beer, wine and bread, and so on. His point is that these achievements took centuries of tinkering, and that they involve very little in the way of genuinely scientific knowledge. So they—like the clock—make it less appropriate to wonder at how much we know than to wonder at how little. Then:]

If you turn from the workshop to the library, and wonder at the immense variety of books you see there, just look carefully into their contents and your amazement will be flipped: having seen their endless repetitions, and seen how men are always saying and doing what has been said and done before, you’ll pass from •admiration at the variety to •astonishment at the poverty and scantiness of the subjects that have so far possessed the minds of men.

[Next Bacon comments derisively on the intellectual poverty of alchemy. Then:] The students of natural magic, who explain everything by ‘sympathies’ and ‘antipathies’, have in their lazy conjectures credited substances with having wonderful powers and operations. If they have ever they produced any results, they have been more productive of astonishment than of anything useful. [Followed by a slap at ‘superstitious magic’; Bacon expresses some embarrassment at even mentioning this, as he does with alchemy. Finally:] It isn’t surprising that the belief that one has a great deal has been a cause of our having very little.

86. (8) Furthermore, men’s feeble and almost childish admiration for doctrines and arts has been increased by the tricks and devices of those who have practised and taught the sciences. For they produce them with so much fuss and flourish, putting them before the world all dressed up and masked ·and seemingly ready to go·, as though they were wholly complete and finished. Just look at the structure and the classifications they bring with them! They seem to cover everything that could come up in that subject, and to the minds of the vulgar they present the form and plan of a perfected science; but really the classificatory units are little more than empty bookshelves. The earliest seekers after truth did better than this. Their thoughts about things resulted in knowledge that they want to set down for later use, and they did this in aphorisms—i.e. short unconnected sentences, not linked by any method—and didn’t pretend or profess to cover the entire art. But given the way things are these days, it’s not surprising that men don’t try to make further progress in matters that have been passed down to them as long since perfect and complete.

87. (9) The •ancient systems have also gained considerably in their reputation and credit from the empty-headed foolishness of those who have propounded •new ones, especially in the area of applied science. There has been no shortage of talkers and dreamers who—partly believing what they say and partly not—have loaded mankind with promises, offering the means to

• prolong life,
• slow down the aging process,
• lessen pain,
• repair natural defects,. . . .
• control and arouse affections,
• sharpen and heighten the intellectual faculties,
• turn substances into other substances (·e.g. lead into gold·),
• make things move, or move faster, at will,
• make changes in the air,
• arrange for influence from the stars,
• prophesy the future,
• make things visible from a long way off,
• reveal things that are hidden,

and many more. With regard to these ‘benefactors’ it wouldn’t be unfair to say that •their absurdities differ as much from •true arts (in the eyes of the philosopher) as •the exploits of Julius Caesar or Alexander the Great differ from •those of ·such fictional characters as· Amadis of Gaul or the Knights of the Round Table. . . . It isn’t surprising that prejudice is raised against new propositions, especially ones that are said to have practical implications, because of those impostors who have tried something similar. . . .

[[Bacon seems to speaking of idea inoculation here where people have been inoculated against new sciences because of the charlatans promising things they fail to deliver.

88. (10) Far more harm has been done to knowledge by pettiness, and the smallness and triviality of the tasks that men have tackled. It is made worse by the fact that this pettiness comes with a certain air of arrogance and superiority. A now-familiar general device that is found in all the arts is this: the author blames nature for any weakness in his art, declaring—on the authority of his art!—that whatever his art can’t achieve is intrinsically impossible. [‘Art’ refers to any human activity that involves techniques and requires skills.] If arts are to be their own judges, then clearly none will be found guilty! Moreover, the philosophy that is now in play hugs to itself certain tenets whose purpose. . . .is to persuade men that we can’t expect art or human labour to come up with any results that are hard to get, requiring that nature be commanded and subdued. The doctrine that the sun’s heat and fire’s heat differ in kind is an example of this, and another is the doctrine about mixture—both mentioned earlier, ·in 75·. If you think about it carefully you’ll see that all this involves a wrong limiting of human power; it tends—and is meant to tend—to produce an unnatural despair; and this not only messes up the auguries that might give hope but also cuts the sinews and spurs of industry, and loads the dice against experience itself. And all for the sake of having us think that their art has been completed, and for the miserable ‘triumph’ of getting us to believe that whatever hasn’t yet been discovered and understood can’t ever be discovered or understood.

And when someone does get in touch with reality and try to discover something new, he will confine himself to investigating and working out some one topic, such as

• the nature of the magnet,
• the tides,
• mapping the heavens,

and things like that, which seem to be somewhat isolated from everything else and have hitherto been tackled without much success; whereas really it is an ignorant mistake to study something in isolation. Why? Because a nature that seems to be •latent and hidden in some things is •obvious and (as it were) palpable in others, so that people puzzle over it in •the former while nobody even notices it in •the latter. Consider the holding-together ·of material things·. Wood and stones hold together, but people pay no attention to that fact, merely saying of wood and stone that ‘they are solid’ and giving no further thought to why they don’t fall apart, breaking up their continuity; while with water-bubbles—in which a sort of hemispherical skin is formed, fending off for a moment the breaking up of the continuity—the holding together seems to be a subtle matter.

In fact, what in some things is regarded as special to them ·and not present in the rest of nature· also occurs elsewhere in an obvious and well-known form, but it won’t be recognized there as long as the experiments and thoughts of men are engaged only on the former, ·i.e. on the less obvious and supposedly ‘special’ cases·. But generally speaking, in mechanics all that is needed for someone to pass off an old result as something new is •to refine or embellish it, •to combine it with some others, •to make it handier for practical application, •to produce the result on a larger or a smaller scale than had been done before, or the like.

So it is no wonder that no important discoveries worthy of mankind have been brought to light, when men have been satisfied—indeed pleased—with such trifling and puerile tasks, and have even fancied that in them they were trying for something great, if not achieving it.

89. (11) Bear in mind also that in every period natural philosophy has had a troublesome and recalcitrant adversary in superstition and blind religious extremism. Among the Greeks those who first proposed natural causes for lightning and for storms were condemned for disrespect towards the gods. And some of the fathers of the early Christian church were not much milder in their attitude to those who, on most convincing grounds that no sane person would question today, maintained that the earth is round and thus that the antipodes exist.

Even today it is harder and more dangerous ·than it ought to be· to talk about nature, because of the procedures of the theological schoolmen. They regularized theology as much as they could, and worked it into the shape of an art [here = ‘academic discipline’], and then incorporated into the body of religion more of Aristotle’s contentious and thorny philosophy than would properly fit there. The same result is apt to arise, though in a different way, from the theories of those who have been so bold as to infer the truth of the Christian religion from the principles of •philosophers, and to confirm it by •their authority. They have solemnly and ceremonially celebrated this union of the senses with faith as a lawful marriage, entertaining [permulcentes] men’s minds with a pleasing variety things to think about but also mixing [permiscentes] the human with the divine in an unseemly fashion. In such mixtures of theology with philosophy only the accepted doctrines of philosophy are included, while •new ones—which may be changes for the better—are driven off and wiped out.

Lastly, you will find that some ignorant divines close off access to any philosophy, however ‘purified’ it may be. •Some are feebly afraid that a deeper search into nature would take one beyond the limits of what is proper; and they take what is said in the Scriptures against those who pry into

• sacred mysteries,

wrenching it away from there and transferring it to

• the hidden things of nature,

which are not fenced off by any prohibition ·in the Bible·. •Other divines are more complex and thoughtful: they think that if middle causes [see note in 65] aren’t known then it will be easier to explain everything in terms of God’s hand and rod; and they think that this is greatly in the interests of religion, whereas really it’s nothing but trying to gratify God by a lie. •Others are led by past examples to fear that movements and changes in philosophy will end in attacks on religion. And •others again—·bringing us to the end of my list·—seem to be afraid that if nature is investigated something may be found to subvert religion or at least to shake its authority, especially with the unlearned. But these two last fears strike me as having come from thinking at the level of the lower animals, ·like a dog cowering in fear when it hears an unfamiliar noise·; it’s as though these men in their heart of hearts weren’t sure of the strength of religion and of faith’s domination of the senses, and were therefore scared that the investigation of truth in nature might be dangerous to them. But in point of fact natural philosophy is second only to the Bible as the best antidote to superstition and the most approved nourishment for faith. So natural philosophy deserves its place as religion’s most faithful handmaid: religion displays God’s •will, while natural philosophy displays his •power. . . . ·Summing up·: it isn’t surprising that •natural philosophy is stunted in its growth when religion, the thing that has most power over men’s minds, has been pulled into the fight against •it by the stupidity and incautious zeal of certain people.

90. (12) Moving on now: in the customs and institutions of schools, academies, colleges, and similar bodies whose role is to house learned men and to develop learning, everything turns out to work against the progress of the sciences. Their lectures and tests are devised in such a way that it would be hard for anyone to think or speculate about anything out of the common rut. And if one or two have the courage to judge freely, they’ll have to do it all by themselves with no help from the company of others. And if they can put up with that too, they will find that their hard work and breadth of mind are a considerable hindrance to their careers! For the studies of men in these places are confined—as it were imprisoned—in the writings of certain authors, and if anyone disagrees with them he is immediately accused of being a trouble-maker and a revolutionary. But ·this is all wrong, because· the situation of the •arts is quite different from that of the •state, and the coming of •new light ·in the arts· is not like the coming of •new events ·in the state·. In matters of state any change—even a change for the better—is under suspicion of making trouble, because politics rests on authority, consent, fame and opinion, not on demonstration. But arts and sciences should be like quarries, where the noise of new works and further advances is heard on every side. That is how things stand according to right reason, but it’s not what actually happens; and the things I have reported in the administration and government of learning severely restrain the advancement of the sciences.

91. Indeed, even if that hostility ·towards new work· stopped, the growth of the sciences would still be held back by the fact that high aims and hard work in this field go unrewarded. For the rewarding of scientific achievement and the performing of it are not in the same hands. The growth of the sciences comes from high intelligence, while the prizes and rewards of them are in the hands of the common people, or of ‘great’ persons who are nearly all quite ignorant. Moreover, not only do scientific advances bring no rewards or other benefits, they don’t even get popular applause. For the common run of people aren’t up to the task of understanding such matters, so that news about them is apt to be blown away by the gales of popular opinions. And it’s not surprising that endeavours that are not honoured don’t prosper.

92. (13) By far the greatest obstacle to the progress of science—to the launching of new projects and the opening up of new fields of inquiry—is that men despair and think things impossible. For in these matters it’s the careful, serious people who have no confidence at all, and are taken up with such thoughts as that

• nature is dark,
• life is short,
• the senses are deceptive,
• judgment is weak,
• experiments are hard to do,

and the like. They think that •throughout the centuries the sciences have their ebbs and flows, sometimes growing and flourishing and at others withering and decaying, but that •a time will come when the sciences are in a state from which no further progress will be possible. ·And they evidently think that that time lies in the very near future·. So if anyone expects or undertakes to make further discoveries, they set this down to his immature irresponsibility. Such endeavours, they think, start well, become harder as they go on, and end in confusion. This is a way of thinking that sober intelligent men are likely to fall into, and we mustn’t let their charms and attractions lead us to relax or mitigate our judgment ·of their line of thought·. We should carefully note what gleams of hope there are and what direction they come from; and—·changing the metaphor·—we should disregard the lighter breezes of hope but seriously and attentively follow the winds that seem to be steadier. We must also look to political prudence for advice, and to take the advice it gives; it is distrustful on principle, and takes a dim view of human affairs. So my topic here ·and to the end of 114· is hope; for I don’t trade in promises, and don’t want to affect men’s judgments by force or by trickery; rather, I want to lead them by the hand without coercion. The best way to inspire hope will be to bring men to particulars, especially ones that are set out in an orderly way in the Tables of Discovery (partly in this work ·112–113 and 218·, but much more in the fourth part of my Great Fresh Start [see note in 31], because this isn’t merely a •hope for the thing but •the thing itself. But I want to come at things gently, so ·instead of jumping straight to the Tables· I shall proceed with my plan of preparing men’s minds, for hope is a significant part even of preparation. If all the other inducements aren’t accompanied by hope, their effect on men is not to •ginger them up and get them busy but rather to •make them depressed by giving them an even darker view of how things now stand and making them even more fully aware of the unhappiness of their own condition. So there is a point in my revealing and recommending the views of mine that make hope in this matter reasonable. It’s like what Columbus did before his wonderful voyage across the Atlantic, giving reasons for his belief that hitherto unknown lands and continents might be discovered. His reasons were rejected at first, but later they were vindicated by experience, and were the causes and beginnings of great events.

The next post in the sequence, Book 1: 93-130 (Reasons for Hope), will be posted Thursday, October 10th at latest by 6:00pm PDT.

Discuss

### What are your strategies for avoiding micro-mistakes?

4 октября, 2019 - 21:42
Published on October 4, 2019 6:42 PM UTC

I've recently been spending more time doing things that involve algebra and/or symbol manipulation (after a while not doing these things by hand that often) and have noticed that small mistakes cost me a lot of time. Specifically, I can usually catch such mistakes by double-checking my work, but the cost of not being able to trust my initial results and redo steps is very high. High enough that I'm willing to spend time working to reduce the number of such mistakes I make even if it means slowing down quite a bit or adopting some other costly process.

If you're good at avoiding making such mistakes in the first place and it's not just because you were born that way, what strategies do you use?

Two notes on the type of answers I'm looking for:

1. I should note that one answer is just to use something like WolframAlpha or Mathematica, which I do. That said, I'm still interested in not having to rely on such tools for things in the general symbol manipulation reference class as I don't like relying on my computer being present to do these sorts of things.

2. I did do some looking around for work addressing this (found this for example), but most of it suggested basic strategies that I already implement like being neat and checking your work.

Discuss

4 октября, 2019 - 19:44
Published on October 4, 2019 4:44 PM UTC

Discuss

### The AI is the model

4 октября, 2019 - 11:11
Published on October 4, 2019 8:11 AM UTC

A Friendly AI is not a selfish AI constrained by a special extra conscience module that overrides the AI's natural impulses and tells it what to do.  You just build the conscience, and that is the AI.

Eliezer Yudkowsky, Ghosts in the Machine

When I started thinking about value learning, I thought the goal was to extract simple objects that described the essence of morality. Not so simple as a verbal definition, but something like a utility function. Something separate from planning or reasoning, that was purely about preferences, which you could plug into an AI which would then do some totally separate work to turn preferences into choices.

Turns out that runs into some serious obstacles.

I

The difficulty of value learning is that there is no One True Utility Function to be assigned the globs of atoms we call humans. To think about them as having desires at all requires viewing them at a suitable level of abstraction - though of course, there's no One True Level Of Abstraction, either. (I promise this is my last post that's basically just consequences of needing the intentional stance for a while.)

Call the world-model the AI uses to best predict the world its "native ontology." If I want to go to the gym, we want the AI to look at the atoms and see "Charlie wants to go to the gym." The thing that I want is not some specific state of the AI's native ontology. Instead, I can only "want" something in an abstracted ontology that not only contains the AI's intentional-stance model of "Charlie," but also intentional-stance-compatible abstractions for "go" and "gym." In short, abstraction is contagious.

This is like the idea of an umwelt (oom-velt), introduced by early philosopher of biology Jakob Johann von Uexküll. In nature, different organisms can have different effective models of the world even though they live in the same environment. They only evolve to model what is necessary for them to survive and reproduce. The umwelt is a term for this modeled world. The umwelt of a bloodsucking tick consists largely of things to climb on and warm-blooded mammals, which are perceived not by sight but by a certain smell and body temperature.

I think of the AI's intentional stance as not just being a specially abstract model of me, but also being a model of my entire umwelt. It needs an abstraction of the gym because the gym is a part of my inner world, an abstract concept that gets referenced in my plans and desires.

II

Back to value learning. The bare minimum for success is that we build an AI that can predict which actions will do a good job satisfying human values. But how minimalist do we really have to be? Can we get it to output an abstract object corresponding to human values, like a utility function or some compression thereof?

Well, maybe. If it had a complete understanding of humans, maybe it could take that abstract, intentional stance description of humans and cash it out into a utility function over world-histories. Note that this is over world-histories, not world-states, because humans' abstractions often involve things like duration and change. So one problem is that this object is impractically massive, both to construct and to use. In order to actually do anything with human values, what we want is the compressed, abstracted version, and this turns out to more or less consist of the entire AI.

It's theoretically convenient to think about separating values and planning, only passing a utility function from one to the other, but in practice the utility function is too big to construct, which means that the planning step must repeatedly talk to the abstract model, and is no longer so cleanly separate from it, especially if we imagine optimizing end-to-end, causing every part to be optimized to fit every other part, like two trees growing intertwined.

The other factor blurring any neat lines is meta-ethics. We might want to use meta-ethical data - information learned by observing and talking to humans - to change how the AI treats its information about human values, or even change which decision theory it's using. You can frame this as preferences over the AI's own code, but this is still a case of supposedly simpler preferences actually containing the specification of the whole AI.

These violations of clean separability tell us that our goal shouldn't be to find a separate "human values" object. Except in special cases that we really shouldn't count on, the entire FAI is the "human values" object, and all of its parts might make sense only in the context of its other parts. The AI doesn't have a model of what it should do, the AI is the model.

Discuss

### Solving the forgetting. Spaced repetition beyond rationality community.

4 октября, 2019 - 09:00
Published on October 3, 2019 3:03 PM UTC

Many of you have heard about spaced repetition. It's a learning technique that allows you to remember almost anything as long as you want. It works by repeatedly answering test questions with increasing intervals. The problem is it's not widely used (just like the art of rationality). Existing solutions are either limited to memorization of terms/foreign language words or require you to create all flashcards (test questions) yourself. But those who go through the struggle of creating flashcards for complex topics themselves show that spaced repetition can help in learning any topic.

The main hurdle to sharing flashcards is that you can't understand the question written by someone else if you don't know the topic quite well already. Therefore you have to start repetition when you understand the underlying concept. It seems that the best timing is right after you read about the relevant concept in a textbook or watch a lecture. The test question has to be integrated with the educational content.

The linked article describes the approach we take to implement this idea. We've already made a basic implementation. Now we're looking for those who want to try use it in their personal learning. But even more importantly those who are willing to experiment with creating courses for others. We'd appreciate any feedback you have!

Discuss

### Debate on Instrumental Convergence between Yann Le Cunn, Stuart Russell and More

4 октября, 2019 - 07:08
Published on October 4, 2019 4:08 AM UTC

An actual freaking public debate about instrumental convergence, in a public space! Major respect to all involved, especially Yoshua Bengio for great facilitation.

For posterity (i.e. having a good historical archive) and further discussion, I've reproduced the conversation here. I'm happy to make edits at the request of anyone in the discussion who is quoted below. I've improved formatting for clarity and fixed some typos. For people who are not AI Alignment Researchers who wish to comment, see the public version of this post here. For people who do work in the relevant fields, please sign up in the top right. It will take a day or so to confirm membership.

Original Post

Yann LeCun: "don't fear the Terminator", a short opinion piece by Tony Zador and me that was just published in Scientific American.

"We dramatically overestimate the threat of an accidental AI takeover, because we tend to conflate intelligence with the drive to achieve dominance. [...] But intelligence per se does not generate the drive for domination, any more than horns do."

https://blogs.scientificamerican.com/observations/dont-fear-the-terminator/

Elliot Olds: Yann, the smart people who are very worried about AI seeking power and ensuring its own survival believe it's a big risk because power and survival are instrumental goals for almost any ultimate goal.

If you give a generally intelligent AI the goal to make as much money in the stock market as possible, it will resist being shut down because that would interfere with tis goal. It would try to become more powerful because then it could make money more effectively. This is the natural consequence of giving a smart agent a goal, unless we do something special to counteract this.

You've often written about how we shouldn't be so worried about AI, but I've never seen you address this point directly.

Stuart Russell: It is trivial to construct a toy MDP in which the agent's only reward comes from fetching the coffee. If, in that MDP, there is another "human" who has some probability, however small, of switching the agent off, and if the agent has available a button that switches off that human, the agent will necessarily press that button as part of the optimal solution for fetching the coffee. No hatred, no desire for power, no built-in emotions, no built-in survival instinct, nothing except the desire to etch the coffee successfully. This point cannot be addressed because it's a simple mathematical observation.

Yoshua Bengio: Yann, I'd be curious about your response to Stuart Russell's point.

Yann LeCun: You mean, the so-called "instrumental convergence" argument by which "a robot can't fetch you coffee if it's dead. Hence it will develop self-preservation as an instrumental sub-goal."

It might even kill you if you get in the way.

1. Once the robot has brought you coffee, its self-preservation instinct disappears. You can turn it off.

2. One would have to be unbelievably stupid to build open-ended objectives in a super-intelligent (and super-powerful) machine without some safeguard terms in the objective.

3. One would have to be rather incompetent not to have a mechanism by which new terms in the objective could be added to prevent previously-unforeseen bad behavior. For humans, we have education and laws to shape our objective functions and complement the hardwired terms built into us by evolution.

4. The power of even the most super-intelligent machine is limited by physics, and its size and needs make it vulnerable to physical attacks. No need for much intelligence here. A virus is infinitely less intelligent than you, but it can still kill you.

5. A second machine, designed solely to neutralize an evil super-intelligent machine will win every time, if given similar amounts of computing resources (because specialized machines always beat general ones).

Bottom line: there are lots and lots of ways to protect against badly-designed intelligent machines turned evil.

Stuart has called me stupid in the Vanity Fair interview linked below for allegedly not understanding the whole idea of instrumental convergence.

It's not that I don't understand it. I think it would only be relevant in a fantasy world in which people would be smart enough to design super-intelligent machines, yet ridiculously stupid to the point of giving it moronic objectives with no safeguards.

Here is the juicy bit from the article where Stuart calls me stupid:

Russell took exception to the views of Yann LeCun, who developed the forerunner of the convolutional neural nets used by AlphaGo and is Facebook’s director of A.I. research. LeCun told the BBC that there would be no Ex Machina or Terminator scenarios, because robots would not be built with human drives—hunger, power, reproduction, self-preservation. “Yann LeCun keeps saying that there’s no reason why machines would have any self-preservation instinct,” Russell said. “And it’s simply and mathematically false. I mean, it’s so obvious that a machine will have self-preservation even if you don’t program it in because if you say, ‘Fetch the coffee,’ it can’t fetch the coffee if it’s dead. So if you give it any goal whatsoever, it has a reason to preserve its own existence to achieve that goal. And if you threaten it on your way to getting coffee, it’s going to kill you because any risk to the coffee has to be countered. People have explained this to LeCun in very simple terms.”

Tony Zador: I agree with most of what Yann wrote about Stuart Russell's concern.

Specifically, I think the flaw in Stuart's argument is the assertion that "switching off the human is the optimal solution"---who says that's an optimal solution?

I guess if you posit an omnipotent robot, destroying humanity might be a possible solution. But if the robot is not omnipotent, then killing humans comes at considerable risk, ie that they will retaliate. Or humans might build special "protector robots" whose value function is solely focused on preventing the killing of humans by other robots. Presumably these robots would be at least as well armed as the coffee robots. So this really increases the risk to the coffee robots of pursuing the genocide strategy.

And if the robot is omnipotent, then there are an infinite number of alternative strategies to ensure survival (like putting up an impenetrable forcefield around the off switch) that work just as well.

So i would say that killing all humans is not only not likely to be an optimal strategy under most scenarios, the set of scenarios under which it is optimal is probably close to a set of measure 0.

Stuart Russell: Thanks for clearing that up - so 2+2 is not equal to 4, because if the 2 were a 3, the answer wouldn't be 4? I simply pointed out that in the MDP as I defined it, switching off the human is the optimal solution, despite the fact that we didn't put in any emotions of power, domination, hate, testosterone, etc etc. And your solution seems, well, frankly terrifying, although I suppose the NRA would approve. Your last suggestion, that the robot could prevent anyone from ever switching it off, is also one of the things we are trying to avoid. The point is that the behaviors we are concerned about have nothing to do with putting in emotions of survival, power, domination, etc. So arguing that there's no need to put those emotions in is completely missing the point.

Yann LeCun: Not clear whether you are referring to my comment or Tony's.

The point is that behaviors you are concerned about are easily avoidable by simple terms in the objective. In the unlikely event that these safeguards somehow fail, my partial list of escalating solutions (which you seem to find terrifying) is there to prevent a catastrophe. So arguing that emotions of survival etc will inevitably lead to dangerous behavior is completely missing the point.

It's a bit like saying that building cars without brakes will lead to fatalities.

Yes, but why would we be so stupid as to not include brakes?

That said, instrumental subgoals are much weaker drives of behavior than hardwired objectives. Else, how could one explain the lack of domination behavior in non-social animals, such as orangutans.

Francesca Rossi: @Yann Indeed it would be odd to design an AI system with a specific goal, like fetching coffee, and capabilities that include killing humans or disallowing being turned off, without equipping it also with guidelines and priorities to constrain its freedom, so it can understand for example that fetching coffee is not so important that it is worth killing a human being to do it. Value alignment is fundamental to achieve this. Why would we build machines that are not aligned to our values? Stuart, I agree that it would easy to build a coffee fetching machine that is not aligned to our values, but why would we do this? Of course value alignment is not easy, and still a research challenge, but I would make it part of the picture when we envision future intelligent machines.

Richard Mallah: Francesca, of course Stuart believes we should create value-aligned AI. The point is that there are too many caveats to explicitly add each to an objective function, and there are strong socioeconomic drives for humans to monetize AI prior to getting it sufficiently right, sufficiently safe.

Stuart Russell: "Why would be build machines that are not aligned to our values?" That's what we are doing, all the time. The standard model of AI assumes that the objective is fixed and known (check the textbook!), and we build machines on that basis - whether it's clickthrough maximization in social media content selection or total error minimization in photo labeling (Google Jacky Alciné) or, per Danny Hillis, profit maximization in fossil fuel companies. This is going to become even more untenable as machines become more powerful. There is no hope of "solving the value alignment problem" in the sense of figuring out the right value function offline and putting it into the machine. We need to change the way we do AI.

Yoshua Bengio: All right, we're making some progress towards a healthy debate. Let me try to summarize my understanding of the arguments. Yann LeCun and Tony Zadorr argue that humans would be stupid to put in explicit dominance instincts in our AIs. Stuart Russell responds that it needs not be explicit but dangerous or immoral behavior may simply arise out of imperfect value alignment and instrumental subgoals set by the machine to achieve its official goals. Yann LeCun and Tony Zador respond that we would be stupid not to program the proper 'laws of robotics' to protect humans. Stuart Russell is concerned that value alignment is not a solved problem and may be intractable (i.e. there will always remain a gap, and a sufficiently powerful AI could 'exploit' this gap, just like very powerful corporations currently often act legally but immorally). Yann LeCun and Tony Zador argue that we could also build defensive military robots designed to only kill regular AIs gone rogue by lack of value alignment. Stuart Russell did not explicitly respond to this but I infer from his NRA reference that we could be worse off with these defensive robots because now they have explicit weapons and can also suffer from the value misalignment problem.

Yoshua Bengio: So at the end of the day, it boils down to whether we can handle the value misalignment problem, and I'm afraid that it's not clear we can for sure, but it also seems reasonable to think we will be able to in the future. Maybe part of the problem is that Yann LeCun and Tony Zador are satisfied with a 99.9% probability that we can fix the value alignment problem while Stuart Russell is not satisfied with taking such an existential risk.

Yoshua Bengio: And there is another issue which was not much discussed (although the article does talk about the short-term risks of military uses of AI etc), and which concerns me: humans can easily do stupid things. So even if there are ways to mitigate the possibility of rogue AIs due to value misalignment, how can we guarantee that no single human will act stupidly (more likely, greedily for their own power) and unleash dangerous AIs in the world? And for this, we don't even need superintelligent AIs, to feel very concerned. The value alignment problem also applies to humans (or companies) who have a lot of power: the misalignment between their interests and the common good can lead to catastrophic outcomes, as we already know (e.g. tragedy of the commons, corruption, companies lying to have you buy their cigarettes or their oil, etc). It just gets worse when more power can be concentrated in the hands of a single person or organization, and AI advances can provide that power.

Francesca Rossi: I am more optimistic than Stuart about the value alignment problem. I think that a suitable combination of symbolic reasoning and various forms of machine learning can help us to both advance AI’s capabilities and get closer to solving the value alignment problem.

Tony Zador: @Stuart Russell "Thanks for clearing that up - so 2+2 is not equal to 4, because if the 2 were a 3, the answer wouldn't be 4? "

hmm. not quite what i'm saying.

If we're going for the math analogies, then i would say that a better analogy is:

Find X, Y such that X+Y=4.

The "killer coffee robot" solution is {X=642, Y = -638}. In other words: Yes, it is a solution, but not a particularly natural or likely or good solution.

But we humans are blinded but our own warped perspective. We focus on the solution that involves killing other creatures because that appears to be one of the main solutions that we humans default to. But it is not a particularly common solution in the natural world, nor do i think it's a particularly effective solution in the long run.

Yann LeCun: Humanity has been very familiar with the problem of fixing value misalignments for millenia.

We fix our children's hardwired values by teaching them how to behave.

We fix human value misalignment by laws. Laws create extrinsic terms in our objective functions and cause the appearance of instrumental subgoals ("don't steal") in order to avoid punishment. The desire for social acceptance also creates such instrumental subgoals driving good behavior.

We even fix value misalignment for super-human and super-intelligent entities, such as corporations and governments.

This last one occasionally fails, which is a considerably more immediate existential threat than AI.

Tony Zador: @Yoshua Bengio I agree with much of your summary. I agree value alignment is important, and that it is not a solved problem.

I also agree that new technologies often have unintended and profound consequences. The invention of books has led to a decline in our memories (people used to recite the entire Odyssey). Improvements in food production technology (and other factors) have led to a surprising obesity epidemic. The invention of social media is disrupting our political systems in ways that, to me anyway, have been quite surprising. So improvements in AI will undoubtedly have profound consequences for society, some of which will be negative.

But in my view, focusing on "killer robots that dominate or step on humans" is a distraction from much more serious issues.

That said, perhaps "killer robots" can be thought of as a metaphor (or metonym) for the set of all scary scenarios that result from this powerful new technology.

Yann LeCun: @Stuart Russell you write "we need to change the way we do AI". The problems you describe have nothing to do with AI per se.

They have to do with designing (not avoiding) explicit instrumental objectives for entities (e.g. corporations) so that their overall behavior works for the common good. This is a problem of law, economics, policies, ethics, and the problem of controlling complex dynamical systems composed of many agents in interaction.

What is required is a mechanism through which objectives can be changed quickly when issues surface. For example, Facebook stopped maximizing clickthroughs several years ago and stopped using the time spent in the app as a criterion about 2 years ago. It put in place measures to limit the dissemination of clickbait, and it favored content shared by friends rather than directly disseminating content from publishers.

We certainly agree that designing good objectives is hard. Humanity has struggled with designing objectives for itself for millennia. So this is not a new problem. If anything, designing objectives for machines, and forcing them to abide by them will be a lot easier than for humans, since we can physically modify their firmware.

There will be mistakes, no doubt, as with any new technology (early jetliners lost wings, early cars didn't have seat belts, roads didn't have speed limits...).

But I disagree that there is a high risk of accidentally building existential threats to humanity.

Existential threats to humanity have to be explicitly designed as such.

Yann LeCun: It will be much, much easier to control the behavior of autonomous AI systems than it has been for humans and human organizations, because we will be able to directly modify their intrinsic objective function.

This is very much unlike humans, whose objective can only be shaped through extrinsic objective functions (through education and laws), that indirectly create instrumental sub-objectives ("be nice, don't steal, don't kill, or you will be punished").

As I have pointed out in several talks in the last several years, autonomous AI systems will need to have a trainable part in their objective, which would allow their handlers to train them to behave properly, without having to directly hack their objective function by programmatic means.

Yoshua Bengio: Yann, these are good points, we indeed have much more control over machines than humans since we can design (and train) their objective function. I actually have some hopes that by using an objective-based mechanism relying on learning (to inculcate values) rather than a set of hard rules (like in much of our legal system), we could achieve more robustness to unforeseen value alignment mishaps. In fact, I surmise we should do that with human entities too, i.e., penalize companies, e.g. fiscally, when they behave in a way which hurts the common good, even if they are not directly violating an explicit law. This also suggests to me that we should try to avoid that any entity (person, company, AI) have too much power, to avoid such problems. On the other hand, although probably not in the near future, there could be AI systems which surpass human intellectual power in ways that could foil our attempts at setting objective functions which avoid harm to us. It seems hard to me to completely deny that possibility, which thus would beg for more research in (machine-) learning moral values, value alignment, and maybe even in public policies about AI (to minimize the events in which a stupid human brings about AI systems without the proper failsafes) etc.

Yann LeCun: @Yoshua Bengio if we can build "AI systems which surpass human intellectual power in ways that could foil our attempts at setting objective functions", we can also build similarly-powerful AI systems to set those objective functions.

Sort of like the discriminator in GANs....

• designing objectives for super-human entities is not a new problem. Human societies have been doing this through laws (concerning corporations and governments) for millennia.
• the defensive AI systems designed to protect against rogue AI systems are not akin to the military, they are akin to the police, to law enforcement. Their "jurisdiction" would be strictly AI systems, not humans.

But until we have a hint of a beginning of a design, with some visible path towards autonomous AI systems with non-trivial intelligence, we are arguing about the sex of angels.

Yuri Barzov: Aren't we overestimating the ability of imperfect humans to build a perfect machine? If it will be much more powerful than humans its imperfections will be also magnified. Cute human kids grow up into criminals if they get spoiled by reinforcement i.e. addiction to rewards. We use reinforcement and backpropagation (kind of reinforcement) in modern golden standard AI systems. Do we know enough about humans to be able to build a fault-proof human friendly super intelligent machine?

Yoshua Bengio: @Yann LeCun, about discriminators in GANs, and critics in Actor-Critic RL, one thing we know is that they tend to be biased. That is why the critic in Actor-Critic is not used as an objective function but instead as a baseline to reduce the variance. Similarly, optimizing the generator wrt a fixed discriminator does not work (you would converge to a single mode - unless you balance that with entropy maximization). Anyways, just to say, there is much more research to do, lots of unknown unknowns about learning moral objective functions for AIs. I'm not afraid of research challenges, but I can understand that some people would be concerned about the safety of gradually more powerful AIs with misaligned objectives. I actually like the way that Stuart Russell is attacking this problem by thinking about it not just in terms of an objective function but also about uncertainty: the AI should avoid actions which might hurt us (according to a self-estimate of the uncertain consequences of actions), and stay the conservative course with high confidence of accomplishing the mission while not creating collateral damage. I think that what you and I are trying to say is that all this is quite different from the terminator scenarios which some people in the media are brandishing. I also agree with you that there are lots of unknown unknowns about the strengths and weaknesses of future AIs, but I think that it is not too early to start thinking about these issues.

Yoshua Bengio: @Yuri Barzov the answer to your question: no. But we don't know that it is not feasible either, and we have reasons to believe that (a) it is not for tomorrow such machines will exist and (b) we have intellectual tools which may lead to solutions. Or maybe not!

Stuart Russell: Yann's comment "Facebook stopped maximizing clickthroughs several years ago and stopped using the time spent in the app as a criterion about 2 years ago" makes my point for me. Why did they stop doing it? Because it was the wrong objective function. Yann says we'd have to be "extremely stupid" to put the wrong objective into a super-powerful machine. Facebook's platform is not super-smart but it is super-powerful, because it connects with billions of people for hours every day. And yet they put the wrong objective function into it. QED. Fortunately they were able to reset it, but unfortunately one has to assume it's still optimizing a fixed objective. And the fact that it's operating within a large corporation that's designed to maximize another fixed objective - profit - means we cannot switch it off.

Stuart Russell: Regarding "externalities" - when talking about externalities, economists are making essentially the same point I'm making: externalities are the things not stated in the given objective function that get damaged when the system optimizes that objective function. In the case of the atmosphere, it's relatively easy to measure the amount of pollution and charge for it via taxes or fines, so correcting the problem is possible (unless the offender is too powerful). In the case of manipulation of human preferences and information states, it's very hard to assess costs and impose taxes or fines. The theory of uncertain objectives suggests instead that systems be designed to be "minimally invasive", i.e., don't mess with parts of the world state whose value is unclear. In particular, as a general rule it's probably best to avoid using fixed-objective reinforcement learning in human-facing systems, because the reinforcement learner will learn how to manipulate the human to maximize its objective.

Stuart Russell: @Yann LeCun Let's talk about climate change for a change. Many argue that it's an existential or near-existential threat to humanity. Was it "explicitly designed" as such? We created the corporation, which is a fixed-objective maximizer. The purpose was not to create an existential risk to humanity. Fossil-fuel corporations became super-powerful and, in certain relevant senses, super-intelligent: they anticipated and began planning for global warming five decades ago, executing a campaign that outwitted the rest of the human race. They didn't win the academic argument but they won in the real world, and the human race lost. I just attended an NAS meeting on climate control systems, where the consensus was that it was too dangerous to develop, say, solar radiation management systems - not because they might produce unexpected disastrous effects but because the fossil fuel corporations would use their existence as a further form of leverage in their so-far successful campaign to keep burning more carbon.

Stuart Russell: @Yann LeCun This seems to be a very weak argument. The objection raised by Omohundro and others who discuss instrumental goals is aimed at any system that operates by optimizing a fixed, known objective; which covers pretty much all present-day AI systems. So the issue is: what happens if we keep to that general plan - let's call it the "standard model" - and improve the capabilities for the system to achieve the objective? We don't need to know today *how* a future system achieves objectives more successfully, to see that it would be problematic. So the proposal is, don't build systems according to the standard model.

Yann LeCun: @Stuart Russell the problem is that essentially no AI system today is autonomous.

They are all trained *in advance* to optimize an objective, and subsequently execute the task with no regards to the objective, hence with no way to spontaneously deviate from the original behavior.

As of today, as far as I can tell, we do *not* have a good design for an autonomous machine, driven by an objective, capable of coming up with new strategies to optimize this objective in the real world.

We have plenty of those in games and simple simulation. But the learning paradigms are way too inefficient to be practical in the real world.

Yuri Barzov: @Yoshua Bengio yes. If we frame the problem correctly we will be able to resolve it. AI puts natural intelligence into focus like a magnifying mirror

Yann LeCun: @Stuart Russell in pretty much everything that society does (business, government, of whatever) behaviors are shaped through incentives, penalties via contracts, regulations and laws (let's call them collectively the objective function), which are proxies for the metric that needs to be optimized.

Because societies are complex systems, because humans are complex agents, and because conditions evolve, it is a requirement that the objective function be modifiable to correct unforeseen negative effects, loopholes, inefficiencies, etc.

The Facebook story is unremarkable in that respect: when bad side effects emerge, measures are taken to correct them. Often, these measures eliminate bad actors by directly changing their economic incentive (e.g. removing the economic incentive for clickbaits).

Perhaps we agree on the following:

(0) not all consequences of a fixed set of incentives can be predicted.

(1) because of that, objectives functions must be updatable.

(2) they must be updated to correct bad effect whenever they emerge.

(3) there should be an easy way to train minor aspects of objective functions through simple interaction (similar to the process of educating children), as opposed to programmatic means.

Perhaps where we disagree is the risk of inadvertently producing systems with badly-designed and (somehow) un-modifiable objectives that would be powerful enough to constitute existential threats.

Yoshua Bengio: @Yann LeCun this is true, but one aspect which concerns me (and others) is the gradual increase in power of some agents (now mostly large companies and some governments, potentially some AI systems in the future). When it was just weak humans the cost of mistakes or value misalignment (improper laws, misaligned objective function) was always very limited and local. As we build more and more powerful and intelligent tools and organizations, (1) it becomes easier to cheat for 'smarter' agents (exploit the misalignment) and (2) the cost of these misalignments becomes greater, potentially threatening the whole of society. This then does not leave much time and warning to react to value misalignment.

Discuss

### AI Alignment Open Thread November 2019

4 октября, 2019 - 04:28
Published on October 4, 2019 1:28 AM UTC

Continuing the experiment from August, let's try another open thread for AI Alignment discussion. The goal is to be a place where researchers and upcoming research can ask small questions they are confused about, share early stage ideas and have lower-key discussions.

Discuss

### [Link] Tools for thought (Matuschak & Nielson)

4 октября, 2019 - 03:42
Published on October 4, 2019 12:42 AM UTC

An excerpt:

Discuss

### To Be Decided #1

4 октября, 2019 - 02:22
Published on October 3, 2019 7:30 PM UTC

(Preface: This is the first edition of a quarterly email newsletter I started earlier this year called To Be Decided. I'm posting this as an experiment; if response here is positive, I'll post the two issues that have gone out since then as well as future issues as they come out. Feedback welcome!)

Welcome to the inaugural edition of To Be Decided, a quarterly newsletter about smarter decisions for a better world! TBD is all about deploying knowledge for impact, learning at scale, and making more thoughtful choices for ourselves and our organizations. Each edition will feature short and sweet reviews of important publications you don't want to miss but don't have time to read, along with a brief roundup of major developments in the world of learning and decision-making since last time.

Why Your Hard Work Sits on the Shelf—and What to Do About It

We've all been there. The time when the client seemed to forget the project ever happened as soon as the final check was cut. The time when your report stuffed full of creative recommendations got buried by risk-averse leadership. The time when stakeholders really did seem engaged by the findings, had lots of conversations, and then...nothing changed.

If you suspect these stories are more the rule than the exception, the evidence suggests you're right. And if the trend continues, chances are it's eventually going to catch up to those of us who generate and spread knowledge in the social sector. If we really want our work to be useful, we have to continue supporting decision-makers after the final report is delivered, working hand-in-hand with them to ensure whatever choices they make take into account not only the best information available but also other factors that matter to them, including their values, goals, and perceived obligations. For this reason, knowledge providers who want to see their work have greater impact might find value in partnering with a decision consultant in the form of a "wrap-around" service for knowledge initiatives.

Rethinking the Purpose of Measurement
Measurement is not a simple act of observation disconnected from any larger plan. Instead, it’s an optimization strategy for reducing uncertainty about decisions we need to make. That’s the central argument of Douglas Hubbard’s How to Measure Anything: Finding the Value of “Intangibles” in Business, which remains one of the most important books on decision-making I’ve read since first encountering it more than seven years ago. This revolutionary reframing argues that measurement can only have value if it can reduce uncertainty about a decision that mattersIt points toward an ultra-applied approach to evaluation and research that would represent a radical departure from the way these functions operate at most organizations today.

Funders Learn Mostly from Each Other. Is that Dangerous? "Peer to Peer: At the Heart of Influencing More Effective Philanthropy," commissioned by the Hewlett Foundation with the goal of understanding how foundations access and use knowledge, raises the question of whether there are enough intellectually curious foundation leaders who both keep tabs on new studies and reports as they come out and proactively share that knowledge with their peers. (Twitter thread)

• The very same day last December that negotiations to avoid the longest government shutdown in US history fell apart, President Trump signed into law one of the most important pieces of government performance legislation in 25 years. Among other reforms, it directs federal agencies to develop public learning agendas and hire senior evaluation officers. As improbable as it may seem, the Foundations for Evidence-Based Policy-Making Act was passed with broad bipartisan support by a Republican Congress following recommendations from an Obama-era presidential commission. (Side note: props to Bipartisan Policy Coalition's Nick Hart for braving Reddit to host a rowdy Ask Me Anything on this topic.)
• In a bid to accelerate the open science movement, the University of California system has declined to renew its $10 million annual contract with Elsevier, the world's largest publisher of scholarly research. • The Open Philanthropy Project, one of the most interesting funders in the world right now, has placed its biggest bet to date: a$55 million grant to help establish the new Center for Security and Emerging Technology at Georgetown University. The center, which will focus extensively on heading off threats from advanced artificial intelligence, will be headed by Jason Matheny, former director of the Intelligence Advanced Research Projects Activity (IARPA) program at the US Office of the Director of National Intelligence. Fun fact: while at IARPA, Matheny managed the prediction tournament that helped establish the empirical basis for the advanced techniques described in Philip Tetlock's popular book Superforecasting. (More about forecasting in a future TBD.)

That's all for now!

If you enjoyed this edition of TBD, please consider forwarding it to a friend. It's easy to sign up here. See you next time!

Discuss

### Long-Term Future Fund: August 2019 grant recommendations

3 октября, 2019 - 23:41
Published on October 3, 2019 8:41 PM UTC

Note: The Q4 deadline for applications to the Long-Term Future Fund is Friday 11th October. Apply here.

We opened up an application for grant requests earlier this year, and it was open for about one month. This post contains the list of grant recipients for Q3 2019, as well as some of the reasoning behind the grants. Most of the funding for these grants has already been distributed to the recipients.

In the writeups below, we explain the purpose for each grant and summarize our reasoning behind their recommendation. Each summary is written by the fund manager who was most excited about recommending the relevant grant (with a few exceptions that we've noted below). These differ a lot in length, based on how much available time the different fund members had to explain their reasoning.

When we’ve shared excerpts from an application, those excerpts may have been lightly edited for context or clarity.

Grant RecipientsGrants Made By the Long-Term Future Fund

Each grant recipient is followed by the size of the grant and their one-sentence description of their project. All of these grants have been made.

• Samuel Hilton, on behalf of the HIPE team ($60,000): Placing a staff member within the government, to support civil servants to do the most good they can. • Stag Lynn ($23,000): To spend the next year leveling up various technical skills with the goal of becoming more impactful in AI safety.
• Roam Research ($10,000): Workflowy, but with much more power to organize your thoughts and collaborate with others. • Alexander Gietelink Oldenziel ($30,000): Independent AI Safety thinking, doing research in aspects of self-reference in using techniques from type theory, topos theory and category theory more generally.
• Alexander Siegenfeld ($20,000): Characterizing the properties and constraints of complex systems and their external interactions. • Sören Mindermann ($36,982): Additional funding for an AI strategy PhD at Oxford / FHI to improve my research productivity
• AI Safety Camp ($41,000): A research experience program for prospective AI safety researchers. • Miranda Dixon-Luinenburg ($13,500): Writing EA-themed fiction that addresses X-risk topics.
• David Manheim ($30,000): Multi-model approach to corporate and state actors relevant to existential risk mitigation. • Joar Skalse ($10,000): Upskilling in ML in order to be able to do productive AI safety research sooner than otherwise.
• Chris Chambers ($36,635): Combat publication bias in science by promoting and supporting the Registered Reports journal format. • Jess Whittlestone ($75,080): Research on the links between short- and long-term AI policy while skilling up in technical ML.
• Lynette Bye ($23,000): Productivity coaching for effective altruists to increase their impact. Total distributed:$439,197

Other Recommendations

Sometimes, applicants get alternative sources of funding, or decide to work on a different project.

The following people and organizations were applicants of this kind. The Long-Term Future Fund recommended grants to them, but did not end up funding them. We sometimes create write-ups for these applicants and include them in our reports in order to provide readers with better information on the types of grants we like to recommend.

• Center for Applied Rationality ($150,000): Help promising people to reason more effectively and find high-impact work, such as reducing x-risk. Two grants we recommended but did not write up: • Jake Coble, who requested$10,000 to do some work with Simon Beard of CSER. This grant request came with an early deadline, so we made the recommendation earlier in the grant cycle. However, after our recommendation went out, Jake found a different project he preferred, and no longer required funding.
• We recommended another individual for a grant, but they wound up accepting funding from another source. (They requested that we not share their name; we would have shared this information had they received funding from us.)
Writeups by Helen TonerSamuel Hilton, on behalf of the HIPE team ($60,000)Placing a staff member within the government, to support civil servants to do the most good they can. This grant supports HIPE (https://hipe.org.uk), a UK-based organization that helps civil servants to have high-impact careers. HIPE’s primary activities are researching how to have a positive impact in the UK government; disseminating their findings via workshops, blog posts, etc.; and providing one-on-one support to interested individuals. HIPE has so far been entirely volunteer-run. This grant funds part of the cost of a full-time staff member for two years, plus some office and travel costs. Our reasoning for making this grant is based on our impression that HIPE has already been able to gain some traction as a volunteer organization, and on the fact that they now have the opportunity to place a full-time staff member within the Cabinet Office. We see this both as a promising opportunity in its own right, and also as a positive signal about the engagement HIPE has been able to create so far. The fact that the Cabinet Office is willing to provide desk space and cover part of the overhead cost for the staff member suggests that HIPE is engaging successfully with its core audiences. HIPE does not yet have robust ways of tracking its impact, but they expressed strong interest in improving their impact tracking over time. We would hope to see a more fleshed-out impact evaluation if we were asked to renew this grant in the future. I’ll add that I (Helen) personally see promise in the idea of services that offer career discussion, coaching, and mentoring in more specialized settings. (Other fund members may agree with this, but it was not part of our discussion when deciding whether to make this grant, so I’m not sure.) Writeups by Alex ZhuStag Lynn ($23,000)To spend the next year leveling up various technical skills with the goal of becoming more impactful in AI safety

Stag’s current intention is to spend the next year improving his skills in a variety of areas (e.g. programming, theoretical neuroscience, and game theory) with the goal of contributing to AI safety research, meeting relevant people in the x-risk community, and helping out in EA/rationality related contexts wherever he can (eg, at rationality summer camps like SPARC and ESPR).

Two projects he may pursue during the year:

• Working to implement certificates of impact in the EA/X-risk community, in the hope of encouraging coordination between funders with different values and increasing transparency around the contributions of different people to impactful projects.
• Working as an unpaid personal assistant to someone in EA who is sufficiently busy for this form of assistance to be useful, and sufficiently productive for the assistance to be valuable.

I recommended funding Stag because I think he is smart, productive, and altruistic, has a track record of doing useful work, and will contribute more usefully to reducing existential risk by directly developing his capabilities and embedding himself in the EA community than he would by finishing his undergraduate degree or working a full-time job. While I’m not yet clear on what projects he will pursue, I think it’s likely that the end result will be very valuable — projects like impact certificates require substantial work from someone with technical and executional skills, and Stag seems to me to fit the bill.

More on Stag’s background: In high school, Stag had top finishes in various Latvian and European Olympiads, including a gold medal in the 2015 Latvian Olympiad in Mathematics. Stag has also previously taken the initiative to work on EA causes -- for example, he joined two other people in Latvia in attempting to create the Latvian chapter of Effective Altruism (which reached the point of creating a Latvian-language website), and he has volunteered to take on major responsibilities in future iterations of the European Summer Program in Rationality (which introduces promising high-school students to effective altruism).

Potential conflict of interest: at the time of making the grant, Stag was living with me and helping me with various odd jobs, as part of his plan to meet people in the EA community and help out where he could. This arrangement lasted for about 1.5 months. To compensate for this potential issue, I’ve included notes on Stag from Oliver Habryka, another fund manager.

Oliver Habryka’s comments on Stag Lynn

I’ve interacted with Stag in the past and have broadly positive impressions of him, in particular his capacity for independent strategic thinking

Stag has achieved a high level of success in Latvian and Galois Mathematical Olympiads. I generally think that success in these competitions is one of the best predictors we have of a person’s future performance on making intellectual progress on core issues in AI safety. See also my comments and discussion on the grant to Misha Yagudin last round.

Stag has also contributed significantly to improving both ESPR and SPARC , both of which introduce talented pre-college students to core ideas in EA and AI safety. In particular, he’s helped the programs find and select strong participants, while suggesting curriculum changes that gave them more opportunities to think independently about important issues. This gives me a positive impression of Stag’s ability to contribute to other projects in the space. (I also consider ESPR and SPARC to be among the most cost-effective ways to get more excellent people interested in working on topics of relevance to the long-term future, and take this as another signal of Stag’s talent at selecting and/or improving projects.)

Roam Research ($10,000)Workflowy, but with much more power to organize your thoughts and collaborate with others. Roam is a web application which automates the Zettelkasten method, a note-taking / document-drafting process based on physical index cards. While it is difficult to start using the system, those who do often find it extremely helpful, including a researcher at MIRI who claims that the method doubled his research productivity. On my inside view, if Roam succeeds, an experienced user of the note-taking app Workflowy will get at least as much value switching to Roam as they got from using Workflowy in the first place. (Many EAs, myself included, see Workflowy as an integral part of our intellectual process, and I think Roam might become even more integral than Workflowy. See also Sarah Constantin’s review of Roam, which describes Roam as being potentially as “profound a mental prosthetic as hypertext”, and her more recent endorsement of Roam.) Over the course of the last year, I’ve had intermittent conversations with Conor White-Sullivan, Roam’s CEO, about the app. I started out in a position of skepticism: I doubted that Roam would ever have active users, let alone succeed at its stated mission. After a recent update call with Conor about his LTF Fund application, I was encouraged enough by Roam’s most recent progress, and sufficiently convinced of the possible upsides of its possible success, that I decided to recommend a grant to Roam. Since then, Roam has developed enough as a product that I’ve personally switched from Workflowy to Roam and now recommend Roam to my friends. Roam’s progress on its product, combined with its growing base of active users, has led me to feel significantly more optimistic about Roam succeeding at its mission. (This funding will support Roam’s general operating costs, including expenses for Conor, one employee, and several contractors.) Potential conflict of interest: Conor is a friend of mine, and I was once his housemate for a few months. Alexander Gietelink Oldenziel ($30,000)Independent AI Safety thinking, doing research in aspects of self-reference in using techniques from type theory, topos theory and category theory more generally.

In our previous round of grants, we funded MIRI as an organization: see our April reportfor a detailed explanation of why we chose to support their work. I think Alexander’s research directions could lead to significant progress on MIRI’s research agenda — in fact, MIRI was sufficiently impressed by his work that they offered him an internship. I have also spoken to him in some depth, and was impressed both by his research taste and clarity of thought.

After the internship ends, I think it will be valuable for Alexander to have additional funding to dig deeper into these topics; I expect this grant to support roughly 1.5 years of research. During this time, he will have regular contact with researchers at MIRI, reporting on his research progress and receiving feedback.

Alexander Siegenfeld ($20,000)Characterizing the properties and constraints of complex systems and their external interactions. Alexander is a 5th-year graduate student in physics at MIT, and he wants to conduct independent deconfusion research for AI safety. His goal is to get a better conceptual understanding of multi-level world models by coming up with better formalisms for analyzing complex systems at differing levels of scale, building off of the work of Yaneer Bar-Yam. (Yaneer is Alexander’s advisor, and the president of the New England Complex Science Institute.) I decided to recommend funding to Alexander because I think his research directions are promising, and because I was personally impressed by his technical abilities and his clarity of thought. Tsvi Benson-Tilsen, a MIRI researcher, was also impressed enough by Alexander to recommend that the Fund support him. Alexander plans to publish a paper on his research; it will be evaluated by researchers at MIRI, helping him decide how best to pursue further work in this area. Potential conflict of interest: Alexander and I have been friends since our undergraduate years at MIT. Writeups by Oliver Habryka I have a sense that funders in EA, usually due to time constraints, tend to give little feedback to organizations they fund (or decide not to fund). In my writeups below, I tried to be as transparent as possible in explaining the reasons for why I came to believe that each grant was a good idea, my greatest uncertainties and/or concerns with each grant, and some background models I use to evaluate grants. (I hope this last item will help others better understand my future decisions in this space.) I think that there exist more publicly defensible (or easier to understand) arguments for some of the grants that I recommended. However, I tried to explain the actual models that drove my decisions for these grants, which are often hard to summarize in a few paragraphs. I apologize in advance that some of the explanations below are probably difficult to understand. Thoughts on grant selection and grant incentives Some higher-level points on many of the grants below, as well as many grants from last round: For almost every grant we make, I have a lot of opinions and thoughts about how the applicant(s) could achieve their aims better. I also have a lot of ideas for projects that I would prefer to fund over the grants we are actually making. However, in the current structure of the LTFF, I primarily have the ability to select potential grantees from an established pool, rather than encouraging the creation of new projects. Alongside my time constraints, this means that I have a very limited ability to contribute to the projects with my own thoughts and models. Additionally, I spend a lot of time thinking independently about these areas, and have a broad view of “ideal projects that could be made to exist.” This means that for many of the grants I am recommending, it is not usually the case that I think the projects are very good on all the relevant dimensions; I can see how they fall short of my “ideal” projects. More frequently, the projects I fund are among the only available projects in a reference class I believe to be important, and I recommend them because I want projects of that type to receive more resources (and because they pass a moderate bar for quality). Some examples: • Our grant to the Kocherga community space club last round. I see Kocherga as the only promising project trying to build infrastructure that helps people pursue projects related to x-risk and rationality in Russia. • I recommended this round’s grant to Miranda partly because I think Miranda's plans are good and I think her past work in this domain and others is of high quality, but also because she is the only person who applied with a project in a domain that seems promising and neglected (using fiction to communicate otherwise hard-to-explain ideas relating to x-risk and how to work on difficult problems). • In the November 2018 grant round, I recommended a grant to Orpheus Lummis to run an AI safety unconference in Montreal. This is because I think he had a great idea, and would create a lot of value even if he ran the events only moderately well. This isn’t the same as believing Orpheus has excellent skills in the relevant domain; I can imagine other applicants who I’d have been more excited to fund, had they applied. I am, overall, still very excited about the grants below, and I think they are a much better use of resources than what I think of as the most common counterfactuals to donating to the LTFF fund (e.g. donating to the largest organizations in the space, donating based on time-limited personal research) . However, related to the points I made above, I will have many criticisms of almost all the projects that receive funding from us. I think that my criticisms are valid, but readers shouldn't interpret them to mean that I have a negative impression of the grants we are making — which are strong despite their flaws. Aggregating my individual (and frequently critical) recommendations will not give readers an accurate impression of my overall (highly positive) view of the grant round. (If I ever come to think that the pool of valuable grants has dried up, I will say so in a high-level note like this one.) I can imagine that in the future I might want to invest more resources into writing up lists of potential projects that I would be excited about, though it is also not clear to me that I want people to optimize too much for what I am excited about, and think that the current balance of "things that I think are exciting, and that people feel internally motivated to do and generated their own plans for" seems pretty decent. To follow up the above with a high-level assessment, I am slightly less excited about this round’s grants than I am about last round’s, and I’d estimate (very roughly) that this round is about 25% less cost-effective than the previous round. Acknowledgements For both this round and the last round, I wrote the writeups in collaboration with Ben Pace, who works with me on LessWrong and the Alignment Forum. After an extensive discussion about the grants and the Fund's reasoning for them, we split the grants between us and independently wrote initial drafts. We then iterated on those drafts until they accurately described my thinking about them and the relevant domains. I am also grateful for Aaron Gertler’s help with editing and refining these writeups, which has substantially increased their clarity. Sören Mindermann ($36,982)Additional funding for an AI strategy PhD at Oxford / FHI to improve my research productivity.I'm looking for additional funding to supplement my 15k pound/y PhD stipend for 3-4 years from September 2019. I am hoping to roughly double this. My PhD is at Oxford in machine learning, but co-supervised by Allan Dafoe from FHI so that I can focus on AI strategy. We will have multiple joint meetings each month, and I will have a desk at FHI.The purpose is to increase my productivity and happiness. Given my expected financial situation, I currently have to make compromises on e. g. Ubers, Soylent, eating out with colleagues, accommodation, quality and waiting times for health care, spending time comparing prices, travel durations and stress, and eating less healthily.I expect that more financial security would increase my own productivity and the effectiveness of the time invested by my supervisors.

I think that when FHI or other organizations in that reference class have trouble doing certain things due to logistical obstacles, we should usually step in and fill those gaps (e.g. see Jacob Lagerros’ grant from last round). My sense is that FHI has trouble with providing funding in situations like this (due to budgetary constraints imposed by Oxford University).

I’ve interacted with Sören in the past (during my work at CEA), and generally have positive impressions of him in a variety of domains, like his basic thinking about AI Alignment, and his general competence from running projects like the EA Newsletter.

I have a lot of trust in the judgment of Nick Bostrom and several other researchers at FHI. I am not currently very excited about the work at GovAI (the team that Allan Dafoe leads), but still have enough trust in many of the relevant decision makers to think that it is very likely that Soeren should be supported in his work.

In general, I think many of the salaries for people working on existential risk are low enough that they have to make major tradeoffs in order to deal with the resulting financial constraints. I think that increasing salaries in situations like this is a good idea (though I am hesitant about increasing salaries for other types of jobs, for a variety of reasons I won’t go into here, but am happy to expand on).

This funding should last for about 2 years of Sören’s time at Oxford.

AI Safety Camp (41,000)A research experience program for prospective AI safety researchers.We want to organize the 4th AI Safety Camp (AISC) - a research retreat and program for prospective AI safety researchers. Compared to past iterations, we plan to change the format to include a 3 to 4-day project generation period and team formation workshop, followed by a several-week period of online team collaboration on concrete research questions, a 6 to 7-day intensive research retreat, and ongoing mentoring after the camp. The target capacity is 25 - 30 participants, with projects that range from technical AI safety (majority) to policy and strategy research. More information about past camps is at https://aisafetycamp.com/[...]Early-career entry stage seems to be a less well-covered part of the talent pipeline, especially in Europe. Individual mentoring is costly from the standpoint of expert advisors (esp. compared to guided team work), while internships and e.g. MSFP have limited capacity and are US-centric. After the camp, we advise and encourage participants on future career steps and help connect them to other organizations, or direct them to further individual work and learning if they are pursuing an academic track..Overviews of previous research projects from the first 2 camps can be found here:1- http://bit.ly/2FFFcK12- http://bit.ly/2KKjPLBProjects from AISC3 are still in progress and there is no public summary.To evaluate the camp, we send out an evaluation form directly after the camp has concluded and then informally follow the career decisions, publications, and other AI safety/EA involvement of the participants. We plan to conduct a larger survey from past AISC participants later in 2019 to evaluate our mid-term impact. We expect to get a more comprehensive picture of the impact, but it is difficult to evaluate counterfactuals and indirect effects (e.g. networking effects). The (anecdotal) positive examples we attribute to past camps include the acceleration of entrance of several people in the field, research outputs that include 2 conference papers, several SW projects, and about 10 blogposts.The main direct costs of the camp are the opportunity costs of participants, organizers and advisors. There are also downside risks associated with personal conflicts at multi-day retreats and discouraging capable people from the field if the camp is run poorly. We actively work to prevent this by providing both on-site and external anonymous contact points, as well as actively attending to participant well-being, including during the online phases. This grant is for the AI Safety Camp, to which we made a grant in the last round. Of the grants I recommended this round, I am most uncertain about this one. The primary reason is that I have not received much evidence about the performance of either of the last two camps[1], and I assign at least some probability that the camps are not facilitating very much good work. (This is mostly because I have low expectations for the quality of most work of this kind and haven’t looked closely enough at the camp to override these — not because I have positive evidence that they produce low-quality work.) My biggest concern is that the camps do not provide a sufficient level of feedback and mentorship for the attendees. When I try to predict how well I’d expect a research retreat like the AI Safety Camp to go, much of the impact hinges on putting attendees into contact with more experienced researchers and having a good mentoring setup. Some of the problems I have with the output from the AI Safety Camp seem like they could be explained by a lack of mentorship. From the evidence I observe on their website, I see that the attendees of the second camp all produced an artifact of their research (e.g. an academic writeup or code repository). I think this is a very positive sign. That said, it doesn’t look like any alignment researchers have commented on any of this work (this may in part have been because most of it was presented in formats that require a lot of time to engage with, such as GitHub repositories), so I’m not sure the output actually lead to the participants to get any feedback on their research directions, which is one of the most important things for people new to the field. After some followup discussion with the organizers, I heard about changes to the upcoming camp (the target of this grant) that address some of the above concerns (independent of my feedback). In particular, the camp is being renamed to “AI Safety Research Program”, and is now split into two parts — a topic selection workshop and a research retreat, with experienced AI Alignment researchers attending the workshop. The format change seems likely to be a good idea, and makes me more optimistic about this grant. I generally think hackathons and retreats for researchers can be very valuable, allowing for focused thinking in a new environment. I think the AI Safety Camp is held at a relatively low cost, in a part of the world (Europe) where there exist few other opportunities for potential new researchers to spend time thinking about these topics, and some promising people have attended. I hope that the camps are going well, but I will not fund another one without spending significantly more time investigating the program. Footnotes [1] After signing off on this grant, I found out that, due to overlap between the organizers of the events, some feedback I got about this camp was actually feedback about the Human Aligned AI Summer School, which means that I had even less information than I thought. In April I said I wanted to talk with the organizers before renewing this grant, and I expected to have at least six months between applications from them, but we received another application this round and I ended up not having time for that conversation. Miranda Dixon-Luinenburg (13,500)Writing EA-themed fiction that addresses X-risk topics.I want to spend three months evaluating my ability to produce an original work that explores existential risk, rationality, EA, and related themes such as coordination between people with different beliefs and backgrounds, handling burnout, planning on long timescales, growth mindset, etc. I predict that completing a high-quality novel of this type would take ~12 months, so 3 months is just an initial test.In 3 months, I would hope to produce a detailed outline of an original work plus several completed chapters. Simultaneously, I would be evaluating whether writing full-time is a good fit for me in terms of motivation and personal wellbeing.[...]I have spent the last 2 years writing an EA-themed fanfiction of The Last Herald-Mage trilogy by Mercedes Lackey (online at https://archiveofourown.org/series/936480). In this period I have completed 9 “books” of the series, totalling 1.2M words (average of 60K words/month), mostly while I was also working full-time. (I am currently writing the final arc, and when I finish, hope to create a shorter abridged/edited version with a more solid beginning and better pacing overall.)In the writing process, I researched key background topics, in particular AI safety work (I read a number of Arbital articles and most of this MIRI paper on decision theory: https://arxiv.org/pdf/1710.05060v1.pdf), as well as ethics, mental health, organizational best practices, medieval history and economics, etc. I have accumulated a very dedicated group of around 10 beta readers, all EAs, who read early drafts of each section and give feedback on how well it addresses various topics, which gives me more confidence that I am portraying these concepts accurately.

One natural decomposition of whether this grant is a good idea is to first ask whether writing fiction of this type is valuable, then whether Miranda is capable of actually creating that type of fiction, and last whether funding Miranda will make a significant difference in the amount/quality of her fiction.

I think that many people reading this will be surprised or confused about this grant. I feel fairly confident that grants of this type are well worth considering, and I am interested in funding more projects like this in the future, so I’ve tried my best to summarize my reasoning. I do think there are some good arguments for why we should be hesitant to do so (partly summarized by the section below that lists things that I think fiction doesn’t do as well as non-fiction), so while I think that grants like this are quite important, and have the potential to do a significant amount of good, I can imagine changing my mind about this in the future.

The track record of fiction

In a general sense, I think that fiction has a pretty strong track record of both being successful at conveying important ideas, and being a good attractor of talent and other resources. I also think that good fiction is often necessary to establish shared norms and shared language.

Here are some examples of communities and institutions that I think used fiction very centrally in their function. Note that after the first example, I am making no claim that the effect was good, I’m just establishing the magnitude of the potential effect size.

• Harry Potter and the Methods of Rationality (HPMOR) was instrumental in the growth and development of both the EA and Rationality communities. It is very likely the single most important recruitment mechanism for productive AI alignment researchers, and has also drawn many other people to work on the broader aims of the EA and Rationality communities.
• Fiction was a core part of the strategy of the neoliberal movement; fiction writers were among the groups referred to by Hayek as "secondhand dealers in ideas.” An example of someone whose fiction played both a large role in the rise of neoliberalism and in its eventual spread would be Ayn Rand.
• Almost every major religion, culture and nation-state is built on shared myths and stories, usually fictional (though the stories are often held to be true by the groups in question, making this data point a bit more confusing).
• Francis Bacon’s (unfinished) utopian novel “The New Atlantis” is often cited as the primary inspiration for the founding of the Royal Society, which may have been the single institution with the greatest influence on the progress of the scientific revolution.

On a more conceptual level, I think fiction tends to be particularly good at achieving the following aims (compared to non-fiction writing):

• Teaching low-level cognitive patterns by displaying characters that follow those patterns, allowing the reader to learn from very concrete examples set in a fictional world. (Compare Aesop’s Fables to some nonfiction book of moral precepts — it can be much easier to remember good habits when we attach them to characters.)
• Establishing norms, by having stories that display the consequences of not following certain norms, and the rewards of following them in the right way
• Establishing a common language, by not only explaining concepts, but also showing concepts as they are used, and how they are brought up in conversational context
• Establishing common goals, by creating concrete utopian visions of possible features that motivate people to work towards them together
• Reaching a broader audience, since we naturally find stories more exciting than abstract descriptions of concepts

(I wrote in more detail about how this works for HPMOR in the last grant round.)

In contrast, here are some things that fiction is generally worse at (though a lot of these depend on context; since fiction often contains embedded non-fiction explanations, some of these can be overcome):

• Carefully evaluating ideas, in particular when evaluating them requires empirical data. There is a norm against showing graphs or tables in fiction books, making any explanation that rests on that kind of data difficult to access in fiction.
• Conveying precise technical definitions
• Engaging in dialogue with other writers and researchers
• Dealing with topics in which readers tend to come to better conclusions by mentally distancing themselves from the problem at hand, instead of engaging with concrete visceral examples (I think some ethical topics like the trolley problem qualify here, as well as problems that require mathematical concepts that don’t neatly correspond to easy real-world examples)

Overall, I think current writing about both existential risk, rationality, and effective altruism skews too much towards non-fiction, so I’m excited about experimenting with funding fiction writing.

Miranda’s writing

The second question is whether I trust Miranda to actually be able to write fiction that leverages these opportunities and provides value. This is why I think Miranda can do a good job:

• Her current fiction project is read by a few people whose taste I trust, and many of them describe having developed valuable skills or insights as a result (for example, better skills for crisis management, a better conception of moral philosophy, an improved moral compass, and some insights about decision theory)
• She wrote frequently on LessWrong and her blog for a few years, producing content of consistently high quality that, while not fictional, often displayed some of the same useful properties as fiction writing.
• I’ve seen her execute a large variety of difficult projects outside of her writing, which means I am a lot more optimistic about things like her ability to motivate herself on this project, and excelling in the non-writing aspects of the work (e.g. promoting her fiction to audiences beyond the EA and rationality communities)
• She worked in operations at CEA and received strong reviews from her coworkers
• She helped CFAR run the operations for SPARC in two consecutive years and performed well as a logistics volunteer for 11 of their other workshops
• I’ve seen her organize various events and provide useful help with logistics and general problem-solving on a large number of occasions

My two biggest concerns are:

• Miranda losing motivation to work on this project, because writing fiction with a specific goal requires a significantly different motivation than doing it for personal enjoyment
• The fiction being well-written and engaging, but failing to actually help people better understand the important issues it tries to cover.

I like the fact that this grant is for an exploratory 3 months rather than a longer period of time; this allows Miranda to pivot if it doesn’t work out, rather than being tied to a project that isn’t going well.

The counterfactual value of funding

It would be reasonable to ask whether a grant is really necessary, given that Miranda has produced a huge amount of fiction in the last two years without receiving funding explicitly dedicated to that. I have two thoughts here:

1. I generally think that we should avoid declining to pay people just because they’d be willing to do valuable work for free. It seems good to reward people for work even if this doesn’t make much of a difference in the quality/consistency of the work, because I expect this promise of reward to help people build long-term motivation and encourage exploration.
1. To explain this a bit more, I think this grant will help other people build motivation towards pursuing similar projects in the future, by setting a precedent for potential funding in this space. For example, I think the possibility of funding (and recognition) was also a motivator for Miranda in starting to work on this project.
2. I expect this grant to have a significant effect on Miranda’s productivity, because I think that there is often a qualitative difference between work someone produces in their spare time and work that someone can focus full-time on. In particular, I expect this grant to cause Miranda’s work to improve in the dimensions that she doesn’t naturally find very stimulating, which I expect will include editing, restructuring, and other forms of “polish”.
David Manheim ($30,000)Multi-model approach to corporate and state actors relevant to existential risk mitigation.Work for 2-3 months on continuing to build out a multi-model approach to understanding international relations and multi-stakeholder dynamics as it relates to risks of strong(er) AI systems development, based on and extending similar work done on biological weapons risks done on behalf of FHI's Biorisk group and supporting Open Philanthropy Project planning.This work is likely to help policy and decision analysis for effective altruism related to the deeply uncertain and complex issues in international relations and long term planning that need to be considered for many existential risk mitigation activities. While the project is focused on understanding actors and motivations in the short term, the decisions being supported are exactly those that are critical for existential risk mitigation, with long term implications for the future. I feel a lot of skepticism toward much of the work done in the academic study of international relations. Judging from my models of political influence and its effects on the quality of intellectual contributions, and my models of research fields with little ability to perform experiments, I have high priors that work in international relations is of significantly lower quality than in most scientific fields. However, I have engaged relatively little with actual research on the topic of international relations (outside of unusual scholars like Nick Bostrom) and so am hesitant in my judgement here. I also have a fair bit of worry around biorisk. I haven’t really had the opportunity to engage with a good case for it, and neither have many of the people I would trust most in this space, in large part due to secrecy concerns from people who work on it (more on that below). Due to this, I am worried about information cascades. (An information cascade is a situation where people primarily share what they believe but not why, and because people update on each others' beliefs you end up with a lot of people all believing the same thing precisely because everyone else does.) I think is valuable to work on biorisk, but this view is mostly based on individual conversations that are hard to summarize, and I feel uncomfortable with my level of understanding of possible interventions, or even just conceptual frameworks I could use to approach the problem. I don’t know how most people who work in this space came to decide it was important, and those I’ve spoken to have usually been reluctant to share details in conversation (e.g. about specific discoveries they think created risk, or types of arguments that convinced them to focus on biorisk over other threats). I’m broadly supportive of work done at places like FHI and by the people at OpenPhil who care about x-risks, so I am in favor of funding their work (e.g. Soren’s grant above). But I don’t feel as though I can defer to the people working in this domain on the object level when there is so much secrecy around their epistemic process, because I and others cannot evaluate their reasoning. However, I am excited about this grant, because I have a good amount of trust in David’s judgment. To be more specific, he has a track record of identifying important ideas and institutions and then working on/with them. Some concrete examples include: • Wrote up a paper on Goodhart’s Law with Scott Garrabrant (after seeing Scott’s very terse post on it) • Works with the biorisk teams at FHI and OpenPhil • Completed his PhD in public policy and decision theory at the RAND Corporation, which is an unusually innovative institution (e.g. this study); • Writes interesting comments and blog posts on the internet (e.g. LessWrong) • Has offered mentoring in his fields of expertise to other people working or preparing to work projects in the x-risk space; I’ve heard positive feedback from his mentees Another major factor for me is the degree to which David is shares his thinking openly and transparently on the internet, and participates in public discourse, so that other people interested in these topics can engage with his ideas. (He’s also a superforecaster, which I think is predictive of broadly good judgment.) If David didn’t have this track record of public discourse, I likely wouldn’t be recommending this grant, and if he suddenly stopped participating, I’d be fairly hesitant to recommend such a grant in the future. As I said, I’m not excited about the specific project he is proposing, but have trust in his sense of which projects might be good to work on, and I have emphasized to him that I think he should feel comfortable working on the projects he thinks are best. I strongly prefer a world where David has the freedom to work on the projects he judges to be most valuable, compared to the world where he has to take unrelated jobs (e.g. teaching at university). Joar Skalse ($10,000)Upskilling in ML in order to be able to do productive AI safety research sooner than otherwise.I am requesting grant money to upskill in machine learning (ML).Background: I am an undergraduate student in Computer Science and Philosophy at Oxford University, about to start the 4th year of a 4-year degree. I plan to do research in AI safety after I graduate, as I deem this to be the most promising way of having a significant positive impact on the long-term future[...]What I’d like to do:I would like to improve my skills in ML by reading literature and research, replicating research papers, building ML-based systems, and so on.To do this effectively, I need access to the compute that is required to train large models and run lengthy reinforcement learning experiments and similar.It would also likely be very beneficial if I could live in Oxford during the vacations, as I would then be in an environment in which it is easier to be productive. It would also make it easier for me to speak with the researchers there, and give me access to the facilities of the university (including libraries, etc.).It would also be useful to be able to attend conferences and similar events.

Joar was one of the co-authors on the Mesa-Optimisers paper, which I found surprisingly useful and clearly written, especially considering that its authors had relatively little background in alignment research or research in general. I think it is probably the second most important piece of writing on AI alignment that came out in the last 12 months, after the Embedded Agency sequence. My current best guess is that this type of conceptual clarification / deconfusion is the most important type of research in AI alignment, and the type of work I’m most interested in funding. While I don’t know exactly how Joar contributed to the paper, my sense is that all the authors put in a significant effort (bar Scott Garrabrant, who played a supervising role).

This grant is for projects during and in between terms at Oxford. I want to support Joar producing more of this kind of research, which I expect this grant to help with. He’s also been writing further thoughts online (example), which I think has many positive effects (personally and as externalities).

My brief thoughts on the paper (nontechnical):

• The paper introduced me to a lot of of terminology that I’ve continued to use over the past few months (which is not true for most terminology introduced in this space)
• It helped me deconfuse my thinking on a bunch of concrete problems (in particular on the question of whether things like Alpha Go can be dangerous when “scaled up”)
• I’ve seen multiple other researchers and thinkers I respect refer to it positively
• In addition to being published as a paper, it was written up as a series of blogposts in a way that made it a lot more accessible

More of my thoughts on the paper (technical):

Note: If you haven’t read the paper, or you don’t have other background in the subject, this section will likely be unclear. It’s not essential to the case for the grant, but I wanted to share it in case people with the requisite background are interested in more details about the research

I was surprised by how helpful the conceptual work in the paper was - helping me think about where the optimization was happening in a system like AlphaGo Zero improved my understanding of that system and how to connect it to other systems that do optimization in the world. The primary formalism in the paper was clarifying rather than obscuring (and the ratio of insight to formalism was very high - see my addendum below for more thoughts on that).

Once the basic concepts were in place, clarifying different basic tools that would encourage optimization to happen in either the base optimizer or the mesa optimizer (e.g. constraining and expanding space/time offered to the base or mesa optimizers has interesting effects), plus clarifying the types of alignment / pseudo-alignment / internalizing of the base objective, all helped me think about this issue very clearly. It largely used basic technical language I already knew, and put it together in ways that would’ve taken me many months to achieve on my own - a very helpful conceptual piece of work.

Further Writeups by Oliver Habryka

The following three grants were more exciting to one or more other fund managers than they were to me. I think that for all three, if it had just been me on the grant committee, we might have not actually made them. However, I had more resources available to invest into these writeups, and as such I ended up summarizing my view on them, instead of someone else on the fund doing so. As such, they are probably less representative of the reasons for why we made these grants than the writeups above.

In the course of thinking through these grants, I formed (and wrote out below) more detailed, explicit models of the topics. Although these models were not counterfactual in the Fund’s making the grants, I think they are fairly predictive of my future grant recommendations.

Chris Chambers ($36,635) Note: Application sent in by Jacob Hilton. Combat publication bias in science by promoting and supporting the Registered Reports journal formatI'm suggesting a grant to fund a teaching buyout for Professor Chris Chambers, an academic at the University of Cardiff working to promote and support Registered Reports. This funding opportunity was originally identified and researched by Hauke Hillebrandt, who published a full analysis here. In brief, a Registered Report is a format for journal articles where peer review and acceptance decisions happen before data is collected, so that the results are much less susceptible to publication bias. The grant would free Chris of teaching duties so that he can work full-time on trying to get Registered Reports to become part of mainstream science, which includes outreach to journal editors and supporting them through the process of adopting the format for their journal. More details of Chris's plans can be found here. I think the main reason for funding this is from a worldview diversification perspective: I would expect it to broadly improve the efficiency of scientific research by improving the communication of negative results, and to enable people to make better-informed use of scientific research by reducing publication bias. I would expect these effects to be primarily within fields where empirical tests tend to be useful but not always definitive, such as clinical trials (one of Chris's focus areas), which would have knock-on effects on health. From an X-risk perspective, the key question to answer seems to be which technologies differentially benefit from this grant. I do not have a strong opinion on this, but to quote Brian Wang from a Facebook thread: "In terms of [...] bio-risk, my initial thoughts are that reproducibility concerns in biology are strongest when it comes to biomedicine, a field that can be broadly viewed as defense-enabling. By contrast, I'm not sure that reproducibility concerns hinder the more fundamental, offense-enabling developments in biology all that much (e.g., the falling costs of gene synthesis, the discovery of CRISPR)." As for why this particular intervention strikes me as a cost-effective way to improve science, it is shovel-ready, it may be the sort of thing that traditional funding sources would miss, it has been carefully vetted by Hauke, and I thought that Chris seemed thoughtful and intelligent from his videoed talk.” The Let’s Fund report linked in the application played a major role in my assessment of the grant, and I probably would not have been comfortable recommending this grant without access to that report. Thoughts on Registered Reports The replication crisis in psychology, and the broad spread of “career science,” have made it (to me) quite clear that the methodological foundations of at least psychology itself, but possibly also the broader life-sciences, are creating a very large volume of false and likely unreproducible claims. This is in large part caused by problematic incentives for individual scientists to engage in highly biased reporting and statistically dubious practices. I think preregistration has the opportunity to fix a small but significant part of this problem, primarily by reducing file-drawer effects. To borrow an explanation from the Let’s Fund report (lightly edited for clarity): [Pre-registration] was introduced to address two problems: publication bias and analytical flexibility (in particular outcome switching in the case of clinical medicine).Publication bias, also known as the file drawer problem, refers to the fact that many more studies are conducted than published. Studies that obtain positive and novel results are more likely to be published than studies that obtain negative results or report replications of prior results. The consequence is that the published literature indicates stronger evidence for findings than exists in reality.Outcome switching refers to the possibility of changing the outcomes of interest in the study depending on the observed results. A researcher may include ten variables that could be considered outcomes of the research, and — once the results are known — intentionally or unintentionally select the subset of outcomes that show statistically significant results as the outcomes of interest. The consequence is an increase in the likelihood that reported results are spurious by leveraging chance, while negative evidence gets ignored.This is one of several related research practices that can inflate spurious findings when analysis decisions are made with knowledge of the observed data, such as selection of models, exclusion rules and covariates. Such data-contingent analysis decisions constitute what has become known as P-hacking, and pre-registration can protect against all of these.[...]It also effectively blinds the researcher to the outcome because the data are not collected yet and the outcomes are not yet known. This way the researcher’s unconscious biases cannot influence the analysis strategy “Registered reports” refers to a specific protocol that journals are encouraged to adopt, which integrates preregistration into the journal acceptance process. Illustrated by this picture (borrowed from the Let’s Fund report): Of the many ways to implement preregistration practices, I don’t think the one that Chambers proposes seems ideal, and I can see some flaws with it, but I still think that the quality of clinical science (and potentially other fields) will significantly improve if more journals adopt the registered reports protocol. (Please keep this in mind as you read my concerns in the next section.) The importance of bandwidth constraints for journals Chambers has the explicit goal of making all clinical trials require the use of registered reports. That outcome seems potentially quite harmful, and possibly worse than the current state of clinical science. (However, since that current state is very far from “universal registered reports,” I am not very worried about this grant contributing to that scenario.) The Let’s Fund report covers the benefits of preregistration pretty well, so I won’t go into much detail here. Instead, I will mention some of my specific concerns with the protocol that Chambers is trying to promote. From the registered reports website: Manuscripts that pass peer review will be issued an in principle acceptance (IPA), indicating that the article will be published pending successful completion of the study according to the exact methods and analytic procedures outlined, as well as a defensible and evidence-bound interpretation of the results. This seems unlikely to be the best course of action. I don’t think that the most widely-read journals should only publish replications. The key reason is that many scientific journals are solving a bandwidth constraint - sharing papers that are worth reading, not merely papers that say true things, to help researchers keep up to date with new findings in their field. A math journal could have papers for every true mathematical statement, including trivial ones, but they instead need to focus on true statements that are useful to signal boost to the mathematics community. (Related concepts are the tradeoff between bias and variance in Machine Learning, or accuracy and calibration in forecasting.) Ultimately, from a value of information perspective, it is totally possible for a study to only be interesting if it finds a positive result, and to be uninteresting when analyzed pre-publication from the perspective of the editor. It seems better to encourage pre-publication, but still take into account the information value of a paper’s experimental results, even if this doesn’t fully prevent publication bias. To give a concrete (and highly simplified) example, imagine a world where you are trying to find an effective treatment for a disease. You don’t have great theory in this space, so you basically have to test 100 plausible treatments. On their own, none of these have a high likelihood of being effective, but you expect that at least one of them will work reasonably well. Currently, you would preregister those trials (as is required for clinical trials), and then start performing the studies one by one. Each failure provides relatively little information (since the prior probability was low anyways), so you are unlikely to be able to publish it in a prestigious journal, but you can probably still publish it somewhere. Not many people would hear about it, but it would be findable if someone is looking specifically for evidence about the specific disease you are trying to treat, or the treatment that you tried out. However, finding a successful treatment is highly valuable information which will likely get published in a journal with a lot of readers, causing lots of people to hear about the potential new treatment. In a world with mandatory registered reports, none of these studies will be published in a high-readership journal, since journals will be forced to make a decision before they know the outcome of a treatment. Because all 100 studies are equally unpromising, none are likely to pass the high bar of such a journal, and they’ll wind up in obscure publications (if they are published at all) [1]. Thus, even if one of them finds a successful result, few people will hear about it. High-readership journals exist in large part to spread news about valuable results in a limited bandwidth environment; this no longer happens in scenarios of this kind. Because of dynamics like this, I think it is very unlikely that any major journals will ever switch towards only publishing registered report-based studies, even within clinical trials, since no journal would want to pass up on the opportunity to publish a study that has the opportunity to revolutionize the field. Importance of selecting for clarity Here is the full set of criteria that papers are being evaluated by for stage 2 of the registered reports process: 1. Whether the data are able to test the authors’ proposed hypotheses by satisfying the approved outcome-neutral conditions (such as quality checks or positive controls)2. Whether the Introduction, rationale and stated hypotheses are the same as the approved Stage 1 submission (required)3. Whether the authors adhered precisely to the registered experimental procedures4. Whether any unregistered post hoc analyses added by the authors are justified, methodologically sound, and informative5. Whether the authors’ conclusions are justified given the data The above list is comprehensive, and does not include any mention of the clarity of the authors’ writing, the quality/rigor of the explanation provided by the paper’s methodology, or the implications of the paper’s findings on underlying theory. (All of these are very important to how journals currently evaluate papers.) This means that journals can only filter for those characteristics in the first stage of the registered reports process, when large parts of the paper haven’t yet been written. As a result, large parts of the paper basically have no selection applied to them for conceptual clarity, as well as thoughtful analysis of implications for future theory, likely resulting in those qualities getting worse. I think the goal of registered reports is to split research in two halves where you publish two separate papers, one that is empirical, and another that is purely theoretical, which that takes the results of the first paper as given and explores their consequences. We already see this split a good amount in physics, in which there exists a pretty significant divide between experimental and theoretical physics, the latter of which rarely performs experiments. I don’t know whether encouraging this split in a given field is a net-improvement, since I generally think that a lot of good science comes from combining the gathering of good empirical data with careful analysis and explanations, and I am particularly worried that the analysis of the results in papers published via registered reports will be of particularly low-quality, which encourages the spread of bad explanations and misconceptions which can cause a lot of damage (though some of that is definitely offset by reducing the degree to which scientists can fit hypotheses post-hoc, due to preregistration). The costs here seem related to Chris Olah’s article on research debt. Again, I think both of these problems are unlikely to become serious issues, because at most I can imagine getting to a world where something between 10% and 30% of top journal publications in a given field have gone through registered reports-based preregistration. I would be deeply surprised if there weren’t alternative outlets for papers that do try to combine the gathering of empirical data with high-quality explanations and analysis. Failures due to bureaucracy I should also note clinical science is not something I have spent large amounts of time thinking about, that I am quite concerned about adding more red tape and necessary logistical hurdles to jump through when registering clinical trials. I have high uncertainty about the effect of registered reports on the costs of doing small-scale clinical experiments, but it seems more likely than not that they will lengthen the review process, and add additional methodological constraints. (There is also a chance that it will reduce these burdens by giving scientists feedback earlier in the process and letting them be more certain of the value of running a particular study. However, this effect seems slightly weaker to me than the additional costs, though I am very uncertain about this.) In the current scientific environment, running even a simple clinical study may require millions of dollars of overhead (a related example is detailed in Scott Alexander’s “My IRB nightmare”). I believe this barrier is a substantial drag on progress in medical science. In this context, I think that requiring even more mandatory documentation, and adding even more upfront costs, seems very costly. (Though again, it seems highly unlikely for the registered reports format to ever become mandatory on a large scale, and giving more researchers the option to publish a study via the registered reports protocol, depending on their local tradeoffs, seems likely net-positive) To summarize these three points: • If journals have to commit to publishing studies, it’s not obvious to me that this is good, given that they would have to do so without access to important information (e.g. whether a surprising result was found) and only a limited number of slots for publishing papers. • It seems quite important for journals to be able to select papers based on the clarity of their explanations, both for ease of communication and for conceptual refinement. • Excessive red tape in clinical research seems like one of the main problems with medical science today, so adding more is worrying, though the sign of the registered reports protocol on this is a bit ambigious Differential technological progress Let’s Fund covers differential technological progress concerns in their writeup. Key quote: One might worry that funding meta-research indiscriminately speeds up all research, including research which carries a lot of risks. However, for the above reasons, we believe that meta-research improves predominantly social science and applied clinical science (“p-value science’) and so has a strong differential technological development element, that hopefully makes society wiser before more risks from technology emerge through innovation. However, there are some reproducibility concerns in harder sciences such as basic biological research and high energy physics that might be sped up by meta-research and thus carry risks from emerging technologies[110]. My sense is that further progress in sociology and psychology seems net positive from a global catastrophic risk reduction perspective. The case for clinical science seems a bit weaker, but still positive. In general, I am more excited about this grant in worlds in which global catastrophes are less immediate and less likely than my usual models suggest, and I’m thinking of this grant in some sense as a hedging bet, in case we live in one of those worlds. Overall, a reasonable summary of my position on this grant would be "I think preregistration helps, but is probably not really attacking the core issues in science. I think this grant is good, because I think it actually makes preregistration a possibility in a large number of journals, though I disagree with Chris Chalmers on whether it would be good for all clinical trials to require preregistration, which I think would be quite bad. On the margin, I support his efforts, but if I ever come to change my mind about this, it’s likely for one or more of the above reasons." Footnotes [1]: The journal could also publish a random subset, though at scale that gives rise to the same dynamics, so I’ll ignore that case. It could also batch a large number of the experiments until the expected value of information is above the relevant threshold, though that significantly increases costs. Jess Whittlestone ($75,080)

Note: Funding from this grant will go to the Leverhulme Centre for the Future of Intelligence, which will fund Jess in turn. The LTF Fund is not replacing funding that CFI would have supplied instead; without this grant, Jess would need to pursue grants from sources outside CFI.

The main work I know of Jess’s is her early involvement in 80,000 Hours. In the first 1-2 years of their existence, she wrote dozens of articles for them, and contributed to their culture and development. Since then I’ve seen her make positive contributions to a number of projects over the years - she has helped in some form with every EA Global conference I’ve organized (two in 2015 and one in 2016), and she’s continued to write publicly in places like the EA Forum, the EA Handbook, and news sites like Quartz and Vox. This background implies that Jess has had a lot of opportunities for members of the fund to judge her output. My sense is that this is the main reason that the other members of the fund were excited about this grant — they generally trust Jess’s judgment and value her experience (while being more hesitant about CFI’s work).

There are three things I looked into for this grant writeup: Jess’s policy research output, Jess’s blog, and the institutional quality of Leverhulme CFI. The section on Leverhulme CFI became longer than the section on Jess and was mostly unrelated to her work, so I’ve taken it out and included it as an addendum.

Impressions of Policy Papers

First is her policy research. The papers I read were from those linked on her blog. They were:

On the first paper, about focusing on tensions: the paper said that many “principles of AI ethics” that people publicly talk about in industry, non-profit, government and academia are substantively meaningless, because they don’t come with the sort of concrete advice that actually tells you how to apply them - and specifically, how to trade them off against each other. The part of the paper I found most interesting were four paragraphs pointing to specific tensions between principles of AI ethics. They were:

• Using data to improve the quality and efficiency of services vs. respecting the privacy and autonomy of individuals
• Using algorithms to make decisions and predictions more accurate vs. ensuring fair and equal treatment
• Reaping the benefits of increased personalization in the digital sphere vs. enhancing solidarity and citizenship
• Using automation to make people’s lives more convenient and empowered vs. promoting self-actualization and dignity

My sense is that while there is some good public discussion about AI and policy (e.g. OpenAI’s work on release practices seems quite positive to me), much conversation that brands itself as ‘ethics’ is often not motivated by the desire to ensure this novel technology improves society in accordance with our deepest values, but instead by factors like reputation, PR and politics.

There are many notions, like Peter Thiel’s “At its core, artificial intelligence is a military technology” or the common question “Who should control the AI?” which don’t fully account for the details of how machine learning and artificial intelligence systems work, or the ways in which we need to think about them in very different ways from other technologies; in particular, that we will need to build new concepts and abstractions to talk about them. I think this is also true of most conversations around making AI fair, inclusive, democratic, safe, beneficial, respectful of privacy, etc.; they seldom consider how these values can be grounded in modern ML systems or future AGI systems. My sense is that much of the best conversation around AI is about how to correctly conceptualize it. This is something that (I was surprised to find) Henry Kissinger’s article on AI did well; he spends most of the essay trying to figure out which abstractions to use, as opposed to using already existing ones.

The reason I liked that bit of Jess’s paper is that I felt the paper used mainstream language around AI ethics (in a way that could appeal to a broad audience), but then:

• Correctly pointed out that AI is a sufficiently novel technology that we’re going to have to rethink what these values actually mean, because the technology causes a host of fundamentally novel ways for them to come into tension
• Provided concrete examples of these tensions

In the context of a public conversation that I feel is often substantially motivated by politics and PR rather than truth, seeing someone point clearly at important conceptual problems felt like a breath of fresh air.

That said, given all of the political incentives around public discussion of AI and ethics, I don’t know how papers like this can improve the conversation. For example, companies are worried about losing in the court of Twitter’s public opinion, and also are worried about things like governmental regulation, which are strong forces pushing them to primarily take popular but ineffectual steps to be more "ethical". I’m not saying papers like this can’t improve this situation in principle, only that I don’t personally feel like I have much of a clue about how to do it or how to evaluate whether someone else is doing it well, in advance of their having successfully done it.

Personally, I feel much more able to evaluate the conceptual work of figuring out how to think about AI and its strategic implications (two standout examples are this paper by Bostrom and this LessWrong post by Christiano), rather than work on revising popular views about AI. I’d be excited to see Jess continue with the conceptual side of her work, but if she instead primarily aims to influence public conversation (the other goal of that paper), I personally don’t think I’ll be able to evaluate and recommend grants on that basis.

From the second paper I read sections 3 and 4, which lists many safety and security practices in the fields of biosafety, computer information security, and institutional review boards (IRBs), then outlines variables for analysing release practices in ML. I found it useful, even if it was shallow (i.e. did not go into much depth in the fields it covered). Overall, the paper felt like a fine first step in thinking about this space.

In both papers, I was concerned with the level of inspiration drawn from bioethics, which seems to me to be a terribly broken field (cf. Scott Alexander talking about his IRB nightmare or medicine’s ‘culture of life’). My understanding is that bioethics coordinated a successful power grab (cf. OpenPhil’s writeup) from the field of medicine, creating hundreds of dysfunctional and impractical ethics boards that have formed a highly adversarial relationship with doctors (whose practical involvement with patients often makes them better than ethicists at making tradeoffs between treatment, pain/suffering, and dignity). The formation of an “AI ethics” community that has this sort of adversarial, unhealthy relationship with machine learning researchers would be an incredible catastrophe.

Overall, it seems like Jess is still at the beginning of her research career (she’s only been in this field for ~1.5 years). And while she’s spent a lot of effort on areas that don’t personally excite me, both of her papers include interesting ideas, and I’m curious to see her future work.

Impressions of Other Writing

Jess also writes a blog, and this is one of the main things that makes me excited about this grant. On the topic of AI, she wrote three posts (1, 2, 3), all of which made good points on at least one important issue. I also thought the post on confirmation bias and her PhD was quite thoughtful. It correctly identified a lot of problems with discussions of confirmation bias in psychology, and came to a much more nuanced view of the trade-off between being open-minded versus committing to your plans and beliefs. Overall, the posts show independent thinking written with an intent to actually convey understanding to the reader, and doing a good job of it. They share the vibe I associate with much of Julia Galef’s work - they’re noticing true observations / conceptual clarifications, successfully moving the conversation forward one or two steps, and avoiding political conflict.

I do have some significant concerns with the work above, including the positive portrayal of bioethics and the absence of any criticism toward the AAAI safety conference talks, many of which seem to me to have major flaws.

While I’m not excited about Leverhulme CFI’s work (see the addendum for details), I think it will be good for Jess to have free rein to follow her own research initiatives within CFI. And while she might be able to obtain funding elsewhere, this alternative seems considerably worse, as I expect other funding options would substantially constrain the types of research she’d be able to conduct.

Lynette Bye ($23,000)Productivity coaching for effective altruists to increase their impact.I plan to continue coaching high-impact EAs on productivity. I expect to have 600+ sessions with about 100 clients over the next year, focusing on people working in AI safety and EA orgs. I’ve worked with people at FHI, Open Phil, CEA, MIRI, CHAI, DeepMind, the Forethought Foundation, and ACE, and will probably continue to do so. Half of my current clients (and a third of all clients I’ve worked with) are people at these orgs. I aim to increase my clients’ output by improving prioritization and increasing focused work time.I would use the funding to: offer a subsidized rate to people at EA orgs (e.g. between$10 and $50 instead of$125 per call), offer free coaching for select coachees referred by 80,000 Hours, and hire contractors to help me create materials to scale coaching.You can view my impact evaluation (linked below) for how I’m measuring my impact so far.

(Lynette’s public self-evaluation is here.)

I generally think it's pretty hard to do "productivity coaching" as your primary activity, especially when you are young, due to a lack of work experience. This means I have a high bar for it being a good idea that someone should go full-time into the "help other people be more productive” business.

My sense is that Lynette meets that bar, but only barely (to be clear, I consider it to be a high bar). The main thing that she seems to be doing well is being very organized about everything that she is doing, in a way that makes me confident that her work has had a real impact — if not, I think she’d have noticed and moved on to something else.

However, as I say in the CFAR writeup, I have a lot of concerns with primarily optimising for legibility, and Lynette’s work shows some signs of this. She has shared around 60 testimonials on her website (linked here). Of these, not one of them mentioned anything negative, which clearly indicates that I can't straightforwardly interpret those testimonials as positive evidence (since any unbiased sampling process would have resulted in at least some negative datapoints). I much prefer what another applicant did here: they asked people to send us information anonymously, which increased the chance of our hearing opinions that weren’t selected to create a positive impression. As is, I think I actually shouldn't update much on the testimonials, in particular given that none of them go into much detail on how Lynette has helped them, and almost all of them share a similar structure.

Reflecting on the broader picture, I think that Lynette’s mindset reflects how I think many of the best operations staff I’ve seen operate: aim to be productive by using simple output metrics, and by doing things in a mindful, structured way (as opposed to, for example, trying to aim for deep transformative insights more traditionally associated with psychotherapy). There is a deep grounded-ness and practical nature to it. I have a lot of respect for that mindset, and I feel as though it's underrepresented in the current EA/rationality landscape. My inside-view models suggest that you can achieve a bunch of good things by helping people become more productive in this way.

I also think that this mindset comes with a type of pragmatism that I am more concerned about, and often gives rise to what I consider unhealthy adversarial dynamics. As I discussed above, it’s difficult to get information from Lynette’s positive testimonials. My sense is that she might have produced them by directly optimising for “getting a grant” and trying to give me lots of positive information, leading to substantial bias in the selection process. The technique of ‘just optimize for the target’ is valuable in lots of domains, but in this case was quite negative.

That said, framing her coaching as achieving a series of similar results generally moves me closer to thinking about this grant as "coaching as a commodity". Importantly, few people reported very large gains in their productivity; the testimonials instead show a solid stream of small improvements. I think that very few people have access to good coaching, and the high variance in coach quality means that experimenting is often quite expensive and time-consuming. Lynette seems to be able to consistently produce positive effects in the people she is working with, making her services a lot more valuable due to greater certainty around the outcome. (However, I also assign significant probability that the way the evaluation questions were asked reduced the rate at which clients reported either negative or highly positive experiences.)

I think that many productivity coaches fail to achieve Lynette’s level of reliability, which is one of the key things that makes me hopeful about her work here. My guess is that the value-add of coaching is often straightforwardly positive unless you impose significant costs on your clients, and Lynette seems quite good at avoiding that by primarily optimizing for professionalism and reliability.

Further Recommendations (not funded by the LTF Fund)Center for Applied Rationality (150,000) This grant was recommended by the Fund, but ultimately was funded by a private donor, who (prior to CEA finalizing its standard due diligence checks) had personally offered to make this donation instead. As such, the grant recommendation was withdrawn. Oliver Habryka had created a full writeup by that point, so it is included below. Help promising people to reason more effectively and find high-impact work, such as reducing x-risk.The Center for Applied Rationality runs workshops that promote particular epistemic norms—broadly, that beliefs should be true, bugs should be solved, and that intuitions/aversions often contain useful data. These workshops are designed to cause potentially impactful people to reason more effectively, and to find people who may be interested in pursuing high-impact careers (especially AI safety).Many of the people currently working on AI safety have been through a CFAR workshop, such as 27% of the attendees at the 2019 FLI conference on Beneficial AI in Puerto Rico, and for some of those people it appears that CFAR played a causal role in their decision to switch careers. In the confidential section, we list some graduates from CFAR programs who subsequently decided to work on AI safety, along with our estimates of the counterfactual impact of CFAR on their decision [16 at MIRI, 3 on the OpenAI safety team, 2 at CHAI, and one each at Ought, Open Phil and the DeepMind safety team].Recruitment is the most legible form of impact CFAR has, and is probably its most important—the top reported bottleneck in the last two years among EA leaders at Leaders Forum, for example, was finding talented employees.[...]In 2019, we expect to run or co-run over 100 days of workshops, including our mainline workshop (designed to grow/improve the rationality community), workshops designed specifically to recruit programmers (AIRCS) and mathematicians (MSFP) to AI safety orgs, a 4-weekend instructor training program (to increase our capacity to run workshops), and alumni reunions in both the United States and Europe (to grow the EA/rationality community and cause impactful people to meet/talk with one another). Broadly speaking, we intend to continue doing the sort of work we have been doing so far. In our last grant round, I took an outside view on CFAR and said that, in terms of output, I felt satisfied with CFAR's achievements in recruitment, training and the establishment of communal epistemic norms. I still feel this way about those areas, and my writeup last round still seems like an accurate summary of my reasons for wanting to grant to CFAR. I also said that most of my uncertainty about CFAR lies in its long-term strategic plans, and I continue to feel relatively confused about my thoughts on that. I find it difficult to explain my thoughts on CFAR, and I think that a large fraction of this difficulty comes from CFAR being an organization that is intentionally not optimizing towards being easy to understand from the outside, having simple metrics, or more broadly being legible[1]. CFAR is intentionally avoiding being legible to the outside world in many ways. This decision is not obviously wrong, as I think it brings many positives, but I think it is the cause of me feeling particularly confused about how to talk coherently about CFAR. Considerations around legibility Summary: CFAR’s work is varied and difficult to evaluate. This has some good features — it can avoid focusing too closely on metrics that don’t measure impact well — but also forces evaluators to rely on factors that aren’t easy to measure, like the quality of its internal culture. On the whole, while I wish CFAR were somewhat more legible, I appreciate the benefits to CFAR’s work of not maximizing “legibility” at the cost of impact or flexibility. To help me explain my point, let's contrast CFAR with an organization like AMF, which I think of as exceptionally legible. AMF’s work, compared to many other organizations with tens of millions of dollars on hand, is easy to understand: they buy bednets and give them to poor people in developing countries. As long as AMF continues to carry out this plan, and provides basic data showing its success in bednet distribution, I feel like I can easily model what the organization will do. If I found out that AMF was spending 10% of its money funding religious leaders in developing countries to preach good ethical principles for society, or funding the campaigns of government officials favorable to their work, I would be very surprised and feel like some basic agreement or contract had been violated — regardless of whether I thought those decisions, in the abstract, were good or bad for their mission. AMF claims to distribute anti-malaria bednets, and it is on this basis that I would choose whether to support them. AMF could have been a very different organization, and still could be if it wanted to. For example, it could conduct research on various ways to effect change, and give its core staff the freedom to do whatever they thought was best. This new AMF (“AMF 2.0”) might not be able to tell you exactly what they’ll do next year, because they haven’t figured it out yet, but they can tell you that they’ll do whatever their staff determine is best. This could be distributing deworming pills, pursuing speculative medical research, engaging in political activism, funding religious organizations, etc. If GiveWell wanted to evaluate AMF 2.0, they would need to use a radically different style of reasoning. There wouldn’t be a straightforward intervention with RCTs to look into. There wouldn’t be a straightforward track record of impact from which to extrapolate. Judging AMF 2.0 would require GiveWell to form much more nuanced judgments about the quality of thinking and execution of AMF’s staff, to evaluate the quality of its internal culture, and to consider a host of other factors that weren’t previously relevant. I think that evaluating CFAR requires a lot of that kind of analysis, which seems inherently harder to communicate to other people without summarizing one’s views as: "I trust the people in that organization to make good decisions." The more general idea here is that organizations are subject to bandwidth constraints - they often want to do lots of different things, but their funders need to be able to understand and predict their behavior with limited resources for evaluation. As I've written about recently, a key variable for any organization is the people and organizations by which they are trying to be understood and held accountable. For charities that receive most of their funding in small donations from a large population of people who don’t know much about them, this is a very strong constraint; they must communicate their work so that people can understand it very quickly with little background information. If a charity instead receives most of its funding in large donations from a small set of people who follow it closely, it can communicate much more freely, because the funders will be able to spend a lot of their time talking to the org, exchanging models, and generally coming to an understanding of why the org is doing what it’s doing. This idea partly explains why most organizations tend to focus on legibility, in how they talk about their work and even in the work they choose to pursue. It can be difficult to attract resources and support from external parties if one’s work isn’t legible. I think that CFAR is still likely optimizing too little towards legibility, compared to what I think would be ideal for it. Being legible allows an organization to be more confident that its work is having real effects, because it acquires evidence that holds up to a variety of different viewpoints. However, I think that far too many organizations (nonprofit and otherwise) are trying too hard to make their work legible, in a way that reduces innovation and also introduces a variety of adversarial dynamics. When you make systems that can be gamed, and which carry rewards for success (e.g. job stability, prestige, etc), people will reliably turn up to game them[2]. (As Jacob Lagerros has written in his post on Unconscious Economics, this doesn’t mean people are consciously gaming your system, but merely that this behavior will eventually transpire. The many causes of this include selection effects, reinforcement learning, and memetic evolution.) In my view, CFAR, by not trying to optimize for a single, easy-to-explain metric, avoids playing the “game” many nonprofits play of focusing on work that will look obviously good to donors, even if it isn’t what the nonprofit believes would be most impactful. They also avoid a variety of other games that come from legibility, such as job applicants getting very good at faking the signals that they are a good fit for an organization, making it harder for them to find good applicants. Optimizing for communication with the goal of being given resources introduces adversarial dynamics; someone asking for money may provide limited/biased information that raises the chance they’ll be given a grant but reduces the accuracy of the grantmaker’s understanding. (See my comment in Lynette’s writeup below for an example of how this can arise.) This optimization can also tie down your resources, forcing you to carry out commitments you made for the sake of legibility, rather than doing what you think would be most impactful[3]. So I think that it's important that we don't force all organizations towards maximal legibility. (That said, we should ensure that organizations are encouraged to pursue at least some degree of legibility, since the lack of legibility also gives rise to various problems.) Do I trust CFAR to make good decisions? As I mentioned in my initial comments on CFAR, I generally think that the current projects CFAR is working on are quite valuable and worth the resources they are consuming. But I have a lot of trouble modeling CFAR’s long-term planning, and I feel like I have to rely instead on my models of how much I trust CFAR to make good decisions in general, instead of being able to evaluate the merits of their actual plans. That said, I do generally trust CFAR's decision-making. It’s hard to explain the evidence that causes me to believe this, but I’ll give a brief overview anyway. (This evidence probably won’t be compelling to others, but I still want to give an accurate summary of where my beliefs come from): • I expect that a large fraction of CFAR's future strategic plans will continue to be made by Anna Salamon, from whom I have learned a lot of valuable long-term thinking skills, and who seems to me to have made good decisions for CFAR in the past. • I think CFAR's culture, while imperfect, is still based on strong foundations of good reasoning with deep roots in the philosophy of science and the writings of Eliezer Yudkowsky (which I think serve as a good basis for learning how to think clearly). • I have made a lot of what I consider my best and most important strategic decisions in the context of, and aided by, events organized by CFAR. This suggests to me that at least some of that generalizes to CFAR's internal ability to think strategically. • I am excited about a number of individuals who intend to complete CFAR's latest round of instructor training, which gives me some optimism about CFAR's future access to good talent and its ability to establish and sustain a good internal culture. Footnotes [1] The focus on ‘legibility’ in this context I take from James C. Scott’s book “Seeing Like a State.” It was introduced to me by Elizabeth Van Nostrand in this blogpost discussing it in the context of GiveWell and good giving; Scott Alexander also discussed it in his review of the book . Here’s an example from Scott regarding centralized planning and governance: the centralized state wanted the world to be “legible”, ie arranged in a way that made it easy to monitor and control. An intact forest might be more productive than an evenly-spaced rectangular grid of Norway spruce, but it was harder to legislate rules for, or assess taxes on. [2] The errors that follow are all forms of Goodhart’s Law, which states that “any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.” [3] The benefits of (and forces that encourage) stability and reliability can maybe be most transparently understood in the context of menu costs and the prevalence of highly sticky wages. AddendaAddendum: Thoughts on a Strategy Article by the Leadership of Leverhulme CFI and CSER I wrote the following in the course of thinking about the grant to Jess Whittlestone. While the grant is to support Jess’s work, the grant money will go to Leverhulme CFI, which will maintain discretion about whether to continue employing her, and will likely influence what type of work she will pursue. As such, it seems important to not only look into Jess’s work, but also look into Leverhulme CFI and its sister organization, the Centre for the Study of Existential Risk (CSER). While my evaluation of the organization that will support Jess during her postdoc is relevant to my evaluation of the grant, it is quite long and does not directly discuss Jess or her work, so I’ve moved it into a separate section. I’ve read a few papers from CFI and CSER over the years, and heard many impressions of their work from other people. For this writeup, I wanted to engage more concretely with their output. I reread and reviewed an article published in Nature earlier this year called Bridging near- and long-term concerns about AI, written by the Executive Directors at Leverhulme CFI and CSER respectively, Stephen Cave and Seán ÓhÉigeartaigh. Summary and aims of the article The article’s summary: Debate about the impacts of AI is often split into two camps, one associated with the near term and the other with the long term. This divide is a mistake — the connections between the two perspectives deserve more attention, say Stephen Cave and Seán S. ÓhÉigeartaigh. This is not a position I hold, and I’m going to engage with the content below in more detail. Overall, I found the claims of the essay hard to parse and often ambiguous, but I’ve attempted to summarize what I view as its three main points: 1. If ML is a primary technology used in AGI, then there are likely some design decisions today that will create lock-in in the long-term and have increasingly important implications for AGI safety. 2. If we can predict changes in society from ML that matter in the long-term (such as automation of jobs), then we can prepare policy for them in the short term (like preparing educational retraining for lorry drivers who will be automated). 3. Norms and institutions built today will have long-term effects, and so people who care about the long term should especially care about near-term norms and institutions. They say “These three points relate to ways in which addressing near-term issues could contribute to solving potential long-term problems. If I ask myself what Leverhulme/CSER’s goals are for this document, it feels to me like it is intended as a statement of diplomacy. It’s saying that near-term and long-term AI risk work are split into two camps, but that we should be looking for common ground (“the connections between the two perspectives deserve more attention”, “Learning from the long term”). It tries to emphasize shared values (“Connected research priorities”) and the importance of cooperation amongst many entities (“The challenges we will face are likely to require deep interdisciplinary and intersectoral collaboration between industries, academia and policymakers, alongside new international agreements”). The goal that I think it is trying to achieve is to negotiate trade and peace between the near-term and long-term camps by arguing that “This divide is a mistake”. Drawing the definitions does a lot of work The authors define “long-term concerns” with the following three examples: wide-scale loss of jobs, risks of AI developing broad superhuman capabilities that could put it beyond our control, and fundamental questions about humanity’s place in a world with intelligent machines Despite this broad definition, they only use concrete examples from the first category, which I would classify as something like “mid-term issues.” I think the possibility of even wide-scale loss of jobs, unless interpreted extremely broadly, is something that does not make sense to put into the same category as the other two, which are primarily concerned with stakes that are orders of magnitude higher (such as the future of the human species). I think this conflation of very different concerns causes the rest of the article to make an argument that is more likely to mislead than to inform. After this definition, the article failed to mention any issue that I would classify as representative of the long-term concerns of Nick Bostrom or Max Tegmark, both of whom are cited by the article to define “long-term issues.” (In Tegmark’s book Life 3.0, he explicitly categorizes unemployment as a short-term concern, to be distinguished from long-term concerns.) Conceptual confusions in short- and mid-term policy suggestions The article has the following policy idea: Take explainability (the extent to which the decisions of autonomous systems can be understood by relevant humans): if regulatory measures make this a requirement, more funding will go to developing transparent systems, while techniques that are powerful but opaque may be deprioritized. (Let me be clear that this is not explicitly listed as a policy recommendation.) My naive prior is that there is no good AI regulation a government could establish today. I continue to feel this way after looking into this case (and the next example below). Let me explain why in this case the idea that regulation requiring explainability would encourage transparent + explainable systems is false. Modern ML systems are not doing a type of reasoning that is amenable to explanation in the way human decisions often are. There is not a principled explanation of their reasoning when deciding whether to offer you a bank loan, there is merely a mass of correlations between spending history and later reliability, which may factorise into a small number of well-defined chunks like “how regularly someone pays their rent” but it might not. The main problem with the quoted paragraph is that it does not at all attempt to specify how to define explainability in an ML system to the point where it can be regulated, meaning that any regulation would either be meaningless and ignored, or worse highly damaging. Policies formed in this manner will either be of no consequence, or deeply antagonise the ML community. We currently don’t know how to think about explainability of ML systems, and ignoring that problem and regulating that they should be ‘explainable’ will not work. The article also contains the following policy idea about autonomous weapons. The decisions we make now, for example, on international regulation of autonomous weapons, could have an outsized impact on how this field develops. A firm precedent that only a human can make a ‘kill’ decision could significantly shape how AI is used — for example, putting the focus on enhancing instead of replacing human capacities. Here and throughout the article, repeated uses of the conditional ‘could’ make it unclear to me whether this is being endorsed or merely suggested. I can’t quite tell if they think that drone swarms are a long-term issue - they contrast it with a short-term issue but don’t explicitly say that it is long-term. Nonetheless, I think their suggesting it here is also a bit misguided. Let me contrast this with Nick Bostrom on a recent episode of the Joe Rogan Experienceexplaining that he thinks that the specific rule has ambiguous value. Here’s a quote from a discussion of the campaign to ban lethal autonomous weapons: Nick Bostrom: I’ve kind of stood a little bit on the sidelines on that particular campaign, being a little unsure exactly what it is that… certainly I think it’d be better if we refrained from having some arms race to develop these than not. But if you start to look in more detail: What precisely is the thing that you’re hoping to ban? So if the idea is the autonomous bit, that the robot should not be able to make its own firing decision, well, if the alternative to that is there is some 19-year old guy sitting in some office building and his job is whenever the screen flashes ‘fire now’ he has to press a red button. And exactly the same thing happens. I’m not sure how much is gained by having that extra step.Interviewer: But it feels better for us for some reason. If someone is pushing the button.Nick Bostrom: But what exactly does that mean. In every particular firing decision? Well, you gotta attack this group of surface ships here, and here are the general parameters, and you’re not allowed to fire outside these coordinates? I don’t know. Another is the question of: it would be better if we had no wars, but if there is gonna be a war, maybe it is better if it’s robots v robots. Or if there’s gonna be bombing, maybe you want the bombs to have high precision rather than low precision - get fewer civilian casualties.[...]On the other hand you could imagine it reduces the threshold for going to war, if you think that you wouldn’t fear any casualties you would be more eager to do it. Or if it proliferates and you have these mosquito-sized killer-bots that terrorists have. It doesn’t seem like a good thing to have a society where you have a facial-recognition thing, and then the bot flies out and you just have a kind of dystopia. Overall, it seems that in both situations, the key open questions are in understanding the systems and how they’ll interface with areas of industry, government and personal life, and that regulation based on inaccurate conceptualizations of the technology would either be meaningless or harmful. Polarizing approach to policy coordination I have two main concerns with what I see as the intent of the paper. The first one can be summarized by Robin Hanson’s article To Oppose Polarization, Tug Sideways: The policy world can [be] thought of as consisting of a few Tug-O-War "ropes" set up in this high dimensional policy space. If you want to find a comfortable place in this world, where the people around you are reassured that you are "one of them," you need to continually and clearly telegraph your loyalty by treating each policy issue as another opportunity to find more supporting arguments for your side of the key dimensions. That is, pick a rope and pull on it.If, however, you actually want to improve policy, if you have a secure enough position to say what you like, and if you can find a relevant audience, then [you should] prefer to pull policy ropes sideways. Few will bother to resist such pulls, and since few will have considered such moves, you have a much better chance of identifying a move that improves policy. On the few main dimensions, not only will you find it very hard to move the rope much, but you should have little confidence that you actually have superior information about which way the rope should be pulled. I feel like the article above is not pulling policy ropes sideways, but is instead connecting long-term issues to specific sides of existing policy debates, around which there is already a lot of tension. The issue of technological unemployment seems to me to be a highly polarizing topic, where taking a position seems ill-advised, and I have very low confidence about the correct direction in which to pull policy. Entangling long-term issues with these highly tense short-term issues seems like it will likely reduce our future ability to broadly coordinate on these issues (by having them associated with highly polarized existing debates). Distinction between long- and short-term thinking My second concern is that on a deeper level, I think that the type of thinking that generates a lot of the arguments around concerns for long-term technological risks is very different from that which suggests policies around technological unemployment and racial bias. I think there is some value in having these separate ways of thinking engage in “conversation,” but I think the linked paper is confusing in that it seems to try to down-play the differences between them. An analogy might be the differences between physics and architecture; both fields nominally work with many similar objects, but the distinction between the two is very important, and the fields clearly require different types of thinking and problem-solving. Some of my concerns are summarized by Eliezer in his writing on Pivotal Acts: ...compared to the much more difficult problems involved with making something actually smarter than you be safe, it may be tempting to try to write papers that you know you can finish, like a paper on robotic cars causing unemployment in the trucking industry, or a paper on who holds legal liability when a factory machine crushes a worker. But while it's true that crushed factory workers and unemployed truckers are both, ceteris paribus, bad, they are not astronomical catastrophes that transform all galaxies inside our future light cone into paperclips, and the latter category seems worth distinguishing......there will [...] be a temptation for the grantseeker to argue, "Well, if AI causes unemployment, that could slow world economic growth, which will make countries more hostile to each other, which would make it harder to prevent an AI arms race." But the possibility of something ending up having a non-zero impact on astronomical stakes is not the same concept as events that have a game-changing impact on astronomical stakes. The question is what are the largest lowest-hanging fruit in astronomical stakes, not whether something can be argued as defensible by pointing to a non-zero astronomical impact. I currently don’t think that someone who is trying to understand how to deal with technological long-term risk should spend much time thinking about technological unemployment or related issues, but it feels like the paper is trying to advocate for the opposite position. Concluding thoughts on the article Many people in the AI policy space have to spend a lot of effort to gain respect and influence, and it’s genuinely hard to figure out a way to do this while acting with integrity. One common difficulty in this area is navigating the incentives to connect one’s arguments to issues that already get a lot of attention (e.g. ongoing political debates). My read is that this essay makes these connections even when they aren’t justified; it implies that many short- and medium-term concerns are a natural extension of current long-term thought, while failing to accurately portray what I consider to be the core arguments around long-term risks and benefits from AI. It seems like the effect of this essay will be to reduce perceived differences between long-term, mid-term and short-term work on risks from AI, to cause confusion about the actual concerns of Bostrom et al., and to make future communications work in this space harder and more polarized. Broader thoughts on CSER and CFI I only had the time and space to critique one specific article from CFI and CSER. However, from talking to others working in the global catastrophic risk space, and from engagement with significant fractions of the rest of CSER and CFI’s work, I've come to think that the problems I see in this article are mostly representative of the problems I see in CSER’s and CFI’s broader strategy and work. I don’t think what I’ve written sufficiently justifies that claim; however, it seems useful to share this broader assessment to allow others to make better predictions about my future grant recommendations, and maybe also to open a dialogue that might cause me to change my mind. Overall, based on the concerns I’ve expressed in this essay, and that I’ve had with other parts of CFI and CSER’s work, I worry that their efforts to shape the conversation around AI policy, and to mend disputes between those focused on long-term and short-term problems, do not address important underlying issues and may have net-negative consequences. That said, it’s good that these organizations give some researchers a way to get PhDs/postdocs at Cambridge with relatively little institutional oversight and an opportunity to explore a large variety of different topics (e.g. Jess, and Shahar Avin, a previous grantee whose work I’m excited about). Addendum: Thoughts on incentives in technical fields in academia I wrote the following in the course of writing about the AI Safety Camp. This is a model I use commonly when thinking about funding for AI alignment work, but it ended up not being very relevant to that writeup, so I’m leaving it here as a note of interest. My understanding of many parts of technical academia is that there is a strong incentive to make your writing hard to understand while appearing more impressive by using a lot of math. Eliezer Yudkowsky describes his understanding of it as such (and expands on this further in the rocket alignment problem): The point of current AI safety work is to cross, e.g., the gap between [. . . ] saying “Ha ha, I want AIs to have an off switch, but it might be dangerous to be the one holding the off switch!” to, e.g., realizing that utility indifference is an open problem. After this, we cross the gap to solving utility indifference in unbounded form. Much later, we cross the gap to a form of utility indifference that actually works in practice with whatever machine learning techniques are used, come the day.Progress in modern AI safety mainly looks like progress in conceptual clarity — getting past the stage of “Ha ha it might be dangerous to be holding the off switch.” Even though Stuart Armstrong’s original proposal for utility indifference completely failed to work (as observed at MIRI by myself and Benya), it was still a lot of conceptual progress compared to the “Ha ha that might be dangerous” stage of thinking.Simple ideas like these would be where I expect the battle for the hearts of future grad students to take place; somebody with exposure to Armstrong’s first simple idea knows better than to walk directly into the whirling razor blades without having solved the corresponding problem of fixing Armstrong’s solution. A lot of the actual increment of benefit to the world comes from getting more minds past the “walk directly into the whirling razor blades” stage of thinking, which is not complex-math-dependent.Later, there’s a need to have real deployable solutions, which may or may not look like impressive math per se. But actual increments of safety there may be a long time coming. [. . . ]Any problem whose current MIRI-solution looks hard (the kind of proof produced by people competing in an inexploitable market to look impressive, who gravitate to problems where they can produce proofs that look like costly signals of intelligence) is a place where we’re flailing around and grasping at complicated results in order to marginally improve our understanding of a confusing subject matter. Techniques you can actually adapt in a safe AI, come the day, will probably have very simple cores — the sort of core concept that takes up three paragraphs, where any reviewer who didn’t spend five years struggling on the problem themselves will think, “Oh I could have thought of that.” Someday there may be a book full of clever and difficult things to say about the simple core — contrast the simplicity of the core concept of causal models, versus the complexity of proving all the clever things Judea Pearl had to say about causal models. But the planetary benefit is mainly from posing understandable problems crisply enough so that people can see they are open, and then from the simpler abstract properties of a found solution — complicated aspects will not carry over to real AIs later. And gives a concrete example here: The journal paper that Stuart Armstrong coauthored on "interruptibility" is a far step down from Armstrong's other work on corrigibility. It had to be dumbed way down (I'm counting obscuration with fancy equations and math results as "dumbing down") to be published in a mainstream journal. It had to be stripped of all the caveats and any mention of explicit incompleteness, which is necessary meta-information for any ongoing incremental progress, not to mention important from a safety standpoint. The root cause can be debated but the observable seems plain. If you want to get real work done, the obvious strategy would be to not subject yourself to any academic incentives or bureaucratic processes. Particularly including peer review by non-"hobbyists" (peer commentary by fellow "hobbyists" still being potentially very valuable), or review by grant committees staffed by the sort of people who are still impressed by academic sage-costuming and will want you to compete against pointlessly obscured but terribly serious-looking equations. (Here is a public example of Stuart’s work on utility indifference, though I had difficulty finding the most relevant examples of his work on this subject.) Some examples that seem to me to use an appropriate level of formalism include: the Embedded Agency sequence, the Mesa-Optimisation paper, some posts by DeepMind researchers (thoughts on human models, classifying specification problems as variants of Goodhart’s law), and many other blog posts by these authors and others on the AI Alignment Forum. There’s a sense in which it’s fine to play around with the few formalisms you have a grasp of when you’re getting to grips with ideas in this field. For example, MIRI recently held a retreat for new researchers, which led to a number of blog posts that followed this pattern (1, 2, 3, 4). But aiming for lots of technical formalism is not helpful - any conception of useful work that focuses primarily on molding the idea to the format rather than molding the format to the idea, especially for (nominally) impressive technical formats, is likely optimizing for the wrong metric and falling prey to Goodhart’s law. Discuss ### New Petrov Game Brainstorm 3 октября, 2019 - 22:48 Published on October 3, 2019 7:48 PM UTC Big thanks to the LW team for putting together the Petrov Day experience! (Setup. Follow up.) I looked over the comments and it seems like there there were a number of suggestions for how to do this better. Instead of waiting for the next year, let's do it right now. My proposed setup: 1. Take the original 125 LW users. Take a prize pool of1,250 (or more if people are willing to donate). The prize pool is split evenly between each player, but you have to survive the game to get paid. Everyone is anonymized in the game.

2. The game will last a minimum of 4 days (to give everyone enough time to act, strategize, and think). After 4 days, there will be an increasing probability that the game will end at any minute. (This is to prevent anyone trying to attack right when the game ends to avoid retaliation. In expectation, the game should last about a week.)

3. Each player will have the number of missiles equal to the number of players. They can launch any number of them.

4. When a missile is launched: a) the attacked player is notified that they are being attacked by a specific player (and therefore has an option to retaliate), b) 48 hours after the launch, the attacked player is declared dead: they can no longer perform any actions and will not receive a payout, c) 48 hours after the launch, the attacking player gets the target player's entire prize pool.

5. During the game there will be at least 125 fake alerts. They will be generated randomly (so some players might receive zero or more than one fake alerts). It will look the same as if some specific player has launched a missile against you. 48 hours after the notification, you'll find out whether it was real or not by whether or not you're still alive.

• You can see who has been killed.
• You can only know about missiles launches that you have done or that have been done to target you.
• If you take money from a player who already took money from someone else, you get those too. So in theory, we could end up with one winner with the entire original prize pool.
• When people create their accounts, they'll have an optional to either receive their winnings directly (let's say via PayPal) or to donate to LW. (Personally, I hope most people will choose the second option, which will make the payout much less of a hustle.)
• I'm not sure how to easily structure this so that the players are completely anonymous. (For example, if I'm sending the payouts, I'll know.) If this seems like an important feature, I'm willing to work through this. (E.g. each player gets a random invite code and creates a new account. There's no record of who received what invite code. Payouts are done to anonymous BTC addresses.)

What do you think?

• Does it look like the rules are facilitating the kind of experiment we want to run?
• If you were part of the originally selected group, are you willing to participate?
• Can anyone add to the prize pool?
• Any clever way to structure this game on top of some existing platform to avoid writing too much code?

Discuss

### [Link] (EA Podcast) Global Optimum: How to Learn Better

3 октября, 2019 - 18:51
Published on October 3, 2019 12:29 AM UTC

I host the podcast Global Optimum. The goal of the podcast is to make altruists more effective. The most recent episode is about how to learn more effectively. I review the psychological literature on learning techniques and discuss the interplay between scholarship and rationality.

This episode features:

-What are the best and worst studying techniques?

-Do “learning styles” exist?

-How to squeeze more learning into your day

-How to start learning a new field

-How to cultivate viewpoint diversity

-How to avoid getting parasitized by bad ideas

-Should you study in the morning or at night?

-Can napping enhance learning?

Full transcript: http://danielgambacorta.com/podcast/how-to-learn-better/

The podcast is available on all podcast apps.

Discuss

### Can we make peace with moral indeterminacy?

3 октября, 2019 - 15:56
Published on October 3, 2019 12:56 PM UTC

The problem:

Put humans in the ancestral environment, and they'll behave as if they like nutrition and reproducing. Put them in the modern environment, and they'll behave as if they like tasty food and good feelings. Pump heroin into their brains, and they'll behave as if they want high dopamine levels.

None of these are the One True Values of the humans, they're just what humans seem to value in context, at different levels of abstraction. And this is all there is - there is no One True Context in which we find One True Values, there are just regular contexts. Thus we're in a bit of a pickle when it comes to teaching an AI how we want the world to be rearranged, because there's no One True Best State Of The World.

This underdetermination gets even worse when we consider that there's no One True Generalization Procedure, either. At least for everyday sorts of questions (do I want nutrition, or do I want tasty food?), we're doing interpolation, not extrapolation. But when we ask about contexts or options totally outside the training set (how should we arrange the atoms of the Milky Way?), we're back to the problem illustrated with train tracks in The Tails Coming Apart As Metaphor For Life.

Sometimes it feels like for every value alignment proposal, the arbitrariness of certain decisions sticks out like a missing finger on a hand. And we just have to hope that it all works out fine and that this arbitrary decision turns out to be a good one, because there's no way to make a non-arbitrary decision for some choices.

Is it possible for us to make peace with this upsetting fact of moral indeterminacy? If two slightly different methods of value learning give two very different plans for the galaxy, should we regard both plans as equally good, and be fine with either? I don't think this acceptance of arbitrariness is crazy, and some amount is absolutely necessary. But this pill might be less bitter to swallow if we clarify our picture of what "value learning" is supposed to be doing in the first place.

AIs aren't driving towards their One Best State anyhow:

For example, what kind of "human values" object do we want a value learning scheme to learn? Because it ain't a utility function over microphysical states of the world.

After all, we don't want a FAI to be in the business of finding the best position for all the atoms, and then moving the atoms there and freezing them. We want the "best state" to contain people growing, exploring, changing the environment, and so on. This is only a "state" at all when viewed at some very high level of abstraction that incorporates history and time evolution.

So when two Friendly AIs generalize differently, this might look less like totally different end-states for the galaxy, but like subtly different opinions on which dynamics make for a satisfying galactic society... which eventually lead to totally different end-states for the galaxy. Look, I never said this would make the problem go away - we're still talking about generalizing from our training set to the entire universe, here. If I'm making any comforting point here, it's that the arbitrariness doesn't have to be tense or alien or too big to comprehend, it can be between reasonable things that all sound like good ideas.

Meta-ethics:

And jumping Jehoshaphat, we haven't even talked about meta-ethics yet. AI that takes meta-ethics into account wouldn't only learn what we appear to value according to whatever definition it started with, it would try to take into account what we think it means to value things, what it means to make good decisions, what we think we value, and what we want to value.

This can get a lot trickier than just inferring a utility function from a human's actions, and we don't have a very good understanding of it right now. But our concern about the arbitrariness of values is precisely a meta-ethical concern, so you can see why it might be a big deal to build an AI that cares about meta-ethics. I'd want a superhuman meta-ethical reasoner to learn that there was something weird and scary about this problem of formalizing and generalizing values, and take superhumanly reasonable steps to address this. The only problem is I have no idea how to build such a thing.

But in lieu of superintelligent solutions, we can still try to research appealing metaethical schemes for controlling generalization.

One such scheme is incrementalism. Rather than immediately striking out for the optimal utopia your model predicts, maybe it's safer to follow something like an iterative process - humans learning, thinking, growing, changing the world, and eventually ending up at a utopia that might not be what you had in mind at the start. (More technically, we might simulate this process as flow between environments, where we start with our current environment and values, and flow to nearby environments based on our rating of them, at each step updating our values not according to what they would actually be in that environment, but based on an idealized meta-ethical update rule set by our current selves.)

This was inspired by Scott Garrabrant's question about gradient descent vs. Goodhart's law. If we think of utopias as optimized points in a landscape of possibilities, we might want to find ones that lie near to home - via hill-climbing or other local dynamics - rather than trusting our model to safely teleport us to some far-off point in configuration space.

It also bears resemblance to Eliezer_2004's meta-ethical wish list: "if we knew more, [...] were the people we wished we were, had grown up farther together..." There just seems to be something meta-ethically trustworthy about "growing up more."

This also illustrates how the project of incorporating meta-ethics into value learning really has its work cut out for it. Of course there are arbitrary choices in meta-ethics too, but somehow they seem more palatable than arbitrary choices at the lower meta-level. Whether we do it with artificial help or not, I think it's possible to gradually tease out what sort of things we want from value learning, which might not reduce the number of arbitrary choices, but hopefully can reduce their danger and mystery.

Discuss

3 октября, 2019 - 13:50
Published on October 3, 2019 10:50 AM UTC

By default it's first-name only when copying from social media (Facebook, Google Plus) and full name when copying from forums (LessWrong, EA Forum), though I have it always use my full name for clarity. While everything I copy over is already world-readable without an account, some people don't want their comments copied (let me know if that includes you!), in which case I show something like:

It's a little fragile, though, and about a year after I last fixed Facebook comment inclusion it broke again. And Google Plus was turned off, so I couldn't pull comments from there either. And the EA Forum and LessWrong migrated to new software that didn't support the old (and not very good) rss-based system. And my adapters for Hacker News and Reddit broke too.

I've now gotten the main four working again, though it's not quite the same as before:

• Google Plus, being shut down, just serves a frozen archive as of my last backup.

• The EA Forum and LessWrong use the GraphQL API and pull comments as needed, so crossposting is very fast.

• I have new code for Facebook that runs selenium while logged out to build an archive, and I'll figure out some system for updating it at some point. Crossposting is very slow, like ~weeks.

At some point I may get Reddit and HN fixed, but I don't crosspost to them very often so it's not much of a priority.

Discuss

### [Link] What do conservatives know that liberals don't (and vice versa)?

2 октября, 2019 - 19:16
Published on October 2, 2019 4:14 PM UTC

I am a PhD student currently conducting research on political polarization and persuasion.  I am running an experiment that requires a database of trivia questions which conservatives are likely to get correct, and liberals are likely to get wrong (and vice versa).  Our pilot testing has shown, for example, Democrats (but not Republicans) tend to overestimate the percentage of gun deaths that involve assault-style rifles, while Republicans (but not Democrats) tend overestimate the proportion of illegal immigrants who commit violent crimes. Similarly, Democrats (but not Republicans) tend to overestimate the risks associated with nuclear power, while Republicans (but not Democrats) underestimate the impact of race-based discrimination on hiring outcomes.

Actually designing these questions is challenging, however, because it’s difficult to know which of one’s political beliefs are most likely to be ill-informed.  As such, I am running a crowdsourcing contest in which we will pay \$100 for any high-quality trivia question submitted (see contest details here: https://redbrainbluebrain.org/).  The only requirements are that participants submit a question text, four multiple choice answers, and a credible source.  The deadline for submissions is October 15th, 2019 at 11:59 p.m.

My intuition is that the LessWrong community will be particularly good at generating these kinds of questions given their commitment to belief updating and rationality. If you don't have the time to participate in the contest, I welcome any ideas about potential topics that might be a fruitful source of these kinds of questions.

Discuss

### What are we assuming about utility functions?

2 октября, 2019 - 18:11
Published on October 2, 2019 3:11 PM UTC

I often notice that in many (not all) discussions about utility functions, one side is "for" their relevance, while others tend to be "against" their usefulness, without explicitly saying what they mean. I don't think this is causing any deep confusions among researchers here, but I'd still like to take a stab at disambiguating some of this, if nothing else for my own sake. Here are some distinct (albeit related) ways that utility functions can come up in AI safety, in terms of what assumptions/hypotheses they give rise to:

AGI utility hypothesis: The first AGI will behave as if it is maximizing some utility function

ASI utility hypothesis: As AI capabilities improve well beyond human-level, it will behave more and more as if it is maximizing some utility function (or will have already reached that ideal earlier and stayed there)

Human utility hypothesis: Even though in some experimental contexts humans seem to not even be particularly goal-directed, utility functions are often a useful model of human preferences to use in AI safety research

Coherent Extrapolated Volition (CEV) hypothesis: For a given human H, there exists some utility function V such that if H is given the appropriate time/resources for reflection, H's values would converge to V

• The "Goals vs Utility Functions" chapter of Rohin's Value Learning sequence, and the resulting discussion focused on differing intuitions about the AGI and ASI utility hypotheses (more accurately, as the title implies, the discussion was whether those agents will be broadly goal-directed at all, a weaker condition than being a utility maximizer).
• AGI utility doesn't logically imply ASI utility, but I'd be surprised if anyone thinks it's very plausible for the former to be true while the latter fails. In particular, the coherence arguments and other pressures that move agents toward VNM seem to roughly scale with capabilities. A plausible stance could be that we should expect most ASIs to hew close to the VNM ideal, but these pressures aren't quite so overwhelming at the AGI level; in particular, humans are fairly goal-directed but only "partially" VNM, so the goal-directedness pressures on an AGI will likely be at this order of magnitude. Depending on takeoff speeds, we might get many years to try aligning AGIs at this level of goal-directedness, which seems less dangerous than playing sorcerer's apprentice with VNM-based AGIs at the same level of capability.(Note: I might be reifying VNM here too much, in thinking of things having a measure of "goal-directedness" with "very goal-directed" approximating VNM. But this basic picture could be wrong in all sorts of ways.)
• The human utility hypothesis is much more vague than the others, and seems ultimately context-dependent. To my knowledge, the main argument in its favor is the fact that most of economics is founded on it. On the other hand, behavioral economists have formulated models like prospect theory for when greater precision is required than the simplistic VNM model gives, not to mention the cases where it breaks down more drastically. I haven't seen prospect theory used in AI safety research; I'm not sure if this reflects more a) the size of the field and the fact that few researchers have had much need to explicitly model human preferences, or b) that we don't need to model humans more than superficially. since this kind of research is still at a very early theoretical stage with all sorts of real-world error terms abounding.
• The CEV hypothesis can be strengthened, consistent with Yudkowsky's original vision, to say that every human will converge to about the same values. But the extra "values converge" assumption seems orthogonal to one's opinions about the relevance of utility functions, so I'm not including it in the above list.
• In practice a given researcher's opinions on these tend to be correlated, so it makes sense to talk of "pro-utility" and "anti-utility" viewpoints. But I'd guess the correlation is far from perfect, and at any rate, the arguments connecting these hypotheses seem somewhat tenuous.

Discuss

### Human instincts, symbol grounding, and the blank-slate neocortex

2 октября, 2019 - 15:06
Published on October 2, 2019 12:06 PM UTC

Intro: What is Common Cortical Algorithm (CCA) theory, and why does it matter for AGI?

As I discussed at Jeff Hawkins on neuromorphic AGI within 20 years, and was earlier discussed on LessWrong at The brain as a universal learning machine, there is a theory, due originally to Vernon Mountcastle in the 1970s, that the neocortex (75% of the human brain) consists of ~150,000 interconnected copies of a little module, the "cortical column", each of which implements the same algorithm. Following Jeff Hawkins, I'll call this the "common cortical algorithm" (CCA) theory. (I don't think that terminology is standard.)

So instead of saying that the human brain has a vision processing algorithm, motor control algorithm, language algorithm, planning algorithm, and so on, in CCA theory we say that (to a first approximation) we have a massive amount of "general-purpose neocortical tissue", and if you dump visual information into that tissue, it does visual processing, and if you connect that tissue to motor control pathways, it does motor control, etc.

Whether and to what extent CCA theory is true is, I think, very important for AGI forecasting, strategy, and both technical and non-technical safety research directionssee my answer here for more details.

Should we believe CCA theory?

CCA theory, as I'm using the term, is a simplified model. There are almost definitely a couple caveats to it:

1. There are sorta "hyperparameters" on the generic learning algorithm which seem to be set differently in different parts of the neocortex. For example, some areas of the cortex have higher or lower density of particular neuron types. I don't think this significantly undermines the usefulness or correctness of CCA theory, as long as these changes really are akin to hyperparameters, as opposed to specifying fundamentally different algorithms. So my reading of the evidence is that if you put, say, motor nerves coming out of visual cortex tissue, the tissue could do motor control, but it wouldn't do it quite as well as the motor cortex does.[1]
2. There is almost definitely a gross wiring diagram hardcoded in the genome—i.e., set of connections between different neocortical regions and each other, and other parts of the brain. These connections later get refined and edited during learning. Again, we can ask how much the existence of this innate gross wiring diagram undermines CCA theory. How complicated is the wiring diagram? Is it millions of connections among thousands of tiny regions, or just tens of connections among a few regions? Would the brain work at all if you started with a random wiring diagram? I don't know for sure, but for various reasons, my current belief is that this initial gross wiring diagram is not carrying much of the weight of human intelligence, and thus that this point is not a significant problem for the usefulness of CCA theory.

Going beyond these caveats, I found pretty helpful literature reviews on both sides of the issue:

• The experimental evidence for CCA theory: see chapter 5 of Rethinking Innateness (1996)
• The experimental evidence against CCA theory: see chapter 5 of The Blank Slate by Steven Pinker (2002).

I won't go through the debate here, but after reading both of those I wound up feeling that CCA theory (with the caveats above) is probably right, though not 100% proven. Please comment if you've seen any other good references on this topic, especially more up-to-date ones.

CCA theory vs human-universal traits and instincts

The main topic for this post is:

If Common Cortical Algorithm theory is true, then how do we account for all the human-universal instincts and behaviors that evolutionary psychologists talk about?

Indeed, we know that there are a diverse set of remarkably specific human instincts and mental behaviors evolved by natural selection. Again, Steven Pinker's The Blank Slate is a popularization of this argument; it ends with Donald E. Brown's giant list of "human universals", i.e. behaviors that are observed in every human culture.

Now, 75% of the human brain is the neocortex, but the other 25% consists of various subcortical ("old-brain") structures like the amygdala, and these structures are perfectly capable of implementing specific instincts. But these structures do not have access to an intelligent world-model—only the neocortex does! So how can the brain implement instincts that require intelligent understanding? For example, maybe the fact that "Alice got two cookies and I only got one!" is represented in the neocortex as the activation of neural firing pattern 7482943. There's no obvious mechanism to connect this arbitrary, learned pattern to the "That's so unfair!!!" section of the amygdala. The neocortex doesn't know about unfairness, and the amygdala doesn't know about cookies. Quite a conundrum!

This is really a symbol grounding problem, which is the other reason this post is relevant to AI alignment. When the human genome builds a human, it faces the same problem as a human programmer building an AI: how can one point a goal system at things in the world, when the internal representation of the world is a complicated, idiosyncratic, learned data structure? As we wrestle with the AI goal alignment problem, it's worth studying what human evolution did here.

List of ways that human-universal instincts and behaviors can exist despite CCA theory

Finally, the main part of this post. I don't know a complete answer, but here are some of the categories I've read about or thought of, and please comment on things I've left out or gotten wrong!

Mechanism 1: Simple hardcoded connections, not implemented in the neocortex

Example: Enjoying the taste of sweet things. This one is easy. I believe the nerve signals coming out of taste buds branch, with one branch going to the cortex to be integrated into the world model, and another branch going to subcortical regions. So the genes merely have to wire up the sweetness taste buds to the good-feelings subcortical regions.

Mechanism 2: Subcortex-supervised learning.

Example: Wanting to eat chocolate. This is different than the previous item because "sweet taste" refers to a specific innate physiological thing, whereas "chocolate" is a learned concept in the neocortex's world-model. So how do we learn to like chocolate? Because when we eat chocolate, we enjoy it (Mechanism 1 above). The neocortex learns to predict a sweet taste upon eating chocolate, and thus paints the world-model concept of chocolate with a "sweet taste" property. The supervisory signal is multidimensional, such that the neocortex can learn to paint concepts with various labels like "painful", "disgusting", "comfortable", etc., and generate appropriate behaviors in response. (See the DeepMind paper Prefrontal cortex as a meta-reinforcement learning system for a more specific discussion along these lines.)

Mechanism 3: Same learning algorithm + same world = same internal model

Possible example: Intuitive biology. In The Blank Slate you can find a discussion of intuitive biology / essentialism, which "begins with the concept of an invisible essence residing in living things, which gives them their form and powers." Thus preschoolers will say that a dog altered to look like a cat is still a dog, yet a wooden toy boat cut into the shape of a toy car has in fact become a toy car. I think we can account for this very well by saying that everyone's neocortex has the same learning algorithm, and when they look at plants and animals they observe the same kinds of things, so we shouldn't be surprised that they wind up forming similar internal models and representations. I found a paper that tries to spell out how this works in more detail; I don't know if it's right, but it's interesting: free link, official link.

Mechansim 4: Human-universal memes

Example: Fire. I think this is pretty self-explanatory. People learn about fire from each other. No need to talk about neurons, beyond the more general issues of language and social learning discussed below.

Mechanism 5: "Two-process theory"

Possible example: Innate interest in human faces.[2] The meta-reinforcement learning mechanism above (Mechanism 2) can be thought of more broadly as an interaction between a hardwired subcortical system that creates a "ground truth", and a cortical learning algorithm that then learns to relate that ground truth to its complex internal representations. Here, Johnson's "two-process theory" for faces fits this same mold, but with a more complicated subcortical system for ground truth. In this theory, a subcortical system gets direct access to a low-resolution version of the visual field, and looks for a pattern with three blobs in locations corresponding to the eyes and mouth of a blurry face. When it finds such a pattern, it passes information to the cortex that this is a very important thing to attend to, and over time the cortex learns what faces actually look like (and suppresses the original subcortical template circuitry). Anyway, Johnson came up with this theory partly based on the observation that newborns are equally entranced by pictures of three blobs versus actual faces (each of which were much more interesting than other patterns), but after a few months the babies were more interested in actual face pictures than the three-blob pictures. (Not sure what Johnson would make of this twitter account.)

(Other possible examples of instincts formed by two-process theory: fear of snakes, interest in human speech sounds, sexual attraction.)

Mechanism 6: Time-windows

Examples: Filial imprinting in animals, incest repulsion (Westermarck effect) in humans. Filial imprinting is a famous result where newborn chicks (and many other species) form a permanent attachment to the most conspicuous moving object that they see in a certain period shortly after hatching. In nature, they always imprint on their mother, but in lab experiments, chicks can be made to imprint on a person, or even a box. As with other mechanisms here, time-windows provides a nice solution to the symbol grounding problem, in that the genes don't need to know what precise collection of neurons corresponds to "mother", they only need to set up a time window and a way to point to "conspicuous moving objects", which is presumably easier. The brain mechanism of filial imprinting has been studied in detail for chicks, and consists of the combination of time-windows plus the two-process model (mechanism 5 above). In fact, I think the two-process model was proven in chick brains before it was postulated in human brains.

There likewise seem to be various time-window effects in people, such as the Westermarck effect, a sexual repulsion between two people raised together as young children (an instinct which presumably evolved to reduce incest).

Mechanism 7 (speculative): empathetic grounding of intuitive psychology.

Possible example: Social emotions (gratitude, sympathy, guilt,...) Again, the problem is that the neocortex is the only place with enough information to, say, decide when someone slighted you, so there's no "ground truth" to use for meta-reinforcement learning. At first I was thinking that the two-process model for human faces and speech could be playing a role, but as far as I know, deaf-blind people have the normal suite of social emotions, so that's not it either. I looked in the literature a bit and couldn't find anything helpful. So, I made up this possible mechanism (warning: wild speculation).

Step 1 is that a baby's neocortex builds a "predicting my own emotions" model using normal subcortex-supervised learning (Mechanism 2 above). Then a normal Hebbian learning mechanism makes two-way connections between the relevant subcortical structures (amygdala) and the cortical neurons involved in this predictive model.

Step 2 is that the neocortex's universal learning algorithm will, in the normal course of development, naturally discover that this same "predicting my own emotions" model from step 1 can be reused to predict other people's emotions (cf. Mechanism 3 above), forming the basis for intuitive psychology. Now, because of those connections-to-the-amygdala mentioned in step 1, the amygdala is incidentally getting signals from the neocortex when the latter predicts that someone else is angry, for example.

Step 3 is that the amygdala (and/or neocortex) somehow learns the difference between the intuitive psychology model running in first-person mode versus empathetic mode, and can thus generate appropriate reactions, with one pathway for "being angry" and a different pathway for "knowing that someone else is angry".

So let's now return to my cookie puzzle above. Alice gets two cookies and I only get one. How can I feel it's unfair, given that the neocortex doesn't have a built-in notion of unfairness, and the amygdala doesn't know what cookies are? The answer would be: thanks to subcortex-supervised learning, the amygdala gets a message that one yummy cookie is coming, but the neocortex also thinks "Alice is even happier", and that thought also recruits the amygdala, since intuitive psychology is built on empathetic modeling. Now the amygdala knows that I'm gonna get something good, but that Alice is gonna get something even better, and that combination (in the current emotional context) triggers the amygdala to send out waves of jealousy and indignation. This is then a new supervisory signal for the neocortex, which allows the neocortex to gradually develop a model of fairness, which in turn feeds back into the intuitive psychology module, and thereby back to the amygdala, allowing the amygdala to execute more complicated innate emotional responses in the future, and so on.

The special case of language.

It's tempting to put language in the category of memes (mechanism 4 above)—we do generally learn language from each other—but it's not really, because apparently groups of kids can invent grammatical languages from scratch (e.g. Nicaraguan Sign Language). My current guess is that it combines three things: (1) a two-process mechanism (Mechanism 5 above) that makes people highly attentive to human speech sounds. (2) possibly "hyperparameter tuning" in the language-learning areas of the cortex, e.g. maybe to support taller compositional hierarchies than would be required elsewhere in the cortex. (3) The fact that language can sculpt itself to the common cortical algorithm rather than the other way around—i.e., maybe "grammatical language" is just another word for "a language that conforms to the types of representations and data structures that are natively supported by the common cortical algorithm".

By the way, lots of people (including Steven Pinker) seem to argue that language processing is a fundamentally different and harder task than, say, visual processing, because language requires symbolic representations, composition, recursion, etc. I don't understand this argument; I think vision processing needs the exact same things! I don't see a fundamental difference between the visual-processing system knowing that "this sheet of paper is part of my notebook", and the grammatical "this prepositional phrase is part of this noun phrase". Likewise, I don't see a difference between recognizing a background object interrupted by a foreground occlusion, versus recognizing a noun phrase interrupted by an interjection. It seems to me like a similar set of problems and solutions, which again strengthens my belief in CCA theory.

Conclusion

When I initially read about CCA theory, I didn't take it too seriously because I didn't see how instincts could be compatible with it. But I now find it pretty likely that there's no fundamental incompatibility. So having removed that obstacle, and also read the literature a bit more, I'm much more inclined to believe that CCA theory is fundamentally correct.

Again, I'm learning as I go, and in some cases making things up as I go along. Please share any thoughts and pointers!

1. The visual cortex actually does do a bit of motor control: it moves the eyeballs. ↩︎

2. See Rethinking Innateness p116, or better yet Johnson's article ↩︎

Discuss

### Toy model #6: Rationality and partial preferences

2 октября, 2019 - 15:04
Published on October 2, 2019 12:04 PM UTC

In my research agenda on synthesising human preferences, I didn't mention explicitly using human rationality to sort through conflicting partial preferences.

This was, in practice, deferred to the "meta-preferences about synthesis". In this view, rationality is just one way of resolving contradictory lower-level preferences, and we wouldn't need to talk about rationality, just observe that it existed - often - within the meta-preferences.

Nevertheless, I think we might gain by making rationality - and its issues - an explicit part of the process.

Defining rationality in preference resolution Explicit choice

We can define rationality in this area by using the one-step hypotheticals. If there is a contradiction between lower-level preferences, then that contradiction is explained to the human subject, and they can render a verdict.

This process can, of course, result in different outcomes depending on how the question is phrased - especially if we allow the one-step hypothetical to escalate to a "hypothetical conversation" where more arguments and evidence is considered.

So the distribution of outcomes would be interesting. If, in cases where most of the relevant argument/evidence is mentioned, the human tends to come down on one side, then that is a strong contender for being their "true" rational resolution of the issues.

However, if in stead the human answers in many different ways, especially if the answer changes because of small changes in how the evidence is ordered, how long the human has to think, whether they get all the counter-evidence or not, and so on - then their preference seems to be much weaker.

For example, I expect that most people similar to me would converge on one answer on questions like "does expected lives save dominate most other considerations in medical interventions?", while having wildly divergent views on "what's the proper population ethics?".

This doesn't matter

Another use of rationality could be to ask the human explicitly whether certain aspects of their preferences should matter. Many human seem to have implicit biases, whether racial or other; many humans believe that it is wrong to have these biases, or at least wrong to let them affect their decisions[1].

Thus another approach for rationality is to query the subject as to whether some aspect should be affecting their decisions or not (because humans only consider a tiny space of options at once, it's better to ask "should X, Y, and Z be relevant", rather than "are A, B, and C the only things that should be relevant?").

Then these kind of rational questions can also be treated in the same way as above.

Weighting rationality

Despite carving out a special place for "rationality", the central thrust of the research agenda remains: a human's rational preferences will dominate other preferences, only if they put great weight in their own rationality.

Real humans don't always change their views just because they can't currently figure out a flaw in an argument; nor would we want them to, especially if their own rationality skills are limited or underused.

1. Having preferences that never affect any decisions at all is, in practice, the same as not having those preferences: they never affect the ordering of possible universes. ↩︎

Discuss