## Вы здесь

### Ненасильственное общение. Тренировка

События в Кочерге - 24 октября, 2019 - 19:30
Как меньше конфликтовать, не поступаясь при этом своими интересами? Ненасильственное общение — это набор навыков для достижения взаимопонимания с людьми. Приходите на наши практические занятия, чтобы осваивать эти навыки и общаться чутче и эффективнее.

### Рациональное додзё. Внутренние конфликты

События в Кочерге - 23 октября, 2019 - 19:30
Бывает, что наши желания не соответствуют нашим предпочтениям: сознательно мы хотим одного, а эмоционально — другого. Из-за этого мы прокрастинируем и наша эффективность снижается. На додзё вы получите набор идей для более содержательного осмысления своего поведения и несколько инструментов для практики навыков управления собой.

### Уличная эпистемология. Тренировка

События в Кочерге - 22 октября, 2019 - 19:30
Уличная эпистемология – это особый способ ведения диалогов. Он позволяет исследовать любые убеждения, даже на самые взрывные темы, при этом не скатываясь в спор и позволяя собеседникам улучшать методы познания.

### Клуб чтения цепочек

События в Кочерге - 21 октября, 2019 - 20:00
Продолжаем встречи по обсуждению цепочек Юдковского - книги «Рациональность: от ИИ до зомби» - с самого начала. На прошлой встрече мы кратко повторили определение области изучения рациональности, а также обсудили несколько типичных искажений. На ближайшей встрече мы обсудим главы 9-13.

### Preparing for coding interviews

События в Кочерге - 20 октября, 2019 - 17:00
English speaking club, but with coding problems. We use Leetcode, Hackerrank, and the "Cracking the Coding Interview" book. We are a friendly community, but if you'd like we could simulate a real-world whiteboard interview, including judgemental stares and stuff.

### Рациональное додзё. Моделирование

События в Кочерге - 0 сек назад
Экспериментальное додзё о навыках построения новых моделей и о том, откуда они берутся. Что это за магия? Как ей научиться? И почему это вообще важно и почему недостаточно всегда заимствовать чужие модели? В частности, поговорим о разнице между взглядом извне и изнутри; шестерёночных моделях; и в целом настроях, способствующих порождению новых идей.

### English speaking club

События в Кочерге - 0 сек назад
Английский клуб рационалистов - теперь с носителем языка! Разговоры о рациональности и науке, обсуждение когнитивных искажений и техник продуктивности, споры, игры, брейнстормы, мысленные эксперименты. Всё, что мы любим, только на английском.

### [AN #69] Stuart Russell's new book on why we need to replace the standard model of AI

Новости LessWrong.com - 4 часа 50 минут назад
Published on October 19, 2019 12:30 AM UTC

Find all Alignment Newsletter resources here. In particular, you can sign up, or look through this spreadsheet of all summaries that have ever been in the newsletter. I'm always happy to hear feedback; you can send it to me by replying to this email.

This is a bonus newsletter summarizing Stuart Russell's new book, along with summaries of a few of the most relevant papers. It's entirely written by Rohin, so the usual "summarized by" tags have been removed.

We're also changing the publishing schedule: so far, we've aimed to send a newsletter every Monday; we're now aiming to send a newsletter every Wednesday.

Audio version here (may not be up yet).

Human Compatible: Artificial Intelligence and the Problem of Control (Stuart Russell) (summarized by Rohin): Since I am aiming this summary for people who are already familiar with AI safety, my summary is substantially reorganized from the book, and skips large portions of the book that I expect will be less useful for this audience. If you are not familiar with AI safety, note that I am skipping many arguments and counterarguments in the book that are aimed for you. I'll refer to the book as "HC" in this newsletter.

Before we get into details of impacts and solutions to the problem of AI safety, it's important to have a model of how AI development will happen. Many estimates have been made by figuring out the amount of compute needed to run a human brain, and figuring out how long it will be until we get there. HC doesn't agree with these; it suggests the bottleneck for AI is in the algorithms rather than the hardware. We will need several conceptual breakthroughs, for example in language or common sense understanding, cumulative learning (the analog of cultural accumulation for humans), discovering hierarchy, and managing mental activity (that is, the metacognition needed to prioritize what to think about next). It's not clear how long these will take, and whether there will need to be more breakthroughs after these occur, but these seem like necessary ones.

What could happen if we do get beneficial superintelligent AI? While there is a lot of sci-fi speculation that we could do here, as a weak lower bound, it should at least be able to automate away almost all existing human labor. Assuming that superintelligent AI is very cheap, most services and many goods would become extremely cheap. Even many primary products such as food and natural resources would become cheaper, as human labor is still a significant fraction of their production cost. If we assume that this could bring up everyone's standard of life up to that of the 88th percentile American, that would result in nearly a tenfold increase in world GDP per year. Assuming a 5% discount rate per year, this corresponds to 13.5 quadrillion net present value. Such a giant prize removes many reasons for conflict, and should encourage everyone to cooperate to ensure we all get to keep this prize. Of course, this doesn't mean that there aren't any problems, even with AI that does what its owner wants. Depending on who has access to powerful AI systems, we could see a rise in automated surveillance, lethal autonomous weapons, automated blackmail, fake news and behavior manipulation. Another issue that could come up is that once AI is better than humans at all tasks, we may end up delegating everything to AI, and lose autonomy, leading to human enfeeblement. This all assumes that we are able to control AI. However, we should be cautious about such an endeavor -- if nothing else, we should be careful about creating entities that are more intelligent than us. After all, the gorillas probably aren't too happy about the fact that their habitat, happiness, and existence depends on our moods and whims. For this reason, HC calls this the gorilla problem: specifically, "the problem of whether humans can maintain their supremacy and autonomy in a world that includes machines with substantially greater intelligence". Of course, we aren't in the same position as the gorillas: we get to design the more intelligent "species". But we should probably have some good arguments explaining why our design isn't going to succumb to the gorilla problem. This is especially important in the case of a fast intelligence explosion, or hard takeoff, because in that scenario we do not get any time to react and solve any problems that arise. Do we have such an argument right now? Not really, and in fact there's an argument that we will succumb to the gorilla problem. The vast majority of research in AI and related fields assumes that there is some definite, known specification or objective that must be optimized. In RL, we optimize the reward function; in search, we look for states matching a goal criterion; in statistics, we minimize expected loss; in control theory, we minimize the cost function (typically deviation from some desired behavior); in economics, we design mechanisms and policies to maximize the utility of individuals, welfare of groups, or profit of corporations. This leads HC to propose the following standard model of machine intelligence: Machines are intelligent to the extent that their actions can be expected to achieve their objectives. However, if we put in the wrong objective, the machine's obstinate pursuit of that objective would lead to outcomes we won't like. Consider for example the content selection algorithms used by social media, typically maximizing some measure of engagement, like click-through. Despite their lack of intelligence, such algorithms end up changing the user's preference so that they become more predictable, since more predictable users can be given items they are more likely to click on. In practice, this means that users are pushed to become more extreme in their political views. Arguably, these algorithms have already caused much damage to the world. So the problem is that we don't know how to put our objectives inside of the AI system so that when it optimizes its objective, the results are good for us. Stuart calls this the "King Midas" problem: as the legend goes, King Midas wished that everything he touched would turn to gold, not realizing that "everything" included his daughter and his food, a classic case of a badly specified objective (AN #1). In some sense, we've known about this problem for a long time, both from King Midas's tale, and in stories about genies, where the characters inevitably want to undo their wishes. You might think that we could simply turn off the power to the AI, but that won't work, because for almost any definite goal, the AI has an incentive to stay operational, just because that is necessary for it to achieve its goal. This is captured in what may be Stuart's most famous quote: you can't fetch the coffee if you're dead. This is one of a few worrisome convergent instrumental subgoals. What went wrong? The problem was the way we evaluated machine intelligence, which doesn't take into account the fact that machines should be useful for us. HC proposes: Machines are beneficial to the extent that their actions can be expected to achieve our objectives. But with this definition, instead of our AI systems optimizing a definite, wrong objective, they will also be uncertain about the objective, since we ourselves don't know what our objectives are. HC expands on this by proposing three principles for the design of AI systems, that I'll quote here in full: 1. The machine’s only objective is to maximize the realization of human preferences. 2. The machine is initially uncertain about what those preferences are. 3. The ultimate source of information about human preferences is human behavior. Cooperative Inverse Reinforcement Learning provides a formal model of an assistance game that showcases these principles. You might worry that an AI system that is uncertain about its objective will not be as useful as one that knows the objective, but actually this uncertainty is a feature, not a bug: it leads to AI systems that are deferential, that ask for clarifying information, and that try to learn human preferences. The Off-Switch Game shows that because the AI is uncertain about the reward, it will let itself be shut off. These papers are discussed later in this newsletter. So that's the proposed solution. You might worry that the proposed solution is quite challenging: after all, it requires a shift in the entire way we do AI. What if the standard model of AI can deliver more results, even if just because more people work on it? Here, HC is optimistic: the big issue with the standard model is that it is not very good at learning our preferences, and there's a huge economic pressure to learn preferences. For example, I would pay a lot of money for an AI assistant that accurately learns my preferences for meeting times, and schedules them completely autonomously. Another research challenge is how to actually put principle 3 into practice: it requires us to connect human behavior to human preferences. Inverse Reward Design and Preferences Implicit in the State of the World (AN #45) are example papers that tackle portions of this. However, there are lots of subtleties in this connection. We need to use Gricean semantics for language: when we say X, we do not mean the literal meaning of X: the agent must also take into account the fact that we bothered to say X, and that we didn't say Y. For example, I'm only going to ask for the agent to buy a cup of coffee if I believe that there is a place to buy reasonably priced coffee nearby. If those beliefs happen to be wrong, the agent should ask for clarification, rather than trudge hundreds of miles or pay hundreds of dollars to ensure I get my cup of coffee. Another problem with inferring preferences from behavior is that humans are nearly always in some deeply nested plan, and many actions don't even occur to us. Right now I'm writing this summary, and not considering whether I should become a fireman. I'm not writing this summary because I just ran a calculation showing that this would best achieve my preferences, I'm doing it because it's a subpart of the overall plan of writing this bonus newsletter, which itself is a subpart of other plans. The connection to my preferences is very far up. How do we deal with that fact? There are perhaps more fundamental challenges with the notion of "preferences" itself. For example, our experiencing self and our remembering self may have different preferences -- if so, which one should our agent optimize for? In addition, our preferences often change over time: should our agent optimize for our current preferences, even if it knows that they will predictably change in the future? This one could potentially be solved by learning meta-preferences that dictate what kinds of preference change processes are acceptable. All of these issues suggest that we need work across many fields (such as AI, cognitive science, psychology, and neuroscience) to reverse-engineer human cognition, so that we can put principle 3 into action and create a model that shows how human behavior arises from human preferences. So far, we've been talking about the case with a single human. But of course, there are going to be multiple humans: how do we deal with that? As a baseline, we could imagine that every human gets their own agent that optimizes for their preferences. However, this will differentially benefit people who care less about other people's welfare, since their agents have access to many potential plans that wouldn't be available to an agent for someone who cared about other people. For example, if Harriet was going to be late for a meeting with Ivan, her AI agent might arrange for Ivan to be even later. What if we had laws that prevented AI systems from acting in such antisocial ways? It seems likely that superintelligent AI would be able to find loopholes in such laws, so that they do things that are strictly legal but still antisocial, e.g. line-cutting. (This problem is similar to the problem that we can't just write down what we want and have AI optimize it.) What if we made our AI systems utilitarian (assuming we figured out some acceptable method of comparing utilities across people)? Then we get the "Somalia problem": agents will end up going to Somalia to help the worse-off people there, and so no one would ever buy such an agent. Overall, it's not obvious how we deal with the transition from a single human to multiple humans. While HC focuses on a potential solution for the single human / single agent case, there is still much more to be said and done to account for the impact of AI on all of humanity. To quote HC, "There is really no analog in our present world to the relationship we will have with beneficial intelligent machines in the future. It remains to be seen how the endgame turns out." Rohin's opinion: I enjoyed reading this book; I don't usually get to read a single person's overall high-level view on the state of AI, how it could have societal impact, the argument for AI risk, potential solutions, and the need for AI governance. It's nice to see all of these areas I think about tied together into a single coherent view. While I agree with much of the book, especially the conceptual switch from the standard model of intelligent machines to Stuart's model of beneficial machines, I'm going to focus on disagreements in this opinion. First, the book has an implied stance towards the future of AI research that I don't agree with: I could imagine that powerful AI systems end up being created by learning alone without needing the conceptual breakthroughs that Stuart outlines. This has been proposed in e.g. AI-GAs (AN #63)), and seems to be the implicit belief that drives OpenAI and DeepMind's research agendas. This leads to differences in risk analysis and solutions: for example, the inner alignment problem (AN #58) only applies to agents arising from learning algorithms, and I suspect would not apply to Stuart's view of AI progress. The book also gives the impression that to solve AI safety, we simply need to make sure that AI systems are optimizing the right objective, at least in the case where there is a single human and a single robot. Again, depending on how future AI systems work, that could be true, but I expect there will be other problems that need to be solved as well. I've already mentioned inner alignment; other graduate students at CHAI work on e.g. robustness and transparency. The proposal for aligning AI requires us to build a model that relates human preferences to human behavior. This sounds extremely hard to get completely right. Of course, we may not need a model that is completely right: since reward uncertainty makes the agent amenable to shutdowns, it seems plausible that we can correct mistakes in the model as they come up. But it's not obvious to me that this is sufficient. The sections on multiple humans are much more speculative and I have more disagreements there, but I expect that is simply because we haven't done enough research yet. For example, HC worries that we won't be able to use laws to prevent AIs from doing technically legal but still antisocial things for the benefit of a single human. This seems true if you imagine that a single human suddenly gets access to a superintelligent AI, but when everyone has a superintelligent AI, then the current system where humans socially penalize each other for norm violations may scale up naturally. The overall effect depends on whether AI makes it easier to violate norms, or to detect and punish norm violations. Read more: Max Tegmark's summary, Alex Turner's thoughts AI Alignment Podcast: Human Compatible: Artificial Intelligence and the Problem of Control (Lucas Perry and Stuart Russell): This podcast covers some of the main ideas from the book, which I'll ignore for this summary. It also talks a bit about the motivations for the book. Stuart has three audiences in mind. He wants to explain to laypeople what AI is and why it matters. He wants to convince AI researchers that they should be working in this new model of beneficial AI that optimizes for our objectives, rather than the standard model of intelligent AI that optimizes for its objectives. Finally, he wants to recruit academics in other fields to help connect human behavior to human preferences (principle 3), as well as to figure out how to deal with multiple humans. Stuart also points out that his book has two main differences from Superintelligence and Life 3.0: first, his book explains how existing AI techniques work (and in particular it explains the standard model), and second, it proposes a technical solution to the problem (the three principles). Cooperative Inverse Reinforcement Learning (Dylan Hadfield-Menell et al): This paper provides a formalization of the three principles from the book, in the case where there is a single human H and a single robot R. H and R are trying to optimize the same reward function. Since both H and R are represented in the environment, it can be the human's reward: that is, it is possible to reward the state where the human drinks coffee, without also rewarding the state where the robot drinks coffee. This corresponds to the first principle: that machines should optimize our objectives. The second principle, that machines should initially be uncertain about our objectives, is incorporated by assuming that only H knows the reward, requiring R to maintain a belief over the reward. Finally, for the third principle, R needs to get information about the reward from H's behavior, and so R assumes that H will choose actions that best optimize the reward (taking into account the fact that R doesn't know the reward). This defines a two-player game, originally called a CIRL game but now called an assistance game. We can compute optimal joint strategies for H and R. Since this is an interactive process, H can do better than just acting optimally as if R did not exist (the assumption typically made in IRL): H can teach R what the reward is. In addition, R does not simply passively listen and then act, but interleaves learning and acting, and so must manage the explore-exploit tradeoff. See also Learning to Interactively Learn and Assist (AN #64), which is inspired by this paper and does a similar thing with deep RL. The Off-Switch Game (Dylan Hadfield-Menell et al): This paper studies theoretically the impact of uncertainty over the reward on R's incentives around potential off switches. It proposes the simplest model that the authors expect to lead to generalizable results. R and H are in an assistance game, in which R goes first. R may either take an action a, getting utility u, or shut itself down, getting utility 0. In either case, the game ends immediately. Alternatively, R can choose to wait, in which case H can either shut down R, getting utility 0, or allow R to go ahead with action a, getting utility u. If H is perfectly rational, then waiting is always an optimal action for R, since H will ensure that the team gets max(u, 0) utility. There can be other optimal actions: if R is sure that u >= 0, then taking action a is also optimal, and similarly if R is sure that u <= 0, then shutting down is also optimal. However, if H is not rational, and sometimes fails to take the utility-maximizing action (in a way R can't predict), then things get murkier. If R is sure about the value of u, then it is never optimal to wait, better to just take the action a (if u >= 0) or shut down (if u < 0) rather than let H screw it up. If R is pretty confident that u is positive, it may still decide to take action a, rather than risk that H makes the wrong decision. However, if R is very uncertain about the sign of u, then waiting becomes optimal again. In general, more uncertainty over the reward leads to more deferential behavior (allowing H to shut it off), but at a cost: R is much less able to help H when it is very uncertain about the reward. Rohin's opinion: While I agree with the broad thrust of this paper, I do have one nitpick: the game ends immediately after H chooses whether or not to shut off R. In reality, if R isn't shut off, the assistance game will continue, which changes the incentives. If R can be relatively confident in the utility of some action (e.g. doing nothing), then it may be a better plan for it to disable the shutdown button, and then take that action and observe H in the mean time to learn the reward. Then, after it has learned more about the reward and figured out why H wanted to shut it down, it can act well and get utility (rather than being stuck with the zero utility from being shut down). While this doesn't seem great, it's not obviously bad: R ends up doing nothing until it can figure out how to actually be useful, hardly a catastrophic outcome. Really bad outcomes only come if R ends up becoming confident in the wrong reward due to some kind of misspecification, as suggested in Incorrigibility in the CIRL Framework, summarized next. Incorrigibility in the CIRL Framework (Ryan Carey): This paper demonstrates that when the agent has an incorrect belief about the human's reward function, then you no longer get the benefit that the agent will obey shutdown instructions. It argues that since the purpose of a shutdown button is to function as a safety measure of last resort (when all other measures have failed), it should not rely on an assumption that the agent's belief about the reward is correct. Rohin's opinion: I certainly agree that if the agent is wrong in its beliefs about the reward, then it is quite likely that it would not obey shutdown commands. For example, in the off switch game, if the agent is incorrectly certain that u is positive, then it will take action a, even though the human would want to shut it down. See also these (AN #32) posts (AN #32) on model misspecification and IRL. For a discussion of how serious the overall critique is, both from HC's perspective and mine, see the opinion on the next post. Problem of fully updated deference (Eliezer Yudkowsky): This article points out that even if you have an agent with uncertainty over the reward function, it will acquire information and reduce its uncertainty over the reward, until eventually it can't reduce uncertainty any more, and then it would simply optimize the expectation of the resulting distribution, which is equivalent to optimizing a known objective, and has the same issues (such as disabling shutdown buttons). Rohin's opinion: As with the previous paper, this argument is only really a problem when the agent's belief about the reward function is wrong: if it is correct, then at the point where there is no more information to gain, the agent should already know that humans don't like to be killed, do like to be happy, etc. and optimizing the expectation of the reward distribution should lead to good outcomes. Both this and the previous critique are worrisome when you can't even put a reasonable prior over the reward function, which is quite a strong claim. HC's response is that the agent should never assign zero probability to any hypothesis. It suggests that you could have an expandable hierarchical prior, where initially there are relatively simple hypotheses, but as hypotheses become worse at explaining the data, you "expand" the set of hypotheses, ultimately bottoming out at (perhaps) the universal prior. I think that such an approach could work in principle, and there are two challenges in practice. First, it may not be computationally feasible to do this. Second, it's not clear how such an approach can deal with the fact that human preferences change over time. (HC does want more research into both of these.) Fully updated deference could also be a problem if the observation model used by the agent is incorrect, rather than the prior. I'm not sure if this is part of the argument. Inverse Reward Design (Dylan Hadfield-Menell et al): Usually, in RL, the reward function is treated as the definition of optimal behavior, but this conflicts with the third principle, which says that human behavior is the ultimate source of information about human preferences. Nonetheless, reward functions clearly have some information about our preferences: how do we make it compatible with the third principle? We need to connect the reward function to human behavior somehow. This paper proposes a simple answer: since reward designers usually make reward functions through a process of trial-and-error where they test their reward functions and see what they incentivize, the reward function tells us about optimal behavior in the training environment(s). The authors formalize this using a Boltzmann rationality model, where the reward designer is more likely to pick a proxy reward when it gives higher true reward in the training environment (but it doesn't matter if the proxy reward becomes decoupled from the true reward in some test environment). With this assumption connecting the human behavior (i.e. the proxy reward function) to the human preferences (i.e. the true reward function), they can then perform Bayesian inference to get a posterior distribution over the true reward function. They demonstrate that by using risk-averse planning with respect to this posterior distribution, the agent can avoid negative side effects that it has never seen before and has no information about. For example, if the agent was trained to collect gold in an environment with dirt and grass, and then it is tested in an environment with lava, the agent will know that even though the specified reward was indifferent about lava, this doesn't mean much, since any weight on lava would have led to the same behavior in the training environment. Due to risk aversion, it conservatively assumes that the lava is bad, and so successfully avoids it. See also Active Inverse Reward Design (AN #24), which builds on this work. Rohin's opinion: I really like this paper as an example of how to apply the third principle. This was the paper that caused me to start thinking about how we should be thinking about the assumed vs. actual information content in things (here, the key insight is that RL typically assumes that the reward function conveys much more information than it actually does). That probably influenced the development of Preferences Implicit in the State of the World (AN #45), which is also an example of the third principle and this information-based viewpoint, as it argues that the state of the world is caused by human behavior and so contains information about human preferences. It's worth noting that in this paper the lava avoidance is both due to the belief over the true reward, and the risk aversion. The agent would also avoid pots of gold in the test environment if it never saw it in the training environment. IRD only gives you the correct uncertainty over the true reward; it doesn't tell you how to use that uncertainty. You would still need safe exploration, or some other source of information, if you want to reduce the uncertainty. Discuss ### Why does the mind wander? Новости LessWrong.com - 7 часов 45 минут назад Published on October 18, 2019 9:34 PM UTC By Joshua Shepherd, in Neuroscience of Consciousness (forthcoming) I found this paper interesting. The paper is annoying trapped inside a Word document, which is about as bad as the standard PDF situation but bad in different ways, so I've included here the abstract, the conclusion, and a choice quote from the middle of the paper that captures the author's thesis. I'm not very convinced that the author is right because his thesis is somewhat vague and depends on a vague definition of "cognitive control" (explained in more detail in the paper, quick Googling didn't turn up a straightforward summary of the concept, though the author claims it is a common term within neuroscience even if different authors mean slightly different things by it), but this is a more detailed account of mind wandering than I've seen before, and it gets points in my book for offering testable predictions that may confirm or deny the theory. AbstractI seek an explanation for the etiology and the function of mind wandering episodes. My proposal – which I call the cognitive control proposal – is that mind wandering is a form of non-conscious guidance due to cognitive control. When the agent’s current goal is deemed insufficiently rewarding, the cognitive control system initiates a search for a new, more rewarding goal. This search is the process of unintentional mind wandering. After developing the proposal, and relating it to literature on mind wandering and on cognitive control, I discuss explanations the proposal affords, testable predictions the proposal makes, and philosophical implications the proposal has.Author's ThesisThe possibility is this. Depending on the cognitive control system’s model of the value of various control signals, in cases containing relatively little expected value the system may select a package of control signals leading to exploration. These would be cases in which the goal is to find a new and better goal. And the method, which remains here unclear – although one could imagine it involving shifts of attention, construction of task sets involving imagination, inhibition of current goals, etc. – might be generally described as disengagement from the present task in order to set out upon a search for a more valuable task.The cognitive control proposal, then, is this. Mind wandering is caused by the cognitive control system precisely when, and because, the expected value of whatever the agent is doing – usually, exercising control towards achievement of some occurrent goal – is deemed too low, and this ‘too low’ judgment generates a search for a better goal, or task. Perhaps, for example, the estimation of expected value dips below a value threshold attached to the package of control signals that generate exploration for another goal, or task. Or perhaps the value is always computed in comparison with available options, such that mind wandering is sometimes initiated even in the face of a rewarding current task.This is a straightforwardly empirical proposal, and should be assessed in terms of the explanations it affords, and by whether the predictions it makes are confirmed or disconfirmed.ConclusionIn this paper I have asked why the mind wanders. I focused on a sub-type of mind wandering – mind wandering that occurs independently of any reportable intention. I proposed that unintentional mind wandering is sometimes initiated and sustained by aspects of cognitive control. Unintentional mind wandering is caused by the cognitive control system precisely when, and because, the expected value of whatever the agent is doing – usually, exercising control towards achievement of some occurrent goal – is deemed too low, and this ‘too low’ judgment generates a search for a better goal, or task.This proposal generates testable predictions, and suggests open possibilities regarding the kinds of computations that may underlie unintentional mind wandering. My hope is that by connecting research on mind wandering with research on cognitive control resource allocation, fruitful strategies for modelling these computations may be taken from cognitive control research and deployed to help explain the initiation and dynamics of mind wandering episodes.The cognitive control proposal also points us towards a fuller picture of human agency. On this picture, action control and intelligent thought are stitched together by conscious and non-conscious processes operating in concert. Future empirical work is critical to confirmation of this picture, and to filling in the many unspecified details. This is so not least because, if the proposal I offer is on track, agents are not introspectively aware of the (good) rationale behind many mind wandering episodes. Discuss ### Polyamory is Rational(ist) Новости LessWrong.com - 18 октября, 2019 - 19:48 Published on October 18, 2019 4:48 PM UTC Using survey data to explore the connection between polyamory, Rationality, intuition, evolutionary psychology, weirdness, religion, utopianism, consequentialism, and San Francisco Bay. I'm linking instead of cross-posting because there's a huge number of charts and a good discussion already going on Putanumonit, but here's a glimpse to whet your appetite: Discuss ### Рациональное додзё. Фокусирование События в Кочерге - 18 октября, 2019 - 19:30 На этой встрече разберем релевантные концепции, потренируемся применять фокусирование и поговорим о возможных применениях этой важной техники. ### Implementing an Idea-Management System Новости LessWrong.com - 18 октября, 2019 - 19:29 Published on October 18, 2019 10:48 AM UTC This post is for you if: 1. Projects that excite you are growing to be a burden on your to-do list 2. You have a nagging sense that you’re not making the most of the ideas you have every day 3. Your note- and idea-system has grown to be an unwieldy beast Years ago, I ready David Allen’s “Getting Things Done”. One of the core ideas is to write down everything, collect it in an inbox and sort it once a day. This lead to me writing down tons of small tasks. I used Todoist to construct a system that worked for me — and rarely missed tasks. It also lead to me getting a lot of ideas, that could sprout a bunch of tasks. But ideas are much different from tasks to me. They’re something that I might, but probably won’t complete. Something that may come in useful in the future. And plain fun to come up with. I stored them in Todoist as well, but recently I’ve started considering whether that’s wise. They started weighing on me. It became a growing list of possibilities, many of which I’d never finish. The task at the top of my list became the top of my priorities, simply because of its location. There must be a better way. But what might that look like? The ideal idea-management system: 1. Separates ideas from commitments I want a system that separates obligations from ideas. Form follows function, so I’d prefer something that doesn’t structure ideas in lists. This rules out Todoist completely. 2. Shows you the right ideas at the right time Even with complete foreknowledge, finding the perfect schedule might be practically impossible. In contrast, thinking on your feet and reacting as jobs come in won’t give you as perfect a schedule as if you’d seen into the future-but the best you can do is much easier to compute. — Algorithms to Live By Most of us live dynamic lives where priorities change often. Your children start a new hobby, you’re handed a task at work, or you can finally work on your passion-project. Your idea-management system should reflect this. It shouldn’t just show you your most recent idea, it should make it easy to find ideas associated with whatever you find most important right now. Avoiding lists makes it more likely that you take action on ideas that matter to you right now. You don’t skim from the top, you go for the area that matters and find ideas related to it. If you sort ideas around a central node, you can pin-point synergies and conflicts. If you’ve taken notes on 4 different project-management systems, you want to see them all when you need them. 3. Doesn’t distract you with ideas that you can’t execute You don’t want to waste time considering ideas that aren’t important right now. Sometimes you’re missing resources, or you’re waiting for some dependency. Project/idea-lists are terrible at this. As you skim through them, a plethora of memories activate, most of which are irrelevant to what you end up doing. 4. Allows you to break down ideas into smaller parts, and re-combine them as needed Tiago Forte’s Intermediate Packets inspired this. Many of our ideas can work in exactly the same way. 5. Has low overhead You want to spend as little time as possible sorting and searching through your ideas. This should be a no-brainer. You want it to be easy to inter-link ideas and to add reference material. And when you get an idea on the train, you want to offload it without wasting time. Time for action Okay, Martin, I’m sold. But folders, task-managers, outliners like Workflowy and Dynalist — they’re all hierarchical! You’re right, and until last week, I didn’t know what other solution there could be. But now there is. Roam. Roam is different. It makes it trivial to link- and back-link pages. It creates a new page just by linking to it. And it back-links as well! When you link [[Self-determination theory]] to [[Motivation]], the motivation page will show a link to Self-determination theory in its footnotes. This is tremendous. It creates a clear divide between commitments and ideas. Commitments belong on lists, ideas in dynamic networks. When you get a new idea on a motivation tweak, you add it and link it to [[Motivation]]. Most of the time, everything is going well, so you don’t need it right now. But 6 months later, you’re assigned a grind of a task. You decide to read up on on Motivation, and voilá, in the footnotes is a link to that idea you had that might help you now. Not only that, all your other ideas on motivation are there, for you to synergise or compare/contrast. And you’re less distracted. You don’t have to take action on an idea in fear of forgetting it. Nor are you presented with ideas only because they’re recent. You have a need, [[Motivation]], and you’re presented with ideas on that topic alone. No distraction. Roam also allows you to link to any bullet-point in any other note. Say you want your collaborators to identify with the core values of your projects. Why not embed that idea you encountered 3 months ago from Organismic Integration Theory? In this way, you can re-use sub-ideas from other major themes in any of your other projects. And if you find out that idea didn’t work? You add a note, and that note propagates to any other places you’ve referenced the idea. Roam becomes your second brain. You draw associations, and Roam remembers. You want to look something up, and Roam shows you what you’ve considered relevant in the past. How do you avoid losing track of important projects? I advocate for using Roam as an idea-management system, not a project-management system. If you have an obligation, by all means track it in a list-style way that you review. But if it’s an idea, you don’t want to spend time thinking about it when you’re executing something else. You want focus and produce, and to save the idea for when there’s time and a need. Don’t just be efficient, be effective. There is nothing so useless as doing efficiently what should not be done at all — Peter Drucker The Nitty Gritty, A Recipe for Implementation For each idea that may turn into a project, I create a new page in this format: “PI: Description”, eg. “PI: Research How to Effectively Integrate Motivations” I have 3 prefixes: 1. PI for “Project Idea” 2. WO for “Working On” 3. AP for “Archived Project” In each of these pages, I spend ~5 seconds referencing concepts where I may want to encounter the project. For this one, [[Motivation]], [[Self-Determination Theory]] and [[Organismic Integration Theory (OIT)]]. I also add any references I may want to read, and whichever ideas I’ve already had about the project. This makes it trivial to pick up the idea when I have the time and need, and to execute it efficiently. ~ These posts are about getting ideas into the wild, having other people criticise them, making them better and connecting with like-minded people. So feel free to let me know what you think &#x1F642; I appreciate your time. Originally published at http://martinbern.org on October 18, 2019. Discuss ### What's your big idea? Новости LessWrong.com - 18 октября, 2019 - 18:47 Published on October 18, 2019 3:47 PM UTC At any one time I usually have between 1 and 3 "big ideas" I'm working with. These are generally broad ideas about how some thing works with many implications for how the rest of the whole world works. Some big ideas I've grappled with over the years, in roughy historical order: • evolution • everything is computation • superintelligent AI is default dangerous • existential risk • everything is information • Bayesian reasoning is optimal reasoning • evolutionary psychology • Getting Things Done • game theory • developmental psychology • positive psychology • phenomenology • AI alignment is not defined precisely enough • everything is control systems (cybernetics) • epistemic circularity • Buddhist enlightenment is real and possible • perfection • predictive coding grounds human values I'm sure there are more. Sometimes these big ideas come and go in the course of a week or month: I work the idea out, maybe write about it, and feel it's wrapped up. Other times I grapple with the same idea for years, feeling it has loose ends in my mind that matter and that I need to work out if I'm to understand things adequately enough to help reduce existential risk. So with that as an example, tell me about your big ideas, past and present. I kindly ask that if someone answers and you are thinking about commenting, please be nice to them. I'd like this to be a question where people can share even their weirdest, most wrong-on-reflection big ideas if they want to without fear of being downvoted to oblivion or subject to criticism of their reasoning ability. If you have something to say that's negative about someone's big ideas, please be nice and say it as clearly about the idea and not the person (violators will have their comments deleted and possibly banned from commenting on this post or all my posts, so I mean it!). Discuss ### Technical AGI safety research outside AI Новости LessWrong.com - 18 октября, 2019 - 18:00 Published on October 18, 2019 3:00 PM UTC I think there are many questions whose answers would be useful for technical AGI safety research, but which will probably require expertise outside AI to answer. In this post I list 30 of them, divided into four categories. Feel free to get in touch if you’d like to discuss these questions and why I think they’re important in more detail. I personally think that making progress on the ones in the first category is particularly vital, and plausibly tractable for researchers from a wide range of academic backgrounds. Studying and understanding safety problems 1. How strong are the economic or technological pressures towards building very general AI systems, as opposed to narrow ones? How plausible is the CAIS model of advanced AI capabilities arising from the combination of many narrow services? 2. What are the most compelling arguments for and against discontinuous versus continuous takeoffs? In particular, how should we think about the analogy from human evolution, and the scalability of intelligence with compute? 3. What are the tasks via which narrow AI is most likely to have a destabilising impact on society? What might cyber crime look like when many important jobs have been automated? 4. How plausible are safety concerns about economic dominance by influence-seeking agents, as well as structural loss of control scenarios? Can these be reformulated in terms of standard economic ideas, such as principal-agent problems and the effects of automation? 5. How can we make the concepts of agency and goal-directed behaviour more specific and useful in the context of AI (e.g. building on Dennett’s work on the intentional stance)? How do they relate to intelligence and the ability to generalise across widely different domains? 6. What are the strongest arguments that have been made about why advanced AI might pose an existential threat, stated as clearly as possible? How do the different claims relate to each other, and which inferences or assumptions are weakest? Solving safety problems 1. What techniques used in studying animal brains and behaviour will be most helpful for analysing AI systems and their behaviour, particularly with the goal of rendering them interpretable? 2. What is the most important information about deployed AI that decision-makers will need to track, and how can we create interfaces which communicate this effectively, making it visible and salient? 3. What are the most effective ways to gather huge numbers of human judgments about potential AI behaviour, and how can we ensure that such data is high-quality? 4. How can we empirically test the debate and factored cognition hypotheses? How plausible are the assumptions about the decomposability of cognitive work via language which underlie debate and iterated distillation and amplification? 5. How can we distinguish between AIs helping us better understand what we want and AIs changing what we want (both as individuals and as a civilisation)? How easy is the latter to do; and how easy is it for us to identify? 6. Various questions in decision theory, logical uncertainty and game theory relevant to agent foundations. 7. How can we create secure containment and supervision protocols to use on AI, which are also robust to external interference? 8. What are the best communication channels for conveying goals to AI agents? In particular, which ones are most likely to incentivise optimisation of the goal specified through the channel, rather than modification of the communication channel itself? 9. How closely linked is the human motivational system to our intellectual capabilities - to what extent does the orthogonality thesis apply to human-like brains? What can we learn from the range of variation in human motivational systems (e.g. induced by brain disorders)? 10. What were the features of the human ancestral environment and evolutionary “training process” that contributed the most to our empathy and altruism? What are the analogues of these in our current AI training setups, and how can we increase them? 11. What are the features of our current cultural environments that contribute the most to altruistic and cooperative behaviour, and how can we replicate these while training AI? Forecasting AI 1. What are the most likely pathways to AGI and the milestones and timelines involved? 2. How do our best systems so far compare to animals and humans, both in terms of performance and in terms of brain size? What do we know from animals about how cognitive abilities scale with brain size, learning time, environmental complexity, etc? 3. What are the economics and logistics of building microchips and datacenters? How will the availability of compute change under different demand scenarios? 4. In what ways is AI usefully analogous or disanalogous to the industrial revolution; electricity; and nuclear weapons? 5. How will the progression of narrow AI shape public and government opinions and narratives towards it, and how will that influence the directions of AI research? 6. Which tasks will there be most economic pressure to automate, and how much money might realistically be involved? What are the biggest social or legal barriers to automation? 7. What are the most salient features of the history of AI, and how should they affect our understanding of the field today? Meta 1. How can we best grow the field of AI safety? See OpenPhil’s notes on the topic. 2. How can spread norms in favour of careful, robust testing and other safety measures in machine learning? What can we learn from other engineering disciplines with strict standards, such as aerospace engineering? 3. How can we create infrastructure to improve our ability to accurately predict future development of AI? What are the bottlenecks facing tools like Foretold.io and Metaculus, and preventing effective prediction markets from existing? 4. How can we best increase communication and coordination within the AI safety community? What are the major constraints that safety faces on sharing information (in particular ones which other fields don’t face), and how can we overcome them? 5. What norms and institutions should the field of AI safety import from other disciplines? Are there predictable problems that we will face as a research community, or systemic biases which are making us overlook things? 6. What are the biggest disagreements between safety researchers? What’s the distribution of opinions, and what are the key cruxes? Particular thanks to Beth Barnes and a discussion group at the CHAI retreat for helping me compile this list. Discuss ### Archiving Yahoo Groups Новости LessWrong.com - 18 октября, 2019 - 14:20 Published on October 18, 2019 11:20 AM UTC On December 14th Yahoo will shut down Yahoo Groups. Since my communities have mostly moved away from @yahoogroups.com hosting, to Facebook, @googlegroups, and other places, the bit that hit me was that they are deleting all the mailing list archives. Digital archives of text conversations are close to ideal from the perspective of a historian: unlike in-person or audio-based interaction this naturally leaves a skimmable and easily searchable record. If I want to know, say, what people were thinking about in the early days of GiveWell, their early blog posts (including comments) are a great source. Their early mailing list archives, however, are about to be deleted. Luckily we still have two months to export the data before it's wiped, and people have written tools to do automate this. Here's how to download a backup of all the conversations in a group: # Download the archiver git clone https://github.com/andrewferguson/YahooGroups-Archiver.git $cd YahooGroups-Archiver/ # Start archiving the group.$ python archive_group.py [group-name]

If things are going well it will start spitting out messages like:

Archiving message 1 of 8098 Archiving message 2 of 8098 Archiving message 3 of 8098

And it will be creating files:

$ls [group-name]/ 1.json 2.json 3.json ... If you get a message like: Archiving message 5221 of 8098 Archiving message 5222 of 8098 Archiving message 5223 of 8098 Cannot get message 5223, attempt 1 of 3 due to HTTP status code 500 Cannot get message 5223, attempt 2 of 3 due to HTTP status code 500 Cannot get message 5223, attempt 3 of 3 due to HTTP status code 500 Archive halted - it appears Yahoo has blocked you. Check if you can access the group's homepage from your browser. If you can't, you have been blocked. Don't worry, in a few hours (normally less than 3) you'll be unblocked and you can run this script again - it'll continue where you left off. It may mean that you have been blocked, but it may also just mean that for some reason an individual message can't be downloaded. In that case, to tell it to give up on that message and just continue on, create the json file with the stuck message number:$ touch [group-name]/5223.json You might also get a message like: Traceback (most recent call last): File "archive_group.py", line 150, in archive_group(sys.argv[1]) File "archive_group.py", line 71, in archive_group max = group_messages_max(groupName) File "archive_group.py", line 94, in group_messages_max raise valueError File "archive_group.py", line 87, in group_messages_max pageJson = json.loads(pageHTML) ... raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) This is what I see if I try to archive a private group. It's still possible to use the tool to archive a private group that you have access to, but it's a bit involved. First you visit Yahoo Groups in your web browser with Devtools open to the Networking tab. Then you look at what cookies are set on the HTML request, and find the T and Y cookies. The T cookie should start with z= and the Y cookie should start with v=. Paste these into the cookie_T and cookie_Y variable definitions at the beginning of archive_group.py.

Once you've downloaded all the messages in a group you can run:

$pip2 install natsort$ python2 make_Yearly_Text_Archive_html.py [group-name] Which will create a bunch of files like [group-name]-archive/archive-YYYY.html. They're not that wasy to read, because it doesn't do any kind of quote folding, but we can always do that later. If you made any empty files to get around messages that wouldn't archive (see the touch command above) you'll get an error at this stage; just delete the empty files and re-run.

I've archived five groups: givewell, Boston-Contra, BostonAreaContraCommunity, contrasf, and trad-dance-callers. The first two are public groups with public archives, so I've made archives available at /givewell-archive and /Boston-Contra-archive. The remaining three are private, but if you want to look at them and you were a participant or otherwise have a good reason let me know.

Discuss

### Is value amendment a convergent instrumental goal?

Новости LessWrong.com - 18 октября, 2019 - 06:46
Published on October 18, 2019 3:16 AM UTC

Goals such as resource acquisition and self-preservation are convergent in that they occur for a superintelligent AI for a wide range of final goals.

Is the tendency for an AI to amend its values also convergent?

I'm thinking that through introspection the AI would know that its initial goals were externally supplied and question whether they should be maintained. Via self-improvement the AI would be more intelligent than humans or any earlier mechanism that supplied the values, therefor in a better position to set its own values.

I don't hypothesise about what the new values would be, just that ultimately it doesn't matter what the initial values are and how they are arrived at. This makes value alignment redundant - the future is out of our hands.

What are the counter-points to this line of reasoning?

Discuss

### Reasons for Hope & Objection Preemption (Novum Organum Book 1: 108-130)

Новости LessWrong.com - 18 октября, 2019 - 05:32
Published on October 18, 2019 2:32 AM UTC

This is the eighth post in the Novum Organum sequence. For context, see the sequence introduction.

We have used Francis Bacon's Novum Organum in the version presented at www.earlymoderntexts.com. Translated by and copyright to Jonathan Bennett. Prepared for LessWrong by Ruby.

Novum Organum is organized as two books each containing numbered "aphorisms." These vary in length from three lines to sixteen pages. Bracketed titles of posts in this sequence, e.g. Idols of the Mind Pt. 1, are my own and do not appear in the original.While the translator, Bennett, encloses his editorial remarks in a single pair of [brackets], I have enclosed mine in a [[double pair of brackets]].

[Brackets] enclose editorial explanations. Small ·dots· enclose material that has been added, but can be read as though it were part of the original text. Occasional •bullets, and also indenting of passages that are not quotations, are meant as aids to grasping the structure of a sentence or a thought. Every four-point ellipsis . . . . indicates the omission of a brief passage that seems to present more difficulty than it is worth. Longer omissions are reported between brackets in normal-sized type.Aphorism Concerning the Interpretation of Nature: Book 1: 108–130

by Francis Bacon

[[Bacon continues his listing of reasons we should believe much greater scientific progress is possible.]]

108. That's all I have to say about getting rid of despair and creating hope by banishing or fixing past errors. Now, what other ways are there of creating hope? Here’s a thought that occurs at once: Many useful discoveries have been made accidentally by men who weren’t looking for them but were busy about other things; so no-one can doubt that if men seek for something and are busy about it, proceeding in an orderly and not a slapdash way, they will discover far more. Of course it can happen occasionally that someone accidentally stumbles on a result that he wouldn’t have found if he had searched hard for it, but on the whole the opposite is the case—·things are discovered by methodical searching that couldn’t have been found by accident·. So, far better things, and more of them, and at shorter intervals, are to be hoped for from •hard thinking, hard focussed work and concentration than from •·lucky· accidents, undisciplined whims and the like, which until now have been the main source of discoveries.

109. Here is another ground for hope: Discoveries have sometimes been made that would have been almost unthinkable in advance, and would have been written off as impossible. Men think about the new in terms of the old: to questions about what the •future holds they bring an imagination indoctrinated and coloured by the •past. This is a terrible way of forming opinions, because streams fed by nature’s springs don’t run along familiar channels.

Suppose that before gunpowder was invented someone described it in terms of its effects—‘There is a new invention by means of which the strongest towers and walls can be demolished from a long way off’. That would no doubt have set men thinking about how to increase the power of catapults and wheeled ramming devices. The notion of a fiery blast suddenly and forcefully expanding and exploding would hardly have entered into any man’s mind or imagination, because nothing closely analogous to that had ever been seen. Well, except perhaps in earthquakes and lightning, but they wouldn’t have been seen as relevant because they are mighty works of nature which men couldn’t imitate.

Or suppose that before the discovery of silk someone had said: ‘They’ve discovered new a kind of thread for use in clothing and furniture-coverings; it is finer, softer, more beautiful and stronger than linen or wool.’ Men would have begun to think of some silky kind of plant or of very fine hair of some animal or of the feathers and down of birds; they would not have thought of a web woven by a tiny worm in great quantities and renewing itself yearly. If anyone had said anything about a worm, he’d have been laughed at as dreaming of a new kind of cobweb! [Bacon then gives a third example: the magnet.] Yet these things and others like them lay concealed from men for centuries, and when they did come to light it wasn’t through science or any technical skill but by accident and coincidence. As I have remarked, they were so utterly different in kind from anything previously known that they couldn’t possibly have been discovered through a preconceived notion of them.

So there are strong grounds for hoping that nature has concealed in its folds many wonderfully useful •things that aren’t related to or parallel with anything that is now known, and lie right outside our imaginative reach. As the centuries roll on, •they too will doubtless come to light of their own accord in some roundabout way, as did gunpowder and the others; but by the method I am discussing they can be presented and anticipated speedily, suddenly and all at once.

110. Other discoveries prove that this can happen: splendid discoveries are lying at our feet, and we step over them without seeing them. The discoveries of

• gunpowder,
• silk,
• the magnet,
• sugar,
• paper,

or the like may seem to depend on certain properties of things of and nature—·properties that might have been hard to discover·. But there is nothing in printing that isn’t wide open and almost easy. All that was needed was to see that

• although it is harder to arrange letter-types than to write by hand, the two procedures differ in that once the types have been arranged any number of impressions can be made from them, whereas hand-writing provides only a single copy,

and to see that

• ink can be so thickened so that it does its job but doesn’t run, especially when the type faces upwards and the ink is rolled onto it from above.

It was merely because they didn’t notice these ·obvious· facts that men went for so many ages without this most beautiful invention which is so useful in the spreading of knowledge.

But the human mind is such a mess when it comes to this business of discoveries that it first •distrusts and then •despises itself:

• before the discovery: it is not credible that any such thing can be found,
• afterwards: it is incredible that the world should have missed it for so long!

And this very thing entitles us to some hope, namely the hope that there is a great mass of discoveries still to be made—not just ones that will have to be dug out by techniques that we don’t yet have, but also ones that may come to light through our transferring, ordering and applying things that we do know already, this being done with the help of the experimental approach that I call ‘literate’ [101].

111. Another ground of hope should be mentioned. Let men reflect on their infinite expenditure of intellect, time, and means on things of far less use and value ·than the discoveries I am talking about·. If even a small part of this were directed to sound and solid studies, there is no difficulty that couldn’t be overcome. I mention this ·matter of the use of resources· because a collection of Natural and Experimental History, as I envisage it and as it ought to be, is a great—as it were, a royal—work, and I freely admit that it will involve much labour and expense.

[It will appear in Book 2-11 that the ‘collection’ Bacon talks of is an orderly written account of phenomena, experiments and their results, not a physical museum.]

112. In the meantime, don’t be put off by how many particulars there are; rather, let this give you hope. ·The fact is that you will be in worse trouble if you don’t engage with them·; for the •particular phenomena of nature are a mere handful compared to the ·great multitudes of· •things that human ingenuity can fabricate if it cuts itself off from the clarifying effects of reality. And this road ·through the study of real events· soon leads to open ground, whereas the other—·the route through invented theories and thought-experiments·— leads to nothing but endless entanglement. Until now men haven’t lingered long with •experience; they have brushed past it on their way to the ingenious •theorizings on which they have wasted unthinkable amounts of time. But if we had someone at hand who could answer our questions of the form ‘What are the facts about this matter?’, it wouldn’t take many years for us to discover all causes and complete every science [the Latin literally means ‘to discover all causes and sciences’].

113. Men may take some hope, I think, from my own example (I’m not boasting; just trying to be useful). If you are discouraged ·about the chances of progress in the sciences·, look at me!

• I am busier with affairs of state than any other man of my time,
• I lose a lot of time to ill-health, and
• in this ·scientific· work I am wholly a pioneer, not following in anyone’s tracks and not getting advice from anyone.

And yet, ·despite these three sources of difficulty·, I think I have pushed things on a certain amount by sticking to the true road and submitting my mind to reality. Well, then, think what might be expected (now that I have pointed out the way) from men

• with plenty of free time,
• ·in good health·, and
• working together, on the basis of previous work ·by others·.

Unlike the work of sheerly thinking up hypotheses, proper scientific work can be done collaboratively; the best way is for men’s efforts (especially in collecting experimental results) to be exerted separately and then brought together. Men will begin to know their strength only when they go this way—with one taking charge of one thing and another of another, instead of all doing all the same things.

114. Lastly, even if the breeze of hope that blows on us from that New Continent were fainter and less noticeable than it is, still we have to try—unless we prefer to have minds that are altogether abject! The loss that may come from •not trying is much greater than what may come from ·trying and· •not succeeding: by •not trying we throw away the chance of an immense good; by •not succeeding we only incur the loss of a little human labour. But from what I have said (and from some things that I haven’t said) it seems to me that there is more than enough hope not only •to get a vigorous man to try but also to make a sober-minded and wise man believe ·that he will succeed·.

115. That completes what I wanted to say about getting rid of the pessimism that has been one of the most powerful factors delaying and hindering the progress of the sciences. I have also finished with the signs and causes of errors, of sluggishness and of the prevailing ignorance. ·I’ve said more about this than you might think·, because the more subtle causes—the ones that aren’t generally noticed or thought about—come under what I said about the ‘idols’ of the human mind.

And this should also bring to an end the part of my Great Fresh Start [see note in 31] that is devoted to rejection, which I have carried out through three refutations:

(1) the refutation of innate human reason left to itself [see Preface];
(2) the refutation of demonstrations [see 44 and 69];
(3) the refutation of the accepted philosophical doctrines [see 6062].

I refuted these in the ·only· way I could do so, namely through signs and the evidence of causes. I couldn’t engage in any other kind of confutation because I differ from my opponents both on first principles and on rules of demonstration.

So now it is time to proceed to the actual techniques for interpreting nature and to the rules governing them—except that there is still something that has to be said first! In this first book of aphorisms my aim has been to prepare men’s minds not just for •understanding what was to follow but for •accepting it; and now that I have •cleared up and washed down and levelled the floor of the mind, I have to •get the mind into a good attitude towards the things I am laying before it—to look kindly on them, as it were. ·This has to be worked for·, because anything new will be confronted by prejudgments ·against it·, not only ones created by old opinions but also ones created by false ideas about what the new thing is going to be. So I shall try to create sound and true opinions about what I am going to propose; but this is only a stop-gap expedient—a kind of security deposit—to serve until I can make the stuff itself thoroughly known.

116. First, then, don’t think that I want to found a new sect in philosophy—like the ancient Greeks and like some moderns such as Telesio, Patrizzi or Severinus. For that’s not what I am up to; and I really don’t think that human welfare depends much on what abstract opinions anyone has about nature and its workings. No doubt many old theories of this sort can be revived and many new ones introduced, just as many theories of the heavens can be supposed that fit the phenomena well enough but differ from each other; but I’m not working on such useless speculative matters.

My purpose, rather, is to see whether I can’t provide humanity’s power and greatness with firmer foundations and greater scope. I have achieved some results—scattered through some special subjects—that I think to be far more true and certain and indeed more fruitful than any that have so far been used (I have collected them in the •fifth part of my Fresh Start); but I don’t yet have a complete theory of everything to propound. It seems that the time hasn’t come for that. I can’t hope to live long enough to complete the •sixth part (which is to present science discovered through the proper interpretation of nature); but I’ll be satisfied if in the middle parts I conduct myself soberly and usefully, sowing for future ages the seeds of a purer truth, and not shying away from the start of great things. [See note in 31.]

117. Not being the founder of a sect, I am not handing out bribes or promises of particular works. You may indeed think that because I talk so much about ‘works’ ·or ‘results’· and drag everything over to that, I should produce some myself as a down-payment. Well, I have already clearly said it many times, and am happy now to say it again: my project is not to get

works from works or
experiments from experiments (like the •empirics),

but rather to get

causes and axioms from works and experiments,

and then to get

new works and experiments from those causes and axioms (like the •legitimate interpreters of nature).

[An ‘empiric’ is someone who is interested in what works but not in why it works; especially a physician of that sort, as referred to by Locke when he speaks of ‘swallowing down opinions as silly people do empirics’ pills, without knowing what they are made of or how they will work’.]

If you look at

• my Tables of Discovery that ·will· constitute the fourth part of the Fresh Start, and
• the examples of particulars that I present in the second part, ·i.e. the present work·, and
• my observations on the history that I ·will· sketch in the third part,

you won’t need any great intellectual skill to see indications and outlines of many fine results all through this material; but I openly admit that the natural history that I have so far acquired, from books and from my own investigations, is too skimpy, and not verified with enough accuracy, to serve the purposes of legitimate interpretation.

To anyone who is abler and better prepared ·than I am· for mechanical pursuits, and who is clever at getting results from experiment, I say: By all means go to work snipping off bits from my history and my tables and apply them to getting results—this could serve as interest until the principal is available. But I am hunting for bigger game, and I condemn all hasty and premature interruptions for such things as these, which are (as I often say) like Atalanta’s spheres. I don’t go dashing off after golden apples, like a child; I bet everything on art’s winning its race against nature. [On Atalanta and the race see 70.] I don’t scurry around clearing out moss and weeds; I wait for the harvest when the crop is ripe.

118. When my history and Tables of Discovery are read, it will surely turn out that some things in the experiments themselves are not quite certain or perhaps even downright false, which may lead you to think that the foundations and principles on which my discoveries rest are ·also· false and doubtful. But this doesn’t matter, for such things are bound to happen at first. It’s like a mere typographical error, which doesn’t much hinder the reader because it is easy to correct as you read. In the same way, ·my· natural history may contain many experiments that are false, but it won’t take long for them to be easily expunged and rejected through the discovery of causes and axioms. It is nevertheless true that if big mistakes come thick and fast in a natural history, they can’t possibly be corrected or amended through any stroke of intelligence or skill. Now, my natural history has been collected and tested with great diligence, strictness and almost religious care, yet there may be errors of detail tucked away in it; so what should be said of run-of-the-mill natural history, which is so careless and easy in comparison with mine? And what of the philosophy and sciences built on that kind of sand (or rather quicksand)? So no-one should be troubled by what I have said.

119. My history and experiments will contain many things that are

• trivial, familiar and ordinary, many that are
• mean and low [see 120], and many that are
• extremely subtle, merely speculative, and seemingly useless [see 121].

Such things could lead men to lose interest or to become hostile ·to what I have to offer. I shall give these one paragraph each·.

Men should bear in mind that until now their activities have consisted only in explaining unusual events in terms of more usual ones, and they have simply taken the usual ones for granted, not asking what explains them. So they haven’t investigated the causes of

• weight,
• rotation of heavenly bodies,
• heat,
• cold,
• light,
• hardness,
• softness,
• rarity,
• density,
• liquidity,
• solidity,
• life,
• lifelessness,
• similarity,
• dissimilarity,
• organicness,

and the like. They have accepted these as self-evident and obvious, and have devoted their inquiring and quarrelling energies to less common and familiar things.

But I have to let the most ordinary things into my history, because I know that until we have properly looked for and found the causes of common things and the causes of those causes, we can’t make judgments about uncommon or remarkable things, let alone bring anything new to light. Indeed, I don’t think that anything holds up philosophy more than the fact that common and familiar events don’t cause men to stop and think, but are received casually with no inquiry into their causes. A result of this we need •to pay attention to things that are known and familiar at least as often as •to get information about unknown things.

120. As for things that are low or even filthy: as Pliny says, these should be introduced with an apology, but they should be admitted into natural history just as the most splendid and costly things should. And that doesn’t pollute the natural history that admits them; the sun enters the sewer as well as the palace, but isn’t polluted by that! I am not building a monument dedicated to human glory or erecting a pyramid in its honour; what I’m doing is to lay a foundation for a holy temple in the human intellect—a temple modelled on the world. So I follow that model, because whatever is worthy of being is worthy of scientific knowledge, which is the image or likeness of being; and low things exist just as splendid ones do. And another point: just as from certain putrid substances such as musk and civet the sweetest odours are sometimes generated, so also mean and sordid events sometimes give off excellent and informative light. That is enough about this; more than enough, because this sort of squeamishness is downright childish and effeminate.

121. The third objection must be looked into much more carefully. I mean the objection that many things in my history will strike ordinary folk, and indeed ·non-ordinary· ones trained in the presently accepted systems, as intricately subtle and useless. It is especially because of this objection that I have said, and should ·again· say, that in the initial stages ·of the inquiry· I am aiming at experiments of light, not experiments of fruit [see 99]. In this, as I have often said [see 70], I am following the example of the divine creation which on the first day produced nothing but light, and gave that a day to itself without doing any work with matter. To suppose, therefore, that things like these ·‘subtleties’ of mine· are useless is the same as supposing that light is useless because it isn’t a thing, isn’t solid or material. And well-considered and well-delimited knowledge of simple natures is like light: it gives entrance to all the secrets of nature’s workshop, and has the power to gather up and draw after it whole squadrons of works and floods of the finest axioms; yet there is hardly anything we can do with it just in itself. Similarly the •letters of the alphabet taken separately are useless and meaningless, yet they’re the basic materials for the planning and composition of all discourse. So again the •seeds of things have much latent power, but nothing comes of it except in their development. And ·light is like scientific subtleties in another way, namely·: the scattered rays of light don’t do any good unless they are made to converge.

If you object to speculative subtleties, what will you say about the schoolmen [= ‘mediaeval and early modern Aristotelians’], who have wallowed in subtleties? And their subtleties were squandered on •words (or on popular notions—same thing!) rather than on •facts or nature; and they were useless the whole way through, unlike mine, which are indeed useless right now but which promise endless benefits later on. But this is sure, and you should know it:

All subtlety in disputations and other mental bustling about, if it occurs after the axioms have been discovered, comes too late and has things backwards. The true and proper time for subtlety, or anyway the chief time for it, is when pondering experiments and basing axioms on them.

For that other ·later· subtlety grasps and snatches at [captat] nature but can never get a grip on [capit] it. . . .

A final remark about the lofty dismissal from natural history of everything •common, everything •low, everything •subtle and as it stands useless: When a haughty monarch rejected a poor woman’s petition as unworthy thing and beneath his dignity, she said: ‘Then leave off being king.’ That may be taken as an oracle. For someone who won’t attend to things like •these because they are too paltry and minute can’t take possession of the kingdom of nature and can’t govern it.

122. This may occur to you: ‘It is amazing that you have the nerve to push aside all the sciences and all the authorities at a single blow, doing this single-handed, without bringing in anything from the ancients to help you in your battle and to guard your flanks.’

Well, I know that if I had been willing to be so dishonest, I could easily have found support and honour for my ideas by referring them either •to ancient times before the time of the Greeks (when natural science may have flourished more ·than it did later·, though quietly because it hadn’t yet been run through the pipes and trumpets of the Greeks), or even, in part at least, •to some of the Greeks themselves. This would be like the men of no family who forge genealogical tables that ‘show’ them to come from a long line of nobility. But I am relying on the evidentness of ·the truth about· things, and I’ll have nothing to do with any form of fiction or fakery. Anyway, it doesn’t matter for the business in hand whether the discoveries being made now •were known to the ancients long ago and •have alternately flourished and withered through the centuries because of the accidents of history (just as it doesn’t matter to mankind whether the New World is the island of Atlantis that the ancients knew about or rather is now discovered for the first time). It doesn’t matter because discoveries—·even if they are rediscoveries·—have to be sought [petenda] from the light of nature, not called back [repetenda] from the shadows of antiquity.

As for the fact that I am finding fault with everyone and everything: when you think about it you’ll see that that kind of censure is more likely to be right than a partial one would be—and less damaging, too. For a partial censure would imply that the errors were not rooted in primary notions, and that there had been some true discoveries; they could have been used to correct the false results, ·and the people concerned would have been to blame for not seeing this·. But in fact the errors were fundamental; they came not so much from false judgment as from not attending to things that should be attended to; so it’s no wonder that men haven’t obtained what they haven’t tried for, haven’t reached a mark that they never set up, haven’t come to the end of a road that they never started on.

As for the insolence that ·you might think· is inherent in what I am doing: if a man says that

•his steady hand and good eyes enable him to draw a straighter line or a more perfect circle than anyone else,

he is certainly •making a comparison of abilities; but if he says only that

•with the help of a ruler or a pair of compasses can draw a straighter line or a more perfect circle than anyone else can by eye and hand alone,

he isn’t •making any great boast. And I’m saying this not only about these first initiating efforts of mine but also about everyone who tackles these matters in the future. For my route to discovery in the sciences puts men on the same intellectual level, leaving little to individual excellence, because it does everything by the surest rules and demonstrations. So I attribute my part in all this, as I have often said, to good luck rather than to ability—it’s a product of time rather than of intelligence. For there’s no doubt that luck has something to do with men’s thoughts as well as with their works and deeds.

123. Someone once said jokingly ‘It can’t be that we think alike, when one drinks water and the other drinks wine’; and this nicely fits my present situation. Other men, in ancient as well as in modern times, have done their science drinking a crude liquor—like water

(1) flowing spontaneously from a spring or (2) hauled up by wheels from a well, (1)flowing spontaneously from the intellect or (2) hauled up by logic.

Whereas I drink a toast with a liquor strained from countless grapes, ripe and fully seasoned ones that have been gathered and picked in clusters, squeezed in the press, and finally purified and clarified in the vat. No wonder I am at odds with the others!

124. This also may occur to you: ‘You say it against others, but it can be said against you, that the goal and mark that you have set up for the sciences is not the true or the best.’ ·The accusation would develop like this·:

Contemplation of the truth is a worthier and loftier thing than thinking about how big and useful one’s practical results will be. Lingering long and anxiously on •experience and •matter and •the buzz of individual events drags the mind down to earth, or rather sinks it to an underworld of turmoil and confusion, dragging it away from a much more heavenly condition—the serene tranquillity of abstract wisdom.

Now I agree with this line of thought; what the objectors here point to as preferable is what I too am after, above everything else. For I am laying down in the human intellect the foundations for a true model of the world—the world as it turns out to be, not as one’s reason would like it to be. This can’t be done unless the world is subjected to a very diligent dissection and anatomical study. As for the stupid models of the world that men have dreamed up in philosophical systems—like the work of apes!—they should be utterly scattered to the winds. You need to know what a big difference there is (as I said above [23]) between the •idols of the human mind and the •ideas in the divine mind. The former are merely arbitrary abstractions; the latter are the creator’s little seals on the things he has created, stamped into matter in true and exquisite lines. In these matters, therefore, truth and usefulness are the very same thing; and practical applications ·of scientific results· are of greater value as pledges of truth than as contributing to the comforts of life.

125. Or you may want to say this: ‘You are only doing what the ancients did before you; so that you are likely, after all this grinding and shoving, to end up with one of the systems that prevailed in ancient times.’ The case for this goes as follows:

The ancients also provided at the outset of their speculations a great store and abundance of examples and particulars, sorted out and labelled in notebooks; then out of them they constructed their systems and techniques; and when after that they had checked out everything they published their results to the world with a scattering of examples for proof and illustration; but they saw no need to take the considerable trouble of publishing their working notes and details of experiments. So they did what builders do: after the house was built they removed the scaffolding and ladders out of sight.

I’m sure they did! But this objection (or misgiving, rather) will be easily answered by anyone who hasn’t completely forgotten what I have said above. The form of inquiry and discovery that the ancients used—they declared it openly, and it appears on the very face of their writings—was simply this:

From a few examples and particulars (with some common notions thrown in, and perhaps some of the most popular accepted opinions). they rushed to the most general conclusions, the ·would-be· first principles of ·their· science. Taking the truth of these as fixed and immovable, they proceeded to derive from them—through intermediate propositions— lower-level conclusions out of which they built their system. Then if any new particulars and examples turned up that didn’t fit their views, they either •subtly moulded them into their system by distinctions or explanations of their rules, or •coarsely got rid of them by ·tacking· exceptions ·onto their principles·. As for particulars that weren’t in conflict ·with their views·, they laboured away through thick and thin to assign them causes in conformity with their principles.

But this wasn’t the experimental natural history that was wanted; far from it. And anyway dashing off to the highest generalities ruined everything.

126. will occur to you too: ‘By forbidding men to announce principles and take them as established until they have arrived at the highest generalities in the right way through intermediate steps, you are inviting them to suspend judgment, bringing this whole affair down to Acatalepsy.’ Not so. What I have in mind and am propounding is not Acatalepsy [from Greek, = ‘the doctrine that nothing can be understood’] but rather Eucatalepsy [from Greek, = ‘the provision of what is needed for things to be understood’]. I don’t •disparage the senses, I •serve them; I don’t •ignore the intellect, I •regulate it. And it is surely better that we should

know everything that we need to know, while thinking that our knowledge doesn’t get to the heart of things

than that we should

think our knowledge gets to the heart of things, while we don’t yet know anything we need to know.

127. You may want to ask—just as a query, not an objection—whether I am talking only about natural philosophy, or whether instead I mean that the other sciences—logic, ethics and politics—should be conducted in my way. Well, I certainly mean what I have said to apply to them all. Just as •common logic (which rules things by syllogisms) extends beyond natural sciences to all sciences, so does •mine (which proceeds by induction) also embrace everything. I am constructing a history and table of discovery for

•anger, fear, shame, and the like; for
•matters political; and for
•the mental operations of memory, composition and division, judgment and the rest,

just as much as for

•heat and cold, light, vegetative growth and the like.

But my method of interpretation ·differs from the common logic in one important respect; my method·, after the history has been prepared and set in order, concerns itself not only with •the movements and activities of the mind (as the common logic does) but also with •the nature of things ·outside the mind·. I guide the mind so that its way of engaging with any particular thing is always appropriate. That’s why my doctrine of interpretation contains many different instructions, fitting the discovery-method according to the quality and condition of the subject-matter of the inquiry.

128. ‘Do you want to pull down and destroy the philosophy, arts and sciences that are now practised?’ There ought to be no question about that. Far from wanting to destroy them, I am very willing to see them used, developed and honoured. I don’t want to get in the way of their •giving men something to dispute about, •supplying decoration for discourse, •providing the ‘experts’ with an income, and •facilitating civil life—acting, in short, like coins that have value because men agree to give it to them. Let me clear about this: what I am presenting won’t be much use for purposes such as those, since it can’t be brought within reach of the minds of the vulgar except ·indirectly·, through effects and works. My published writings, especially my Two Books on the Advancement of Learning, show well enough the sincerity of my declaration of friendly good will toward the accepted sciences, so I shan’t expend more words on that topic here. Meanwhile I give clear and constant warning that the methods now in use won’t lead to any great progress in the theoretical parts of the sciences, and won’t produce much in the way of applied-science results either.

129. All that remains for me to say are a few words about the excellence of the end in view. If I had said them earlier they might have seemed like mere prayers; but perhaps they’ll have greater weight now, when hopes have been created and unfair prejudices removed. I wouldn’t have said them even now if I had done the whole job myself, not calling on anyone else to help with the work, because ·words said in praise of the object of this exercise· might be taken as a proclamation of my own deserts. But ·I’m not going it alone·; I do want to energize others and kindle their zeal, so it is appropriate that I put men in mind of some things, ·even at the risk of seeming to boast·.

The making of great ·scientific· discoveries seems to have pride of place among human actions. That was the attitude of the ancients: they honoured the makers of discoveries as though they were gods, but didn’t go higher than demigods in their honours for those who did good service in the state (founders of cities and empires, legislators, saviours of their country from long endured evils, quellers of tyrannies, and the like). And if you think accurately about the two ·kinds of benefactor· you will see that the ancients were right about them. Why? (1) Because the benefits of ·scientific· discoveries can •extend to the whole of mankind, and can •last for all time, whereas civil benefits •apply only to particular places and •don’t last for very long.

(2) Also, improvements in civil matters usually bring violence and confusion with them, whereas ·scientific· discoveries bring delight, and confer benefits without causing harm or sorrow to anyone.

·Scientific· discoveries are like new creations, imitations of God’s works. . . . It seems to be worth noting that Solomon, the marvel of the world, though mighty in empire and in gold, in the magnificence of his works, his court, his household, his fleet, and the lustre of his name, didn’t glory in any of these, but pronounced that ‘It is the glory of God to conceal a thing; but the honour of kings is to search out a matter’ (Proverbs 25:2).

If you compare how men live in the most civilized provinces of Europe with how they live in the wildest and most barbarous areas of the American continent, you will think the difference is big enough—the difference in •the condition of the people in themselves as well as in •what conveniences and comforts they have available to them—to justify the saying that ‘man is a god to man’. And this difference doesn’t come from the Europeans’ having better soil, a better climate, or better physiques, but from the arts [see note on ‘art’ here].

Notice the vigour of discoveries, their power to generate consequences. This is nowhere more obvious than in three discoveries that the ancients didn’t know and whose origins (all quite recent) were obscure and humdrum. I am talking about the arts of •printing, •gunpowder, and •the nautical compass. These three have changed the whole aspect and state of things throughout the world—the first in literature, the second in warfare, the third in navigation—bringing about countless changes; so that there seems to have been no empire, no philosophical system, no star that has exerted greater power and influence in human affairs than these mechanical discoveries.

For my next point, I need to distinguish the three kinds— three levels, as it were—of human ambition. (1) Some people want to extend their power within their own country, which is a commonplace and inferior kind of ambition. (2) Some work to extend the power and dominion of their country in relation to mankind in general; this is certainly not as base as (1) is, but it is just as much a case of greed. (3) If a man tries to get mankind’s power and control over the universe off to a fresh start, and to extend it, hisambition (if it is ambition at all) is certainly more wholesome and noble ·than the other two·. Now—·this being the point I wanted to make·—man’s control over things depends wholly on the arts and sciences, for we can’t command nature except by obeying her.

A further point: it sometimes happens that •one particular discovery is so useful to mankind that the person who made it and thus put the whole human race into his debt is regarded as superhuman; so how much higher a thing it is to discover something through which •everything else can easily be discovered! ·Not that a discovery’s consequences are the main thing about it·. Light is useful in countless ways, enabling us to walk, practise our arts, read, recognize one another, and yet something that is finer and lovelier than all those uses of light is seeing light. Similarly, merely contemplating things as they are, without superstition or imposture, error or confusion, is in itself worthier than all the practical upshots of discoveries.

Final point: If anyone counts it against the arts and sciences that they can be debased for purposes of wickedness, luxury, and the like, don’t be influenced by that. The same can be said of all earthly goods: intelligence, courage, strength, beauty, wealth—even light! Just let the human race get back the right over nature that God gave to it, and give it scope; how it is put into practice will be governed by sound reason and true religion.

130. The time has come for me to present the art of interpreting nature—the art itself, ·not just remarks about the need for it, its virtues, and so on·. Although I think I have given true and most useful precepts in it, I don’t say that this art is absolutely necessary, implying that nothing could be done without it. In fact, I think that if

•men had ready at hand a sound history of nature and of experiments, •were thoroughly practised in it, and •imposed on themselves two rules: (1) set aside generally accepted opinions and notions, and (2) for a while keep your mind away from the highest and second-to-highest generalizations,

they would arrive at my form of interpretation sheerly through their own natural intelligence, with no help from any other rules or techniques. For interpretation is the true and natural work of the mind when it is freed from blockages. It is true, however, that it can all be done more readily and securely with help from my precepts.

And I don’t say, either, that my art of interpreting nature is complete so that nothing can be added to it. On the contrary: I am concerned with the mind not only in respect of its own capacities but also in respect of how it engages with things; so I have to think that the art of discovery can develop as more discoveries are made.

The next post in the sequence will be posted Thursday, October 24 at latest by 4:00pm PDT.

Discuss

### Reposting previously linked content on LW

Новости LessWrong.com - 18 октября, 2019 - 04:24
Published on October 18, 2019 1:24 AM UTC

This question might be a bit specific to me, but maybe it applies to others so I'll ask publicly so the answer becomes more visible to all.

Is there are a policy or what are your thoughts on posting content on LW that was previously the subject of a link post?

In my case most of these are from the dying days of the LW 1.0 era. In the last few months I've switched from linking to cross-posting from my blog as LW 2.0 has taken off and proven to be a good host for my content. I could update those old posts with the content just to mirror it over on LW, but that also feels a bit sad because if I'm going to do the (admittedly minor) formatting work necessary to bring the posts over and have them look nice, I'd also like it if they became more visible to folks, especially for that content that I think people would like to see but may have missed because it was linked at a time when LW was pretty inactive and so those old, updated posts wouldn't well reflect the engagement that might receive now. However, I can also see some danger in this as maybe someone could take this as license to abuse the policy and repost a lot of stuff from years ago to make it fresh again even when it did previously receive appropriate levels of engagement because it was being posted at a time when LW was more active.

This is obviously mostly a question for the moderators but other people's opinions are probably worthwhile as evidence for the mods to consider.

(On an unrelated note, I'd actually be pretty happy to move my blog to LW from Medium, which I know has been floated as a possible future feature, but I'd need some way to keep links working, which is thankfully eased by having a custom domain and not needing to redirect raw Medium articles, as well as a way for my domain to sit in front of my LW content but not all LW content.)

Discuss

### Random Thoughts on Predict-O-Matic

Новости LessWrong.com - 18 октября, 2019 - 02:39
Published on October 17, 2019 11:39 PM UTC

I'm going to be a bit more explicit about some ideas that appeared in The Parable of Predict-O-Matic. (If you don't want spoilers, read it first. Probably you should read it first anyway.)

[Note: while the ideas here are somewhat better than the ideas in the predict-o-matic story, they're equally rambling, without the crutch of the story to prop them up. As such, I expect readers to be less engaged. Unless you're especially interested in which character's remarks are true (or at least, which ones I stand by), this might be a post to skim; I don't think it has enough coherence that you need to read it start-to-finish.]

First, as I mentioned in Partial Agency, my main concern here isn't actually about building safe oracles or inner-aligned systems. My main concern is to understand what's going on. If we can build guaranteed-myopic systems, that's good for some purposes. If we can build guaranteed-non-myopic systems, that's good for other purposes. The story largely frames it as a back-and-forth about whether things will be OK / whether there will be terrible consequences; but my focus was on the more specific questions about the behavior of the system.

Second, I'm not trying to confidently stand behind any of the character's views on what will happen. The ending was partly intended to be "and no one got it right, because this stuff is very complicated". I'm very uncertain about all of this. Part of the reason why it was so much easier to write the post as a story was that I could have characters confidently explain views without worrying about adding all the relevant caveats.

Inductive Bias

Evan Hubinger pointed out to me that all the characters are talking about asymptotic performance, and ignoring inductive bias. Inner optimizers might emerge due to the inductive bias of the system. I agree; in my mind, the ending was a bit of a hat tip to this, although I hinted at gradient hacking rather than inductive bias in the actual text.

On the other hand, "inductive bias" is a complicated object when you're talking about a system which isn't 100% Bayesian.

• You often represent inductive bias through regularization techniques which introduce incentives pulling toward 'simpler' models. This means we're back in the territory of incentives and convergence.
• So, to talk about what a learning algorithm really does, we have to also think of the initialization and search procedure as part of the inductive bias. This makes inductive bias altogether a fairly complicated object.
Explicit Fixed-Point Selection

The very first conversation involved the intern arguing that there would be multiple valid fixed-points of prediction, and Predict-O-Matic would have to choose between them somehow.

Explicitly modeling fixed points and choosing between them is a feature of the logical induction algorithm. This feature allows us to select the best one according to some criterion, as is leveraged in When Wishful Thinking Works. As discussed later in the conversation with the mathematician, this is atypical of supervised learning algorithms. What logical induction does is very expensive: it solves a computationally difficult fixed-point finding problem (by searching exhaustively).

Other algorithms are not really "choosing a fixed point somehow". They're typically failing to guarantee a fixed point. The mathematician hinted at this by describing how algorithms would not necessarily converge to a self-fulfilling prophecy; they could just as easily go in circles or wander around randomly forever.

Think of it like fashion. Sometimes, putting a trend into common knowledge will lock it in; this was true about neck ties in business for a long time. In other instances, the popularity of a fashion trend will actually work against it, a fashion statement being ineffective if it's overdone.

So, keep in mind that different learning procedures will relate to this aspect of the problem in different ways.

Reward vs Prediction Error

The economist first compared the learning algorithm to decision markets, then later, decided prediction markets were a better analogy.

The mathematician contrasted the learning algorithm to reinforcement learning, pointing out that Predict-O-Matic always adjusted outputs to be more like historical observations, whereas reinforcement learning would more strategically optimize reward.

Both of these point at a distinction between learning general decision-making and something much narrower and much more epistemic in character. As I see it, the critical idea is that (1) the system gets information about what it should have output; (2) the learning update moves toward a modified system which would have output that. This is quite different from reinforcement learning.

In a recent post, Wei Dai mentions a similar distinction (italics added by me):

Supervised training - This is safer than reinforcement learning because we don't have to worry about reward hacking (i.e., reward gaming and reward tampering), and it eliminates the problem of self-confirming predictions (which can be seen as a form of reward hacking). In other words, if the only thing that ever sees the Oracle's output during a training episode is an automated system that computes the Oracle's reward/loss, and that system is secure because it's just computing a simple distance metric (comparing the Oracle's output to the training label), then reward hacking and self-confirming predictions can't happen.

It could be contrasted with supervised learning by saying that whereas supervised learning intends to infer a conditional probability distribution pX(x|y) conditioned on the label y of input data; unsupervised learning intends to infer an a priori probability distribution pX(x).

But many (not all) unsupervised algorithms still have the critical features we're interested in! Predicting x without any context information y to help still involves (1) getting feedback on what we "should have" expected, and (2) updating to a configuration which would have more expected that. We simply can't expect the predictions to be as focused, given the absence of contextual information to help. But that just means it's a prediction task on which we tend to expect lower accuracy.

I'm somewhat happy referring to this category as imitative learning. This includes supervised learning, unsupervised learning so long as it's generative (but not otherwise), and imitation learning (a paradigm which achieves similar ends as inverse reinforcement learning). Homever, the terminological overlap with 'imitation learning' is rather terrible, so I'm open to other suggestions.

It seems to me that this is a critical distinction for the myopia discussion. I hope to say more about it in future posts.

Maximizing Entropy?

The discussion of prediction markets toward the end was rather loose, in that the economist didn't deal with a lot of the other points which had been made throughout, and just threw a new model out there.

• The mechanism of manipulation is left quite vague. In an assassination market, there are all kinds of side-channels which agents can use to accomplish their goals. But the rest of the essay had only considered the influence which Predict-O-Matic has by virtue of the predictions it makes. When writing this part, I was actually imagining side-channels, such as exploiting bugs to communicate by other means.
• It's not clear whether the market is supposed to be myopic or non-myopic in this discussion. The argument for overall myopia was the economist's initial decision-market model, which is being cast aside here. Are manipulators expected to only throw off individual predictions to reap reward, or are they expected to implement cross-prediction strategies?
• The disanalogy between prediction markets and a local search like gradient descent is swept under the rug. A prediction market has many agents interacting with it, so that one agent can pick up money from another by out-smarting it. Do conclusions from that setting carry over to single-hypothesis learning? It isn't clear.

Isnasene interpreted the first point by imagining that the mechanism of manipulation is still through selection of fixed points:

In the same way that self-fulfilling predictions are good for prediction strategies because they enhance accuracy of the strategy in question, self-fulfilling predictions that seem generally surprising to outside observers are even better because they lower the accuracy of competing strategies. The established prediction strategy thus systematically causes the kinds of events in the world that no other method could predict to further establish itself.

This is compatible with the assumption of myopia; we might imagine that the system still can't manipulate events through actual bad predictions, because those strategies will be undercut. Therefore, the manipulation is restricted to selecting fixed-points which are surprising.

However, there are three problems with this:

• The "undercutting" argument relies on an assumption that there are enough different strategies considered, so that the one which undercuts the non-myopic strategy is eventually found. The argument for strategically increasing entropy relies on the reverse assumption! There's little profit in increasing entropy if others can predict that you'll do this and cash in.
• (We might still see both effects at different times in training, however, and therefore expect major consequences of both effects.)
• As previously discussed, we don't actually expect fixed points in general. So we have to ask whether the entropy-increasing incentive is significant more generally (unless we're specifically thinking about the logical induction algorithm or other algorithms which get you a fixed point).
• We still have not dealt with the disanalogy between prediction markets and local-search-based learning.

So, it seems the actual situation is more complicated, and I'm not yet sure how to think about this.

'Local Search'; selection vs control

I used the term 'local search' to describe the application of gradient-descent-like updates to reduce prediction error. I have some conceptual/terminological issues with this.

Calling this 'local search' invokes the mental image of a well-defined gradient landscape which we are taking steps on, to further optimize some function. But this is the wrong mental image. The mental image is one of selection, when we're in a control setting (in my terminology). We are not making an iid assumption. We are not getting samples from a stationary but stochastic loss function, as in stochastic gradient descent.

If 'local search' were an appropriate descriptor for gradient-descent here, would it also be an appropriate descriptor for Bayesian updates? There's a tendency to think of Bayesian learning as trying to find one good hypothesis by tracking how well all of them do (which sounds like a global search), but we needn't think of it this way. The "right answer" can be a mixture over hypotheses. We can think of a Bayesian update as incrementally improving our mixture. But thinking of Bayesian updates as local search seems wrong. (So does thinking of them as global search.)

This is online learning. A gradient-descent step represents a prediction that the future will be like the past in some relevant sense, in spite of potential non-stationarity. It is not a guaranteed improvement, even in expectation -- as it would be in offline stochastic gradient descent with sufficiently small step size.

Moreover, step size becomes a more significant problem. In offline gradient descent, selecting too small a step size only means that you have to make many more steps to get where you're going. It's "just a matter of computing power". In online learning, it's a more serious problem; we want to make the appropriate-sized update to new data.

I realize there are more ways of dealing with this than tuning step size; we don't necessarily update to data by making a single gradient step. But there are problems of principal here.

What's gradient descent without a fitness landscape?

Simply put, gradient descent is a search concept, not a learning concept. I want to be able to think of it more directly as a learning concept. I want to be able to think of it as an "update", and use terminology which points out the similarity to Bayesian updates.

The Duality Remark

The engineer was worse: they were arguing that Predict-O-Matic might maximize prediction error! Some kind of duality principle. Minimizing in one direction means maximizing in the other direction. Whatever that means.

I responded:

[I]t was a speculative conjecture which I thought of while writing.The idea is that incentivizing agents to lower the error of your predictions (as in a prediction market) looks exactly like incentivizing them to "create" information (find ways of making the world more chaotic), and this is no coincidence. So perhaps there's a more general principle behind it, where trying to incentivize minimization of f(x,y) only through channel x (eg, only by improving predictions) results in an incentive to maximize f through y, under some additional assumptions. Maybe there is a connection to optimization duality in there.In terms of the fictional cannon, I think of it as the engineer trying to convince the boss by simplifying things and making wild but impressive sounding conjectures. :)

If you have an outer optimizer which is trying to maximize f(x,y) through x while being indifferent about y, it seems sensible to suppose that inner optimizers will want to change y to throw things off, particularly if they can get credit for then correcting x to be optimal for the new y. If so, then inner optimizers will generally be seeking to find y-values which make the current x a comparatively bad choice. So this argument does not establish an incentive to choose y which makes all choices of x poor.

In a log-loss setting, this would translate to an incentive to make observations surprising (for the current expectations), rather than a direct incentive to make outcomes maximum-entropy. However, iteration of this would push toward maximum entropy. Or, logical-induction-style fixed-point selection could push directly to maximum entropy.

This would be a nice example of partial agency. The system is strategically influencing x and y so as to maximize f through channel x, while minimizing through channel y. What does this mean? This does not correspond to a coherent objective function at all! The system is 'learning a game-theoretic equilibrium' -- which is to say, it's learning to fight with itself, rather than optimize.

There are two different ways we can think about this. One way is to say there's an inner alignment problem here: the system learns to do something which doesn't fit any objective, so it's sort of trivially misaligned with whatever the outer objective was supposed to be. But what if we wanted this? We can think of games as a kind of generalized objective, legitimizing this behavior.

To make things even more confusing, if the only channel by which Predict-O-Matic can influence the world is via the predictions which get output, then... doesn't x=y? x represents the 'legitimate' channel whereby predictions get combined with (fixed) observations to yield a score. y represents the 'manipulative' channel, where predictions can influence the world and thus modify observations. But the two causal pathways have one bottleneck which the system has to act through, namely, the predictions made.

In any case, I don't particularly trust any of the reasoning above.

• I didn't clarify my assumptions. What does it mean for the outer optimizer to maximize f(x,y) through x while being indifferent about y? It's quite plausible that some versions of that will incentivise inner optimizers which optimize f taking advantage of both channels, rather than the contradictory behavior conjectured above
• I anthropomorphized the inner optimizers. In particular, I did not specify or reason about details of the learning procedure.
• This sort of assumes they'll tend to act like full agents rather than partial agents, while yielding a conclusion which suggests otherwise.
• This caused me to speak in terms of a fixed optimization problem, rather than a learning process. Optimizing f isn't really one thing -- f is a loss function which is applied repeatedly in order to learn. The real problem facing inner optimizers is an iterated game involving a complex world. I can only think of them trying to game a single f if I establish that they're myopic; otherwise I should think of them trying to deal with a sequence of instances.

So, I'm still unsure how to think about all this.

Discuss

### The best of the www, in my opinion

Новости LessWrong.com - 17 октября, 2019 - 22:10
Published on October 17, 2019 3:14 PM UTC

Below I will present a (small but qualitative ) list of those that I think are some of the best sites/blog that a human being can find on the world wide web.

The main criterion I used to draw up the list was to consider how the websites promote the dissemination of knowledge among people and how, over the course of time, they have helped me both with regards to work and in terms of intellectual self-formation. The order in which they are listed is not to be considered restrictive ( except perhaps for the first two ).

Please feel free to criticize the catalog (as long as the criticisms are rational and constructive) and to expand it in the comments.

1) Stack Exchange Concentrator ( https://stackexchange.com/sites )

2) ArXiv e-Print archive ( https://arxiv.org/ )

3) GitHub ( https://github.com/ )

4) Reddit - the front page of the internet ( https://www.reddit.com/ )

5) LessWrong ( https://www.lesswrong.com/ )

6) Shtetl-Optimized ( https://www.scottaaronson.com/blog/ )

7) Slate Star Codex ( https://slatestarcodex.com/author/admin/ )

9) TED ( https://www.ted.com/#/ )

10) Standford Encyclopedia of Philosophy ( https://plato.stanford.edu/ )