Вы здесь

Сборщик RSS-лент

Preview On Hover

Новости LessWrong.com - 25 июня, 2020 - 01:20
Published on June 24, 2020 10:20 PM GMT

A couple years ago, Wikipedia added a feature where if you hover over an internal link you'll see a preview of the target page:

Other sites with similar features include gwern.net:

And LessWrong:

In general, I like these features a lot. They dramatically lower the barrier the following internal links, letting you quickly figure out whether you're interested. On the other hand, they do get in the way. They pop up, overlapping the text you're reading, and mean you need to be paying more attention to where the mouse goes.

I decided I wanted to add a feature like this to my website, but without any overlap. The right margin seemed good, and if you're reading this on jefftk.com with a window at least 1000px wide then hovering over any link from one of my blog posts to another should show a preview:

Here are my five most recent posts if you'd like to give it a try:

There are a lot of options for how to implement this, but I decided to use an iframe that just loads the relevant page. Feel free to look at the page source and see exactly how it works, but the general idea is:

  • It loads the page directly, not a stripped down version like the examples above. My pages are simple enough that this is fine.

  • It's a preview, not a full page, so set scrolling=no.

  • It's a same-origin ("friendly") iframe, so I can reach into it and add a click listener, so clicking on the iframe takes you through to the previewed post.

  • I don't want comments, ads, analytics, or anything else potentially slow to run, and I don't use JS for rendering, so I use sandbox to turn off JS.

  • Once it's open it stays open until you hover a different preview.

  • It appears vertically aligned with the hovered link, and moves with the page out of view as you scroll down.

  • If you hover over a second link close to the first one, it reuses the same vertical position to avoid looking jumpy.

  • If you hover many links in quick succession it starts loading the first one immediately, and then discards any links that have been overtaken by events.

I'm pretty happy with it, but if you find any bugs let me know!

Comment via: facebook



Discuss

Don't punish yourself for bad luck

Новости LessWrong.com - 25 июня, 2020 - 00:52
Published on June 24, 2020 9:52 PM GMT

The following text first summarizes the standard moral-hazard model. Afterwards, I point out that it implies that you always get punished for bad luck. The third part is completely speculative: I speculate on how you should behave towards yourself.

A brief summary of a moral-hazard setting

A Moral Hazard situation occurs when someone takes too much risk, or does not reduce it enough because someone else bears the cost.

The following situation is a typical textbook example. A worker works for a firm, and her effort influences the probability that the firm has high revenue. The worker can exert high or low effort, the firm's revenue can be high or low, and low revenue is more likely when effort is low, but can also occur when effort is high. Moreover, the worker has to get a wage that compensates her for forgoing whatever else she would do with her time.

Suppose the firm would, in principle, be willing to compensate the worker for high effort (which means that we assume that the additional expected revenue gained from high effort ist at least as high as the additional wage needed to make the worker willing to exert high effort). Because workers are usually assumed to be risk-averse, the firm would take the risk of low revenue and the worker gets a wage that is constant in all states of the world.

However, now also suppose the firm cannot directly observe the effort - this constitutes a situation of asymmetric information, because the worker can observe her own effort and the firm cannot. Then the firm cannot condition the payment on the worker's effort. It also cannot just conclude that the worker exerted low effort by observing low revenue, because we assumed that low revenue can also occur when the worker exerted high effort.

The second-best optimal solution (that is, the best solution given this information problem) is to condition payments on the revenue - and thus, on the result instead of the effort to get it. The worker gets a higher wage when the firm has high revenue. Thereby, the firm can design the contract such that the worker will choose to exert high effort.

In this setting of asymmetric information, the worker gets the same expected utility as in the setting with symmetric information (in which effort could be observed), because the firm still has to compensate her for not doing something else. But because the worker now faces an uncertain income stream, the expected wage must be higher than if the wage were constant. (Thus, the firm has a lower expected profit. If the loss to the firm due to the high-revenue wage premium is severe enough, the firm may not even try to enforce high effort.) The asymmetric information precludes an optimal arrangement of which economic agent takes the risk.


You'll get punished for bad luck


At this point, note that the way that the firm offers a higher wage when it has high revenues and a lower one when it has low revenues is a matter of framing. The firm may for example say that it wants its workers to participate in its success, and therefore pay a premium.

Vocabulary of "punishment", by contrast, may not be popular. Also, it seems wrong to call the low wage a punishment wage. Why? Because the optimal contract makes the worker exert high effort, and a low revenue will NOT indicate that the worker idled.

So that is the irony of the situation: An optimal contract punishes you for bad luck, and for nothing else. At the same time, the worker would be more likely to get "punished" if he idled, because low revenue would then be more likely. The threat of punishment for a bad result is exactly what makes the worker exert high effort to at least make the bad results unlikely.

Optimal contracts in your brain?

Suppose you feel a bit split between two "agents" in your brain. One part of you would like to avoid working. The other part would like you to exert high effort to have a good chance of reaching your goals.

You cannot pay yourself a wage for high effort, but you can feel good or bad. Yet the kind-of-metaphorical implication of the moral-hazard optimal-contract model is that you should not punish yourself for bad luck. There are random influences in the world, but if you can see (or remember) how much effort you exerted, it does not make sense to give yourself a hard time because you were unlucky.

On the other hand, maybe you punish yourself because you lie to yourself about your effort? If you have created such an asymmetric-information situation within yourself, punishing yourself for bad luck is a seemingly rational idea. But keep in mind that it is only second-best optimal, under the assumption that this asymmetric information indeed exists. If so, think of ways to measure your effort instead of only your outcome. If you cannot do it, consider whether forcing yourself to exert high effort is really worth it. Solve the problem that actually needs to be solved, and respect the constraints that exist, and none that do not.



Discuss

Can I archive content from lesswrong.com on the wayback machine (internet archive, archive.org) ?

Новости LessWrong.com - 24 июня, 2020 - 23:18
Published on June 24, 2020 6:15 PM GMT

There are some great information on lesswrong.com (LW) that seems to be available publicly (I can access it in an incognito chrome window) and I would like to increase the chances of this information surviving for a long time.

When I try saving a LW page it looks like it does not render correctly on the wayback machine. Ex: https://web.archive.org/web/20200624170623/https://www.lesswrong.com/s/FrqfoG3LJeCZs96Ym/p/8qccXytpkEhEAkjjM

I opened a github issue on LW's repo since I assume it is an issue with the source code of LW. The EA forum seems to have the same issue and it looks like the EA forum's repo is a fork of lesswrong's repo. I am also writing here since it might have more visibility for non tech people.



Discuss

Betting with Mandatory Post-Mortem

Новости LessWrong.com - 24 июня, 2020 - 23:04
Published on June 24, 2020 8:04 PM GMT

Betting money is a useful way to

  • ensure you have some skin in the game when making assertions;
  • get a painful reminder of when you're wrong, so that you'll update;
  • make money off of people, if you're right.

However, I recently made a bet with both a monetary component and the stipulation that the loser write at least 500 words to a group chat about why they were wrong. I like this idea because:

  • It enforces that some cognitive labor be devoted to the update, rather than relying on the pain of lost cash. Even if you do think it through privately, the work of writing it up will help you remember the mistake next time. (If you don't want to spend that amount of time thinking about why you were wrong, then perhaps you aren't very interested in really updating on this bet.)
  • People usually make small-cash bets anyway, so there's not that much skin in the game. Being forced to write publically, or to a select group of peers such as a slack/discord server, makes it feel real for me in a way that losing a small sum of money doesn't.
  • Where normal bets may benefit the participants, these kinds of public bets have more benefit for the whole audience. Observers get a lot more information about the structure of the disagreement, and the update which the loser takes from it.
  • Often, by the time a bet is decided, a lot of other relevant information has come in as well. A public post-mortem gives the loser a chance to convey this information.
  • This kind of bet will often be positive-sum in reputational terms: the winner gets a public endorsement from the loser, but the loser may gain respect from the audience for their gracious defeat and judicious update.

Furthermore, if the loser's write-up is anything short of honest praise for the winner's views, the write-up may provide hints at a continuing disagreement between the loser and winner which can lead to another bet.

This idea feels similar to Ben's Share Models, Not Beliefs. Bets focus only on disagreement with probabilities, not the underlying reasons for those disagreements. Declaring a winner/loser conveys binary information about who was more correct, but this is very little information. Post-mortems give a place for the models to be brought to light.

A group of people who engaged in betting-with-post-mortems together would generally be getting a lot more feedback on practical reasoning and where it can go wrong.



Discuss

Quick Look #1 Diophantus of Alexandria

Новости LessWrong.com - 24 июня, 2020 - 22:12
Published on June 24, 2020 7:12 PM GMT

https://www.storyofmathematics.com/hellenistic_diophantus.html

Diophantus of Alexandria, a 2nd Century Greek mathematician, had a lot of the concepts needed to develop an Algebra. However, he was unable to fully generalize his methods of problem solving, even if he invented some interesting methods.

Ancient math was written in paragraphs, using words for the most part, thus making reading it very, very painful compared to the compact elegance of modern mathematical notation. However, I was surprised to see Diophantus (or his very early editors at least) develop some interesting and helpful notation in his algebra.

Final sigma ‘ς’ represented the unknown variable, but there were different symbols for variables of every power so for x^2… x^6 each had a unique variable. In fact, this situation persisted into the 17th century, even Fermat used N for unknown and S for the unknown-squared and C for the unknown cubed!

The problem with this is that it meant Diophantus couldn’t devise general methods to solve algebraic problems which had multiple unknowns, and it wasn’t obvious to him that one CAN frequently relate x^2 to x.

The cool thing about this notation from the past though, is how it makes obvious something that Algebra I – Algebra II students mess up frequently. You can’t just combine x^2 + x^3, these are different variables whose relation concerns the base. And almost everyone has made this mistake in their early math career. Some never recover.

Although the editor of my copy, Sir Thomas L. Heath, claims that Diophantus experienced limited options as a mathematician because all the letters of the Greek alphabet were in use as letters except for the final sigma, which Diophantus used to represent the unknown variable, I think D could have invented more variables quite easily. We see this in his invention of the subtraction sign as an inverted psi, and his use of a different variable with superscript for an unknown to the nth power up to the sixth. There was also the extinct digamma and all the Egyptian symbols which at least could have cribbed off of. Surely, the problem was not a lack of imagination, but merely satisfaction with the method then in use. Besides, one person can only invent so much, unless that person is Leonard Euler or Von Neumann, neither of whom had any limits. D. merely didn't see the limits of his notation.

Although D’s problems are surprisingly challenging even using modern notation, the logic D. used to solve the problems is obscure. He does not explain his step by step process. Since they are not proofs, and they are merely problems, it’s hard to divine exactly what D. thought the import of his methods were or exactly which steps he took to come to the answer. He seems to have used trial and error to solve some problems frequently, just plugging in numbers until the right answer popped out. He only wanted positive integers in his answers, so the problems are designed to reflect that. However, some problems don’t have an answer as whole number. For those he would estimate the answer. “X is < 11 and > 10.” Sometimes he is wrong on these estimations! I don’t know quite what to make of that. In problems whose answer is a negative number, Diophantus says, “Pthht, absurd!”

This is unfortunate, because if D. had credits and debts in mind when he was putting together these problems, he might have seen the utility of negative numbers and started an accounting revolution 1500 years early.

If Diophantus can teach us one thing about discovery, I believe it is that iterating over different methods of notation might lead us to make conceptual breakthroughs.



Discuss

What's the name for that plausible deniability thing?

Новости LessWrong.com - 24 июня, 2020 - 21:42
Published on June 24, 2020 6:42 PM GMT

There's a concept I remember reading about here, of the idea that you can't just suddenly refuse to answer dangerous questions. You have to consistently refuse to answer some random sample of totally normal questions, so that "refusing to answer a question" doesn't itself become a source of information.

Unfortunately I can't remember what it's usually called, and haven't been able to turn it up via search ("plausible deniability" is way too broad, and I can't find the right narrowing criteria). What is this concept/process called?



Discuss

Abstraction, Evolution and Gears

Новости LessWrong.com - 24 июня, 2020 - 20:39
Published on June 24, 2020 5:39 PM GMT

Meta: this project is wrapping up for now. This is the second of probably several posts dumping my thought-state as of this week.

It is an empirical fact that we can predict the day-to-day behavior of the world around us - positions of trees or buildings, trajectories of birds or cars, color of the sky and ground, etc - without worrying about the details of plasmas roiling in any particular far-away star. We can predict the behavior of a dog without having to worry about positions of individual molecules in its cells. We can predict the behavior of reinforced concrete without having to check it under a microscope or account for the flaps of butterfly wings a thousand kilometers away.

Our universe abstracts well: it decomposes into high-level objects whose internal details are approximately independent of far-away objects, given all of their high-level summary information.

It didn’t have to be this way. We could imagine a universe which looks like a cryptographic hash function, where most bits are tightly entangled with most other bits and any prediction of anything requires near-perfect knowledge of the whole system state. But empirically, our universe does not look like that.

Given that we live in a universe amenable to abstraction, what sorts of agents should we expect to evolve? What can we say about agency structure and behavior in such a universe? This post comes at the question from a few different angles, looking at different properties I expect evolved agents to display in abstraction-friendly universes.

Convergent Instrumental Goals

The basic idea of abstraction is that any variable X is surrounded by lots of noisy unobserved variables, which mediate its interactions with the rest of the universe. Anything “far away” from X - i.e. anything outside of those noisy intermediates - can only “see” some abstract summary information f(X). Anything more than a few microns from a transistor on a CPU will only be sensitive to the transistor’s on/off state, not its exact voltage; the gravitational forces on far-apart stars depend only on their total mass, momentum and position, not on the roiling of plasmas.

One consequence: if an agent’s goals do not explicitly involve things close to X, then the agent cares only about controlling f(X). If an agent does not explicitly care about exact voltages on a CPU, then it will care only about controlling the binary states (and ultimately, the output of the computation). If an agent does not explicitly care about plasmas in far-away stars, then it will care only about the total mass, momentum and position of those stars. This holds for any goal which does not explicitly care about the low-level details of X or the things nearby X.

Noisy intermediates Z mask all information about X except the summary f(X). So, if an agent's objective only explicitly depends on far-away variables Y, then the agent only wants to control f(X), not necessarily all of X.

This sounds like instrumental convergence: any goal which does not explicitly care about things near X itself will care only about controlling f(X), not all of X. Agents with different goals will compete to control the same things: high-level behaviors f(X), especially those with far-reaching effects.

Natural next question: does all instrumental convergence work this way?

Typical intuition for instrumental convergence is something like “well, having lots of resources increases one’s action space, so a wide variety of agents will try to acquire resources in order to increase their action space”. Re-wording that as an abstraction argument: “an agent’s accessible action space ‘far away’ from now (i.e. far in the future) depends mainly on what resources it acquires, and is otherwise mostly independent of specific choices made right now”. 

That may sound surprising at first, but imagine a strategic video game (I picture Starcraft). There’s a finite world-map, so over a long-ish time horizon I can get my units wherever I want them; their exact positions don’t matter to my long-term action space. Likewise, I can always tear down my buildings and reposition them somewhere else; that’s not free, but the long-term effect of such actions is just having less resources. Similarly, on a long time horizon, I can build/lose whatever units I want, at the cost of resources. It’s ultimately just the resources which restrict my action space, over a long time horizon.

(More generally, I think mediating-long-term-action-space is part of how we intuitively decide what to call “resources” in the first place.)

Coming from a different angle, we could compare to TurnTrout’s formulation of convergent instrumental goals in MDPs. Those results are similar to the argument above in that agents tend to pursue states which maximize their long-term action space. We could formally define an abstraction on MDPs in which X is the current state, and f(X) summarizes the information about the current state relevant to the far-future action space. In other words, two states X with the same long-run action space will have the same f(X). “Power”, as TurnTrout defined it, would be an increasing function of f(X) - larger long-run action spaces mean more power. Presumably agents would tend to seek states with large f(X).

Modularity

Fun fact: biological systems are highly modular, at multiple different scales. This can be quantified and verified statistically, e.g. by mapping out protein networks and algorithmically partitioning them into parts, then comparing the connectivity of the parts. It can also be seen more qualitatively in everyday biological work: proteins have subunits which retain their function when fused to other proteins, receptor circuits can be swapped out to make bacteria follow different chemical gradients, manipulating specific genes can turn a fly’s antennae into legs, organs perform specific functions, etc, etc.

One leading theory for why modularity evolves is “modularly varying goals”: essentially, modularity in the organism evolves to match modular requirements from the environment. For instance, animals need to breathe, eat, move, and reproduce. A new environment might have different food or require different motions, independent of respiration or reproduction - or vice versa. Since these requirements vary more-or-less independently in the environment, animals evolve modular systems to deal with them: digestive tract, lungs, etc. This has been tested in simple simulated evolution experiments, and it works.

In short: modularity of the organism evolves to match modularity of the environment.

… and modularity of the environment is essentially abstraction-friendliness. The idea of abstraction is that the environment consists of high-level components whose low-level structure is independent (given the high-level summaries) for any far-apart components. That’s modularity.

Coming from an entirely different direction, we could talk about the good regulator theorem from control theory: any regulator of a system which is maximally successful and simple must be isomorphic to the system itself. Again, this suggests that modular environments should evolve modular “regulators”, e.g. organisms or agents.

I expect that the right formalization of these ideas would yield a theorem saying that evolution in abstraction-friendly environments tends to produce modularity reflecting the modular structure of the environment. Or, to put it differently: evolution in abstraction-friendly environments tends to produce (implicit) world-models whose structure matches the structure of the world.

Reflection

Finally, we can ask what happens when one modular component of the world is itself an evolved agent modelling the world. What would we expect this agent’s model of itself to look like?

I don’t have much to say yet about what this would look like, but it would be very useful to have. It would give us a grounded, empirically-testable outside-view correctness criterion for things like embedded world models and embedded decision theory. Ultimately, I hope that it will get at Scott’s open question “Does agent-like behavior imply agent-like architecture?”, at least for evolved agents specifically.



Discuss

[AN #105]: The economic trajectory of humanity, and what we might mean by optimization

Новости LessWrong.com - 24 июня, 2020 - 20:30
Published on June 24, 2020 5:30 PM GMT

[AN #105]: The economic trajectory of humanity, and what we might mean by optimization Alignment Newsletter is a weekly publication with recent content relevant to AI alignment around the world. View this email in your browser Newsletter #105
Alignment Newsletter is a weekly publication with recent content relevant to AI alignment around the world. Find all Alignment Newsletter resources here. In particular, you can look through this spreadsheet of all summaries that have ever been in the newsletter.
Audio version here (may not be up yet). SECTIONS HIGHLIGHTS
TECHNICAL AI ALIGNMENT
        AGENT FOUNDATIONS
        LEARNING HUMAN INTENT
        PREVENTING BAD BEHAVIOR
        FORECASTING
        MISCELLANEOUS (ALIGNMENT)
AI STRATEGY AND POLICY    HIGHLIGHTS

Modeling the Human Trajectory (David Roodman) (summarized by Nicholas): This post analyzes the human trajectory from 10,000 BCE to the present and considers its implications for the future. The metric used for this is Gross World Product (GWP), the sum total of goods and services produced in the world over the course of a year.

Looking at GWP over this long stretch leads to a few interesting conclusions. First, until 1800, most people lived near subsistence levels. This means that growth in GWP was primarily driven by growth in population. Since then population growth has slowed and GWP per capita has increased, leading to our vastly improved quality of life today. Second, an exponential function does not fit the data well at all. In an exponential function, the time for GWP to double would be constant. Instead, GWP seems to be doubling faster, which is better fit by a power law. However, the conclusion of extrapolating this relationship forward is extremely rapid economic growth, approaching infinite GWP as we near the year 2047.

Next, Roodman creates a stochastic model in order to analyze not just the modal prediction, but also get the full distribution over how likely particular outcomes are. By fitting this to only past data, he analyzes how surprising each period of GWP was. This finds that the industrial revolution and the period after it was above the 90th percentile of the model’s distribution, corresponding to surprisingly fast economic growth. Analogously, the past 30 years have seen anomalously lower growth, around the 25th percentile. This suggests that the model's stochasticity does not appropriately capture the real world -- while a good model can certainly be "surprised" by high or low growth during one period, it should probably not be consistently surprised in the same direction, as happens here.

In addition to looking at the data empirically, he provides a theoretical model for how this accelerating growth can occur by generalizing a standard economic model. Typically, the economic model assumes technology is a fixed input or has a fixed rate of growth and does not allow for production to be reinvested in technological improvements. Once reinvestment is incorporated into the model, then the economic growth rate accelerates similarly to the historical data.



Nicholas's opinion: I found this paper very interesting and was quite surprised by its results. That said, I remain confused about what conclusions I should draw from it. The power law trend does seem to fit historical data very well, but the past 70 years are fit quite well by an exponential trend. Which one is relevant for predicting the future, if either, is quite unclear to me.

The theoretical model proposed makes more sense to me. If technology is responsible for the growth rate, then reinvesting production in technology will cause the growth rate to be faster. I'd be curious to see data on what fraction of GWP gets reinvested in improved technology and how that lines up with the other trends.

Rohin’s opinion: I enjoyed this post; it gave me a visceral sense for what hyperbolic models with noise look like (see the blog post for this, the summary doesn’t capture it). Overall, I think my takeaway is that the picture used in AI risk of explosive growth is in fact plausible, despite how crazy it initially sounds. Of course, it won’t literally diverge to infinity -- we will eventually hit some sort of limit on growth, even with “just” exponential growth -- but this limit could be quite far beyond what we have achieved so far. See also this related post.



The ground of optimization (Alex Flint) (summarized by Rohin): Many arguments about AI risk depend on the notion of “optimizing”, but so far it has eluded a good definition. One natural approach is to say that an optimizer causes the world to have higher values according to some reasonable utility function, but this seems insufficient, as then a bottle cap would be an optimizer (AN #22) for keeping water in the bottle.

This post provides a new definition of optimization, by taking a page from Embedded Agents (AN #31) and analyzing a system as a whole instead of separating the agent and environment. An optimizing system is then one which tends to evolve toward some special configurations (called the target configuration set), when starting anywhere in some larger set of configurations (called the basin of attraction), even if the system is perturbed.

For example, in gradient descent, we start with some initial guess at the parameters θ, and then continually compute loss gradients and move θ in the appropriate direction. The target configuration set is all the local minima of the loss landscape. Such a program has a very special property: while it is running, you can change the value of θ (e.g. via a debugger), and the program will probably still work. This is quite impressive: certainly most programs would not work if you arbitrarily changed the value of one of the variables in the middle of execution. Thus, this is an optimizing system that is robust to perturbations in θ. Of course, it isn’t robust to arbitrary perturbations: if you change any other variable in the program, it will probably stop working. In general, we can quantify how powerful an optimizing system is by how robust it is to perturbations, and how small the target configuration set is.

The bottle cap example is not an optimizing system because there is no broad basin of configurations from which we get to the bottle being full of water. The bottle cap doesn’t cause the bottle to be full of water when it didn’t start out full of water.

Optimizing systems are a superset of goal-directed agentic systems, which require a separation between the optimizer and the thing being optimized. For example, a tree is certainly an optimizing system (the target is to be a fully grown tree, and it is robust to perturbations of soil quality, or if you cut off a branch, etc). However, it does not seem to be a goal-directed agentic system, as it would be hard to separate into an “optimizer” and a “thing being optimized”.

This does mean that we can no longer ask “what is doing the optimization” in an optimizing system. This is a feature, not a bug: if you expect to always be able to answer this question, you typically get confusing results. For example, you might say that your liver is optimizing for making money, since without it you would die and fail to make money.

The full post has several other examples that help make the concept clearer.



Rohin's opinion: I’ve previously argued (AN #35) that we need to take generalization into account in a definition of optimization or goal-directed behavior. This definition achieves that by primarily analyzing the robustness of the optimizing system to perturbations. While this does rely on a notion of counterfactuals, it still seems significantly better than any previous attempt to ground optimization.

I particularly like that the concept doesn’t force us to have a separate agent and environment, as that distinction does seem quite leaky upon close inspection. I gave a shot at explaining several other concepts from AI alignment within this framework in this comment, and it worked quite well. In particular, a computer program is a goal-directed AI system if there is an environment such that adding the computer program to the environment transforms it into an optimizing system for some “interesting” target configuration states (with one caveat explained in the comment).

   TECHNICAL AI ALIGNMENT
 AGENT FOUNDATIONS

Public Static: What is Abstraction? (John S Wentworth) (summarized by Rohin): If we are to understand embedded agency, we will likely need to understand abstraction (see here (AN #83)). This post presents a view of abstraction in which we abstract a low-level territory into a high-level map that can still make reliable predictions about the territory, for some set of queries (whether probabilistic or causal).

For example, in an ideal gas, the low-level configuration would specify the position and velocity of every single gas particle. Nonetheless, we can create a high-level model where we keep track of things like the number of molecules, average kinetic energy of the molecules, etc which can then be used to predict things like pressure exerted on a piston.

Given a low-level territory L and a set of queries Q that we’d like to be able to answer, the minimal-information high-level model stores P(Q | L) for every possible Q and L. However, in practice we don’t start with a set of queries and then come up with abstractions, we instead develop crisp, concise abstractions that can answer many queries. One way we could develop such abstractions is by only keeping information that is visible from “far away”, and throwing away information that would be wiped out by noise. For example, when typing 3+4 into a calculator, the exact voltages in the circuit don’t affect anything more than a few microns away, except for the final result 7, which affects the broader world (e.g. via me seeing the answer).

If we instead take a systems view of this, where we want abstractions of multiple different low-level things, then we can equivalently say that two far-away low-level things should be independent of each other when given their high-level summaries, which are supposed to be able to quantify all of their interactions.

Read more: Abstraction sequence



Rohin's opinion: I really like the concept of abstraction, and think it is an important part of intelligence, and so I’m glad to get better tools for understanding it. I especially like the formulation that low-level components should be independent given high-level summaries -- this corresponds neatly to the principle of encapsulation in software design, and does seem to be a fairly natural and elegant description, though of course abstractions in practice will only approximately satisfy this property.

  LEARNING HUMAN INTENT

Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences (Daniel S. Brown et al) (summarized by Zach): Bayesian reward learning would allow for rigorous safety analysis when performing imitation learning. However, Bayesian reward learning methods are typically computationally expensive to use. This is because a separate MDP needs to be solved for each reward hypothesis. The main contribution of this work is a proposal for a more efficient reward evaluation scheme called Bayesian REX (see also an earlier version (AN #86)). It works by pre-training a low-dimensional feature encoding of the observation space which allows reward hypotheses to be evaluated as a linear combination over the learned features. Demonstrations are ranked using pair-wise preference which is relativistic and thus conceptually easier for a human to evaluate. Using this method, sampling and evaluating reward hypotheses is extremely fast: 100,000 samples in only 5 minutes using a PC. Moreover, Bayesian REX can be used to play Atari games by finding a most likely or mean reward hypothesis that best explains the ranked preferences and then using that hypothesis as a reward function for the agent.

Prerequisites: T-REX



Zach's opinion: It's worth emphasizing that this isn't quite a pure IRL method. They use preferences over demonstrations in addition to the demonstrations themselves and so they have more information than would be available in a pure IRL context. However, it’s also worth emphasizing that (as the authors show) pixel-level features make it difficult to use IRL or GAIL to learn an imitation policy, which means I wasn’t expecting a pure IRL approach to work here. Conceptually, what's interesting about the Bayesian approach is that uncertainty in the reward distribution translates into confidence intervals on expected performance. This means that Bayesian REX is fairly robust to direct attempts at reward hacking due to the ability to directly measure overfitting to the reward function as high variance in the expected reward.

  PREVENTING BAD BEHAVIOR

Avoiding Side Effects in Complex Environments (Alexander Matt Turner, Neale Ratzlaff et al) (summarized by Rohin): Previously, attainable utility preservation (AUP) has been used to solve (AN #39) some simple gridworlds. Can we use it to avoid side effects in complex high dimensional environments as well? This paper shows that we can, at least in SafeLife (AN #91). The method is simple: first train a VAE on random rollouts in the environment, and use randomly generated linear functions of the VAE features as the auxiliary reward functions for the AUP penalty. The Q-functions for these auxiliary reward functions can be learned using deep RL algorithms. Then we can just do regular deep RL using the specified reward and the AUP penalty. It turns out that this leads to fewer side effects with just one auxiliary reward function and a VAE whose latent space is size one! It also leads to faster learning for some reason. The authors hypothesize that this occurs because the AUP penalty is a useful shaping term, but don’t know why this would be the case.

 FORECASTING

Reasons you might think human level AI soon is unlikely (Asya Bergal) (summarized by Rohin): There is a lot of disagreement about AI timelines, that can be quite decision-relevant. In particular, if we were convinced that there was a < 5% chance of AGI in the next 20 years, that could change the field’s overall strategy significantly: for example, we might focus more on movement building, less on empirical research, and more on MIRI’s agent foundations research. This talk doesn't decisively answer this question, but discusses three different sources of evidence one might have for this position: the results of expert surveys, trends in compute, and arguments that current methods are insufficient for AGI.

Expert surveys usually suggest a significantly higher than 5% chance of AGI in 20 years, but this is quite sensitive to the specific framing of the question, and so it’s not clear how informative this is. If we instead ask experts what percentage of their field has been solved during their tenure and extrapolate to 100%, the extrapolations for junior researchers tend to be optimistic (decades), whereas those of senior researchers are pessimistic (centuries).

Meanwhile, the amount spent on compute (AN #7) has been increasing rapidly. At the estimated trend, it would hit $200 billion in 2022, which is within reach of large governments, but would presumably have to slow down at that point, potentially causing overall AI progress to slow. Better price performance (how many flops you can buy per dollar) might compensate for this, but hasn't been growing at comparable rates historically.

Another argument is that most of our effort is now going into deep learning, and methods that depend primarily on deep learning are insufficient for AGI, e.g. because they can’t use human priors, or can’t do causal reasoning, etc. Asya doesn’t try to evaluate these arguments, and so doesn’t have a specific takeaway.



Rohin's opinion: While there is a lot of uncertainty over timelines, I don’t think under 5% chance of AGI in the next 20 years is very plausible. Claims of the form “neural nets are fundamentally incapable of X” are almost always false: recurrent neural nets are Turing-complete, and so can encode arbitrary computation. Thus, the real question is whether we can find the parameterization that would correspond to e.g. causal reasoning.

I’m quite sympathetic to the claim that this would be very hard to do: neural nets find the simplest way of doing the task, which usually does not involve general reasoning. Nonetheless, it seems like by having more and more complex and diverse tasks, you can get closer to general reasoning, with GPT-3 (AN #102) being the latest example in this trend. Of course, even then it may be hard to reach AGI due to limits on compute. I’m not claiming that we already have general reasoning, nor that we necessarily will get it soon: just that it seems like we can’t rule out the possibility that general reasoning does happen soon, at least not without a relatively sophisticated analysis of how much compute we can expect in the future and some lower bound on how much we would need for AGI-via-diversity-of-tasks.



Relevant pre-AGI possibilities (Daniel Kokotajlo) (summarized by Rohin): This page lists 47 things that could plausibly happen before the development of AGI, that could matter for AI safety or AI policy. You can also use the web page to generate a very simple trajectory for the future, as done in this scenario that Daniel wrote up.



Rohin's opinion: I think this sort of reasoning about the future, where you are forced into a scenario and have to reason what must have happened and draw implications, seems particularly good for ensuring that you don’t get too locked in to your own beliefs about the future, which will likely be too narrow.

  MISCELLANEOUS (ALIGNMENT)

Preparing for "The Talk" with AI projects (Daniel Kokotajlo) (summarized by Rohin): At some point in the future, it seems plausible that there will be a conversation in which people decide whether or not to deploy a potentially risky AI system. So one class of interventions to consider is interventions that make such conversations go well. This includes raising awareness about specific problems and risks, but could also include identifying people who are likely to be involved in such conversations and concerned about AI risk, and helping them prepare for such conversations through training, resources, and practice. This latter intervention hasn't been done yet: some simple examples of potential interventions would be generating official lists of AI safety problems and solutions which can be pointed to in such conversations, or doing "practice runs" of these conversations.



Rohin's opinion: I certainly agree that we should be thinking about how we can convince key decision makers of the level of risk of the systems they are building (whatever that level of risk is). I think that on the current margin it's much more likely that this is best done through better estimation and explanation of risks with AI systems, but it seems likely that the interventions laid out here will become more important in the future.

   AI STRATEGY AND POLICY

Medium-Term Artificial Intelligence and Society (Seth D. Baum) (summarized by Rohin): Like a previously summarized paper (AN #90), this paper aims to find common ground between near-term and long-term priorities in medium-term concerns. This can be defined along several dimensions of an AI system: when it chronologically appears, how feasible it is to build it, how certain it is that we can build it, how capable the system is, how impactful the system is, and how urgent it is to work on it.

The paper formulates and evaluates the plausibility of the medium term AI hypothesis: that there is an intermediate time period in which AI technology and accompanying societal issues are important from both presentist and futurist perspectives. However, it does not come to a strong opinion on whether the hypothesis is true or not.

FEEDBACK I'm always happy to hear feedback; you can send it to me, Rohin Shah, by replying to this email. PODCAST An audio podcast version of the Alignment Newsletter is available. This podcast is an audio version of the newsletter, recorded by Robert Miles.
Subscribe here:

Copyright © 2020 Alignment Newsletter, All rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.
 

Discuss

Linkpost: M21 Review: We Have Normality

Новости LessWrong.com - 24 июня, 2020 - 19:10
Published on June 24, 2020 4:10 PM GMT

You can find it here.



Discuss

Models, myths, dreams, and Cheshire cat grins

Новости LessWrong.com - 24 июня, 2020 - 13:50
Published on June 24, 2020 10:50 AM GMT

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

"she has often seen a cat without a grin but never a grin without a cat"

Let's have a very simple model. There's a boolean, C, which measures whether there's a cat around. There's a natural number N, which counts the number of legs on the cat, and a boolean G, which checks whether the cat is grinning (or not).

There are a few obvious rules in the model, to make it compatible with real life:

  • ¬C → (N=0).
  • ¬C → ¬G.

Or, in other words, if there's no cat, then there are zero cat legs and no grin.

And that's true about reality. But suppose we have trained a neural net to automatically find the values of C, N, and G. Then it's perfectly conceivable that something might trigger the outputs ¬C and G simultaneously: a grin without any cat to hang it on.

Adversarial examples

Adversarial examples often seem to behave this way. Take for example this adversarial example of a pig classified as an airplane:

Imagine that the neural net was not only classifying "pig" and "airplane", but other things like "has wings" and "has fur".

Then the "pig-airplane" doesn't have wings, and has fur, which are features of pigs but not airplanes. Of course, you could build an adversarial model that also breaks "has wings" and "has fur", but, hopefully, the more features that need to be faked, the harder it would become.

This suggests that, as algorithms get smarter, they will become more adept at avoiding adversarial examples - as long as the ultimate question is clear. In our real world, the categories of pigs and airplanes are pretty sharply distinct.

We run into problems, though, if the concepts are less clear - such as what might happens to pigs and airplanes if the algorithm optimises them, or how the algorithm might classify underdefined concepts like "human happiness".

Myths and dreams

Define the following booleans: HH detects the presence of a living human head, HB a living human body, JH a living jackal head, JB a living jackal body.

In our world real world we generally have HH↔HB and JH↔JB. But set the following values:

¬HH,HB,JH,¬JB,

and you have the god Anubis.

Similarly, what is a dragon? Well, it's an entity such that the following are all true:

{is lizard, is flying, is huge, breath is fire, intelligence is human level, ...}

And, even though those features never go together in the real world, we can put them together in our imagination, and get a dragon.

Note that "is flying" seems more fundamental to a dragon than "has wings", thus all the wingless dragons that fly "by magic". Our imagination seem comfortable with such combinations.

Dreams are always bewildering upon awakening, because they also combine contradictory assumptions. But these combinations are often beyond what our imaginations are comfortable with, so we get things like meeting your mother - who is also a wolf - and handing Dubai to her over the tea cups (that contain milk and fear).

"Alice in Wonderland" seems to be in between the wild incoherence of dream features, and the more restricted inconsistency of stories and imagination.



Discuss

Does NYT have policies?

Новости LessWrong.com - 24 июня, 2020 - 07:06
Published on June 24, 2020 4:06 AM GMT

Does the New York Times have written policies? Does it publish them? Have they leaked?

Here is a list of six public documents. Most interesting are the Ethical Journalism Guidebook/Handbook and the Guidelines on [Our] Integrity. The first mentions three documents: (A) "the Newsroom Integrity Statement, promulgated in 1999"; is this the Guidelines linked above? (B) "the Policy on Confidential Sources, issued in 2004," archived here. Do they still publish it? and (C) "the Rules of the Road," which sounds like a private document not specific to journalism.

Are there other private written policies? Have they leaked? Are there rumors about them?

I don't mean to imply that policies are an unalloyed good. At some level of detail or disorganization, people simply don't learn them. I have a largely unjustified intuition that lying is bad and lying about policies is particularly bad, seeming to exist to diffuse responsibility.



Discuss

The Dark Miracle of Optics

Новости LessWrong.com - 24 июня, 2020 - 06:09
Published on June 24, 2020 3:09 AM GMT

Alternate titles:

  • The Public-Private Information Gap Rules Everything Around Me
  • Baudrillard’s Simulacra, Steelmanned
  • “Having your cake and eating it too”
  • The Precarity of Prestige Economies
  • “Goodhart’s is just a subset, mannn.”
  • “Costly signals are just a subset, mannn.”
  • The Tragedy of Appearances
  • On Truth & Lies in a Nonmoral Sense

Epistemic status: no idea how original any of this is; it just connects a lot of nodes in my brain. I’ve been told there’s a real debt to Robert Trivers, which I hope to educate myself on shortly. I may just be reinventing signal theory.

In the beginning there was defection.

We can treat a prisoner’s dilemma as an elegant stand-in for coordination more generally. A one-off dilemma has as its ideal solution defection. Bellum omnium contra omnes: the war of all against all, or, “hyper-individualism.”

At the same time, it is clear that many of the “benefits sought by living things”[1]—which is to say, that which assists survival—are more readily reached by group effort.

Crucially, an iterated prisoner’s dilemma has the opposite optimal equilibrium: tit-for-tat, or cooperation, in its many variations, its various guises. And so the trick becomes how to switch individuals over from one-offs onto iterated dilemmas. The technology which achieves this is reputation, allowing formation of a ledger anchored to names[2], faces, identities. Individuals sharing an ecosystem continually run into each other, and given a reputation ledger, cannot defect and “get away” with it, continuing to freeride into oblivion.[3]

Tit-for-tat is exceedingly simple. It enables mutualism (in symbiosis[4]) and is practiced internally to species as diverse as stickleback fish, tree swallows, all primates, bats. All it requires is a sense of continuous identity and tracking of that identity’s (recent-)historical actions. We can take bats as an example: Since mothers’ hunting returns are unequally distributed, but bat babies do better when consistently fed, mothers communally share food. But should a researcher pump up a mother’s gullet full of air, so it appears she had a strong return but is refusing to share, suddenly the other mothers will no longer feed her children, will allow them to starve.

We can read human progress as a history of instituting cooperation. The Stele of Hammurabi’s famous law is eye for an eye; many of its other decrees are variations thereof: if a building collapses and kills its occupant, its builder shall be put to death as well. The Old Testament introduces the Commandments, the laws of Exodus. Almost every major religion has its in-house variation on the Golden Rule. These are efforts at securing internal coordination of the group, which a la Studies on Slack” and multi-level selection theory, will outcompete other groups once instituted. I have heard from reliable sources that laws in the Quran, and many other major religious texts, have similar structures.

But vanilla tit-for-tat reputational ledgers, like a barter system, are difficult and costly to continuously verify. It requires small, local communities of recognition, and prevents civilizations from scaling up. And so there was a need for currency, for credit, for the accumulation, transportation, and commensurability of capital, all of which together say: this individual cooperates. (Or to generalize across contexts, since optics signal more broadly than mere cooperation: This individual carries value; an interaction with her will be positive-sum.) This currency needed to be legible and exchanged across markets, across subcommunities. For these and a thousand other reasons we invented proxies, heuristics, measurements; instituted CVs, letters of recommendation, titles of achievement and nobility, and of course, fashion. But currency is easily counterfeited.

*

Clothing arises to serve object-level purposes: warmth from cold, shelter from sun. But soon it gives rise in turn to fashion: equally tactical, but suited for the social, rather than literal, landscape. (For a social being, both are essential to survival.) Because the garments, the choices of paint pigment, the geometric patterns and metal hanging from the ears, reflected both the wealth of the individual and his affiliation to group, they became sources of information for recipients, on-lookers: ways of deducing whole from part, of grokking a person. As social reality devours the physical—in Baudrillard’s terms, simulacra—thus fashion devours its mother.

In the Upper-Middle Paleolithic Transition, human societies and economies grow increasingly complex. Trade deals and diplomacy are performed among credible spokesmen, and social hierarchies need preservation across interactions between strangers. Fashion enters as a technology for maintaining and navigating the social graph. “By the production of symbolic artefacts that signified different social groups and kinds of relationships, Aurignacian people were able to maintain wider networks that could exist even between people who had never set eyes on each other,” giving them a competitive advantage. The practice spreads through the law of cultural evolution: “The surface of the body… becomes the symbolic stage upon which the drama of socialisation is enacted, and body adornment… becomes the language through which it was expressed.”[5] We have entered the second stage of simulacra. The territory has a map, and there are parties interested in manipulating it.

*

Or consider the butterfly. A “protected” species (poisonous, inedible, etc) gains a survival advantage through honest signaling of this protection. An honest signal is a Pareto improvement—a win-win. The butterfly does not wish to be eaten; the predator does not wish to eat a toxic insect. How does it evolve this protection?

Brute association. The outward phenotypic expression of the butterfly—its public information—becomes associated with some interior, private information—its toxicity. Let’s say the distinctive pattern is black and red. A predator cannot tell whether an insect is toxic from sight, but it can tell by proxy. In other words, the butterfly develops a reputation.

*

Once this association between optics and essence, between appearance and reality, between signal and quality (the biological frame) or public and private information (the economic one), is formed, it can be freeridden. It becomes, in most cases, easier to pay “lip service”—to outwardly express the associated public characteristic—than it is to to develop the private characteristic. This is not entirely the fault of the freerider; it is a difficult situation he finds himself in. Imagine he “chooses” (I’m anthropomorphizing evolution) to remain with his blue and yellow colors: even if his “product” is “good” (I’m mixing metaphors, but I mean to say, his advertising is honest), it will take some time for a trusted association between signal and quality, public and private, to form. As consumers, we may initially disbelieve an advertiser’s claims, and for good reason, since there is incentive to deceive. And thus it is with the sun-basking lizard, deciding which butterfly to eat. Far easier for a precarious insect to ride coattails, to imitate and pretend toward what he is not—and so, quite simply, it does.

The connection with fashion should come into view now. The “barberpole” metaphor of fashion, where lower classes continually imitate higher classes, who are themselves engaged in a continual quest for “distinction” from the chasing masses, is a popular one in rationalist circles for good reason. Its cyclical nature is the result of limited options and a continual evasion of freeriders who exploit an associative proxy: clothing for caste.

*

A quick inventory of where we are: Individuals profit from cooperation, but are always personally incentivized to defect. Reputation ledgers switch us from the one-off tit-for-tat, and its incentivized defection, into an iterated tit-for-tat, and its incentivized cooperation. As civilizations scale, and we wish to do more with what we have, achieve new complexities, we move to an alternate system. A credit system of heuristic and proxy. Thus an individual who wishes to enter the art world will work internships in which she forges relationships of trust, in the hope that she will be recommended. And the employer who takes the recommendation will do so on account of having built up trust with the recommender; this trust is built by history, and its credits are transferable. (Each exchange, of course, comes with a diminishment.) Across many recommendations and positions, across many cities, the accumulating recommendations become virtualized: not only can one fabricate a CV, but one can embellish it, and the latter behavior is so ubiquitous it is hard to call it “cheating,” even though this is what a dishonest signal is. And, at the same time, this intern will find herself competing in a much larger implicit landscape of associations, in which the clothes she wears, the way she speaks, and a hundred other variables come together to—by proxy—provide further evidence of value.

Imagine that a bat mother, instead of having her gullets pumped full of air by researchers, developed a technology to achieve the opposite: to appear as if she had not caught insects, when in reality she had. In other words, to appear as if she was cooperating when in fact she was defecting. To present public information at odds with private information. This bat’s offspring would be most fit, would pass on its genes at higher rates. This bat would have discovered the miracle of optics. But it is a dark, and short-term miracle: the population as a whole would lose its fitness, as its ability to cooperate diminished.

It is better to cooperate than defect. But it is better still to defect while others around you cooperate: to reap the advantages of coordinated effort while contributing none of the work (freeriding). This behavior is blocked insofar as it is noticed. Social systems are not two-player, but N-player games, and resemble public goods games more than prisoners dilemmas, and thus even in the presence of parasites, it can be optimal for other players to invest in the pool.[6] But freeriders remain a burden on the system that rational players will wish to eliminate.

While an honest signal is beneficial to all parties involved—it adds true information to the informational ecosystem which actors can base actions on—a dishonest signal is definitionally exploitative. It causes another self-interested actor to behave against its interest, because its premises are malformed. It causes the sun-basking lizard to pass up on the butterfly, believing it to be protected, when in reality, it is only masquerading.

*

This is the tragedy of appearances. The cheater is punished if he is caught cheating; a society which punishes cheaters (or “parasites”) outperforms that which does not; and thus his optimal behavior will always be to cheat and pretend otherwise, to evade enforcers. He can do this by means of appearance, and the more that appearance is selected for, the more easily he can simply pretend, while embodying none of the internal, proxied-for qualities. freerider situations don’t work when the supporting actor can recognize freeriding, thus the trick, if one wishes to continue freeriding, is to prevent such a recognition.

This is the superset of Goodhart-Campbell. The solution is the superset of costly signaling. The greater the divergence in the incentive structure between proxy and proxied, the greater the incentives to optimize for appearance. Thus we can understand politics, where everything “real” is hidden behind a great veil, and public image carefully manipulated. Thus we can understand Baudrillard’s simulacra, at least in its steelmanned form: the first level is honest signaling, a one-to-one relationship between public and private. Levels 2-4 are self-aware manipulations, “complex patterns of strategic interactions,”[7] and if you believe Baudrillard, we are long past naivete, between simple one-to-oneness. An unsophisticated relationship to maps is a departure point, not a finish.

The tragedy of appearances, and our incessant optimization thereof, is a problem society does not yet seem to have stable solutions to. Taleb might admonish us, in Skin In The Game, to never trust a surgeon who looks the part, to never employee a straight-A student—but while wise as manipulations of the current fashion field, these are inherently unstable and contingent solutions. As soon as we would follow his advice we would see surgeons trending toward slovenliness, students strategically achieving B-grades in Bio for the sake of seeming interesting. Those familiar with Goodhard-Campbell know the pattern well, and the only answers are the same: diminish the gap between incentivized appearance and desired behavior. Easier said than done.

Or we might move away from proxy, heuristic, appearance; we might ditch the resume and credential. But would we move ahead or backwards? Would we become more or less encumbered, more or less handicapped? Currency can be more easily counterfeited, a silver finish over a nickel core, a nice embossing. “If it looks the part…” But look at currency’s advantages.

I wrote in a recent comment to Zvi’s post on simulacra:

But the actually toxic butterflies—the original honest signalers—they can't go anywhere. They're just stuck. One might happen to evolve a new phenotype, but that phenotype isn't protected by reputational association, and it's going to take a very long time for the new signal-association to take hold in predators. Once other insects have learned how to replicate the proxy-association or symbol that protected them, they can only wait it out until it's no longer protective.

Thus there is an arms race toward manufacturing and recognizing what can only be called “bullshit,” following Harry Frankfurt. It is speech designed to improve one’s image. And as our world becomes more mediated by representation, it in turn becomes more exploitable. We enter the Simulacra.

[1] Axelrod & Hamilton 1981.

[2] The Wire season 2: Barksdale’s crew develops a bad reputation for their underwhelming H, renames it to ditch the old baggage and keep slinging shitty product.

[3] See “recognize and retaliate.”

[4] Hence the parasite, which is a freerider (or worse).

[5] David Lewis-Williams, The Mind in the Cave: Consciousness and the Origins of Art

[6] Thanks to romeostevensit for pointing me toward related literature.



Discuss

Half-Baked Products and Idea Kernels

Новости LessWrong.com - 24 июня, 2020 - 04:00
Published on June 24, 2020 1:00 AM GMT

When I ask someone at work for a project proposal, I never want the person to go silent on me and put in 100 hours of solitary work, and then finally show me something and ask for my feedback. I always want to see a half-baked product.

You can half-bake something in an hour or two, or even in a few minutes.

The advantage of half-baking is that you get a quick feedback loop. The more you think there’s a possibility that I’ll say “no, that’s not what I wanted”, the more half-baked you should make your first effort before getting my feedback.

When brainstorming ideas, my term for a half-baked idea is a kernel. A kernel is usually a crappy idea on its own, but there’s “something to it” that could make it the seed of a better idea. I encourage people to toss out kernels.

There are two reasons why operating this way is efficient:

Diminishing Returns on Time Spent

Say you work on something for 100 hours. While each hour adds value, typically the highest-value hour is the first hour and the lowest-value hour is the last hour, and it follows a curve like this:

For example, if you're going to spend 100 hours writing a long report, spending one hour to brain-dump the key bullet points would give a reader a lot more than 1% of the final value of your report. Realistically, it’s probably more like 20%.

So the less time you spend working before getting feedback, the higher your productivity was in that time.

Efficient Course Correction

When you’re starting out on a new project that isn’t well-understood, you’re unlikely to go in the exact right direction. So you don’t want to go too far before getting a course correction, or you’ll waste time.

The top path shows how most people waste their time by investing too much effort between course corrections. The bottom path shows the efficient approach: you do a small chunk of work, then get feedback from your boss or your customers to correct your course, then do the next small chunk of work.

Once your course corrections become small, you can do larger chunks of work between course corrections. Until then, take small steps that produce half-baked products and idea kernels.



Discuss

How do you Murphyjitsu essentially risky activities?

Новости LessWrong.com - 24 июня, 2020 - 00:09
Published on June 23, 2020 9:09 PM GMT

In the CFAR Handbook they have the following process instructions for Murphyjitsu:

  1. Select a goal. A habit you want to install, or a plan you’d like to execute, or a project you want to complete.
  2. Outline your plan. Be sure to list next actions, concrete steps, and specific deadlines or benchmarks. It’s important that you can actually visualize yourself moving through your plan, rather than having something vague like work out more.
  3. Surprise-o-meter. It’s been months, and you’ve made little or no progress! Where are you, on the scale from yeah, that sounds right to I literally don’t understand what happened? If you’re completely shocked—good job, your inner sim endorses your plan! If you’re not, though, go to Step 4.
  4. Pre-hindsight. Try to construct a plausible narrative for what kept you from succeeding. Remember to look at both internal and external factors.
  5. Bulletproofing. What actions can you take to prevent these hypothetical failure modes? Visualize taking those preemptive actions and then ask your inner sim “What comes next?” Have you successfully defused the danger?
  6. Iterate steps 3-5. That’s right—it’s not over yet! Even with your new failsafes, your plan still failed. Are you shocked? If so, victory! If not—keep going.

It seems like this process presumes that mitigations are low-cost and that the project you are trying to achieve is fundamentally achievable according to your inner sim. Most of this is presumption is contained in step 3. I've been thinking about how to apply this process to projects in a professional context (rather than a "self-help" context I guess) and in many cases you face costly tradeoffs regarding derisking mitigations. Also, sometimes your project may just be a big bet.

How do you change Murphyjitsu to work in such situations? Also, if people have experiences using Murphyjitsu in projects (e.g. a 1-3 month project involving a small team people), I'd be interested in learning how it's different.



Discuss

Modelling Continuous Progress

Новости LessWrong.com - 23 июня, 2020 - 21:06
Published on June 23, 2020 6:06 PM GMT

I have previously argued for two claims about AI takeoff speeds. First, almost everyone agrees that if we had AGI, progress would be very fast. Second, the major disagreement is between those who think progress will be discontinuous and sudden (such as Eliezer Yudkowsky, MIRI) and those who think progress will be very fast by normal historical standards but continuous (Paul Chrisiano, Robin Hanson).

What do I mean by ‘discontinuous’? If we were to graph world GDP over the last 10,000 years, it fits onto a hyperbolic growth pattern. We could call this ‘continuous’ since it is following a single trend, or we could call it ‘discontinuous’ because, on the scale of millennia, the industrial revolution exploded out of nowhere. I will call these sorts of hyperbolic trends ‘continuous, but fast’, in line with Paul Christiano, who argued for continuous takeoff, defining it this way:AI is just another, faster step in the hyperbolic growth we are currently experiencing, which corresponds to a further increase in rate but not a discontinuity (or even a discontinuity in rate).I’ll be using Paul’s understanding of ‘discontinuous’ and ‘fast’ here. For progress in AI to be discontinuous, we need a switch to a new growth mode, which will show up as a step function in the capability of AI or in the rate of change of the capability of the AI over time. For takeoff to be fast, it is enough that there is one single growth mode that is hyperbolic or some other function that is very fast-growing.

This post tries to build on a simplified mathematical model of takeoff which was first put forward by Eliezer Yudkowsky and then refined by Bostrom in Superintelligence, modifying it to account for the different assumptions behind continuous, fast progress as opposed to discontinuous progress. As far as I can tell, few people have touched these sorts of simple models since the early 2010’s, and no-one has tried to formalize how newer notions of continuous takeoff fit into them. I find that it is surprisingly easy to accommodate continuous progress and that the results are intuitive and fit with what has already been said qualitatively about continuous progress.

The code for the model can be found here.

The Model

The original mathematical model was put forward here by Eliezer Yudkowsky in 2008:

In our world where human brains run at constant speed (and eyes and hands work at constant speed), Moore’s Law for computing power s is:s = R(t) = e^t...So to understand what happens when the Intel engineers themselves run on computers (and use robotics) subject to Moore’s Law, we recursify and get:dy/dt = s = R(y) = e^yHere y is the total amount of elapsed subjective time, which at any given point is increasing according to the computer speed s given by Moore’s Law, which is determined by the same function R that describes how Research converts elapsed subjective time into faster computers. Observed human history to date roughly matches the hypothesis that R is exponential with a doubling time of eighteen subjective months (or whatever).

In other words, we start with

I(t)=et

because we assume speed is a reasonable proxy for general optimization power, and progress in processing speed is currently exponential. Then when intelligence gets high enough, the system becomes capable enough that it can apply its intelligence to improving itself, the graph ‘folds back in on itself’ and we get

dI/dt=I′(t)=et

which forms a positive singularity. The switch between these two modes occurs when the first AGI is brought online.

In Superintelligence, Nick Bostrom gave a different treatment of the same idea:

For Bostrom, we have

I′(t)=optimization/recalcitrance

where recalcitrance is how hard the system resists the optimization pressure applied to it. Given that (we assume) we are currently applying a roughly constant pressure to improve the intelligence of our systems, but intelligence is currently increasing exponentially (again, equating computing speed with intelligence), recalcitrance declines as the inverse of the system’s intelligence, and the current overall rate of change of intelligence is given by .mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} I′(t)=cI.

When recursive self-improvement occurs, the applied optimization pressure is equal to the outside world's contribution plus the system's own intelligence

I′(t)=(c+I)I=cI+I2

If we make a like-for-like comparison with Bostrom and Yudkowsky's equations, we get I′(t)=eI for Yudkowsky and I′(t)=I2 for Bostrom in the RSI condition. These aren't as different as they seem - Yudkowsky's solves to give I(t)=−ln(c−t) and Bostrom's gives I(t)=1/(c−t), the derivative of Yudkowsky's! They both reach positive infinity after finite time.

These models are, of course, very oversimplifed - Bostrom's does acknowledge the possibility of diminishing returns on optimization, although he thinks current progress suggests accelerating returns. 'All models are wrong but some are useful' - and there does seem to be some agreement that these models capture some of what we might expect on the Bostrom/Yudkowsky model.

I'm going to take Bostrom's equation, since it clearly shows how outside-system and inside-system optimization combine in a way that Yudkowsky's doesn't, and try to incorporate the assumptions behind continuous takeoff, as exemplified by this from Paul Christiano:

Powerful AI can be used to develop better AI (amongst other things). This will lead to runaway growth.This on its own is not an argument for discontinuity: before we have AI that radically accelerates AI development, the slow takeoff argument suggests we will have AI that significantly accelerates AI development (and before that, slightly accelerates development).

We model this by, instead of simply switching from I′(t)=cI to I′(t)=cI+I2 when RSI becomes possible, having a continuous change function that depends on I - for low values of this function only a small fraction of the system's intelligence can be usefully exerted to improve its intelligence, because the system is still in the regime where AI only 'slightly accelerates development'.

I′(t)=cI+f(I)I2

this f(I) needs to satisfy several properties to be realistic - it has to be bounded between 0 (our current situation, with no contribution from RSI), and 1 (all the system's intelligence can be applied to improve its own intelligence), and depend only on the intelligence of the system. The most natural choice, if we assume RSI is like most other technological capabilities, is the logistic curve.

f(I)=11+e−d(I(t)−IAGI)

where d is the strength of the discontinuity - if d is 0, f(I) is always fixed at a single value (this should be 0.5 but I forced it to be 0 in the code). If d is infinite then we have a step function - discontinuous progress from cI to (c+I)I at AGI exactly as Bostrom's original scenario describes. For values between 0 and infinity we have varying steepnesses of continuous progress. IAGI is the Intelligence level we identify with AGI. In the discontinuous case, it is where the jump occurs. In the continuous case, it is the centre of the logistic curve. here IAGI=4


All of this together allows us to build a (very oversimplified) model of some different takeoff scenarios, in order to examine the dynamics. The variables we have available to adjust are,

    • IAGI - the approximate capability level required for RSI
    • d - how sudden a breakthrough is RSI
    • I0 - the initial capability of the system
    • c - the strength of the constant optimization pressure applied to the system by the outside world
Discontinuities

Varying d between 0 (no RSI) and infinity (a discontinuity) while holding everything else constant looks like this:

If we compare the trajectories, we see two effects - the more continuous the progress is (lower d), the earlier we see growth accelerating above the exponential trend-line (except for slow progress, where growth is always just exponential) and the smoother the transition to the new growth mode is. For d=0.5, AGI was reached at t=1.5 but for discontinuous progress this was not until after t=2. As Paul Christiano says, slow takeoff seems to mean that AI has a larger impact on the world, sooner.

Or, compare the above with the graph in my earlier post, where the red line is the continuous scenario:

If this model can capture the difference between continuous and discontinuous progress, what else can it capture? As expected, varying the initial optimization pressure applied and/or the level of capability required for AGI tends to push the timeline earlier or later without otherwise changing the slope of the curves - you can put that to the test by looking at the code yourself, which can be found here.

Speed

Once RSI gets going, how fast will it be? This is an additional question which is connected to the question of discontinuities, but not wholly dependent on it. We have already seen some hints about how to capture post-RSI takeoff speed - to model a slow takeoff I forced the function f(I) to always equal 0. Otherwise, the function f(I) is bounded at 1. Suppose the speed of the takeoff is modelled by an additional scaling factor behind f(I) which controls how powerful RSI is overall - if this speed s is above 1, then the AGI can exert optimization pressure disproportionately greater than its current intelligence, and if s is below 1 then the total amount of optimization RSI can exert is bounded below the AGI's intelligence:

I′(t)=cI+sf(I)I2=cI+sI21+e−d(I(t)−IAGI)

Here are two examples with s = 2 and 0.5:


I'm less sure than in the previous section that this captures what people mean by varying takeoff speed independent of discontinuity, but comparing the identical colours in the two graphs above seems to be a reasonable fit to my intuitions about what a differing takeoff would be like for the same level of continuity, but differing 'speed'.

Conclusion

I've demonstrated a simple model of AI takeoff with two main variables, the suddenness of the discontinuity d and the overall strength of the RSI s, with a few simplifying assumptions - that Bostrom's 2012 model is correct and that RSI progress, like most technologies, follows a logistic curve. This model produces results that fit with what proponents of continuous progress expect - progress to superintelligence is smoother with more warning time, and occurs earlier.



Discuss

[META] Building a whisper network.

Новости LessWrong.com - 23 июня, 2020 - 17:12
Published on June 23, 2020 2:12 PM GMT

The recent disappearance of Star Slate Codex made me realise that censorship is a real threat to the rationalist community. Not hard, government mandated censorship, but censorship in the form of online mobs prepared to harass and threaten those seen to say the wrong thing.

The current choice for a rationalist with a controversial idea seems to be to publish it online, where the most angry mobs from around the world can access it easily, or not to publish at all.

My solution, digital infrastructure for a properly anonymous, hidden rationalist community.

Related to kolmogorov-complicity-and-the-parable-of-lightning (Now also deleted, but here are a few people discussing it)

https://www.quora.com/How-has-religion-hindered-scientific-progress-during-the-reformation-period

https://www.reddit.com/r/slatestarcodex/comments/78d8co/kolmogorov_complicity_and_the_parable_of_lightning/

So we need to create the social norms and digital technologies to allow good rationalist content to be created without fear of mobs. My current suggestions include.

Security

1) Properly anonymous publishing. Each post that is put into this system is anonymous. If a rationalist posts many posts, then subtle clues about their identity could add up, so make each post independently anonymous. Given a specific post, you can't find others by the same author. Record nothing more than the actual post. With many rationalists putting posts into this system, and none of the posts attributable to a specific person, mobs won't be able to find a target. And noone knows who is putting posts into the pool at all.

2) Delay all published posts by a random time up to a week, we don't want to give info away about your timezone, do we.

3) Only certain people can access the content. Maybe restrict viewing to people with enough less wrong karma. Maybe rate limit it to 4 posts a day or something, to make it harder to scrape the whole anonymous site. (Unrestricted anonymous posting, restricted viewing is an unusual dynamic)

4) Of course only some posts will warrant such levels of paranoia, so maybe these features could be something that can be turned on and off independently.

My current model of online mobs is that they are not particularly good at updating on subtle strands of evidence and digging around online. One person who wants to stir up a mob does the digging, and then posts the result somewhere obvious. This raises the possibility of misinformation. If we can't stop one person putting our real name and address on a social media post where mobs can pass it around, could we put out other false names and addresses.

Preventing Spam

1) Precondition GPT-X on a sample of rationalist writings. Precondition another on samples of spam. Anything that causes more surprise on the rationalist net than the spam net is probably spam. (In general, AI based spam filtering)

2) Reuse codes. When you input a post, you can also put in a random codeword. Posts are given preferential treatment for the spam filtering if they are associated with a code that was also given with known good posts. codewords are hashed and salted before being stored on the server, along with a number representing reputation, and never shown. Posts are stored with their reputation + a small random offset.

3) Changing password. Every hour, come up with a new password. Whenever anyone with enough Karma requests any page, put the hours password in small text at the bottom of the page (or even in a html comment). When someone uses this password, you know that it was someone who visited some lesswrong page in the last hour, and can't tell who. You could restrict viewing with the same password.

I look forward to a discussion of which cryptographic protocols are most suitable for building The Whisper.



Discuss

Old-world Politics Fallacy

Новости LessWrong.com - 23 июня, 2020 - 15:32
Published on June 23, 2020 12:32 PM GMT

When a nasty political problem (like the current SSC situation) hits my consciousness, I'm habituated to Do Something About It. I feel an urge to investigate the political climate, find allies, and fight back against the threat. In the current Internet age, and with my nonexistent political clout and social influence (especially considering I live in Iran and people here generally can't be counted on to know or care about global politics), the substitution bias kicks in; I substitute doing The Real Thing (to which my contribution would be very meager if any) with reading online forums (Reddit, Twitter, Lesswrong, SSC, Hackernews, ...)(substituted for "investigate the political climate"), voting the Correct posts in said forums and possibly writing some low-effort answers to particularly egregious pieces ("find allies, and fight back"), and generally feeling bad that I have failed the Mission (and that the Society is broken).

I speculate that this fallacy probably has some evolutionary roots. In a hunter-gatherer tribe, a person such as myself (I estimate myself to be upper-mid status.) would have had a fair chance of affecting political change against causes that mostly hurt everyone but a minority of politicians who don't produce much value anyways. Especially since I (and my family) have been quite morally upstanding and honest, in a small community, we would have had a reputation to draw on. Even if not, engaging with the community would have been the essential first step in being part of a coalition, necessary for survival.

Obviously, in the 21st century, all this is moot for political stuff that actually matters. Most people are quite powerless in affecting those matters, one reason being that the important issues now affect orders of magnitude more people. I don't know what the optimal strategy currently is. My gut feeling is that a lot of the good people are un-politicizing themselves and simply giving up. (Hasn't Scott's default defensive strategy been more (self)censorship?) Individual contributory power being what it is, this might actually be the best heuristic. If political "activism" consists mostly of low-skill low-reputation noise-making, a game of quantity over quality, then "good" people will immediately lose the comparative advantage. In fact, in Iran, the situation seems to be that the only marginally effective activism is violence. Which, in an authoritarian regime, obviously leaves the hungry and the criminal to fend off the evil. The second derivative of their numbers being positive because of abject government failure does not exactly lend me hope.

To summarize; The old-world politics fallacy is the mistaken alief that you have meaningful political power in the 21st century. It often manifests via the substitution bias as fervent but ultimately zero-impact digital activity. Obviously, the fallacy leaves a bitter taste when you do notice that your efforts bore no fruits.

A related phenomenon might be our largely unwarranted interest/motivation (the two are subtly different) for participating in social media that engage mostly with strangers. We are wired to find coalitions of like-minded people and join them, because like-minded people's values align better with our own. The modern era, by allowing our freedom and individuality to flourish, has allowed us to be ever more different than one another. Social media empower us to find and communicate with people much more precisely than is possible in the physical world. These two trends cause us to seek coalition-building in the cyberspace, when the cyberspace usually does not facilitate that function (yet). This will, necessarily, cause us to overvalue our social media interactions. (Because we have the alief that we are actually accumulating social capital when no such thing is happening.)

I don't know what to call this cognitive error. "Pseudo-socializing?"



Discuss

Personal experience of coffee as nootropic

Новости LessWrong.com - 23 июня, 2020 - 15:32
Published on June 23, 2020 12:00 PM GMT

My life took a turn in 2015. That was when I first started drinking coffee on a regular basis. Not Americanos or Lattes, but 'Kopi' brewed with higher-caffeine Robusta beans using a 'sock' immersed in water close to boiling temperature, extracting massive amounts of caffeine along with strong bitter notes that are neutralized by sweet, sugary condensed milk.

My rate of learning in sports shot up seemingly overnight. My field of vision and attention increased tremendously. I gained more stamina and muscle control. It wasn't just a subjective feeling. The results showed in the speed and accuracy of the pitches I threw. The first training session I had with coffee was so productive, I almost swore that I would never train without it again.

Seeing its benefits, I began to use it for my studies as well. The hit of dopamine I got from starting problem sets with a cup of kopi was nothing like I'd experienced before. While in the past I had to start slow and enter 'flow' gradually, this was a shortcut to the flow state. Sure, learning didn't happen instantly, and I still had to struggle, but I enjoyed the process. I felt unnaturally productive; my absorption of information shot up, and I could go at it for progressively longer hours. This was my NZT pill.

Gradually, the more I was able to accomplish, the greater the aims I set for myself. I couldn't believe it. As a result of drinking coffee, my 'ambition' increased because my self-judgement on learning ability and productivity increased. I had other strong reasons for working hard, but the biochemical changes that occurred in my brain and body after ingesting coffee brought an emotional multiplier to those reasons.

However, this is where complications came in. I slowly began to build up a heavy tolerance to it. When before it raised my energy and loosened my social inhibitions, I started "zoning out" with friends, craving for that focused mode where I could work on my studies. Initially, I attributed the gradual shelling up to the long hours of study and mental stress caused by the major exam at the end of the year. Only after abstaining from coffee for a period of time, did I think of linking that never-before-experienced social discomfort to coffee.

On hindsight, I realised that I might have been experiencing a mild but non-negligible form of anxiety that came from work-related stress, compounded by the spikes in cortisol and adrenaline my body was producing in reaction to coffee. The initial spike of motivation in getting things done was offset by the dreaded caffeine crash. The sharp contrast of focus and pleasure in pre-coffee and post-coffee states was what got me into daily consumption, but in the long run, consuming coffee at that rate was a net detriment.

This leads to a question I have been trying to solve, and would like to pose to the community: if a common beverage like coffee can cause such a marked increase in life satisfaction and work/study output, what other foods or practices can lead to the same changes in subjective state, and is sustainable on a time scale of decades?



Discuss

New York Times, Please Do Not Threaten The Safety of Scott Alexander By Revealing His True Name

Новости LessWrong.com - 23 июня, 2020 - 15:20
Published on June 23, 2020 12:20 PM GMT

In reaction to (Now the entirety of SlateStarCodex): NYT Is Threatening My Safety By Revealing My Real Name, So I Am Deleting The Blog

I have sent the following to New York Times technology editor Pui-Wing Tam, whose email is pui-wing.tam@nytimes.com:

My name is Zvi Mowshowitz. I am a friend of Scott Alexander. I grew up with The New York Times as my central source of news and greatly value that tradition.

Your paper has declared that you intend to publish, in The New York Times, the true name of Scott Alexander. Please reconsider this deeply harmful and unnecessary action. If Scott’s name were well-known, it would likely make it more difficult or even impossible for him to make a living as a psychiatrist, which he has devoted many years of his life to being able to do. He has received death threats, and would likely not feel safe enough to continue living with other people he cares about. This may well ruin his life.

At a minimum, and most importantly for the world, it has already taken down his blog. In addition to this massive direct loss, those who know what happened will know that this happened as a direct result of the irresponsible actions of The New York Times. The bulk of the best bloggers and content creators on the internet read Scott’s blog, and this will create large-scale permanent hostility to reporters in general and the Times in particular across the board.

I do not understand what purpose this revelation is intended to serve. What benefit does the public get from this information?

This is not news that is fit to print.

If, as your reporter who has this intention claims, you believe that Scott provides a valuable resource that enhances the quality of our discourse, scientific understanding and lives, please reverse this decision before it is too late.

If you don’t believe this, I still urge you to reconsider your decision in light of its other likely consequences.

We should hope it is not too late to fix this.

I will be publishing this email as an open letter.

Regards,
Zvi Mowshowitz

PS for internet: If you wish to help, here is Scott’s word on how to help:

There is no comments section for this post. The appropriate comments section is the feedback page of the New York Times. You may also want to email the New York Times technology editor Pui-Wing Tam at pui-wing.tam@nytimes.com, contact her on Twitter at @puiwingtam, or phone the New York Times at 844-NYTNEWS.

(please be polite – I don’t know if Ms. Tam was personally involved in this decision, and whoever is stuck answering feedback forms definitely wasn’t. Remember that you are representing me and the SSC community, and I will be very sad if you are a jerk to anybody. Please just explain the situation and ask them to stop doxxing random bloggers for clicks. If you are some sort of important tech person who the New York Times technology section might want to maintain good relations with, mention that.)

If you are a journalist who is willing to respect my desire for pseudonymity, I’m interested in talking to you about this situation (though I prefer communicating through text, not phone). My email is scott@slatestarcodex.com.



Discuss

Страницы

Подписка на LessWrong на русском сбор новостей