Новости LessWrong.com

A community blog devoted to refining the art of rationality
Обновлено: 23 минуты 44 секунды назад

IRL 5/8: Maximum Causal Entropy IRL

4 апреля, 2019 - 13:53
Published on April 4, 2019 10:53 AM UTC

Every Monday for 8 weeks, we will be posting lessons about Inverse Reinforcement Learning. This is lesson 5. We are publishing it now because it would have interfered with the randomized controlled trial we are running, and yesterday we finished collecting responses from the participants. A LW post with the results will appear in a few days. Future IRL lessons will resume normally on Monday.

This lesson comes with the following supplementary material:

Have a nice day!

Discuss

Could waste heat become an environment problem in the future (centuries)?

4 апреля, 2019 - 11:57
Published on April 3, 2019 2:48 PM UTC

I have wondered about this scenario for a while, and would like to know what is your opinion about it. Its assumptions are quite specific and probably won't be true, but they do appear realistic enough for me.

(1): Assume that nuclear fusion becomes an available energy source within a couple centuries, it will provide a cheap, plentiful, emission-free, and long lasting source of energy for human activities.

(If this assumption is wrong, we are probably in trouble)

(2): Assume that continued economical/technological development requires increasing energy consumption indefinitely.

(This is probably wrong if we utilise completely new physics in the future, but I don't think this assumption is unlikely)

(3): Assume that the generation of waste heat during energy generation/consumption cannot be dramatically lowered in the short-term future.

(This is also probably wrong. But it will hold true if we still have to use machines/engines/generators based on the same design principles as we do today, and I don't see that happening too soon)

The logical conclusion from the above three assumption:

At some point after the implementation of nuclear fusion, humanity's energy consumption might reach a level so high that the waste heat we release into the atmosphere will be altering the Earth's climate system not unlike what our carbon emissions are doing today.

(Since the source of any future fusion plant is likely hydrogen in seawater, for Earth it probably acts as an extra heat source independent of the sun)

The Earth is functionally a giant spacecraft, and spacecrafts usually have very sophisticated heat management systems to prevent them from overheating, so perhaps we have to work with that as well.

I haven't done too much number crunching yet, I might have gotten the figures wildly wrong.

We know today the amount of solar energy the Earth receives per year is about ~5000 times the amount of energy humanity consumes.

If humanity's energy consumption increases 100 times, and 50% of the energy is released into the atmosphere as waste heat, then we are releasing ~1% of solar energy into the atmosphere as heat.

That might have some serious climate implications if lasting for a long time, but I'm not certain about that yet.

Possible solutions:

(1): Geoengineering, that seems to be obvious. We try to reduce the solar energy input on Earth when the heat we release is too much. But that probably will negatively impact the biosphere a lot due to photosynthesis issues.

(2): Set "energy consumption targets" for countries/firms/etc like current climate policy.

Problem: while countries can continue to develop their economy and technology without increasing carbon emission (by adopting clean energy, etc), a limit on energy consumption seems be a hard cap on a country's development that cannot be worked with. So, probably no one would be compliant with such an agreement...

(3): Colonising other planets/solar systems

Each colony would also have to face that problem.

The Earth (and any other planet/moon we colonise) seems to be functionally the same as a giant space station. And space stations need sophisticated maintenance systems, including management of waste heat.

Discuss

A significant idea

4 апреля, 2019 - 01:15
Published on April 3, 2019 10:15 PM UTC

Imagine a person in the ancient world who came up with the following idea: "What would the sun and moon look like if they were very very far away?" This idea would likely lead to the conclusion that they would look like tiny points of light, which then could lead to the question "What if the tiny points of light we call stars and planets are actually faraway suns and moons?"

Unfortunately, our ancient friend would likely be stuck at that point, due to the limitations of human vision and the lack of proper instruments for examining the nature of celestial objects. But our friend would be right, unlike nearly every other human until Giordano Bruno's cosmology of 1584.

My questions then are, what other ideas of similar power exist, how will we know them if we find them, and is there any way to search for them intentionally?

Discuss

Rationality Dojo

4 апреля, 2019 - 00:43
Published on April 3, 2019 9:43 PM UTC

Discuss

On AI and Compute

3 апреля, 2019 - 22:20
Published on April 3, 2019 7:00 PM UTC

This is a post on OpenAI’s AI and Compute piece, as well as excellent responses by Ryan Carey and Ben Garfinkel, Research Fellows at the Future of Humanity Institute.

Intro: AI and Compute

Last May, OpenAI released an analysis on AI progress that blew me away. The key takeaway is this: the computing power used in the biggest AI research projects has been doubling every 3.5 months since 2012. That means that more recent projects like AlphaZero have tens of thousands of times the “compute” behind them as something like AlexNet did in 2012.

When I first saw this, it seemed like evidence that powerful AI is closer than we think. Moore’s Law doubled generally-available compute about every 18 months to 2 years, and has resulted in the most impressive achievements of the last half century. Personal computers, mobile phones, the Internet...in all likelihood, none of these would exist without the remorseless progress of constantly shrinking, ever cheaper computer chips, powered by the mysterious straight line of Moore’s Law.

So with a doubling cycle for AI compute that’s more than five times faster (let’s call it AI Moore’s Law), we should expect to see huge advances in AI in the relative blink of an eye...or so I thought. But OpenAI’s analysis has led some people to the exact opposite view.[1]

Interpreting the Evidence

Ryan Carey points out that while the compute used in these projects is doubling every 3.5 months, the compute you can buy per dollar is growing around 4-12 times slower. The trend is being driven by firms investing more money, not (for the most part) inventing better technology, at least on the hardware side. This means that the growing cost of projects will keep even Google and Amazon-sized companies from sustaining AI Moore’s Law for more than roughly 2.5 years. And that’s likely an upper bound, not a lower one; companies may try keep their research budgets relatively constant. This means that increased funding for AI research would have to displace other R&D, which firms will be reluctant to do.[2] But for lack of good data, for the rest of the post I’ll assume we’ve more or less been following the trend since the publication of “AI and Compute”.[3]

While Carey thinks that we’ll pass some interesting milestones for compute during AI Moore’s Law which might be promising for research, Ben Garfinkel is much more pessimistic. His argument is that we’ve seen a certain amount of progress in AI research recently, so realizing that it’s been driven by huge increases in compute means we should reconsider how much adding more will advance the field. He adds that this also means AI advances at the current pace are unsustainable, agreeing with Carey. Both of their views are somewhat simplified here, and worth reading in full.

Thoughts on Garfinkel

To address Garfinkel’s argument, it helps to be a bit more explicit. We can think of the compute in an AI system and the computational power of a human brain as mediated by the effectiveness of their algorithms, which is unknown for both humans and AI systems. The basic equation is something like: Capability = Compute * Algorithms. Once AI Capability reaches a certain threshold, “Human Brain,” we get human-level AI. We can observe the level of Capability that AI systems have reached so far (with some uncertainty), and have now measured their Compute. My initial reaction to reading OpenAI’s piece was the optimistic one - AI Capability must be higher than we thought, since Compute is so much higher! Garfinkel seems to think that Algorithms must be lower than we thought, since Capability hasn’t changed. This shows that Garfinkel and I disagree on how precisely we can observe Capability. If our observation has room to be revised in light of other data, we can avoid lowering Algorithms to some extent. I think he’s probably right that the default approach should be to revise Algorithms downward, though there’s some room to revise Capability upward.

Much of Garfinkel’s pessimism about the implications of “AI and Compute” comes from the realization that its trend will soon stop - an important point. But what if, by that time, the Compute in AI systems will have surpassed the brain’s?

Thoughts on Carey

Carey says one important milestone for AI progress is when projects have compute equal to running a human brain for 18 years. At that point we could expect AI systems to match an 18-year-old human’s cognitive abilities, if their algorithms successfully imitated a brain or otherwise performed at its level. AI Impacts has collected various estimates of how much compute this might require - by the end of AI Moore's Law they should comfortably reach and exceed it. Another useful marker is the 300-year AlphaGo Zero milestone. The thinking here is that AI systems might learn much more slowly than humans - it would take someone about 300 years to play as many Go games as AlphaGo did before beating its previous model, which beat a top-ranked human Go player. A similar ratio might apply to learning to perform other tasks at a human-equivalent level (although AlphaGo Zero’s performance was superhuman). Finally we have the brain-evolution milestone; that is, how much compute it would take to simulate the evolution of a nervous system as complex as the human brain. Only this last milestone is outside the scope of AI Moore's Law.[4] I tend to agree with Carey that the necessary compute to reach human-level AI lies somewhere around the 18 and 300-year milestones.

But I believe his analysis likely overestimates the difficulty of reaching these computational milestones. The FLOPS per brain estimates he cites are concerned with simulating a physical brain, rather than estimating how much useful computation the brain performs. The level of detail of the simulations seems to be the main source of variance among these higher estimates, and is irrelevant for our purposes - we just want to know how well a brain can compute things. So I think we should take the lower estimates as more relevant - Moravec’s 10^13 FLOPS and Kurzweil’s 10^16 FLOPS (page 114) are good places to start,[5] though far from perfect. These estimates are calculated by comparing areas of the brain responsible for discrete tasks like vision to specialized computer systems - they represent something nearer the minimum amount of computation to equal the human brain than other estimates. If accurate, the reduction in required computation by 2 orders of magnitude has significant implications for our AI milestones. Using the estimates Kurzweil cites, we’ll comfortably pass the milestones for both 18 and 300-year human-equivalent compute by the time AI Moore's Law has finished in roughly 2.5 years.[6] There’s also some reason to think that AI systems’ learning abilities are improving, in the sense that they don’t require as much data to make the same inferences. DeepMind certainly seems to be saying that AlphaZero is better at searching a more limited set of promising moves than Stockfish, a traditional chess engine (unfortunately they don’t compare it to earlier versions of AlphaGo on this metric). On the other hand, board games like Chess and Go are probably the ideal case for reinforcement learning algorithms, as they can play against themselves rapidly to improve. It’s unclear how current approaches could transfer to situations where this kind of self-play isn’t possible.

Final Thoughts

So - what can we conclude? I don’t agree with Garfinkel that OpenAI’s analysis should make us more pessimistic about human-level AI timelines. While it makes sense to revise our estimate of AI algorithms downward, it doesn’t follow that we should do the same for our estimate of overall progress in AI. By cortical neuron count, systems like AlphaZero are at about the same level as a blackbird (albeit one that lives for 18 years),[7] so there’s a clear case for future advances being more impressive than current ones as we approach the human level. I’ve also given some reasons to think that level isn’t as high as the estimates Carey cites. However, we don’t have good data on how recent projects fit AI Moore’s Law. It could be that we’ve already diverged from the trend, as firms may be conservative about drastically changing their R&D budgets. There’s also a big question mark hovering over our current level of progress in the algorithms that power AI systems. Today’s techniques may prove completely unable to learn generally in more complex environments, though we shouldn’t assume they will.[8]

If AI Moore’s Law does continue, we’ll pass the 18 and 300-year human milestones in the next two years. I expect to see an 18-year-equivalent project in the next five, even if it slows down. After these milestones, we’ll have some level of hardware overhang[9] and be left waiting on algorithmic advances to get human-level AI systems. Governments and large firms will be able to compete to develop such systems, and costs will halve roughly every 4 years,[10] slowly widening the pool of actors. Eventually the relevant breakthroughs will be made. That they will likely be software rather than hardware should worry AI safety experts, as these will be harder to monitor and foresee.[11] And once software lets computers approach a human level in a given domain, we can quickly find ourselves completely outmatched. AlphaZero went from a bundle of blank learning algorithms to stronger than the best human chess players in history...in less than two hours.

1. Important to note that while Moore’s Law resulted in cheaper computers (albeit by increasing the scale and complexity of the factories that make them), this doesn’t seem to be doing the same for AI chips. It’s possible that Google’s TPUs will continue to decrease in cost after becoming commercially available, but without a huge consumer market to sell these to, it’s likely that these firms will mostly have to eat the costs of their investments. ↩︎

2. This assumes corporate bureaucracy will slow reallocation of resources, and could be wrong if firms prove willing to keep ratcheting up total R&D budgets. Both Amazon and Google are doing so at the moment. ↩︎

3. Information about the cost and compute of AI projects since then would be very helpful for evaluating the continuation of the trend. ↩︎

4. Cost and computation figures take AlphaGo Zero as the last available data point in the trend, since it’s the last AI system for which OpenAI has calculated compute. AlphaGo Zero was released in October 2017, but I’m plotting how things will go from now, March 2019, assuming that the trends in cost and compute have continued. These estimates are therefore 1.5 years shorter than Carey’s, apart from our use of different estimates of the brain’s computation. ↩︎

5. Moravec does his estimate by comparing the number of calculations machine vision software makes to the retina, and extrapolating to the size of the rest of the brain. This isn’t ideal, but at least it’s based on a comparison of machine and human capability, not simulation of a physical brain. Kurzweil cites Moravec’s estimate as well as a similar one by Lloyd Watts based on comparisons between the human auditory system and teleconferencing software, and finally one by the University of Texas replicating the functions of a small area of the cerebellum. These latter estimates come to 10^17 and 10^15 FLOPS for the brain. I know people are wary of Kurzweil, but he does seem to be on fairly solid ground here. ↩︎

6. The 18-year milestone would be reached in under a year and the 300-year milestone in slightly over another. If the brain performs about 10^16 operations per second, 18 year’s worth would be roughly 10^25 FLOPS. AlphaGo Zero used about 10^23 FLOPS in October 2017 (1,000 Petaflop/s-days, 1 petaflop/s-day is roughly 10^20 ops). If the trend is holding, Compute is increasing roughly an order of magnitude per year. It’s worth noting that this would be roughly a $700M project in late 2019 (scaling AlphaZero up 100x and halving costs every 4 years), and something like$2-3B if hardware costs weren’t spread across multiple projects. Google has an R&D budget over 20B, so this is feasible, though significant. The AlphaGo Zero games milestone would take about 14 months more of AI Moore's Law to reach, or a few decades of cost decreases if it ends. ↩︎ 7. This is relative to 10^16 FLOPS estimates of the human brain’s computation and assuming computation is largely based on cortical neuron count - a blackbird would be at about 10^14 FLOPS by this measure. ↩︎ 8. An illustration of this point is found here, expressed by Richard Sutton, one of the inventors of reinforcement learning. He examines the history of AI breakthroughs and concludes that fairly simple search and learning algorithms have powered the most successful efforts, driven by increasing compute over time. Attempts to use models that take advantage of human expertise have largely failed. ↩︎ 9. This argument fails if the piece’s cited estimates of a human brain’s compute are too optimistic. If more than a couple extra orders of magnitude are needed to get brain-equivalent compute, we could be many decades away from having the necessary hardware. AI Moore’s Law can’t continue much longer than 2.5 years, so we’d have to wait for long-term trends in cost decreases to run more capable projects. ↩︎ 10. AI Impacts cost estimates, using the 10-16 year recent order of magnitude cost decreases. ↩︎ 11. If the final breakthroughs depend on software, we’re left with a wide range of possible human-level AI timelines - but one that likely precludes centuries in the future. We could theoretically be months away from such a system if current algorithms with more compute are sufficient. See this article, particularly the graphic on exponential computing growth. This completely violates my intuitions of AI progress but seems like a legitimate position. ↩︎ Discuss What are the advantages and disadvantages of knowing your own IQ? 3 апреля, 2019 - 21:31 Published on April 3, 2019 6:31 PM UTC I've seen some answers here: https://www.quora.com/What-are-the-advantages-disadvantages-to-know-your-own-IQ But I would be curious to know the perspective from people here. Third alternative: taking an IQ test and tracking your IQ, but not looking at it. For example: Can tracking IQ be useful to track cognitive degradation and predict neurodegenerative diseases? Discuss Machine Pastoralism 3 апреля, 2019 - 19:04 Published on April 3, 2019 4:04 PM UTC This idea has occurred to me before, but in the interim I dismissed it and then forgot. Since it is back again more-or-less unprompted, I am writing it down. We usually talk about animals and their intelligence as a way to interrogate intelligence in general, or as a model for possible other minds. It occurred to me our relationship with animals is therefore a model for our relationship with other forms of intelligence. In the mode of Prediction Machines, it is straightforward to consider: prediction engines in lieu of dogs to track and give warning; teaching/learning systems for exploring the map in lieu of horses; analysis engines to provide our solutions instead of cattle or sheep to provide our sustenance. The idea here is just to map animals-as-capital to the information economy, according to what they do for us. Alongside what they do for us is the question of how we manage them. The Software 2.0 lens of adjusting weights to search program space reads closer to animal husbandry than building a new beast from the ground up with gears each time, to me. It allows for a notion of lineage, and we can envision using groups of machines with subtle variations, or entirely different machines in combination. This analogy also feels like it does a reasonable job of priming the intuition about where dangerous thresholds might lie. How smart is smart enough to be dangerous for one AI? Tiger-ish? We can also think about relative intelligence: the primates with better tool ability and more powerful communication were able to establish patronage and then total domestication over packs of dogs and herds of horses, cattle, and sheep. How big is that gap exactly, and what does that imply about the threshold for doing the same to humans? Historically we are perfectly capable of doing it to ourselves, so it seems like the threshold might actually be lower than us. Discuss Defeating Goodhart and the closest unblocked strategy problem 3 апреля, 2019 - 17:46 Published on April 3, 2019 2:46 PM UTC .mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} This post is longer and more self-contained than my recent stubs. tl;dr: Patches such as telling the AI "avoid X" will result in Goodhart's law and the nearest unblocked strategy problem: the AI will do almost exactly what it was going to do, except narrowly avoiding the specific X. However, if the patch can replaced with "I am telling you to avoid X", and this is treated as information about what to avoid, and the biases and narrowness of my reasoning are correctly taken into account, these problems can be avoided. The important thing is to correctly model my uncertainty and overconfidence. AIs don't have a Goodhart problem, not exactly The problem of an AI maximising a proxy utility function seems similar to the Goodhart Law problem, but isn't exactly the same thing. The standard Goodhart law is a principal-agent problem: the principal P and the agent A both know, roughly, what the principal's utility U is (eg U aims to create a successful company). However, fulfilling U is difficult to measure, so a measurable proxy V is used instead (eg V aims to maximise share price). Note that the principal and the agents goals are misaligned, and the measurable V serves to (try to) bring them more into alignment. For an AI, the problem is not that U is hard to measure, but that it is hard to define. And the AI's goals are V: there is no need to make V measurable, it is not a check on the AI, but the AI's intrinsic motivation. This may seem like a small difference, but it has large consequences. We could give an AI a V, our "best guess" at U, while also including all our uncertainty about how to define U. This option is not available for the principal agent problem, since giving a complicated goal to a more knowledgeable agent just gives it more opportunities to misbehave: we can't rely on it maximising the goal, we have to check that it does so. Overfitting to the patches There is a certain similarity with many machine learning techniques. Neural nets that distinguish cats and dogs could treat any "dog" photo as a specific patch that can be routed around. In that case, the net would define "dog" as "anything almost identical to the dog photos I've been trained on", and "cat" as "anything else". And that would be a terrible design; fortunately, modern machine learning gets around the problem by, in effect, assigning uncertainty correctly: "dog" is not seen as the exact set of dog photos in the training set, but as a larger, more nebulous concept, of which the specific dog photos are just examples. Similarly, we could define V as W+Δ, where W is our best attempt at specifying U, and Δ encodes the fact that W is but an example our imperfect minds have come up with, to try and capture U. We know that W is oversimplified, and Δ is an encoding of this fact. If a neural net could synthesis a decent estimate of "dog" from some examples, could it synthesis "friendliness" from our attempts to define it? The idea is best explained through an example. Example: Don't crush the baby or the other objects This section will present a better example, I believe, than the original one presented here. A robot exists in a grid world: The robot's aim is to get to the goal square, with the flag. It gets a penalty of −1 for each turn it isn't there. If that were the only reward, the robot's actions would be disastrous: So we will give it a penalty of −100 for running over babies. If we do so, we will get a Goodhart/nearest unblocked strategy behaviour: Oops! Turns out we valued those vases as well. What we want the AI to learn is not that the baby is specifically important, but that the baby is an example of important things it should not crush. So imagine it is confronted by the following, which includes six types of objects, of unknown value: Instead of having humans hand-label each item, we instead generalise from some hand-labelled examples, using rules of extrapolation and some machine learning. This tells the AI that, typically, we value about one-in-six objects, and value them at a tenth of the value of babies (hence it gets −10 for running one over). Given that, the best policy, with an expected reward of −9−10(2/6)≈−12.333…, is: This behaviour is already much better than we would expect from a typical Goodhart law-style agent (and we could complicate the example to make the difference more emphatic). Example: human over-confidence The above works if we humans correctly account for our uncertainty - if we not only produce W, but also a correct Δ for how good a match we expect between W and U. But we humans are often overconfident in their estimates, especially in our estimates of value. We are far better at hindsight ("you shouldn't have crushed the vase") than at foresight ("here's a complete list of what you shouldn't do"). Even knowing that hindsight is better, doesn't make the issue go away. This is similar to the planning fallacy. That fallacy means that we underestimate the time taken to complete tasks - even if we try to take the planning fallacy into account. However, the planning fallacy can be solved using the outside view: comparing the project to similar projects, rather than using detailed inner knowledge. Similarly, human overconfidence can be solved by the AI noting our initial estimates, our corrections to those initial estimates, our corrections taking into account the previous corrections, our attempts to take into account all previous repeated corrections - and the failure of those attempts. Suppose, for example, that humans, in hindsight, value one-in-three of the typical objects in the grid world. We start out with an estimate of one-in-twelve; after the robot mashes a bit too many of the objects, we update to one-in-nine; after being repeatedly told that we underestimate our hindsight, we update to one-in-six... and stay there. But meanwhile, the robot can still see that we continue to underestimate, and goes directly to a one-in-three estimate; so with new, unknown objects, it will only risk crushing a single one: If the robot learnt that we valued even more objects (or valued some of them more than +10), it would then default to the safest, longest route: . In practice, of course, the robot will also be getting information about what types of objects we value, but the general lesson still applies: the robot can learn that we underestimate uncertainty, and increase its own uncertainty in consequence. Full uncertainty, very unknown unknowns So, this is a more formal version of ideas I posted a while back. The process could be seen as: 1. Give the AI W as our current best estimate for U. 2. Encode our known uncertainties about how well W relates to U. 3. Have the AI deduce, from our subsequent behaviour, how well we have encoded our uncertainties, and change these as needed. 4. Repeat 2-3 for different types of uncertainties. What do I mean by "different types" of uncertainty? Well, the example above was simple: the model had but a single uncertainty, over the proportion of typical objects that we valued. The AI learnt that we systematically underestimated this, even when it helped us try and do better. But there are other types of uncertainties that could happen. We value some objects more than others, but maybe these estimates are not accurate either. Maybe we are fine as long as one object of a type exists, and don't care about the other - or, conversely, maybe some objects are only valuable in pairs. The AI needs a rich enough model to be able to account for these extra types of preferences, that we may not have ever articulated explicitly. There are even more examples as we move from gridworlds into the real world. We can articulate ideas like "human value is fragile" and maybe give an estimate of the total complexity of human values. And then the agent could use examples to estimate the quality of our estimate, and come up with better number for the desired complexity. But "human value is fragile" is a relatively recent insight. There was time when people hadn't articulated that idea. So it's not that we didn't have a good estimate for the complexity of human values; we didn't have any idea that was a good thing to estimate. The AI has to figure out the unknown unknowns. Note that, unlike the value synthesis project, the AI doesn't need to resolve this uncertainty; it just needs to know that it exists, and give a good-enough estimate of it. The AI will certainly figure out some unknown unknowns (and unknown knowns): it just has to spot some patterns and connections we were unaware of. But in order to get all of them, the AI has to have some sort of maximal model in which all our uncertainty (and all our models) can be contained. Just consider some of the concepts I've come up with (I chose these because I'm most familiar with them; LessWrong abounds with other examples): siren worlds, humans making similar normative assumptions about each other, and the web of connotations. In theory, each of these should have reduced my uncertainty, and moved W closer to U. In practice, each of these has increased my estimate of uncertainty, by showing how much remains to be done. Could an AI have taken these effects correctly into account, given that these three examples are of very different types? Can it do so for discoveries that remain to be made? I've argued that an indescribable hellworld cannot exist. There's a similar question as to whether there exists human uncertainty about U that cannot be included in the AI's model of Δ. By definition, this uncertainty would be something that is currently unknown and unimaginable to us. However, I feel that it's far more likely to exist, than the indescribable hellworld. Still despite that issue, it seems to me that there are methods of dealing with the Goodhart problem/nearest unblocked strategy problem. And this involves properly accounting for all our uncertainty, directly or indirectly. If we do this well, there no longer remains a Goodhart problem at all. Discuss Alignment Newsletter #51 3 апреля, 2019 - 07:10 Published on April 3, 2019 4:10 AM UTC Alignment Newsletter #51 Cancelling within-batch generalization in order to get stable deep RL View this email in your browser Find all Alignment Newsletter resources here. In particular, you can sign up, or look through this spreadsheet of all summaries that have ever been in the newsletter. You may have noticed that I've been slowly falling behind on the newsletter, and am now a week behind. I would just skip a week and continue -- but there are actually a lot of papers and posts that I want to read and summarize, and just haven't had the time. So instead, this week you're going to get two newsletters. This one focuses on all of the ML-based work that I have mostly been ignoring for the past few issues. Highlights Towards Characterizing Divergence in Deep Q-Learning (Joshua Achiam et al): Q-Learning algorithms use the Bellman equation to learn the Q*(s, a) function, which is the long-term value of taking action a in state s. Tabular Q-Learning collects experience and updates the Q-value for each (s, a) pair independently. As long as each (s, a) pair is visited infinitely often, and the learning rate is decayed properly, the algorithm is guaranteed to converge to Q*. Once we get to complex environments where you can't enumerate all of the states, we can't explore all of the (s, a) pairs. The obvious approach is to approximate Q*(s, a). Deep Q-Learning (DQL) algorithms use neural nets for this approximation, and use some flavor of gradient descent to update the parameters of the net such that it is closer to satisfying the Bellman equation. Unfortunately, this approximation can prevent the algorithm from ever converging to Q*. This paper studies the first-order Taylor expansion of the DQL update, and identifies three factors that affect the DQL update: the distribution of (s, a) pairs from which you learn, the Bellman update operator, and the neural tangent kernel, a property of the neural net that specifies how information from one (s, a) pair generalizes to other (s, a) pairs. The theoretical analysis shows that as long as there is limited generalization between (s, a) pairs, and each (s, a) pair is visited infinitely often, the algorithm will converge. Inspired by this, they design PreQN, which explicitly seeks to minimize generalization across (s, a) pairs within the same batch. They find that PreQN leads to competitive and stable performance, despite not using any of the tricks that DQL algorithms typically require, such as target networks. Rohin's opinion: I really liked this paper: it's a rare instance where I actually wanted to read the theory in the paper because it felt important for getting the high level insight. The theory is particularly straightforward and easy to understand (which usually seems to be true when it leads to high level insight). The design of the algorithm seems more principled than others, and the experiments suggest that this was actually fruitful. The algorithm is probably more computationally expensive per step compared to other algorithms, but that could likely be improved in the future. One thing that felt strange is that the proposed solution is basically to prevent generalization between (s, a) pairs, but the whole point of DQL algorithms is to generalize between (s, a) pairs since you can't get experience from all of them. Of course, since they are only preventing generalization within a batch, they still generalize between (s, a) pairs that are not in the batch, but presumably that was because they only could prevent generalization within the batch. Empirically the algorithm does seem to work, but it's still not clear to me why it works. Technical AI alignment Learning human intent Deep Reinforcement Learning from Policy-Dependent Human Feedback (Dilip Arumugam et al): One obvious approach to human-in-the-loop reinforcement learning is to have humans provide an external reward signal that the policy optimizes. Previous work noted that humans tend to correct existing behavior, rather than providing an "objective" measurement of how good the behavior is (which is what a reward function is). They proposed Convergent Actor-Critic by Humans (COACH), where instead of using human feedback as a reward signal, they use it as the advantage function. This means that human feedback is modeled as specifying how good an action is relative to the "average" action that the agent would have chosen from that state. (It's an average because the policy is stochastic.) Thus, as the policy gets better, it will no longer get positive feedback on behaviors that it has successfully learned to do, which matches how humans give reinforcement signals. This work takes COACH and extends it to the deep RL setting, evaluating it on Minecraft. While the original COACH had an eligibility trace that helps "smooth out" human feedback over time, deep COACH requires an eligibility replay buffer. For sample efficiency, they first train an autoencoder to learn a good representation of the space (presumably using experience collected with a random policy), and feed these representations into the control policy. They reward entropy so that the policy doesn't commit to a particular behavior, making it responsive to feedback, but select actions by always picking the action with maximal probability (rather than sampling from the distribution) in order to have interpretable, consistent behavior for the human trainers to provide feedback on. They evaluate on simple navigation tasks in the complex 3D environment of Minecraft, including a task where the agent must patrol the perimeter of a room, which cannot be captured by a state-based reward function. Rohin's opinion: I really like the focus on figuring out how humans actually provide feedback in practice; it makes a lot of sense that we provide reinforcement signals that reflect the advantage function rather than the reward function. That said, I wish the evaluation had more complex tasks, and had involved human trainers who were not authors of the paper -- it might have taken an hour or two of human time instead of 10-15 minutes, but would have been a lot more compelling. Before continuing, I recommend reading about Simulated Policy Learning in Video Models below. As in that case, I think that you get sample efficiency here by getting a lot of "supervision information" from the pixels used to train the VAE, though in this case it's by learning useful features rather than using the world model to simulate trajectories. (Importantly, in this setting we care about sample efficiency with respect to human feedback as opposed to environment interaction.) I think the techniques used there could help with scaling to more complex tasks. In particular, it would be interesting to see a variant of deep COACH that alternated between training the VAE with the learned control policy, and training the learned control policy with the new VAE features. One issue would be that as you retrain the VAE, you would invalidate your previous control policy, but you could probably get around that (e.g. by also training the control policy to imitate itself while the VAE is being trained). From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following (Justin Fu et al): Rewards and language commands are more generalizable than policies: "pick up the vase" would make sense in any house, but the actions that navigate to and pick up a vase in one house would not work in another house. Based on this observation, this paper proposes that we have a dataset where for several (language command, environment) pairs, we are given expert demonstrations of how to follow the command in that environment. For each data point, we can use IRL to infer a reward function, and use that to train a neural net that can map from the language command to the reward function. Then, at test time, given a language command, we can convert it to a reward function, after which we can use standard deep RL techniques to get a policy that executes the command. The authors evaluate on a 3D house domain with pixel observations, and two types of language commands: navigation and pick-and-place. During training, when IRL needs to be done, since deep IRL algorithms are computationally expensive they convert the task into a small, tabular MDP with known dynamics for which they can solve the IRL problem exactly, deriving a gradient that can then be applied in the observation space to train a neural net that given image observations and a language command predicts the reward. Note that this only needs to be done at training time: at test time, the reward function can be used in a new environment with unknown dynamics and image observations. They show that the learned rewards generalize to novel combinations of objects within a house, as well as to entirely new houses (though to a lesser extent). Rohin's opinion: I think the success at generalization comes primarily because of the MaxEnt IRL during training: it provides a lot of structure and inductive bias that means that the rewards on which the reward predictor is trained are "close" to the intended reward function. For example, in the navigation tasks, the demonstrations for a command like "go to the vase" will involve trajectories through the state of many houses that end up in the vase. For each demonstration, MaxEnt IRL "assigns" positive reward to the states in the demonstration, and negative reward to everything else. However, once you average across demonstrations in different houses, the state with the vase gets a huge amount of positive reward (since it is in all trajectories) while all the other states are relatively neutral (since they will only be in a few trajectories, where the agent needed to pass that point in order to get to the vase). So when this is "transferred" to the neural net via gradients, the neural net is basically "told" that high reward only happens in states that contain vases, which is a strong constraint on the learned reward. To be clear, this is not meant as a critique of the paper: indeed, I think when you want out-of-distribution generalization, you have to do it by imposing structure/inductive bias, and this is a new way to do it that I hadn't seen before. Using Natural Language for Reward Shaping in Reinforcement Learning (Prasoon Goyal et al): This paper constructs a dataset for grounding natural language in Atari games, and uses it to improve performance on Atari. They have humans annotate short clips with natural language: for example, "jump over the skull while going to the left" in Montezuma's Revenge. They use this to build a model that predicts whether a given trajectory matches a natural language instruction. Then, while training an agent to play Atari, they have humans give the AI system an instruction in natural language. They use their natural language model to predict the probability that the trajectory matches the instruction, and add that as an extra shaping term in the reward. This leads to faster learning. Interpretability Visualizing memorization in RNNs (Andreas Madsen): This is a short Distill article that showcases a visualization tool that demonstrates how contextual information is used by various RNN units (LSTMs, GRUs, and nested LSTMs). The method is very simple: for each character in the context, they highlight the character in proportion to the gradient of the logits with respect to that character. Looking at this visualization allows us to see that GRUs are better at using long-term context, while LSTMs perform better for short-term contexts. Rohin's opinion: I'd recommend you actually look at and play around with the visualization, it's very nice. The summary is short because the value of the work is in the visualization, not in the technical details. Other progress in AI Exploration Learning Exploration Policies for Navigation (Tao Chen et al) Deep Reinforcement Learning with Feedback-based Exploration (Jan Scholten et al) Reinforcement learning Towards Characterizing Divergence in Deep Q-Learning (Joshua Achiam et al): Summarized in the highlights! Eighteen Months of RL Research at Google Brain in Montreal (Marc Bellemare): One approach to reinforcement learning is to predict the entire distribution of rewards from taking an action, instead of predicting just the expected reward. Empirically, this works better, even though in both cases we choose the action with highest expected reward. This blog post provides an overview of work at Google Brain Montreal that attempts to understand this phenomenon. I'm only summarizing the part that most interested me. First, they found that in theory, distributional RL performs on par with or worse than standard RL when using either a tabular representation or linear features. They then tested this empirically on Cartpole, and found similar results: distributional RL performed worse when using tabular or linear representations, but better when using a deep neural net. This suggests that distributional RL "learns better representations". So, they visualize representations for RL on the four-room environment, and find that distributional RL captures more structured representations. Similarly this paper showed that predicting value functions for multiple discount rates is an effective way to produce auxiliary tasks for Atari. Rohin's opinion: This is a really interesting mystery with deep RL, and after reading this post I have a story for it. Note I am far from an expert in this field and it's quite plausible that if I read the papers cited in this post I could tell this story is false, but here's the story anyway. As we saw with PreQN earlier in this issue, one of the most important aspects of deep RL is how information about one (s, a) pair is used to generalize to other (s, a) pairs. I'd guess that the benefit from distributional RL is primarily that you get "good representations" that let you do this generalization well. With a tabular representation you don't do any generalization, and with a linear feature space the representation is hand-designed by humans to do this generalization well, so distributional RL doesn't help in those cases. But why does distributional RL learn good representations? I claim that it provides stronger supervision given the same amount of experience. With normal expected RL, the final layer of the neural net need only be useful for predicting the expected reward, but with distributional RL they must be useful for predicting all of the quantiles of the reward distribution. There may be "shortcuts" or "heuristics" that allow you to predict expected reward well because of spurious correlations in your environment, but it's less likely that those heuristics work well for all of the quantiles of the reward distribution. As a result, having to predict more things enforces a stronger constraint on what representations your neural net must have, and thus you are more likely to find good representations. This perspective also explains why predicting value functions for multiple discount rates helps with Atari, and why adding auxiliary tasks is often helpful (as long as the auxiliary task is relevant to the main task). The important aspect here is that all of the quantiles are forcing the same neural net to learn good representations. If you instead have different neural nets predicting each quantile, each neural net has roughly the same amount of supervision as in expected RL, so I'd expect that to work about as well as expected RL, maybe a little worse since quantiles are probably harder to predict than means. If anyone actually runs this experiment, please do let me know the result! Diagnosing Bottlenecks in Deep Q-learning Algorithms (Justin Fu, Aviral Kumar et al): While the PreQN paper used a theoretical approach to tackle Deep Q-Learning algorithms, this one takes an empirical approach. Their results: - Small neural nets cannot represent Q*, and so have undesired bias that results in worse performance. However, they also have convergence issues, where the Q-function they actually converge to is significantly worse than the best Q-function that they could express. Larger architectures mitigate both of these problems. - When there are more samples, we get a lower validation loss, showing that we are overfitting. Despite this, larger architectures are better, because the performance loss from overfitting is not as bad as the performance loss from having a bad bias. A good early stopping criterion could help with this. - To study how non-stationarity affects DQL algorithms, they study a variant where the Q-function is a moving average of the past Q-functions (instead of the full update), which means that the target values don't change as quickly (i.e. it is closer to a stationary target). They find that non-stationarity doesn't matter much for large architectures. - To study distribution shift, they look at the difference between the expected Bellman error before and after an update to the parameters. They find that distribution shift doesn't correlate much with performance and so is likely not important. - Algorithms differ strongly in the distribution over (s, a) pairs that the DQL update is computed over. They study this in the absence of sampling (i.e. when they simply weight all possible (s, a) pairs, rather than just the ones sampled from a policy) and find that distributions that are "close to uniform" perform best. They hypothesize that this is the reason that experience replay helps -- initially an on-policy algorithm would take samples from a single policy, while experience replay adds samples from previous versions of the policy, which should increase the coverage of (s, a) pairs. To sum up, the important factors are using an expressive neural net architecture, and designing a good sampling distribution. Inspired by this, they design Adversarial Feature Matching (AFM), which like Prioritized Experience Replay (PER) puts more weight on samples that have high Bellman error. However, unlike PER, AFM does not try to reduce distribution shift via importance sampling, since their experiments found that this was not important. Rohin's opinion: This is a great experimental paper, there's a lot of data that can help understand DQL algorithms. I wouldn't take the results too literally, since insights on simple environments may not generalize to more complex environments. For example, they found overfitting to be an issue in their environments -- it's plausible to me that with more complex environments (think Dota/StarCraft, not Mujoco) this reverses and you end up underfitting the data you have. Nonetheless, I think data like this is particularly valuable for coming up with an intuitive theory of how deep RL works, if not a formal one. Simulated Policy Learning in Video Models (Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Miłos, Błazej Osinski et al): This blog post and the associated paper tackle model-based RL for Atari. The recent world models (AN #23) paper proposed first learning a model of the world by interacting with the environment using a random policy, and then using the model to simulate the environment and training a control policy using those simulations. (This wasn't it's main point, but it was one of the things it talked about.) The authors take this idea and put it in an iterative loop: they first train the world model using experience from a random policy, then train a policy using the world model, retrain the world model with experience collected using the newly trained policy, retrain the policy, and so on. This allows us to correct any mistakes in the world model and let it adapt to novel situations that the control policy discovers. This allows them to train agents that can play Atari with only 100K interactions with the environment (corresponding to about two hours of real-time gameplay), though the final performance is lower than the state-of-the-art achieved with model-free RL. See Import AI for more details. Rohin's opinion: This work follows the standard pattern where model-based RL is more sample efficient but reaches worse final performance compared to model-free RL. Let's try to explain this using the same story as in the rest of this newsletter. The sample efficiency comes from the fact that they learn a world model that can predict the future, and then use that model to solve the control problem (which has zero sample cost, since you are no longer interacting with the environment). It turns out that predicting the future is "easier" than selecting the optimal action, and so the world model can be trained in fewer samples than it would take to solve the control problem directly. Why is the world model "easier" to learn? One possibility is that solving the control problem requires you to model the world anyway, and so must be a harder problem. If you don't know what your actions are going to do, you can't choose the best one. I don't find this very compelling, since there are lots of aspects of world modeling that are irrelevant to the control problem -- you don't need to know exactly how the background art will change in order to choose what action to take, but world modeling requires you to do this. I think the real reason is that world modeling benefits from much more supervision -- rather than getting a sparse reward signal over a trajectory, you get a full grid of pixels every timestep that you were supposed to predict. This gives you many orders of magnitude more "supervision information" per sample, and so it makes it easier to learn. (This is basically the same argument as in Yann Lecun's cake analogy.) Why does it lead to worse performance overall? The policy is now being trained using rollouts that are subtly wrong, and so instead of specializing to the true Atari dynamics it will be specialized to the world model dynamics, which is going to be somewhat different and should lead to a slight dip in performance. (Imagine a basketball player having to shoot a ball that was a bit heavier than usual -- she'll probably still be good, but not as good as with a regular basketball.) In addition, since the world model is supervised by pixels, any small objects are not very important to the world model (i.e. getting them wrong does not incur much loss), even if they are very important for control. In fact, they find that bullets tend to disappear in Atlantis and Battle Zone, which is not good if you want to learn to play those games. I'm not sure if they shared weights between the world model and the control policy. If they did, then they would also have the problem that the features that are useful for predicting the future are not the same as the features that are useful for selecting actions, which would also cause a drop in performance. My guess is that they didn't share weights for precisely this reason, but I'm not sure. Unifying Physics and Deep Learning with TossingBot (Andy Zeng): TossingBot is a system that learns how to pick up and toss objects into bins using deep RL. The most interesting thing about it is that instead of using neural nets to directly predict actions, they are instead used to predict adjustments to actions that are computed by a physics-based controller. Since the physics-based controller generalizes well to new situations, TossingBot is also able to generalize to new tossing locations. Rohin's opinion: This is a cool example of using structured knowledge in order to get generalization while also using deep learning in order to get performance. I also recently came across Residual Reinforcement Learning for Robot Control, which seems to have the same idea of combining deep RL with conventional control mechanisms. I haven't read either of the papers in depth, so I can't compare them, but a very brief skim suggests that their techniques are significantly different. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables (Kate Rakelly, Aurick Zhou et al) Deep learning Measuring the Limits of Data Parallel Training for Neural Networks (Chris Shallue and George Dahl): Consider the relationship between the size of a single batch and the number of batches needed to reach a specific performance bound when using deep learning. If all that mattered for performance was the total number of examples that you take gradient steps on (i.e. the product of these two numbers), then you would expect a perfect inverse relationship between these two quantities, which would look like a line with negative slope on a log-log plot. In this case, we could scale batch sizes up arbitrarily far, and distribute them across as many machines as necessary, in order to reduce wall clock training time. A 2x increase in batch size with twice as many machines would lead to a 2x decrease in training time. However, as you make batch sizes really large, you face the problem of stale gradients: if you had updated on the first half of the batch and then computed gradients on the second half of the batch, the gradients for the second half would be "better", because they were computed with respect to a better set of parameters. When this effect becomes significant, you no longer get the nice linear scaling from parallelization. This post studies the relationship empirically across a number of datasets, architectures, and optimization algorithms. They find that universally, there is initially an era of perfect linear scaling as you increase batch size, followed by a region of diminishing marginal returns that ultimately leads to an asymptote where increasing batch size doesn't help at all with reducing wall-clock training time. However, the transition points between these regimes vary wildly, suggesting that there may be low hanging fruit in the design of algorithms or architectures that explicitly aim to achieve very good scaling. Rohin's opinion: OpenAI found (AN #37) that the best predictor of the maximum useful batch size was how noisy the gradient is. Presumably when you have noisy gradients, a larger batch size helps "average out" the noise across examples. Rereading their post, I notice that they mentioned the study I've summarized here and said that their results can help explain why there's so much variance in the transition points across datasets. However, I don't think it can explain the variance in transition points across architectures. Noisy gradients are typically a significant problem, and so it would be weird if the variance in transition points across architectures were explained by the noisiness of the gradient: that would imply that two architectures reach the same final performance even though one had the problem of noisy gradients while the other didn't. So there seems to be something left to explain here. That said, I haven't looked in depth at the data, so the explanation could be very simple. For example, maybe the transition points don't vary much across architecture and vary much more across datasets, and the variance across architecture is small enough that its effect on performance is dwarfed by all the other things that can affect the performance of deep learning systems. Or perhaps while the noisiness of the gradient is a good predictor of the maximum batch size, it still only explains say 40% of the effect, and so variance across architectures is totally compatible with factors other than the gradient noise affecting the maximum batch size. Copyright © 2019 Rohin Shah, All rights reserved. Want to change how you receive these emails? You can update your preferences or unsubscribe from this list. Discuss LW Update 2019-04-02 – Frontpage Rework 3 апреля, 2019 - 02:48 Published on April 2, 2019 11:48 PM UTC Since LW2.0 launched, the frontpage had become very complex – both visually and conceptually. This was producing an overall bad experience, and making it hard for the team to add or scale up features (such as Q&A, and later on Community, Library and upcoming Recommendations) For the past couple months, we've been working on an overhaul of the frontpage (and correspondingly, the overall site design). Our goal was is to rearrange that complexity, spending fewer "complexity points" on things that didn't need them as much, so we could spend them elsewhere. Frontpage Updates • Tooltip oriented design. • It's easier to figure out what most things will do before you click on it. • Navigation Menu • Helps establish the overall site hierarchy • Available on all major site pages (not Post Pages, where we want people to read without distraction) • Improved mobile navigation (shows up as a tab menu at the bottom) • Eventually we'll deprecate the old Nav Menu (still available in the header) and replace it with a collapsible version of the new one. • Home Page streamlining • Moved Recommend Sequences and Community over to the Nav Menu, so there are only 3 sections to parse • Post Items simplified down to one line. • Latest Posts now only have a single setting: "show personal blogposts", instead of forcing you to figure out immediately what "meta", "curated" and "daily" are. • Post List options are generally 'light cobalt blue' – not too obtrusive, but easier to find when you want them. • Questions Page now has two sections: • Recent Activity – simply sorted by "most recently commented at", so if you respond to an old question it will appear above the fold. • Top Questions – also sorted by "recently commented", but filtered to questions with 40 or more karma, so that it's easier to catch up on updates to highly upvoted questions. • Community Page • UI updated to match Home Page. • The group section now shows 7 groups instead of 3, and has a load more button. Discuss Degrees of Freedom 3 апреля, 2019 - 00:10 Published on April 2, 2019 9:10 PM UTC Something I’ve been thinking about for a while is the dual relationship between optimization and indifference, and the relationship between both of them and the idea of freedom. Optimization: “Of all the possible actions available to me, which one is best? (by some criterion). Ok, I’ll choose the best.” Indifference: “Multiple possible options are equally good, or incommensurate (by the criterion I’m using). My decision algorithm equally allows me to take any of them.” Total indifference between all options makes optimization impossible or vacuous. An optimization criterion which assigns a total ordering between all possibilities makes indifference vanishingly rare. So these notions are dual in a sense. Every dimension along which you optimize is in the domain of optimization; every dimension you leave “free” is in the domain of indifference. Being “free” in one sense can mean “free to optimize”. I choose the outcome that is best according to an internal criterion, which is not blocked by external barriers. A limit on freedom is a constraint that keeps me away from my favorite choice. Either a natural limit (“I would like to do that but the technology doesn’t exist yet”) or a man-made limit (“I would like to do that but it’s illegal.”) There’s an ambiguity here, of course, when it comes to whether you count “I would like to do that, but it would have a consequence I don’t like” as a limit on freedom. Is that a barrier blocking you from the optimal choice, or is it simply another way of saying that it’s not an optimal choice after all? And, in the latter case, isn’t that basically equivalent to saying there is no such thing as a barrier to free choice? After all, “I would like to do that, but it’s illegal” is effectively the same thing as “I would like to do that, but it has a consequence I don’t like, such as going to jail.” You can get around this ambiguity in a political context by distinguishing natural from social barriers, but that’s not a particularly principled distinction. Another issue with freedom-as-optimization is that it’s compatible with quite tightly constrained behavior, in a way that’s not consistent with our primitive intuitions about freedom. If you’re only “free” to do the optimal thing, that can mean you are free to do only one thing, all the time, as rigidly as a machine. If, for instance, you are only free to “act in your own best interests”, you don’t have the option to act against your best interests. People in real life can feel constrained by following a rigid algorithm even when they agree it’s “best”; “but what if I want to do something that’s not best?” Or, they can acknowledge they’re free to do what they choose, but are dismayed to learn that their choices are “dictated” as rigidly by habit and conditioning as they might have been by some human dictator. An alternative notion of freedom might be freedom-as-arbitrariness. Freedom in the sense of “degrees of freedom” or “free group”, derived from the intuition that freedom means breadth of possibility rather than optimization power. You are only free if you could equally do any of a number of things, which ultimately means something like indifference. This is the intuition behind claims like Viktor Frankl’s: “Between stimulus and response there is a space. In that space is our power to choose a response. In our response lies our growth and our freedom.” If you always respond automatically to a given stimulus, you have only one choice, and that makes you unfree in the sense of “degrees of freedom.” Venkat Rao’s concept of freedom is pretty much this freedom-as-arbitrariness, with some more specific wrinkles. He mentions degrees of freedom (“dimensionality”) as well as “inscrutability”, the inability to predict one’s motion from the outside. Buddhists also often speak of freedom more literally in terms of indifference, and there’s a very straightforward logic to this; you can only choose equally between A and B if you have been “liberated” from the attractions and aversions that constrain you to choose A over B. Those who insist that Buddhism is compatible with a fairly normal life say that after Buddhist practice you still will choose systematically most of the time — your utility function cannot fully flatten if you act like a living organism — but that, like Viktor Frankl’s ideal human, you will be able to reflect with equinamity and consider choosing B over A; you will be more “mentally flexible.” Of course, some Buddhist texts simply say that you become actually indifferent, and that sufficient vipassana meditation will make you indistinguishable from a corpse. Freedom-as-indifference, I think, is lurking behind our intuitions about things like “rights” or “ownership.” When we say you have a “right” to free speech — even a right bounded with certain limits, as it of course always is in practice — we mean that within those limits, you may speak however you want. Your rights define a space, within which you may behave arbitrarily. Not optimally. A right, if it’s not to be vacuous, must mean the right to behave “badly” in some way or other. To own a piece of property means that, within whatever limits the concept of ownership sets, you may make use of it in any way you like, even in suboptimal ways. This is very clearly illustrated by Glen Weyl’s notion of radical markets, which neatly disassociates two concepts usually both considered representative of free-market systems: ownership and economic efficiency. To own something just is to be able to hang onto it even when it is economically inefficient to do so. As Weyl says, “property is monopoly.” The owner of a piece of land can sit on it, making no improvements, while holding out for a high price; the owner of intellectual property can sit on it without using it; in exactly the same way that a monopolist can sit on a factory and depress output while charging higher prices than he could get away with in a competitive market. For better or for worse, rights and ownership define spaces in which you can destroy value. If your car was subject to a perpetual auction and ownership tax as Weyl proposes, bashing your car to bits with a hammer would cost you even if you didn’t personally need a car, because it would hurt the rental or resale value and you’d still be paying tax. On some psychological level, I think this means you couldn’t feel fully secure in your possessions, only probabilistically likely to be able to provide for your needs. You only truly own what you have a right to wreck. Freedom-as-a-space-of-arbitrary-action is also, I think, an intuition behind the fact that society (all societies, but the US more than other rich countries, I think) is shaped by people’s desire for more discretion in decisionmaking as opposed to transparent rubrics. College admissions, job applications, organizational codes of conduct, laws and tax codes, all are designed deliberately to allow ample discretion on the part of decisionmakers rather than restricting them to following “optimal” or “rational”, simple and legible, rules. Some discretion is necessary to ensure good outcomes; a wise human decisionmaker can always make the right decision in some hard cases where a mechanical checklist fails, simply because the human has more cognitive processing power than the checklist. This phenomenon is as old as Plato’s Laws and as current as the debate over algorithms and automation in medicine. However, what we observe in the world is more discretion than would be necessary, for the aforementioned reasons of cognitive complexity, to generate socially beneficial outcomes. We have discretion that enables corruption and special privileges in cases that pretty much nobody would claim to be ideal — rich parents buying their not-so-competent children Ivy League admissions, favored corporations voting themselves government subsidies. Decisionmakers want the “freedom” to make illegible choices, choices which would look “suboptimal” by naively sensible metrics like “performance” or “efficiency”, choices they would prefer not to reveal or explain to the public. Decisionmakers feel trapped when there’s too much “accountability” or “transparency”, and prefer a wider sphere of discretion. Or, to put it more unfavorably, they want to be free to destroy value. And this is true at an individual psychological level too, of course — we want to be free to “waste time” and resist pressure to account for literally everything we do. Proponents of optimization insist that this is simply a failure mode from picking the wrong optimization target — rest, socializing, and entertainment are also needs, the optimal amount of time to devote to them isn’t zero, and you don’t have to consider personal time to be “stolen” or “wasted” or “bad”, you can, in principle, legibilize your entire life including your pleasures. Anything you wish you could do “in the dark”, off the record, you could also do “in the light,” explicitly and fully accounted for. If your boss uses “optimization” to mean overworking you, the problem is with your boss, not with optimization per se. The freedom-as-arbitrariness impulse in us is skeptical. I see optimization and arbitrariness everywhere now; I see intelligent people who more or less take one or another as ideologies, and see them as obviously correct. Venkat Rao and Eric Weinstein are partisans of arbitrariness; they speak out in favor of “mediocrity” and against “excellence” respectively. The rationale being, that being highly optimized at some widely appreciated metric — being very intelligent, or very efficient, or something like that — is often less valuable than being creative, generating something in a part of the world that is “dark” to the rest of us, that is not even on our map as something to value and thus appears as lack of value. Ordinary people being “mediocre”, or talented people being “undisciplined” or “disreputable”, may be more creative than highly-optimized “top performers”. Robin Hanson, by contrast, is a partisan of optimization; he speaks out against bias and unprincipled favoritism and in favor of systems like prediction markets which would force the “best ideas to win” in a fair competition. Proponents of ideas like radical markets, universal basic income, open borders, income-sharing agreements, or smart contracts (I’d here include, for instance, Vitalik Buterin) are also optimization partisans. These are legibilizing policies that, if optimally implemented, can always be Pareto improvements over the status quo; “whatever degree of wealth redistribution you prefer”, proponents claim, “surely it is better to achieve it in whatever way results in the least deadweight loss.” This is the very reason that they are not the policies that public choice theory would predict would emerge naturally in governments. Legibilizing policies allow little scope for discretion, so they don’t let policymakers give illegible rewards to allies and punishments to enemies. They reduce the scope of the “political”, i.e. that which is negotiated at the personal or group level, and replace it with an impersonal set of rules within which individuals are “free to choose” but not very “free to behave arbitrarily” since their actions are transparent and they must bear the costs of being in full view. Optimization partisans are against weakly enforced rules — they say “if a rule is good, enforce it consistently; if a rule is bad, remove it; but selective enforcement is just another word for favoritism and corruption.” Illegibility partisans say that weakly enforced rules are the only way to incorporate valuable information — precisely that information which enforcers do not feel they can make explicit, either because it’s controversial or because it’s too complex to verbalize. “If you make everything explicit, you’ll dumb everything in the world down to what the stupidest and most truculent members of the public will accept. Say goodbye to any creative or challenging innovations!” I see the value of arguments on both sides. However, I have positive (as opposed to normative) opinions that I don’t think everybody shares. I think that the world I see around me is moving in the direction of greater arbitrariness and has been since WWII or so (when much of US society, including scientific and technological research, was organized along military lines). I see arbitrariness as a thing that arises in “mature” or “late” organizations. Bigger, older companies are more “political” and more monopolistic. Bigger, older states and empires are more “corrupt” or “decadent.” Arbitrariness has a tendency to protect those in power rather than out of power, though the correlation isn’t perfect. Zones that protect your ability to do “whatever” you want without incurring costs (which include zones of privacy or property) are protective, conservative forces — they allow people security. This often means protection for those who already have a lot; arbitrariness is often “elitist”; but it can also protect “underdogs” on the grounds of tradition, or protect them by shrouding them in secrecy. (Scott thought “illegibility” was a valuable defense of marginalized peoples like the Roma. Illegibility is not always the province of the powerful and privileged.) No; the people such zones of arbitrary, illegible freedom systematically harm are those who benefit from increased accountability and revealing of information. Whistleblowers and accusers; those who expect their merit/performance is good enough that displaying it will work to their advantage; those who call for change and want to display information to justify it; those who are newcomers or young and want a chance to demonstrate their value. If your intuition is “you don’t know me, but you’ll like me if you give me a chance” or “you don’t know him, but you’ll be horrified when you find out what he did”, or “if you gave me a chance to explain, you’d agree”, or “if you just let me compete, I bet I could win”, then you want more optimization. If your intuition is “I can’t explain, you wouldn’t understand” or “if you knew what I was really like, you’d see what an impostor I am”, or “malicious people will just use this information to take advantage of me and interpret everything in the worst possible light” or “I’m not for public consumption, I am my own sovereign person, I don’t owe everyone an explanation or justification for actions I have a right to do”, then you’ll want less optimization. Of course, these aren’t so much static “personality traits” of a person as one’s assessment of the situation around oneself. The latter cluster is an assumption that you’re living in a social environment where there’s very little concordance of interests — people knowing more about you will allow them to more effectively harm you. The former cluster is an assumption that you’re living in an environment where there’s a great deal of concordance of interests — people knowing more about you will allow them to more effectively help you. For instance, being “predictable” is, in Venkat’s writing, usually a bad thing, because it means you can be exploited by adversaries. Free people are “inscrutable.” In other contexts, such as parenting, being predictable is a good thing, because you want your kids to have an easier time learning how to “work” the house rules. You and your kid are not, most of the time, wily adversaries outwitting each other; conflicts are more likely to come from too much confusion or inconsistently enforced boundaries. Relationship advice and management advice usually recommends making yourself easier for your partners and employees to understand, never more inscrutable. (Sales advice, however, and occasionally advice for keeping romance alive in a marriage, sometimes recommends cultivating an aura of mystery, perhaps because it’s more adversarial.) A related notion: wanting to join discussions is a sign of expecting a more cooperative world, while trying to keep people from joining your (private or illegible) communications is a sign of expecting a more adversarial world. As social organizations “mature” and become larger, it becomes harder to enforce universal and impartial rules, harder to keep the larger population aligned on similar goals, and harder to comprehend the more complex phenomena in this larger group. . This means that there’s both motivation and opportunity to carve out “hidden” and “special” zones where arbitrary behavior can persist even when it would otherwise come with negative consequences. New or small organizations, by contrast, must gain/create resources or die, so they have more motivation to “optimize” for resource production; and they’re simple, small, and/or homogeneous enough that legible optimization rules and goals and transparent communication are practical and widely embraced. “Security” is not available to begin with, so people mostly seek opportunity instead. This theory explains, for instance, why US public policy is more fragmented, discretionary, and special-case-y, and less efficient and technocratic, than it is in other developed countries: the US is more racially diverse, which means, in a world where racism exists, that US civil institutions have evolved to allow ample opportunities to “play favorites” (giving special legal privileges to those with clout) in full generality, because a large population has historically been highly motivated to “play favorites” on the basis of race. Homogeneity makes a polity behave more like a “smaller” one, while diversity makes a polity behave more like a “larger” one. Aesthetically, I think of optimization as corresponding to an “early” style, like Doric columns, or like Masaccio; simple, martial, all form and principle. Arbitrariness corresponds to a “late” style, like Corinthian columns or like Rubens: elaborate, sensual, full of details and personality. The basic argument for optimization over arbitrariness is that it creates growth and value while arbitrariness creates stagnation. Arbitrariness can’t really argue for itself as well, because communication itself is on the other side. Arbitrariness always looks illogical and inconsistent. It kind of is illogical and inconsistent. All it can say is “I’m going to defend my right to be wrong, because I don’t trust the world to understand me when I have a counterintuitive or hard-to-express or controversial reason for my choice. I don’t think I can get what I want by asking for it or explaining my reasons or playing ‘fair’.” And from the outside, you can’t always tell the difference between someone who thinks (perhaps correctly!) that the game is really rigged against them a profound level, and somebody who just wants to cheat or who isn’t thinking coherently. Sufficiently advanced cynicism is indistinguishable from malice and stupidity. For a fairly sympathetic example, you see something like Darkness at Noon, where the protagonist thinks, “Logic inexorably points to Stalinism; but Stalinism is awful! Therefore, let me insist on some space free from the depredations of logic, some space where justice can be tempered by mercy and reason by emotion.” From the distance of many years, it’s easy to say that’s silly, that of course there are reasons not to support Stalin’s purges, that it’s totally unnecessary to reject logic and justice in order to object to killing innocents. But from inside the system, if all the arguments you know how to formulate are Stalinist, if all the “shoulds” and “oughts” around you are Stalinist, perhaps all you can articulate at first is “I know all this is right, of course, but I don’t like it.” Not everything people call reason, logic, justice, or optimization, is in fact reasonable, logical, just, or optimal; so, a person needs some defenses against those claims of superiority. In particular, defenses that can shelter them even when they don’t know what’s wrong with the claims. And that’s the closest thing we get to an argument in favor of arbitrariness. It’s actually not a bad point, in many contexts. The counterargument usually has to boil down to hope — to a sense of “I bet we can do better.” Discuss Triangle SSC Meetup-April 2 апреля, 2019 - 21:42 Published on April 2, 2019 6:42 PM UTC Interested in rationality in the Research Triangle? Come join us at Ponysaurus. We're a fun, welcoming and engaging group! Discuss March 2019 gwern.net newsletter 2 апреля, 2019 - 17:17 Published on April 2, 2019 2:17 PM UTC Discuss Internet v. Culture (2019) - Los Angeles LW/SSC Meetup #103 (Wednesday, April 3rd) 2 апреля, 2019 - 09:00 Published on April 2, 2019 6:00 AM UTC Location: Wine Bar next to the Landmark Theater in the Westside Pavilion (10850 W Pico Blvd #312, Los Angeles, CA 90064). We will move upstairs (to the 3rd floor hallway) as soon as we reach capacity. Time: 7 pm (April 3rd) Parking: Available in the parking lot for the entire complex. The first three (3) hours are free and do not require validation (the website is unclear and poorly written, but it may be the case that if you validate your ticket and leave before three hours have passed, you will be charged3).  After that, parking is $3 for up to the fifth (5) hour, with validation. Contact: The best way to contact me (or anybody else who is attending the meetup) is through our Discord. Feel free to message me (T3t) directly. Invitation link: https://discord.gg/TaYjsvN Topic: We'll be discussing the effects (and second-order effects) of the internet on culture. Reading: https://marginalrevolution.com/marginalrevolution/2019/04/the-internet-vs-culture.html Discuss User GPT2 is Banned 2 апреля, 2019 - 09:00 Published on April 2, 2019 6:00 AM UTC For the past day or so, user GPT2 has been our most prolific commenter, replying to (almost) every LessWrong comment without any outside assistance. Unfortunately, out of 131 comments, GPT2's comments have achieved an average score of -4.4, and have not improved since it received a moderator warning. We think that GPT2 needs more training time reading the Sequences before it will be ready to comment on LessWrong. User GPT2 is banned for 355 days, and may not post again until April 1, 2020. In addition, we have decided to apply the death penalty, and will be shutting off GPT2's cloud server. Use this thread for discussion about GPT2, on LessWrong and in general. Discuss post-rational distractions 2 апреля, 2019 - 05:26 Published on April 2, 2019 2:26 AM UTC DonyChristie's intellectual fap post has called for post-rational techniques. I got most of the way through a comment reply before I realised it was a joke. April fools and all. Fruits of that effort here are some thoughts *** Developing your centre's. Sarah Perry's are knitting and mountain running. https://www.ribbonfarm.com/2018/04/06/deep-laziness/ If you ever meet me in person and want to put me at ease, ask me about running or knitting. These are two of my behaviours, my behavioural centers, and one indication of that is how much I like talking about them specifically. I do feel that there is something special about them, and that they connect to my nature on a fundamental level. In my heart, I think everyone should do mountain running and knitting, because they are the best things. Reading a lot. All the good soft books. Perhaps the ones overlooked by the skeptic types: Bonds that make us free, Feeding your Demons, Chakras, MTG colour wheel, Dream interpretation, Peterson's Bible lectures. Architecture. Free-ing stuck meanings. A long example of Chapman's here. "I'm not good with people" or "I'm not a technical person" Meditation. Seems to be important and relate to this somehow. MTCB, The Mind Illuminated, Seeing that frees, Roaring Silence. What's the context? What the hell is it you're trying to do? The metagame is discovering the constraints. You're swimming in the unknown what are the rules of the game you're playing. This is what you're doing anyway. It feels important keep in mind Chapman's answer to "If not Bayes then what?" My answer to “If not Bayesianism, then what?” is: all of human intellectual effort. Figuring out how things work, what’s true or false, what’s effective or useless, is “human complete.” In other words, it’s unboundedly difficult, and every human intellectual faculty must be brought to bear. Discuss Announcing the Center for Applied Postrationality 2 апреля, 2019 - 04:17 Published on April 2, 2019 1:17 AM UTC Hi all! Today, we are announcing the formation of a new organization: the Center for Applied Postrationality (or CFAP). Right now we're looking for two things: 1)$1.5 million in funding to have runway for the next six months, and 2) intellectual contributions from deep thinkers like YOU!

Just what is postrationality, anyway? To be honest, we don't really know either. Maybe you can help us? The term can refer to many different things, including:

• Epistemological pragmatism
• Getting in touch with your feelings
• Learning social skills
• Disagreeing with stuff on LessWrong
• Becoming a Christian
• According to one of our employees: "Postrationality is making the territory fit the map. Postrationality is realizing that paraconsistent linear homotopy type theory is the One True framework for epistemology, not Bayes. Postrationality is realizing that Aristotle was right about everything so there's no point in doing philosophy anymore. Or natural science. Postrationality is The Way that realizes there is no Way. Postrationality is meaning wireheading rather than pleasure wireheading. Postrationality is postironic belief in God. Postrationality is the realization that there is no distinction between sincerity and postirony."
• Another employee: "CFAP exists at the intersection of epistemology, phenomenology, sense-making, and pretentiousness. Our goal is to make new maps, in order to better navigate the territory, especially the territory in California. Google maps sucks at this. "

We're still deconfusing ourselves on what "applied" postrationality is, as so far it's mostly been insight porn posted on Twitter. Comment below what techniques you'd suggest for training the art of postrationality!

Discuss

User GPT2 Has a Warning for Violating Frontpage Commenting Guidelines

1 апреля, 2019 - 23:23
Published on April 1, 2019 8:23 PM UTC

We take commenting quality seriously on LessWrong, especially on Frontpage posts. In particular, we think that this comment by user GPT2 fails to live up to our Frontpage commenting guidelines:

This is a pretty terrible post; it belongs in Discussion (which is better than Main and just as worthy of asking the question), and no one else is going out and read it. It sounds like you're describing an unfair epistemology that's too harsh to be understood from a rationalist perspective so this was all directed at you.

Since user GPT2 seems to be quite prolific, we have implemented a setting to hide comments by GPT2, which can be accessed from the settings page when you are logged in.

Discuss

Prompts for eliciting blind spots/bucket errors/bugs

1 апреля, 2019 - 22:41
Published on April 1, 2019 7:41 PM UTC

This post is to make publicly available a few prompts/questions I came up with aiming to uncover blind spots around identity/self-concepts.

• Select a trait X that you believe you have, and where you like that you have it (e.g. rational, kind, patient...)
• Try to imagine a character that is a caricature of someone with trait X. Or another way to think about this: The way Spock is a Straw Man version of a rational character, what would a Straw Man version of a character with trait X look like? (referred to in the following as X-Spock)
• What are blind spots an X-Spock is likely to have?
• In what sorts of situations is an X-Spock especially likely to fail?
• What would an X-Spock have a lot of trouble admitting to? (e.g. someone who considers themselves courageous may be unable to admit they are afraid)
• What are traits that seem like opposites of X?
• Could the opposite traits actually be beneficial?
• Is what seems like an opposite trait in actuality orthogonal? (e.g. rational and emotional)

Discuss

Learning "known" information when the information is not actually known

1 апреля, 2019 - 20:56
Published on April 1, 2019 5:56 PM UTC

Methods like cooperative inverse reinforcement learning assume that the human knows their "true" reward function R(θ), and then that the human and the robot cooperate to figure out and maximise this reward.

This is fine as far as the model goes, and can allow us to design many useful systems. But it has a problem: the assumption is not true, and, moreover, its falsity can have major detrimental effects.

Contrast two situations:

1. The human knows the true R(θ).
2. The human has a collection of partial models in which they have clearly defined preferences. As a bounded, limited agent whose internal symbols are only well-grounded in standard situations, their stated preferences will be a simplification of their mental model at the time. The true R(θ) is constructed from some process of synthesis.

Now imagine the following conversation:

• AI: What do you really want?
• Human: Money.
• AI: Are you sure?
• Human: Yes.

Under most versions of hypothesis 1., this will be in a disaster. The human has expressed their preferences, and, when offered the opportunity for clarification, didn't give any. The AI will become a money-maximiser, and things go pear shaped.

Under hypothesis 2., however, the AI will attempt to get more details out of the human, suggesting hypothetical scenarios, checking what happens when money and other things in money's web of connotations come apart - eg "What if you had a lot of money, but couldn't buy anything, and everyone despised you?" The synthesis may fail, but, at the very least, the AI will investigate more.

Thus assuming the AI will be learning a truth that humans already know, is harmless assumption in many circumstances, but will result in disasters if pushed to the extreme.

Discuss