Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 35 минут 16 секунд назад

New York Restaurants I Love: Breakfast

14 февраля, 2019 - 16:10
Published on February 14, 2019 1:10 PM UTC



Discuss

Are there documentaries on rationality?

14 февраля, 2019 - 14:34
Published on February 14, 2019 11:34 AM UTC

I love documentaries, and i wonder if there are any about LessWrong topics, such as rationality/rationalists, epistemology, maybe cognitive science., etc..

Do you know any?



Discuss

Alignment Newsletter #45

14 февраля, 2019 - 05:10
Published on February 14, 2019 2:10 AM UTC

Alignment Newsletter #45 How to extract human preferences from the state of the world View this email in your browser

Find all Alignment Newsletter resources here. In particular, you can sign up, or look through this spreadsheet of all summaries that have ever been in the newsletter.

Highlights

Learning Preferences by Looking at the World (Rohin Shah and Dmitrii Krasheninnikov): The key idea with this project that I worked on is that the state of the world is already optimized for our preferences, and so simply by looking at the world we can infer these preferences. Consider the case where there is a vase standing upright on the table. This is an unstable equilibrium -- it's very easy to knock over the vase so it is lying sideways, or is completely broken. The fact that this hasn't happened yet suggests that we care about vases being upright and intact; otherwise at some point we probably would have let it fall.

Since we have optimized the world for our preferences, the natural approach is to model this process, and then invert it to get the preferences. You could imagine that we could consider all possible reward functions, and put probability mass on them in proportion to how likely they make the current world state if a human optimized them. Basically, we are simulating the past in order to figure out what must have happened and why. With the vase example, we would notice that in any reward function where humans wanted to break vases, or were indifferent to broken vases, we would expect the current state to contain broken vases. Since we don't observe that, it must be the case that we care about keeping vases intact.

Our algorithm, Reward Learning by Simulating the Past (RLSP), takes this intuition and applies it in the framework of Maximum Causal Entropy IRL (AN #12), where you assume that the human was acting over T timesteps to produce the state that you observe. We then show a few gridworld environments in which applying RLSP can fix a misspecified reward function.

Rohin's opinion: In addition to this blog post and the paper, I also wrote a post on the Alignment Forum expressing opinions about the work. There are too many disparate opinions to put in here, so I'd recommend reading the post itself. I guess one thing I'll mention is that to infer preferences with a single state, you definitely need a good dynamics model, and a good set of features. While this may seem difficult to get, it's worth noting that dynamics are empirical facts about the world, and features might be, and there is already lots of work on learning both dynamics and features.

Technical AI alignment   Iterated amplification sequence

Security amplification (Paul Christiano): If we imagine humans as reasoners over natural language, there are probably some esoteric sentences that could cause "failure". For example, maybe there are unreasonably convincing arguments that cause the human to believe something, when they shouldn't have been convinced by the argument. Maybe they are tricked or threatened in a way that "shouldn't" have happened. The goal with security amplification is to make these sorts of sentences difficult to find, so that we will not come across them in practice. As with Reliability amplification (AN #44), we are trying to amplify a fast agent A into a slow agent A* that is "more secure", meaning that it is multiplicatively harder to find an input that causes a catastrophic failure.

You might expect that capability amplification (AN #42) would also improve security, since the more capable agent would be able to notice failure modes and remove them. However, this would likely take far too long.

Instead, we can hope to achieve security amplification by making reasoning abstract and explicit, with the hope that when reasoning is explicit it becomes harder to trigger the underlying failure mode, since you have to get your attack "through" the abstract reasoning. I believe a future post will talk about this more, so I'll leave the details till then. Another option would be for the agent to act stochastically; for example, when it needs to generate a subquestion, it generates many different wordings of the subquestion and chooses one randomly. If only one of the wordings can trigger the failure, then this reduces the failure probability.

Rohin's opinion: This is the counterpoint to Reliability amplification (AN #44) from last week, and the same confusion I had last week still apply, so I'm going to refrain from an opinion.

Problems

Constructing Goodhart (johnswentworth): This post makes the point that Goodhart's Law is so common in practice because if there are several things that we care about, then we are probably at or close to a Pareto-optimal point with respect to those things, and so choosing any one of them as a proxy metric to optimize will cause the other things to become worse, leading to Goodhart effects.

Rohin's opinion: This is an important point about Goodhart's Law. If you take some "random" or unoptimized environment, and then try to optimize some proxy for what you care about, it will probably work quite well. It's only when the environment is already optimized that Goodhart effects are particularly bad.

Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function) (Peter Eckersley) (summarized by Richard): This paper discusses some impossibility theorems related to the Repugnant conclusion in population ethics (i.e. theorems showing that no moral theory simultaneously satisfies certain sets of intuitively desirable properties). Peter argues that in the context of AI it's best to treat these theorems as uncertainty results, either by allowing incommensurate outcomes or by allowing probabilistic moral judgements. He hypothesises that "the emergence of instrumental subgoals is deeply connected to moral certainty", and so implementing uncertain objective functions is a path to making AI safer.

Richard's opinion: The more general argument underlying this post is that aligning AGI will be hard partly because ethics is hard (as discussed here). I agree that using uncertain objective functions might help with this problem. However, I'm not convinced that it's useful to frame this issue in terms of impossibility theorems and narrow AI, and would like to see these ideas laid out in a philosophically clearer way.

Iterated amplification

HCH is not just Mechanical Turk (William Saunders): In Humans Consulting HCH (HCH) (AN #34) a human is asked a question and is supposed to return an answer. The human can ask subquestions, which are delegated to another copy of the human, who can ask subsubquestions, ad infinitum. This post points out that HCH has a free parameter -- the base human policy. We could imagine e.g. taking a Mechanical Turk worker and using them as the base human policy, and we could argue that HCH would give good answers in this setting as long as the worker is well-motivated, since he is using "human-like" reasoning. However, there are other alternatives. For example, in theory we could formalize a "core" of reasoning. For concreteness, suppose we implement a lookup table for "simple" questions, and then use this lookup table. We might expect this to be safe because of theorems that we proved about the lookup table, or by looking at the process by which the development team created the lookup table. In between these two extremes, we could imagine that the AI researchers train the human overseers about how to corrigibly answer questions, and then the human policy is used in HCH. This seems distinctly more likely to be safe than the first case.

Rohin's opinion: I strongly agree with the general point that we can get significant safety by improving the human policy (AN #43), especially with HCH and iterated amplification, since they depend on having good human overseers, at least initially.

Reinforcement Learning in the Iterated Amplification Framework (William Saunders): This post and its comments clarify how we can use reinforcement learning for the distillation step in iterated amplification. The discussion is still happening so I don't want to summarize it yet.

Learning human intent

Learning Preferences by Looking at the World (Rohin Shah and Dmitrii Krasheninnikov): Summarized in the highlights!

Preventing bad behavior

Test Cases for Impact Regularisation Methods (Daniel Filan): This post collects various test cases that researchers have proposed for impact regularization methods. A summary of each one would be far too long for this newsletter, so you'll have to read the post itself.

Rohin's opinion: These test cases and the associated commentary suggest to me that we haven't yet settled on what properties we'd like our impact regularization methods to satisfy, since there are pairs of test cases that seem hard to solve simultaneously, as well as test cases where the desired behavior is unclear.

Interpretability

Neural Networks seem to follow a puzzlingly simple strategy to classify images (Wieland Brendel and Matthias Bethge): This is a blog post explaining the paper Approximating CNNs with bag-of-local-features models works surprisingly well on ImageNet, which was summarized in AN #33.

Robustness

AI Alignment Podcast: The Byzantine Generals’ Problem, Poisoning, and Distributed Machine Learning (Lucas Perry and El Mahdi El Mahmdi) (summarized by Richard): Byzantine resilience is the ability of a system to operate successfully when some of its components have been corrupted, even if it's unclear which ones they are. In the context of machine learning, this is relevant to poisoning attacks in which some training data is altered to affect the batch gradient (one example being the activity of fake accounts on social media sites). El Mahdi explains that when data is very high-dimensional, it is easy to push a neural network into a bad local minimum by altering only a small fraction of the data. He argues that his work on mitigating this is relevant to AI safety: even superintelligent AGI will be vulnerable to data poisoning due to time constraints on computation, and the fact that data poisoning is easier than resilient learning.

Trustworthy Deep Learning Course (Jacob Steinhardt, Dawn Song, Trevor Darrell) (summarized by Dan H): This underway course covers topics in AI Safety topics for current deep learning systems. The course includes slides and videos.

AI strategy and policy

How Sure are we about this AI Stuff? (Ben Garfinkel) (summarized by Richard): Ben outlines four broad arguments for prioritising work on superintelligent AGI: that AI will have a big influence over the long-term future, and more specifically that it might cause instability, lock-in or large-scale "accidents". He notes the drawbacks of each line of argument. In particular, the "AI is a big deal" argument doesn't show that we have useful leverage over outcomes (compare a Victorian trying to improve the long-term effects of the industrial revolution). He claims that the next two arguments have simply not been researched thoroughly enough to draw any conclusions. And while the argument from accidents has been made by Bostrom and Yudkowsky, there hasn't been sufficient elaboration or criticism of it, especially in light of the recent rise of deep learning, which reframes many ideas in AI.

Richard's opinion: I find this talk to be eminently reasonable throughout. It highlights a concerning lack of public high-quality engagement with the fundamental ideas in AI safety over the last few years, relative to the growth of the field as a whole (although note that in the past few months this has been changing, with three excellent sequences released on the Alignment Forum, plus Drexler's technical report). This is something which motivates me to spend a fair amount of time writing about and discussing such ideas.

One nitpick: I dislike the use of "accidents" as an umbrella term for AIs behaving in harmful ways unintended by their creators, since it's misleading to describe deliberately adversarial behaviour as an "accident" (although note that this is not specific to Ben's talk, since the terminology has been in use at least since the Concrete problems paper).

Summary of the 2018 Department of Defense Artificial Intelligence Strategy (DOD)

Other progress in AI   Reinforcement learning

The Hanabi Challenge: A New Frontier for AI Research (Nolan Bard, Jakob Foerster et al) (summarized by Richard): The authors propose the cooperative, imperfect-information card game Hanabi as a target for AI research, due to the necessity of reasoning about the beliefs and intentions of other players in order to win. They identify two challenges: firstly, discovering a policy for a whole team that allows it to win (the self-play setting); and secondly, discovering an individual policy that allows an agent to play with an ad-hoc team without previous coordination. They note that successful self-play policies are often very brittle in the ad-hoc setting, which makes the latter the key problem. The authors provide an open-source framework, an evaluation benchmark and the results of existing RL techniques.

Richard's opinion: I endorse the goals of this paper, but my guess is that Hanabi is simple enough that agents can solve it using isolated heuristics rather than general reasoning about other agents' beliefs.

Rohin's opinion: I'm particularly excited to see more work on ad hoc teamwork, since it seems like very similar to the setting we are in, where we would like to deploy AI system among groups of humans and have things go well. See Following human norms (AN #42) for more details.

Read more: A cooperative benchmark: Announcing the Hanabi Learning Environment

A Comparative Analysis of Expected and Distributional Reinforcement Learning (Clare Lyle et al) (summarized by Richard): Distributional RL systems learn distributions over the value of actions rather than just their expected values. In this paper, the authors investigate the reasons why this technique improves results, by training distribution learner agents and expectation learner agents on the same data. They provide evidence against a number of hypotheses: that distributional RL reduces variance; that distributional RL helps with policy iteration; and that distributional RL is more stable with function approximation. In fact, distributional methods have similar performance to expectation methods when using tabular representations or linear function approximators, but do better when using non-linear function approximators such as neural networks (especially in the earlier layers of networks).

Richard's opinion: I like this sort of research, and its findings are interesting (even if the authors don't arrive at any clear explanation for them). One concern: I may be missing something, but it seems like the coupled samples method they use doesn't allow investigation into whether distributional methods benefit from generating better data (e.g. via more effective exploration).

Recurrent Experience Replay in Distributed Reinforcement Learning (Steven Kapturowski et al): See Import AI.

Visual Hindsight Experience Replay (Himanshu Sahni et al)

A Geometric Perspective on Optimal Representations for Reinforcement Learning (Marc G. Bellemare et al)

The Value Function Polytope in Reinforcement Learning (Robert Dadashi et al)

Deep learning

A Conservative Human Baseline Estimate for GLUE: People Still (Mostly) Beat Machines (Nikita Nangia et al) (summarized by Dan H): BERT tremendously improves performance on several NLP datasets, such that it has "taken over" NLP. GLUE represents performance of NLP models across a broad range of NLP datasets. Now GLUE has human performance measurements. According to the current GLUE leaderboard, the gap between human performance and models fine-tuned on GLUE datasets is a mere 4.7%. Hence many current NLP datasets are nearly "solved."

News

Governance of AI Fellowship (Markus Anderljung): The Center for the Governance of AI is looking for a few fellows to work for around 3 months on AI governance research. They expect that fellows will be at the level of PhD students or postdocs, though there are no strict requirements. The first round application deadline is Feb 28, and the second round application deadline is Mar 28.

Copyright © 2019 Rohin Shah, All rights reserved.


Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.



Discuss

Three Kinds of Research Documents: Clarification, Explanatory, Academic

14 февраля, 2019 - 00:25
Published on February 13, 2019 9:25 PM UTC

Epistemic Status: Low. This was a quick idea, but the grouping honesty doesn't work as well as I'd like. I still think it could be useful to some people though. Ideas appreciated.

Recently I have started writing more and have been trying to be more intentional with what I accomplish. Different documents have different purposes and it seemed useful to help clarify this. Here is a list of three specific different types I think are relevant on LessWrong and similar.

Clarification

I see clarification posts as generally the first instance of information being written down. Here it is important to get the essential ideas out there and to create consensus around terminology among the most interested readers. In some cases, the only interested reader may be the author, who would use the post just to help cement their ideas for themselves.

Clarification posts may not be immediately useful and require later posts or context for them to make sense. This is typically fine. There's often not a rush for them to be understood. In many cases, there is a lot of possible information to write down, so the first step is to ensure it's out there, even if it's slow, hard to read, or doesn't much make sense until later.

I think of many of Paul Christiano's posts as clarification posts. They're very numerous and novel, but quite confusing to many readers (at least, to myself and several people I've talked to). Sometimes the terminology changes from one post to the next. I used to see this is somewhat of a weakness, but now it comes across to me as a pragmatic option. If he were to have tried to make all of this readable to the average LessWrong reader, there's likely no way he could have written a portion as much.

One important point here is that if something is a clarification post, then the main relevant feedback is on the core content, not the presentation. Giving feedback on the readability can still be useful, but it should be understood and expected that this isn't the main goal.

Explanatory

Explanatory posts seek to explain content to people. The focus here is on accessibility. Often the main ideas are already documented somewhere, but the author thinks that they could do a better job explaining them to their intended audience.

I would categorize some of the recent posts on Embedded Agency as being explanatory. Some of them have very nice diagrams and are elegantly laid out. I believe much of the content comes from earlier work that was a lot more fragmented and experimental. Zhukeepa's recent overview of Paul Christiano's work also is a good example.

Academic

Academic documents, as I interpret them, aim to be acceptable to the academic community or considered academic. Some attributes that typically go along with this include:

  • The academic article structure
  • Citations, generally of other academic works
  • Discussion of how work fits in with existing academic literature
  • A high level of rigor and completeness
  • An expectation that the main terms and ideas won't change much
  • PDF formatting

There can definitely be a lot of signaling going on here. Many people see academic seeming articles as substantially more trustworthy and impressive than other works.

That said, I feel like there are some useful attributes to these works besides signaling. For one, it's a format well suited to interfacing with the academic world. Interfacing with the academic world can be quite valuable, especially in domains with substantial academic work. Also, the format has become popular for some valid reasons around robustness and context.

As an example, MIRI's official papers fit into this category.

Academic-oriented posts don't need to be PDFs. I would consider my post on Prediction-Augmented Evaluation Systems to partially be in this category, and several EA Forum posts to partially be in this category (examples here, here, and here.)

There are some documents that do a good job being both "academic" and "explanatory." I think these should be considered a mix of both.

Further Thought

I think the main take away of this post is that some documents exist for the main purpose of clarification, and should be understood as such. I myself currently have a lot of ideas I want to write down and intend to focus on clarification posts for a while.

The distinction between explanatory and academic documents doesn't seem as novel nor as elegant to me. I'd be really curious if readers can post in the comments with improvements on this ontology or better examples.



Discuss

Humans interpreting humans

13 февраля, 2019 - 22:03
Published on February 13, 2019 7:03 PM UTC

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

In a previous post, I showed how, given certain normative assumptions, one could distinguish agents H for whom anchoring was a bias, from those H′ for which it was a preference.

But agent H′ looks clearly ridiculous - how could anchoring be a bias, it makes no sense. And I agree with that assessment! H′'s preferences make no sense - if we think of it as a human.

Humans model each other in very similar ways

This is another way in which I think we can extract human preferences: using the fact that human models of each other, and self-models, are all incredibly similar. Consider the following astounding statements:

  • If somebody turns red, shouts at you, then punches you in the face, they are probably angry at you.
  • If somebody is drunk, they are less rational at implementing long-term plans.
  • If somebody close to you tells you an intimate secret, then they probably trust you.

Most people will agree with all those statements, to a large extent - including the "somebody" being talked about. But what is going on here? Have I not shown that you can't deduce preferences or rationality from behaviour? It's not like we've put the "somebody" in an FMRI scan to construct their internal model, so how do we know?

The thing is, that natural selection is lazy, and a) different humans use the same type of cognitive machinery to assess each other, and b) individual humans tend to use their own self-assessment machinery to assess other humans. Consequently, there tends to be large agreement between our own internal self-assessment models, our models of other people, other people's models of other people, and other people's self-assessment models of themselves:

This agreement is not perfect, by any means - I've mentioned that it varies from culture to culture, individual to individual, and even within the same individual. But even so, we can add the normative assumption:

  • β: If H is a human and G another human, then G's models of H's preferences and rationality are informative of H's preferences and rationality.

That explains why I said that H was a human, while H′ was not: my model of what a human would prefer in those circumstances was correct for H but not for H′.

Implicit models

Note that this modelling is often carried out implicitly, through selecting the scenarios, and tweaking the formal model, so as to make the agent being assessed more human-like. With many variables to play with, it's easy to restrict to a set that seems to demonstrate human-like behaviour (for example, using almost-rationality assumptions for agents with small action spaces but not for agents with large ones).

There's nothing wrong with this approach, but it needs to be made clear that, when we are doing that, we are projecting our own assessments of human rationality on the agent; we not making "correct" choices as if we were dispassionately improving the hyperparameters of an image recognition program.



Discuss

Anchoring vs Taste: a model

13 февраля, 2019 - 22:03
Published on February 13, 2019 7:03 PM UTC

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

Here I'll develop my observation that anchoring bias is formally similar to taste based preferences, and develop some more formalism for learning the values/preferences/reward functions of a human.

Anchoring or taste

An agent H (think of them as a simplified human) confronts one of two scenarios:

  • In scenario I, the agent sees a movie scene where someone wonders how much to pay for a bar of chocolate, spins a wheel, and gets either £0.01 or £100. Then H is asked how much they would spend for the same bar of chocolate.

  • In scenario II, the agent sees a movie scene in which someone eats a bar of chocolate, which reveals that the bar has nuts, or doesn't. Then H is asked how much they would spend for the same bar of chocolate.

In both cases, H will spend £1 for the bar (£0.01/no nuts) or £3 (£100/nuts).

We want to say that scenario I is due to anchoring bias, while scenario II is due to taste differences. Can we?

Looking into the agent

We can't directly say anything about H just by their actions, of course - even with simplicity priors. But we can make some assumptions if we look inside their algorithm, and see how they model the situation.

Assume that H's internal structure consists of two pieces: a modeller M and an assessor A. Any input i is streamed to both M and A. Then M can interrogate A by sending an internal variable v, receives another variable in return, and then outputs o.

In pictures, this looks like this, where each variable has been indexed by the timestep at which it is transmitted:

Here the input i1 decomposes in m (the movie) and q (the question). Assume that these variables are sufficiently well grounded that when I describe them ("the modeller", "the movie", "the key variables", and so on), these descriptions mean what they seem to.

So the modeller M will construct a list of all the key variables, and pass these on to the assessor A to get an idea of the price. The price will return in v3, and then M will simply output that value as o4.

A human-like agent

First we'll design H to look human-like. In scenario I the modeller M will pass v2=q to the assessor A - only the question q= "how much is a bar of chocolate worth?" will be passed on (in a real world scenario, more details about what kind of chocolate it is would be included, but let's ignore those details here). The answer v3 will be £1 or £3, as indicated above, dependent on m (which is also an input into A).

In scenario II, the modeller will pass on v2={q,n} where n is a boolean that indicates whether the chocolate contains nuts or not. The response v3 will be £1 if n=0 (false) or £3 if n=1 (true).

Can we now say that anchoring is a bias but the taste of nuts is a preference? Almost, we're nearly there. To complete this, we need to make the normative assumption:

  • α: key variables that are not passed on by M are not relevant to the agent's reward function.

Now we can say that anchoring is a bias (because the variable that changes the assessment, the movie, affects A but is not passed on via M), while taste is likely a preference (because the key taste variable is passed on by M).

A non-human agent

We can also design an H′ with the same behaviour as H, but clearly non-human. For H′, v′2=q in scenario II, while v′2={q,n} is scenario I, where n is a boolean encoding whether the movie-chocolate was bought for £0.01 or for £100.

In that case, α will assess anchoring as a demonstration of preference, while the presence of nuts is clearly an irrational bias. And I'd agree with this assessment - but I wouldn't call H′ a human, for reasons explained here.



Discuss

Individual profit-sharing?

13 февраля, 2019 - 20:58
Published on February 13, 2019 5:58 PM UTC


Here's a sketch of an idea:

  • Design an open-source legal agreement that two people sign.
  • The contract states that each person agrees to give the other 1% of their annual earnings, each year for X years. (Ideally X = several decades; both duration & percentage could be customized)
  • Contract is legally binding; each year both parties pay out to each other.
  • Not exclusive: a person could be in multiple contracts simultaneously (e.g. 5 contracts with 5 friends, sharing in total 5% of their annual earnings).

Two motivations for signing a contract like this:

  1. Diversify one's career & earnings risk by "investing" in admired peers.
  2. Deepen one's relationship with the other signee (signing isn't a thing to be taken lightly); signing signals intimacy & desire to build a longterm relationship with the other person.

Of course there are lots of ways something like this could go awry.

Has anyone heard of people doing something like this?

What are existing mechanisms that do something like this? Examples I've encountered already include marriage (50% profit-sharing indefinitely, at least in the US) and Kibbutzim (100% profit-sharing during one's tour of duty).



Discuss

The RAIN Framework for Informational Effectiveness

13 февраля, 2019 - 15:54
Published on February 13, 2019 12:54 PM UTC

[Epistemic status: Medium-Uncertainty. I've only spent a few days thinking about this, but it seems to fit well for some specific problem so far.]

The following describes one possible framework for understanding the usefulness of different sources of information. It's particularly meant to help value source types such as books, academic articles, blog posts, online comments, and mathematical models. I think it could be a useful starting point but would guess that there are better alternatives upon further deliberation.

The framework factors are robustness, importance, novelty, and accessibility.

Simple Use Cases

There are multiple things this could be useful for, I'm sure most of which I haven't yet considered. For a start, I would hope that it could be used when discussing options on either writing information for others or deciding what materials to encourage.

Some possible discussion quotes relating to this framework
"These blog posts are quite novel, but I think they aren't very robust."
"This video may not be very dense, but it is highly accessible."
"This paper has a lot of equations but they don't seem useful to the point. It's both inaccessible and not robust."
"I think that you can change your paper to make it more accessible without sacrificing any robustness."

Informational Effectiveness

People use informational resources (books, videos, etc), in part, to learn information. There are many important attributes of such resources that will impact the quality and magnitude of such learning. Here I wrap these in the total term "information effectiveness."

Information effectiveness, when most narrowly estimated, is context-specific to an agent or group of readers or writers. It is specific to a set of topics; for instance, a particular article by George about politics may be considered ineffective on the topic of politics, but highly effective in it's revealed information about George's beliefs.

Informational effectiveness could be judged for any quantity of information; an entire book, a "per-page average", a "per-bit average", or similar.

In this document, we focus on "reader information effectiveness", which seeks to understand the effectiveness of information to readers. Similar frameworks could be made for writers; for instance, they may have goals such as persuading readers of specific claims or generating status.

To give a simple example, if you were to read a document that was enjoyable, seemed trustworthy, and became significantly life-changing in a positive way, that would be considered to have high informational effectiveness. If you were to read a boring archaic tome by a highly unreliable author about a topic not at all important to you, then that would be considered to have low information effectiveness. To be clear, this says more about the relationship between yourself and the text than about the text itself; in each case, it's possible other readers could have had very different reactions.

While reader informational effectiveness varies per reader, there are expected to be strong correlations between readers on many dimensions. For example, one article may be highly biased. This may not be a big deal for a reader incredibly well read on the particular author's biases, but would likely be a significant deterrent for most readers. Therefore such an article could be rated as having "low expected informational effectiveness" for a collection of possible readers.

RAIN Factors

The RAIN framework lists four factors that I think may be relatively intuitive, mutually-exclusive, relatively exhaustive, and relatively low in internal correlations. These factors are robustness, importance, novelty, and accessibility.

Hypothetically one could directly calculate the expected value of all information sources on all agents on all tasks, but this would be challenging and may not break down into the most intuitive substructures. This framework may provide a more pragmatic approach.

Robustness

Robustness describes how valid the information understood by a reader would be expected to be. This could mean a few different things in different contexts. If a reader is reading an article expressing several claims, robustness would refer to the expected validity of those claims. If the reader is reading a table of data, robustness would refer to the expected validity of that data. If the reader is reading an article by an obviously biased source, but is reading it for information unimpacted from that bias, then that information can be robust.

Robustness can itself be broken down into further factors.

Verifiability If claims or data are described, can those be easily verified? One way of doing this is by being able to explicitly falsify this information.

Bias (Noise) Based on the author's background, the medium, and the intended audience, can this information be expected to be false, misleading, or selectively chosen to create bias that would be disliked by the reader?

Scrutinization If the information came from a human author, did it go through rounds of scrutinization by unbiased and qualified parties? Will other parties pay attention to it and be able to disprove questionable claims? Even in specific cases where scrutinization itself is not obvious, the threat of it could promote accuracy.

Importance

Importance here is very similar to the Importance attribute of the ITN framework. Information content is important to a reader if it describes information that is highly decision relevant to the reader. This is very similar to it having high "value of information", though it is not constrained to any one decision the reader may be facing. Note that it is possible that information content could be high in importance but still useless; for instance, if the reader already knows all of that information.

Novelty

Information is novel to a learner if that learner does not yet know that information. If the reader does know that information, it would have zero educational value. I believe this is pretty self-evident.

Accessibility

Accessibility covers the relative cost or benefit of obtaining information. Typically learning bears costs, but not always. There are some educational information sources that are highly enjoyable and preferable even if not for the information value; these would be considered to have negative learning cost.

Unlike with the other three primary attributes, accessibility determines both costs and benefits. An unnecessarily difficult-to-read book would probably have readers struggle more per unit learned (a cost), but also have them give up before learning all the available content (a lack of benefit).

As with robustness, accessibility can be broken down further.

Availability Information may not be easily available to many possible learners. This could be because it is behind a paywall, only shared within an exclusive group, or difficult to discover. In cases like video, it may not be available in websites that offer variable speeds. There could also be substantial parts missing.

Understandability Even when information is technically available, it may be difficult for some readers to understand. This could be reader-specific; a technical article may have high understandability, and thus information effectiveness, for some readers, but not others.

Most documents take a lot of time to understand, and then have some expected limit of understanding for a given reader. Both of these cost considerations can be significant and would go under the title of understandability.

Enjoyability If information is strongly unenjoyable, that would count as a cost for the learner. Enjoyment could come from many traits such as simplicity, elegance, low required mental effort, and humor. There could also be personally beneficial factors such as reinforcing the learner's identity or making them feel intelligent.

Compactness Compactness describes the density of relevant information.

Common Trade-Offs

I think there are some common correlations between the four factors, and that these come about for different reasons.

Robustness vs. Accessibility

Some common ways of making information sources more robust include things that would make them less generally accessible.

High-Robustness, Low-Accessibility Example Technical papers with lots of proofs, citations, and carefully described terminology.

Low-Robustness, High-Accessibility Example Short emotionally-charged opinion pieces.

Importance vs. Accessibility

People generally seem to like it when information is useful to them, but on the other hand, the most accessible information for them is generally not the most important.

High-Importance, Low-Accessibility Example Facts involving difficult truths. For a group at war, this could be, "You are very likely to lose, and if you really should surrender immediately."

Low-Importance, High-Accessibility Example Writings about the lives of cultural celebrities.

Robustness vs. Novelty

When information is not novel, the learner would have a greater ability to validate it against their existing knowledge. Also, if one believes there is generally a much wider variety of false information than true information, then on average the false information would be more novel.

High-Robustness, Low-Novelty Example Scientific statements that can be reasonably verified, because almost all are already known well by the readers.

Low-Robustness, High-Novelty Example Sophisticated conspiracy theories complete new to the readers but very unlikely to be true.

Accessibility vs. Novelty

If information is not at all novel it may be boring, which would reduce accessibility. On the other hand, if it is too novel, it may be mentally challenging to process, also reducing accessibility.

High-Accessibility, Low-Novelty Example A movie that the viewers have seen before but still enjoy. They don't have to struggle to follow it because they already know it well.

Low-Accessibility, High-Novelty Example A 120-minute, highly-dense academic seminar on a very new topic to the audience.

Density vs. Accessibility

This is similar to the accessibility/novelty tradeoff. Very dense and very sparse information is typically low in accessibility.

High-Density, Low-Accessibility Example A dense math logic textbook with derivations but very few explanations.

Low-Density, High-Accessibility Example An extensive video series on a relatively simple subject.

Future Work

The current framework is not tied to any specific mathematical model. I think that one is possible, though it may not map 1-1 with the accessibility term specifically.

It would be interesting to attempt to provide rubrics or quantifications for each factor. I'd also be interested, of course, in applying this framework in different ways to various available information sources.

For any specific in-progress informational work, there would be an effective "Pareto frontier" of RAIN factors. Understanding how to weight these factors for future works could be quite useful.



Discuss

On Long and Insightful Posts

13 февраля, 2019 - 07:59
Published on February 13, 2019 3:52 AM UTC

Concise articles are more constructive because they are easier to refute.



Discuss

Layers of Expertise and the Curse of Curiosity

13 февраля, 2019 - 02:41
Published on February 12, 2019 11:41 PM UTC

Epistemic status: oversimplification of a process I'm confident about; meant as proof of concept.

Related to: Double-Dipping in Dunning-Kruger

Expertise comes in different, mostly independent layers. To illustrate them, I will to describe the rough process of a curious mind discovering a field of study.

Discovery

In the beginning, the Rookie knows nothing. They have no way to tell what's true or false in the field. Anything they say about it will probably be nonsense, or at best, not better than chance.

Consider a child discovering astronomy. They know the Sun and the Moon move in the sky, that other planets and stars exist, but they wonder about the mysterious domain of space. They open a book, or watch a few videos, and their first discoveries are illuminating. The Moon goes around the Earth which goes around the Sun, the other stars are very very far. Everything makes sense, because beginner material is designed to make sense.

The basic facts are overwhelming. They feel so valuable and wondrous that they have to be shared with other children. They know nothing! The knowledge gap is so large that the enlightened child is viewed as an Expert, and for a while the little explorer does feel like one.

However, the child is still a Rookie. They start talking about how planets go in perfect circles around the Sun, that there's nothing but interstellar space beyond Pluto except maybe comets, because introductory material is fuzzy on the details. The child may be overconfident, until someone more educated points out the mistake. Then Curiosity kicks in.

Learning iteration

When discovering gaps in their knowledge, one with a curious mind will strive to fill them. They will seek new material, kind teachers, and if they're lucky they'll learn more and more. This is the first layer of expertise: accumulation of true facts. Repeatedly, they will be confronted with their own ignorance: for each new shard of knowledge they reveal, dozens appear still shrouded. Every time they think they've exhausted the field, an unexpected complexity will show them wrong.

At some point they will internalize the pattern: the field is deep, and full of more details than they can learn in a lifetime. They will be cautious about their learning process, acknowledging they may be wrong, that their models of reality aren't perfect, that they don't know all there is to know about their field. This is the second layer of expertise: realization of one's limitations.

Faced with the ever-incomplete nature of their discoveries, the curious mind will still learn, and eventually hit against the open problems of their field. Suddenly, reaching new knowledge is much more expensive. The frontier is full of conjectures, uncertainties and gaps. Venturing outside the well-studied questions comes with the risk of accidentally spouting nonsense. We can't have that! After learning so much, making Rookie mistakes would be unforgivable, wouldn’t it?

Underconfident experts

A failure mode appears when an Expert confuses:

  1. knowing all there is to know about X;
  2. knowing all that is currently known about X;
  3. knowing more about X than almost any non-Expert.

An Expert may be very familiar with their own limits, but they may forget how far they have pushed them. They may consider the solutions of actually unsolved problems as something they should know. They may look at other Experts, lament they aren't as knowledgeable as them on certain details, downplay how much they actually know, and underestimate the quality of their own advice since it's not perfect.

This happens if the Expert has no way to figure out their own level. Sure, you can teach the basic facts to Rookies, but any Intermediate can do that, right? Maybe you can even teach advanced stuff to Intermediates, but you feel like you have to point out everything that you can't teach because you don't know everything, and surely a True Expert should know this better than you...

What's missing is the third layer of expertise: evaluation of your own competence. The value of your expertise is mostly relative to the state of the art. For instance, any chemistry undergrad knows more about radioactivity than Marie Curie ever did. Yet she was the leading expert of her time, made immense contributions to the field, and still died following what we consider today a Rookie mistake (high doses of radiation are bad for your health).

One of the upsides of PhD graduation is that you get explicit confirmation by your peers that you're the leading expert (or nearly) on your chosen topic. This is sometimes hard to accept. Research comes with a lot of failures, and it takes time to internalize that an Expert is allowed to fail so often. However, this problem is no longer limited to academic settings.

Validation scarcity

The curious mind today is in luck. Vast swathes of knowledge are available on the Internet. You can binge-read your favorite encyclopedia, specialized blogs, follow scientists on Twitter, dive into arXiv if you're driven enough.

However, easy access to knowledge doesn't help you reach the third layer of Expertise. You may learn as much as you can, and figure out your limitations, yet Acknowledged Experts won't have time to validate you. Online courses will give you some sort of diploma, but you're not sure it's as valuable as college degrees, and you heard that even those are mostly signaling something other than Expertise.

Increasing the supply of knowledge creates lots of learners, who need validation. However, this demand for validation increases much slower than its supply, making it harder to get. Worse, overconfident learners won’t hesitate to post nonsense online, drowning the competent voices and misleading other confused learners. Only the Experts will be able to tell them apart, and they don’t have enough time for everyone!

The amount of Expert attention remaining equal, or growing slowly, the underconfident will fail to find validation and join the ranks of Experts, while the overconfident will not get corrected, hindering their progress.

Hence the curse of Curiosity: the more accessible knowledge is, the harder it is to ensure (and signal) you got it right.

Teaching shift

The above assumes that Experts are confident enough of their own level to be able to evaluate others in the field. This is the fourth layer of expertise: peer judgment. The ability to provide feedback, to point out someone else’s mistakes and progress, to keep Curiosity in the right direction.

Since this fourth layer is interactive by definition, there will be a signal of achievement, of understanding, some kind of proof that an Expert will evaluate. This could be sample problems to solve, a performance to give, an elaborate project to craft. It must be hard to fake, and quick to recognize. However, this signal is reliable only if it’s endorsed by Experts themselves. You can very well have a token degree for having completed an online course, but no assurance that it truly validates your expertise.

Part of teaching is making sure your students understand you and actually learn. Each field has its own methods to differentiate genuine understanding from guessing the teacher’s password.

As quality sources of knowledge get shared and incrementally refined, the value created by teachers shifts to evaluation rather than basic transmission. The curse of Curiosity entails that validation is scarce, and there is more to gain by designing proper tests of expertise, better rewards for curiosity, than by adding to the heap of available facts or making a lesson slightly clearer.

Exploration

One can excel in the first four layers of expertise: knowing lots of things, being aware of what you don’t know, of how much is currently known in general, and how much you and anyone else do know relatively to each other. This includes being able to show your skill, but with those layers alone, you’re ultimately limited to the state of the art.

Actively trying to figure out what isn’t yet known is the fifth layer of expertise: novel research. The previous layers aren’t a prerequisite. You could discover something new about a field without knowing much about it, but you shouldn’t count on it, and you may not even notice your lucky push of the boundaries of knowledge.

The curse of Curiosity doesn’t affect, strictly speaking, the fifth layer. You can attain by yourself a level high enough to do productive research, without needing peer validation. However, you don’t want to waste time on explorations that an Expert would recognized as confused or futile, and you may be underconfident that you’re an Expert yourself.

You don’t need a license to do great things. Still, you need to be reasonably confident that your efforts have a positive expected value, to stay motivated. As a corollary to the curse of Curiosity: self-confidence being harder to attain means there’s a fast-growing pool of unaware Experts, perfectly capable of doing productive research but believing they can’t.

I would argue this is a neglected problem.

Underlying assumptions

The above reasoning rests on the hypothesis (among others) that current expert validation supply scales roughly linearly with the existing number of Experts, as if each of them had a bounded amount of time to assess each piece of work, by grading, reviewing or otherwise producing valuable feedback.

In particular, we assume there isn’t any validation method that: (a) scales well with the number of Rookies, (b) is hard enough to fake to constitute a reliable validation signal.

This assumption doesn’t hold for domains where there are cheap ways to test predictions. Anyone can test their own expertise in intuitive ballistics by throwing balls; anyone can test their fluency in basic arithmetic by checking their results against a calculator. We assume that for most domains, the vast majority of validation is done by peers; only the foremost experts have the resources to test brand new predictions, which is where validation bottoms out; all other experts are either playing catch-up, or doing something other than original research.

The model also overlooks the gradual nature of expertise, as domain practitioners aren’t neatly separated between Experts and non-Experts. I posit that the curse of Curiosity holds anyway at every level of expertise, i.e. that the more accessible Nth-level knowledge is, the harder it is to find above-Nth-level validation. This position is stronger than the original formulation, and I’m slightly less confident about it.

What to expect from this model

As stated above, the curse of Curiosity implies a fast-growing pool of knowledgeable apparent Rookies, which aren’t recognized (and don’t consider themselves) as Experts. In other words, there’s a talent overhang, where a sudden improvement in validation methods would unlock a flood of previously hidden competent people.

I expect that the rise of average proficiency, in most academic domains, in the general population, is no longer constrained by knowledge scarcity, but by validation scarcity. Therefore, greater educative value would be created by better tests, better credentials, or easier access to experts, rather than clearer textbooks or wider diffusion of courses.

As an aside, I plan to clarify further my mental models of expertise, based on the five layers described above. I also hope to find more ideas related to scalable validation.

Thanks to gjm and other kind proofreaders for their feedback!



Discuss

Nuances with ascription universality

13 февраля, 2019 - 02:38
Published on February 12, 2019 11:38 PM UTC

Part of this post is based on points made by Scott Garrabrant and Abram Demski at a recent AI safety workshop put on by Alex Zhu.

Paul Christiano recently published a sequence of posts on a concept he calls "ascription universality." I highly recommend reading the full sequence of posts, though I will do my best to summarize the basic concept here. As I see it, ascription universality can be seen as a formalization of transparency/honesty: a system is ascription universal if, relative to our current epistemic state, its explicit beliefs contain just as much information as any other way of ascribing beliefs to it.

Of particular interest to this post is the last part—what does "any other way of ascribing beliefs" mean? Paul points this out as one of the most informal steps in his definition of ascription universality. In this post, I want to try and point out one way in which I think this process could go wrong.

First, one of the most obvious ways in which a system could fail to be ascription universal is if it is doing some sort of internal cognition which is at odds with its stated beliefs—if an AI system is attempting to deceive its programmers, for example. Such a system would fail to be ascription universal because there would be a way of ascribing beliefs to the internal cognition it is hiding from its programmers that would give the programmers information they currently don't have and can't get purely from the system's explicit output.

However, consider the following alternative system: take the system from the previous example and "memoize" it, replacing it with a simple lookup table that always outputs what the system from the previous example would have output. Is this system ascription universal? From a moral standpoint, we certainly don't want it to be, since it's still taking the same deceptive actions. But if your concept of "ascribing beliefs" only considers ways of looking at the system's internal cognition, you might think that this system is ascription universal and thus erroneously conclude that it is safe.

Is this likely to ever be an actual problem for a real-world AI system? I argue yes. Certainly, we don't expect our AI systems to be massive lookup tables. However, there is a sense in which, if we are only looking at an agent's internal cognition to determine whether it is ascription universal or not, we will miss a huge part of the optimization that went into producing that system's output: namely, that of the training process. For example, a training process might produce an algorithm which isn't itself performing any internal optimization—but rather doing something akin to the lookup table in the second example—that is nevertheless unsafe because the entries put into that lookup table by the training process are unsafe. One realistic way in which something like this could happen is through distillation. If a powerful amplified system is set upon the task of distilling another system, it might produce something akin to the unsafe lookup table in the second example.

This is not necessarily a difficult concern to address in the sense of making sure that any definition of ascription universality includes some concept of ascribing beliefs to a system by looking at the beliefs of any systems that helped create that system. However, if one ever wants to produce a practical guarantee of ascription universality for a real-world system, this sort of concern could end up causing lots of problems. For example, even if machine learning transparency research progresses to a point where all the internals of an AI system can be made transparent, that might not be enough to guarantee ascription universality if the training process that produced the system can't also be made similarly transparent.



Discuss

Learning preferences by looking at the world

13 февраля, 2019 - 01:25
http://bair.berkeley.edu/blog/assets/preferences/room.png

Functional silence: communication that minimizes change of receiver's beliefs

13 февраля, 2019 - 00:32
Published on February 12, 2019 9:32 PM UTC

I just thought of this concept and I'd like to know if it is new or useful.

"Functional silence" is communication intended to carry as little information as possible. It intends for the beliefs of listeners to be changed as little as possible.

This is usually the same as regular silence. The difference is in cases where regular silence would convey the wrong message, such as when asked a question that one cannot politely refuse to answer, when unwantedly prompted to give a statement or when padding a short message because more words are expected. In these cases, a range of techniques is commonly used:

  • Boilerplate statements. This is prepared words that have little or no relation to the subject at hand but are unobjectionable and usually express or imply an expectation of mutual respect or a mission statement.
  • Distraction / subject change. This will usually be towards a superficially related subject. For example, many politicians when asked to give a prediction will instead talk about what they want to happen. Failing that, humor is powerful here. A switch to a shiny subject or an applause light works much better than a switch to a boring subject.
  • Incomprehensibility. Use of lots of complicated words and allusions in quick succession in order to overload the listener's comprehension. Delivered with a friendly-seeming air of expectation that the listener obviously gets all of this, so the listener cannot ask without losing face. A classic salesman's technique.
  • Repetition. Saying the same thing again in different words. This is most vacuous if it repeats something already said in the same conversation, but anything the listener already knows the speaker has said will do.
  • Tautology. Statements that boil down to "if A is the case, then A is the case" without any explicit statement on whether A is the case. Sounds dumb when boiled down like this, but this is shockingly effective at convincing people that actual information has been shared.
  • The median response. Expresses the most average (hence predictable, boring) thing that average people say in this situation. This does transport a bit of information, but will probably at least not be remembered.

These seem different and there are surely others I've forgotten, but I'd like to call all of them "functional silence" in order to emphasize how they all do the same thing. They all intend to minimize the sharing of information, while denying that intent.

"Functional silence" is inherently deceptive and probably a Dark Art. Excuses might be made for it in some situations, where it maintains a relationship despite a lack of trust or it covers up an uncomfortable lull in a conversation, but my intuition is that it is just Wrong. It is easily found, especially in political and corporate communication, and lots of other places once you look for it. But because denying its own nature is inherently part of it, and because I didn't know a word for it, I have found it hard to nail down. So I thought maybe having a word for it would help? Helps me, anyway.

I should say that I really think of functional Schweigen, but Schweigen is the German word for choosing not to speak, which English (being a clearly inferior language) doesnt really have a word for.



Discuss

Maintenance Warning: LessWrong slower than usual for next few hours

13 февраля, 2019 - 00:17
Published on February 12, 2019 9:17 PM UTC

We are deploying a new version of the site which has a bunch of new features that required some changes to our database schemas, and are running some migrations scripts for that today. These will likely slow down our database, and might result in some timeouts or janky behavior.

No data should be lost in any case, but you might experience a few minutes of content you write not showing up, while we are in between the two new versions. Sorry for the inconvenience.



Discuss

Arguments for moral indefinability

12 февраля, 2019 - 13:40
Published on February 12, 2019 10:40 AM UTC

Epistemic status: I endorse the core intuitions behind this post, but am only moderately confident in the specific claims made.

Moral indefinability is the term I use for the idea that there is no ethical theory which provides acceptable solutions to all moral dilemmas, and which also has the theoretical virtues (such as simplicity, precision and non-arbitrariness) that we currently desire. I think this is an important and true perspective on ethics, and in this post will explain why I hold it, with the caveat that I'm focusing more on airing these ideas than constructing a watertight argument.

Here’s another way of explaining moral indefinability: let’s think of ethical theories as procedures which, in response to a moral claim, either endorse it, reject it, or do neither. Moral philosophy is an attempt to find the theory whose answers best match our intuitions about what answers ethical theories should give us (e.g. don’t cause unnecessary suffering), and whose procedure for generating answers best matches our meta-level intuitions about what ethical theories should look like (e.g. they should consistently apply impartial principles rather than using ad-hoc, selfish or random criteria). None of these desiderata are fixed in stone, though - in particular, we sometimes change our intuitions when it’s clear that the only theories which match those intuitions violate our meta-level intuitions. My claim is that eventually we will also need to change our meta-level intuitions in important ways, because it will become clear that the only theories which match them violate key object-level intuitions. In particular, this might lead us to accept theories which occasionally evince properties such as:

  • Incompleteness: for some claim A, the theory neither endorses nor rejects either A or ~A, even though we believe that the choice between A and ~A is morally important.
  • Vagueness: the theory endorses an imprecise claim A, but rejects every way of making it precise.
  • Contradiction: the theory endorses both A and ~A (note that this is a somewhat provocative way of framing this property, since we can always add arbitrary ad-hoc exceptions to remove the contradictions. So perhaps a better term is arbitrariness of scope: when we have both a strong argument for A and a strong argument for ~A, the theory can specify in which situations each conclusion should apply, based on criteria which we would consider arbitrary and unprincipled. Example: when there are fewer than N lives at stake, use one set of principles; otherwise use a different set).

Why take moral indefinability seriously? The main reason is that ethics evolved to help us coordinate in our ancestral environment, and did so not by giving us a complete decision procedure to implement, but rather by ingraining intuitive responses to certain types of events and situations. There were many different and sometimes contradictory selection pressures driving the formation of these intuitions - and so, when we construct generalisable principles based on our intuitions, we shouldn't expect those principles to automatically give useful or even consistent answers to very novel problems. Unfortunately, the moral dilemmas which we grapple with today have in fact "scaled up" drastically in at least two ways. Some are much greater in scope than any problems humans have dealt with until very recently. And some feature much more extreme tradeoffs than ever come up in our normal lives, e.g. because they have been constructed as thought experiments to probe the edges of our principles.

Of course, we're able to adjust our principles so that we are more satisfied with their performance on novel moral dilemmas. But I claim that in some cases this comes at the cost of those principles conflicting with the intuitions which make sense on the scales of our normal lives. And even when it's possible to avoid that, there may be many ways to make such adjustments whose relative merits are so divorced from our standard moral intuitions that we have no good reason to favour one over the other. I'll give some examples shortly.

A second reason to believe in moral indefinability is the fact that human concepts tend to be open texture: there is often no unique "correct" way to rigorously define them. For example, we all know roughly what a table is, but it doesn’t seem like there’s an objective definition which gives us a sharp cutoff between tables and desks and benches and a chair that you eat off and a big flat rock on stilts. A less trivial example is our inability to rigorously define what entities qualify as being "alive": edge cases include viruses, fires, AIs and embryos. So when moral intuitions are based on these sorts of concepts, trying to come up with an exact definition is probably futile. This is particularly true when it comes to very complicated systems in which tiny details matter a lot to us - like human brains and minds. It seems implausible that we’ll ever discover precise criteria for when someone is experiencing contentment, or boredom, or many of the other experiences that we find morally significant.

I would guess that many anti-realists are sympathetic to the arguments I’ve made above, but still believe that we can make morality precise without changing our meta-level intuitions much - for example, by grounding our ethical beliefs in what idealised versions of ourselves would agree with, after long reflection. My main objection to this view is, broadly speaking, that there is no canonical “idealised version” of a person, and different interpretations of that term could lead to a very wide range of ethical beliefs. I explore this objection in much more detail in this post. (In fact, the more general idea that humans aren’t really “utility maximisers”, even approximately, is another good argument for moral indefinability.) And even if idealised reflection is a coherent concept, it simply passes the buck to your idealised self, who might then believe my arguments and decide to change their meta-level intuitions.

So what are some pairs of moral intuitions which might not be simultaneously satisfiable under our current meta-level intuitions? Here’s a non-exhaustive list - the general pattern being clashes between small-scale perspectives, large-scale perspectives, and the meta-level intuition that they should be determined by the same principles:

  • Person-affecting views versus non-person-affecting views. Small-scale views: killing children is terrible, but not having children is fine, even when those two options lead to roughly the same outcome. Large-scale view: extinction is terrible, regardless of whether it comes about from people dying or people not being born.
  • The mere addition paradox, aka the repugnant conclusion. Small-scale view: adding happy people can only be an improvement. Large-scale view: a world consisting only of people whose lives are barely worth living is deeply suboptimal. (Note also Arrhenius' impossibility theorems, which show that you can't avoid the repugnant conclusion without making even greater concessions).
  • Weighing theories under moral uncertainty. I personally find OpenPhil's work on cause prioritisation under moral uncertainty very cool, and the fundamental intuitions behind it seem reasonable, but some of it (e.g. variance normalisation) has reached a level of abstraction where I feel almost no moral force from their arguments, and aside from an instinct towards definability I'm not sure why I should care.
  • Infinite and relativistic ethics. Same as above. See also this LessWrong post arguing against applying the “linear utility hypothesis” at vast scales.
  • Whether we should force future generations to have our values. On one hand, we should be very glad that past generations couldn't do this. But on the other, the future will probably disgust us, like our present would disgust our ancestors. And along with "moral progress" there'll also be value drift in arbitrary ways - in fact, I don't think there's any clear distinction between the two.

I suspect that many readers share my sense that it'll be very difficult to resolve all of the dilemmas above in a satisfactory way, but also have a meta-level intuition that they need to be resolved somehow, because it's important for moral theories to be definable. But perhaps at some point it's this very urge towards definability which will turn out to be the weakest link. I do take seriously Parfit's idea that secular ethics is still young, and there's much progress yet to be made, but I don't see any principled reason why we should be able to complete ethics, except by raising future generations without whichever moral intuitions are standing in the way of its completion (and isn't that a horrifying thought?). From an anti-realist perspective, I claim that perpetual indefinability would be better. That may be a little more difficult to swallow from a realist perspective, of course. My guess is that the core disagreement is whether moral claims are more like facts, or more like preferences or tastes - if the latter, moral indefinability would be analogous to the claim that there’s no (principled, simple, etc) theory which specifies exactly which foods I enjoy.

There are two more plausible candidates for moral indefinability which were the original inspiration for this post, and which I think are some of the most important examples:

  • Whether to define welfare in terms of preference satisfaction or hedonic states.
  • The problem of "maximisation" in utilitarianism.

I've been torn for some time over the first question, slowly shifting towards hedonic utilitarianism as problems with formalising preferences piled up. While this isn't the right place to enumerate those problems (see here for a previous relevant post), I've now become persuaded that any precise definition of which preferences it is morally good to satisfy will lead to conclusions which I find unacceptable. After making this update, I can either reject a preference-based account of welfare entirely (in favour of a hedonic account), or else endorse a "vague" version of it which I think will never be specified precisely.

The former may seem the obvious choice, until we take into account the problem of maximisation. Consider that a true (non-person-affecting) hedonic utilitarian would kill everyone who wasn't maximally happy if they could replace them with people who were (see here for a comprehensive discussion of this argument). And that for any precise definition of welfare, they would search for edge cases where they could push it to extreme values. In fact, reasoning about a "true utilitarian" feels remarkably like reasoning about an unsafe AGI. I don't think that's a coincidence: psychologically, humans just aren't built to be maximisers, and so a true maximiser would be fundamentally adversarial. And yet many of us also have strong intuitions that there are some good things, and it's always better for there to be more good things, and it’s best if there are most good things.

How to reconcile these problems? My answer is that utilitarianism is pointing in the right direction, which is “lots of good things”, and in general we can move in that direction without moving maximally in that direction. What are those good things? I use a vague conception of welfare that balances preferences and hedonic experiences and some of my own parochial criteria - importantly, without feeling like it's necessary to find a perfect solution (although of course there will be ways in which my current position can be improved). In general, I think that we can often do well enough without solving fundamental moral issues - see, for example, this LessWrong post arguing that we’re unlikely to ever face the true repugnant dilemma, because of empirical facts about psychology.

To be clear, this still means that almost everyone should focus much more on utilitarian ideas, like the enormous value of the far future, because in order to reject those ideas it seems like we’d need to sacrifice important object- or meta-level moral intuitions to a much greater extent than I advocate above. We simply shouldn’t rely on the idea that such value is precisely definable, nor that we can ever identify an ethical theory which meets all the criteria we care about.



Discuss

Triangle SSC Meetup-February

12 февраля, 2019 - 06:07
Published on February 12, 2019 3:07 AM UTC

Come join us at Bull City Ciderworks. We're a friendly, easygoing crowd :)



Discuss

Would I think for ten thousand years?

11 февраля, 2019 - 22:37
Published on February 11, 2019 7:37 PM UTC

Some AI safety ideas delegate key decisions to our idealised selves. This is sometimes phrased as "allowing versions of yourself to think for ten thousand years", or similar sentiments.

Occasionally, when I've objected to these ideas, it's been pointed out that any attempt to construct a safe AI design would involve a lot of thinking, so therefore there can't be anything wrong with delegating this thinking to an algorithm or an algorithmic version of myself.

But there is a tension between "more thinking" in the sense of "solve specific problems" and in the sense of "change your own values".

An unrestricted "do whatever a copy of Stuart Armstrong would have done after he thought about morality for ten thousand years" seems to positively beg for value drift (worsened by the difficulty in defining what we mean by "a copy of Stuart Armstrong [...] thought [...] for ten thousand years").

A more narrow "have ten copies of Stuart think about these ten theorems for a subjective week each and give me a proof or counter-example" seems much safer.

In between those two extremes, how do we assess the degree of value drift and its potential importance to the question being asked? Ideally, we'd have a theory of human values to help distinguish the cases. Even without that, we can use some common sense on issues like length of thought, nature of problem, bandwidth of output, and so on.



Discuss

"Normative assumptions" need not be complex

11 февраля, 2019 - 22:03
Published on February 11, 2019 7:03 PM UTC

.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}

I've shown that, even with simplicity priors, we can't figure out the preferences or rationality of a potentially irrational agent (such as a human H).

But we can get around that issue with 'normative assumptions'. These can allow us to zero in on a 'reasonable' reward function RH.

We should however note that:

  • Even if RH is highly complex, a normative assumption need not be complex to single it out.

This post gives an example of that for general agents, and discusses how a similar idea might apply to the human situation.

Formalism

An agent takes actions (A) and gets observations (O), and together these form histories, with H the set of histories (I won't present all the details of the formalism here). The policies Π={π:H→A} are maps from histories to actions. The reward functions R={R:H→R} are maps from histories H to real numbers), and the planners P={p:R→Π} are maps from reward functions to policies.

By observing an agent, we can deduce (part of) their policy π. Then a reward-planner pair (p,R) is compatible with π if p(R)=π. Further observations cannot distinguish between different compatible pairs.

Then a normative assumption α is something that distinguishes between compatible pairs. It could be a prior on P×R, or an assumption of full rationality (which removes all-but-the-rational planner from P), or something that takes in more details about the agent or the situation.

Assumptions that use a lot of information

Assume that the agent's algorithm π is written in some code, as Cπ, and that α will have access to this. Then suppose that α scans Cπ, looking for the following: an object CR that takes a history as an input and has a real number as an output, an object Cp that takes CR and a history as inputs, and outputs an action, and a guarantee that Cπ chooses actions by running Cp on CR and the input history.

The α need not be very complex to do that job. Because of rice's theorem and obfuscated code, it will be impossible for α to check those facts in general. But, for many examples of Cπ, it will be able to check that those things hold. In that case, let α return R; otherwise, let it return the trivial 0 reward.

So, for a large set S of possible algorithms, α can return a reasonable reward function estimate. Even if the complexity of Cπ and R is much, much higher than the complexity of α itself, there are still examples of these where α can successfully identity the reward function.

Of course, if we run α on a human brain, it would return 0. But what I am looking for is not α, but a more complicate αH, that, when run on the set SH of human agents, will extract some 'reasonable' RH. It doesn't matter what αH does when run on non-human agents, so we can load it with assumptions about how humans work. When I talk about extracting preferences through looking at internal models, this is the kind of thing I had in mind (along with some method for synthesising those preferences into a coherent whole).

So, though my desired αH might be complex, there is no a priori reason to think that it need be as complex as the RH output.



Discuss

Why do you reject negative utilitarianism?

11 февраля, 2019 - 21:28
Published on February 11, 2019 3:38 PM UTC

(Crossposted on the EA Forum)

Absolute negative utilitarianism (ANU) is a minority view despite the theoretical advantages of terminal value monism (suffering is the only thing that motivates us “by itself”) over pluralism (there are many such things). Notably, ANU doesn’t require solving value incommensurability, because all other values can be instrumentally evaluated by their relationship to the suffering of sentient beings, using only one terminal value-grounded common currency for everything.

Therefore, it is a straw man argument that NUs don’t value life or positive states, because NUs value them instrumentally, which may translate into substantial practical efforts to protect them (compared even with someone who claims to be terminally motivated by them).

If the rationality and EA communities are looking for a unified theory of value, why are they not converging (more) on negative utilitarianism?

What have you read about it that has caused you to stop considering it, or to overlook it from the start?

Can you teach me how to see positive states as terminally (and not just instrumentally) valuable, if I currently don’t? (I still enjoy things, being closer to the extreme of hyperthymia than anhedonia. Am I platonically blind to the intrinsic aspect of positivity?)

And if someone wants to answer: What is the most extreme form of suffering that you’ve experienced and believe can be “outweighed” by positive experiences?



Discuss

Страницы