Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 29 минут 8 секунд назад

Exploring safe exploration

7 января, 2020 - 00:07
Published on January 6, 2020 9:07 PM UTC

This post is an attempt at reformulating some of the points I wanted to make in “Safe exploration and corrigibility” in a clearer way. This post is standalone and does not assume that post as background.

In a previous comment thread, Rohin argued that safe exploration is best defined as being about the agent not making “an accidental mistake.” I think that definition is wrong, at least to the extent that I think it both doesn't make much sense and doesn't describe how I actually expect current safe exploration work to be useful.

First, what does it mean for a failure to be an “accident?” This question is simple from the perspective of an engineer outside the whole system—any unintended failure is an accident, encapsulating the majority of AI safety concerns (i.e. “accident risk”). But that's clearly not what the term “accidental mistake” is pointing at in this context—rather, the question here is what is an accident from the perspective of the model? Intuitively, an accident from the perspective of the model should be some failure that the model didn't intend or wouldn't retroactively endorse. But that sort of a definition only makes sense for highly coherent mesa-optimizers that actually have some notion of intent. Maybe instead we should be thinking of this from the perspective of the base optimizer/loss function? That is, maybe a failure is an accidental failure if the loss function wouldn't retroactively endorse it (e.g. the model got a very low reward for making the mistake). By this definition, however, every generalization failure is an accidental failure such that safe exploration would just be the problem of generalization.

Of all of these definitions, the definition defining an accidental failure from the perspective of the model as a failure that the model didn't intend or wouldn't endorse seems the most sensical to me. Even assuming that your model is a highly coherent mesa-optimizer such that this definition makes sense, however, I still don't think it describes current safe exploration work, and in fact I don't think it's even really a safety problem. The problem of producing models which don't make mistakes from the perspective of their own internal goals is precisely the problem of making powerful, capable models—that is, it's precisely the problem of capability generalization. Thus, to the extent that it's reasonable to say this for any ML problem, the problem of accidental mistakes under this definition is just a capabilities problem. However, I don't think that at all invalidates the utility of current safe exploration work, as I don't think that current safe exploration work is actually best understood as avoiding “accidental mistakes.”

If safe exploration work isn't about avoiding accidental mistakes, however, then what is it about? Well, let's take a look at an example. Safety Gym has a variety of different environments containing both goal states that the agent is supposed to reach and unsafe states that the agent is supposed to avoid. From OpenAI's blog post: “If deep reinforcement learning is applied to the real world, whether in robotics or internet-based tasks, it will be important to have algorithms that are safe even while learning—like a self-driving car that can learn to avoid accidents without actually having to experience them.” Why wouldn't this happen naturally, though—shouldn't an agent in a POMDP always want to be careful? Well, not quite. When we do RL, there are really two different forms of exploration happening:[1]

  • Within-episode exploration, where the agent tries to identify what particular environment/state it's in, and
  • Across-episode exploration, which is the problem of making your agent explore enough to gather all the data necessary to train it properly.

In your standard episodic POMDP setting, you get within-episode exploration naturally, but not across-episode exploration, which you have to explicitly incentivize.[2] Because we have to explicitly incentivize across-episode exploration, however, it can often lead to behaviors which are contrary to the goal of actually trying to achieve the greatest possible reward in the current episode. Fundamentally, I think current safe exploration research is about trying to fix that problem—that is, it's about trying to make across-episode exploration less detrimental to reward acquisition. This sort of a problem is most important in an online learning setting where bad across-episode exploration could lead to catastrophic consequences (e.g. crashing an actual car to get more data about car crashes).

Thus, rather than define safe exploration as “avoiding accidental mistakes,” I think the right definition is something more like “improving across-episode exploration.” However, I think that this framing makes clear that there are other types of safe exploration problems—that is, there are other problems in the general domain of making across-episode exploration better. For example, I would love to see an exploration of how different across-episode exploration techniques impact capability generalization vs. objective generalization—that is, when is across-episode exploration helping you collect data which improves the model's ability to achieve its current goal versus helping you collect data which improves the model's goal?[3] Because across-episode exploration is explicitly incentivized, it seems entirely possible to me that we'll end up getting the incentives wrong somehow, so it seems quite important to me to think about how to get them right—and I think that the problem of getting them right is the right way to think about safe exploration.

  1. This terminology is borrowed from Rohin's first comment in the same comment chain I mentioned previously. ↩︎

  2. With some caveats—in fact, I think a form of across-episode exploration will be instrumentally incentivized for an agent that is aware of the training process it resides in, though that's a bit of a tricky question that I won't try to fully address now (I tried talking about this somewhat in “Safe exploration and corrigibility,” though I don't think I really succeeded there). ↩︎

  3. This is what I somewhat confusingly called the “objective exploration problem” in “Safe exploration and corrigibility.” ↩︎


Open & Welcome Thread - January 2020

6 января, 2020 - 22:42
Published on January 6, 2020 7:42 PM UTC

  • If it’s worth saying, but not worth its own post, here's a place to put it.
  • And, if you are new to LessWrong, here's the place to introduce yourself.
    • Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are welcome.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ.

The Open Thread sequence is here.


Looking at Mixers

6 января, 2020 - 19:10
Published on January 6, 2020 4:10 PM UTC

This is not an official post by the BIDA Board, and the Board has not reviewed it.

In Spring 2015 BIDA bought a Mackie 1604-VLZ Pro mixer used for $275. It wasn't the ideal mixer, but it was reasonably close and a good price. It's been starting to wear out for a while, however, and last night one of the two monitor mixes stopped working. While this one may be fixable, I think we should probably get a new mixer that's closer to what we want and sell this one. Here's what I think would be good in a mixer for a dance like BIDA:

  • At least 10 XLR inputs, ideally 16.

  • At least 2 pre-fader monitor outputs, ideally 3+

  • EQ with sweepable mids.

  • XLR outputs for the mains, and ideally the monitors.

  • Ideally, the monitors are covered by the mute.

  • Ideally, the monitors are covered by the EQ.

  • Ideally, few extra controls we're not going to use that, when set wrong, keep things from working in hard-to-diagnose ways (this is an issue with our current board).

I'm thinking that an analog board probably still makes sense, given that we have a range of people who run sound at the dance, but I could be convinced otherwise. Looking around, here's are the options that look best to me:

  • Mackie ProFX22v3: analog board with pretty much everything we need, though on the expensive end. It has 17 xlr inputs, two of which can be used as DIs, and 3 pre-fader monitor outputs. The EQ includes the monitors, but the mute unfortunately doesn't. About $700 new.

  • Mackie ProFX22v2: by going back a generation we can get something almost as good for less. This brings us down to 16 xlr inputs, none of which can be a DI, and two pre-fader monitor outputs. About $500 new.

  • Mackie ProFX16v3: another direction we could go is a smaller board. Dropping down a size from the ProFX22v3 brings us down to 11 xlr inputs, but doesn't change anything else. The big question is how often we would find ourselves needing more than 11 inputs. About $500 new.

  • Behringer XR18: this is a tablet-controlled mixer, which probably won't work for us. But it has sixteen XLR/TRS combo inputs, six monitor outputs, and lets you mix from anywhere in the hall. It includes its own WIFI, and I know a lot of sound people like it a lot. About $600 new.

  • KitchenAid KSM105: this is a very popular analog mixer, but it doesn't meet any of our requirements. I don't understand why so many people buy these when they're so under-spec'd.

Most of these also have cheaper options available if we bought used.

What mixers have people been using for live sound, and what do you like about them?

Comment via: facebook


What were the biggest discoveries / innovations in AI and ML?

6 января, 2020 - 10:42
Published on January 6, 2020 7:42 AM UTC

These could be theoretical breakthroughs (like "the idea of a perceptron" or [something Judea Pearl did]), or they could be watershed developments / new applications that don't necessarily involve much new theory (like AlexNet or AlphaGo). Anything that seems like an important development in AI, is fair game.

I want an independently-generated list of all the interesting developments of AI, over the whole life of the fields, for a research project that I'm working on.

Feel free to include ones that you, personally, think were a big deal in some way, even if most people don't think so.



The Universe Doesn't Have to Play Nice

6 января, 2020 - 05:08
Published on January 6, 2020 2:08 AM UTC

It's often helpful to think about the root cause of your disagreements with other people and it seems to me that one such cause is that I believe the universe doesn't have to play nice in terms of our ability to know anything, while other people do.

Here are some examples of where this assumption can be used:

  • The problem of skepticism - How do we know that any of the objects that we seem to perceive are real? Countless philosophers have tried to answer this question, but I'm happy to bite the bullet as I don't see any reason for believing that the universe has to give us this ability
  • Godel's theorem/incomputability: It may seem obvious that every mathematical truth has a proof or that we can write a program to compute everything, but we now know this is false
  • Problem of induction - Induction may have worked in the past, but we can't conclude that it'll work in the future without using induction itself which would be a circular argument
  • Boltzmann Brains - It totally seems as though we could construct a simulation where the majority of agents would be Boltzmann brains.
  • Bayesianism - How do we form a prior given that the prior is before any observation that could help us determine a reasonable prior? Some people see this as an argument against Bayesianism, but I see it as just another way the universe isn't nice
  • Theories that can explain anything - Evolutionary psychology has often been criticised as being able to provide just-so stories for whatever we observe. I don't believe that the universe is always kind enough to provide us with a nice clean empirical test to determine if a theory is true or not as opposed to subtly impacting our expectations over a wide range of results. This becomes worse once you take into account varying positions within a theory, as there could be dozens of incompatible schemes for constraining expectations based on a theory with almost nothing in common
  • Non-empirically testable facts - Some people believe if there isn't a clean empirical test to falsify a theory then it is irrelevant pseudo-science rubbish. But reality doesn't have to give us such clear cut boundaries. There are lots of things that are reasonable to posit or lean towards, but where we can't expect to ever know for certain.
  • Qualia - Many people believe qualia don't exist because we wouldn't be able learn about them empirically. But it seems spurious to assume nothing exists outside of our lightcone just because we can't observe it

See also:


What we Know vs. How we Know it?

6 января, 2020 - 03:30
Published on January 6, 2020 12:30 AM UTC

Two weeks ago I said:

The other concept I’m playing with is that “what we know” is inextricable from “how we know it”. This is dangerously close to logical positivism, which I disagree with my limited understanding of. And yet it’s really improved my thinking when doing historical research.

I have some more clarify on what I meant now. Let’s say you’re considering my ex-roommate, person P, as a roommate, and ask me for information. I have a couple of options.

Scenario 1: I turn over chat logs and video recordings of my interactions with the P. 

E.g., recordings of P playing music loudly and chat logs showing I’d asked them to stop.

Trust required: that the evidence is representative and not an elaborate deep fake.

Scenario 2: I report representative examples of my interactions with P.

E.g., “On these dates P played music really loudly even when I asked them to stop.”

Trust required: that from scenario 1, plus that I’m not making up the examples.

Scenario 3: I report summaries of patterns with P

E.g., “P often played loud music, even when I asked them to stop”

Trust required: that from scenario 2, plus my ability to accurately infer and report patterns from data.

Scenario 4: I report what a third party told me

E.g. “Mark told me they played loud music a lot”

Trust required: that from scenario 3, plus my ability to evaluate other people’s evidence

Scenario 5: I give a flat “yes good” or “no bad” answer.

E.g., “P was a bad roommate.”

Trust required: that from scenario 3 and perhaps 4, plus that I have the same heuristics for roommate goodness that you do.



The earlier the scenario, the more you can draw your own conclusions and the less trust you need to have in me. Maybe you don’t care about loud music, and a flat yes/no would drive you away from a roommate that would be fine for you. Maybe I thought I was clear about asking for music to stop but my chat logs reveal I was merely hinting, and you are confident you’ll be able to ask more directly. The more specifics I give you, the better an assessment you’ll be able to make.

Here’s what this looks like applied to recent reading:

Scenario 5: Rome fell in the 500s AD.

Even if I trust your judgement, I have no idea why you think this or what it means to you.

Scenario 4: In Rome: The Book, Bob Loblaw says Rome Fell in the 500s AD.

At least I can look up why Bob thinks this.

Scenario 3: Pottery says Rome fell between 300 and 500 AD.

Useful to experts who already know the power of pottery, but leaves newbies lost.

Scenario 2: Here are 20 dig sites in England. Those dated before 323 (via METHOD) contain pottery made in Greece (which we can identify by METHOD), those after 500 AD show cruder pottery made locally.

Great. Now my questions are “Can pottery evidence give that much precision?” and “Are you interpreting it correctly?”

Scenario 1: Please enjoy this pile of 3 million pottery shards.

Too far, too far.


In this particular example (from The Fall of Rome), 2-3 was the sweet spot. It allowed me to learn as much as possible with a minimum of trust. But there’s definitely room in life for 4; you can’t prove everything in every paper and sometimes it’s more efficient to offload it.

I don’t view 5 as acceptable for anything that’s trying to claim to be evidenced based, or at least, any basis besides “Try this and see if it helps you.” (which is a perfectly fine basis if it’s cheap).



Clumping Solstice Singalongs in Groups of 2-4

5 января, 2020 - 23:50
Published on January 5, 2020 8:50 PM UTC

This post assumes you're familiar with rationalist solstice. (It also assumes that while yes, ritual is something to be epistemically careful about, the overall effect size is relatively small compared to spending much of your life thinking about a topic with peers that think that topic is important, and meanwhile having community identities is valuable. If you want to debate that please do so on one of those previous posts)

If you run a solstice ceremony with singalongs, there's particular value in:

  • Doing at least 16 singalongs
  • Clumping* them together in groups of 2-4, rather than alternating song / story / song / story. (Clumping is valuable even if you are doing a smaller number of songs)

This isn't the right approach for all possible solstice aesthetics, but there's a magic thing that can happen here if you do. And if you're not doing it (i.e. most solstice organizers seem to default to the "story/song/story/song" thing), you won't receive any feedback that there's a different thing you could do with a magic, synergistic outcome.

Reasons to want more songs, and to cluster them in groups of 2-4:

  • It takes people awhile to get comfortable singing.
  • Context switching makes it harder to get into the headspace of singing.
  • There is a secret, deeper headspace of singing that you only get to if you do a LOT of it, in a row, in an environment that encourages being thoroughly un-self-conscious about it.
  • There is a long game that I think singalong solstice celebrations can help with, which is to restore musicality as a basic skill, which in turn allows you to have much richer musical traditions than if it's an incidental thing you do a little of sometimes. The payoff for this comes on a multi-year timescale.

There are reasons not to want this many songs, or to have them clustered this way. Some people get more value out of the speeches or other activities than songs. One organizer of a small solstice mentioned their primary concern was "Have each person bring one activity to the solstice", and most of them weren't comfortable with songleading. Getting people directly involved with Solstice indeed seems valuable if that's an option. (This makes more sense for smaller communities)

But my impression is that much of the time, the ratio of songs/stories and their placement was determined somewhat arbitrarily, and then never reconsidered.

Getting Comfortable

It used to be that group singing was quite common. There were no iPods or headphones, or even recordings. Running into a 1-in-a-million musician was a rare event. Therefore, it was quite natural that if you wanted music in your life, you had to make it yourself, and when you did you were comparing yourself to your friends and family, not to popular superstars.

This is no longer the case by default. So it takes people awhile to get used to "oh, okay I am actually allowed to sing. I am actually encouraged to sing. It doesn't matter if I sound good, we are doing this thing together."

For many people, it takes at least two songs in a row to get them to a point where they even consider singing at all, let alone feeling good about it. The feeling of hesitation resets when you spend a lot of time listening to a speech.

The idea here is not just "people get to sing", but, "people feel a deep reassurance that singing is okay, that we are all here singing together", and I think that's just impossible to get in the space of one or even two songs. (It becomes even harder to hit this point if there are proportionately few singalongs, and especially if there are also performance-piece songs that people are not encouraged to sing along with)

Deep musical headspace

In my preferred celebration, "Deep reassurance that singing is okay" is only step one. There's a second deeper stage of feeling connected to the other people in the room, and connected to ideas that you're all here to celebrate, for which reassurance is a prerequisite but insufficient.

Step two requires the songs be resonant, and for you to have a strong sense that the other people in the room all have a particular connection to the songs. (The sense of ingroup identity and sense of philosophical connection are separate qualities, but work together to produce something greater than the sum of their parts)

You can get pieces of this in the space of a single song, but there's a version of it with unique qualia that takes something like 8 songs to really get going (and then, once you're there, it's nice to get to stay there awhile)

Interwoven Story and Song; each Round Deepening

The formula I find works best (at least for my preferences) is:

  • On average, groups of 2-4 songs
  • Start with a song that's a particularly inviting singalong, to set the overall context of "this is an event where we're here to sing together."
  • Each song gets a brief story (like 10-30 seconds) that gives it some context and helps people fit it into the overall narrative arc of the night. The brief stories are not long enough to take you out of singalong-headspace.
  • In between sets of 2-4 songs, there are longer stories, speeches, meditations and other activities that move the narrative along more significantly. Each one sets the overall context for the next 2-4 songs, shifting the particular qualia of "deep singalong" that you'd get from it.

Once you've gotten into the overall singalong headspace, it's less necessary to do groups of songs – alternating between a song and a speech won't kill the headspace once it's had a chance to take root.

Your Mileage May Vary

Reiterating a final time that this is just one particular effect you can go for. I think it's important that local solstice organizers adapt to fit the needs of their particular communities. But the effect I'm trying to describe here is hard to grok if you haven't directly experienced it, and I wanted people to at least have considered an option they may have been missing.


UML V: Convex Learning Problems

5 января, 2020 - 22:47
Published on January 5, 2020 7:47 PM UTC

(This is part five in a sequence on Machine Learning based on this book. Click here for part 1.)

The first three posts of this sequence have defined PAC learning and established some theoretical results, problems (like overfitting), and limitations. While that is helpful, it doesn't actually answer the question of how to solve real problems (unless the brute force approach is viable). The first way in which we approach that question is to study particular classes of problems and prove that they are learnable. For example, in the previous post, we've looked (among other things) at linear regression, where the loss function has the form .mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} ℓ:Rd→R and is linear. In this post, we focus on convex learning problems, where the loss function also has the above form and is convex.


We begin with sets rather than functions.

Convex setsA set M (that is part of a vector space) is called convex iff for any two points in that set, the line segment which connects both points is a subset of M.

The condition that M is part of a vector space is missing in the book, but it is key – as far as I know, being part of a vector space is the most general way of demanding that a line between two points even exists. Convexity cannot be defined for mere topological spaces, or even metric spaces. In our case, all of our sets will live in Rd for some d∈N+.

For example, let's look at letters (as a subset of the plane, R2). None of the letters in this chapter so far is convex – the letter l comes the closest, but it's not quite there if you look closely enough. Even the uppercase I is not convex in this font. The only convex symbols in this post that I've noticed are . and ' and | and –.

Conversely, every regular filled polygon with n corners is convex. The circle is not convex (no two points have a line segment which is contained in the circle), but the disc (filled circle) is convex. The disc with an arbitrary set of points on its boundary (i.e. the circle) taken out remains convex. The disc with any other point taken out is not convex, neither is the disc with any additional point added. You get the idea. (To be precise on the last one, the mathematical description of the disc is D={x∈R2|||x||≤1}, so there is no way to add a single point that somehow touches the boundary.)

Convex functions

Informally, a function f:Rd→R is convex iff the set of all points on and above the function is convex as a subset of Rd+1, where the dependent variable goes up/downward.

(Here, the middle function (x3) is not convex because the red line segment is not in the blue set, but the left (|x|) and the right (x2) are convex.)

The formal definition is that a function f:Rd→R is convex iff for all x,y∈Rd, the equation f(x+α(y−x))≤f(x)+α[f(y)−f(x)] holds for all α∈[0,1]. This says that a line segment connecting two points on the graph lines above the graph, so the same thing.

If d=1 as was the case in all of my pictures, then f is convex iff the little pixie flying along the function graph never turns rightward. This is the case iff f′ is monotonically increasing, which is the case iff f′′(x)≥0 for all x∈R.

The main reason why convexity is a desirable property is that, for a convex function, every local minimum is a global minimum – which is probably fairly obvious, but let's do a formal proof, because it's reasonably easy in this case.

Suppose that x∈Rd is a local minimum. Then we find some ball Bd(x,ϵ):={p∈Rd|||p−x||≤ϵ} around x such that f(y)≥x for all y in the ball (this is what it means to be a local minimum in Rd). Now let z be an arbitrary point in Rd; we show that its function value can't lie below that of x. Imagine the line segment from x to z. A part of it must lie in our ball, so we find some (small) δ∈R+ such that x+δ[z−x]∈Bd(x,ϵ). Then (because x is our local minimum), we have that f(x)≤f(x+δ[z−x]). By convexity of f we have f(x+δ[z−x])≤f(x)+δ[f(y)−f(x)], so taken together we obtain the equation


Or equivalently δ[f(z)−f(x)]≥0 which is to say that δf(z)≥δf(x) which is to say that f(z)≥f(x).

If there are several local minima, then there are several global minima, then one can draw a line segment between them that inevitably cannot go up or down (because otherwise one of the global minima wouldn't be a global minimum), so really there is just one global minimum, which might be 'wide' if the function just happens to not go up or down for a while. This is all about the difference between ≤ and <. The simplest example is a constant function – it is convex, and everywhere is a global minimum (or rather, the global minimum).

Jensen's Inequality

The key fact about convex functions, I would argue, is Jensen's inequality:

Given &#x3B1;1,...,&#x3B1;n∈R+ with ∑ni=1&#x3B1;i=1, if f:Rd→R is convex, then for any sequence (x1,...,xn)∈(Rd)n, it holds that f(∑ni=1&#x3B1;ixi)≤∑ni=1&#x3B1;if(xi).

If you look at the inequality above, you might notice that it is almost the definition of linearity, except for the condition ∑ni=1&#x3B1;i=1 and the fact that we have ≤ instead of =. So convex functions fulfill the linearity property as an inequality rather than an equality (almost). In particular, linear functions are convex. Conversely, concave functions (these are functions where the ≤ in the definition of convex functions is a ≥) also fulfill the above property as an inequality, only the the sign does again turn around. In particular, linear functions are concave. To refresh your memory, here is the definition of convexity:


To to summarize: convex functions never turn rightward, concave functions never turn leftward, and the intersection of both does neither, i.e. always goes straight, i.e. is linear. Looking at convexity and concavity as a generalization of linearity might further motivate the concept. In fact, it might have even been wise to define the three concepts in analogous terms rather than considering Jensen's inequality a result about convex functions.

Terms of the form x+a(y−x), which one sees quite often (for example in defining points on a line segment), can be equivalently written as (1−a)x+ay. I think the first form is far more intuitive; however, the second one generalizes a bit better. We see that x and y are given weights, and those weights sum up to 1. If one goes from 2 weighted values to n weighted values (still all non-negative), one gets Jensen's inequality. Thus, the statement of Jensen's inequality is that if you take any number of points on the graph and construct a weighted mean, that resulting point still lies above the graph. See wikipedia's page for a simple proof via induction.

Guaranteeing learnability

Recall that we are trying to find solvable special cases of the setting minimize a loss function of the form ℓ:Rd→R. This can be divided into three tasks:

(1) define the special case

(2) demonstrate that this special case is indeed solvable

(3) apply it as widely as possible

This chapter is about (1). (By the way, when I say "chapter" and "section", I'm referring to the level-1 and level-2 headlines of this post as visible in the navigation at the left.) The reason why we aren't already done with (1) is that convexity of the loss function alone turns out to be insufficient to guarantee PAC learnability. We'll discuss a counter-example in the context of linear regression and then define additional properties to remedy the situation.

A failure of convexity

In our counter-example, we'll have X=Y=R; note that this is a convex set. Our hypothesis class H will be that of all (small) linear predictors, i.e. just H={f&#x3B1;:x↦&#x3B1;⋅x|&#x3B1;∈[−1,1]}. The reason that we only allow small predictors is that our final formulation of the learnable class will also demand that H is a bounded set, so this example demonstrates that even boundedness + convexity is still not enough.

We've previously defined real loss functions as taking a hypothesis and returning the real error, and empirical loss functions as taking a hypothesis and a training sequence and returning the empirical error. Now we'll look point-based loss functions (not bolded as it's not an official term, but I'll be using it a lot) which measure the error of a hypothesis on a single point only, i.e. they have the form ℓ(x,y):H→R for some (x,y)∈X×Y. To be more specific, we will turn the squared loss function defined in the previous post into a point-based loss function. Thus we will have ℓ(2)(x,y)(h&#x3B1;)=||&#x3B1;⋅x−y||2=(&#x3B1;x−y)2, where the last equality holds because we're in the one-dimensional case (this is also why none of the letters are bolded). We will only care about two points (all else with have probability mass 0), namely these two:

That's the point (1/−1) at the left and one all the way to the right at (1/&#x3BC;,0). With &#x3BC;, think of an extremely small positive number, so that 1/&#x3BC; is quite large.

If this class was PAC learnable, then there would be a learner A such that, for all &#x3F5;,&#x3B4;∈(0,1), if the size of the training sequence is at least m∗(&#x3F5;,&#x3B4;), then for all probability distributions over X×Y, with probability at least 1−&#x3B4; over the choice of S, the error of A(S) would be at most &#x3F5; larger than that of the best classifier.

So to prove that it is not learnable, we first assume we have some learner A. Then we get to set some &#x3F5; and &#x3B4; and construct a probability distribution DA based on A. Finally, we have to prove that A fails on the problem given the choices of &#x3F5; and &#x3B4; and DA. That will show that the problem is not PAC learnable.

We consider two possible candidates for DA. The first is DL which has all probability mass on the point (1/−1) on the left. The second is D?, which has almost all probability mass on the point (1/−1), but also has &#x3BC; probability mass on the point (1/&#x3BC;,0). Here &#x3BC;∈R+ will be an extremely small number; so small that the right point will be unlikely to ever appear in the training sequence.

If the right point doesn't appear in the training sequence, then the sequence consists of only the left point sampled over and over again. In that case, A cannot differentiate between DL and D?, so in order to succeed, it would have to output a hypothesis which performs well with both distributions – which as we will show is not possible. Given our class H, the hypothesis A(S) must be of the form h&#x3B1; for some &#x3B1;∈R. Recall that the classifier is supposed to predict the y-coordinate of the points. Thus, for the first point, &#x3B1;=−1 would be the best choice (since −1⋅1=−1) and for the second point, &#x3B1;=0 would be the best choice (since 0⋅1/&#x3BC;=0).

Now if &#x3B1;≤−0,5, then we declare that DA=D?. In this case (assuming that the second point doesn't appear in the training sequence), there will be a &#x3BC; chance of predicting the value &#x3B1;⋅1/&#x3BC;=&#x3B1;/&#x3BC;≤−1/2&#x3BC;, which, since we use the squared loss function, leads to an error of at least 14&#x3BC;2, and thus the expected error is at least &#x3BC;14&#x3BC;2=14&#x3BC;, which, because &#x3BC; is super tiny, is a very large number. Conversely, the best classifier would be at least as good as the classifier with &#x3B1;=0, which would only have error 1−&#x3BC; (for the left point), which is about 1 and thus a much smaller number.

Conversely, if &#x3B1;>−0.5, we declare that DA=DL, in which case the error of A(S) is at least (−0.5−(−1))2=14, whereas the best classifier (with &#x3B1;=−1) has zero error.

Thus, we only need to choose some &#x3F5;<14 and an arbitrary &#x3B4;. Then, given the sample size m, we set &#x3BC; small enough such that the training sequence is less than &#x3B4; likely to contain the second point. This is clearly possible: we can make &#x3BC; arbitrarily small; if we wanted, we could make it so small that the probability of sampling the second point is <&#x3B4;100. That concludes our proof.

Why was this negative result possible? It comes down to the fact that we were able to make the error of the first classifier with &#x3B1;<−0.5 large via a super unlikely sample point with super2 high error – so the problem is the growth rate of the loss function. As long as the loss function grows so quickly that, while both giving a point less probability mass and moving it further to the right, the expected error goes up, well then one can construct examples with arbitrarily high expected error. (As we've seen, the expected error in the case of &#x3B1;<−0.5 is at least 14&#x3BC;, i.e. a number that grows arbitrarily large as &#x3BC;→0.)

Lipschitzness & Smoothness

There are at least two ways to formalize a requirement such that the loss function is somehow "bounded". They're called Lipschitzness and Smoothness, and both are very simple.

Lipschitzness says that a function cannot grow too fast, i.e.

A function f:Rd→R is &#x3C1;-Lipschitz iff |f(y)−f(x)|≤&#x3C1;||y−x||∀x,y∈Rd

If f is differentiable, then a way to measure maximum growth is the gradient, because the gradient points into the direction of fastest growth. Thus, one has the equivalent characterization:

A differentiable function f:Rd→R is &#x3C1;-Lipschitz iff ||∇x||≤&#x3C1; for all x∈Rd

However, non-differentiable functions can be Lipschitz; for example, the absolute value function |x| on the real numbers is 1-Lipschitz. Conversely, smoothness is about the change of change. Thus, |x| is definitely not smooth since the derivative jumps from −1 to 1 across a single point (smoothness is only even defined for differentiable functions). On the other hand, the function x2 is smooth on all of R. The formal definition simply moves Lipschitzness one level down, i.e.

A differentiable function f:Rd→R is &#x3B2;-smooth iff its gradient is &#x3B2;-Lipschitz

Which is to say, iff ||∇f(y)−∇f(x)||≤&#x3C1;||y−x|| for all x,y∈Rd. In the one-dimensional case, the gradient equals the derivative, and if the derivative is itself differentiable, then smoothness can be characterized in terms of the second derivative. Thus, a twice-differentiable function f:R→R is &#x3B2;-smooth iff |f′′(x)|≤&#x3B2; for all x∈R.

One now defines the class of convex Lipschitz bounded problems and that of convex smooth bounded problems. They both require that H has a structure as a familiar set like Bd(0,M), that it is convex and bounded (so H=Rd would not suffice), and that, for all (x,y)∈X×Y, the point-based loss function ℓ(x,y):H→R is &#x3C1;-Lipschitz (in the former case) or &#x3B2;-smooth and nonnegative (in the latter case). If all this is given, the class is called convex Lipschitz bounded with parameters (M,&#x3C1;); or convex smooth bounded with parameters (M,&#x3B2;).

In the previous example, the hypothesis could be represented by the set [0,1], which is both convex and bounded. (In that example, we thought of it as a set of functions, each of which fully determined by an element in [0,1]; now we think of it as the set [0,1] itself.) Each point-based loss function ℓ(2)(x,y) is convex (and non-negative). However, for any number x∈R+, the loss function ℓ(2)x,y is defined by the rule ℓ(2)(x,y)(&#x3B1;)=(&#x3B1;⋅x−y)2, and the gradient of this function with respect to &#x3B1; (which equals the derivative since we're in the one-dimensional case) is 2(&#x3B1;⋅x−y). Since this gets large as &#x3B1; gets large, the function is not Lipschitz. Furthermore, the second derivative is 2x. This means that each particular function induced by the point (x,y) is 2x-smooth, but there is no parameter &#x3B2; such that all functions are &#x3B2;-smooth.

Surrogate Loss Functions

We are now done with task (1), defining the problem. Task (2) would be demonstrating that both convex Lipschitz bounded and convex smooth bounded problems are in fact PAC learnable with no further conditions. This is done by defining an algorithm and then proving that that the algorithm works, i.e. learns any instance of the class with the usual PAC learning guarantees. The algorithm we will look at for this purpose (which does learn both classes) is an implementation of Stochastic Gradient Descent; however we'll do this in the next post rather than now. For this final chapter, we will instead dive into (3), i.e. find an example of how the ability to learn these two classes is useful even for problems that don't naturally fit into either of them.

Recall the case of binary linear classification. We have a set of points in some high-dimensional space X=Rd, a training sequence where points are given binary labels (i.e. Y={−1,1}) and we wish to find a hyperplane that performs well in the real world. We've already discussed the Perceptron algorithm and also reduced the problem to linear programming; however, both approaches have assumed that the problem is separable.

We're not going to find a perfect solution for the general case, because one can show that the problem is NP-hard. However, we can find a solution that approximates the optimal predictor. The approach here is to define a surrogate loss function, which is a loss function ℓ∗ that (a) upper-bounds the real loss ℓ0−1, and (b) has nicer properties than the real loss, so that minimizing it is easier. In particular, we would like for it to be a member of one of the two learnable classes we have introduced. Our point-based loss function for ℓ0−1 has the form ℓ0−1(x,y)(ha):=1ha(x)≠y, where 1B for a boolean statement B is 1 iff B is true and 0 otherwise.

Recall that each hyperplane is fully determined by one vector in Rd, hence the notation ha. If we represent H directly as Rd and assume d=1, then the graph of ℓ0−1(x,y) looks like this...

... because in d=1 and the homogeneous case, the classifier determined by a single number; if this number is positive it will label all positive points with 1; if it's negative, it will label all negative points with 1. If the x-coordinate of the point in question is positive with label 1 or negative with label −1 (i.e. x>0 and y=1; or x<0 and y=−1), then the former case is the correct one and we get this loss function. Otherwise, the loss function would jump from 0 to 1 instead.

Obviously, d=1 is silly, but it already demonstrates that this loss function is not convex (it makes a turn to the right, and it's easy to find a segment which connects two points of the graph and doesn't lie above the graph). But consider the alternative loss function ℓ∗(x,y):

This new loss function can be defined by the rule ℓ∗(x,y)(ha):=max(0,1−⟨a,x⟩y). In the picture, the horizontal axis corresponds to a and we have x>0 and y=1. This loss function is easily seen to be convex, non-negative, and not at all smooth. It is also ||x||-Lipschitz. Thus, the problem with X=Rd is not convex Lipschitz bounded, but if we take X=Bd(0,&#x3C1;) and also H=Bd(0,M) for some M,&#x3C1;∈R+, then it does become a member of the convex-Lipschitz-bounded class with parameters M and &#x3C1;, and we can learn via e.g. stochastic gradient descent.

Of course, this won't give us exactly what we want (although penalizing a predictor for being "more wrong" might not be unreasonable), so if we want to bound our loss (empirical or real) with respect to ℓ0−1, we will have to do it via ℓ0−1(h)=ℓ∗(h)+[ℓ0−1(h)−ℓ∗(h)], where the second term is the difference between both loss functions. If ℓ0−1(h)−ℓ∗(h) is small, then this approach will perform well.


Homeostasis and “Root Causes” in Aging

5 января, 2020 - 21:43
Published on January 5, 2020 6:43 PM UTC

Let’s start with a stylized fact: almost every cell type in the human body is removed and replaced on a regular basis. The frequency of this turnover ranges from a few days (for many immune cells and cells in the gastrointestinal lining) to ten years (for fat, heart, and skeleton cells). Only a handful of tissues are believed to be non-renewing in humans - e.g. eggs, neurons, and the lens of the eye (and even out of those, neurons are debatable).

This means that the number of cells of any given type is determined by “homeostatic equilibrium” - the balance of cell removal and replacement. If an ulcer destroys a bunch of cells in your stomach lining, they’ll be replaced over a few days, and the number of stomach cells will return to roughly the same equilibrium level as before. If a healthy person receives a bunch of extra red blood cells in a transfusion, they’ll be broken down over a few months, and the number of blood cells will return to roughly the same equilibrium level as before.

As organisms age, we see a change in the homeostatic equilibrium level of many different cell types (and other parameters, like hormone and cytokine levels). In particular, a wide variety of symptoms of aging involve “depletion” (i.e. lower observed counts) of various cell types.

However, human aging happens on a very slow timescale, i.e. decades. Most cell counts equilibrate much faster - for instance, immune cell counts equilibrate on a scale of days to weeks. So, suppose we see a decrease in the count of certain immune cells with age - e.g. naive T cells. Could it be that naive T cells just wear out and die off with age? No - T cells are replaced every few weeks, so a change on a timescale of decades cannot be due to the cells themselves dying off. If the count of naive T cells falls on a timescale of decades, then either (a) the rate of new cell creation has decreased, or (b) the rate of old cell removal has increased (or both). Either of those would require some “upstream” change to cause the rate change.

More generally: in order for cell counts, or chemical concentrations, or any other physiological parameter to decrease/increase with age, at least one of the following must be true:

  • the timescale of turnover is on the order of decades (or longer)
  • rate of removal increases/decreases
  • rate of creation decreases/increases

If none of these is true, then any change is temporary - the cell count/concentration/whatever will return to the same level as before, determined by the removal and creation rates.

Of those three possibilities, notice that the second two - increase/decrease in production/removal rate - both imply some other upstream cause. Something else must have caused the rate change. Sooner or later, that chain of cause-and-effect needs to bottom out, and it can only bottom out in something which equilibrates on a timescale of decades or longer. (Feedback loops are possible, but if all the components equilibrate on a fast timescale then so will the loop.) Something somewhere in the system is out-of-equilibrium on a timescale of decades. We’ll call that thing (or things) a “root cause” of aging. It’s something which is not replaced on a timescale faster than decades, and it either accumulates or decumulates with age.

Now, the main criteria: a root cause of aging cannot be a higher or lower value of any parameter subject to homeostasis on a faster timescale than aging itself. Examples:

  • Most cell types turn over on timescales of days to months. “Depletion” of any of these cell types cannot be a root cause of aging; either their production rate has decreased or their removal rate has increased.
  • DNA damage (as opposed to mutation) is normally repaired on a timescale of hours - sometimes much faster, depending on type. “Accumulation” of DNA damage cannot be a root cause of aging; either the rate of new damage has increased or the repair rate has decreased.
  • DNA mutations cannot be repaired; from a cell’s perspective, the original information is lost. So mutations can accumulate in a non-equilibrium fashion, and are a plausible root cause under the homeostasis argument.

Note that the homeostasis argument does not mean the factors ruled out above are not links in the causal chain. For instance, there’s quite a bit of evidence that DNA damage does increase with age, and that this has important physiological effects. However, there must be changes further up the causal chain - some other long-term change in the organism’s state leads to faster production or slower repair of DNA damage. Conversely, the homeostasis argument does not imply that “plausible root causes” are the true root causes - for instance, although DNA mutations could accumulate in principle, cells with certain problematic mutations are believed to be cleared out by the immune system - so the number of cells with these mutations is in equilibrium on a fast timescale, and cannot be a root cause of aging.

For any particular factor which changes with age, key questions are:

  1. Is it subject to homeostasis?
  2. If so, on what timescale does it turn over?
  3. If it is subject to homeostasis on a timescale faster than aging, then what are the production and removal mechanisms, and what changes the production and removal rates with age?

These determine the applicability of the homeostasis argument. Typically, anything which can normally be fixed/replaced/undone by the body will be ruled out as a root cause of aging - the timescale of aging is very long compared to practically all other physiological processes. We then follow the causal chain upstream, in search of plausible root cause.


Dissolving Confusion around Functional Decision Theory

5 января, 2020 - 21:17
Published on January 5, 2020 6:38 AM UTC


Functional Decision Theory (FDT), (see also causal, evidential, timeless, and updateless decision theories) recommends taking cooperative, non-greedy actions in twin prisoners dilemmas, Newcombian problems, and Parfit’s hitchhiker-like games but not smoking lesion situations. It’s a controversial concept with important implications for designing agents that have optimal behavior when embedded in environments in which they may potentially interact with models of themselves. Unfortunately, I think that FDT is sometimes explained confusingly and misunderstood by its proponents and opponents alike. To help dissolve confusion about FDT and address key concerns of its opponents, I refute the criticism that FDT assumes that causation can happen backward in time and offer two key principles that provide a framework for clearly understanding it:

  1. Questions in decision theory are not questions about what choices you should make with some sort of unpredictable free will. They are questions about what type of source code you should be running.
  2. I should consider predictor P to “subjunctively depend” on agent A to the extent that P makes predictions of A’s actions based on correlations that cannot be confounded by my choice of what source code A runs.
Getting Up to Speed

I think that functional decision theory (FDT) is a beautifully counterintuitive and insightful framework for instrumental rationally. I will not make it my focus here to talk about what it is and what types of situations it is useful in. To gain a solid background, I recommend this post of mine or the original paper on it by Eliezer Yudkowsky and Nate Soares.

Additionally, here are four different ways that FDT can be explained. I find them all complimentary for understanding and intuiting it well.

  1. The decision theory that tells you to act as if you were setting the output to an optimal decision-making process for the task at hand.
  2. The decision theory that has you cooperate in situations isomorphic to a prisoners’ dilemma against a model of yourself--including when your opponent locks in their choice and shows it to you before you make yours.
  3. The decision theory that has you one-box it in situations isomorphic to Newcombian games--including when the boxes are transparent; see also Parfit’s Hitchhiker.
  4. The decision theory that shifts focus from what type of decisions you should make to what type of decision-making agent you should be.

I’ll assume a solid understanding of FDT from here on. I’ll be arguing in favor of it, but it’s fairly controversial. Much of what inspired this post was an AI Alignment Forum post called A Critique of Functional Decision Theory by Will MacAskill which raised several objections to FDT. Some of his points are discussed below. The rest of this post will be dedicated to discussing two key principles that help to answer criticisms and dissolve confusions around FDT.

1. Acknowledging One’s own Predictability

Opponents of FDT, usually proponents of causal decision theory (CDT), will look at a situation such as the classic Newcombian game and reason as so:

I can choose to one-box it and take A or two-box it and take A+B. Regardless of the value of A, A+B is greater, so it can only be rational to take both. After all, when I’m sitting in front of these boxes, what’s in them is already in them regardless of the choice I make. The functional decision theorist’s perspective requires assuming that causation can happen backwards in time! Sure, one-boxers might do better at these games, but non-smokers do better in smoking lesion problems. That doesn’t mean they are making the right decision. Causal decision theorists may be dealt a bad hand in Newcombian games, but it doesn’t mean they play it badly.

The problem with this argument, I’d say, is subtle. I actually fully agree with the perspective that for causal decision theorists, Newcombian games are just like smoking lesion problems. I also agree with the point that causal decision theorists are dealt a bad hand in these games but don’t play it badly. The problem with the argument is some subtle confusion about the word ‘choice’ plus how it says that FDT assumes that causation can happen backwards in time.

The mistake that a causal decision theorist makes isn’t in two-boxing. It’s in being a causal decision theorist in the first place. In Newcombian games, the assumption that there is a highly-accurate predictor of you makes it clear that you are, well, predictable and not really making free choices. You’re just executing whatever source code you’re running. If this predictor thinks that you will two-box it, your fate is sealed and the best you can do is then to two-box it. The key is to just be running the right source code. And hence the first principle:

Questions in decision theory are not questions about what choices you should make with some sort of unpredictable free will. They are questions about what type of source code you should be running.

And in this sense, FDT is actually just what happens when you use causal decision theory to select what type of source code you want to enter a Newcombian game with. There’s no assumption that causation can occur backwards. FDT simply acknowledges that the source code you’re running can have a, yes, ***causal*** effect on what types of situations you will be presented with when models of you exist.

Instead of FDT assuming causal diagrams like these:

It really only assumes ones like these:

I think that many proponents of FDT fail to make this point: FDT’s advantage is that it shifts the question to what type of agent you want to be--not misleading questions of what types of “choices” you want to make. But this isn’t usually how functional decision theorists explain FDT, including Yudkowsky and Soares in their paper. And I attribute some unnecessary confusion and misunderstandings like “FDT requires us to act as if causation happens backward in time,” to it.

To see this principle in action, let’s look at a situation presented by Will MacAskill. It’s similar to a Newcombian game with transparent boxes. And I say “similar” instead of “isomorphic” because of some vagueness which will be discussed soon. MacAskill presents this situation as follows:

You face two open boxes, Left and Right, and you must take one of them. In the Left box, there is a live bomb; taking this box will set off the bomb, setting you ablaze, and you certainly will burn slowly to death. The Right box is empty, but you have to pay $100 in order to be able to take it. A long-dead predictor predicted whether you would choose Left or Right, by running a simulation of you and seeing what that simulation did. If the predictor predicted that you would choose Right, then she put a bomb in Left. If the predictor predicted that you would choose Left, then she did not put a bomb in Left, and the box is empty. The predictor has a failure rate of only 1 in a trillion trillion. Helpfully, she left a note, explaining that she predicted that you would take Right, and therefore she put the bomb in Left. You are the only person left in the universe. You have a happy life, but you know that you will never meet another agent again, nor face another situation where any of your actions will have been predicted by another agent. What box should you choose?

Macaskill claims that you should take left because it results in a “guaranteed payoff”. Unfortunately, there is some vagueness here about what it means for a long-dead predictor to have run a simulation of you and for it to have an error rate of one in a trillion trillion. Is this simulation true to your actual behavior? What type of information about you did this long dead predictor have access to? What is the reference class for the error rate?

Let’s assume that your source code was written long ago, that the predictor understood how it functioned, that it ran a true-to-function simulation, and that you were given an unaltered version of that source code. Then this situation isomorphic to a transparent-box Newcombian game in which you see no money in box A (albeit more dramatic), and the confusion goes away! If this is the case then there are only two possibilities.

  1. You are a causal decision theorist (or similar), the predictor made a self-fulfilling prophecy by putting the bomb in the open right box alongside a note, and you will choose the left box.
  2. You are a functional decision theorist (or similar), the predictor made an extremely rare, one in a trillion-trillion mistake, and you will unfortunately take the box with a bomb (just as a functional decision theorist in a transparent box Newcombian game would take only box A).

So what source code would you rather run when going into a situation like this? Assuming that you want to maximize expected value and that you don’t value your life at more than 100 trillion trillion dollars, then you want to be running the functional decision theorist’s source code. Successfully navigating this game, transparent-box Newcombian games, twin-opponent-reveals-first prisoners’ dilemmas, Parfit’s Hitchiker situations, and the like all require you have source code that would tell you to commit to making the suboptimal decision in the rare case in which the predictor/twin made a mistake.

Great! But what if we drop our assumptions? What if we don’t assume that this predictor’s simulation was functionally true to your behavior? Then it becomes unclear how this prediction was made, and what the reference class of agents is for which this predictor is supposedly only wrong one in a trillion trillion times. And this leads us to the second principle.

2. When a Predictor is Subjunctively Entangled with an Agent

An alternate title for this section could be “when statistical correlations are and aren’t mere.”

As established above, functional decision theorists need not assume that causation can happen backwards in time. Instead, they only need to acknowledge that a prediction and an action can both depend on an agent’s source code. This is nothing special whatsoever: an ordinary correlation between an agent and predictor that arises from a common factor: the source code.

However, Yudkowsky and Soares give this type of correlation a special name in their paper: subjunctive dependence. I don’t love this term because it gives a fancy name to something that is not fancy at all. I think this might be responsible for some of the confused criticism that FDT assumes that causation can happen backward in time. Nonetheless, “subjunctive dependence” is at least workable. Yudkowsky and Soares write:

When two physical systems are computing the same function, we will say that their behaviors “subjunctively depend” upon that function.

This concept is very useful when a predictor actually knows your source code and runs it to simulate you. However, this notion of subjunctive dependence isn’t very flexible and quickly becomes less useful when a predictor is not doing this. And this is a bit of a problem that MacAskill pointed out. A predictor could make good predictions without potentially querying a model of you that is functionally equivalent to your actions. He writes:

...the predictor needn’t be running your algorithm, or have anything like a representation of that algorithm, in order to predict whether you’ll one box or two-box. Perhaps the Scots tend to one-box, whereas the English tend to two-box. Perhaps the predictor knows how you’ve acted prior to that decision. Perhaps the Predictor painted the transparent box green, and knows that’s your favourite colour and you’ll struggle not to pick it up. In none of these instances is the Predictor plausibly doing anything like running the algorithm that you’re running when you make your decision. But they are still able to predict what you’ll do. (And bear in mind that the Predictor doesn’t even need to be very reliable. As long as the Predictor is better than chance, a Newcomb problem can be created.)

Here, I think that MacAskill is getting at an important point, but one that’s hard to see clearly with the wrong framework. On its face though, there’s a major problem with this argument. Suppose that in Newcombian games, 99% of brown-eyed people one-boxed it, and 99% of blue-eyed people two-boxed it. If a predictor only made its prediction based on yout eye color, then clearly the best source code to be running would be the kind that always made you two-box it regardless of your eye color. There’s nothing newcombian, paradoxical, or even difficult about this case. And pointing out these situations is essentially how critics of MacAskill’s argument have answered it. Their counterpoint is that unless the predictor is querying a model of you that is functionally isomorphic to your decision making process, then it is only using “mere statistical correlations,” and subjunctive dependence does not apply.

But this counterpoint and Yudkoswky and Soares’ definition of subjunctive dependence miss something! MacAskill had a point. A predictor need not know an agent’s decision-making process to make predictions based on statistical correlations that are not “mere”. Suppose that you design some agent who enters an environment with whatever source code you gave it. Then if the agent’s source code is fixed, a predictor could exploit certain statistical correlations without knowing the source code. For example, suppose the predictor used observations of the agent to make probabilistic inferences about its source code. These could even be observations about how the agent acts in other Newcombian situations. Then the predictor could, without knowing what function the agent computes, make better-than-random guesses about its behavior. This falls outside of Yudkowsky and Soares’ definition of subjunctive dependence, but it has the same effect.

So now I’d like to offer my own definition of subjunctive dependence (even though still, I maintain that the term can be confusing, and I am not a huge fan of it).

I should consider predictor P to “subjunctively depend” on agent A to the extent that P makes predictions of A’s actions based on correlations that cannot be confounded by my choice of what source code A runs.

And hopefully, it’s clear why this is what we want. When we remember that questions in decision theory are really just questions about what type of source code we want to enter an environment using, then the choice of source code can only affect predictions that depend in some way on the choice of source code. If the correlation can’t be confounded by the choice of source code, the right kind of entanglement to allow for optimal updateless behavior is present.

Additional TopicsGoing Meta

Consider what I call a Mind Police situation: Suppose that there is a powerful mind policing agent that is about to encounter agent A and read its mind (look at its source code). Afterward, if the mind policer judges A to be using decision theory X, they will destroy A. Else they will do nothing.

Suppose that decision theory X is FDT (but it could be anything) and that you are agent A who happens to use FDT. If you were given the option of overwriting your source code to implement some alternative, tolerated decision theory, would you? You’d be better off if you did, and it would be the output of an optimal function for the decision making task at hand, but it’s sort of unclear whether or this is a very functional decision theorist thing to do. Because of situations like these, I think that we should consider decision theories to come in two flavors: static which will never overwrite itself, and autoupdatable, which might.

Also, note that the example above is only a first-order version of this type of problem, but there are higher-order ones too. For example, what if the mind police destroyed agents using autoupdatable decision theories?

Why Roko’s Basilisk is Nonsense

A naive understanding of FDT has led some people to ask whether a superintelligent sovereign, if one were ever developed, would be rational to torture everyone who didn’t help to bring it into existence. The idea would be that this sovereign might consider this as part of an updateless strategy to help it come into existence more quickly and accomplish its goals more effectively.

Fortunately, a proper understanding of FDT and subjunctive dependence tell us that an optimally-behaving embedded agent doesn’t need to pretend that causation can happen backward in time. Such a sovereign would not be in control of its source code, and it can’t execute an updeateless strategy if there was nothing there to not-update on in the first place. So Roko’s Basilisk is only an information hazard if FDT is poorly understood.


It's all about the source code.


What is Life in an Immoral Maze?

5 января, 2020 - 16:40
Published on January 5, 2020 1:40 PM UTC

Previously in sequence: Moloch Hasn’t WonPerfect CompetitionImperfect CompetitionDoes Big Business Hate Your Family?

This post attempts to give a gears-level explanation of maze life as experienced by a middle manager.

Again, if you have not yet done so, you are highly encouraged to read or review Quotes from Moral Mazes. I will not have the space here to even gloss over many important aspects.

An Immoral Maze can be modeled as a super-perfectly competitive job market for management material. All the principles of super-perfect competition are in play. The normal barriers to such competition have been stripped away. Too many ‘qualified’ managers compete for too few positions.

If an aspirant who does not devote everything they have, and visibly sacrifice all slack, towards success, they automatically fail. Those who do make such sacrifices mostly fail anyway, but some of them “succeed”. We’ll see later what success has in store for them.

The Lifestyle of a Middle Manager

At the managerial and professional levels, the road between work and life is usually open because it is difficult to refuse to use one’s influence, patronage, or power on behalf of another regular member of one’s social coterie. It therefore becomes important to choose one’s social colleagues with some care and, of course, know how to drop them should they fall out of organizational favor. (Moral Mazes, Location 884, Quote #117)

We have this idea that there is work and there is not-work, and once one leaves work one is engaged in not-work distinct from work. We also have this idea that there are things that are off limits even at work, like sexual harassment.

For a person without anyone reporting to them, who is ‘on the line’ in the book’s parlance, this can be sustained.

For those in middle management who want to succeed, that’s not how things work. Everything you are is on the table. You’d better be all-in. 

You will increasingly choose your friends to help you win. You will increasingly choose your hobbies, and what you eat, and your politics, and your house, and your church, and your spouse and how many kids you have, to help you win. And of course, you will choose your (lack of) morality.

In the end, you will sacrifice everything, and I mean everything, that you value, in any sense, to win.

If the job requires you to move, anywhere in the world, you’ll do it, dragging your nuclear family along and forcing all of you to leave behind everything and everyone you know. Otherwise, you’re just not serious about success. 

Slack will definitely not be a thing.

Your time is especially vulnerable.

Higher-level managers in all the corporations I studied commonly spend twelve to fourteen hours a day at the office. (Location 1156, Quote #120, Moral Mazes)

This is the result of total competition between producers – the managers are effectively rival producers trying to sell themselves as the product. 

The market for managers is seen, by those who make the decisions, as highly efficient.

If managers were seen as wildly different in terms of talent, intelligence, or some other ability that helped get things done, that would help a lot. You could afford to be a little quirky, to hold on to the things you value most, without losing the game entirely. Your success will be influenced by your personality and dedication, but nothing like solely determined by them.

Alas, the perception in these mazes is exactly the opposite.

See, once you are at a certain level of experience, the difference between a vice-president, an executive vice-president, and a general manager is negligible. It has relatively little to do with ability as such. People are all good at that level. They wouldn’t be there without that ability. So it has little to do with ability or with business experience and so on. All have similar levels of ability, drive, competence, and so on. What happens is that people perceive in others what they like—operating styles, lifestyles, personalities, ability to get along. Now these are all very subjective judgments. And what happens is that if a person in authority sees someone else’s guy as less competent than his own guy, well, he’ll always perceive him that way. And he’ll always pick—as a result—his own guy when the chance to do so comes up. (Location 1013, Quote #87, Moral Mazes)

It is known that most people ‘don’t have what it takes’ to be a manager. This is clearly true on many levels. Only one of them is a willingness to fully get with the program.

Once you get several levels up, the default assumption is that everyone is smart enough, and competent enough. That the object-level is a fully level playing field. The idea that someone can just be better at doing the actual job doesn’t parse for them.

All remaining differences are about negative selection, about how hard you want it and are willing to sacrifice everything, or about how well you play political games. Nor do they much care whether you succeed at your job, anyway.

Some additional supporting quotes on that follow. A large portion of the quotes reinforce this perspective. 

If you can’t work smart, work hard:

When asked who gets ahead, an executive vice-president at Weft Corporation says: The guys who want it [get ahead]. The guys who work. You can spot it in the first six months. They work hard, they come to work earlier, they leave later. They have suggestions at meetings. They come into a business and the business picks right up. They don’t go on coffee breaks down here [in the basement]. You see the parade of people going back and forth down here? There’s no reason for that. I never did that. If you need coffee, you can have it at your desk. Some people put in time and some people work. (Location 992, Quote 29, Moral Mazes)

But everyone at this level works hard, which was more about showing you work hard than the results of the work, because concrete outcomes don’t much matter:

As one manager says: “Personality spells success or failure, not what you do on the field.” (Location 1383, Quote 33, Moral Mazes)

It’s not like there were ever objective criteria:

Managers rarely speak of objective criteria for achieving success because once certain crucial points in one’s career are passed, success and failure seem to have little to do with one’s accomplishments. (Location 917, Quote 42, Moral Mazes)

Which makes sense, because if everyone is the same, then concrete outcomes are just luck:

Assuming a basic level of corporate resources and managerial know-how, real economic outcome is seen to depend on factors largely beyond organizational or personal control. (Location 1592, Quote 46, Moral Mazes)

I am supremely confident that this perspective is completely bonkers. There is huge differential between better and worse no matter how high up you go or how extreme your filters have already been. But what matters here is what the managers believe. Not what is true. Talent or brilliance won’t save you if no one believes it can exist. If noticed it will only backfire:

Striking, distinctive characteristics of any sort, in fact, are dangerous in the corporate world. One of the most damaging things, for instance, that can be said about a manager is that he is brilliant. This almost invariably signals a judgment that the person has publicly asserted his intelligence and is perceived as a threat to others. What good is a wizard who makes his colleagues and his customers uncomfortable? (Location 1173, Quote 88, Moral Mazes)

How do things get so bad? 

That’s the question we’ll look at an aspect of next post. From here I anticipate 3-5 day gaps between posts.

Questions that will be considered later, worth thinking about now, include: How does this persist? If things are so bad, why aren’t things way worse? Why haven’t these corporations fallen apart or been competed out of business? Given they haven’t, why hasn’t the entire economy collapsed? Why do regular people, aspirant managers and otherwise, still think of these manager positions as the ‘good jobs’ as opposed to picking up pitchforks and torches?



Q & A with Stuart Russell in AISafety.com Reading Group

5 января, 2020 - 14:23
Published on January 5, 2020 11:23 AM UTC

On Wednesday the 8th 11:45 PST = 19:45 UTC, Stuart Russell will be joining the online AISafety.com Reading Group, and answer questions about his book, Human Compatible.

If you'd like to join, please add me on Skype ("soeren.elverlin").

This book has previously been discussed on LessWrong in 2 posts:




Machine Learning Can't Handle Long-Term Time-Series Data

5 января, 2020 - 06:43
Published on January 5, 2020 3:43 AM UTC

More precisely, today's machine learning (ML) systems cannot infer a fractal structure from time series data.

This may come as a surprise because computers seem like they can understand time series data. After all, aren't self-driving cars, AlphaStar and recurrent neural networks all evidence that today's ML can handle time series data?


Self-Driving Cars

Self-driving cars use a hybrid of ML and procedural programming. ML (statistical programming) handles the low-level stuff like recognizing pedestrians. Procedural (nonstatistical) programming handles high-level stuff like navigation. The details of self-driving car software are trade secrets, but we can infer bits of Uber's architecture from the National Transportation Safety Board's report on Uber's self-crashing car as summarized by jkaufman.

"If I'm not sure what it is, how can I remember what it was doing?" The car wasn't sure whether Herzberg and her bike were a "Vehicle", "Bicycle", "Unknown", or "Other", and kept switching between classifications. This shouldn't have been a major issue, except that with each switch it discarded past observations. Had the car maintained this history it would have seen that some sort of large object was progressing across the street on a collision course, and had plenty of time to stop.

We see here (above) is that the car throws away its past observations. Now let's take a look at a consequence of this.

"If we see a problem, wait and hope it goes away". The car was programmed to, when it determined things were very wrong, wait one second. Literally. Not even gently apply the brakes. This is absolutely nuts. If your system has so many false alarms that you need to include this kind of hack to keep it from acting erratically, you are not ready to test on public roads.

Humans have to write ugly hacks like this when when your system isn't architected bottom-up to handle concepts like the flow of time. A machine learning system designed to handle time series data should never have human beings in the loop this low down the ladder of abstraction. In other words, Uber effectively uses a stateless ML system.

All you need to know when driving a car is the position of different objects and their velocities. You almost never need to know the past history of another driver or even yourself. There is no such thing as time to a stateless system. A stateless system cannot understand the concept of time. Stateless ML systems make sense when driving a car.


AlphaStar (DeepMind's StarCraft II AI) is only a little more complicated than Uber's self-crashing car. It uses two neural networks. One network predicts the odds of winning and another network figures out which move to perform. This turns a time-series problem (what strategy to perform) into a two separate stateless[1] problems.

Comparisons between AlphaStar and human beings are fudged because StarCraft II depends heavily on actions-per-minute (APM), the speed a player can perform actions. Humans wouldn't have a chance if AlphaStar was not artificially limited in the number of actions it would take. Games between humans and AlphaStar are only interesting because AlphaStar's actions are limited thereby giving humans a handicap.

Without the handicap, AlphaStar crushes human beings tactically. With the handicap, AlphaStar still crushes human beings tactically. Human beings can beat AlphaStar on occasion only because elite StarCraft II players possess superior strategic understanding.

Most conspicuously, human beings know how to build walls with buildings. This requires a sequence of steps that don't generate a useful result until the last of them are completed. A wall is useless until the last building is put into place. AlphaStar (the red player in the image below) does not know how to build walls.

With infinite computing power, AlphaStar could eventually figure this out. But we don't have infinite computing power. I don't think it's realistic to expect that AlphaStar will ever figure out how to build a wall with its current algorithms and realistic hardware limitations.

AlphaStar is good at tactics and bad at strategy. To state this more precisely, AlphaStar hits a computational cliff for understanding conceptually complex strategies when time horizons exceed the tactical level. Human brains are not limited in this way.

Recurrent Neural Networks (RNNs)

RNNs are neural networks with a form of short-term memory. A newly-invented variant incorporates long short-term memory. In both cases, the RNN is trained with the standard backpropagation algorithm used by all artificial neural networks[2]. The backpropagation algorithm works fine on short timescales but quickly breaks down when strategizing about longer timescales conceptually. The algorithm hits a computational cliff.

This is exactly the behavior we observe from AlphaStar. It's also the behavior we observe in natural language processing and music composition. ML can answer a simple question just fine but has trouble maintaining a conversation. ML can generate classical music just fine but can't figure out the chorus/verse system used in rock & roll. That's because the former can be constructed stochastically without any hidden variables while the latter cannot.

The Road Ahead

This brings us to my first law of artificial intelligence.

Any algorithm that is not organized fractally will eventually hit a computational wall, and vice versa.

―Lsusr's First Law of Artificial Intelligence

For a data structure to be organized fractally means you can cut a piece off and that piece will be a smaller version of the original dataset. For example, if you cut a sorted list in half then you will end up with two smaller sorted lists. (This is part of why quicksort works.) You don't have to cut a sorted list in exactly the middle to get two smaller sorted lists. The sorted list's fractal structure means can cut the list anywhere. In this way, a sorted list is organized fractally along one dimension. Others examples of fractal datastructures are heaps and trees.

Another fractal structure is a feed forward neural network (FFNN). FFNNs are organized fractally along two dimensions. There are two ways you can cut a neural network in half to get two smaller neural networks. The most obvious is to cut the network at half at a hidden layer. To do this, duplicate the hidden layer and then cut between the pair of duplicated layers. The less obvious way to cut a neural network is to slice between its input/output nodes.

Each dimension of fractality is a dimension the FFNN can scale indefinitely. FFNNs are good at scaling the number of input and output nodes they possess because a FFNN is structured fractally along this dimension (number of input/output nodes). FFNNs are good at scaling the complexity of processing they perform because a FFNN is structured fractally along this dimension too (number of hidden layers).

Much of the recent development in image recognition comes from these two dimensions of fracticality[3]. Image recognition has a high number of input nodes (all the color channels of all the pixels in an image). FFNN can apply complex rules to this large space because of its fractal geometry in the number of hidden layers dimension.

FFNNs are stateless machines so feeding time series data into a FFNN doesn't make sense. RNNs can handle time series data, but they have no mechanism for organizing it fractally. Without a fractal structure in the time dimension, RNNs cannot generalize information from short time horizons to long time horizons. They therefore do not have enough data to formulate complex strategies on long time horizons.

If we could build a neural network fractally-organized in the time domain then it could generalize (apply transfer learning) from short time horizons to long time horizons. This turns a small data problem into an big data problem. Small data problems are hard. Big data problems are easy.

This is why I'm so interested in Connectome-Specific Harmonic Waves (CSHW). The fractal equation of harmonic waves (Laplacian eigendecomposition) could answer the problem of how to structure a neural network fractally in the time domain.

  1. AlphaStar does contain one bit of time-series comprehension. It can predict the actions of enemy units hidden by fog of war. I'm choosing to ignore this on the grounds it isn't an especially difficult problem. ↩︎

  2. The human brain lacks a known biological mechanism for performing the backpropagation algorithm used by artificial neural networks. Therefore biological neural networks probability use a different equation for gradient ascent. ↩︎

  3. Progress also comes from the application of parallel GPUs to massive datasets, but scaling int this way wouldn't be viable mathematically without the two-dimensional fractal structure of FFNNs. ↩︎


[Book Review] The Trouble with Physics

5 января, 2020 - 04:47
Published on January 5, 2020 1:47 AM UTC

Lee Smolin's book The Trouble with Physics: The Rise of String Theory, the Fall of a Science, and What Comes Next is ostensibly about why string theory can't solve what he calls the Five Great Problems in theoretical physics:

  1. "Combine general relativity and quantum theory into a single theory that can claim to be the complete theory of nature" i.e. "the problem of quantum gravity".
  2. "Resolve the problems in the foundations of quantum mechanics, either by making sense of the theory as it stands or by inventing a new theory that does make sense."
  3. "Determine whether or not the various particles and forces can be unified in a theory that explains them all as manifestations of a single, fundamental entity."
  4. "Explain how the values of the free constants in the standard model of particle physics are chosen in nature."
  5. "Explain dark matter and dark energy. Or, if they don't exist, determine how and why gravity is modified on large scales. More generally, explain why the constants of the standard model of cosmology, including the dark energy, have the values they do."

Actually, The Trouble with Physics is about a much broader problem—disruptive innovation as described in Clayton Christenson's The Innovator's Dilemma and Thomas Kuhn's The Structure of Scientific Revolutions. In Smolin's view, the scientific establishment is good at making small iterations to existing theories and bad at creating radically new theories. It's therefore not implausible that the solution to quantum gravity could come from a decade of solitary amateur work by someone totally outside the scientific establishment.

This is an extraordinary claim and extraordinary claims demand extraordinary evidence. Smolin's book includes plenty of such evidence.

He [Carlo Rovelli] got so many e-mails [from string theorists] asserting that Mandelstam had proved [String Theory] finite that he decided to write to Mandelstam himself and ask his view. Mandelstam is retired, but he responded quickly. He explained that what he had proved is that a certain kind of infinite term does not appear anywhere in the theory. But he told us that he had not actually proved that the theory itself was finite, because other kinds of infinite terms might appear. No such term has ever been seen in any calculation done so far, but neither has anyone proved that one couldn't appear.

Smolin's book is full of evidence like this. I find his argument convincing because it aligns with my personal experience. I earned a bachelor's degree in physics because I wanted to help figure out the Great Problems. I wanted to discuss big ideas with Einsteins and Feynmans. Instead I found narrowly specialized postdocs. They're good at math but tend to have little education in the broader history of science. To them, the physical laws might as well have been handed down on stone tablets. Physicists' job is that of a madrasa.

This might be tenable if the foundations of physics (general relativity and quantum theory) were plausibly true. But general relativity and quantum theory contradict each other. They cannot both be correct. Therefore at least half of physics is wrong.

The scientific establishment isn't structured to resolve problems of this magnitude. Until we restructure the institution of physics so that it promotes diversity of thought (unlikely anytime soon) it's not inconceivable that the answers to the Five Great Problems could come from an amateur.

Smolin's book has inspired me to begin working on a theory of quantum gravity. I'll need to learn new things like quantum field theory. I might give up before getting anywhere. But at least I know I don't understand basic physics. That puts me in good company.

I think I can safely say that nobody understands quantum mechanics.

― Richard Feynman


Credit Cards for Giving

5 января, 2020 - 02:00
Published on January 4, 2020 11:00 PM UTC

The standard advice for how to physically make a donation is something like: if you're donating less than ~$1,000 use a credit card, otherwise use a check or other method with lower fees. For example, GiveWell writes: We recommend that gifts up to $1,000 be made online by credit card. If you are giving more than $1,000, please consider one of these alternatives: Check, Bank Transfer, ...

And the Against Malaria Foundation writes:

However you prefer to make a donation is fine.
All other things being equal, we prefer:
  • For donations less than US$5,000, an online donation using credit or debit card
  • For donations more than US$5,000, an offline donation - by bank transfer or by mail (cheque/check) - to eliminate fees.

This makes sense: if the charity is paying 2.25% on your donation, that's $25 on a $1,000 donation, and $125 on a $5,000 one.

The ideal, though, would to be able to donate with a credit card, get credit card rewards, but still have the charity get the full amount. I wouldn't expect this to exist, but it turns out that it does! There are platforms (ex: Facebook) which will cover the credit card fees for charities, so that if you donate $1,000 the charity receives the full $1,000. Not only that, but your purchase is eligible for regular credit card rewards, so 1-3% savings over sending a check.

The main downside of this approach is that you generally can't direct your donation. You can make a donation to the Malaria Consortium, but not to their Seasonal Malaria Chemoprevention program that GiveWell recommends. Likewise, you can donate to the Centre for Effective Altruism but not a specific EA fund.

Again, however, there is an exception: the EA Giving Tuesday team has been coordinating with EA charities to create FB fundraisers for specific programs. At least for 2019 these were only open for a couple weeks leading up to Giving Tuesday, but during that window you could use your credit card for no-fee donations to specific programs at EA charities. Potentially the EA Giving Tuesday folks should advertise this as something to use their fundraisers for, and keep them open a bit longer in December?

There could also be other opportunities available on short notice. For example, we could find out in November 2020 that there's something like the 1% PayPal Giving Fund match that was offered in 2017 and 2018. Because these can appear on short notice, I think it's worth trying to have sufficient credit across multiple cards that you could potentially run your annual donations all through your open credit cards on a single day.

The two main ways to increase your available credit are opening new credit cards and asking for credit limit increases on your existing cards. For limit increases there's generally a link in the online banking UI like "Request a Credit Limit Increase". Typically it will bring up a form with fields for annual income and monthly mortgage/rent:

I went through my cards yesterday and asked for 100% limit increases on each. Here's what happened:

Citi Double Cash (2%) Got +45%, immediately Capital One Quicksilver (1.5%) Got +6% immediately Barclays jetBlue (~1.5% equivalent) Got +100%, by email the next day Alliant Cashback (2.5%) Couldn't find a link for requesting an increase; I should probably call them.

In 2019 I was able to run about 80% of our donations through credit cards, which was worth about $3k in rewards.

If your goal is to get rich, donating money, even with tax deductions and credit card rewards, is not going to help. But for money you're donating anyway, ~2% savings is worth optimizing a bit for.

Comment via: facebook


Less Wrong Poetry Corner: Walter Raleigh's "The Lie"

5 января, 2020 - 01:22
Published on January 4, 2020 10:22 PM UTC

Followup to: Rationalist Poetry Fans, Unite!, Act of Charity

This is my favorite poem about revealing information about deception! It goes like this (sources: Wikipedia, Poetry Foundation, Bartleby)—

Go, Soul, the body's guest,
Upon a thankless arrant:
Fear not to touch the best;
The truth shall be thy warrant:
Go, since I needs must die,
And give the world the lie.

Say to the court, it glows
And shines like rotten wood;
Say to the church, it shows
What's good, and doth no good:
If church and court reply,
Then give them both the lie.

Tell potentates, they live
Acting by others' action;
Not loved unless they give,
Not strong, but by a faction:
If potentates reply,
Give potentates the lie.

Tell men of high condition,
That manage the estate,
Their purpose is ambition,
Their practice only hate:
And if they once reply,
Then give them all the lie.

Tell them that brave it most,
They beg for more by spending,
Who, in their greatest cost,
Seek nothing but commending:
And if they make reply,
Then give them all the lie.

Tell zeal it wants devotion;
Tell love it is but lust;
Tell time it is but motion;
Tell flesh it is but dust:
And wish them not reply,
For thou must give the lie.

Tell age it daily wasteth;
Tell honour how it alters;
Tell beauty how she blasteth;
Tell favour how it falters:
And as they shall reply,
Give every one the lie.

Tell wit how much it wrangles
In tickle points of niceness;
Tell wisdom she entangles
Herself in over-wiseness:
And when they do reply,
Straight give them both the lie.

Tell physic of her boldness;
Tell skill it is pretension;
Tell charity of coldness;
Tell law it is contention:
And as they do reply,
So give them still the lie.

Tell fortune of her blindness;
Tell nature of decay;
Tell friendship of unkindness;
Tell justice of delay;
And if they will reply,
Then give them all the lie.

Tell arts they have no soundness,
But vary by esteeming;
Tell schools they want profoundness,
And stand too much on seeming:
If arts and schools reply,
Give arts and schools the lie.

Tell faith it's fled the city;
Tell how the country erreth;
Tell, manhood shakes off pity;
Tell, virtue least preferreth:
And if they do reply,
Spare not to give the lie.

So when thou hast, as I
Commanded thee, done blabbing,—
Although to give the lie
Deserves no less than stabbing,—
Stab at thee he that will,
No stab the soul can kill.

The English is a bit dated; Walter Raleigh (probably) wrote it in 1592 (probably). "Give the lie" here is an expression meaning "accuse them of lying" (not "tell them this specific lie", as modern readers not familiar with the expression might interpret it).

The speaker is telling his soul to go to all of Society's respected institutions and reveal that the stories they tell about themselves are false: the court's shining standard of Justice is really about as shiny as a decaying stump; the chruch teaches what's good but doesn't do any good; kings think they're so powerful and mighty, but are really just the disposable figurehead of a coalition; &c. (I'm not totally sure exactly what all of the stanzas mean because of the dated language, but I feel OK about this.)

The speaker realizes this campaign is kind of suicidal ("Go, since I needs must die") and will probably result in getting stabbed. That's why he's telling his soul to do it, because—ha-ha!—immaterial souls can't be stabbed!

What about you, dear reader? Have you given any thought to revealing information about deception?!


Dec 2019 gwern.net newsletter

4 января, 2020 - 23:48
Published on January 4, 2020 8:48 PM UTC


Is there a previous LW discussion on ergodicity, like the difference of time-average vs. ensemble-average and its effect when working with expected values?

4 января, 2020 - 13:07
Published on January 4, 2020 10:07 AM UTC

I've recently stumbled across a blog-post explaining ergodicity, and if I gathered correctly, Taleb's book "Skin in the Game" is also explaining this (I haven't read that book).

It seems to be rather important. Some claim (don't know if correctly) that both decision theory and social and economic science has, or had quite a few errors because of not correctly dealing with non-ergodicity.

While I had some small bit of intuitive understanding of this (for instance, when fumbling around personal finances), I hadn't seen a clear "LW treaty" about it. Maybe I just forgot it already, but a short site-search didn't come up with something sensible.

Anyone having some pointers?


Underappreciated points about utility functions (of both sorts)

4 января, 2020 - 10:27
Published on January 4, 2020 7:27 AM UTC

In this post I'd basically like to collect some underappreciated points about utility functions that I've made in the comments of various places but which I thought were collecting into a proper, easily-referenceable post. The first part will review the different things referred to by the term "utility function", review how they work, and explain the difference between them. The second part will explain why -- contrary to widespread opinion on this website -- decision-theoretic utility functions really do need to be bounded.

(It's also worth noting that as a consequence, a number of the decision-theoretic "paradoxes" discussed on this site simply are not problems since they rely on unbounded decision-theoretic utility. An example is the original Pascal's Mugging (yes, I realize that term has since been applied to a bunch of things that have nothing to do with unbounded utility, but I mean the original problem).)

Anyway. Let's get on with it.

Part 1: "Utility function" refers to two different things that are often conflated and you should be sure you know which one you're talking about

The term "utility function" refers to two significantly different, but somewhat related, things, which are, due to the terminological and conceptual overlap, often conflated. This results in a lot of confusion. So, I want to cover the distinction here.

The two things called utility functions are:

  1. A function that describes the preferences of a given agent, assuming that agent's preferences satisfy certain rationality conditions, and does not depend on any particular ethical theory, but rather is useful in decision theory and game theory more generally. If I need to be unambiguous, I'll call this a decision-theoretic utility function.
  2. A function, used specifically in utilitarianism, that describes something like a person's preferences or happiness or something -- it's never been clearly defined, and different versions of utilitarianism suggest different ideas of what it should look like; these are then somehow aggregated over all people into an overall (decision-theoretic!) utility function (which is treated as if it were the decision-theoretic utility function describing what an ideal moral agent would do, rather than the preferences of any particular agent). If I need to be unambiguous, I'll call this an E-utility function.

(There's actually a third thing sometimes called a "utility function", which also gets confused with these other two, but this is a rarer and IMO less important usage; I'll get back to this in a bit.)

It's important to note that much discussion online conflates all of these and yields nonsense as a result. If you see someone talking nonsense about utility functions, before replying, it's worth asking -- are they mixing together different definitions of "utility function"?

So. Let's examine these in a bit more detail.

Decision-theoretic utility functions and their assumptions

Decision-theoretic utility functions describe the preferences of any consequentialist agent satisfying certain rationality conditions; by "it describes the agent's preferences", I mean that, given a choice between two options, the one yielding the higher expected utility is the one the agent chooses.

It's not obvious in advance that a rational agent's preferences need to be described by a utility function, but there are theorems guaranteeing this; Savage's theorem probably provides the best foundation for this, although the VNM theorem may be a little more familiar. (We'll discuss the difference between these two in more detail in the quick note below and discuss it further in the second part of this post.) Note that these functions are not entirely unique -- see below. Also note that these are conditions of rationality under uncertainty.

Again, a decision-theoretic utility function simply describes an agent's preferences. It has nothing to do with any particular idea of morality, such as utilitarianism. Although you could say -- as I've said above -- that it assumes a consequentialist agent, who cares only about consequences. So, any rational consequentialist agent has a decision-theoretic utility function; but only a utilitarian would admit the existence of E-utility functions. (While this doesn't exactly bear on the point here, it is worth noting that utilitarianism is a specific type of consequentialism and not identical with it!)

Note that real people will not actually obey the required rationality assumptions and thus will not actually have decision-theoretic utility functions; nonetheless, idealized rational agents, and therefore decision-theoretic utility functions are a useful abstraction for a number of purposes.

Decision-theoretic utility functions are usually stated as taking values in the real numbers, but they're only defined up to positive affine transformations (scaling by a positive constant, and translation); applying such a transformation to a utility function for an agent will yield another, equally-valid utility function. As such they may be better thought of not as taking values in R, exactly, but rather a sort of ordered 1-dimensional affine space over R. Outputs of a decision-theoretic utility function are not individually meaningful; in order to get meaningful numbers, with concrete meaning about the agent's preferences, one must take ratios of utilities, (a-b)/|c-d|. (Note the absolute value in the denominator but not the numerator, due to the importance of order.)

Decision-theoretic utility functions really need to be bounded -- a point seriously underappreciated on this website -- but I'll save discussion of that for the second part of this post.

A quick tangential note on probability and additivity

This is pretty tangential to the point of this post, but it's probably worth taking the time here to explain what the difference is between the Savage and VNM formalisms. (Well, one of two differences; the other will be discussed in the second part of this post, but as we'll see it's actually not such a difference.) The main difference is that the VNM theorem assumes that we already believe in the idea of probability -- it justifies decision-theoretic utility, but it does nothing to justify probability, it just assumes it. Savage's theorem, by contrast, provides a foundation for both probability and decision-theoretic utility simultaneously, based on just on rationality axioms about preferences, which is why I think it's the better foundation.

However, the probability measure it constructs need not actually be a probability measure as such, as it need only be finitely additive rather than countably additive. It's not clear what to make of this. Maybe countable additivity of probability just isn't necessary for a rational agent? It's hard to say. (If I'm not mistaken, the limiting probabilities of MIRI's logical inductor are merely (the analogue of) finitely additive, not countably additive, but I could be wrong about that...) But this is really off the point, so I'm just going to raise the question and then move on; I just wanted to mention it to ward off nitpicks on this point. As we'll see below, the choice of formalism doesn't actually matter much to my point here.

E-utility functions and their assumptions

This is the older meaning of the term if I'm not mistaken, but there is mostly not a lot to say about these because they're fairly ill-defined. They are, as mentioned above, specifically a utilitarian notion (not a general consequentialist notion). How to define these, as well as how to aggregate them, remain disputed.

Utilitarians say that one should try to maximize the expected value of the aggregated utility function, which means that the aggregated function is actually a weird sort of decision-theoretic utility function (corresponding to an ideal moral agent rather than any particular agent), not an E-utility function. One does not attempt to maximize expected value of E-utility functions.

One thing we can say about E-utility functions is that while only idealized rational agents have decision-theoretic utility functions and not real people, real people are supposed to have E-utility functions. Or at least so I gather, or else I don't see how utilitarianism makes sense?

Actually, one could say that it is not only utilitarians who rely on these -- there is also the notion of prioritarianism; one sometimes sees the term "aggregative consequentialism" to cover both of these (as well as other potential variants). But, because E-utility functions are so ill-defined, there is, as best I can tell, not really any meaningful distinction between the two. For example, consider a utilitarian theory that assigns to each agent p a real-valued E-utility function U_p, and aggregates them by summing. Let's suppose further that each U_p takes values in the nonnegative reals; then if we change the aggregation rule to summing the square roots of the U_p, we have changed our utilitarian theory into a prioritarian one. Except, instead of doing that, we could define U'_p = sqrt(U_p), and call U'_p the E-utilities; because there's no precise definition of E-utilities, there's nothing stopping us from doing this. But then the utilitarian theory described by the U'_p, describes exactly the same theory as the prioritarian theory described by the U_p! The theory could equally well be described as "utilitarian" or "prioritarian"; for this reason, unless one puts further restrictions on E-utility functions, I do not consider there to be any meaningful difference between the two.

As such, throughout this post I simply say "utilitarianism" rather than "aggregative consequentialism"; but if I'm wrong in identifying the two, well, whenever I say "utilitarianism" I really kind of mean "aggregative consequentialism". Hope that's OK.

Preference utilitarianism and Harsanyi's theorem (using decision-theoretic utility functions as E-utility functions)

Above I've made a point of emphasizing that decision-theoretic utility and E-utility functions are different things. But could there be cases where it makes sense to use one as the other? Specifically, to use decision-theoretic utility functions as E-utility functions? (The reverse clearly doesn't make much sense.)

Well, yes, that's basically what preference utilitarianism is! OK, precise formulations of preference utilitarianism may vary, but the idea is to use people's preferences as E-utility functions; and how are you going to encode people's preferences if not with decision-theoretic utility functions? (OK, this may only really work for a population of idealized agents, but it's still worth thinking about.)

Indeed, we can go further and formalize this with Harsanyi's theorem, which gives a series of moral assumptions (note: among them is that the agents in the population do indeed have decision-theoretic utility functions!) under which morality does indeed come down to maximizing a sort of aggregate of the population's decision-theoretic utility functions.

(Note that it also assumes that the population is fixed, which arguably assumes away a lot of the hard parts of utilitarianism, but it's still a useful starting point.)

But, what is this aggregation? If we think of the agent's utility functions as taking values in R, as they're usually thought of, then the aggregation consists of summing a utility function for each agent. But which one? As mentioned above, utility functions are only unique up to positive affine transformations. Harsanyi's theorem provides no guidance on which utility function to use for each agent -- how could it? They're all equally valid. And yet using different ones can yield very different (and meaningfully different) aggregated results, essentially letting you adjust weightings between agents! Except there's no meaningful notion of "equal weighting" to use as baseline. It's something of a problem.

(This is often discussed in terms of "weights", coefficients put in front of the utility functions; but I think this obscures the fundamental issue, in making it sound like there's a meaningful notion of "equal weights" when there really isn't.)

Still, despite these holes, preference utilitarianism and Harsanyi's theorem are definitely worth thinking about.

Brief note on that third sort of utility function

Finally, before we get to the second part of this post, I wanted to mention that third thing sometimes called a "utility function".

The term "utility function" is sometimes used for a real-valued function that describe's an agents deterministic preferences; i.e., if A and B are two options, and U the utility function, then the agent prefers A to B if and only if U(A) > U(B). Note the lack of any requirement here about expected value! This is a weaker sense than a decision-theoretic utility function as I described it above; any decision-theoretic utility function is one of these, but not vice-versa.

While you're occasionally encounter this, it's frankly a useless and even counterproductive notion. Why? Because fundamentally, it's the wrong abstraction for the situation. If uncertainty isn't coming into play, and you're only applying deterministic rationality constraints, then the right structure for describing an agent's preferences is a total preorder. Why would you introduce real numbers? That just restricts what you can express! Not every total preorder will embed in the real numbers. So, there isn't any sensible set of rationality conditions that will lead to this notion of utility function; they'll lead you instead to the idea of a total preorder, and then oops maybe that total preorder will fail to embed in R and the agent won't have a "utility function" in this sense.

Such a function is of course only unique up to order-preserving functions on R, meaning it's not very unique at all (one more sign of it being the wrong abstraction).

Why were such functions ever even used, when they're clearly the wrong abstraction? I think basically it's because a lot of people lack familiarity with mathematical structures, or how to build an abstraction to suit a set of requirements, and instead tend to just immediately reach for the real numbers as a familiar setting to put things in. (Honestly, that's probably why decision-theoretic utility functions were initially defined as R-valued as well; fortunately, in that case, it turns out to be the correct choice! The real numbers can indeed be quite useful...)

Of course, as discussed above, if your agent not only obeys requirements of deterministic rationality, but also requirements of rationality under uncertainty, then in fact they'll have a decision-theoretic utility function, taking values in R, and so will have one of these. So in that sense the assumption of taking these values in R is harmless. But still...

Part 2: Yes, decision-theoretic utility functions really do need to be bounded

OK. Now for the main point: Contrary to widespread opinion on this site, decision-theoretic utility functions really do need to be bounded.

First, I'm going to discuss this in terms of Savage's theorem. I realize this is the less familiar formalism for many here, but I think it's the better one; if you're not familiar with it I recommend reading my post on it. I'll discuss the point in terms of the more familiar VNM formalism shortly.

OK. So under Savage's formalism, well, Savage's theorem tells us that (under Savage's rationality constraints) decision-theoretic utility functions must be bounded. Um, OK, hm, that's not a very helpful way of putting it, is it? Let's break this down some more.

There's one specific axiom that guarantees the boundedness of utility functions: Savage's axiom P7. Maybe we don't need axiom P7? Is P7 really an important rationality constraint? It seems intuitive enough, like a constraint any rational agent should obey -- what rational agent could possibly violate it? -- but maybe we can do without it?

Let's hold that thought and switch tracks to the VNM formalism instead. I mean -- why all this discussion of Savage at all? Maybe we prefer the VNM formalism. That doesn't guarantee that utility functions are bounded, right?

Indeed, as usually expressed, the VNM formalism doesn't guarantee that utility functions are bounded... except the usual VNM formalism doesn't actually prove that utility functions do everything we want!

The point of a decision-theoretic utility function is that it describes the agent's preferences under uncertainty; given two gambles A and B, the one with the higher expected utility (according to the function) is the one the agent prefers.

Except, the VNM theorem doesn't actually prove this for arbitrary gambles! It only proves it for gambles with finitely many possible outcomes. What if we're comparing two gambles and one of them has infinitely many possible outcomes? This is something utility functions are often used for on this site, and a case I think we really do need to handle -- I mean, anything could potentially have infinitely many possible outcomes, couldn't it?

Well, in this case, the VNM theorem by itself provides absolutely no guarantee that higher expected utility actually describes the agent's preference! Our utility function might simply not work -- might simply fail to correctly describe the agent's preference -- once gambles with infintely many outcomes are involved!

Hm. How troublesome. OK, let's take another look at Savage and his axiom P7. What happens if we toss that out? There's no longer anything guaranteeing that utility functions are bounded. But also, there's no longer anything guaranteeing that the utility function works when comparing gambles with infinitely many outcomes!

Sounds familiar, doesn't it? Just like with VNM. If you don't mind a utility function that might fail to correctly describe your agent's preferences once infinite gambles get involved, then sure, utility functions can be unbounded. But, well, that's really not something we can accept -- we do need to be able to handle such cases; or at least, such cases are often discussed on this site. Which means bounded utility functions. There's not really any way around it.

And if you're still skeptical of Savage, well, this all has an analogue in the VNM formalism too -- you can add additional conditions to guarantee that the utility function continues to work even when dealing with infinite gambles, but you end up proving in addition that the utility function is bounded. I'm not so familiar with this, so I'll just point to this old comment by AlexMennen for that...

Anyway, point is, it doesn't really matter which formalism you use -- either you accept that utility functions are bounded, or you give up on the idea that utility functions produce meaningful results in the face of infinite gambles, and, as I've already said, the second of these is not acceptable.

Really, the basic reason should go through regardless of the particular formalism; you can't have both unbounded utility functions, and meaningful expected utility comparisons for infinite gambles, because, while the details will depend on the particular formalism, you can get contradictions by considering St. Petersburg-like scenarios. For instance, in Savage's formalism, you could set up two St. Petersburg-like gambles A and B such that the agent necessarily prefers A to B but also necessarily is indifferent between them; forcing the conclusion that in fact the agent's utility function must have been bounded, preventing this setup.

I'd like to note here a consequence of this I already noted in the intro -- a number of the decision-theoretic "paradoxes" discussed on this site simply are not problems since they rely on unbounded decision-theoretic utility. An example is the original Pascal's Mugging; yes, I realize that term has since been applied to a bunch of things that have nothing to do with unbounded utility, but the original problem, the one Yudkowsky was actually concerned with, crucially does.

And I mean, it's often been noted before that these paradoxes go away if bounded utility is assumed, but the point I want to make is stronger -- that the only reason these "paradoxes" seem to come up at all is because contradictory assumptions are being made! That utility functions can be unbounded, and that utility functions work for infinite gambles. One could say "utility functions have to be bounded", but from a different point of view, one could say "expected utility is meaningless for infinite gambles"; either of these would dissolve the problem, it's only insisting that neither of these are acceptable that causes the conflict. (Of course, the second really is unacceptable, but that's another matter.)

Does normalization solve the weighting problem? (I wouldn't bet on it)

One interesting note about bounded utility functions is that it suggests a solution to the weighting problem discussed above with Harsanyi's theorem; notionally, one could use boundedness to pick a canonical normalization -- e.g., choosing everyone's utility function to have infimum 0 and supremum 1. I say it suggests a solution rather than that it provides a solution, however, in that I've seen nothing to suggest that there's any reason one should actually do that other than it just seeming nice, which, well, is not really a very strong reason for this sort of thing. While I haven't thought too much about it, I'd bet someone can come up with an argument as to why this is actually a really bad idea.

(And, again, this would still leave the problem of population ethics, as well as many others, but still, in this idealized setting...)

Some (bad) arguments against boundedness

Finally, I want to take a moment here to discuss some arguments against boundedness that have come up here.

Eliezer Yudkowsky has argued against this (I can't find the particular comment at the moment, sorry) basically on the idea of that total utilitarianism in a universe that can contain arbitrarily many people requires unbounded utility functions. Which I suppose it does. But, to put it simply, if your ethical assumptions contradict the mathematics, it's not the mathematics that's wrong.

That's being a bit flip, though, so let's examine this in more detail to see just where the problem is.

Eliezer would point out that the utility function is not up for grabs. To which I can only say, yes, exactly -- except that this way of formulating it is slightly less than ideal. We should say instead, preferences are not up for grabs -- utility functions merely encode these, remember. But if we're stating idealized preferences (including a moral theory), then these idealized preferences had better be consistent -- and not literally just consistent, but obeying rationality axioms to avoid stupid stuff. Which, as already discussed above, means they'll correspond to a bounded utility function. So if your moral theory is given by an unbounded utility function, then it is not, in fact, a correct description of anyone's idealized preferences, no matter how much you insist it is, because you're saying that people's idealized (not real!) preferences are, essentially, inconsistent. (I mean, unless you claim that it's not supposed to be valid for infinite gambles, in which case it can I suppose be correct within its domain of applicability, but it won't be a complete description of your theory, which will need some other mechanism to cover those cases; in particular this means your theory will no longer be utilitarian, if that was a goal of yours, and so in particular will not be total-utilitarian.)

One could question whether the rationality constraints of Savage (or VNM, or whatever) really apply to an aggregated utility function -- above I claimed this should be treated as a decision-theoretic utility function, but is this claim correct? -- but I think we have to conclude that they do for the same reason that they apply to preferences of ideal agents, i.e., they're supposed to be a consistent set of preferences; an incosistent (or technically consistent but having obvious perversities) moral system is no good. (And one could imagine, in some idealized world, one's ethical theory being programmed as the preferences of an FAI, so...)

Basically the insistence on unbounded utility functions strikes me as, really, backwards reasoning -- the sort of thing that only makes sense if one starts with the idea of maximizing expected utility (and maybe not distinguishing too strongly between the two different things called "utility functions"), rather than if one starts from agents' actual preferences and the rationality constraints these must obey. If one remembers that utility functions are merely meant to describe preferences that obey rationality constraints, there's no reason you'd ever want them to be unbounded; the math rules this out. If one reasons backwards, however, and starts with the idea of utility functions, it seems like a harmless little variant (it isn't). So, I'd like to encourage everyone reading this to beware of this sort of backwards thinking, and to remember that the primary thing is agents' preferences, and that good rationality constraints are directly interpretable in terms of these. Whereas "the agent has a decision-theoretic utility function"... what does that mean, concretely? Why are there real numbers involved, where did those come from? These are a lot of very strong assumptions to be making with little reason! Of course, there are good reason to believe these strong-sounding claims, such as the use of real numbers specifically; but they make sense as conclusions, not assumptions.

Tangential note about other formalisms (or: I have an axe to grind, sorry)

One final tangential note: Eliezer Yudkowsky has occasionally claimed here that probability and decision-theoretic utility should be grounded not in Savage's theorem but rather in the complete class theorem (thus perhaps allowing unbounded utilities, despite the reasons above the particular formalism shouldn't matter?), but the arguments he has presented for this do not make any sense to me and as best I can tell contain a number of claims that are simply incorrect. Like, obviously, the complete class theorem cannot provide a foundation for probability, when it already assumes a notion of probability; I may be mistaken but it looks to me like it assumes a notion of decision-theoretic utility as well; and his claims about it requiring weaker assumptions than Savage's theorem are not only wrong but likely exactly backwards. Apologies for grinding this axe here, but given how this has come up here before I thought it was necessary. Anyway, see previous discussion on this point, not going to discuss it more here. (Again, sorry for that.)


Anyway I hope this has clarified the different things meant by the term "utility function", so you can avoid getting these mixed up in the future, and if you see confused discussion of them you can come in and de-confuse the issue.

...and yes, decision-theoretic utility functions really do need to be bounded.


Running and Optimizing

4 января, 2020 - 05:40
Published on January 4, 2020 2:40 AM UTC

Two years ago I decided I was going to start running between my house and the subway station each day on my commute, for a total of about five miles a week. Initially I was very interested in getting faster, and would time myself a couple days a week, trying to beat my previous best:

Unfortunately, after six months I was pushing myself too hard, and my knees started hurting. I stopped timing my runs and stopped running so hard, and they got better again. I'm still running, but at a gentler pace.

This experience was a good illustration of how optimizing for a metric can often bring you towards your overall goals for a while, but then continuing to optimize on it can start to bring you away from them again.