# Новости LessWrong.com

A community blog devoted to refining the art of rationality
Обновлено: 2 минуты 44 секунды назад

### Long-term Short-term Happiness

8 мая, 2022 - 18:53
Published on May 8, 2022 3:53 PM GMT

TLDR: I think I always act to maximize my happiness in the present. I believe I make “long-term decisions” because I get immediate happiness from the thought that I’m going to do something that will lead me to be happy in the future.

I think that all my desires reduce to a desire to be happy. For example, I want money to buy stuff and to feel financially secure. If I buy things I want and I worry less about my finances, I’ll feel happier.[1]

I could give a similar example about anything. I want to eat lunch soon because not being hungry makes me happy. I don’t want to live in North Korea because having freedom of speech and using the internet makes me happy. The point is that I don’t want anything besides my own happiness for the sake of it.

That doesn’t mean I don’t care about other people. That means I don’t take any 100% altruistic action. For example, I would donate money to charity to make other people happy, which would make me happy.

I want to be as happy as possible. If I can eat lunch and not live in North Korea, I’ll do both.

I also think I always act to maximize my happiness in the present.

Procrastination

I used to say that I wanted “long-term happiness.” I meant that I wanted to maximize my total happiness over the course of my entire life. But I think I was lying to myself when I said that. I at least didn’t always feel that way. If so, I would’ve been able to never procrastinate.

If I want to be happy right now, I literally want short-term happiness. But I wouldn't tell someone I want "short-term happiness." If I heard that someone only thinks “short-term,” I’d imagine someone who doesn’t plan in advance.

I’m fickle. Sometimes working maximizes my happiness. It can feel good to believe I’m being productive. Other times relaxing maximizes my happiness. Procrastinating makes me happiest when I don’t want to work, but I feel like I should be working.

Sometimes I can’t get myself to feel happy. For example, I may not feel happy as I procrastinate. I maximize my happiness while procrastinating by telling myself, “I’ll still get this work done on time” or “It doesn’t matter if I miss this deadline.”

Experience Machine

Even when I’m feeling energized and motivated enough to work, I don’t think I want long-term happiness. The experience machine thought experiment inspired this theory.

I don’t know the exact details of the original thought experiment. I imagine the experience machine as a combo of the perfect drug and video game. It would make me infinitely happy and give me immortality. I’d enjoy the machine so much that I’d never leave, and the machine would never break.

Let’s imagine I somehow had the chance to enter the machine. If I only cared about my long-term happiness, the choice would be easy. I’d choose to use the machine and have eternal bliss.

However, I wouldn’t make that decision based on my long-term happiness. I’d think about the people that care about me. They’d be sad that I disappeared. I’d think about all the sentient beings suffering right now and everyone who could suffer in the future.

I don’t believe I’ll be the next Stanislav Petrov or Norman Borlaug. But I want to believe I’m on the path to doing the best I can to help others. Having that feeling maximizes my present happiness.

So I think I’d resist the temptation to enter the machine right now. But if I was in a bad mood, I’d enter it.

Conclusion

I tell myself I want “long-term short-term” happiness. I don’t think there’s anything special about this term. That phrasing helps me. It reminds me that I have an entire life where I’ll try to be as happy as I can in each moment. Many people may interpret “long-term happiness” the same way. But I’d taken it literally.

(cross-posted from my blog: https://utilitymonster.substack.com/p/long-term-short-term-happiness)

1. ^

I’m defining happiness as a positive emotional state (i.e., a good feeling).

Discuss

### Dropping Out: A Dialectical Cost-Benefit Analysis

8 мая, 2022 - 16:47
Published on May 8, 2022 1:47 PM GMT

Contra: Okay, so first of all, I should go read the relevant chapter summaries for my upcoming Calc 3 final simply because passing that class leaves me with more optionality than not. I could cite Nassim Taleb, or TurnTrout et al.'s Optimal Policies Tend To Seek Power, but the simple truth is that it's usually best to keep one's options open. Yes, I know that "usually" doesn't mean "always", but we should default to maintaining optionality as a heuristic, and only deviate from that heuristic in the face of strong evidence to the contrary. I'm enrolled in one of the top five Computer Science programs in the world, and I fought hard to get in here.

Pro: Sunk-cost fallacy spotted in the wild! Also, university rankings measure nothing but prestige.

Contra: That's the sunk-cost heuristic, thank you very much.

Contra: There's practically no chance that UIUC CS would readmit a dropout, so we should treat dropping out as an irreversible action. Dropping out would decrease optionality, which means that we should be initially prejudiced against it.

Pro: We do have strong evidence that I should drop out, even if it does reduce my optionality. The three central components of this case are my mental health, opportunity costs, and value drift.

Pro: Indeed, I may benefit from dropping out because it reduces my optionality. (I'm not stating my position on whether it does just yet.)

Pro: Sun Tzu recommends that attacking generals tear down their bridges after they cross them, so that their troops know that retreat is not an option. When Alexander the Great landed upon the Persian shore, the first thing he did was to order his men to burn their boats. In less than a year, his forces would conquer the once-mighty Achaemenid Empire. Analogously, dropping out of college would light a fire under my ass, would give me something to prove to the world. Necessity is the mother of invention, and in this field, invention is the mother of success.

Pro: A less obvious fault with your argument is that you've smuggled in an assumption; you've framed the question in such a way that implies that optionality is reduced by dropping out. I could construct an opposing frame for the same question. Here I go:

Pro: I'm only nineteen once. Why should I spend the next three years of my life slavishly signalling obedience? So that some dead-eyed drone in HR will be mildly impressed by me? My prefrontal cortex isn't done developing yet. If I don't drop out now, the vast majority of my post-pubertal brain development will have occured while in a state of academic serfdom. My brain will mature while still in servitude, and I will never be able to truly accept that I am a free man. Conditioned to be docile, I'll develop retroactive Stockholm Syndrome once I graduate. Time washes away memories of misery, and a decade down the line, I'll lie to myself that I'm happy that I stayed in school, in between breaks at my miserable (but well-paid!) L4 engineer position at MetaGoogAmaSoft, where I'll save money so that my kids can go through the same process. Dropping out now is my best chance to get off of this path.

Pro: Evocative rhetoric aside, it is unclear that staying in college maximizes optionality. If I am to drop out, I should do it now, because every year I spend here is another $55k in sunk cost. Therefore, you can't claim with a straight face that being enrolled is the higher-optionality position because "you can always drop out later lol". While that is technically true, it ignores the accumulation of sunk costs and mazelike all-pay-auction dynamics that college entails. College is designed as a trap, and every day I spend here, I sink deeper into it. Better to get myself out now, rather than later. Better to dig an arrow out of my abdomen than to push it all the way through. To quote Zvi: Being in an immoral maze is not worth it. They couldn’t pay you enough. Even if they could, they definitely don’t. Every day there is another day you suffer, another day you invest your social and human capital in ways that can’t be transferred, and another day you become infected by the maze that much more. Quit. Seriously. Go do something else. Ideally, do it today. Contra: I think that this optionality-related line of discussion is going to end up with us obsessing over minutia and/or arguing semantics, so we should kill it here. Since optionality is defined with respect to a world-model, one can change what optionality means by altering the world-model, or just by drawing attention to a different set of salient features within an agreed-upon world-model. Contra: I will concede that now is the best time to drop out, if and only if dropping out is indeed the best strategy. Which it isn't. Instead of trying to formalize optionality, let's aim towards more of a straightforward cost/benefit analysis. My two-pronged case for staying in college is probably quite familiar already: credentials and connections. I am interested, though, in first hearing you first flesh out your case. Pro: Right. Like I said: mental health, opportunity costs, and value drift. Okay, let's talk about my mental health first. Pro: I have never in my life been more miserable than I was here at UIUC. There have been times (8th grade) when I had few friends. There have been times (post-surgery) when I was in pain for long periods of time. But never before have I experienced suicidal ideation. At this university, I did. I had suicidal thoughts despite having a higher apparent quality of life than I did before. Luckily, I used to meditate once in a while, so I knew on a gut level that just because a thought appeared in my head doesn't mean that that thought was endorsed by me. My depression mostly manifested as lethargy and anhedonia, and I don't think I was ever in any actual danger from myself. Still, it's quite alarming that I was in a psychological state conducive to the regular occurrence of self-destructive thoughts. Pro: Since I went to an academically rigorous boarding school for four years, I know that living in a dormitory and doing hard work isn't the root cause of my mental issues. At UIUC, my classes tend to be virtual (except for my 8am Calc 3 discussion, which is really fun to walk to in the Midwestern winter), tend to be massive, and tend to consist of transparently useless signalling of both my obedience and my propensity for self-sacrificial toil. Learning the ostensible content of my courses is not too hard. For these reasons, I hold a visceral hatred for nearly all of my courses. I don't expect this tendency to stop if I stay in school. Pro: My depressive symptoms started waning when I began to put less effort towards the task of doing well in my classes, and now that I put almost zero effort towards my classes, they're nearly absent (i.e., back to the summer of '21 baseline, which was moderately pleasant). If memory serves, my highscore on the PHQ-9 was a 17 ("moderately severe depression"), while my current score is a 3 ("minimal or no depression"). Pro: Whenever I picture myself dropping out of college, I feel hope, excitement, and relief, none of which are conditional upon me picturing my current business venture (Poetic) taking off. Having previously worked at Introspective Systems, and having known my coworkers, I feel confident that I would be much happier doing menial webdev/data science work there (or at a similar shop) than I am doing my utterly useless and loathsome schoolwork here. Also, I probably would be doing interesting work if I went back to Introspective, so I'm giving you way too much ground there in advance. Contra: That confident assertion is too confident; the grass is always greener on the other side of the fence, so you might hate being in industry more than you hate being in academia. Like you said, time washes away memories of misery, so the present will seem worse than the past if misery is held constant over time. Pro: I reject the framing of that metaphor. If there is a fence, it surrounds the university; freedom lies in the software labor market, where periodically defecting from your organization is not merely tolerated, but expected of you. Like I said, I knew my coworkers at Introspective, and they seemed mostly upbeat. In contrast, both of my closest friends here at UIUC (who are also engineering majors) have confided in me that they, too, struggle with depression. Looks like a symptom of the institution, no? Contra: No. As an alternative explanation, there could plausibly be similarity-based selection effects at play here for who ends up in your your social circles. Birds of a feather, and all. Another explanation is that the mental frames used by those you hang out with tend to seep into your mind over time. Correlation is not strong evidence for causality. Furthermore, observing a correlation is not strong evidence for any particular causal explanation. Pro: A large part of a person's self-esteem (and therefore their well-being) is whether they feel like they are contributing to society, or are a burden upon it. At UIUC, I know that I'm burning through my family's cash at a rate of over$55k/year in order to get some dumb credential. Which makes me feel bad. (Also, I'm spending the taxpayers' money as well, but let's not go there.)

Pro: Even if I was just doing menial webdev work at $30/hour (and again, I would likely be doing something better!), I would know that my employer values the output of my 8-hour workday at least at$240, because they wouldn't have made that trade otherwise. That implies that there's eventually someone that values the product of my day's work at at least 240, because they otherwise wouldn't have used my employer's product or service. This way, even a lower-status industrial position ensures a lower bound on my contribution to society as a whole, which is something I can feel good about. Contra: That's not really how the world works. If you're developing an addictive mobile game, you're not exactly doing the end-user a favor, but you still would get paid. Why don't you just rely on some other metric to measure your self-worth? Pro: Because I can't! If people in my ancestral environment could simply trick themselves into thinking that they were high-status, they would end up violating the norms of their tribe too often, and they would have been shunned, leading to them not reproducing as much. I'm designed to want society to value me, my work, and my time, and money is the unit of societal value. Therefore, getting paid makes me feel good, and so does providing value to society via impactful altruistic projects. Contra: Your earlier "miserable L4 engineer" line doesn't gel with this "I like money" point. A CS degree would meaningfully increase my power to contribute to society, because I would be more likely to get hired and internally promoted at MetaGoogAmaSoft. Therefore, I shouldn't drop out. Pro: I plan to counter this line of objection in greater detail once we get to Opportunity Costs, and also Value Drift. Basically, though, doing direct work is way better than earning to give, and it's also more fun. One of the reasons it's more fun is because MetaGoogAmaSoft is also an Immoral Maze. Pro: Whenever I envision myself completing my degree, I see nothing but three more years of slogging away at meaningless tasks so that I can eventually demonstrate to a sclerotic bureaucracy (again) that I am neither lazy, nor insolent, nor stupid. I didn't exactly run a blind randomized trial on myself, but these observations, along with the trajectory of my depressive symptoms, do constitute substantial evidence that I would be significantly happier outside of this institution than I am within it. Contra: Okay, my turn to appeal to emotion. You have a duty to complete your degree. You owe it to your family (which includes not only your parents, your little brother, and your little sister, but also people who don't exist yet). A degree serves as a mark of prestige, and acts as a hedge against a bad labor market. Contra: Let's imagine a pessimistic scenario: Contra: You drop out, Poetic fails to gain market traction, and OpenAI Codex 2: The Recodexing immediately disgorges a virtually-free mass of mediocre synthetic labor into the market. Junior SWE jobs are eliminated, and anyone not already embedded into MetaGoogAmaSoft is effectively shut out. Instead of having to make friends with nerds, coked up Wharton "econ students" can simply speak with a friendly customer service lady at MetaGoogAmaSoft, who will develop their shitty CRUD app idea for a reasonable fee and a 10% equity stake. ("Okay, so our elevator pitch, our app, is like, uhh, it's Tinder for kids. We call it Kindr!") The S&P goes wild, but the software labor market is reduced to the state of, well, the state of every other sector of the labor market. The magic is gone. We now live in Stephenson's Snow Crash, except instead of Ethereum, we get CBDCs. If you had the UIUC CS credential, you could plausibly swing a job at a defense contractor, but you don't, and these contractors are very conservative about who to hire, you see. Pro: It's not obvious to me that your scenario is internally consistent. To say that a rising tide lifts all boats is cliché, but I find it hard to believe that you couldn't find a job working as a "Transformer Interpretability Engineer" or something. Additionally, your "OpenAI Codex 2: The Recodexing" narrative has significant implications for AGI timelines. Also, Poetic's core product is basically a fine-tuned large language model in gift wrap, so by all rights, we sho– Contra: Oh God, please, not this Singularity bullshit again! Pro: Why not?! You read Superintelligence. You've lurked for years on LessWrong. You know all the arguments quite well, the arguments for looming existential risk, the arguments for the overwhelming imperative to figure out alignment ASAP. You know how indefensible they are. You know how many geniuses have tried to rebut them, and failed miserably. Contra: Actually, I am nursing a few criticisms of the standard arguments. For instance, I brought up Optimal Policies Tend To Seek Power earlier, and that train of thought faltered because the definition of optionality (a similar idea to "Power" as formalized in the paper) is wholly dependent upon the representation scheme of an agent's world-model, so maybe the theoretical basis of the "a treacherous turn is an instrumentally convergent goal" argument isn't as firm as people originally thought. Contra: But none of that fucking matters, because your conclusions are so absurd that there must be a fault in your logic, even if the identity of that fault isn't readily apparent. Do you know how batshit insane you would sound, telling your normal friends what you think about x-risk from AGI? It's like a proof that pi equals four, or a blueprint for a perpetual motion machine. You don't have to actually find a flaw in the convoluted sequence of arguments to know that the conclusion they support is wrong. To a first approximation, Yann LeCun is right about everything, and AGI (if it even arrives during my lifetime) will be "aligned" via trial and error, just like how everything else in ML gets done nowadays. Pro: Your statements reek of motivated reasoning. You'd rather construct a comforting, semi-plausible fiction for yourself to huddle under than to open your eyes to the harsh truth. On this issue, you are in a state of epistemic learned helplessness, and are therefore literally impossible to reason with. Contra: We're at an impasse, then. I propose a compromise. We both agree that mitigating existential risk from unaligned AI should definitely be researched by people. We both agree that I should dedicate a small fraction (like, a tenth) of my resources/career capital towards solving alignment, but also that I should not drink the entire pitcher of Yudkowsky's Kool-Aid. Pro: Well, not yet. I might get compelling evidence to revise this position, if, for instance, a slow-takeoff scenario comes to pass and, say, multimodal transformers can exhibit far more agentlike behavior than I currently expect. Contra: Lmao, sure. Good luck with that. Pro: Ugh, fine then. Deal. Contra: Excellent. What should we discuss next? I'm going to end the post there for now, because it was getting a bit long, especially without any headings to lend it high-level structure and render it navigable. Of course, Pro and Contra aren't done with their debate, so the course of their future dialogue is open to being influenced by the comment section. In fact, both Contra and Pro would love it if you could do them a solid and poke a few holes in the other guy's logic. Discuss ### Elementary Infra-Bayesianism 8 мая, 2022 - 15:23 Published on May 8, 2022 12:23 PM GMT TL;DR: I got nerd-sniped into working through some rather technical work in AI Safety. Here's my best guess of what is going on. Imprecise probabilities for handling catastrophic downside risk. Short summary: I apply the updating equation from Infra-Bayesianism to a concrete example of an Infradistribution and illustrate the process. When we "care" a lot for things that are unlikely given what we've observed before, we get updates that are extremely sensitive to outliers. I've written previously on how to act when confronted with something smarter than yourself. When in such a precarious situation, it is difficult to trust the other; they might dispense their wisdom in a way that steers you to their benefit. In general, we're screwed. But there are ideas for a constrained set-up that force the other to explain itself and point out potential flaws in its arguments. We might try to leverage the other's ingenuity against itself by slowing down its reasoning to our pace. The other would no longer be an oracle with prophecies that might or might not kill us but instead a teacher who lets us see things we otherwise couldn't. While that idea is nice, there is a severe flaw at its core: obfuscation. By making the argument sufficiently long and complicated, the other can sneak a false conclusion past our defenses. Forcing the other to lay out its reasoning, thus, is not a foolproof solution. But (as some have argued), it's unclear whether this will be a problem in practice. Why am I bringing this up? No reason in particular. Why Infra-Bayesianism? Engaging with the work of Vanessa Kosoy is a rite of passage in the AI Safety space. Why is that? • The pessimist answer is that alignment is really, really difficult, and if you can't understand complicated math, you can't contribute. • The optimist take is that math is fun, and (a certain type of) person gets nerd sniped by this kind of thing. • The realist takes naturally falls somewhere in between. Complicated math can be important and enjoyable. It's okay to have fun with it. But being complicated is (in itself) not a mark of quality. I truly believe that if you can't explain it, you don't understand it. So here goes my attempt at "Elementary Infrabayesianism", where I motivate the structure behind Infrabayesianism using pretty pictures and high school mathematics[1]. Did I succeed? Ahaha, no, I only cover the content of one of the relevant posts. But this post has been sitting in my drafts for too long, so here we go. Uncertain updates Imagine it's late in the night, the lights are off, and you are trying to find your smartphone. You cannot turn on the lights, and you are having a bit of trouble seeing properly[2]. You have a vague sense about where your smartphone should be (your prior, panel a). Then you see a red blinking light from your smartphone (sensory evidence, panel b). Since your brain is really good at this type of thing, you integrate the sensory evidence with your prior optimally (despite your disinhibited state) to obtain an improved sense of where your smartphone might be (posterior, panel c). That's just boring old Bayes, nothing to see here, move along.P(S|E)=P(E|S)P(S)P(E).mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-surd + .mjx-box {display: inline-flex} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} Now let's say you are even more uncertain about where you put your smartphone.[3] It might be one end of the room or the other (bimodal prior, panel a). You see a blinking light further to the right (sensory evidence, panel b), so your overall belief shifts to the right (bimodal posterior, panel c). Importantly, by conserving probability mass, your belief that the phone might be on the left end of the room is reduced. The absence of evidence is evidence of absence. This is still only boring old Bayes. To go Infra, we have to go weird.Fundamentally uncertain updates Let's say you are really, fundamentally unsure about where you put your phone. If someone were to put a gun to your head threaten to sign you up for sweaters for kittens unless you give them your best guess, you could not.[4] This is the situation Vanessa Kosoy finds herself in[5].[6] With Infra-Bayesianism, she proposes a theoretical framework for thinking in situations where you can't (or don't want to) specify a prior on your hypotheses. Because she is a mathematician, she is using the proper terminology for this: • a signed measure is a generalization of probability distributions, • an indicator function for a fuzzy set is a generalization of your observation/sensory evidence, • a continuous function g∈C(X,[0,1]) is... wait, what is g? g tells you how much you care about stuff that happens in regions that become very unlikely/impossible given the sensory evidence you obtain. Why should you care about that, you ask? Great question, let's just not care about it for now. Let's set it equal to zero, g=0. When g=0, the updating equation for our two priors, P1 and P2, becomes very familiar indeed: P1(S|E)=P1(E|S)P1(S)P+(E)P2(S|E)=P2(E|S)P2(S)P+(E) This is basically Bayes' theorem applied to each prior separately. Still, the evidence term (the denominator) is computed in a wonky way[7] but this doesn't make much difference since it's a shared scaling factor. Consistently, things also look very normal when using this updating rule to integrate sensory information. We shift our two priors towards the evidence and scale them proportional to how unlikely they said the evidence is. While this picture looks almost identical to the previous section, notice that the prior is still split in two! Thus, we can still tell which one of our initial guesses turned out to be "more accurate".Fundamentally dangerous updates Alright, you know where this is going. We will have to start caring about things that become less likely after observing the evidence. Why we have to care is a bit hard to motivate; Vanessa Kossoy and Diffractor motivate in three parts where I don't even get the first part[8].[9] Instead, I will motivate why you might care about things that seem very unlikely given your evidence by revealing more information about the thought experiment: It's not so much that you can't give your best guess estimate about where you put your smartphone. Rather, you dare not. Getting this wrong would be, like, really bad. You might be unsure whether it's even your phone that's blinking or if it's the phone of the other person sleeping in the room[10]. Or perhaps the bright red light you see is the bulbous red nose of somebody else sleeping in the room. Getting the location of your smartphone wrong would be messy. Better not risk it. We'll set g=1. The update rule doesn't change too much at first glance: P1(S|E)=P1(E|S)P1(S)P−(E)+ϰ1P2(S|E)=P2(E|S)P2(S)P−(E)+ϰ2 Again, the denominator changes from one wonky thing (P+) to another wonky thing (P−);[11] but that still doesn't matter, since it's the same for both equations. And, of course, then there is a ϰ that showed up out of nowhere. ϰ is a variable that tells us how good our distribution is at explaining things that we did not get any evidence for[12]. Intuitively, you can tell that this will favor the prior distribution that was previously punished for not explaining the observation. And indeed, when we run the simulation: One of the two "distributions"[13] is taking off! Even though the corresponding prior was bad at explaining the observation, the updating still strongly increases the mass associated with that hypothesis. Intuitively this translates into something like: You are unsure about the location of your smartphone (and mortally afraid to get it wrong). You follow the red blinking light, but you never discard your alternative hypothesis that the smartphone might be at the other end of the room. At the slightest indication that something is off you'll discard all the information you have collected and start the search from scratch. This is a very cautious strategy, and it might be appropriate when you're in dangerous domains with the potential for catastrophic outliers, basically what Nassim Taleb calls Black Swan events. I'm not sure how productive this strategy is, though; noise might dramatically mess up your updates at some point. Closing thoughts This concludes the introduction to Elementary Infrabayesianism. I realize that I have only scratched the surface of what's in the sequence, and there is more coming out every other month, but letting yourself get nerd-sniped is just about as important as being able to stop working on something and publish. I hope what I wrote here is helpful to some, in particular in conjunction with the other explanations on the topic (1 2 3) which go a bit further than I do in this post. I'm afraid at this point I'm obliged to add a hot take on what all of this means for AI Safety. I'm not sure. I can tell myself a story about how being very careful about how quickly you discard alternative hypotheses/narrow down the hypothesis space is important. I can also see the outline of how this framework ties in with fancy decision theory. But I still feel like I only scratched the surface of what's there. I'd really like to get a better grasp of that Nirvana trick, but timelines are short and there is a lot out there to explore. 1. ^ French high school though, not American high school. 2. ^ If there's been alcohol involved, I want to know nothing of it. 3. ^ The idea that alcohol might have been involved in navigating you into this situation is getting harder to deny. 4. ^ Is this ever a reasonable assumption? I don't know. It seems to me you can always just pick an uninformative prior. But perhaps the point is that sometimes you should acknowledge your cluelessness, otherwise you expose yourself to severe downside risks? But I'm not convinced. 5. ^ Not the coming home drunk situation, only the fundamental confused part. Oh no, that came out wrong. What I mean is that she is trying to become less fundamentally confused. Urgh. I'll just stop digging now. 6. ^ A proper infradistribution would have to be a convex set of distributions and upper complete and everything. Also, the support of the Gaussians would have to be compact. But for the example I'm constructing this won't become relevant, the edge points (the two Gaussians) of the convex set fully characterize how the entire convex set changes. 7. ^ PHg(L)=EH(L)=minp∈{p1,p2}∫RL(x)p(x)dx rather than ∫Rp1(x)+p2(x)2L(x)dx for an uninformative prior. 8. ^ Despite having read it at least twice! 9. ^ A more "natural" way to motivate it might be to talk about possible worlds and updateless decision theory, but this is something that you apparently get out of Infrabayesianism, so we don't want to use it to motivate it. 10. ^ The story is coming together. This is why you can't turn on the light, btw. 11. ^ Actually, in this particular example, it turns out that P+=P−, PHg(L)=EH(1)−EH(1−L)=1−minp∈{p1,p2}∫R(1−L(x))p(x)dx=minp∈{p1,p2}∫RL(x)p(x)dx , since we've got two normalized probability distributions. 12. ^ You can't find any ϰ in Vanessa Kosoy's paper because she is thinking more generally about Banach spaces and also a situation where there is no Radon-Nikodyn derivative. But if we have a density for our measures, we can write ϰ as ∫Xϰdm=b for an inframeasure (m,b). Also, you can't find ϰ basically nowhere because almost nobody uses it! 13. ^ I'm still calling them distributions, although we've left that territory already in the last section. More appropriate would be something like "density function of the signed measure" or "Radon-Nikodym derivative". Discuss ### Cambridge LW Meetup: Books That Change 8 мая, 2022 - 08:23 Published on May 8, 2022 5:23 AM GMT Everyone's read some book that really touched them. Perhaps it can touch others too. Read anything lately that taught you some important skill? That explained something you always struggled with? That changed your view of the world in some deep way? If you have such a book and think it can help other aspiring rationalists, come bring it along and tell others about it. Snacks will be provided, and there will be an informal dinner afterwards. Discuss ### Video and Transcript of Presentation on Existential Risk from Power-Seeking AI 8 мая, 2022 - 06:50 Published on May 8, 2022 3:50 AM GMT In March 2022, I gave a presentation about existential risk from power-seeking AI, as part a lecture series hosted by Harvard Effective Altruism. The presentation summarized my report on the topic. With permission from the organizers, I'm posting the video here, along with the transcript (lightly edited for clarity/concision) and the slides Main Talk Thanks for having me, nice to be here, and thanks to everyone for coming. I'm Joe Carlsmith, I work at Open Philanthropy, and I'm going to be talking about the basic case, as I see it for, for getting worried about existential risk from artificial intelligence, where existential risk just refers to a risk of an event that could destroy the entire future and all of the potential for what the human species might do. Plan I'm going to discuss that basic case in two stages. First, I'm going to talk about what I see as the high-level backdrop picture that informs the more detailed arguments about this topic, and which structures and gives intuition for why one might get worried. And then I'm going to go into a more precise and detailed presentation of the argument as I see it -- and one that hopefully makes it easier to really pin down which claims are doing what work, where might I disagree and where could we make more progress in understanding this issue. And then we'll do some Q&A at the end. I understand people in the audience might have different levels of exposure and understanding of these issues already. I'm going to be trying to go at a fairly from-scratch level. And I should say: basically everything I'm going to be saying here is also in a report I wrote last year, in one form or another. I think that's linked on the lecture series website, and it's also on my website, josephcarlsmith.com. So if there's stuff we don't get to, or stuff you want to learn more about, I'd encourage you to check out that report, which has a lot more detail. And in general, there's going to be a decent amount of material here, so some of it I'm going to be mentioning as a thing we could talk more about. I won't always get into the nitty-gritty on everything I touch on. High-level backdrop Okay, so the high-level backdrop here, as I see it, consists of two central claims. • The first is just that intelligent agency is an extremely powerful force for transforming the world on purpose. • And because of that, claim two, creating agents who are far more intelligent than us is playing with fire. This is just a project to be approached with extreme caution. And so to give some intuition for this, consider the following pictures of stuff that humanity as a species has done: Here's the City of Tokyo, the Large Hadron Collider, this is a big mine, there's a man on the moon. And I think there's a way of just stepping back and looking at this and going "this is an unusual thing for a species on the planet earth to do." In some sense, this is an unprecedented scale and sophistication of intentional control over our environment. So humans are strange. We have a kind of "oomph" that gives rise to cities like this. And in particular, these sorts of things, these are things we've done on purpose. We were trying to do something, we had some goal, and these transformative impacts are the result of us pursuing our goals. And also, our pursuit of our goals is structured and made powerful by some sort of cognitive sophistication, some kind of intelligence that gives us plausibly the type of "oomph" I'm gesturing at. I'm using "oomph" as a specifically vague term, because I think there's a cluster of mental abilities that also interact with culture and technology; there are various factors that explain exactly what makes humanity such a potent force in the world; but very plausibly, our minds have something centrally to do with it. And also very plausibly, whatever it is about our minds that gives us this "oomph," is something that we haven't reached the limit of. There are possible biological systems, and also possible artificial systems, that would have more of these mental abilities that give rise to humanity's potency. And so the basic thought here is that at some point in humanity's history, and plausibly relatively soon -- plausibly, in fact, within our lifetimes -- we are going to transition into a scenario where we are able to build creatures that are both agents (they have goals and are trying to do things) and they're also more intelligent than we are, and maybe vastly more. And the basic thought here is that that is something to be approached with extreme caution. That that is a sort of invention of a new species on this planet, a species that is smarter, and therefore possibly quite a bit more powerful, than we are. And that that's something that could just run away from us and get out of our control. It's something that's reasonable to expect dramatic results from, whether it goes well or badly, and it's something that if it goes badly, could go very badly. That's the basic intuition. I haven't made that very precise, and there are a lot of questions we can ask about it, but I think it's importantly in the backdrop as a basic orientation here. Focus on power-seeking I'm going to focus, in particular, on a way of cashing out that worry that has to do with the notion of power, and power-seeking. And the key hypothesis structuring that worry is that suitably capable and strategic AI agents will have instrumental incentives to gain and maintain power, since this will help them pursue their objectives more effectively. And so the basic idea is: if you're trying to do something, then power in the world -- resources, control over your environment -- this is just a very generically useful type of thing, and you'll see that usefulness, and be responsive to that usefulness, if you're suitably capable and aware of what's going on. And so the worry is, if we invent or create suitably capable and strategic AI agents, and their objectives are in some sense problematic, then we're facing a unique form of threat. A form of threat that's distinct, and I think importantly distinct, from more passive technological problems. For example, if you think about something like a plane crash, or even something as extreme as a nuclear meltdown: this is bad stuff, it results in damage, but the problem remains always in a sense, passive. The plane crashes and then it sits there having crashed. Nuclear contamination, it's spreading, it's bad, it's hard to clean up, but it's never trying to spread, and it's not trying to stop you from cleaning it up. And it's certainly not doing that with a level of cognitive sophistication that exceeds your own. But if we had AI systems that went wrong in the way that I'm imagining here, then they're going to be actively optimizing against our efforts to take care of the problem and address it. And that's a uniquely worrying situation, and one that we have basically never faced. Basic structure of the more detailed argument That's the high-level backdrop here. And now what I'm going to try to do is to make that worry more precise, and to go through specific claims that I think go into a full argument for an existential catastrophe from this mechanism. I'm going to structure that argument as follows, in six stages. The first is a claim about the timelines: when it will become possible to build relevantly dangerous AI systems. And so there's a lot we can say about that. I think Holden talked about that a bit last week. Some other people in the speaker series are going to be talking about that issue. Ajeya Cotra at Open Philanthropy, who I think is coming, has done a lot of work on this. I'm not going to talk a lot about that, but I think it's a really important question. I think it's very plausible, basically, that we get systems of the relevant capability within our lifetimes. I use the threshold of 2070 to remind you that these are claims you will live to see falsified or confirmed. Probably, unless something bad happens, you will live to see: were the worries about AI right? Were we even going to get systems like this at all or not, at least soon? And I think it's plausible that we will, that this is an "our lifetime" issue. But I'm not going to focus very much on that here. I'm going to focus more on the next premises: • First, there's a thought that there are going to be strong incentives to build these sorts of systems, once we can, and I'll talk about that. • The next thought is that: once we're building these systems, it's going to be hard, in some sense, to get their objectives right, and to prevent the type of power-seeking behavior that I was worried about earlier. • Then the fourth claim is that we will, in fact, deploy misaligned systems -- systems with problematic objectives, that are pursuing power in these worrying ways -- and they will have high impact failures, failures at a serious level. • And then fifth, those failures will scale to the point of the full disempowerment of humanity as a species. That's an extra step. • And then finally, there's a sixth premise, which I'm not going to talk about that much, but which is an important additional thought, which is that this itself is a drastic production in the expected value of the future. This is a catastrophe on a profound scale. Those are the six stages, and I'm going to talk through each of them a bit, except the timelines one. Three key properties Let me talk a little bit about what I mean by relevantly dangerous, what's the type of system that we're worried about here. I'm going to focus on three key properties of these systems. • The first is advanced capability, which is basically just a way of saying they're powerful enough to be dangerous if they go wrong. I operationalize that as: they outperform the best humans on some set of tasks, which when performed at advanced levels grant significant power in today's world. So that's stuff like science or persuasion, economic activity, technological development, stuff like that. Stuff that yields a lot of power. • And then the other two properties are properties that I see as necessary for getting this worry about alignment -- and particularly, the instrumental incentives for power-seeking -- off the ground. Basically, the second property here -- agentic planning -- says that these systems are agents, they're pursuing objectives and they're making plans for doing that. • And then three, strategic awareness, says that they're aware, they understand the world enough to notice and to model the effects of seeking power on their objectives, and to respond to incentives to seek power if those incentives in fact arise. And so these three properties together, I call "APS" or advanced, planning, strategically-aware systems. And those are the type of systems I'm going to focus on throughout. (Occasionally I will drop the APS label, and so just generally, if I talk about AI systems going wrong, I'm talking about APS systems.) Incentives So suppose we can build these systems, suppose the timelines thing becomes true. Will we have incentives to do so? I think it's very likely we'll have incentives to build systems with advanced capabilities in some sense. And so I'm mostly interested in whether there are incentives to build these relevantly agentic systems -- systems that are in some sense trying to do things and modeling the world in these sophisticated ways. It's possible that AI doesn't look like that, that we don't have systems that are agents in the relevant sense. But I think we probably will. And there'll be strong incentives to do that, and I think that's for basically three reasons. • The first is that agentic and strategically aware systems seem very useful to me. There are a lot of things we want AI systems to do: run our companies, help us design policy, serve as personal assistants, do long-term reasoning for us. All of these things very plausibly benefit a lot from both being able to pursue goals and make plans, and then also, to have a very rich understanding of the world. • The second reason to suspect there'll be incentives here are just that available techniques for developing AI systems might make agency and strategic awareness on the most sufficient development pathway. An example of that might be: maybe the easiest way to train AI systems to be smart is to expose them to a lot of data from the internet or text corpora or something like that. And that seems like it might lead very naturally to a rich sophisticated understanding of the world. • And then three, I think it's possible that some of these properties will just emerge as byproducts of optimizing a system to do something. This is a more speculative consideration, but I think it's possible that if you have a giant neural network and you train it really hard to do something, it just sort of becomes something more like an agent and develops knowledge of the world just naturally. Even if you're not trying to get it to do that, and indeed, maybe even if you're trying to get it not to do that. That's more speculative. Of these, basically, I put the most weight on the first. I think the usefulness point is a reason to suspect we'll be actively trying to be build systems that have these properties, and so I think we probably will. AlignmentDefinitions Suppose we can build these systems, and there are incentives to build these systems. Now let's talk about whether it will be hard to make sure that they're aligned, or to make sure that their objectives are in some sense relevantly innocuous from the perspective of these worries about power-seeking. I'm going to give three definitions here that I think will be helpful. • The first is the definition of misaligned behavior. So I'm defining that as unintended behavior that arises in virtue of problems with a system's objectives. This can get fuzzy at times, but the basic thought here is that certain failures of a system look like it's breaking or it's failing in a non-competent way. And then certain forms of failure look like it's doing something competent, it's doing something well, but it's not the thing you wanted it to do. For example, if you have an employee, and the employee gives a bad presentation, that was a failure of competence. If the employee steals your money in some really smart way, that's a misalignment of your employee. Your employee is trying to do the wrong thing. And so that's the type of misaligned behavior I'm talking about. • Misaligned power-seeking is just misaligned behavior that involves power-seeking. • And then practically PS-aligned is the key alignment property I'm interested in, which is basically: that a system doesn't engage in misaligned power-seeking on any of the inputs that it is in fact exposed to. Importantly, it doesn't need to be the case that a system would never, in any circumstances, or any level of capability, or something like that, engage in some problematic form of power-seeking. It's just that in the actual circumstances that the system actually gets used in, it's not allowed to do that. That's what it takes to be practically PS aligned on the definition I'm going to use. Instrumental convergence Those are a few definitions to get us started. Now let's talk about: why might we think it's hard to prevent misaligned power-seeking of this type? I mentioned this hypothesis, often call it the instrumental convergence hypothesis, and here's how I understand it: Misaligned behavior on some inputs strongly suggest misaligned power-seeking on those inputs too. The idea is that there's actually a very close connection between any form of misalignment, any form of problematic objectives being pursued by systems that are relevantly strategic, and misaligned power-seeking in particular. And the connection is basically from this fact that misaligned behaviors is in pursuit of problematic objectives, and power is useful for loss of objectives. There's lots we can query about this thesis. I think it's an important, central piece of the story here, and there are questions we can ask about it. But for now I'm just going to flag it and move on. The difficulty of practical PS-alignment So suppose we're like: okay, yeah, power-seeking could happen. Why don't we just make sure it doesn't? And in particular, why don't we just make sure that these systems have just innocuous values, or are pursuing objectives that we're okay seeing pursued? Basically, how to do that: I think there are two main steps, if you want to ensure a practical PS alignment of the relevant kind. • First, you need to cause the APS system to be such that the objectives it pursues on some set of inputs, X, do not give rise to misaligned power-seeking. • And then you need to restrict the inputs it receives to that set of inputs. And these trade-off against each other. If you make it a very wide set of inputs where the system acts fine, then you don't need to control the inputs it receives very much. If you make it so it only plays nice on a narrow set of inputs, then you need to exert a lot more control at step two. And in general, I think there are maybe three key types of levers you can exert here. One is: you can influence the system's objectives. Two, you can influence its capabilities. And three, you can influence the circumstances it's exposed to. And I'll go through those each in turn. Controlling objectives Let's talk about objectives. Objectives is where most of the discourse about AI alignment focuses. The idea is: how do we make sure we can exert the relevant type of control over the objectives of the systems we create, so that their pursuit of those objectives doesn't involve this type of power-seeking? Here are two general challenges with that project. The first is a worry about proxies. It's basically a form of what's known as Goodhart’s Law: if you have something that you want, and then you have a proxy for that thing, then if you optimize very hard for the proxy, often that optimization breaks the correlation that made the proxy a good proxy in the first place. An example here that I think is salient in the context of AI is something like the proxy of human approval. It's very plausible we will be training AI systems via forms of human feedback, where we say: 10 out of 10 for that behavior. And that's a decent proxy for a behavior that I actually want, at least in our current circumstances. If I give 10 out of 10 to someone for something they did, then that's plausibly a good indicator that I actually like, or would like, what they did, if I understood. But if we then have a system that's much more powerful than I am and much more cognitively sophisticated, and it's optimizing specifically for getting me to give it a 10 out of 10, then it's less clear that that correlation will be preserved. And in particular, the system now may be able to deceive me about what it's doing; it may be able to manipulate me; it may be able to change my preferences; in an extreme case, it may be able to seize my arm and force me to press the 10 out of 10 button. That's the type of thing we have in mind here. And there are a lot of other examples of this problem, including in the context of AI, and I go through a few in the report. The second problem I'm calling it a problem with search. And this is a problem that arises with a specific class of techniques for developing AI systems, where, basically, you set some criteria, which you take to be operationalizing or capturing the objectives you want the systems to pursue, and then you search over and select for an agent that performs well, according to those criteria. But the issue is that performing well according to the criteria that you want doesn't mean that the system is actively pursuing good performance on those criteria as its goal. So a classic example here, though it's a somewhat complicated one, is evolution and humans. If you think of evolution as a process of selecting for agents that pass on their genes, you can imagine a giant AI designer, running an evolutionary process similar to the one that gave rise to humans, who is like: "I want a system that is trying to pass on its genes." And so you select over systems that pass on their genes, but then what do you actually get out at the end? Well, you get humans, who don't intrinsically value passing on their genes. Instead, we value a variety of other proxy goals that were correlated with passing on our genes throughout our evolutionary history. So things like sex, and food, and status, and stuff like that. But now here are the humans, but they're not optimizing for passing on their genes. They're wearing condoms, and they're going to the moon, they're doing all sorts of wacky stuff that you didn't anticipate. And so there's a general worry that that sort of issue will arise in the context of various processes for training AI systems that are oriented towards controlling their objectives. These are two very broad problems. I talk about them more in the report. And they are reasons for pessimism, or to wonder, about how well we'll be able to just take whatever values we want, or whatever objectives we want an AI system to pursue, and just "put them in there." The "putting them in there" is challenging at a number of different levels. Other options That said, we have other tools in the toolbox for ensuring practical PS alignment, and I'll go through a few here. One is: we can try to shape the objectives in less fine-grained ways. So instead of saying, ah, we know exactly what we want and we'll give it to the systems, we can try to ensure higher level properties of these objectives. So two that I think are especially nice: you could try to ensure that the systems objectives are always in some sense myopic, or they're limited in their temporal horizon. So an episodic system, a system that only cares about what happens in the next five minutes, that sort of stuff. Systems of that kind seem, for various reasons, less dangerous. Unfortunately, they also seem less useful, especially for long-term tasks, so there's a trade-off there. Similarly, you can try to ensure that systems are always honest and they always tell you the truth, even if they're otherwise problematic. I think that would be a great property if you could ensure it, but that also may be challenging. There are some options here that are less demanding ,in terms of the properties you're trying to ensure about the objectives, but that have their own problems. And then you can try to exert control over other aspects of the system. You can try to control its capabilities by making sure it's specialized, or not very able to do a ton of things in a ton of different domains. You can try to prevent it from enhancing its capabilities, I think that's a very key one. A number of the classic worries about AI involve the AI improving itself, getting up in its head, "editing its own source code," or in a context of machine learning, maybe running a new training process or something like that. That's a dicey thing. Generally if you can prevent that, I think you should. It should be your choice, whether a system's capabilities scale up. And then number three is: you can try to control the options and incentives that the system has available. You can try to put it in some environment where it has only a limited range of actions available. You can try to monitor its behavior, you can reward it for good behavior, stuff like that. So there are a lot of tools in the toolbox here. All of them, I think, seem like there might be useful applications. But they also seem problematic and dicey, in my opinion, to rely on, especially as we scale-up the capabilities of the systems we're talking about. And I go through reasons for that in the report. What's unusual about this problem? So those are some concrete reasons to be worried about the difficulty of ensuring practical PS alignment. I want to step back for a second and ask the question: okay, but what's unusual about this? We often invent some new technology, they are often safety issues with it, but also, often, we iron them out and we work through it. Planes: it's dicey initially, how do you make the plane safe? But now planes are really pretty safe, and we might expect something similar for lots of other technologies, including this one. So what's different here? Well, I think there are at least three ways in which this is a uniquely difficult problem. The first is that our understanding of how these AI systems work, and how they're thinking, and our ability to predict their behavior, is likely to be a lot worse than it is with basically any other technology we work with. And that's for a few reasons. • One is, at a more granular level, the way we train AI systems now (though this may not extrapolate to more advanced systems) is often a pretty black box process in which we set up high-level parameters in the training process, but we don't actually know at a granular level how the information is being processed in the system. • Even if we solve that, though, I think there's a broader issue, which is just that once you're creating agents that are much more cognitively sophisticated than you, and that are reasoning and planning in ways that you can't understand, that just seems to me like a fundamental barrier to really understanding and anticipating their behavior. And that's not where we're at with things like planes. We really understand how planes work, and we have a good grasp on the basic dynamics that allows us a degree of predictability and assurance about what's going to happen. Two, I think you have these adversarial dynamics that I mentioned before. These systems might be deceiving you, they might be manipulating you, they might be doing all sorts of things that planes really don't do. And then three: I think there are higher stakes of error here. If a plane crashes, it's just there, as I said. I think a better analogy for AI is something like an engineered virus, where, if it gets out, it gets harder and harder to contain, and it's a bigger and bigger problem. For things like that, you just need much higher safety standards. And I think for certain relevantly dangerous systems, we just actually aren't able to meet the safety standards, period, as a civilization right now. If we had an engineered virus that would kill everyone if it ever got out of the lab, I think we just don't have labs that are good enough, that are at an acceptable level of security, to contain that type of virus right now. And I think that AI might be analogous. Those are reasons to think this is an unusually difficult problem, even relative to other types of technological safety issues. Deployment But even if it's really difficult, we might think: fine, maybe we can't make the systems safe, and so we don't use them. If I had a cleaning robot, but it always killed everyone's cats -- everyone we sold it to, it killed their cat immediately, first thing it did -- we shouldn't necessarily expect to see everyone buying these robots and then getting their cats killed. Very quickly, you expect: all right, we recall the robots, we don't sell them, we noticed that it was going to kill the cat before we sold it. So there's still this question of, well, why are you deploying these systems if they're unsafe, even if it's hard to make them safe? And I think there's a number of reasons we should still be worried about actually deploying them anyway. • One is externalities, where some actor might be willing to impose a risk on the the whole world, and it might be rational for them to do that from their own perspective, but not rational for the world to accept it. That problem can be exacerbated by race dynamics where there are multiple actors and there's some advantage to being in the lead, so you cut corners on safety in order to secure that advantage. So those are big problems. • I think having a ton of actors who are in a position to develop systems of the relevant level of danger makes it harder to coordinate. Even if lots of people are being responsible and safe, it's easier to get one person who messes up. • I think even pretty dangerous systems can be very useful and tempting to use for various reasons. • And then finally, as I said, I think these systems might deceive you about their level of danger, or manipulate you, or otherwise influence the process of their own deployment. And so those are all reasons to think we actually deploy systems that are unsafe in a relevant sense. Scaling Having done that though, there's still this question of: okay, is that going to scale-up to the full disempowerment of humanity, or is it something more like, oh, we notice the problem, we address it, we introduce new regulations and there are new feedback loops and security guarantees and various things, to address this problem before it spirals entirely out of our control. And so there's a lot to say about this. I think this is one place in which we might get our act together, though depending on your views about the level of competence we've shown in response to things like the COVID-19 virus and other things, you can have different levels of pessimism or optimism about exactly how much getting our act together humanity is likely to do. But the main point I want to make here is just that I think there's sometimes a narrative in some of the discourse about AI risk that assumes that the only way we get to the relevant level of catastrophe is via what's called a "fast takeoff," or a very, very concentrated and rapid transition from a state of low capability to a state of very high AI capability, often driven by the AI improving itself, and then there's one AI system that takes over and dominates all of humanity. I think that is something like that is a possibility. I think the more our situation looks like that -- and there's various parameters that go into that -- that's more dangerous. But even if you don't buy that story, I think the danger is still very real. • I think having some sort of warning is helpful but I think it's not sufficient: knowing about a problem is not sufficient to fix it, witness something like climate change or various other things. • I think even if you have a slow rolling catastrophe, you can just still have a catastrophe. • And I think you can have a catastrophe in which there are many systems that are misaligned, rather than a single one. An analogy there is something like, if you think about the relationship between humans and chimpanzees, no single human took over, but nevertheless, humans as a whole right now are in a dominant position relative to chimpanzees. And I think you could very well end up in a similar situation with respect to AI systems. I'll skip for now the question of whether humanity being disempowered is a catastrophe. There are questions we can ask there, but I think it's rarely the crux. Putting it together (in April 2021) So let's just put it all together into these six premises. I'm going to try to assign rough credence to these premises. I think we should hold this all with a grain of salt. I think it can be a useful exercise. And so each of these premises is conditional on the previous premises being true, so we can multiply them all through to get a final probability. • At least as of April 2021, I put 65% on the timelines condition that becomes possible and financially feasible to build APS systems by 2070, • 80%, conditional on that, they will be strong incentives to build those systems, • 40%, conditional on that, that it will be much harder to develop APS systems that would be practically PS aligned if deployed, than to develop APS systems that would be practically PS misaligned if deployed, but which are at least superficially attractive to deploy, anyway. • then 65%, conditional on that, that you'll get actual deployed systems failing in high impact ways -- I'm saying a trillion dollars of damage or more, • then 40%, conditional on that, that this failure scales up to the disempowerment of all of humanity, • and then 95%, conditional on all of that, that that's an existential catastrophe. These are some rough numbers. I think this is an exercise to hold with lots of skepticism, but I think it can be useful. And I encourage you, if you're trying to form views about this issue, to really go through and just throw out some numbers and see what you get. And so that overall that leads me to about 5% on all of those premises being true by 2070. And then you want to adjust upwards for scenarios that don't fit those premises exactly. Since writing the report, I've actually adjusted my numbers upwards, to greater amounts of worry, especially for premises two through five. I'm currently at something above 10%, though I haven't really pinned down my current estimates. And I also have some concern that there's a biasing that enters in from having a number of different premises, and so if you're unwilling to be confident about any of the premises, then if you have lots of premises, then that will just naturally drive down the final answer, but it can be arbitrary how many premises you include. And so I have some concern that the way I'm setting it up is having that effect too. That's the overall argument here. We'll go into Q&A. Before doing that, I'll say as a final thing: the specific numbers aside, the upshot here, as I see it, is that this is a very serious issue. I think it's the most serious issue that we as a species face right now. And I think there's a lot to be done, there are a lot of people working on this, but there's also a lot of room for people to contribute in tons of ways. Having talented people thinking about this, and in particular thinking about: how can we align these systems? What techniques will ensure the type of properties in these systems that we need? How can we understand how they work, and then how can we create the incentives, and policy environment, and all sorts of other things, to make sure this goes well? I think this is basically the most important issue in the world right now, and there's a lot of room to get involved. If you're interested, you can follow-up with me, you can follow-up with other people in the speaker series, and I would love to hear from you. Cool, thanks everybody, we can go to questions. Q&A Question: So you made the analogy to a pandemic, and I've heard an argument that I think is compelling, that COVID could provide a sufficient or very helpful warning shot for us in terms of preventing something that could be significantly more deadly. There's a good chance that we won't get a warning shot with AI, but I'm wondering what an effective or sufficient warning shot would look like, and is there a way to... I mean, because with COVID, it's really gotten our act together as far as creating vaccines, it's really galvanized people, you would hope that's the outcome. What would a sufficient warning shot to sufficiently galvanize people around this issue and really raise awareness in order to prevent an existential crisis? Response: I feel maybe more pessimistic than some about the extent to which COVID has actually functioned as a warning shot of the relevant degree of galvanization, even for pandemics. I think it's true: pandemics are on the radar, there is a lot of interest in it. But from talking with folks I know who are working on really trying to prevent the next big pandemic and pandemics at larger scales, I think they've actually been in a lot of respects disappointed by the amount of response that they've seen from governments and in other places. I wouldn't see COVID as: ah, this is a great victory for a warning shot, and I would worry about something similar with AI. So examples of types of warning shots that I think would be relevant: there's a whole spectrum. I think if an AI system breaks out of a lab and steals a bunch of cryptocurrency, that's interesting. Everyone's going, "Wow, how did that happen?" If an AI system kills people, I think people will sit up straight, they will notice that, and then there's a whole spectrum there. And I think the question is: what degree of response is required, and what exactly does it get you. And there I have a lot more concerns. I think it's easy to get the issue on people's radar and get them thinking about different things. I think the question of like, okay, but does that translate into preventing the problem from ever arising again, or driving the probability of that problem arising again or at a larger scale are sufficiently low -- there I feel more uncertainty and concern. Question: Don't take this the wrong way, but I'm very curious how you'd answer this. As far as I see from your CV, you haven't actually worked on AI. And a lot of the people talking about this stuff, they're philosophy PhDs. So how would you answer the criticism of that? Or how would you answer the question: what qualifies you to weigh in or discuss these hypothetical issues with AI, vs. someone who is actually working there. Response: I think it's a reasonable question. If you're especially concerned about technical expertise in AI, as a prerequisite for talking about these issues, there will be folks in the speaker series who are working very directly on the technical stuff and who also take this seriously, and you can query them for their opinions. There are also expert surveys and other things, so you don't have to take my word for it. That said, I actually think that a lot of the issues here aren't that sensitive to the technical details of exactly how we're training AI systems right now. Some of them are, and I think specific proposals for how you might align a system, the more you're getting into the nitty-gritty on different proposals for alignment, I think then technical expertise becomes more important. But I actually think that the structure of the worry here is accessible at a more abstract level. And in fact, in my experience, and I talk to lots of people who are in the nitty-gritty of technical AI work, my experience is that the discussion of this stuff is nevertheless at a more abstract level, and so that's just where I see the arguments taking place, and I think that's what's actually driving the concern. So you could be worried about that, and generally you could be skeptical about reasoning of this flavor at all, but my own take is that this is the type of reasoning that gives rise to the concern. Question: Say you're trying to convince just a random person on the street to be worried about AI risks. Is there a sort of article, or Ted Talk, or something you would recommend, that you would think would be the most effective for just your average nontechnical person? People on twitter were having trouble coming up with something other than Nick Bostrom's TED talk or Sam Harris's TED talk. Response: So other resources that come to mind: Kelsey Piper has an intro that she wrote for Vox a while back, that I remember enjoying, and so I think that could be one. And that's fairly short. There's also a somewhat longer introduction by Richard Ngo, at OpenAI, called AI Safety from First Principles. That's more in depth, but it's shorter than my report, and goes through the case. Maybe I'll stop there. There are others, too. Oh yeah, there's an article in Holden Karnofsky's Most Important Century Series called: Why AI Alignment Might Be Hard With Deep Learning. That doesn't go through the full argument, but I think it does point at some of the technical issues in a pretty succinct and accessible way. Oh yeah: Robert Miles's YouTube is also good. Question: Why should we care about the instrumental convergence hypothesis? Could you elaborate a little bit more the reasons to believe it? And also, one question from Q and A is: let's say you believe in Stuart Russell arguments that all AI should be having the willingness hardwired into them to switch them themselves out, should the humans desire them to do so. Does that remove the worries about instrumental convergence? Response: Sorry, just so I'm understanding the second piece. Was the thought: "if we make these systems such that they can always be shut off then is it fine?" Questioner: Yeah. Response: OK, so maybe I'll start with the second thing. I think if we were always in a position to shut off the systems, then that would be great. But not being shut off is a form of power. I think Stuart Russell has this classic line: you can't fetch the coffee if you're dead. To extent that your existence is promoting your objectives, then continuing to be able to exist and be active and be not turned off is also going to promote your objectives. Now you can try to futz with it, but there's a balance where if you try to make the system more and more amenable to being turned off, then sometimes it automatically turns itself off. And there's work on how you might deal with this, I think it goes under the term "the shutdown problem," I think actually Robert Miles has a YouTube on it. But broadly, if we succeed at getting the systems such that they're always happy to be shut off, then that's a lot of progress. So what else is there to say about instrumental convergence and why we might expect it? Sometimes you can just go through specific forms of power-seeking. Like why would it be good to be able to develop technology? Why would it be good to be able to harness lots of forms of energy? Why would it be good to survive? Why would it be good to be smarter? There are different types of power, and we can just go through and talk about: for which objectives would this be useful? We can just some imagine different objectives and get a flavor for why it might be useful. We can also look at humans and say, ah, it seems like when humans try to accomplish things... Humans like money, and money is a generic form of power. Is that a kind of idiosyncrasy of humans that they like money, or is it something more to do with a structural feature of being an agent and being able to exert influence in the world, in pursuit of what you want? We can look at humans, and I go through some of the evidence from humans in the report. And then finally there are actually some formal results where we try to formalize a notion of power-seeking in terms of the number of options that a given state allows a system. This is work by Alex Turner, which I'd encourage folks to check out. And basically you can show that for a large class objectives defined relative to an environment, there's a strong reason for a system optimizing those objectives to get to the states that give them many more options. And the intuition for that is like: if your ranking is over final states, then the states with more options will give you access to more final states, and so you want to do that. So those are three reasons you might worry. Question: Regarding your analysis of this policy thing: how useful do you find social science theories such as IR or social theory -- for example, a realism approach to IR. Somebody asking in the forum, how useful do you find those kind of techniques? Response: How useful do I find different traditions in IR for thinking about what might happen with AI in particular? Questioner: The theories people have already come up with in IR or the social sciences, regarding power. Response: I haven't drawn a ton on that literature. I do think it's likely to be relevant in various ways. And in particular, I think, a lot of the questions about AI race dynamics, and deployment dynamics, and coordination between different actors and different incentives -- some of this mirrors other issues we see with arms races, we can talk about bargaining theory, and we can talk about how different agents with different objectives, and different levels of power, how we should expect them to interact. I do think there's stuff there, I haven't gone very deep on that literature though, so I can't speak to it in-depth. Question: What's your current credence, what's your prior, that your own judgment of this report to be correct? And is it 10%, bigger than 10%, 15% or whatever, and how spread out are credences of those whose judgment you respect? Response: Sorry, I'm not sure I'm totally understanding the question. Is it: what's the probability that I'm right? Various questioners: A prior distribution of all those six arguments on the slide...the spread ... what's the percentage, you think yourself, is right. Response: My credence on all of them being right, as I said, it's currently above 10%. It feels like the question is trying to get at something about my relationship with other people and their views. And am I comfortable disagreeing with people, and maybe some people think it's higher, some people think it's lower. Questioner in the chat: Epistemic status maybe? Response: Epistemic status? So, I swing pretty wildly here, so if you're looking for error bars, I think there can be something a little weird about error bars with your subjective credences, but in terms of the variance: it's like, my mood changes, I can definitely get up very high, as I mentioned I can get up to 40% or higher or something like this. And I can also go in some moods quite low. So this isn't a very robust estimate. And also, I am disturbed by the amount of disagreement in the community. We solicited a bunch of reviews of my report. If people are interested in looking at other people's takes on the premises, that's on LessWrong and on the EA Forum, and there's a big swing. Some people are at 70%, 80% doom by 2070, and some people are at something very, very low. And so there is a question of how to handle that disagreement. I am hazily incorporating that into my analysis, but it's not especially principled, and there are general issues with how to deal with disagreement in the world. Yeah, maybe that can give some sense of the epistemic status here. Question: It seems to me that you could plausibly get warning shots by agents that are subhuman in general capabilities, but still have some local degree of agency without strategic planning objectives. And that seems like a pretty natural... or you could consider some narrow system, where the agent has some strategy within the system to not be turned off, but globally is not aware of its role as an agent within some small system or something. I feel like there's some notion of there's only going to be a warning shot once agents are too powerful to stop, that I vaguely disagree with, I was just wondering your thoughts. Response: I certainly wouldn't want to say that, and as I mentioned, I think warning shots come in degrees. You're getting different amounts of evidence from different types of systems as to how bad these problems are, when they tend to crop up, how hard they are to fix. I totally agree: you can get evidence and warning shots, in some sense, from subhuman systems, or specialized systems, or lopsided systems that are really good at one thing, bad at others. And I have a section in the report on basically why I think warning shots shouldn't be all that much comfort. I think there's some amount of comfort, but I really don't think that the argument here should be like, well, if we get warning shots, then obviously it'll be fine because we'll just fix it. I think knowing that there's a problem, there's a lot of gradations of how seriously you're taking it, and there's a lot of gradation in terms of how easy it is to fix and what resources you're bringing to bear to the issue. And so my best guess is not that we get totally blindsided, no one saw this coming, and then it just jumps out. My guess is actually people are quite aware, and it's like, wow yeah, this is a real issue. But nevertheless, we're just sort of progressing forward and I think that's a a very worrying and reasonably mainline scenario. Question: So does that imply, in your view of the strategic or incentive landscape, you just think that the incentive structure would just be too strong, that it will require deploying =planning AI versus just having lots of tool-like AI. Response: Basically I think planning and strategic awareness are just going to be sufficiently useful, and it's going to be sufficiently hard to coordinate if there are lots of actors, that those two issues in combination will push us towards increasing levels of risk and in the direction of more worrying systems. Question: One burning question. How do you work out those numbers? How do you work out the back-of-the-envelope calculation, or how hard did you find those numbers? Response: It's definitely hard. There are basic calibration things you can try to do, you can train yourself to do some of this, and I've done some of that. I spent a while, I gave myself a little survey over a period of weeks where I would try to see how my numbers changed, and I looked at what the medians were and the variance and stuff like that. I asked myself questions like: would I rather win10,000 if this proposition were true, or I pulled a red ball out an an urn with 70% red balls? You can access your intuitions using that thought experiment. There are various ways, but it's really hard. And as I said, I think those numbers should be taken with a grain of salt. But I think it's still more granular and I think more useful than just saying things like "significant risk" or "serious worry" or something. Because that can mean a really lot of things to different people, and I think it can be useful to be more specific.

Host: Alright, let's thank Joe again for this wonderful presentation.

Response: Thanks everybody, appreciate it.

Discuss

### Algorithmic formalization of FDT?

8 мая, 2022 - 04:36
Published on May 8, 2022 1:36 AM GMT

I occasionally see a question like "what would FDT recommend in ....?" and I am puzzled that there is no formal algorithm to answer it. Instead humans ask other humans, and the answers are often different and subject to interpretation. This is rather disconcerting. For comparison, you don't ask a human what, say, a chessbot would do in a certain situation, you just run the bot. Similarly, it would be nice to have an "FDTbot" one can feed a decision theory problem to. Does something like that exist? If not, what are the obstacles?

Discuss

### Experience on Meloxicam

8 мая, 2022 - 03:30
Published on May 8, 2022 12:30 AM GMT

I continue to have wrist and hand issues, though they are mostly fine as long as I minimize how much I use them. Which mostly means a lot of dictation! Still, it would be good both to understand what is wrong with them and figure out how I can resume some of the activities I enjoy that I've stopped.

On 2022-03-04 I started taking 15mg of Meloxicam, daily with dinner. This is a nonsteroidal anti-inflammatory (NSAID), which often reduces arthritic swelling. I continued for a month, with my last dose on 2022-04-04. Unlike when I tried Methotrexate, I didn't notice any changes either when I started or when I went off.

To understand whether there might be a more subtle effect, I also checked my level of finger swelling each morning on waking by clasping my hands and seeing how tight it felt. I rated this feeling on a scale of 1 to 10, and all days were in the range 6 to 9 inclusive:

I take this to mean Meloxicam is very unlikely to reduce my swelling, and may even slightly increase it.

Still trying to figure out what to try next. One candidate is to stop restricting activity, let my wrists and hands get really bad, and then repeat the bloodwork. The hope is that if there are markers that are only elevated when my hands are agitated we might see them then.

Comment via: facebook

Discuss

### What does Functional Decision Theory say to do in imperfect Newcomb situations?

8 мая, 2022 - 01:26
Published on May 7, 2022 10:26 PM GMT

Imagine a variation of Newcomb's problem where the Oracle is correct 99% of the time (say, half of people 1-box, half 2-box, and the Oracle is correct 99% of the time on each). What would FDT imply is the correct course of action then?

Discuss

### The glorious energy boost I've gotten by abstaining from coffee

7 мая, 2022 - 23:21
Published on May 7, 2022 8:21 PM GMT

Over the years, I've sometimes heard people rave about how they cut out caffeine (or just coffee) and have way more energy. My reaction has typically been to feel kind of dismissive / not really believe them / assume that means that they need to sleep more now, or have less flexibility in how much they sleep.

I kind of stumbled into cutting out coffee a few months ago, and it's just given me a lot of energy for free (i.e. without having to take more time for sleep or make any other non-trivial tradeoffs).

I wrote this up to remind myself why it's been super worth it for me, and decided to post it here as well in case it is helpful to others.

tl;dr:

Not drinking coffee is SUPER worth it for me.

On the same amount of sleep as before, I generally:

• have more energy throughout the day
• avoid what was previously a terrible late afternoon crash
• feel less tense / anxious
• often have energy into the night

Given that energy is one of the top, if not the top, constraint in how good I feel and how much I can get done of the stuff I want to do, this feels magical. It's well worth giving up the awesome jolt that coffee provides.

Previously I was drinking a fair amount of coffee:

(typically:

• 1 100 mg caffeine pill + 1-2 cups of coffee/day on weekdays, never after 4pm
• typically 1 100 mg caffeine pill + 0-1 cups on weekends (when I got 9h of sleep)

Now I:

• Drink 0-2 cups of English Breakfast tea a day (typically 1, sometimes 3)
• Will drink tea after 4pm sometimes
• Haven't had a cup of caffeinated coffee in weeks or months (I think I had half a cup, a month or two ago. I also had a decaf cappuccino a couple weeks ago).

How have things changed?

(Both before and after, I got:

• 7-7.5h of sleep on weekdays
• 9h of sleep on Fri and Sat
• A 30 second blast of cold shower (typically at the end of the shower) almost always
• This seems much more important now than it did before in terms of waking me up
• Working out in the AM probably 4-5 mornings a week

The big benefits:

• The very big ones:
• I used to feel pretty terrible in the mid-late afternoon / early evening - exhausted in my body, such that it felt very difficult to do anything productive for 1-2 hours during that time. I often felt the need to take a nap if I hadn't gotten 7.5-8 hours of sleep at night. And even after the nap / break I felt very sluggish
• I now still feel sleepy (and still end up taking a break for an hour much of the time, spanning meditating/napping and having a cup of tea/snack), but I can power through it if needed, and the slump feels much, much more minor.
• I used to feel very sluggish and low on energy at night / after dinner.
• I now often feel myself having a good amount of energy till I go to sleep (and I am still able to go to sleep pretty readily)

• I feel more energetic in the morning during the workday - rarely do I feel like I'm dragging hard until coffee kicks in (which was not a rare occurrence previously)
• I feel less anxious / tense in my body. Previously, feeling tense/tight in my body after a cup of coffee was not a rare occurrence; now I never have it (I still experience anxiety, but not that specific terrible flavor of anxiety, and I think I have less anxiety/stress overall)

Other impacts:

• Immediately upon waking up:
• I still feel a little out of it / sometimes grumpy (esp during the workweek), but rarely feel exhausted
• Though I am probably sleepier till my shower than I was before
• Upon starting my workday:
• I am sufficiently clear-headed and motivated to work just fine
• A little harder on the weekends, because I don't have the adrenalin/cortisol flowing. But my AM productivity on the weekends was pretty spotty even before.
• I can perceive a difference in my energy levels when I drink (alcohol) lightly (1 drink), moderately (2 drinks), or heavily (3+ drinks) and if I drink shortly before bed. That motivates me to drink less, and gives me an additional lever for more energy.

How did I get here?

I don't exactly remember, but I kind of stumbled into it - I think I had a 4 day weekend and so ended up better slept than I typically am, felt the need for less caffeine as a result, and that just kicked off a virtuous cycle.

Other thoughts:

This brings to mind the question of whether I'd get even more benefits by giving up caffeine entirely (or say, switching to green tea). I wonder if that would be the case, but currently I like having the ability to jolt myself / have energy at the times I want it even if I underslept a little, so am not planning to experiment with that any time soon.

Discuss

### Sealed predictions thread

7 мая, 2022 - 21:00
Published on May 7, 2022 6:00 PM GMT

Want to publicly register predictions or other statements without yet revealing what they say? Do so here. The advantage of commenting here (in addition to, say, your Shortform) is that I'll record all comments here so they can't be deleted or edited later, and I may publicly shame you if you don't reveal your statement when you say you will.

So, comment with

• the date by which (you say) you will reveal your statement, and
• a hash or another system that will provide strong evidence that your statement is what you will say it is.

And, of course, save the statement. (If you use hashing and don't save your exact statement, you of course won't be able to prove it, so not only will your registration be useless, but also it will seem as if you post-facto decided to hide your statement.) If there are some circumstances in which you would not reveal the statement--e.g., if it could be infohazardous--strongly consider

• explaining that in your comment, or
• instead of sharing the hash (or whatever) of your statement, sharing the hash of a message that (1) explains why you might not share your statement and (2) includes or links to the hash of your real statement.

Otherwise, it will look bad if the date arrives and you don't reveal your statement.

To retract or edit a statement, send me a private message within 24 hours of commenting; after that, it cannot be changed in my record (although you can of course comment to say that you no longer endorse it, or to add hashes with new statements).

Fancy methods may exist if you want to provably commit to sharing your statement.

It's your responsibility not to share infohazards. This includes ensuring that any hashes you share can't be reversed, even if computing power advances more than you expect in the time before you reveal your statement, and ensuring that you don't commit to sharing something that, it might turn out, you should not share.

Discuss

### Stories as Technology: Past, Present, and Future (on the possibilities and dangers of AI-generated fictions)

7 мая, 2022 - 20:21
Published on May 7, 2022 5:21 PM GMT

Discuss

### Minimum Viable Alignment

7 мая, 2022 - 16:18
Published on May 7, 2022 1:18 PM GMT

What is the largest possible target we could have for aligned AGI?

That is, instead of creating a great and prosperous future, is it possible that we can find an easier path to align an AGI by aiming for the entire set of 'this-is-fine' kind of futures?

For example, a future where all new computers are rendered inoperable by malicious software. Or a future where a mostly-inactive AI does nothing except prevent any superintelligence from forming, or that continuously tries to use up all over the available compute in the world.

I don't believe there is a solution here yet either, but could relaxing the problem from 'what we actually want' to 'anything we could live with' help? Has there been much work in this direction? Please let me know what to search for if so. Thank you.

Discuss

### scipy.optimize.curve_fit Is Awesome

7 мая, 2022 - 13:57
Published on May 7, 2022 10:57 AM GMT

cross-posted from niplav.github.io

I recently learned about the python function scipy.optimize.curve_fit, and I'm really happy I did.

It fulfills a need I didn't know I'd always had, but never fulfilled: I often have a dataset and a function with some parameters, and I just want the damn parameters to be fitted to that dataset, even if imperfectly. Please don't ask any more annoying questions like “Is the dataset generated by a Gaussian?” or “Is the underlying process ergodic?”, just fit the goddamn curve!

And scipy.optimize.curve_fit does exactly that!

You give it a function f with some parameters a, b, c, … and a dataset consisting of input values x and output values y, and it then optimizes a, b, c, … so that f(x, a, b, c, …) is as close as possible to y (where, of course, x and y can both be numpy arrays).

This is awesome! I have some datapoints x, y and I believe it's generated by some obscure function, let's say of the form f(x,a,b,c)=a⋅x⋅sin(b⋅x+c).mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-surd + .mjx-box {display: inline-flex} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor; overflow: visible} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} , but I don't know the exact values for a, b and c?

No problem! I just throw the whole thing into curve_fit (scipy.optimize.curve_fit(f, x, y)) and out comes an array of optimal values for a, b, c!

What if I then want c to be necessarily positive?

Trivial! curve_fit comes with an optional argument called bounds, since b is the second argument, I call scipy.optimize.curve_fit(f, x, y, bounds=([-numpy.inf, -numpy.inf, 0], numpy.inf)), which says that curve_fit should not make the second argument smaller than zero, but otherwise can do whatever it wants.

So far, I've already used this function two times, and I've only known about it for a week! A must for every wannabe data-scientist.

For more information about this amazing function, consult its documentation.

Discuss

### Quote request: "if even the Sun requires proof"

7 мая, 2022 - 13:30
Published on May 7, 2022 10:30 AM GMT

I remember reading a quote, I think from the Buddhist or some other Eastern tradition, about how if you've argued yourself into believing the Sun doesn't exist, you've gone horribly wrong. It went something like "Nothing can be known, if even the Sun requires proof". I think I read it either on LessWrong or on gwern.net several years ago, but for the life of me I can't find it. I remember it being phrased more poetically than prosaically.

Does anyone know what quote I'm referring to, and where to find it?

Discuss

### Best open-source online textbooks / what books do y'all want to be collaborative?

7 мая, 2022 - 12:52
Published on May 7, 2022 9:52 AM GMT

I'm looking for online open-source / generously licensed textbooks, papers or tutorials.

Think of stuff like: http://neuralnetworksanddeeplearning.com/

Why? I'm currently running https://chimu.sh--a collaborative learning platform. Quick explanation: Chimu combines an e-reader with a Stack Overflow-like Q&A forum. As people read, they can view others' questions and ask their own. Demo here (desktop works best).

I need to seed the site with initial content, and I figured LW would be a great place to ask.

With this in mind, what are good online tutorials / textbooks that people here have learned from? Is there any book or paper that you wish that you could discuss with your friends?

Discuss

### Hard evidence that mild COVID cases frequently reduce intelligence

7 мая, 2022 - 08:55
Published on May 7, 2022 5:55 AM GMT

I managed to find a literature review on post-covid studies, which only targeted people with "mild" cases.

Much of the data is self-evaluation, and everything here is obviously inferior to the kind of data that could be procured by a specialist firm on Wall Street (let alone the military). It also goes along with the usual suggestion that brain damage isn't permanent and that everyone will be fine if they just wait long enough and adjust.

It is also probably the best data you've ever seen on Long Covid, by a massive margin. Because that's the kind of world we live in right now.

There is a data point that suggests that cognitive dysfunction persists after 3 months in mild cases, but most of the data points indicate fatigue that persists 3 months after a mild infection. This is notable because fatigue reduces intelligence, if nothing else by dramatically increasing how taxing your workday is. There are also some data points indicating that fatigue persists from around half of all mild cases.

People on Lesswrong and elsewhere seem to think that P100 masks are the best bet for protection, and I really hope they're correct because the CDC is once again saying nothing on the topic (and I wouldn't trust them if they did, since these are the same bozos who knowingly pumped out broken COVID tests and told everyone to wear cloth masks for two whole years). Amazon is obsessed with labeling P100 masks as "dust masks" and cloth/surgical masks as "COVID masks". I have a lot riding on my P100 mask during my flight tomorrow, so I really hope they got it right, but it looks like they did.

If you like the idea of being smart, either right now or in the future, then you should probably stop worrying about Nootropics or Nutrition, and start worrying about respiratory aerosols, room ventilation, and P100 masks. Because at this point, it's clear that reality is allowed to give you debilitating brain damage, and that COVID is a brain virus.

Discuss

### But What's Your *New Alignment Insight,* out of a Future-Textbook Paragraph?

7 мая, 2022 - 06:10
Published on May 7, 2022 3:10 AM GMT

This is something I've been thinking about a good amount while considering my model of Eliezer's model of alignment. After tweaking it a bunch, it sure looks like a messy retread of much of what Richard says here; I don't claim to assemble any new, previously unassembled insights here.

Tl;dr: For impossibly difficult problems like AGI alignment, the worlds in which we solve the problem will be worlds that came up with some new, intuitively compelling insights. On our priors about impossibly difficult problems, worlds without new intuitive insights don't survive AGI.

Object-Level Arguments for Perpetual Motion

I once knew a fellow who was convinced that his system of wheels and gears would produce reactionless thrust, and he had an Excel spreadsheet that would prove this - which of course he couldn't show us because he was still developing the system.  In classical mechanics, violating Conservation of Momentum is provably impossible.  So any Excel spreadsheet calculated according to the rules of classical mechanics must necessarily show that no reactionless thrust exists - unless your machine is complicated enough that you have made a mistake in the calculations.

And similarly, when half-trained or tenth-trained rationalists abandon their art and try to believe without evidence just this once, they often build vast edifices of justification, confusing themselves just enough to conceal the magical steps.

It can be quite a pain to nail down where the magic occurs - their structure of argument tends to morph and squirm away as you interrogate them.  But there's always some step where a tiny probability turns into a large one - where they try to believe without evidence - where they step into the unknown, thinking, "No one can prove me wrong".

Hey, maybe if you add enough wheels and gears to your argument, it'll turn warm water into electricity and ice cubes!  Or, rather, you will no longer see why this couldn't be the case.

"Right! I can't see why couldn't be the case!  So maybe it is!"

Another gear?  That just makes your machine even less efficient.  It wasn't a perpetual motion machine before, and each extra gear you add makes it even less efficient than that.

Each extra detail in your argument necessarily decreases the joint probability.  The probability that you've violated the Second Law of Thermodynamics without knowing exactly how, by guessing the exact state of boiling water without evidence, so that you can stick your finger in without getting burned, is, necessarily, even less than the probability of sticking in your finger into boiling water without getting burned.

I say all this, because people really do construct these huge edifices of argument in the course of believing without evidence.  One must learn to see this as analogous to all the wheels and gears that fellow added onto his reactionless drive, until he finally collected enough complications to make a mistake in his Excel spreadsheet.

Manifestly Underpowered Purported Proofs

If I read all such papers, then I wouldn’t have time for anything else. It’s an interesting question how you decide whether a given paper crosses the plausibility threshold or not … Suppose someone sends you a complicated solution to a famous decades-old math problem, like P vs. NP. How can you decide, in ten minutes or less, whether the solution is worth reading?

The techniques just seem too wimpy for the problem at hand. Of all ten tests, this is the slipperiest and hardest to apply — but also the decisive one in many cases. As an analogy, suppose your friend in Boston blindfolded you, drove you around for twenty minutes, then took the blindfold off and claimed you were now in Beijing. Yes, you do see Chinese signs and pagoda roofs, and no, you can’t immediately disprove him — but based on your knowledge of both cars and geography, isn’t it more likely you’re just in Chinatown? I know it’s trite, but this is exactly how I feel when I see (for example) a paper that uses category theory to prove NL≠NP. We start in Boston, we end up in Beijing, and at no point is anything resembling an ocean ever crossed.

What's going on in the above cases is argumentation from "genre savviness" about our physical world: knowing, based on the reference class that a purported feat would fall into, the probabilities of feat success conditional on its having or lacking various features. These meta-level arguments rely on knowledge about what belongs in which reference class, rather than on in-the-weeds object-level arguments about the proposed solution itself. It's better to reason about things concretely, when possible, but in these cases the meta-level heuristic has a well-substantiated track record.

Successful feats will all have a certain superficial shape, so you can sometimes evaluate a purported feat based on its superficial features alone. One instance where we might really care about doing this is where we only get one shot at a feat, such as aligning AGI, and if we fail our save everyone dies. In that case, we will not get lots of postmortem time to poke through how we failed and learn the object-level insights after the fact. We just die. We'll have to evaluate our possible feats in light of their non-hindsight-based features, then.

Let's look at the same kind of argument, courtesy Eliezer, about alignment schemes:

On Priors, is "Weird Recursion" Not an Answer to Alignment?

I remark that this intuition matches what the wise might learn from Scott’s parable of K’th’ranga V: If you know how to do something then you know how to do it directly rather than by weird recursion, and what you imagine yourself doing by weird recursion you probably can’t really do at all. When you want an airplane you don’t obtain it by figuring out how to build birds and then aggregating lots of birds into a platform that can carry more weight than any one bird and then aggregating platforms into megaplatforms until you have an airplane; either you understand aerodynamics well enough to build an airplane, or you don’t, the weird recursion isn’t really doing the work. It is by no means clear that we would have a superior government free of exploitative politicians if all the voters elected representatives whom they believed to be only slightly smarter than themselves, until a chain of delegation reached up to the top level of government; either you know how to build a less corruptible relationship between voters and politicians, or you don’t, the weirdly recursive part doesn’t really help. It is no coincidence that modern ML systems do not work by weird recursion because all the discoveries are of how to just do stuff, not how to do stuff using weird recursion. (Even with AlphaGo which is arguably recursive if you squint at it hard enough, you’re looking at something that is not weirdly recursive the way I think Paul’s stuff is weirdly recursive, and for more on that see https://intelligence.org/2018/05/19/challenges-to-christianos-capability-amplification-proposal/.)

It’s in this same sense that I intuit that if you could inspect the local elements of a modular system for properties that globally added to aligned corrigible intelligence, it would mean you had the knowledge to build an aligned corrigible AGI out of parts that worked like that, not that you could aggregate systems that corrigibly learned to put together sequences of corrigible thoughts into larger corrigible thoughts starting from gradient descent on data humans have labeled with their own judgments of corrigibility.

Eliezer often asks, "Where's your couple-paragraph-length insight from the Textbook from the Future"? Alignment schemes are purported solutions to problems in the reference class of impossibly difficult problems, in which we're actually doing something new, like inventing mathematical physics for the very first time, and doing so playing against emerging superintelligent optimizers. As far as I can tell, Eliezer's worry is that proposed alignment schemes spin these long arguments for success that just amount to burying the problem deep enough to fool yourself. That's why any proposed solution to alignment has to yield a core insight or five that we didn't have before -- conditional on an alignment scheme looking good without a simple new insight, you've probably just buried the hard core of the problem deep enough in your arguments to fool your brain.

So it's fair to ask any alignment scheme what its new central insight into AI is, in a paragraph or two. If these couple of paragraphs read like something from the Textbook from the Future, then the scheme might be in business. If the paragraphs contain no brand new, intuitively compelling insights, then the scheme probably doesn't contain the necessary insights but distributed across its whole body either.[1]

1. ^

Though this doesn't mean that pursuing that line of research further couldn't lead to the necessary insights. The science just has to eventually get to those insights if alignment is to work.

Discuss

### Seattle Robot Cult

7 мая, 2022 - 03:25
Published on May 7, 2022 12:25 AM GMT

This event's topic of discussion is "Propaganda and infohazards , how aesthetic and normative content modify the effects of spreading factual content".

Discuss

### Upgrading Imagination: The Promise Of DALL-E 2 As A Tool For Thought

7 мая, 2022 - 02:43
Published on May 6, 2022 6:12 PM GMT

[Cross-posting this here, from my blog, Echoes and Chimes]

When I was a kid, I’d spend the hours after school on the circular trampoline we had assembled in our backyard. I’d go into a kind of trance. Though my body would be engaged in jumping up and down — or, when that grew dull, and when the blue plastic cover that obscured the springs wore thin and blew away, in walking around the trampoline’s perimeter in a game of balance — my mind flew far away. I constructed an elaborate fictional world in which to pass my afternoons.

Hopelessly derivative, it was inspired by whatever media I was consuming at the time - I recall the Ratchet and Clank series, and Avatar: The Last Airbender both being heavily influential. I envisioned a galaxy of planets, each with distinct people, cultures, and economic activities. The system was quasi-federal; the political relations complex. For example, I recall creating the planet “Laboris”, the lab planet, where new technology would be developed and tested, before being shipped to worlds the star-system over. There was a planet dedicated to retail and advertising - animated by the spirit of Moloch, I now realise. And so on. Populating these planets, in starring roles, were my friends, family members, and any other people in my life at the time, all reimagined in the art style of my mind’s eye.

As my real life changed, so did my fictional one. Alliances were forged and broken. Civilisations rose and fell. It was a serial drama, with some weeks covering mere days, and some days spanning decades. As serious challenges came up at school — a block of tests, for example — I would envision each obstacle taking a physical form, like an elaborate boss fight. Although the details escape me, I recall my Grade 7 exams mapping onto the mythology of 2007’s Heavenly Sword.

I found great joy in specificity — I’d try to imagine the content of the advertisements that ran on the intergalactic cable channels, the outfits worn by my fellow adventurers, that kind of thing. In pursuing these specifics, I’d frequently run into the limits of my own imaginative capacity. While I was strong on narrative (a consequence of all the books I’d swallowed), and while each character I created evoked strong vibes in my brain, I was quite unable to picture exactly what they looked like. It was like trying to pin fog to paper.

This confrontation with my own limits frequently led me to fantasise about a technology that could scan my brain and produce images of what I was thinking, so I could show my peers, and so I could circumvent my utter inability to draw. I thought it might work like a polygraph machine, with a mess of wires that, through wizardry, would output my thoughts on a page.

I grew up, and I relegated this dream to the realm of science-fiction. I accepted the limits of my own creativity, and was happy to work within them. So I couldn’t produce beautiful illustrations, or really any illustrations at all. Fine; at least I was good with words. And if the medium of language, written and spoken, was good enough for the endless procession of great writers/thinkers/creators that inspired me, no doubt it was sufficient for me too.

That was pretty much my position until I came across DALL-E 2. DALL-E 2 is an “AI system that can create realistic images and art from a description in natural language”.

You’ve probably seen systems that do similar things — I had. DALL-E 2 stands out, though, for the quality of the images it seems capable of producing, and for their worth as expressions of a machine’s imagination. If you haven’t seen it in action, it’s worth checking out (in addition the link above) this video, this twitter thread that turned bios into images , or this one that explores the system’s various limitations

I’m not writing this to dissect the abilities of DALL-E 2 per se — not least because that’s already been done (see e.g. “what DALL-E 2 can and cannot do”), and because I don’t actually have access to it. Instead, let’s get back to my childhood fantasies. Is this the technology I dreamed of as a kid?

As far as I can tell…not quite, but it’s remarkably close. If I had access to something like this when I was younger, I think it would’ve been transformative. It would have given me a new tool with which to explore my inner life. And the fact that this exists now sparks hope in me for what we might see in the future. A model that can turn short stories into short films, for example, doesn’t seem especially far off.

Now, you may ask: “But who would want this? What use would this be? What about poverty/inequality/glaring social problems?”.  To this I’d respond, in turn, “I’d want this! I think it might deepen my capacity for thought. And yes, other social issues are obviously important and deserve attention, but thankfully it’s not a zero-sum game”.

Let’s focus on my second response, about deepening capacity for thought, which I think is the most interesting part here. To explain what I mean, I want to briefly talk about “conceptual metaphor”. This, in my rudimentary understanding, is the idea that metaphors are not just literary devices, but are fundamental to how we make sense of the world, as they allow us to compare a domain we understand to one we do not; and in doing so, to understand abstract ideas. These kinds of metaphors are embedded in our everyday language — consider how phrases like “the price of oil rose” or “the market fell” are predicated on the idea that MORE IS UP, LESS IS DOWN; or how a phrase like “today dragged on” is predicated on the idea that TIME PASSING IS MOTION.

What’s interesting about this is that it suggests that metaphors occur at a level deeper than language — they inform how we generate thoughts to begin with. I’ve been thinking about this a lot lately, as I’ve been trying to write about how complex social problems are often discussed in inescapably metaphorical terms. For example, one frequently encounters talk of the “fight against corruption”, the implication being something like CORRUPTION IS AN ENEMY IN WAR. I think this can be useful to some extent, as it helps us understand the nature of the problem, but unhelpful in other ways, as for example it leaves ambiguous what the conditions for victory in this fight are.

Anyway. Where I’m going with this is that, if large swathes of our thoughts are metaphorical, relying on intuitive comparisons between domains, having a tool that allows one to visualise these comparisons would be incredibly interesting. It would let me see what the fight against corruption might look like, which might help me pinpoint why the comparison is or is not a good one.

Another example: often, when I’m trying to do multiple tasks at once, I think of an image of a man spinning multiple plates. You can find dozens of visualisations of this online — here’s a clip of a guy doing it in 1958 —  but it’s not quite what I’m picturing.  I don’t want to claim that, if I could only use a tool to generate the right image of a plate-spinning man, an image that is sufficiently similar to my internal experience, that I would then understand something new about the nature of multitasking. That doesn’t seem true.

But at the same time, if I was able to perform this kind of check regularly, where I generate images based on the implicit metaphors that are informing my thoughts, I’d guess that that might lead me to notice which metaphors are particularly good or bad at helping me understand the world, and that my worldview would update accordingly. So I can see how one could draw a line between using this kind of AI system regularly, and enriching one’s internal experience. Indeed, that seems to be what Sam Altman, CEO of OpenAI (the organisation that developed DALL-E 2) is getting at when he says that using the system has “slightly rewired [his] brain”.

I don’t think this use-case of clarifying conceptual metaphors is the only way a system like DALL-E 2 could enrich thought. It also just seems like a fun way to explore the nonsense word fragments that sometimes pop into one’s head. The other day, for example, the phrase “crocodiles in Lacoste!” showed up while I was driving. With DALL-E 2, I could really see that. This would be low-grade amusing to current, mid-20s me, but I think it would’ve absolutely blown my child self’s mind.

Younger me would’ve recognised that here was a way to fill in all the specifics of the fictional world in which I spent so many afternoons. Here was a way to nail down the aesthetics of my fellow adventures, my friends reimagined, as I experienced them internally. Here was a way to upgrade my imagination. That’s how the kid in me feels now, at any rate.

I don’t know where any of this will lead, but, x-risk aside, I’m eager to find out.

Discuss

### D&D.Sci Divination: Nine Black Doves

7 мая, 2022 - 02:02
Published on May 6, 2022 11:02 PM GMT

This is an entry in the 'Dungeons & Data Science' series, a set of puzzles where players are given a dataset to analyze and an objective to pursue using information from that dataset.

STORY (skippable)

The omens are troubling for we Romans this year!  Black doves have been spotted circling above major cities in all three of the Empire's main provinces.  Not only that, news from abroad confirms that they have been seen across the world, from the wastes of Britannia in the West to the lush shores of Persia in the East.

There is confident general agreement that the black doves are a Very Bad Omen.  No two people quite agree on why, though.  Some say that they are emissaries of Pluto, sent out to escort the souls of the dead, and that their coming means Pluto must expect many to die.  Others say that the birds have foreseen Jupiter's displeasure, and are taking flight to escape it.  Captive philosophers from the Grecian territories say something about how they are the shadows on the walls of a cave cast from the true forms outside - but since the philosophers also say that we are all trapped in a cave at all times, no-one really takes them seriously.

You are a diviner in Rome.  Your unconventional data-driven approach to divination has not endeared you to the broader divination community, so business has been quite scarce.  However, two weeks ago, the main College of Divination in Rome was struck by lightning and burned down.  Few of the staff were actually harmed, but the broad view is that anyone who built their college somewhere it would get struck by lightning clearly isn't a very good diviner.  As such, when the recent ill omens arose, the Emperor approached you (as the best-known non-College diviner) to ask for advice on how best to weather this situation.  This could be your big break, and your chance to demonstrate the superiority of your school of Data-Driven Divination!

The Emperor has given you the dataset gathered by years of Imperial diviners, which tracks the Omens that were recorded and the Disasters that happened in each year.

The good news is that this dataset is extremely extensive (reaching back to the founding of the Imperial system of Divination, which they mark as Year 0 in their system) and comprehensive (Imperial surveyors gather an annual update on every interesting omen and disaster that happened in the past year, even in foreign countries).

The bad news is that, rather than tracking the things your study of Data-Driven Divination has led you to think might be useful predictors (such as the total population of each region, or the value of goods and services produced therein), it tracks the things the existing diviners think are useful predictors, such as whether children have been born with an unusual number of heads.  You're...not actually sure these are good predictors of disaster?  Still, they're what you've got, so you might as well work with them.

If you can do well, and impress the Emperor, you think you can convince him to adopt your school of Data-Driven Divination across the Empire, and usher in a new era of data-driven prosperity!

If you do badly...yeah, he'll probably have you executed.  So.  Uh.  No pressure.

DATA & OBJECTIVES

The Emperor has allocated a budget of 60,000 denarii to mitigation efforts for the upcoming year (Year 1081) across the three provinces of the Roman Empire (Hispania, Italia and Grecia).  His administrators have devised the following strategies:

• For 5,000 denarii, the priests in any one province can conduct a ritual entreating Vulcan to hold the Titans firmly in their prison beneath the earth.  This will reduce the risk of earthquakes in that province by 80% this year.
• For 10,000 denarii, the priests in any one province can entreat the protection of Asclepius to ward off plague.  This will prevent all plague in that province this year.
• For 10,000 denarii, the Emperor can invest in a fire protection service in any one province.  This will reduce the risk of fires in that province by 70% this year.
• For 10,000 denarii, grain shipments can be made to any one province.  This will prevent all famines in that province this year.
• For 10,000 denarii, the soldiers on our border with any one adversary (Britannia, Germania or Persia) can be reinforced.  This will halve the chance that we are pillaged by them this year.
• For a further 5,000 denarii (15,000 in total), the soldiers can also be equipped and encouraged to attack that adversary.  This will double the chance that we pillage them this year (along with reducing the chance that they pillage us).

He's given you a dataset of historical omens and disasters, and asked your advice on which options to take.  You've asked about what exactly your goals should be, and got the following answer:

• Minimize the total number of disasters that take place in the Empire in the upcoming year (year 1081).
• The dataset lists all omens and disasters that have happened in prior years, from Year 0 to Year 1080.  The people and the Emperor believe that 'black doves this year' implies 'many disasters incoming', but you're welcome to use the dataset however you wish.
• All disasters are equally bad: three fires and one outbreak of plague, or two famines and two pillagings, both count the same, as '4 total disasters'.
• Successfully pillaging an enemy nation is exactly as good as a disaster is bad - it counts as -1 disasters to your score.
• You can be completely confident that the interventions function as advertised - your goal is simply to figure out which interventions provide optimal protection.
• You can buy an intervention multiple times pointed at different provinces/adversaries: for instance, you could spend 15,000 denarii for earthquake protection in all three  provinces.
• You cannot buy the same intervention twice for the same province/adversary: for instance, you cannot spend 20,000 denarii for twice as much fire protection in one province.
• Note that you do not have access to 'what omens happen in 1081' to predict 'what disasters happen in 1081'.  By the time the Imperial surveyors have gathered that information, 1081 will be over and the disasters will have already happened.  You will need to use omens from previous years (1080 and earlier) to predict what disasters will happen in 1081.

Based on the dataset, you need to advise the Emperor on how to spend his 60,000 denarii of budget.

I'll aim to post the ruleset and results on May 16th (giving one week and both weekends for players).  If you find yourself wanting extra time, comment below and I can push this deadline back.

As usual, working together is allowed, but for the sake of anyone who wants to work alone, please spoiler parts of your answers (type a '>' followed by a '!' at the start of a line to open a spoiler block) that contain information or questions about the dataset.

Thank you to abstractapplic, who reviewed a draft of this scenario.  (For the avoidance of doubt, abstractapplic does not have inside information on the scenario, and is free to play it).

Discuss