# Новости LessWrong.com

A community blog devoted to refining the art of rationality
Обновлено: 44 минуты 32 секунды назад

### Are there any good ways to place a bet on RadicalXChange and/or related ideas/mechanisms taking off in a big way? e.g. is there something to invest $in? 17 апреля, 2021 - 09:58 Published on April 17, 2021 6:58 AM GMT xposted on Twitter https://twitter.com/DavidSKrueger/status/1383313776938688521 Discuss ### Alex Flint on "A software engineer's perspective on logical induction" 17 апреля, 2021 - 09:56 Published on April 17, 2021 6:56 AM GMT This Sunday at noon PT, Alex Flint will be giving a short talk on how he thinks about Logical Induction from a software engineer's perspective. He describes the talk as "My attempt to understand and re-explain logical induction as a practical alternative to probability." We'll be meeting in the Walled Garden at noon, on Sunday April 18th. http://garden.lesswrong.com?code=qCM0&event=alex-flint-on-logical-induction-from-a-software-engineer-s Discuss ### On Sleep Procrastination: Going To Bed At A Reasonable Hour 17 апреля, 2021 - 07:53 Published on April 17, 2021 2:10 AM GMT Who Would Find This Article Most Helpful • Those who find going to bed at a reasonable hour a major bottleneck to getting enough sleep and maintaining a healthy sleep schedule • Those interested in thinking more deeply about their mindset with respect to sleep and productivity TL;DR Things I’ve Tried Shortened • Paying someone else$0.01 for every minute later I go to bed than my bedtime. (Perhaps next time I could use SPAR).
• FocusMate while doing bedtime routine.
• Posters reminding myself to sleep.
• App/Website blockers.
• Making rough calculations on productivity loss.

Things I am Currently Trying

• App/Website blockers.
• Listing out the ways I fail to go to sleep earlier and strategizing ways to combat those failure modes.

Mindset Shifts

• Framing staying up late to meet deadlines as a high-interest loan.
• Recognizing planning fallacy with variability in how we feel the next day while being sleep deprived.
• Realizing that giving up sleep to be successful backfires.

Recommended Exercises

• Making rough calculations for loss of productivity on sleep-deprived days.
• Enumerating and combatting failure modes that lead to sleep procrastination.

My Personal Sleep Procrastination How I Procrastinate on Sleep

I procrastinate on going to sleep in a wide variety of seemingly unrelated ways:

• Having long phone calls
• Listening to good music
• Cleaning up my room
• Having messaging exchanges
• Reading cool articles or a good book
• Finishing up homework
• Making posters
• And many more!

Things I’ve Tried

• Paying someone else $0.01 for every minute later I go to bed than my bedtime. It worked in the sense that my sleep schedule became better, but I ended up losing enough money that it was painful enough for me to discontinue this. Having a sleep accountability buddy system could work better (two people paying each other, so that no one loses too much). • FocusMate nightly routine sessions. I ended up cleaning my email instead of brushing my teeth, showering, and actually going to sleep. • App & Website Blockers. I kept quitting the blockers. I am currently experimenting with a script (for Freedom on Mac) that reopens the blocker every five minutes, which seems quite effective. I do sometimes go on my Phone/iPad instead to avoid the blockers. • Listening to the audiobook Why We Sleep, but I still find going to bed at a reasonable hour hard. • Posters reminding myself to sleep. Didn’t work but probably was helpful towards shifting my mindset a little bit. Given that a lot of these interventions didn’t work, I hypothesize that shifting my mindset about sleep—even though I know it is quite important—is probably necessary to ensure that my system of tools actually works. Why Sleep is Hard & Making the Mindset Shift 1. Staying Up May (Sometimes) Be the Optimal Choice There are times when we do not regret staying up late. In fact, in the short-run, staying up late may well be the optimal decision. Here are some examples: • Preparing a presentation that you need to present the next morning. It would be extremely unprofessional to have many visibly unfinished slides. • Finishing up an assignment that is worth many points and has a hard due date, so that maybe you get a 90% on the assignment rather than a 50%. • Studying for an exam (sometimes). For instance, in my experience, it could be better to learn 5 new Spanish vocabulary words and be slightly more sleep-deprived the next day than having learned 0. The general theme is that there is a hard deadline that one has to meet and that it would be much better in the short run to meet it than to get the extra dose of sleep.[1] The Problem We might** extrapolate from the experiences where staying up later would be the optimal choice** and assume that we should stay up late to be more productive and get more done. However, forgoing sleep could be thought of as a high-interest loan.[2] You’re likely to be less effective and feel awful the next day. In addition, it takes roughly 4 days to recover 1 hour of sleep debt in terms of performing at the optional level. 2. Variability in How We Feel the Next Day Maybe sometimes you stayed up late and did something productive, and when you woke up the next day you didn’t feel that bad. The Problem With planning fallacy, you think that you’d end up ok tomorrow even if you sleep late tonight when that is not always the case. 3. Stories of Successful People Who Sleep Very Little Elon Musk. Steve Jobs. Nikola Tesla. Thomas Edison. (Maybe) your friends at elite universities. The Problem Our careers span decades. Maybe being sleep deprived for a few years can work out, but this is unsustainable in the long run. Steve Jobs died young. Nikola Tesla wrote love letters to his pigeon. Elon Musk’s tweets suggest that he may not be thinking clearly. Meanwhile, Jeff Bezos gets a full 8 hours. In addition, correlation does not imply causation, and we cannot extrapolate from the habits of successful people due to survivorship bias; there are many people who have slept little but were not successful. Numerous studies suggest that more sleep increases productivity and sleep deprivation is a major risk factor in terms of developing burnout. 4. Wanting To Accomplish Something If you don’t do X tonight, you don’t think you’ll ever do X. Or perhaps you feel like you haven’t been productive for the rest of the day and have an urge to stay up late to catch up on some work. The Problem Is doing X more important than getting enough sleep? Does doing X even matter at all? If it was actually that important would I really never do it? Is it possible that you were unproductive during the day because you were sleep deprived? Recall that numerous studies suggest that sleep increases productivity and decreases the risk of burnout. 5. Insomnia If you probably won’t fall asleep anyway, might as well go to bed later. The Problem I’ve definitely experienced a lot of insomnia. I have plenty of recollections of being awake at 5am when I went to bed at a much more reasonable time. There are many great articles on how to fall asleep more easily. Personally, I found that wearing these blue / green blocking glasses[3] a couple of hours before sleeping (still experimenting with exact timing) and taking magnesium glycinate supplements[4] right before bed is a very low effort way to mostly get rid of my insomnia, though this may or may not work for you.[5] Exercises To Try 1. Reading Why We Sleep[6] Realizing that sleep is the foundation for optimal performance and good health (ability to eat healthy and desire to exercise are both impacted) has led me to write this post and want to prioritize sleep. An easy way to do this is listening to an audiobook while going on walks or doing chores like cooking. I realize that there are many critiques on this book, but if you know that your sleep deprivation is actively harming you, which it was in my case, reading it is a net positive. Sometimes seeking out something helpful or useful in the short run may mean sacrificing a little bit of truth. This is sometimes ok. 2. Estimating Short-Term Productivity Decline from Sleep Deprivation As someone who cares a lot about my productivity, having a rough number for how much less productive I will be if I were sleep deprived could be a useful exercise so I don’t fall into a planning fallacy of believing that the next day could be productive. Note that these are extremely rough estimates. If you know of better estimates than I use here, please leave a comment and I can update these very rough calculations. Let’s say 25% of the time when I am sleep deprived • I listen to 1 hour less podcasts/audiobooks while doing chores like cooking or taking walks because I am too tired to listen to them. • I am** 30% less productive** when I am doing work. This means that if I were working 8 hours, I would take 2.4 hours longer to complete tasks. • I spend 1.5 additional hours unproductively feeling awful. This would amount to approximately 1 + 2.4 + 1.5 = **4.9 less productive hours **in the day. 50% of the time when I am sleep deprived • I listen to 30 mins less podcasts/audiobooks while doing chores like cooking or taking walks because I am too tired to listen to them. • I am 10% less productive when I do work. This means that if I were working 8 hours, I would take 0.8 hours longer to complete tasks. • I spend 1 additional hour unproductively feeling awful. This would amount to approximately 0.5 + 0.8 + 1 = 2.1 less productive hours in the day. 10% of the time when I am sleep deprived • I am 5% less productive when I do work. This means that if I were working 8 hours, I would take 0.4 hours longer to complete tasks. This would amount to approximately 0.5 + 0.8 + 1 = 0.4 less productive hours in the day. Therefore, the expected value estimate is that I will be losing 0.254.9 + 0.502.1 + 0.10*0.4 = 2.315 hours from a lack of sleep a day. My guess is that this is likely to be an underestimate because it doesn’t take into account poor decisions being made (choosing to prioritize task B over task A), the increased likelihood of falling sick if you are sleep deprived, and the increased likelihood of burning out for an indefinite amount of time, and the fact that it takes around 4 days to make up for the lack of sleep. These estimates also do not account for the likely decrease in lifespan that may occur due to a chronic lack of sleep. That being said, productivity is not the only thing that suffers from the lack of sleep. It also impacts your subjective well-being, desire to exercise, ability to moderate your eating, social interactions, and more. 3. Enumerating Failure Modes While I generally have an intuitive idea of why I fail to go to sleep early enough, explicitly listing out the failure modes leads to more concrete and effective targeting mechanisms. Here’s an example 1. I realized that I stayed up to finish a task (downloading software and trying it out) 2. I list out this failure mode and strategize ways to combat it. I realized that I was excited to get the task done and afraid that I would not ever finish the task if I didn’t stay up to 3. I started running a recurring script (for Mac) every night that will play "You might think that you'll never do the task if you don't stay up to do it. If the task really is that important, you will do it later; if the task isn't, then sleep is more important" to my speakers / headphones. Surprisingly, it worked. This is probably because • I internalized that sleep was more important than whatever it was I was doing after doing the above 2 exercises • The script reminded me to start my nightly routine • The timing of the script corresponds to when my website blocks start Here is the link to a template for this spreadsheet. I’ve been doing this consistently for a couple weeks already. I use Tab Snooze to automatically open up the tab every morning (recurring tab openings show up as a pro feature but I haven’t paid for it and it still works for me). Recap A copy and paste of the TLDR Things I’ve Tried Shortened • Paying someone else$0.01 for every minute later I go to bed than my bedtime. (Perhaps next time I could use SPAR).
• FocusMate while doing bedtime routine.
• Posters reminding myself to sleep.
• App/Website blockers.
• Making rough calculations on productivity loss.

Things I am Currently Trying

• App/Website blockers.
• Listing out the ways I fail to go to sleep earlier and strategizing ways to combat those failure modes.

Mindset Shifts

• Framing staying up late to meet deadlines as a high-interest loan.
• Recognizing planning fallacy with variability in how we feel the next day while being sleep deprived.
• Realizing that giving up sleep to be successful backfires.

Recommended Exercises

• Making rough calculations for loss of productivity on sleep-deprived days.
• Enumerating and combatting failure modes that lead to sleep procrastination.

Acknowledgments

I appreciate Sydney Von Arx, Constantin, Chris Lakin, Raj Thimmiah, Talya, Kevin, Hawk, Ti Guo, Alan Taylor, Tony, and Aaron Gertler for reviewing a draft of this post and providing feedback. I take responsibility for all errors in this document.

I am also grateful for everyone who has nudged me towards prioritizing sleep and sharing strategies on how to do so. I definitely feel better with a more stable sleep schedule.

PS: Say hi

Some of my most interesting and life-changing conversations and insights come from other people. Even if you don’t feel like you have anything to say, you probably have something interesting to share. All of us come from different backgrounds, so something that is obvious to you may have been overlooked by me.

What does this mean?

Cross posted on blog.emily.fan and EA Forum

1. However, if you commit to going to bed at a reasonable time, and as a result don't end up finishing the presentation, you will suffer in the short term. The next time a similar situation occurs, you will have a very hard incentive to start work earlier, and thus avoid this completely. Timeless decision theory may also be relevant here. ↩︎

2. The notion of sleep as a high interest loan was inspired by a friend who was inspired by the CFAR handbook. ↩︎

3. I found these red glasses more intense than the orange ones. Note that they smell bad initially, so you might want to first air them out. Also, this is an Amazon affiliate link if you feel like supporting me with your purchase and are based in the US. Otherwise, this link should work, and no hard feelings. ↩︎

4. Again, this is an Amazon affiliate link if you feel like supporting me with your purchase and are based in the US. Otherwise, this link should work, and no hard feelings. ↩︎

5. Another common supplement is melatonin. Melatonin worked great in terms of falling asleep quickly, but I did not wake up feeling well rested. ↩︎

6. Again, this is an Amazon affiliate link if you feel like supporting me with your purchase and are based in the US. Otherwise, this link should work, and no hard feelings. ↩︎

Discuss

### All is fair in love and war, on Zero-sum games in life

17 апреля, 2021 - 05:11
Published on April 17, 2021 2:11 AM GMT

Crossposted from my blog: Dark Rationality

Why can’t we all just get along? Is a good question to ask, even if you won’t like the answer. Naval Ravikant posted on Twitter that people should avoid playing zero-sum games and focus on the positive-sum game of wealth creation, the irony is quite amusing considering twitter is a zero-sum-silicon-valley-engineered status game that causes many of its participants to become utterly obsessed with their follower counts. In this case, the medium is the message.

The case I’m going to be making in this post is that for humans zero-sum games are not only unavoidable but are actually very important and that these truths are being suppressed and denied because of their dire implications.

Zero-sum games in life

The most famous and clear-cut zero-sum game in life is wars, two or more sides fighting over some scarce resource, may it be oil, land, or trade routes.

The entire human history was filled with wars, in many cases wars can be considered even a negative-sum game as the aggregate loss from both sides in many cases is larger than the utility gain of the winning side. A contrarian reader would probably mention world war 2 and the positive influence it had on the development of many technologies like computation or nuclear power. But I would argue that the price we paid in economic destruction and suffering was quite high, and it’s unclear if on aggregate the result was positive. But even if it was the case this is the exception and not the norm.

But wars are not what they used to be, it seems that nuclear weapons and the MAD doctrine have made wars somewhat obsolete so an argument can be made that while wars used to be a significant problem in the past their importance in human life seems to be heavily declining.

The second big zero-sum game in life is status, here things start to become unpleasant. Because it’s quite hard to deny that status is zero-sum and while wars are becoming rare it doesn’t seem that status-wars follow suit. And if you believe the ideas of Peter Turchin about the overproduction of elites they might have even become worse with time. Judging by revealed preferences, social status is very important to people – many spend a large number of resources on Veblen goods and signaling (the broke individual who still splurges on clothes is a trope for a reason). The elephant in the Brain by Robin Hanson and Kevin Simler claims that many very basic facets of our life are actually related to signaling and status maneuvers.

The third bid zero-sum game is competing for sexual partners (long and short term). One objection that I sometimes encounter against viewing intrasexual competition as a zero-sum game is different preferences. Imagine that Alice is dating Bob and Charles is dating Dina but in reality, Alice and Charles are way more compatible, and so are Bob and Dina. So if they would switch you would get a Pareto improvement which means it’s actually not a zero-sum game. And while this argument is theoretically correct in practice most people are looking for the same qualities, men usually put emphasis on youth while women tend to appreciate power, but both men and women prefer attractive and healthy mates. So in practice, the competition for partners can mostly be described as zero-sum [1]. Important to note that the zero-sum game is not between the partners in the relationship, because if the relationship is healthy it’s definitely a positive-sum game. The zero-sum game is the intrasexual competition, think of the stories of Helen of troy or Bathsheba as archetypical representations of this phenomenon.

The fourth category of zero-sum games is more subtle but still happens quite frequently and these are zero-sum games that are hidden in positive-sum games, few examples:

• While employment at large is a positive-sum game – salary negotiations are a zero-sum game, every extra dollar the employee is a dollar that the employer could have and vice versa.
• While the government could be described as a positive-sum game (at least if you agree with the Hobbesian view). The election process is a zero-sum game for the parties that participate.
• While marriage is a positive-sum game, a specific argument about who is the one who needs to get up to care for the crying baby at 4 am is zero-sum.

These are not the only examples, it seems that most interactions have some kind of a zero-sum in them: even if you work together to increase the size of the pie, the question of how the pie will be divided always stays pressing and relevant.

The grim implications

Two questions that are worth asking: How important are the results of these games are to our well-being? can someone lead a happy life while consistently finding himself on the losing side of these games?

The answer to both of these questions seems to be complex. Losing a war can be a matter of death or enslavement, while we can easily imagine someone who utterly sucks at salary negotiations but yet still lives a perfectly good life.

While there is a significant variance in the importance of different games, in aggregate, doing well in zero-sum games is extremely important for thriving and happiness.

The losers of the male intrasexual competition, which are most nowadays most commonly known as incels – seem to be extremely miserable, even though most of them are better off in most facets of life (health and material goods) compared to the average person in earlier times. It makes sense that a failure in something as crucial (in the evolutionary sense) as reproduction will cause suffering – just like hunger, thirst or physical pain.

In the case of status, it seems that there is at least some research that shows that social status influences well being, but most of the weight of the importance of status lies in revealed preferences, And while there is a lot of plausible deniability involved I think the work of both Hanson and Veblen is very convincing in demonstrating just how much people really care about status.

Now if we accept the idea that winning zero-sum games is very important for human thriving and happiness the logical implications are quite unsettling.

First, it seems that conflict theorists were mostly right all along. If one’s chance at happiness and thriving depend on zero-sum competition and position-based results then there is a very fundamental conflict of interests between people, and it’s quite rational to do things that might hurt everyone but will help you and your ingroup to beat the competition. It might be better to be the chief of a bush tribe than a low-status incel in modern Switzerland.

It also means playing by the rules is a bad strategy if you’re at the bottom of the totem pole – and you might be better with the riskier “get rich or die tryin’ ” approach. If you got dealt a bad hand playing by the rules means you’re probably going to lose and if these zero-sum games are crucial for your happiness and reproduction it makes sense to do anything that it takes to win, hence: “All is fair in love and war.”

But the propagation of these truths might be bad both for society, and for the ruling elite that benefits from the fact that people play positive-sum games, As they both get to keep their positional advantage and enjoy the economical growth that is the result of people creating value. That’s why the millionaire Naval Ravikant tells the clueless masses to avoid zero-sum games while aiming for status himself.

[1] – A more accurate term would be low-positive-sum games. Situations where the utility of the expected Pareto improvements is very low compared to non-Pareto improvements.

Discuss

### What does vaccine effectiveness as a function of time look like?

17 апреля, 2021 - 03:36
Published on April 17, 2021 12:36 AM GMT

I've heard it takes ~10-14 days after a shot (either first or 2nd dose) to reach maximum effectiveness.
Can anyone point me towards a figure showing (estimated) effectiveness as a function of time?

Discuss

### Could degoogling be a practice run for something more important?

17 апреля, 2021 - 03:03
Published on April 17, 2021 12:03 AM GMT

Andrew Critch's recent threat model ends with the following:

We humans eventually realize with collective certainty that the companies have been trading and optimizing according to objectives misaligned with preserving our long-term well-being and existence, but by then their facilities are so pervasive, well-defended, and intertwined with our basic needs that we are unable to stop them from operating. With no further need for the companies to appease humans in pursuing their production objectives, less and less of their activities end up benefiting humanity.
Eventually, resources critical to human survival but non-critical to machines (e.g., arable land, drinking water, atmospheric oxygen…) gradually become depleted or destroyed, until humans can no longer survive.

I occasionally see posts by people who believe that surveillance advertising is bad, and we should try to write google out of our lives. Regardless of the merit of this argument, I admire the discipline it takes to degoogle. Gmail and gdocs are really high quality, add a ton of value, which is why they've become so entrenched. I can scarcely imagine actually doing without google docs, at this point!

It occurs to me: should we be practicing the skill of doing without something that's pervasively adding a lot of value, just in case that skill helps to keep us from being enfeebled by an aligned AI system, or destroyed by a misaligned AI system? Is it also practice coordinating, which would payoff more generally than AI problems?

Discuss

### Superrational Agents Kelly Bet Influence!

17 апреля, 2021 - 01:08
Published on April 16, 2021 10:08 PM GMT

As a follow-up to the Walled Garden discussion about Kelly betting, Scott Garrabrant made some super-informal conjectures to me privately, involving the idea that some class of "nice" agents would "Kelly bet influence", where "influence" had something to do with anthropics and acausal trade.

I was pretty incredulous at the time. However, as soon as he left the discussion, I came up with an argument for a similar fact. (The following does not perfectly reflect what Scott had in mind, by any means. His notion of "influence" was very different, for a start.)

The meat of my argument is just Critch's negotiable RL theorem. In fact, that's practically the entirety of my argument. I'm just thinking about the consequences in a different way from how I have before.

Superrationality

Rather than articulating a real decision theory that deals with all the questions of acausal trade, bargaining, commitment races, etc, I'm just going to imagine a class of superrational agents which solve these problems somehow. These agents "handshake" with each other and negotiate (perhaps acausally) a policy which is Pareto-optimal wrt each of their preferences.

Negotiable RL

Critch's negotiable RL result studies the question of what an AI should do if it must serve multiple masters. For this post, I'll refer to the masters as "coalition members".

He shows the following:

Any policy which is Pareto-optimal with respect to the preferences of coalition members, can be understood as doing the following. Each coalition member is assigned a starting weight, with weights summing to one. At each decision, the action is selected via the weighted average of the preferences of each coalition member, according to the current weights. At each observation, the weights are updated via Bayes' Law, based on the beliefs of coalition members.

He was studying what an AI's policy should be, when serving the coalition members; however, we can apply this result to a coalition of superrational agents who are settling on their own policy, rather than constructing a robotic servant.

Critch remarks that we can imagine the weight update as the result of bets which the coalition members would make with each other. I've known about this for a long time, and it made intuitive sense to me that they'll happily bet on their beliefs; so, of course they'll gain/lose influence in the coalition based on good/bad predictions.

What I didn't think too hard about was how they end up betting. Sure, the fact that it's equivalent to a Bayesian update is remarkable. But it makes sense once you think about the proof.

Or does it?

To foreshadow: the proof works from the assumption of Pareto optimality. So it collectively makes sense for the agents to bet this way. But the "of course it makes sense for them to bet on their beliefs" line of thinking tricks you into thinking that it individually makes sense for the agents to bet like this. However, this need not be the case.

Kelly Betting & Bayes

The Kelly betting fraction can be written as:

Where p is your probability for winning, and r is the return rate if you win (ie, if you stand to double your money, r=2; etc).

Now, it turns out, betting f of your money (and keeping the rest in reserve) is equivalent to betting p of your money and putting (1-p) on the other side of the bet. Betting against yourself is a pretty silly thing to do, but since you'll win either way, there's no problem:

• If you win, you've got r⋅f, plus the (1−f) you held.
• r⋅f=rp−11−1r
• (1−f)=1−p1−1r
• So the sum = rp−11−1r+1−p1−1r=rp−p1−1r=pr(r−1)r−1=p⋅r of your initial money.
• If you lose, you've still got (1−f) of what you had.
• So this is just 1−p1−1r

Betting against yourself, with fractions like your beliefs:

• If you win, you've got rp of your money.
• If you lose, the payoff ratio (assuming you can get the reverse odds for the reverse bet) is 11−1r. So, since you put down 1−p, you get 1−p1−1r.

But now imagine that a bunch of bettors are using the second strategy to make bets with each other, with the "house odds" being the weighted average of all their beliefs (weighted by their bankrolls, that is). Aside from the betting-against-yourself part, this is a pretty natural thing to do: these are the "house odds" which make the house revenue-neutral, so the house never has to dig into its own pockets to award winnings.

You can imagine that everyone is putting money on two different sides of a table, to indicate their bets. When the bet is resolved, the losing side is pushed over to the winning side, and everyone who put money on the winning side picks up a fraction of money proportional to the fraction they originally contributed to that side. (And since payoffs of the bet-against-yourself strategy are exactly identical to Kelly betting payoffs, a bunch of Kelly bets at house odds rearrange money in exactly the same way as this.)

But this is clearly equivalent to how hypotheses redistribute weight during Bayesian updates!

So, a market of Kelly betters re-distributes money according to Bayesian updates.

Altruistic Bets

Therefore, we can interpret the superrational coalition members as betting their coalition weight, according to the Kelly criterion.

But, this is a pretty weird thing to do!

I've argued that the main sensible justification for using the Kelly criterion is if you have utility logarithmic in wealth. Here, this translates to utility logarithmic in coalition weight.

It's possible that under some reasonable assumptions about the world, we can argue that utility of coalition members will end up approximately logarithmic. But Critch's theorem applies to lots of situations, including small ones where there isn't any possibility for weird things to happen over long chains of bets as in some arguments for Kelly.

Typically, final utility will not even be continuous in coalition weight: small changes in coalition weight often won't change the optimal strategy at all, but at select tipping points, the optimal strategy will totally change to reflect the reconfigured trade-offs between preferences.

Intuitively, these tipping points should factor significantly in a coalition member's betting strategy; you'd be totally indifferent to small bets which can't change anything, but avoid specific transitions strongly, and seek out others. If the coalition members were betting based on their selfish preferences, this would be the case.

Yet, the coalition members end up betting according to a very simple formula, which does not account for any of this.

Why?

We can't justify this betting behavior from a selfish perspective (that is, not with the usual decision theories); as I said, the bets don't make sense.

But we're not dealing with selfish agents. These agents are acting according to a Pareto-optimal policy.

And that's ultimately the perspective we can justify the bets from: these are altruistically motivated bets. Exchanging coalition weight in this way is best for everyone. It keeps you Pareto-optimal!

This is very counterintuitive. I suspect most people would agree with me that there seems to be no reason to bet, if you're being altruistic rather than selfish. Not so! They're not betting for their personal benefit. They're betting for the common good!

Of course, that fact is a very straightforward consequence of Critch's theorem. It shouldn't be surprising. Yet, somehow, it didn't stick out to me in quite this way. I was too stuck in the frame of trying to interpret the bets selfishly, as Pareto-improvements which both sides happily agree to.

I'm quite curious whether we can say anything interesting about how altruistic agents would handle money, based on this. I don't think it means altruists should Kelly bet money; money is a very different thing from coalition weight. Coalition weights are like exchange rates or prices. Money is more of a thing being exchanged. You do not pay coalition weight in order to get things done.

Discuss

### Most Analogies Are Wrong

17 апреля, 2021 - 00:21
Published on April 16, 2021 7:53 PM GMT

I remember when I was taking a plant biology class at university, and our lecturer said something along the lines of  "A plant needs to take in resources and spend them like a business or a bank, but this analogy isn't great, can anyone explain why?". It took a while for someone to come up with a good response but the one the lecturer was looking for was "plants take in and spend multiple 'currencies' whereas a bank only uses money". There is a better response.

It is up to the person making argument by analogy to explain why their analogy is good. It is not up to other people to explain why their analogy is bad. Analogies can often sneak in assumptions, or lose much of the incredibly deep detail present in the subject. People often end up arguing over the analogy rather than the substance.

The strongest analogies have the form [A], [A] is like [a], [a] implies [b], [b] is like [B]", therefore [B]". Many analogies run as "[A], [a] implies [b], therefore [B]", skipping over the steps which are both the most important (and often where the cracks in the argument show).

Morality as taxes goes into this a bit. Intuition pumps for philosophy are more akin to a rhetorical device than a strong argument.. If you need to use an analogy to convince someone of your point then they are not worth discussing an issue with. The need to use an analogy may mean they don't understand or want to engage with the subject matter. The other likely option is that they think you can't understand the subject matter, or even worse, they want you to accept their conclusion without engaging with the subject matter.

Instead of analogies I tend to use examples: in a discussion about AI risks, rather than saying "Imagine if gorillas decided to create humans to solve gorilla problems. No matter how much the gorillas outnumber the humans, the humans will be able to escape their cages by acting in ways which a gorilla could not have imagined." I would much rather say something like "Imagine creating an AI system which manipulates sensors and billboards in order to discourage crime in an area. If the crime is evaluated by humans watching video feeds, the most effective billboard messages will simply tell criminals where the blind spots in the camera systems are." The second one is a much stronger argument. (Here I have used two examples within an example)

Most people are aware that analogies can fall down or fall apart. Most analogies that people use do not fall apart rapidly for this reason, but they are still wrong. If you cannot argue without analogy you are likely confused about the question at hand, and you will improve your thoughts by either learning more about the technical details of the subject, or by dissolving the question.

Discuss

### The Scout Mindset - read-along

16 апреля, 2021 - 22:43
Published on April 16, 2021 7:43 PM GMT

I just started reading Julia Galef's new book "The Scout Mindset: Why Some People See Things Clearly and Others Don't".  Here's a description:

When it comes to what we believe, humans see what they want to see. In other words, we have what Julia Galef calls a "soldier" mindset. From tribalism and wishful thinking, to rationalizing in our personal lives and everything in between, we are driven to defend the ideas we most want to believe—and shoot down those we don't.

But if we want to get things right more often, argues Galef, we should train ourselves to have a "scout" mindset. Unlike the soldier, a scout's goal isn't to defend one side over the other. It's to go out, survey the territory, and come back with as accurate a map as possible. Regardless of what they hope to be the case, above all, the scout wants to know what's actually true.

In The Scout Mindset, Galef shows that what makes scouts better at getting things right isn't that they're smarter or more knowledgeable than everyone else. It's a handful of emotional skills, habits, and ways of looking at the world—which anyone can learn. With fascinating examples ranging from how to survive being stranded in the middle of the ocean, to how Jeff Bezos avoids overconfidence, to how superforecasters outperform CIA operatives, to Reddit threads and modern partisan politics, Galef explores why our brains deceive us and what we can do to change the way we think.

This seems like a good book to do a read-along to, since there are probably a decent number of people reading it at the same time.

If possible put your comments under the correct chapter parent comment as you go. If you get further ahead than me, feel free to create a new chapter parent comment.

Discuss

### Why has nuclear power been a flop?

16 апреля, 2021 - 19:49
Published on April 16, 2021 4:49 PM GMT

To fully understand progress, we must contrast it with non-progress. Of particular interest are the technologies that have failed to live up to the promise they seemed to have decades ago. And few technologies have failed more to live up to a greater promise than nuclear power.

In the 1950s, nuclear was the energy of the future. Two generations later, it provides only about 10% of world electricity, and reactor design hasn‘t fundamentally changed in decades. (Even “advanced reactor designs” are based on concepts first tested in the 1960s.)

So as soon as I came across it, I knew I had to read a book just published last year by Jack Devanney: Why Nuclear Power Has Been a Flop.

What follows is my summary of the book—Devanney‘s arguments and conclusions, whether or not I fully agree with them. I‘ll give my own thoughts at the end.

The Gordian knot

There is a great conflict between two of the most pressing problems of our time: poverty and climate change. To avoid global warming, the world needs to massively reduce CO2 emissions. But to end poverty, the world needs massive amounts of energy. In developing economies, every kWh of energy consumed is worth roughly $5 of GDP. How much energy do we need? Just to give everyone in the world the per-capita energy consumption of Europe (which is only half that of the US), we would need to more than triple world energy production, increasing our current 2.3 TW by over 5 additional TW: Devanney Fig 1.3: Regional distribution of electricity consumption If we account for population growth, and for the decarbonization of the entire economy (building heating, industrial processes, electric vehicles, synthetic fuels, etc.), we need more like 25 TW: Devanney Fig 1.4: Electricity consumption in a decarbonized world This is the Gordian knot. Nuclear power is the sword that can cut it: a scalable source of dispatchable (i.e., on-demand), virtually emissions-free energy. It takes up very little land, consumes very little fuel, and produces very little waste. It‘s the technology the world needs to solve both energy poverty and climate change. So why isn‘t it much bigger? Why hasn‘t it solved the problem already? Why has it been “such a tragic flop?” Nuclear is expensive but should be cheap The proximal cause of nuclear‘s flop is that it is expensive. In most places, it can‘t compete with fossil fuels. Natural gas can provide electricity at 7–8 cents/kWh; coal at 5 c/kWh. Why is nuclear expensive? I‘m a little fuzzy on the economic model, but the answer seems to be that it‘s in design and construction costs for the plants themselves. If you can build a nuclear plant for around$2.50/W, you can sell electricity cheaply, at 3.5–4 c/kWh. But costs in the US are around 2–3x that. (Or they were—costs are so high now that we don‘t even build plants anymore.)

Why are the construction costs high? Well, they weren‘t always high. Through the 1950s and ‘60s, costs were declining rapidly. A law of economics says that costs in an industry tend to follow a power law as a function of production volume: that is, every time production doubles, costs fall by a constant percent (typically 10 to 25%). This function is called the experience curve or the learning curve. Nuclear followed the learning curve up until about 1970, when it inverted and costs started rising:

Devanney Figure 7.11: USA Unit cost versus capacity. From P. Lang, “Nuclear Power Learning and Deployment Rates: Disruption and Global Benefits Forgone” (2017)

Plotted over time, with a linear y-axis, the effect is even more dramatic. Devanney calls it the “plume,” as US nuclear constructions costs skyrocketed upwards:

Devanney Figure 7.10: Overnight nuclear plant cost as a function of start of construction. From J. Lovering, A. Yip, and T. Nordhaus, “Historical construction costs of global nuclear reactors” (2016)

This chart also shows that South Korea and India were still building cheaply into the 2000s. Elsewhere in the text, Devanney mentions that Korea, as late as 2013, was able to build for about $2.50/W. The standard story about nuclear costs is that radiation is dangerous, and therefore safety is expensive. The book argues that this is wrong: nuclear can be made safe and cheap. It should be 3 c/kWh—cheaper than coal. Safety Fundamental to the issues of safety is the question: what amount of radiation is harmful? Very high doses of radiation can cause burns and sickness. But in nuclear power safety, we‘re usually talking about much lower doses. The concern with lower doses is increased long-term cancer risk. Radiation can damage DNA, potentially creating cancerous cells. But wait: we‘re exposed to radiation all the time. It occurs naturally in the environment—from sand and stone, from altitude, even from bananas (which contain radioactive potassium). So it can‘t be that even the tiniest amount of radiation is a mortal threat. How, then, does cancer risk relate to the dose of radiation received? Does it make a difference if the radiation hits you all at once, vs. being spread out over a longer period? And is there anything like a “safe” dose, any threshold below which there is no risk? Linear No Threshold The official model guiding US government policy, both at the EPA and the Nuclear Regulatory Commission (NRC), is the Linear No Threshold model (LNT). LNT says that cancer risk is directly proportional to dose, that doses are cumulative over time (rate doesn‘t matter), and that there is no threshold or safe dose. The problem with LNT is that it flies in the face of both evidence and theory. First, theory. We know that cells have repair mechanisms to fix broken DNA. DNA gets broken all the time, and not just from radiation. And remember, there is natural background radiation from the environment. If cells weren‘t able repair DNA, life would not have survived and evolved on this planet. When DNA breaks, it migrates to special “repair centers” within the cell, which put the strands back together within hours. However, this is a highly non-linear process: these centers can correctly repair breaks at a certain rate, but as the break rate increases, the error rate of the repair process goes up drastically. This also implies that dose rate matters: a given amount of radiation is more harmful if received all at once, and less if spread out over time. (In both of these details, I think of this as analogous to alcohol being processed out of the bloodstream by the liver: a low dose can be handled; but overwhelm the system and it quickly becomes toxic. One beer a night for a month might not even get you tipsy; the same amount in a single night would kill you.) Radiotherapy takes advantage of this. When radiotherapy is applied to tumors, non-linear effects allow doctors to do much more damage to the tumor than to surrounding tissue. And doses of therapy are spread out over multiple days, to give the patient time to recover. Devanney also assembles a variety of types of evidence about radiation damage from a range of sources. Indeed, his argument against LNT is by far the longest chapter in the book, weighing in at over 50 pages (out of fewer than 200). He looks at studies of: • The nuclear bomb survivors of Hiroshima and Nagasaki • The effects of radon gas • Animal experiments in beagles and mice • UK radiologists (tracked over 100 years) • Radiation workers across fifteen countries • Nuclear shipyard workers (using a closely matched control group of non-nuclear workers in the same yards) • Areas with naturally high levels of background radiation from sources such as thorium-containing sand or radon: Finland; Ramsar, Iran; Guarapari, Brazil; Yangjiang, China; and Kerala, India • The population of Washington County, Utah, 200 miles downwind of a nuclear test site in Nevada that was used in the 1950s • The Chernobyl cleanup crew, including the guys who had to shovel chunks of core graphite off the roof of one of the buildings and toss them into the gaping hole from the explosion • An incident in Taipei in which an apartment was accidentally built with rebar containing radioactive cobalt-60 • The women who hand-painted radium onto watch dials in the early 20th century (some of whom would lick the brushes to form a point) • A 1950 trial that violated every conceivable standard of medical ethics by injecting unknowing and non-consenting patients with plutonium In the last case, all of the patients had been diagnosed with terminal disease. None of them died from the plutonium—including one patient, Albert Stevens, who had been misdiagnosed with terminal stomach cancer that turned out to be an operable ulcer. He lived for more than twenty years after the experiment, over which time he received a cumulative dose of 64 sieverts, one-tenth of which would have killed him if received all at once. He died from heart failure at the age of 79. The weight of all of this evidence is that low doses of radiation do not cause detectable harm. Little to no cancer, or at least far less than predicted by LNT, is found in the subjects receiving low doses, such as workers operating under modern safety standards, or populations in high-background areas (in fact, there is some evidence of a beneficial effect from very low doses, although nothing in Devanney‘s overall argument depends on this, nor does he stress it). In populations where some subjects did receive high doses, the response curves tend to look decidedly non-linear. The other finding from these studies is that dose rate matters. This was the explicit finding of an MIT study in mice, and it is the unmistakeable conclusion of the case of Albert Stevens, who lived over two decades with plutonium in his bloodstream. (At least, all this is Devanney‘s interpretation—it is not always the conclusion written in the papers. Devanney argues, not unconvincingly, that in many cases the researchers‘ conclusions are not supported by their own data.) ALARA Excessive concern about low levels of radiation led to a regulatory standard known as ALARA: As Low As Reasonably Achievable. What defines “reasonable”? It is an ever-tightening standard. As long as the costs of nuclear plant construction and operation are in the ballpark of other modes of power, then they are reasonable. This might seem like a sensible approach, until you realize that it eliminates, by definition, any chance for nuclear power to be cheaper than its competition. Nuclear can‘t even innovate its way out of this predicament: under ALARA, any technology, any operational improvement, anything that reduces costs, simply gives the regulator more room and more excuse to push for more stringent safety requirements, until the cost once again rises to make nuclear just a bit more expensive than everything else. Actually, it‘s worse than that: it essentially says that if nuclear becomes cheap, then the regulators have not done their job. What kinds of inefficiency resulted? An example was a prohibition against multiplexing, resulting in thousands of sensor wires leading to a large space called a cable spreading room. Multiplexing would have cut the number of wires by orders of magnitude while at the same time providing better safety by multiple, redundant paths. A plant that required 670,000 yards of cable in 1973 required almost double that, 1,267,000, by 1978, whereas “the cabling requirement should have been dropping precipitously” given progress at the time in digital technology. Another example was the acceptance in 1972 of the Double-Ended-Guillotine-Break of the primary loop piping as a credible failure. In this scenario, a section of the piping instantaneously disappears. Steel cannot fail in this manner. As usual Ted Rockwell put it best, “We can’t simulate instantaneous double ended breaks because things don’t break that way.” Designing to handle this impossible casualty imposed very severe requirements on pipe whip restraints, spray shields, sizing of Emergency Core Cooling Systems, emergency diesel start up times, etc., requirements so severe that it pushed the designers into using developmental, unrobust technology. A far more reliable approach is Leak Before Break by which the designer ensures that a stable crack will penetrate the piping before larger scale failure. Or take this example (quoted from T. Rockwell, “What’s wrong with being cautious?”): A forklift at the Idaho National Engineering Laboratory moved a small spent fuel cask from the storage pool to the hot cell. The cask had not been properly drained and some pool water was dribbled onto the blacktop along the way. Despite the fact that some characters had taken a midnight swim in such a pool in the days when I used to visit there and were none the worse for it, storage pool water is defined as a hazardous contaminant. It was deemed necessary therefore to dig up the entire path of the forklift, creating a trench two feet wide by a half mile long that was dubbed Toomer’s Creek, after the unfortunate worker whose job it was to ensure that the cask was fully drained. The Bannock Paving Company was hired to repave the entire road. Bannock used slag from the local phosphate plants as aggregate in the blacktop, which had proved to be highly satisfactory in many of the roads in the Pocatello, Idaho area. After the job was complete, it was learned that the aggregate was naturally high in thorium, and was more radioactive that the material that had been dug up, marked with the dreaded radiation symbol, and hauled away for expensive, long-term burial. The Gold Standard Overcautious regulation interacted with economic history in a particular way in the mid–20th century that played out very badly for the nuclear industry. Nuclear engineering was born with the Manhattan Project during WW2. Nuclear power was initially adopted by the Navy. Until the Atomic Energy Act of 1954, all nuclear technology was the legal monopoly of the US government. In the ‘50s and ‘60s, the nuclear industry began to grow. But it was competing with extremely abundant and cheap fossil fuels, a mature and established technology. Amazingly, the nuclear industry was not killed by this intense competition—evidence of the extreme promise of nuclear. Then came the oil shocks of the ‘70s. Between 1969 and 1973, oil prices tripled to$11/barrel. This should have been nuclear‘s moment! And indeed, there was a boom in both coal and nuclear.

But as supply expands to meet demand, costs rise to meet prices. The costs of both coal and nuclear rose. In the coal power industry, this took the form of more expensive coal from marginal mines, higher wages paid to labor who now had more bargaining power, etc. In the nuclear industry, it took the form of ever more stringent regulation, and the formal adoption of ALARA. Prices were high, so the pressure was on to get construction approved as quickly as possible, regardless of cost. Nuclear companies stopped pushing back on the regulators and started agreeing to anything in order to move the process along. The regulatory regime that resulted is now known as the Gold Standard.

The difference between the industries is that the cost rises in coal could, and did, reverse as prices came down. But regulation is a ratchet. It goes in one direction. Once a regulation is in place, it‘s very difficult to undo.

Even worse was the practice of “backfitting”:

The new rules would be imposed on plants already under construction. A 1974 study by the General Accountability Office of the Sequoyah plant documented 23 changes “where a structure or component had to be torn out and rebuilt or added because of required changes.” The Sequoyah plant began construction in 1968, with a scheduled completion date of 1973 at a cost of $300 million. It actually went into operation in 1981 and cost$1700 million. This was a typical experience.

Bottom line: Ever since the ‘70s, nuclear has been stuck with burdensome regulation and high prices—to the point where it‘s now accepted that nuclear is inherently expensive.

Regulator incentives

The individuals who work at NRC are not anti-nuclear. They are strongly pro-nuclear—that‘s why they went to work for a nuclear agency in the first place. But they are captive to institutional logic and to their incentive structure.

The NRC does not have a mandate to increase nuclear power, nor any goals based on its growth. They get no credit for approving new plants. But they do own any problems. For the regulator, there‘s no upside, only downside. No wonder they delay.

Further, the NRC does not benefit when power plants come online. Their budget does not increase proportional to gigawatts generated. Instead, the nuclear companies themselves pay the NRC for the time they spend reviewing applications, at something close to $300 an hour. This creates a perverse incentive: the more overhead, the more delays, the more revenue for the agency. The result: the NRC approval process now takes several years and costs literally hundreds of millions of dollars. The Big Lie Devanney puts a significant amount of blame on the regulators, but he also lays plenty at the feet of industry. The irrational fear of very low doses of radiation leads to the idea that any reactor core damage, leading to any level whatsoever of radiation release, would be a major public health hazard. This has led the entire nuclear complex to foist upon the public a huge lie: that such a release is virtually impossible and will never happen, or with a frequency of less than one in a million reactor-years. In reality, we‘ve seen three major disasters—Chernobyl, Three Mile Island, and Fukushima—in less than 15,000 reactor-years of operation worldwide. We should expect about one accident per 3,000 reactor-years going forward, not one per million. If nuclear power were providing most of the world‘s electricity, there would be an accident every few years. Instead of selling a lie that a radiation release is impossible, the industry should communicate the truth: releases are rare, but they will happen; and they are bad, but not unthinkably bad. The deaths from Chernobyl, 35 years ago, were due to unforgivably bad reactor design that we‘ve advanced far beyond now. There were zero deaths from radiation at Three Mile Island or at Fukushima. (The only deaths from the Fukushima disaster were caused by the unnecessary evacuation of 160,000 people, including seniors in nursing homes.) In contrast, consider aviation: An airplane crash is a tragedy. It kills hundreds of people. The public accepts this risk not only because of the value of flying, but because these crashes are rare. And further, because the airline industry does not lie about the risk of crashes. Rather than saying “a crash will never happen,” they put data-collecting devices on every plane so that when one inevitably does crash, they can learn from it and improve. This is a healthy attitude towards risk that the nuclear industry should emulate. Testing Another criticism the book makes of the industry is its approach to QA and the general lack of testing. Many questions arise during NRC design review: how a plant will handle the failure of this valve or that pump, etc. A natural way to answer these questions would be to build a reactor and test it, and for the design application to be based in large part on data from actual tests. For instance, one advanced reactor design comes from NuScale: NuScale is not really a new technology, just a scaled down Pressurized Water Reactor; but the scale down allows them to rely on natural circulation to handle the decay heat. No AC power is required to do this. The design also uses boron, a neutron absorber, in the cooling water to control the reactivity. The Advisory Committee on Reactor Safeguards (ACRS), an independent government body, is concerned that in emergency cooling mode some of the boron will not be recirculated into the core, and that could allow the core to restart. NuScale offers computer analyses that they claim show this will not happen. ACRS and others remain unconvinced. The solution is simple. Build one and test it. But under NRC rules, you cannot build even a test reactor without a license, and you can’t get a license until all such questions are resolved. Instead, a lot of analysis is done by building models. In particular, NRC relies on a method called Probabilistic Risk Assessment: enumerate all possible causes of a meltdown, and all the events that might lead up to them, and assign a probability to each branch of each path. In theory, this lets you calculate the frequency of meltdowns. However, this method suffers from all the problems of any highly complex model based on little empirical data: it‘s impossible to predict all the things that might go wrong, or to assign anything like accurate probabilities even to the scenarios you do dream up: In March, 1975, a workman accidentally set fire to the sensor and control cables at the Browns Ferry Plant in Alabama. He was using a candle to check the polyurethane foam seal that he had applied to the opening where the cables entered the spreading room. The foam caught fire and this spread to the insulation. The whole thing got out of control and the plant was shut down for a year for repairs. Are we to blame the PRA analysts for not including this event in their fault tree? (If they did, what should they use for the probability?) In practice, different teams using the same method come up with answers that are orders of magnitude apart, and what result to accept is a matter of negotiation. Probabilistic models were used in the past to estimate that reactors would have a core damage frequency of less than one in a million years. They were wrong. Later, during construction, a similar issue arises. The standard in the industry is to use “formal QA” processes that amount to paperwork and box-checking, a focus on following bureaucratic rules rather than producing reliable outcomes. Devanney saw the same mentality in US Navy shipyards, which produced billion-dollar ships that don‘t even work. Instead, the industry should be more like the Korean shipyards, which are able to deliver reliably on schedule, with higher quality and lower cost. They do this by inspecting the work product, rather than the process used to create it: “test the weld, not the welder.” And they require formal guarantees (such as warranties) of meeting a rigorous spec given up front. Competition Finally, Devanney laments the lack of real competition in the market. He paints the industry as a set of bloated incumbents and government labs, all “feeding at the public trough.” For instance: One the biggest labs is Argonne outside Chicago. At Argonne, they monitor people going in and out of some of the buildings for radiation contamination. The alarms are set so low that, if it’s raining, in coming people must wipe off their shoes after they walk across the wet parking lot. And you can still set off the alarm, which means everything comes to the halt while you wait for the Health Physics monitor to show up, wand you down, and pronounce you OK to come in. What has happened is that the rain has washed some of the naturally occurring radon daughters out of the air, and a few of these mostly alpha articles have stuck to your shoes. In other words, Argonne is monitoring rain water. Nuclear incumbents aren‘t upset that billions of dollars are thrown away on waste disposal and unnecessary cleanup projects—they are getting those contracts. For instance, 8,000 people are employed in cleanup at Hanford, Washington, costing$2.5B a year, even though the level of radiation is only a few mSv/year, well within the range of normal background radiation.

What to do?

Devanney has a practical alternative for everything he criticizes. Here are the ones that stood out to me as most important:

Replace LNT with a model that more closely matches both theory and evidence. As one workable alternative, he suggests using a sigmoid, or S-curve, instead of a linear fit, in a model he calls Sigmoid No Threshold. In this model, risk is monotonic with dose (there are no beneficial effects at low doses) and it is nonzero for every nonzero dose (there is no “perfectly safe” dose). But the risk is orders of magnitude lower than LNT at low doses. S-curves are standard for dose-response models in other areas.

Drop ALARA. Replace it with firm limits: choose a threshold of radiation deemed safe; enforce that limit and nothing more. Further, these limits should balance risk vs. benefit, recognizing that nuclear is an alternative to other modes of power, including fossil fuels, that have their own health impacts.

Encourage incident reporting, on the model of the FAA‘s Aviation Safety Reporting System. This system enables anonymous reports, and in case of accidental rule violations, it treats workers more leniently if they can show that they proactively reported the incident.

Enable testing. Don‘t regulate test reactors like production ones. Rather than requiring licensing up front, have testing monitored by a regulator, who has the power to shut down test reactors deemed unsafe. Then, a design can be licensed for production based on real data from actual tests, instead of theoretical models.

We could even designate a federal nuclear testing park, the “Protopark,” in an unpopulated region. The park would be funded by rent from tenants, so that the market, rather than the government, would decide who uses it. Tenants would have to obtain insurance, which would force a certain level of safety discipline.

Align regulator incentives with the industry. Instead of an hourly fee for regulatory review, fund the NRC by a tax on each kilowatt-hour of nuclear electricity, giving them a stake in the outcome and the growth of the industry.

Allow arbitration of regulation. Regulators today have absolute power. There should be an appeals process by which disputes can be taken to a panel of arbitrators, to decide whether regulatory action is consistent with the law. City police are held accountable for their use and abuse of power; the nuclear police should be too.

Metanoeite

At the end of the day, though, what is needed is not a few reforms, but “metanoiete”: a deep repentance, a change to the industry‘s entire way of thinking. Devanney is not optimistic that this will happen in the US or any wealthy country; they‘re too comfortable and too able to fund fantasies of “100% renewables.” Instead, he thinks the best prospect for nuclear is a poor country with a strong need for cheap, clean power. (I assume that‘s why his company, ThorCon, is building its thorium-fueled molten salt reactor in Indonesia.)

Again, all of the above is Devanney‘s analysis and conclusions, not necessarily mine. What to make of all this?

I‘m still early in my research on this topic, so I don‘t yet know enough to fully evaluate it. But the arguments are compelling to me. Devanney quantifies his arguments where possible and cites references for his claims. He places blame on systems and incentives rather than on evil or stupid individuals. And he offers reasonable, practical alternatives.

I would have liked to see the nuclear economic model made more explicit. How much of the cost of electricity is the capital cost of the plant, vs. operating costs, vs. fuel? How much is financing, and how sensitive is this to construction times and interest rates? Etc.

A few important topics were not addressed. One is weapons proliferation. Another is the role of the utility companies and the structure of the power industry. Electricity utilities are often regulated monopolies. At least some of them, I believe, have a profit margin that is guaranteed by law. (!) That seems like an important element in the lack of competition and perverse incentive structure.

I would be interested in hearing thoughtful counterarguments to the book’s arguments. But overall, Why Nuclear Power Has Been a Flop pulls together academic research, industry anecdotes, and personal experience into a cogent narrative that pulls no punches. Well worth reading. Buy the paperback on Amazon, or download a revised and updated PDF edition for free.

Discuss

### What are some interesting examples of risks measured in micromorts?

16 апреля, 2021 - 08:25
Published on April 16, 2021 5:25 AM GMT

A micromort is a 1-in-a-million chance of death. Wikipedia's article has a few examples of risks of death measured in micromorts, but it seems like there must be a lot more of them. What are some interesting examples you use as a baseline for comparison?

Discuss

### Place-Based Programming - Part 2 - Functions

16 апреля, 2021 - 03:25
Published on April 16, 2021 12:25 AM GMT

In Part 1, we defined a place-of macro and a value-of function. The code from Part 1, as originally written, was not an importable module. I have modified the code from Part 1 to be portable.

;; Module from Part 1 ;; Save this code into a file called part1.hy and then use it with: ;; (import [part1 [value-of]]) ;; (require [part1 [place-of]]) (setv +place-dir+ ".places/") (defmacro/g! place-of [code] (do (import [hashlib [md5]] os pickle) (setv ~g!type-dir (os.path.join ~+place-dir+ (str (type '~code)))) (if-not (os.path.exists ~g!type-dir) (os.mkdir ~g!type-dir)) (setv ~g!place (os.path.join ~g!type-dir (+ (.hexdigest (md5 (.encode (str '~code)))) ".pickle"))) (if-not (os.path.exists ~g!place) (with [f (open ~g!place "wb")] (pickle.dump (eval '~code) f))) ~g!place)) (defn value-of [place] (import os pickle) (assert (= (type place) str) (+ (str place) " is not a place")) (if-not os.path.exists (raise (FileNotFoundError (+ "Could not find place " place)))) (with [f (open place "rb")] (pickle.load f)))

The value-of function works fine. The place-of macro has no way to accept parameters. We will define a macro for constructing place-based functions, which can accept parameters.

defnp

Hy's built-in function declaration macro is defn. We will call our place-based function declaration macro defnp. Our place-based function will hash its own code as before. We also need a unique identifier for its parameters. In data science, the values of our parameters are often gigantic. It takes a long time to hash a big data structure. Hashing big data structures takes many computations. The whole purpose of a persistent memoization system is to reduce how many computations we have to perform. Passing values to our place-based function is a wastes compute. Instead we pass places, which are always easy to hash. A place-based function takes places as parameters and then returns another place.

(import [part1 [value-of]]) (require [part1 [place-of]]) (import os [part1 [+place-dir+]]) (setv +funcall-dir+ (os.path.join +place-dir+ "funcall")) (defmacro/g! defnp [symbol params &rest body] (do (import [hashlib [md5]] os pickle) (defn ~symbol ~params (setv ~g!funcall-place (os.path.join +funcall-dir+ (+ (. ~symbol code-hash) "-" (.hexdigest (md5 (.encode (str (list ~params)))))))) (if-not (os.path.exists ~g!funcall-place) (do (setv ~g!value ((fn ~params ~@body) #*(lfor ~g!param ~params (value-of ~g!param)))) (with [f (open ~g!funcall-place "wb")] (pickle.dump ~g!value f)))) ~g!funcall-place) (setv (. ~symbol code-hash) (.hexdigest (md5 (.encode (str ['~params '~body]))))))) ;; Tests (defnp plus [x y] (+ x y)) (assert (= (value-of (plus (place-of 1) (place-of 2))) (+ 1 2))) (assert (= (value-of (plus (place-of 3) (place-of 4))) (+ 3 4))) (defnp times [x y] (* x y)) (assert (= (value-of (times (place-of 1) (place-of 2))) (* 1 2))) (assert (= (value-of (times (place-of 3) (place-of 4))) (* 3 4)))

Discuss

### The Variational Characterization of Expectation

16 апреля, 2021 - 03:12
Published on April 16, 2021 12:12 AM GMT

Epistemological Status: Fairly confident. This is much closer to the expected minimal map I had in mind ;)

Say we have a random variable X, but are only allowed to summarize the outcomes with a single number eX. What should we pick so that the squared error is minimized? mineXE[(X−eX)2]⟺∂eXE[(X−eX)2]=0⟺E[X−eX]=0⟺E[X]=eX Thus, in this sense, the expectation of a random variable is the best point estimate of it's outcomes.

A major reason a variational characterization is interesting is because it creates a tunnel to optimization theory. The expected minimal map presented here can 'average out' irrelevant details allowing you to filter to just the relevant things in a stream-lined manner.

When the random variables have binary outcomes we can use the conditional expectation to characterize probability without referencing information. It is effectively the same statement without reference to information. So there are at least two ways of giving a variational characterization of probability.

Lemma 1 (Optimal Prediction): Define a random variable χU∈L2. The best point representation of the outcome of χU for a given observation O is equivalent to E(χU|O). Moreover, the optimal point representation of χU is invariant under the pull-back η:O→E(χU|O) of the conditioning.

Corollary: When χU indicates a binary answer to a query we have a characterization of probability.

The second condition is a fancy way of saying that our optimal estimate is an optimal way to condition our expectation. It's somewhat like cheating on a test. You can either read the question Q and then predict A via E[A|Q] or you could be told the answer and then predict A via E[A|E[A|Q]].

Suppose that instead of cheating we answer some other question that gives us a hint for the answer. Now we have something like, Q→A1→A2 We have a question, we get a hint, we answer. From the above lemma, the best guess for A2 is E[A2|A1], but we don't know A1. Our best guess for A1 is E[A1|Q] so the best we can do is, E[A2|Q]≈E[A2|E[A1|Q]] On a multi-part problem we might having something like, (Q=A0)→A1→…→An−1→An So we take the approximations as, ^Ak=E[Ak|^Ak−1]=argminek∈L2 E[(Ak−ek(^Ak−1))2] In practice, we'll often want or need to restrict the class of functions we know how to optimize over. In such cases extending the Q/A process is the only way to guarantee that the final answer is always close to the optimum. At this point we can 'forget' about the intermediates and optimize, ^An=argmine1,…,en∈E E[(An−en∘…∘e2∘e1(Q))2] If the function class is appropriate, this gives you the optimization problem associated with training a neural-network. Literally, but only approximately, each function averages out irrelevant features of it's input to create more relevant features for prediction in it's output.

Proof of Lemma 1: The proof works for non-binary random variables. However, the probabalistic interpretation is lost. Expectation for random variables in L2 is characterized by, P(χU|O)=E[χU|O]=mineχU∈L2E[(χU−eχU(O))2] The minimizer exists and is unique by the Hilbert projection theorem. Moreover, E[χU|η(O)]=E[χU|E(χU|O)]⟺argmineχU∈L2E[(χU−eχU(O))2]=argmine1∈L2E[(χU−e1(E[χU|O]))2] So all that's left is to verify the pull-back doesn't affect the outcome. We have, argmine1∈L2E[(χU−e1(E[χU|O]))2]=argmine1∈L2E[(χU−e1(argmine2∈L2E[(χU−e2(O))2]))2]=argmine1∈L2E[(χU−e1∘e2(O))2]=argmine1,e2∈L2E[(χU−e1∘e2(O))2]=argmineχU∈L2E[(χU−eχU(O))2] Therefore, the probability conditioned on the observation is the optimal point estimate and pulling-back the observation to just the point-estimate leaves the estimate unchanged. □

Discuss

### [ACX Linkpost] Prospectus on Próspera

16 апреля, 2021 - 01:48
Published on April 15, 2021 10:48 PM GMT

I like reading LW comments more than Substack comments, and I think this post might generate some high-info discussion.

Discuss

### Old post/writing on optimization daemons?

15 апреля, 2021 - 21:00
Published on April 15, 2021 6:00 PM GMT

I'm having trouble locating an old post about optimization daemons that I know I read at one point. I believe it was written by Eliezer. It featured a lot of visual imagery of the daemon steering the path of the model through parameter-space by e.g. forcing it through narrow paths towards low loss surrounded by walls of high loss. Does anyone know which post I'm remembering?

Discuss

### Computing Natural Abstractions: Linear Approximation

15 апреля, 2021 - 20:47
Published on April 15, 2021 5:47 PM GMT

Background: Testing The Natural Abstraction Hypothesis

A linear-Guassian approximation is a natural starting point. It’s relatively simple - all of the relevant computations are standard matrix operations. But it’s also relatively general: it should be a good approximation for any system with smooth dynamics and small noise/uncertainty. For instance, if our system is a solid object made of molecules and sitting in a heat bath, then linear-Gaussian noise should be a good model. (And that’s a good physical picture to keep in mind for the rest of this post - solid object made of molecules, all the little chunks interacting elastically, but also buffeted about by thermal noise.)

So, we have some Bayes net in which each variable is Gaussian - a linear combination of its parents plus some IID Gaussian noise. For the underlying graph, I’m using a 2D Delaunay triangulation, mainly because it does a good job reflecting physically-realistic space/time structure. (In particular, as in physical space, most pairs of nodes are far apart - as opposed to the usual Erdos-Renyi random graph, where the distance between most points is logarithmic. We’ll revisit the choice of graph structure later.) Embedded in 2D, an example graph looks like this:

Example of the kind of Bayes net structure we’ll use. Each node is a variable, and the arrows show causal relationships in the model. The whole graph embeds nicely in a 2D space, so nodes which are “far apart” in the spatial embedding are also “far apart” in terms of steps in the graph.

… although the actual experiments below will use a graph with about 500X more nodes (50k, whereas the visual above uses 100). Each of those nodes is a Gaussian random variable, with the arrows showing its parents in the Bayes net. Weights on the edges (i.e. coefficients of each parent) are all random normal, with variance ⅔ (another choice we’ll revisit later).

Let’s start with the basic experiment: if we pick two far-apart neighborhoods of nodes, X and Y, can the information from X which is relevant to Y fit into a much-lower-dimensional summary? In a Gaussian system, we can answer this by computing the covariance matrix Cov[X,Y], taking an SVD, and looking at the number of nonzero singular values. Each left singular vector with a nonzero singular value gives the weights of one variable in the “summary data” of X which is relevant to Y. (Or, for information in Y relevant to X, we can take the right singular vectors; the dimension of the summary is the same either way.)

We’ll use these three randomly-chosen neighborhoods of nodes (green is neighborhood 1, red is 2, purple is 3):

Three neighborhoods in a 50k-node network, of ~110 nodes each. The black/blue background is all the arrows in the whole network - they’re dense enough that it’s hard to see much.

Here are the 10 largest singular values of the neighborhood 1 - neighborhood 2 covariance matrix:

array([5.98753213e+05, 1.21862101e+02, 1.91973783e-01, 1.03200621e-03,       2.01888771e-04, 1.05877548e-05, 5.34855466e-06, 4.94833571e-10,       2.55557198e-10, 2.17092875e-10])

The values of order 1e-10 or lower are definitely within numerical error of zero, and I expect the values of order 1e-3 or lower are as well, so we see at most 7 nonzero singular values and probably more like 3. (I know 1e-3 seems high for numerical error, but the relevant number here is the ratio between a given singular value and the largest singular value, so it’s really more like 1e-8. In principle a double provides 1e-16 precision, but in practice it’s common to lose half the bits in something like SVD, and 1e-8 is about how much I usually trust these things.) Meanwhile, the two neighborhoods contain 109 and 129 nodes, respectively.

To summarize all that: we picked out two neighborhoods of ~110 variables each in a 50k-node linear Gaussian network. To summarize all the information from one neighborhood relevant to the other, we need at most a 7-variable summary (which is also linear and Gaussian). That’s already a pretty interesting result - the folks at SFI would probably love it.

But it gets even better.

We can do the same thing with the neighborhood 1 - neighborhood 3 covariance matrix. The first 10 singular values are:

array([7.50028046e+11, 5.87974398e+10, 2.50629095e+08, 2.23881004e+03,       1.46035030e+01, 2.52752074e+00, 1.01656991e-01, 1.79916658e-02,       5.89831843e-04, 3.52074803e-04])

… and by the same criteria as before, we see at most 8 nonzero singular values, and probably more like 3. Now, we can compare the neighborhood-1-singular-vectors from our two SVDs. Simply computing the correlations between each pair of nonzero singular vectors from the two decompositions yields:

Key thing to notice here: the first two singular vectors are near-perfectly correlated! (Both correlations are around .996.) The correlations among the next few are also large, after which it drops off into noise. (I did say that everything after the first three singular components was probably mostly numerical noise.)

In English: not only can a three-variable summary of neighborhood 1 probably contain all the info relevant to either of the other two neighborhoods, but two of the three summary variables are the same.

This is exactly the sort of thing the natural abstraction hypothesis predicts: information relevant far away fits into a compact summary - the “abstract summary” - and roughly-the-same abstract summary is relevant even from different “vantage points”. Or, to put it differently, the abstraction is reasonably robust to changes in which “far away” neighborhood we pick.

Our 2D-embedded linear-Gaussian world abstracts well.

Of course, this was just an example from one run of the script, but the results are usually pretty similar - sometimes the summary has 2 or 4 or 5 dimensions rather than 3, sometimes more or fewer summary-dimensions align, but these results are qualitatively representative. The main case where it fails is when the RNG happens to pick two neighborhoods which are very close or overlap.

Is This Just The Markov Blanket?

Notice that our “neighborhoods” of variables include some nodes strictly in the interior of the neighborhood. The variables on the boundary of the neighborhood should form a Markov blanket - i.e. they are themselves an abstract summary for the whole neighborhood, and of course they’re the same regardless of which second neighborhood we pick. So perhaps our results are really just finding the variables on the neighborhood boundary?

The results above already suggest that this is not the case: we can see at a glance that the dimension of the summary is quite a bit smaller than the number of variables on the boundary. But let’s run another test: we’ll fix the point at the “center” of the neighborhood, then expand the neighborhood’s “radius” (in terms of undirected steps in the graph), and see how both the boundary size and the summary size grow.

Here are results from one typical run (number of boundary variables in blue, number of summary variables in red):

As we expand the neighborhood, the number of boundary-variables grows, but the number of summary variables stays flat. So there’s definitely more going on here than just the neighborhood’s Markov blanket (aka boundary variables) acting as an abstract summary.

The above toy model made two choices which one could imagine not generalizing, even in other sparse linear-Gaussian systems.

First, the graph structure. The random graphs most often used in mathematics are Erdos-Renyi (ER) random graphs. “Information relevant far away” doesn’t work very well on these, because nodes are almost never far apart. The number of steps between the two most-distant nodes in such graphs is logarithmic. So I wouldn’t expect environments with such structure to abstract well. On the other hand, I would usually expect such environments to be a poor model for systems in our physical world - the constraints of physical space-time make it rather difficult to have large numbers of variables all interacting with only logarithmic causal-distances between them. In practice, there are usually mediating variables with local interactions. Even the real-world systems which most closely resemble ER random graphs, like social networks or biochemical regulatory networks, tend to have much more “localized” connections than a true ER random graph - e.g. clusters in social networks or modules in biochemical networks.

(That said, I did try running the above experiments with ER graphs, and they work surprisingly well, considering that the neighborhoods are only a few steps apart on the graph. Summary sizes are more like 10 or 12 variables rather than 2 or 4, and alignment of the summary-variables is hit-and-miss.)

Second, the weights. There’s a phase shift phenomenon here: if the weights are sufficiently low, then noise wipes out all information over a distance. If the weights are sufficiently high, then the whole system ends up with one giant principal component. Both of these abstract very well, but in a boring-and-obvious way. The weights used in these experiments were chosen to be right at the boundary between these two behaviors, where more interesting phenomena could potentially take place. If the system abstracts well right at the critical point, then it should abstract well everywhere.

So What Could We Do With This?

I mainly intend to use this sort of model to look for theoretical insights into abstraction which will generalize beyond the linear-Gaussian case. Major goals include data structures for representing whole sets of abstractions on one environment, as well as general conditions under which the abstractions are learned by a model trained on the environment.

But there are potential direct use-cases for this kind of technique, other than looking for hints at more-general theory.

One example would be automatic discovery of abstractions in physical systems. As long as noise is small and the system is non-chaotic (so noise stays small), a sparse linear-Gaussian model should work well. (In practice, this mainly means the system needs to have some kind of friction in most degrees of freedom.) In this case, I’d expect the method used here to work directly: just compute covariances between far-apart neighborhoods of variables, and those covariances will likely be low-rank.

In principle, this method could also be directly applied to machine learning systems, although the range of applicability would be fairly narrow. It would only apply to systems where linear approximations work very well (or maybe quadratic approximations, which give a linear approximation of the gradient). And ideally the random variables would be Gaussian. And it should have a fairly large and deep computational graph, so most neighborhoods of variables are not too close together. If one just happened to have an ML model which met those criteria, we’d expect these results to carry over directly: compute the covariance matrix between far-apart neighborhoods of the random variables in the model, and those matrices should be low-rank. That sounds like a potentially-very-powerful transparency tool... if one had a model which fit the criteria. Hypothetically.

Summary

Build a random linear-Gaussian Bayes net with a random planar graph structure (in this case generated from a Delaunay mesh). Pick two far-apart neighborhoods of variables in this net, and use an SVD to compute the rank of their covariance matrix. Empirically, we find rank <10 even with 110-variable neighborhoods in a 50k-node network. In other words: a very small number of variables typically suffices to summarize all the information from a neighborhood of variables which is relevant far away. The system abstracts well.

Furthermore: the largest singular vectors are the same even for different choices of the second (far-away) neighborhood. Not only does a small summary suffice, but approximately the same small summary suffices. The abstractions are robust to changes in our “far away” reference neighborhood.

Discuss

### How & when to write a business plan

15 апреля, 2021 - 18:45
Published on April 15, 2021 3:45 PM GMT

You’ve decided you need a business plan – so where do you start?

My previous post explained why you should write a plan – and what’s wrong with all your excuses for not doing so. Here I give tips on how, and how not, to write one. But first of all, when to.

When?

Business plans are often written too late. As a proper first draft takes weeks to research and write, people tend to procrastinate it – and do other things too soon instead.

In particular, don’t start the business or create a full product before writing a plan! Why plan a skyscraper after building it? (Remember from my previous post that a plan is a simulation of a business, to help decide whether it’s worth starting at all.)

With my first company, we began by developing the product (music software) – which took, oh, seven years. Only then did I write a business plan. But for all we knew, the research involved might have shown that there wasn’t a big enough market, or we had unbeatable competition, so all our effort would have been wasted. (By sheer luck it didn’t.)

It’s also possible to write a plan too early, before establishing whether it will be worth the time. So, do these things first:

• Consider why you want to start a business. If it’s to get rich quick, that’s not going to happen – most start-ups fail, and almost all the rest take many years of hard work before you can enjoy success. Alternatively, if your aim is to break free and be your own boss, the reality is not quite like that either – as your customers, and maybe investors, end up becoming your boss.
• Consider whether you’re the right kind of person. You need to be smart, confident, motivated, thick-skinned, and good at teamwork; if you fall down on any of these, don’t start a business. You also need the skills and experience/expertise for your particular role in the start-up.
• Don’t just go with your first business idea, as it’s unlikely to be your best option. Come up with several ideas, in fields you or a co-founder know a lot about. Do initial research on them, then run any promising candidates past experts for their view (see below).
• While you shouldn’t create a complete product yet, there may be a case for producing a minimum viable product (MVP) or mock-up to see what potential customers think (as in the lean start-up methodology). But only do this now if you can make one quickly, in days or weeks; otherwise, leave it till later.

Then pick your best business idea and write a plan. Or it may be that none of your ideas is good enough – in which case give up. Don’t pursue an iffy idea just to fulfil your dream of starting a business; it will probably turn into a nightmare.

Sections

Various web sites cover the nuts and bolts of business plans, so I won’t. Most of them list key points, such as StartupDonut; two that go into greater depth are Inc.com and EntrepreneurHandbook.

But here’s a little about sections of your plan. Business plans use fairly standard section headings, which are topics to research, think about, discuss and model in depth. These include your start-up’s products/services, market, sales, marketing, operations, people, competitors, SWOT analysis, financials, and often an appendix with further detail.

Online advice often suggests fewer sections than this, with topics grouped together; but I reckon more is better, as it clarifies your thinking. For example, sales and marketing are often combined, because in small businesses they’re often managed by the same person or handled by the same channels. But if you can’t distinguish between sales and marketing, you haven’t thought them through. (Sales for example includes business model and pricing, which may have nothing to do with how customers hear about your product.)

The SWOT analysis should include all weaknesses, threats and other risks you can possibly think of, and say how you’ll deal with them. If you identify a seemingly fatal flaw, then a change to your product, how it is used, or its target audience, may well fix the problem. Your disruptive deodorant-shampoo combo may not appeal to salons, for example, though it might be a great product for dogs. If there’s no solution, abandon this start-up idea, and move on to your next best one.

Business plans start with an executive summary, but this should be written last of all – being a precis of the key points from the plan. You may well not understand what your start-up is until you’ve written the whole plan.

Expert feedback

Expert feedback is essential, because to a trained eye your world-beating plan will be like this:

So, believe what they tell you, and revise or abandon the plan accordingly.

Don’t dismiss the experts’ criticisms, e.g. on the grounds they don’t understand your product, market, or disruptive vision. They’re probably right. And don’t continue with your plan just because you’ve spent so much time on it!

If you do abandon the plan, bear in mind that it was still extremely useful – because it helped you to find a fatal flaw in advance, and avoid all the cost, time and stress of running a doomed business.

Refine the plan

If you keep going, spend several more months:

• maybe create an MVP or mock-up, and test it on customers
• revise the business plan, and get more feedback, perhaps several times

until the plan is like this:

It’s possible to spend too long planning, either due to analysis paralysis, or to avoid starting a business you’re not confident about. But under-planning is far more common for new entrepreneurs.

If all founders and experts end up satisfied with the plan, and it’s still your best option, decide whether to start the business.

Use the plan

Assuming you do go ahead, make sure you actually use the business plan! Follow it, and measure progress against it: not just financials, but the actions, responsibilities, targets and goals of everyone in the business. Your start-up will diverge from the plan over time, so figure out why things didn’t turn out as expected, and update it annually.

You can also use the plan to help raise money, to brief new senior employees, and as a basis for other documents and presentations.

Not writing a proper business plan is as bad as not doing one at all, because you can kid yourself or others that you are ready to go. Don’t throw a lame plan together just to say you’ve done so, or to try to raise money.

• Too short: ten pages is a bare minimum, if you’ve researched it and thought it through properly. See my previous post for why a lean plan, business model canvas or pitch deck won’t suffice. Though conversely, a very long plan (e.g. 50 pages) will go unread, and is hard to use and update later
• Sketchy, disorganised, or missing sections
• Doesn’t deal with obvious issues/questions/risks
• Over-optimistic sales projections (e.g. hockey-stick curves) and timescales
• Contains things you don’t fully believe, e.g. hype aimed at investors
• Ignores expert feedback.
Summary

Don’t begin by creating a company or full product. First, think through your motives and suitability for starting a business.

Then consider a range of start-up ideas, in fields the founders know a lot about. Run any promising ones past experts, and perhaps also customers (using quick MVPs or mock-ups).

Choose the best idea (if any), and only then write a proper business plan. It must be fully researched and thought through, with realistic projections, incorporating expert feedback, in multiple drafts over months. Half-baked efforts are worthless.

If you decide to start the business, then follow the plan, measure progress against it over time, understand deviations from it, and update it annually.

Discuss

### Andrés Gómez Emilsson at the SlateStarCodex Online Meetup

15 апреля, 2021 - 18:36
Published on April 15, 2021 3:36 PM GMT

Andrés will talk about "Mapping the Heaven Realms. After that we will socialize online.

Please register  here and we will send you an invitation before the event.

Description: Angels and demons lurk in our neurons. While "parallel dimensions" are most certainly nothing more than a product of the imagination, such products are still extremely important from a scientific and ethical point of view. Based on previous research, we have concluded that the intensity and quality of experiences follow a long-tail distribution. Indeed, the most blissful states of consciousness are orders of magnitude more intense, luminous, and satisfying than "merely" amazing experiences. Thus, mapping out the heaven realms, so to speak, may be of enormous value for the task of paradise engineering. In this talk, I will share what we currently know at QRI about the heaven realms of experience and how to experience them safely.

More so, I will touch upon how the phenomenological and information processing properties of ecstatic states of consciousness relate to modern neuroscientific paradigms such as predictive coding, indirect realism about perception, connectome-specific harmonic waves, and QRI's symmetry theory of valence and neural annealing theory.

Looking forward to seeing you there! Please consider bringing with you a fresh memory of some of your best experiences to share and discuss in light of these ideas.

Bio: Andrés has a Master’s Degree in Computational Psychology from Stanford and a professional background in graph theory, statistics, and affective science. Andrés was also the co-founder of the Stanford Transhumanist Association and first place winner of the Norway Math Olympiad. His work at QRI ranges from algorithm design, to psychedelic theory, to neurotechnology development, to mapping and studying the computational properties of consciousness. Andrés blogs at qualiacomputing.com

Discuss

### The irrelevance of test scores is greatly exaggerated

15 апреля, 2021 - 17:15
Published on April 15, 2021 2:15 PM GMT

Here's some claims about how grades (GPA) and test scores (ACT) predict success in college.

In a study released this month, the University of Chicago Consortium on School Research found—after surveying more than 55,000 public high school graduates—that grade point averages were five times as strong at predicting college graduation as were ACT scores. (Fortune)

High school GPAs show a very strong relationship with college graduation despite sizable school effects, and the relationship does not differ across high schools. In contrast, the relationship between ACT scores and college graduation is weak-to nothing once school effects are controlled. University of Chicago Consortium on School Research

“It was surprising not only to see that there was no relationship between ACT scores and college graduation at some high schools, but also to see that at many high schools the relationship was negative among students with the highest test scores” (Science Daily)

"The bottom line is that high school grades are powerful tools for gauging students' readiness for college, regardless of which high school a student attends, while ACT scores are not." (Inside Higher Ed)

(See also the Washington Post, Science Blog, Fatherly, The Chicago Sun Times, etc.)

All these articles are mild adaptions of a press release for Allensworth and Clark's 2020 paper "High School GPAs and ACT Scores as Predictors of College Completion".

I understood these articles as making the following claim: Standardized test scores are nearly useless (at least once you know GPAs), and colleges can eliminate them from admissions with no downside.

Surprised by this claim, I read the paper. I apologize if this is indelicate, but... the paper doesn't give the slightest shred of evidence that the above claim is true. It's not that the paper is wrong, exactly, it simply doesn't address how useful ACT scores are for college admissions.

So why do we have all these articles that seem to make this claim, you ask? That's an interesting question! But first, let's see what's actually in the paper.

Test scores are not irrelevant

The authors got data for 55,084 students who graduated from Chicago public schools between 2006 and 2009. Most of their analysis only looks at a subset of 17,753 who enrolled in a 4-year college immediately after high school. Here's the percentage of those students who graduated college within 6 years for each possible GPA and ACT score:

We can also visualize this by plotting each row of the above matrix as a line. This shows how graduation rates change for a fixed GPA score as the ACT score is changed.

It doesn't appear that ACT scores are useless... But let's test this more rigorously.

Test scores are highly predictive

The full dataset isn't available, but since we have the number of students in each ACT / GPA bin above, we can create a "pseudo" dataset, with a small loss of precision in the GPA and ACT score for each student. I did this, and then fit models to predict if a student would graduate using GPA alone, ACT alone, or with both together. (The model is cubic spline regression on top of a quantile transformation.)

To measure how good these fits are, I used cross-validation, repeatedly holding out 20% of the data, fitting a model like above to the other 80%, and then predicting if each student will graduate. You can measure how accurate the predictions are, either as a simple error rate (1-accuracy or as a Brier score. I also compare to a model using no features, which just predicts the base rate for everyone.

PredictorsBrierErrorNothing.249.491ACT only.219.355GPA only.210.330both.197.302

Now, you've got two options. You can claim that GPA is significantly better than the ACT. That's fine, but if you care about that difference, you care even more about the difference between (GPA only) and (GPA and ACT). You can also claim that the difference between GPA an ACT is insignificant. What you can't do is simultaneously claim that the GPA is better than the ACT and also that the ACT doesn't add value to the GPA. That's just incoherent.

I repeated this same calculation with other predictors: logistic regression, decision trees, and random forests. The numbers barely changed at all.

Still, these are all just calculations based on the first table in the paper.

What the paper actually did

For each student, they recorded three variables:

• Gender
• Ethnicity (Black, Latino, Asian)
• Poverty (average poverty rate in the student's census block)

For the students who enrolled in a 4-year college, they recorded four variables about that college:

• The number of students at the college
• The percentage of full-time students
• The student-faculty ratio
• The college's average graduation rate

They standardized all the variables to have unit mean and unit variance (except for than gender and ethnicity since these are binary). For example, GPA=0 for someone with the average grades, and GPA=-2 for someone 2 standard deviations below average.

With this data in hand, they fit a bunch of models.

First, they predicted graduation rates from grades alone. Higher grades were better. There are nothing really surprising here, so let's skip the details.

Second, they predicted graduation rates from ACT scores alone. Higher ACT scores were better. As you'd expect this relationship is strong. Again, let's skip the details.

Third, they predicted graduation rates from grades, student variables, and variables for the college the student enrolled at. This model gets a "likely-to-graduate" score for each student as follows. This labels student background variables and college institutional variables in different colors for clarity.

The "likely-to-graduate" score becomes a probability after a sigmoid transformation. If you're not familiar with sigmoid functions, think of them like this: If a student has a score is X then graduation probability is around .5 + .025 × X. For larger X (say |X|>1) scores start to have diminishing returns, since probabilities must be between 0 and 1.

For example, the coefficient for (male) above is -.092. This means that a male has around a 2.3% lower chance of graduating than an otherwise identical female. (For students with very high or very low scores the effect will be less.)

Fourth, they predicted graduation rates from ACT scores, student variables, and college variables.

The dependence on ACT is less than the dependence on GPA in Model 3. However, the dependence on student background and college variables is much higher.

Fifth, they predicted graduation rates from GPAs, ACT scores, student variables, and college variables.

Here, there's minimal dependence on ACT, but a negative dependence on ACT2, meaning that extreme ACT scores (high or low) both lead to lower likely-to-graduate scores.

Does that seem counterintuitive to you? Remember, we are taking a student who is already enrolled in a particular known college and predicting how likely that are to graduate from that college.

Sixth, they predicted graduation rates from the same stuff as in the previous model, but now adding mean GPA and ACT for the student's school. They also now standardize some variables relative to each high school.

I can't tell what variables are affected by this change of the way things are standardized. My guess is that it's just for GPA and the SAT, but it might affect other variables too.

I mean... not much?

Here's what these models do: Take a student with a certain GPA, ACT scores, background who is accepted to and enrolls in a given college. How likely are they to graduate?

It's true that these models have small coefficients in front of ACT. But does this mean ACT scores aren't good predictors of preparation for college? No. ACT scores are still influencing who enrolls in college and what college they go to. These models made that influence disappear by dropping all the students who didn't go to college, and then conditioning on the college they went to.

These models don't say much of anything about how college admissions should work. There's three reasons why.

First, these models are conditioning on student background! Look at the coefficients in Model 5. What exactly is the proposal here, to do college admissions using those coefficients? So, college should explicitly penalize men and poor students like this model does? Come on.

Second, test scores influence if students go to college at all. This entire analysis ignores the 67% of students who don't enroll in college. The paper confirms that ACT scores are a strong predictor of college enrollment.

Of course, many factors influence if a student will go to college. Do they want to? Can they get in? Can they afford it?

You might say, "Well of course the ACT is predictive here -- colleges are using it." Sure, but that's because colleges think it gauges preparation. It's possible they're wrong, but... isn't that kind of the question here? It's absurd to assume the ACT isn't predictive of college success, and then use that assumption to prove that the ACT isn't predictive of college success.

Third, for students who go to college, test scores influence which college they go to, and more selective colleges have higher graduation rates. Here's three private colleges in the Boston area and three public colleges in Michigan.

Collegeacceptance rateaverage graduation rateHarvard University5%98%Northeastern University18%85%Suffolk University84%63%University of Michigan - Ann Arbor23%92%Michigan State University71%80%Grand Valley State University83%60%

The paper also does a regression on students who go to college to try to predict the graduation rate of the college they end up at. Again, GPA and ACT scores are about equally predictive.

Of course, you could also drop the student background and college variables, and just predict from GPA and ACT. But remember, we did that above, and the ACT was extremely predictive.

Alternatively, I guess you could condition on student background without conditioning on the college students go to. I doubt this is a good idea or a realistic idea, but at least it's causally possible for colleges to use such a model to do admissions.

Why didn't the authors do this? Well... Actually, they did.

Unfortunately, this is sort of hidden away on a corner of the paper, and no coefficients are given other than for GPA and ACT. It's not clear if high-school GPA or ACT are even included here. The authors were not able to provide the other coefficients (nor to even acknowledge multiple polite requests notthatimbitteraboutit).

This is hidden away in a corner of a paper, and they only give the results for GPA and ACT, which is

This model includes student characteristics like race, but they don't share what those are.

The laundering of unproven claims

What happened? There's really nothing fundamentally wrong in the paper. It fits some models to some data and gets some coefficients! Interpreted carefully, it's all fine. And the paper itself never really pushes anything beyond the line of what's technically correct.

Somehow, though, the second the paper ends, and the press release starts, all that is thrown out the window. Rather than "ACT scores definitely predict college graduation, but they don't seem to give any extra information if you already know how they interact with college application, acceptance and enrollment, if we condition on student demographic variables in an implausible way", we get "ACT scores don't predict college success".

To be fair, a couple hedges like "once school effects are controlled" make their way into the articles but are treated as a minor technical aside and never explained.

Let's separate a bunch of claims.

1. It might be desirable to reduce the influence of test scores on college admissions to achieve worthy social goals.
2. It might be that test scores don't predict college graduation rates.
3. It might be that test scores only predict college graduation because selective (and high graduation-rate) colleges choose to use them in admissions.
4. It might be that if selective colleges stopped using test scores in admissions, test scores would no longer predict admissions.

I'm open to claim #1 being true. If you believe #1, it would be convenient if #2, #3, and #4 were true. But the universe is not here to please us. #2 is not just unproven but proven to be false. This paper does not provide evidence for #3 or #4. Yet because these claims were inserted into the public narrative after peer review, we have a situation where the paper isn't wrong, yet it is being used as evidence for claims it manifestly failed to establish.

Journals don't issue retractions for press releases.

A field guide

There's a fair number of errors and undefined notation in the paper, which might throw you off if you try to read it. I've created a guide to help with this.

Discuss

### Covid 4/15: Are We Seriously Doing This Again

15 апреля, 2021 - 16:00
Published on April 15, 2021 1:00 PM GMT

Yes we are. It can happen here

THIS IS LITERAL ONE IN A MILLION AND MUCH LESS THAN THE BASE RATE WHAT THE ACTUAL FUCK IS WRONG WITH YOU DID YOU NOT SEE WHAT HAPPENED LAST TIME ARE YOU COMPLETE MORONS OR ARE YOU MUSTACHE-TWIRLING VILLIANS YOU CAN’T NOT BE BOTH, AS IN IF YOU’RE NOT MORONS AND I LOOK AT PHOTOGRAPHS I WILL SEE MUSTACHES AND YOU PEOPLE WILL BE TWIRLING THEM:

If any of them don’t have mustaches, we need to get them some clip-on ones, because while lots of people die at least they should get to enjoy the pleasures of twirling.

Yes I am fully aware that it is technically a particular rarer blood clotting disorder that is happening here and thus in that subclass it is above the base rate and that there’s an argument this might be ‘real’ in some sense and no I do not care even a little bit about any of that and no I am not going to treat this with the dignity and respect that it does not in any way even potentially deserve. There are scientific details and if you find them interesting by all means read about them but I am ignoring them because like the points They. Do. Not. Matter

In case you were wondering how people were going to react or what this would do to public confidence, these are from less than an hour after the announcement:

I mean, they’re wrong, but I can’t fault their reasoning from where they sit, if you asked me the ‘which is more likely’ game back in 2019 I would most definitely have not have gone with ‘no really they’re doing this in a pandemic because of six cases.’

Again, seems logical to conclude they were rushed if they act in a way that would only make sense if they actually did rush.

The first time around with AstraZeneca, I could sort of understand the argument for the other side of the hesitancy effect when I squinted, that this would look like the Very Serious People Take Vaccine Safety Seriously and therefore we should now expect the people to trust the FDA more, and being untrustworthy stewards who kill a bunch of people in order to fool the public into thinking we are trustworthy is a tradeoff they thought we should make, but we ran the experiment on that hypothesis, and, yeah, no.

Seriously, people are not so stupid:

Also, when you keep saying loudly that any adverse things that happen will destroy your credibility, consider your credibility preemptively destroyed already because you’re either right or wrong:

This is going to permanently supercharge the anti-vax movement, not only on Covid but also in general, and kill a huge number of people. Over six cases. Note deaths. Cases. Six.

You know how many people died?

ONE. F***ing ONE.

No, this was not ‘going to come out.’ It was going to completely correctly get ignored. All they had to do was put it in the list of side effects and note it was extremely rare.

This is the Washington Post’s attempt to chronicle what was going through their minds, it’s sympathetic but doesn’t make the decision look any less absurd  if you actually think about the physical situation at all or how real people would react to it.

If I had to steelman the case being made, it’s a combination of believing that acting over-the-top paranoid about side effects makes people feel more confident rather than less confident, that a pause to inform people can meaningfully impact care for this rare type of blood clot, and thinking that until one looks at the data who knows how big the problem might be and one shouldn’t assume the math is right until you check it, so we should halt and catch fire for a day and then quickly convene a meeting to confirm that this is only going to kill one person in six million.

Even in a world in which the initial pause wasn’t crazy, there was a meeting the next day to go over the information, and the decision was made to wait 7-10 days and then meet again without making a choice about the pause (and obviously, here, if you choose not to decide you still have made a choice). They didn’t even make the ‘compromise’ decision of halting for young women (and yes, ‘people who are in the subpopulation that is often on birth control which causes orders of magnitude more blood clots than this seems like it’s a hint on what’s happening) and continuing for everyone else, since you can then swap doses between different groups and keep up your pace of vaccinations while you ‘investigate further’ whatever that means here. The failure to at least make that decision is obviously completely bonkers even if you somehow think the initial decision to halt and catch fire was reasonable, as laid out in this thread by someone who supported the first decision but at least supported the ‘compromise’ option at the meeting.

Here’s an argument that this isn’t so bad in the United States, as it will mostly only destroy faith in Johnson & Johnson, rather than faith in the mRNA vaccines as well, or all vaccines generally:

At a minimum, while we prepare to do that, we can at least implement Tyler’s modest proposal.

Let’s run the (other, not equal to one or six) numbers.

The Numbers Predictions

Prediction from last week: Positivity rate of 5.9% (up 0.4%) and deaths decline by 8%.

Once again Washington Post’s numbers baffle me, although this being six rather than seven days later makes them not impossible. Somehow tests fell, cases rose, and the positivity rate barely budged.

A key question whenever one gets good news on deaths is whether this is good news or whether it’s time shifted. If it’s cases shifting into the future, it means the next week looks doubly worse and on top of that you were fooled by what looked like a downward trend. Similarly, bad news can be a mirage from old cases. It now looks like the death rate decline has stalled out, which is unfortunate.

Predictions for next week: Positivity rate of 5.8% (up 0.2%) and deaths unchanged.

Deaths DateWESTMIDWESTSOUTHNORTHEASTTOTALFeb 25-Mar 3383416695610195813071Mar 4-Mar 1025951775371415399623Mar 11-Mar 1714921010321714027121Mar 18-Mar 241823957289512946969Mar 25-Mar 311445976256412626247Apr 1-Apr 71098867178911604914Apr 8-Apr 1410701037162111454873

Half or more of the Midwest increase is quirky data in Missouri, but that doesn’t make any of this good news, and it’s likely deaths are going to now be stable or go slightly up, along with cases, until we get enough vaccinations to turn things around.

Cases DateWESTMIDWESTSOUTHNORTHEASTFeb 25-Mar 366,15158,295151,253115,426Mar 4-Mar 1062,93557,262114,830109,916Mar 11-Mar 1749,69659,881109,141115,893Mar 18-Mar 2447,92172,81099,568127,421Mar 25-Mar 3149,66993,690102,134145,933Apr 1-Apr 752,891112,84898,390140,739Apr 8-Apr 1460,693124,161110,995137,213

Looking at this chart, it seems clear the Midwest’s problems are real. The finale wave is out in force there, even if it’s relatively tame in other places.

Given the increase in positive tests, and the report of a continued decline in test counts, I’m willing to believe that positive rates did go up ~0.4% in the past week, which Johns Hopkins confirms (although they have lower numbers on both ends than WaPo does), so the prediction miss was mostly about doing it based on Friday’s number or some similar quirk (or a math error on their end somewhere).

Things in many places other than the United States are quite bad. In India, they surpassed 200,000 cases per day and things are rapidly getting worse, and there are many other places that have big problems. Aside from the places that successfully did full suppression, the places doing actively well are the ones with strong vaccination campaigns. Facing the new strains while not keeping up in vaccinations is a very bad place to be right now.

Vaccinations

That small decline at the end might have something to do with the J&J suspension, or it could mostly be a random quirk. Either way, even without J&J, we still should be able to continue slowly expanding our vaccination rates until we hit a wall where we run out of people who want a shot put into their arm. There are signs this is starting to be an issue in some places, but mostly there’s still plenty of people eager to put the pandemic behind them.

As furious as I am at the J&J suspension, and as many people as it’s going to kill (most of which will be from disadvantaged groups and areas, which J&J’s one shot at room temperature made much easier to reach), it is important not to lose perspective. J&J was a small portion of our vaccine effort, and case growth is not that rapid, so it’s not going to kill hundreds of thousands of people, at least not in America. If we’re lucky, it will only kill thousands.

Vaccine Passport Hype

Washington Post reports on New York’s Excelsior pass, the first one of its kind. Conclusion was that it’s relatively easy to use and isn’t got reasonable privacy protections given the circumstances, but that unless you’re counting on ID to catch fraud it’s trivial to fake it via copying someone else’s pass. That sounds about right. It’s clear by now that there isn’t going to be a national system and that New York is the exception rather than the rule. That doesn’t render the questions moot, but it lowers their urgency and importance quite a lot.

Tyler’s position is that we should be planning full reopening, and that passports seem more likely to hinder that than help. That’s one of the key disagreements. Is the alternative to passports a full reopening, or is it more restrictions? My guess remains that not being able to check leads to more restrictions in the medium term (next few months), but there’s a point when that flips, and things that would have fully reopened without checking would, if given the opportunity to easily check vaccine status, continue to check that status for a while longer. We then have to balance these needs. My guess is that the ‘overtime’ period’ is 50% to last at least two months or so, but highly unlikely (<10%) to last for six, and that the ‘extra game time’ period when passports would help starts now and has at least three months to go most (75%) of the time, and there’s a decent chance (25%) it’s six months or more in at least many blue areas, so one can do a cost/benefit calculation with this plus all the other objections. Here I’m counting the extra restrictions as pure downside, because even with them the net risk is likely higher than with the pass, unless we’re checking physical cards at the door, which is a different cost/benefit tradeoff.

The other half of that is the argument from focus. If the country and discourse only have so many focus points (Imperial Focus Points!), which seems basically right, then it’s plausible that all the work on passports delays the full reopening not because of lowering the costs of not reopening fully, but by preventing the attention and blame pressure required to generate the reopenings. Doing anything at all, in this model, has high opportunity costs. I don’t think I value this as highly as Tyler but I’ve likely not been giving it its due.

Vaccines Still Work

The J&J suspension goes hand in hand with the ongoing campaign to convince the public that vaccines work, but don’t work in the sense of accomplishing anything for people. In the name of some combination of proving one’s Very Serious Person credentials, maximizing the quantity of economic harm and scaring people as much as possible, there’s a competition on how to give the impression that being out there is unsafe for the fully vaccinated.

What CNN is saying might be technically correct. A model where 90% of people who are vaccinated are fully safe, while 10% remain at similar risk to before vaccination, is simplified but mostly plausible. What CNN is technically saying here is that there are 100k people who are being exposed to possibly getting infected (look around, could it be you?) and Zeynep is pointing out that this is damn well written to give the ordinary person the impression that if we didn’t Do Something About It that one in ten people who fly vaccinated would get infected, so if you’re vaccinated and fly that way there’s a 10% chance you get infected, which is of course complete nonsense.

Even without this willful mislead it’s still terrible and leads to scaremongering, but this here is something special. There should be some kind of award for such things.

Also, Nate Silver is correct here, and it would be dishonest to treat ‘we don’t know if vaccinated people can transmit’ FUD spreading as anything but gaslighting.

Vaccinated people are almost certainly less infectious when they do get infected, on top of not getting infected. The reduction in risk to others is ‘we don’t know’ to the extent that it might be much safer than the 90-95% range in which it reduces risk of infection.

Anyone who tells you otherwise is either lying to you, or is believing the lies told to them by others. Those who continue to treat vaccinated people as risky to others, and avoid living life on that basis, are making a choice to not live life in order to send some sort of social message or tell themselves a story about the type of person they are, or some other not-physical-reality based motivation. Or they just aren’t that into you and it’s a convenient excuse.

That doesn’t mean risk for the vaccinated is zero, precautions that are cheap are worth taking and ‘stupid stuff’ is worth avoiding, and one should follow mask norms for social reasons cause it’s really not that big a deal, but on this ‘we don’t know if it works’ thing, seriously: Stop. Just stop.

Similarly, Zeynep has a thread here about wildly misleading headlines about effectiveness of the vaccines against variants. Studies that find the vaccines work fine are being reported as ‘vaccines don’t work as well’ in a way that has nothing to do with any practical implications. The practical implications are that they work just fine, thanks, and it’s so clear I’m not even going to go into it beyond that. Attempts like this one to scare people about this are pure gaslighting.

And act more like this (video, which is living its best life):

Which is why this is basically where we are:

That’s before the whole Johnson & Johnson mess.

In Other News

Yep.

I mean, at least it’s true.

More support for first doses first. As Tyler notes, it’s too late for America to benefit from this, but the rest of the world still could. The same goes for fractional dosing. From what I’ve seen, a lot of people are having a really unpleasant day or two after the second dose of Moderna, and my strong hunch is the severity is caused by using a dose that’s twice as big as it needs to be, and it would be actively better for them to get half doses plus we’d have twice as many doses to give out.

Canadians return home via taxis from Buffalo to avoid quarantine. They did indeed solve for the equilibrium.

Australia might not open its borders even after full vaccination. The hypothesis that Australia succeeded because it was using good epistemics to make decisions is not holding up well in the endgame.

Covid Tracking Project offers thoughts on data source issues. I miss them deeply.

Discuss