# Новости LessWrong.com

A community blog devoted to refining the art of rationality
Обновлено: 1 день 5 часов назад

### Thoughts on Voting Methods

17 ноября, 2020 - 23:23
Published on November 17, 2020 8:23 PM GMT

I've been nerd-sniped by voting theory recently. This post is a fairly disorganized set of thoughts.

Condorcet Isn't Utilitarian

The condorcet criterion doesn't make very much sense to me. My impression is that a good chuck of hard-core theorists think of this as one of the most important criteria for a voting method to satisfy. (I'm not really sure if that's true.)

What the condorcet criterion says is: if a candidate would win pairwise elections against each other candidate, they should win the whole election.

Here's my counterexample.

Consider an election with four candidates, and three major parties. The three major parties are at each other's throats. If one of them wins, they will enact laws which plunder the living daylights out of the losing parties, transferring wealth to their supporters.

The fourth candidate will plunder everyone and keep all the wealth. However, the fourth candidate is slightly worse at plundering than the other three.

We can model this scenario with just three voters for simplicity. Here are the voter utilities for the different candidates:

CandidatesVoters  ABCD110000120100013001001

D would beat everyone in a head-to-head election. But D is the worst option from a utilitarian standpoint!! Furthermore, I think I endorse the utilitarian judgement here. This is an election with only terrible options, but out of those terrible options, D is the worst.

VSE Isn't Everything

VSE is a way of basically calculating a utilitarian score for an election method, based on simulating a large number of elections. This is great! I think we should basically look at VSE first, as a way of evaluating proposed systems, and secondarily evaluate formal properties (such as the condorcet criterion, or preferably, others that make more sense) as a way of determining how robust the system is to crazy scenarios.

But I'm also somewhat dissatisfied with VSE; I think there might be better ways of calculating statistical scores for voting methods.

Candidate Options Matter

As we saw in the example for Condorcet, an election can't give very good results if all the candidates are awful, no matter how good the voting method.

Voting Methods Influence Candidate Selection

Some voting methods, specifically plurality (aka first-past-the-post) and instant runoff voting, are known to create incentive dynamics which encourage two-party systems to eventually emerge.

In order to model this, we would need to simulate many rounds of elections, with candidates (/political parties) responding to the incentives placed upon them for re-election. VSE instead simulates many independent elections, with randomly selected candidates.

Candidate Selection Systems Should Be Part of the Question

Furthermore, even if we ignore the previous point and restrict our attention to single elections, it seems really important to model the selection of candidates. Randomly selected candidates will be much different from those selected by the republican and democratic parties. These democratically selected candidates will probably be much better, in fact -- both parties know that they have to select a candidate who has broad appeal.

Furthermore, this would allow us to try and design better candidate selection methods.

I admit that this would be a distraction if the goal is just to score voting methods in the abstract. But if the goal is to actually implement better systems, then modeling candidate selection seems pretty important.

Utilitarianism Isn't Friendly

Suppose I modify the example from the beginning, to make the fourth candidate significantly worse at plundering the electorate:

CandidatesVoters  ABCD110000332010003330010033

Candidate D is still the utilitarian-worst candidate, by 1 utilon. But now (at least for me), the condorcet-winner idea starts to have some appeal: D is a good compromise candidate.

We don't just want a voting method to optimize total utility. We also want it to discourage unfair outcomes in some sense. I can think of two different ways to formalize this:

• Discourage wealth transfers. This is the more libertarian/conservative way of thinking about it. Candidates A, B, and C are bad because they take wealth from one person and give it to another person. This encourages rent-seeking behavior through regulatory capture.
• Encouraging equitable outcomes. A different way of thinking of it is that candidates A, B, and C are terrible because they create a large amount of inequality. This could be formalized by maximizing the product of utilities in the population rather than the sum, in keeping with Nash bargaining theory. Or, more extreme, we could maximize the minimum (in keeping with Rawls).

These two perspectives are ultimately incompatible, but the point is, VSE doesn't capture either of them. It doing so, it allows some very nasty dynamics to be counted as high VSE.

Obviously, the Condorcet criterion does capture this -- but, like maximizing the minimum voter's utility, I would say it strays too far from utilitarianism.

Selectorate Theory

This subsection and those that follow are based on reading The Dictator's Handbook by Bruce Bueno de Mesquita. (You might also want to check out The Logic of Political Survival, which I believe is a more formal version of selectorate theory.) For a short summary, see The Rules for Rulers video by CPG Grey.

The basic idea is that rulers do whatever it takes to stay in power. This means satisfying a number of key supporters, while maintaining personal control of the resources needed to maintain that satisfaction. If the number of supporters a ruler needs to satisfy is smaller, the government is more autocratic; if it is larger, the government is more democratic. This is a spectrum, with the life of the average citizen getting worse as we slide down the scale from democracy to autocracy.

Bruce Bueno de Mesquita claims that the size of the selectorate is the most important variable for governance. I claim that VSE does little to capture this variable.

Cutting the Pie

Imagine the classic pie-cutting problem: there are N people and 1 pie to share between them. Players must decide on a pie-cutting strategy by plurality vote.

There is one "fair" solution, namely to cut a 1/N piece for each player. But the point of this game is that there are many other equilibria, and none of them are stable under collusion.

If the vote would otherwise go to the fair solution, then half-plus-one of the people could get together and say "Let's all vote to split the pie just between us!".

But if that happened, then slightly more than half of that group could conspire together to split the pie just between them. And so on.

This is the pull toward autocracy: coalitions can increase their per-member rewards by reducing the number of coalition members.

Note that VSE is unable to see a problem here, because of its utilitarian foundation. By definition, a pie-cutting problem results in the same total utility no matter what (and, the same average utility) -- even if the winner wins on a tiny coalition.

VSE's failure to capture this also goes back to its failure to capture the problem of poor options on ballots. If the fair pie-cut was always on the ballot, then a coalition of less than 50% should never be able to win. (This is of course not a guarantee with plurality, but we know plurality is bad.)

Growing the Pie

Of course, the size of the pie is not really fixed. A government can enact good policies to grow the size of the pie, which means more for everyone, or at least more for those in power.

Bruce Bueno de Mesquita points out that the same public goods which grow the economy make revolution easier. Growing the pie is not worth the risk for autocracies. The more autocratic a government, the less such resources it will provide. The more democratic it is, the more it will provide. Growing the pie is the only way a 100% democracy can provide wealth to its constituents, and is still quite appealing to even moderately democratic governments. (He even cites research suggesting that between states within the early USA, significant economic differences can be largely explained by differences in the state governments. The effective amount of support needed to win in state elections in the early USA differed greatly. These differences explain the later economic success of the northern states better than several other hypotheses. See Chapter 10 of The Dictator's Handbook.)

Bruce Bueno de Mesquita argues that this is the reason that domocracy and autocracy are each more or less stable. A large coalition has a tendency to promote further democratization, as growing the coalition has a tendency to grow the pie further. A small coalition has no such incentive, and instead has a tendency to contract further.

VSE can, of course, capture the idea that growing the pie is good. But I worry that by failing to capture winning coalition size, it fails to encourage this in the long term.

How can we define the size of the winning coalition for election methods in general, and define modifications of VSE which take selectorate theory into account?

Discuss

### How the Moderna vaccine works, and a note about mRNA vaccines

17 ноября, 2020 - 20:22
Published on November 17, 2020 5:22 PM GMT

Epistemic status: Pretty confident I have it right. I'm not an expert, but I'm asking for feedback from experts, and changes would be added here.

What the Moderna vaccine does is it contains a piece of code (RNA) which asks your body to create a protein that is very similar to the SARS-COV-2 spike protein (which it uses to attach to your cells), but modified in such a way that it doesn't change shape when it touches the ACE-2 receptor (another protein that's on the surface of your cell). That's because the easiest place for an antibody to attach to gets hidden otherwise. Your body then says, "Hey, that's something new! Let's attack it!" and tries out a bunch of different things.

The good thing about the vaccine is that there are no viral particles. The only thing it contains are the code (plus other code that doesn't get used to generate things, but are useful in keeping the rest of the code stable). And, unlike the Pfizer vaccine, it's not self-propagating. In other words, the amount of RNA you get in your two shots would be all that's needed, and your body doesn't make extra strands of RNA that would ask for more spike proteins. And it gets a better immune response than normal COVID, which is awesome.

Side effects are minimal. Less than 2% of the people get a fever, and most don't even have a headache. Just some muscle pain etc near the injection site.

(The Pfizer vaccine works differently, but it's a similar idea. RNA code to produce the antigen to get an antibody response.)

These are the first ever mRNA vaccines. I've been following Moderna for a few years, and they were working on a MERS vaccine in the past. 4 days after the SARS-COV-2 genome was uploaded, they'd decided which sections they want to use, and which edits needed to be made, and they started manufacturing it (not mass production) literally the next day, and that was still in January. It's very modular, easy to change and modify, and most labs would be able to make their own vaccines with machines they already have.

Discuss

### Comparing Covid and Tobacco

17 ноября, 2020 - 19:13
Published on November 17, 2020 4:13 PM GMT

Tobacco kills 5 million people every year [1]. Covid probably won't pass 5 million this year regardless of our policies or behavior. And yet, Covid has been the focus of far greater scarce political attention than Tobacco. We have accepted an increase of 150 million people in global severe poverty and the trillions in economic damage to prevent Covid deaths. Tobacco eradication has received far less attention. What do you think are the biggest reasons for this difference?

1. Covid is new, people over-react to new threats.
2. Covid affects the most politically organized and powerful world demographics: Old people in rich countries. Tobacco affects poor old people in poor countries.
3. Covid is more tractable than Tobacco (the cost of preventing a Covid death is lower than preventing a Tobacco death). This seems unlikely, but I'm open to an argument.
4. The tax incentives cause governments to neglect the Tobacco problem.
5. People individually do not mind staying at home and watching Netflix. They therefore share/read/write more about Covid.
6. The world's policy elite knows people with Covid but knows very few tobacco addicts in Lebanon or China.
7. Policy Elite believes people can rationally decide to consume tobacco (hurt themselves) but not decide to social distance (hurt others)
8. The Covid attention results from a massive availability cascade. Once an issue becomes available enough to the policy elite, it's salience is self-reinforcing.
9. Individuals can affect Covid but organizing Tobacco policy NGO's for developing countries is a more complicated model.

Thoughts?

1. WHO (World Health Organization). 2012b. ‘‘Why Tobacco Is a Public Health Priority.’’ www.who.int/tobacco/health_priority/en/.

Discuss

### Writing to think

17 ноября, 2020 - 10:54
Published on November 17, 2020 7:54 AM GMT

There are a lot of things that I want to write blog posts about. I find myself feeling like I have something useful to say about a topic, and I want to say it. But when I actually sit down to get started, I run into problems.

• Sometimes I don't know how to explain what I want to say.
• Sometimes -- no, quite often, I can explain it reasonably well abstractly, but I can't think of good, concrete examples, and without those the post doesn't feel good enough to be worth posting.
• Sometimes the subject matter feels like it's not important enough.
• Sometimes I feel like I'm on to something, but the subject matter is something I only have an amateur's understanding of, I don't want to make noob mistakes in the post, but I also don't want to spend the time doing the research. Or maybe I still have trouble understanding it even after I do the research.
• Sometimes I just question whether or not my idea is actually a good one.

There's an insight I learned from Paul Graham in The Age of the Essay that I think addresses all of this. A lot of people want to collect their thoughts first before starting the process of putting them down on paper. To address all of the hesitations I mention above before getting started. You don't want to publish something that has these issues, so you may as well resolve them before you start writing, right? Seems pretty logical.

Here's the problem though. The act of writing can help you to resolve the issues. Actually, that's a huge understatement: it's enormously helpful. Someone who writes in this exploratory sense has a huge leg up on someone who tries to resolve the issues in their head. It's almost like trying to solve an algebra problem in your head vs. with paper and pencil. Writing seems to have a way of boosting your IQ by 20 points.

Here's an interesting thought that's never occurred to me before. There are various bloggers/writers who I keep up with: Scott Alexander, Robin Hanson, Paul Graham, Tim Urban. They're all smart and have lots of great ideas. I've always assumed that in order to be a good writer like them that you have to be smart and have good ideas first. Ie. that it's a prerequisite. But what if it's the opposite? What if they're smart and have good ideas because they spend a lot of time writing? Maybe the arrow of causality is reversed. Strictly speaking, I'm presenting a false dichotomy here. It's not one or the other. But I suspect that a big reason why these guys are all so smart is because they spend a lot of time writing.

I'm not sure why writing is this powerful. It doesn't seem like it should be. A small boost makes sense, but a superpower isn't something I would have predicted in advance.

Here's my hypothesis though. I think it has to do with working memory and mind wandering. Think of writing as putting a linear sequence of thoughts on paper. What's the advantage to them being on paper? Why not just think them in your head in that same sequence?

Well, one thing is that you might forget stuff in your head, but if it's on paper you can refer to it. It doesn't get lost. It seems like you should be able to maintain a pretty decent sequence of thoughts in your head, but I'm always surprised with how much I struggle to do so.

I'm able to do a much better job of not losing track when I am having a conversation though, as opposed to being alone with my thoughts, so it seems like the raw capacity to keep track is there. I suspect that mind wandering is the bigger issue. Both conversation and writing have a way of bringing you "back on track". Writing has always felt very meditative to me, and now that finally makes sense: meditation also is about preventing mind wandering and bringing yourself "back on track".

I hope that this post is the first of many. I want to start writing a lot more. I think that writing is a superpower. I'm on the bandwagon. Why not take advantage of it if it's available to me? I do have one big hesitation though: publishing.

Writing to think makes sense. But what if the end result still turns out crappy? What if it's meh? What if it's good but not great? Should you publish it to the world? I'm someone who leans towards saying no. I like to make sure it's pretty refined and high quality.

But that leads me to a catch-22: most thoughts I want to explore don't seem promising enough where I'd end up publishing them. Or, rather, they usually seem like they'd take way too much time to refine. And if I'm not going to publish them, well, why write them up in the first place?

Because of the title of this post: write to think. Duh. That's what we've been talking about this whole time. But somehow the monkey in my brain doesn't understand that, or just won't cooperate. I just can't motivate myself to write if it's not something I plan on publishing. Most of my ideas don't seem publish-worthy, so I end up not writing. But this is a very bad state that must change. Writing is a superpower, and I want to use it.

Part of the solution I'm going to attempt is just lowering my standards. Fuck it, you guys are just going to have to deal with my writing being shitty sometimes. I'd like to be able to look through my list of posts and feel content that each and every one is something that I put into the world because I am really proud of it and it deserves to be there, but that mindset just leads me to the catch-22.

Actually, I think it leads to a second catch-22 as well. When I look back at my old posts, I'm horrified by a lot of them, despite the fact that I tried to hold myself to this high standard for publishing. It's to the point where I want to say "that author is my past self, not current-me, and I don't want to associate with that past self. But this post right now is purely exploratory, and it feels like it's turning into one of my better posts. I suspect that by lowering the bar, it'll continue to lead to paradoxically high quality posts. To some extent at least.

Another part of the solution I'm going to attempt is to view blog posts as my motivation for learning something new. Let me explain. I was talking to a friend a few weeks about about learning. I'm the type of person who reads textbooks and likes learning for learning's sake. He's the opposite. He needs a more concrete, practical reason. "Learn X because I want to achieve/solve Y. and X will help with that." I think I need to adopt that mindset more, and maybe publishable-quality blog posts that I'm proud of can be my Y.

I've been talking about writing from the perspective of it being a superpower that makes you smarter, more insightful, and a clearer thinker. Those are all things that I care about. However, there are two other reasons to write that I think might be even bigger.

The first is for mental health reasons. This is a great example of something I hesitate to write about because I have no expertise in mental health. But it's an insanely important topic. "Huge if true". Anyway, I do have a pretty strong intuition about the importance of writing for mental health, and I have read some books. You could probably say that I have a strong amateur's undersatnding of the field. Hopefully I'll expand on this in the future, but for now check out James Pennebaker's research and the research on memory reconsolidation if you're interested.

The second reason other than smarts why I think writing is crazy powerful is because it's fun! At least for me. But I strongly suspect that it is for you too. If you give it a proper chance. I think it's a human thing, not a me thing.

I remember when I was in college and started writing blog posts for the first time. I was working on a startup and wanted to write a few posts about the subject matter. But then I lost my mind. I enjoyed it so much that I stopped caring about the startup and started writing posts that had nothing to do with the startup I was working on. I felt guilty because it wasn't what I was "supposed" to be working on, but hey, whatever works! A strong sense of happiness like that is hard to come by, so I think that there's wisdom in just running with it.

Discuss

### Notes on Honor

17 ноября, 2020 - 08:25
Published on November 17, 2020 5:25 AM GMT

What is honor?

“The one who is conscious of his soul’s nobility will not endure a dishonorable life.” ―Sophocles

There are several different ways in which I see the concept of honor deployed, including:

1. Honor as a package of other virtues. An honorable knight, for example, is one who practices all of the various virtues in the code of chivalry. A person might dishonor themself, or their family, or their profession, by flagrantly violating any one of a set of virtues. In some contexts, “honor” is more of a euphemism for one particular virtue: a young woman in an old novel who “defends her honor” is really defending her chastity; when you address a judge as “your honor” you’re hoping to get the message through to the virtue of justice in particular. In other contexts, honor is an explicit role-based code-of-conduct such as Omertà or the Hippocratic Oath.
2. Honor as reliability in the practice of virtues. When a boy scout says, “and that’s the truth: scout’s honor” he is asserting that he takes his pledge of honesty more seriously than the typical person, because of an additional honor code he feels bound by. Sometimes this facet goes by the name “rectitude”.
3. Honor as extraordinary investment in one’s character. An honorable person may be defined as someone who strongly values his or her character, such that they will go to great lengths to avoid doing anything vicious or shameful (even if nobody else will ever know). Sometimes this facet is called “pride” (or in an inverted way, “a sense of shame”), “character,” or “dignity.”
4. Honor as public standing or reputation. There is also a sense of honor which means something like “unusually sensitive to one’s social status, and prone to take exceptional offense to being insulted” — from which you get things like “honor culture,” “honor killings,” and the like (see Tamler Sommers’s Why Honor Matters for a sympathetic look at this variety of honor). In this case, your character and dignity are your own, but your honor is determined by those around you, and you may be periodically called upon to prove it or to defend it against insults.

Something common to most of these is that honorable people tend to hold themselves to unusually high standards. Someone with a strong sense of honor is not satisfied with being “more or less as decent as the next guy” but instead judges him or herself in a more inflexible and exacting way.

“The man of honor thinks of his character, the inferior man of his position. The man of honor desires justice, the inferior man favor.” (Analects of Confucius, Ⅳ.Ⅺ)

This sometimes leads to an association between honor (and especially its more aristocratic cousin “nobility”) and arrogance or vanity. In this way, honor may be in tension with the virtues of humility or modesty. If a sense of honor is used as the excuse for conforming to some arbitrary fashion (“why, that simply isn’t done where I come from,” “I would not be seen in such a place”) honor can seem a fancy name for mere snobbery.

But if honor is genuine and wise, it can make for a firm foundation for the other virtues. If you hold your character at a high price, it will be that much harder for temptation to buy you off. If your sense of honor is what motivates you, you will conduct yourself honorably even when nobody is watching.

Honor can be a variety of self-esteem, or can be a way of earning one’s self-esteem (“I would think less of myself were I to behave dishonorably”). You might think of it as a standard that you hold up for yourself, and try to sculpt yourself into, in order to make yourself as admirable as you can be.

“Character — the willingness to accept responsibility for one’s own life — is the source from which self respect springs.” ―Joan Didion

Dishonor usually connotes a failure of character rather than one of skill or luck. You can lose or fail honorably if you fought the good fight.

Philosopher Kwame Anthony Appiah, in his book The Honor Code, examined the sorts of moral revolutions that take place when widespread practices (like slavery, foot-binding, or dueling) come to be seen as reprehensible and fall out of favor. He claimed that evolving definitions of honor are what lie behind such changes, and explored how such evolution takes place.

Megalopsychia

In the Nicomachean Ethics, Aristotle introduces us to the megalopsyche, or great-soul — a sort of pinnacle of pride and self-regard, and a connoisseur of honor.

His portrayal of the great-souled man is slightly comical, even somewhat mocking. He skips opportunities to describe the great-souled man’s most attractive qualities, and lingers over his haughty unconcern and disdain and his presumption and self-regard and the way he works to dominate others and put them in his debt. I think Aristotle may be rubbing our noses in the fact that to him virtue is meant for the benefit of the virtuous person, not for the rest of us. We should not expect a great-souled person to be the sort of person we’d want as a best buddy, but as someone who is far above us and, probably, as a result fairly contemptuous of our affairs.

Among the traits of a great-souled man:

• He deserves and claims great things, but above all, honor.
• He is good in the highest degree, great in every virtue. You never see him behaving in a cowardly manner or wronging another person, because, loving honor above all, he has no motive to do such things.
• He will be moderately pleased at receiving great honors from good people, but just thinking these his due, in fact less than his due, but as the best honors perhaps that are available under the circumstances, he will make allowance. Casual honors from middling people, he will despise.
• He is indifferent to what fate brings him — “neither over-joyed by good fortune nor over-pained by evil” and cares not for power and wealth, except as a means to honor. Even honor, which he loves above all, he doesn’t make a big deal over.
• It doesn’t hurt if he’s rich, powerful, and well-born, though none of these things are sufficient.
• He doesn’t court danger, particularly since there’s not much he finds worth courting danger for. But when he encounters danger, he faces it “unsparing of his life, knowing that there are conditions on which life is not worth having.”
• He asks for nothing, but gives readily. He gives benefits and gifts, but hates to receive them, and hates to be in another’s debt, but will overpay a debt so as to turn the tables.
• Similarly, he remembers (and prefers to be reminded of) the services he has done for others, but not those he has received (for those things are reminders of having been in an inferior position, and the proud man prefers to be superior).
• He does not stoop but projects his dignity before people of high position and riches, but he behaves in an unassuming way towards ordinary folk, as it’s a vulgar thing to lord it over people below one’s station.
• He doesn’t exert himself for the sorts of honors most people strive for, but only for the best of the best. He’s a man of few deeds, but those few are fantastic.
• He’s a straight-talker. He respects truth more than people’s opinions of him, so he doesn’t hesitate to share his contempt and doesn’t waste time trying to be diplomatic. (This, amusingly, “except when he speaks in irony to the vulgar.”)
• He will not put himself in service to any so-called superior, but may choose to serve a friend.
• He doesn’t much go in for admiring things, since to a great person like him, nothing else is particularly outstanding.
• He doesn’t tend to bear grudges or remember wrongs against him.
• He doesn’t gossip or praise or bad-talk others, mostly because he doesn’t much care about the things that typically motivate people to do these things.
• He prefers to possess beautiful things of no particular use more than useful, profitable things.
• He moves slowly and deliberately, not in a rush, and speaks in a deep, level voice.
• He is, most assuredly, not he-or-she, though Aristotle doesn’t think he needs to point this out. The great-souled man is a great-souled man.

It’s almost like a James Bond-style action movie hero. And it reads more like a laundry list of what the great-souled man would be like than a description of what he is like. A fictional character, an avatar, The Übermensch.

Discuss

17 ноября, 2020 - 06:10
Published on November 17, 2020 3:10 AM GMT

HTTP offers a convenient way to download only the headers: send a HEAD request: $telnet www.example.com 80 Trying 93.184.216.34... Connected to www.example.com. Escape character is '^]'. HEAD / HTTP/1.1 Host: www.example.com HTTP/1.1 200 OK Content-Encoding: gzip Accept-Ranges: bytes Age: 325063 Cache-Control: max-age=604800 Content-Type: text/html; charset=UTF-8 Date: Tue, 17 Nov 2020 02:29:50 GMT Etag: "3147526947" Expires: Tue, 24 Nov 2020 02:29:50 GMT Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT Server: ECS (dcb/7F82) X-Cache: HIT Content-Length: 648 Of course you wouldn't usually manually type into telnet, you'd use something like curl:$ curl -I http://www.example.com HTTP/1.1 200 OK Accept-Ranges: bytes Age: 326121 Cache-Control: max-age=604800 Content-Type: text/html; charset=UTF-8 Date: Tue, 17 Nov 2020 02:47:38 GMT Etag: "3147526947" Expires: Tue, 24 Nov 2020 02:47:38 GMT Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT Server: ECS (dcb/7EC9) X-Cache: HIT Content-Length: 1256 It's defined in RFC 7231: The HEAD method is identical to GET except that the server MUST NOT send a message body in the response (i.e., the response terminates at the end of the header section). The server SHOULD send the same header fields in response to a HEAD request as it would have sent if the request had been a GET, except that the payload header fields MAY be omitted.

Unfortunately, HEAD is a trap. When you are trying to debug strange server behavior, it is much safer to send GET requests and throw away the body (ex, ex). Not only is "SHOULD" just a recommendation, but even if this were a "MUST" you could bet some servers would mishandle it. Counterfactuals are hard!

While differences are rare, always debugging by requesting the body like a normal client would, and then discarding it, means one fewer way that your debug request differs from a real one:

curl -sS -D- -o/dev/null http://www.example.com HTTP/1.1 200 OK Accept-Ranges: bytes Age: 326124 Cache-Control: max-age=604800 Content-Type: text/html; charset=UTF-8 Date: Tue, 17 Nov 2020 02:47:41 GMT Etag: "3147526947" Expires: Tue, 24 Nov 2020 02:47:41 GMT Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT Server: ECS (dcb/7EC9) Vary: Accept-Encoding X-Cache: HIT Content-Length: 1256 Going farther in the same direction, it's even better to start with "Copy as cURL": And then add the -sS -D- -o/dev/null to get the headers if that's all you want. Comment via: facebook Discuss ### Solomonoff Induction and Sleeping Beauty 17 ноября, 2020 - 05:28 Published on November 17, 2020 2:28 AM GMT Various people have said that Solomonoff Induction (SI) accords with the Self-Sampling Assumption (SSA) more than the Self-Indicating Assumption (SIA). See these posts and the comments on them: https://www.lesswrong.com/posts/sEij9C9MnzEs8kaBc/the-presumptuous-philosopher-self-locating-information-and I was surprised, because I like both SI and SIA. Both seem correct to me, and I carefully considered the apparent contradiction. I believe that I have dissolved the contradiction, and that SI, properly applied, actually implies SIA. I can't actually prove this broad claim, but I will at least argue that SI is a thirder in Sleeping Beauty, and gesture in the direction of what I think is wrong with the claims in the linked post. As a bonus, if you read till the end I'll throw in an intuition-generator for why SIA actually gives the correct answer in Presumptuous Philosopher. First, let me reconstruct the contradiction in the Sleeping Beauty context, and explain why it might seem that SI is a halfer. Naive view: There are three possible outcomes: Monday-Tails (MT), Monday-Heads (MH) and Tuesday-Heads (TH). Each of these three outcomes are equally simple, therefore the machines encoding each will get equal weighting and the probabilities are all 1/3. Antithesis: MT is actually simpler than MH. Why? Because if you know that it was heads, you still need to be told that it's Monday - but if you know that it's tails, then you already know that it's Monday. MT is one bit simpler than MH and therefore is twice as likely, under SI. SI is a halfer. Note that this is roughly the same argument as in the Presumptuous Philosopher post - it takes more information to encode "where you are" if there's many copies of you. Synthesis: Wait a minute. By equivalent logic, TH is simpler than MH - if you know that it's Tuesday, you automatically know that it was heads! TH is then equal to MT, but MH still seems more complicated - to "locate" it you need two bits of info. The core insight needed to solve this puzzle is that there's two different ways to encode MH - either "it's Monday and also heads", or "it's heads and also Monday". So each of those encodings are more complicated by one bit than the other options, but there's twice as many such encodings. In the end, MH=TH=MT=1/3. I strongly suspect the same thing ends up happening in the full Presumptuous Philosopher scenario, but it's difficult to show rigorously. One can easily reason that if there's 100 observers, there's multiple ways to encode each - "the 2nd observer", "the one after the 1st observer", "the one before the 3rd observer" all point to the same person. But it's much more difficult to estimate how it all adds up. I'm fairly confident based on the above argument that it all adds up to the thirder position in Sleeping Beauty. I think that in Presumptuous Philosopher it adds up such that you get full SIA, with no discount for the complexity of specifying individual observers. But I can't prove that. Presumptuous Philosopher intuition-generator You bump into Omega, who's sitting in front of a big red button that he clearly just pushed. He tells you that up until 60 seconds ago, when he pushed the button, there were a trillion trillion trillion observers in the universe. The button, when pushed, flips an internal fair coin. If heads, then it Thanos-style kills everyone in the universe at random except for one trillion people. If tails, it does nothing. Either way, everyone that survives has this conversation with Omega. What are the odds that the coin was heads? I think it's quite plausible to say that it's overwhelmingly unlikely for it to have been heads, given the fact that you survived. This scenario is identical in relevant aspects to Presumptuous Philosopher. Discuss ### The Darwin Game - Rounds 10 to 20 17 ноября, 2020 - 04:14 Published on November 17, 2020 1:14 AM GMT MeasureBot maintains its lead. Rounds 10-20 Everything so far Today's Obituary Bot Team Summary Round Silly Random Invert Bot 2-3 NPCs Returns 2 or 4 on the first round. Returns 5 - <opponents_last_move> on subsequent rounds. 10 Silly 3 Bot NPCs Always returns 3. 10 CooperateBot [Larks] Chaos Army "For the first 10 turns: return 3. For all subsequent turns: return the greater of 3 and (5 - the maximum value they have ever submitted)" 10 Silly Cement Bot 2 NPCs Returns 2 on the first turn. Otherwise, returns 5 - opponent_first_move. 12 Silly Counter Invert Bot NPCs Starts by randomly playing 2 or 3. Then always returns 5 -opponent_previous_move. 12 Silly Invert Bot 5 NPCs Returns 5 on the first round. Returns 5 - <opponents_last_move> on subsequent rounds. 12 Silly Cement Bot 3 NPCs Returns 3 on the first turn. Otherwise, returns 5 - opponent_first_move. 14 Silly Cement Bot 2-3 NPCs Returns 2 or 3 on the first turn. Otherwise, returns 5 - opponent_first_move. 14 Silly Invert Bot 3 NPCs Returns 3 on the first round. Returns 5 - <opponents_last_move> on subsequent rounds. 15 Silly Invert Bot 4 NPCs Returns 4 on the first round. Returns 5 - <opponents_last_move> on subsequent rounds. 17 Random-start-turn-taking Chaos Army Selects 3 or 2 randomly until symmetry is broken. Then oscillates between 2 and 3. 17 This alternate timeline will conclude on November 20, at 5 pm Pacific Time. Discuss ### Propinquity Cities So Far 17 ноября, 2020 - 02:12 Published on November 16, 2020 11:12 PM GMT Finding alternatives to war can save a lot of money Summary In any dense city, lots of people will be struggling to occupy the same set of spaces. To function, cities need to have some systematic way of resolving those positioning conflicts, a method for deciding who gets to go where. The methods we use now for resolving positioning conflicts (land markets and rent) have a lot of overhead that is both very obviously overhead and also overlooked as inevitable. I talk about that extensively, and some of its unexamined costs. I present an outline of what looks like a better method, Propinquity Optimization (proq), which resolves positioning conflicts at minimal cost, enabling a much higher maximum quality of life in dense cities. It feels urgently needed, to me. I am not sure whether it is the most urgently needed thing that I can be working on (I'm also responsible for this humanization of recommender systems/harmonization of global discourses and.. some other stuff). There's some discussion of its global importance in the Longtermist Significance section. In the course of this, I also discuss quite a lot of the problems in applied preference aggregation and some potentially novel ways to resolve some of them. Even if you aren't interested in building better cities, you might want to read it just to see an instance of applied utilitarianism as a legal mechanism. I think that aspect of it is really pretty neat. A Propinquity City assigns services and residents to whatever proposed locations optimize an aggregation of the preference expressions of the residents. Motivation In dense cities, even once housing supply has exceeded demand, most city dwellers will still have rent extracted from them to a significant proportion of their income: Dense living means that demand within the urban center doesn't ever drop. You can maybe get arbitrarily cheap housing in some incredibly sparsely populated outskirts, but not in the dense part. There seem to be levels of affordability that are firmly unreachable under the kinds of land allocation methods we use now. When we notice that there seem to be firm and significant limits on how cheaply a technology can ever come to operate, even at its peak, smart buyers start looking for alternatives. Price competition happens when competitors with high prices can be, in some way, outrun, by competitors with low prices. That just doesn't happen to land traders in high demand areas. If one land trader offers drastically lower rents that others can't match (maybe because they have entered an already high mortgage), those other land traders still sell all their units and stay in full business. They make their money, same as always. They don't get outcompeted. Cost-efficacy, beyond a point, is not rewarded with any increase in market share. The unfit are not selected out. The result is, reliably, the costs of operating in dense cities will always be high enough to significantly reduce mean quality of life. As long as most land within the city is privately held, this will not be solvable. There is also a commons problem in urban economies: When rents are raised on a beloved shared service (a restaurant, a teahouse, a bookstore, anything that provides a lot of value to the locals), forcing it to raise prices and cut quality, the losses in (real) property value in the surrounding neighborhood (which now no longer has that service as it was) exceed the individual gains to the service's owner's owner. The city suffers more than we can know under this dynamic. A participant in this economy - even one who expected a fair chance of getting to be the land owner - would prefer that this dynamic couldn't play out, such is the extent of the value being removed. There would be no central park under private ownership. We must imagine the many central parks that never were, and never can be. You could probably solve part of this problem with a type of city where land is owned and developed by a non-profit, or local government, where land rent goes to improving quality of life in the city (public spaces, libraries, meeting rooms, schools, etc). But, why draw taxes through those wars of rent? Why take taxes in proportion to rent conflict? Is that really a good way to resolve the positioning conflict, or to take taxes? Taxes disincent things, why disincent dense living? If we were designing something explicitly to do those things well, in correct proportion, I don't think it would do it like that. Land pricing provides shockingly few economic functions I wasn't expecting to be able to come up with so many points in support of this. Here, I'll be going through a checklist of things that useful market systems generally do, and I find that land markets generally don't do them well, if at all: • Land is not efficiently priced (Land price is adjusted quickly upwards, but not downwards; there is no way to short most of it, so the price is usually not an accurate reflection of demand (page 15 of Inadequate Equilibria)) • Increases in land price tend to be captured by people who didn't create much of that value, and can't really create much more of it: The value of a piece of land comes mostly from what it is next to, and the most impactful things that can be done to affect land value like making public goods like schools or parks or transit routes or even just food courts aren't in control of the land owners who benefit. Also, please contemplate the georgist meme. The prospect of increases in land price incent few of the causes of increases in land price. • Increases in land price tend to punish a lot of the people who did contribute to creating it, by living and working here, paying rent, and patronizing the local businesses, we are rewarded by having to pay even more rent and having the businesses we have grown to love increase their prices or cut quality or close down and be replaced with something premium mediocre or just mediocre. • Price signals from land markets often set density in proportion to demand, but it's not clear to me that land markets are actually particularly good at this. Would anyone argue that the city's architects couldn't set density levels well enough themselves? It seems to me like that's a pretty simple problem and that urban planners are mostly already solving correctly? • Don't basically all cities control density pretty tightly? I know that a lot of density restriction is just nimbies defending housing scarcity, but it can't all be that, can it? • Land is inelastic. Price signals conveying increased demand for land don't lead to the creation of more land, because it's not possible to create more land. • We can imagine multi-story cities like the shimizu megacity pyramid, which do effectively create more land, but no non-governmental process has ever created one of those, nor perhaps ever will. • We could frame ordinary multistory buildings as an increase in supply of land, but, per the previous point, I don't think we need land markets to help us to decide when and where taller buildings are needed, it's not that hard to get it pretty much right. • We can also imagine seasteads, which could perhaps part in the middle thus "creating new land" in a quite real sense, but they will need a pretty complicated process for governing the insertion of new land into the middle of things and it's not clear to me that this process will be distinct from the process I am going to propose. • You could argue that the auctioning (literal, implicit or historical) of land ensures that any given parcel will go to the person who wants it the most. That's one way of doing that. Another way of making sure things go to the person who wants it the most is by having applicants physically fight over it until each side is brought too close to death to justify continuing and the last one standing is declared renter. We tend to agree that it's good to avoid physically fighting over anything, because fighting imposes a great cost on its participants. Similarly, I contend that rent/bidding is actually not all that different in that respect. As a method of apportioning resources fairly according to conflicting wills, it costs about as much as possible. And the ones who own the city benefit from having that process of occupancy conflict resolution being as expensive as possible, but the people who live and work in them mostly don't. They would like to go somewhere where it's handled differently. In summary, land markets are not very good at what they do. They provide less functionality than we might have imagined. To completely replicate their functionality, we will need lots of information about the housing stock, and some (perhaps democratic) negotiation tool for deciding who and what gets to go where. Since those things would be useful to have anyway, that's what I'm trying to develop here. That's what proq would be. Tools for pooling information and negotiating with minimal overhead so that we do not need to burn so much money fighting each other for space. What it is Residents (who have bought shares, funding their part of the construction, who pay rates) in a Propinquity City provide the city with a pretty complete expression of their needs and preferences about their housing and their neighborhood. The city defines a mathematical function that represents how well those expressed desires are being satisfied, given everyone's locations. Solvers try to find ways of positioning residents and services that will make that number go as high as possible. Whichever location solution resolves with the highest number, is instated. More Concisely: Every month, a Propinquity City positions services and residents according to whatever proposed location solution optimizes an aggregation of the expressed preferences of the residents. Residents end up in the presence of the people they want and are wanted by. Services are allocated space according to expressed public will for them, rather than how much rent they can pay. We solve the occupancy conflict problem via cheap, efficient, negotiation towards an optimization criterion, instead of through a bedlam of costly bidding wars. The metric (or, the expression language) representing an individual resident's desires, focuses mainly on these features: • Adjacency desires: How near the resident wishes to be to specific people, types of people, services or types of services • The system only recognizes an adjacency desire to the extent that it is reciprocated by an adjacency desire of the other party. This limits nonconsensual interaction. • Consider, for instance, a celebrity, wanted by millions. The ones who want them the most are often not the ones they would want to live beside. The ones they want in return would be lost in the crowd. This prevents that. • Services generally automatically reciprocate resident adjacency desires (which for now would be implemented with a default maximum desire to be adjacent to anyone (or their preferred resident type)) • Requirements about their housing, things like "must be on the ground floor" or "must have blackout curtains" or "must face east". To support this, it would be a good idea to try to develop an open process for adding new qualities (and dropping disused ones?) to the checklist, that the city couldn't anticipate. I really don't like the idea of relying on a single bureaucracy to decide which qualities of a piece of housing might be worth keeping rows for. • This system should probably play a role in measuring needs to decide what sorts of new housing is built. • Whether they're willing to be moved, and how important it is to them that they not be. Generally, the aggregation gets a bonus for not moving a person (unless they communicate that they want to be moved). Having to move is annoying and it should only be done when it would raise utility further than the threshold. • It may be possible to plan cities in such a way that moving wouldn't be annoying at all. See Eliezer's Movable Housing for Scalable Cities. Look at how nice Kasitas were going to be. I think proq housing should at least consistently make apartment doors wide enough for a forklift to drive through. I'm concerned that going for fully modular relocatable housing would make this concept much harder to realize, but hm, (thinks of Elon Musk) maybe sometimes, dreaming freely and going after unprecedented things increases the probability of success, so long as you dream with an engineer's clarity. • There are lots of questions I have about this that I think we will almost certainly find answers to when we run the test games • Whether even very small movement penalties are totally good enough to keep proq from moving people whenever reasonable • Whether we can just promise not to move a resident unless their utility would be raised by it. To me, this seems likely to result in commons problems that would lock the city dead, but who knows. • Whether we can simply trust residents to just honestly report how annoying it would be for them to be moved, or whether we have to restrict that input to ensure that the configuration will be able to improve when it should. • Shouldn't people have a right to stand still? It's not clear that they should. Consider how we don't give people a right to stand still in a road. Living, dynamic communities are at least a little bit like traffic. They want to shift as communities shift, as new people and services come into existence, as movements split and scenes evolve. You should usually be able to stand still most of the time, but we can't make a firm guarantee right now. Maybe the test games will reveal that a universal right to stand still wouldn't cause the city to clog at all, but I definitely wouldn't bet on it. • I suppose the system will need to recognize legal restraining orders and not violate them. • Unsure how to implement in a utilitarian frame. Restraining orders are typically between a victim and a perpetrator of some heinous act. Restraining orders should not harm the victim; the burden of creating distance should be placed mostly on the perpetrator, but I'm not sure how to get the math to work in proq so that this rigidity wont mildly punish victim and perpetrator equally as often. An egalitarian altruistic city cares just as much about the needs of the perpetrator as the needs of the victims... • It's conceivable that there wont be much of a burden in the creation of distance; that the next best neighborhood for each person will be about as good as wherever they were. Testing needed. If the act was bad enough, whatever second degree social bonds were still holding these two in adjacency may sever and create a distance without any intervention from the solver, in which case it really wouldn't be a big deal. • Reciprocal desires to be separate (in response to, say, a breakup) would be supported without any special legal order. • There would be something beautiful about having an incentive to revisit your distancing orders against your exes every couple of months and maybe declaring that you are at peace with them now, and finding, one cycle, that they reciprocate. (Oh. This might need an additional little UI/system to be good. Otherwise they'd have to like explicitly arrange forgiveness by talking to each other which most people wont do, or decide unilaterally, which is a mess.) • Etcetera. The preference expression format will be about as broad and varied as the needs of people themselves. Then those measures of the quality of each resident's situation are aggregated in some way (added together, for instance). That aggregation of the preferences of the residents is the metric that the propinquity city is legally obligated to optimize. I'm not exactly sure what operation should be used for aggregation. It may need to vary between different proq cities, depending on their population's preferred variant of utilitarianism. Candidates include: • Simple addition. Traditional and decisive. • Addition of the sqrt of each individual utility (or some other function that has diminishing returns; logarithms, raising to the negative power), which will make the system try to help the people who have the least before it gives more to people who have a lot. That isn't my metaethics, but it's practical. It's there to reduce the number of people who end up feeling cheated out of what they were promised, or to allow the involvement of people who might have otherwise anticipated that they would be made sacrifice to the greater good due to some personal predisposition (I don't think I can currently imagine what sorts of characteristics would make an IRL utilitarian more inclined to sacrifice a person's wellbeing for others, but a picture would probably surface over time and people would react to that) • On the other hand, some parts of the individual utility function would already have decreasing returns (in the same way that money only buys decreasing marginal happiness, your 10th neighborhood friend is not half as important to you as your first). So I'm not sure this would really be needed. • The min of the utility functions: The score of the solution is the lowest score of any resident, it is all we should care about. • The firm anti-Omelas. • I'm going to be straight up: I don't think this is the correct aggregation function. The system should care, at least a little bit, about opportunities to make the vast majority of the populations' lives better, even if those interventions will not help some people. • Is there a function with an additional parameter that would naturally lerp between those? Idk. Just curious. • Maybe a weighted sum of the mean of the poorest i residents for all i in n • I'm starting to think this might be reaching for a kind of mathematical elegance that the moral principle of equality isn't going to turn out to have, though. There are a number of reasons equality reliably emerges as a value. Again, one is just pragmatism; if people can tell that they're guaranteed to receive a bad deal, they wont buy an apartment and the city will not get to welcome them in. Another purpose is keeping truces, by keeping power ratios between factions entrained with their sizes. Those are completely different sorts of objectives, and these aggregation rules don't much resemble either of them. • Huh. What if those things are implementable though. What if you could solve utilitarianism's sacrifice problem. • A crude solution to the predictably sacraficial deal problem would be to assess applicants' other options and make sure they're guaranteed utility greater than those (and maybe factor that into the price of their share) so that even if the optimizer would have otherwise disfavored them in some way it's still going to be worth buying in. • Solutions to the inter-faction instability problem could probably be cobbled together from measures of kinds of access to capital and then having the optimizer try to keep those fixed... but... negotiating an agreeable design for this would be uh, challenging. Proq resembles utilitarianism, but utilitarianism couldn't really be implemented, even if the political will were there (or if there were some rawlsian veil that evened out the expected payoffs and guaranteed that the deal would be worth it for everyone). We can't read peoples' actual utility functions. We can ask them to describe their utility function for us, but they will not generally answer with the true utility function, we will receive a speech act that has been carefully, strategically shaped to benefit them more than the truth would have. Anything implementable is always going to be more like a voting system than a metaphysical ideal. To argue for any voting system, we need to be able to argue that the dishonest individually rational voting strategies that people will inevitably discover and deploy will tend to add up to acceptable outcomes. If we can get people to tell the truth about their preferences, we can just measure the solution in terms of those, but it is difficult to incent people to tell the truth. In many cases, it's provably impossible. I don't know what strategic voting would look like under proq, I don't think we'll know until the system exists and we can play around with it and see which strategies thrive, but I know that there will be some analogue to strategic voting, there always is. I'm thinking about making a game version of life in a proq city and getting adversarial economist types to all try to "win" at it. (though it's important to emphasize here that in a eusocial game, winning doesn't tend to look like domination. It will tend to look like trading beneficially with others as a side effect of pursuing whatever your goal is. One of my projects in game design is addressing the alarming ubiquity of contrived zero sum multiplayer games. Every time I read the rules of an otherwise peaceful eurogame and wind up meeting again the phrase "whoever gets the most victory points Wins The Game" I groan a little louder. Soon you will be able to hear my groan from the mainland. One person's gain should not be presumed to be another's loss. Life isn't like that. Humans aren't like that. Long ago we entered a pact that bound our fates together.) There, in those games, we'll get a glimpse of this political ecology's future, and we'll see if the system continues optimizing utility under strain. I thought I'd need a pretty decent prototype propinquity optimizer algorithm for that, but I'm starting to think it might be a lot easier, and maybe a lot more fun to do a thing where every resident is able to submit hand-authored incremental improvements to the position solution. In the long term, offering a prize to whoever can optimize the allocation solution's aggregate utility the most might elicit near optimal solutions from specialists in location solving, who I'd anticipate would make use of some fancy algorithms, but it's conceivable that a series of incremental improvements from individuals and volunteers might turn out to do well enough in the beginning, as well as fostering enfranchisement. But anyway, in the least, whatever process optimizes the aggregate utility, it does not have to be a ministry of the city. The great thing about defining an objective, easily computable measure of solution quality is that it means we can cheaply consider allocation suggestions from whoever will offer one. If their solution scores the best, then it pretty much must be the best and that is the one we will pay for. So, one of my current considerations is this: How much could individual people use their understanding of their propinquity locality to incrementally improve the solution for themselves until we arrive at a solution that will be pretty close to ideal. I don't know. But I'd like to experiment. Playing propinquity optimizer seems fun to me. Making an app for editing location solutions and measuring their total propinquity also sounds like a great starting point for designing location optimization algorithms, if those later turn out to be necessary. If we do let residents submit incremental manual edits, a naive implementation would have some difficulties • What if multiple edits are coming in every minute but it takes the average person over a minute to compose, consider, and upload an edit. How do they get through. • I guess there'd need to be some fairly sophisticated operation that tries to apply edits that might be slightly out of date. It would ignore what they say about residents who are no longer where the patch thought they were, for instance. Sometimes it would have to report that the difference is too great and the edit is no longer applicable, or that some change in another part of the solution made the edit a negative change and so it can't legally be applied. Hopefully this would not happen too often. • quibbles, probably not important • Possible Exploit: A strategy where a solver superpower might hold back their best solution and permute the solution slowly but steadily upwards around drastic shifts to prevent those with local knowledge from being able to get their edit through, to ensure that they will not lose the prize to some rando who uses local knowledge to improve on their best solution. Unlikely, as this would be both hard to make and pretty evil. It's just unlikely that anyone would want to do the work. More likely, if local knowledge ever took the lead from a professional solver, they would lobby the city and we'll figure out what to do with that later. • Would there be a possibility of edit wars? ... Generally.. no, actually! Edit wars seem to be impossible, or in another sense, desirable. If edit A is legal, it must be increasing the utility (could ban edits that have zero effect on the utility to ensure this). The edit that negates A then, harms the solution, and would not be allowed to go through. To effectively reverse A, the adversary would have to find some way to at least mildly benefit the solution in some other respect to bring it higher than it previously had been, which would be a positive externality and should be allowed. • Should residents be allowed to change their adjacency preferences during the incremental edit period? Longtermist Significance I found proq by following the anguished cries of the present. In these cases it's good to step back and remember the endgame and ask if it still makes sense in light of that. I find that I have more questions than answers. A Propinquity City would support extents of quality of life that I've argued aren't possible under the current paradigm. It would be nice to have. Ultimately, though, dense cities will not be as important in the near future, given remote work (which I expect to be irresistable once VR headsets with foveated rendering reach retinal pixel densities) and dirt cheap automated delivery systems. Propinquity is good adjacency, but adjacency wont matter as much. Proq arose from a concern that a dense city could not ever be made cheap. I do still believe that, but I'm not sure we sorely need dense cities to be cheap. Might we live almost just as well in sparser, broader land markets where not all units sell, where it is theoretically possible for land prices to compete down to negligibility, where there aren't far more buyers than sellers. Proq offers us a future with at least one dense yet affordable megacity in the western world, a lively intersecting patchwork of emergent communities growing somewhere in the middle of the continent. The future without proq still offers us tesla-quickened land markets, expensive in urban centers but perhaps decent work will be available from any small town grouphouse with an internet connection. We may want to scatter broadly if we want to live on a non-profit's wages, but we will be able to live well enough. It's conceivable that this difference in living situations will have some predictable effects on cultural evolution. Anthropological forecasts on this would be deeply interesting. Proq will invite anyone who knows about it to contemplate legal systems that constitute from the optimization of a utility function. I wonder if experiencing the results of that might make the alignment problem more broadly obvious. For better, or for worse? I wish we had a clearer picture. Practical Concerns of Deploying it in Reality I really hope we wouldn't need to convert any pre-existing city that already has high land prices. That just looks like an impenetrable, unscalable political wall to me. I am not planning for that. We might have to kind of start from scratch. This isn't necessarily as depressing as it sounds. There are places in the world where construction progresses very quickly. Perhaps one day those businesses processes will make it to the west. If it does take a while for a city to grow, oh well. Personally, I think I would love living in a tiny fetal city. Maybe it could be like arcosanti. I can dream, at least. Acquiring land I would like if we could make this deal with the regional council: Once the city needs land, it could buy it at triple the price that rural land would have been expected to command had the city never been built. This prevents land-owners from holding the city's growth hostage with their newfound land wealth that the city, by being adjacent to them, created. It is just. It is profitable for the rural land-owners, their land-value still goes up significantly, and they still benefit from adjacency to the city in other ways. A rational citizenry, knowing that this city could not grow otherwise, would accept the deal. In Case no Rational Citizenries can be Found It's conceivable that there are places where the the land owned by the city on its outskirts could be scaled up faster than the city grows, meaning that by the time the region's landowners believe there's a thing of value here to exploit, when the city does start to press up against the edges of its domain, it would have enough residents to vote for fair prices for further expansion. Seeding the City's Economy A delightful puzzle. Finding a series of productive yet crazy organizations, each wanting to be near some of the ones before them, progressively becoming less and less crazy, until reasonable people start to get it. I can see some pieces of the solution; first businesses that don't mind solitude, then businesses that can operate with just an internet connection, and by then we will have more pieces to work with that I can't anticipate right now from here. What this needs from you We need to get generally better at designing less costly ways to credibly signal will: Develop voting theory, maybe develop some auction theory for cases where cost-free outcomes are not possible, but low-cost outcomes might be • An aside: we (Colton Dillion and I, and some passing contributions from others who were around) have been exploring a few low-cost auction designs, the general theme is that a cost is imposed for bidding and losing, so money is only burned when participants in the auction can't cheaply reach agreement about an advanced prediction of who was going to win. These sorts of auctions would make the most sense for dividing assets that have no owner; unclaimed territories, radio bandwidth, perhaps, occasionally, network traffic or road space? (There is only a tenuous sense in which urban land could be included in this category, I can't yet see how to apply these systems well to negotiating out of the wars of rent, but I'm going to keep looking.) • An auction where all bids made have to get paid to the house, even losing bids. Brutal but simple. • Regular auction, but each bid has a bidding fee. Failed bids are, thus, discouraged. Recognising and ceding to the strength of will of your opponents without bidding is thus incented. • Colton proposed this (and did some analysis): Everyone stakes an amount of money that is supposed to be proportionate to their will. This remains fixed. The stakes are revealed, and there is a withdrawal period where people can leave the pot at no penalty (surrender). For those who remain, the war begins. Each dollar from the largest bid is matched at random to a dollar from the remaining contenders and burned, steadily over time until one bidder is left remaining via sortition. The owner of the last remaining bid wins. • I like this mechanism. It lets you call peoples' bluffs by staying in and warring with them. It lets costly evidence of will increase gradually, only as much as is necessary for the less willing to be convinced to step aside. • I propose a continuous version of the above, where essentially the dollar unit of matching approaches the limit zero, which makes it non-random. Although this would not make auction outcomes predictable, as we cannot know when other bidders will pull out, it makes it more predictable, which should increase players inclination to surrender. Expressions of interest in buying a share - a permanent entitlement to a propinquitously located apartment in a Propinquity City, cost of about 40,000USD plus a yearly rates fee (covering maintenance and governance) - would be pleasant to receive, though not actionable at this stage. I may develop a small game for examining and maybe demonstrating propinquity optimization. If anyone else would be interested in developing a game about propinquity optimization, I'd contribute heavily. I really do feel like there's a fun game to be found in there. Designing a proq game with a score criterion that accurately reflects of a propinquity city's optimization criterion would be a really cool challenge. There is a chance, small, that it would help billions of people by helping to speed proq into reality. So for a game designer, it's very much worth thinking about. • I was delighted to learn that there is a game, Islanders, about a type of propinquity optimization. I find it pretty inspiring and it has renewed my energy. If anything here seems insufficiently well justified, or questionable, I encourage you to please ask about it. Chances are decent that I will have thought pretty deeply about it and I will have lots to say about why it was unavoidable, and I just wasn't able to fit it in anywhere. Discuss ### How can labour productivity be an indicator of automation? 17 ноября, 2020 - 00:16 Published on November 16, 2020 9:16 PM GMT I was listening to an interview with the economist Paul Krugman the other day about whether robots will be taking our jobs. He pointed to data showing that the growth in productivity has declined in recent years, citing this as evidence that AI and robotics are not (yet at least) going to be automating away many jobs. This puzzled me a little and led me to do a bit of research into the economists definition of productivity (specifically what they call labour productivity). I'm no expert on economics so this is just my best attempt. Labour productivity is calculated by a simple formula: Market Value Produced / Hours worked So if your economy produces100 in 10 hours the labour productivity is 10 $/h. What struck me as odd was that Krugman's analysis didn't seem to take into account the interaction between this productivity measure and reduced prices due to automation. Suppose a given market is producing$100 by selling 10 units at 10 per unit for 10 hours of labour. If automation allows us to produce 100 times more units with fewer hours of labour at first glance it seems like a no-brainer that his should increase labour productivity. But suppose that demand is not elastic, then the ability to produce 100 times more units with similar variable costs means that the price will fall dramatically. So now, the value of the market produced is much less (i.e., 10 units times a much lower price). Thus even though it takes many fewer working hours to produce the goods, the value of the goods has also fallen. Since both the denominator and the numerator would fall as a result of large scale automation, why does Krugman suggest that rising labour productivity would be the main indicator of whether large scale automation is happening in the economy? Is there some additional assumption economists are making when they make such claims? (e.g., that demand for goods is infinitely elastic) Discuss ### Extorsion beats brinksmanship, but the audience matters 17 ноября, 2020 - 00:13 Published on November 16, 2020 9:13 PM GMT Extorsion, simplified, is: • "Give me something I want or I'll hurt you (even if that costs me something too)". Brinksmanship is more like: • "Offer me something I want, or I'll blow up our deal and hurt you (even if that costs me something too)". Written that way, the two sound very similar. Indeed, I've argued that there is little difference between extorsion and trade offers, apart from the "default point". So why am I claiming that these two are different, and that extorsion is much more powerful? Because of a key difference in the default point: the outside audience. How the audience reacts Extorsion audience Suppose I am extorting someone; maybe I'm a blackmailer with naughty photos, a mafia offering "protection", or the roman empire demanding tribute. The problem for me is to make my threat credible: to show I will go through with the threat, even if that is risky or expensive for me to do so. Suppose I have twenty targets that need to convince that I'm serious. Then if one of them resists, this is exactly what I need. I will publish their photos/burn down their shop/invade their territory. My threat is credible, because I've just shown that I will carry it out; this keeps the other nineteen targets in line. Indeed, I might actively want one target to resist; that way, I pay the expenses of one threat carried out, but get full compliance from the other nineteen, rather than getting twenty sets of grudging partial compliances. This makes resisting extorsion very tricky. Suppose you were the target of my extorsion, and suppose that you had made it a principle to never give in to threats. And suppose that you had credibly demonstrated that principle. If we two are the only people around, then there's no advantage to me carrying out my threats[1]. But if there are other people in the audience, I still would want to hurt you if you don't give in. It's not about you; I want to credibly demonstrate I will carry out my threats. Indeed, carrying out my threats against you might be the best move on my part; I've shown I will carry them out, even when it arguably makes no sense to do so. Brinksmanship audience Now let's compare that with brinksmanship. Suppose I'm setting out the conditions for a business partnership, negotiating how to split a restaurant bill with friends, or maybe I'm a country negotiating a trade deal with a union I've just left. Now, maybe you won't offer me the deal I want, and I can then prove my credibility by blowing up the deal. Or maybe you'll grudgingly give in, and I'll walk away triumphant. But the problem is as follows: it is not useful for me to have a credible reputation for following up on brinksmanship threats. The other targets don't want to deal with someone with a reputation like that; they'll offer fewer deals, and worse ones. That makes resisting brinksmanship much easier than resisting extortion. If there are twenty potential groups I might strike a deal with, then blowing up a deal with one of them is not going to help me with the other nineteen. Committing to rejecting brinksmanship - say, by rejecting any deal that isn't "fair[2]" - is more credible, because I don't benefit from blowing things up. Which audience, though? Not all real-world examples of extortion and brinksmanship fit neatly into the above framework. Terrorists are generally trying to extort governments, but a generic "we don't negotiate with terrorists" seems to have served governments pretty well. The cold war saw lots of brinksmanship between the superpowers, where there wasn't really an audience of entities of comparable power. To sort that out, let's consider the size of the audience, i.e. the other entities that might practice extortion or brinksmanship, or have it practiced on them. Thus define: • Extortion: there are entities {A1,…,An}.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} , and {B1,…,Bm}. All Ai can threaten extortion on any Bj, in roughly similar ways. • Brinksmanship: there are entities {A1,…,An}, and {B1,…,Bm}. None of the Ai cannot force any of the Bj into deals. All entities have roughly similar ideas of what constitutes a "fair" deal, and each entity stands to gain roughly the same from each fair deal they agree to. So, how do things stand for varying n and m? n=1, m=1 This is the USA and the USSR during the Cold War. They have no rivals of comparable power; they don't really need to demonstrate the credibility of their mutual threats to an outside audience. All that matters, fundamentally, is how credible their threats are to each other. In this situation, extortion and brinksmanship are essentially the same thing. The two superpowers are locked in a contest with no clear default state, and which neither can exit or ignore. The situation is complicated, and very much dependent on individual decisions and personalities; there is no "best behaviour". n=1, m large This is the situation I described above: one main extorter/brinksmanshipper, a large audience of potential victims/trade partners. As we saw, extortion is effective and hard to resist, brinksmanship is ineffective. n large, m=1 Here there is a single "victim", and many entities that might seek to take advantage of them. This is like a government that "does not negotiate with terrorists"; there are many terrorists, potential terrorists, potential hijackers, potential hostage-takers, and so on, but one target. Here the incentives are reversed for extortion: the target is incentivised to resist extortion, even at great cost, lest giving in encourage others to try their hand at it. Since there's only one target, there's no audience that the extorters will try and show their credibility to, so they won't be incentivised to go ahead with their threats. Brinksmanship is even less effective; the target will hold out for good deals from some of Ai, to pressure the rest to also offer good deals. n and m both large Here the incentives are harder to parse for the extorters and their targets. The extorter Ai wants to demonstrate to all the Bj that they are serious about following up on their threats, and the target Bj wants to demonstrate to all Ai that they are serious in resisting threats. Depending on the individual dynamics, the Ai may attack, or fail to attack, for reasons that have nothing to do with their specific target. This gets hard to predict, and can depend on contingent factors (just as in the n=1, m=1 case), but there are multiple equilibriums that can be relatively stable (unlike the n=1, m=1 case). See the hawk versus dove game and the various complicated variants on that. Brinksmanship continues to be ineffective. Multiple fairness criteria If there are multiple fairness criteria, the equation for brinksmanship shifts: entities can apply brinksmanship to some deals, just as long at the rest of their audience doesn't feel they were excessive. Conversely, targets are incentivised to not have excessively strong "anti-brinksmanship" standards. It seems likely that some broad consensus on "fairness" will emerge, as a sort of averaging of different entities' judgements. 1. Neglecting acausal or counterfactual situations. ↩︎ 2. Notice the strong similarity between anti-brinksmanship and brinksmanship: in both cases, the parties are threatening to stop the deal unless their conditions are met. ↩︎ Discuss ### Anatomy of a Gear 16 ноября, 2020 - 19:34 Published on November 16, 2020 4:34 PM GMT Thankyou to Sisi Cheng (of the Working as Intended comic) for the excellent drawings. When modelling a physical gearbox, why do we chunk the system into gears? We’re essentially treating each gear as a black box, a discrete component without internal structure of its own, interesting mainly for its interactions with other components. Why not zoom out, and model the entire gearbox as a black box? Why not zoom in, and model the individual chunks of metal which comprise each gear, or even each individual atom? It seems like gears are a natural choice of abstraction for a gearbox. But why that choice? In general, what makes a good “gear” - a good lowest-level component in a “gears-level” model? Why choose “gears” as the components of our model? Why chunk the world that way, rather than chunk multiple gears into subsystems, or break up gears into even smaller subsystems? More generally: we want to build gears-level models, but we don’t want to model things down to the last atom. At some point, we have to accept some black boxes in our model components. So how do we decide where that point is? This post answers that question. We’ll start with a relatively-detailed discussion of a physical gear, then extend the reasoning to “gears” in other kinds of models. A Physical Gear Picture a gear, turning in a gearbox. We can pick out a little patch of metal on that gear, zoom in, and see lots of atoms vibrating around. Those vibrations aren’t completely random - vibrations of one atom are correlated with the atoms next to it, although the correlations do typically fall off over a short distance. If we look at the motions of all the atoms in one little chunk of the gear, it won’t tell us much about the motions of the atoms in some other chunk far away in the gear. … with one exception. If we average together the motion of all the atoms in our little chunk, we should get a decent estimate of the overall rotation speed of the gear. And that does tell us something about all the other little chunks of metal: it tells us roughly what the average motion in all those other chunks is. If we look at all the atoms in one little chunk of one gear, only the average motion of all the atoms will tell us much about motion of atoms in a neighboring gear. Caution: physical accuracy not a priority in this visual. Likewise, if we look at the motion of all the individual atoms in one little chunk of our gear, what does that tell us about the motion of atoms in other gears? Mostly nothing… except, again, the average motion of all the atoms tells us the overall rotation speed of our gear, which tells us something about the overall rotation speed of the other gears. So, if we were to somehow model the whole system at atomic scales, we’d find that all the information in one little chunk of atoms, relevant to some other chunk of atoms far away, was summarized by the rotation speed of the gear. In terms of dimensionality: the atom motions are extremely high dimensional. Every atom’s speed is a 3-dimensional vector, so with n atoms we have 3n dimensions. Even a tiny patch of metal has an awful lot of atoms, so that’s an awful lot of dimensions. Yet most of that information is irrelevant to everything far away. For purposes of predicting things far away, that huge number of dimensions can be summarized by a one-dimensional number: the gear’s rotation speed. That rotation speed, in turn, informs the motions of huge numbers of other atoms elsewhere in the system. It’s an “information bottleneck”: (high dimensional atom motions) -> (one dimensional rotation speed) -> (high dimensional atom motions elsewhere). We can think of the abstract object - the “gear” - as an interface to all the lower-level components which comprise the gear (e.g. atoms, in a physical gear). In general, a good “gear” in a model picks out some chunk of an object for which we have a good low-dimensional summary. All the information from that chunk which is relevant to predicting things elsewhere in the system should be summarized by a few low-dimensional parameters. For a physical gear, it’s the rotation speed. The next few sections give other examples. Rigid Body Dynamics One direct generalization of a physical gearbox is rigid body dynamics: we have rigid, solid objects which move around, push each other, bounce off each other, etc. This is a good model for most of the non-flexible solid objects around us most of the time: tables and chairs, wheels, hammers and screwdrivers and screws and nails, pens and pencils, sticks and rocks, pots and pans and dishes and silverware, etc. When thinking about the mechanics of rigid bodies (i.e. how they move), all the positions and motions of individual atoms in the object can be summarized by the overall position, orientation and motion of the object. It’s just like gears, though slightly more general - gears mostly just rotate in place, while rigid bodies in general can also move around. So, it’s natural to chunk rigid bodies into individual “objects”, i.e. treat them as single gears in our models. This is exactly what we do in our everyday lives: we treat a pencil as an object, a plate as an object, a rock as an object, …. These things make natural “objects” because the low-level dynamics of all their atoms can be summarized by just the position, orientation and motion of the whole object, for purposes of predicting how things will move. Note, however, that this is not sufficient for all questions we might ask about the object. For instance, if I want to know what sound an object makes when struck, then the vibrations of all the little parts become relevant. Rigid bodies make natural “gears” because we can answer a very broad range of questions just given a low-dimensional summary, not because we can answer all questions just given a low-dimensional summary. Things get more complicated for non-rigid bodies, like cloth or rope. Sometimes we can use a low-dimensional summary for the dynamics, like a rope in a pulley, but not always; cloth moves around in complicated ways. We can still answer a broad range of questions using only summary information, just not necessarily questions about dynamics. For instance, looking at the atoms in one little chunk of cloth tells us very little about other atoms in the same chunk of cloth, except that the material composition is probably the same. This is quite similar to the “gears” used in chemical models. Chemical Equations In chemistry, we summarize the state of a huge number of atoms with just a handful of chemical concentrations (plus temperature and pressure, depending on the application). Why? Well, as long as the system is mixed up, it doesn’t matter exactly where particular molecules are - they’re all more-or-less evenly distributed throughout the system. The exact positions of atoms in one patch of fluid at one time don’t tell us much about the exact positions of atoms in another patch of fluid at a later time, except insofar as they tell us about the average concentrations of each molecule type throughout the system as a whole. High-dimensional atom positions in one patch of fluid tell us low-dimensional average concentrations, which is all the information relevant to high-dimensional atom positions in some other patch of fluid (assuming everything is well-mixed). Note that this changes if the system isn’t well-mixed, e.g. a layer of oil on top of water, or a diffusion gradient. Then we either need to keep track of more information (e.g. concentrations in small patches of fluid) or we need to further restrict the questions we want to answer. Electronic Circuits Electronic circuits are a particularly interesting example, because avoiding “crosstalk” between circuit components is an explicit design goal. We generally don’t want e.g. a resistor to behave differently if there’s a magnet nearby. In other words: the design of a resistor is chosen so that we can summarize all the information about the component using just its overall resistance and the overall current (or voltage) through it at any given moment. We don’t need to worry about the details of the physical connection between the resistor and a wire. We don’t need to worry about whether the resistor is upside down, or spinning around, or a little hotter/colder than room temperature. We don’t need to worry about the low-level behavior of individual atoms within the resistor. We just need a one dimensional summary: overall current is proportional to overall voltage delta. The same applies to other electronic components - transistors, wires, capacitors, transformers, diodes, etc. Components are designed to make good “gears”: their behavior has a simple, low-dimensional summary. API Functions In software design, we often draw high-level diagrams of the software in which each component is a function in some API, and lines/arrows between components show which information is passed between them. This is useful mainly to the extent that the inputs and outputs of each component can be summarized without needing a detailed representation of the internal behavior of each function. This is often considered a major criterion of good software design: function should have simple interfaces. Even if there’s lots of internal complexity, like some function with many internal calculations (i.e. high dimensional, with many intermediate values calculated), the inputs and outputs should be low-dimensional (at least compared to the internals). When functions satisfy this criterion, they make good “gears” in our conceptual models of the software’s behavior. Companies In economics, companies work a lot like functions in software. The “interface” they provide is their catalogue of products. The product itself is relatively low-dimensional, compared to the complicated process which produced the product. It’s the same idea behind the classic “I, Pencil” essay: even something as simple as a pencil is produced by hundreds of people and machines with specialized functions. The pencil-user does not understand all the details of the pencil production process, but they don’t need to - they just need to know how to use a pencil. As an anti-example, imagine some preindustrial ironsmith. Without reliable standards for metal composition and quality, the smith might buy very different “iron” from different suppliers at different times. The smith would either have to know quite a bit about the production of the raw iron, or else accept an inconsistent product. If the smith needs to keep track of the whole production process, then the iron supplier would be a bad gear in an economic model. Organs The physiology of the kidney or liver or other internal organs is rather involved. Each contains a wide variety of specialized cell types and substructures, all in large numbers; their internal structure is high-dimensional. Yet the overall function of most organs allows a relatively low-dimensional summary: they regulate levels of specific hormones or metabolites or cell types. Takeaway A good “gear” - i.e. a lowest-level component in a model - should offer a (relatively) low-dimensional summary of its own internal subcomponents. That low-dimensional summary makes it practical to treat the gear as a black box, even though its internal components may be complicated and high dimensional (e.g. made of lots of atoms). The information which needs to be included in the summary depends on what questions we want to answer about the system, but a common theme is that broad classes of questions about behavior not-too-close to a particular subcomponent depend only on a few summary dimensions. Finally, I'll note that some of the examples above brush some subtleties under the rug (though the main idea does generally work as advertised). People are invited to try to spot some of those subtleties, and possibly propose the resolutions, in the comments. I'll point to one to kick things off: how does temperature fit in to our physical gearbox example? Discuss ### Is it rational to worry selfishly about conflict-related s-risks? 16 ноября, 2020 - 17:19 Published on November 16, 2020 2:19 PM GMT Does it seem likely that threats stemming from AI co-operation failures could affect current humans' selfish values with sufficient probability to impact the expected value of the future (to a selfish individual)? Most previous discussion seems to revolve around the possibility of simulations being used as a means of threats, but I'm unsure how feasible this would be to a coercive UFAI with access to limited compute; simulating a mind doesn't exactly seem cheap. Using some weird chemistry to synthesise beings which would matter morally (to humans or an FAI) would be another possibility, but I'm not sure how likely this is. If neither of these are likely to work, it might be reasonable to worry about these scenarios selfishly. Should we be worried about this in more than just an altruistic manner, and if so, should it deter one from cryonics? This seems meaningful to discuss in part because I'm far more loss-averse with regards to my selfish values than my altruistic ones ("good outcomes are more likely than bad ones" isn't particularly comforting) & suspect this to be the case with most people. Discuss ### How Roodman's GWP model translates to TAI timelines 16 ноября, 2020 - 17:05 Published on November 16, 2020 2:05 PM GMT How does David Roodman’s world GDP model translate to TAI timelines? Now, before I go any further, let me be the first to say that I don’t think we should use this model to predict TAI. This model takes a very broad outside view and is thus inferior to models like Ajeya Cotra’s which make use of more relevant information. (However, it is still useful for rebutting claims that TAI is unprecedented, inconsistent with historical trends, low-prior, etc.) Nevertheless, out of curiosity I thought I’d calculate what the model implies for TAI timelines. Here is the projection made by Roodman’s model. The red line is real historic GWP data; the splay of grey shades that continues it is the splay of possible futures calculated by the model. The median trajectory is the black line. I messed around with a ruler to make some rough calculations, marking up the image with blue lines as I went. The big blue line indicates the point on the median trajectory where GWP is 10x what is was in 2019. Eyeballing it, it looks like it happens around 2040, give or take a year. The small vertical blue line indicates the year 2037. The small horizontal blue line indicates GWP in 2037 on the median trajectory. Thus, it seems that between 2037 and 2040 on the median trajectory, GWP doubles. (One-ninth the distance between 1,000 and 1,000,000 is crossed, which is one-third of an order of magnitude, which is about one doubling). This means that TAI happens around 2037 on the median trajectory according to this model, at least according to Ajeya Cotra’s definition of transformative AI as “software which causes a tenfold acceleration in the rate of growth of the world economy (assuming that it is used everywhere that it would be economically profitable to use it)... This means that if TAI is developed in year Y, the entire world economy would more than double by year Y + 4.” What about the non-median trajectories? Each shade of grey represents 5 percent of the simulated future trajectories, so it looks like there’s about a 20% chance that GWP will be near-infinite by 2040 (and 10% by 2037). So, perhaps-too-hastily extrapolating backwards, maybe this means about a 20% chance of TAI by 2030 (and 10% by 2027). At this point, I should mention that I disagree with this definition of TAI; I think the point of no return (which is what matters for planning) is reasonably likely to come several years before TAI-by-this-definition appears. (It could also come several years later!) For more on why I think this, see this post. [link to be added when linked post appears] Finally, let’s discuss some of the reasons not to take this too seriously: This model has been overconfident historically. It was surprised by how fast GDP grew prior to 1970 and surprised by how slowly it grew thereafter. And if you look at the red trendline of actual GWP, it looks like the model may have been surprised in previous eras as well. Moreover, for the past few decades it has consistently predicted a median GWP-date of several decades ahead: The grey region is the confidence interval the model predicts for when growth goes to infinity. 100 on the x-axis is 1947. So, throughout the 1900’s the model has consistently predicted growth going to infinity in the first half of the twenty-first century, but in the last few decades in particular, it’s displayed a consistent pattern of pushing back the date of expected singularity, akin to the joke about how fusion power is always twenty years away: Model has access to data up to year X =Year of predicted singularityDifference194020298919502045951960202060197020104019802014341990202232200020313120102038282019204728 The upshot, I speculate, is that if we want to use this model to predict TAI, but we don’t want to take it 100% literally, we should push the median significantly back from 2037 while also increasing the variance significantly. This is because we are currently in a slower-than-the-model-predicts period, but faster-than-the-model-predicts periods are possible and indeed likely to happen around TAI. So probably the status quo will continue and GWP will continue to grow slowly and the model will continue to push back the date of expected singularity… but also at any moment there’s a chance that we’ll transition to a faster-than-the-model-predicts period, in which case TAI is imminent. (Thanks to Denis Drescher and Max Daniel for feedback on a draft) Discuss ### Range and Forecasting Accuracy 16 ноября, 2020 - 16:06 Published on November 16, 2020 1:06 PM GMT cross-posted from niplav.github.io This text looks at the accuracy of forecasts in relation to the time between forecast and resolution, and asks three questions: First; is the accuracy higher between forecasts; Second; is the accuracy higher between questions; Third; is the accuracy higher within questions? These questions are analyzed using data from PredictionBook and Metaculus, the answers turn out to be no, no and yes. Possible reasons are discussed. Range and Forecasting Accuracy Above all, don’t ask what to believe—ask what to anticipate. Every question of belief should flow from a question of anticipation, and that question of anticipation should be the center of the inquiry. Every guess of belief should begin by flowing to a specific guess of anticipation, and should continue to pay rent in future anticipations. If a belief turns deadbeat, evict it. Eliezer Yudkowsky, “Making Beliefs Pay Rent (in Anticipated Experiences)“, 2007 Probabilistic forecasting that aggregates both qualitative and quantitative methods is a comparatively simple idea. Basically, one needs to have only very few tools at ones disposal to being ready to start forecasting: • View of belief as probabilistic (perhaps with some bayesian epistemology) • Track records (grading results of forecasts using for example brier scores or log scores) • Probability theory (a concept of probabilities, and maybe some simple probability distributions) Since the 1980s, forecasting has slowly but surely matured from "X is going to happen because my intuition/divine revelation told me so" to "my probability distribution on the outcome of this random variable is an X distribution with the following parameters", or alternatively "I assign a probability of X% to this event". However, since this kind of forecasting is relatively recent, information about the accuracy of long-range forecasting is basically non-existent: 1. Long-range forecasts are often stated too imprecisely to be judged for accuracy. More 2. Even if a forecast is stated precisely, it might be difficult to find the information needed to check the forecast for accuracy. More 3. Degrees of confidence for long-range forecasts are rarely quantified. More 4. In most cases, no comparison to a “baseline method” or “null model” is possible, which makes it difficult to assess how easy or difficult the original forecasts were. More 5. Incentives for forecaster accuracy are usually unclear or weak. More 6. Very few studies have been designed so as to allow confident inference about which factors contributed to forecasting accuracy. More 7. It’s difficult to know how comparable past forecasting exercises are to the forecasting we do for grantmaking purposes, e.g. because the forecasts we make are of a different type, and because the forecasting training and methods we use are different. More Luke Muehlhauser, “How Feasible Is Long-range Forecasting?”, 2019 In this text, I will try to look at the accuracy of short-term and mid-term forecasting, which may shine some light on the relation between the range of forecasts and their accuracy in general. The range of a forecast is defined as the length of the timespan between the forecast and the resolution of the forecast. Keeping with Muehlhauser 2019, I will define short-term forecasts as forecasts with a range of less than a year, mid-range forecasts as forecasts with a range between 1 and 10 years, and long-term forecasts as forecasts with a range of more than 10 years (this distinction is not central to the following analysis, though). Fortunately, for short- and mid-range forecasts, two easily accessible sources of forecasts and their resolutions are available online: The two forecasting websites PredictionBook and Metaculus. To find out about the range of forecasts, I download, parse & analyse forecasting data from these sites. Metaculus and PredictionBook PredictionBook and Metaculus are both forecasting focussed sites, though not prediction markets, but rather function on the base of merit and track records: although you don't win money by being right, you can still boast about it (it is an open question whether other people will be impressed). Besides that, these sites make it easier to train ones calibration on real-world questions and become less wrong in the process. However, both sites differ in their approach to writing questions and judging and scoring forecasts. PredictionBook is much older than Metaculus: the former was first released in 2008, the latter started in 2015. It is also much less formal than Metaculus: it doesn't require stringent resolution criteria, making possible for everybody to judge a question (unrelated to whether the person has even made a prediction on the question themselves!), while Metaculus requires a short text explaining the context and resolution criteria for a question, with the questions being resolved by moderators or admins. This leads to Metaculus having less questions than PredictionBook, but each question having more predictions on it. Of the two, Metaculus is much more featureful: It supports not only binary questions, but also range questions with probability distributions, comment threads, closed questions (questions that haven't yet been resolved, but that can still be predicted on), three different kinds of scores (the Brier score, and a logarithmic scoring rule for discrete and continuous forecasts each), as well as the Metaculus prediction, a weighted aggregation of the forecasts of the best forecasters on the site. Another significant difference between these two websites is the amount of data they publish: PredictionBook shows every single forecast made, while on Metaculus one can only see the community forecast (a the time-weighted median of the forecasts made on the question). This is relevant for this analysis: The two approaches must be analysed separately. Getting the Data First of all, the data for both platforms needs to be made available in a reasonable format. This works nicer for Metaculus, and is a bit more difficult to achieve for PredictionBook. The resulting data from Metaculus is here, for PredictionBook it's here. For Metaculus The Metaculus data is relatively easy to obtain: The forecasts are available on a JSON API at https://www.metaculus.com/api2/questions/?page=. Fortunately, gimpf has already published a collection of scripts for fetching & analysing Metaculus data. I reused their script fetch to download the raw JSON. I then converted the distinct page objects in the generated file to a list of questions: cd /usr/local/src $git clone https://github.com/gimpf/metaculus-question-stats$ cd metaculus-question-stats $./fetch$ z site jq -s '[.]|flatten' </usr/local/src/metaculus/data-questions-raw.json >data/metaculus.json The resulting data is available here. I then wrote a python script to convert the JSON data to CSV in the form id,questionrange,result,probability,range, while also filtering out yet unresolved questions and range questions. Here, id is a unique numerical ID per question, which will come in handy later, questionrange is the duration between the time for creating and resolving the question, result is the result of the question (either 0 or 1), probability is the probability given by the predictor ]0;1[.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0} .mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table} .mjx-full-width {text-align: center; display: table-cell!important; width: 10000em} .mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0} .mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left} .mjx-numerator {display: block; text-align: center} .mjx-denominator {display: block; text-align: center} .MJXc-stacked {height: 0; position: relative} .MJXc-stacked > * {position: absolute} .MJXc-bevelled > * {display: inline-block} .mjx-stack {display: inline-block} .mjx-op {display: block} .mjx-under {display: table-cell} .mjx-over {display: block} .mjx-over > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-under > * {padding-left: 0px!important; padding-right: 0px!important} .mjx-stack > .mjx-sup {display: block} .mjx-stack > .mjx-sub {display: block} .mjx-prestack > .mjx-presup {display: block} .mjx-prestack > .mjx-presub {display: block} .mjx-delim-h > .mjx-char {display: inline-block} .mjx-surd {vertical-align: top} .mjx-mphantom * {visibility: hidden} .mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%} .mjx-annotation-xml {line-height: normal} .mjx-menclose > svg {fill: none; stroke: currentColor} .mjx-mtr {display: table-row} .mjx-mlabeledtr {display: table-row} .mjx-mtd {display: table-cell; text-align: center} .mjx-label {display: table-row} .mjx-box {display: inline-block} .mjx-block {display: block} .mjx-span {display: inline} .mjx-char {display: block; white-space: pre} .mjx-itable {display: inline-table; width: auto} .mjx-row {display: table-row} .mjx-cell {display: table-cell} .mjx-table {display: table; width: 100%} .mjx-line {display: block; height: 0} .mjx-strut {width: 0; padding-top: 1em} .mjx-vsize {width: 0} .MJXc-space1 {margin-left: .167em} .MJXc-space2 {margin-left: .222em} .MJXc-space3 {margin-left: .278em} .mjx-test.mjx-test-display {display: table!important} .mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px} .mjx-test.mjx-test-default {display: block!important; clear: both} .mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex} .mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left} .mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right} .mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0} .MJXc-TeX-unknown-R {font-family: monospace; font-style: normal; font-weight: normal} .MJXc-TeX-unknown-I {font-family: monospace; font-style: italic; font-weight: normal} .MJXc-TeX-unknown-B {font-family: monospace; font-style: normal; font-weight: bold} .MJXc-TeX-unknown-BI {font-family: monospace; font-style: italic; font-weight: bold} .MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw} .MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw} .MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw} .MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw} .MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw} .MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw} .MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw} .MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw} .MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw} .MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw} .MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw} .MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw} .MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw} .MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw} .MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw} .MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw} .MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw} .MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw} .MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw} .MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw} .MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw} @font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')} @font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')} @font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold} @font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')} @font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')} @font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold} @font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')} @font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic} @font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')} @font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')} @font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold} @font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')} @font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic} @font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')} @font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')} @font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')} @font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')} @font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold} @font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')} @font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic} @font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')} @font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')} @font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic} @font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')} @font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')} @font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')} @font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')} @font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')} @font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')} @font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')} @font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold} @font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')} , and range is the duration between the forecast and the resolution. The script is not terribly interesting: It just reads in the JSON data, parses and traverses it, printing the CSV in the process. Code: #!/usr/bin/env python3 import json import time from time import mktime f=open("../../data/metaculus.json") jsondata=json.load(f) for page in jsondata: for question in page["results"]: if question["possibilities"]["type"]=="binary" and (question["resolution"]==1 or question["resolution"]==0): try: restime=time.strptime(question["resolve_time"],"%Y-%m-%dT%H:%M:%S.%fZ") except: restime=time.strptime(question["resolve_time"],"%Y-%m-%dT%H:%M:%SZ") try: createtime=time.strptime(question["created_time"],"%Y-%m-%dT%H:%M:%S.%fZ") except: createtime=time.strptime(question["created_time"],"%Y-%m-%dT%H:%M:%SZ") for pred in question["prediction_timeseries"]: timediff=mktime(restime)-pred["t"] qtimediff=mktime(restime)-mktime(createtime) print("{},{},{},{},{}".format(question["id"], qtimediff, question["resolution"], pred["community_prediction"], timediff)) The resulting CSV file contains over 40k predictions. For PredictionBook As far as I know, PredictionBook doesn't publish its data over an API. However, all individual predictions are visible on the web, which means I had to parse the HTML itself using BeautifulSoup. This time the code is more complex, but just slightly so: It starts at the first page of predictions, and loops down to the last one, every time iterating through the questions on that page. It then loops through the predictions on each question and parses out the date for the prediction and the credence. Every question on PredictionBook has two dates related to its resolution: the 'known on' date, for which the resolution was originally planned, and by which the result should be known, and the 'judged on' date, on which the resolution was actually made. I take the second date to avoid predictions with negative differences between prediction and resolution time. The output of this script is in the same format as the one for Metaculus data: id,questionrange,result,probability,range (although here probability can also be 0 and 1, which Metaculus doesn't allow). Code: #!/usr/bin/env python2 import urllib2 import sys import time from bs4 import BeautifulSoup from time import mktime def showforecasts(linkp, res): urlp="https://predictionbook.com{}".format(linkp) reqp=urllib2.Request(urlp, headers={"User-Agent" : "Firefox"}) try: conp=urllib2.urlopen(reqp, timeout=10) except (urllib2.HTTPError, urllib2.URLError) as e: return datap=conp.read() soupp=BeautifulSoup(datap, "html.parser") timedata=soupp.find(lambda tag:tag.name=="p" and "Created by" in tag.text) resolved=timedata.find("span", class_="judgement").find("span", class_="date created_at").get("title") restime=time.strptime(resolved,"%Y-%m-%d %H:%M:%S UTC") created=timedata.find("span", class_="date").get("title") createtime=time.strptime(created,"%Y-%m-%d %H:%M:%S UTC") responses=soupp.find_all("li", class_="response") for r in responses: forecasts=r.find_all("span", class_="confidence") if forecasts!=[]: est=float(r.find_all("span", class_="confidence")[0].text.strip("%"))/100 else: continue estimated=r.find("span", class_="date").get("title") esttime=time.strptime(estimated,"%Y-%m-%d %H:%M:%S UTC") print("{},{},{},{},{}".format(linkp.replace("/predictions/", ""), mktime(restime)-mktime(createtime), res, est, mktime(restime)-mktime(esttime))) for page in range(1,400): url="https://predictionbook.com/predictions/page/{}".format(page) req=urllib2.Request(url, headers={"User-Agent" : "Firefox"}) try: con=urllib2.urlopen(req) except (urllib2.HTTPError, urllib2.URLError) as e: continue data=con.read() soup=BeautifulSoup(data, "html.parser") predright=soup.find_all("li", {"class": "prediction right"}) predwrong=soup.find_all("li", {"class": "prediction wrong"}) for pred in predright: linkp=pred.span.a.get("href") showforecasts(linkp, "1.0") for pred in predwrong: linkp=pred.span.a.get("href") showforecasts(linkp, "0.0") Surprisingly, both platforms had almost the same amount of individual predictions on binary resolved questions: ~48k for Metaculus, and ~44k for PredictionBook. Three Different Analyses: An Illustrative Example In this text, I analyze the relation between accuracy and range in forecasting, considering three different aspects: • Between forecasts • Between questions • Within questions What exactly does this mean? Let's say there are two people: Bessie and Heloïse. They are trying to make predictions about the weather about different time horizons (it is currently end of August): 1. Will it rain tomorrow? (resolution: no/0) 2. Will the average temperature in August in 1 year be higher than 20°C? (resolution: no/0) Let's say that they make the following predictions: • Bessie: 0.3 for 1, 0.85 for 2 • Heloïse: 0.1 for 1, 0.6 for 2 Let's also say that they make their predictions in alphabetical order of their names, one hour after another (Bessie at 00:00 and Heloïse at 01:00). Judging Between Forecasts Evaluating the relation between forecasts would be as following: Each forecast, its resolution and its timespan are independently analyzed. We have four predictions: 1. One with a range of 23 hours, a probability of 0.1 (Heloïse's prediction on 1), and a resolution of 0 2. One with a range of 24 hours, a probability of 0.3, (Bessie's prediction on 1) and a resolution of 0 3. One with a range of 24h/d∗365d−1h=8759h (it's not a leap year), a probability of 0.6 (Heloïse's prediction on 2), and a resolution 0 4. One with a range of 24h/d∗365d=8760h, a probability of 0.85 (Bessie's prediction on 2), and a resolution 0 The Brier scores for ranges are then 0.01 for 23h, 0.09 for 24h, 0.36 for 8759h, and 0.7225 for 8760h. Here, higher range between forecasts is correlated with worse performance. Judging Between Questions Judging the performance between questions now means looking at the forecasts made on each question and evaluating the performance of forecasts on that question. Question 1 has a range of 24h, and question 2 has a range of 8760h. The Brier score for predictions on question 1 is 0.05, and the Brier score for predictions on question 2 is 0.54125. In this case, a higher range seems to be worse for performance on questions (Brier scores are lower/better for question 1). Judging Within Questions Within questions one examines each question separately. On question 1, the forecast with the higher range has a Brier score of 0.09, and the forecast with the lower range has a brier score of 0.01. So for question 1, higher range is correlated with worse performance. For question 2, it is similar, the forecast with the higher range (8760h) has a score of 0.7225, while the forecast with the lower range (8759h) has a score of 0.36. Here also higher range is correlated with worse performance. One can now try to aggregate the findings from the two questions and could tentatively conclude that generally range within questions is correlated negatively with accuracy of forecasts. These were of course only illustrative examples, but I hope that now the different approaches in this text are clearer than before. Accuracy Between Forecasts The first approach I took was to simply take the probability, result and range for all forecasts made, sort these forecasts into buckets by range (e.g. one bucket for all forecasts made 1 day before their resolution (and their results), one bucket for all forecasts made 2 days before their resolution, and so on). I then calculated the Brier score for each of these buckets, and then checked what the relation between brier score and range (the time between forecast & resolution) was (correlation & linear regression). Analysis Now that the two datasets are available, they can be properly analyzed. First, the raw data is loaded from the two CSV files and then the ID is converted to integer, and the rest of the fields are converted to floats (the range is a float for some Metaculus questions, and while the result can only take on 0 or 1, using float there makes it easier to calculate the brier score using mse.set): .fc(.ic("../../data/pb.csv"));pbraw::csv.load() .fc(.ic("../../data/met.csv"));metraw::csv.load() pbdata::+flr({0<*|x};{(1:*x),1.0:$'1_x}'pbraw) metdata::+flr({0<*|x};{(1:$*x),1.0:$'1_x}'metraw) To compare the accuracy between forecasts, one can't deal with individual forecasts, only with sets of forecasts and outcomes. Here, I organise the predictions into buckets according to range. The size of the buckets seems important: bigger buckets contain bigger datasets, but are also less granular. Also, should the size of buckets increase with increasing range (e.g. exponentially: the first bucket is for all predictions made one day or less before resolution, the second bucket for all predictions made 2-4 days before resolution, the third bucket for all predictions 4-8 days before resolution, and so on) or stay the same? I decided to use evenly sized buckets, and test with varying sizes of one day, one week, one month (30 days) and one year (365 days). spd::24*60*60 spw::7*spd spm::30*spd spy::365*spd dpbdiffs::{_x%spd}pbdata@3 wpbdiffs::{_x%spw}'pbdata@3 mpbdiffs::{_x%spm}'pbdata@3 ypbdiffs::{_x%spy}'pbdata@3 dmetdiffs::{_x%spd}'metdata@3 wmetdiffs::{_x%spw}'metdata@3 mmetdiffs::{_x%spm}'metdata@3 ymetdiffs::{_x%spy}'metdata@3 The Brier Score is a scoring rule for binary forecasts. It takes into account both calibration and resolution by basically being the mean squared error of forecast (ft) and outcome (ot): BS=1NN∑t=1(ft−ot)2 In Klong, it's easy to implement (and also available through the function mse.set): brier::{mu((x-y)^2)} Now, one can calculate the brier score for the forecasts and outcomes in each bucket (here I only show it for the days buckets, but it's similar for weeks, months and years): pbress::pbdata@1 pbfcs::pbdata@2 dpbdg::=dpbdiffs dpbdiffbrier::{(dpbdiffs@*x),brier(pbfcs@x;pbress@x)}'dpbdg dpbdiffbrier::dpbdiffbrier@<*'dpbdiffbrier metress::metdata@1 metfcs::metdata@2 dmetdg::=dmetdiffs dmetdiffbrier::{(dmetdiffs@*x),brier(metfcs@x;metress@x)}'dmetdg dmetdiffbrier::dmetdiffbrier@<*'dmetdiffbrier Every diffbrier list contains lists with two elements, the first one being the time between forecast and resolution, and the second one being the brier score for all forecasts made in that time. For example, ypbdiffbrier (the brier scores for all predictions made on PredictionBook 1/2/.../10 years before resolution) is [[0 0.162] [1 0.168] [2 0.164] [3 0.159] [4 0.13] [5 0.12] [6 0.128] [7 0.147] [8 0.121] [9 0.215] [10 0.297]] (Brier scores truncated using {(*x),(_1000**|x)%1000}'ypbdiffbrier). Results First, one can check how high the range of these two datasets really is. The PredictionBook forecasts with the highest range span 3730 days (more than 10 years), for Metaculus it's 1387 days (nearly 4 years). One can now look at the correlation between range and Brier score first for Metaculus, and then for PredictionBook: cor@+dmetdiffbrier -0.209003553312708299 cor@+wmetdiffbrier -0.255272030357598017 cor@+mmetdiffbrier -0.304951730306590024 cor@+ymetdiffbrier -0.545313822494739663 cor@+dpbdiffbrier -0.0278634569332282397 cor@+wpbdiffbrier 0.0121150252846883416 cor@+mpbdiffbrier 0.0752110636215072744 cor@+ypbdiffbrier 0.411830122003081247 For Metaculus, the results are pretty astonishing: the correlation is negative for all four options, meaning that the higher the range of the question, the lower the Brier score (and therefore, the higher the accuracy)! And the correlation is extremly low either: -0.2 is quite formidable. PredictionBook, on the other hand, is not as surprising: the correlations are mostly weak and indicate that accuracy doesn't change with range – a null result. Visualizing the forecasts with scatterplots and linear regressions shows a very similar picture (red dots are for Metaculus forecasts, blue dots are for PredictionBook forecasts): Scatterplot with linear regression for Metaculus & PredictionBook forecasts by range (in days) Scatterplot with linear regression for Metaculus & PredictionBook forecasts by range (in weeks) Scatterplot with linear regression for Metaculus & PredictionBook forecasts by range (in months) Scatterplot with linear regression for Metaculus & PredictionBook forecasts by range (in years) The high amounts of noise are probably due to the low number of predictions for single days (or, in the case of weeks and months, for years/months with a high range, as not enough questions with this range have resolved yet). Why Assume Accuracy will Increase? I believe that this finding is quite surprising. A priori, one would believe that beliefs about the near future are generally more accurate than beliefs about the far future: We can predict the weather in 2 minutes far better than the weather in 6 months, we can say much more about the position of a rock in an hour than in 100 years, more about the popularity of a political party in 2 months as opposed to 10 years. Even in reasonably chaotic systems, one should expect to become more and more accurate the closer one comes to the expected time. Take, for example, a double pendulum: I am totally able to predict its position & velocity 100ms before resolution time, but 1s before and it's already getting more difficult. Information, like nearly everything else, has diminishing value, posteriors converge continuously towards truth. Possible Explanations So, what is the reason for this rather weird finding? Several possible reasons come to mind. Range and Biased Questions The most obvious solution is that the analysis above is absolute bogus and completely meaningless: It compares questions about global catastrophic risks to popular banana brands, very different kinds of questions with very different kinds of forecasts. Here, one would assume that the longer-term questions asked are generally easier to predict, and that the effect goes away when one compares predictions among very similary questions (or, better, within questions). Generally, the long-term questions we prefer asking seem to be more menable to forecasting than short-term questions: development of population sizes, the climate, especially the movement of interstellar bodies is much more thoroughly modelled than the development of markets, elections and the weather. This is of course only a weak trend, but one that could influence the questions (as will be investigated in this section). Low Sample Sizes With High Ranges Another question one might ask is: How big are the sample sizes at the tails when the range is high? This is important: low sample sizes increase noise dramatically, and make findings much less reliable. To get a rough overview over the sample sizes, on can look at the number of samples for each bucket. I generated charts for sample sizes for days, weeks, months and years, but I'll only show the chart for months (the others are quite similar): mssplot::.oc("mss_plot.eps") .tc(mssplot) mmaxlen::(#mpbss)|#mmetss mmaxval::|/*|+mpbss,mmetss mmetssvals::(*|+mmetss),(mmaxlen-#mmetss):^0 mpbssvals::(*|+mpbss),(mmaxlen-#mpbss):^0 setrgb(0;0;0) grid([0],mmaxlen,(mmaxlen:%20);[0],mmaxval,(mmaxval:%20)) xtitle("Range (in months)") ytitle("Number of predictions") setrgb(0;0;1) segplot(mmetssvals) setrgb(1;0;0) segplot(mpbssvals) draw() .fl() .cc(mssplot) Sample sizes for predictions with a range of n months, sorted and graphed. The red bars stand for Metaculus sample sizes, the blue bars stand for PredictionBook sample sizes. As one can see, the sample sizes have a drastical skew towards recent predictions, not surprising for relatively young platforms (although 10 years for PredictionBook is sizable by internet standards, it's not that much compared to the range of some predictions on the platform). This can be seen in the data as well: More than 77% percent of Metaculus predictions and 75% of PredictionBook questions have a range of less than one year: ypbss::{(ypbdiffs@*x),#x}'ypbdg ypbss::ypbss@<ypbss ymetss::{(ymetdiffs@*x),#x}'ymetdg ymetss::ymetss@<ymetss ymetss [[0 34487] [1 7129] [2 2182] [3 507]] ypbss [[0 29724] [1 4257] [2 1966] [3 1491] [4 909] [5 374] [6 287] [7 155] [8 143] [9 107] [10 6]] 34487%(34487+7129+2182+507) 0.77839972915020878 29724%(+/*|+ypbss) 0.754052614221568279 I hope that the dataset becomes richer the older these platforms become. For days the skew is not as strong for Metaculus (moreso for PredictionBook), but still relevant: 10#dmetss [[0 406] [1 543] [2 633] [3 464] [4 546] [5 477] [6 440] [7 307] [8 240] [9 297]] 10#dpbss [[0 3267] [1 1142] [2 754] [3 611] [4 625] [5 426] [6 507] [7 440] [8 283] [9 246]] Because in the linear regression all datapoints are weighted equally, it could very well be that a tiny bit of noise at the tails dominates the entire regression. Accuracy Between Questions Another way to determine at the relation between forecasting accuracy and range is to look at the range of questions and not of individual forecasts. In this case, this means taking the forecasts on all questions with a given range, and calculating the brier score on these forecasts, sorting them into buckets related to range. Determining the Range of a Question The range of a question is determined by taking the time difference between the opening time (the time when the first prediction on the question could have been made) and the resolution time. One could imagine other metrics to determine the range of a question: the mean range for forecasts of that question, the median range for forecasts on that question, time differences between writing/opening and closing/resolution times of the question, and probably many more. Here, the range of a question was set to the time difference between opening time and resolution time. The reasons for this were threefold: First, I had no clear idea about the time when people were making forecasts on questions. Are most of the forecasts made just after opening, or just before closing? Or is the distribution uniform on the time between opening and closing? And are these distributions different on long-range as opposed to short-range questions? Also, I was unsure whether taking the mean time for forecasts would just be the same as comparing forecasts directly. So taking the median or the mean of the forecasts made was less preferable. Second, what I cared about here was the uncertainty of questions at time of writing, not at time of prediction. This is much better tracked by opening time than by proxy on the forecasts. Third, there was the question of data availability. Both Metaculus and PredictionBook publish opening/resolution times, but PredictionBook has no clear distinction between closing and resolution time (there is, however, a distinction between effective resolution time and planned resolution time ("When was the question resolved?" vs. "When should the question have been resolved?")). Analysis First, the dataset grouped by forecasts had to be grouped by the question ID, in both cases a positive integer. The resulting datastructure should have the structure [[id open-resolve-timediff [outcomes] [forecasts] [forecast-resolve-timediffs]]*]` where the splat just indicates the inner list can be repeated. This was achieved by first finding the grouping of forecasts by question ID, then concatenating the ID, the question range, the list of outcomes, the list of forecasts and the list of forecast ranges: metquestions::{(*x@0),(*x@1),2_x}'+'(+metdata)@=*metdata pbquestions::{(*x@0),(*x@1),2_x}'+'(+pbdata)@=*pbdata Strictly speaking, the outcomes could be a single element, since for every question there is only one well-defined outcome, but this makes it easier to later compute the brier score. Showcase: metquestions@10 [474 497590.0 [0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0] [0.79 0.8 0.99 0.8 0.8 0.65 0.65 0.8 0.8 0.81 0.81 0.7] [249575.65223908424 249548.86438822746 245775.7940876484 242420.23024630547 230434.71577501297 230276.97260832787 230111.41609930992 229967.06126213074 216594.73318576813 207687.5192539692 177898.677213192 151590.6441845894]] The next step involves merging the forecasts on questions with the same range (rounded by day/week/month/year). This was achieved by first dividing the question range by spd/spw/spm/spy, then grouping the questions by the resulting rounded range. Afterwards, questions ID and forecast range were dropped and the forecast and result arrays were concatenated in order. The resulting array had the structure [[question-range [results] [forecasts]]*]. Aftewards, computing the brier score was quite straightforward by selectively applying it in a mapped function. The resulting array was sorted by range, for convenience. dmetquestions::{(_(x@1)%spd),x@[2 3]}'metquestions dmetquestions::{(**x),,/'+{1_x}'x}'dmetquestions@=*'dmetquestions dqmetbrier::{(*x),brier@1_x}'dmetquestions dqmetbrier::dqmetbrier@<dqmetbrier dqmetbrier@19 [20 0.210210798122065727] This was of course implemented for both datasets and for all four kinds of buckets. Results Again I use linear regressions, correlation coefficients and scatter plots to inadequately analyze the data. For accuracy between questions, the results were mostly not very interesting: cor@dqmetbrier -0.021357561237633882 cor@+wqmetbrier -0.0564522173076630489 cor@+mqmetbrier -0.134945120480158162 cor@+yqmetbrier -0.459895122605089935 cor@+dqpbbrier 0.00977369255430785951 cor@+wqpbbrier 0.0350359685890036469 cor@+mqpbbrier 0.00195160822503737404 cor@+yqpbbrier -0.542853871095691028 With a high resolution (looking at days and weeks, similarly months), the correlations are very near zero, probably just by noise. But the correlations for the range in years and across-question accuracy is ~-0.5 in both cases. This is curious, and I have no explanation of what exactly is going on. Perhaps this is just a random result in both cases, which works because the datasets are just too small (4 & 10 for Metaculus and PredictionBook, respectively)? Or is it picking up on a real effect only visible with ranges as high as years? I don't know. And now: linear regressions and scatterplots! The following are scatterplots with range on the X-axis and accuracy (calculated using the Brier score) on the Y-axis. Again, red dots/lines are for Metaculus data, and blue dots/lines are for PredictionBook data. Scatterplot with linear regression for Metaculus & PredictionBook question accuracy by range (in days) Scatterplot with linear regression for Metaculus & PredictionBook question accuracy by range (in weeks) Scatterplot with linear regression for Metaculus & PredictionBook question accuracy by range (in months) Scatterplot with linear regression for Metaculus & PredictionBook question accuracy by range (in years) Note that these are indeed different from the results in the analysis on between-forecast accuracy. Especially, it seems like the linear regressions are less steep: lreg(dmetdiffbrier) [-0.0000372520623478135807 0.190620666721820704] lreg(dqmetbrier) [-0.00000947572947427605725 0.177148138436629167] The general trend seems to be: questions with a higher range have a higher accuracy than questions with a lower range. In itself, this is already a fascinating finding, and might explain some of the effect seen with accuracy between forecasts in the previous section). On the other hand, the data is still very noisy, and the interpolation on PredictionBook data shows no relation at all for the four timespans, while having questions with a much higher range than Metaculus. All in all, it's plausible that the relation of range and accuracy between questions explains the the weird relation for accuracy and range between forecasts, but I don't know enough statistics to tease these out exactly. My intuition tells me that the effect on accuracy between questions is too small to explain the whole anomaly between forecasts. Accuracy Within Questions If there exists any bias in regard to what kinds of questions get asked in relation to their range, how can we correct for this bias? One approach could be to compare very similar questions, such as only questions about artificial intelligence, the cost & speed of gene sequencing or autonomous cars, and examine the relation of range and accuracy within these categories. This might eliminate bias resulting from questions in different kinds of domains being easier or harder to forecast. Here, I take a simpler approach. I examine the relation of range and accuracy within questions; are forecasts made on the same question later generally more accurate than forecasts made on a question earlier? Analysis In order to do this, it seems like questions with higher numbers of forecasts on them are are more likely to give clearer results than questions with only a dozen or so forecasts. The Metaculus dataset contains predictions on 557 questions, the PredictionBook dataset 13356: #metquestions 557 #pbquestions 13356 I filtered out questions with <100 predictions on them, resulting in 323 questions from the Metaculus dataset and 0 (!) questions from PredictionBook: wmetq::flr({100<#x@2};metquestions) wpbq::flr({100<#x@2};pbquestions) #wmetq 323 #wpbq 0 This is not wholly surprising: Metaculus makes creating new questions much harder, and more strongly encourages users to predict on existing questions, with an elaborate tagging system for questions. PredictionBook on the other hand simplifies the questions creation process, leaving out moderation, complex resolution criteria etc. Still, I'm surprised – there must be at least one PredictionBook question popular enough for 100 forecasts! But apparently not. So, what is the highest number of predictions a PredictionBook question has gotten? pbl::{#x@2}'pbquestions pbl::pbl@<pbl |/pbl 99 You got to be kidding me. Anyway, within the usable questions with >100 predictions, the predictions of each question are first sorted by range (here time between forecast and resolution) and then separated into chunks containing 50 predictions each, so that the resulting structure of cwmetq looks like this: [ [ [[result_array] [50_earliest_predictions] [ranges]] [[result_array] [50_next_predictions] [ranges]] … ] … ] The code works by iterating the function sac over every question, first sorting the values by range and then cutting the predictions into chunks of size 50. chl::50 sac::{t::+2_x;+'(chl*1+!(#t):%chl):_t} chsmetq::sac'wmetq Interlude: It's Under 102 When I first ran this code, I then also wanted to check how many chunks each question had: #'chsmetq [3 3 3 3 3 3 … 3 3 ] The result was, to say the least, confusing – where did all those 3s come from‽ Surely, there are questions with more than 150 forecasts (which I knew, this question about 2016 being the warmest year on record has 765 forecasts)! 10#{#x@3}'metquestions [101 101 94 60 101 61 101 101 101 68] |/{#x@3}'metquestions 101 I initially suspected a bug in my code, but to my surprise, after further investigation, it turned out that the Metaculus API returns timeseries with elements removed so that the length was always 101. I can think of two reasons to do this: • Metaculus wants to prevent other entities from using the predictions to create stronger forecasting algorithms that could rival the Metaculus algorithm • It was programmed in as a hard limit when Metaculus wasn't as big as it is now, and never changed I mailed the support address on the site, asking for a full timeseries on resolved binary questions. After the support address had not responded to my inquiry, I contacted one of the admins of the site on the Discord, but was informed that updating the API would be too difficult to do (which is understandable, the Metaculus developers do not exist to cater to my whims, and are doing a phenomenal job). So, unfortunately I'll have postpone a more complete analysis to later. Now for each chunk of size 50 we can compute the brier score and the mean of the range, and subsequently convert the ranges from seconds to days: pchsmetq::{+mu(*|x),brier@2#x}'x}'chsmetq pchsmetq::{{((_*x):%(3600*24)),1_x}'x}'pchsmetq The dataset then has elements like this: 2#pchsmetq [[[294 0.036422] [72 0.015188] [1 0.0016]] [[57 0.002532] [35 0.001462] [28 0.0004]]] Each element contains the mean range of a chunk in days and the accuracy of the forecasts on that question within that chunk. Results We can now compute the linear regression for the chunks in each question: 2#lreg'pchsmetq [[0.00011329877152681667 0.0038764502832194274] [0.0000675414847161572049 -0.00123699272197962153]] We can also visualise the linear regression for each question by setting it to zero outside the range of the oldest and newest chunks: sketch::{q::x; setrgb(.rn();.rn();.rn()); pltr::{:[(x>**q)|x<**|q;0;lr(x;lreg(q))]}; plot(pltr)} sketch'pchsmetq Linear regressions for the accuracy of questions by range in chunks of size 50. The vertical bars are artifacts stemming from the fact that Klong attempts to makes the discontinuous function continuous, connecting 0 and the linear regression. Although the plot is kind of cool to look at, I'm not really sure what it can tell us. My guess would be that it somewhat shows a trend with higher ranges responding to higher Brier scores (and therefore lower accuracy). We can test whether this suspicion is acually correct by calculating the average offset and the average ascension – if the ascension is positive, our suspicion is confirmed. mu'+lreg'pchsmetq [0.00198030517003624986 0.0105472685809891273] So it is true that accuracy within question generally is higher with lower range. Everything else would have been surprising. Mean of linear regressions on accuracy within questions. Conclusion Using two datasets with both ~45k predictions, having ranges between 1 day and 10 years (thereby containing forcasts with short and medium range) I have investigated the relation between the accuracy of predictions and their range (that is, the time between the prediction being made and the result of the prediction being known). I have found that the data indicates three facts: 1. For predictions made on any question, the predictions made a long time before their resolution are generally more accurate than predictions made a shorter time before their resolution. This can be partially, but not completely explained by fact 2. 2. Questions with a longer range (that is, time between the question being written and the question being resolved) generally receive predictions with a higher accuracy than questions with a shorter range. 3. Predictions made on the same question earlier are generally less accurate than predictions that are made later. These results vary strongly between Metaculus and PredictionBook, with observations 1. and 2. much weaker or non-existent in PredictionPook data (observation 3. only holds for Metaculus, because there are no questions on PredictionBook with enough forecasts to run the analysis). These results suggest what to expect with questions with even greater range: That later predictions on them will generally be more accurate, and that the kinds of questions asked with a very high range might have engender prediction with an even accuracy than questions with short and medium ranges. However, there are plausible reasons to expect the trend from 1. and 2. to reverse: The questions asked with very high range are not very different from questions with medium range, and have a lot less information available to make useful predictions on them; butterfly effects start kicking in in systems that are relatively slow moving on human timescales (thus easier to predict on medium timescales), but nearly completely random at the scale of decades and/or centuries; the questions asked about longer timescales are of a different kind and much less predictable. I hope to update this analysis in the future, when data from predictions with higher ranges has become available, and to check whether the findings in this analysis continue to be correct. Miscellaneous The code for image generation can be found here, the complete code for analyzing the data can be found here. Discuss ### Brigaded Rounds 16 ноября, 2020 - 04:50 Published on November 16, 2020 1:50 AM GMT A group of round singers were interested in trying out the bucket brigade singing program, and I realized that with a small tweak it could support rounds directly. We just needed to write audio both at the place it belonged to, and an appropriate distance into the future. This would let everyone hear everyone else, though the leader still has to wait a bit before they start to hear others. The system already needs to know BPM (beats per minute) so it can run a metronome, but the support around it also needs to know BPR (beats per repeat). For example, "Row, Row, Row, Your Boat" is a 16 beat round with participants entering every four beats, so you would probably set BPR to be 16. Four (or more) people could sing, with one person leading and each other person joining after the right number of beats as if they were singing together in person. Once everyone has been singing for 16 beats they all hear everyone else, and also themself. You can get interesting effects by setting the BPR to some other multiple of the part interval, such as 12. This lets you sing along with yourself an offset, so that you appear to be one of the other singing voices. It's also possible to configure the system with multiple repeats, so you would hear each iteration of your voice, say, three times. This can be fun if you have a small group of people and want to sing many parts, or even if you're playing around by yourself. One thing this made me realize is that the latency calibration is really very important, and the previous way we were handling it was not sufficient. Originally, it started off trying to get a very accurate estimation of your latency, but if it wasn't doing very well it would slowly lower its standards. After about 30 seconds, it would let in people even with pretty inaccurate estimates. I've updated it now to be very careful, and if it can't estimate your latency to within 2 ms, it lets you know and gives you the choice between trying again, and joining without sending any of your audio to the server. Feel free to play with it: echo.jefftk.com. Comment via: facebook Discuss ### When socializing, to what extent does walking reduce the risk of contracting Covid as opposed to being stationary? 16 ноября, 2020 - 03:39 Published on November 16, 2020 12:39 AM GMT I would expect that when stationary, the aerosol particles have more time to accumulate whereas when walking they don't, and thus walking would offer a pretty nice reduction of risk. Discuss ### Announcing the Forecasting Innovation Prize 16 ноября, 2020 - 00:12 Published on November 15, 2020 9:12 PM GMT Motivation There is already a fair amount of interest around Effective Altruism in judgemental forecasting. We think there’s a whole lot of good research left to be done. The valuable research seems to be all over the place. We could use people to speculate on research directions, outline incentive mechanisms, try novel forecasting questions with friends, and outline new questions that deserve forecasts. Some of this requires a fair amount of background knowledge, but a lot doesn’t. The EA and LW communities have a history of using prizes to encourage work in exciting areas. We’re going to try one in forecasting research. If this goes well, we’d like to continue and expand this going forward. Prize This prize will total$1000 between multiple recipients, with a minimum first place prize of $500. We will aim for 2-5 recipients in total. The prize will be paid for by the Quantified Uncertainty Research Institute (QURI). Rules To enter, first make a public post online between now and Jan 1, 2021. We encourage you to either post directly or make a link post to either LessWrong or the EA Forum. Second, complete this form, also before Jan 1, 2021. Research Feedback If you’d like feedback or would care to discuss possible research projects, please do reach out! To do so, fill out this form. We’re happy to advise at any stages of the process. Judges The judges will be AlexRJL, Nuño Sempere, Eric Neyman, Tamay Besiroglu, Linch Zhang and Ozzie Gooen. The details of the judging process will vary depending on how many submissions we get. We’ll try to select winners for their importance, novelty, and presentation. Some Possible Research Areas Areas of work we would be excited to see explored: • Operationalizing questions in important domains so that they can be predicted in e.g., Metaculus. This is currently a significant bottleneck; it’s surprisingly difficult to write good questions. Examples in the past have been the Ragnarök or the Animal Welfare series. A possible suggestion might be to try to come up with forecastable fire alarms for AGI. Tamay Besiroglu has suggested a “S&P 500 but for AI forecasts,” i.e., a group of forecasting questions which track something useful for AI (or for other domains.) • Small experiments where you and/or a group of people use forecasting for your own decision making, and write up what you’ve learned. For example, set up a Foretold community to decide on which research document you want to write up next. Predictions as a Substitute for Reviews is an example here. • New forecasting approaches, or forecasting tools being used in new and interesting ways, or applied to new domains. For example, Amplifying generalist research via forecasting, or Ought’s AI timelines forecasting thread. • Estimable or gears-level models of the world that are well positioned to be used in forecasting. For example, a decomposition informed by one’s own expertise of a difficult question into smaller questions, each of which can be then forecasted. Recent work by CSET-foretell would be an example of this. • Suggestions for or basic implementation of better tooling for forecasters, like a Bayes rule calculator for considering many pieces of evidence, a Laplace law calculator, etc. • New theoretical schemes which propose solutions to current problems around forecasting. For a recent example, see Time Travel Markets for Intellectual Accounting. • Elicitation of expert forecasters of useful questions. For example, the probabilities of the x-risks outlined in The Precipice. • Overviews of existing research, or thoughts or reflections on existing prediction tournaments and similar. For example, Zvi’s posts on prediction markets, here and here. • Figuring out why some puzzling behavior happens in current prediction markets or forecasting tournaments, like in Limits of Current US Prediction Markets (PredictIt Case Study). For a new puzzle suggested by Eric Neyman, consider that PredictIt is thought to be limited because it caps trades at$850, has various fees, etc, which makes it not the sort of market that big, informed players can enter and make efficient. But that fails to explain why markets without such caps, such as FTX, have prices similar to PredictIt. So, is PredictIt reasonable or is FTX unreasonable? If the former, why is there such a strong expert consensus against what PredictIt says so often? If the latter, why is FTX unreasonable?
• Comments on existing posts can themselves be very valuable. Feel free to submit a list of good comments instead of one single post.

Discuss

### Notes on Loyalty

15 ноября, 2020 - 22:30
Published on November 15, 2020 7:30 PM GMT

What is loyalty?

These are two senses of loyalty:

• loyalty as reciprocity — Α is loyal to Β in this way because either Α has reason to feel gratitude to Β for some previously-granted favor, or expects to be able to get such a favor from Β at some time in the future. “You pulled the thorn from my paw, and now I shall certainly come to your aid.”
• loyalty as partiality — Α is loyal to Β in this way by reliably making the promotion and defense of Β’s interests a priority to Α when that would not be the case absent the loyalty. “I would never rat out a fellow member of the Water Buffalo lodge.”

Loyalty may suggest both “I’ll be in your corner” (unlike other non-loyal people), and “I’ll be in your corner” (not in your adversary’s), but sometimes one more than the other.

Phrases like that one that describe loyalty commonly include descriptions of body language and relative positions of bodies: “I won’t turn my back on you,” “Now I know where you stand,” “I’ve got your back,” “He stayed by her side through it all.”

Issues of loyalty provide the story arc of many a popular classic — for example movies like Casablanca, Yojimbo / A Fistful of Dollars, or the Star Wars films (will Annakin go over to the dark side? will Han abandon his comrades in their hour of need?). Betrayal and false loyalty define many a classic villain (e.g. Macbeth). Loyalties that make incompatible demands have been a staple of tragedy at least since Sophocles and the Mahabharata. We seem to take particular interest in stories that involve shifts in loyalty, hidden loyalties being uncovered, loyalties being put to the test, the disloyal getting their comeuppance, and that sort of thing. This suggests that careful attention to the loyalties of those around us may have been an important skill to have in the history of our species.

Synonyms and related virtues

The words “faithfulness” and “fidelity” are sometimes used more-or-less synonymously with loyalty (especially in the context of marriage vows). “Fealty” and “allegiance” cover something similar in the context of loyalty expressed upwards in hierarchies.

“Patriotism” sometimes gets used as a synonym for the loyalty a person feels toward their nation. “Filial piety” includes a specific variety of loyalty practiced towards ones parents. “Solidarity” is a sort of implied loyalty that similarly-situated people are supposed to feel toward one another. “Teamwork” includes a sense of loyalty to the team itself and its goals.

Loyalty often gets discussed in combination with nearby-virtues like commitment, dependability/reliability, and duty. When people are unswervingly loyal to principles, ideas, and ideals, this can be a reasoning failure of unwise intellectual rigidity; but sometimes “loyalty” is used metaphorically in this context to describe devotion, consistency, integrity, and other such virtues.

Loyalty is important to the virtue of friendship (“The ground for the steadfastness and constancy for which we are searching in friendship is faithfulness.” ―Cicero). A loyal friend is sometimes described as a “true” friend: one who has been tried and has passed the test.

In a professional context, when you agree to promote your client’s or customer’s interests as part of the contract or as part of the ethical obligation of the job, this sort of loyalty is sometimes called “fiduciary responsibility.”

Sometimes loyalty is used informally to describe a mere preference or habit (“a loyal Starbucks customer”). Other times we express a sort of loyalty to tradition or to our ancestors (“just like my grandparents did, and their grandparents before them” or “as the founding fathers would have wanted”).

Loyalty can conflict with other virtues — most obviously virtues like impartiality, objectivity, and justice, but really any virtue against which loyalty might plead the cause of vice. For this reason, some philosophers have given loyalty the stink-eye, seeing it as more of a temptation than a virtue.

What does loyalty commit you to?

What exactly loyalty demands of us is usually pretty vaguely defined. What loyalty consists of is often conveyed through anecdotes and exemplars (sometimes of the disloyal) rather than through rules.

This can make it seem like loyalty is less a compelling commitment and more of a post-hoc excuse for what a person wanted to do for other reasons. How do you choose between “I am loyal to you, so I must” and “I am loyal to you, but I won’t”? Loyalty may induce you to incur opportunity costs: you come to the assistance of whatever you are loyal to at the cost of working on your own pursuits. This gets trickier when loyalty encourages you to do things you would otherwise find actually un-virtuous or fully vicious.

People and institutions that rely for their strength on the loyalty that people have toward them do what they can to strengthen that loyalty. They may try to eliminate rival loyalties by demanding that you make loyalty to them paramount: “you cannot serve both X and Y,” “you’re either with us or against us.” They may, as the United States does with its schoolchildren, ask you to pledge your allegiance over and over again.

A loyalty-dependent institution like this can broadcast its strength by demonstrating the extremes its fanatics are willing to go to to show their loyalty. For this reason, they may ask people to signal their loyalty in various ways. Oaths, pledges, vows, insignia, binding rituals, and things of that nature are legible ways to signal loyalty. But because they are easily-accomplished they may not be very effective gauges. More expensive signals are more reliable for this purpose, and so sometimes people are called upon to prove their loyalty by things that may seem absurd to outside observers: believe the unbelievable, defend the indefensible, assert the incredible, humiliate yourself, take the blame for something you didn’t do. You can best prove your loyalty by doing something that is costly, that goes against your own interests, and that otherwise violates your moral code: something that you would obviously never do except for your loyalty. (And once you have done so, even though such abusiveness suggests that maybe your loyalty is misplaced, the sunk-cost fallacy may help cement your loyalty further.)

People may exploit the vagueness of what loyalty commits you to, by asking you to be loyal in a way that explicitly commits you to X, but then asserting at some later time that you implicitly committed yourself to Y & Z as well. Open-ended or unspecified commitments are especially tricky. “You said you’d be there if I needed you, and now I need you to help me hide this body.”

Because of this potential for abuse and for conflicts with other virtues, loyalty is a virtue that requires strong bodyguards in the form of wisdom, discernment, foresight, vigilance, and caution. If you are going to be fiercely loyal, you should take special care in deciding what to be loyal to. If you ask favors of the Godfather, expect to hear “Someday I will call upon you to do a service for me” in return.

Loyalty in tension with justice

Loyalty usually implies partiality, which is a problem if you value loyalty but also value impartiality and objectivity. For this reason, we suspect the judgement of people who have expressed (or suspected) loyalties that might induce them to put their thumbs on the scale.

People and institutions with more power, authority, and resources can use those things to extort, command, or purchase more loyalty, which they can then trade in for more power, authority, and resources. This can create a dynamic in which these things flow to where they are already concentrated, in a way that can seem unjust. However, places where lots of power, authority, and resources come together are notoriously dens of intrigue and back-stabbing, so maybe this dynamic is ultimately unstable.

Part of what is exceptional about Christianity is its emphasis on solidarity with the downtrodden as a way of demonstrating loyalty to Jesus — flipping that dynamic of demonstrating your loyalty through acts that benefit those who already have more than their share: “For I was hungry and you gave me food, I was thirsty and you gave me drink, I was a stranger and you welcomed me, I was naked and you clothed me, I was sick and you visited me, I was in prison and you came to me.”

When you put energy and resources into displaying loyalty to (for example) a football team, those are resources that could be spent instead on those who have more genuine need. So maybe there’s an effective-altruism argument for reducing the influence of such loyalty as well.

Although loyalty can be in tension with justice, it can also be a way of honoring justice. If loyalty is owed, then disloyalty is the unjust failure to honor a debt. The disloyal are sometimes described as being unjust in their betrayals. This is especially true when loyalty is a sort of reciprocity (you came through for me in a pinch, so now you can count on me).

Loyalty as a coordination mechanism

If coordinated group effort is important to the success of some endeavor, loyalty (to the cause or to the institution or to the leader) is one mechanism for helping to ensure that individual efforts are appropriately focused on the common task. “Teamwork” is a variety of loyalty in which the members of the team value the goals of the team as a whole over their own personal goals, and behave accordingly.

“We must all hang together, or, most assuredly, we shall all hang separately.” ―Benjamin Franklin, to his fellow-revolutionaries, on the signing of the Declaration of Independence (possibly apocryphal)

The loyalty of patriotism is sometimes defended from charges that it is irrationally partial by people who say that strong nations are good, and loyalty is necessary to making a nation strong, so you should not try to judge whether your nation is worthy of your loyalty but you should be loyal to it in order to benefit your nation and increase its worth.

Loyalty is a “force multiplier”. One legend holds that during negotiations with the enemy on the brink of battle, the leader of the Hashashin abruptly ordered one of his soldiers to leap from the window of the room to his death. The soldier complied without hesitation and without a word. The representative of the enemy of the Hashashin realized from this how devoted and formidable an enemy he was facing, and so war was averted.

To share in the benefits that come with coordinated group action, it can be useful for individuals to signal that they are “team players” with robust senses of loyalty. I wonder if the subconscious reason people often ostentatiously display loyalties to things like sports teams, brands, and so forth, is that these things signal that they are capable of forming strong loyalties, and in this way they encourage other people to join with them in mutually-beneficial alliances.

In prisoners-dilemma-type games, a reputation for loyalty can help game players optimize their play.

Loyalty as a way of forming identity

Loyalty is a component of belonging, which people tend to value. “I am an American” may describe a mere accident of birth; “I am a loyal American” seems to bind me together with other Americans in a joint project. People often define themselves in part by the loyalties they have adopted. If you think of yourself as a Freemason, for example, or a Marine (semper fi!), or a husband or wife, you have an identity that comes necessarily packaged with certain expectations of loyalty.

Demonstrations of loyalty, declarations of loyalty, symbols and tests of loyalty, and the like, are ways of policing in-group/out-group boundaries.

In our eagerness to belong, people sometimes adopt loyalties (or pantomime as though they have) to ephemeral and arbitrary things, and for the most tissue-thin reasons. Once you start, for example, harmlessly rooting for the home team or being true to your school, it can be hard to remember that your home team or school isn’t really objectively better or more noble or more worthy. The teacher who as an experiment divided her class up by eye color and encouraged eye-color-solidarity among them was astonished to see “what had been marvelous, cooperative, wonderful, thoughtful children turn into nasty, vicious, discriminating little third-graders in a space of fifteen minutes.”

Whether or not you can keep a critical, objective head about you while remaining loyal to a person, team, or cause, is a tough nut to crack. “Blind” or “unthinking loyalty” is usually looked down on, there is an honorable place for the “loyal opposition,” and sometimes your most loyal friends are the ones who aren’t afraid to tell you what you didn’t want to hear.

Loyalty as extended reciprocity

When loyalty has been earned (e.g. through services rendered), it is sometimes seen as a form of gratitude. Expressions of loyalty can be acknowledgments of indebtedness, or that the original favor has not been forgotten.

Loyalty is sometimes seen as a potential resource that can be “cashed in” in a more concrete way at some future time. You might offer such loyalty in exchange for someone’s help if you don’t have a better way to incentivize them. If you cultivate a reputation for steadfast loyalty, the loyalty you offer at such times will have a higher value and you presumably can obtain more for it.

Sometimes, institutions will use this sort of mechanism as a mutual-insurance policy. For example, I understand that Masons typically take an oath to come to the aid of any other Mason in distress.

Conclusion

In what has become an alarming pattern with these virtue explorations, I picked up “loyalty” thinking that it seemed simple enough and that I had a pretty good handle on what it meant, but then the more I investigated the more complex it revealed itself to be.

Discuss

### Spend twice as much effort every time you attempt to solve a problem

15 ноября, 2020 - 21:37
Published on November 15, 2020 6:37 PM GMT

In brief:  in order to iteratively solve problems of unknown difficulty a good heuristic is to double your efforts every time you attempt it.

You can try to solve the problem as many times as you want, but you need to precommit in advance how much effort e you want to put into each attempt - this might be for example because you need to plan in advance how you are going to spend your time.

We are going to assume that there is little transfer of knowledge between attempts, so that each attempt succeeds iff the amount of effort you spend on the problem is greater than its difficulty rating  d">e>d.

The question is - how much effort should you precommit to spend on each attempt so that you solve the problem spending as little effort as possible?

Let's consider one straightforward strategy. We first spend 1 unit of effort, then 2, then 3, and so on.

We will have to spend in total 1+2+⋯+d=12(d+1)d∈O(d2). The amount of effort spent scales quadratically with the difficulty of the problem.

Can we do better?

What if we double the effort spent each time?

We will have to spend 1+2+4+⋯+2⌈log2d⌉=2⋅2⌈log2d⌉−1≤4d−1∈O(d). Much better than before! The amount of effort spent now scales linearly with the difficulty of the problem.

In complexity terms, you cannot possibly do better - even if we knew in advance the difficulty rating we would need to spend d∈O(d) to solve it.

Is doubling the best scaling factor? Maybe if we increase the effort by 1.5 each time we will spend less total effort.

In general, if we scaled our efforts by a factor b then we will spend in total 1+b+b2+⋯+b⌈logbd⌉=b⌈logbd⌉+1−1b−1≤b2d−1b−1. The minimum of this upper bound is found for b=1+√d−1d which asymptotically approaches 2 as d grows. Hence 2 seems like a good heuristic scaling factor without additional information on the expected difficulty. But this matters less than the core principle of increasing your effort exponentially.

Total effort spent vs scaling factor, for difficulties 10, 1e4, 1e9. The effort zigzags up and down, hitting local minima for scaling factors that share factors with the target difficulty. But if we look at the upper bounds of the function we can appreciate a robust minimum slightly below 2.

Real life is not as harsh in our assumptions - usually part of the effort spent carries over between attempts and we have information about the expected difficulty of a problem. But in general I think this is a good heuristic to live by.

Let us suppose you are unsure about what to do with your career. You are considering research, but aren't sure yet. If you try out research, you will learn more about that. But you are unsure of how much time you should spend trying out research to gather enough information on whether this is for you.

In this situation, before committing to a three year PhD, you better make sure you spend three months trying out research in an internship to try out research. And before that, it seems a wise use of your time to allocate three days to try out research on your own. And you better spend three minutes beforehand thinking about whether you like research.

Thank you to Pablo Villalobos for double-checking the math, creating the graphs and discussing a draft of the post with me.

Discuss