Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 6 минут 44 секунды назад


6 сентября, 2020 - 21:30
Published on September 6, 2020 6:30 PM GMT

What you’re looking at is a geological formation called Dry Falls, in the Sun Lakes-Dry Falls state park in my home state of Washington. The Dry Falls are a series of escarpments and cliffs near Grand Coulee, deep in Eastern Washington’s channelled scablands region. These are four hundred foot high cliffs in the middle of the desert, how did they get here? What secrets does this terrain hold? What can the strange rock formations and alien landscapes of eastern Washington state tell us about the future of our planet?

During the end of the last ice age, a massive amount of glacial ice in continental Europe and North America melted away. During the period from 25,000 to 10,000 years ago, the Laurentide, Cordilleran, and Fennoscandian ice sheets completely melted, leading to a 120 meter rise in the global sea level. The rise in sea levels from this melting is estimated to have averaged in at roughly one meter per century while being augmented by two intense periods of melting between 15,000 and 13,000 years ago, and between 11,000 and 9,000 years ago.

While the current consensus among paleoclimatologists is that this melting was relatively gradual and steady, occurring at a linear rate over the course of 15,000 years, there is some evidence beginning to surface both in our current ice sheets and in the geologic records on the last one, that a gradual and linear melting rate is not what we should expect to see going forward.

In this post, I’ll look at recent melting trends in Antarctica and Greenland as well as at paleoclimate data from ice and seabed cores to propose a model of continental ice sheet collapses as rapid and potentially cataclysmic historical events which we should be aware of as potentially civilization destabilizing. Most of our current population, our largest cities, and most of our power and industry facilities, are all located in low lying areas susceptible to coastal flooding. If the water levels rise at a rate faster than can be mitigated by a slow withdrawal from the coastline over the course of many decades and centuries, it could cripple human civilization and bring an end to our current way of life.

The first piece of evidence to note here is that the geologic record of the last ice age is littered with superfloods and seemingly cataclysmic sea level rise events. Water topped over earthen berms and flooded into lowlying areas, Doggerland and Sundaland vanished beneath the waves and the Bering Strait cut Asia and North America apart. These events have left scars on the surface of the Earth which you can see from space, you just need to know what to look for.

This is the North Fork of the Toutle River as it flows across the soft dried mud and ash of the Mount Saint Helens lahar zone, I provide this image just to given an example of stream braiding, the lahar zone gives a nice canvas on which you can really see how the water carves all these winding channels through the surface material. This happens in rivers around the world though, there are dozens of examples of this sort of river braiding I could show you. The important thing to note here though is the scale of this landform. The lahar zone is less than a kilometer across, and we can see roads and trees and houses at this level of zoom.

So now lets zoom out and look east across the Cascade range.

This is the channelled scablands from far above. At the height which satellites orbit, the mass scouring of hundreds of square kilometers can clearly be seen. braids tens of kilometers across and hundreds of kilometers long draw tracks across all of eastern Washington before spilling into the Columbia River Valley to flow onward toward the Pacific. This event, or events, geologists aren’t sure, is referred to as the Missoula Megafloods, and was the source of the Dry Falls pictured at the beginning of this post. At their peak flow, the Dry Falls were twice the height of Niagara Falls and five times the width. So much water poured into the Columbia River that it backfilled and flooded most of the Willamette Valley.

According to current consensus, these massive floods were caused when a proglacial lake formed in what is now Missoula, Montana. The leading theory is that a fifty mile long ice dam formed across the Clark Fork River which caused the waters of the receding Cordilleran Ice sheet to back up and pool around Missoula. This presents the first problem with the current consensus and is where a rather peculiar group of individuals become involved.

There are a group of slightly kooky geologists and historians who call themselves the Catastrophists. They hold that a moderately advanced civilization in North America was destroyed during the Younger Dryas period around 12,000 years ago and have found all sorts of interesting things to lend credence to their theory.

The Catastrophists looked at the story of the Missoula Megafloods and said, “That doesn’t work.” They pointed out that an ice dam the size of the one proposed cannot possibly have held back the amount of water under the head pressure that Glacial Lake Missoula was under, long enough for the lake to each its maximum historical depth of over 600 meters. Glacial Lake Missoula is estimated at having held 2,500 cubic kilometers of water, and the catastrophists say that there’s no way that could have happened with an ice dam triggered outburst flood, the ice would give before that much water could build up.

Instead, the Catastrophists propose that glacial lake missoula wasn’t a long term lake, but formed temporarily as a result of water flowing in from further north pooling and backfilling around Missoula as it interacted with the chokepoint in its flow along the Clark Fork River Valley.

The Catastrophists also have other evidence of rapid melting which they have found from seabed cores. Most of the seabed is composed mostly of decaying organic material, crushed up tiny organisms that rain down to the bottom in an ever present snow. However, there are notable strata lines within seabed cores, which contain mostly rocks, pebbles, sand, and other inorganic debris. These layers are called Heinrich Events, and it is believed that they are caused by large masses of icebergs breaking off, carrying rocks and sediment with them, and then dropping these bits of rock and sediment as they melt away. All of these things come together, according to the catastrophists, to seemingly support their theory of a cataclysmic event during the Younger Dryas period, 12,900 years ago.

So the Catastrophists look at all the data for speed of melting, heating from sunlight, atmospheric C02 levels, and conclude that the melting just happens too fast to be explained without an outside source. They claim there simply wasn’t enough energy available for the math to work out unless you added a bunch of extra energy from somewhere outside the climactic system.

The solution to this problem, they say, is that around 12,900 years ago, a comet or asteroid struck the top of the Laurentide Ice Sheet, triggering a massive pulse in melting which we observe in the form of megafloods and Meltwater Pulses and Heinrich Events. The evidence for this is shaky, but I sincerely hope they end up being correct. And they might actually be, late last year a 19 kilometer wide impact crater was discovered under the Hiawatha Crater in greenland. This impactor, if it occured at the right time period, might actually be the catastrophists smoking gun.

However, I am not particularly confident that they are correct. Because it’s under a glacier, we don’t yet know how old the crater at Hiawatha actually is. It could be significantly older than 12,000 years, and if it is, than we’re once again left with too much melting to fit our model and no discernible cause. The currently dominant theory is that a combination of increased insolation on the glaciers and high C02 levels at the time caused their final retreat and collapse. However, the effect seems to have exceeded the cause and the extremity of the events, especially the large pulses of meltwater, seem to imply some other mechanism was present. 

Without invoking some outside event like a volcanic eruption or an extraterrestrial impact, the only explanation we’re left with is the ill-understood climate feedback mechanisms which we are currently engaged in setting off en-masse.

The impact theory is in some senses comforting. We have big telescopes, we can see into space now, in theory, if we knew an impact event was coming, we could prevent it. If it takes an impact to cause a catastrophic melting and sea level rise event, then we’re mostly safe from it happening. If the melting was caused by an impact, then it means our current climate models which estimate around a meter of sea level rise by the year 2100, are largely accurate. 

But if these melting spikes were not caused by an impact, then it means something on earth which we currently do not understand triggered them. Something caused the ice sheets to suddenly and rapidly destabilize and release a large quantity of meltwater over a relatively brief period. If such an event were to occur today, the effects would be globally catastrophic. If an event caused a one-meter sea level rise over the course of a few years, it would render many of the world’s coastal cities uninhabitable. 

Scientists have posited that the West Antarctic Ice sheet, which is sitting on bedrock below sea level, could potentially experience a catastrophic collapse event if sea water was able to access the roots of this glacier. Although computer models have been unable to construct the timeline of events in detail, the possibility remains that the entire ice sheet could collapse over a period as short as a few years, which, if the entire thing went, would lead to 6 to 9 meters of sea level rise, enough to submerge a large number of urban cores around the world and utterly remake coastlines. 

The possibility of this catastrophic melting event is often left out of the climate change conversation, with the assumption being that melting will be a nuisance and force the eventual abandonment of low-lying areas or construction of new seawalls, but is not an existential threat to civilization by itself.

If the entire West Antarctic ice sheet was to collapse over a five year period, it would lead to a global crisis as populations were forced to relocate and cities were rendered unlivable. In many ways, the predictions that the ice sheets will last for centuries more and take a thousand years to melt away are overly optimistic and based on older and less accurate models of past climate events. A recent paper has provided evidence that melting may not be linear in nature but exponential, and if recent trends in accelerating melting are extrapolated out, we could see multimeter sea level rise within the next fifty years.

This would not by itself be an X-Risk, but would represent a major case of cranking up the pressure that humanity is put under, and make other X-Risks such as nuclear wars and pandemics more likely. It is my opinion that the possibility of catastrophic ice sheet collapse should be carefully considered and studied as a real possibility. It’s unlikely we could prevent such a collapse from occurring, but by anticipating such an event we may be able to save many lives and livelihoods.


Design thoughts for building a better kind of social space with many webs of trust

6 сентября, 2020 - 05:08
Published on September 6, 2020 2:08 AM GMT

Webs of trust, specific tastes, a more discerning, open, peaceful kind of social network system Abstract

A set of guiding design principles and a partial design for a social tool that uses webs of trust to identify the curators of content tags in a way that is decentralized, scalable, manageable, and in many cases subjective. I believe this would make content tags vastly more useful than they previously have been, giving rise to a robust, human-centric way of discovering information and having productive conversations about it.

A concise, complete explanation of what webs of trust are and all of the reasons they work

A web of trust is a network of users endorsing other users.

If you tell a web of trust who you trust, you can then travel along and find out who they trust, and so on as far as you wish to go, and that will give you a large set of people who you can trust to some extent, by transitivity.

Webs of trust can scale up at an exponential rate, as each new user can immediately add more users (better, they can start issuing endorsements in their own network segment even before they're added). This is pretty cool. Wonderfully, despite that, webs of trust can also be pruned and weeded fairly easily: If a few bad endorsements do get made, and the newly empowered bad users start adding more bad users, we will be able to trace the source of badness back through the endorsement relations to the root causes, and pruning them away will also prune away everyone that came through them, directly or indirectly (unless those people have received endorsements from other non-bad people since being added, in which case they're probably fine, and they will remain in). Crucially, the pruning and weeding does not need to be done by any central authority. Every user is the center of their own web, they can bring in anyone they like.

Webs of trust are useful for tracking qualities of users which come with the ability to recognize the presence of that quality in others. Most personal qualities are self-recognizing, in this way, to some extent. A person who has it, faced with another person, can usually figure out whether they have it too.

Examples of such qualities include good taste, responsibility, or not a spambot.

Some non-examples would be is a spambot (spambots are mainly about spamming and are not very interested in identifying each other), or is a fool. A web of trust would not help with keeping track of these qualities, but, again, most qualities people talk about aren't like these. If you do find yourself in need of a web that tracks a non-selfrecognizing quality, consider just making a web that tracks its negation. not a spambot, or no fool would work pretty well.

You might notice that some of the given examples of selfrecognizing qualities have rather subjective meanings. Not everyone will agree about what good taste or responsibility means. Though using fully explicit definitions of things is preferable where possible, sometimes it isn't possible (who would ever try to formally define taste?). Another neat thing about webs of trust is that they will still often work pretty well in those cases! If people disagree about the nuances of a quality, they will often end up organizing into separate webs of trust that agree within themselves. Webs of trust are compatible with subjectivity.

That makes webs of trust suitable for moderating a truly global platform. At no point does a central authority have to decide for everyone else what any of the webs are about. If two groups disagree about what sorts of things should be posted in a fundamental tag like respectful discourse or safe content, they don't have to interact! The web of trust is so powerful as a moderation technology that they can wholly split their webs and keep using the same tags in completely different ways without stepping on each other.

Some noteworthy systems that use webs of trust

The prototypical example of webs of trust seems to have been the process of establishing of real identity in PGP signature networks.

A friend, Alexander Cobleigh, is implementing a subjective moderation system for the P2P chat protocol Cabal, which you can read some things about here

Webs of Trust are being used to measure social adjacency in various distributed systems, for instance

  • Intersectional Social Data, a system for sharing personal information with only the right groups of people, uses Staked Collateral, where trust links are represented with commitments to lend money.

    • Additionally, I believe I've heard other RadicalXChange rumblings about absence of trust being potentially useful for evidence that a set of parties aren't very likely to be able to collude, which may be an important component in mechanisms for preventing vote trading in quadratic voting systems or guaranteeing competition in global auditing process.
  • Duniter requires that "Every member is strongly identified through a Web of Trust mechanism to prevent any one individual from receiving multiple Universal Dividends by using more than one identity"

Core principle: Users should not be asked to reduce themselves to a single brand

A web of trust can be used to exclude spammers, sibyls, annoying people, rude people, bad people, or people with bad taste. However, if one web of trust were used to cover all of those meanings and purposes at once, I imagine the results will be pretty inhumane; people would commit chilling, cowardly omissions of self to avoid any risk of being perceived as rude lest the web put them in the same icy hell as spammers.

Twitter kind of is like that, and I think it exhibits a lot of the problems we should expect that to have. On twitter, you have one face, you get one tube, the people who follow you have to be alright with everything you put in that tube. If you ever want to to post a type of content that some of your followers explicitly want to never see, you have nowhere to put it. Brand is totalizing. Everyone has to compress themselves down to one legible brand before the network can thrive.

Users should be encouraged to have more than one side to them. The situation could be helped if twitter were more encouraging of the use of alt accounts. In a way, the system I'm about to propose is a way of streamlining that sort of mode of use.

As it currently stands, we can conceptualize Twitter as a kind of thick slow web of trust for the overly broad content category of good tweet. This web's quality is not truly self-recognizing; the endorsements do not represent a transitive relation, they do not conduct very far, if you travel just a few steps along through your follows of follows you will find mostly people you wouldn't want to follow. Only shitposts and the most general interest news propagates well, everything else propagates depressingly incompletely, there is no strong agreement in most networks about what is good to post, and where there is no strong agreement there is no truth about what is good to post. Nothing is good to post. We must simply log off.

What if, instead, we had many webs of trust that discuss and define the many different dimensions of interest that people can have, which users could choose to participate in or not. Most of these webs may have specific enough meanings that content could be automatically propagated fairly far through them with confidence that everyone in them would be interested in most of it. Some of these webs might be nebulous or subjective in meaning, which would have lower recommended automatic propagation constants, and those would work too.

It is important that users are not required to present as a category of information, and it is important that categories of information can grow larger than any one curator. A person should not be a brand, and a brand should not be a person.


The system consists mostly of these four types of thing:

  • presence: A presence of a user in the web. A user can (and usually should) have many presences in different webs.

  • article: Content, posts, replies, records of actions and declarations.

  • tag: A property articles can be declared by presences to have (or, equivalently, can also be thought of as a set that articles can be put into.)

  • web: A network of endorsements over presences, generally about a type of quality that presences can have that tells us which sorts of tags they are assured to use agreeably.

To reiterate: presences apply tags to articles to organize them, filter them, and to alert interested parties of them. The webs of trust in which presences are organized speed and shape the propagation of updates about what articles have been tagged recently, and guide queries over the presences most worth visiting.

That's pretty much it. The rest of the document will give you a clearer picture of how many things those primitives will enable.

Some tags will have simple, objective meanings. music, for instance. A tag like this would be affiliated with the uses basic tags correctly web (meaning that basically everyone would be able to use it), it would be useful for confirming that an article is music, but it might not be an especially useful tag to most people for finding or promoting attentionworthy examples of music. Here's where things get interesting:

Consider a tag called good music. Its meaning would, of course, need to be subjective, and webs of trust can support that! You would find a good music taste web, Find someone you align with and trust them along the good music taste dimension, and you'll get their recommendations, and if they haven't recommended anything today you'll see the recommendations of the people they trust, and so on, and it will immediately function as a music recommendation system that you and the musicfriends have complete control over. You would wake up every morning and have your client essentially run a query like "time:today tag:good music from:my(good music taste) min_similarity:0.04" and it would all be great stuff, or if it's not, you can rearrange your endorsements and move towards a web where it is.

The crucial advantage this has over other recommender systems that use user similarity, is that it is fully transparent, accountable, and controllable. You can see how it works, you know where the recommendations are coming from, and you can fix it yourself when there is too much bad or not enough good being recommended. It is not a black box algorithm. You can trust it for a lot more, because it consists of people, who you can see.

A taxonomy of very good webs of trust that should arise under a healthy culture of usage
  • There are a few basic webs that every human should be in. In most cases, they will be bundled together and reciprocated through a friend endorsement.

    Very silly abusive actions can sometimes result in people getting cut out of some of these. Since they're so important, we must try to make sure the punishment is suitable and proportionate to the crime and not a complete disenfranchisement from the entire system, so that's why we have to separate them into different specific categories when we can.

    • Isn't a spambot

    • Isn't a propagandist from a troll farm. Will, inevitably, be occasionally partially infiltrated by propagandists from troll farms, but it would still be a good thing to have for fighting them.

    • uses basic tags correctly. Lets people use any tag with truly simple, objective criteria, stuff like is a picture of a dog, is a book review.

    • suggests respectfully allows them to issue requests that an article be lifted up by widely endorsed tag curators who are open to suggestions.

  • Meta topics

    • curator curator curators: metametacurators

    • good webs: An intentionally vague and subjective tag for discovering a wide variety of places

    • Fancifully; it would be neat to have a web that maintains a light read that explains how people are supposed to use the system, proposals for alterations to the text of culture of common usage.

  • scholarly consensus: Wikipedian type people (or the type of people wikipedians ought to be) who are honest and informed about what reasonable people should be able to agree on. Trust only good scholars and you will be a good scholar too.

    Wikipedia generally works because no portion of the accountholding userbase brigades articles outside of their expertise. This will be a useful mechanism for maintaining that sort of condition.

    • Most users aren't going to want to post like this. The tags that scholarly consensus curate are no use for speculation or for anything subjective. Upholding the norms of scholarly consensus requires you to post very conservatively. That is usually boring, a lot of the time it's not even intellectually productive.

    • Many users won't have the types of research skills or the network epistemic sense to post like this. Maybe they will aspire to learn.

    • Regardless, this web will perform many important functions that require soberly reporting widely accepted truths.

    • There would be controversies and web splits sometimes, of course, but unlike with most webs they should always be considered to be potentially avoidable tragedies. Wherever scholarly consensus gets into commenting about things that might turn out to be deeply importantly wrong, well that never had any place in scholarly consensus in the first place.

  • good talk: Replies well, converses well. Use it to decide who you want to see first in replies and who you want to be able to receive reply notifications from.

    • The first thing is sort of a stricter requirement than the second thing. There should probably also be replies okay a web that means "I want to be told when these people are saying something to me, even if I don't necessarily want to see their replies first when they're replying to something else". Many people, though not all, are totally up to having the level of responsibility to be allowed to begin dialogue with a random stranger online. It would be a much broader web.
  • a general class: networks of -taste. When a post or comment is tagged with, for instance, good music, we will only see that post or comment in our queries if it was tagged by someone who is near to us in our good music taste network. For many types of tag, having curators turns out to be very important. Tagging/categorization systems currently mainly only work reliably for very plain objective qualities that everyone can pretty much agree on the meaning of and which few people have an incentive to lie about, and such categories tend to be boring. For almost any interesting category, things like important news, honest summary, interesting game, scientific breakthrough, there can be no widely agreeable shared objective understanding of which articles belong in it. In practice, such categories, when opened to the most general possible audience, will tend to become pretty mediocre. Many will leave and the ones who remain will do so grudgingly. Given a subjective inclusion though, networks of -taste could work wonderfully.

    • Real world examples of tag systems that suffer from being uncurated

      • reddit's subreddits, which provide no way of holding voters accountable for upvoting reposts, upvoting inane bullshit, or downvoting interesting or challenging things, so they tend to end up either needing extensive moderation, to an extent that almost obviates the usefulness of reddit's vote sorting.

      • twitter's hashtags, which the people who get the most out of twitter never seem to use in earnest. I have seen more hashtags being brigaded than I have seen hashtags being used fruitfully. If I have seen any hashtags used fruitfully, I don't recall, they must not have been for me.

    • I can't wait to see if taste in humor would work. Shitposting twitter tends to be pretty powerful so it'd probably work really well.

  • I think it might turn out to be impossible to engage in nation-sized online political discourses without some kind of actually listens web.

    • Since the entire point of it would be to facilitate civil political dialogue between people who do not already agree about everything I think this probably wouldn't result in totalizing echo chambers, especially if you factor in users changing their behaviour in response to knowing that this specific moderation pressure to be respectful and accessible are hanging over them.

I wish I could present a clear and complete design, but that will take some consideration. For now I'll just throw some thoughts out

  • A lot of it would just look like a feed

  • Some quick ways of responding to any articles

    • bookmark, for when something is important but large and should be considered again later. All platforms should have this!

    • relevant, a signal that means "I do agree that this belongs in these tags in this web". Should probably cause your presence to confirm the tag? But I dunno, I kind of want there to be a lower stakes "I appreciate this" signal that just goes to the article submitter.

    • irrelevant, a signal that means "This should not be in these tags or this web". May be used to prevent an article from propagating upstream to your endorsers? but also just makes a note locally for the client to help you to identify bad curators. If enough bad signals turn out to stem from a specific curator that you have trusted (with an inadequate propotion of relevant signals to balance them out), the client will ask if you'd like to unendorse that curator.

  • The suggestion queue.

    • If a user is in the basic suggests respectfully web, they can put things before their favored curators, who may then tag the article themselves. Curators may, if they choose, view those suggestions in their suggestion queue. And they may well choose to, because it helps them to find new things to post.

    • There should probably be some automated system that notices when an outside presence has submitted multiple successful articles, and recommends bringing them inside. This would have a lot of subtleties though. Hm. I guess that would be another governance process that curators would have to opt into, similar to the suggestion queue.

  • A way of viewing the feed of each presence the user owns (the articles tagged with the relevant tags by the presences that presence trusts).

  • Maybe allow seeing the view of presences they don't own, but I'm unsure. They could get an equivalent experience by creating a presence of their own that just trusts that other presence. This would have two advantages.

    • Encourage them to get engaged with building their own presence in the web, instead of relying on the submission process or not contributing articles at all

    • Lower the barrier to making a view that consists of multiple curators by trusting additional curators. If all a person knows how to do is select one curator, they wont get as much benefit from the software. We should encourage them to use it in a different way.

  • To consume information in a healthy way, the user needs to be able to control what they'll be seeing on any given session. Sometimes we don't want to face the notifications of one of our accounts. Sometimes we want to make sure we will be able to engage in a particular news-processing task or research project without being distracted. For such cases, users will be encouraged to define modes, or use profiles, which they can switch between.

    • Modes would mostly consist of a combination of the views of some of the user's presences, I think? Maybe it should support hand-written queries, in some cases.
  • Please can we not make it a web browser app. All web apps consume infinite memory and become slow and the layout language of the web is actually pretty bad. The answer is no, though. There aren't any great alternatives to the web for multiplatform UI development, afaik, and the site will need to be able to render a lot of things for the web anyway, for the sake of making it widely accessible. I really wish we didn't have to. We'll just have to work really hard at doing things in the simplest, fastest, lightest way.

Getting this made

This is not going to happen at all if you leave it to me alone.

If you come to believe that a system like this would be good, reach out, and we'll get organized, and then maybe it will happen.

I should mention up-front that I'm not very interested in doing it in a for-profit way. Systems like this should be managed by non-profits that are constitutionally obligated to steward responsibly over any global political discourses they might come to host. They should not be designed around enriching their founders, nor around self-preservation.

(That said, of course, a good thing must fight to grow faster than the bad things that are growing now.)

For a bit of additional writing about webs of trust and part of an idea for making them efficient to query, see Using neighborhood approximation to make trust queries more efficient


Petrov Day celebration

6 сентября, 2020 - 04:09
Published on September 6, 2020 1:09 AM GMT

Come join us as we celebrate the world not having been destroyed, and raise a glass of Vodka to Stanislav Petrov.

We have chosen an outdoor venue for pandemic concerns, and will have a potluck for anyone willing to break bread during these troubled times. Masks are encouraged if you are not eating or drinking, and social distancing is recommended for anyone not already spending time together (some attendees have created quarantine circles, so don't be surprised if you see people ignoring typical precautions).

The format will mostly be casual, so feel free to drop in whenever.


Why would code/English or low-abstraction/high-abstraction simplicity or brevity correspond?

5 сентября, 2020 - 17:30
Published on September 4, 2020 7:46 PM GMT

Solomonoff Induction (SI) focuses on short code. What’s short in English is not necessarily short in code, and vice versa. Most of our intuition in favor of short, simple explanations is from our experience with English, not code. Is there literature arguing that code and English brevity usually or always correspond to each other? If not, then most of our reasons for accepting Occam’s Razor wouldn’t apply to SI.

Another way to think of the issue may be that coding is a low level or reductionist way to deal with an issue, while English is a high level approach that uses high level tools like explanations. Ideas can be represented in many ways, including at different levels of abstraction. It’s unclear that length or simplicity is consistent for the same idea across different levels of abstraction. That is, if you have two ideas X and Y, and X is simpler and shorter when you compare both ideas at a one level of abstraction, it may be unsafe to assume that X will also be simpler and shorter than Y when you compare them at a different level of abstraction. Is there any literature which addresses this?


Li and Vitalyi's bad scholarship

5 сентября, 2020 - 17:12
Published on September 5, 2020 8:07 AM GMT

In An Introduction to Kolmogorov Complexity and Its Applications 4th edition Li and Vitalyi claim that Solomonoff Induction solves the problem of induction. In Section 5.1.4 they write:

The philosopher D. Hume (1711–1776) argued that true induction is impossible because we can reach conclusions only by using known data and methods. Therefore, the conclusion is logically already contained in the start configuration. Consequently, the only form of induction possible is deduction. Philosophers have tried to find a way out of this deterministic conundrum by appealing to probabilistic reasoning such as using Bayes’s rule. One problem with this is where the prior probability one uses has to come from. Unsatisfactory solutions have been proposed by philosophers such as R. Carnap (1891–1970) and K.R. Popper.

What Hume actually wrote about induction was (Section IV, Part II, pp. 16-17):

Should it be said that, from a number of uniform experiments, we infer a connexion between the sensible qualities and the secret powers; this, I must confess, seems the same difficulty, couched in different terms. The question still recurs, on what process of argument this inference is founded? Where is the medium, the interposing ideas, which join propositions so very wide of each other? It is confessed that the colour, consistence, and other sensible qualities of bread appear not, of themselves, to have any connexion with the secret powers of nourishment and support. For otherwise we could infer these secret powers from the first appearance of these sensible qualities, without the aid of experience; contrary to the sentiment of all philosophers, and contrary to plain matter of fact. Here, then, is our natural state of ignorance with regard to the powers and influence of all objects. How is this remedied by experience? It only shows us a number of uniform effects, resulting from certain objects, and teaches us that those particular objects, at that particular time, were endowed with such powers and forces. When a new object, endowed with similar sensible qualities, is produced, we expect similar powers and forces, and look for a like effect. From a body of like colour and consistence with bread we expect like nourishment and support. But this surely is a step or progress of the mind, which wants to be explained. When a man says, I have found, in all past instances, such sensible qualities conjoined with such secret powers: And when he says, Similar sensible qualities will always be conjoined with similar secret powers, he is not guilty of a tautology, nor are these propositions in any respect the same. You say that the one proposition is an inference from the other. But you must confess that the inference is not intuitive; neither is it demonstrative: Of what nature is it, then? To say it is experimental, is begging the question. For all inferences from experience suppose, as their foundation, that the future will resemble the past, and that similar powers will be conjoined with similar sensible qualities. If there be any suspicion that the course of nature may change, and that the past may be no rule for the future, all experience becomes useless, and can give rise to no inference or conclusion. It is impossible, therefore, that any arguments from experience can prove this resemblance of the past to the future; since all these arguments are founded on the supposition of that resemblance. Let the course of things be allowed hitherto ever so regular; that alone, without some new argument or inference, proves not that, for the future, it will continue so. In vain do you pretend to have learned the nature of bodies from your past experience. Their secret nature, and consequently all their effects and influence, may change, without any change in their sensible qualities. This happens sometimes, and with regard to some objects: Why may it not happen always, and with regard to all objects? What logic, what process or argument secures you against this supposition? My practice, you say, refutes my doubts. But you mistake the purport of my question. As an agent, I am quite satisfied in the point; but as a philosopher, who has some share of curiosity, I will not say scepticism, I want to learn the foundation of this inference. No reading, no enquiry has yet been able to remove my difficulty, or give me satisfaction in a matter of such importance. Can I do better than propose the difficulty to the public, even though, perhaps, I have small hopes of obtaining a solution? We shall at least, by this means, be sensible of our ignorance, if we do not augment our knowledge.

What Hume wrote isn’t that we can only use known data and methods. Rather, he said that no argument can prove that the future will resemble the past. So drawing conclusions about what will happen in the future from past data is illogical. He didn’t say that the only possible form of induction is deduction.

In addition, the future always resembles the past in some respects and not others, so saying the future resembles the past is irrelevant to creating and assessing ideas.

Popper wasn't trying to solve the problem of how to make Bayesian induction work. He claimed that induction was impossible, not that he had a way of making it work by finding the right prior (Realism and the Aim of Science, Chapter I, Section 3, I):

It seems that almost everybody believes in induction; believes, that is, that we learn by the repetition of observations. Even Hume, in spite of his great discovery that a natural law can neither be established nor made ‘probable’ by induction, continued to believe firmly that animals and men do learn through repetition: through repeated observations as well as through the formation of habits, or the strengthening of habits, by repetition. And he upheld the theory that induction, though rationally indefensible and resulting in nothing better than unreasoned belief, was nevertheless reliable in the main—more reliable and useful at any rate than reason and the processes of reasoning; and that ‘experience’ was thus the unreasoned result of a (more or less passive) accumulation of observations.

As against all this, I happen to believe that in fact we never draw inductive inferences, or make use of what are now called ‘inductive procedures’. Rather, we always discover regularities by the essentially different method of trial and error, of conjecture and refutation, or of learning from our mistakes; a method which makes the discovery of regularities much more interesting than Hume thought.

Li and Vitalyi want us to think they can solve the problem of induction, but they can’t even summarise the arguments against their position accurately.


Stop pressing the Try Harder button

5 сентября, 2020 - 12:10
Published on September 5, 2020 9:10 AM GMT

(This is a post from a daily blogging experiment I did at neelnanda.io, which I thought might also fit the tastes of LessWrong. This is very much in the spirit of Trying to Try)

I recently had a productivity coaching session, and at the end we agreed on a few actions points that I’d do by the next session. But, come the next session, these had completely slipped my mind. These suggestions were good ideas, and I had no issue with implementing them, the problem was just that they completely slipped my mind! (We then spent the second session debugging my ability to actually follow action points, and this was pretty successful!)

I think the error I made there is a really common one when planning, and one I observe often in myself and others. Often I’ll hear a cool book recommendation, offer to meet up with someone some time, hear about a new productivity technique, notice an example sheet deadline looming. But I consistently fail to action upon this. So this post is about what exactly went wrong, and the main solution I’ve found to this problem!

Planning, as I define it, is about ensuring that the future goes the way I currently want it to. And the error I made was that, implicitly, I was trying to make the future go the way I currently wanted it to. That by committing to do things, and wanting to them, and just applying effort, things would happen. And the end result of this was that I totally forgot about it. Or sometimes, that I vaguely remembered the commitment or idea, and felt some guilt about it, but it never felt urgent or my highest priority. And every time I thought about the task, I resolved to Try Harder, and felt a stronger sense of motivation, but this never translated into action. I call this error Pressing the Try Harder button, and it’s characterised by feelings of guilt, obligation, motivation and optimism.

This is a classic case of failing to Be Deliberate. It feels good to try hard at something, it feels important and virtuous, and it’s easy to think that trying hard is what matters. But ultimately, trying hard is just a means to an end - my goal is to ensure that the task happens. If I can get it done in half the effort, or get somebody else to do it, that’s awesome! Because my true goal is the result. And pressing the Try Harder button is not an effective way of achieving the goal - you can tell, because it so often fails!

A good litmus test for whether you’re pressing the Try Harder button: Imagine it’s 2 weeks from now, and you never got round to doing the task. Are you surprised that this happened? Often my intuitions are well-calibrated when I phrase the question like this - on some level I know that I procrastinate on things and forget them all the time.

But just noticing yourself pushing the Try Harder button isn’t enough - you need to do something stronger to change this. You need to find strategies that actually work. This is pretty personal, and much easier said than done! But it can be done. Look for common trends, strategies that have worked for you in the past, and things that you can repurpose.

Strategies that work for me:

  • Scaffold systems - meta-systems that I check regularly
    • Calendars
    • Trello (my to-do list) - especially future reminders that result in an email
    • Getting friends to check in with me
  • Do it now, not later. Set a 5 minute timer, and see if you can finish the task. Or at least make a start!
    • You can get a surprising amount done in 5 minutes! (Say, writing a third of a blog post)
    • Often the bottleneck is that getting started takes a bit of energy. Doing something for 5 minutes can take much less energy, and I find that timers help me focus a lot
  • Make things concrete - often tasks feel overwhelming and fuzzy, so you put them off. Can you break it down into a concrete next action? Something that can be done in under 5 minutes?
  • Schedule time for it - often the bottleneck is that it doesn’t feel urgent - I care about the task getting done, but I always have something seemingly-higher-priority to do
    • This is terrible, because in the long-run I always have something that seems short-term higher priority, and I never make time for my long-term goals
    • So if I make time in my calendar for it, and make it feel important that I stick to these, that’s valuable
    • I find focusmate.com valuable for carving out an hour for a specific task
  • Add it to a queue
    • Ie a to-do list
    • I find Trello great for this - on Desktop I can add a new card from anywhere with CTRL+ALT+SPACE, it makes it really low friction
    • Important: It’s not enough to just add it to a list - the other half of a good to-do list system is having a regular time to process the list!
      • Having an eg weekly routine for this is important - a routine doesn’t involve decisions, it can just happen automatically. While if I just say “I’ll make time for it”, that’s pushing the Try Harder button!
  • Accountability
    • Message a friend saying that I commit to this task
    • Extreme: Give a friend some money, and tell them to only give it back when the task is done

These are just the strategies that work for me - I’d love to hear what works for others, and expect it vary a lot between people. The message I want you to take from this post is just to notice when you next push the Try Harder button. And ask yourself: “am I just being virtuous and trying? Or am I trying to change what my future self actually does?”


Conflict, the Rules of Engagement, and Professionalism

5 сентября, 2020 - 08:04
Published on September 5, 2020 5:04 AM GMT

(Talk given at an event on Sunday 16th of August. habryka is responsible for the talk, Jacob Lagerros and Justis Mills edited the transcript. 

If you're a curated author and interested in giving a 5-min talk, which will then be transcribed and edited, sign up here.) 


habryka: I’m going to talk about three frames I have involving relationships and sociology. I’ll present the frames, some short justifications for why I believe them, and discuss how they connect to each other.

1. I think most relationships go better if you lean into conflict.

2. Most conflicts are hierarchically embedded within different rules, and maintaining the integrity of those rules is quite important.

3. Professionalism is really interesting. I like thinking about it, and I've gotten a bunch of value from thinking about it, because I didn't realize how much of my life has been shaped by professionalism.

Leaning into conflict

One of the things that has been pretty useful for me in life, is a general heuristic of realizing that conflict in relationships is usually net positive. (It depends a bit on the exact type of conflict, but works as a very rough heuristic.) I find it pretty valuable too, if I'm in a relationship, whether it's a working relationship, a romantic relationship, or a friendship, to pay a good amount of attention to where conflicts could happen in that relationship. And generally, I choose to steer towards those conflicts, to talk about them and seize them as substantial opportunities.

I think there are two reasons for this. 

First, if startups should fail fast, so should relationships. The number of people you could have relationships with is much greater than the number of people that you will have relationships with. So there is a selection problem here, and in order to get as much data as you want, I think going through relationships quickly and figuring out whether they will break or not is quite valuable.

Second, I've found that having past successful conflicts in a relationship is a very strong predictor for that relationship going well more generally, and for my ability to commit to the relationship and get things done within it.  In fact, I find it a better predictor of my capacity to coordinate with that person than the length of the relationship, the degree to which we even enjoy spending time with each other, or other obvious indicators. I have some models of why successful conflict is such a good predictor, which brings me to my second frame.

Honoring the rules of engagement

I have this frame of thinking about rules of conflict and hierarchically embedded conflict in a lot of different situations. The basic situation that happens relatively frequently is that I have been interfacing publicly with some organization or person on the internet, and they take some small action that hurts me in some way. They might write an angry comment, for instance, or they might try to insult me. 

A thing that I find useful to think about with these conflicts is: do we have any rules of engagement that we could rely on? That me and whoever I'm in conflict with, that we could together coordinate on, in order to prevent that conflict from spreading over into other parts of our life? With online commenting, for example, there's a generally widely accepted rule that we will not let our conflict leak into other platforms. If we're in a pseudonymous context, you won’t seek me out in a non-pseudonymous context and try to continue the conflict there.

Similarly, let's say LessWrong is in conflict with some other organization. There is a general expectation that if we have some kind of debate on the internet and there's a bunch of conflict happening there, we won't take that conflict to a broader venue like Twitter and then spark an angry mob to attack the other organization. The reason for that is pretty straightforward: the collateral damage of high levels of conflict tends to be quite large. I've often, in situations of conflict, got a lot of value out of asking myself, "What are the lowest collateral damage rules that I could embed and act on here in order to make this conflict go well?"

This leads me to frame three. 


I argue that this frame is the culmination of everything I’ve said thus far. 

I’d like to talk about professionalism as a culture. Professionalism requires navigating conflict in lots of situations successfully, and it often encourages having important conflicts early on. Furthermore, it’s a really important component of people's lives.

I've found it useful to think about professionalism, or “presenting consistent APIs to other people”, as a safeguard against serious conflict. These APIs enable very narrow interfaces, which also enables very narrow potential for collateral damage. If I try to get a plumber, it's pretty easy. I don't have to think exactly about whether the plumber will like me or not, or other broader concerns about them as a person. Instead, I can pick a plumber based only on their fulfilling that specific role.

And also, another key component that I find really interesting to think about is the degree to which professional culture is optimized for replacing any individual within a given system. So that, for example, if you do professional writing, I would say that one key component is the ability to switch out writers in the middle of the writing process, which is false for 99% of other situations involving writing. 

More broadly, I find professionalism is a modern religion whose details and culture it can be quite valuable to analyze.



jacobjacob: Regarding the first point: Nassim Taleb, who sure likes conflict, has a metaphor of forest fires. The basic idea is that if you prevent the small fires that come up naturally, it leads to an accumulation of excess flammable material, such that the next big fire is exceptionally bad. Does this resonate with your model? 

habryka: Sorry, my first reaction was just that Taleb is my key example of failing at point two, and escalating way too much. There's this thing where you disagree with Taleb on some topic and suddenly he calls you deeply disingenuous and sends an internet mob after you. That was my first reaction -- but that is mostly about Taleb as a person.

I do think that the idea of not having conflict explode violently later on is pretty important here, but that actually most of the time we manage to avoid such explosions, and that’s interesting. Humans are pretty good at predicting when big conflict will happen and actually, most of the time, we manage to prevent it. I think it’s true that small fires help guard against big ones, but I also think that big fires are rare in any case because people are good at predicting and averting them.


mingyuan: I have previously heard you talk about professionalism in a more negative sense, and I'm wondering if you've changed your perspective over the past couple months?

habryka: My current relationship to professionalism is very similar to my relationship with Christianity, which is that it’s really damn important to understand how Christianity has influenced the West, because it's everywhere. There are obviously central concerns about Christianity. The dominant one being that it is horribly wrong. But my current relationship to professionalism is: "Oh god, it is doing massive amounts of damage in many different contexts." It's also quite useful in other contexts, in the same way that Christianity is.

mingyuan: So you're not a practicing professional?

habryka: I am in some dimensions. I definitely recognize that in certain situations it's really important to present a consistent API that's consistent with professional norms. I also try to avoid situations where that's truly necessary, because the constraints that come from professionalism are quite substantial.


Ozzie: For relationships, I was curious if you’ve thought about “anti-dates” or purposely difficult situations that you could put yourself in, to generate helpful conflict. Like taking care of a few kids that are difficult for a day or organizing a friend's wedding. Things that maybe should be common practice for people dating each other.

habryka: I mean, I don't know. The first thing that came to mind, I think it was someone in our community who wrote the “questions to ask if you wanted to end your relationship”. 

Maybe it's a good idea for everyone to ask themselves those questions early on. It’s well understood that a lot of typical relationship advice is actually about vulnerability, about finding places where there might be a potential source of conflict, and stuff like that, so I think all this is relatively entangled here.

It’d be hard to think of specific tasks that would expose potential conflicts across all relationships, though, since the situations that produce conflict are very different per relationship. If I want to work with someone at an organization, the dimensions in which I need to understand how future conflict will go differs a lot from if I want to have a Dungeons & Dragons group with someone. So it's very hard for me to come up with one thing that you would want to do in both, but it doesn't feel very hard to come up with stuff I'd want to do in each individual situation to stress-test.


jacobjacob: I'm pretty confused about the claim that professionalism does conflict fast. My impression is that many professional environments are incredibly risk averse, and they’re characterized by things like “putting your emotions to the side and just showing up to do the work”. This is an important norm for some situations, but it doesn't enable conflict. 

The outlier environment that most have embraced conflict seems to be Bridgewater, a large hedge fund centered around something like “having the most crucial conversations as early as possible”. (If people know what a Hamming Circle, or even a Doom Circle, is, my impression is that at Bridgewater they basically do that all the time.) 

So could you say a little more about professionalism’s relationship to conflict?

habryka: I think the key thing here is to distinguish professional norms within and between organizations. Professionalism has very different rules for these two cases. One of the core components of professionalism between organizations is contractualism. All parties agree on a contract, and negotiations on that contract frontload a bunch of conflict. All parties  figure out what they will do in the relevant different scenarios very early on. Compared to other norms, contractualism is really costly. If I hang out with a friend and we already have an existing trust relationship we usually don't need a contract, because we trust that we will later figure out how to negotiate if something weird happens or one of us falls through on an implied obligation.

In professional relationships, if you were to rely only on norms and ad hoc negotiations, it would often end very badly. So what we do is put a lot of the negotiation and conflict very early on, where we sign a contract and the contract is very well specified. All parties negotiate the rules of the contract, bring in some lawyers, and they figure out all the ways in which the contract could go wrong. Then,  once negotiations are done, the contract codifies a working relationship. Without that frontloaded conflict, a professional relationship would go very badly. Contractualism is really powerful here.


Anonymous: Where do professional hierarchies start and end? Does professionalism have some sort of consistent variation with power or scale? And is there a predictable way it changes as you go up a power hierarchy?

habryka: Let me restate the question to make sure I get it: “Let's say we have conflicts that are hierarchically embedded. What does that hierarchy actually look like?”

The top of the  hierarchy would be something like outright war. You’re in conflict with each other and you're both just out to kill each other. And then, even above that, you're out to blackmail each other using threats to directly harm the other person's values. In the superintelligence example, this could look like torturing trillions and trillions of simulated beings that you really like. On the very lower level, it's very minor conflict, like we're playing Scrabble and/or we are playing a video game and killing each other's units. Even though there's some kind of damage we're causing each other, it's an extremely limited type of damage.

I feel like there’s a component of professionalism here that boils down to a bunch of very well-negotiated rules of conflict. These rules work pretty well when you're talking about tens of thousands to millions of dollars. I think it starts working substantially less well when you start talking about billions and trillions of dollars. Wars are not fought, for instance, within the professional culture, nor are other things with stakes in the trillions of dollars. There are still rules of conflict in these cases, but they are fought with different sets of norms.


Multivariate estimation & the Squiggly language

5 сентября, 2020 - 07:35
Published on September 5, 2020 4:35 AM GMT

(Talk given at an event on Sunday 16th of August. Ozzie Gooen is responsible for the talk, Jacob Lagerros and Justis Mills edited the transcript. 

If you're a curated author and interested in giving a 5-min talk, which will then be transcribed and edited, sign up here.) 

Ozzie: This image is my TLDR on probability distributions: 

Basically, distributions are kind of old school. People are used to estimating and predicting them. We don't want that. We want functions that return distributions -- those are way cooler. The future is functions, not distributions.

What do I mean by this? 

For an example, let's look at some of the existing COVID models. This is one of them, from the IHME:

You can see that it made projections for total deaths, daily deaths, and a bunch of other variables. And for each indicator, you could choose a country or a location, and it gives you a forecast of what that indicator may look like. 

So basically there's some function that for any parameter, which could be deaths or daily deaths or time or  whatever, outputs a probability density. That's the core thing that's happening.

So if you were able to parameterize the model in that way, and format it in these terms, you could basically wrap the function in some encoding. And then do the same forecast, but now using a centralized encoding. 

So right now, basically for people to make something like the COVID dashboard from before, they have to use this intense output and write some custom GUI. It's a whole custom process. Moreover, it's very difficult to write your own function that calls their underlying model.

But, hypothetically, if we had an encoding layer between the model and the output, these forecasters could basically write the results of their model into one function, or into one big file. Then that file could be interpreted and run on demand. That would be a much nicer format. 

Let’s take a look at Metaculus, which is about the best forecasting platform we have right now.

On Metaculus, everything is a point estimate, which is limiting. In general, it's great that we have good point estimates, but most people don't want to look at this. They’d rather look at the pretty dashboard from before, right?

So we need to figure out ways of getting our predictors to work together to make things that look more like the pretty graphs. And one of those questions is: how do we get predictors to write functions that return distributions? 

Ultimately, I think this is something that we obviously want. But it is kind of tricky to get there. 

So in Estimation Utopia, as I call it, we’d allow for people to take the results of their data science models and convert them into a unified format. But also, humans could just intuitively go ahead and write in the unified format directly. And if we have unified formats that are portable and could be run in different areas with different programming languages, then it would be very easy to autogenerate GUIs for them, including aggregates which combined multiple models at the same time. We could also do scoring, which is something that is obvious that we want, as well as compose models together.

So that's why I've been working on the Squiggly language. 

Let’s look at some quick examples! 

This is a classic normal distribution, but once you have this, some of the challenge is making it as easy as possible to make functions that return distributions. 

Here's a case for any t:

We're going to give you a normal, with t as a mean and the standard deviation of 3. This is a plot where it's basically showing bars at each one of the deciles. It gets a bit wider at the end. It's very easy once you have this to just create it for any specific combination of values.

It's also cool, because once you have it in this format, it's very easy to combine multiple models. For instance, here’s a lognormal. 

For example, if I have an estimate and my friend Jacob has an estimate, then we could write a function that for every time t, basically queries each one of our estimates and gives that as a combined result. 

This kind of shows you a problem with fan charts, that they don't show the fact that all the probability amasses on the very top and the very bottom. That's an issue that we'll get over soon. Here’s what it looks like if I aggregate my model with Jacob’s. 


Raemon: I had a little bit of excitement, and then fear, and then excitement again, when you talked about a unified format. The excitement was like, "Ah, a unified format, that sounds nice." Then I had an image of all of the giant coordination problems that result from failed attempts to create a new unified format, where the attempted unified format becomes yet another distinct format among all the preexisting options.

Then I got kind of excited again because to a first approximation, as far as I can tell, in the grand scheme of things currently, approximately zero people use prediction markets. You might actually be able to figure out the right format and get it right the first time. You also might run into the same problems that all the other people that tried to come up with unified formats did, which was that it was hard to figure that out right at the beginning. Maybe now I am scared again. Do you have any thoughts on this?

Ozzie: Yeah, I'd say in this case, I think there's no format that does this type of thing yet. This is a pretty unexplored space. Of course, writing the first format in a space is kind of scary, right? Maybe I should spend a huge amount of time making it great, because maybe it'll lock in. Maybe I should just iterate. I'm not too sure what to do there.

And there are also a few different ways that the format could go. I don't know who it's going to be the most useful for, which will be important. But right now, I'm just experimenting and seeing what's good for small communities. Well, specifically what’s good for me.

Raemon: Yeah, you can build the thing that seems good for you. That seems good. If you get to a point where you want to scale it up, making sure that whatever you're scaling up is reasonably flexible or something might be nice. I don't know.

Ozzie: Yeah. Right now, I'm aiming for something that's good at a bunch of things but not that great at any one of them. I'm also very curious to get outside opinions. Hopefully people could start playing with this, and I can get their thoughts.


habryka: This feels very similar to Guesstimate, which you also built, just in programming language as opposed to visual language. How does this project differ?

Ozzie: Basically, you could kind of think about this as “Guesstimate: The Language”. But it does come with a lot of advantages. The main one is that you could write functions. With Guesstimate you couldn't write functions. That was a gigantic limitation!

Really, a lot of Squiggly is me trying to remake for my sins with Guesstimate. With Guesstimate, if one person makes a model of how the damage from bicycling, like the micromorts that they're taking when they bike, that model only works for them. If you wanted to go and configure it to match your situation, you’d have to go in and modify it manually. It's actually very difficult to port these models. If one person writes a good model, it's hard for somebody else to copy and paste it, hopefully into another programming tool. It's not very portable.

So I think these new features are pretty fundamental. I think that this is a pretty big step in the right direction. In general text-based solutions have a lot of benefits when you can use them, but it is kind of tricky to use them.


Johnswentworth: I'm getting sort of mixed vibes about what exactly the use case here is. If we're thinking of this as a sort of standard for representing models, then I should be able to convert models in other formats, right?  Like, if I have a model in Excel or I have a model in Pyro, then there should be some easy way to turn it into this standard format?

On the other hand, if we're trying to create a language in which people write models, then that's a whole different use case where being a standard isn't really part of it at all (instead it looks more like the actual UI you showed us). 

So I'm sort of not sure what the picture is in your head for how someone is actually going to use this and what it's going to do for them, or what the value add is compared to Excel or Pyro.

Ozzie: Yeah, great question. So I would say that I’d ideally have both data scientists and judgemental forecasters trying to use it, and those are two very distinct types of use cases, as you mentioned. It's very possible that they both want their own ideal format, and it doesn't make sense to have one format for the two of them. I’m excited for users who don't have any way of making these methods intuitively at the moment.

Suppose, for example, that you’re trying to forecast the GDP of US for each year in the coming decades. 

Step one is making sure that, basically, people on Metaculus or existing other forecasting platforms, could basically be writing functions using this language and then submitting those instead of just submitting point forecasts. So you’d be able to say “given as input a specific year, and some other parameters, output this distribution” -- instead of having to make a new and separate forecast for each and every year. Then having  the whole rest of the forecasting pipeline work with that (e.g. scoring, visualisations, and so forth). 

When you do that, though, it is pretty easy to take some results from other, more advanced tools, and put them into probably very simple functions. So, for instance, if there is a distribution over time (as in the GDP example), that may be something they could interpolate with a few different points. There could be some very simple setups where you take your different Pyro model or something that actually did some intense equations, and then basically put them into this very simple function that just interpolates based on that and then uses this new format.

Johnswentworth: What would be the advantage of that?

Ozzie: It’s complicated. If you made your model in Pyro and you wanted to then export it and allow someone to play with it, that could be a tricky thing, because your Pyro model might be computationally expensive to run. As opposed to trying to export a representation that is basically a combination of a CSV and a light wrapper function. And then people run that, which is more convenient and facilitates more collaboration.

Johnswentworth: Why would people run that though? Why do people want that compressed model?

Ozzie: I mean, a lot of the COVID models are like that, where basically the running of the simulation was very time intensive and required one person's whole PC. But it would still be nice to be able to export the results of that and then make those interactable, right?

Johnswentworth: Oh, I see. Okay, I buy that.

Ozzie: I also don't want to have to write all of the work to do all of the Pyro stuff in this language. It's way too much.

Johnswentworth: Usually, when I'm thinking about this sort of thing, and I look at someone's model, I really want to know what the underlying gears were behind it. Which is exactly the opposite of what you're talking about. So it's just a use case that I'm not used to thinking through. But I agree, it does make sense.


Ollie: Why call the language Squiggly? There were a surprising lack of squiggles in the language. I was like, "Ah, it makes sense, you just use the squiggles as the primary abstraction" -- but then you showed me your code editor and there were no squiggles, and I was very disappointed.


 Yeah, so I haven't written my own parser yet. I've been using the one from math.js. When I write my own, it's possible I'll add it. I also am just really unsure about the name.


When should I be concerned about my Oura measurements indicating COVID-19?

5 сентября, 2020 - 01:15
Published on September 4, 2020 10:15 PM GMT

The Rockefeller Neuroscience Institute seems to have developed an app that can detect COVID-19 3 days before symptoms based on Oura data but the app isn't publically available. How should a user that doesn't have access to the app interpret his Oura measurements to know when they should self-isolate or get tested?


Hello ordinary folks, I'm the Chosen One

4 сентября, 2020 - 22:59
Published on September 4, 2020 7:59 PM GMT

This is an informal argument against the fine-tuned universe. The complete argument can be found on my website. The purpose is to show the importance of perspectives in reasoning, especially in anthropic topics. See my previous post for a summary.

Why I am the Chosen One

I know this sounds ridiculous. But bear with me for a minute. I will prove it to you. The clue is in the history.

I have 2 parents, 4 grandparents, and 8 great-grandparents. This number keeps multiplying. 50 generations back, the theoretical upper bound for the number of my ancestors is about 10^15, way above the total number of humans ever lived. Of course, there would be significant overlaps in this mega family tree. At this scale, calling it a family web would no longer be a joke. Nonetheless, even with overlapping, it still means going back far enough, say around 1000AD, my direct ancestors would cover a significant portion of the world population.

My existence is the result of all those people’s reproductive success. That means any historical event that affected the lives of a moderate number of people must have unfolded exactly the way it did for the creation of me. If Alexsander lost the battle of Gaugamela, or Genghis Khan failed to unite the Mongol tribes, or Columbus did not reach the New World in 1492, I would not be here.

Why stop there? It can go even further and be more disturbing. For my existence, it is not enough that all my ancestors successfully produced offsprings, they had to produce the exact offsprings. Meaning in each case the exact sperm had to fertilize the exact egg. Couple this with the exponential growth of my family tree the chance of my existence is unfathomably small. Yet, here I am.

This is either a statistical miracle or, an unknown force has guided every aspect of the past to ensure my existence. The odd is too obvious. The history must be fine-tuned for me.

The Fine-Tuned Universe

I can imagine how people would react if I tell them the above. They will say I am unbelievably egocentric and narcissistic. Why would history care if you are produced or not? Something could very well happen differently causing you not to be born. Big deal? More importantly, if the past is fine-tuned for you, am I just a by-product of that fine-tuning? I can use the same argument from my perspective and say history is fine-tuned for me instead of you. Safe to say it is not going to be well received.

Foreseeing these criticisms I decide to modify that argument a bit. I need more allies on this. So instead of focusing on the immediate “me”, extend it to “my kind”. Depending on how inclusive I want to be, that could mean humans, or life, or conscious beings, or complex physical systems, or maybe something even more general. On the other hand, to keep the probability low, extend the history further back. Consider the entire universe: all of its past up to the initial conditions and how the fundamental parameters came to be.

Now we have the argument known as the fine-tuned universe. It looks at the fundamental parameter of our universe and made an amazing discovery. They are all compatible with life's existence. Given the odds, "life" must be in some way significant to the universe. It is still the same egocentric and narcissistic argument. But this time, it is less obvious. Because everybody discussing it is"life". We are all on the same side.

Perspective is the key

Both arguments take a first-person perspective while presenting their evidence. I analyzed the historical events basing on their eventual effect on "my" existence. The fine-tuning argument analyzed all fundamental parameters according to their compatibility with life (our kind). There is nothing inherently wrong with doing so. However, we must realize this focus on oneself is perspective dependent. E.g. If another person analyzes historical events the same way I just did, he would analyze them based on their effects on him rather than on me.

From here, if I am to ask "why are all past events compatible with my existence?" it would be a perspective-dependent question. So it must accept a perspective-based answer. The answer is clearly "because I would always find myself exist". Reasoning from any perspective, it would always conclude the existence of its perspective center. That tautology is the Weak Anthropic Principle (WAP) response to fine-tuning.

However, proponents of fine-tuning argue that response is not a causal explanation of the fundamental parameters' value. Others criticize it for being unscientific. To that, I would say of course it is not scientific or causal. Because it is not answering a scientific or causal question. For those answers, the question should be objective/impartial. It should not be focusing on "me", or be perspective-dependent.

There are multiple ways to be impartial, but perhaps the easiest is to take a god's eye perspective, to reason with "a view from nowhere". I personally think that is not the best approach but that is a topic for another day. The important thing here is to recognize "why are the fundamental parameters compatible with life?" and "why the parameters have these values" are two different questions. They are formulated from different perspectives. The WAP response answers the previous question. While an impartial/scientific explanation is needed for the latter.

Fine-tuning presents the first-person question. Yet, it demands an impartial explanation. It does not reason from a consistent perspective. It effectively assumes we are significant not just to ourselves, but also to the universe. That is why it usually ends up with teleological conclusions. E.g. "I" am the chosen one or, the universe is designed to support life.


Open & Welcome Thread - September 2020

4 сентября, 2020 - 21:14
Published on September 4, 2020 6:14 PM GMT

If it’s worth saying, but not worth its own post, here's a place to put it. (You can also make a shortform post)

And, if you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are welcome.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the new Concepts section.

The Open Thread tag is here.


Estimating the ROI of Insulation

3 сентября, 2020 - 23:30
Published on September 3, 2020 8:30 PM GMT

A few months ago I wrote about trying to decide what to do with our shed. It turns out that not only is our property too small for a tiny house ("backyard cottage"), the only reason we're able to have the shed where it is it's because it is grandfathered in. The shed turned out to have solid footings, so we decided to have it repaired.

I hired a mason to redo the cinderblock and a carpenter to replace the roof. The mason has finished, and I'm waiting for the carpenter to start:

When the carpenter is done, we'll have a weathertight structure, but we still need to figure out what we want to do with the inside. One thing I'm trying to figure out is how much insulation makes sense. As a detached building with electricity, we would probably use resistive heat, which is relatively expensive (~$0.21/kWh).

The structure is:

Walls 335 sqft Roof 145 sqft Floor 145 sqft Door 18 sqft Side windows 12 sqft End window 8 sqft

Most of the structure is exposed to air, which is generally much colder than the ground, so I'm going to ignore the floor.

Heat loss is proportional to area and temperature difference, and also varies by material. This per-material factor is called the U-factor (heat / area-degree). When dealing with conductive heat loss, which is the main kind of loss I expect us to see here, people generally use its reciprocal, the R-value. Attempts at estimating the R-values of the shell components, in F*sqft*hr/BTU:

Walls 8" cinderblock 1.11 Roof 3/4 plywood sheathing, asphault shingles 0.94 + 0.44 Door solid wood 2.17 Side windows single pane 0.91 End windows double pane 3.4

This gives me:

335 sqft / 1.11 F*sqft*hr/BTU + 145 sqft / 1.38 F*sqft*hr/BTU + 18 sqft / 2.17 F*sqft*hr/BTU + 12 sqft / 0.91 F*sqft*hr/BTU + 8 sqft / 3.4 F*sqft*hr/BTU = 430 BTU / F*hr

How many heating degree hours (F*h) should we count? If it is 50F outside for one hour and I want it to be 70F inside, that's 20F*h. What is it over the course of a year?

The standard approach is to look up heating degree days for your location. This is a standard number computed for many places, and assumes you heat your building to 65F. Per NationalGrid, over the last 30 years Boston has averaged 6,521 heating degree-days. We're not planning to live in the shed, though, and instead we would probably use it as a home office. So we only care about heating degree hours that fall during working hours. There aren't standard tables for this, as far as I can find, but we can calculate them from NOAA's Local Climatological Data summaries.

LCD offers hourly temperatures for each weather station as CSV, and then you can do your own processing. I wrote a script (github) that assumes we heat it to 68F, turning on the heat at 7am (since it cooled off at night and needs time to warm up) until 5pm.

It does tend to be warmer during the day than overall, though it's more of a difference in summer:

During working hours, 30% of the time we need no heat at all. The rest of the time we typically need a modest amount of heat, and occasionally it is very cold and we need a lot:

On average, over all the working hours including the ones when the heat is off, we need 7.09 heating degrees. If someone used it full time (9hr/d x 250d/y) that's 2,250hr/y and 16,000 heating degree hours.

Going back to our calculation earlier, we can now estimate how much heat we need:

430 BTU / F*hr * 16,000 F*hr * 0.00029 kWh / BTU = 1990 kWh

At $0.21/kWh, this is $420/y in heating. What would it be with insulation?

Let's imagine we use R-15 insulation for the walls and R-30 for the ceiling, plus 1/2" drywall (R-0.45). Where does that leave us?

335 sqft / (1.11+15+0.45) F*sqft*hr/BTU + 145 sqft / (1.38+30+0.45) F*sqft*hr/BTU + 18 sqft / 2.17 F*sqft*hr/BTU + 12 sqft / 0.91 F*sqft*hr/BTU + 8 sqft / 3.4 F*sqft*hr/BTU = 49 BTU / F*hr 49 BTU / F*hr * 16,000 F*hr * 0.00029 kWh / BTU * $0.21 / kWh = $48/y

It looks like it would save about $375/y, and bring heating costs down to a very reasonable $48/y. This isn't perfect because it ignores the effect of thermal mass (so too high) and heat loss through the floor (so too low), but I think it's probably in the right range.

How much would that much insulation cost? Fiberglass R-15 is about $0.70/sqft and R-30 is about $0.91/sqft, add $0.37/sqft for the drywall and another $0.37/sqft for the wall studs, and ignore my installation time (since this is something I would enjoy doing), and this is ~$668. Payback of 1.5y for something that should last decades, and would make the space much nicer than having concrete walls. Seems worth it!


Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda

3 сентября, 2020 - 21:27
Published on September 3, 2020 6:27 PM GMT

Tl;dr We are attempting to make neural networks (NN) modular, have GPT-N interpret each module for us, in order to catch mesa-alignment and inner-alignment failures.

Completed Project

Train a neural net with an added loss term that enforces the sort of modularity that we see in well-designed software projects. To use this paper's informal definition of modularity

a network is modular to the extent that it can be partitioned into sets of neurons where each set is strongly internally connected, but only weakly connected to other sets.

Example of a “Modular” GPT. Each module should be densely connected w/ relatively larger weights. Interfaces between modules should be sparsely connected w/ relatively smaller weights.

Once we have a Modular NN (for example, a GPT), we will use a normal GPT to map each module into a natural language description. Notice that there are two different GPT’s at work here.

GPT-N reads in each “Module” of the “Modular GPT”, outputting a natural language description for each module.

If successful, we could use GPT-N to interpret any modular NN in natural language. Not only should this help our understanding of what the model is doing, but it should also catch mesa-alignment and inner-alignment failures.


There are a few intuitions we have that go counter to other’s intuitions. Below is an elaboration of our thoughts and why we think this project could work.

Finding a Loss function that Induces Modularity

We currently think a Gomory-Hu Tree (GH Tree) captures the relevant information. We will initially convert a NN to a GH Tree to calculate the new loss function. This conversion will be computationally costly, though more progress can be made to calculate the loss function directly from the NN. See Appendix A for more details

Small NN’s are Human Interpretable

We’re assuming humans can interpret small NN’s, given enough time. A “Modular” NN is just a collection of small NN’s connected by sparse weights. If humans could interpret each module in theory, then GPT-N could too. If humans can interpret the interfaces between each, then GPT-N could too. This may require explicit examples of interpreting small NN’s (see Appendix A)

Examples from NN Playground are readily interpretable (such as the above example).

GPT-3 can already turn comments into code. We don't expect the reverse case to be fundamentally harder, and neural nets can be interpreted as just another programming language.

Microscope AI has had some success in interpreting large NN’s. These are NN’s that should be much harder to interpret than modular NN’s that we would be interpreting.

Technical Questions:

First question: Capabilities will likely be lost by adding a modularity loss term. Can we spot-check capability of GPT by looking at the loss of the original loss terms? Or would we need to run it through NLP metrics (like Winograd Schema Challenge questions)?

To create a modular GPT, we have two paths, but I'm unsure of which is better.

  1. Train from scratch with modified loss
  2. Train OpenAI’s gpt-2 on more data, but with added loss term. The intuition here is that it’s already capable, so optimizing for modularity starting here will preserve capabilities.
Help Wanted

If you are interested in the interpretability of GPT (even unrelated to our project), I can add you to a discord server full of GPT enthusiasts (just DM me). If you're interested in helping out our project specifically, DM me and we'll figure out a way to divvy up tasks.

Appendix AGomory-Hu Tree Contains Relevant Information on Modularity

Some readily accessible insights:

  1. The size of the minimum cut between two neurons can be used to measure the size of the interface between their modules.
  2. Call two graphs G and G’ on the same vertices equivalent if for every two u,v, the sizes of their minimum cuts are the same in G and G’. It turns out that there always exists a G’ which is a tree! (The Gomory-Hu tree.)
  3. It turns out that the minimum cut between two neurons within a module never needs to expose the innards of another module.

Therefore, the Gomory-Hu tree probably contains all the information needed to calculate the loss term and the hierarchy of software modules.

Interpret Small NN’s

We may be able to generate large amounts of training examples using very small NN’s. This way, we have a “ground truth” to compare against, as well as having many extra examples to train on. [I’m uncertain here, and more thought would be required to flesh this out]


Emotional valence vs RL reward: a video game analogy

3 сентября, 2020 - 18:28
Published on September 3, 2020 3:28 PM GMT

I recently read a book about emotions and neuroscience (brief review here) that talked about "valence and arousal" as two key ingredients of our interoception. Of these, arousal seems pretty comprehensible—the brain senses the body's cortisol level, heart rate, etc. But the valence of an emotion—what is that? What does it correspond to in the brain and body? My brief literature search didn't turn up anything that made sense to me, but after thinking about it a bit, here is what I came up with (with the usual caveat that it may be wrong or obvious). But first,

Definition of "the valence of an emotional state" (at least as I'm using the term)

Here's how I want to define the valence of an emotional state:

  • When I'm proud, that's a nice feeling, I like having that feeling, and I want that feeling to continue. That's positive valence.
  • When I have a feeling of guilt and dread, that's a bad feeling, I don't like having that feeling, and I want that feeling to end as soon as possible. That's negative valence.

There's a chance that I'm misusing the term; the psychological literature itself seems all over the place. For example, some people say anger is negative valence, but when I feel righteous anger, I like having that feeling, and I want that feeling to continue. (I don't want to want that feeling to continue, but I do want that feeling to continue!) So by my definition, righteous anger is positive valence!

There are some seemingly-paradoxical aspects of how valence does or doesn't drive behavior:

  • Sometimes I have an urge to snack, or to procrastinate, but doing so doesn't make me happy or put me in a positive-valence state; it makes my mood worse, and I know it's going to make my mood worse, but I do it anyway.
  • Conversely, sometimes it occurs to me that I should go meditate, and I know it will make me happy, but I feel an irresistible urge not to, and I don't.
  • ...and yet these are exceptions. I do tend to usually take actions that lead to more positive-valence states and fewer negative-valence states. For example, I personally go way out of my way to try to avoid future feelings of guilt.

(See also: Scott Alexander on Wanting vs Liking vs Approving)

How is emotional valence implemented computationally? A video game analogy

In Doom II (1994 ... I guess I'm showing my age), you could lose a bunch of health points all at once by getting hit by an enemy (left), or you could go running on lava and you'll lose a few health points every second until you get off the lava (right). By analogy, when I eat junk food, I get a big transient positive reward (a.k.a. "dopamine hit"); when I feel positive-valence emotions (happy, proud, righteous indignation, etc.) I claim that I'm getting a constant stream of positive reward as long as I'm in that state; and conversely when I feel negative-valence emotions (guilt, suffering, etc.), I claim that I'm getting a constant stream of negative reward as long as I'm in that state.  

Here's a simple picture I kinda like, based on an analogy to action-type video games. (Ha, I knew it, playing all those video games in middle school wasn't a waste of time after all!)

In many video games you control a character with a "health" level. It starts at 100 (or whatever), and if it ever gets to 0, you die. There are two ways to gain or lose health:

  • Event-based health changes: When you get hit by an enemy, you lose health points. When you fall from a great distance and hit the ground, you lose health points. When you pick up a health kit, you gain health points. Etc.
  • State-based health changes: In certain situations, you lose or gain a certain number of health points every second.
    • For example, maybe you can walk across lava, but if you don't have the appropriate protective gear, you keep losing health points at a fixed rate, for as long as you're in the lava. So you run across the lava as fast as you can, and with luck, you can make it to the other side before you die.

In the brain, I've come around to the reinforcement-learning-type view that the neocortex tries to maximize a "reward" signal (among other things). So in the above, replace "gain health points" with "get positive reward", replace "lose health points" with "get negative reward", then the "state-based" situation corresponds to my current working theory of what valence is. Pretty simple, right?

To be explicit:

  • If negative reward keeps flowing as long as you're in a state, you perceive that state as having negative valence. As a reward-maximizing system, you will feel an urge to get out of that state as quickly as possible, you will describe the state as aversive, and you will try to avoid it in the future (other things equal).
  • If positive reward keeps flowing as long as you're in a state, you perceive that state as having positive valence. As a reward-maximizing system, you will feel an urge to stay in that state as long as possible, you will describe the state as attractive, and you will try to get back into that state in the future (other things equal).

Worked examples

  • Other things equal, I seek out positive-valence emotional states and try to avoid negative-valence emotional states. Easy: I choose actions based in part on the predicted future rewards, and the rewards associated with the valence of my emotional state is one contributor to that total reward.
  • I have an urge to have a snack, even though I know eating it will make me unhappy: I predict that as I eat the snack, I'll get a bunch of positive reward right while I eat it. I also predict that the negative-valence feeling after the snack will dole out slightly negative rewards for a while afterwards. So I feel a bit torn. But if the positive reward is sufficiently large and soon, and the negative-valence feeling afterwards is sufficiently mild and short-lived, then this is appealing on net, so I eat the snack. In the video game analogy, it's a bit like jumping down onto a platform with a giant restorative health kit ... but then you need to run through lava for a while to get back to where you were. Well, if the health gain from the health kit is large enough to outweigh the health loss from needing to run through lava afterwards, then OK, maybe that's worth doing.
  • I have an urge to NOT meditate, even though I know meditating will make me happy: Just the opposite. Starting meditating involves stopping other things I'm doing, or turning down the opportunity to do other more-immeditely-appealing things, and that gives me a bunch of negative reward all at once. That outweighs the steady drip of positive reward that I get from time spent being happy, in my brain's unconscious calculation.


Covid 9/3: Meet the New CDC

3 сентября, 2020 - 16:40
Published on September 3, 2020 1:40 PM GMT

This week’s news all centers around policy decisions. The new data contains few important surprises, so attention shifts to what actions will be taken and how that will affect the path we follow going forward. The CDC’s fall and transformation into an arm of the White House reelection campaign is now complete. Others continue to come up with, suggest and criticize various policies. 

Before we get to all that, let’s run the numbers.

Positive Test Counts DateWESTMIDWESTSOUTHNORTHEASTJuly 9-July 151083955322925007220276July 16-July 221175065779726522120917July 23-July 291102196790324066726008July 30-Aug 5910026446221294523784Aug 6-Aug 12930426193118848621569Aug 13-Aug 19808876338415699820857Aug 20-Aug 26675456654013232218707Aug 7-Sep 2550007540112741421056

Only the West’s number here is reassuring. The South’s number here is disappointing but reflects a rebound in the number of tests after a steep decline last week. The Midwest situation continues to get worse. The Northeast has some reason to worry, but the increase is mostly explained by increased testing.

Deaths DateWESTMIDWESTSOUTHNORTHEASTJune 25-July 18586581285818July 2-July 88945591503761July 9-July 1513805392278650July 16-July 2214696743106524July 23-July 2917077004443568July 30-Aug 518317194379365Aug 6-Aug 1217386634554453Aug 13-Aug 1915768504264422Aug 20-Aug 2615037453876375Aug 27-Sep 212457593631334

The Midwest number is bad news, the West and Northeast numbers are excellent news. The South’s is an improvement, but less of an improvement than expected, so it counts as bad news. Deaths are on a clear downward trend in general and that should continue for at least several weeks, as the overall situation continues to improve right now. 

Positive Test Percentages by Region

The Covid Tracking Project’s data has a very strange and very negative number of positive tests from Massachusetts this week, which I’ve corrected to a reasonable number. 

PercentagesNortheastMidwestSouthWest7/16 to 7/222.49%5.13%13.29%8.56%7/23 to 7/292.54%5.51%12.32%7.99%7/30 to 8/52.58%7.26%12.35%6.68%8/6 to 8/132.30%5.67%14.67%6.98%8/13 to 8/202.06%5.62%9.41%6.47%8/20 to 8/261.86%5.78%9.93%5.88%8/27 to 9/21.87%6.37%9.38%4.78%

This makes it clear the Midwest is getting worse and not merely testing more, and the West is rapidly improving. The South’s situation remains ambiguous, but looking at the individual states makes it looks like things are indeed improving slowly. 

Test Counts DateUSA testsPositive %NY testsPositive %Cumulative PositivesJune 25-July 14,352,9817.1%419,6961.2%0.82%July 2-July 84,468,8508.2%429,8041.1%0.93%July 9-July 155,209,2438.4%447,0731.1%1.06%July 16-July 225,456,1688.6%450,1151.1%1.20%July 17-July 295,746,0567.9%448,1821.1%1.34%July 30-Aug 55,107,7397.8%479,6131.0%1.46%Aug 6-Aug 125,121,0117.3%502,0460.9%1.58%Aug 13-Aug 195,293,5366.2%543,9220.8%1.68%Aug 20-Aug 264,785,0566.0%549,2320.8%1.77%Aug 27-Sep 25,042,1135.5%606,8420.8%1.85%

New York’s positive percentage creeped up substantially this week while the test count continued to rise, especially in the last few days. I am definitely worried that something has gone wrong and we are no longer on a slowly but steadily improving path. If things are suddenly getting worse here now, presumably it is a school problem, and that does not at all bode well.

The national picture here however is quite good. Our test numbers creeped back up a bit and the positive percentage fell substantially. (Recorded) hospitalizations are down as well. Yesterday was the first day in a long time they didn’t decline day over day, but for now I’m treating that as a mere blip.

Center For Disease Control Sorta Partially Walks Back Its Opposition To Disease Control

After taking a pounding from all sides for several days, director Robert Redfield (who, alas, probably can’t be played by the newly retired Robert Redford in the inevitable HBO movie version, but I’m hoping he’ll make an exception because come on) ‘clarified’ the new guidelines that led to last week’s headline.

In a statement, Director Robert Redfield said those who come into contact with confirmed or probable COVID-19 patients could be tested themselves, even if they do not show symptoms of the virus.

“Testing is meant to drive actions and achieve specific public health objectives. Everyone who needs a COVID-19 test, can get a test. Everyone who wants a test does not necessarily need a test; the key is to engage the needed public health community in the decision with the appropriate follow-up action,” Redfield said.

So he allows for the possibility that people who come into contact with confirmed cases could be tested, in theory, I mean it’s a thing that happens from time to time. Very generous of him. And it’s great to hear that everyone who “needs” a test can get a test, especially considering the numerous reports that this is not the case for any meaningful value of getting a test, and the fact that this is not the case is the only good reason to revise the guidelines.

So… Do you feel clarified now? 

Me neither. This does not feel like a walk back to me. It feels like they’re doubling down.

Instead, it seems their strategy is to assert control over… evictions

I don’t want to get too deep into the economics of this move. I won’t discuss whether it is completely and totally insane, or how much it will permanently drive up rental costs since renting means the government might decide to seize your property outright and pay you nothing in return, while you maintain it under penalty of law at your own expense in the hopes that the government will one day give it back. 

I will instead say that this is completely and utterly unconstitutional and illegal and in no way something the CDC has any authority whatsoever to do. You are the Centers For Disease Control, not the Centers for Rent Control. 

So you know what? Fine. You did it. Congratulations. Burn it to the ground. CDC Delenda Est. 

Centers For Disease Control Advocates Disease Control

This just in: The CDC has also informed states that they should be ready to distribute one of two vaccine candidates by November 1.

Under normal circumstances this would be both the correct action and great news. It would mean that the two vaccine candidates have a substantial probability of being far enough along to be worth deploying soon, potentially heralding a swift end to the pandemic. Given “medical ethics” and the general overwhelming paranoia about deploying a vaccine by all Very Serious People, I have an extremely strong prior that any deployment would be too late rather than too early.

It’s certainly good news, even in these times, that they have the good sense to tell states to get ready to distribute whether or not there is any intent to actually distribute. We should get ready to distribute long before we expect to need distribution. Things will inevitably go wrong and cause delays, which we can address now before those delays cost lives.

Unfortunately on so many levels, these are not normal times. We have the president we have, who is facing a presidential election… on November 3, two days after the target date. That does not in any way feel like a coincidence.

I would be very surprised if this CDC announcement is not being made under, at a bare minimum, extreme pressure from the White House. This was a political decision, and together with other CDC news, it seems safe to respond as if the CDC is completely captured by the White House and is acting under its direct orders to serve the President’s political interests and whims, rather than as a center for the control of disease.

If we take as given that Trump is planning a big October Surprise, I’ll take ‘issues an order to distribute the vaccine early’ over every other alternative I can come up with, except for the possibility that it might actually work and win him the election.

The thing is, he’s right.

He’s not right for the right reasons. He’s not understanding the situation and doing the Bayesian calculus and realizing that early distribution of a known-to-be-safe vaccine is a huge net benefit to America and the world, and we should follow in the footsteps of China and Russia and get on that. Of course not. That’s not how he thinks. 

He will issue the order, if he issues it, because he thinks it will help him get reelected, full stop, without caring about whether it is a good idea.

That doesn’t make him wrong. If you think he’s wrong, as Tyler Cowen says, show your work.

And if and when he does issue that order, if you are Biden, how do you respond?

If Biden says ‘yes, that was the right thing to do’ then obviously it’s a huge Trump win (and also a win for the world, but in context neither side cares about that).

If Biden says ‘no, that’s not a responsible thing to do’ then Trump is the one who is doing the only action that matters to get us out of that, and Biden is the one not doing it because “medical ethics.” 

Thus, it would be a great play even if there were risks that made it a bad idea – it’s not like those risks could be properly communicated to the public. Nor could a lack of such risks be communicated to the public, especially over the objections of the Very Serious People, but also even with their full support. A huge percentage of Americans don’t want the vaccine, sight unseen, even under the best conditions. 

I wonder why the public has such distrust for public health authorities and doesn’t want to inject strange things into their bodies on such authorities’ say so. It’s not like they are constantly lying to us about pretty much everything.    

Health Experts Warn of Dangers of Ignoring Health Experts

What’s new with those vaccines in Russia and China? I can’t find any news on whether they’re working, but we do have news that the Very Serious People are Very Concerned

Whenever people who will always have objections object to something, it’s important to remember that you should not expect to update your beliefs in any particular direction. Health experts will warn about the dangers of doing the thing their ‘ethics’ say not to do, with whatever case they think is the strongest, whether or not they have a good case. So when you see them make their case, you should update based on whether their case is stronger or weaker than expected. If they make terrible arguments that are worse than you expect, you should update in favor of there not being good objections.

In this case, it seems there are two concerns.

The first concern is that the vaccine is based on the common cold. Therefore, those who have had the wrong common cold will already have an immune response ready, and the vaccine won’t work on those people. This might reduce how often the vaccine is effective. 

That’s a reasonably good objection. It’s a great objection if you’re choosing what approach to use. As an objection to deploying the vaccine versus doing nothing, though, it’s rather weak. If often the vaccine does nothing, then the calculus on whether the vaccine is a net benefit is unlikely to change much. Every extra immune person helps, and the costs of deployment are trivial relative to that benefit. What you’re looking for is active downsides, not reduced frequency of upside.

The second objection is that a previous HIV vaccine that used some similar characteristics in its delivery ended up making people more vulnerable to HIV, so they warn that this too could make people more vulnerable to HIV.

I know complete and utter BS when I see it. The previous HIV vaccine put people at risk for HIV because it was trying to be an HIV vaccine and messed up. Not because it so generically forked with the immune system that it happened to make HIV worse. This vaccine is trying to be a Covid-19 vaccine. It could plausibly make Covid-19 worse. But if Very Serious People are talking about HIV risk here, it means they have no cards to play. Update accordingly.

Arizona University Kind of Solves Covid-19

Seriously, it kind of did. Check this out.

It turns out, if you actually care about solving the problem, you can test waste water from each building, and then test everyone in the building when the water tests positive, thus catching cases before they have much chance to spread. Do that consistently, using the quick tests that are actually easy and dirt cheap, and it’s over. That doesn’t mean the University of Arizona is in the clear, because no one else is doing it and they therefore have to constantly worry about reintroduction. But if we all followed this procedure? It would all be over in a month.

This has been your periodic reminder of The Kinds of Things a Functional Civilization Would Do.

As opposed to, say, not telling people when a classmate tests positive

What About Those Reinfection Cases?

This week’s periodic panic about lack of immunity was unique because it had actual bad news to consider. Normally people don’t need actual bad news, and mumble something about how we can’t be sure how long things will last in order to sound serious. In the past, this has somehow kept happening while there were actual zero reports of reinfections.

Now there are a non-zero number of reports of reinfections, which led to a moderately larger amount of panic and fear mongering. It turns out that its frequency and intensity does respond somewhat to actual news. So how worried should we be about these new reports?

As usual, the news article starts out with the scariest take it’s willing to dish out, with bullet points like “These reinfection cases demonstrate how immunity to the novel coronavirus is somewhat transient, especially with mild infections.” But overall, I’m actually very happy with the lack of mongering going on here from Business Insider, so positive reinforcement to them. 

They get to the right answer here, which is definitely ‘not very worried.’ 

What these cases show is not that immunity is short lived. They show that a very small number of people don’t get complete immunity when they are infected. 

But that is neither surprising nor particularly impactful. A system of containment doesn’t care much about a 1% failure rate given how this virus works. With a total of 6 known cases worldwide and large incentives to find them, there’s no way the number of people who don’t regain full immunity is enough to be worth worrying about. It shouldn’t impact how anyone lives their life at least until after they have symptoms again. And in most of these cases, the secondary infections were mild anyway. 

What this definitely doesn’t mean is that we now have to suddenly worry about immunity fading quickly. In these cases, the second infection happened quickly, often within a month or so. We know for sure that immunity almost always lasts far longer than that. So this isn’t people who got immunity and then lost it, it’s people part of the small group who were never immune in the first place. Which we’d prefer didn’t happen, sure, but isn’t impactful. 

If we suddenly had six new cases, all of which had their first infection in February or March and their second one in August, then I’d be much more worried that five or six months was enough to start to meaningfully degrade immunity. That’s not what we saw, so six months is insufficient to do this. We can assume that for practical purposes immunity lasts a minimum of seven months, and then apply Lindy, and assume that the end of that is where things begin to be a problem. Which should be enough time to get the vaccine online. Excellent.

This was worse immunity news than I expected this week. But overall, does this week make us think immunity is shorter (because we found some reinfection cases) or longer (because almost everyone stayed immune one more week)? I don’t think that is clear.

Physical World Does Not Think Six Feet Is a Magic Distance

People claiming with presumably straight faces to be ‘researchers’ used that authority to get into the paper that perhaps the six foot rule could use a bit of nuance. That it matters how long you’re there for, indoors or outdoors, poorly or well ventilated, silent versus spoken versus shouting or singing, dense versus sparse crowd. If I had to choose three additional considerations when measuring risk and deciding how far to keep away and whether to require masks, then those are probably the correct variables to consider. And all their directional assessments seem right. So, good job, I guess. As far as it goes.

If it makes people actually think about their physical situations a bit and optimize somewhat, that would be great. Hopefully the nuance is net helpful. 

If you want a lot of nuance on what to be doing and how to measure risk, the microCOVID project is one option. I had the chance to comment on their document and models a bit. They didn’t take every suggestion I made, but they are definitely trying to come up with reasonable answers and provide practical help. If that seems interesting or valuable, check it out for another opinion. 

A note for those who try the microCOVID project is that their basic system of ‘use a budget to allocate risk’ originates in the need to find a policy that roommates can all live with and follow, without anyone feeling cheated or causing anything too perverse. If you have different binding constraints, different strategies will make sense for you.

Important Things Are More Important

Periodically we see outrage like this about the hypocrisy of letting Very Important People like celebrities or the rich get away with doing things that the rest of us are told not to do. It seems that while mostly not allowing concerts, New York allowed the Video Music Awards to completely break a lot of the rules. 


If anything, the report shows a decided shortage of such hypocrisy. The event had to be spread out throughout the city, extensive precautions were taken for spots that lasted only a few minutes. I am guessing that everyone involved was tested in advance, probably multiple times. And that was then shown to millions of people. Not my thing, but the same way that sports must go on, other things that bring joy to millions in exchange for the exposure of dozens or hundreds is obviously a trade-off that we want to make.

People are so against doing things that make sense, and so unwilling to deal with ‘hypocrisy’ or ‘inequality’ that they think that you not being allowed to have a private dance party means the VMAs should stop. That we shouldn’t look at the value of an activity in dollars or happiness, and compare it to the risks involved, when deciding what to do, to maybe help make this lockdown liveable for all and helping the economy survive. 

Or that we shouldn’t give extraordinary flexibility to those willing to take extraordinary precautions. If you have the time and money to test everyone and make something safe, I don’t care if it otherwise violates guidelines.

The key is that this needs consensus that the exception is a reasonable exception. That it involves minimal risk given the benefits involved, that precautions were taken, that it is an efficient allocation of risk with a solid story attached. Otherwise, even if it’s a good idea, it decays people’s willingness to follow the rules.

I would hope that the ‘it’s being broadcast to millions of people who want to see it’ rule together with the ‘it’s worth enough to spend what it takes to get everyone tested beforehand and take all the precautions’ rule would cover the right times to make an exception pretty well. 

If both of those apply, do it. If they don’t both apply, respect the rules.

Or, if there’s something you think is too important and has to be done anyway, understand that not doing so will undermine the rules themselves and decide whether it is worth it.

Contrast this with, say, Nancy Pelosi going to a hair salon and not taking precautions. There is zero excuse for that. The outrage is completely justified.


"How to Talk About Books You Haven't Read", by Pierre Bayard

3 сентября, 2020 - 16:05
Published on September 3, 2020 1:05 PM GMT

Salticidae Philosophiae is a series of abstracts, commentaries, and reviews on philosophical articles and books.

Somewhere out there is a universe where my first post here was How to Talk About Books You Haven't Read, and ours is flawed by comparison. Still, I've gotten to it at last, and here we are, with everything you need to know in order to talk about How to Talk About Books You Haven't Read, without having read it.

I can only hope that Pierre Bayard gets an inexplicable warm feeling in his chest at the moment that I publish this post.

  • We do not have access to, or an unfiltered "true" understanding of, any text.
  • The first reason for this is that our experience of any text, and our understanding of that text, is filtered by factors like our experiences with other books, our preconceptions, etc.
  • The second reason is that, even as we are reading a book, we fail to have a perfect recollection of what we have read, transforming it into a "book we have (partly) forgotten."
  • More important than having read a book is being able to understand its content, its relation to other books, and so on, which are all theoretically possible without even picking up the book.
  • Do not be afraid to talk about a book that you have not personally read.
  • Do, however, be upfront about the degree to which you are familiar with it, and in what ways.

The preface is worth noting for this passage:

As I will reveal through my own case, authors often refer to books of which we have only scanty knowledge, and so I will attempt to break with the misrepresentation of reading by specifying exactly why I know of each book.

The four abbreviations which Bayard uses are:

  1. UB, or books unknown to me.
  2. SB, or books I have skimmed.
  3. HB, or books I have heard of.
  4. FB, or books I have forgotten.

Bayard also uses the symbols - -, - , +, and ++ to denote various degrees of positive and negative opinion. Together with the previous abbreviations (which will be elaborated on in the next section), Bayard would like to see this system be more widely adopted.

  • Bayard could have been clearer (here or in the upcoming chapters) about the demarcations between each category, however. It's unclear to me where the dividing line should be drawn between UB and HB, or (to a lesser extent) SB and FB.
Ways of Not Knowing

The primary problem is that we have practical access only to a certain number of books (and the internet ultimately makes this problem worse, not better, because we can access more books than ever before but they also exist in a far greater number). Reading is also non-reading: every decision to pick up a book is also a decision to not pick up every other book.

Sometimes what we are talking about is not even the book itself but a fantasy of a book. We can exchange comments about a book and build beliefs (accurate or otherwise) about a book without having read it, or even build new beliefs about a book we have read in response to the comments of other people.

The collective library is the cultural discourse and context in which books exist. A book may be UB, or unknown to me, but I may still be able to place it in the collective library or understand its relevance. For example, I have never read Paradise Lost, but I know what's going on when someone says that it is better to reign in Hell than to serve in Heaven. I can, furthermore, not just understand references to it but make valid references of my own to that text. Another example might be the Arthurian mythos: Very few of us have read The Once and Future King, let alone any other Arthurian texts, but most of us can answer various questions about King Arthur.

SB, or skimmed books, are just that, but Bayard believes that this can be valuable, and sometimes even more valuable than reading the book more closely. We can skim linearly or circuitously, and someone who has skimmed may still get the essential facts. For example, one doesn't need to study the Meditations of Marcus Aurelius very long to understand that the central message is this: We need to focus on controlling ourselves rather than worrying about outside circumstances which we cannot control, and also Marcus Aurelius would really prefer to be dead.

Next are HB, or books which one has heard of. It is possible to get enough information about a book to meaningfully engage on it. Returning to a previous example, I am not only culturally literate with regard to Paradise Lost but could talk for a fair while on its plot, characters, themes, and so on, between what I have heard about the book in particular and what I know of its author, John Milton, and the ideas which would have appealed to him, etc etc. I can follow a conversation on Paradise Lost, and even start one.

Last of all there are the FB, or forgotten books. Only a few people have perfect memories, so for the majority of us, every book we have read is a book which we have, to one degree or another, forgotten. For the reasons described in this book, Bayard prefers to not refer to "books which I have read," or which he has "not read," but FB may be the abbreviation which most closely approximates the first.

It is worth noting that, after UB, the most common reference is to SB and HB, together, and that there are are also books which are SB, HB, and FB. Also, it is absolutely emblematic of this book that Bayard feels no shame in assigning negativity or positivity to UB in exactly the same manner as the other markings.

  • "For a true reader, one who cares about being able to reflect on literature, it is not any specific book that counts, but the totality of all books." pg 30 para 4
  • Though the object of this post is to let you get away without reading the book, I believe that it is worth reading just for the pages on Michel de Montaigne, who writes, "To compensate a little for the treachery and weakness of my memory, so extreme that it has happened to me more than once to pick up again, as recent and unknown to me, books which I had read carefully a few years before and scribbled over with my notes, I have adopted the habit for some time now of adding at the end of each book (I mean of those I intend to use only once) the time I finished reading it and the judgment I have derived of it as a whole, so that this may represent to me at least the sense and general idea I had conceived of the author in reading it."
Literary Confrontations

If we have both skimmed a book, then we might be talking about different books, really. It's possible for two people to skim different books and come away with the same general idea (consider how derivative and generic many fantasy novels are, for example), or to skim the same book but come away with different enough ideas about that book that, if not for the title, they might not realize in later conversation that they had skimmed the same book. 

The "inner library" is the set of books around which you in particular are constructed. Each of us is the sum of our own inner library. The inner book is a "fragmentary and reconstituted object" which is not the book itself, as an objective text existing outside yourself, but the book as you understood it: Roger Ebert and I may both go to the theater together (or we might have done, before he died) but we will, in this sense, be watching very different films.

The "inner book" is personal to us. It is the filter which encounters every new text and determines which elements we consciously perceive, and how we interpret those elements. The reason that Ebert and I will have watched different films is that we have different inner books. Writing is the act of bringing, in one form or another, our inner book into the world, but because our hands are imperfect and because everyone has their own inner book, this is usually unsuccessful.

  • One chapter is mostly a retelling of "Shakespeare in the Bush," which you can read in its complete form here.
Ways of Behaving

Do not be ashamed over a failure to have read a book. We need to be honest with ourselves and others about the degree to which we have not read things.

Talking about books is, of course, not reading, but the virtual library is the space in which we discuss books, and in which our inner books meet (or try to meet). Because it is a space of discussion, and none of us actually has direct access to the text itself (mediated, as the act of reading is, by our personal filters), the virtual library does not contain any "objectively existing" books but only a plethora of subjective experiences of books. There is currently a great resistance to the idea of (to use my own wording) calcifying the virtual library and acknowledging this fact.

The book is not the thing. The text can be changed by the conversation. The discussion surrounding the book is also part of the book. Books are reinvented in the reading. There are "Phantom books" based on mistaken recollections.

  • Sometimes reading is harmful. Oscar Wilde had three categories: books to read, books to reread, and books to convince people to not read.
  • Creators are critics and critics are creators. Talking about things is an act of creation.
  • "If it is true that he hasn't 'read' Hamlet, Ringbaum certainly has at his disposal a great deal of information about it and, in addition to Laurence Olivier's movie adaptation, is familiar with other plays by Shakespeare. Even without having had access to its contents, he is perfectly well equipped to gauge its position within the collective library." pg 124-125 para 2.
Favorite passageThis encounter with the infinity of available books offers a certain encouragement not to read at all. Faced with a quantity of books so vast that nearly all of them must remain unknown, how can we escape the conclusion that even a lifetime of reading is utterly in vain? [pg 6 | para 2]Additional comments

It might be worth talking about the Bible (or rather, our ways of reading and talking about the Bible) in order to give further examples of what Bayard means.

First of all, any such discussion "of what Bayard means" is already running into forgotten books and virtual libraries and so forth. I'm only making this post several years after I originally purchased Bayard's book and read it for the first time, and I think I read it a couple more times after that, highlighting it and adding marginalia at least once in all those readings (and on another occasion I read bits and pieces, making it, at that point, a Skimmed Book). Then I began to read it again, almost two years ago, and this time took more complete notes until, two-thirds of the way through, something distracted me and I set the book aside till recently, when I read finished the last part of the book at last and...waited several more weeks before I returned to my notes and created this post as you see it now.

How to Talk About Books You Haven't Read is very definitely a Forgotten Book for me, and in some ways it is also a Heard Book: I've only read one other blog post on this book, but all these notes which "I" made are from people who are, to varying degrees, arguably not myself. To what extent do they fill the same role as totally separate persons who are merely telling me things which they recall, and which, at this point, I no longer do?

This is a book which I have skimmed, heard about, and forgotten. It is a book which was at some point unknown to you, and which you have now heard about. When I talk about "what Bayard means," what you are getting is a filtered conception of my filtered conception of what Bayard means, and at every step of the way there has been a transformation of information, from the point that Bayard put pen to paper (or finger to keyboard) to the point that you are reading and interpreting these words.

Now, the Bible. Most English-speaking people are familiar with various passages and references. We know about swords turning to plowshares and lions laying down with lambs, what is meant when someone reminds you that some politician or priest does not "walk on water," and when someone refers to the Book of Genesis they are talking about one of the Bible's constituent parts. This is the Bible in the context of the cultural library.

We all have a personal interpretation of the Bible, to whatever extent we are familiar with. Slaveholders and abolitionists both referred to the Bible in support of their respective positions, and we can say that there was outright willful misrepresentation to one extent or another, in some number of cases, but that leaves out the role of motivated reasoning in the production of a genuinely-held interpretation. At least some slaveholders truly believed that there was a Biblical mandate for that institution, even if they only came to that belief to support their position, rather than their position as a natural consequence of this interpretation. This is the "inner Bible" as it existed in each person's inner library, and something that many literalists simply do not understand or refuse to accept. Even a "literal" interpretation of the Bible in each of its original languages is going to result in multiple inner Bibles.

This discourse about "the Bible," when each of is speaking according to our inner Bibles, then produces virtual Bibles, which may be very far removed from the objectively-existing Bible, which probably exists and may even be reachable, but which, if we did, we could not know we had reached, and which we still could not transmit to others. Our conversation about the Bible, and the virtual Bible which that conversation alters by its very existence, may, in fact, be more important than any objectively-existing Bible could be.

Author biography

Pierre Bayard is a professor of French literature at the University of Paris VIII and a psychoanalyst. He is the author of Who Killed Roger Ackroyd?, and many other books.


What's the best overview of common Micromorts?

3 сентября, 2020 - 05:39
Published on September 3, 2020 2:39 AM GMT

I want to get generally oriented on how various common risks compare against each other. I've seen some of this come up in recent Covid discussion, but I'm interested in a good article that's like "Here's all the most dangerous stuff it's likely that you do, and here's how it breaks down for various sub-activities."

This question triggered by "the first few google results not being that good."


Sunday September 6, 12pm (PT) — Casual hanging out with the LessWrong community

3 сентября, 2020 - 05:08
Published on September 3, 2020 2:08 AM GMT

This Sunday at 12pm (PDT), we will have another one of our weekly online meetups. This time we will simply hangout without much of an introduction. In the future we will be back with our curated talks, but my guess is having a more casual hangout meetup is actually something I would be more excited about this week.

If you're a curated author and interested in giving a 5-min talk at a future event, which will then be transcribed and edited, sign up here.


We will be meeting in gather.town at a URL that will be announced at a later date. There will also be a backup zoom meeting for anyone for whom the gather.town chat doesn't work out for. 


When? Sunday September 6, 12pm (PT)

Where? gather.town (exact URL to be announced later), with backup Zoom room


Study Group for Progress – 50% off for LessWrongers

3 сентября, 2020 - 03:17
Published on September 3, 2020 12:17 AM GMT

Recently I announced the Study Group for Progress: a weekly discussion/Q&A on the history, economics and philosophy of progress, with featured guests including Robert Gordon (Rise & Fall of American Growth), Margaret Jacob, and Richard Nelson.

LessWrong members can now get a 50% discount on registration by using this link. Register by Monday, September 7.


[AN #115]: AI safety research problems in the AI-GA framework

2 сентября, 2020 - 20:10
Published on September 2, 2020 5:10 PM GMT

[AN #115]: AI safety research problems in the AI-GA framework Alignment Newsletter is a weekly publication with recent content relevant to AI alignment around the world View this email in your browser Newsletter #115
Alignment Newsletter is a weekly publication with recent content relevant to AI alignment around the world. Find all Alignment Newsletter resources here. In particular, you can look through this spreadsheet of all summaries that have ever been in the newsletter.
Audio version here (may not be up yet). SECTIONS HIGHLIGHTS

Open Questions in Creating Safe Open-ended AI: Tensions Between Control and Creativity (Adrien Ecoffet et al) (summarized by Rohin): One potential pathway to powerful AI is through open-ended search, in which we use search algorithms to search for good architectures, learning algorithms, environments, etc. in addition to using them to find parameters for a particular architecture. See the AI-GA paradigm (AN #63) for more details. What do AI safety issues look like in such a paradigm?

Building on DeepMind’s framework (AN #26), the paper considers three levels of objectives: the ideal objective (what the designer intends), the explicit incentives (what the designer writes down), and the agent incentives (what the agent actually optimizes for). Safety issues can arise through differences between any of these levels.

The main difference that arises when considering open-ended search is that it’s much less clear to what extent we can control the result of an open-ended search, even if we knew what result we wanted. We can get evidence about this from existing complex systems, though unfortunately there are not any straightforward conclusions: several instances of convergent evolution might suggest that the results of the open-ended search run by evolution were predictable, but on the other hand, the effects of intervening on complex ecosystems are notoriously hard to predict.

Besides learning from existing complex systems, we can also empirically study the properties of open-ended search algorithms that we implement in computers. For example, we could run search for some time, and then fork the search into independent replicate runs with different random seeds, and see to what extent the results converge. We might also try to improve controllability by using meta learning to infer what learning algorithms, environments, or explicit incentives help induce controllability of the search.

The remaining suggestions will be familiar to most readers: they suggest work on interpretability (that now has to work with learned architectures), better benchmarks, human-in-the-loop search, safe exploration, and sim-to-real transfer.

Rohin's opinion: I’m glad that people are paying attention to safety in this AGI paradigm, and the problems they outline seem like reasonable problems to work on. I actually expect that the work needed for the open-ended search paradigm will end up looking very similar to the work needed by the “AGI via deep RL” paradigm: the differences I see are differences in difficulty, not differences in what problems qualitatively need to be solved. I’m particularly excited by the suggestion of studying how particular environments can help control the result of the open-ended search: it seems like even with deep RL based AGI, we would like to know how properties of the environment can influence properties of agents trained in that environment. For example, what property must an environment satisfy in order for agents trained in that environment to be risk-averse?


Model splintering: moving from one imperfect model to another (Stuart Armstrong) (summarized by Rohin): This post introduces the concept of model splintering, which seems to be an overarching problem underlying many other problems in AI safety. This is one way of more formally looking at the out-of-distribution problem in machine learning: instead of simply saying that we are out of distribution, we look at the model that the AI previously had, and see what model it transitions to in the new distribution, and analyze this transition.

Model splintering in particular refers to the phenomenon where a coarse-grained model is “splintered” into a more fine-grained model, with a one-to-many mapping between the environments that the coarse-grained model can distinguish between and the environments that the fine-grained model can distinguish between (this is what it means to be more fine-grained). For example, we may initially model all gases as ideal gases, defined by their pressure, volume and temperature. However, as we learn more, we may transition to the van der Waal’s equations, which apply differently to different types of gases, and so an environment like “1 liter of gas at standard temperature and pressure (STP)” now splinters into “1 liter of nitrogen at STP”, “1 liter of oxygen at STP”, etc.

Model splintering can also apply to reward functions: for example, in the past people might have had a reward function with a term for “honor”, but at this point the “honor” concept has splintered into several more specific ideas, and it is not clear how a reward for “honor” should generalize to these new concepts.

The hope is that by analyzing splintering and detecting when it happens, we can solve a whole host of problems. For example, we can use this as a way to detect if we are out of distribution. The full post lists several other examples.

Rohin's opinion: I think that the problems of generalization and ambiguity out of distribution are extremely important and fundamental to AI alignment, so I’m glad to see work on them. It seems like model splintering could be a fruitful approach for those looking to take a more formal approach to these problems.

An Architectural Risk Analysis of Machine Learning Systems: Towards More Secure Machine Learning (Gary McGraw et al) (summarized by Rohin) (H/T Catherine Olsson): One systematic way of identifying potential issues in a system is to perform an architectural risk analysis, in which you draw an architecture diagram showing the various components of the system and how they interact, and then think about each component and interaction and how it could go wrong. (Last week’s highlight (AN #114) did this for Bayesian history-based RL agents.) This paper performs an architectural risk analysis for a generic ML system, resulting in a systematic list of potential problems that could occur.

Rohin's opinion: As far as I could tell, the problems identified were ones that we had seen before, but I’m glad someone has gone through the more systematic exercise, and the resulting list is more organized and easier to understand than previous lists.


Forecasting Thread: AI Timelines (Amanda Ngo et al) (summarized by Rohin): This post collects forecasts of timelines until human-level AGI, and (at the time of this writing) has twelve such forecasts.

Roadmap to a Roadmap: How Could We Tell When AGI is a ‘Manhattan Project’ Away? (John-Clark Levin et al) (summarized by Rohin): The key hypothesis of this paper is that once there is a clear “roadmap” or “runway” to AGI, it is likely that state actors could invest a large number of resources into achieving it, comparably to the Manhattan project. The fact that we do not see signs of such investment now does not imply that it won’t happen in the future: currently, there is so little “surface area” on the problem of AGI that throwing vast amounts of money at the problem is unlikely to help much.

If this were true, then once such a runway is visible, incentives could change quite sharply: in particular, the current norms of openness may quickly change to norms of secrecy, as nations compete (or perceive themselves to be competing) with other nations to build AGI first. As a result, it would be good to have a good measure of whether we have reached the point where such a runway exists.

Read more: Import AI summary


State of AI Ethics (Abhishek Gupta et al) (summarized by Rohin): This report from the Montreal AI Ethics Institute has a wide variety of summaries on many different topics in AI ethics, quite similarly to this newsletter in fact.


Decision Points in AI Governance (Jessica Cussins Newman) (summarized by Rohin): While the last couple of years have seen a proliferation of “principles” for the implementation of AI systems in the real world, we are only now getting to the stage in which we turn these principles into practice. During this period, decision points are concrete actions taken by some AI stakeholder with the goal of shaping the development and use of AI. (These actions should not have been predetermined by existing law and practice.) Decision points are the actions that will have a disproportionately large influence on the field, and thus are important to analyze. This paper analyzes three case studies of decision points, and draws lessons for future decision points.

First, we have the Microsoft AETHER committee. Like many other companies, Microsoft has established a committee to help the company make responsible choices about its use of AI. Unlike e.g. Google’s AI ethics board, this committee has actually had an impact on Microsoft’s decisions, and has published several papers on AI governance along the way. The committee attributes its success in part to executive-level support, regular opportunities for employee and expert engagement, and integration with the company’s legal team.

Second, we have the GPT-2 (AN #46) staged release process. We’ve covered (AN #58) this (AN #55) before (AN #58), so I won’t retell the story here. However, this shows how a deviation from the norm (of always publishing) can lead to a large discussion about what publication norms are actually appropriate, leading to large changes in the field as a whole.

Finally, we have the OECD AI Policy Observatory, a resource that has been established to help countries implement the OECD AI principles. The author emphasizes that it was quite impressive for the AI principles to even get the support that they did, given the rhetoric about countries competing on AI. Now, as the AI principles have to be put into practice, the observatory provides several resources for countries that should help in ensuring that implementation actually happens.

Read more: MAIEI summary


Combining Deep Reinforcement Learning and Search for Imperfect-Information Games (Noam Brown, Anton Bakhtin et al) (summarized by Rohin): AlphaZero (AN #36) and its predecessors have achieved impressive results in zero-sum two-player perfect-information games, by using a combination of search (MCTS) and RL. This paper provides the first combination of search and deep RL for imperfect-information games like poker. (Prior work like Pluribus (AN #74) did use search, but didn’t combine it with deep RL, instead relying on significant expert information about poker.)

The key idea that makes AlphaZero work is that we can estimate the value of a state independently of other states without any interaction effects. For any given state s, we can simulate possible future rollouts of the game, and propagate the values of the resulting new states back up to s. In contrast, for imperfect information games, this approach does not work since you cannot estimate the value of a state independently of the policy you used to get to that state. The solution is to instead estimate values for public belief states, which capture the public common knowledge that all players have. Once this is done, it is possible to once again use the strategy of backing up values from simulated future states to the current state, and to train a value network and policy network based on this.


AI Governance Project Manager (Markus Anderljung) (summarized by Rohin): The Centre for the Governance of AI is hiring for a project manager role. The deadline to apply is September 30.

FEEDBACK I'm always happy to hear feedback; you can send it to me, Rohin Shah, by replying to this email. PODCAST An audio podcast version of the Alignment Newsletter is available. This podcast is an audio version of the newsletter, recorded by Robert Miles.
Subscribe here:

Copyright © 2020 Alignment Newsletter, All rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.