Вы здесь

Новости LessWrong.com

Подписка на Лента Новости LessWrong.com Новости LessWrong.com
A community blog devoted to refining the art of rationality
Обновлено: 1 час 1 минута назад

We Still Don't Know If Masks Work

5 июля, 2021 - 07:52
Published on July 5, 2021 4:52 AM GMT


Recently a paper was published which got attention by estimating that 100% mask wearing in the population would cause a 25% reduction in the effective transmission number (shortened to transmissiblity throughout).

This study was observational and so inferring causality is always difficult. Thanks to the excellent data availability I was able to replicate and attempt to validate the model.

Based on my analysis, this study does not provide good evidence for such a causal link, and there is evidence to believe that the observed correlation is spurious.

Model Details:

The paper uses a fairly simple model in combination with MCMC sampling to arrive at their estimates of the effectiveness of various interventions. This simple model allows them to combine multiple disparate data sources together to get an estimate combining multiple possible effects.

In order to arrive at its estimate of transmissibility the model considers the following parameters.

  • Various NPIs such as school closing or restrictions on gathering at a regional level.
  • Regional Mobility (based on Google Mobility Reports)
  • Self Reported Mask Wearing (based on Facebook surveys)
  • A custom regional factor.
  • Random variation in transmissibility over time.

It then computes the most likely distribution of weights for each of these factors based on the likelihood of matching the observed cases and deaths.

Observational Window:

One common criticism of this paper has been that the window was chosen specifically to make masks look better. If the analysis was extended to include the winter spike, masks would come out looking worse.

This was the critique I initially started out to analyze, but it does not appear to be true. When I extended the analysis window to December, masks still appeared to be effective. In fact the claimed effectiveness increased to 40% over that interval. The data on NPIs in the model is not as high quality for the full interval, but the effect being observed seems robust to a wider time range.

Regional Effects:

If mask wearing causes a drop in transmissibility, then regions with higher levels of mask wearing should observe lower growth rates. This figure from the paper makes the implication clear. In fact the model does not actually make that prediction. Instead (holding NPIs and mobility constant) regions with higher mask wearing are predicted to have higher growth.

This occurs because the custom regional factor learned in our model actually correlates positively with mask wearing.

If we apply the expected 25% reduction in transmissibility for complete mask wearing we see that the higher masked regions still have slightly higher transmissibility.

Relative vs Absolute Mask Wearing:

This analysis of the data leads to a seeming contradiction. Within a given region, increased mask usage is correlated with lower growth rates (the 25% claimed effectiveness), but when comparing across regions masks seem to be ineffective. Depending on the causal story you want to tell, either of these claims could be true.

It is possible that people wear masks most in places where COVID is most transmissible. This would explain why masks don't appear effective when comparing across regions.

However it is also possible that the same factors causes both mask wearing to increase and transmissibility to decrease. For instance, if people wear masks in response to an observed spike in cases, then the population immunity caused by the spike will make masks appear to be effective even if are not.

Model Variations:

In order to try and determine whether the effect was likely to be true I tried two variations on the experiment.

Uniform Regional Transmissibility:

The first experiment was to force all regions to share the same base transmissibility. This provided an estimate that masks had an effectiveness of -10% (less effective than nothing). This validates the basic concern, but does not address the confounder of high transmissibility regions causing mask wearing (which causes transmissibility to decrease).

No Mask Variation:

The next experiment was to force each regions to use a constant value for mask wearing (the average value in the time period). Although this would add noise to the estimate, it should distinguish between the two effects. In this experiment masks appeared to be reduce transmission, but the point estimate was only ~8% and was not significantly different from 0.

Data Extrapolation:

Another experiment would be to look at the difference between April and May. During this period mask usage did increase (by around 8 percentage points) and the growth rate did decline. But again there was little to no correlation between how much the mask usage increased, and how much the growth rate declined.


The failure of large absolute differences in mask wearing across regions to meaningfully impact the observed growth rate, should make us skeptical of large claimed effects within a particular region. Clever statistical methods can make observational studies more powerful, but they are not sufficient to prove a causal story. A randomized trial underway in Bangladesh found that they could increase mask wearing by 30 percentage points. This randomization allows us to infer causality with much more confidence, and should provide a more definitive answer.


Anthropic Effects in Estimating Evolution Difficulty

5 июля, 2021 - 07:02
Published on July 5, 2021 4:02 AM GMT

Thanks to Linchuan Zhang, Mauricio Baker, Jack Ryan and Carl Shulman for helpful comments and suggestions. Remaining mistakes are my own.

Epistemic status: “There is something fascinating about [anthropics]. One gets such wholesale returns of conjecture out of such a trifling investment of fact.” (Mark Twain, Life on the Mississippi)

Attempts to predict the development of artificial general intelligence (AGI) sometimes use biological evolution to upper bound the amount of computation needed to produce human level intelligence, e.g. in Ajeya’s use of biological anchors. Such attempts have mostly ignored observation selection effects. Shulman and Bostrom’s 2012 paper How Hard is Artificial Intelligence? analyzes how evolutionary arguments interact with various ways of reasoning about observation selection effects, drawing evidence from timings of evolutionary milestones and instances of convergent evolution. This post is a summary of key arguments advanced by the paper; see the original paper for citations.

More concretely, suppose evolutionary processes produce human-level intelligence on 1/10 or 1/10^1000 planets that develop life. Call the former case “easy” and the latter case “hard.” The paper attempts to determine whether the facts about evolution on Earth can distinguish between evolving intelligence being easy or hard.

Recall two common forms of anthropic reasoning:[1]

  • Self-Sampling Assumption (SSA): Observers should reason as if they were a random sample from all actually existent observers in their reference class. Observers should reason as if they have an equal probability of being in any world with observers, regardless of the number of observers in that world. Worlds where a higher fraction of observers are “like you” are more probable.
  • Self-Indication Assumption (SIA): Observers should reason as if they were a random sample from all possible observers. Observers should reason as if they have a probability of being in a world proportional with the number of observers it contains. Worlds where a higher _number _of observers are “like you” are more probable.

For more discussion, see Katja's Anthropic Principles or Bostrom’s Anthropic Bias.

Key takeaways:

  • Universes where evolution is easy have vastly more intelligent observers than universes in which intelligence is hard. Since SIA a priori favors universes with many observers over universes with few, SIA assigns almost dogmatically high prior credence to evolution being easy.
    • Caveat: Intelligent life could run cheap computer programs that simulate observers, so any universe with a few instances of intelligent life could have an immense number of observers. SIA cannot strongly distinguish between universes where nearly all resources are used for observers, which is possible both with intelligence in every solar system or only a few times per galaxy. SIA therefore advises that evolution is relatively easy, but can still be some orders of magnitude more difficult than once per solar system.
  • Given SSA, our observation of humans having evolved cannot alone distinguish between evolution being easy or hard. However, under both SSA and SIA, whether or not intermediaries to intelligence evolved multiple times can provide evidence about evolution’s overall difficulty. If evolution is easy, we would expect predecessors to intelligence to have evolved more than once. Evolutionary developments that have occurred multiple times cannot be subject to massive anthropic distortion.
    • The Last Common Ancestor (LCA) between humans and octopuses, estimated to have lived 560 million years ago, had an extremely primitive nervous system. However, octopuses have extensive central nervous systems and display sophisticated behaviors like memory, visual communication, and tool use.
      • Other examples of complex learning and memory include corvids (crows and ravens, LCA about 300 million years ago) and elephants (LCA about 100 million years ago).
    • Caveat: Non-human intelligence might not be as scalable as human intelligence, despite appearances of generality. If this is the case, these examples might not represent an instance of convergent evolution since they and humans might be substantially different.
    • Caveat: The LCA of octopuses and humans already contained a nervous system, which only evolved once in Earth’s history and thus might be arbitrarily difficult to evolve.
    • Caveat: The LCA of various other intelligent organisms and humans might have some undetected property that predisposed both organisms towards intelligence. Similarly, the LCA of all animals with eyes contained opsins proteins, which might have been extremely difficult to evolve.
  • If a task requires many steps of varying difficulty to accomplish and you condition on the task being done in about the time expected for accomplishing a single step, each of the steps is expected to take about the same amount of time. In effect, conditioning on the task being done in a short time prohibits any of the tasks taking very long and the truncated distributions for tasks of varying difficulty are similar.
    • Example: Suppose you need to pick two locks, and the first takes uniform [0, 20] seconds and the second takes uniform [0, 1000] seconds, if you condition both taking <20 seconds, then you know the second lock took less than 20 seconds, so you cannot distinguish between the second lock taking uniform [0, 1000] seconds or uniform [0, 20] seconds.
    • Crucially, since the conditional distributions are roughly uniform, the time gap between when the last of these steps was completed and the end of the interval is equal in expectation to the time any of the hard steps took. Given that the Earth remains habitable for about a billion years and is about 5 billion years old, the expected number of “hard steps” in evolution is around 5. We can rule out hypotheses that postulate large numbers of evolutionarily hard steps because they predict intelligent life evolving much later than a billion years before the Earth stops being habitable. The LCA between humans and chimps was 6 million years ago, making it improbable that scaling up brains contained any hard steps.
      • Caveat: This argument cannot distinguish between evolutionary steps being hard (~1 billion years) or extremely hard (~100 billion years), since anthropic conditioning implies these would have taken the same amount of time historically.
  1. Armstrong roughly argues that SSA corresponds to making decisions under average utilitarianism while SIA corresponds to making decisions under total utilitarianism. ↩︎


Mauhn Releases AI Safety Documentation

4 июля, 2021 - 22:22
Published on July 3, 2021 9:23 PM GMT

Mauhn is a company doing research in AGI with a capped profit structure and an ethics board (represented by people from outside the company). Whereas there is a significant amount of AI/AGI safety research, there is still a gap of how to put this into practice for organizations doing research in AGI. We want to help closing this gap, with the following blogpost (written for an audience not familiar with AI Safety) and associated links to relevant documents:

Mauhn AI Safety Vision
This summarizes the most important points Mauhn will commit to towards building safe (proto-)AGI systems

Ethics section of Mauhn’s statutes
The statutes of Mauhn define the legal structure of the ethics board

Declaration of Ethical Commitment
Every founder, investor and employee sign the declaration of ethical commitment before starting a collaboration with Mauhn

We hope that other organizations will adopt similar principles or derivatives thereof. We were a bit short on bandwidth for this first version, but we want to include more feedback from the AI safety community for future versions of these documents. Please drop me an e-mail (berg@mauhn.com), if you'd like to contribute to next versions of this work. Probably we'll update the documentation once per year.


An Apprentice Experiment in Python Programming

4 июля, 2021 - 06:29
Published on July 4, 2021 3:29 AM GMT

A couple weeks ago Zvi made an apprentice thread. I have always wanted to be someone's apprentice, but it didn't occur to me that I could just ...ask to do that. Mainly I was concerned about this being too big of an ask. I saw gilch's comment offering to mentor Python programming. I want to level up my Python skills, so I took gilch up on the offer. In a separate comment, gilch posed some questions about what mentors and/or the community get in return. I proposed that I, as the mentee, document what I have learned and share it publicly.

Yesterday we had our first session.


I had identified that I wanted to fill gaps in my Python knowledge, two of which being package management and decorators.

Map and Territory

Gilch started by saying that "even senior developers typically have noticeable gaps," but building an accurate map of the territory of programming would enable one to ask the right questions. They then listed three things to help with that:

Documentation on the Python standard library. "You should at least know what's in there, even if you don't know how to use all of it. Skimming the documentation is probably the fastest way to learn that. You should know what all the operators and builtin functions do."

Structure and Interpretation of Computer Programs for computer science foundation. There are some variants of the book in Python, if one does not want to use Scheme.

CODE as "more of a pop book" on the backgrounds.

In my case, I minored in CS, but did not take operating systems or compilers. I currently work as a junior Python developer, so reading the Python standard library seems to be the lowest hanging fruit here, with SICP on the side, CODE on the back burner.


The rest of the conversation consisted of gilch teaching me about decorators.

Gilch: Decorators are syntactic sugar.

@foo def bar():   ...

means the same thing as

def bar():   ... bar = foo(bar)

Decorators also work on classes.

@foo class Bar:   ...

is the same as

class Bar:   ... Bar = foo(Bar)

An Example from pytest

At this point I asked if decorators were more than that. I had seen decorators in pytest:

@pytest.fixture def foo():   ...     def test_bar(foo):  # foo automatically gets evaluated inside the test   ...

Does this mean that, when foo is passed in test_bar as a variable, what gets passed in is actually something like pytest.fixture(foo)?

Gilch identified that there might be more than decorators involved in this example, so we left this for later and went back to decorators.

Decorators, Example 1

I started sharing my screen, gilch gave me the first instruction: Try making a decorator.

def test_decorator(foo):    return 42 ​ @test_decorator def bar():    print('hi')     print(bar())

Then, before I ran the program, gilch asked me what I expected to happen when I run this program, to which I answered that hi and 42 would be printed to console. At this point, gilch reminded me that decorators were sugar, and asked me to write out the un-sugared translation of the function above. I wrote:

def bar():    bar = test_decorator(bar)    return bar

I ran the program, and was surprised by the error TypeError: 'int' object is not callable. I expected bar to still be a function, not an integer.

Gilch asked me to correct my translation of my program based on the result I saw. It took me a few more tries, and eventually they showed me the correct translation:

def bar():    print('hi') ​ bar = test_decorator(bar)

Then I realized why I was confused--I had the idea that decorators modify the function they decorate directly (in the sense of modifying function definitions), when in fact the actions happen outside of function definitions.

Gilch explained: A decorator could [modify the function], but this one doesn't. It ignores the function and returns something else. Which it then gives the function's old name.

Decorators, Example 2

Gilch: Can you make a function that subtracts two numbers?


def subtract(a, b): return a - b

Gilch: Now make a decorator that swaps the order of the arguments.

My first thought was to ask if there was any way for us to access function parameters the same way we use sys.argv to access command line arguments. But gilch steered me away from that path by pointing out that decorators could return anything. I was stuck, so gilch suggested that I try return lambda x, y: y-x.

Definition Time

My program looked like this at this point:

@swap_order def subtract(a, b): return a - b def swap_order(foo): return lambda x, y: y - x

PyCharm gave me an warning about referencing swap_order before defining it. Gilch explained that decoration happened at definition time, which made sense considering the un-sugared version.

Interactive Python

Up until this point, I had been running my programs with the command python3 <file>. Gilch suggested that I run python3 -i <file> to make it interactive, which made it easier to experiment with things.

Decorators, Example 2

Gilch: Now try an add function. Decorate it too.


def swap_order(foo): return lambda x, y: y - x @swap_order def subtract(a, b): return a - b @swap_order def add(a, b): return a + b

Gilch then asked, "What do you expect the add function to do after decoration?" To which I answered that the add function would return the value of its second argument subtracted by the first argument. The next question gilch asked was, "Can you modify the decorator to swap the arguments for both functions?"

I started to think about sys.argv again, then gilch hinted, "You have 'foo' as an argument." I then realized that I could rewrite the return value of the lambda function:

def swap_order(foo): return lambda x, y: foo(y, x)

I remarked that we'd see the same result from add with or without the decorator. Gilch asked, "Is addition commutative in Python?" and I immediately responded yes, then I realized that + is an overloaded operator that would work on strings too, and in that case it would not be commutative. We tried with string inputs, and indeed the resulting value was the reverse-ordered arguments concatenated together.

Gilch: Now can you write a decorator that converts its result to a string?

I wrote:

def convert_to_str(foo): return str(foo())

It was not right. I then tried

def convert_to_str(foo): return str

and it was still not right. Finally I got it:

def convert_to_str(foo): lambda x, y: str(foo(x, y))

There was some pair debugging that gilch and I did before I reached the answer. Looking at the mistakes I've made here, I see that I still hadn't grasped the idea that decorators would return functions that transform the results of other functions, not the transformed result itself.

Gilch: Try adding a decorator that appends ", meow." to the result of the function.

I verbalized the code in my head out loud, then asked how we'd convert the types of the function return value to string before appending ", meow" to it. Gilch suggested f"{foo(x, y)}, meow" and we had our third decorator.

We then applied decorators in different orders to show that multiple decorators were allowed, and that the order of decorators decided the order of application.


When we were writing the convert_to_str decorator, I commented that this would only work for functions that take in exactly 2 arguments. So gilch asked me if I was familiar with the term "unpacking" or "splat." I knew it was something like ** but didn't have more knowledge than that.

How Many Arguments

Gilch asked me, "How many arguments can print() take?" To which I answered "infinite." They then pointed out that it was different from infinite--zero would be valid, or one, or two, and so on. So the answer is "any number, " and the next challenge would be to make convert_to_str work with any number of arguments.


We tried passing different numbers of arguments into print(), and sure enough it took any number of arguments. Here, gilch pointed out that print actually printed out a newline character by default, and the default separator was a space. They also pointed out that I could use the help(print) command to access the doc in the terminal without switching to my browser.


Gilch pointed out that I could use the command type(_) to get the type of the previous value in the console, without having to copy and paste.


To illustrate how splat worked, gilch gave me a few commands to try. I'd say out loud what I expected the result to be before I ran the code. Sometimes I got what I expected; sometimes I was surprised by the result, and gilch would point out what I had missed. To illustrate splat in arrays, gilch gave two examples: print(1,2,3,*"spam", sep="~") and print(1,2,*"eggs",3,*"spam", sep="~"). Then they showed me how to use ** to construct a mapping: (lambda **kv: kv)(foo=1, bar=2)

Dictionary vs. Mapping

We went off on a small tangent on dictionary vs. mapping because gilch pointed out that dictionary was not the only type of mapping and tuple is no the only type of iterable. I asked if there were other types of mapping in Python, and they listed OrderedDict as a subtype and the Mapping abstract class.

Parameter vs. Argument, Packing vs. Unpacking

At this point gilch noticed that I kept using the word "unpacking." I also noticed that I was using the term "argument" and "parameter" interchangeably here. Turns out the distinction is important here--the splat operator used on a parameter packs values in a tuple; used on an argument unpacks iterable into separate values. For example, in (lambda a, b, *cs: [a, b, cs])(1,2,3,4,5), cs is a parameter and * packs the values 3, 4, 5 into a tuple; in print(*"spam", sep="~"), "spam" is an argument and * unpacks it into individual characters.


Gilch gave me another example: Try {'x':1, **dict(foo=2,bar=3), 'y':4}. I answered that it would return a dictionary with four key-value pairs, with foo and bar also becoming keys. Gilch then asked, "in what order?" To which I answered "dictionaries are not ordered."

"Not true anymore," gilch pointed out, "Since Python 3.7, they're guaranteed to remember their insertion order." We looked up the Python documentation and it was indeed the case. We tried dict(foo=2, **{'x':1,'y':4}, bar=3) and got a dictionary in a different order.

Hashable Types

I asked if there was any difference in defining a dictionary using {} versus dict(). Gilch compared two examples: {42:'spam'} works and dict(42='spam') doesn't. They commented that keys could be any hashable type, but keyword arguments were always keyed by identifier strings. The builtin hash() only worked on hashable types.

I don't fully understand the connection between hashable types and identifier strings here, it's something that I'll clarify later.

Parameter vs. Argument, Packing vs. Unpacking

Gilch gave another example: a, b, *cs, z = "spameggs"

I made a guess that cs would be an argument here, so * would be unpacking, but then got stuck on what cs might be. I tried to run it:

>>> a, b, *cs, z = "spameggs" >>> a 's' >>> b 'p' >>> cs ['a', 'm', 'e', 'g', 'g'] >>> z 's'

Gilch pointed out that cs was a store context, not a load context, which made it more like a parameter rather than an argument. Then I asked what store vs. load context was.


Gilch suggested, import ast then def dump(code): return ast.dump(ast.parse(code)). Then something like dump("a = a") would return a nexted object, in which we can locate the ctx value for each variable.

This reminded me of lvalue and rvalue in C++, so I asked if they were the same thing as store vs. load context. They were.


Gilch tied it all together, "So for a decorator to pass along all args and kwargs, you do something like lambda *args, **kwargs: foo(*args, **kwargs). Then it works regardless of their number. Arguments and keyword arguments in a tuple and dict by keyword. So you can add, remove, and reorder arguments by using decorators to wrap functions. You can also process return values. You can also return something completely different. But wrapping a function in another function is a very common use of decorators. You can also have definition-time side effects. When you first load the module, it runs all the definitions--This is still runtime in Python, but you define a function at a different time than when you call it. The decoration happens on definition, not on call."

We wrapped up our call at this point.


  1. As we were working through the examples, we'd voice out what we expect to see when we run the code before actually running to verify. Several times gilch asked me to translate a decorated function into an undecorated one. This was helpful for me to check my understanding of things.
  2. Another thing I found valuable were the tips and tricks I picked up from gilch throughout the session, like interactive mode; and the clarification of concepts, like the distinction between parameter and argument.
  3. Gilch quizzed me throughout the session. This made things super fun! I haven't had the opportunity for someone to keep quizzing me purely for learning (as opposed to giving me a grade or deciding whether to hire me) for the longest time! I guess that reading through well-written text tends to be effective for familiarizing oneself with concepts, while asking/answering questions is effective at solidifying and synthesizing knowledge.
  4. In this post, I tried to replicate the structure of my conversation with gilch as much as possible (the fact that gilch's mic was broken so they typed while I talked made writing this post so much easier--I had their half of the transcript generated for me!) since we went off on some tangents and I wanted to provide context for those tangents. I think of a conversation as a tree structure--we start with a root topic and go from there. A branch would happen when we go off on a tangent and then later come back to where we left off before the tangent. Sometimes two sections of this post would have the same section headings; a second time a section heading is used indicates that we stopped the tangent and went back to where we branched off.