Notebook

Bayesian inference of $R_0$ : A side product of the Rt.live generative model¶

Written by Michael Osthege (2020-06-28)

In this Jupyter notebook, we'll take a look at $R_0$ , the basic reproduction number of an infectious disease.

What is $R_0$ ?¶

The basic reproduction number $R_0$ ("R-naught") describes the average number of people that are infected by one infected person, in a population without immunity. It is often used by epidemiologists to calculate other important factors such as the fraction of effectively immune individuals to reach herd immunity ( $1 - \frac{1}{R_0}$ ).

While $R_0$ is influenced by some properties of the disease (infectiousness, incubation period, ...), it is not a constant, but also depends on the population. In other words, even in populations without immunity, $R_0$ can be quite different, depending on factors like population density, general hygiene or rituals.

At this point it should be noted that while these days we're all concered about Covid19, other diseases such as measles or chickenpox have much higher $R_0$ values (link).

How can we measure $R_0$ ?¶

In a textbook SIR model, $R_0$ is a parameter that could be obtained by fitting the model to data.

Real outbreaks, however, don't work like textbook examples: Countermeasures such as quarantine are taken in attempts to contain an outbreak. These changes in population behavior change $R$ over time, making the effective reproduction number $R_e(t)$ , or short just $R_t$ a more relevant parameter to measure the effectivity of countermeasures.

This analysis is based on the Rt.live model. For a detailed explanation of how it works, please refer to this blogpost by Thomas Wiecki.

tl;dr: $R_t$ is modeled via a Gaussian random walk that allows it to slowly drift over time.

We use the model fits to US data, because it has some characteristics that are ideal for this analysis:

countermeasures were taken very late in international comparison, because
- the early testing kits did not work as expected
- recommendations by the WHO were largely dismissed
there are 51 (independent) regions to compare
the data quality is actually good!

For the purpose of this analysis, we can therefore consider the $R_t$ in early February as the undisturbed reproductive number in an essentially non-immune population without countermeasures.

For better readability, source code was moved to outsourced.py:

In [1]:

import outsourced

The model is a generative model is a Bayesian model, so we get all predictions as samples from a posterior probability distribution $p(R_t \mid data)$ .

In [2]:

outsourced.plot_r_t(region="NY")

We get $R_0$ by taking just the first timepoint: $p(R_0 \mid data)$

In [3]:

outsourced.plot_r_0(regions=["NY"]);

Comparing $R_0$ across regions¶

In the plot above, we've seen our estimated $R_0$ probability distribution for one region.

We can include all 51 regions in the same plot to get an idea of about how different $R_0$ can be:

In [4]:

medians, samples = outsourced.plot_r_0(regions=outsourced.US_REGIONS)

Meaningful correlations?¶

Why is $R_0$ so different? As mentioned in the introduction, it also depends on properties of the population, such as hygiene, rituals or population density. The first two are hard to quantify, but we can easily find the population density of all US states.

Correlation does not imply causation and some regions could have a low population density even though most inhabitants live in the same city. But in the absense of a more promising factor, we can still formulate a hypotheses:

Hypothesis: "Regions with less population density have lower $R_0$ "

To qualitatively examine the relation ship between $R_0$ and population density, we can make a violinplot of the $R_0$ posterior distributions, positioned according to the regions population density:

In [5]:

outsourced.plot_scatter_r_0(regions=outsourced.US_REGIONS)

100.00% [51/51 00:04<00:00]

Conclusion¶

The $R_0$ of the SARS-CoV2 outbreak ranges from 1-5, with clear differences between different the 51 analyzed US regions. The lowest $R_0$ was observed in Alaska ( $HDI_{94\%}=[0.99, 1.55]$ ) and the highest in New Jersey ( $HDI_{94\%}=[{3.20}, {4.90}]$ ).

Interestingly, while there are regions with high population density and low $R_0$ (Hawaii, Delaware, District of Columbia) there are no regions with low population density that have a high $R_0$ .

Bayesian inference of R0R_0: A side product of the Rt.live generative model¶

What is R0R_0?¶

How can we measure R0R_0?¶