Written by Michael Osthege (2020-06-28)
In this Jupyter notebook, we'll take a look at R0, the basic reproduction number of an infectious disease.
The basic reproduction number R0 ("R-naught") describes the average number of people that are infected by one infected person, in a population without immunity. It is often used by epidemiologists to calculate other important factors such as the fraction of effectively immune individuals to reach herd immunity (1−1R0).
While R0 is influenced by some properties of the disease (infectiousness, incubation period, ...), it is not a constant, but also depends on the population. In other words, even in populations without immunity, R0 can be quite different, depending on factors like population density, general hygiene or rituals.
At this point it should be noted that while these days we're all concered about Covid19, other diseases such as measles or chickenpox have much higher R0 values (link).
In a textbook SIR model, R0 is a parameter that could be obtained by fitting the model to data.
Real outbreaks, however, don't work like textbook examples: Countermeasures such as quarantine are taken in attempts to contain an outbreak. These changes in population behavior change R over time, making the effective reproduction number Re(t), or short just Rt a more relevant parameter to measure the effectivity of countermeasures.
This analysis is based on the Rt.live model. For a detailed explanation of how it works, please refer to this blogpost by Thomas Wiecki.
tl;dr: Rt is modeled via a Gaussian random walk that allows it to slowly drift over time.
We use the model fits to US data, because it has some characteristics that are ideal for this analysis:
For the purpose of this analysis, we can therefore consider the Rt in early February as the undisturbed reproductive number in an essentially non-immune population without countermeasures.
For better readability, source code was moved to outsourced.py
:
import outsourced
The model is a generative model is a Bayesian model, so we get all predictions as samples from a posterior probability distribution p(Rt∣data).
outsourced.plot_r_t(region="NY")
We get R0 by taking just the first timepoint: p(R0∣data)
outsourced.plot_r_0(regions=["NY"]);
In the plot above, we've seen our estimated R0 probability distribution for one region.
We can include all 51 regions in the same plot to get an idea of about how different R0 can be:
medians, samples = outsourced.plot_r_0(regions=outsourced.US_REGIONS)
Why is R0 so different? As mentioned in the introduction, it also depends on properties of the population, such as hygiene, rituals or population density. The first two are hard to quantify, but we can easily find the population density of all US states.
Correlation does not imply causation and some regions could have a low population density even though most inhabitants live in the same city. But in the absense of a more promising factor, we can still formulate a hypotheses:
Hypothesis: "Regions with less population density have lower R0"
To qualitatively examine the relation ship between R0 and population density, we can make a violinplot of the R0 posterior distributions, positioned according to the regions population density:
outsourced.plot_scatter_r_0(regions=outsourced.US_REGIONS)
The R0 of the SARS-CoV2 outbreak ranges from 1-5, with clear differences between different the 51 analyzed US regions. The lowest R0 was observed in Alaska (HDI94%=[0.99,1.55]) and the highest in New Jersey (HDI94%=[3.20,4.90]).
Interestingly, while there are regions with high population density and low R0 (Hawaii, Delaware, District of Columbia) there are no regions with low population density that have a high R0.