Disease Outbreak Response Decision-making Under Uncertainty: A retrospective analysis of measles in Sao Paulo¶

In [15]:

%matplotlib inline
import pandas as pd
import numpy as np
import numpy.ma as ma
from datetime import datetime
import matplotlib.pyplot as plt
pd.set_option('max_columns', 20)
pd.set_option('max_rows', 25)

from IPython.core.display import HTML
def css_styling():
    styles = open("styles/custom.css", "r").read()
    return HTML(styles)
css_styling()

Out[15]:

In [16]:

data_dir = "data/"

Import outbreak data

In [17]:

measles_data = pd.read_csv(data_dir+"measles.csv", index_col=0)
measles_data.NOTIFICATION = pd.to_datetime(measles_data.NOTIFICATION)
measles_data.BIRTH = pd.to_datetime(measles_data.BIRTH)
measles_data.ONSET = pd.to_datetime(measles_data.ONSET)

In [18]:

measles_data = measles_data.replace({'DISTRICT': {'BRASILANDIA':'BRAZILANDIA'}})

Sao Paulo population by district

In [19]:

sp_pop = pd.read_csv(data_dir+'sp_pop.csv', index_col=0)

In [20]:

_names = sp_pop.index.values
_names[_names=='BRASILANDIA'] = 'BRAZILANDIA'
sp_pop.set_index(_names, inplace = True)

In [21]:

sp_pop.head()

Out[21]:

	0 a 4 anos	5 a 9 anos	10 a 14 anos	15 a 19 anos	20 a 24 anos	25 a 29 anos	30 a 34 anos	35 a 39 anos	40 a 44 anos	45 a 49 anos	50 a 54 anos	55 a 59 anos	60 a 64 anos	65 a 69 anos	70 a 74 anos	75 anos e +	Total
AGUA RASA	5411	5750	6450	7122	7621	7340	6999	6984	6346	5608	4987	4212	4152	3595	2937	3637	89151
ALTO DE PINHEIROS	2070	2369	2953	3661	4612	4190	3539	3633	3448	3289	3040	2533	2298	1732	1305	1823	46495
ANHANGUERA	3068	3006	2755	2431	2426	2636	2695	2308	1653	1107	753	509	352	217	162	171	26249
ARICANDUVA	7732	7730	8373	8956	9182	8531	7813	7365	6551	5554	4887	3858	3320	2449	1611	1723	95635
ARTUR ALVIM	9031	9078	10000	11058	11387	10347	9125	8658	7830	7055	5919	4612	3756	2633	1727	1724	113940

Plot of cumulative cases by district

In [22]:

measles_onset_dist = measles_data.groupby(['DISTRICT','ONSET']).size().unstack(level=0).fillna(0)
measles_onset_dist.cumsum().plot(legend=False, grid=False)

Out[22]:

<matplotlib.axes._subplots.AxesSubplot at 0x11114df98>

In [23]:

total_district_cases = measles_onset_dist.sum()

Top 5 districts by number of cases

In [24]:

totals = measles_onset_dist.sum()
totals.sort(ascending=False)
totals[:5]

Out[24]:

DISTRICT
GRAJAU             1074
JARDIM ANGELA       944
CAPAO REDONDO       849
JARDIM SAO LUIZ     778
CAMPO LIMPO         692
dtype: float64

Age distribution of cases, by confirmation status

In [25]:

by_conclusion = measles_data.groupby(["YEAR_AGE", "CONCLUSION"])
counts_by_cause = by_conclusion.size().unstack().fillna(0)
ax = counts_by_cause.plot(kind='bar', stacked=True, xlim=(0,50), figsize=(15,5))

Chain Binomial Transmission Model¶

As a baseline for comparison, we can fit a model to all the clinically-confirmed cases, regardless of lab confirmation status. For this, we will use a simple chain binomial model, which will be fit using MCMC.

This model fits the series of 2-week infection totals as a set of Binomial models:

$$Pr(I_{t+1} | S_t, p_t) = \text{Bin}(S_t, p_t) $$

Where the binomial probability is modeled as:

$$p_t = 1 - \exp$-\lambda_t$$$$$\lambda_t = \frac{\beta_t I_t}{N}$$

We will assume here that the transmission rate is constant over time (and across districts):

$$\beta_t = \beta_0$$

which allows the effective reproductive number to be calculated as:

$$R_t = \frac{\beta S_t}{N}$$

Confirmation Sub-model¶

Rather than assume all clinical cases are true cases, we can adjust the model to account for lab confirmation probability. This is done by including a sub-model that estimates age group-specific probabilities of confirmation, and using these probabilities to estimate the number of lab-confirmed cases. These estimates are then plugged into the model in place of the clinically-confirmed cases.

We specified a structured confirmation model to retrospectively determine the age group-specific probabilities of lab confirmation for measles, conditional on clinical diagnosis. Individual lab confirmation events $c_i$ were modeled as Bernoulli random variables, with the probability of confirmation being allowed to vary by age group:

$$c_i \sim \text{Bernoulli}(p_{a(i)})$$

where $a(i)$ denotes the appropriate age group for the individual indexed by i. There were 16 age groups, the first 15 of which were 5-year age intervals $[0,5), [5, 10), \ldots , [70, 75)$, with the 16th interval including all individuals 75 years and older.

Since the age interval choices were arbitrary, and the confirmation probabilities of adjacent groups likely correlated, we modeled the correlation structure directly, using a multivariate logit-normal model. Specifically, we allowed first-order autocorrelation among the age groups, whereby the variance-covariance matrix retained a tridiagonal structure.

$$\begin{aligned} \Sigma = \left[{ \begin{array}{c} {\sigma^2} & {\sigma^2 \rho} & 0& \ldots & {0} & {0} \\ {\sigma^2 \rho} & {\sigma^2} & \sigma^2 \rho & \ldots & {0} & {0} \\ {0} & \sigma^2 \rho & {\sigma^2} & \ldots & {0} & {0} \\ \vdots & \vdots & \vdots & & \vdots & \vdots\\ {0} & {0} & 0 & \ldots & {\sigma^2} & \sigma^2 \rho \\ {0} & {0} & 0 & \ldots & \sigma^2 \rho & {\sigma^2} \end{array} }\right] \end{aligned}$$

From this, the confirmation probabilities were specified as multivariate normal on the inverse-logit scale.

$$ \text{logit}(p_a) = \{a\} \sim N(\mu, \Sigma)$$

Priors for the confirmation sub-model were specified by:

$$\begin{aligned} \mu_i &\sim N(0, 100) \\ \sigma &\sim \text{HalfCauchy}(25) \\ \rho &\sim U(-1, 1) \end{aligned}$$

Age classes are defined in 5-year intervals.

In [26]:

age_classes = [0,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,100]
measles_data.dropna(subset=['AGE'], inplace=True)
measles_data['AGE_GROUP'] = pd.cut(measles_data.AGE, age_classes, right=False)

Lab-checked observations are extracted for use in estimating lab confirmation probability.

In [27]:

CONFIRMED = measles_data.CONCLUSION == 'CONFIRMED'
CLINICAL = measles_data.CONCLUSION == 'CLINICAL'
DISCARDED = measles_data.CONCLUSION == 'DISCARDED'

In [28]:

lab_subset = measles_data[(CONFIRMED | DISCARDED) & measles_data.YEAR_AGE.notnull() & measles_data.COUNTY.notnull()].copy()
lab_subset.loc[lab_subset.YEAR_AGE > 75, 'YEAR_AGE'] = 75
age = lab_subset.YEAR_AGE.astype(int).values
ages = lab_subset.YEAR_AGE.astype(int).unique()
counties = lab_subset.COUNTY.unique()
y = (lab_subset.CONCLUSION=='CONFIRMED').values

Proportion of lab-confirmed cases older than 20 years

In [29]:

(measles_data[CONFIRMED].AGE>20).mean()

Out[29]:

0.6034981469764078

Extract cases by age and time.

In [30]:

age_group = pd.cut(age, age_classes, right=False)
age_index = np.array([age_group.categories.tolist().index(i) for i in age_group])

/usr/local/lib/python3.4/site-packages/pandas/core/categorical.py:462: FutureWarning: Accessing 'levels' is deprecated, use 'categories'
  warn("Accessing 'levels' is deprecated, use 'categories'", FutureWarning)

In [31]:

# Get index from full crosstabulation to use as index for each district
dates_index = measles_data.groupby(
        ['ONSET', 'AGE_GROUP']).size().unstack().index

In [32]:

unique_districts = measles_data.DISTRICT.dropna().unique()

In [33]:

N = sp_pop.ix[unique_districts, 'Total'].dropna()

In [34]:

sp_districts = N.index.values
len(sp_districts)

Out[34]:

In [35]:

all_district_data = []
all_confirmed_cases = []
for d in sp_districts:
    
    # All bi-weekly unconfirmed and confirmed cases
    district_data = measles_data[measles_data.DISTRICT==d]
    district_counts_2w = district_data.groupby(
        ['ONSET', 'AGE_GROUP']).size().unstack().reindex(dates_index).fillna(0).resample('2W', how='sum')
    all_district_data.append(district_counts_2w)
    
    # All confirmed cases, by district
    confirmed_data = district_data[district_data.CONCLUSION!='CONFIRMED']
    confirmed_counts = confirmed_data.groupby(
        ['ONSET', 'AGE_GROUP']).size().unstack().reindex(dates_index).fillna(0).sum()
    all_confirmed_cases.append(confirmed_counts.reindex_axis(measles_data['AGE_GROUP'].unique()).fillna(0))

Time series of cases by district, summarized in 2-week intervals

In [36]:

# Sum over ages for susceptibles
sp_cases_2w = [dist.sum(1) for dist in all_district_data]

In [37]:

# Ensure the age groups are ordered
I_obs = np.array([dist.reindex_axis(measles_data['AGE_GROUP'].unique(), 
                            axis=1).fillna(0).values.astype(int) for dist in all_district_data])

In [38]:

measles_data['AGE_GROUP'].unique()

Out[38]:

array(['[0, 5)', '[5, 10)', '[10, 15)', '[15, 20)', '[20, 25)', '[25, 30)',
       '[30, 35)', '[35, 40)', '[40, 45)', '[45, 50)', '[50, 55)',
       '[55, 60)', '[60, 65)', '[65, 70)', '[70, 75)', '[75, 100)'], dtype=object)

Check shape of data frame

93 districts, 28 bi-monthly intervals, 16 age groups

In [39]:

assert I_obs.shape == (93, 28, 16)

Spatial distance between districts¶

In [40]:

import geopandas as gpd

shp = gpd.GeoDataFrame.from_file("Sao Paulo/Brazil_full/BRA_adm3.shp")

In [41]:

district_names = N.index.unique()

In [42]:

import trans
shp['district_name'] = shp.NAME_3.apply(
    lambda x: trans.trans(x).upper())

In [43]:

sp_shp = shp[shp.NAME_2=='São Paulo'].set_index('district_name')

In [45]:

centroids = sp_shp.geometry.centroid

In [46]:

distance_matrix = pd.concat([sp_shp.geometry.distance(o) for o in sp_shp.geometry],
                     axis=1)
distance_matrix.columns = sp_shp.index

In [47]:

min_x, min_y = sp_shp.bounds.min()[:2]
max_x, max_y = sp_shp.bounds.max()[2:]

In [48]:

assert (distance_matrix.index == centroids.index).all()

In [49]:

centroid_xy = np.array([[c.x, c.y] for c in sp_shp.geometry.centroid])

Here is an arbitrary distance metric for an arbitrary district, as an example.

In [50]:

_beta = -100
np.exp(_beta*distance_matrix).values.round(2)[0]

Out[50]:

array([ 1.  ,  0.  ,  0.  ,  0.01,  0.  ,  0.12,  0.  ,  0.  ,  0.  ,
        0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.04,  0.  ,  0.  ,  0.  ,
        1.  ,  0.  ,  0.  ,  0.  ,  0.01,  0.01,  0.  ,  0.  ,  0.  ,
        0.01,  0.  ,  0.  ,  0.  ,  0.  ,  0.18,  0.  ,  0.  ,  0.  ,
        0.  ,  0.04,  1.  ,  0.  ,  0.  ,  0.  ,  0.1 ,  0.01,  0.  ,
        0.  ,  0.19,  0.  ,  0.  ,  0.  ,  0.  ,  0.04,  0.  ,  1.  ,
        0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.11,  0.  ,  1.  ,
        0.02,  0.  ,  1.  ,  0.  ,  1.  ,  0.  ,  0.01,  0.  ,  0.02,
        0.03,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,  0.  ,
        0.  ,  0.  ,  0.  ,  0.1 ,  0.  ,  0.  ,  0.  ,  0.  ,  0.65,
        0.  ,  0.01,  0.  ,  0.  ,  0.  ,  1.  ])

Spatial decision model¶

We attempt to estimate $R_t$ for a truncated subset of the data, to simulate a decision-maker's information during (rather than after) an outbreak. This essentially involves turning part of the time series into missing data, and running the model.

This is an example of creating a mask for data not observed by the decision date.

In [51]:

np.array([np.resize(all_district_data[0].index > '1997-06-15', I_obs[0].T.shape).T]*len(all_district_data))

Out[51]:

array([[[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True]],

       ..., 
       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True]],

       [[False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        [False, False, False, ..., False, False, False],
        ..., 
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True]]], dtype=bool)

In [52]:

from pymc import MCMC, Matplot
from pymc import (Uniform, DiscreteUniform, Beta, Lambda, Binomial, Normal, Poisson, 
                  NegativeBinomial, observed, negative_binomial_like, Lognormal, Exponential, binomial_like,
                  stochastic, potential, invlogit, TruncatedNormal, Binomial, Gamma)
from pymc import (HalfCauchy, deterministic, MvNormalCov, Bernoulli, potential, Uninformative, Multinomial,
                  rmultinomial, rbinomial, AdaptiveMetropolis)

from pymc.gp import *
from pymc.gp.cov_funs import matern

In [59]:

def measles_model(obs_date, confirmation=True, spatially_weighted=True, all_traces=False):
    
    ### Confirmation sub-model
    
    if confirmation:

        # Specify priors on age-specific means
        age_classes = np.unique(age_index)

        mu = Normal("mu", mu=0, tau=0.0001, value=[0]*len(age_classes))
        sig = HalfCauchy('sig', 0, 25, value=1)
        var = sig**2
        cor = Uniform('cor', -1, 1, value=0)


        # Build variance-covariance matrix with first-order correlation among age classes
        @deterministic
        def Sigma(var=var, cor=cor):
            I = np.eye(len(age_classes))*var
            E = np.diag(np.ones(len(age_classes)-1), k=-1)*var*cor
            return I + E + E.T


        # Age-specific probabilities of confirmation as multivariate normal random variables
        beta_age = MvNormalCov("beta_age", mu=mu, C=Sigma, value=[1]*len(age_classes))
        p_age = Lambda('p_age', lambda t=beta_age: invlogit(t))

        @deterministic(trace=False)
        def p_confirm(beta=beta_age):
            return invlogit(beta[age_index])


        # Confirmation likelihood
        lab_confirmed = Bernoulli('lab_confirmed', p=p_confirm, value=y, observed=True)


    '''
    Missing data sub-model 
    
    We treat observations later than the decision date as missing data. This is 
    implemented as a `masked_array` in NumPy, which requires a boolean mask to identify missing values.
    '''
    missing_mask = all_district_data[0].index > obs_date
    district_mask = np.resize(missing_mask, I_obs[0].T.shape).T
    
    # Index to the observation period
    current_index = (~missing_mask).sum()-1

    I_obs_masked = ma.masked_array(I_obs,
                                   mask=np.array([district_mask]*len(all_district_data)),
                                   fill_value=1)


    # Index for observation date, used to index out values of interest from the model.
    t_obs = (~missing_mask).sum() - 1


    # Imputed infecteds
    I_imp = DiscreteUniform('I_imp', 0, 2000, value=I_obs_masked, observed=True)

    ### Chain binomial model for disease dynamics

    if confirmation:
        
        # Confirmed cases
        @stochastic(trace=all_traces, dtype=int)
        def I(value=(I_imp.value*0.7).astype(int), n=I_imp, p=p_age):
            return np.sum([np.sum([binomial_like(vj[:,i], nj[:,i], p[i]) for i in range(len(p))]).sum(0) 
                                    for vj, nj in zip(value, n)])

    else:
        
        I = I_imp


    # Infecteds at time $t_{obs}$
    It = Lambda('It', lambda I=I: I.sum(0)[t_obs])


    # Calculate susceptibles from total number of infections
    @deterministic(trace=all_traces)
    def S(I=I):
        return np.array([Ij.sum() - np.array([Ij[:i].sum() for i in range(len(Ij))]) for Ij in I])


    # Total infecteds until time t_obs, by age
    I_total = Lambda('I_total', lambda I=I: I[:,:t_obs].sum(0).sum(0))
    # Age distribution of infecteds
    age_dist = Lambda('age_dist', lambda I=I_total: I/I.sum())
    
    # Susceptibles at time t
    S_t = Lambda('S_t', lambda S=S: S[:, t_obs])
    
    # Susceptibles at time t, by age
    @deterministic
    def S_age(S=S_t, p=age_dist):
        return np.array([rmultinomial(si, p) for si in S])  
    
    @deterministic
    def vacc_5(S=S_age):
        # Vaccination of 15 and under
        p = [0.95] + [0]*15
        return rbinomial(S[current_index], p)
    
    # Proportion of susceptibles vaccinated
    pct_5 = Lambda('pct_5', lambda V=vacc_5, S=S_age: float(V.sum())/S[current_index].sum())
    
    @deterministic
    def vacc_15(S=S_age):
        # Vaccination of 15 and under
        p = [0.95]*3 + [0]*13
        return rbinomial(S[current_index], p)
    
    # Proportion of susceptibles vaccinated
    pct_15 = Lambda('pct_15', lambda V=vacc_15, S=S_age: float(V.sum())/S[current_index].sum())
    
    @deterministic
    def vacc_30(S=S_age):
        # Vaccination of 30 and under
        p = [0.95]*6 + [0]*10
        return rbinomial(S[current_index], p)
    
    # Proportion of 30 and under susceptibles vaccinated
    pct_30 = Lambda('pct_30', lambda V=vacc_30, S=S_age: float(V.sum())/S[current_index].sum())
    
    @deterministic
    def vacc_adult(S=S_age):
        # Vaccination of adults under 30 (and young kids)
        p = [0.95, 0, 0, 0, 0.95, 0.95] + [0]*10
        return rbinomial(S[current_index], p)
    
    # Proportion of adults under 30 (and young kids)
    pct_adult = Lambda('pct_adult', lambda V=vacc_adult, S=S_age: float(V.sum())/S[current_index].sum())
    
    # Transmission parameter
    beta = Gamma('beta', 1, 0.1, value=10) 


    '''
    Calculation of the effective reproduction number depends on whether we believe that the districts 
    are independent or not. If not (`spatially_weighted = True`), both the number of susceptibles and the 
    denominator population are calculated as a distance-weighted average of all the districts in Sao Paulo.
    '''
    if spatially_weighted:

        geo_mesh = np.transpose([np.linspace(min_x, max_x),
                                np.linspace(min_y, max_y)])

        # Vague prior for the overall mean
        m = Uninformative('m', value=0)

        def constant(x, val):
            return np.zeros(x.shape[:-1],dtype=float) + val

        @deterministic
        def M(m=m):
            return Mean(constant, val=m)

        # ==================
        # = The covariance =
        # ==================

        # Vague priors for the amplitude and scale
        amp = Exponential('amp', 1e-5, value=1)
        scale = Exponential('scale',1e-5, value=0.1)
        @deterministic
        def C(amp=amp, scale=scale):
            return FullRankCovariance(exponential.euclidean, amp = amp, scale = scale)

        # ===================
        # = The GP submodel =
        # ===================
        zeta = GPSubmodel('zeta', M, C, geo_mesh)

        z_eval = Lambda('z_eval', lambda z=zeta.f(centroid_xy): z)



    if spatially_weighted:
        alpha = Exponential('alpha', 1, value=0.0001)
        Rt = Lambda('Rt', lambda B=beta, S=S, z=z_eval, a=alpha: ((B * (1 + a*z) * S.T) / N.values).T)
    else:
        Rt = Lambda('Rt', lambda B=beta, S=S: ((B * S).T / N.values).T)


    # Effective reproduction number at time of observation, and implied vaccination target
    Rt_obs = Lambda('Rt_obs', lambda r=Rt: r[:, t_obs])
    vaccination_target = Lambda('vaccination_target', lambda r=Rt_obs: np.maximum(1-1./r, 0))


    # Force of infection, assuming mass action transmission
    if spatially_weighted:
        lam = Lambda('lam', lambda B=beta, I=I, z=z_eval, a=alpha: 
                     np.array([(B * (1 + zj) * Ij) / nj for Ij,nj,zj in zip(I, N.values, a*z)]), 
                     trace=False)
    else:
        lam = Lambda('lam', lambda B=beta, I=I: np.array([(B * Ij) / nj for Ij,nj in zip(I, N.values)]), 
                     trace=False)


    # 2-week infection probabilities
    p = Lambda('p', lambda lam=lam: 1. - np.exp(-lam) + 1e-6, trace=False)


    # Binomial likelihood for observed cases
    @potential
    def new_cases(p=p, I=I, S=S): 
        return np.sum([[binomial_like(i, s, pt) for pt,i,s in zip(pj[:t_obs], Ij[:t_obs], Sj[:t_obs])] 
                                                   for pj,Ij,Sj in zip(p, I, S)])

    return locals()

Run models for June 15 and July 15 observation points, both with and without clinical confirmation.

In [60]:

db = 'txt'
n_iterations = 50000
n_burn = 40000

June 15, with lab confirmation

In [61]:

model_june = MCMC(measles_model('1997-06-15', spatially_weighted=False), 
                         db=db, dbname='model_june')

In [62]:

model_june.sample(n_iterations, n_burn)

 [-----------------100%-----------------] 50001 of 50000 complete in 6400.6 sec

July 15, with lab confirmation

In [59]:

model_july = MCMC(measles_model('1997-07-15', spatially_weighted=False), 
                         db=db, dbname='model_july')

In [60]:

model_july.sample(n_iterations, n_burn)

 [-----------------100%-----------------] 50000 of 50000 complete in 1974.3 sec

June 15, no lab confirmation

In [53]:

model_june_noconf = MCMC(measles_model('1997-06-15', 
                                       spatially_weighted=False, 
                                       confirmation=False), 
                         db=db, dbname='model_june_noconf')

In [54]:

model_june_noconf.sample(n_iterations, n_burn)

 [-----------------100%-----------------] 50001 of 50000 complete in 884.4 sec

July 15, no lab confirmation

In [55]:

model_july_noconf = MCMC(measles_model('1997-07-15', spatially_weighted=False, 
                                       confirmation=False), 
                         db=db, dbname='model_july_noconf')

In [56]:

model_july_noconf.sample(50000, 40000)

 [-----------------100%-----------------] 50000 of 50000 complete in 998.2 sec

Summary of model output¶

In [63]:

Rt_june = model_june.Rt.stats()

In [64]:

plt.plot(Rt_june['quantiles'][50].T, 'b-', alpha=0.4)
plt.ylabel('R(t)')
plt.xlabel('time (2-week periods)')

Out[64]:

<matplotlib.text.Text at 0x11307dc50>

In [65]:

Matplot.summary_plot(model_june.vacc_30)

Could not calculate Gelman-Rubin statistics. Requires multiple chains of equal length.

In [66]:

Matplot.plot(model_june.pct_15)

Plotting pct_15

In [67]:

Matplot.plot(model_june.pct_30)

Plotting pct_30

In [121]:

june_Rt = pd.DataFrame(model_june.Rt_obs.trace()).unstack().reset_index()
june_Rt.columns = ('district', 'iteration', 'Rt')
june_Rt['month'] = 'June'

In [204]:

june_Rt_noconf = pd.DataFrame(model_june_noconf.Rt_obs.trace()).unstack().reset_index()
june_Rt_noconf.columns = ('district', 'iteration', 'Rt')
june_Rt_noconf['month'] = 'June'

In [122]:

july_Rt = pd.DataFrame(model_july.Rt_obs.trace()).unstack().reset_index()
july_Rt.columns = ('district', 'iteration', 'Rt')
july_Rt['month'] = 'July'

In [101]:

confirmed_Rt = june_Rt.append(july_Rt, ignore_index=True)

In [158]:

june_means = june_Rt.groupby('district')['Rt'].mean()
june_means.sort(ascending=False)

In [159]:

sorted_districts = june_means.index.values

In [203]:

import seaborn as sb

sb.set_context("talk", font_scale=1.1)
f, (ax_1, ax_2) = plt.subplots(2, 1, figsize=(12,6), sharey=True, sharex=True)

sb.boxplot('district', 'Rt', data=june_Rt, ax=ax_1, linewidth=0.5, 
           fliersize=0, color='r', order=sorted_districts)
ax_1.hlines(1, xmin=0, xmax=93, linestyles='dashed', linewidth=0.2)
ax_1.set_xticks([])
ax_1.set_xlabel('')
ax_1.set_ylabel('June')
ax_1.set_ylim(0, 6)
ax_1.set_yticks(range(7))
ax_1.set_title(r'$R_t$ estimates, ordered by June means')

sb.boxplot('district', 'Rt', data=july_Rt, ax=ax_2, linewidth=0.5, 
           fliersize=0, color='r', order=sorted_districts)
ax_2.hlines(1, xmin=0, xmax=93, linestyles='dashed', linewidth=0.2)
ax_2.set_xticks([])
ax_2.set_ylabel('July')

f.tight_layout()

In [217]:

sb.set_context("talk", font_scale=1.1)
f, (ax_1, ax_2) = plt.subplots(2, 1, figsize=(12,6), sharey=True, sharex=True)

sb.boxplot('district', 'Rt', data=june_Rt, ax=ax_1, linewidth=0.5, 
           fliersize=0, color='r', order=sorted_districts)
ax_1.hlines(1, xmin=0, xmax=93, linestyles='dotted', linewidth=0.75)
ax_1.set_xticks([])
ax_1.set_xlabel('')
ax_1.set_ylabel('Lab')
ax_1.set_yticks(np.arange(13, step=2))
ax_1.set_title(r'June $R_t$ estimates, ordered by confirmed means')

sb.boxplot('district', 'Rt', data=june_Rt_noconf, ax=ax_2, linewidth=0.5, 
           fliersize=0, color='r', order=sorted_districts)
ax_2.hlines(1, xmin=0, xmax=93, linestyles='dotted', linewidth=0.75)
ax_2.set_xticks([])
ax_2.set_ylabel('Clinical')

f.tight_layout()

In [66]:

model_june.summary(['pct_5', 'pct_15', 'pct_30', 'pct_adult'])

pct_5:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.102            0.056            0.001                [ 0.   0.2]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	0.0              0.062           0.097          0.129         0.226
	

pct_15:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.369            0.087            0.002            [ 0.219  0.556]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	0.194            0.312           0.37           0.424         0.548
	

pct_30:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.922            0.047            0.001            [ 0.839  1.   ]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	0.815            0.893           0.926          0.963         1.0
	

pct_adult:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.593            0.09             0.003            [ 0.407  0.75 ]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	0.407            0.531           0.593          0.656         0.758

In [224]:

june_coverage = pd.DataFrame({name: model_june.trace(name)[:] for name in ['pct_5', 'pct_15', 'pct_30', 'pct_adult']})

In [233]:

axes = sb.boxplot(june_coverage, order=['pct_5', 'pct_15', 'pct_30', 'pct_adult'])
axes.set_xticklabels(['Under 5', 'Under 15', 'Under 30', 'Under 5 + 20-30'])
plt.ylabel('% susceptibles vaccinated')
sb.despine(offset=10, trim=True);

/usr/local/lib/python3.4/site-packages/seaborn/categorical.py:1587: UserWarning: The boxplot API has been changed. Attempting to adjust your arguments for the new API (which might not work). Please update your code. See the version 0.6 release notes for more info.
  warnings.warn(msg, UserWarning)

In [67]:

model_june_noconf.summary(['pct_5', 'pct_15', 'pct_30', 'pct_adult'])

pct_5:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.163            0.001            0.0              [ 0.162  0.165]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	0.161            0.163           0.163          0.164         0.165
	

pct_15:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.464            0.001            0.0              [ 0.462  0.466]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	0.462            0.463           0.464          0.465         0.466
	

pct_30:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.872            0.001            0.0              [ 0.871  0.874]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	0.871            0.872           0.872          0.873         0.874
	

pct_adult:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.502            0.001            0.0              [ 0.499  0.504]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	0.499            0.501           0.502          0.502         0.504

In [68]:

model_july.summary(['pct_5', 'pct_15', 'pct_30', 'pct_adult'])

pct_5:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.11             0.05             0.001            [ 0.026  0.206]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	0.026            0.079           0.108          0.146         0.212
	

pct_15:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.292            0.073            0.002            [ 0.146  0.425]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	0.154            0.242           0.293          0.342         0.441
	

pct_30:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.899            0.049            0.001            [ 0.805  0.976]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	0.794            0.868           0.897          0.939         0.974
	

pct_adult:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.67             0.076            0.002            [ 0.526  0.818]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	0.514            0.618           0.667          0.727         0.816

In [69]:

model_july_noconf.summary(['pct_5', 'pct_15', 'pct_30', 'pct_adult'])

pct_5:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.133            0.001            0.0              [ 0.131  0.135]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	0.131            0.132           0.133          0.133         0.135
	

pct_15:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.365            0.001            0.0              [ 0.362  0.367]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	0.362            0.364           0.365          0.366         0.367
	

pct_30:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.861            0.001            0.0              [ 0.859  0.863]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	0.859            0.86            0.861          0.862         0.863
	

pct_adult:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.568            0.001            0.0              [ 0.566  0.571]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	0.565            0.567           0.568          0.569         0.57

In [70]:

Rt_july = model_july.Rt.stats()

In [71]:

plt.plot(Rt_july['quantiles'][50].T, 'b-', alpha=0.4)
plt.ylabel('R(t)')
plt.xlabel('time (2-week periods)')

Out[71]:

<matplotlib.text.Text at 0x1172d5630>

In [72]:

Rt_june_noconf = model_june_noconf.Rt.stats()

In [73]:

plt.plot(Rt_june_noconf['quantiles'][50].T, 'b-', alpha=0.4)
plt.ylabel('R(t)')
plt.xlabel('time (2-week periods)')

Out[73]:

<matplotlib.text.Text at 0x117602c88>

In [74]:

Matplot.summary_plot(model_june.p_age)

Could not calculate Gelman-Rubin statistics. Requires multiple chains of equal length.

In [75]:

june_age_dist = model_june.age_dist.trace()[:]
june_age_dist.shape

Out[75]:

(10000, 16)

In [76]:

june_age_dist = pd.DataFrame(june_age_dist, columns=measles_data['AGE_GROUP'].unique())

In [154]:

plt.figure(figsize=(8,12))
Matplot.summary_plot(model_july.vaccination_target)

Could not calculate Gelman-Rubin statistics. Requires multiple chains of equal length.

In [155]:

plt.figure(figsize=(8,12))
Matplot.summary_plot(model_july.vaccination_target)

Could not calculate Gelman-Rubin statistics. Requires multiple chains of equal length.

In [157]:

Matplot.plot(model_june.alpha)

Plotting alpha

Mapping spatial effects¶

In [240]:

from mpl_toolkits.basemap import Basemap
import geopandas as gpd

lllat=-24
urlat=-23.3
lllon=-47
urlon=-46.3

SP_base = Basemap(ax=None, lon_0=(urlon + lllon) / 2, lat_0=(urlat + lllat) / 2,
        llcrnrlat=lllat, urcrnrlat=urlat, llcrnrlon=lllon, urcrnrlon=urlon, 
                  resolution='i',
                 epsg='4326')

In [241]:

SP_dist = gpd.GeoDataFrame.from_file('Sao Paulo/Brazil_full/BRA_adm3.shp').to_crs({'proj': 'longlat', 
                                                                               'ellps': 'WGS84', 
                                                                               'datum': 'WGS84'})

In [242]:

SP_dist['DIST_NAME'] = [trans.trans(_).upper() for _ in SP_dist.NAME_3]

In [243]:

SP_dist.head()

Out[243]:

	ENGTYPE_3	HASC_3	ID_0	ID_1	ID_2	ID_3	ISO	NAME_0	NAME_1	NAME_2	...	NL_NAME_3	REMARKS_3	Shape_Area	Shape_Leng	TYPE_3	VALIDFR_3	VALIDTO_3	VARNAME_3	geometry	DIST_NAME
0	District	None	32	434	4112	4868	BRA	Brazil	Bahia	Piritiba	...	None	None	0.041872	1.033621	Distrito	Unknown	Present	None	POLYGON ((-40.38711999999998 -11.6031739999999...	PIRITIBA
1	District	None	32	434	4112	4869	BRA	Brazil	Bahia	Piritiba	...	None	None	0.008318	0.375198	Distrito	Unknown	Present	None	POLYGON ((-40.62181199999998 -11.7807669999999...	PORTO FELIZ
2	District	None	32	434	4113	4870	BRA	Brazil	Bahia	Planaltino	...	None	None	0.024796	0.751948	Distrito	Unknown	Present	None	POLYGON ((-40.11320899999998 -13.3009799999998...	IBITIGUIRA
3	District	None	32	434	4113	4871	BRA	Brazil	Bahia	Planaltino	...	None	None	0.027577	0.776489	Distrito	Unknown	Present	None	POLYGON ((-40.15503999999993 -13.1252719999998...	NOVA ITAIPE
4	District	None	32	434	4113	4872	BRA	Brazil	Bahia	Planaltino	...	None	None	0.025845	1.052880	Distrito	Unknown	Present	None	POLYGON ((-40.38104599999997 -13.2141219999999...	PLANALTINO

5 rows × 21 columns

In [244]:

SP_dist.NAME_3.unique()

Out[244]:

array(['Piritiba', 'Porto Feliz', 'Ibitiguira', ..., 'Tupiratins',
       'Wanderlândia', 'Xambioá'], dtype=object)

In [245]:

district_names

Out[245]:

array(['BRAS', 'BARRA FUNDA', 'FREGUESIA DO O', 'CAMBUCI', 'PENHA',
       'BRAZILANDIA', 'SANTA CECILIA', 'CASA VERDE', 'CAPAO REDONDO',
       'CONSOLACAO', 'JAGUARE', 'CIDADE ADEMAR', 'CIDADE TIRADENTES',
       'SAPOPEMBA', 'MOOCA', 'CANGAIBA', 'SAUDE', 'SANTANA',
       'JARDIM ANGELA', 'CAMPO LIMPO', 'VILA MARIANA', 'VILA CURUCA',
       'CIDADE DUTRA', 'ARTUR ALVIM', 'JARDIM HELENA', 'SE', 'LAPA',
       'JARDIM PAULISTA', 'JABAQUARA', 'PINHEIROS', 'RIO PEQUENO',
       'GRAJAU', 'VILA SONIA', 'IGUATEMI', 'SANTO AMARO', 'LIMAO',
       'PERDIZES', 'SAO DOMINGOS', 'SAO LUCAS', 'PIRITUBA',
       'RAPOSO TAVARES', 'VILA FORMOSA', 'BELEM', 'REPUBLICA',
       'SAO MATEUS', 'TREMEMBE', 'VILA PRUDENTE', 'VILA LEOPOLDINA',
       'BOM RETIRO', 'JOSE BONIFACIO', 'PARELHEIROS', 'PERUS', 'PEDREIRA',
       'JARAGUA', 'LAJEADO', 'SACOMA', 'ITAIM PAULISTA', 'TUCURUVI',
       'VILA MEDEIROS', 'LIBERDADE', 'TATUAPE', 'CARRAO', 'BELA VISTA',
       'BUTANTA', 'SAO RAFAEL', 'MORUMBI', 'AGUA RASA', 'MOEMA',
       'VILA ANDRADE', 'ANHANGUERA', 'VILA MARIA', 'IPIRANGA', 'SOCORRO',
       'CACHOEIRINHA', 'ARICANDUVA', 'CAMPO GRANDE', 'MANDAQUI',
       'ITAQUERA', 'SAO MIGUEL', 'JAGUARA', 'PARQUE DO CARMO', 'JACANA',
       'CIDADE LIDER', 'CAMPO BELO', 'VILA JACUI', 'ITAIM BIBI',
       'VILA GUILHERME', 'CURSINO', 'MARSILAC', 'GUAIANASES',
       'VILA MATILDE', 'PONTE RASA', 'PARI'], dtype=object)

In [307]:

Rt_june = pd.Series(model_june.Rt_obs.stats()['mean'], index=sp_districts)

In [308]:

Rt_june

Out[308]:

BRAS              1.027835
BARRA FUNDA       1.777664
FREGUESIA DO O    0.160857
CAMBUCI           0.897236
PENHA             0.167268
BRAZILANDIA       0.181805
SANTA CECILIA     0.423634
CASA VERDE        0.305393
CAPAO REDONDO     0.267271
CONSOLACAO        0.383250
...
CIDADE LIDER      0.216627
CAMPO BELO        0.375009
VILA JACUI        0.148078
ITAIM BIBI        0.424089
VILA GUILHERME    0.283753
CURSINO           0.177329
MARSILAC          3.106626
GUAIANASES        0.316312
VILA MATILDE      0.207121
PONTE RASA        0.205254
PARI              1.745154
Length: 93, dtype: float64

In [311]:

SP_dist_merged = SP_dist.merge(pd.DataFrame(Rt_june, columns=['Rt']), left_on='DIST_NAME', right_index=True)

In [319]:

SP_dist_merged = SP_dist.merge(foo, left_on='DIST_NAME', right_index=True)

In [276]:

measles_onset_conf = measles_data[CONFIRMED].groupby(['DISTRICT','ONSET']).size().unstack(level=0).fillna(0).sum()

In [296]:

measles_onset_conf

Out[296]:

DISTRICT
AGUA RASA         37
ALTO PINHEIROS    19
AMERICANOPOLIS     1
ANHANGUERA        22
ARICANDUVA        32
ARTUR ALVIM       68
BARRA FUNDA       37
BELA VISTA        83
BELEM             16
BOM RETIRO        40
...
VILA.CARRAO         1
VILA.JAGUARA        1
VILA.LEOPOLDINA     1
VILAILA ANDRADE     1
VILAILA CURUCA      1
VILAILA FORMOSA     1
VILAILA MARIA       1
VILAILA MATILDE     1
VILAILA MEDEIROS    1
VILAILA SONIA       3
VILAL FORMOSA       1
Length: 153, dtype: float64

In [298]:

_rates = measles_onset_conf/sp_pop.sum(1)

In [299]:

SP_dist_conf = SP_dist.merge(pd.DataFrame(_rates, columns=['rate']), left_on='DIST_NAME', right_index=True)

In [320]:

from matplotlib.pyplot import cm

map_fig = plt.figure(figsize=(16,12))
map_ax = plt.gca()
SP_base.drawcoastlines()
SP_base.drawrivers()

SP_dist_merged.plot(column='cases', colormap=cm.Reds, axes=map_ax)

Out[320]:

<matplotlib.axes._subplots.AxesSubplot at 0x4723f3be0>

In [ ]: