Uterine fibroids follow-up treatment meta-analysis

Our goal is to estimate the probabilities of requiring one of a suite of candidate follow-up treatments following randomization to a given initial treatment for uterine fibroids. Specifically, we are interested in estimating:

$$Pr(I_2|I_1 =i,T=t)$$

where $I_1$ is an initial intervention, which take specific values $i = 1, 2, \ldots , K$ for each of $K$ candidate intervention types, $I_2$ is the followup intervention that also may take any of the same values of $i$, and $T$ is followup time in months, which will generally be either 6 or 12 months.

Our current set of candidate interventions include:

  • Myomectomy
  • Hysterectomy
  • Ablation
  • UAE
  • Magnetic resonance imaging-guided high-intensity focused ultrasound (MRIgFUS)
  • Ablation +/- hysteroscopic myomectomy
  • No intervention

Rather than model each conditional probability independently, we will instead model the outcomes for a treatment arm as a multinomial random variable. That is,

$$\{X_{I_2} \} ∼ \text{Multinomial}(N_{I_1}=i, \{\pi_i\})$$

where $\{X_{I_2}\}$ is the vector of outcomes corresponding to each of the possible followup interventions listed above, $N_{I_1}=i$ is the number of women randomized to the initial intervention i, and $\{\pi_i\}$ is a vector of conditional transition probabilities corresponding to $Pr(I_2|I_1 = i, T = t)$, as specified above. The multinomial distribution is a multivariate generalization of the categorical distribution, which is what the above simplifies to when modeling the outcome for a single patient. The multivariate formulation allows us to model study-arm-specific outcomes, incorporating covariates that are specific to that arm or study.

The quantities of interest are the vectors of transition probabilities $\{\pi_i\}$ corresponding to each of the initial candidate interventions. A naive approach to modeling these is to assign a vague Dirichlet prior distribution to each set, and perform Bayesian inference using the multinomial likelihood, with which the Dirichlet is conjugate, to yield posterior estimates for each probability. However, there may be additional information with which to model these probabilities, which may include:

  • followup time for each study
  • arm-specific demographic covariates (e.g. race, mean age)
  • study-specific random effects

hence, a given transition probability $\pi_{ijk}$ – the probability of transitioning from initial intervention $i$ to followup intervention $j$ in study $k$ – may be modeled as:

$$\text{logit}(\pi_{ijk})= \theta_{ij} + X_k \beta_{ij} + \epsilon_k$$

where $\theta_{ij}$ is a baseline transition probability (on the logit scale), $X_k$ a matrix of study(-arm)-specific covariates, $\beta_{ij}$ the corresponding coefficients, and $\epsilon_k$ a mean-zero random effect for study k. We will initially consider (1) follow-up time and (2) mean/median age as covariates.

An attractive benefit to using Bayesian inference to estimate this model is that it is easy to generate predictions from the model, via the posterior predictive distribution. For example, we could estimate the distribution of the expected proportion of women requiring a particular followup intervention; this estimate would factor in both the residual uncertainty in the transition probability estimates, as well as the sampling uncertainty of the intervention.

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import pymc as pm
import seaborn as sns
import pdb
sns.set()

Import data from worksheets in Excel spreadsheet.

In [2]:
data_file = 'UF Subsequent Interventions Data_Master_updated.xlsx'
In [3]:
missing = ['NA', 'NR', 'ND', '?', 'null']

misc_data = pd.read_excel('data/' + data_file, sheetname='MISC (SP)', na_values=missing)
misc_data = misc_data[~misc_data['baseline_n'].isnull()].drop('notes', axis=1)
rows, cols = misc_data.shape
print('Occlusion rows={0}, columns={1}, missing={2}'.format(rows, cols,
                                                        misc_data.isnull().sum().sum()))

med_vs_iac_data = pd.read_excel('data/' + data_file, sheetname='Med vs IAC JW', na_values=missing)
med_vs_iac_data = med_vs_iac_data[~med_vs_iac_data['trial_arm'].isnull()].drop('notes', axis=1)
rows, cols = med_vs_iac_data.shape
print('Med vs IAC rows={0}, columns={1}, missing={2}'.format(rows, cols, 
                                                            med_vs_iac_data.isnull().sum().sum()))

med_vs_med_data = pd.read_excel('data/' + data_file, sheetname='Med vs Med DVE', na_values=missing)
med_vs_med_data = med_vs_med_data[~med_vs_med_data['baseline_n'].isnull()].drop('notes', axis=1)
rows, cols = med_vs_med_data.shape
print('Med vs Med rows={0}, columns={1}, missing={2}'.format(rows, cols, 
                                                            med_vs_med_data.isnull().sum().sum()))

uae_data = pd.read_excel('data/' + data_file, sheetname='UAE SK')
uae_data = uae_data[~uae_data['baseline_n'].isnull()].drop('notes', axis=1)
rows, cols = uae_data.shape
print('UAE rows={0}, columns={1}, missing={2}'.format(rows, cols, 
                                                            uae_data.isnull().sum().sum()))

datasets = [misc_data, med_vs_iac_data, med_vs_med_data, uae_data]
Occlusion rows=31, columns=13, missing=6
Med vs IAC rows=49, columns=13, missing=46
Med vs Med rows=67, columns=13, missing=13
UAE rows=32, columns=13, missing=0
In [4]:
unique_inerventions = set(np.concatenate([d.intervention.values for d in datasets]))

Use the following lookup table to create "intervention category" field in each dataset.

In [5]:
# %load intervention_lookup.py
intervention_lookup = {'Ablation': 'ablation',
 'Ablation+/- hysteroscopic myomectomy': 'ablation',
 'Asoprisnil 10 mg': 'med_manage',
 'Asoprisnil 25 mg': 'med_manage',
 'Asoprisnil 5 mg': 'med_manage',
 'CD20 (Ulipristal)': 'med_manage',
 'CDB10 (Ulipristal)': 'med_manage',
 'Hysterectomy': 'hysterectomy',
 'LBCUV': 'uae',
 'LP + GnRH agonist plus raloxifene': 'med_manage',
 'LP + placebo': 'med_manage',
 'LPA+ MPA / LPA+placebo': 'med_manage',
 'LPA+ placebo / LPA+MPA': 'med_manage',
 'LUNA plus LBCUV': 'ablation',
 'Myomectomy': 'myomectomy',
 'No treatment': 'control',
 'No treatment (control)': 'control',
 'Placebo': 'control',
 'Raloxifene, 180mg/day': 'med_manage',
 'SC implant of 3.6 goserelin + placebo (3 months) then tibolone 2.5 mg daily (3 months)': 'med_manage',
 'SC implant of 3.6 goserelin + placebo (6 months)': 'med_manage',
 'SC implant of 3.6 goserelin + tibolone 2.5 mg daily (6 months)': 'med_manage',
 'Surgery': 'DROP',
 'Tibolone': 'med_manage',
 'UAE': 'uae',
 'UAE only': 'uae',
 'UAE plus goserelin acetate depot': 'uae',
 'buserelin + goserelin': 'med_manage',
 'buserelin, intranasal': 'med_manage',
 'cabergoline': 'med_manage',
 'diphereline': 'med_manage',
 'gestrinone, 2.5mg': 'med_manage',
 'gestrinone, 2.5mg oral + gestrinone, 5mg oral + gestrinone, 5mg vaginal': 'med_manage',
 'gestrinone, 5mg': 'med_manage',
 'gestrinone, 5mg vaginal': 'med_manage',
 'goserelin, subcutaneous': 'med_manage',
 'healthy controls': 'control',
 'hormone replacement therapy, transdermal': 'DROP',
 'hysterectomy or myomectomy': 'DROP',
 'letrozole, 2.5mg': 'med_manage',
 'leuprolide': 'med_manage',
 'leuprolide acetate depot (11.25 mg q 3 months) + Placebo': 'med_manage',
 'leuprolide acetate depot (11.25 mg q 3 months) + tibolone 2.5 mg/d orally': 'med_manage',
 'leuprolide acetate depot (3.75 mg/28 d) + placebo (B)': 'med_manage',
 'leuprolide plus (tibolone 2.5 mg daily) (A)': 'med_manage',
 'leuprolide plus MPA': 'med_manage',
 'leuprolide plus estrogen-progestin': 'med_manage',
 'leuprolide plus placebo': 'med_manage',
 'leuprolide plus progestin': 'med_manage',
 'leuprolide plus raloxifene 60 mg daily': 'med_manage',
 'leuprolide, 1.88mg': 'med_manage',
 'leuprolide, 3.75mg': 'med_manage',
 'mifepristone, 10mg': 'med_manage',
 'mifepristone, 10mg + mifepristone, 5mg': 'med_manage',
 'mifepristone, 2.5mg': 'med_manage',
 'mifepristone, 5mg': 'med_manage',
 'placebo': 'control',
 'raloxifene 180 mg daily': 'med_manage',
 'raloxifene 60 mg daily': 'med_manage',
 'tamoxifen 20 mg daily': 'med_manage',
 'tibolone': 'med_manage',
 'tibolone, 2.5mg': 'med_manage',
 'transdermal estrogen replacement therapy': 'med_manage',
 'triptorelin, 100ug': 'med_manage',
 'triptorelin, 100ug + triptorelin, 20ug + triptorelin, 5ug': 'med_manage',
 'triptorelin, 20ug': 'med_manage',
 'triptorelin, 3.6mg/mo': 'med_manage',
 'triptorelin, 5ug': 'med_manage',
 'ulipristal acetate followed by placebo': 'med_manage',
 'ulipristal acetate followed by progestin': 'med_manage',
 'ulipristal, 10mg': 'med_manage',
 'ulipristal, 5mg': 'med_manage',
 'HIFU': 'MRgFUS',
 'HIFU with CEUS': 'MRgFUS',
 'LUAO': 'uae',
 'UAE plus PVA': 'uae',
 'UAE plus TAG': 'uae',
 'UAE with PVA': 'uae',
 'UAE with PVA particles, large': 'uae',
 'UAE with PVA particles, small': 'uae',
 'UAE with SPA': 'uae',
 'UAE with SPVA': 'uae',
 'UAE with TAG': 'uae',
 'UAE with TAG microspheres': 'uae',
 'myomectomy': 'myomectomy',
 'myomectomy with vasopressin': 'myomectomy',
 'myomectomy, abdominal': 'myomectomy',
 'myomectomy, laparoscopic': 'myomectomy',
 'myomectomy, loop ligation with vasopressin': 'myomectomy',
 'myomectomy, minilaparotomic': 'myomectomy'}

Assign intervention categories to each arm

In [6]:
datasets = [d.assign(intervention_cat=d.intervention.replace(intervention_lookup)) for d in datasets]
In [7]:
intervention_categories = set(intervention_lookup.values())
intervention_categories
Out[7]:
{'DROP',
 'MRgFUS',
 'ablation',
 'control',
 'hysterectomy',
 'med_manage',
 'myomectomy',
 'uae'}

Import demographic information

In [8]:
demographics = pd.read_excel('data/' + data_file, sheetname='ALL_DEMO_DATA', na_values=missing)
demographics.columns
Out[8]:
Index(['study_id', 'Citation', 'FamCode', 'FamDesig', 'NCT', 'ArmsN',
       'ArmCategory', 'Group_Desc', 'New Grouping', 'Demo_Category',
       'Demo_specify', 'BL N', 'Denom_N', 'BL %', 'BL Mean', 'BL SD', 'BL_SE',
       'BL_Median', 'BL Min', 'BL Max', 'BL 95% L', 'BL 95% H',
       'BL_group_diff', 'Comments'],
      dtype='object')

Extract columns of interest

In [9]:
age_data = demographics.loc[demographics.Demo_Category=='Age', ['study_id', 'New Grouping', 'BL Mean', 'BL SD']]

Clean arm labels

In [10]:
age_data = age_data.assign(arm=age_data['New Grouping'].str.replace(':','')).drop('New Grouping', axis=1)
In [11]:
age_data.arm.unique()
Out[11]:
array(['G2', 'G1', 'G1b', 'G1a', 'G3', 'CG', 'G1c', 'G1+G2', 'G1a+G1b+G1c'], dtype=object)

Concatenate all datasets

In [12]:
all_data = pd.concat(datasets)

Clean up study arm field

In [13]:
all_arm = all_data.trial_arm.str.replace(':','').str.replace(' ', '').str.replace('Group', 'G')
all_data = all_data.assign(arm=all_arm).drop('trial_arm', axis=1)
In [14]:
all_data.arm.unique()
Out[14]:
array(['G1', 'G2', 'G3', 'CG', 'G1a', 'G1b', 'G1c', 'CG1', 'CG2', 'G1/CG',
       'CG/G1', 'G1a+G1b', 'G1a+G1b+G1c', 'G1+G2'], dtype=object)

Clean up study ID field. Currently contains non-numeric entries. Will strip out the first study ID from the compund labels, as this is the parent study ID.

In [15]:
all_data.study_id.unique()
Out[15]:
array([23, 347, 1400, 1529, 1806, 1889, 2375, 2967, 3382, 3690, 3785, 5186,
       5474, 414, 1849, 3016, 3181, 3324, 3674, 4258, 4468, 4858, 4960,
       5276, 5302, 6091, 6263, 6696, 7155, 7504, 7797, 7936, 95.0, 629.0,
       757.0, 1290.0, 2318.0, 2555.0, 2635.0, 3312.0, 3978.0, 4787.0,
       4961.0, 5721.0, 6393.0, 6903.0, 7139.0, 7309.0, 7530.0, 7589.0,
       7763.0, '3803_3052', 1546, '3365_2026_1657_986',
       '3819_815_1986_2759_2971_\n3120_3175_3192_3678_3721', 4789, 2006], dtype=object)
In [16]:
str_mask = all_data.study_id.str.isnumeric()==False
all_data.loc[str_mask, 'study_id'] = all_data.study_id[str_mask].apply(lambda x: x[:x.find('_')])
all_data.study_id = all_data.study_id.astype(int)
In [17]:
all_data.study_id.unique()
Out[17]:
array([  23,  347, 1400, 1529, 1806, 1889, 2375, 2967, 3382, 3690, 3785,
       5186, 5474,  414, 1849, 3016, 3181, 3324, 3674, 4258, 4468, 4858,
       4960, 5276, 5302, 6091, 6263, 6696, 7155, 7504, 7797, 7936,   95,
        629,  757, 1290, 2318, 2555, 2635, 3312, 3978, 4787, 4961, 5721,
       6393, 6903, 7139, 7309, 7530, 7589, 7763, 3803, 1546, 3365, 3819,
       4789, 2006])

Here is what the data look like after merging.

In [18]:
all_data.head()
Out[18]:
study_id intervention baseline_n followup_interval followup_n hysterectomy myomectomy uae MRIgFUS ablation iud no_treatment intervention_cat arm
0 23 HIFU with CEUS 17 12 17 0 0 0 1 0 0 16 MRgFUS G1
1 23 HIFU 16 12 16 0 0 0 3 0 0 13 MRgFUS G2
2 347 UAE with SPVA 30 12 27 1 0 0 0 0 0 26 uae G1
3 347 UAE with TAG 30 12 29 0 0 0 0 0 0 29 uae G2
4 1400 UAE 63 6 62 0 1 5 0 0 0 56 uae G1
In [19]:
all_data.groupby('intervention_cat')['study_id'].count()
Out[19]:
intervention_cat
DROP              8
MRgFUS            2
ablation          3
control          11
hysterectomy      7
med_manage      100
myomectomy       14
uae              34
Name: study_id, dtype: int64

Merge age data with outcomes

In [20]:
all_data_merged = pd.merge(all_data, age_data, on=['study_id', 'arm'])

For now, drop arms with no reported followup time (we may want to impute these):

In [21]:
all_data_merged = all_data_merged.dropna(subset=['followup_interval'])

Parse followup intervals that are ranges, creating fup_min and fup_max fields.

In [22]:
dataset = all_data_merged.assign(fup_min=0, fup_max=all_data.followup_interval.convert_objects(convert_numeric=True).max()+1)
range_index = dataset.followup_interval.str.contains('to').notnull()
range_vals = dataset[range_index].followup_interval.apply(lambda x: x.split(' '))
dataset.loc[range_index, ['fup_min']] = range_vals.apply(lambda x: float(x[0]))
dataset.loc[range_index, ['fup_max']] = range_vals.apply(lambda x: float(x[-1]))
dataset.loc[range_index, ['followup_interval']] = 30.33
dataset['followup_interval'] = dataset.followup_interval.astype(float)
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py:1: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  if __name__ == '__main__':
In [23]:
dataset.head()
Out[23]:
study_id intervention baseline_n followup_interval followup_n hysterectomy myomectomy uae MRIgFUS ablation iud no_treatment intervention_cat arm BL Mean BL SD fup_max fup_min
0 23 HIFU with CEUS 17 12 17 0 0 0 1 0 0 16 MRgFUS G1 43.1 5.3 61 0
1 23 HIFU 16 12 16 0 0 0 3 0 0 13 MRgFUS G2 42 5.4 61 0
2 347 UAE with SPVA 30 12 27 1 0 0 0 0 0 26 uae G1 43.9 5.0 61 0
3 347 UAE with TAG 30 12 29 0 0 0 0 0 0 29 uae G2 41.7 5.4 61 0
4 1400 UAE 63 6 62 0 1 5 0 0 0 56 uae G1 41 3.5 61 0

Fill missing values

In [24]:
dataset.loc[dataset.followup_n.isnull(), 'followup_n'] = dataset.loc[dataset.followup_n.isnull(), 'baseline_n']
In [25]:
dataset.loc[dataset.no_treatment.isnull(), 'no_treatment'] = dataset.followup_n - dataset[[ 'hysterectomy', 'myomectomy', 'uae',
                                                        'MRIgFUS', 'ablation', 'iud']].sum(1)[dataset.no_treatment.isnull()]
In [26]:
dataset.followup_interval.unique()
Out[26]:
array([ 12.  ,   6.  ,  30.33,  24.  ,   2.  ,   1.  ,   3.  ,   5.5 ,
         9.  ,  18.  ,   0.  ,   7.  ,  60.  ])
In [27]:
dataset['BL Mean'].unique()
Out[27]:
array([43.1, 42, 43.9, 41.7, 41, 43.5, 40.3, 42.7, 45, 44, 38.26, 32.1,
       34.3, 44.9, 42.5, 43.3, 38.4, 37.5, 45.9, 44.5, 33.97, 34, 41.3,
       42.9, 42.1, 43.4, 37.7, 43, 40.2, 41.1, 49.1, 48.6, 36.3, 35.9,
       37.2, 54.2, 51.2, 43.6, nan, 38.9, 37.1, 41.4, 36.9, 41.6, 39, 39.6,
       39.67, 36.87, 30.94, 31, 39.5, 42.8, 56.2, 57.9, 50.2, 50.6, 34.4,
       42.2, 49.2, 32.6, 48.4, 33.8, 38.1, 37, 32.3, 43.2, 44.6, 45.4,
       46.4, 48.5, 48.3], dtype=object)
In [28]:
crossover_studies = 7155, 3324, 414, 95, 7139, 6903, 3721, 3181, 4858, 4960, 4258, 4789, 2006, 2318
In [29]:
outcome_cats = [ 'hysterectomy', 'myomectomy', 'uae',
       'MRIgFUS', 'ablation', 'iud', 'no_treatment']
In [30]:
dataset.loc[dataset['BL Mean'].isnull(), 'BL Mean'] = 90
In [58]:
from numpy.ma import masked_values


def specify_model(intervention):
    
    '''
    Data preparation
    '''
    
    intervention_data = dataset[(dataset.intervention_cat==intervention)
                            & ~dataset[outcome_cats].isnull().sum(axis=1).astype(bool)]
    
    followup_masked = masked_values(intervention_data.followup_interval.values, 30.33)
    followup_min, followup_max = intervention_data[['fup_min', 'fup_max']].values.T

    outcomes = intervention_data[[ 'hysterectomy', 'myomectomy', 'uae',
           'MRIgFUS', 'ablation', 'iud', 'no_treatment']].values
    
    if np.isnan(outcomes).any():
        print('Missing values in outcomes for', intervention)

    followup_n = intervention_data.followup_n.values

    # Center age at 40
    age_masked = masked_values(intervention_data['BL Mean'].values - 40, 50)
    
    studies = intervention_data.study_id.unique()
    study_index = np.array([np.argwhere(studies==i).squeeze() for i in intervention_data.study_id])

    study_id = intervention_data.study_id.values
    
    n_studies = len(set(study_id))

    n_outcomes = 7
    arms = len(outcomes)
    
    '''
    Model specification
    '''
            
    
    # Impute followup times uniformly over the observed range
    if np.any(followup_masked.mask):
        followup_time = pm.Uniform('followup_time', followup_min.min(), followup_max.max(), 
                                   observed=True, 
                                   value=followup_masked)
    else:
        followup_time = followup_masked.data.astype(float)

    # Impute age using a T-distribution
    if np.any(age_masked.mask):
        nu = pm.Exponential('nu', 0.01, value=10)
        age_centered = pm.T('age_centered', nu, observed=True, value=age_masked)
    else:
        age_centered = age_masked.data.astype(float)

    # Mean probabilities (on logit scale)
    μ = pm.Normal('μ', 0, 0.01, value=[-1.]*n_outcomes)
    # Followup time covariates 
    β_fup = pm.Normal('β_fup', 0, 1e-6, value=[-0.5]*6 + [0])
    # Age covariate
    β_age = pm.Normal('β_age', 0, 1e-6, value=[-0.5]*6 + [0])

    # Study random effect
    σ = pm.Uniform('σ', 0, 100, value=1)
    ϵ = pm.Normal('ϵ', 0, σ**-2, value=np.zeros(n_studies))

    # Expected value (on logit scale)
    θ_uae = [pm.Lambda('θ_uae_%i' % i, lambda μ=μ, β_fup=β_fup, β_age=β_age, ϵ=ϵ, t=followup_time, a=age_centered: 
            np.exp(μ + β_fup*t[i] + β_age*a[i] + ϵ[study_index[i]])) 
                     for i in range(arms)]

    # Inverse-logit transformation to convert to probabilities
    π = [pm.Lambda('π_%i' % i, lambda t=t: t/t.sum()) for i,t in enumerate(θ_uae)]

    # Multinomial data likelihood
    y = [pm.Multinomial('y_%i' % i, outcomes[i].sum(), π[i], 
                                 value=outcomes[i], observed=True) for i in range(arms)]
    p_6 = pm.Lambda('p_6', lambda μ=μ, β_fup=β_fup: np.exp(μ + β_fup*6)/np.exp(μ + β_fup*6).sum())
    p_12 = pm.Lambda('p_12', lambda μ=μ, β_fup=β_fup: np.exp(μ + β_fup*12)/np.exp(μ + β_fup*12).sum())
    p_6_50 = pm.Lambda('p_6_50', lambda μ=μ, β_fup=β_fup, β_age=β_age: 
                       np.exp(μ + β_fup*6 + β_age*10)/np.exp(μ + β_fup*6 + β_age*10).sum())
        
    return locals()

Instantiate models

In [59]:
n_iterations = 50000
n_burn = 40000
n_chains = 2
In [ ]:
uae_model = pm.MCMC(specify_model('uae'))
for chain in range(n_chains):
    print('\nchain',chain+1)
    uae_model.sample(n_iterations, n_burn)
chain 1
 [-----------------100%-----------------] 50000 of 50000 complete in 2139.6 sec
chain 2
 [---------------- 43%                  ] 21706 of 50000 complete in 9302.4 sec
In [41]:
plot_labels = dataset.columns[5:12]
In [45]:
uae_model.followup_time.trace()
Out[45]:
array([[ 18.58231743,  28.32509498,  11.52847809,  58.35255279],
       [ 18.58231743,  28.32509498,  11.52847809,  58.35255279],
       [ 18.58231743,  28.32509498,  11.52847809,  58.35255279],
       ..., 
       [  6.41701153,  55.6964252 ,  19.74318293,  44.20362134],
       [  6.41701153,  55.6964252 ,  19.74318293,  44.20362134],
       [  6.41701153,  55.6964252 ,  19.74318293,  44.20362134]])
In [46]:
pm.Matplot.summary_plot(uae_model.followup_time)
In [47]:
pm.Matplot.summary_plot(uae_model.μ, custom_labels=plot_labels)

Follow-up time effect size estimates. Positive values indicate higher probability of event with increased follow-up time.

In [48]:
pm.Matplot.summary_plot(uae_model.β_fup, custom_labels=plot_labels)

Age effect size estimates. Positive values suggest higher probability of event with each year above age 40.

In [55]:
pm.forestplot(trace_uae, vars=['β_age'], ylabels=plot_labels)
Out[55]:
<matplotlib.gridspec.GridSpec at 0x12584a4e0>

Estimated probabilities of follow-up interventions for 6-month followup and age 40.

In [57]:
pm.forestplot(trace_uae, vars=['p_6'], ylabels=plot_labels)
Out[57]:
<matplotlib.gridspec.GridSpec at 0x128ada4a8>
In [58]:
pm.summary(trace_uae, vars=['p_6'])
p_6:

  Mean             SD               MC Error         95% HPD interval
  -------------------------------------------------------------------
  
  0.023            0.007            0.000            [0.012, 0.034]
  0.033            0.007            0.000            [0.020, 0.048]
  0.016            0.004            0.000            [0.009, 0.025]
  0.000            0.002            0.000            [0.000, 0.001]
  0.000            0.001            0.000            [0.000, 0.001]
  0.000            0.004            0.000            [0.000, 0.001]
  0.927            0.013            0.001            [0.907, 0.948]

  Posterior quantiles:
  2.5            25             50             75             97.5
  |--------------|==============|==============|--------------|
  
  0.013          0.019          0.022          0.026          0.036
  0.020          0.027          0.032          0.037          0.048
  0.009          0.013          0.016          0.019          0.026
  0.000          0.000          0.000          0.000          0.002
  0.000          0.000          0.000          0.000          0.002
  0.000          0.000          0.000          0.000          0.002
  0.903          0.921          0.928          0.934          0.945

Estimated probabilities of follow-up interventions for 12-month followup and age 40.

In [59]:
pm.forestplot(trace_uae, vars=['p_12'], ylabels=plot_labels)
Out[59]:
<matplotlib.gridspec.GridSpec at 0x11eb50a58>
In [60]:
pm.summary(trace_uae, vars=['p_12'])
p_12:

  Mean             SD               MC Error         95% HPD interval
  -------------------------------------------------------------------
  
  0.029            0.008            0.000            [0.015, 0.042]
  0.032            0.006            0.000            [0.020, 0.044]
  0.019            0.005            0.000            [0.010, 0.028]
  0.000            0.002            0.000            [0.000, 0.001]
  0.000            0.001            0.000            [0.000, 0.001]
  0.000            0.004            0.000            [0.000, 0.001]
  0.920            0.013            0.001            [0.899, 0.940]

  Posterior quantiles:
  2.5            25             50             75             97.5
  |--------------|==============|==============|--------------|
  
  0.016          0.024          0.028          0.033          0.044
  0.021          0.027          0.031          0.036          0.045
  0.011          0.016          0.019          0.022          0.029
  0.000          0.000          0.000          0.000          0.002
  0.000          0.000          0.000          0.000          0.002
  0.000          0.000          0.000          0.000          0.001
  0.897          0.914          0.921          0.927          0.938

Estimated probabilities of follow-up interventions for 12-month followup and age 50.

In [61]:
pm.forestplot(trace_uae, vars=['p_6_50'], ylabels=plot_labels)
Out[61]:
<matplotlib.gridspec.GridSpec at 0x1286a26a0>
In [62]:
pm.summary(trace_uae, vars=['p_6_50'])
p_6_50:

  Mean             SD               MC Error         95% HPD interval
  -------------------------------------------------------------------
  
  0.089            0.043            0.003            [0.007, 0.168]
  0.001            0.002            0.000            [0.000, 0.002]
  0.012            0.007            0.000            [0.001, 0.026]
  0.001            0.003            0.000            [0.000, 0.005]
  0.025            0.048            0.004            [0.000, 0.111]
  0.173            0.255            0.024            [0.000, 0.788]
  0.699            0.219            0.020            [0.170, 0.917]

  Posterior quantiles:
  2.5            25             50             75             97.5
  |--------------|==============|==============|--------------|
  
  0.014          0.061          0.085          0.112          0.185
  0.000          0.001          0.001          0.001          0.003
  0.002          0.007          0.011          0.016          0.030
  0.000          0.000          0.000          0.000          0.008
  0.000          0.001          0.007          0.026          0.156
  0.000          0.011          0.040          0.211          0.849
  0.131          0.637          0.791          0.847          0.902

In [45]:
med_manage_model = specify_model(med_manage_model, 'med_manage')
Applied log-transform to nu and added transformed nu_log to model.
Applied interval-transform to σ and added transformed σ_interval to model.
In [46]:
with med_manage_model:
        
    trace_med_manage = pm.sample(2000)
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/theano/gradient.py:545: UserWarning: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: ϵ
  handle_disconnected(elem)
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/theano/gradient.py:571: UserWarning: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: <DisconnectedType>
  handle_disconnected(rval[i])
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/theano/gradient.py:545: UserWarning: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: ϵ
  handle_disconnected(elem)
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/theano/gradient.py:571: UserWarning: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: <DisconnectedType>
  handle_disconnected(rval[i])
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/theano/gradient.py:545: UserWarning: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: σ_interval
  handle_disconnected(elem)
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/theano/gradient.py:571: UserWarning: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: <DisconnectedType>
  handle_disconnected(rval[i])
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/theano/gradient.py:545: UserWarning: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: σ_interval
  handle_disconnected(elem)
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/theano/gradient.py:571: UserWarning: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: <DisconnectedType>
  handle_disconnected(rval[i])
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/theano/gradient.py:545: UserWarning: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: β_age
  handle_disconnected(elem)
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/theano/gradient.py:545: UserWarning: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: β_age
  handle_disconnected(elem)
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/theano/gradient.py:571: UserWarning: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: <DisconnectedType>
  handle_disconnected(rval[i])
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/theano/gradient.py:545: UserWarning: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: β_fup
  handle_disconnected(elem)
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/theano/gradient.py:545: UserWarning: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: μ
  handle_disconnected(elem)
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/theano/gradient.py:545: UserWarning: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: age_centered_missing
  handle_disconnected(elem)
Assigned NUTS to followup_time_missing
Assigned NUTS to nu_log
Assigned NUTS to age_centered_missing
Assigned NUTS to μ
Assigned NUTS to β_fup
Assigned NUTS to β_age
Assigned NUTS to σ_interval
Assigned NUTS to ϵ
Assigned NUTS to followup_time_missing
Assigned NUTS to nu_log
Assigned NUTS to age_centered_missing
Assigned NUTS to μ
Assigned NUTS to β_fup
Assigned NUTS to β_age
Assigned NUTS to σ_interval
Assigned NUTS to ϵ
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-46-5f5abf71fe8b> in <module>()
      1 with med_manage_model:
      2 
----> 3     trace_med_manage = pm.sample(2000)

/Users/fonnescj/Repositories/pymc3/pymc3/sampling.py in sample(draws, step, start, trace, chain, njobs, tune, progressbar, model, random_seed)
    122     model = modelcontext(model)
    123 
--> 124     step = assign_step_methods(model, step)
    125 
    126     if njobs is None:

/Users/fonnescj/Repositories/pymc3/pymc3/sampling.py in assign_step_methods(model, step, methods)
     67 
     68     # Instantiate all selected step methods
---> 69     steps += [s(vars=selected_steps[s]) for s in selected_steps if selected_steps[s]]
     70 
     71     if len(steps)==1:

/Users/fonnescj/Repositories/pymc3/pymc3/sampling.py in <listcomp>(.0)
     67 
     68     # Instantiate all selected step methods
---> 69     steps += [s(vars=selected_steps[s]) for s in selected_steps if selected_steps[s]]
     70 
     71     if len(steps)==1:

/Users/fonnescj/Repositories/pymc3/pymc3/step_methods/nuts.py in __init__(self, vars, scaling, step_scale, is_cov, state, Emax, target_accept, gamma, k, t0, model, profile, **kwargs)
     67 
     68         if isinstance(scaling, dict):
---> 69             scaling = guess_scaling(Point(scaling, model=model), model=model, vars = vars)
     70 
     71 

/Users/fonnescj/Repositories/pymc3/pymc3/tuning/scaling.py in guess_scaling(point, vars, model)
    107     model = modelcontext(model)
    108     try:
--> 109         h = find_hessian_diag(point, vars, model=model)
    110     except NotImplementedError:
    111         h = fixed_hessian(point, vars, model=model)

/Users/fonnescj/Repositories/pymc3/pymc3/tuning/scaling.py in find_hessian_diag(point, vars, model)
    101     """
    102     model = modelcontext(model)
--> 103     H = model.fastfn(hessian_diag(model.logpt, vars))
    104     return H(Point(point, model=model))
    105 

/Users/fonnescj/Repositories/pymc3/pymc3/memoize.py in memoizer(*args, **kwargs)
     12 
     13         if key not in cache:
---> 14             cache[key] = obj(*args, **kwargs)
     15 
     16         return cache[key]

/Users/fonnescj/Repositories/pymc3/pymc3/theanof.py in hessian_diag(f, vars)
    101 
    102     if vars:
--> 103         return -t.concatenate([hessian_diag1(f, v) for v in vars], axis=0)
    104     else:
    105         return empty_gradient

/Users/fonnescj/Repositories/pymc3/pymc3/theanof.py in <listcomp>(.0)
    101 
    102     if vars:
--> 103         return -t.concatenate([hessian_diag1(f, v) for v in vars], axis=0)
    104     else:
    105         return empty_gradient

/Users/fonnescj/Repositories/pymc3/pymc3/theanof.py in hessian_diag1(f, v)
     92         return gradient1(g[i], v)[i]
     93 
---> 94     return theano.map(hess_ii, idx)[0]
     95 
     96 

/Users/fonnescj/anaconda3/lib/python3.5/site-packages/theano/scan_module/scan_views.py in map(fn, sequences, non_sequences, truncate_gradient, go_backwards, mode, name)
     67                      go_backwards=go_backwards,
     68                      mode=mode,
---> 69                      name=name)
     70 
     71 

/Users/fonnescj/anaconda3/lib/python3.5/site-packages/theano/scan_module/scan.py in scan(fn, sequences, outputs_info, non_sequences, n_steps, truncate_gradient, go_backwards, mode, name, profile, allow_gc, strict)
    473                 # If not we need to use copies, that will be replaced at
    474                 # each frame by the corresponding slice
--> 475                 actual_slice = seq['input'][k - mintap]
    476                 _seq_val = tensor.as_tensor_variable(seq['input'])
    477                 _seq_val_slice = _seq_val[k - mintap]

/Users/fonnescj/anaconda3/lib/python3.5/site-packages/theano/tensor/var.py in __getitem__(self, args)
    529                     self, *theano.tensor.subtensor.Subtensor.collapse(
    530                         args,
--> 531                         lambda entry: isinstance(entry, Variable)))
    532 
    533     def take(self, indices, axis=None, mode='raise'):

/Users/fonnescj/anaconda3/lib/python3.5/site-packages/theano/gof/op.py in __call__(self, *inputs, **kwargs)
    649                 thunk.outputs = [storage_map[v] for v in node.outputs]
    650 
--> 651                 required = thunk()
    652                 assert not required  # We provided all inputs
    653 

/Users/fonnescj/anaconda3/lib/python3.5/site-packages/theano/gof/op.py in rval()
    852 
    853         def rval():
--> 854             fill_storage()
    855             for o in node.outputs:
    856                 compute_map[o][0] = True

/Users/fonnescj/anaconda3/lib/python3.5/site-packages/theano/gof/cc.py in __call__(self)
   1712                 print(self.error_storage, file=sys.stderr)
   1713                 raise
-> 1714             reraise(exc_type, exc_value, exc_trace)
   1715 
   1716 

/Users/fonnescj/anaconda3/lib/python3.5/site-packages/six.py in reraise(tp, value, tb)
    684         if value.__traceback__ is not tb:
    685             raise value.with_traceback(tb)
--> 686         raise value
    687 
    688 else:

IndexError: index out of bounds
In [43]:
med_manage_model.vars
Out[43]:
[followup_time_missing,
 nu_log,
 age_centered_missing,
 μ,
 β_fup,
 β_age,
 σ_interval,
 ϵ]
In [44]:
pm.forestplot(trace_med_manage, vars=['age_centered_missing'], ylabels=plot_labels)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-44-ed6e9ecfa6a7> in <module>()
----> 1 pm.forestplot(trace_med_manage, vars=['age_centered_missing'], ylabels=plot_labels)

/Users/fonnescj/Repositories/pymc3/pymc3/plots.py in forestplot(trace_obj, vars, alpha, quartiles, rhat, main, xtitle, xrange, ylabels, chain_spacing, vline, gs)
    378 
    379             # Substitute HPD interval for quantile
--> 380             quants[0] = var_hpd[0].T
    381             quants[-1] = var_hpd[1].T
    382 

IndexError: index 0 is out of bounds for axis 0 with size 0
In [40]:
pm.summary(trace_med_manage, vars=['p_6', 'p_12'])
p_6:

  Mean             SD               MC Error         95% HPD interval
  -------------------------------------------------------------------
  
  0.075            0.000            0.000            [0.075, 0.075]
  0.075            0.000            0.000            [0.075, 0.075]
  0.075            0.000            0.000            [0.075, 0.075]
  0.075            0.000            0.000            [0.075, 0.075]
  0.075            0.000            0.000            [0.075, 0.075]
  0.075            0.000            0.000            [0.075, 0.075]
  0.552            0.000            0.000            [0.552, 0.552]

  Posterior quantiles:
  2.5            25             50             75             97.5
  |--------------|==============|==============|--------------|
  
  0.075          0.075          0.075          0.075          0.075
  0.075          0.075          0.075          0.075          0.075
  0.075          0.075          0.075          0.075          0.075
  0.075          0.075          0.075          0.075          0.075
  0.075          0.075          0.075          0.075          0.075
  0.075          0.075          0.075          0.075          0.075
  0.552          0.552          0.552          0.552          0.552


p_12:

  Mean             SD               MC Error         95% HPD interval
  -------------------------------------------------------------------
  
  0.075            0.000            0.000            [0.075, 0.075]
  0.075            0.000            0.000            [0.075, 0.075]
  0.075            0.000            0.000            [0.075, 0.075]
  0.075            0.000            0.000            [0.075, 0.075]
  0.075            0.000            0.000            [0.075, 0.075]
  0.075            0.000            0.000            [0.075, 0.075]
  0.552            0.000            0.000            [0.552, 0.552]

  Posterior quantiles:
  2.5            25             50             75             97.5
  |--------------|==============|==============|--------------|
  
  0.075          0.075          0.075          0.075          0.075
  0.075          0.075          0.075          0.075          0.075
  0.075          0.075          0.075          0.075          0.075
  0.075          0.075          0.075          0.075          0.075
  0.075          0.075          0.075          0.075          0.075
  0.075          0.075          0.075          0.075          0.075
  0.552          0.552          0.552          0.552          0.552

In [39]:
pm.forestplot(trace_med_manage, vars=['μ'], ylabels=plot_labels)
Out[39]:
<matplotlib.gridspec.GridSpec at 0x116f27c18>
In [80]:
def get_data(intervention):
    
    intervention_data = dataset[(dataset.intervention_cat==intervention)
                            & ~dataset[outcome_cats].isnull().sum(axis=1).astype(bool)]
    
    followup_masked = masked_values(intervention_data.followup_interval.values, np.nan)
    followup_min, followup_max = intervention_data[['fup_min', 'fup_max']].values.T

    outcomes = intervention_data[[ 'hysterectomy', 'myomectomy', 'uae',
           'MRIgFUS', 'ablation', 'iud', 'no_treatment']].values
    
    if np.isnan(outcomes).any():
        print('Missing values in outcomes for', intervention)

    followup_n = intervention_data.followup_n.values

    # Center age at 40
    age_masked = masked_values(intervention_data['BL Mean'].values - 40, np.nan)
    
    studies = intervention_data.study_id.unique()
    study_index = np.array([np.argwhere(studies==i).squeeze() for i in intervention_data.study_id])

    study_id = intervention_data.study_id.values
    
    n_studies = len(set(study_id))

    n_outcomes = 7
    arms = len(outcomes)    
    intervention_data = dataset[(dataset.intervention_cat==intervention)
                            & ~dataset[outcome_cats].isnull().sum(axis=1).astype(bool)]
    
    followup_masked = masked_values(intervention_data.followup_interval.values, 17.33)
    followup_min, followup_max = intervention_data[['fup_min', 'fup_max']].values.T

    outcomes = intervention_data[[ 'hysterectomy', 'myomectomy', 'uae',
           'MRIgFUS', 'ablation', 'iud', 'no_treatment']].values

    followup_n = intervention_data.followup_n.values

    age_masked = masked_values(intervention_data['BL Mean'].values, 41.33)
    
    studies = intervention_data.study_id.unique()
    study_index = np.array([np.argwhere(studies==i).squeeze() for i in intervention_data.study_id])

    study_id = intervention_data.study_id.values
    
    n_studies = len(set(study_id))

    n_outcomes = 7
    arms = len(outcomes)
    
    return locals()
In [81]:
foo = get_data('med_manage')
In [84]:
foo['age_masked'] - 40
Out[84]:
masked_array(data = [2.5 4.5 3.6000000000000014 -- 1.6000000000000014 1.6000000000000014
 1.1000000000000014 1.1000000000000014 -1 -1 -1 -0.3999999999999986
 -0.3999999999999986 -0.3999999999999986 1.6000000000000014
 1.6000000000000014 1.6000000000000014 1.6000000000000014
 1.6000000000000014 1.6000000000000014 1.6000000000000014
 1.6000000000000014 1.2999999999999972 1.2999999999999972
 1.2999999999999972 1.2999999999999972 1.1000000000000014
 1.1000000000000014 1.1000000000000014 1.1000000000000014
 -0.3299999999999983 -3.1300000000000026 -9.059999999999999 -9
 16.200000000000003 10.200000000000003 10.600000000000001
 -5.600000000000001 1 4 2.200000000000003 2.200000000000003 2 2 --
 9.200000000000003 -7.399999999999999 -- 9.200000000000003
 -7.399999999999999 -0.5 -1.8999999999999986 -3 -3 -- -- -- -- -- --],
             mask = [False False False  True False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False  True False False  True
 False False False False False False  True  True  True  True  True  True],
       fill_value = ?)