Our goal is to estimate the probabilities of requiring one of a suite of candidate follow-up treatments following randomization to a given initial treatment for uterine fibroids. Specifically, we are interested in estimating:
$$Pr(I_2|I_1 =i,T=t)$$where $I_1$ is an initial intervention, which take specific values $i = 1, 2, \ldots , K$ for each of $K$ candidate intervention types, $I_2$ is the followup intervention that also may take any of the same values of $i$, and $T$ is followup time in months, which will generally be either 6 or 12 months.
Our current set of candidate interventions include:
Rather than model each conditional probability independently, we will instead model the outcomes for a treatment arm as a multinomial random variable. That is,
$$\{X_{I_2} \} ∼ \text{Multinomial}(N_{I_1}=i, \{\pi_i\})$$where $\{X_{I_2}\}$ is the vector of outcomes corresponding to each of the possible followup interventions listed above, $N_{I_1}=i$ is the number of women randomized to the initial intervention i, and $\{\pi_i\}$ is a vector of conditional transition probabilities corresponding to $Pr(I_2|I_1 = i, T = t)$, as specified above. The multinomial distribution is a multivariate generalization of the categorical distribution, which is what the above simplifies to when modeling the outcome for a single patient. The multivariate formulation allows us to model study-arm-specific outcomes, incorporating covariates that are specific to that arm or study.
The quantities of interest are the vectors of transition probabilities $\{\pi_i\}$ corresponding to each of the initial candidate interventions. A naive approach to modeling these is to assign a vague Dirichlet prior distribution to each set, and perform Bayesian inference using the multinomial likelihood, with which the Dirichlet is conjugate, to yield posterior estimates for each probability. However, there may be additional information with which to model these probabilities, which may include:
hence, a given transition probability $\pi_{ijk}$ – the probability of transitioning from initial intervention $i$ to followup intervention $j$ in study $k$ – may be modeled as:
$$\text{logit}(\pi_{ijk})= \theta_{ij} + X_k \beta_{ij} + \epsilon_k$$where $\theta_{ij}$ is a baseline transition probability (on the logit scale), $X_k$ a matrix of study(-arm)-specific covariates, $\beta_{ij}$ the corresponding coefficients, and $\epsilon_k$ a mean-zero random effect for study k. We will initially consider (1) follow-up time and (2) mean/median age as covariates.
An attractive benefit to using Bayesian inference to estimate this model is that it is easy to generate predictions from the model, via the posterior predictive distribution. For example, we could estimate the distribution of the expected proportion of women requiring a particular followup intervention; this estimate would factor in both the residual uncertainty in the transition probability estimates, as well as the sampling uncertainty of the intervention.
%matplotlib inline
import numpy as np
import pandas as pd
import pymc3 as pm
import seaborn as sns
import matplotlib.pyplot as plt
import pdb
sns.set(style="ticks")
Import data from worksheets in Excel spreadsheet.
data_file = 'UF Subsequent Interventions Data_Master_updated.xlsx'
missing = ['NA', 'NR', 'ND', '?', 'null']
misc_data = pd.read_excel('data/' + data_file, sheetname='MISC (SP)', na_values=missing)
misc_data = misc_data[~misc_data['baseline_n'].isnull()].drop('notes', axis=1)
rows, cols = misc_data.shape
print('Occlusion rows={0}, columns={1}, missing={2}'.format(rows, cols,
misc_data.isnull().sum().sum()))
med_vs_iac_data = pd.read_excel('data/' + data_file, sheetname='Med vs IAC JW', na_values=missing)
med_vs_iac_data = med_vs_iac_data[~med_vs_iac_data['trial_arm'].isnull()].drop('notes', axis=1)
rows, cols = med_vs_iac_data.shape
print('Med vs IAC rows={0}, columns={1}, missing={2}'.format(rows, cols,
med_vs_iac_data.isnull().sum().sum()))
med_vs_med_data = pd.read_excel('data/' + data_file, sheetname='Med vs Med DVE', na_values=missing)
med_vs_med_data = med_vs_med_data[~med_vs_med_data['baseline_n'].isnull()].drop('notes', axis=1)
rows, cols = med_vs_med_data.shape
print('Med vs Med rows={0}, columns={1}, missing={2}'.format(rows, cols,
med_vs_med_data.isnull().sum().sum()))
uae_data = pd.read_excel('data/' + data_file, sheetname='UAE SK')
uae_data = uae_data[~uae_data['baseline_n'].isnull()].drop('notes', axis=1)
rows, cols = uae_data.shape
print('UAE rows={0}, columns={1}, missing={2}'.format(rows, cols,
uae_data.isnull().sum().sum()))
datasets = [misc_data, med_vs_iac_data, med_vs_med_data, uae_data]
Occlusion rows=31, columns=13, missing=6 Med vs IAC rows=49, columns=13, missing=46 Med vs Med rows=67, columns=13, missing=13 UAE rows=32, columns=13, missing=0
unique_inerventions = set(np.concatenate([d.intervention.values for d in datasets]))
Use the following lookup table to create "intervention category" field in each dataset.
# %load intervention_lookup.py
intervention_lookup = {'Ablation': 'ablation',
'Ablation+/- hysteroscopic myomectomy': 'ablation',
'Asoprisnil 10 mg': 'med_manage',
'Asoprisnil 25 mg': 'med_manage',
'Asoprisnil 5 mg': 'med_manage',
'Ulipristal (CD20)': 'med_manage',
'Ulipristal (CDB10)': 'med_manage',
'Hysterectomy': 'hysterectomy',
'LBCUV': 'uae',
'LP + GnRH agonist plus raloxifene': 'med_manage',
'LP + placebo': 'med_manage',
'LPA+ MPA / LPA+placebo': 'med_manage',
'LPA+ placebo / LPA+MPA': 'med_manage',
'LUNA plus LBCUV': 'ablation',
'Myomectomy': 'myomectomy',
'No treatment': 'control',
'No treatment (control)': 'control',
'Placebo': 'control',
'Raloxifene, 180mg/day': 'med_manage',
'SC implant of 3.6 goserelin + placebo (3 months) then tibolone 2.5 mg daily (3 months)': 'med_manage',
'SC implant of 3.6 goserelin + placebo (6 months)': 'med_manage',
'SC implant of 3.6 goserelin + tibolone 2.5 mg daily (6 months)': 'med_manage',
'Surgery': 'DROP',
'Tibolone': 'med_manage',
'UAE': 'uae',
'UAE only': 'uae',
'UAE plus goserelin acetate depot': 'uae',
'buserelin + goserelin': 'med_manage',
'buserelin, intranasal': 'med_manage',
'cabergoline': 'med_manage',
'diphereline': 'med_manage',
'gestrinone, 2.5mg': 'med_manage',
'gestrinone, 2.5mg oral + gestrinone, 5mg oral + gestrinone, 5mg vaginal': 'med_manage',
'gestrinone, 5mg': 'med_manage',
'gestrinone, 5mg vaginal': 'med_manage',
'goserelin, subcutaneous': 'med_manage',
'healthy controls': 'control',
'hormone replacement therapy, transdermal': 'DROP',
'hysterectomy or myomectomy': 'DROP',
'letrozole, 2.5mg': 'med_manage',
'leuprolide': 'med_manage',
'leuprolide acetate depot (11.25 mg q 3 months) + Placebo': 'med_manage',
'leuprolide acetate depot (11.25 mg q 3 months) + tibolone 2.5 mg/d orally': 'med_manage',
'leuprolide acetate depot (3.75 mg/28 d) + placebo (B)': 'med_manage',
'leuprolide plus (tibolone 2.5 mg daily) (A)': 'med_manage',
'leuprolide plus MPA': 'med_manage',
'leuprolide plus estrogen-progestin': 'med_manage',
'leuprolide plus placebo': 'med_manage',
'leuprolide plus progestin': 'med_manage',
'leuprolide plus raloxifene 60 mg daily': 'med_manage',
'leuprolide, 1.88mg': 'med_manage',
'leuprolide, 3.75mg': 'med_manage',
'mifepristone, 10mg': 'med_manage',
'mifepristone, 10mg + mifepristone, 5mg': 'med_manage',
'mifepristone, 2.5mg': 'med_manage',
'mifepristone, 5mg': 'med_manage',
'placebo': 'control',
'raloxifene 180 mg daily': 'med_manage',
'raloxifene 60 mg daily': 'med_manage',
'tamoxifen 20 mg daily': 'med_manage',
'tibolone': 'med_manage',
'tibolone, 2.5mg': 'med_manage',
'transdermal estrogen replacement therapy': 'med_manage',
'triptorelin, 100ug': 'med_manage',
'triptorelin, 100ug + triptorelin, 20ug + triptorelin, 5ug': 'med_manage',
'triptorelin, 20ug': 'med_manage',
'triptorelin, 3.6mg/mo': 'med_manage',
'triptorelin, 5ug': 'med_manage',
'ulipristal acetate followed by placebo': 'med_manage',
'ulipristal acetate followed by progestin': 'med_manage',
'ulipristal, 10mg': 'med_manage',
'ulipristal, 5mg': 'med_manage',
'HIFU': 'MRgFUS',
'HIFU with CEUS': 'MRgFUS',
'LUAO': 'uae',
'UAE plus PVA': 'uae',
'UAE plus TAG': 'uae',
'UAE with PVA': 'uae',
'UAE with PVA particles, large': 'uae',
'UAE with PVA particles, small': 'uae',
'UAE with SPA': 'uae',
'UAE with SPVA': 'uae',
'UAE with TAG': 'uae',
'UAE with TAG microspheres': 'uae',
'myomectomy': 'myomectomy',
'myomectomy with vasopressin': 'myomectomy',
'myomectomy, abdominal': 'myomectomy',
'myomectomy, laparoscopic': 'myomectomy',
'myomectomy, loop ligation with vasopressin': 'myomectomy',
'myomectomy, minilaparotomic': 'myomectomy'}
Assign intervention categories to each arm
datasets = [d.assign(intervention_cat=d.intervention.replace(intervention_lookup)) for d in datasets]
intervention_categories = set(intervention_lookup.values())
intervention_categories
{'DROP', 'MRgFUS', 'ablation', 'control', 'hysterectomy', 'med_manage', 'myomectomy', 'uae'}
Import demographic information
demographics = pd.read_excel('data/' + data_file, sheetname='ALL_DEMO_DATA', na_values=missing)
demographics.columns
Index(['study_id', 'Citation', 'FamCode', 'FamDesig', 'NCT', 'ArmsN', 'ArmCategory', 'Group_Desc', 'New Grouping', 'Demo_Category', 'Demo_specify', 'BL N', 'Denom_N', 'BL %', 'BL Mean', 'BL SD', 'BL_SE', 'BL_Median', 'BL Min', 'BL Max', 'BL 95% L', 'BL 95% H', 'BL_group_diff', 'Comments'], dtype='object')
Extract columns of interest
age_data = demographics.loc[demographics.Demo_Category=='Age', ['study_id', 'New Grouping', 'BL Mean', 'BL SD']]
Clean arm labels
age_data = age_data.assign(arm=age_data['New Grouping'].str.replace(':','')).drop('New Grouping', axis=1)
age_data.arm.unique()
array(['G2', 'G1', 'G1b', 'G1a', 'G3', 'CG', 'G1c', 'G1+G2', 'G1a+G1b+G1c'], dtype=object)
Concatenate all datasets
all_data = pd.concat(datasets)
Clean up study arm field
all_arm = all_data.trial_arm.str.replace(':','').str.replace(' ', '').str.replace('Group', 'G')
all_data = all_data.assign(arm=all_arm).drop('trial_arm', axis=1)
all_data.arm.unique()
array(['G1', 'G2', 'G3', 'CG', 'G1a', 'G1b', 'G1c', 'CG1', 'CG2', 'G1/CG', 'CG/G1', 'G1a+G1b', 'G1a+G1b+G1c', 'G1+G2'], dtype=object)
Clean up study ID field. Currently contains non-numeric entries. Will strip out the first study ID from the compund labels, as this is the parent study ID.
all_data.study_id.unique()
array([23, 347, 1400, 1529, 1806, 1889, 2375, 2967, 3382, 3690, 3785, 5186, 5474, 414, 1849, 3016, 3181, 3324, 3674, 4258, 4468, 4858, 4960, 5276, 5302, 6091, 6263, 6696, 7155, 7504, 7797, 7936, 95.0, 629.0, 757.0, 1290.0, 2318.0, 2555.0, 2635.0, 3312.0, 3978.0, 4787.0, 4961.0, 5721.0, 6393.0, 6903.0, 7139.0, 7309.0, 7530.0, 7589.0, 7763.0, '3803_3052', 1546, '3365_2026_1657_986', '3819_815_1986_2759_2971_\n3120_3175_3192_3678_3721', 4789, 2006], dtype=object)
str_mask = all_data.study_id.str.isnumeric()==False
all_data.loc[str_mask, 'study_id'] = all_data.study_id[str_mask].apply(lambda x: x[:x.find('_')])
all_data.study_id = all_data.study_id.astype(int)
all_data.study_id.unique()
array([ 23, 347, 1400, 1529, 1806, 1889, 2375, 2967, 3382, 3690, 3785, 5186, 5474, 414, 1849, 3016, 3181, 3324, 3674, 4258, 4468, 4858, 4960, 5276, 5302, 6091, 6263, 6696, 7155, 7504, 7797, 7936, 95, 629, 757, 1290, 2318, 2555, 2635, 3312, 3978, 4787, 4961, 5721, 6393, 6903, 7139, 7309, 7530, 7589, 7763, 3803, 1546, 3365, 3819, 4789, 2006])
Here is what the data look like after merging.
all_data.head()
study_id | intervention | baseline_n | followup_interval | followup_n | hysterectomy | myomectomy | uae | MRIgFUS | ablation | iud | no_treatment | intervention_cat | arm | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 23 | HIFU with CEUS | 17 | 12 | 17.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 16.0 | MRgFUS | G1 |
1 | 23 | HIFU | 16 | 12 | 16.0 | 0.0 | 0.0 | 0.0 | 3.0 | 0.0 | 0.0 | 13.0 | MRgFUS | G2 |
2 | 347 | UAE with SPVA | 30 | 12 | 27.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 26.0 | uae | G1 |
3 | 347 | UAE with TAG | 30 | 12 | 29.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 29.0 | uae | G2 |
4 | 1400 | UAE | 63 | 6 | 62.0 | 0.0 | 1.0 | 5.0 | 0.0 | 0.0 | 0.0 | 56.0 | uae | G1 |
all_data.groupby('intervention_cat')['study_id'].count()
intervention_cat DROP 8 MRgFUS 2 ablation 3 control 11 hysterectomy 7 med_manage 100 myomectomy 14 uae 34 Name: study_id, dtype: int64
Merge age data with outcomes
all_data_merged = pd.merge(all_data, age_data, on=['study_id', 'arm'])
For now, drop arms with no reported followup time (we may want to impute these):
all_data_merged = all_data_merged.dropna(subset=['followup_interval'])
Parse followup intervals that are ranges, creating fup_min
and fup_max
fields.
dataset = all_data_merged.assign(fup_min=0, fup_max=all_data.followup_interval.convert_objects(convert_numeric=True).max()+1)
range_index = dataset.followup_interval.str.contains('to').notnull()
range_vals = dataset[range_index].followup_interval.apply(lambda x: x.split(' '))
dataset.loc[range_index, ['fup_min']] = range_vals.apply(lambda x: float(x[0]))
dataset.loc[range_index, ['fup_max']] = range_vals.apply(lambda x: float(x[-1]))
dataset.loc[range_index, ['followup_interval']] = np.nan
dataset['followup_interval'] = dataset.followup_interval.astype(float)
Fill missing values
dataset.loc[dataset.followup_n.isnull(), 'followup_n'] = dataset.loc[dataset.followup_n.isnull(), 'baseline_n']
dataset.loc[dataset.no_treatment.isnull(), 'no_treatment'] = dataset.followup_n - dataset[[ 'hysterectomy', 'myomectomy', 'uae',
'MRIgFUS', 'ablation', 'iud']].sum(1)[dataset.no_treatment.isnull()]
dataset.followup_interval.unique()
array([ 12. , 6. , nan, 24. , 2. , 1. , 3. , 5.5, 9. , 18. , 0. , 7. , 60. ])
dataset['BL Mean'].unique()
array([43.1, 42, 43.9, 41.7, 41, 43.5, 40.3, 42.7, 45, 44, 38.26, 32.1, 34.3, 44.9, 42.5, 43.3, 38.4, 37.5, 45.9, 44.5, 33.97, 34, 41.3, 42.9, 42.1, 43.4, 37.7, 43, 40.2, 41.1, 49.1, 48.6, 36.3, 35.9, 37.2, 54.2, 51.2, 43.6, nan, 38.9, 37.1, 41.4, 36.9, 41.6, 39, 39.6, 39.67, 36.87, 30.94, 31, 39.5, 42.8, 56.2, 57.9, 50.2, 50.6, 34.4, 42.2, 49.2, 32.6, 48.4, 33.8, 38.1, 37, 32.3, 43.2, 44.6, 45.4, 46.4, 48.5, 48.3], dtype=object)
Identify crossover studies
crossover_studies = 7155, 3324, 414, 95, 7139, 6903, 3721, 3181, 4858, 4960, 4258, 4789, 2006, 2318
dataset['crossover_study'] = dataset.study_id.isin(crossover_studies)
dataset.head()
study_id | intervention | baseline_n | followup_interval | followup_n | hysterectomy | myomectomy | uae | MRIgFUS | ablation | iud | no_treatment | intervention_cat | arm | BL Mean | BL SD | fup_max | fup_min | crossover_study | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 23 | HIFU with CEUS | 17 | 12.0 | 17.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 16.0 | MRgFUS | G1 | 43.1 | 5.3 | 61.0 | 0 | False |
1 | 23 | HIFU | 16 | 12.0 | 16.0 | 0.0 | 0.0 | 0.0 | 3.0 | 0.0 | 0.0 | 13.0 | MRgFUS | G2 | 42 | 5.4 | 61.0 | 0 | False |
2 | 347 | UAE with SPVA | 30 | 12.0 | 27.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 26.0 | uae | G1 | 43.9 | 5.0 | 61.0 | 0 | False |
3 | 347 | UAE with TAG | 30 | 12.0 | 29.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 29.0 | uae | G2 | 41.7 | 5.4 | 61.0 | 0 | False |
4 | 1400 | UAE | 63 | 6.0 | 62.0 | 0.0 | 1.0 | 5.0 | 0.0 | 0.0 | 0.0 | 56.0 | uae | G1 | 41 | 3.5 | 61.0 | 0 | False |
Export data for posterity
dataset.to_csv('data/UF_interventions_clean.csv', na_rep=None)
outcome_cats = [ 'hysterectomy', 'myomectomy', 'uae',
'MRIgFUS', 'ablation', 'iud', 'no_treatment']
studies = dataset.study_id.unique()
study_index = np.array([np.argwhere(studies==i).squeeze() for i in dataset.study_id])
def append(tensor, value):
return T.concatenate([tensor, T.stack([value], axis=0)])
import theano.tensor as T
from numpy.ma import masked_values
SumTo1 = pm.transforms.SumTo1()
inverse_logit = pm.transforms.inverse_logit
def specify_model(model, intervention):
intervention_data = dataset[(dataset.intervention_cat==intervention)
& ~dataset[outcome_cats].isnull().sum(axis=1).astype(bool)].copy()
intervention_data.loc[intervention_data['followup_interval'].isnull(), 'followup_interval'] = 17.33
followup_masked = masked_values(intervention_data.followup_interval.values, 17.33)
followup_min, followup_max = intervention_data[['fup_min', 'fup_max']].values.T
outcomes = intervention_data[[ 'hysterectomy', 'myomectomy', 'uae',
'MRIgFUS', 'ablation', 'iud', 'no_treatment']].values
if np.isnan(outcomes).any():
print('Missing values in outcomes for', intervention)
followup_n = intervention_data.followup_n.values
# Center age at 40
intervention_data.loc[intervention_data['BL Mean'].isnull(), 'BL Mean'] = 90
age_masked = masked_values(intervention_data['BL Mean'].values - 40, 50)
studies = intervention_data.study_id.unique()
study_index = np.array([np.argwhere(studies==i).squeeze() for i in intervention_data.study_id])
study_id = intervention_data.study_id.values
n_studies = len(set(study_id))
n_outcomes = 7
arms = len(outcomes)
with model:
# Impute followup times uniformly over the observed range
if np.any(followup_masked.mask):
followup_time = pm.Uniform('followup_time', followup_min.min(), followup_max.max(),
shape=len(followup_min),
observed=followup_masked)
else:
followup_time = followup_masked.data.astype(float)
# Impute age using a T-distribution
if np.any(age_masked.mask):
nu = pm.Exponential('nu', 0.01, testval=10)
age_centered = pm.StudentT('age_centered', nu, shape=len(age_masked), observed=age_masked)
else:
age_centered = age_masked.data.astype(float)
# Mean probabilities (on log scale)
μ = pm.Normal('μ', np.ones(n_outcomes), sd=1000, shape=n_outcomes)
# Followup time covariates
θ_fup = pm.Normal('θ_fup', 0, sd=1000, shape=n_outcomes-1)
# Append a zero for no treatmnet
β_fup = append(θ_fup, 0)
# Age covariate
θ_age = pm.Normal('θ_age', 0, sd=1000, shape=n_outcomes-1)
# Append a zero for no treatment
β_age = append(θ_age, 0)
# Study random effect
η = pm.Normal('η', 0, 1, shape=n_studies)
σ = pm.HalfCauchy('σ', 5)
ϵ = η*σ
# Poisson rate
λ = [T.exp(μ + β_fup*followup_time[i] + β_age*age_centered[i] + ϵ[study_index[i]])
for i in range(arms)]
# Multinomial data likelihood
likelihood = [pm.Poisson('likelihood_%i' % i, λ[i]*followup_n[i], observed=outcomes[i]) for i in range(arms)]
'''
Scenario predictions
'''
# 40-years old, 6-month followup
p_6_40 = pm.Deterministic('p_6_40', T.exp(μ + β_fup*6)/T.exp(μ + β_fup*6).sum())
# 40-years old, 12-month followup
p_12_40 = pm.Deterministic('p_12_40', T.exp(μ + β_fup*12)/T.exp(μ + β_fup*12).sum())
# 40-years old, 24-month followup
p_24_40 = pm.Deterministic('p_24_40', T.exp(μ + β_fup*24)/T.exp(μ + β_fup*24).sum())
# 50-years old, 6-month followup
p_6_50 = pm.Deterministic('p_6_50',
T.exp(μ + β_fup*6 + β_age*10)/T.exp(μ + β_fup*6 + β_age*10).sum())
# 50-years old, 12-month followup
p_12_50 = pm.Deterministic('p_12_50',
T.exp(μ + β_fup*12 + β_age*10)/T.exp(μ + β_fup*12 + β_age*10).sum())
# 50-years old, 6-month followup
p_6_30 = pm.Deterministic('p_6_30',
T.exp(μ + β_fup*6 + β_age*(-10))/T.exp(μ + β_fup*6 + β_age*(-10)).sum())
# 50-years old, 24-month followup
p_24_50 = pm.Deterministic('p_24_50',
T.exp(μ + β_fup*24 + β_age*10)/T.exp(μ + β_fup*24 + β_age*10).sum())
# 50-years old, 24-month followup
p_24_30 = pm.Deterministic('p_24_30',
T.exp(μ + β_fup*24 + β_age*(-10))/T.exp(μ + β_fup*24 + β_age*(-10)).sum())
# 30-years old, 12-month followup
p_12_30 = pm.Deterministic('p_12_30',
T.exp(μ + β_fup*12 + β_age*-10)/T.exp(μ + β_fup*12 + β_age*-10).sum())
return model
use_NUTS = False
if use_NUTS:
n_iterations, n_burn = 2000, 0
else:
n_iterations, n_burn = 100000, 90000
uae_model = specify_model(pm.Model(), 'uae')
Applied log-transform to σ and added transformed σ_log_ to model.
with uae_model:
# start = {'θ_fup':np.zeros(n_outcomes-1), 'θ_age':np.zeros(n_outcomes-1)}
trace_uae = pm.sample(n_iterations, step=pm.Metropolis(), random_seed=20140925)
### Model output
[-----------------100%-----------------] 100000 of 100000 complete in 233.6 sec
trace_uae = trace_uae[n_burn:]
plot_labels = dataset.columns[5:12]
Imputed follow-up times
pm.forestplot(trace_uae, varnames=['followup_time_missing'])
<matplotlib.gridspec.GridSpec at 0x7fc40dd73390>
Baseline estimates for each outcome, on the log scale.
pm.forestplot(trace_uae, varnames=['μ'], ylabels=plot_labels)
<matplotlib.gridspec.GridSpec at 0x7fc40d90f2e8>
pm.traceplot(trace_uae, varnames=['μ'])
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fc408efbc18>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fc408eaf668>]], dtype=object)
Follow-up time effect size estimates. Positive values indicate higher probability of event with increased follow-up time.
pm.forestplot(trace_uae, varnames=['θ_fup'], ylabels=plot_labels[:-1])
<matplotlib.gridspec.GridSpec at 0x7fc408eca470>
pm.traceplot(trace_uae, varnames=['θ_fup'])
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fc408e18b00>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fc408de1a20>]], dtype=object)
Age effect size estimates. Positive values suggest higher probability of event with each year above age 40.
pm.forestplot(trace_uae, varnames=['θ_age'], ylabels=plot_labels[:-1])
<matplotlib.gridspec.GridSpec at 0x7fc408ea2240>
Estimated probabilities of follow-up interventions for 6-month followup and age 40.
pm.forestplot(trace_uae, varnames=['p_6_40'], ylabels=plot_labels)
<matplotlib.gridspec.GridSpec at 0x7fc408d58f60>
Estimated probabilities of follow-up interventions for 12-month followup and age 40.
pm.forestplot(trace_uae, varnames=['p_12_40'], ylabels=plot_labels)
<matplotlib.gridspec.GridSpec at 0x7fc408ce5c88>
Estimated probabilities of follow-up interventions for 24-month followup and age 40.
pm.forestplot(trace_uae, varnames=['p_24_40'], ylabels=plot_labels)
<matplotlib.gridspec.GridSpec at 0x7fc408cdd630>
Estimated probabilities of follow-up interventions for 6-month followup and age 30.
pm.forestplot(trace_uae, varnames=['p_6_30'], ylabels=plot_labels)
<matplotlib.gridspec.GridSpec at 0x7fc408bd4390>
Estimated probabilities of follow-up interventions for 6-month followup and age 50.
pm.forestplot(trace_uae, varnames=['p_6_50'], ylabels=plot_labels)
<matplotlib.gridspec.GridSpec at 0x7fc408bbe630>
Estimated probabilities of follow-up interventions for 24-month followup and age 30.
pm.forestplot(trace_uae, varnames=['p_24_30'], ylabels=plot_labels)
<matplotlib.gridspec.GridSpec at 0x7fc408ace2b0>
Estimated probabilities of follow-up interventions for 24-month followup and age 50.
pm.forestplot(trace_uae, varnames=['p_24_50'], ylabels=plot_labels)
<matplotlib.gridspec.GridSpec at 0x7fc408d48dd8>
def generate_table(model, trace):
tables = []
for scenario in uae_model.deterministics[1:]:
table = pm.df_summary(trace, varnames=[scenario]).round(3).drop('mc_error', axis=1)
table.index = plot_labels.values.tolist()[:-1] + ['none']
tokens = scenario.name.split('_')
fup = int(tokens[1])
age = int(tokens[2])
table['age'] = age
table['followup'] = fup
table.index.name = 'next intervention'
tables.append(table)
return (pd.concat(tables).set_index(['age', 'followup'], append=True)
.reorder_levels([1,2,0])
.sort_index(level='age'))
uae_table = generate_table(uae_model, trace_uae)
uae_table
mean | sd | hpd_2.5 | hpd_97.5 | |||
---|---|---|---|---|---|---|
age | followup | next intervention | ||||
30 | 6 | MRIgFUS | 0.000 | 0.000 | 0.000 | 0.000 |
ablation | 0.000 | 0.000 | 0.000 | 0.000 | ||
hysterectomy | 0.002 | 0.002 | 0.000 | 0.006 | ||
iud | 0.000 | 0.000 | 0.000 | 0.000 | ||
myomectomy | 0.397 | 0.053 | 0.292 | 0.498 | ||
none | 0.591 | 0.052 | 0.493 | 0.696 | ||
uae | 0.010 | 0.008 | 0.000 | 0.025 | ||
12 | MRIgFUS | 0.000 | 0.000 | 0.000 | 0.000 | |
ablation | 0.000 | 0.000 | 0.000 | 0.000 | ||
hysterectomy | 0.003 | 0.002 | 0.000 | 0.008 | ||
iud | 0.000 | 0.000 | 0.000 | 0.000 | ||
myomectomy | 0.385 | 0.052 | 0.285 | 0.485 | ||
none | 0.601 | 0.051 | 0.499 | 0.698 | ||
uae | 0.011 | 0.009 | 0.000 | 0.029 | ||
24 | MRIgFUS | 0.000 | 0.000 | 0.000 | 0.000 | |
ablation | 0.000 | 0.000 | 0.000 | 0.000 | ||
hysterectomy | 0.005 | 0.004 | 0.000 | 0.013 | ||
iud | 0.000 | 0.000 | 0.000 | 0.000 | ||
myomectomy | 0.361 | 0.066 | 0.227 | 0.484 | ||
none | 0.618 | 0.064 | 0.494 | 0.744 | ||
uae | 0.015 | 0.012 | 0.001 | 0.041 | ||
40 | 6 | MRIgFUS | 0.000 | 0.000 | 0.000 | 0.000 |
ablation | 0.000 | 0.000 | 0.000 | 0.000 | ||
hysterectomy | 0.018 | 0.005 | 0.008 | 0.027 | ||
iud | 0.000 | 0.000 | 0.000 | 0.000 | ||
myomectomy | 0.035 | 0.008 | 0.022 | 0.053 | ||
none | 0.933 | 0.011 | 0.913 | 0.954 | ||
uae | 0.014 | 0.005 | 0.005 | 0.023 | ||
12 | MRIgFUS | 0.000 | 0.000 | 0.000 | 0.000 | |
ablation | 0.000 | 0.000 | 0.000 | 0.001 | ||
... | ... | ... | ... | ... | ||
... | 0.929 | 0.010 | 0.908 | 0.948 | ||
none | 0.016 | 0.005 | 0.007 | 0.025 | ||
12 | uae | 0.000 | 0.000 | 0.000 | 0.000 | |
MRIgFUS | 0.000 | 0.000 | 0.000 | 0.001 | ||
ablation | 0.035 | 0.009 | 0.017 | 0.050 | ||
hysterectomy | 0.000 | 0.000 | 0.000 | 0.000 | ||
iud | 0.029 | 0.006 | 0.019 | 0.042 | ||
myomectomy | 0.915 | 0.012 | 0.893 | 0.939 | ||
none | 0.021 | 0.005 | 0.010 | 0.031 | ||
40 | 24 | uae | 0.000 | 0.000 | 0.000 | 0.000 |
MRIgFUS | 0.039 | 0.085 | 0.000 | 0.212 | ||
ablation | 0.067 | 0.039 | 0.003 | 0.139 | ||
hysterectomy | 0.342 | 0.270 | 0.002 | 0.857 | ||
iud | 0.001 | 0.001 | 0.000 | 0.003 | ||
myomectomy | 0.542 | 0.234 | 0.095 | 0.872 | ||
none | 0.010 | 0.007 | 0.000 | 0.023 | ||
6 | uae | 0.000 | 0.000 | 0.000 | 0.000 | |
MRIgFUS | 0.043 | 0.083 | 0.000 | 0.214 | ||
ablation | 0.091 | 0.043 | 0.017 | 0.179 | ||
hysterectomy | 0.265 | 0.203 | 0.003 | 0.685 | ||
iud | 0.001 | 0.001 | 0.000 | 0.003 | ||
myomectomy | 0.588 | 0.176 | 0.219 | 0.858 | ||
none | 0.012 | 0.007 | 0.002 | 0.026 | ||
12 | uae | 0.000 | 0.000 | 0.000 | 0.000 | |
MRIgFUS | 0.050 | 0.083 | 0.000 | 0.218 | ||
ablation | 0.151 | 0.057 | 0.055 | 0.275 | ||
hysterectomy | 0.154 | 0.123 | 0.001 | 0.404 | ||
iud | 0.001 | 0.001 | 0.000 | 0.002 | ||
myomectomy | 0.627 | 0.110 | 0.419 | 0.832 | ||
none | 0.018 | 0.009 | 0.004 | 0.034 |
63 rows × 4 columns
def factorplot(table):
table_flat = table.reset_index().rename(columns = {'mean':'probability'})
sns.factorplot(x="followup", y="probability", hue="next intervention", col="age",
data=table_flat[table_flat['next intervention']!='none'],
facet_kws={'ylim':(0,0.6)})
factorplot(uae_table)
uae_table_flat = uae_table.reset_index().rename(columns = {'mean':'probability'}).drop(['sd'],
axis=1)
myomectomy_model = specify_model(pm.Model(), 'myomectomy')
Applied log-transform to σ and added transformed σ_log_ to model.
with myomectomy_model:
trace_myomectomy = pm.sample(n_iterations, step=pm.Metropolis(), random_seed=20140925)
[-----------------100%-----------------] 100000 of 100000 complete in 140.6 sec
Baseline estimates on log scale
pm.forestplot(trace_myomectomy, varnames=['μ'], ylabels=plot_labels)
<matplotlib.gridspec.GridSpec at 0x7fc3fdd5ce80>
Followup time effect
pm.forestplot(trace_myomectomy, varnames=['θ_fup'], ylabels=plot_labels[:-1])
<matplotlib.gridspec.GridSpec at 0x7fc3fdd67ef0>
Age over 40 effect
pm.forestplot(trace_myomectomy, varnames=['θ_age'], ylabels=plot_labels[:-1])
<matplotlib.gridspec.GridSpec at 0x7fc431e632e8>
Predicted probabilities for 6 months after followup, 40 years of age.
pm.forestplot(trace_myomectomy, varnames=['p_6_40'], ylabels=plot_labels)
<matplotlib.gridspec.GridSpec at 0x7fc3fedcda58>
Estimated probabilities of follow-up interventions for 12-month followup and age 40.
pm.forestplot(trace_myomectomy, varnames=['p_12_40'], ylabels=plot_labels)
<matplotlib.gridspec.GridSpec at 0x7fc3fedc2160>
Estimated probabilities of follow-up interventions for 24-month followup and age 40.
pm.forestplot(trace_myomectomy, varnames=['p_24_40'], ylabels=plot_labels)
<matplotlib.gridspec.GridSpec at 0x7fc3fede6630>
Estimated probabilities of follow-up interventions for 6-month followup and age 30.
pm.forestplot(trace_myomectomy, varnames=['p_6_30'], ylabels=plot_labels)
<matplotlib.gridspec.GridSpec at 0x7fc42170b6a0>
Estimated probabilities of follow-up interventions for 6-month followup and age 50.
pm.forestplot(trace_myomectomy, varnames=['p_6_50'], ylabels=plot_labels)
<matplotlib.gridspec.GridSpec at 0x7fc427541898>
Estimated probabilities of follow-up interventions for 24-month followup and age 30.
pm.forestplot(trace_myomectomy, varnames=['p_24_30'], ylabels=plot_labels)
<matplotlib.gridspec.GridSpec at 0x7fc4277fb080>
Estimated probabilities of follow-up interventions for 24-month followup and age 50.
pm.forestplot(trace_myomectomy, varnames=['p_24_50'], ylabels=plot_labels)
<matplotlib.gridspec.GridSpec at 0x7fc43e334048>
myomectomy_table = generate_table(myomectomy_model, trace_myomectomy)
myomectomy_table_flat = myomectomy_table.reset_index().rename(columns = {'mean':'probability'}).drop(['sd'],
axis=1)
factorplot(myomectomy_table)
med_manage_model = specify_model(pm.Model(), 'med_manage')
Applied log-transform to nu and added transformed nu_log_ to model. Applied log-transform to σ and added transformed σ_log_ to model.
with med_manage_model:
trace_med_manage = pm.sample(n_iterations, step=pm.Metropolis(), random_seed=20140925)
[-----------------100%-----------------] 100000 of 100000 complete in 655.5 sec
Baseline log-probabilities
pm.forestplot(trace_med_manage, varnames=['μ'], ylabels=plot_labels)
<matplotlib.gridspec.GridSpec at 0x7fc43d0dbf28>
Estimated probabilities of follow-up interventions for 6-month followup and age 40.
pm.forestplot(trace_med_manage, varnames=['p_6_40'])
<matplotlib.gridspec.GridSpec at 0x7fc414499390>
Estimated probabilities of follow-up interventions for 12-month followup and age 40.
pm.forestplot(trace_med_manage, varnames=['p_12_40'], ylabels=plot_labels)
<matplotlib.gridspec.GridSpec at 0x7fc414472cf8>
Estimated probabilities of follow-up interventions for 24-month followup and age 40.
pm.forestplot(trace_med_manage, varnames=['p_24_40'], ylabels=plot_labels)
<matplotlib.gridspec.GridSpec at 0x7fc43319dfd0>
Estimated probabilities of follow-up interventions for 6-month followup and age 30.
pm.forestplot(trace_med_manage, varnames=['p_6_30'], ylabels=plot_labels)
<matplotlib.gridspec.GridSpec at 0x7fc433ec8470>
Estimated probabilities of follow-up interventions for 6-month followup and age 50.
pm.forestplot(trace_med_manage, varnames=['p_6_50'], ylabels=plot_labels)
<matplotlib.gridspec.GridSpec at 0x7fc425e2f550>
Estimated probabilities of follow-up interventions for 24-month followup and age 30.
pm.forestplot(trace_med_manage, varnames=['p_24_30'], ylabels=plot_labels)
<matplotlib.gridspec.GridSpec at 0x7fc423d47518>
Estimated probabilities of follow-up interventions for 24-month followup and age 50.
pm.forestplot(trace_med_manage, varnames=['p_24_50'], ylabels=plot_labels)
<matplotlib.gridspec.GridSpec at 0x7fc432b369e8>
med_manage_table = generate_table(med_manage_model, trace_med_manage)
med_manage_table_flat = med_manage_table.reset_index().rename(columns = {'mean':'probability'}).drop(['sd'],
axis=1)
factorplot(med_manage_table)
med_manage_table_flat['initial_treatment'] = 'medical management'
uae_table_flat['initial_treatment'] = 'UAE'
myomectomy_table_flat['initial_treatment'] = 'myomectomy'
all_tables = [med_manage_table_flat, uae_table_flat, myomectomy_table_flat]
full_table = pd.concat(all_tables)
full_table.head()
age | followup | next intervention | probability | hpd_2.5 | hpd_97.5 | initial_treatment | |
---|---|---|---|---|---|---|---|
0 | 30 | 6 | MRIgFUS | 0.000 | 0.000 | 0.000 | medical management |
1 | 30 | 6 | ablation | 0.000 | 0.000 | 0.000 | medical management |
2 | 30 | 6 | hysterectomy | 0.016 | 0.008 | 0.026 | medical management |
3 | 30 | 6 | iud | 0.000 | 0.000 | 0.000 | medical management |
4 | 30 | 6 | myomectomy | 0.008 | 0.002 | 0.014 | medical management |
full_table['probability (95% CI)'] = full_table.apply(lambda x: '{0:1.2f} ({1:1.2f}, {2:1.2f})'.format(x['probability'],
x['hpd_2.5'],
x['hpd_97.5']), axis=1)
full_table.head()
age | followup | next intervention | probability | hpd_2.5 | hpd_97.5 | initial_treatment | probability (95% CI) | |
---|---|---|---|---|---|---|---|---|
0 | 30 | 6 | MRIgFUS | 0.000 | 0.000 | 0.000 | medical management | 0.00 (0.00, 0.00) |
1 | 30 | 6 | ablation | 0.000 | 0.000 | 0.000 | medical management | 0.00 (0.00, 0.00) |
2 | 30 | 6 | hysterectomy | 0.016 | 0.008 | 0.026 | medical management | 0.02 (0.01, 0.03) |
3 | 30 | 6 | iud | 0.000 | 0.000 | 0.000 | medical management | 0.00 (0.00, 0.00) |
4 | 30 | 6 | myomectomy | 0.008 | 0.002 | 0.014 | medical management | 0.01 (0.00, 0.01) |
formatted_tables = {}
for inter,table in full_table.groupby('initial_treatment'):
table_pivot = table.assign(age_followup=table.age.astype(str)+'-'+table.followup.astype(str)).pivot(index='age_followup',
columns='next intervention',
values='probability (95% CI)')
age, followup = np.transpose([(int(a), int(f)) for a,f in table_pivot.index.str.split('-').values])
table_pivot = table_pivot.set_index([age, followup])
table_pivot.index.names = 'age', 'followup'
formatted_tables[inter] = (table_pivot[['none', 'uae', 'iud', 'myomectomy', 'hysterectomy', 'MRIgFUS']]
.sortlevel([0,1]))
formatted_tables['UAE']
next intervention | none | uae | iud | myomectomy | hysterectomy | MRIgFUS | |
---|---|---|---|---|---|---|---|
age | followup | ||||||
30 | 6 | 0.59 (0.49, 0.70) | 0.01 (0.00, 0.03) | 0.00 (0.00, 0.00) | 0.40 (0.29, 0.50) | 0.00 (0.00, 0.01) | 0.00 (0.00, 0.00) |
12 | 0.60 (0.50, 0.70) | 0.01 (0.00, 0.03) | 0.00 (0.00, 0.00) | 0.39 (0.28, 0.48) | 0.00 (0.00, 0.01) | 0.00 (0.00, 0.00) | |
24 | 0.62 (0.49, 0.74) | 0.01 (0.00, 0.04) | 0.00 (0.00, 0.00) | 0.36 (0.23, 0.48) | 0.01 (0.00, 0.01) | 0.00 (0.00, 0.00) | |
40 | 6 | 0.93 (0.91, 0.95) | 0.01 (0.01, 0.02) | 0.00 (0.00, 0.00) | 0.04 (0.02, 0.05) | 0.02 (0.01, 0.03) | 0.00 (0.00, 0.00) |
12 | 0.93 (0.91, 0.95) | 0.02 (0.01, 0.03) | 0.00 (0.00, 0.00) | 0.03 (0.02, 0.05) | 0.02 (0.01, 0.03) | 0.00 (0.00, 0.00) | |
24 | 0.92 (0.89, 0.94) | 0.02 (0.01, 0.03) | 0.00 (0.00, 0.00) | 0.03 (0.02, 0.04) | 0.04 (0.02, 0.05) | 0.00 (0.00, 0.00) | |
50 | 6 | 0.54 (0.10, 0.87) | 0.01 (0.00, 0.02) | 0.34 (0.00, 0.86) | 0.00 (0.00, 0.00) | 0.07 (0.00, 0.14) | 0.00 (0.00, 0.00) |
12 | 0.59 (0.22, 0.86) | 0.01 (0.00, 0.03) | 0.27 (0.00, 0.69) | 0.00 (0.00, 0.00) | 0.09 (0.02, 0.18) | 0.00 (0.00, 0.00) | |
24 | 0.63 (0.42, 0.83) | 0.02 (0.00, 0.03) | 0.15 (0.00, 0.40) | 0.00 (0.00, 0.00) | 0.15 (0.06, 0.28) | 0.00 (0.00, 0.00) |
formatted_tables['myomectomy']
next intervention | none | uae | iud | myomectomy | hysterectomy | MRIgFUS | |
---|---|---|---|---|---|---|---|
age | followup | ||||||
30 | 6 | 0.58 (0.00, 0.97) | 0.10 (0.00, 1.00) | 0.15 (0.00, 1.00) | 0.09 (0.00, 0.37) | 0.00 (0.00, 0.01) | 0.05 (0.00, 0.37) |
12 | 0.84 (0.49, 1.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.16 (0.00, 0.50) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | |
24 | 0.72 (0.25, 1.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.28 (0.00, 0.75) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | |
40 | 6 | 0.99 (0.98, 1.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.01 (0.00, 0.02) | 0.00 (0.00, 0.00) |
12 | 1.00 (0.99, 1.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.01) | 0.00 (0.00, 0.00) | |
24 | 1.00 (0.99, 1.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.01) | 0.00 (0.00, 0.01) | 0.00 (0.00, 0.00) | |
50 | 6 | 0.40 (0.00, 0.99) | 0.13 (0.00, 1.00) | 0.09 (0.00, 1.00) | 0.00 (0.00, 0.00) | 0.04 (0.00, 0.23) | 0.11 (0.00, 1.00) |
12 | 0.85 (0.00, 1.00) | 0.05 (0.00, 0.33) | 0.03 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.02 (0.00, 0.10) | 0.03 (0.00, 0.00) | |
24 | 0.99 (0.95, 1.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.01 (0.00, 0.05) | 0.00 (0.00, 0.00) |
formatted_tables['medical management']
next intervention | none | uae | iud | myomectomy | hysterectomy | MRIgFUS | |
---|---|---|---|---|---|---|---|
age | followup | ||||||
30 | 6 | 0.98 (0.96, 0.99) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.01 (0.00, 0.01) | 0.02 (0.01, 0.03) | 0.00 (0.00, 0.00) |
12 | 0.99 (0.98, 1.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.01) | 0.00 (0.00, 0.01) | 0.00 (0.00, 0.00) | |
24 | 0.99 (0.98, 1.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.01) | 0.00 (0.00, 0.00) | 0.01 (0.00, 0.00) | |
40 | 6 | 0.99 (0.98, 0.99) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.01) | 0.01 (0.01, 0.01) | 0.00 (0.00, 0.00) |
12 | 1.00 (0.99, 1.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | |
24 | 0.99 (0.99, 1.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | |
50 | 6 | 0.99 (0.99, 1.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.01 (0.00, 0.01) | 0.00 (0.00, 0.00) |
12 | 1.00 (1.00, 1.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | |
24 | 0.99 (1.00, 1.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.00 (0.00, 0.00) | 0.01 (0.00, 0.00) |
Posterior predictive checks for models. We generate simulated datasets using the model, and check where the observed data lies in the distribution of simulated values. Extreme values of the data percentile is suggestive of problems.
ppc_uae = pm.sample_ppc(trace_uae, model=uae_model, samples=500)
intervention_data = dataset[(dataset.intervention_cat=='uae')
& ~dataset[outcome_cats].isnull().sum(axis=1).astype(bool)].copy()
outcomes = intervention_data[[ 'hysterectomy', 'myomectomy', 'uae',
'MRIgFUS', 'ablation', 'iud', 'no_treatment']].values
from scipy.stats import percentileofscore
Calculate percentiles of each observation relative to simulated data.
percentiles = []
for label in ppc_uae:
if label.startswith('likelihood'):
tokens = label.split('_')
index = int(tokens[1])
simvals = ppc_uae[label]
obsvals = outcomes[index]
p = [percentileofscore(s, o).round(2) for s,o in zip(simvals.T, obsvals)]
percentiles.append(pd.Series(p, index=plot_labels))
pd.concat(percentiles, axis=1).T.hist();