All studies with successful outcomes reported for early-identified children who are deaf or hard of hearing have intervention provided by specialists who are trained in parent-infant intervention services.[12,90,97] Early intervention programs should develop mechanisms to ensure that early intervention professionals have special skills necessary for providing families with the highest quality of service specific to children with hearing loss. Professionals with a background in deaf education, audiology, and speech-language pathology will typically have the skills needed for providing intervention services. Professionals should be highly qualified in their respective fields and should be skilled communicators who are knowledgeable and sensitive to the importance of enhancing families' strengths and supporting their priorities. When early intervention professionals have knowledge of the principles of adult learning, it increases their success with parents and other professionals.
TJC must be competent to provide services.
References:
Purpose of the study is to examine the relationship between highly qualified providers, age of enrollment in intervention, and speech and language outcomes at 4 years of age in a group of children enrolled in school programs that specialize in the development of listening and spoken language.
# Import modules and set options
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
sns.set()
pd.set_option('display.notebook_repr_html', False)
pd.set_option('display.max_columns', 20)
pd.set_option('display.max_rows', 100)
Import data
lsl_dr = pd.read_csv('lsl_dr.csv', index_col=0)
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/IPython/core/interactiveshell.py:2723: DtypeWarning: Columns (31,41) have mixed types. Specify dtype option on import or set low_memory=False. interactivity=interactivity, compiler=compiler, result=result)
lsl_dr.head()
redcap_event_name academic_year hl male _race prim_lang sib \ 0 initial_assessment_arm_1 2002-2003 0.0 0.0 0.0 0.0 1.0 1 initial_assessment_arm_1 2002-2003 0.0 0.0 0.0 0.0 1.0 2 initial_assessment_arm_1 2002-2003 0.0 0.0 0.0 0.0 1.0 3 initial_assessment_arm_1 2008-2009 0.0 0.0 0.0 0.0 2.0 4 initial_assessment_arm_1 2008-2009 0.0 0.0 0.0 0.0 2.0 _mother_ed father_ed premature_age ... age_test \ 0 6.0 6.0 9.0 ... 54.0 1 6.0 6.0 9.0 ... 54.0 2 6.0 6.0 9.0 ... 54.0 3 5.0 5.0 7.0 ... 53.0 4 5.0 5.0 7.0 ... 53.0 domain school score test_name test_type \ 0 Expressive Vocabulary 101.0 58.0 NaN EOWPVT 1 Language 101.0 51.0 PLS receptive 2 Language 101.0 60.0 PLS expressive 3 Articulation 628.0 107.0 NaN Goldman 4 Language 628.0 103.0 CELF-P2 receptive academic_year_start tech_class age_year non_profound 0 2002.0 Bimodal 4.0 False 1 2002.0 Bimodal 4.0 False 2 2002.0 Bimodal 4.0 False 3 2008.0 Bilateral HA 1.0 True 4 2008.0 Bilateral HA 1.0 True [5 rows x 77 columns]
Indicator for non-profound hearing loss
lsl_dr['deg_hl_below6'] = lsl_dr.degree_hl<6
lsl_dr.deg_hl_below6[lsl_dr.degree_hl.isnull()] = np.nan
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy from ipykernel import kernelapp as app
Indicator for first intervention outside OPTION
lsl_dr['int_outside_option'] = lsl_dr.age > lsl_dr.age_int
lsl_dr.int_outside_option[lsl_dr.age < lsl_dr.age_int] = np.nan
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy from ipykernel import kernelapp as app
Indicator for high school graduation of mother
lsl_dr['mother_hs'] = lsl_dr.mother_ed > 1
lsl_dr.mother_hs[lsl_dr.mother_ed.isnull()] = None
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy from ipykernel import kernelapp as app
Joint commission benchmark
lsl_dr.onset_1.unique()
array([ 15. , 4. , 0. , 26. , 36. , 24. , 80. , 14. , 62. , 2. , 49. , 19. , 23. , 18. , 9. , nan, 10. , 12. , 1. , 5. , 30. , 7. , 51. , 8. , 3. , 17. , 50. , 31. , 34. , 28. , 35. , 38. , 95. , 42. , 13. , 16. , 61. , 46. , 22. , 53. , 59. , 88. , 6. , 37. , 96. , 52. , 64. , 65. , 48. , 97. , 25. , 47. , 79. , 107. , 74. , 77. , 84. , 60. , 41. , 33. , 39. , 27. , 11. , 20. , 21. , 45. , 29. , 32. , 81. , 1.5, 55. , 70. , 58. , 154. , 54. , 78. , 43. , 57. , 83. , 44. , 72. , 116. , 40. , 119. , 63. , 66. , 56. , 87. , 76. , 68. , 92. , 140. , 86. , 126. , 85. , 133. , 103. , 67. , 71. , 2.5, 98. , 75. , 0.5, 152. , 89. ])
ident_by_3 = lsl_dr.onset_1 <= 3
program_by_6 = lsl_dr.age <= 6
lsl_dr['jcih'] = ident_by_3 | program_by_6
lsl_dr.jcih[lsl_dr.onset_1.isnull() | lsl_dr.age.isnull()] = None
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py:4: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
Create age in years variable
lsl_dr['age_years'] = lsl_dr.age/12.
Retain only 4-year-olds
lsl_dr = lsl_dr[(lsl_dr.age_test>=48) & (lsl_dr.age_test<60)]
lsl_dr.shape
(7130, 82)
lsl_dr.study_id.unique().shape
(1670,)
unique_students = lsl_dr.drop_duplicates(subset='study_id')
unique_students.shape
(1670, 82)
unique_students.age.describe()
count 1648.000000 mean 29.581311 std 15.886114 min 0.000000 25% 18.000000 50% 32.000000 75% 41.000000 max 68.000000 Name: age, dtype: float64
unique_students.male.mean()
0.5243535778713169
unique_students.sib.describe()
count 1553.000000 mean 1.167418 std 0.933508 min 0.000000 25% 0.000000 50% 1.000000 75% 2.000000 max 3.000000 Name: sib, dtype: float64
(unique_students.degree_hl == 6).describe()
count 1670 unique 2 top False freq 919 Name: degree_hl, dtype: object
((unique_students.degree_hl < 6) & (unique_students.degree_hl > 1)).describe()
count 1670 unique 2 top False freq 1032 Name: degree_hl, dtype: object
unique_students.mother_hs.describe()
count 1044.000000 mean 0.576628 std 0.494330 min 0.000000 25% 0.000000 50% 1.000000 75% 1.000000 max 1.000000 Name: mother_hs, dtype: float64
unique_students.synd_or_disab.describe()
count 1469.000000 mean 0.208986 std 0.406723 min 0.000000 25% 0.000000 50% 0.000000 75% 0.000000 max 1.000000 Name: synd_or_disab, dtype: float64
grouped_age = unique_students.replace({'jcih':{1:'JCIH', 0:'non-JCIH'}}).groupby('jcih')['age_years']
fig, ax = plt.subplots()
for k, v in grouped_age:
v.hist(label=k, alpha=.75, ax=ax, grid=False)
ax.legend()
plt.xlabel('Age (years)'); plt.ylabel('Frequency');
# Filter domain and English-only, and appropriate bilateral subset
receptive_language_dataset = lsl_dr[(lsl_dr.domain=='Language') & (lsl_dr.test_type=='receptive') &
(lsl_dr.non_english==False)]
receptive_language_dataset.head()
redcap_event_name academic_year hl male _race prim_lang sib \ 1 initial_assessment_arm_1 2002-2003 0.0 0.0 0.0 0.0 1.0 4 initial_assessment_arm_1 2008-2009 0.0 0.0 0.0 0.0 2.0 67 initial_assessment_arm_1 2009-2010 0.0 0.0 0.0 0.0 1.0 74 initial_assessment_arm_1 2009-2010 0.0 0.0 2.0 0.0 3.0 103 initial_assessment_arm_1 2009-2010 0.0 1.0 0.0 0.0 3.0 _mother_ed father_ed premature_age ... test_type \ 1 6.0 6.0 9.0 ... receptive 4 5.0 5.0 7.0 ... receptive 67 4.0 3.0 9.0 ... receptive 74 2.0 2.0 8.0 ... receptive 103 2.0 4.0 8.0 ... receptive academic_year_start tech_class age_year non_profound deg_hl_below6 \ 1 2002.0 Bimodal 4.0 False 0.0 4 2008.0 Bilateral HA 1.0 True 1.0 67 2009.0 Bilateral CI 0.0 False 0.0 74 2009.0 Bilateral HA 2.0 True 1.0 103 2009.0 Bilateral HA 1.0 True 1.0 int_outside_option mother_hs jcih age_years 1 0.0 NaN 0.0 4.333333 4 1.0 1.0 0.0 1.416667 67 0.0 1.0 1.0 0.250000 74 0.0 0.0 1.0 2.166667 103 1.0 0.0 1.0 1.416667 [5 rows x 82 columns]
This is the size of the resulting dataset to be used in this analysis
receptive_language_dataset.shape
(1090, 82)
mean_score = receptive_language_dataset.groupby('study_id')['score'].mean()
receptive_language_dataset = receptive_language_dataset.drop_duplicates('study_id')
receptive_language_dataset = receptive_language_dataset.set_index('study_id')
receptive_language_dataset['mean_score'] = mean_score
Proportion of missing values for each variable
covs = ['age_years', 'school', 'score', 'male',
'family_inv', 'sib', 'deg_hl_below6', 'mother_hs', 'synd_or_disab', 'jcih']
(receptive_language_dataset[covs].isnull().sum() / receptive_language_dataset.shape[0]).round(2)
age_years 0.00 school 0.00 score 0.00 male 0.00 family_inv 0.16 sib 0.06 deg_hl_below6 0.05 mother_hs 0.31 synd_or_disab 0.14 jcih 0.24 dtype: float64
Drop individuals with missing age
, since there is <1% of them
receptive_language_dataset = receptive_language_dataset[receptive_language_dataset.age.notnull()]
receptive_language_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');
receptive_language_dataset.mother_ed.hist()
<matplotlib.axes._subplots.AxesSubplot at 0x1090f77b8>
Final analysis dataset size
receptive_language_dataset.shape
(999, 82)
missing_pct = (receptive_language_dataset[covs].isnull().mean()*100).round(1)
missing_pct.sort(ascending=False)
missing_pct
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py:2: FutureWarning: sort is deprecated, use sort_values(inplace=True) for INPLACE sorting from ipykernel import kernelapp as app
mother_hs 30.9 jcih 23.1 family_inv 16.2 synd_or_disab 13.4 sib 5.5 deg_hl_below6 5.0 male 0.0 score 0.0 school 0.0 age_years 0.0 dtype: float64
receptive_language_dataset['age_amp_missing'] = receptive_language_dataset.age_amp.isnull()
by_age_amp_missing = receptive_language_dataset.groupby('age_amp_missing')
by_age_amp_missing['male', 'synd_or_disab', 'deg_hl_below6', 'mother_hs',
'non_english'].mean()
male synd_or_disab deg_hl_below6 mother_hs age_amp_missing False 0.507343 0.223926 0.480218 0.596558 True 0.516000 0.314554 0.453704 0.610778
for col in ['age', 'score', 'degree_hl', 'family_inv', 'age_int']:
try:
_ = by_age_amp_missing.boxplot(column=col)
except:
pass
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/pandas/tools/plotting.py:3079: FutureWarning: The default value for 'return_type' will change to 'axes' in a future release. To use the future behavior now, set return_type='axes'. To keep the previous behavior and silence this warning, set return_type='dict'. rot=rot, grid=grid, **kwds)
receptive_language_dataset['jcih_missing'] = receptive_language_dataset.jcih.isnull()
by_jcih_missing = receptive_language_dataset.groupby('jcih_missing')
by_jcih_missing['male', 'synd_or_disab', 'deg_hl_below6', 'mother_hs',
'non_english'].mean()
male synd_or_disab deg_hl_below6 mother_hs jcih_missing False 0.510417 0.220390 0.474104 0.604824 True 0.506494 0.333333 0.474490 0.582781
for col in ['age', 'score', 'degree_hl', 'family_inv', 'age_int']:
_ = by_jcih_missing.boxplot(column=col)
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/pandas/tools/plotting.py:3079: FutureWarning: The default value for 'return_type' will change to 'axes' in a future release. To use the future behavior now, set return_type='axes'. To keep the previous behavior and silence this warning, set return_type='dict'. rot=rot, grid=grid, **kwds)
receptive_language_dataset.age_years.mean()
2.2938772105438825
from pymc import Bernoulli, Normal, Uniform, invlogit, logit, deterministic, MCMC, Matplot, MAP
from pymc import Lambda, Dirichlet, Categorical, Beta, NegativeBinomial, Exponential
from numpy.ma import masked_values
def generate_model(dataset):
# Extract data to local variables
age_years = dataset.age_years.values
school = dataset.school.values
score = dataset.score.values
non_english = dataset.non_english.astype(int).values
male = dataset.male.values
sib = dataset.sib.values
synd_or_disab = dataset.synd_or_disab.values
int_outside_option = dataset.int_outside_option.values
synd_or_disab = dataset.synd_or_disab.values
jcih = dataset.jcih.values
# Transform some data
age_std = age_years - age_years.mean()
# Imputation of family involvement
n_family_inv_cats = len(dataset.family_inv.unique())
p_family_inv = Dirichlet("p_family_inv", np.ones(n_family_inv_cats), value=[1./n_family_inv_cats]*(n_family_inv_cats-1))
family_inv_masked = masked_values(dataset.family_inv.fillna(0.5), value=0.5)
x_family_inv = Categorical('x_family_inv', p_family_inv, value=family_inv_masked, observed=True)
# Imputation of hearing loss
p_jcih = Beta("p_jcih", 1, 1, value=0.5)
jcih_masked = masked_values(dataset.jcih.values, value=np.nan)
x_jcih = Bernoulli('x_jcih', p_jcih, value=jcih_masked, observed=True)
# Imputation of hearing loss
p_hl = Beta("p_hl", 1, 1, value=0.5)
hl_masked = masked_values(dataset.deg_hl_below6.values, value=np.nan)
x_hl = Bernoulli('x_hl', p_hl, value=hl_masked, observed=True)
# Imputation of maternal education
p_mother_hs = Beta("p_mother_hs", 1, 1, value=0.5)
mother_hs_masked = masked_values(dataset.mother_hs.values, value=np.nan)
x_mother_hs = Bernoulli('x_mother_hs', p_mother_hs, value=mother_hs_masked, observed=True)
# Imputation of previous disability
p_synd_or_disab = Beta("p_synd_or_disab", 1, 1, value=0.5)
synd_or_disab_masked = masked_values(dataset.synd_or_disab.values, value=np.nan)
x_synd_or_disab = Bernoulli('x_synd_or_disab', p_synd_or_disab, value=synd_or_disab_masked, observed=True)
# Imputation of siblings
n_sib_cats = len(dataset.sib.unique())
p_sib = Dirichlet("p_sib", np.ones(n_sib_cats), value=[1./n_sib_cats]*(n_sib_cats-1))
sib_masked = masked_values(dataset.sib.fillna(0.5), value=0.5)
x_sib = Categorical('x_sib', p_sib, value=sib_masked, observed=True)
mean_score = Normal("mean_score", 100, 0.001, value=80)
# Indices to school random effects
unique_schools = np.unique(school)
school_index = [list(unique_schools).index(s) for s in school]
# School random effect
sigma_school = Uniform("sigma_school", 0, 1000, value=10)
tau_school = sigma_school**-2
alpha_school = Normal("alpha_school", mu=0, tau=tau_school, value=np.zeros(len(unique_schools)))
# Indices to child random effects
# unique_child = np.unique(study_id)
# child_index = [list(unique_child).index(s) for s in study_id]
# Individual random effect
# sigma_child = Uniform("sigma_child", 0, 1000, value=10)
# tau_child = sigma_child**-2
# alpha_child = Normal("alpha_child", mu=0, tau=tau_child, value=np.zeros(len(unique_child)))
@deterministic
def intercept(a0=mean_score, a1=alpha_school):
"""Random intercepts"""
return a0 + a1[school_index]
# Fixed effects
beta = Normal("beta", 0, 0.001, value=[0]*8)
@deterministic
def theta(b0=intercept, b=beta, x_family_inv=x_family_inv, x_sib=x_sib,
x_hl=x_hl, x_mother_hs=x_mother_hs, x_synd_or_disab=x_synd_or_disab, x_jcih=x_jcih):
# Complete predictors
X = [age_std, male]
# Partially-imputed predictors
X += [x_family_inv, x_sib, x_hl, x_mother_hs, x_synd_or_disab, x_jcih]
return b0 + np.dot(b, X)
sigma = Uniform("sigma", 0, 1000, value=10)
tau = sigma**-2
# Data likelihood
score_like = Normal("score_like", mu=theta, tau=tau, value=score, observed=True)
#score_like_pred = Normal("score_like_pred", mu=theta, tau=tau, value=score)
# Predictions:
# Girl, 1 year older than average, normal family involvement, 2 siblings, non-profound hearing loss,
# mother with HS, no previous disability, JCIH
x1 = [0, 0, 0, 2, 1, 1, 0, 1]
# Boy, 6 months younger than average, normal family involvement, no siblings, profound hearing loss,
# mother without HS diploma, previous disability, JCIH
x2 = [-0.5, 1, 0, 0, 0, 0, 1, 1]
# Boy, 1.5 years older than average, poor family involvement, 1 sibling, non-profound hearing loss,
# mother without HS diploma, previous disability, no JCIH
x3 = [1.5, 1, 2, 1, 1, 0, 1, 0]
# Girl, 6 months older than average, impaired family involvement, 3 siblings, profound hearing loss,
# mother with HS diploma, no previous disability, JCIH
x4 = [0.5, 1, 1, 3, 0, 1, 0, 1]
X = np.c_[x1, x2, x3, x4]
# import pdb; pdb.set_trace()
theta_pred = Lambda('theta_pred', lambda b=beta, mu=mean_score: mu + np.dot(X.T, b))
predictions = Normal('predictions', mu=theta_pred, tau=tau)
return locals()
Couldn't import dot_parser, loading of dot files will not be possible.
# M_reclang = MCMC(generate_model(receptive_language_dataset))
M_reclang = MCMC(generate_model(receptive_language_dataset), db='sqlite', name='reclang', dbmode='w')
iterations = 100000
burn = 90000
M_reclang.sample(iterations, burn)
[-----------------100%-----------------] 200000 of 200000 complete in 1469.0 sec
labels = ['Age of Enrollment', 'Male',
'Reduced Family Involvement', 'Sibling Count', 'Non-profound Hearing Loss',
'Mother with HS Diploma','Additional Disability', 'JCIH']
Matplot.summary_plot([M_reclang.beta], rhat=False,
custom_labels=labels, x_range=(-10,10), main='Receptive Language')
M_reclang.beta.summary()
beta: Mean SD MC Error 95% HPD interval ------------------------------------------------------------------ -3.221 0.438 0.025 [-4.016 -2.261] 0.668 1.104 0.088 [-1.493 2.839] -5.274 0.521 0.031 [-6.314 -4.296] -0.291 0.578 0.036 [-1.486 0.747] 8.146 1.151 0.087 [ 5.822 10.46 ] 5.597 1.109 0.088 [ 3.374 7.684] -6.361 1.049 0.076 [-8.442 -4.41 ] 3.318 1.136 0.09 [ 1.108 5.474] Posterior quantiles: 2.5 25 50 75 97.5 |---------------|===============|===============|---------------| -4.122 -3.503 -3.217 -2.935 -2.335 -1.44 -0.1 0.64 1.443 2.942 -6.283 -5.626 -5.288 -4.927 -4.231 -1.413 -0.678 -0.308 0.087 0.86 5.843 7.37 8.073 8.927 10.517 3.208 4.876 5.682 6.361 7.636 -8.418 -7.089 -6.381 -5.652 -4.329 1.049 2.622 3.351 4.038 5.421
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/numpy/core/fromnumeric.py:225: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future return reshape(newshape, order=order)
M_reclang.sigma_school.summary()
sigma_school: Mean SD MC Error 95% HPD interval ------------------------------------------------------------------ 5.832 0.921 0.063 [ 4.145 7.668] Posterior quantiles: 2.5 25 50 75 97.5 |---------------|===============|===============|---------------| 4.239 5.185 5.767 6.409 7.876
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/numpy/core/fromnumeric.py:225: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future return reshape(newshape, order=order)
Matplot.summary_plot([M_reclang.predictions], rhat=False, custom_labels=['Student {}'.format(i) for i in range(1,5)])
#Matplot.gof_plot(M_reclang.score_like_pred.trace(), M_reclang.score, verbose=0)
expressive_language_dataset = lsl_dr[(lsl_dr.domain=='Language') & (lsl_dr.test_type=='expressive') &
(lsl_dr.non_english==False)]
expressive_language_dataset.head()
redcap_event_name academic_year hl male _race prim_lang sib \ 2 initial_assessment_arm_1 2002-2003 0.0 0.0 0.0 0.0 1.0 5 initial_assessment_arm_1 2008-2009 0.0 0.0 0.0 0.0 2.0 68 initial_assessment_arm_1 2009-2010 0.0 0.0 0.0 0.0 1.0 75 initial_assessment_arm_1 2009-2010 0.0 0.0 2.0 0.0 3.0 104 initial_assessment_arm_1 2009-2010 0.0 1.0 0.0 0.0 3.0 _mother_ed father_ed premature_age ... test_type \ 2 6.0 6.0 9.0 ... expressive 5 5.0 5.0 7.0 ... expressive 68 4.0 3.0 9.0 ... expressive 75 2.0 2.0 8.0 ... expressive 104 2.0 4.0 8.0 ... expressive academic_year_start tech_class age_year non_profound deg_hl_below6 \ 2 2002.0 Bimodal 4.0 False 0.0 5 2008.0 Bilateral HA 1.0 True 1.0 68 2009.0 Bilateral CI 0.0 False 0.0 75 2009.0 Bilateral HA 2.0 True 1.0 104 2009.0 Bilateral HA 1.0 True 1.0 int_outside_option mother_hs jcih age_years 2 0.0 NaN 0.0 4.333333 5 1.0 1.0 0.0 1.416667 68 0.0 1.0 1.0 0.250000 75 0.0 0.0 1.0 2.166667 104 1.0 0.0 1.0 1.416667 [5 rows x 82 columns]
mean_score = expressive_language_dataset.groupby('study_id')['score'].mean()
expressive_language_dataset = expressive_language_dataset.drop_duplicates('study_id').set_index('study_id')
expressive_language_dataset['mean_score'] = mean_score
(expressive_language_dataset[covs].isnull().sum() / expressive_language_dataset.shape[0]).round(2)
age_years 0.00 school 0.00 score 0.00 male 0.00 family_inv 0.16 sib 0.06 deg_hl_below6 0.05 mother_hs 0.31 synd_or_disab 0.14 jcih 0.23 dtype: float64
expressive_language_dataset = expressive_language_dataset[expressive_language_dataset.age.notnull()
& expressive_language_dataset.int_outside_option.notnull()]
expressive_language_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');
M_explang = MCMC(generate_model(expressive_language_dataset), db='sqlite', name='explang', dbmode='w')
M_explang.sample(iterations, burn)
[-----------------100%-----------------] 200000 of 200000 complete in 1215.0 sec
M_explang.beta.summary()
beta: Mean SD MC Error 95% HPD interval ------------------------------------------------------------------ -3.601 0.462 0.023 [-4.579 -2.78 ] 0.117 1.008 0.076 [-1.779 2.071] -4.607 0.558 0.035 [-5.677 -3.56 ] -0.222 0.557 0.031 [-1.369 0.865] 8.775 1.194 0.092 [ 6.285 11.108] 7.282 1.246 0.103 [ 4.844 9.591] -6.211 1.151 0.085 [-8.353 -3.948] 4.895 1.095 0.086 [ 2.792 7.068] Posterior quantiles: 2.5 25 50 75 97.5 |---------------|===============|===============|---------------| -4.493 -3.907 -3.595 -3.294 -2.66 -1.972 -0.558 0.093 0.824 2.017 -5.671 -4.992 -4.6 -4.226 -3.532 -1.354 -0.574 -0.219 0.141 0.898 6.307 8.01 8.81 9.576 11.182 4.853 6.415 7.299 8.162 9.635 -8.336 -6.983 -6.223 -5.484 -3.894 2.793 4.216 4.89 5.574 7.085
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/numpy/core/fromnumeric.py:225: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future return reshape(newshape, order=order)
Matplot.summary_plot([M_explang.beta], custom_labels=labels, rhat=False, main='Expressive Language', x_range=(-10,10))
Matplot.summary_plot([M_explang.predictions], rhat=False, custom_labels=['Student {}'.format(i) for i in range(1,5)])
Matplot.plot(M_explang.sigma_school)
Plotting sigma_school
articulation_dataset = lsl_dr[(lsl_dr.domain=='Articulation') &
(lsl_dr.non_english==False)]
articulation_dataset.head()
redcap_event_name academic_year hl male _race prim_lang sib \ 3 initial_assessment_arm_1 2008-2009 0.0 0.0 0.0 0.0 2.0 6 initial_assessment_arm_1 2009-2010 0.0 1.0 2.0 0.0 3.0 66 initial_assessment_arm_1 2009-2010 0.0 0.0 0.0 0.0 1.0 73 initial_assessment_arm_1 2009-2010 0.0 0.0 2.0 0.0 3.0 102 initial_assessment_arm_1 2009-2010 0.0 1.0 0.0 0.0 3.0 _mother_ed father_ed premature_age ... test_type \ 3 5.0 5.0 7.0 ... Goldman 6 6.0 6.0 8.0 ... Goldman 66 4.0 3.0 9.0 ... Goldman 73 2.0 2.0 8.0 ... Goldman 102 2.0 4.0 8.0 ... Goldman academic_year_start tech_class age_year non_profound deg_hl_below6 \ 3 2008.0 Bilateral HA 1.0 True 1.0 6 2009.0 Bimodal 0.0 False 0.0 66 2009.0 Bilateral CI 0.0 False 0.0 73 2009.0 Bilateral HA 2.0 True 1.0 102 2009.0 Bilateral HA 1.0 True 1.0 int_outside_option mother_hs jcih age_years 3 1.0 1.0 0.0 1.416667 6 0.0 NaN 1.0 0.083333 66 0.0 1.0 1.0 0.250000 73 0.0 0.0 1.0 2.166667 102 1.0 0.0 1.0 1.416667 [5 rows x 82 columns]
mean_score = articulation_dataset.groupby('study_id')['score'].mean()
articulation_dataset = articulation_dataset.drop_duplicates('study_id').set_index('study_id')
articulation_dataset['mean_score'] = mean_score
(articulation_dataset[covs].isnull().sum() / articulation_dataset.shape[0]).round(2)
age_years 0.01 school 0.00 score 0.00 male 0.00 family_inv 0.18 sib 0.07 deg_hl_below6 0.16 mother_hs 0.36 synd_or_disab 0.12 jcih 0.34 dtype: float64
articulation_dataset = articulation_dataset[articulation_dataset.age.notnull()
& articulation_dataset.int_outside_option.notnull()]
articulation_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');
M_articulation = MCMC(generate_model(articulation_dataset), db='sqlite', name='articulation', dbmode='w')
M_articulation.sample(iterations, burn)
[-----------------100%-----------------] 100000 of 100000 complete in 794.5 sec
Matplot.summary_plot([M_articulation.beta],
custom_labels=labels, rhat=False, main='Articulation', x_range=(-10,10))
Matplot.summary_plot([M_articulation.predictions], rhat=False, custom_labels=['Student {}'.format(i) for i in range(1,5)])
M_articulation.beta.summary()
beta: Mean SD MC Error 95% HPD interval ------------------------------------------------------------------ -2.46 0.526 0.028 [-3.456 -1.387] 1.136 1.118 0.081 [-0.896 3.474] -3.817 0.603 0.035 [-5.011 -2.653] 0.025 0.649 0.036 [-1.278 1.253] 5.299 1.346 0.103 [ 2.844 8.168] 6.097 1.424 0.117 [ 3.169 8.679] -8.293 1.358 0.101 [-10.727 -5.582] 6.625 1.452 0.121 [ 4.143 9.456] Posterior quantiles: 2.5 25 50 75 97.5 |---------------|===============|===============|---------------| -3.518 -2.824 -2.469 -2.091 -1.436 -1.02 0.343 1.098 1.934 3.4 -5.002 -4.227 -3.822 -3.423 -2.616 -1.242 -0.423 0.034 0.476 1.327 2.719 4.375 5.301 6.161 8.102 3.309 5.157 6.167 7.079 8.875 -10.939 -9.279 -8.27 -7.317 -5.725 3.584 5.625 6.578 7.719 9.229
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/numpy/core/fromnumeric.py:225: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future return reshape(newshape, order=order)
expressive_vocab_dataset = lsl_dr[(lsl_dr.domain=='Expressive Vocabulary') &
(lsl_dr.non_english==False)]
expressive_vocab_dataset.head()
redcap_event_name academic_year hl male _race prim_lang sib \ 0 initial_assessment_arm_1 2002-2003 0.0 0.0 0.0 0.0 1.0 124 initial_assessment_arm_1 2014-2015 0.0 0.0 6.0 0.0 2.0 205 initial_assessment_arm_1 2013-2014 0.0 0.0 0.0 0.0 1.0 213 initial_assessment_arm_1 2012-2013 0.0 1.0 0.0 0.0 2.0 219 initial_assessment_arm_1 2012-2013 0.0 0.0 3.0 0.0 0.0 _mother_ed father_ed premature_age ... test_type \ 0 6.0 6.0 9.0 ... EOWPVT 124 NaN NaN 8.0 ... EVT 205 4.0 5.0 8.0 ... EOWPVT 213 4.0 5.0 8.0 ... EOWPVT 219 6.0 6.0 9.0 ... EOWPVT academic_year_start tech_class age_year non_profound deg_hl_below6 \ 0 2002.0 Bimodal 4.0 False 0.0 124 2014.0 Bilateral HA 4.0 True 1.0 205 2013.0 Bimodal 4.0 False 0.0 213 2012.0 Bimodal 4.0 False 0.0 219 2012.0 Bimodal 4.0 False 0.0 int_outside_option mother_hs jcih age_years 0 0.0 NaN 0.0 4.333333 124 0.0 NaN 0.0 4.833333 205 1.0 1.0 1.0 4.916667 213 1.0 1.0 1.0 4.583333 219 0.0 NaN 0.0 4.583333 [5 rows x 82 columns]
mean_score = expressive_vocab_dataset.groupby('study_id')['score'].mean()
expressive_vocab_dataset = expressive_vocab_dataset.drop_duplicates('study_id').set_index('study_id')
expressive_vocab_dataset['mean_score'] = mean_score
expressive_vocab_dataset[covs].isnull().sum()
age_years 7 school 0 score 0 male 0 family_inv 216 sib 75 deg_hl_below6 178 mother_hs 426 synd_or_disab 138 jcih 395 dtype: int64
expressive_vocab_dataset = expressive_vocab_dataset[expressive_vocab_dataset.age.notnull()
& expressive_vocab_dataset.male.notnull()
& expressive_vocab_dataset.int_outside_option.notnull()]
expressive_vocab_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');
M_expressive_vocab = MCMC(generate_model(expressive_vocab_dataset), db='sqlite', name='expressive_vocab', dbmode='w')
M_expressive_vocab.sample(iterations, burn)
[-----------------100%-----------------] 100001 of 100000 complete in 904.7 sec
Matplot.summary_plot([M_expressive_vocab.beta],
custom_labels=labels, rhat=False, main='Expressive Vocabulary', x_range=(-10,10))
M_expressive_vocab.beta.summary()
beta: Mean SD MC Error 95% HPD interval ------------------------------------------------------------------ -3.355 0.439 0.025 [-4.241 -2.531] 1.485 1.041 0.08 [-0.636 3.439] -5.252 0.538 0.034 [-6.367 -4.273] -1.018 0.581 0.038 [-2.112 0.148] 6.463 1.066 0.082 [ 4.524 8.605] 6.901 1.395 0.119 [ 4.176 9.645] -5.977 1.185 0.092 [-8.412 -3.733] 3.873 1.264 0.106 [ 1.431 6.319] Posterior quantiles: 2.5 25 50 75 97.5 |---------------|===============|===============|---------------| -4.233 -3.644 -3.35 -3.064 -2.502 -0.421 0.783 1.417 2.176 3.678 -6.324 -5.612 -5.235 -4.88 -4.212 -2.112 -1.42 -1.02 -0.638 0.148 4.534 5.768 6.422 7.173 8.649 4.191 5.925 6.939 7.851 9.713 -8.294 -6.781 -5.967 -5.18 -3.58 1.527 3.003 3.844 4.699 6.579
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/numpy/core/fromnumeric.py:225: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future return reshape(newshape, order=order)
Matplot.summary_plot(M_explang.alpha_school, custom_labels=[' ']*42, main='Expressive language school effects',
x_range=(-25, 25))
Could not calculate Gelman-Rubin statistics. Requires multiple chains of equal length.
Matplot.summary_plot(M_expressive_vocab.alpha_school, custom_labels=[' ']*42, main='Expressive vocabulary school effects',
x_range=(-25, 25))
Could not calculate Gelman-Rubin statistics. Requires multiple chains of equal length.
receptive_vocab_dataset = lsl_dr[(lsl_dr.domain=='Receptive Vocabulary') &
(lsl_dr.non_english==False)]
receptive_vocab_dataset.head()
redcap_event_name academic_year hl male _race prim_lang sib \ 125 initial_assessment_arm_1 2014-2015 0.0 0.0 6.0 0.0 2.0 206 initial_assessment_arm_1 2013-2014 0.0 0.0 0.0 0.0 1.0 214 initial_assessment_arm_1 2012-2013 0.0 1.0 0.0 0.0 2.0 217 initial_assessment_arm_1 2012-2013 0.0 0.0 0.0 0.0 1.0 220 initial_assessment_arm_1 2012-2013 0.0 0.0 3.0 0.0 0.0 _mother_ed father_ed premature_age ... test_type \ 125 NaN NaN 8.0 ... PPVT 206 4.0 5.0 8.0 ... ROWPVT 214 4.0 5.0 8.0 ... ROWPVT 217 5.0 5.0 8.0 ... ROWPVT 220 6.0 6.0 9.0 ... PPVT academic_year_start tech_class age_year non_profound deg_hl_below6 \ 125 2014.0 Bilateral HA 4.0 True 1.0 206 2013.0 Bimodal 4.0 False 0.0 214 2012.0 Bimodal 4.0 False 0.0 217 2012.0 Bilateral HA 4.0 True 1.0 220 2012.0 Bimodal 4.0 False 0.0 int_outside_option mother_hs jcih age_years 125 0.0 NaN 0.0 4.833333 206 1.0 1.0 1.0 4.916667 214 1.0 1.0 1.0 4.583333 217 1.0 1.0 1.0 4.083333 220 0.0 NaN 0.0 4.583333 [5 rows x 82 columns]
mean_score = receptive_vocab_dataset.groupby('study_id')['score'].mean()
receptive_vocab_dataset = receptive_vocab_dataset.drop_duplicates('study_id').set_index('study_id')
receptive_vocab_dataset['mean_score'] = mean_score
receptive_vocab_dataset[covs].isnull().sum()
age_years 7 school 0 score 0 male 0 family_inv 220 sib 74 deg_hl_below6 180 mother_hs 430 synd_or_disab 135 jcih 399 dtype: int64
receptive_vocab_dataset = receptive_vocab_dataset[receptive_vocab_dataset.age.notnull() &
receptive_vocab_dataset.male.notnull() &
receptive_vocab_dataset.int_outside_option.notnull()]
receptive_vocab_dataset.age_int.hist(bins=40)
<matplotlib.axes._subplots.AxesSubplot at 0x10919c358>
receptive_vocab_dataset.shape
(1171, 82)
receptive_vocab_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');
M_receptive_vocab = MCMC(generate_model(receptive_vocab_dataset), db='sqlite', name='receptive_vocab', dbmode='w')
M_receptive_vocab.sample(iterations, burn)
[-----------------100%-----------------] 100000 of 100000 complete in 900.8 sec
#order = np.argsort(M.beta.stats()['mean'])[::-1]
Matplot.summary_plot([M_receptive_vocab.beta], rhat=False,
custom_labels=labels, main='Receptive Vocabulary', x_range=(-10,10))
M_receptive_vocab.beta.summary()
beta: Mean SD MC Error 95% HPD interval ------------------------------------------------------------------ -3.091 0.435 0.024 [-3.933 -2.28 ] 0.654 0.989 0.077 [-1.31 2.667] -4.541 0.493 0.029 [-5.534 -3.615] -0.837 0.538 0.032 [-1.857 0.228] 5.958 1.091 0.087 [ 3.857 8.068] 5.369 1.374 0.122 [ 2.592 7.699] -7.058 1.147 0.09 [-9.313 -4.939] 3.526 1.217 0.102 [ 1.36 5.848] Posterior quantiles: 2.5 25 50 75 97.5 |---------------|===============|===============|---------------| -3.909 -3.399 -3.089 -2.796 -2.245 -1.31 0.032 0.62 1.278 2.667 -5.521 -4.877 -4.542 -4.21 -3.592 -1.834 -1.189 -0.869 -0.492 0.281 3.715 5.228 6.008 6.687 7.928 2.537 4.421 5.541 6.385 7.679 -9.302 -7.85 -7.003 -6.224 -4.909 1.368 2.644 3.487 4.395 5.907
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/numpy/core/fromnumeric.py:225: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future return reshape(newshape, order=order)
import sqlite3
db_files = !ls *sqlite
articulation = sqlite3.connect("articulation.sqlite")
c_artic = articulation.cursor()
c_artic.execute('SELECT v1 FROM sigma_school WHERE trace==0')
<sqlite3.Cursor at 0x10d852e30>
sigma_school_artic = np.ravel(c_artic.fetchall())
sigma_school_artic.mean()
5.4121976771999991
for dbname in db_files:
db = sqlite3.connect(dbname)
cur = db.cursor()
cur.execute('SELECT v1 FROM sigma_school WHERE trace==0')
trace = np.ravel(cur.fetchall())
trace.sort()
print('\n'+dbname)
print(np.median(trace))
print(trace[[int(len(trace)*0.025), int(len(trace)*0.975)]])
articulation.sqlite 5.3787495 [ 2.726391 8.125967] explang.sqlite 5.638327 [ 4.07999 8.008275] expressive_vocab.sqlite 8.387962 [ 6.401984 11.66551 ] receptive_vocab.sqlite 6.903193 [ 5.176159 9.135123] reclang.sqlite 5.766925 [ 4.239235 7.87621 ]