All studies with successful outcomes reported for early-identified children who are deaf or hard of hearing have intervention provided by specialists who are trained in parent-infant intervention services.[12,90,97] Early intervention programs should develop mechanisms to ensure that early intervention professionals have special skills necessary for providing families with the highest quality of service specific to children with hearing loss. Professionals with a background in deaf education, audiology, and speech-language pathology will typically have the skills needed for providing intervention services. Professionals should be highly qualified in their respective fields and should be skilled communicators who are knowledgeable and sensitive to the importance of enhancing families' strengths and supporting their priorities. When early intervention professionals have knowledge of the principles of adult learning, it increases their success with parents and other professionals.
TJC must be competent to provide services.
References:
Purpose of the study is to examine the relationship between highly qualified providers, age of enrollment in intervention, and speech and language outcomes at 4 years of age in a group of children enrolled in school programs that specialize in the development of listening and spoken language.
# Import modules and set options
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
pd.set_option('display.notebook_repr_html', False)
pd.set_option('display.max_columns', 20)
pd.set_option('display.max_rows', 100)
Import data
lsl_dr = pd.read_csv('lsl_dr.csv', index_col=0)
/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py:1130: DtypeWarning: Columns (34,42) have mixed types. Specify dtype option on import or set low_memory=False. data = self._reader.read(nrows)
lsl_dr.head()
index redcap_event_name academic_year hl male _race \ 0 0 Initial Assessment/Enrollment 2002-2003 0 0 0 1 0 Initial Assessment/Enrollment 2002-2003 0 0 0 2 1 Year 1 Complete (7/1 - 6/30) 2003-2004 0 0 0 3 2 Year 2 Complete (7/1 - 6/30) 2004-2005 0 0 0 4 2 Year 2 Complete (7/1 - 6/30) 2004-2005 0 0 0 prim_lang sib _mother_ed father_ed ... age_diag known_synd \ 0 0 1 6 6 ... 15 0 1 0 1 6 6 ... 15 0 2 0 1 6 6 ... 15 0 3 0 1 6 6 ... 15 0 4 0 1 6 6 ... 15 0 synd_or_disab race age_test domain school score \ 0 0 0 54 Expressive Vocabulary 101 60 1 0 0 54 Language 101 58 2 0 0 80 Articulation 101 78 3 0 0 80 Expressive Vocabulary 101 84 4 0 0 80 Receptive Vocabulary 101 90 test_name test_type 0 NaN EOWPVT 1 PLS expressive 2 NaN Goldman 3 NaN EOWPVT 4 NaN PPVT [5 rows x 72 columns]
Indicator for non-profound hearing loss
lsl_dr['deg_hl_below6'] = lsl_dr.degree_hl<6
lsl_dr.deg_hl_below6[lsl_dr.degree_hl.isnull()] = np.nan
Indicator for first intervention outside OPTION
lsl_dr['int_outside_option'] = lsl_dr.age > lsl_dr.age_int
lsl_dr.int_outside_option[lsl_dr.age < lsl_dr.age_int] = np.nan
Indicator for high school graduation of mother
lsl_dr['mother_hs'] = lsl_dr.mother_ed > 1
lsl_dr.mother_hs[lsl_dr.mother_ed.isnull()] = None
Joint commission benchmark
lsl_dr.onset_1.unique()
array([ 15. , 32. , 2. , nan, 0. , 4. , 1. , 24. , 3. , 7. , 13. , 10. , 42. , 34. , 12. , 1.5, 22. , 21. , 23. , 5. , 28. , 29. , 51. , 44. , 16. , 14. , 35. , 18. , 41. , 33. , 37. , 6. , 46. , 40. , 47. , 31. , 30. , 48. , 11. , 20. , 26. , 36. , 25. , 19. , 9. , 39. , 86. , 17. , 38. , 27. , 8. , 64. , 58. , 70. , 49. , 55. , 60. , 50. , 116. , 72. , 57. , 43. , 65. , 78. , 83. , 61. , 107. , 74. , 77. , 62. , 80. , 88. , 59. , 53. , 67. , 2.5, 52. , 97. , 92. , 69. , 103. , 133. , 126. , 168. , 140. , 68. , 120. , 56. ])
ident_by_3 = lsl_dr.onset_1 <= 3
program_by_6 = lsl_dr.age <= 6
lsl_dr['jcih'] = ident_by_3 | program_by_6
lsl_dr.jcih[lsl_dr.onset_1.isnull() | lsl_dr.age.isnull()] = None
Create age in years variable
lsl_dr['age_years'] = lsl_dr.age/12.
Retain only 4-year-olds
lsl_dr = lsl_dr[(lsl_dr.age_test>=48) & (lsl_dr.age_test<60)]
lsl_dr.shape
(5076, 77)
lsl_dr.study_id.unique().shape
(1203,)
# Filter domain and English-only, and appropriate bilateral subset
receptive_language_dataset = lsl_dr[(lsl_dr.domain=='Language') & (lsl_dr.test_type=='receptive') &
(lsl_dr.non_english==False)]
receptive_language_dataset.head()
index redcap_event_name academic_year hl male _race \ 20 10 Year 1 Complete (7/1 - 6/30) 2003-2004 0 0 0 32 14 Year 2 Complete (7/1 - 6/30) 2008-2009 0 1 0 52 20 Year 3 Complete (7/1 - 6/30) 2007-2008 0 1 0 77 32 Year 2 Complete (7/1 - 6/30) 2008-2009 0 1 0 80 33 Year 3 Complete (7/1 - 6/30) 2008-2009 0 1 0 prim_lang sib _mother_ed father_ed ... domain school score \ 20 0 1 2 2 ... Language 101 77 32 0 0 6 6 ... Language 101 85 52 0 1 4 4 ... Language 101 86 77 0 2 5 4 ... Language 101 92 80 0 2 5 4 ... Language 101 92 test_name test_type deg_hl_below6 int_outside_option mother_hs jcih \ 20 PLS receptive 1 1 0 0 32 CELF-P2 receptive 1 1 NaN 1 52 CELF-P2 receptive 0 0 1 1 77 CELF-P2 receptive NaN 0 1 0 80 CELF-P2 receptive NaN 0 1 0 age_years 20 3.416667 32 2.916667 52 0.083333 77 2.416667 80 2.416667 [5 rows x 77 columns]
This is the size of the resulting dataset to be used in this analysis
receptive_language_dataset.shape
(749, 77)
Proportion of missing values for each variable
covs = ['age_years', 'school', 'score', 'male',
'family_inv', 'sib', 'deg_hl_below6', 'mother_hs', 'synd_or_disab', 'jcih']
(receptive_language_dataset[covs].isnull().sum() / receptive_language_dataset.shape[0]).round(2)
age_years 0.01 school 0.00 score 0.00 male 0.00 family_inv 0.24 sib 0.08 deg_hl_below6 0.09 mother_hs 0.34 synd_or_disab 0.12 jcih 0.44 dtype: float64
Drop individuals with missing age
, since there is <1% of them
receptive_language_dataset = receptive_language_dataset[receptive_language_dataset.age.notnull()]
receptive_language_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');
receptive_language_dataset.mother_ed.hist()
<matplotlib.axes._subplots.AxesSubplot at 0x10724e0d0>
Final analysis dataset size
receptive_language_dataset.shape
(743, 77)
receptive_language_dataset[covs].isnull().sum()
age_years 0 school 0 score 0 male 0 family_inv 180 sib 57 deg_hl_below6 62 mother_hs 246 synd_or_disab 88 jcih 324 dtype: int64
receptive_language_dataset['age_amp_missing'] = receptive_language_dataset.age_amp.isnull()
by_age_amp_missing = receptive_language_dataset.groupby('age_amp_missing')
by_age_amp_missing['male', 'synd_or_disab', 'deg_hl_below6', 'mother_hs',
'non_english'].mean()
male synd_or_disab deg_hl_below6 mother_hs age_amp_missing False 0.495146 0.235955 0.466667 0.588652 True 0.531722 0.274247 0.487973 0.651163
for col in ['age', 'score', 'degree_hl', 'family_inv', 'age_int']:
try:
_ = by_age_amp_missing.boxplot(column=col)
except:
pass
/usr/local/lib/python2.7/site-packages/pandas/tools/plotting.py:2385: FutureWarning: The default value for 'return_type' will change to 'axes' in a future release. To use the future behavior now, set return_type='axes'. To keep the previous behavior and silence this warning, set return_type='dict'. warnings.warn(msg, FutureWarning)
receptive_language_dataset['jcih_missing'] = receptive_language_dataset.jcih.isnull()
by_jcih_missing = receptive_language_dataset.groupby('jcih_missing')
by_jcih_missing['male', 'synd_or_disab', 'deg_hl_below6', 'mother_hs',
'non_english'].mean()
male synd_or_disab deg_hl_below6 mother_hs jcih_missing False 0.498807 0.239669 0.468514 0.582456 True 0.527778 0.270548 0.485915 0.660377
for col in ['age', 'score', 'degree_hl', 'family_inv', 'age_int']:
_ = by_jcih_missing.boxplot(column=col)
from pymc import Bernoulli, Normal, Uniform, invlogit, logit, deterministic, MCMC, Matplot, MAP
from pymc import Lambda, Dirichlet, Categorical, Beta, NegativeBinomial, Exponential
from numpy.ma import masked_values
def generate_model(dataset):
# Extract data to local variables
age_years = dataset.age_years.values
school = dataset.school.values
study_id = dataset.study_id.values
score = dataset.score.values
non_english = dataset.non_english.astype(int).values
male = dataset.male.values
sib = dataset.sib.values
synd_or_disab = dataset.synd_or_disab.values
int_outside_option = dataset.int_outside_option.values
synd_or_disab = dataset.synd_or_disab.values
jcih = dataset.jcih.values
# Transform some data
age_std = age_years - age_years.mean()
# Imputation of family involvement
n_family_inv_cats = len(dataset.family_inv.unique())
p_family_inv = Dirichlet("p_family_inv", np.ones(n_family_inv_cats), value=[1./n_family_inv_cats]*(n_family_inv_cats-1))
family_inv_masked = masked_values(dataset.family_inv.fillna(0.5), value=0.5)
x_family_inv = Categorical('x_family_inv', p_family_inv, value=family_inv_masked, observed=True)
# Imputation of hearing loss
p_jcih = Beta("p_jcih", 1, 1, value=0.5)
jcih_masked = masked_values(dataset.jcih.values, value=np.nan)
x_jcih = Bernoulli('x_jcih', p_jcih, value=jcih_masked, observed=True)
# Imputation of hearing loss
p_hl = Beta("p_hl", 1, 1, value=0.5)
hl_masked = masked_values(dataset.deg_hl_below6.values, value=np.nan)
x_hl = Bernoulli('x_hl', p_hl, value=hl_masked, observed=True)
# Imputation of maternal education
p_mother_hs = Beta("p_mother_hs", 1, 1, value=0.5)
mother_hs_masked = masked_values(dataset.mother_hs.values, value=np.nan)
x_mother_hs = Bernoulli('x_mother_hs', p_mother_hs, value=mother_hs_masked, observed=True)
# Imputation of previous disability
p_synd_or_disab = Beta("p_synd_or_disab", 1, 1, value=0.5)
synd_or_disab_masked = masked_values(dataset.synd_or_disab.values, value=np.nan)
x_synd_or_disab = Bernoulli('x_synd_or_disab', p_synd_or_disab, value=synd_or_disab_masked, observed=True)
# Imputation of age at intervention
# mu_ai = Uniform("mu_ai", 0, 1000, value=10)
# a_ai = Exponential("a_ai", 1, value=1)
# age_int_masked = masked_values(dataset.age_int.fillna(0.5), value=0.5)
# x_age_int = NegativeBinomial('x_age_int', mu_ai, a_ai, value=age_int_masked, observed=True)
# Imputation of siblings
n_sib_cats = len(dataset.sib.unique())
p_sib = Dirichlet("p_sib", np.ones(n_sib_cats), value=[1./n_sib_cats]*(n_sib_cats-1))
sib_masked = masked_values(dataset.sib.fillna(0.5), value=0.5)
x_sib = Categorical('x_sib', p_sib, value=sib_masked, observed=True)
mean_score = Normal("mean_score", 100, 0.001, value=80)
# Indices to school random effects
unique_schools = np.unique(school)
school_index = [list(unique_schools).index(s) for s in school]
# School random effect
sigma_school = Uniform("sigma_school", 0, 1000, value=10)
tau_school = sigma_school**-2
alpha_school = Normal("alpha_school", mu=0, tau=tau_school, value=np.zeros(len(unique_schools)))
# Indices to child random effects
unique_child = np.unique(study_id)
child_index = [list(unique_child).index(s) for s in study_id]
# Individual random effect
sigma_child = Uniform("sigma_child", 0, 1000, value=10)
tau_child = sigma_child**-2
alpha_child = Normal("alpha_child", mu=0, tau=tau_child, value=np.zeros(len(unique_child)))
@deterministic
def intercept(a0=mean_score, a1=alpha_school, a2=alpha_child):
"""Random intercepts"""
return a0 + a1[school_index] + a2[child_index]
# Fixed effects
beta = Normal("beta", 0, 0.001, value=[0]*8)
@deterministic
def theta(b0=intercept, b=beta, x_family_inv=x_family_inv, x_sib=x_sib,
x_hl=x_hl, x_mother_hs=x_mother_hs, x_synd_or_disab=x_synd_or_disab, x_jcih=x_jcih):
# Complete predictors
X = [age_std, male]
# Partially-imputed predictors
X += [x_family_inv, x_sib, x_hl, x_mother_hs, x_synd_or_disab, x_jcih]
return b0 + np.dot(b, X)
sigma = Uniform("sigma", 0, 1000, value=10)
tau = sigma**-2
# Data likelihood
score_like = Normal("score_like", mu=theta, tau=tau, value=score, observed=True)
score_like_pred = Normal("score_like_pred", mu=theta, tau=tau, value=score)
# Predictions:
# Girl, 1 year older than average, normal family involvement, 2 siblings, non-profound hearing loss,
# mother with HS, no previous disability, JCIH
x1 = [0, 0, 0, 2, 1, 1, 0, 1]
# Boy, 6 months younger than average, normal family involvement, no siblings, profound hearing loss,
# mother without HS diploma, previous disability, JCIH
x2 = [-0.5, 1, 0, 0, 0, 0, 1, 1]
# Boy, 1.5 years older than average, poor family involvement, 1 sibling, non-profound hearing loss,
# mother without HS diploma, previous disability, no JCIH
x3 = [1.5, 1, 2, 1, 1, 0, 1, 0]
# Girl, 6 months older than average, impaired family involvement, 3 siblings, profound hearing loss,
# mother with HS diploma, no previous disability, JCIH
x4 = [0.5, 1, 1, 3, 0, 1, 0, 1]
X = np.c_[x1, x2, x3, x4]
# import pdb; pdb.set_trace()
theta_pred = Lambda('theta_pred', lambda b=beta, mu=mean_score: mu + np.dot(X.T, b))
predictions = Normal('predictions', mu=theta_pred, tau=tau)
return locals()
M_reclang = MCMC(generate_model(receptive_language_dataset))
M_reclang.sample(200000, 190000)
[-----------------100%-----------------] 200000 of 200000 complete in 2529.8 sec
labels = ['Age of Enrollment', 'Male',
'Reduced Family Involvement', 'Sibling Count', 'Non-profound Hearing Loss',
'Mother with HS Diploma','Additional Disability', 'JCIH']
Matplot.summary_plot([M_reclang.beta], rhat=False,
custom_labels=labels)
M_reclang.beta.summary()
beta: Mean SD MC Error 95% HPD interval ------------------------------------------------------------------ -3.133 0.505 0.034 [-4.16 -2.174] -0.19 1.308 0.114 [-2.453 2.53 ] -5.453 0.559 0.036 [-6.557 -4.305] -0.684 0.62 0.044 [-1.881 0.509] 6.313 1.322 0.117 [ 3.841 9.039] 5.139 1.408 0.126 [ 2.154 7.753] -6.381 1.361 0.121 [-8.609 -3.807] 4.252 1.658 0.153 [ 0.91 7.616] Posterior quantiles: 2.5 25 50 75 97.5 |---------------|===============|===============|---------------| -4.131 -3.479 -3.124 -2.804 -2.124 -2.566 -1.09 -0.217 0.669 2.496 -6.629 -5.818 -5.439 -5.086 -4.333 -1.907 -1.089 -0.652 -0.271 0.504 3.658 5.376 6.238 7.273 8.929 2.401 4.227 5.202 6.032 8.026 -8.703 -7.368 -6.523 -5.387 -3.861 0.598 3.36 4.339 5.282 7.364
Matplot.summary_plot([M_reclang.predictions], rhat=False, custom_labels=['Student {}'.format(i) for i in range(1,5)])
Matplot.gof_plot(M_reclang.score_like_pred.trace(), M_reclang.score, verbose=0)
expressive_language_dataset = lsl_dr[(lsl_dr.domain=='Language') & (lsl_dr.test_type=='expressive') &
(lsl_dr.non_english==False)]
expressive_language_dataset.head()
index redcap_event_name academic_year hl male _race \ 1 0 Initial Assessment/Enrollment 2002-2003 0 0 0 21 10 Year 1 Complete (7/1 - 6/30) 2003-2004 0 0 0 33 14 Year 2 Complete (7/1 - 6/30) 2008-2009 0 1 0 53 20 Year 3 Complete (7/1 - 6/30) 2007-2008 0 1 0 78 32 Year 2 Complete (7/1 - 6/30) 2008-2009 0 1 0 prim_lang sib _mother_ed father_ed ... domain school score \ 1 0 1 6 6 ... Language 101 58 21 0 1 2 2 ... Language 101 77 33 0 0 6 6 ... Language 101 67 53 0 1 4 4 ... Language 101 81 78 0 2 5 4 ... Language 101 105 test_name test_type deg_hl_below6 int_outside_option mother_hs jcih \ 1 PLS expressive 0 1 NaN 0 21 PLS expressive 1 1 0 0 33 CELF-P2 expressive 1 1 NaN 1 53 CELF-P2 expressive 0 0 1 1 78 CELF-P2 expressive NaN 0 1 0 age_years 1 4.333333 21 3.416667 33 2.916667 53 0.083333 78 2.416667 [5 rows x 77 columns]
(expressive_language_dataset[covs].isnull().sum() / expressive_language_dataset.shape[0]).round(2)
age_years 0.01 school 0.00 score 0.00 male 0.00 family_inv 0.24 sib 0.08 deg_hl_below6 0.09 mother_hs 0.34 synd_or_disab 0.12 jcih 0.43 dtype: float64
expressive_language_dataset = expressive_language_dataset[expressive_language_dataset.age.notnull()
& expressive_language_dataset.int_outside_option.notnull()]
expressive_language_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');
M_explang = MCMC(generate_model(expressive_language_dataset))
M_explang.sample(200000, 190000)
[-----------------100%-----------------] 200000 of 200000 complete in 2007.5 sec
Matplot.summary_plot([M_explang.beta], custom_labels=labels, rhat=False)
Matplot.summary_plot([M_explang.predictions], rhat=False, custom_labels=['Student {}'.format(i) for i in range(1,5)])
articulation_dataset = lsl_dr[(lsl_dr.domain=='Articulation') &
(lsl_dr.non_english==False)]
articulation_dataset.head()
index redcap_event_name academic_year hl male _race \ 18 10 Year 1 Complete (7/1 - 6/30) 2003-2004 0 0 0 29 14 Year 2 Complete (7/1 - 6/30) 2008-2009 0 1 0 49 20 Year 3 Complete (7/1 - 6/30) 2007-2008 0 1 0 93 43 Year 1 Complete (7/1 - 6/30) NaN 0 0 0 139 69 Year 2 Complete (7/1 - 6/30) 2012-2013 0 1 0 prim_lang sib _mother_ed father_ed ... domain school \ 18 0 1 2 2 ... Articulation 101 29 0 0 6 6 ... Articulation 101 49 0 1 4 4 ... Articulation 101 93 0 2 6 6 ... Articulation 101 139 0 0 5 6 ... Articulation 101 score test_name test_type deg_hl_below6 int_outside_option \ 18 97 NaN Goldman 1 1 29 75 NaN Goldman 1 1 49 100 NaN Goldman 0 0 93 120 NaN Goldman 1 0 139 65 NaN Goldman 0 1 mother_hs jcih age_years 18 0 0 3.416667 29 NaN 1 2.916667 49 1 1 0.083333 93 NaN NaN 4.000000 139 1 1 2.083333 [5 rows x 77 columns]
(articulation_dataset[covs].isnull().sum() / articulation_dataset.shape[0]).round(2)
age_years 0.01 school 0.00 score 0.00 male 0.00 family_inv 0.27 sib 0.09 deg_hl_below6 0.21 mother_hs 0.40 synd_or_disab 0.10 jcih 0.54 dtype: float64
articulation_dataset = articulation_dataset[articulation_dataset.age.notnull()
& articulation_dataset.int_outside_option.notnull()]
articulation_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');
M_articulation = MCMC(generate_model(articulation_dataset))
M_articulation.sample(200000, 190000)
[-----------------100%-----------------] 200000 of 200000 complete in 3985.2 sec
Matplot.summary_plot([M_articulation.beta],
custom_labels=labels, rhat=False)
Matplot.summary_plot([M_articulation.predictions], rhat=False, custom_labels=['Student {}'.format(i) for i in range(1,5)])
M_articulation.beta.summary()
beta: Mean SD MC Error 95% HPD interval ------------------------------------------------------------------ -2.578 0.55 0.035 [-3.627 -1.475] 0.209 1.37 0.117 [-2.772 2.703] -4.463 0.738 0.059 [-5.864 -3.023] -1.243 0.752 0.056 [-2.733 0.216] 5.76 1.366 0.117 [ 3.055 8.39 ] 5.955 1.534 0.136 [ 3.237 9.183] -6.041 1.219 0.103 [-8.614 -3.82 ] 4.849 1.556 0.138 [ 1.255 7.454] Posterior quantiles: 2.5 25 50 75 97.5 |---------------|===============|===============|---------------| -3.649 -2.935 -2.57 -2.217 -1.475 -2.89 -0.555 0.276 1.103 2.633 -5.836 -5.014 -4.493 -3.957 -2.91 -2.725 -1.741 -1.228 -0.735 0.251 3.122 4.775 5.727 6.702 8.511 2.881 4.879 6.02 6.995 8.927 -8.568 -6.824 -5.981 -5.268 -3.713 1.725 3.812 4.79 5.921 8.198
expressive_vocab_dataset = lsl_dr[(lsl_dr.domain=='Expressive Vocabulary') &
(lsl_dr.non_english==False)]
expressive_vocab_dataset.head()
index redcap_event_name academic_year hl male _race \ 0 0 Initial Assessment/Enrollment 2002-2003 0 0 0 30 14 Year 2 Complete (7/1 - 6/30) 2008-2009 0 1 0 50 20 Year 3 Complete (7/1 - 6/30) 2007-2008 0 1 0 76 32 Year 2 Complete (7/1 - 6/30) 2008-2009 0 1 0 79 33 Year 3 Complete (7/1 - 6/30) 2008-2009 0 1 0 prim_lang sib _mother_ed father_ed ... domain \ 0 0 1 6 6 ... Expressive Vocabulary 30 0 0 6 6 ... Expressive Vocabulary 50 0 1 4 4 ... Expressive Vocabulary 76 0 2 5 4 ... Expressive Vocabulary 79 0 2 5 4 ... Expressive Vocabulary school score test_name test_type deg_hl_below6 int_outside_option \ 0 101 60 NaN EOWPVT 0 1 30 101 90 NaN EOWPVT 1 1 50 101 102 NaN EOWPVT 0 0 76 101 89 NaN EOWPVT NaN 0 79 101 89 NaN EOWPVT NaN 0 mother_hs jcih age_years 0 NaN 0 4.333333 30 NaN 1 2.916667 50 1 1 0.083333 76 1 0 2.416667 79 1 0 2.416667 [5 rows x 77 columns]
expressive_vocab_dataset[covs].isnull().sum()
age_years 6 school 0 score 0 male 0 family_inv 241 sib 67 deg_hl_below6 182 mother_hs 354 synd_or_disab 86 jcih 476 dtype: int64
expressive_vocab_dataset = expressive_vocab_dataset[expressive_vocab_dataset.age.notnull()
& expressive_vocab_dataset.male.notnull()
& expressive_vocab_dataset.int_outside_option.notnull()]
expressive_vocab_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');
M_expressive_vocab = MCMC(generate_model(expressive_vocab_dataset))
M_expressive_vocab.sample(200000, 190000)
[-----------------100%-----------------] 200000 of 200000 complete in 2645.0 sec
Matplot.summary_plot([M_expressive_vocab.beta],
custom_labels=labels, rhat=False)
#order = np.argsort(M.beta.stats()['mean'])[::-1]
Matplot.summary_plot([M_expressive_vocab.beta], rhat=False,
custom_labels=labels)
M_expressive_vocab.beta.summary()
beta: Mean SD MC Error 95% HPD interval ------------------------------------------------------------------ -2.961 0.511 0.035 [-3.922 -1.983] 0.873 1.168 0.098 [-1.502 3.261] -5.562 0.63 0.045 [-6.599 -4.294] -1.833 0.731 0.056 [-3.223 -0.299] 5.595 1.421 0.127 [ 2.817 8.309] 7.266 1.644 0.152 [ 4.215 10.308] -7.076 1.31 0.113 [-9.646 -4.54 ] 2.393 1.492 0.136 [-0.31 5.313] Posterior quantiles: 2.5 25 50 75 97.5 |---------------|===============|===============|---------------| -3.921 -3.31 -2.969 -2.59 -1.977 -1.502 0.115 0.828 1.58 3.27 -6.692 -6.004 -5.571 -5.122 -4.325 -3.271 -2.34 -1.844 -1.349 -0.323 2.848 4.582 5.634 6.527 8.377 3.793 6.109 7.327 8.564 10.098 -9.671 -7.925 -7.017 -6.228 -4.547 -0.271 1.381 2.307 3.352 5.487
receptive_vocab_dataset = lsl_dr[(lsl_dr.domain=='Receptive Vocabulary') &
(lsl_dr.non_english==False)]
receptive_vocab_dataset.head()
index redcap_event_name academic_year hl male _race \ 19 10 Year 1 Complete (7/1 - 6/30) 2003-2004 0 0 0 31 14 Year 2 Complete (7/1 - 6/30) 2008-2009 0 1 0 51 20 Year 3 Complete (7/1 - 6/30) 2007-2008 0 1 0 95 43 Year 1 Complete (7/1 - 6/30) NaN 0 0 0 144 70 Year 3 Complete (7/1 - 6/30) 2013-2014 0 1 0 prim_lang sib _mother_ed father_ed ... domain \ 19 0 1 2 2 ... Receptive Vocabulary 31 0 0 6 6 ... Receptive Vocabulary 51 0 1 4 4 ... Receptive Vocabulary 95 0 2 6 6 ... Receptive Vocabulary 144 0 0 5 6 ... Receptive Vocabulary school score test_name test_type deg_hl_below6 int_outside_option \ 19 101 80 NaN PPVT 1 1 31 101 96 NaN PPVT 1 1 51 101 93 NaN PPVT 0 0 95 101 114 NaN PPVT 1 0 144 101 94 NaN PPVT 0 1 mother_hs jcih age_years 19 0 0 3.416667 31 NaN 1 2.916667 51 1 1 0.083333 95 NaN NaN 4.000000 144 1 1 2.083333 [5 rows x 77 columns]
receptive_vocab_dataset[covs].isnull().sum()
age_years 7 school 0 score 0 male 0 family_inv 254 sib 71 deg_hl_below6 186 mother_hs 373 synd_or_disab 92 jcih 495 dtype: int64
receptive_vocab_dataset = receptive_vocab_dataset[receptive_vocab_dataset.age.notnull() &
receptive_vocab_dataset.male.notnull() &
receptive_vocab_dataset.int_outside_option.notnull()]
receptive_vocab_dataset.age_int.hist(bins=40)
<matplotlib.axes._subplots.AxesSubplot at 0x107abe590>
receptive_vocab_dataset.shape
(894, 77)
receptive_vocab_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');
M_receptive_vocab = MCMC(generate_model(receptive_vocab_dataset))
M_receptive_vocab.sample(200000, 190000)
[-----------------100%-----------------] 200000 of 200000 complete in 14702.4 sec
#order = np.argsort(M.beta.stats()['mean'])[::-1]
Matplot.summary_plot([M_receptive_vocab.beta], rhat=False,
custom_labels=labels)
M_receptive_vocab.beta.summary()
beta: Mean SD MC Error 95% HPD interval ------------------------------------------------------------------ -3.094 0.515 0.036 [-4.077 -2.033] 0.615 1.239 0.11 [-2.098 2.796] -5.179 0.574 0.043 [-6.231 -4.086] -1.701 0.61 0.046 [-2.827 -0.505] 5.038 1.341 0.123 [ 2.507 7.692] 5.874 1.218 0.107 [ 3.402 8.194] -7.876 1.192 0.104 [-10.06 -5.343] 2.687 1.38 0.125 [ 0.393 5.995] Posterior quantiles: 2.5 25 50 75 97.5 |---------------|===============|===============|---------------| -4.058 -3.443 -3.119 -2.759 -2.008 -1.932 -0.151 0.631 1.407 3.17 -6.188 -5.596 -5.212 -4.774 -4.03 -2.852 -2.122 -1.723 -1.278 -0.515 2.239 4.161 5.019 6.002 7.513 3.542 5.102 5.861 6.649 8.365 -10.59 -8.597 -7.823 -7.058 -5.642 0.22 1.807 2.527 3.452 5.871