Pediatrics Paper Analysis

All studies with successful outcomes reported for early-identified children who are deaf or hard of hearing have intervention provided by specialists who are trained in parent-infant intervention services.[12,90,97] Early intervention programs should develop mechanisms to ensure that early intervention professionals have special skills necessary for providing families with the highest quality of service specific to children with hearing loss. Professionals with a background in deaf education, audiology, and speech-language pathology will typically have the skills needed for providing intervention services. Professionals should be highly qualified in their respective fields and should be skilled communicators who are knowledgeable and sensitive to the importance of enhancing families' strengths and supporting their priorities. When early intervention professionals have knowledge of the principles of adult learning, it increases their success with parents and other professionals.

TJC must be competent to provide services.

References:

  1. Yoshinaga-Itano C, Sedey AL, Coulter DK, Mehl AL. Language of early- and later-identified children with hearing loss. Pediatrics.1998;102 :1161– 1171
  2. Moeller MP. Early intervention and language development in children who are deaf and hard of hearing. Pediatrics.2000;106(3) :e43 . Available at: www.pediatrics.org/cgi/content/full/106/3/e43
  3. Calderon R. Parental involvement in deaf children's education programs as a predictor of child's language, early reading, and social-emotional development. J Deaf Stud Deaf Educ.2000;5 :140– 155

Purpose of the study is to examine the relationship between highly qualified providers, age of enrollment in intervention, and speech and language outcomes at 4 years of age in a group of children enrolled in school programs that specialize in the development of listening and spoken language.

  • Does age of enrollment in intervention impact articulation, vocabulary, and language scores? (age_int)
  • Does place of enrollment in intervention impact articulation, vocabulary, and language scores? (age and age_int)
  • Is there an interaction effect between the place and age?
  • What other factors contribute to the variance?
  • Parental involvement
  • Degree of hearing loss
  • Mother’s education level (mother_ed)
  • Prematurity (premature_weeks)
  • Gender (gender)
  • Language (prim_lang)
  • Number of children in the home (sib)
  • Age when first amplified (age_amp)
  • Additional disability (secondary_diagnosis, known_synd)
In [1]:
# Import modules and set options
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

pd.set_option('display.notebook_repr_html', False)
pd.set_option('display.max_columns', 20)
pd.set_option('display.max_rows', 100)

Import data

In [2]:
lsl_dr = pd.read_csv('lsl_dr.csv', index_col=0)
/usr/local/lib/python3.4/site-packages/pandas/io/parsers.py:1159: DtypeWarning: Columns (31,33,41) have mixed types. Specify dtype option on import or set low_memory=False.
  data = self._reader.read(nrows)
In [3]:
lsl_dr.head()
Out[3]:
          redcap_event_name academic_year  hl  male  _race  prim_lang  sib  \
0  initial_assessment_arm_1     2005-2006   0     0      0          0    0   
1  initial_assessment_arm_1     2007-2008   0     0      0          0    2   
2  initial_assessment_arm_1           NaN   0   NaN    NaN        NaN  NaN   
3  initial_assessment_arm_1          2014   0   NaN    NaN        NaN  NaN   
4  initial_assessment_arm_1     2008-2009   0     0      0          0    1   

   _mother_ed  father_ed  premature_age         ...           known_synd  \
0           6          6              8         ...                    0   
1           6          6              2         ...                  NaN   
2         NaN        NaN            NaN         ...                    0   
3         NaN        NaN            NaN         ...                    0   
4           2          4              9         ...                  NaN   

   synd_or_disab  race  age_test        domain  school  score  test_name  \
0              1     0       NaN           NaN     NaN    NaN        NaN   
1            NaN     0       NaN           NaN     NaN    NaN        NaN   
2              0   NaN       NaN           NaN     NaN    NaN        NaN   
3              0   NaN       NaN           NaN     NaN    NaN        NaN   
4            NaN     0        62  Articulation     522     40        NaN   

   test_type  academic_year_start  
0        NaN                 2005  
1        NaN                 2007  
2        NaN                  NaN  
3        NaN                 2014  
4    Goldman                 2008  

[5 rows x 73 columns]

Indicator for non-profound hearing loss

In [4]:
lsl_dr['deg_hl_below6'] = lsl_dr.degree_hl<6
lsl_dr.loc[lsl_dr.degree_hl.isnull(), 'deg_hl_below6'] = np.nan

Indicator for first intervention outside OPTION

In [5]:
lsl_dr['int_outside_option'] = lsl_dr.age > lsl_dr.age_int
lsl_dr.loc[lsl_dr.age < lsl_dr.age_int, 'int_outside_option'] = np.nan

Indicator for high school graduation of mother

In [6]:
lsl_dr['mother_hs'] = lsl_dr.mother_ed > 1
lsl_dr.loc[lsl_dr.mother_ed.isnull(), 'mother_hs'] = None

Joint commission benchmark

In [7]:
lsl_dr.onset_1.unique()
Out[7]:
array([  21. ,    0. ,    nan,   25. ,    1. ,    3. ,   12. ,   24. ,
          7. ,   46. ,    9. ,    2. ,   15. ,   17. ,   36. ,   18. ,
         28. ,   10. ,   48. ,   29. ,   23. ,   30. ,    6. ,    4. ,
         20. ,   19. ,   11. ,    8. ,   14. ,   31. ,   16. ,   27. ,
         33. ,   22. ,   35. ,    5. ,   13. ,   26. ,   60. ,   32. ,
         44. ,   40. ,    1.5,   52. ,   43. ,   38. ,   34. ,   54. ,
         41. ,   50. ,   51. ,  116. ,   72. ,   42. ,   57. ,   65. ,
         78. ,   56. ,   59. ,   83. ,   61. ,  107. ,   64. ,   74. ,
         37. ,   77. ,   62. ,  119. ,   88. ,   84. ,    0.5,   66. ,
         80. ,   58. ,   53. ,   47. ,   81. ,   39. ,   63. ,  140. ,
         45. ,   67. ,   49. ,    2.5,   86. ,  126. ,   85. ,  133. ,
        103. ,   98. ,   87. ,   76. ,   68. ,   92. ,   97. ,   71. ,
         55. ,   70. ,   75. ,  152. ,   89. ,  154. ])
In [8]:
lsl_dr['ident_by_3'] = ident_by_3 = lsl_dr.onset_1 <= 3
lsl_dr['program_by_6'] = program_by_6 = lsl_dr.age <= 6
lsl_dr['jcih'] = ident_by_3 & program_by_6
lsl_dr.loc[lsl_dr.onset_1.isnull() | lsl_dr.age.isnull(), 'jcih'] = None

Create age in years variable

In [9]:
lsl_dr['age_years'] = lsl_dr.age/12.

Retain only 4-year-olds

In [10]:
lsl_dr = lsl_dr[(lsl_dr.age_test>=48) & (lsl_dr.age_test<60)]
In [11]:
lsl_dr.shape
Out[11]:
(5847, 80)
In [12]:
lsl_dr.study_id.unique().shape
Out[12]:
(1369,)

Summary Statistics for Paper

In [13]:
unique_students = lsl_dr.drop_duplicates(subset='study_id')
In [14]:
unique_students.shape
Out[14]:
(1369, 80)
In [15]:
unique_students.age.describe()
Out[15]:
count    1351.00000
mean       29.73131
std        15.62684
min         1.00000
25%        18.00000
50%        32.00000
75%        40.50000
max        68.00000
Name: age, dtype: float64
In [16]:
unique_students.male.mean()
Out[16]:
0.52126099706744866
In [17]:
unique_students.sib.describe()
Out[17]:
count    1261.000000
mean        1.175258
std         0.930211
min         0.000000
25%         1.000000
50%         1.000000
75%         2.000000
max         3.000000
Name: sib, dtype: float64
In [18]:
(unique_students.degree_hl == 6).describe()
Out[18]:
count         1369
mean     0.4704164
std      0.4993064
min          False
25%              0
50%              0
75%              1
max           True
Name: degree_hl, dtype: object
In [19]:
((unique_students.degree_hl < 6) & (unique_students.degree_hl > 1)).describe()
Out[19]:
count         1369
mean     0.3798393
std       0.485524
min          False
25%              0
50%              0
75%              1
max           True
Name: degree_hl, dtype: object
In [20]:
unique_students.mother_hs.describe()
Out[20]:
count    846.000000
mean       0.569740
std        0.495405
min        0.000000
25%        0.000000
50%        1.000000
75%        1.000000
max        1.000000
Name: mother_hs, dtype: float64
In [21]:
unique_students.synd_or_disab.describe()
Out[21]:
count    1207.000000
mean        0.218724
std         0.413552
min         0.000000
25%         0.000000
50%         0.000000
75%         0.000000
max         1.000000
Name: synd_or_disab, dtype: float64
In [22]:
grouped_age = unique_students.replace({'jcih':{1:'JCIH', 0:'non-JCIH'}}).groupby('jcih')['age_years']
fig, ax = plt.subplots()

for k, v in grouped_age:
    v.hist(label=k, alpha=.75, ax=ax, grid=False)

ax.legend()
plt.xlabel('Age (years)'); plt.ylabel('Frequency');

Receptive Language Test Score Model

In [23]:
# Filter domain and English-only, and appropriate bilateral subset
receptive_language_dataset = lsl_dr[(lsl_dr.domain=='Language') & (lsl_dr.test_type=='receptive') &
                                     (lsl_dr.non_english==False)]

receptive_language_dataset.head()
Out[23]:
            redcap_event_name academic_year  hl  male  _race  prim_lang  sib  \
24   initial_assessment_arm_1     2007-2008   0     0      0          0    0   
77   initial_assessment_arm_1     2010-2011   0     1      0          0    0   
126  initial_assessment_arm_1     2011-2012   0     1      0          0    2   
139  initial_assessment_arm_1     2011-2012   0     1      0          0    0   
156  initial_assessment_arm_1     2011-2012   0     1      1          0    2   

     _mother_ed  father_ed  premature_age    ...      test_name  test_type  \
24            6          6              8    ...            PLS  receptive   
77            3          1              8    ...            PLS  receptive   
126           6          6              8    ...        CELF-P2  receptive   
139           6          6              3    ...            PLS  receptive   
156           0          6              8    ...            PLS  receptive   

     academic_year_start  deg_hl_below6  int_outside_option  mother_hs  \
24                  2007              0                   1        NaN   
77                  2010              0                   1          0   
126                 2011              0                   1        NaN   
139                 2011              0                   1        NaN   
156                 2011              0                   1          0   

     ident_by_3  program_by_6  jcih  age_years  
24        False         False     0   2.250000  
77        False         False     0   4.166667  
126        True         False     0   4.166667  
139       False         False     0   3.666667  
156        True         False     0   3.333333  

[5 rows x 80 columns]

This is the size of the resulting dataset to be used in this analysis

In [24]:
receptive_language_dataset.shape
Out[24]:
(901, 80)
In [25]:
mean_score = receptive_language_dataset.groupby('study_id')['score'].mean()
In [26]:
receptive_language_dataset = receptive_language_dataset.drop_duplicates('study_id')
In [27]:
receptive_language_dataset = receptive_language_dataset.set_index('study_id')
In [28]:
receptive_language_dataset['mean_score'] = mean_score

Specify criterion to use: jcih, ident_by_3, program_by_6

In [31]:
criterion_labels = {'jcih': 'JCIH',
                   'ident_by_3': 'Diagnosed by 3 months',
                   'program_by_6': 'Intervention by 6 months'}

criterion = 'ident_by_3'

Proportion of missing values for each variable

In [32]:
covs = ['age_years', 'school', 'score', 'male', 
        'family_inv', 'sib', 'deg_hl_below6', 'mother_hs', 'synd_or_disab'] + [criterion]
(receptive_language_dataset[covs].isnull().sum() / receptive_language_dataset.shape[0]).round(2)
Out[32]:
age_years        0.01
school           0.00
score            0.00
male             0.00
family_inv       0.19
sib              0.07
deg_hl_below6    0.05
mother_hs        0.32
synd_or_disab    0.13
ident_by_3       0.00
dtype: float64

Drop individuals with missing age, since there is <1% of them

In [33]:
receptive_language_dataset = receptive_language_dataset[receptive_language_dataset.age.notnull()]
In [34]:
receptive_language_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');
In [35]:
receptive_language_dataset.mother_ed.hist()
Out[35]:
<matplotlib.axes._subplots.AxesSubplot at 0x10d358f98>

Final analysis dataset size

In [36]:
receptive_language_dataset.shape
Out[36]:
(823, 80)
In [37]:
missing_pct = (receptive_language_dataset[covs].isnull().mean()*100).round(1)
missing_pct.sort(ascending=False)
missing_pct
Out[37]:
mother_hs        32.1
family_inv       19.4
synd_or_disab    13.1
sib               6.3
deg_hl_below6     4.6
ident_by_3        0.0
male              0.0
score             0.0
school            0.0
age_years         0.0
dtype: float64
In [38]:
receptive_language_dataset['age_amp_missing'] = receptive_language_dataset.age_amp.isnull()
by_age_amp_missing = receptive_language_dataset.groupby('age_amp_missing')
In [39]:
by_age_amp_missing['male', 'synd_or_disab', 'deg_hl_below6', 'mother_hs',
                            'non_english'].mean()
Out[39]:
                     male  synd_or_disab  deg_hl_below6  mother_hs
age_amp_missing                                                   
False            0.511986       0.238748       0.483304   0.600985
True             0.514644       0.308824       0.458333   0.607843
In [40]:
for col in ['age', 'score', 'degree_hl', 'family_inv', 'age_int']:
    try:
        _ = by_age_amp_missing.boxplot(column=col)
    except:
        pass
/usr/local/lib/python3.4/site-packages/pandas/tools/plotting.py:2625: FutureWarning: 
The default value for 'return_type' will change to 'axes' in a future release.
 To use the future behavior now, set return_type='axes'.
 To keep the previous behavior and silence this warning, set return_type='dict'.
  warnings.warn(msg, FutureWarning)
In [41]:
receptive_language_dataset['jcih_missing'] = receptive_language_dataset.jcih.isnull()
by_jcih_missing = receptive_language_dataset.groupby('jcih_missing')
In [42]:
by_jcih_missing['male', 'synd_or_disab', 'deg_hl_below6', 'mother_hs',
                            'non_english'].mean()
Out[42]:
                  male  synd_or_disab  deg_hl_below6  mother_hs
jcih_missing                                                   
False         0.517471       0.237094       0.477002   0.610048
True          0.500000       0.317708       0.474747   0.581560
In [43]:
for col in ['age', 'score', 'degree_hl', 'family_inv', 'age_int']:
    _ = by_jcih_missing.boxplot(column=col)
/usr/local/lib/python3.4/site-packages/pandas/tools/plotting.py:2625: FutureWarning: 
The default value for 'return_type' will change to 'axes' in a future release.
 To use the future behavior now, set return_type='axes'.
 To keep the previous behavior and silence this warning, set return_type='dict'.
  warnings.warn(msg, FutureWarning)
In [44]:
receptive_language_dataset.age_years.mean()
Out[44]:
2.3378898339408671
In [45]:
from pymc import Bernoulli, Normal, Uniform, invlogit, logit, deterministic, MCMC, Matplot, MAP
from pymc import Lambda, Dirichlet, Categorical, Beta, NegativeBinomial, Exponential, T
from numpy.ma import masked_values

def generate_model(dataset):
    
    # Extract data to local variables
    age_years = dataset.age_years.values
    school = dataset.school.values
    score = dataset.score.values
    non_english = dataset.non_english.astype(int).values
    male = dataset.male.values
    sib = dataset.sib.values
    synd_or_disab = dataset.synd_or_disab.values
    int_outside_option = dataset.int_outside_option.values
    synd_or_disab = dataset.synd_or_disab.values
    jcih = dataset.jcih.values
    study_id = dataset.index.values
    
    # Transform some data
    age_std = age_years - age_years.mean()
    
    # Imputation of family involvement
    n_family_inv_cats = len(dataset.family_inv.unique())
    p_family_inv = Dirichlet("p_family_inv", np.ones(n_family_inv_cats), value=[1./n_family_inv_cats]*(n_family_inv_cats-1))
    family_inv_masked = masked_values(dataset.family_inv.fillna(0.5), value=0.5)
    x_family_inv = Categorical('x_family_inv', p_family_inv, value=family_inv_masked, observed=True)
    
    # Imputation of hearing loss
    p_crit = Beta("p_%s" % criterion, 1, 1, value=0.5)
    crit_masked = masked_values(dataset[criterion].values, value=np.nan)
    x_crit = Bernoulli('x_%s' % criterion, p_crit, value=crit_masked, observed=True)
    
    # Imputation of hearing loss
    p_hl = Beta("p_hl", 1, 1, value=0.5)
    hl_masked = masked_values(dataset.deg_hl_below6.values, value=np.nan)
    x_hl = Bernoulli('x_hl', p_hl, value=hl_masked, observed=True)
    
    # Imputation of maternal education
    p_mother_hs = Beta("p_mother_hs", 1, 1, value=0.5)
    mother_hs_masked = masked_values(dataset.mother_hs.values, value=np.nan)
    x_mother_hs = Bernoulli('x_mother_hs', p_mother_hs, value=mother_hs_masked, observed=True)
    
    # Imputation of previous disability
    p_synd_or_disab = Beta("p_synd_or_disab", 1, 1, value=0.5)
    synd_or_disab_masked = masked_values(dataset.synd_or_disab.values, value=np.nan)
    x_synd_or_disab = Bernoulli('x_synd_or_disab', p_synd_or_disab, value=synd_or_disab_masked, observed=True)
    
    # Imputation of siblings
    n_sib_cats = len(dataset.sib.unique())
    p_sib = Dirichlet("p_sib", np.ones(n_sib_cats), value=[1./n_sib_cats]*(n_sib_cats-1))
    sib_masked = masked_values(dataset.sib.fillna(0.5), value=0.5)
    x_sib = Categorical('x_sib', p_sib, value=sib_masked, observed=True)
    
    mean_score = Normal("mean_score", 100, 0.001, value=80)
    
    # Indices to school random effects
    unique_schools = np.unique(school)
    school_index = [list(unique_schools).index(s) for s in school]

    # School random effect
    sigma_school = Uniform("sigma_school", 0, 1000, value=10)
    tau_school = sigma_school**-2
    alpha_school = Normal("alpha_school", mu=0, tau=tau_school, value=np.zeros(len(unique_schools)))
    
    # Indices to child random effects
    unique_child = np.unique(study_id)
    child_index = [list(unique_child).index(s) for s in study_id]
    
    # Robust individual random effect
    nu_child = Exponential("nu_child", 0.1, value=10)
    alpha_child = T("alpha_child", nu=nu_child, value=np.zeros(len(unique_child)))
    
    @deterministic
    def intercept(a0=mean_score, a1=alpha_school):
        """Random intercepts"""
        return a0 + a1[school_index]
    
    # Fixed effects
    beta = Normal("beta", 0, 0.001, value=[0]*8)

    @deterministic
    def theta(b0=intercept, b=beta, x_family_inv=x_family_inv, x_sib=x_sib, 
              x_hl=x_hl, x_mother_hs=x_mother_hs, x_synd_or_disab=x_synd_or_disab, 
              x_crit=x_crit,
             a=alpha_child):
        
        # Complete predictors
        X = [age_std, male]
        # Partially-imputed predictors
        X += [x_family_inv, x_sib, x_hl, x_mother_hs, x_synd_or_disab, x_crit]
        return b0 + np.dot(b, X) + a[child_index]
    
    sigma = Uniform("sigma", 0, 1000, value=10)
    tau = sigma**-2
    
    # Data likelihood
    score_like = Normal("score_like", mu=theta, tau=tau, value=score, observed=True)
    #score_like_pred = Normal("score_like_pred", mu=theta, tau=tau, value=score)
    
    # Predictions:
    # Girl, average age, normal family involvement, 2 siblings, non-profound hearing loss,
    # mother with HS, no previous disability, JCIH
    x1 = [0, 0, 0, 2, 1, 1, 0, 1]
    
    # Boy, 6 months younger than average, normal family involvement, no siblings, profound hearing loss,
    # mother without HS diploma, previous disability, JCIH
    x2 = [-0.5, 1, 0, 0, 0, 0, 1, 1]
    
    # Boy, 1.5 years older than average, poor family involvement, 1 sibling, non-profound hearing loss,
    # mother without HS diploma, previous disability, no JCIH
    x3 = [1.5, 1, 2, 1, 1, 0, 1, 0]
    
    # Girl, 6 months older than average, impaired family involvement, 3 siblings, profound hearing loss,
    # mother with HS diploma, no previous disability, JCIH
    x4 = [0.5, 1, 1, 3, 0, 1, 0, 1]
    
    X = np.c_[x1, x2, x3, x4]

    theta_pred = Lambda('theta_pred', lambda b=beta, mu=mean_score: mu + np.dot(X.T, b))
    predictions = Normal('predictions', mu=theta_pred, tau=tau)
    
    
    return locals()
In [46]:
M_reclang = MCMC(generate_model(receptive_language_dataset), db='sqlite', name='reclang', dbmode='w')
In [47]:
iterations = 100000
burn = 90000
In [48]:
M_reclang.sample(iterations, burn)
 [-----------------100%-----------------] 100000 of 100000 complete in 365.7 sec
In [49]:
labels = ['Age of Enrollment', 'Male', 
                       'Reduced Family Involvement', 'Sibling Count', 'Non-profound Hearing Loss',
                       'Mother with HS Diploma','Additional Disability', 
                      criterion_labels[criterion]]
In [50]:
Matplot.summary_plot([M_reclang.beta], rhat=False,
        custom_labels=labels, x_range=(-10,10), main='Receptive Language')
In [51]:
Matplot.summary_plot([M_reclang.beta], rhat=False,
        custom_labels=labels, x_range=(-10,10), main='Receptive Language')
In [52]:
M_reclang.beta.summary()
beta:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	-3.428           0.517            0.03             [-4.475 -2.486]
	-0.081           1.283            0.098            [-2.444  2.36 ]
	-5.33            0.577            0.034            [-6.443 -4.235]
	-0.62            0.662            0.043            [-2.034  0.524]
	7.259            1.301            0.103            [ 4.596  9.55 ]
	5.071            1.323            0.109            [ 2.73   7.842]
	-6.063           1.329            0.106            [-8.643 -3.514]
	2.887            1.271            0.096            [ 0.31   5.279]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	-4.512           -3.761          -3.409         -3.047        -2.495
	-2.427           -1.03           -0.041         0.784         2.461
	-6.436           -5.742          -5.318         -4.93         -4.206
	-1.97            -1.067          -0.602         -0.142        0.617
	4.676            6.444           7.306          8.13          9.703
	2.456            4.186           5.089          5.885         7.668
	-8.623           -6.904          -6.023         -5.169        -3.493
	0.31             2.066           2.898          3.731         5.279
	
In [53]:
Matplot.summary_plot([M_reclang.predictions], rhat=False, custom_labels=['Student {}'.format(i) for i in range(1,5)])

Expressive Language Model

In [54]:
expressive_language_dataset = lsl_dr[(lsl_dr.domain=='Language') & (lsl_dr.test_type=='expressive') &
                                     (lsl_dr.non_english==False)]

expressive_language_dataset.head()
Out[54]:
            redcap_event_name academic_year  hl  male  _race  prim_lang  sib  \
25   initial_assessment_arm_1     2007-2008   0     0      0          0    0   
78   initial_assessment_arm_1     2010-2011   0     1      0          0    0   
127  initial_assessment_arm_1     2011-2012   0     1      0          0    2   
140  initial_assessment_arm_1     2011-2012   0     1      0          0    0   
157  initial_assessment_arm_1     2011-2012   0     1      1          0    2   

     _mother_ed  father_ed  premature_age    ...      test_name   test_type  \
25            6          6              8    ...            PLS  expressive   
78            3          1              8    ...            PLS  expressive   
127           6          6              8    ...        CELF-P2  expressive   
140           6          6              3    ...            PLS  expressive   
157           0          6              8    ...            PLS  expressive   

     academic_year_start  deg_hl_below6  int_outside_option  mother_hs  \
25                  2007              0                   1        NaN   
78                  2010              0                   1          0   
127                 2011              0                   1        NaN   
140                 2011              0                   1        NaN   
157                 2011              0                   1          0   

     ident_by_3  program_by_6  jcih  age_years  
25        False         False     0   2.250000  
78        False         False     0   4.166667  
127        True         False     0   4.166667  
140       False         False     0   3.666667  
157        True         False     0   3.333333  

[5 rows x 80 columns]
In [55]:
mean_score = expressive_language_dataset.groupby('study_id')['score'].mean()
expressive_language_dataset = expressive_language_dataset.drop_duplicates('study_id').set_index('study_id')
expressive_language_dataset['mean_score'] = mean_score
In [56]:
(expressive_language_dataset[covs].isnull().sum() / expressive_language_dataset.shape[0]).round(2)
Out[56]:
age_years        0.01
school           0.00
score            0.00
male             0.00
family_inv       0.19
sib              0.07
deg_hl_below6    0.05
mother_hs        0.32
synd_or_disab    0.14
ident_by_3       0.00
dtype: float64
In [57]:
expressive_language_dataset = expressive_language_dataset[expressive_language_dataset.age.notnull() 
                                                        & expressive_language_dataset.int_outside_option.notnull()]
In [58]:
expressive_language_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');
In [59]:
M_explang = MCMC(generate_model(expressive_language_dataset), db='sqlite', name='explang', dbmode='w')
In [60]:
M_explang.sample(iterations, burn)
 [-----------------100%-----------------] 100000 of 100000 complete in 370.3 sec
In [61]:
M_explang.beta.summary()
beta:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	-3.754           0.517            0.026            [-4.78  -2.784]
	-0.515           1.266            0.097            [-2.813  2.076]
	-4.479           0.64             0.04             [-5.71  -3.163]
	-0.738           0.645            0.039            [-2.021  0.469]
	8.289            1.216            0.091          [  6.023  10.813]
	7.108            1.332            0.105            [ 4.38   9.771]
	-6.047           1.36             0.104            [-8.309 -3.109]
	4.425            1.439            0.119            [ 1.759  7.434]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	-4.766           -4.112          -3.755         -3.397        -2.746
	-2.834           -1.372          -0.626         0.36          2.066
	-5.802           -4.887          -4.468         -4.083        -3.208
	-2.067           -1.168          -0.708         -0.301        0.461
	5.807            7.501           8.288          9.149         10.611
	4.236            6.336           7.139          7.956         9.728
	-8.518           -7.026          -6.126         -5.147        -3.285
	1.763            3.468           4.351          5.366         7.444
	
In [62]:
Matplot.summary_plot([M_explang.beta], custom_labels=labels, rhat=False, main='Expressive Language', x_range=(-10,10))

Making predictions

Student 1 Girl, 1 year older than average, normal family involvement, 2 siblings, non-profound hearing loss, mother with HS, no previous disability, JCIH

Student 2 Boy, 6 months younger than average, normal family involvement, no siblings, profound hearing loss, mother without HS diploma, previous disability, JCIH

Student 3 Boy, 1.5 years older than average, poor family involvement, 1 sibling, non-profound hearing loss, mother without HS diploma, previous disability, no JCIH

Student 4 Girl, 6 months older than average, impaired family involvement, 3 siblings, profound hearing loss, mother with HS diploma, no previous disability, JCIH

In [63]:
M_reclang.predictions.summary()
predictions:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	101.115          16.217           0.197        [  69.935  133.267]
	86.021           16.319           0.247        [  54.886  117.944]
	72.165           16.292           0.224        [  39.335  103.286]
	86.637           16.299           0.222        [  54.807  117.918]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	69.967           90.362          101.032        111.894       133.345
	54.939           74.795          85.932         97.033        118.039
	40.229           61.379          72.064         83.122        104.486
	55.07            75.482          86.847         97.641        118.27
	
In [64]:
Matplot.summary_plot([M_reclang.predictions], rhat=False, custom_labels=['Student {}'.format(i) for i in range(1,5)])
In [65]:
M_explang.predictions.summary()
predictions:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	97.373           16.974           0.225        [  65.488  131.935]
	79.11            16.863           0.284        [  47.281  113.033]
	65.741           16.895           0.27           [ 32.002  97.796]
	81.518           16.802           0.233        [  50.735  116.322]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	63.681           85.855          97.487         108.804       130.158
	46.146           67.758          79.102         90.373        112.017
	32.47            54.364          65.759         77.228        98.611
	48.715           69.992          81.759         92.8          114.343
	
In [66]:
Matplot.summary_plot([M_explang.predictions], rhat=False, custom_labels=['Student {}'.format(i) for i in range(1,5)])

Articulation Model

In [67]:
articulation_dataset = lsl_dr[(lsl_dr.domain=='Articulation') & 
                                     (lsl_dr.non_english==False)]
articulation_dataset.head()
Out[67]:
            redcap_event_name academic_year  hl  male  _race  prim_lang  sib  \
15   initial_assessment_arm_1     2011-2012   1     1      0          0  NaN   
21   initial_assessment_arm_1     2007-2008   0     0      0          0    0   
26   initial_assessment_arm_1     2011-2012   0     0      0          0    2   
123  initial_assessment_arm_1     2011-2012   0     1      0          0    2   
136  initial_assessment_arm_1     2011-2012   0     1      0          0    0   

     _mother_ed  father_ed  premature_age    ...      test_name  test_type  \
15            6          6              9    ...            NaN    Goldman   
21            6          6              8    ...            NaN    Goldman   
26            6          6              9    ...            NaN    Goldman   
123           6          6              8    ...            NaN    Goldman   
136           6          6              3    ...            NaN    Goldman   

     academic_year_start  deg_hl_below6  int_outside_option  mother_hs  \
15                  2011            NaN                   0        NaN   
21                  2007              0                   1        NaN   
26                  2011              1                   1        NaN   
123                 2011              0                   1        NaN   
136                 2011              0                   1        NaN   

     ident_by_3  program_by_6  jcih  age_years  
15        False         False   NaN   4.166667  
21        False         False     0   2.250000  
26         True         False     0   4.833333  
123        True         False     0   4.166667  
136       False         False     0   3.666667  

[5 rows x 80 columns]
In [68]:
mean_score = articulation_dataset.groupby('study_id')['score'].mean()
articulation_dataset = articulation_dataset.drop_duplicates('study_id').set_index('study_id')
articulation_dataset['mean_score'] = mean_score
In [69]:
(articulation_dataset[covs].isnull().sum() / articulation_dataset.shape[0]).round(2)
Out[69]:
age_years        0.00
school           0.00
score            0.00
male             0.00
family_inv       0.22
sib              0.07
deg_hl_below6    0.13
mother_hs        0.36
synd_or_disab    0.12
ident_by_3       0.00
dtype: float64
In [70]:
articulation_dataset = articulation_dataset[articulation_dataset.age.notnull()
                                            & articulation_dataset.int_outside_option.notnull()]
In [71]:
articulation_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');
In [72]:
M_articulation = MCMC(generate_model(articulation_dataset), db='sqlite', name='articulation', dbmode='w')
In [73]:
M_articulation.sample(iterations, burn)
 [-----------------100%-----------------] 100000 of 100000 complete in 372.0 sec
In [74]:
Matplot.summary_plot([M_articulation.beta], 
        custom_labels=labels, rhat=False, main='Articulation', x_range=(-10,10))
In [75]:
Matplot.summary_plot([M_articulation.predictions], rhat=False, custom_labels=['Student {}'.format(i) for i in range(1,5)])
In [76]:
M_articulation.beta.summary()
beta:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	-3.055           0.564            0.032            [-4.118 -1.936]
	1.027            1.38             0.101            [-1.463  3.706]
	-4.268           0.651            0.039            [-5.461 -2.953]
	0.009            0.713            0.04             [-1.475  1.327]
	6.518            1.331            0.104            [ 4.096  9.31 ]
	5.219            1.631            0.134            [ 2.221  8.196]
	-7.6             1.416            0.109          [-10.425  -4.826]
	2.372            1.489            0.114            [-0.427  5.113]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	-4.138           -3.437          -3.061         -2.679        -1.949
	-1.463           0.065           1.032          1.916         3.73
	-5.51            -4.69           -4.267         -3.836        -2.966
	-1.394           -0.459          0.013          0.474         1.448
	3.845            5.591           6.518          7.404         9.134
	2.16             4.011           5.228          6.365         8.182
	-10.545          -8.55           -7.533         -6.612        -4.838
	-0.363           1.293           2.367          3.432         5.299
	

Expressive Vocabulary Model

In [77]:
expressive_vocab_dataset = lsl_dr[(lsl_dr.domain=='Expressive Vocabulary') & 
                                     (lsl_dr.non_english==False)]
expressive_vocab_dataset.head()
Out[77]:
           redcap_event_name academic_year  hl  male  _race  prim_lang  sib  \
16  initial_assessment_arm_1     2011-2012   1     1      0          0  NaN   
22  initial_assessment_arm_1     2007-2008   0     0      0          0    0   
27  initial_assessment_arm_1     2011-2012   0     0      0          0    2   
51  initial_assessment_arm_1     2012-2013   0     0      0          0    2   
75  initial_assessment_arm_1     2010-2011   0     1      0          0    0   

    _mother_ed  father_ed  premature_age    ...      test_name  test_type  \
16           6          6              9    ...            NaN        EVT   
22           6          6              8    ...            NaN     EOWPVT   
27           6          6              9    ...            NaN        EVT   
51           6          6              8    ...            NaN        EVT   
75           3          1              8    ...            NaN        EVT   

    academic_year_start  deg_hl_below6  int_outside_option  mother_hs  \
16                 2011            NaN                   0        NaN   
22                 2007              0                   1        NaN   
27                 2011              1                   1        NaN   
51                 2012              1                   1        NaN   
75                 2010              0                   1          0   

    ident_by_3  program_by_6  jcih  age_years  
16       False         False   NaN   4.166667  
22       False         False     0   2.250000  
27        True         False     0   4.833333  
51       False         False     0   4.000000  
75       False         False     0   4.166667  

[5 rows x 80 columns]
In [78]:
mean_score = expressive_vocab_dataset.groupby('study_id')['score'].mean()
expressive_vocab_dataset = expressive_vocab_dataset.drop_duplicates('study_id').set_index('study_id')
expressive_vocab_dataset['mean_score'] = mean_score
In [79]:
expressive_vocab_dataset[covs].isnull().sum()
Out[79]:
age_years          6
school             0
score              0
male               0
family_inv       219
sib               70
deg_hl_below6    123
mother_hs        351
synd_or_disab    115
ident_by_3         0
dtype: int64
In [80]:
expressive_vocab_dataset = expressive_vocab_dataset[expressive_vocab_dataset.age.notnull() 
                                                    & expressive_vocab_dataset.male.notnull()
                                                    & expressive_vocab_dataset.int_outside_option.notnull()]
In [81]:
expressive_vocab_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');
In [82]:
M_expressive_vocab = MCMC(generate_model(expressive_vocab_dataset), db='sqlite', name='expressive_vocab', dbmode='w')
In [83]:
M_expressive_vocab.sample(iterations, burn)
 [-----------------100%-----------------] 100000 of 100000 complete in 408.6 sec
In [84]:
Matplot.summary_plot([M_expressive_vocab.beta], 
        custom_labels=labels, rhat=False, main='Expressive Vocabulary', x_range=(-10,10))
In [85]:
M_expressive_vocab.beta.summary()
beta:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	-3.21            0.521            0.029            [-4.197 -2.212]
	1.747            1.092            0.082            [-0.097  3.974]
	-5.124           0.658            0.046            [-6.45  -3.894]
	-1.359           0.656            0.042            [-2.706 -0.152]
	6.911            1.242            0.098            [ 4.565  9.378]
	6.528            1.472            0.127            [ 3.497  9.217]
	-5.709           1.259            0.098            [-8.155 -3.362]
	3.273            1.211            0.088            [ 0.916  5.556]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	-4.227           -3.587          -3.196         -2.85         -2.226
	-0.275           0.963           1.748          2.506         3.886
	-6.366           -5.577          -5.145         -4.67         -3.77
	-2.648           -1.802          -1.356         -0.911        -0.086
	4.595            6.016           6.874          7.759         9.418
	3.331            5.593           6.634          7.529         9.152
	-8.225           -6.587          -5.64          -4.841        -3.37
	0.916            2.438           3.27           4.133         5.588
	
In [86]:
Matplot.summary_plot(M_explang.alpha_school, custom_labels=[' ']*42, main='Expressive language school effects', 
                     x_range=(-25, 25))
Could not calculate Gelman-Rubin statistics. Requires multiple chains of equal length.
In [87]:
Matplot.summary_plot(M_expressive_vocab.alpha_school, custom_labels=[' ']*42, main='Expressive vocabulary school effects',
                     x_range=(-25, 25))
Could not calculate Gelman-Rubin statistics. Requires multiple chains of equal length.

Receptive Vocabulary Model

In [88]:
receptive_vocab_dataset = lsl_dr[(lsl_dr.domain=='Receptive Vocabulary') & 
                                     (lsl_dr.non_english==False)]
receptive_vocab_dataset.head()
Out[88]:
           redcap_event_name academic_year  hl  male  _race  prim_lang  sib  \
17  initial_assessment_arm_1     2011-2012   1     1      0          0  NaN   
23  initial_assessment_arm_1     2007-2008   0     0      0          0    0   
28  initial_assessment_arm_1     2011-2012   0     0      0          0    2   
52  initial_assessment_arm_1     2012-2013   0     0      0          0    2   
76  initial_assessment_arm_1     2010-2011   0     1      0          0    0   

    _mother_ed  father_ed  premature_age    ...      test_name  test_type  \
17           6          6              9    ...            NaN       PPVT   
23           6          6              8    ...            NaN       PPVT   
28           6          6              9    ...            NaN       PPVT   
52           6          6              8    ...            NaN       PPVT   
76           3          1              8    ...            NaN       PPVT   

    academic_year_start  deg_hl_below6  int_outside_option  mother_hs  \
17                 2011            NaN                   0        NaN   
23                 2007              0                   1        NaN   
28                 2011              1                   1        NaN   
52                 2012              1                   1        NaN   
76                 2010              0                   1          0   

    ident_by_3  program_by_6  jcih  age_years  
17       False         False   NaN   4.166667  
23       False         False     0   2.250000  
28        True         False     0   4.833333  
52       False         False     0   4.000000  
76       False         False     0   4.166667  

[5 rows x 80 columns]
In [89]:
mean_score = receptive_vocab_dataset.groupby('study_id')['score'].mean()
receptive_vocab_dataset = receptive_vocab_dataset.drop_duplicates('study_id').set_index('study_id')
receptive_vocab_dataset['mean_score'] = mean_score
In [90]:
receptive_vocab_dataset[covs].isnull().sum()
Out[90]:
age_years          5
school             0
score              0
male               0
family_inv       224
sib               69
deg_hl_below6    126
mother_hs        357
synd_or_disab    115
ident_by_3         0
dtype: int64
In [91]:
receptive_vocab_dataset = receptive_vocab_dataset[receptive_vocab_dataset.age.notnull() & 
                                                  receptive_vocab_dataset.male.notnull() &
                                                  receptive_vocab_dataset.int_outside_option.notnull()]
In [92]:
receptive_vocab_dataset.age_int.hist(bins=40)
Out[92]:
<matplotlib.axes._subplots.AxesSubplot at 0x10ef79860>
In [93]:
receptive_vocab_dataset.shape
Out[93]:
(961, 80)
In [94]:
receptive_vocab_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');
In [95]:
M_receptive_vocab = MCMC(generate_model(receptive_vocab_dataset), db='sqlite', name='receptive_vocab', dbmode='w')
In [96]:
M_receptive_vocab.sample(iterations, burn)
 [-----------------100%-----------------] 100000 of 100000 complete in 436.2 sec
In [97]:
#order = np.argsort(M.beta.stats()['mean'])[::-1]
Matplot.summary_plot([M_receptive_vocab.beta], rhat=False,
        custom_labels=labels, main='Receptive Vocabulary', x_range=(-10,10))
In [98]:
M_receptive_vocab.beta.summary()
beta:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	-3.27            0.475            0.025            [-4.204 -2.341]
	0.595            1.053            0.076            [-1.559  2.641]
	-4.967           0.565            0.037            [-6.055 -3.866]
	-1.312           0.664            0.042            [-2.55   0.018]
	5.93             1.118            0.084            [ 3.613  8.03 ]
	4.919            1.445            0.124            [ 2.007  7.778]
	-6.64            1.166            0.089            [-8.751 -4.289]
	1.876            1.324            0.109            [-0.558  4.444]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	-4.157           -3.584          -3.291         -2.954        -2.258
	-1.662           -0.002          0.635          1.281         2.601
	-6.038           -5.338          -4.976         -4.594        -3.829
	-2.559           -1.753          -1.349         -0.877        0.018
	3.646            5.177           5.998          6.674         8.117
	2.079            3.977           4.878          5.816         7.97
	-8.965           -7.455          -6.645         -5.789        -4.463
	-0.731           0.913           1.868          2.835         4.364
	

Database queries

In [99]:
import sqlite3
In [100]:
db_files = !ls *sqlite
In [101]:
articulation = sqlite3.connect("articulation.sqlite")
In [102]:
c_artic = articulation.cursor()
In [103]:
c_artic.execute('SELECT v1 FROM sigma_school WHERE trace==0')
Out[103]:
<sqlite3.Cursor at 0x114568110>
In [104]:
sigma_school_artic = np.ravel(c_artic.fetchall())
In [105]:
sigma_school_artic.mean()
Out[105]:
5.7395983710999996
In [106]:
for dbname in db_files:
    db = sqlite3.connect(dbname)
    cur = db.cursor()
    cur.execute('SELECT v1 FROM sigma_school WHERE trace==0')
    trace = np.ravel(cur.fetchall())
    trace.sort()
    print('\n'+dbname)
    print(np.median(trace))
    print(trace[[int(len(trace)*0.025), int(len(trace)*0.975)]])
articulation.sqlite
5.606214
[ 3.668585  8.544194]

explang.sqlite
5.3168805
[ 3.607359  7.595157]

expressive_vocab.sqlite
8.525305
[  6.582717  11.287013]

receptive_vocab.sqlite
6.660398
[ 5.19439   8.751019]

reclang.sqlite
5.473053
[ 4.032111  7.406457]
In [107]:
expressive_vocab_data = lsl_dr[lsl_dr.domain=='Expressive Vocabulary']
In [108]:
expressive_vocab_data.score.hist()
Out[108]:
<matplotlib.axes._subplots.AxesSubplot at 0x1146446d8>