Pediatrics Paper Analysis¶

All studies with successful outcomes reported for early-identified children who are deaf or hard of hearing have intervention provided by specialists who are trained in parent-infant intervention services.[12,90,97] Early intervention programs should develop mechanisms to ensure that early intervention professionals have special skills necessary for providing families with the highest quality of service specific to children with hearing loss. Professionals with a background in deaf education, audiology, and speech-language pathology will typically have the skills needed for providing intervention services. Professionals should be highly qualified in their respective fields and should be skilled communicators who are knowledgeable and sensitive to the importance of enhancing families' strengths and supporting their priorities. When early intervention professionals have knowledge of the principles of adult learning, it increases their success with parents and other professionals.

TJC must be competent to provide services.

References:

Yoshinaga-Itano C, Sedey AL, Coulter DK, Mehl AL. Language of early- and later-identified children with hearing loss. Pediatrics.1998;102 :1161– 1171
Moeller MP. Early intervention and language development in children who are deaf and hard of hearing. Pediatrics.2000;106(3) :e43 . Available at: www.pediatrics.org/cgi/content/full/106/3/e43
Calderon R. Parental involvement in deaf children's education programs as a predictor of child's language, early reading, and social-emotional development. J Deaf Stud Deaf Educ.2000;5 :140– 155

Purpose of the study is to examine the relationship between highly qualified providers, age of enrollment in intervention, and speech and language outcomes at 4 years of age in a group of children enrolled in school programs that specialize in the development of listening and spoken language.

Does age of enrollment in intervention impact articulation, vocabulary, and language scores? (age_int)
Does place of enrollment in intervention impact articulation, vocabulary, and language scores? (age and age_int)
Is there an interaction effect between the place and age?
What other factors contribute to the variance?
Parental involvement
Degree of hearing loss
Mother’s education level (mother_ed)
Prematurity (premature_weeks)
Gender (gender)
Language (prim_lang)
Number of children in the home (sib)
Age when first amplified (age_amp)
Additional disability (secondary_diagnosis, known_synd)

In [1]:

# Import modules and set options
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
pd.set_option('display.notebook_repr_html', False)

pd.set_option('display.max_columns', 20)
pd.set_option('display.max_rows', 100)

Import data

In [2]:

lsl_dr = pd.read_csv('lsl_dr.csv', index_col=0)

/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py:1130: DtypeWarning: Columns (34,42) have mixed types. Specify dtype option on import or set low_memory=False.
  data = self._reader.read(nrows)

In [3]:

lsl_dr.head()

Out[3]:

   index               redcap_event_name academic_year  hl  male  _race  \
0      0  Initial Assessment/Enrollment      2002-2003   0     0      0   
1      0  Initial Assessment/Enrollment      2002-2003   0     0      0   
2      1    Year 1 Complete (7/1 - 6/30)     2003-2004   0     0      0   
3      2    Year 2 Complete (7/1 - 6/30)     2004-2005   0     0      0   
4      2    Year 2 Complete (7/1 - 6/30)     2004-2005   0     0      0   

   prim_lang  sib  _mother_ed  father_ed    ...      age_diag  known_synd  \
0          0    1           6          6    ...            15           0   
1          0    1           6          6    ...            15           0   
2          0    1           6          6    ...            15           0   
3          0    1           6          6    ...            15           0   
4          0    1           6          6    ...            15           0   

   synd_or_disab  race  age_test                 domain  school  score  \
0              0     0        54  Expressive Vocabulary     101     60   
1              0     0        54               Language     101     58   
2              0     0        80           Articulation     101     78   
3              0     0        80  Expressive Vocabulary     101     84   
4              0     0        80   Receptive Vocabulary     101     90   

   test_name   test_type  
0        NaN      EOWPVT  
1        PLS  expressive  
2        NaN     Goldman  
3        NaN      EOWPVT  
4        NaN        PPVT  

[5 rows x 72 columns]

Indicator for non-profound hearing loss

In [4]:

lsl_dr['deg_hl_below6'] = lsl_dr.degree_hl<6
lsl_dr.deg_hl_below6[lsl_dr.degree_hl.isnull()] = np.nan

Indicator for first intervention outside OPTION

In [5]:

lsl_dr['int_outside_option'] = lsl_dr.age > lsl_dr.age_int
lsl_dr.int_outside_option[lsl_dr.age < lsl_dr.age_int] = np.nan

Indicator for high school graduation of mother

In [6]:

lsl_dr['mother_hs'] = lsl_dr.mother_ed > 1
lsl_dr.mother_hs[lsl_dr.mother_ed.isnull()] = None

Joint commission benchmark

In [7]:

lsl_dr.onset_1.unique()

Out[7]:

array([  15. ,   32. ,    2. ,    nan,    0. ,    4. ,    1. ,   24. ,
          3. ,    7. ,   13. ,   10. ,   42. ,   34. ,   12. ,    1.5,
         22. ,   21. ,   23. ,    5. ,   28. ,   29. ,   51. ,   44. ,
         16. ,   14. ,   35. ,   18. ,   41. ,   33. ,   37. ,    6. ,
         46. ,   40. ,   47. ,   31. ,   30. ,   48. ,   11. ,   20. ,
         26. ,   36. ,   25. ,   19. ,    9. ,   39. ,   86. ,   17. ,
         38. ,   27. ,    8. ,   64. ,   58. ,   70. ,   49. ,   55. ,
         60. ,   50. ,  116. ,   72. ,   57. ,   43. ,   65. ,   78. ,
         83. ,   61. ,  107. ,   74. ,   77. ,   62. ,   80. ,   88. ,
         59. ,   53. ,   67. ,    2.5,   52. ,   97. ,   92. ,   69. ,
        103. ,  133. ,  126. ,  168. ,  140. ,   68. ,  120. ,   56. ])

In [8]:

ident_by_3 = lsl_dr.onset_1 <= 3
program_by_6 = lsl_dr.age <= 6
lsl_dr['jcih'] = ident_by_3 | program_by_6
lsl_dr.jcih[lsl_dr.onset_1.isnull() | lsl_dr.age.isnull()] = None

Create age in years variable

In [9]:

lsl_dr['age_years'] = lsl_dr.age/12.

Retain only 4-year-olds

In [10]:

lsl_dr = lsl_dr[(lsl_dr.age_test>=48) & (lsl_dr.age_test<60)]

In [11]:

lsl_dr.shape

Out[11]:

(5076, 77)

In [12]:

lsl_dr.study_id.unique().shape

Out[12]:

(1203,)

Receptive Language Test Score Model¶

In [13]:

# Filter domain and English-only, and appropriate bilateral subset
receptive_language_dataset = lsl_dr[(lsl_dr.domain=='Language') & (lsl_dr.test_type=='receptive') &
                                     (lsl_dr.non_english==False)]

receptive_language_dataset.head()

Out[13]:

    index             redcap_event_name academic_year  hl  male  _race  \
20     10  Year 1 Complete (7/1 - 6/30)     2003-2004   0     0      0   
32     14  Year 2 Complete (7/1 - 6/30)     2008-2009   0     1      0   
52     20  Year 3 Complete (7/1 - 6/30)     2007-2008   0     1      0   
77     32  Year 2 Complete (7/1 - 6/30)     2008-2009   0     1      0   
80     33  Year 3 Complete (7/1 - 6/30)     2008-2009   0     1      0   

    prim_lang  sib  _mother_ed  father_ed    ...        domain  school  score  \
20          0    1           2          2    ...      Language     101     77   
32          0    0           6          6    ...      Language     101     85   
52          0    1           4          4    ...      Language     101     86   
77          0    2           5          4    ...      Language     101     92   
80          0    2           5          4    ...      Language     101     92   

    test_name  test_type  deg_hl_below6  int_outside_option  mother_hs  jcih  \
20        PLS  receptive              1                   1          0     0   
32    CELF-P2  receptive              1                   1        NaN     1   
52    CELF-P2  receptive              0                   0          1     1   
77    CELF-P2  receptive            NaN                   0          1     0   
80    CELF-P2  receptive            NaN                   0          1     0   

    age_years  
20   3.416667  
32   2.916667  
52   0.083333  
77   2.416667  
80   2.416667  

[5 rows x 77 columns]

This is the size of the resulting dataset to be used in this analysis

In [14]:

receptive_language_dataset.shape

Out[14]:

(749, 77)

Proportion of missing values for each variable

In [15]:

covs = ['age_years', 'school', 'score', 'male', 
        'family_inv', 'sib', 'deg_hl_below6', 'mother_hs', 'synd_or_disab', 'jcih']
(receptive_language_dataset[covs].isnull().sum() / receptive_language_dataset.shape[0]).round(2)

Out[15]:

age_years        0.01
school           0.00
score            0.00
male             0.00
family_inv       0.24
sib              0.08
deg_hl_below6    0.09
mother_hs        0.34
synd_or_disab    0.12
jcih             0.44
dtype: float64

Drop individuals with missing age, since there is <1% of them

In [16]:

receptive_language_dataset = receptive_language_dataset[receptive_language_dataset.age.notnull()]

In [17]:

receptive_language_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');

In [18]:

receptive_language_dataset.mother_ed.hist()

Out[18]:

<matplotlib.axes._subplots.AxesSubplot at 0x10724e0d0>

Final analysis dataset size

In [19]:

receptive_language_dataset.shape

Out[19]:

(743, 77)

In [20]:

receptive_language_dataset[covs].isnull().sum()

Out[20]:

age_years          0
school             0
score              0
male               0
family_inv       180
sib               57
deg_hl_below6     62
mother_hs        246
synd_or_disab     88
jcih             324
dtype: int64

In [21]:

receptive_language_dataset['age_amp_missing'] = receptive_language_dataset.age_amp.isnull()
by_age_amp_missing = receptive_language_dataset.groupby('age_amp_missing')

In [22]:

by_age_amp_missing['male', 'synd_or_disab', 'deg_hl_below6', 'mother_hs',
                            'non_english'].mean()

Out[22]:

                     male  synd_or_disab  deg_hl_below6  mother_hs
age_amp_missing                                                   
False            0.495146       0.235955       0.466667   0.588652
True             0.531722       0.274247       0.487973   0.651163

In [23]:

for col in ['age', 'score', 'degree_hl', 'family_inv', 'age_int']:
    try:
        _ = by_age_amp_missing.boxplot(column=col)
    except:
        pass

/usr/local/lib/python2.7/site-packages/pandas/tools/plotting.py:2385: FutureWarning: 
The default value for 'return_type' will change to 'axes' in a future release.
 To use the future behavior now, set return_type='axes'.
 To keep the previous behavior and silence this warning, set return_type='dict'.
  warnings.warn(msg, FutureWarning)

In [24]:

receptive_language_dataset['jcih_missing'] = receptive_language_dataset.jcih.isnull()
by_jcih_missing = receptive_language_dataset.groupby('jcih_missing')

In [25]:

by_jcih_missing['male', 'synd_or_disab', 'deg_hl_below6', 'mother_hs',
                            'non_english'].mean()

Out[25]:

                  male  synd_or_disab  deg_hl_below6  mother_hs
jcih_missing                                                   
False         0.498807       0.239669       0.468514   0.582456
True          0.527778       0.270548       0.485915   0.660377

In [26]:

for col in ['age', 'score', 'degree_hl', 'family_inv', 'age_int']:
    _ = by_jcih_missing.boxplot(column=col)

In [27]:

from pymc import Bernoulli, Normal, Uniform, invlogit, logit, deterministic, MCMC, Matplot, MAP
from pymc import Lambda, Dirichlet, Categorical, Beta, NegativeBinomial, Exponential
from numpy.ma import masked_values

def generate_model(dataset):
    
    # Extract data to local variables
    age_years = dataset.age_years.values
    school = dataset.school.values
    study_id = dataset.study_id.values
    score = dataset.score.values
    non_english = dataset.non_english.astype(int).values
    male = dataset.male.values
    sib = dataset.sib.values
    synd_or_disab = dataset.synd_or_disab.values
    int_outside_option = dataset.int_outside_option.values
    synd_or_disab = dataset.synd_or_disab.values
    jcih = dataset.jcih.values
    
    # Transform some data
    age_std = age_years - age_years.mean()
    
    # Imputation of family involvement
    n_family_inv_cats = len(dataset.family_inv.unique())
    p_family_inv = Dirichlet("p_family_inv", np.ones(n_family_inv_cats), value=[1./n_family_inv_cats]*(n_family_inv_cats-1))
    family_inv_masked = masked_values(dataset.family_inv.fillna(0.5), value=0.5)
    x_family_inv = Categorical('x_family_inv', p_family_inv, value=family_inv_masked, observed=True)
    
    # Imputation of hearing loss
    p_jcih = Beta("p_jcih", 1, 1, value=0.5)
    jcih_masked = masked_values(dataset.jcih.values, value=np.nan)
    x_jcih = Bernoulli('x_jcih', p_jcih, value=jcih_masked, observed=True)
    
    # Imputation of hearing loss
    p_hl = Beta("p_hl", 1, 1, value=0.5)
    hl_masked = masked_values(dataset.deg_hl_below6.values, value=np.nan)
    x_hl = Bernoulli('x_hl', p_hl, value=hl_masked, observed=True)
    
    # Imputation of maternal education
    p_mother_hs = Beta("p_mother_hs", 1, 1, value=0.5)
    mother_hs_masked = masked_values(dataset.mother_hs.values, value=np.nan)
    x_mother_hs = Bernoulli('x_mother_hs', p_mother_hs, value=mother_hs_masked, observed=True)
    
    # Imputation of previous disability
    p_synd_or_disab = Beta("p_synd_or_disab", 1, 1, value=0.5)
    synd_or_disab_masked = masked_values(dataset.synd_or_disab.values, value=np.nan)
    x_synd_or_disab = Bernoulli('x_synd_or_disab', p_synd_or_disab, value=synd_or_disab_masked, observed=True)
    
    # Imputation of age at intervention
#     mu_ai = Uniform("mu_ai", 0, 1000, value=10)
#     a_ai = Exponential("a_ai", 1, value=1)
#     age_int_masked = masked_values(dataset.age_int.fillna(0.5), value=0.5)
#     x_age_int = NegativeBinomial('x_age_int', mu_ai, a_ai, value=age_int_masked, observed=True)
    
    # Imputation of siblings
    n_sib_cats = len(dataset.sib.unique())
    p_sib = Dirichlet("p_sib", np.ones(n_sib_cats), value=[1./n_sib_cats]*(n_sib_cats-1))
    sib_masked = masked_values(dataset.sib.fillna(0.5), value=0.5)
    x_sib = Categorical('x_sib', p_sib, value=sib_masked, observed=True)
    
    mean_score = Normal("mean_score", 100, 0.001, value=80)
    
    # Indices to school random effects
    unique_schools = np.unique(school)
    school_index = [list(unique_schools).index(s) for s in school]

    # School random effect
    sigma_school = Uniform("sigma_school", 0, 1000, value=10)
    tau_school = sigma_school**-2
    alpha_school = Normal("alpha_school", mu=0, tau=tau_school, value=np.zeros(len(unique_schools)))
    
    # Indices to child random effects
    unique_child = np.unique(study_id)
    child_index = [list(unique_child).index(s) for s in study_id]
    
    # Individual random effect
    sigma_child = Uniform("sigma_child", 0, 1000, value=10)
    tau_child = sigma_child**-2    
    alpha_child = Normal("alpha_child", mu=0, tau=tau_child, value=np.zeros(len(unique_child)))
    
    @deterministic
    def intercept(a0=mean_score, a1=alpha_school, a2=alpha_child):
        """Random intercepts"""
        return a0 + a1[school_index] + a2[child_index]
    
    # Fixed effects
    beta = Normal("beta", 0, 0.001, value=[0]*8)

    @deterministic
    def theta(b0=intercept, b=beta, x_family_inv=x_family_inv, x_sib=x_sib, 
              x_hl=x_hl, x_mother_hs=x_mother_hs, x_synd_or_disab=x_synd_or_disab, x_jcih=x_jcih):
        
        # Complete predictors
        X = [age_std, male]
        # Partially-imputed predictors
        X += [x_family_inv, x_sib, x_hl, x_mother_hs, x_synd_or_disab, x_jcih]
        return b0 + np.dot(b, X)
    
    sigma = Uniform("sigma", 0, 1000, value=10)
    tau = sigma**-2
    
    # Data likelihood
    score_like = Normal("score_like", mu=theta, tau=tau, value=score, observed=True)
    score_like_pred = Normal("score_like_pred", mu=theta, tau=tau, value=score)
    
    # Predictions:
    # Girl, 1 year older than average, normal family involvement, 2 siblings, non-profound hearing loss,
    # mother with HS, no previous disability, JCIH
    x1 = [0, 0, 0, 2, 1, 1, 0, 1]
    # Boy, 6 months younger than average, normal family involvement, no siblings, profound hearing loss,
    # mother without HS diploma, previous disability, JCIH
    x2 = [-0.5, 1, 0, 0, 0, 0, 1, 1]
    # Boy, 1.5 years older than average, poor family involvement, 1 sibling, non-profound hearing loss,
    # mother without HS diploma, previous disability, no JCIH
    x3 = [1.5, 1, 2, 1, 1, 0, 1, 0]
    # Girl, 6 months older than average, impaired family involvement, 3 siblings, profound hearing loss,
    # mother with HS diploma, no previous disability, JCIH
    x4 = [0.5, 1, 1, 3, 0, 1, 0, 1]
    X = np.c_[x1, x2, x3, x4]
#     import pdb; pdb.set_trace()
    theta_pred = Lambda('theta_pred', lambda b=beta, mu=mean_score: mu + np.dot(X.T, b))
    predictions = Normal('predictions', mu=theta_pred, tau=tau)
    
    
    return locals()

In [28]:

M_reclang = MCMC(generate_model(receptive_language_dataset))

In [30]:

M_reclang.sample(200000, 190000)

 [-----------------100%-----------------] 200000 of 200000 complete in 2529.8 sec

In [31]:

labels = ['Age of Enrollment', 'Male', 
                       'Reduced Family Involvement', 'Sibling Count', 'Non-profound Hearing Loss',
                       'Mother with HS Diploma','Additional Disability', 'JCIH']
Matplot.summary_plot([M_reclang.beta], rhat=False,
        custom_labels=labels)

In [32]:

M_reclang.beta.summary()

beta:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	-3.133           0.505            0.034            [-4.16  -2.174]
	-0.19            1.308            0.114            [-2.453  2.53 ]
	-5.453           0.559            0.036            [-6.557 -4.305]
	-0.684           0.62             0.044            [-1.881  0.509]
	6.313            1.322            0.117            [ 3.841  9.039]
	5.139            1.408            0.126            [ 2.154  7.753]
	-6.381           1.361            0.121            [-8.609 -3.807]
	4.252            1.658            0.153            [ 0.91   7.616]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	-4.131           -3.479          -3.124         -2.804        -2.124
	-2.566           -1.09           -0.217         0.669         2.496
	-6.629           -5.818          -5.439         -5.086        -4.333
	-1.907           -1.089          -0.652         -0.271        0.504
	3.658            5.376           6.238          7.273         8.929
	2.401            4.227           5.202          6.032         8.026
	-8.703           -7.368          -6.523         -5.387        -3.861
	0.598            3.36            4.339          5.282         7.364

In [33]:

Matplot.summary_plot([M_reclang.predictions], rhat=False, custom_labels=['Student {}'.format(i) for i in range(1,5)])

In [ ]:

Matplot.gof_plot(M_reclang.score_like_pred.trace(), M_reclang.score, verbose=0)

Expressive Language Model¶

In [35]:

expressive_language_dataset = lsl_dr[(lsl_dr.domain=='Language') & (lsl_dr.test_type=='expressive') &
                                     (lsl_dr.non_english==False)]

expressive_language_dataset.head()

Out[35]:

    index               redcap_event_name academic_year  hl  male  _race  \
1       0  Initial Assessment/Enrollment      2002-2003   0     0      0   
21     10    Year 1 Complete (7/1 - 6/30)     2003-2004   0     0      0   
33     14    Year 2 Complete (7/1 - 6/30)     2008-2009   0     1      0   
53     20    Year 3 Complete (7/1 - 6/30)     2007-2008   0     1      0   
78     32    Year 2 Complete (7/1 - 6/30)     2008-2009   0     1      0   

    prim_lang  sib  _mother_ed  father_ed    ...        domain  school  score  \
1           0    1           6          6    ...      Language     101     58   
21          0    1           2          2    ...      Language     101     77   
33          0    0           6          6    ...      Language     101     67   
53          0    1           4          4    ...      Language     101     81   
78          0    2           5          4    ...      Language     101    105   

    test_name   test_type  deg_hl_below6  int_outside_option  mother_hs  jcih  \
1         PLS  expressive              0                   1        NaN     0   
21        PLS  expressive              1                   1          0     0   
33    CELF-P2  expressive              1                   1        NaN     1   
53    CELF-P2  expressive              0                   0          1     1   
78    CELF-P2  expressive            NaN                   0          1     0   

    age_years  
1    4.333333  
21   3.416667  
33   2.916667  
53   0.083333  
78   2.416667  

[5 rows x 77 columns]

In [36]:

(expressive_language_dataset[covs].isnull().sum() / expressive_language_dataset.shape[0]).round(2)

Out[36]:

age_years        0.01
school           0.00
score            0.00
male             0.00
family_inv       0.24
sib              0.08
deg_hl_below6    0.09
mother_hs        0.34
synd_or_disab    0.12
jcih             0.43
dtype: float64

In [37]:

expressive_language_dataset = expressive_language_dataset[expressive_language_dataset.age.notnull() 
                                                        & expressive_language_dataset.int_outside_option.notnull()]

In [38]:

expressive_language_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');

In [39]:

M_explang = MCMC(generate_model(expressive_language_dataset))

In [40]:

M_explang.sample(200000, 190000)

 [-----------------100%-----------------] 200000 of 200000 complete in 2007.5 sec

In [42]:

Matplot.summary_plot([M_explang.beta], custom_labels=labels, rhat=False)

In [45]:

Matplot.summary_plot([M_explang.predictions], rhat=False, custom_labels=['Student {}'.format(i) for i in range(1,5)])

Articulation Model¶

In [46]:

articulation_dataset = lsl_dr[(lsl_dr.domain=='Articulation') & 
                                     (lsl_dr.non_english==False)]
articulation_dataset.head()

Out[46]:

     index             redcap_event_name academic_year  hl  male  _race  \
18      10  Year 1 Complete (7/1 - 6/30)     2003-2004   0     0      0   
29      14  Year 2 Complete (7/1 - 6/30)     2008-2009   0     1      0   
49      20  Year 3 Complete (7/1 - 6/30)     2007-2008   0     1      0   
93      43  Year 1 Complete (7/1 - 6/30)           NaN   0     0      0   
139     69  Year 2 Complete (7/1 - 6/30)     2012-2013   0     1      0   

     prim_lang  sib  _mother_ed  father_ed    ...            domain  school  \
18           0    1           2          2    ...      Articulation     101   
29           0    0           6          6    ...      Articulation     101   
49           0    1           4          4    ...      Articulation     101   
93           0    2           6          6    ...      Articulation     101   
139          0    0           5          6    ...      Articulation     101   

     score  test_name  test_type  deg_hl_below6  int_outside_option  \
18      97        NaN    Goldman              1                   1   
29      75        NaN    Goldman              1                   1   
49     100        NaN    Goldman              0                   0   
93     120        NaN    Goldman              1                   0   
139     65        NaN    Goldman              0                   1   

     mother_hs  jcih  age_years  
18           0     0   3.416667  
29         NaN     1   2.916667  
49           1     1   0.083333  
93         NaN   NaN   4.000000  
139          1     1   2.083333  

[5 rows x 77 columns]

In [47]:

(articulation_dataset[covs].isnull().sum() / articulation_dataset.shape[0]).round(2)

Out[47]:

age_years        0.01
school           0.00
score            0.00
male             0.00
family_inv       0.27
sib              0.09
deg_hl_below6    0.21
mother_hs        0.40
synd_or_disab    0.10
jcih             0.54
dtype: float64

In [48]:

articulation_dataset = articulation_dataset[articulation_dataset.age.notnull()
                                            & articulation_dataset.int_outside_option.notnull()]

In [49]:

articulation_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');

In [50]:

M_articulation = MCMC(generate_model(articulation_dataset))

In [51]:

M_articulation.sample(200000, 190000)

 [-----------------100%-----------------] 200000 of 200000 complete in 3985.2 sec

In [53]:

Matplot.summary_plot([M_articulation.beta], 
        custom_labels=labels, rhat=False)

In [54]:

Matplot.summary_plot([M_articulation.predictions], rhat=False, custom_labels=['Student {}'.format(i) for i in range(1,5)])

In [55]:

M_articulation.beta.summary()

beta:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	-2.578           0.55             0.035            [-3.627 -1.475]
	0.209            1.37             0.117            [-2.772  2.703]
	-4.463           0.738            0.059            [-5.864 -3.023]
	-1.243           0.752            0.056            [-2.733  0.216]
	5.76             1.366            0.117            [ 3.055  8.39 ]
	5.955            1.534            0.136            [ 3.237  9.183]
	-6.041           1.219            0.103            [-8.614 -3.82 ]
	4.849            1.556            0.138            [ 1.255  7.454]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	-3.649           -2.935          -2.57          -2.217        -1.475
	-2.89            -0.555          0.276          1.103         2.633
	-5.836           -5.014          -4.493         -3.957        -2.91
	-2.725           -1.741          -1.228         -0.735        0.251
	3.122            4.775           5.727          6.702         8.511
	2.881            4.879           6.02           6.995         8.927
	-8.568           -6.824          -5.981         -5.268        -3.713
	1.725            3.812           4.79           5.921         8.198

Expressive Vocabulary Model¶

In [56]:

expressive_vocab_dataset = lsl_dr[(lsl_dr.domain=='Expressive Vocabulary') & 
                                     (lsl_dr.non_english==False)]
expressive_vocab_dataset.head()

Out[56]:

    index               redcap_event_name academic_year  hl  male  _race  \
0       0  Initial Assessment/Enrollment      2002-2003   0     0      0   
30     14    Year 2 Complete (7/1 - 6/30)     2008-2009   0     1      0   
50     20    Year 3 Complete (7/1 - 6/30)     2007-2008   0     1      0   
76     32    Year 2 Complete (7/1 - 6/30)     2008-2009   0     1      0   
79     33    Year 3 Complete (7/1 - 6/30)     2008-2009   0     1      0   

    prim_lang  sib  _mother_ed  father_ed    ...                     domain  \
0           0    1           6          6    ...      Expressive Vocabulary   
30          0    0           6          6    ...      Expressive Vocabulary   
50          0    1           4          4    ...      Expressive Vocabulary   
76          0    2           5          4    ...      Expressive Vocabulary   
79          0    2           5          4    ...      Expressive Vocabulary   

    school  score  test_name  test_type  deg_hl_below6  int_outside_option  \
0      101     60        NaN     EOWPVT              0                   1   
30     101     90        NaN     EOWPVT              1                   1   
50     101    102        NaN     EOWPVT              0                   0   
76     101     89        NaN     EOWPVT            NaN                   0   
79     101     89        NaN     EOWPVT            NaN                   0   

    mother_hs  jcih  age_years  
0         NaN     0   4.333333  
30        NaN     1   2.916667  
50          1     1   0.083333  
76          1     0   2.416667  
79          1     0   2.416667  

[5 rows x 77 columns]

In [57]:

expressive_vocab_dataset[covs].isnull().sum()

Out[57]:

age_years          6
school             0
score              0
male               0
family_inv       241
sib               67
deg_hl_below6    182
mother_hs        354
synd_or_disab     86
jcih             476
dtype: int64

In [58]:

expressive_vocab_dataset = expressive_vocab_dataset[expressive_vocab_dataset.age.notnull() 
                                                    & expressive_vocab_dataset.male.notnull()
                                                    & expressive_vocab_dataset.int_outside_option.notnull()]

In [59]:

expressive_vocab_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');

In [60]:

M_expressive_vocab = MCMC(generate_model(expressive_vocab_dataset))

In [61]:

M_expressive_vocab.sample(200000, 190000)

 [-----------------100%-----------------] 200000 of 200000 complete in 2645.0 sec

In [63]:

Matplot.summary_plot([M_expressive_vocab.beta], 
        custom_labels=labels, rhat=False)

In [65]:

#order = np.argsort(M.beta.stats()['mean'])[::-1]
Matplot.summary_plot([M_expressive_vocab.beta], rhat=False, 
        custom_labels=labels)

In [66]:

M_expressive_vocab.beta.summary()

beta:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	-2.961           0.511            0.035            [-3.922 -1.983]
	0.873            1.168            0.098            [-1.502  3.261]
	-5.562           0.63             0.045            [-6.599 -4.294]
	-1.833           0.731            0.056            [-3.223 -0.299]
	5.595            1.421            0.127            [ 2.817  8.309]
	7.266            1.644            0.152          [  4.215  10.308]
	-7.076           1.31             0.113            [-9.646 -4.54 ]
	2.393            1.492            0.136            [-0.31   5.313]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	-3.921           -3.31           -2.969         -2.59         -1.977
	-1.502           0.115           0.828          1.58          3.27
	-6.692           -6.004          -5.571         -5.122        -4.325
	-3.271           -2.34           -1.844         -1.349        -0.323
	2.848            4.582           5.634          6.527         8.377
	3.793            6.109           7.327          8.564         10.098
	-9.671           -7.925          -7.017         -6.228        -4.547
	-0.271           1.381           2.307          3.352         5.487

Receptive Vocabulary Model¶

In [67]:

receptive_vocab_dataset = lsl_dr[(lsl_dr.domain=='Receptive Vocabulary') & 
                                     (lsl_dr.non_english==False)]
receptive_vocab_dataset.head()

Out[67]:

     index             redcap_event_name academic_year  hl  male  _race  \
19      10  Year 1 Complete (7/1 - 6/30)     2003-2004   0     0      0   
31      14  Year 2 Complete (7/1 - 6/30)     2008-2009   0     1      0   
51      20  Year 3 Complete (7/1 - 6/30)     2007-2008   0     1      0   
95      43  Year 1 Complete (7/1 - 6/30)           NaN   0     0      0   
144     70  Year 3 Complete (7/1 - 6/30)     2013-2014   0     1      0   

     prim_lang  sib  _mother_ed  father_ed    ...                    domain  \
19           0    1           2          2    ...      Receptive Vocabulary   
31           0    0           6          6    ...      Receptive Vocabulary   
51           0    1           4          4    ...      Receptive Vocabulary   
95           0    2           6          6    ...      Receptive Vocabulary   
144          0    0           5          6    ...      Receptive Vocabulary   

     school  score  test_name  test_type  deg_hl_below6  int_outside_option  \
19      101     80        NaN       PPVT              1                   1   
31      101     96        NaN       PPVT              1                   1   
51      101     93        NaN       PPVT              0                   0   
95      101    114        NaN       PPVT              1                   0   
144     101     94        NaN       PPVT              0                   1   

     mother_hs  jcih  age_years  
19           0     0   3.416667  
31         NaN     1   2.916667  
51           1     1   0.083333  
95         NaN   NaN   4.000000  
144          1     1   2.083333  

[5 rows x 77 columns]

In [68]:

receptive_vocab_dataset[covs].isnull().sum()

Out[68]:

age_years          7
school             0
score              0
male               0
family_inv       254
sib               71
deg_hl_below6    186
mother_hs        373
synd_or_disab     92
jcih             495
dtype: int64

In [69]:

receptive_vocab_dataset = receptive_vocab_dataset[receptive_vocab_dataset.age.notnull() & 
                                                  receptive_vocab_dataset.male.notnull() &
                                                  receptive_vocab_dataset.int_outside_option.notnull()]

In [70]:

receptive_vocab_dataset.age_int.hist(bins=40)

Out[70]:

<matplotlib.axes._subplots.AxesSubplot at 0x107abe590>

In [71]:

receptive_vocab_dataset.shape

Out[71]:

(894, 77)

In [72]:

receptive_vocab_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');

In [73]:

M_receptive_vocab = MCMC(generate_model(receptive_vocab_dataset))

In [74]:

M_receptive_vocab.sample(200000, 190000)

 [-----------------100%-----------------] 200000 of 200000 complete in 14702.4 sec

In [76]:

#order = np.argsort(M.beta.stats()['mean'])[::-1]
Matplot.summary_plot([M_receptive_vocab.beta], rhat=False,
        custom_labels=labels)

In [77]:

M_receptive_vocab.beta.summary()

beta:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	-3.094           0.515            0.036            [-4.077 -2.033]
	0.615            1.239            0.11             [-2.098  2.796]
	-5.179           0.574            0.043            [-6.231 -4.086]
	-1.701           0.61             0.046            [-2.827 -0.505]
	5.038            1.341            0.123            [ 2.507  7.692]
	5.874            1.218            0.107            [ 3.402  8.194]
	-7.876           1.192            0.104          [-10.06   -5.343]
	2.687            1.38             0.125            [ 0.393  5.995]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	-4.058           -3.443          -3.119         -2.759        -2.008
	-1.932           -0.151          0.631          1.407         3.17
	-6.188           -5.596          -5.212         -4.774        -4.03
	-2.852           -2.122          -1.723         -1.278        -0.515
	2.239            4.161           5.019          6.002         7.513
	3.542            5.102           5.861          6.649         8.365
	-10.59           -8.597          -7.823         -7.058        -5.642
	0.22             1.807           2.527          3.452         5.871