Pediatrics Paper Analysis¶

All studies with successful outcomes reported for early-identified children who are deaf or hard of hearing have intervention provided by specialists who are trained in parent-infant intervention services.[12,90,97] Early intervention programs should develop mechanisms to ensure that early intervention professionals have special skills necessary for providing families with the highest quality of service specific to children with hearing loss. Professionals with a background in deaf education, audiology, and speech-language pathology will typically have the skills needed for providing intervention services. Professionals should be highly qualified in their respective fields and should be skilled communicators who are knowledgeable and sensitive to the importance of enhancing families' strengths and supporting their priorities. When early intervention professionals have knowledge of the principles of adult learning, it increases their success with parents and other professionals.

TJC must be competent to provide services.

References:

Yoshinaga-Itano C, Sedey AL, Coulter DK, Mehl AL. Language of early- and later-identified children with hearing loss. Pediatrics.1998;102 :1161– 1171
Moeller MP. Early intervention and language development in children who are deaf and hard of hearing. Pediatrics.2000;106(3) :e43 . Available at: www.pediatrics.org/cgi/content/full/106/3/e43
Calderon R. Parental involvement in deaf children's education programs as a predictor of child's language, early reading, and social-emotional development. J Deaf Stud Deaf Educ.2000;5 :140– 155

Purpose of the study is to examine the relationship between highly qualified providers, age of enrollment in intervention, and speech and language outcomes at 4 years of age in a group of children enrolled in school programs that specialize in the development of listening and spoken language.

Does age of enrollment in intervention impact articulation, vocabulary, and language scores? (age_int)
Does place of enrollment in intervention impact articulation, vocabulary, and language scores? (age and age_int)
Is there an interaction effect between the place and age?
What other factors contribute to the variance?
Parental involvement
Degree of hearing loss
Mother’s education level (mother_ed)
Prematurity (premature_weeks)
Gender (gender)
Language (prim_lang)
Number of children in the home (sib)
Age when first amplified (age_amp)
Additional disability (secondary_diagnosis, known_synd)

In [1]:

# Import modules and set options
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
sns.set()

pd.set_option('display.notebook_repr_html', False)
pd.set_option('display.max_columns', 20)
pd.set_option('display.max_rows', 100)

Import data

In [2]:

lsl_dr = pd.read_csv('lsl_dr.csv', index_col=0)

/Users/fonnescj/anaconda3/lib/python3.5/site-packages/IPython/core/interactiveshell.py:2723: DtypeWarning: Columns (31,41) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)

In [3]:

lsl_dr.head()

Out[3]:

          redcap_event_name academic_year   hl  male  _race  prim_lang  sib  \
0  initial_assessment_arm_1     2002-2003  0.0   0.0    0.0        0.0  1.0   
1  initial_assessment_arm_1     2002-2003  0.0   0.0    0.0        0.0  1.0   
2  initial_assessment_arm_1     2002-2003  0.0   0.0    0.0        0.0  1.0   
3  initial_assessment_arm_1     2008-2009  0.0   0.0    0.0        0.0  2.0   
4  initial_assessment_arm_1     2008-2009  0.0   0.0    0.0        0.0  2.0   

   _mother_ed  father_ed  premature_age      ...       age_test  \
0         6.0        6.0            9.0      ...           54.0   
1         6.0        6.0            9.0      ...           54.0   
2         6.0        6.0            9.0      ...           54.0   
3         5.0        5.0            7.0      ...           53.0   
4         5.0        5.0            7.0      ...           53.0   

                  domain  school  score  test_name   test_type  \
0  Expressive Vocabulary   101.0   58.0        NaN      EOWPVT   
1               Language   101.0   51.0        PLS   receptive   
2               Language   101.0   60.0        PLS  expressive   
3           Articulation   628.0  107.0        NaN     Goldman   
4               Language   628.0  103.0    CELF-P2   receptive   

   academic_year_start    tech_class  age_year  non_profound  
0               2002.0       Bimodal       4.0         False  
1               2002.0       Bimodal       4.0         False  
2               2002.0       Bimodal       4.0         False  
3               2008.0  Bilateral HA       1.0          True  
4               2008.0  Bilateral HA       1.0          True  

[5 rows x 77 columns]

Indicator for non-profound hearing loss

In [4]:

lsl_dr['deg_hl_below6'] = lsl_dr.degree_hl<6
lsl_dr.deg_hl_below6[lsl_dr.degree_hl.isnull()] = np.nan

/Users/fonnescj/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app

Indicator for first intervention outside OPTION

In [5]:

lsl_dr['int_outside_option'] = lsl_dr.age > lsl_dr.age_int
lsl_dr.int_outside_option[lsl_dr.age < lsl_dr.age_int] = np.nan

/Users/fonnescj/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app

Indicator for high school graduation of mother

In [6]:

lsl_dr['mother_hs'] = lsl_dr.mother_ed > 1
lsl_dr.mother_hs[lsl_dr.mother_ed.isnull()] = None

/Users/fonnescj/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app

Joint commission benchmark

In [7]:

lsl_dr.onset_1.unique()

Out[7]:

array([  15. ,    4. ,    0. ,   26. ,   36. ,   24. ,   80. ,   14. ,
         62. ,    2. ,   49. ,   19. ,   23. ,   18. ,    9. ,    nan,
         10. ,   12. ,    1. ,    5. ,   30. ,    7. ,   51. ,    8. ,
          3. ,   17. ,   50. ,   31. ,   34. ,   28. ,   35. ,   38. ,
         95. ,   42. ,   13. ,   16. ,   61. ,   46. ,   22. ,   53. ,
         59. ,   88. ,    6. ,   37. ,   96. ,   52. ,   64. ,   65. ,
         48. ,   97. ,   25. ,   47. ,   79. ,  107. ,   74. ,   77. ,
         84. ,   60. ,   41. ,   33. ,   39. ,   27. ,   11. ,   20. ,
         21. ,   45. ,   29. ,   32. ,   81. ,    1.5,   55. ,   70. ,
         58. ,  154. ,   54. ,   78. ,   43. ,   57. ,   83. ,   44. ,
         72. ,  116. ,   40. ,  119. ,   63. ,   66. ,   56. ,   87. ,
         76. ,   68. ,   92. ,  140. ,   86. ,  126. ,   85. ,  133. ,
        103. ,   67. ,   71. ,    2.5,   98. ,   75. ,    0.5,  152. ,
         89. ])

In [8]:

ident_by_3 = lsl_dr.onset_1 <= 3
program_by_6 = lsl_dr.age <= 6
lsl_dr['jcih'] = ident_by_3 | program_by_6
lsl_dr.jcih[lsl_dr.onset_1.isnull() | lsl_dr.age.isnull()] = None

/Users/fonnescj/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Create age in years variable

In [9]:

lsl_dr['age_years'] = lsl_dr.age/12.

Retain only 4-year-olds

In [10]:

lsl_dr = lsl_dr[(lsl_dr.age_test>=48) & (lsl_dr.age_test<60)]

In [11]:

lsl_dr.shape

Out[11]:

(7130, 82)

In [12]:

lsl_dr.study_id.unique().shape

Out[12]:

(1670,)

Summary Statistics for Paper¶

In [13]:

unique_students = lsl_dr.drop_duplicates(subset='study_id')

In [14]:

unique_students.shape

Out[14]:

(1670, 82)

In [15]:

unique_students.age.describe()

Out[15]:

count    1648.000000
mean       29.581311
std        15.886114
min         0.000000
25%        18.000000
50%        32.000000
75%        41.000000
max        68.000000
Name: age, dtype: float64

In [16]:

unique_students.male.mean()

Out[16]:

0.5243535778713169

In [17]:

unique_students.sib.describe()

Out[17]:

count    1553.000000
mean        1.167418
std         0.933508
min         0.000000
25%         0.000000
50%         1.000000
75%         2.000000
max         3.000000
Name: sib, dtype: float64

In [18]:

(unique_students.degree_hl == 6).describe()

Out[18]:

count      1670
unique        2
top       False
freq        919
Name: degree_hl, dtype: object

In [19]:

((unique_students.degree_hl < 6) & (unique_students.degree_hl > 1)).describe()

Out[19]:

count      1670
unique        2
top       False
freq       1032
Name: degree_hl, dtype: object

In [20]:

unique_students.mother_hs.describe()

Out[20]:

count    1044.000000
mean        0.576628
std         0.494330
min         0.000000
25%         0.000000
50%         1.000000
75%         1.000000
max         1.000000
Name: mother_hs, dtype: float64

In [21]:

unique_students.synd_or_disab.describe()

Out[21]:

count    1469.000000
mean        0.208986
std         0.406723
min         0.000000
25%         0.000000
50%         0.000000
75%         0.000000
max         1.000000
Name: synd_or_disab, dtype: float64

In [22]:

grouped_age = unique_students.replace({'jcih':{1:'JCIH', 0:'non-JCIH'}}).groupby('jcih')['age_years']
fig, ax = plt.subplots()

for k, v in grouped_age:
    v.hist(label=k, alpha=.75, ax=ax, grid=False)

ax.legend()
plt.xlabel('Age (years)'); plt.ylabel('Frequency');

Receptive Language Test Score Model¶

In [23]:

# Filter domain and English-only, and appropriate bilateral subset
receptive_language_dataset = lsl_dr[(lsl_dr.domain=='Language') & (lsl_dr.test_type=='receptive') &
                                     (lsl_dr.non_english==False)]

receptive_language_dataset.head()

Out[23]:

            redcap_event_name academic_year   hl  male  _race  prim_lang  sib  \
1    initial_assessment_arm_1     2002-2003  0.0   0.0    0.0        0.0  1.0   
4    initial_assessment_arm_1     2008-2009  0.0   0.0    0.0        0.0  2.0   
67   initial_assessment_arm_1     2009-2010  0.0   0.0    0.0        0.0  1.0   
74   initial_assessment_arm_1     2009-2010  0.0   0.0    2.0        0.0  3.0   
103  initial_assessment_arm_1     2009-2010  0.0   1.0    0.0        0.0  3.0   

     _mother_ed  father_ed  premature_age    ...      test_type  \
1           6.0        6.0            9.0    ...      receptive   
4           5.0        5.0            7.0    ...      receptive   
67          4.0        3.0            9.0    ...      receptive   
74          2.0        2.0            8.0    ...      receptive   
103         2.0        4.0            8.0    ...      receptive   

     academic_year_start    tech_class  age_year  non_profound  deg_hl_below6  \
1                 2002.0       Bimodal       4.0         False            0.0   
4                 2008.0  Bilateral HA       1.0          True            1.0   
67                2009.0  Bilateral CI       0.0         False            0.0   
74                2009.0  Bilateral HA       2.0          True            1.0   
103               2009.0  Bilateral HA       1.0          True            1.0   

     int_outside_option  mother_hs  jcih  age_years  
1                   0.0        NaN   0.0   4.333333  
4                   1.0        1.0   0.0   1.416667  
67                  0.0        1.0   1.0   0.250000  
74                  0.0        0.0   1.0   2.166667  
103                 1.0        0.0   1.0   1.416667  

[5 rows x 82 columns]

This is the size of the resulting dataset to be used in this analysis

In [24]:

receptive_language_dataset.shape

Out[24]:

(1090, 82)

In [25]:

mean_score = receptive_language_dataset.groupby('study_id')['score'].mean()

In [26]:

receptive_language_dataset = receptive_language_dataset.drop_duplicates('study_id')

In [27]:

receptive_language_dataset = receptive_language_dataset.set_index('study_id')

In [28]:

receptive_language_dataset['mean_score'] = mean_score

Proportion of missing values for each variable

In [29]:

covs = ['age_years', 'school', 'score', 'male', 
        'family_inv', 'sib', 'deg_hl_below6', 'mother_hs', 'synd_or_disab', 'jcih']
(receptive_language_dataset[covs].isnull().sum() / receptive_language_dataset.shape[0]).round(2)

Out[29]:

age_years        0.00
school           0.00
score            0.00
male             0.00
family_inv       0.16
sib              0.06
deg_hl_below6    0.05
mother_hs        0.31
synd_or_disab    0.14
jcih             0.24
dtype: float64

Drop individuals with missing age, since there is <1% of them

In [30]:

receptive_language_dataset = receptive_language_dataset[receptive_language_dataset.age.notnull()]

In [31]:

receptive_language_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');

In [32]:

receptive_language_dataset.mother_ed.hist()

Out[32]:

<matplotlib.axes._subplots.AxesSubplot at 0x1090f77b8>

Final analysis dataset size

In [33]:

receptive_language_dataset.shape

Out[33]:

(999, 82)

In [34]:

missing_pct = (receptive_language_dataset[covs].isnull().mean()*100).round(1)
missing_pct.sort(ascending=False)
missing_pct

/Users/fonnescj/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py:2: FutureWarning: sort is deprecated, use sort_values(inplace=True) for INPLACE sorting
  from ipykernel import kernelapp as app

Out[34]:

mother_hs        30.9
jcih             23.1
family_inv       16.2
synd_or_disab    13.4
sib               5.5
deg_hl_below6     5.0
male              0.0
score             0.0
school            0.0
age_years         0.0
dtype: float64

In [35]:

receptive_language_dataset['age_amp_missing'] = receptive_language_dataset.age_amp.isnull()
by_age_amp_missing = receptive_language_dataset.groupby('age_amp_missing')

In [36]:

by_age_amp_missing['male', 'synd_or_disab', 'deg_hl_below6', 'mother_hs',
                            'non_english'].mean()

Out[36]:

                     male  synd_or_disab  deg_hl_below6  mother_hs
age_amp_missing                                                   
False            0.507343       0.223926       0.480218   0.596558
True             0.516000       0.314554       0.453704   0.610778

In [37]:

for col in ['age', 'score', 'degree_hl', 'family_inv', 'age_int']:
    try:
        _ = by_age_amp_missing.boxplot(column=col)
    except:
        pass

/Users/fonnescj/anaconda3/lib/python3.5/site-packages/pandas/tools/plotting.py:3079: FutureWarning: 
The default value for 'return_type' will change to 'axes' in a future release.
 To use the future behavior now, set return_type='axes'.
 To keep the previous behavior and silence this warning, set return_type='dict'.
  rot=rot, grid=grid, **kwds)

In [38]:

receptive_language_dataset['jcih_missing'] = receptive_language_dataset.jcih.isnull()
by_jcih_missing = receptive_language_dataset.groupby('jcih_missing')

In [39]:

by_jcih_missing['male', 'synd_or_disab', 'deg_hl_below6', 'mother_hs',
                            'non_english'].mean()

Out[39]:

                  male  synd_or_disab  deg_hl_below6  mother_hs
jcih_missing                                                   
False         0.510417       0.220390       0.474104   0.604824
True          0.506494       0.333333       0.474490   0.582781

In [40]:

for col in ['age', 'score', 'degree_hl', 'family_inv', 'age_int']:
    _ = by_jcih_missing.boxplot(column=col)

/Users/fonnescj/anaconda3/lib/python3.5/site-packages/pandas/tools/plotting.py:3079: FutureWarning: 
The default value for 'return_type' will change to 'axes' in a future release.
 To use the future behavior now, set return_type='axes'.
 To keep the previous behavior and silence this warning, set return_type='dict'.
  rot=rot, grid=grid, **kwds)

In [41]:

receptive_language_dataset.age_years.mean()

Out[41]:

2.2938772105438825

In [42]:

from pymc import Bernoulli, Normal, Uniform, invlogit, logit, deterministic, MCMC, Matplot, MAP
from pymc import Lambda, Dirichlet, Categorical, Beta, NegativeBinomial, Exponential
from numpy.ma import masked_values

def generate_model(dataset):
    
    # Extract data to local variables
    age_years = dataset.age_years.values
    school = dataset.school.values
    score = dataset.score.values
    non_english = dataset.non_english.astype(int).values
    male = dataset.male.values
    sib = dataset.sib.values
    synd_or_disab = dataset.synd_or_disab.values
    int_outside_option = dataset.int_outside_option.values
    synd_or_disab = dataset.synd_or_disab.values
    jcih = dataset.jcih.values
    
    # Transform some data
    age_std = age_years - age_years.mean()
    
    # Imputation of family involvement
    n_family_inv_cats = len(dataset.family_inv.unique())
    p_family_inv = Dirichlet("p_family_inv", np.ones(n_family_inv_cats), value=[1./n_family_inv_cats]*(n_family_inv_cats-1))
    family_inv_masked = masked_values(dataset.family_inv.fillna(0.5), value=0.5)
    x_family_inv = Categorical('x_family_inv', p_family_inv, value=family_inv_masked, observed=True)
    
    # Imputation of hearing loss
    p_jcih = Beta("p_jcih", 1, 1, value=0.5)
    jcih_masked = masked_values(dataset.jcih.values, value=np.nan)
    x_jcih = Bernoulli('x_jcih', p_jcih, value=jcih_masked, observed=True)
    
    # Imputation of hearing loss
    p_hl = Beta("p_hl", 1, 1, value=0.5)
    hl_masked = masked_values(dataset.deg_hl_below6.values, value=np.nan)
    x_hl = Bernoulli('x_hl', p_hl, value=hl_masked, observed=True)
    
    # Imputation of maternal education
    p_mother_hs = Beta("p_mother_hs", 1, 1, value=0.5)
    mother_hs_masked = masked_values(dataset.mother_hs.values, value=np.nan)
    x_mother_hs = Bernoulli('x_mother_hs', p_mother_hs, value=mother_hs_masked, observed=True)
    
    # Imputation of previous disability
    p_synd_or_disab = Beta("p_synd_or_disab", 1, 1, value=0.5)
    synd_or_disab_masked = masked_values(dataset.synd_or_disab.values, value=np.nan)
    x_synd_or_disab = Bernoulli('x_synd_or_disab', p_synd_or_disab, value=synd_or_disab_masked, observed=True)
    
    # Imputation of siblings
    n_sib_cats = len(dataset.sib.unique())
    p_sib = Dirichlet("p_sib", np.ones(n_sib_cats), value=[1./n_sib_cats]*(n_sib_cats-1))
    sib_masked = masked_values(dataset.sib.fillna(0.5), value=0.5)
    x_sib = Categorical('x_sib', p_sib, value=sib_masked, observed=True)
    
    mean_score = Normal("mean_score", 100, 0.001, value=80)
    
    # Indices to school random effects
    unique_schools = np.unique(school)
    school_index = [list(unique_schools).index(s) for s in school]

    # School random effect
    sigma_school = Uniform("sigma_school", 0, 1000, value=10)
    tau_school = sigma_school**-2
    alpha_school = Normal("alpha_school", mu=0, tau=tau_school, value=np.zeros(len(unique_schools)))
    
    # Indices to child random effects
#     unique_child = np.unique(study_id)
#     child_index = [list(unique_child).index(s) for s in study_id]
    
    # Individual random effect
#     sigma_child = Uniform("sigma_child", 0, 1000, value=10)
#     tau_child = sigma_child**-2    
#     alpha_child = Normal("alpha_child", mu=0, tau=tau_child, value=np.zeros(len(unique_child)))
    
    @deterministic
    def intercept(a0=mean_score, a1=alpha_school):
        """Random intercepts"""
        return a0 + a1[school_index]
    
    # Fixed effects
    beta = Normal("beta", 0, 0.001, value=[0]*8)

    @deterministic
    def theta(b0=intercept, b=beta, x_family_inv=x_family_inv, x_sib=x_sib, 
              x_hl=x_hl, x_mother_hs=x_mother_hs, x_synd_or_disab=x_synd_or_disab, x_jcih=x_jcih):
        
        # Complete predictors
        X = [age_std, male]
        # Partially-imputed predictors
        X += [x_family_inv, x_sib, x_hl, x_mother_hs, x_synd_or_disab, x_jcih]
        return b0 + np.dot(b, X)
    
    sigma = Uniform("sigma", 0, 1000, value=10)
    tau = sigma**-2
    
    # Data likelihood
    score_like = Normal("score_like", mu=theta, tau=tau, value=score, observed=True)
    #score_like_pred = Normal("score_like_pred", mu=theta, tau=tau, value=score)
    
    # Predictions:
    # Girl, 1 year older than average, normal family involvement, 2 siblings, non-profound hearing loss,
    # mother with HS, no previous disability, JCIH
    x1 = [0, 0, 0, 2, 1, 1, 0, 1]
    # Boy, 6 months younger than average, normal family involvement, no siblings, profound hearing loss,
    # mother without HS diploma, previous disability, JCIH
    x2 = [-0.5, 1, 0, 0, 0, 0, 1, 1]
    # Boy, 1.5 years older than average, poor family involvement, 1 sibling, non-profound hearing loss,
    # mother without HS diploma, previous disability, no JCIH
    x3 = [1.5, 1, 2, 1, 1, 0, 1, 0]
    # Girl, 6 months older than average, impaired family involvement, 3 siblings, profound hearing loss,
    # mother with HS diploma, no previous disability, JCIH
    x4 = [0.5, 1, 1, 3, 0, 1, 0, 1]
    X = np.c_[x1, x2, x3, x4]
#     import pdb; pdb.set_trace()
    theta_pred = Lambda('theta_pred', lambda b=beta, mu=mean_score: mu + np.dot(X.T, b))
    predictions = Normal('predictions', mu=theta_pred, tau=tau)
    
    
    return locals()

Couldn't import dot_parser, loading of dot files will not be possible.

In [43]:

# M_reclang = MCMC(generate_model(receptive_language_dataset))
M_reclang = MCMC(generate_model(receptive_language_dataset), db='sqlite', name='reclang', dbmode='w')

In [64]:

iterations = 100000
burn = 90000

In [45]:

M_reclang.sample(iterations, burn)

 [-----------------100%-----------------] 200000 of 200000 complete in 1469.0 sec

In [46]:

labels = ['Age of Enrollment', 'Male', 
                       'Reduced Family Involvement', 'Sibling Count', 'Non-profound Hearing Loss',
                       'Mother with HS Diploma','Additional Disability', 'JCIH']

In [47]:

Matplot.summary_plot([M_reclang.beta], rhat=False,
        custom_labels=labels, x_range=(-10,10), main='Receptive Language')

In [48]:

M_reclang.beta.summary()

beta:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	-3.221           0.438            0.025            [-4.016 -2.261]
	0.668            1.104            0.088            [-1.493  2.839]
	-5.274           0.521            0.031            [-6.314 -4.296]
	-0.291           0.578            0.036            [-1.486  0.747]
	8.146            1.151            0.087          [  5.822  10.46 ]
	5.597            1.109            0.088            [ 3.374  7.684]
	-6.361           1.049            0.076            [-8.442 -4.41 ]
	3.318            1.136            0.09             [ 1.108  5.474]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	-4.122           -3.503          -3.217         -2.935        -2.335
	-1.44            -0.1            0.64           1.443         2.942
	-6.283           -5.626          -5.288         -4.927        -4.231
	-1.413           -0.678          -0.308         0.087         0.86
	5.843            7.37            8.073          8.927         10.517
	3.208            4.876           5.682          6.361         7.636
	-8.418           -7.089          -6.381         -5.652        -4.329
	1.049            2.622           3.351          4.038         5.421

/Users/fonnescj/anaconda3/lib/python3.5/site-packages/numpy/core/fromnumeric.py:225: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  return reshape(newshape, order=order)

In [49]:

M_reclang.sigma_school.summary()

sigma_school:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	5.832            0.921            0.063            [ 4.145  7.668]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	4.239            5.185           5.767          6.409         7.876

/Users/fonnescj/anaconda3/lib/python3.5/site-packages/numpy/core/fromnumeric.py:225: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  return reshape(newshape, order=order)

In [50]:

Matplot.summary_plot([M_reclang.predictions], rhat=False, custom_labels=['Student {}'.format(i) for i in range(1,5)])

In [51]:

#Matplot.gof_plot(M_reclang.score_like_pred.trace(), M_reclang.score, verbose=0)

Expressive Language Model¶

In [52]:

expressive_language_dataset = lsl_dr[(lsl_dr.domain=='Language') & (lsl_dr.test_type=='expressive') &
                                     (lsl_dr.non_english==False)]

expressive_language_dataset.head()

Out[52]:

            redcap_event_name academic_year   hl  male  _race  prim_lang  sib  \
2    initial_assessment_arm_1     2002-2003  0.0   0.0    0.0        0.0  1.0   
5    initial_assessment_arm_1     2008-2009  0.0   0.0    0.0        0.0  2.0   
68   initial_assessment_arm_1     2009-2010  0.0   0.0    0.0        0.0  1.0   
75   initial_assessment_arm_1     2009-2010  0.0   0.0    2.0        0.0  3.0   
104  initial_assessment_arm_1     2009-2010  0.0   1.0    0.0        0.0  3.0   

     _mother_ed  father_ed  premature_age    ...       test_type  \
2           6.0        6.0            9.0    ...      expressive   
5           5.0        5.0            7.0    ...      expressive   
68          4.0        3.0            9.0    ...      expressive   
75          2.0        2.0            8.0    ...      expressive   
104         2.0        4.0            8.0    ...      expressive   

     academic_year_start    tech_class  age_year  non_profound  deg_hl_below6  \
2                 2002.0       Bimodal       4.0         False            0.0   
5                 2008.0  Bilateral HA       1.0          True            1.0   
68                2009.0  Bilateral CI       0.0         False            0.0   
75                2009.0  Bilateral HA       2.0          True            1.0   
104               2009.0  Bilateral HA       1.0          True            1.0   

     int_outside_option  mother_hs  jcih  age_years  
2                   0.0        NaN   0.0   4.333333  
5                   1.0        1.0   0.0   1.416667  
68                  0.0        1.0   1.0   0.250000  
75                  0.0        0.0   1.0   2.166667  
104                 1.0        0.0   1.0   1.416667  

[5 rows x 82 columns]

In [53]:

mean_score = expressive_language_dataset.groupby('study_id')['score'].mean()
expressive_language_dataset = expressive_language_dataset.drop_duplicates('study_id').set_index('study_id')
expressive_language_dataset['mean_score'] = mean_score

In [54]:

(expressive_language_dataset[covs].isnull().sum() / expressive_language_dataset.shape[0]).round(2)

Out[54]:

age_years        0.00
school           0.00
score            0.00
male             0.00
family_inv       0.16
sib              0.06
deg_hl_below6    0.05
mother_hs        0.31
synd_or_disab    0.14
jcih             0.23
dtype: float64

In [55]:

expressive_language_dataset = expressive_language_dataset[expressive_language_dataset.age.notnull() 
                                                        & expressive_language_dataset.int_outside_option.notnull()]

In [56]:

expressive_language_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');

In [57]:

M_explang = MCMC(generate_model(expressive_language_dataset), db='sqlite', name='explang', dbmode='w')

In [58]:

M_explang.sample(iterations, burn)

 [-----------------100%-----------------] 200000 of 200000 complete in 1215.0 sec

In [59]:

M_explang.beta.summary()

beta:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	-3.601           0.462            0.023            [-4.579 -2.78 ]
	0.117            1.008            0.076            [-1.779  2.071]
	-4.607           0.558            0.035            [-5.677 -3.56 ]
	-0.222           0.557            0.031            [-1.369  0.865]
	8.775            1.194            0.092          [  6.285  11.108]
	7.282            1.246            0.103            [ 4.844  9.591]
	-6.211           1.151            0.085            [-8.353 -3.948]
	4.895            1.095            0.086            [ 2.792  7.068]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	-4.493           -3.907          -3.595         -3.294        -2.66
	-1.972           -0.558          0.093          0.824         2.017
	-5.671           -4.992          -4.6           -4.226        -3.532
	-1.354           -0.574          -0.219         0.141         0.898
	6.307            8.01            8.81           9.576         11.182
	4.853            6.415           7.299          8.162         9.635
	-8.336           -6.983          -6.223         -5.484        -3.894
	2.793            4.216           4.89           5.574         7.085

/Users/fonnescj/anaconda3/lib/python3.5/site-packages/numpy/core/fromnumeric.py:225: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  return reshape(newshape, order=order)

In [60]:

Matplot.summary_plot([M_explang.beta], custom_labels=labels, rhat=False, main='Expressive Language', x_range=(-10,10))

In [61]:

Matplot.summary_plot([M_explang.predictions], rhat=False, custom_labels=['Student {}'.format(i) for i in range(1,5)])

In [66]:

Matplot.plot(M_explang.sigma_school)

Plotting sigma_school

Articulation Model¶

In [67]:

articulation_dataset = lsl_dr[(lsl_dr.domain=='Articulation') & 
                                     (lsl_dr.non_english==False)]
articulation_dataset.head()

Out[67]:

            redcap_event_name academic_year   hl  male  _race  prim_lang  sib  \
3    initial_assessment_arm_1     2008-2009  0.0   0.0    0.0        0.0  2.0   
6    initial_assessment_arm_1     2009-2010  0.0   1.0    2.0        0.0  3.0   
66   initial_assessment_arm_1     2009-2010  0.0   0.0    0.0        0.0  1.0   
73   initial_assessment_arm_1     2009-2010  0.0   0.0    2.0        0.0  3.0   
102  initial_assessment_arm_1     2009-2010  0.0   1.0    0.0        0.0  3.0   

     _mother_ed  father_ed  premature_age    ...      test_type  \
3           5.0        5.0            7.0    ...        Goldman   
6           6.0        6.0            8.0    ...        Goldman   
66          4.0        3.0            9.0    ...        Goldman   
73          2.0        2.0            8.0    ...        Goldman   
102         2.0        4.0            8.0    ...        Goldman   

     academic_year_start    tech_class  age_year  non_profound  deg_hl_below6  \
3                 2008.0  Bilateral HA       1.0          True            1.0   
6                 2009.0       Bimodal       0.0         False            0.0   
66                2009.0  Bilateral CI       0.0         False            0.0   
73                2009.0  Bilateral HA       2.0          True            1.0   
102               2009.0  Bilateral HA       1.0          True            1.0   

     int_outside_option  mother_hs  jcih  age_years  
3                   1.0        1.0   0.0   1.416667  
6                   0.0        NaN   1.0   0.083333  
66                  0.0        1.0   1.0   0.250000  
73                  0.0        0.0   1.0   2.166667  
102                 1.0        0.0   1.0   1.416667  

[5 rows x 82 columns]

In [68]:

mean_score = articulation_dataset.groupby('study_id')['score'].mean()
articulation_dataset = articulation_dataset.drop_duplicates('study_id').set_index('study_id')
articulation_dataset['mean_score'] = mean_score

In [69]:

(articulation_dataset[covs].isnull().sum() / articulation_dataset.shape[0]).round(2)

Out[69]:

age_years        0.01
school           0.00
score            0.00
male             0.00
family_inv       0.18
sib              0.07
deg_hl_below6    0.16
mother_hs        0.36
synd_or_disab    0.12
jcih             0.34
dtype: float64

In [70]:

articulation_dataset = articulation_dataset[articulation_dataset.age.notnull()
                                            & articulation_dataset.int_outside_option.notnull()]

In [71]:

articulation_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');

In [72]:

M_articulation = MCMC(generate_model(articulation_dataset), db='sqlite', name='articulation', dbmode='w')

In [73]:

M_articulation.sample(iterations, burn)

 [-----------------100%-----------------] 100000 of 100000 complete in 794.5 sec

In [74]:

Matplot.summary_plot([M_articulation.beta], 
        custom_labels=labels, rhat=False, main='Articulation', x_range=(-10,10))

In [75]:

Matplot.summary_plot([M_articulation.predictions], rhat=False, custom_labels=['Student {}'.format(i) for i in range(1,5)])

In [76]:

M_articulation.beta.summary()

beta:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	-2.46            0.526            0.028            [-3.456 -1.387]
	1.136            1.118            0.081            [-0.896  3.474]
	-3.817           0.603            0.035            [-5.011 -2.653]
	0.025            0.649            0.036            [-1.278  1.253]
	5.299            1.346            0.103            [ 2.844  8.168]
	6.097            1.424            0.117            [ 3.169  8.679]
	-8.293           1.358            0.101          [-10.727  -5.582]
	6.625            1.452            0.121            [ 4.143  9.456]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	-3.518           -2.824          -2.469         -2.091        -1.436
	-1.02            0.343           1.098          1.934         3.4
	-5.002           -4.227          -3.822         -3.423        -2.616
	-1.242           -0.423          0.034          0.476         1.327
	2.719            4.375           5.301          6.161         8.102
	3.309            5.157           6.167          7.079         8.875
	-10.939          -9.279          -8.27          -7.317        -5.725
	3.584            5.625           6.578          7.719         9.229

/Users/fonnescj/anaconda3/lib/python3.5/site-packages/numpy/core/fromnumeric.py:225: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  return reshape(newshape, order=order)

Expressive Vocabulary Model¶

In [77]:

expressive_vocab_dataset = lsl_dr[(lsl_dr.domain=='Expressive Vocabulary') & 
                                     (lsl_dr.non_english==False)]
expressive_vocab_dataset.head()

Out[77]:

            redcap_event_name academic_year   hl  male  _race  prim_lang  sib  \
0    initial_assessment_arm_1     2002-2003  0.0   0.0    0.0        0.0  1.0   
124  initial_assessment_arm_1     2014-2015  0.0   0.0    6.0        0.0  2.0   
205  initial_assessment_arm_1     2013-2014  0.0   0.0    0.0        0.0  1.0   
213  initial_assessment_arm_1     2012-2013  0.0   1.0    0.0        0.0  2.0   
219  initial_assessment_arm_1     2012-2013  0.0   0.0    3.0        0.0  0.0   

     _mother_ed  father_ed  premature_age    ...      test_type  \
0           6.0        6.0            9.0    ...         EOWPVT   
124         NaN        NaN            8.0    ...            EVT   
205         4.0        5.0            8.0    ...         EOWPVT   
213         4.0        5.0            8.0    ...         EOWPVT   
219         6.0        6.0            9.0    ...         EOWPVT   

     academic_year_start    tech_class  age_year  non_profound  deg_hl_below6  \
0                 2002.0       Bimodal       4.0         False            0.0   
124               2014.0  Bilateral HA       4.0          True            1.0   
205               2013.0       Bimodal       4.0         False            0.0   
213               2012.0       Bimodal       4.0         False            0.0   
219               2012.0       Bimodal       4.0         False            0.0   

     int_outside_option  mother_hs  jcih  age_years  
0                   0.0        NaN   0.0   4.333333  
124                 0.0        NaN   0.0   4.833333  
205                 1.0        1.0   1.0   4.916667  
213                 1.0        1.0   1.0   4.583333  
219                 0.0        NaN   0.0   4.583333  

[5 rows x 82 columns]

In [78]:

mean_score = expressive_vocab_dataset.groupby('study_id')['score'].mean()
expressive_vocab_dataset = expressive_vocab_dataset.drop_duplicates('study_id').set_index('study_id')
expressive_vocab_dataset['mean_score'] = mean_score

In [79]:

expressive_vocab_dataset[covs].isnull().sum()

Out[79]:

age_years          7
school             0
score              0
male               0
family_inv       216
sib               75
deg_hl_below6    178
mother_hs        426
synd_or_disab    138
jcih             395
dtype: int64

In [80]:

expressive_vocab_dataset = expressive_vocab_dataset[expressive_vocab_dataset.age.notnull() 
                                                    & expressive_vocab_dataset.male.notnull()
                                                    & expressive_vocab_dataset.int_outside_option.notnull()]

In [81]:

expressive_vocab_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');

In [82]:

M_expressive_vocab = MCMC(generate_model(expressive_vocab_dataset), db='sqlite', name='expressive_vocab', dbmode='w')

In [83]:

M_expressive_vocab.sample(iterations, burn)

 [-----------------100%-----------------] 100001 of 100000 complete in 904.7 sec

In [84]:

Matplot.summary_plot([M_expressive_vocab.beta], 
        custom_labels=labels, rhat=False, main='Expressive Vocabulary', x_range=(-10,10))

In [85]:

M_expressive_vocab.beta.summary()

beta:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	-3.355           0.439            0.025            [-4.241 -2.531]
	1.485            1.041            0.08             [-0.636  3.439]
	-5.252           0.538            0.034            [-6.367 -4.273]
	-1.018           0.581            0.038            [-2.112  0.148]
	6.463            1.066            0.082            [ 4.524  8.605]
	6.901            1.395            0.119            [ 4.176  9.645]
	-5.977           1.185            0.092            [-8.412 -3.733]
	3.873            1.264            0.106            [ 1.431  6.319]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	-4.233           -3.644          -3.35          -3.064        -2.502
	-0.421           0.783           1.417          2.176         3.678
	-6.324           -5.612          -5.235         -4.88         -4.212
	-2.112           -1.42           -1.02          -0.638        0.148
	4.534            5.768           6.422          7.173         8.649
	4.191            5.925           6.939          7.851         9.713
	-8.294           -6.781          -5.967         -5.18         -3.58
	1.527            3.003           3.844          4.699         6.579

/Users/fonnescj/anaconda3/lib/python3.5/site-packages/numpy/core/fromnumeric.py:225: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  return reshape(newshape, order=order)

In [86]:

Matplot.summary_plot(M_explang.alpha_school, custom_labels=[' ']*42, main='Expressive language school effects', 
                     x_range=(-25, 25))

Could not calculate Gelman-Rubin statistics. Requires multiple chains of equal length.

In [87]:

Matplot.summary_plot(M_expressive_vocab.alpha_school, custom_labels=[' ']*42, main='Expressive vocabulary school effects',
                     x_range=(-25, 25))

Could not calculate Gelman-Rubin statistics. Requires multiple chains of equal length.

Receptive Vocabulary Model¶

In [88]:

receptive_vocab_dataset = lsl_dr[(lsl_dr.domain=='Receptive Vocabulary') & 
                                     (lsl_dr.non_english==False)]
receptive_vocab_dataset.head()

Out[88]:

            redcap_event_name academic_year   hl  male  _race  prim_lang  sib  \
125  initial_assessment_arm_1     2014-2015  0.0   0.0    6.0        0.0  2.0   
206  initial_assessment_arm_1     2013-2014  0.0   0.0    0.0        0.0  1.0   
214  initial_assessment_arm_1     2012-2013  0.0   1.0    0.0        0.0  2.0   
217  initial_assessment_arm_1     2012-2013  0.0   0.0    0.0        0.0  1.0   
220  initial_assessment_arm_1     2012-2013  0.0   0.0    3.0        0.0  0.0   

     _mother_ed  father_ed  premature_age    ...      test_type  \
125         NaN        NaN            8.0    ...           PPVT   
206         4.0        5.0            8.0    ...         ROWPVT   
214         4.0        5.0            8.0    ...         ROWPVT   
217         5.0        5.0            8.0    ...         ROWPVT   
220         6.0        6.0            9.0    ...           PPVT   

     academic_year_start    tech_class  age_year  non_profound  deg_hl_below6  \
125               2014.0  Bilateral HA       4.0          True            1.0   
206               2013.0       Bimodal       4.0         False            0.0   
214               2012.0       Bimodal       4.0         False            0.0   
217               2012.0  Bilateral HA       4.0          True            1.0   
220               2012.0       Bimodal       4.0         False            0.0   

     int_outside_option  mother_hs  jcih  age_years  
125                 0.0        NaN   0.0   4.833333  
206                 1.0        1.0   1.0   4.916667  
214                 1.0        1.0   1.0   4.583333  
217                 1.0        1.0   1.0   4.083333  
220                 0.0        NaN   0.0   4.583333  

[5 rows x 82 columns]

In [89]:

mean_score = receptive_vocab_dataset.groupby('study_id')['score'].mean()
receptive_vocab_dataset = receptive_vocab_dataset.drop_duplicates('study_id').set_index('study_id')
receptive_vocab_dataset['mean_score'] = mean_score

In [90]:

receptive_vocab_dataset[covs].isnull().sum()

Out[90]:

age_years          7
school             0
score              0
male               0
family_inv       220
sib               74
deg_hl_below6    180
mother_hs        430
synd_or_disab    135
jcih             399
dtype: int64

In [91]:

receptive_vocab_dataset = receptive_vocab_dataset[receptive_vocab_dataset.age.notnull() & 
                                                  receptive_vocab_dataset.male.notnull() &
                                                  receptive_vocab_dataset.int_outside_option.notnull()]

In [92]:

receptive_vocab_dataset.age_int.hist(bins=40)

Out[92]:

<matplotlib.axes._subplots.AxesSubplot at 0x10919c358>

In [93]:

receptive_vocab_dataset.shape

Out[93]:

(1171, 82)

In [94]:

receptive_vocab_dataset.score.hist(grid=False, bins=25, figsize=(10, 6))
plt.xlabel('Standard Scores'); plt.ylabel('Frequency');

In [95]:

M_receptive_vocab = MCMC(generate_model(receptive_vocab_dataset), db='sqlite', name='receptive_vocab', dbmode='w')

In [96]:

M_receptive_vocab.sample(iterations, burn)

 [-----------------100%-----------------] 100000 of 100000 complete in 900.8 sec

In [97]:

#order = np.argsort(M.beta.stats()['mean'])[::-1]
Matplot.summary_plot([M_receptive_vocab.beta], rhat=False,
        custom_labels=labels, main='Receptive Vocabulary', x_range=(-10,10))

In [98]:

M_receptive_vocab.beta.summary()

beta:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	-3.091           0.435            0.024            [-3.933 -2.28 ]
	0.654            0.989            0.077            [-1.31   2.667]
	-4.541           0.493            0.029            [-5.534 -3.615]
	-0.837           0.538            0.032            [-1.857  0.228]
	5.958            1.091            0.087            [ 3.857  8.068]
	5.369            1.374            0.122            [ 2.592  7.699]
	-7.058           1.147            0.09             [-9.313 -4.939]
	3.526            1.217            0.102            [ 1.36   5.848]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	-3.909           -3.399          -3.089         -2.796        -2.245
	-1.31            0.032           0.62           1.278         2.667
	-5.521           -4.877          -4.542         -4.21         -3.592
	-1.834           -1.189          -0.869         -0.492        0.281
	3.715            5.228           6.008          6.687         7.928
	2.537            4.421           5.541          6.385         7.679
	-9.302           -7.85           -7.003         -6.224        -4.909
	1.368            2.644           3.487          4.395         5.907

/Users/fonnescj/anaconda3/lib/python3.5/site-packages/numpy/core/fromnumeric.py:225: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  return reshape(newshape, order=order)

Database queries¶

In [99]:

import sqlite3

In [100]:

db_files = !ls *sqlite

In [101]:

articulation = sqlite3.connect("articulation.sqlite")

In [102]:

c_artic = articulation.cursor()

In [103]:

c_artic.execute('SELECT v1 FROM sigma_school WHERE trace==0')

Out[103]:

<sqlite3.Cursor at 0x10d852e30>

In [104]:

sigma_school_artic = np.ravel(c_artic.fetchall())

In [105]:

sigma_school_artic.mean()

Out[105]:

5.4121976771999991

In [106]:

for dbname in db_files:
    db = sqlite3.connect(dbname)
    cur = db.cursor()
    cur.execute('SELECT v1 FROM sigma_school WHERE trace==0')
    trace = np.ravel(cur.fetchall())
    trace.sort()
    print('\n'+dbname)
    print(np.median(trace))
    print(trace[[int(len(trace)*0.025), int(len(trace)*0.975)]])

articulation.sqlite
5.3787495
[ 2.726391  8.125967]

explang.sqlite
5.638327
[ 4.07999   8.008275]

expressive_vocab.sqlite
8.387962
[  6.401984  11.66551 ]

receptive_vocab.sqlite
6.903193
[ 5.176159  9.135123]

reclang.sqlite
5.766925
[ 4.239235  7.87621 ]