This notebook contains a "one-day paper", my attempt to pose a research question, answer it, and publish the results in one work day.
Copyright 2016 Allen B. Downey
MIT License: https://opensource.org/licenses/MIT
from __future__ import print_function, division
import thinkstats2
import thinkplot
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
%matplotlib inline
According to Wikipedia, the Trivers-Willard hypothesis:
"...suggests that female mammals are able to adjust offspring sex ratio in response to their maternal condition. For example, it may predict greater parental investment in males by parents in 'good conditions' and greater investment in females by parents in 'poor conditions' (relative to parents in good condition)."
For humans, the hypothesis suggests that people with relatively high social status might be more likely to have boys. Some studies have shown evidence for this hypothesis, but based on my very casual survey, it is not persuasive.
To test whether the T-W hypothesis holds up in humans, I downloaded birth data for the nearly 4 million babies born in the U.S. in 2014.
I selected variables that seemed likely to be related to social status and used logistic regression to identify variables associated with sex ratio.
Summary of results
Running regression with one variable at a time, many of the variables have a statistically significant effect on sex ratio, with the sign of the effect generally in the direction predicted by T-W.
However, many of the variables are also correlated with race. If we control for either the mother's race or the father's race, or both, most other variables have no additional predictive power.
Contrary to other reports, the age of the parents seems to have no predictive power.
Strangely, the variable that shows the strongest and most consistent relationship with sex ratio is the number of prenatal visits. Although it seems obvious that prenatal visits are a proxy for quality of health care and general socioeconomic status, the sign of the effect is opposite what T-W predicts; that is, more prenatal visits is a strong predictor of lower sex ratio (more girls).
Following convention, I report sex ratio in terms of boys per 100 girls. The overall sex ratio at birth is about 105; that is, 105 boys are born for every 100 girls.
Here's how I loaded the data:
names = ['year', 'mager9', 'mnativ', 'restatus', 'mbrace', 'mhisp_r',
'mar_p', 'dmar', 'meduc', 'fagerrec11', 'fbrace', 'fhisp_r', 'feduc',
'lbo_rec', 'previs_rec', 'wic', 'height', 'bmi_r', 'pay_rec', 'sex']
colspecs = [(9, 12),
(79, 79),
(84, 84),
(104, 104),
(110, 110),
(115, 115),
(119, 119),
(120, 120),
(124, 124),
(149, 150),
(156, 156),
(160, 160),
(163, 163),
(179, 179),
(242, 243),
(251, 251),
(280, 281),
(287, 287),
(436, 436),
(475, 475),
]
colspecs = [(start-1, end) for start, end in colspecs]
df = None
filename = 'Nat2014PublicUS.c20150514.r20151022.txt.gz'
#df = pd.read_fwf(filename, compression='gzip', header=None, names=names, colspecs=colspecs)
#df.head()
# store the dataframe for faster loading
#store = pd.HDFStore('store.h5')
#store['births2014'] = df
#store.close()
# load the dataframe
store = pd.HDFStore('store.h5')
df = store['births2014']
store.close()
def series_to_ratio(series):
"""Takes a boolean series and computes sex ratio.
"""
boys = np.mean(series)
return np.round(100 * boys / (1-boys)).astype(int)
I have to recode sex as 0
or 1
to make logit
happy.
df['boy'] = (df.sex=='M').astype(int)
df.boy.value_counts().sort_index()
0 1952273 1 2045902 Name: boy, dtype: int64
All births are from 2014.
df.year.value_counts().sort_index()
2014 3998175 Name: year, dtype: int64
Mother's age:
df.mager9.value_counts().sort_index()
1 2777 2 249581 3 884246 4 1148469 5 1084064 6 510214 7 110318 8 7750 9 756 Name: mager9, dtype: int64
var = 'mager9'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)
boy | |
---|---|
mager9 | |
1 | 109 |
2 | 105 |
3 | 105 |
4 | 105 |
5 | 105 |
6 | 105 |
7 | 104 |
8 | 104 |
9 | 102 |
df.mager9.isnull().mean()
0.0
df['youngm'] = df.mager9<=2
df['oldm'] = df.mager9>=7
df.youngm.mean(), df.oldm.mean()
(0.06311829772333627, 0.029719559549044251)
Mother's nativity (1 = born in the U.S.)
df.mnativ.replace([3], np.nan, inplace=True)
df.mnativ.value_counts().sort_index()
1 3106689 2 881662 Name: mnativ, dtype: int64
var = 'mnativ'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)
boy | |
---|---|
mnativ | |
1 | 105 |
2 | 105 |
Residence status (1=resident)
df.restatus.value_counts().sort_index()
1 2873404 2 1025766 3 88906 4 10099 Name: restatus, dtype: int64
var = 'restatus'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)
boy | |
---|---|
restatus | |
1 | 105 |
2 | 105 |
3 | 106 |
4 | 106 |
Mother's race (1=White, 2=Black, 3=American Indian or Alaskan Native, 4=Asian or Pacific Islander)
df.mbrace.value_counts().sort_index()
1 3029013 2 641089 3 44962 4 283111 Name: mbrace, dtype: int64
var = 'mbrace'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)
boy | |
---|---|
mbrace | |
1 | 105 |
2 | 103 |
3 | 103 |
4 | 106 |
Mother's Hispanic origin (0=Non-Hispanic)
df.mhisp_r.replace([9], np.nan, inplace=True)
df.mhisp_r.value_counts().sort_index()
0 3045419 1 553738 2 69894 3 20165 4 136785 5 141497 Name: mhisp_r, dtype: int64
def copy_null(df, oldvar, newvar):
df.loc[df[oldvar].isnull(), newvar] = np.nan
df['mhisp'] = df.mhisp_r > 0
copy_null(df, 'mhisp_r', 'mhisp')
df.mhisp.isnull().mean(), df.mhisp.mean()
(0.0076727506925034546, 0.23240818268843488)
var = 'mhisp'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)
boy | |
---|---|
mhisp | |
0 | 105 |
1 | 104 |
Marital status (1=Married)
df.dmar.value_counts().sort_index()
1 2390630 2 1607545 Name: dmar, dtype: int64
var = 'dmar'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)
boy | |
---|---|
dmar | |
1 | 105 |
2 | 104 |
Paternity acknowledged, if unmarried (Y=yes, N=no, X=not applicable, U=unknown).
I recode X (not applicable because married) as Y (paternity acknowledged).
df.mar_p.replace(['U'], np.nan, inplace=True)
df.mar_p.replace(['X'], 'Y', inplace=True)
df.mar_p.value_counts().sort_index()
N 462627 Y 3386542 Name: mar_p, dtype: int64
var = 'mar_p'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)
boy | |
---|---|
mar_p | |
N | 103 |
Y | 105 |
Mother's education level
df.meduc.replace([9], np.nan, inplace=True)
df.meduc.value_counts().sort_index()
1 138589 2 437081 3 957265 4 815688 5 308384 6 732661 7 326800 8 94057 Name: meduc, dtype: int64
var = 'meduc'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)
boy | |
---|---|
meduc | |
1 | 104 |
2 | 104 |
3 | 105 |
4 | 105 |
5 | 105 |
6 | 105 |
7 | 105 |
8 | 104 |
df['lowed'] = df.meduc <= 2
copy_null(df, 'meduc', 'lowed')
df.lowed.isnull().mean(), df.lowed.mean()
(0.046933913598079122, 0.15107367095085322)
Father's age, in 10 ranges
df.fagerrec11.replace([11], np.nan, inplace=True)
df.fagerrec11.value_counts().sort_index()
1 277 2 84852 3 498779 4 869280 5 1025631 6 631685 7 262169 8 87432 9 28465 10 12490 Name: fagerrec11, dtype: int64
var = 'fagerrec11'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)
boy | |
---|---|
fagerrec11 | |
1 | 102 |
2 | 106 |
3 | 106 |
4 | 105 |
5 | 105 |
6 | 105 |
7 | 105 |
8 | 105 |
9 | 104 |
10 | 109 |
df['youngf'] = df.fagerrec11<=2
copy_null(df, 'fagerrec11', 'youngf')
df.youngf.isnull().mean(), df.youngf.mean()
(0.12433547806186572, 0.024315207394332003)
df['oldf'] = df.fagerrec11>=8
copy_null(df, 'fagerrec11', 'oldf')
df.oldf.isnull().mean(), df.oldf.mean()
(0.12433547806186572, 0.036670893957829916)
Father's race
df.fbrace.replace([9], np.nan, inplace=True)
df.fbrace.value_counts().sort_index()
1 2497901 2 482433 3 35408 4 238394 Name: fbrace, dtype: int64
var = 'fbrace'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)
boy | |
---|---|
fbrace | |
1 | 105 |
2 | 103 |
3 | 103 |
4 | 107 |
Father's Hispanic origin (0=non-hispanic, other values indicate country of origin)
df.fhisp_r.replace([9], np.nan, inplace=True)
df.fhisp_r.value_counts().sort_index()
0 2649007 1 493497 2 59137 3 19128 4 108111 5 124172 Name: fhisp_r, dtype: int64
df['fhisp'] = df.fhisp_r > 0
copy_null(df, 'fhisp_r', 'fhisp')
df.fhisp.isnull().mean(), df.fhisp.mean()
(0.13634295647389122, 0.23285053338322156)
var = 'fhisp'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)
boy | |
---|---|
fhisp | |
0 | 105 |
1 | 104 |
Father's education level
df.feduc.replace([9], np.nan, inplace=True)
df.feduc.value_counts().sort_index()
1 141654 2 342061 3 951980 4 643118 5 232622 6 616187 7 242022 8 109482 Name: feduc, dtype: int64
var = 'feduc'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)
boy | |
---|---|
feduc | |
1 | 104 |
2 | 105 |
3 | 105 |
4 | 105 |
5 | 106 |
6 | 105 |
7 | 105 |
8 | 105 |
Live birth order.
df.lbo_rec.replace([9], np.nan, inplace=True)
df.lbo_rec.value_counts().sort_index()
1 1555006 2 1270496 3 669016 4 284435 5 110708 6 46093 7 20786 8 21610 Name: lbo_rec, dtype: int64
var = 'lbo_rec'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)
boy | |
---|---|
lbo_rec | |
1 | 105 |
2 | 105 |
3 | 105 |
4 | 105 |
5 | 104 |
6 | 104 |
7 | 104 |
8 | 102 |
df['highbo'] = df.lbo_rec >= 5
copy_null(df, 'lbo_rec', 'highbo')
df.highbo.isnull().mean(), df.highbo.mean()
(0.0050085351441595226, 0.050072772519889897)
Number of prenatal visits, in 11 ranges
df.previs_rec.replace([12], np.nan, inplace=True)
df.previs_rec.value_counts().sort_index()
1 59670 2 44923 3 98141 4 201032 5 366887 6 826908 7 998330 8 684997 9 379305 10 99067 11 128805 Name: previs_rec, dtype: int64
df.previs_rec.mean()
df['previs'] = df.previs_rec - 7
var = 'previs'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)
boy | |
---|---|
previs | |
-6 | 105 |
-5 | 107 |
-4 | 107 |
-3 | 108 |
-2 | 107 |
-1 | 106 |
0 | 105 |
1 | 103 |
2 | 102 |
3 | 102 |
4 | 102 |
df['no_previs'] = df.previs_rec <= 1
copy_null(df, 'previs_rec', 'no_previs')
df.no_previs.isnull().mean(), df.no_previs.mean()
(0.027540065154726845, 0.015346965650008423)
Whether the mother is eligible for food stamps
df.wic.replace(['U'], np.nan, inplace=True)
df.wic.value_counts().sort_index()
N 2124143 Y 1634978 Name: wic, dtype: int64
var = 'wic'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)
boy | |
---|---|
wic | |
N | 105 |
Y | 104 |
Mother's height in inches
df.height.replace([99], np.nan, inplace=True)
df.height.value_counts().sort_index()
30 28 31 1 34 2 36 14 37 7 38 7 39 7 40 6 41 10 42 13 43 3 44 8 45 11 46 14 47 22 48 857 49 544 50 357 51 422 52 493 53 1503 54 1414 55 2762 56 6678 57 18359 58 21019 59 81588 60 209490 61 269142 62 474306 63 485840 64 559249 65 453503 66 429253 67 334485 68 189690 69 127789 70 62364 71 33428 72 15323 73 5200 74 2538 75 1019 76 590 77 593 78 941 Name: height, dtype: int64
df['mshort'] = df.height<60
copy_null(df, 'height', 'mshort')
df.mshort.isnull().mean(), df.mshort.mean()
(0.051844404009329256, 0.0359147662344377)
df['mtall'] = df.height>=70
copy_null(df, 'height', 'mtall')
df.mtall.isnull().mean(), df.mtall.mean()
(0.051844404009329256, 0.03218134412692316)
var = 'mshort'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)
boy | |
---|---|
mshort | |
0 | 105 |
1 | 104 |
var = 'mtall'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)
boy | |
---|---|
mtall | |
0 | 105 |
1 | 104 |
Mother's BMI in 6 ranges
df.bmi_r.replace([9], np.nan, inplace=True)
df.bmi_r.value_counts().sort_index()
1 140142 2 1702519 3 949075 4 506017 5 242957 6 168515 Name: bmi_r, dtype: int64
var = 'bmi_r'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)
boy | |
---|---|
bmi_r | |
1 | 105 |
2 | 105 |
3 | 105 |
4 | 104 |
5 | 104 |
6 | 104 |
df['obese'] = df.bmi_r >= 4
copy_null(df, 'bmi_r', 'obese')
df.obese.isnull().mean(), df.obese.mean()
(0.07227047340349034, 0.2473532880857861)
Payment method (1=Medicaid, 2=Private insurance, 3=Self pay, 4=Other)
df.pay_rec.replace([9], np.nan, inplace=True)
df.pay_rec.value_counts().sort_index()
1 1665161 2 1824151 3 162650 4 167806 Name: pay_rec, dtype: int64
var = 'pay_rec'
df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)
boy | |
---|---|
pay_rec | |
1 | 104 |
2 | 105 |
3 | 107 |
4 | 105 |
Sex of baby
df.sex.value_counts().sort_index()
F 1952273 M 2045902 Name: sex, dtype: int64
Here are some functions I'll use to interpret the results of logistic regression
def logodds_to_ratio(logodds):
"""Convert from log odds to probability."""
odds = np.exp(logodds)
return 100 * odds
def summarize(results):
"""Summarize parameters in terms of birth ratio."""
inter_or = results.params['Intercept']
inter_rat = logodds_to_ratio(inter_or)
for value, lor in results.params.iteritems():
if value=='Intercept':
continue
rat = logodds_to_ratio(inter_or + lor)
code = '*' if results.pvalues[value] < 0.05 else ' '
print('%-20s %0.1f %0.1f' % (value, inter_rat, rat), code)
Now I'll run models with each variable, one at a time.
Mother's age seems to have no predictive value:
model = smf.logit('boy ~ mager9', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692873 Iterations 3 mager9 105.1 105.0
Dep. Variable: | boy | No. Observations: | 3998175 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3998173 |
Method: | MLE | Df Model: | 1 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 1.129e-07 |
Time: | 14:18:28 | Log-Likelihood: | -2.7702e+06 |
converged: | True | LL-Null: | -2.7702e+06 |
LLR p-value: | 0.4290 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0496 | 0.004 | 13.550 | 0.000 | 0.042 0.057 |
mager9 | -0.0007 | 0.001 | -0.791 | 0.429 | -0.002 0.001 |
The estimated ratios for young mothers is higher, and the ratio for older mothers is lower, but neither is statistically significant.
model = smf.logit('boy ~ youngm + oldm', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692873 Iterations 3 youngm[T.True] 104.8 104.9 oldm[T.True] 104.8 103.9
Dep. Variable: | boy | No. Observations: | 3998175 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3998172 |
Method: | MLE | Df Model: | 2 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 3.813e-07 |
Time: | 14:18:33 | Log-Likelihood: | -2.7702e+06 |
converged: | True | LL-Null: | -2.7702e+06 |
LLR p-value: | 0.3478 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0470 | 0.001 | 44.772 | 0.000 | 0.045 0.049 |
youngm[T.True] | 0.0010 | 0.004 | 0.240 | 0.810 | -0.007 0.009 |
oldm[T.True] | -0.0084 | 0.006 | -1.421 | 0.155 | -0.020 0.003 |
Whether the mother was born in the U.S. has no predictive value
model = smf.logit('boy ~ C(mnativ)', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692873 Iterations 3 C(mnativ)[T.2.0] 104.8 104.9
Dep. Variable: | boy | No. Observations: | 3988351 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3988349 |
Method: | MLE | Df Model: | 1 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 4.566e-08 |
Time: | 14:19:00 | Log-Likelihood: | -2.7634e+06 |
converged: | True | LL-Null: | -2.7634e+06 |
LLR p-value: | 0.6154 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0466 | 0.001 | 41.050 | 0.000 | 0.044 0.049 |
C(mnativ)[T.2.0] | 0.0012 | 0.002 | 0.502 | 0.615 | -0.004 0.006 |
Neither does residence status
model = smf.logit('boy ~ C(restatus)', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692872 Iterations 3 C(restatus)[T.2] 104.8 104.7 C(restatus)[T.3] 104.8 106.0 C(restatus)[T.4] 104.8 106.2
Dep. Variable: | boy | No. Observations: | 3998175 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3998171 |
Method: | MLE | Df Model: | 3 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 6.716e-07 |
Time: | 14:19:28 | Log-Likelihood: | -2.7702e+06 |
converged: | True | LL-Null: | -2.7702e+06 |
LLR p-value: | 0.2932 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0468 | 0.001 | 39.653 | 0.000 | 0.044 0.049 |
C(restatus)[T.2] | -0.0010 | 0.002 | -0.418 | 0.676 | -0.005 0.004 |
C(restatus)[T.3] | 0.0117 | 0.007 | 1.718 | 0.086 | -0.002 0.025 |
C(restatus)[T.4] | 0.0132 | 0.020 | 0.663 | 0.507 | -0.026 0.052 |
Mother's race seems to have predictive value. Relative to whites, black and Native American mothers have more girls; Asians have more boys.
model = smf.logit('boy ~ C(mbrace)', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692863 Iterations 3 C(mbrace)[T.2] 105.1 102.9 * C(mbrace)[T.3] 105.1 103.1 * C(mbrace)[T.4] 105.1 106.3 *
Dep. Variable: | boy | No. Observations: | 3998175 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3998171 |
Method: | MLE | Df Model: | 3 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 1.401e-05 |
Time: | 14:19:55 | Log-Likelihood: | -2.7702e+06 |
converged: | True | LL-Null: | -2.7702e+06 |
LLR p-value: | 1.007e-16 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0497 | 0.001 | 43.250 | 0.000 | 0.047 0.052 |
C(mbrace)[T.2] | -0.0214 | 0.003 | -7.770 | 0.000 | -0.027 -0.016 |
C(mbrace)[T.3] | -0.0195 | 0.010 | -2.049 | 0.041 | -0.038 -0.001 |
C(mbrace)[T.4] | 0.0109 | 0.004 | 2.777 | 0.005 | 0.003 0.019 |
Hispanic mothers have more girls.
model = smf.logit('boy ~ mhisp', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692874 Iterations 3 mhisp 105.0 104.1 *
Dep. Variable: | boy | No. Observations: | 3967498 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3967496 |
Method: | MLE | Df Model: | 1 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 1.998e-06 |
Time: | 14:19:59 | Log-Likelihood: | -2.7490e+06 |
converged: | True | LL-Null: | -2.7490e+06 |
LLR p-value: | 0.0009174 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0485 | 0.001 | 42.263 | 0.000 | 0.046 0.051 |
mhisp | -0.0079 | 0.002 | -3.315 | 0.001 | -0.013 -0.003 |
If the mother is married or unmarried but paternity is acknowledged, the sex ratio is higher (more boys)
model = smf.logit('boy ~ C(mar_p)', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692864 Iterations 3 C(mar_p)[T.Y] 102.8 105.1 *
Dep. Variable: | boy | No. Observations: | 3849169 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3849167 |
Method: | MLE | Df Model: | 1 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 9.129e-06 |
Time: | 14:20:27 | Log-Likelihood: | -2.6670e+06 |
converged: | True | LL-Null: | -2.6670e+06 |
LLR p-value: | 2.990e-12 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0278 | 0.003 | 9.446 | 0.000 | 0.022 0.034 |
C(mar_p)[T.Y] | 0.0219 | 0.003 | 6.978 | 0.000 | 0.016 0.028 |
Being unmarried predicts more girls.
model = smf.logit('boy ~ C(dmar)', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692871 Iterations 3 C(dmar)[T.2] 105.1 104.3 *
Dep. Variable: | boy | No. Observations: | 3998175 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3998173 |
Method: | MLE | Df Model: | 1 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 3.001e-06 |
Time: | 14:20:54 | Log-Likelihood: | -2.7702e+06 |
converged: | True | LL-Null: | -2.7702e+06 |
LLR p-value: | 4.555e-05 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0502 | 0.001 | 38.789 | 0.000 | 0.048 0.053 |
C(dmar)[T.2] | -0.0083 | 0.002 | -4.077 | 0.000 | -0.012 -0.004 |
Each level of mother's education predicts a small increase in the probability of a boy.
model = smf.logit('boy ~ meduc', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692874 Iterations 3 meduc 104.1 104.2 *
Dep. Variable: | boy | No. Observations: | 3810525 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3810523 |
Method: | MLE | Df Model: | 1 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 1.416e-06 |
Time: | 14:20:59 | Log-Likelihood: | -2.6402e+06 |
converged: | True | LL-Null: | -2.6402e+06 |
LLR p-value: | 0.006248 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0398 | 0.003 | 14.711 | 0.000 | 0.034 0.045 |
meduc | 0.0016 | 0.001 | 2.734 | 0.006 | 0.000 0.003 |
model = smf.logit('boy ~ lowed', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692874 Iterations 3 lowed 104.9 104.1 *
Dep. Variable: | boy | No. Observations: | 3810525 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3810523 |
Method: | MLE | Df Model: | 1 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 1.431e-06 |
Time: | 14:21:03 | Log-Likelihood: | -2.6402e+06 |
converged: | True | LL-Null: | -2.6402e+06 |
LLR p-value: | 0.005983 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0478 | 0.001 | 43.002 | 0.000 | 0.046 0.050 |
lowed | -0.0079 | 0.003 | -2.749 | 0.006 | -0.013 -0.002 |
Older fathers are slightly more likely to have girls (but this apparent effect could be due to chance).
model = smf.logit('boy ~ fagerrec11', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692840 Iterations 3 fagerrec11 105.9 105.7 *
Dep. Variable: | boy | No. Observations: | 3501060 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3501058 |
Method: | MLE | Df Model: | 1 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 8.226e-07 |
Time: | 14:21:08 | Log-Likelihood: | -2.4257e+06 |
converged: | True | LL-Null: | -2.4257e+06 |
LLR p-value: | 0.04575 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0570 | 0.004 | 14.707 | 0.000 | 0.049 0.065 |
fagerrec11 | -0.0015 | 0.001 | -1.998 | 0.046 | -0.003 -2.9e-05 |
model = smf.logit('boy ~ youngf + oldf', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692840 Iterations 3 youngf 105.1 106.3 oldf 105.1 105.0
Dep. Variable: | boy | No. Observations: | 3501060 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3501057 |
Method: | MLE | Df Model: | 2 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 5.807e-07 |
Time: | 14:21:12 | Log-Likelihood: | -2.4257e+06 |
converged: | True | LL-Null: | -2.4257e+06 |
LLR p-value: | 0.2445 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0493 | 0.001 | 44.656 | 0.000 | 0.047 0.051 |
youngf | 0.0116 | 0.007 | 1.673 | 0.094 | -0.002 0.025 |
oldf | -0.0005 | 0.006 | -0.086 | 0.932 | -0.012 0.011 |
Predictions based on father's race are similar to those based on mother's race: more girls for black and Native American fathers; more boys for Asian fathers.
model = smf.logit('boy ~ C(fbrace)', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692818 Iterations 3 C(fbrace)[T.2.0] 105.5 103.1 * C(fbrace)[T.3.0] 105.5 102.9 * C(fbrace)[T.4.0] 105.5 106.6 *
Dep. Variable: | boy | No. Observations: | 3254136 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3254132 |
Method: | MLE | Df Model: | 3 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 1.504e-05 |
Time: | 14:21:38 | Log-Likelihood: | -2.2545e+06 |
converged: | True | LL-Null: | -2.2546e+06 |
LLR p-value: | 1.256e-14 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0533 | 0.001 | 42.144 | 0.000 | 0.051 0.056 |
C(fbrace)[T.2.0] | -0.0227 | 0.003 | -7.221 | 0.000 | -0.029 -0.017 |
C(fbrace)[T.3.0] | -0.0250 | 0.011 | -2.335 | 0.020 | -0.046 -0.004 |
C(fbrace)[T.4.0] | 0.0106 | 0.004 | 2.479 | 0.013 | 0.002 0.019 |
If the father is Hispanic, that predicts more girls.
model = smf.logit('boy ~ fhisp', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692839 Iterations 3 fhisp 105.4 104.0 *
Dep. Variable: | boy | No. Observations: | 3453052 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3453050 |
Method: | MLE | Df Model: | 1 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 5.800e-06 |
Time: | 14:21:42 | Log-Likelihood: | -2.3924e+06 |
converged: | True | LL-Null: | -2.3924e+06 |
LLR p-value: | 1.378e-07 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0525 | 0.001 | 42.696 | 0.000 | 0.050 0.055 |
fhisp | -0.0134 | 0.003 | -5.268 | 0.000 | -0.018 -0.008 |
Father's education level might predict more boys, but the apparent effect could be due to chance.
model = smf.logit('boy ~ feduc', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692840 Iterations 3 feduc 104.6 104.7
Dep. Variable: | boy | No. Observations: | 3279126 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3279124 |
Method: | MLE | Df Model: | 1 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 8.046e-07 |
Time: | 14:21:46 | Log-Likelihood: | -2.2719e+06 |
converged: | True | LL-Null: | -2.2719e+06 |
LLR p-value: | 0.05587 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0445 | 0.003 | 15.630 | 0.000 | 0.039 0.050 |
feduc | 0.0012 | 0.001 | 1.912 | 0.056 | -3.02e-05 0.002 |
Babies with high birth order are slightly more likely to be girls.
model = smf.logit('boy ~ lbo_rec', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692872 Iterations 3 lbo_rec 105.3 105.1 *
Dep. Variable: | boy | No. Observations: | 3978150 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3978148 |
Method: | MLE | Df Model: | 1 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 1.576e-06 |
Time: | 14:21:51 | Log-Likelihood: | -2.7563e+06 |
converged: | True | LL-Null: | -2.7564e+06 |
LLR p-value: | 0.003206 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0518 | 0.002 | 26.529 | 0.000 | 0.048 0.056 |
lbo_rec | -0.0023 | 0.001 | -2.947 | 0.003 | -0.004 -0.001 |
model = smf.logit('boy ~ highbo', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692872 Iterations 3 highbo 104.9 103.4 *
Dep. Variable: | boy | No. Observations: | 3978150 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3978148 |
Method: | MLE | Df Model: | 1 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 1.647e-06 |
Time: | 14:21:56 | Log-Likelihood: | -2.7563e+06 |
converged: | True | LL-Null: | -2.7564e+06 |
LLR p-value: | 0.002584 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0475 | 0.001 | 46.200 | 0.000 | 0.046 0.050 |
highbo | -0.0139 | 0.005 | -3.013 | 0.003 | -0.023 -0.005 |
Strangely, prenatal visits are associated with an increased probability of girls.
model = smf.logit('boy ~ previs', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692847 Iterations 3 previs 104.6 103.8 *
Dep. Variable: | boy | No. Observations: | 3888065 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3888063 |
Method: | MLE | Df Model: | 1 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 3.975e-05 |
Time: | 14:22:01 | Log-Likelihood: | -2.6938e+06 |
converged: | True | LL-Null: | -2.6939e+06 |
LLR p-value: | 1.677e-48 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0449 | 0.001 | 43.933 | 0.000 | 0.043 0.047 |
previs | -0.0079 | 0.001 | -14.634 | 0.000 | -0.009 -0.007 |
The effect seems to be non-linear at zero, so I'm adding a boolean for no prenatal visits.
model = smf.logit('boy ~ no_previs + previs', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692842 Iterations 3 no_previs 104.6 98.9 * previs 104.6 103.7 *
Dep. Variable: | boy | No. Observations: | 3888065 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3888062 |
Method: | MLE | Df Model: | 2 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 4.717e-05 |
Time: | 14:22:07 | Log-Likelihood: | -2.6938e+06 |
converged: | True | LL-Null: | -2.6939e+06 |
LLR p-value: | 6.538e-56 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0454 | 0.001 | 44.310 | 0.000 | 0.043 0.047 |
no_previs | -0.0564 | 0.009 | -6.322 | 0.000 | -0.074 -0.039 |
previs | -0.0093 | 0.001 | -15.938 | 0.000 | -0.010 -0.008 |
If the mother qualifies for food stamps, she is more likely to have a girl.
model = smf.logit('boy ~ wic', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692869 Iterations 3 wic[T.Y] 105.2 104.3 *
Dep. Variable: | boy | No. Observations: | 3759121 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3759119 |
Method: | MLE | Df Model: | 1 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 3.051e-06 |
Time: | 14:22:35 | Log-Likelihood: | -2.6046e+06 |
converged: | True | LL-Null: | -2.6046e+06 |
LLR p-value: | 6.700e-05 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0506 | 0.001 | 36.886 | 0.000 | 0.048 0.053 |
wic[T.Y] | -0.0083 | 0.002 | -3.987 | 0.000 | -0.012 -0.004 |
Mother's height seems to have no predictive value.
model = smf.logit('boy ~ height', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692873 Iterations 3 height 102.4 102.5
Dep. Variable: | boy | No. Observations: | 3790892 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3790890 |
Method: | MLE | Df Model: | 1 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 1.853e-07 |
Time: | 14:22:39 | Log-Likelihood: | -2.6266e+06 |
converged: | True | LL-Null: | -2.6266e+06 |
LLR p-value: | 0.3238 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0240 | 0.023 | 1.038 | 0.299 | -0.021 0.069 |
height | 0.0004 | 0.000 | 0.987 | 0.324 | -0.000 0.001 |
model = smf.logit('boy ~ mtall + mshort', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692872 Iterations 3 mtall 104.8 104.1 mshort 104.8 104.3
Dep. Variable: | boy | No. Observations: | 3790892 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3790889 |
Method: | MLE | Df Model: | 2 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 4.560e-07 |
Time: | 14:22:43 | Log-Likelihood: | -2.6266e+06 |
converged: | True | LL-Null: | -2.6266e+06 |
LLR p-value: | 0.3019 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0473 | 0.001 | 44.433 | 0.000 | 0.045 0.049 |
mtall | -0.0071 | 0.006 | -1.212 | 0.226 | -0.018 0.004 |
mshort | -0.0056 | 0.006 | -1.005 | 0.315 | -0.016 0.005 |
Mother's with higher BMI are more likely to have girls.
model = smf.logit('boy ~ bmi_r', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692870 Iterations 3 bmi_r 105.7 105.4 *
Dep. Variable: | boy | No. Observations: | 3709225 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3709223 |
Method: | MLE | Df Model: | 1 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 2.168e-06 |
Time: | 14:22:48 | Log-Likelihood: | -2.5700e+06 |
converged: | True | LL-Null: | -2.5700e+06 |
LLR p-value: | 0.0008442 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0554 | 0.003 | 20.336 | 0.000 | 0.050 0.061 |
bmi_r | -0.0029 | 0.001 | -3.338 | 0.001 | -0.005 -0.001 |
model = smf.logit('boy ~ obese', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692870 Iterations 3 obese 105.0 104.2 *
Dep. Variable: | boy | No. Observations: | 3709225 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3709223 |
Method: | MLE | Df Model: | 1 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 2.347e-06 |
Time: | 14:22:53 | Log-Likelihood: | -2.5700e+06 |
converged: | True | LL-Null: | -2.5700e+06 |
LLR p-value: | 0.0005139 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0491 | 0.001 | 40.976 | 0.000 | 0.047 0.051 |
obese | -0.0084 | 0.002 | -3.473 | 0.001 | -0.013 -0.004 |
If payment was made by Medicaid, the baby is more likely to be a girl. Private insurance, self-payment, and other payment method are associated with more boys.
model = smf.logit('boy ~ C(pay_rec)', data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692869 Iterations 3 C(pay_rec)[T.2.0] 104.2 105.1 * C(pay_rec)[T.3.0] 104.2 106.6 * C(pay_rec)[T.4.0] 104.2 104.7
Dep. Variable: | boy | No. Observations: | 3819768 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3819764 |
Method: | MLE | Df Model: | 3 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 5.306e-06 |
Time: | 14:23:19 | Log-Likelihood: | -2.6466e+06 |
converged: | True | LL-Null: | -2.6466e+06 |
LLR p-value: | 3.482e-06 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0416 | 0.002 | 26.840 | 0.000 | 0.039 0.045 |
C(pay_rec)[T.2.0] | 0.0085 | 0.002 | 3.982 | 0.000 | 0.004 0.013 |
C(pay_rec)[T.3.0] | 0.0222 | 0.005 | 4.272 | 0.000 | 0.012 0.032 |
C(pay_rec)[T.4.0] | 0.0047 | 0.005 | 0.925 | 0.355 | -0.005 0.015 |
However, none of the previous results should be taken too seriously. We only tested one variable at a time, and many of these apparent effects disappear when we add control variables.
In particular, if we control for father's race and Hispanic origin, the mother's race has no additional predictive value.
formula = ('boy ~ C(fbrace) + fhisp + C(mbrace) + mhisp')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692816 Iterations 3 C(fbrace)[T.2.0] 105.8 103.1 * C(fbrace)[T.3.0] 105.8 103.5 C(fbrace)[T.4.0] 105.8 106.9 C(mbrace)[T.2] 105.8 105.9 C(mbrace)[T.3] 105.8 104.5 C(mbrace)[T.4] 105.8 105.6 fhisp 105.8 104.2 * mhisp 105.8 106.0
Dep. Variable: | boy | No. Observations: | 3231530 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3231521 |
Method: | MLE | Df Model: | 8 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 2.087e-05 |
Time: | 14:24:08 | Log-Likelihood: | -2.2389e+06 |
converged: | True | LL-Null: | -2.2389e+06 |
LLR p-value: | 9.292e-17 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0566 | 0.001 | 38.234 | 0.000 | 0.054 0.060 |
C(fbrace)[T.2.0] | -0.0260 | 0.006 | -4.668 | 0.000 | -0.037 -0.015 |
C(fbrace)[T.3.0] | -0.0221 | 0.012 | -1.793 | 0.073 | -0.046 0.002 |
C(fbrace)[T.4.0] | 0.0097 | 0.007 | 1.344 | 0.179 | -0.004 0.024 |
C(mbrace)[T.2] | 0.0004 | 0.006 | 0.075 | 0.940 | -0.011 0.012 |
C(mbrace)[T.3] | -0.0130 | 0.013 | -0.994 | 0.320 | -0.039 0.013 |
C(mbrace)[T.4] | -0.0026 | 0.007 | -0.375 | 0.708 | -0.016 0.011 |
fhisp | -0.0156 | 0.004 | -3.591 | 0.000 | -0.024 -0.007 |
mhisp | 0.0018 | 0.004 | 0.422 | 0.673 | -0.007 0.010 |
In fact, once we control for father's race and Hispanic origin, almost every other variable becomes statistically insignificant, including acknowledged paternity.
formula = ('boy ~ C(fbrace) + fhisp + mar_p')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692814 Iterations 3 C(fbrace)[T.2.0] 108.2 105.5 * C(fbrace)[T.3.0] 108.2 105.2 * C(fbrace)[T.4.0] 108.2 109.1 mar_p[T.Y] 108.2 105.8 fhisp 108.2 106.7 *
Dep. Variable: | boy | No. Observations: | 3112362 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3112356 |
Method: | MLE | Df Model: | 5 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 2.117e-05 |
Time: | 14:24:56 | Log-Likelihood: | -2.1563e+06 |
converged: | True | LL-Null: | -2.1563e+06 |
LLR p-value: | 3.558e-18 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0792 | 0.015 | 5.155 | 0.000 | 0.049 0.109 |
C(fbrace)[T.2.0] | -0.0258 | 0.003 | -7.860 | 0.000 | -0.032 -0.019 |
C(fbrace)[T.3.0] | -0.0283 | 0.011 | -2.594 | 0.009 | -0.050 -0.007 |
C(fbrace)[T.4.0] | 0.0074 | 0.004 | 1.662 | 0.097 | -0.001 0.016 |
mar_p[T.Y] | -0.0225 | 0.015 | -1.464 | 0.143 | -0.053 0.008 |
fhisp | -0.0148 | 0.003 | -4.982 | 0.000 | -0.021 -0.009 |
Being married still predicts more boys.
formula = ('boy ~ C(fbrace) + fhisp + dmar')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692814 Iterations 3 C(fbrace)[T.2.0] 105.0 102.2 * C(fbrace)[T.3.0] 105.0 101.9 * C(fbrace)[T.4.0] 105.0 105.9 fhisp 105.0 103.4 * dmar 105.0 105.7 *
Dep. Variable: | boy | No. Observations: | 3235798 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3235792 |
Method: | MLE | Df Model: | 5 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 2.183e-05 |
Time: | 14:25:22 | Log-Likelihood: | -2.2418e+06 |
converged: | True | LL-Null: | -2.2419e+06 |
LLR p-value: | 1.485e-19 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0492 | 0.003 | 14.375 | 0.000 | 0.042 0.056 |
C(fbrace)[T.2.0] | -0.0278 | 0.003 | -8.324 | 0.000 | -0.034 -0.021 |
C(fbrace)[T.3.0] | -0.0301 | 0.011 | -2.778 | 0.005 | -0.051 -0.009 |
C(fbrace)[T.4.0] | 0.0081 | 0.004 | 1.871 | 0.061 | -0.000 0.017 |
fhisp | -0.0156 | 0.003 | -5.270 | 0.000 | -0.021 -0.010 |
dmar | 0.0062 | 0.003 | 2.416 | 0.016 | 0.001 0.011 |
The effect of education disappears.
formula = ('boy ~ C(fbrace) + fhisp + lowed')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692816 Iterations 3 C(fbrace)[T.2.0] 105.8 103.1 * C(fbrace)[T.3.0] 105.8 102.8 * C(fbrace)[T.4.0] 105.8 106.5 fhisp 105.8 104.2 * lowed 105.8 106.0
Dep. Variable: | boy | No. Observations: | 3091385 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3091379 |
Method: | MLE | Df Model: | 5 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 2.076e-05 |
Time: | 14:25:47 | Log-Likelihood: | -2.1418e+06 |
converged: | True | LL-Null: | -2.1418e+06 |
LLR p-value: | 1.130e-17 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0566 | 0.001 | 37.993 | 0.000 | 0.054 0.060 |
C(fbrace)[T.2.0] | -0.0259 | 0.003 | -7.838 | 0.000 | -0.032 -0.019 |
C(fbrace)[T.3.0] | -0.0287 | 0.011 | -2.624 | 0.009 | -0.050 -0.007 |
C(fbrace)[T.4.0] | 0.0067 | 0.004 | 1.487 | 0.137 | -0.002 0.015 |
fhisp | -0.0152 | 0.003 | -4.927 | 0.000 | -0.021 -0.009 |
lowed | 0.0017 | 0.004 | 0.462 | 0.644 | -0.006 0.009 |
The effect of birth order disappears.
formula = ('boy ~ C(fbrace) + fhisp + highbo')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692816 Iterations 3 C(fbrace)[T.2.0] 105.8 103.2 * C(fbrace)[T.3.0] 105.8 102.9 * C(fbrace)[T.4.0] 105.8 106.6 fhisp 105.8 104.4 * highbo 105.8 105.6
Dep. Variable: | boy | No. Observations: | 3221819 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3221813 |
Method: | MLE | Df Model: | 5 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 2.029e-05 |
Time: | 14:26:13 | Log-Likelihood: | -2.2321e+06 |
converged: | True | LL-Null: | -2.2322e+06 |
LLR p-value: | 5.072e-18 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0566 | 0.001 | 38.815 | 0.000 | 0.054 0.060 |
C(fbrace)[T.2.0] | -0.0253 | 0.003 | -7.841 | 0.000 | -0.032 -0.019 |
C(fbrace)[T.3.0] | -0.0284 | 0.011 | -2.616 | 0.009 | -0.050 -0.007 |
C(fbrace)[T.4.0] | 0.0077 | 0.004 | 1.758 | 0.079 | -0.001 0.016 |
fhisp | -0.0139 | 0.003 | -4.785 | 0.000 | -0.020 -0.008 |
highbo | -0.0026 | 0.005 | -0.483 | 0.629 | -0.013 0.008 |
WIC is no longer associated with more girls.
formula = ('boy ~ C(fbrace) + fhisp + wic')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692813 Iterations 3 C(fbrace)[T.2.0] 105.8 103.0 * C(fbrace)[T.3.0] 105.8 103.0 * C(fbrace)[T.4.0] 105.8 106.6 wic[T.Y] 105.8 106.1 fhisp 105.8 104.1 *
Dep. Variable: | boy | No. Observations: | 3040527 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3040521 |
Method: | MLE | Df Model: | 5 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 2.175e-05 |
Time: | 14:27:01 | Log-Likelihood: | -2.1065e+06 |
converged: | True | LL-Null: | -2.1066e+06 |
LLR p-value: | 3.031e-18 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0564 | 0.002 | 34.772 | 0.000 | 0.053 0.060 |
C(fbrace)[T.2.0] | -0.0271 | 0.003 | -7.892 | 0.000 | -0.034 -0.020 |
C(fbrace)[T.3.0] | -0.0267 | 0.011 | -2.405 | 0.016 | -0.048 -0.005 |
C(fbrace)[T.4.0] | 0.0076 | 0.005 | 1.670 | 0.095 | -0.001 0.016 |
wic[T.Y] | 0.0025 | 0.003 | 0.975 | 0.330 | -0.002 0.007 |
fhisp | -0.0161 | 0.003 | -5.153 | 0.000 | -0.022 -0.010 |
The effect of obesity disappears.
formula = ('boy ~ C(fbrace) + fhisp + obese')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692815 Iterations 3 C(fbrace)[T.2.0] 105.9 103.3 * C(fbrace)[T.3.0] 105.9 103.1 * C(fbrace)[T.4.0] 105.9 106.5 fhisp 105.9 104.3 * obese 105.9 105.7
Dep. Variable: | boy | No. Observations: | 3005073 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3005067 |
Method: | MLE | Df Model: | 5 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 1.947e-05 |
Time: | 14:27:26 | Log-Likelihood: | -2.0820e+06 |
converged: | True | LL-Null: | -2.0820e+06 |
LLR p-value: | 5.013e-16 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0571 | 0.002 | 35.622 | 0.000 | 0.054 0.060 |
C(fbrace)[T.2.0] | -0.0247 | 0.003 | -7.305 | 0.000 | -0.031 -0.018 |
C(fbrace)[T.3.0] | -0.0266 | 0.011 | -2.410 | 0.016 | -0.048 -0.005 |
C(fbrace)[T.4.0] | 0.0056 | 0.005 | 1.217 | 0.224 | -0.003 0.015 |
fhisp | -0.0151 | 0.003 | -4.996 | 0.000 | -0.021 -0.009 |
obese | -0.0014 | 0.003 | -0.524 | 0.600 | -0.007 0.004 |
The effect of payment method is diminished, but self-payment is still associated with more boys.
formula = ('boy ~ C(fbrace) + fhisp + C(pay_rec)')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692812 Iterations 3 C(fbrace)[T.2.0] 106.1 103.3 * C(fbrace)[T.3.0] 106.1 103.0 * C(fbrace)[T.4.0] 106.1 106.7 C(pay_rec)[T.2.0] 106.1 105.7 C(pay_rec)[T.3.0] 106.1 108.3 * C(pay_rec)[T.4.0] 106.1 105.4 fhisp 106.1 104.4 *
Dep. Variable: | boy | No. Observations: | 3086812 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3086804 |
Method: | MLE | Df Model: | 7 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 2.500e-05 |
Time: | 14:28:14 | Log-Likelihood: | -2.1386e+06 |
converged: | True | LL-Null: | -2.1386e+06 |
LLR p-value: | 3.965e-20 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0593 | 0.002 | 25.249 | 0.000 | 0.055 0.064 |
C(fbrace)[T.2.0] | -0.0271 | 0.003 | -7.980 | 0.000 | -0.034 -0.020 |
C(fbrace)[T.3.0] | -0.0297 | 0.011 | -2.696 | 0.007 | -0.051 -0.008 |
C(fbrace)[T.4.0] | 0.0056 | 0.004 | 1.239 | 0.216 | -0.003 0.014 |
C(pay_rec)[T.2.0] | -0.0043 | 0.003 | -1.680 | 0.093 | -0.009 0.001 |
C(pay_rec)[T.3.0] | 0.0203 | 0.006 | 3.331 | 0.001 | 0.008 0.032 |
C(pay_rec)[T.4.0] | -0.0063 | 0.006 | -1.094 | 0.274 | -0.018 0.005 |
fhisp | -0.0167 | 0.003 | -5.378 | 0.000 | -0.023 -0.011 |
But the effect of prenatal visits is still a strong predictor of more girls.
formula = ('boy ~ C(fbrace) + fhisp + previs')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692778 Iterations 3 C(fbrace)[T.2.0] 105.8 102.8 * C(fbrace)[T.3.0] 105.8 102.3 * C(fbrace)[T.4.0] 105.8 106.4 fhisp 105.8 104.0 * previs 105.8 104.8 *
Dep. Variable: | boy | No. Observations: | 3155440 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3155434 |
Method: | MLE | Df Model: | 5 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 7.997e-05 |
Time: | 14:28:40 | Log-Likelihood: | -2.1860e+06 |
converged: | True | LL-Null: | -2.1862e+06 |
LLR p-value: | 2.081e-73 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0567 | 0.001 | 38.800 | 0.000 | 0.054 0.060 |
C(fbrace)[T.2.0] | -0.0295 | 0.003 | -9.008 | 0.000 | -0.036 -0.023 |
C(fbrace)[T.3.0] | -0.0341 | 0.011 | -3.114 | 0.002 | -0.056 -0.013 |
C(fbrace)[T.4.0] | 0.0058 | 0.004 | 1.314 | 0.189 | -0.003 0.014 |
fhisp | -0.0172 | 0.003 | -5.862 | 0.000 | -0.023 -0.011 |
previs | -0.0102 | 0.001 | -16.235 | 0.000 | -0.011 -0.009 |
And the effect is even stronger if we add a boolean to capture the nonlinearity at 0 visits.
formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692776 Iterations 3 C(fbrace)[T.2.0] 105.9 102.8 * C(fbrace)[T.3.0] 105.9 102.3 * C(fbrace)[T.4.0] 105.9 106.5 fhisp 105.9 104.1 * previs 105.9 104.7 * no_previs 105.9 101.0 *
Dep. Variable: | boy | No. Observations: | 3155440 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3155433 |
Method: | MLE | Df Model: | 6 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 8.351e-05 |
Time: | 14:29:06 | Log-Likelihood: | -2.1860e+06 |
converged: | True | LL-Null: | -2.1862e+06 |
LLR p-value: | 8.674e-76 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0570 | 0.001 | 38.973 | 0.000 | 0.054 0.060 |
C(fbrace)[T.2.0] | -0.0294 | 0.003 | -8.984 | 0.000 | -0.036 -0.023 |
C(fbrace)[T.3.0] | -0.0342 | 0.011 | -3.123 | 0.002 | -0.056 -0.013 |
C(fbrace)[T.4.0] | 0.0056 | 0.004 | 1.270 | 0.204 | -0.003 0.014 |
fhisp | -0.0171 | 0.003 | -5.817 | 0.000 | -0.023 -0.011 |
previs | -0.0111 | 0.001 | -16.625 | 0.000 | -0.012 -0.010 |
no_previs | -0.0469 | 0.012 | -3.936 | 0.000 | -0.070 -0.024 |
Now if we control for father's race and Hispanic origin as well as number of prenatal visits, the effect of marriage disappears.
formula = ('boy ~ C(fbrace) + fhisp + previs + dmar')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692778 Iterations 3 C(fbrace)[T.2.0] 105.3 102.1 * C(fbrace)[T.3.0] 105.3 101.7 * C(fbrace)[T.4.0] 105.3 106.0 fhisp 105.3 103.5 * previs 105.3 104.3 * dmar 105.3 105.7
Dep. Variable: | boy | No. Observations: | 3155440 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3155433 |
Method: | MLE | Df Model: | 6 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 8.045e-05 |
Time: | 14:29:32 | Log-Likelihood: | -2.1860e+06 |
converged: | True | LL-Null: | -2.1862e+06 |
LLR p-value: | 6.525e-73 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0521 | 0.003 | 15.015 | 0.000 | 0.045 0.059 |
C(fbrace)[T.2.0] | -0.0309 | 0.003 | -9.058 | 0.000 | -0.038 -0.024 |
C(fbrace)[T.3.0] | -0.0353 | 0.011 | -3.210 | 0.001 | -0.057 -0.014 |
C(fbrace)[T.4.0] | 0.0062 | 0.004 | 1.394 | 0.163 | -0.002 0.015 |
fhisp | -0.0181 | 0.003 | -6.033 | 0.000 | -0.024 -0.012 |
previs | -0.0102 | 0.001 | -16.122 | 0.000 | -0.011 -0.009 |
dmar | 0.0037 | 0.003 | 1.446 | 0.148 | -0.001 0.009 |
The effect of payment method disappears.
formula = ('boy ~ C(fbrace) + fhisp + previs + C(pay_rec)')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692777 Iterations 3 C(fbrace)[T.2.0] 105.8 102.8 * C(fbrace)[T.3.0] 105.8 102.2 * C(fbrace)[T.4.0] 105.8 106.3 C(pay_rec)[T.2.0] 105.8 105.9 C(pay_rec)[T.3.0] 105.8 106.9 C(pay_rec)[T.4.0] 105.8 105.0 fhisp 105.8 104.0 * previs 105.8 104.8 *
Dep. Variable: | boy | No. Observations: | 3009712 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3009703 |
Method: | MLE | Df Model: | 8 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 8.163e-05 |
Time: | 14:30:20 | Log-Likelihood: | -2.0851e+06 |
converged: | True | LL-Null: | -2.0852e+06 |
LLR p-value: | 1.004e-68 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0566 | 0.002 | 23.765 | 0.000 | 0.052 0.061 |
C(fbrace)[T.2.0] | -0.0295 | 0.003 | -8.509 | 0.000 | -0.036 -0.023 |
C(fbrace)[T.3.0] | -0.0345 | 0.011 | -3.090 | 0.002 | -0.056 -0.013 |
C(fbrace)[T.4.0] | 0.0046 | 0.005 | 1.012 | 0.312 | -0.004 0.014 |
C(pay_rec)[T.2.0] | 0.0005 | 0.003 | 0.174 | 0.862 | -0.005 0.006 |
C(pay_rec)[T.3.0] | 0.0100 | 0.006 | 1.619 | 0.105 | -0.002 0.022 |
C(pay_rec)[T.4.0] | -0.0074 | 0.006 | -1.260 | 0.208 | -0.019 0.004 |
fhisp | -0.0178 | 0.003 | -5.687 | 0.000 | -0.024 -0.012 |
previs | -0.0101 | 0.001 | -15.540 | 0.000 | -0.011 -0.009 |
Here's a version with the addition of a boolean for no prenatal visits.
formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692776 Iterations 3 C(fbrace)[T.2.0] 105.9 102.8 * C(fbrace)[T.3.0] 105.9 102.3 * C(fbrace)[T.4.0] 105.9 106.5 fhisp 105.9 104.1 * previs 105.9 104.7 * no_previs 105.9 101.0 *
Dep. Variable: | boy | No. Observations: | 3155440 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3155433 |
Method: | MLE | Df Model: | 6 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 8.351e-05 |
Time: | 14:30:47 | Log-Likelihood: | -2.1860e+06 |
converged: | True | LL-Null: | -2.1862e+06 |
LLR p-value: | 8.674e-76 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0570 | 0.001 | 38.973 | 0.000 | 0.054 0.060 |
C(fbrace)[T.2.0] | -0.0294 | 0.003 | -8.984 | 0.000 | -0.036 -0.023 |
C(fbrace)[T.3.0] | -0.0342 | 0.011 | -3.123 | 0.002 | -0.056 -0.013 |
C(fbrace)[T.4.0] | 0.0056 | 0.004 | 1.270 | 0.204 | -0.003 0.014 |
fhisp | -0.0171 | 0.003 | -5.817 | 0.000 | -0.023 -0.011 |
previs | -0.0111 | 0.001 | -16.625 | 0.000 | -0.012 -0.010 |
no_previs | -0.0469 | 0.012 | -3.936 | 0.000 | -0.070 -0.024 |
Now, surprisingly, the mother's age has a small effect.
formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs + mager9')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692775 Iterations 3 C(fbrace)[T.2.0] 106.8 103.6 * C(fbrace)[T.3.0] 106.8 103.1 * C(fbrace)[T.4.0] 106.8 107.4 fhisp 106.8 104.9 * previs 106.8 105.6 * no_previs 106.8 101.9 * mager9 106.8 106.6 *
Dep. Variable: | boy | No. Observations: | 3155440 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3155432 |
Method: | MLE | Df Model: | 7 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 8.440e-05 |
Time: | 14:31:14 | Log-Likelihood: | -2.1860e+06 |
converged: | True | LL-Null: | -2.1862e+06 |
LLR p-value: | 1.043e-75 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0656 | 0.005 | 14.344 | 0.000 | 0.057 0.075 |
C(fbrace)[T.2.0] | -0.0300 | 0.003 | -9.123 | 0.000 | -0.036 -0.024 |
C(fbrace)[T.3.0] | -0.0351 | 0.011 | -3.200 | 0.001 | -0.057 -0.014 |
C(fbrace)[T.4.0] | 0.0062 | 0.004 | 1.413 | 0.158 | -0.002 0.015 |
fhisp | -0.0176 | 0.003 | -5.974 | 0.000 | -0.023 -0.012 |
previs | -0.0110 | 0.001 | -16.456 | 0.000 | -0.012 -0.010 |
no_previs | -0.0468 | 0.012 | -3.926 | 0.000 | -0.070 -0.023 |
mager9 | -0.0019 | 0.001 | -1.970 | 0.049 | -0.004 -9.69e-06 |
So does the father's age. But both age effects are small and borderline significant.
formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs + fagerrec11')
model = smf.logit(formula, data=df)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692775 Iterations 3 C(fbrace)[T.2.0] 106.9 103.7 * C(fbrace)[T.3.0] 106.9 103.2 * C(fbrace)[T.4.0] 106.9 107.6 fhisp 106.9 105.0 * previs 106.9 105.7 * no_previs 106.9 101.8 * fagerrec11 106.9 106.7 *
Dep. Variable: | boy | No. Observations: | 3148537 |
---|---|---|---|
Model: | Logit | Df Residuals: | 3148529 |
Method: | MLE | Df Model: | 7 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 8.517e-05 |
Time: | 14:32:34 | Log-Likelihood: | -2.1812e+06 |
converged: | True | LL-Null: | -2.1814e+06 |
LLR p-value: | 2.924e-76 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0663 | 0.004 | 15.399 | 0.000 | 0.058 0.075 |
C(fbrace)[T.2.0] | -0.0299 | 0.003 | -9.100 | 0.000 | -0.036 -0.023 |
C(fbrace)[T.3.0] | -0.0348 | 0.011 | -3.170 | 0.002 | -0.056 -0.013 |
C(fbrace)[T.4.0] | 0.0067 | 0.004 | 1.518 | 0.129 | -0.002 0.015 |
fhisp | -0.0176 | 0.003 | -5.974 | 0.000 | -0.023 -0.012 |
previs | -0.0110 | 0.001 | -16.545 | 0.000 | -0.012 -0.010 |
no_previs | -0.0483 | 0.012 | -4.039 | 0.000 | -0.072 -0.025 |
fagerrec11 | -0.0019 | 0.001 | -2.278 | 0.023 | -0.003 -0.000 |
The predictive power of prenatal visits is still surprising to me. To make sure we're controlled for race, I'll select cases where both parents are white:
white = df[(df.mbrace==1) & (df.fbrace==1)]
len(white)
2400787
And compute sex ratios for each level of previs
var = 'previs'
white[[var, 'boy']].groupby(var).aggregate(series_to_ratio)
boy | |
---|---|
previs | |
-6 | 107 |
-5 | 110 |
-4 | 108 |
-3 | 110 |
-2 | 108 |
-1 | 107 |
0 | 105 |
1 | 103 |
2 | 103 |
3 | 102 |
4 | 103 |
The effect holds up. People with fewer than average prenatal visits are substantially more likely to have boys.
formula = ('boy ~ previs + no_previs')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692749 Iterations 3 previs 105.5 104.3 * no_previs 105.5 100.4 *
Dep. Variable: | boy | No. Observations: | 2346785 |
---|---|---|---|
Model: | Logit | Df Residuals: | 2346782 |
Method: | MLE | Df Model: | 2 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 6.418e-05 |
Time: | 14:40:39 | Log-Likelihood: | -1.6257e+06 |
converged: | True | LL-Null: | -1.6258e+06 |
LLR p-value: | 4.790e-46 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0534 | 0.001 | 40.728 | 0.000 | 0.051 0.056 |
previs | -0.0113 | 0.001 | -14.378 | 0.000 | -0.013 -0.010 |
no_previs | -0.0490 | 0.015 | -3.352 | 0.001 | -0.078 -0.020 |
inter = results.params['Intercept']
slope = results.params['previs']
inter, slope
(0.053449172473506806, -0.011302385985286368)
previs = np.arange(-5, 5)
logodds = inter + slope * previs
odds = np.exp(logodds)
odds * 100
array([ 111.62346508, 110.36895641, 109.12854687, 107.90207798, 106.68939307, 105.49033723, 104.30475728, 103.13250177, 101.97342096, 100.82736677])
formula = ('boy ~ dmar')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692788 Iterations 3 dmar 105.3 105.5
Dep. Variable: | boy | No. Observations: | 2400787 |
---|---|---|---|
Model: | Logit | Df Residuals: | 2400785 |
Method: | MLE | Df Model: | 1 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 7.406e-08 |
Time: | 15:27:21 | Log-Likelihood: | -1.6632e+06 |
converged: | True | LL-Null: | -1.6632e+06 |
LLR p-value: | 0.6196 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0518 | 0.004 | 13.234 | 0.000 | 0.044 0.059 |
dmar | 0.0014 | 0.003 | 0.496 | 0.620 | -0.004 0.007 |
formula = ('boy ~ lowed')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692788 Iterations 3 lowed 105.6 105.0
Dep. Variable: | boy | No. Observations: | 2301234 |
---|---|---|---|
Model: | Logit | Df Residuals: | 2301232 |
Method: | MLE | Df Model: | 1 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 4.759e-07 |
Time: | 15:28:01 | Log-Likelihood: | -1.5943e+06 |
converged: | True | LL-Null: | -1.5943e+06 |
LLR p-value: | 0.2180 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0542 | 0.001 | 38.603 | 0.000 | 0.051 0.057 |
lowed | -0.0051 | 0.004 | -1.232 | 0.218 | -0.013 0.003 |
formula = ('boy ~ highbo')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692788 Iterations 3 highbo 105.5 105.6
Dep. Variable: | boy | No. Observations: | 2391630 |
---|---|---|---|
Model: | Logit | Df Residuals: | 2391628 |
Method: | MLE | Df Model: | 1 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 4.564e-09 |
Time: | 15:28:25 | Log-Likelihood: | -1.6569e+06 |
converged: | True | LL-Null: | -1.6569e+06 |
LLR p-value: | 0.9021 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0535 | 0.001 | 40.493 | 0.000 | 0.051 0.056 |
highbo | 0.0008 | 0.006 | 0.123 | 0.902 | -0.012 0.013 |
formula = ('boy ~ wic')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692786 Iterations 3 wic[T.Y] 105.6 105.3
Dep. Variable: | boy | No. Observations: | 2266424 |
---|---|---|---|
Model: | Logit | Df Residuals: | 2266422 |
Method: | MLE | Df Model: | 1 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 3.840e-07 |
Time: | 15:28:57 | Log-Likelihood: | -1.5701e+06 |
converged: | True | LL-Null: | -1.5701e+06 |
LLR p-value: | 0.2721 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0548 | 0.002 | 33.369 | 0.000 | 0.052 0.058 |
wic[T.Y] | -0.0031 | 0.003 | -1.098 | 0.272 | -0.009 0.002 |
formula = ('boy ~ obese')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692788 Iterations 3 obese 105.6 105.3
Dep. Variable: | boy | No. Observations: | 2244349 |
---|---|---|---|
Model: | Logit | Df Residuals: | 2244347 |
Method: | MLE | Df Model: | 1 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 1.725e-07 |
Time: | 15:29:20 | Log-Likelihood: | -1.5549e+06 |
converged: | True | LL-Null: | -1.5549e+06 |
LLR p-value: | 0.4639 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0542 | 0.002 | 35.607 | 0.000 | 0.051 0.057 |
obese | -0.0023 | 0.003 | -0.732 | 0.464 | -0.009 0.004 |
formula = ('boy ~ C(pay_rec)')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692786 Iterations 3 C(pay_rec)[T.2.0] 105.4 105.5 C(pay_rec)[T.3.0] 105.4 107.1 * C(pay_rec)[T.4.0] 105.4 105.3
Dep. Variable: | boy | No. Observations: | 2295681 |
---|---|---|---|
Model: | Logit | Df Residuals: | 2295677 |
Method: | MLE | Df Model: | 3 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 1.666e-06 |
Time: | 15:30:06 | Log-Likelihood: | -1.5904e+06 |
converged: | True | LL-Null: | -1.5904e+06 |
LLR p-value: | 0.1511 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0529 | 0.002 | 23.356 | 0.000 | 0.048 0.057 |
C(pay_rec)[T.2.0] | 0.0004 | 0.003 | 0.147 | 0.883 | -0.005 0.006 |
C(pay_rec)[T.3.0] | 0.0159 | 0.007 | 2.235 | 0.025 | 0.002 0.030 |
C(pay_rec)[T.4.0] | -0.0013 | 0.007 | -0.197 | 0.844 | -0.015 0.012 |
formula = ('boy ~ mager9')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692786 Iterations 3 mager9 107.0 106.7 *
Dep. Variable: | boy | No. Observations: | 2400787 |
---|---|---|---|
Model: | Logit | Df Residuals: | 2400785 |
Method: | MLE | Df Model: | 1 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 2.516e-06 |
Time: | 15:30:32 | Log-Likelihood: | -1.6632e+06 |
converged: | True | LL-Null: | -1.6632e+06 |
LLR p-value: | 0.003813 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0677 | 0.005 | 13.452 | 0.000 | 0.058 0.078 |
mager9 | -0.0032 | 0.001 | -2.893 | 0.004 | -0.005 -0.001 |
formula = ('boy ~ youngm + oldm')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692787 Iterations 3 youngm[T.True] 105.6 105.5 oldm[T.True] 105.6 103.8 *
Dep. Variable: | boy | No. Observations: | 2400787 |
---|---|---|---|
Model: | Logit | Df Residuals: | 2400784 |
Method: | MLE | Df Model: | 2 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 1.549e-06 |
Time: | 15:31:04 | Log-Likelihood: | -1.6632e+06 |
converged: | True | LL-Null: | -1.6632e+06 |
LLR p-value: | 0.07608 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0542 | 0.001 | 40.370 | 0.000 | 0.052 0.057 |
youngm[T.True] | -0.0011 | 0.006 | -0.170 | 0.865 | -0.013 0.011 |
oldm[T.True] | -0.0173 | 0.008 | -2.268 | 0.023 | -0.032 -0.002 |
formula = ('boy ~ youngf + oldf')
model = smf.logit(formula, data=white)
results = model.fit()
summarize(results)
results.summary()
Optimization terminated successfully. Current function value: 0.692787 Iterations 3 youngf 105.5 106.4 oldf 105.5 105.7
Dep. Variable: | boy | No. Observations: | 2396141 |
---|---|---|---|
Model: | Logit | Df Residuals: | 2396138 |
Method: | MLE | Df Model: | 2 |
Date: | Tue, 17 May 2016 | Pseudo R-squ.: | 2.717e-07 |
Time: | 15:31:50 | Log-Likelihood: | -1.6600e+06 |
converged: | True | LL-Null: | -1.6600e+06 |
LLR p-value: | 0.6370 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | 0.0534 | 0.001 | 40.229 | 0.000 | 0.051 0.056 |
youngf | 0.0082 | 0.009 | 0.924 | 0.355 | -0.009 0.026 |
oldf | 0.0018 | 0.008 | 0.242 | 0.809 | -0.013 0.017 |