IV: SOX Policies | DV: Donor Advisory | N | Notes | TO DO |
---|---|---|---|---|
2011 | 2016 | 4,857 | 47 donor advisories on these organizations; associational test (we don't know when the SOX policies were added); also, DV is 'current donor advisory' | ready to run |
2011 | 2012-2016 | 4,857 | 47 2016 advisories plus probably another dozen or so advisories over the 2012-2015 period; associational test as above, but adds in donor advisories that were put in place then dropped between 2012 and 2015. | some minor work creating this new DV but not very burdensome |
2011 | 2011 | 5,439 | 39 donor advisories; pure cross-sectional test |
Download the '2011' 990 data (SOX policies + controls) for the 39 orgs with a 2011 donor advisory; a few hours work to download and enter the data |
2016 | 2016 | 8,304 | 328 donor advisories; pure cross-sectional test | ready to run |
change 2011-2016 | 2016 | 4,857 | 'Divide 4,857 orgs into three groups: i) those with no SOX policies in 2011 and still no SOX policies in 2016; ii) those with SOX policies in 2011 and 2016; and iii) those with no SOX policies in 2011 but SOX policies in 2016. Create dummy variables for each group and see whether those in group iii) do better than i) or ii). This is a relatively low cost 'pre-post' test. | moderate amount of work to create the new dummies but not too burdensome |
change 2011-2016 | 2012-2016 | TBD | Similar to above option, but would need to take a sample of organizations in group iii) and go through their 990s to find out exactly when they added the SOX policies | Resource-intensive 990 searches |
Notes from Meeting with Dan:
To Do (beyond notes listed in table above):
First, we will import several necessary Python packages. We will be using the Python Data Analysis Library, or PANDAS, extensively for our data manipulations. It is invaluable for analyzing datasets.
Import of basic elements of PANDAS and numpy
import numpy as np
import pandas as pd
from pandas import DataFrame
from pandas import Series
We can check which version of various packages we're using. You can see I'm running PANDAS 0.17 here.
print pd.__version__
0.18.1
#http://pandas.pydata.org/pandas-docs/stable/options.html
pd.set_option('display.max_columns', None)
pd.set_option('max_colwidth', 500)
Let's read in the merged historical/current/2011 dataset we created in the last notebook. First we'll change the working directory.
cd '/Users/gregorysaxton/Google Drive/SOX'
/Users/gregorysaxton/Google Drive/SOX
df_2011 = pd.read_pickle('Tests 1-2 data.pkl')
print len(df_2011.columns)
print len(df_2011)
df_2011.head(1)
35 4863
donor_advisory | donor_advisory_2016 | donor_advisory_2011_to_2016 | org_id | EIN | FYE | Form 990 FYE | ratings_system | 2011_data | 2016_data | conflict_of_interest_policy_v2 | records_retention_policy_v2 | whistleblower_policy_v2 | SOX_policies | SOX_policies_binary | SOX_policies_all_binary | program_efficiency | complexity | complexity_2011 | age | total_revenue_logged | category | state | tot_rev | category_Animals | category_Arts, Culture, Humanities | category_Community Development | category_Education | category_Environment | category_Health | category_Human Services | category_Human and Civil Rights | category_International | category_Religion | category_Research and Public Policy | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
50715 | 0.0 | 0.0 | 0.0 | 5954 | 010202467 | FY2009 | 2009-12 | CN 2.0 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 3.0 | 1.0 | 1.0 | 0.788895 | 0.0 | 3.0 | 62.0 | 15.947563 | Research and Public Policy | ME | 8432154.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
print df_2011.columns.tolist()
['donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016', 'org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data', 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary', 'program_efficiency', 'complexity', 'complexity_2011', 'age', 'total_revenue_logged', 'category', 'state', 'tot_rev', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy']
print df_2011['donor_advisory_2016'].value_counts()
print df_2011['donor_advisory_2011_to_2016'].value_counts()
0.0 4816 1.0 47 Name: donor_advisory_2016, dtype: int64 0.0 4755 1.0 108 Name: donor_advisory_2011_to_2016, dtype: int64
#DVs = ['donor_advisory',
DVs = ['donor_advisory_2016', 'donor_advisory_2011_to_2016']
indicators = ['org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data']
IVs = ['conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2',
'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary']
controls = ['program_efficiency', 'complexity', 'age', 'total_revenue_logged', 'category', 'state']
fixed_effects = ['category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development',
'category_Education', 'category_Environment', 'category_Health', 'category_Human Services',
'category_Human and Civil Rights', 'category_International', 'category_Religion',
'category_Research and Public Policy']
SOI_check = ['tot_rev']
merge_cols = ['_merge_v1', '_merge_v2', '_merge_v3', '_merge_v4', '_merge_47', '_merge_efile']
#+ SOI_check
logit_cols = DVs + indicators + IVs + controls + fixed_effects
print logit_cols
['donor_advisory_2016', 'donor_advisory_2011_to_2016', 'org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data', 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary', 'program_efficiency', 'complexity', 'age', 'total_revenue_logged', 'category', 'state', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy']
df_2011[df_2011.duplicated()]
donor_advisory | donor_advisory_2016 | donor_advisory_2011_to_2016 | org_id | EIN | FYE | Form 990 FYE | ratings_system | 2011_data | 2016_data | conflict_of_interest_policy_v2 | records_retention_policy_v2 | whistleblower_policy_v2 | SOX_policies | SOX_policies_binary | SOX_policies_all_binary | program_efficiency | complexity | complexity_2011 | age | total_revenue_logged | category | state | tot_rev | category_Animals | category_Arts, Culture, Humanities | category_Community Development | category_Education | category_Environment | category_Health | category_Human Services | category_Human and Civil Rights | category_International | category_Religion | category_Research and Public Policy |
---|
#df_2011.to_excel('df_2011.xls')
import statsmodels
import statsmodels.api as sm
import statsmodels.formula.api as smf #FOR USING 'R'-STYLE FORMULAS FOR REGRESSIONS
print statsmodels.__version__
0.6.1
#IVs = '%s + ' % IV
#DV = '%s ~ ' % DV
IVs = 'SOX_policies '
#IVs = 'SOX_policies_binary'
#DV = 'advisory ~ '
#DV = 'donor_advisory_2016 ~ '
DV = 'donor_advisory_2011_to_2016 ~ '
controls = '+ total_revenue_logged + program_efficiency + age + complexity_2011 + C(category)'
#admin_expense_percent + leader_comp_percent + budget_surplus
logit_formula = DV+IVs+controls
print logit_formula
#globals()["mod%s" % model_num] = smf.logit(formula=logit_formula, data=df).fit()
#print globals()["mod%s" % model_num].summary()
# #print model_num.summary()
#print '\n', "Chi-squared value:", globals()["mod%s" % model_num].llr, '\n' #TO GET THE CHI-SQUARED VALUE
donor_advisory_2011_to_2016 ~ SOX_policies + total_revenue_logged + program_efficiency + age + complexity_2011 + C(category)
print df_2011.columns.tolist()
['donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016', 'org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data', 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary', 'program_efficiency', 'complexity', 'age', 'total_revenue_logged', 'category', 'state', 'tot_rev', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy']
logit_cols1 = ['donor_advisory_2016', 'donor_advisory_2011_to_2016',
'SOX_policies', 'program_efficiency', 'complexity', 'age', 'total_revenue_logged', 'category', 'state']
len(df_2011[logit_cols1].dropna())
4838
len(df_2011.dropna())
1242
logit = smf.logit(formula=logit_formula, data=df_2011).fit()
logit.summary()
Warning: Maximum number of iterations has been exceeded. Current function value: 0.075203 Iterations: 35
//anaconda/lib/python2.7/site-packages/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals "Check mle_retvals", ConvergenceWarning)
Dep. Variable: | donor_advisory_2011_to_2016 | No. Observations: | 4833 |
---|---|---|---|
Model: | Logit | Df Residuals: | 4817 |
Method: | MLE | Df Model: | 15 |
Date: | Tue, 06 Sep 2016 | Pseudo R-squ.: | 0.08962 |
Time: | 12:35:26 | Log-Likelihood: | -363.45 |
converged: | False | LL-Null: | -399.24 |
LLR p-value: | 2.351e-09 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | -2.6828 | 1.438 | -1.865 | 0.062 | -5.502 0.136 |
C(category)[T.Arts, Culture, Humanities] | -0.5078 | 0.515 | -0.986 | 0.324 | -1.517 0.502 |
C(category)[T.Community Development] | -0.3612 | 0.575 | -0.629 | 0.530 | -1.487 0.765 |
C(category)[T.Education] | -1.2152 | 0.794 | -1.531 | 0.126 | -2.771 0.340 |
C(category)[T.Environment] | -18.8719 | 4856.123 | -0.004 | 0.997 | -9536.698 9498.954 |
C(category)[T.Health] | -1.0195 | 0.548 | -1.862 | 0.063 | -2.093 0.054 |
C(category)[T.Human Services] | -0.1790 | 0.415 | -0.431 | 0.666 | -0.993 0.635 |
C(category)[T.Human and Civil Rights] | -0.6823 | 0.687 | -0.993 | 0.321 | -2.029 0.664 |
C(category)[T.International] | -0.3676 | 0.511 | -0.719 | 0.472 | -1.369 0.634 |
C(category)[T.Religion] | 0.3141 | 0.449 | 0.700 | 0.484 | -0.566 1.194 |
C(category)[T.Research and Public Policy] | -0.5298 | 0.796 | -0.665 | 0.506 | -2.090 1.031 |
SOX_policies | -0.4256 | 0.110 | -3.876 | 0.000 | -0.641 -0.210 |
total_revenue_logged | 0.2806 | 0.101 | 2.785 | 0.005 | 0.083 0.478 |
program_efficiency | -3.0563 | 0.767 | -3.982 | 0.000 | -4.561 -1.552 |
age | -0.0065 | 0.007 | -0.981 | 0.327 | -0.019 0.006 |
complexity_2011 | -0.7391 | 0.274 | -2.700 | 0.007 | -1.275 -0.203 |
df_2011['donor_advisory_2016'] = df_2011['donor_advisory_2016'].astype('int')
df_2011['donor_advisory_2011_to_2016'] = df_2011['donor_advisory_2011_to_2016'].astype('int')
df_2011.dtypes
donor_advisory float64 donor_advisory_2016 int64 donor_advisory_2011_to_2016 int64 org_id object EIN object FYE object Form 990 FYE object ratings_system object 2011_data float64 2016_data float64 conflict_of_interest_policy_v2 float64 records_retention_policy_v2 float64 whistleblower_policy_v2 float64 SOX_policies float64 SOX_policies_binary float64 SOX_policies_all_binary float64 program_efficiency float64 complexity float64 age float64 total_revenue_logged float64 category object state object tot_rev float64 category_Animals float64 category_Arts, Culture, Humanities float64 category_Community Development float64 category_Education float64 category_Environment float64 category_Health float64 category_Human Services float64 category_Human and Civil Rights float64 category_International float64 category_Religion float64 category_Research and Public Policy float64 dtype: object
#def new_logit(IV,model_num):
def new_logit_clustered(data, DV, columns, FE, model_num):
#IVs = '%s + ' % IV
#DV = 'RTs_binary ~ '
DV = '%s ~ ' % DV
#controls = 'from_user_followers_count + time_on_twitter_days + CSR_sustainability + \
# URLs_binary + photo'
IVs = ' + '.join(columns)
FE = '%s ' % FE
logit_formula = DV+IVs+FE
print logit_formula
globals()["mod%s" % model_num] = smf.logit(formula=logit_formula, data=data).fit(cov_type='cluster',
cov_kwds={'groups': df['firm_from_user_screen_name']})
print globals()["mod%s" % model_num].summary()
#print model_num.summary()
print '\n', "Chi-squared value:", globals()["mod%s" % model_num].llr, '\n' #TO GET THE CHI-SQUARED VALUE
#print '\n', "Pseudo R-squared:", globals()["mod%s" % model_num].prsquared #TO GET THE PSEUDO-R-SQUARED
donor_advisory_2011_to_2016 ~ SOX_policies + total_revenue_logged + program_efficiency + age + complexity_2011 + C(category)
logit_variables = ['2011_data', 'donor_advisory_2016', 'SOX_policies', 'total_revenue_logged',
'program_efficiency', 'age', 'complexity', 'state', 'category']
df_2011 = df[logit_variables]
df_2011 = df_2011[df_2011['2011_data']==1]
print len(df_2011)
len(df_2011.dropna())
4863
4813
df[['2011_data', 'donor_advisory_2016', 'SOX_policies', 'total_revenue_logged',
'program_efficiency', 'age', 'complexity', 'complexity_2011', 'state', 'category']].describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
2011_data | 84958 | 0.057240 | 0.232302 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
donor_advisory_2016 | 84958 | 0.004332 | 0.065672 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
SOX_policies | 21894 | 2.724582 | 0.689867 | 0.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 |
total_revenue_logged | 21825 | 15.911470 | 1.458552 | 11.616123 | 14.781636 | 15.706608 | 16.864409 | 22.042788 |
program_efficiency | 21894 | 0.805400 | 0.103635 | 0.000000 | 0.756568 | 0.817758 | 0.871105 | 1.010186 |
age | 83830 | 39.508147 | 19.310175 | 0.000000 | 24.000000 | 35.000000 | 52.000000 | 108.000000 |
complexity | 84958 | 0.373031 | 1.220945 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 8.000000 |
complexity_2011 | 4833 | 2.466791 | 0.514468 | 1.000000 | 2.000000 | 2.000000 | 3.000000 | 3.000000 |
logit = smf.logit(formula=logit_formula, data=df[df['2011_data']==1]).fit()
logit.summary()
Warning: Maximum number of iterations has been exceeded. Current function value: 0.075575 Iterations: 35
//anaconda/lib/python2.7/site-packages/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals "Check mle_retvals", ConvergenceWarning)
Dep. Variable: | donor_advisory_2011_to_2016 | No. Observations: | 4808 |
---|---|---|---|
Model: | Logit | Df Residuals: | 4792 |
Method: | MLE | Df Model: | 15 |
Date: | Fri, 02 Sep 2016 | Pseudo R-squ.: | 0.08892 |
Time: | 12:11:05 | Log-Likelihood: | -363.36 |
converged: | False | LL-Null: | -398.83 |
LLR p-value: | 3.049e-09 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | -2.6649 | 1.444 | -1.845 | 0.065 | -5.495 0.166 |
C(category)[T.Arts, Culture, Humanities] | -0.5071 | 0.515 | -0.985 | 0.325 | -1.516 0.502 |
C(category)[T.Community Development] | -0.3595 | 0.575 | -0.626 | 0.532 | -1.486 0.767 |
C(category)[T.Education] | -1.2142 | 0.794 | -1.530 | 0.126 | -2.770 0.341 |
C(category)[T.Environment] | -20.2785 | 9810.033 | -0.002 | 0.998 | -1.92e+04 1.92e+04 |
C(category)[T.Health] | -1.0181 | 0.548 | -1.859 | 0.063 | -2.092 0.055 |
C(category)[T.Human Services] | -0.1780 | 0.415 | -0.429 | 0.668 | -0.992 0.636 |
C(category)[T.Human and Civil Rights] | -0.6780 | 0.687 | -0.987 | 0.324 | -2.025 0.669 |
C(category)[T.International] | -0.3664 | 0.511 | -0.717 | 0.474 | -1.368 0.636 |
C(category)[T.Religion] | 0.3200 | 0.449 | 0.712 | 0.476 | -0.560 1.200 |
C(category)[T.Research and Public Policy] | -0.5291 | 0.796 | -0.665 | 0.506 | -2.090 1.031 |
SOX_policies | -0.4264 | 0.110 | -3.882 | 0.000 | -0.642 -0.211 |
total_revenue_logged | 0.2790 | 0.101 | 2.753 | 0.006 | 0.080 0.478 |
program_efficiency | -3.0518 | 0.768 | -3.976 | 0.000 | -4.556 -1.547 |
age | -0.0064 | 0.007 | -0.977 | 0.329 | -0.019 0.006 |
complexity_2011 | -0.7374 | 0.274 | -2.695 | 0.007 | -1.274 -0.201 |
logit_formula
'donor_advisory_2016 ~ SOX_policies + total_revenue_logged + program_efficiency + age + complexity_2011 + C(category)'
cols1 = ['donor_advisory_2011_to_2016', 'SOX_policies', 'total_revenue_logged', 'program_efficiency', 'age',
'complexity_2011', 'category', 'state']
logit = smf.logit(formula=logit_formula, data=df_2011[cols1].dropna()).fit(cov_type='cluster',
cov_kwds={'groups': df_2011[cols1].dropna()['state']})
logit.summary()
Warning: Maximum number of iterations has been exceeded. Current function value: 0.075203 Iterations: 35
//anaconda/lib/python2.7/site-packages/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals "Check mle_retvals", ConvergenceWarning)
Dep. Variable: | donor_advisory_2011_to_2016 | No. Observations: | 4833 |
---|---|---|---|
Model: | Logit | Df Residuals: | 4817 |
Method: | MLE | Df Model: | 15 |
Date: | Tue, 06 Sep 2016 | Pseudo R-squ.: | 0.08962 |
Time: | 12:35:43 | Log-Likelihood: | -363.45 |
converged: | False | LL-Null: | -399.24 |
LLR p-value: | 2.351e-09 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | -2.6828 | 1.038 | -2.585 | 0.010 | -4.717 -0.649 |
C(category)[T.Arts, Culture, Humanities] | -0.5078 | 0.465 | -1.092 | 0.275 | -1.419 0.404 |
C(category)[T.Community Development] | -0.3612 | 0.700 | -0.516 | 0.606 | -1.733 1.010 |
C(category)[T.Education] | -1.2152 | 0.683 | -1.779 | 0.075 | -2.554 0.123 |
C(category)[T.Environment] | -18.8719 | 0.433 | -43.552 | 0.000 | -19.721 -18.023 |
C(category)[T.Health] | -1.0195 | 0.520 | -1.959 | 0.050 | -2.039 0.000 |
C(category)[T.Human Services] | -0.1790 | 0.370 | -0.484 | 0.628 | -0.904 0.546 |
C(category)[T.Human and Civil Rights] | -0.6823 | 0.483 | -1.412 | 0.158 | -1.629 0.265 |
C(category)[T.International] | -0.3676 | 0.487 | -0.754 | 0.451 | -1.323 0.588 |
C(category)[T.Religion] | 0.3141 | 0.470 | 0.668 | 0.504 | -0.607 1.235 |
C(category)[T.Research and Public Policy] | -0.5298 | 0.526 | -1.007 | 0.314 | -1.561 0.501 |
SOX_policies | -0.4256 | 0.093 | -4.588 | 0.000 | -0.607 -0.244 |
total_revenue_logged | 0.2806 | 0.074 | 3.776 | 0.000 | 0.135 0.426 |
program_efficiency | -3.0563 | 0.816 | -3.744 | 0.000 | -4.656 -1.456 |
age | -0.0065 | 0.005 | -1.199 | 0.231 | -0.017 0.004 |
complexity_2011 | -0.7391 | 0.263 | -2.814 | 0.005 | -1.254 -0.224 |
Here are the two variables that serve as indicators of '2016' and '2011' CN data. These 8,304 and 4,863 rows will serve as the base for conducting the logit regressions. Design variable creation solutions are these subsets of data.
print len(df[df['latest_entry']=='True'])
print len(df[df['2011 data']==1])
8304 4863
- Control variables:
- Size: total revenues best (probably logged)
- will need 2011 and 2016 versions for the 4th and 5th tests
- efficiency ratio
- complexity (could be a good control from Erica's paper)
- fixed effects:
- state
- category
- I need to scrape the category dummies for the new orgs in the 2016 database
- CN does not include that information in the ratings area, but it is included on the webpage in the 'breadcrumbs' area
- The focus of our paper is on SOX policies; if an org has SOX policies it probably has other governance policies, and these would be highly correlated. So, we will leave the other governance variables out of one version of the 4th and 5th tests, and then try to include them in another set. The best candidates are:
- *independent board* --> related to Erica's *independence of key actors" concept
- *board review of 990* and *audited financials* --> both related to Erica's *board monitoring* concept
- we could include other governance variables as needed.
- We are focusing on non-health, non-university organizations; by focusing more on a donor-focused sample (CN), we are differentiating the work from previous studies.
- To differentiate from Erica's *JBE* paper, we should use the SOI data to see how many of the donor advisories are because of 'non-material diversions'.
cols_2011 = [col for col in list(df) if col.endswith('_2011')]
print cols_2011
['charity_name_2011', 'category_2011', 'city_2011', 'state_2011', 'cause_2011', 'tag_line_2011', 'url_2011', 'ein_2011', 'fye_2011', 'overall_rating_2011', 'overall_rating_star_2011', 'efficiency_rating_2011', 'AT_rating_2011', 'financial_rating_star_2011', 'AT_rating_star_2011', 'program_expense_percent_2011', 'admin_expense_percent_2011', 'fund_expense_percent_2011', 'fund_efficiency_2011', 'primary_revenue_growth_2011', 'program_expense_growth_2011', 'working_capital_ratio_2011', 'independent_board_2011', 'no_material_division_2011', 'audited_financials_2011', 'no_loans_related_2011', 'documents_minutes_2011', 'form_990_2011', 'conflict_of_interest_policy_2011', 'whistleblower_policy_2011', 'records_retention_policy_2011', 'CEO_listed_2011', 'process_CEO_compensation_2011', 'no_board_compensation_2011', 'donor_privacy_policy_2011', 'board_listed_2011', 'audited_financials_web_2011', 'form_990_web_2011', 'staff_listed_2011', 'primary_revenue_2011', 'other_revenue_2011', 'total_revenue_2011', 'govt_revenue_2011', 'program_expense_2011', 'admin_expense_2011', 'fund_expense_2011', 'total_functional_expense_2011', 'affiliate_payments_2011', 'budget_surplus_2011', 'net_assets_2011', 'leader_comp_2011', 'leader_comp_percent_2011', 'email_2011', 'website_2011']
df[cols_2011].dtypes
charity_name_2011 object category_2011 object city_2011 object state_2011 object cause_2011 object tag_line_2011 object url_2011 object ein_2011 object fye_2011 object overall_rating_2011 float64 overall_rating_star_2011 float64 efficiency_rating_2011 float64 AT_rating_2011 float64 financial_rating_star_2011 float64 AT_rating_star_2011 float64 program_expense_percent_2011 float64 admin_expense_percent_2011 float64 fund_expense_percent_2011 float64 fund_efficiency_2011 float64 primary_revenue_growth_2011 float64 program_expense_growth_2011 float64 working_capital_ratio_2011 float64 independent_board_2011 object no_material_division_2011 object audited_financials_2011 object no_loans_related_2011 object documents_minutes_2011 object form_990_2011 object conflict_of_interest_policy_2011 object whistleblower_policy_2011 object records_retention_policy_2011 object CEO_listed_2011 object process_CEO_compensation_2011 object no_board_compensation_2011 object donor_privacy_policy_2011 object board_listed_2011 object audited_financials_web_2011 object form_990_web_2011 object staff_listed_2011 object primary_revenue_2011 float64 other_revenue_2011 float64 total_revenue_2011 float64 govt_revenue_2011 object program_expense_2011 float64 admin_expense_2011 float64 fund_expense_2011 float64 total_functional_expense_2011 float64 affiliate_payments_2011 float64 budget_surplus_2011 float64 net_assets_2011 float64 leader_comp_2011 float64 leader_comp_percent_2011 float64 email_2011 object website_2011 object dtype: object
df[cols_2011].describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
overall_rating_2011 | 4833 | 53.547455 | 8.933975e+00 | 8.040000e+00 | 48.73 | 54.61 | 59.97 | 6.996000e+01 |
overall_rating_star_2011 | 4833 | 2.869646 | 8.916785e-01 | 0.000000e+00 | 2.00 | 3.00 | 3.00 | 4.000000e+00 |
efficiency_rating_2011 | 4833 | 53.430780 | 1.048261e+01 | 0.000000e+00 | 46.53 | 54.88 | 61.73 | 6.997000e+01 |
AT_rating_2011 | 4833 | 56.446100 | 1.152050e+01 | -2.000000e+00 | 52.00 | 59.00 | 63.00 | 7.000000e+01 |
financial_rating_star_2011 | 4833 | 2.850403 | 1.007304e+00 | 0.000000e+00 | 2.00 | 3.00 | 4.00 | 4.000000e+00 |
AT_rating_star_2011 | 4833 | 3.128078 | 1.001005e+00 | 0.000000e+00 | 3.00 | 3.00 | 4.00 | 4.000000e+00 |
program_expense_percent_2011 | 4833 | 80.416325 | 1.055343e+01 | 2.200000e+00 | 75.50 | 81.60 | 87.00 | 9.970000e+01 |
admin_expense_percent_2011 | 4833 | 10.320608 | 6.665639e+00 | -5.000000e-01 | 5.80 | 9.20 | 13.20 | 6.790000e+01 |
fund_expense_percent_2011 | 4833 | 9.110180 | 8.056593e+00 | 0.000000e+00 | 4.10 | 7.50 | 11.90 | 9.070000e+01 |
fund_efficiency_2011 | 4833 | 0.129679 | 5.190376e-01 | 0.000000e+00 | 0.05 | 0.10 | 0.16 | 3.533000e+01 |
primary_revenue_growth_2011 | 4833 | 4.403580 | 1.554988e+01 | -7.960000e+01 | -3.50 | 2.60 | 10.20 | 2.452000e+02 |
program_expense_growth_2011 | 4833 | 6.314918 | 2.073699e+01 | -5.320000e+01 | -1.10 | 4.10 | 11.10 | 1.007400e+03 |
working_capital_ratio_2011 | 4833 | 1.788438 | 2.455707e+00 | -3.260000e+00 | 0.44 | 1.01 | 2.20 | 5.842000e+01 |
primary_revenue_2011 | 4833 | 16525465.992551 | 7.313324e+07 | 1.498700e+04 | 1989838.00 | 4430280.00 | 11346855.00 | 3.502077e+09 |
other_revenue_2011 | 4833 | 651129.463066 | 6.235616e+06 | -6.612359e+07 | 5922.00 | 99129.00 | 443689.00 | 2.392543e+08 |
total_revenue_2011 | 4833 | 17176106.777985 | 7.570407e+07 | -4.263887e+07 | 2103386.00 | 4673878.00 | 11721565.00 | 3.587775e+09 |
program_expense_2011 | 4833 | 14784010.616387 | 6.451777e+07 | 2.848300e+04 | 1694422.00 | 3808132.00 | 9557716.00 | 3.091879e+09 |
admin_expense_2011 | 4833 | 1477281.476929 | 7.805211e+06 | -8.585600e+04 | 188863.00 | 412603.00 | 1010869.00 | 4.323913e+08 |
fund_expense_2011 | 4833 | 1133717.688599 | 5.382822e+06 | 0.000000e+00 | 160486.00 | 359811.00 | 830250.00 | 2.231224e+08 |
total_functional_expense_2011 | 4833 | 17395155.368922 | 7.292829e+07 | 1.507310e+05 | 2188637.00 | 4769351.00 | 11766482.00 | 3.354177e+09 |
affiliate_payments_2011 | 4833 | 60756.626733 | 1.812179e+06 | -4.059500e+05 | 0.00 | 0.00 | 0.00 | 1.235951e+08 |
budget_surplus_2011 | 4833 | -219048.159321 | 9.895040e+06 | -3.008552e+08 | -477470.00 | 1652.00 | 452604.00 | 2.335980e+08 |
net_assets_2011 | 4833 | 33372071.914339 | 1.651812e+08 | -1.691832e+07 | 2013666.00 | 6004379.00 | 19206628.00 | 7.002755e+09 |
leader_comp_2011 | 4657 | 153519.535753 | 1.313950e+05 | 0.000000e+00 | 80808.00 | 126250.00 | 191203.00 | 2.257910e+06 |
leader_comp_percent_2011 | 4657 | 3.166133 | 3.015173e+00 | 0.000000e+00 | 1.00 | 2.33 | 4.44 | 3.090000e+01 |
from __future__ import division
efficiency = ProgExp/TotExp
df[df['org_id']=='12123']['program_expense_2011']/df[df['org_id']=='12123']['total_functional_expense_2011']
34 NaN 35 NaN 36 NaN 37 NaN 38 NaN 39 0.824939 40 NaN 41 NaN 42 NaN dtype: float64
df[df['org_id']=='12123'][cols_2011]
charity_name_2011 | category_2011 | city_2011 | state_2011 | cause_2011 | tag_line_2011 | url_2011 | ein_2011 | fye_2011 | overall_rating_2011 | overall_rating_star_2011 | efficiency_rating_2011 | AT_rating_2011 | financial_rating_star_2011 | AT_rating_star_2011 | program_expense_percent_2011 | admin_expense_percent_2011 | fund_expense_percent_2011 | fund_efficiency_2011 | primary_revenue_growth_2011 | program_expense_growth_2011 | working_capital_ratio_2011 | independent_board_2011 | no_material_division_2011 | audited_financials_2011 | no_loans_related_2011 | documents_minutes_2011 | form_990_2011 | conflict_of_interest_policy_2011 | whistleblower_policy_2011 | records_retention_policy_2011 | CEO_listed_2011 | process_CEO_compensation_2011 | no_board_compensation_2011 | donor_privacy_policy_2011 | board_listed_2011 | audited_financials_web_2011 | form_990_web_2011 | staff_listed_2011 | primary_revenue_2011 | other_revenue_2011 | total_revenue_2011 | govt_revenue_2011 | program_expense_2011 | admin_expense_2011 | fund_expense_2011 | total_functional_expense_2011 | affiliate_payments_2011 | budget_surplus_2011 | net_assets_2011 | leader_comp_2011 | leader_comp_percent_2011 | email_2011 | website_2011 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
34 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
35 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
36 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
37 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
38 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
39 | 100 Club of Arizona | Human Services | Phoenix | AZ | Multipurpose Human Service Organizations | Supporting families of public safety | http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=12123 | 23-7172077 | 12/2009 | 63.84 | 4 | 66.58 | 62 | 4 | 4 | 82.4 | 14.4 | 3 | 0.03 | 6.2 | 6.3 | 1.06 | yes | yes | yes | yes | yes | yes | yes | yes | NO | yes | yes | yes | yes | yes | NO | yes | yes | 1212051 | -227543 | 984508 | Note: This organization receives $0 in government support. | 1019191 | 178385 | 37899 | 1235475 | 0 | -250967 | 1316781 | 122623 | 9.92 | info@100club.org | http://www.100club.org |
40 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
41 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
42 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
df[df['2011 data']==1]['program_expense_2011'][:5]/df[df['2011 data']==1]['total_functional_expense_2011'][:5]
9 NaN 21 0.797448 39 0.824939 52 0.854655 63 0.786945 dtype: float64
Number of revenue sources (Donations, Government Grants, Program Service Revenues)
print len(df[df['2011 data']==1])
print len(df[df['govt_revenue_2011'].notnull()])
df[df['govt_revenue_2011'].notnull()]['govt_revenue_2011'].value_counts()
4863 4833
Note: This organization receives $0 in government support. 2531 GOVERNMENT SUPPORT MUST BE RECEIVED 2302 Name: govt_revenue_2011, dtype: int64
df[df['govt_revenue_2011'].notnull()]['govt_revenue_2011'][5:12]
161 GOVERNMENT SUPPORT MUST BE RECEIVED 185 Note: This organization receives $0 in government support. 208 GOVERNMENT SUPPORT MUST BE RECEIVED 228 Note: This organization receives $0 in government support. 244 Note: This organization receives $0 in government support. 255 Note: This organization receives $0 in government support. 276 Note: This organization receives $0 in government support. Name: govt_revenue_2011, dtype: object
df['govt_revenue_2011_binary'] = np.nan
df['govt_revenue_2011_binary'] = np.where(
df['govt_revenue_2011'] == 'Note: This organization receives $0 in government support.', 0,
df['govt_revenue_2011_binary'])
df['govt_revenue_2011_binary'] = np.where(
df['govt_revenue_2011'] == 'GOVERNMENT SUPPORT MUST BE RECEIVED', 1,
df['govt_revenue_2011_binary'])
df['govt_revenue_2011_binary'].value_counts()
0 2531 1 2302 Name: govt_revenue_2011_binary, dtype: int64
print len(df[df['2011 data']==1])
print len(df[df['primary_revenue_2011'].notnull()])
print df[df['primary_revenue_2011'].notnull()]['primary_revenue_2011'][:5]
df[df['primary_revenue_2011'].notnull()]['primary_revenue_2011'].value_counts().sum()
4863 4833 21 3914222 39 1212051 52 762512 63 1140158 148 1375169 Name: primary_revenue_2011, dtype: float64
4833
df[df['other_revenue_2011']<1000]['other_revenue_2011']
39 -227543 63 -44041 312 -572243 362 -2666 385 -7531 477 -4653961 558 -176202 907 -3543041 942 -1726139 1161 -831757 1317 -7494 1352 -651915 1372 -288151 1390 0 1604 -7067577 1674 -271688 1689 -11111670 1705 -46280 1927 -1702433 1955 -6955 1982 -77563 2013 -388287 2027 -76537 2099 -1543 2196 -162612 2249 -2860 2352 -113036 2436 678 2597 -7227 2663 -456651 ... 82598 -1308477 82683 0 82914 -212856 82971 -1358658 82987 -29502 83033 -1003245 83098 -356419 83115 -10947728 83366 991 83423 -18112 83590 -173857 83632 -1810 83713 -4529558 83746 0 83758 251 83841 -364815 83873 -313582 83898 0 83910 300 83911 0 83912 56 83913 28 83917 -4437 83920 -3318 83922 263 83925 748 83927 0 83928 30 83932 -228354 83940 -6663315 Name: other_revenue_2011, dtype: float64
len(df[df['other_revenue_2011']==0])
46
print len(df[df['2011 data']==1])
print len(df[df['other_revenue_2011'].notnull()])
print df[df['other_revenue_2011'].notnull()]['other_revenue_2011'][:5]
df[df['other_revenue_2011'].notnull()]['other_revenue_2011'].value_counts().sum()
4863 4833 21 216503 39 -227543 52 21340 63 -44041 148 5061 Name: other_revenue_2011, dtype: float64
4833
print len(df[df['other_revenue_2011'].notnull()])
df['other_revenue_2011_binary'] = np.where(df['other_revenue_2011']>0, 1, df['other_revenue_2011'] )
df['other_revenue_2011_binary'] = np.where(df['other_revenue_2011']<0, 1, df['other_revenue_2011_binary'] )
print len(df[df['other_revenue_2011_binary'].notnull()])
df['other_revenue_2011_binary'].value_counts()
4833 4833
1 4787 0 46 Name: other_revenue_2011_binary, dtype: int64
df['complexity_2011'] = 1 + df['other_revenue_2011_binary'] + df['govt_revenue_2011_binary']
print len(df[df['complexity_2011'].notnull()])
df['complexity_2011'].value_counts()
4833
2 2501 3 2294 1 38 Name: complexity_2011, dtype: int64
cols_2011 = [col for col in list(df) if col.endswith('_2011')]
print cols_2011
import statsmodels
import statsmodels.api as sm
import statsmodels.formula.api as smf #FOR USING 'R'-STYLE FORMULAS FOR REGRESSIONS
print statsmodels.__version__
0.6.1
df['SOX_policies_2011'] = df['conflict_of_interest_policy_2011'] + ' ' + df['whistleblower_policy_2011'] + ' ' + df['records_retention_policy_2011']
df['SOX_policies_2011'].value_counts()
yes yes yes 3548 yes NO NO 416 yes yes NO 284 yes NO yes 265 NO NO NO 255 NO NO yes 30 NO yes yes 21 NO yes NO 14 Name: SOX_policies_2011, dtype: int64
df['SOX_policies_2011'] = df['SOX_policies_2011'].str.count('yes')
df['SOX_policies_2011'].value_counts()
3 3548 2 570 1 460 0 255 Name: SOX_policies_2011, dtype: int64
df['total_revenue_2011_logged'] = np.log(df['total_revenue_2011'])
df[['total_revenue_2011_logged', 'total_revenue_2011']].describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
total_revenue_2011_logged | 4811 | 15.530981 | 1.282845 | 12.586466 | 14.569243 | 15.36338 | 16.27977 | 2.200080e+01 |
total_revenue_2011 | 4833 | 17176106.777985 | 75704074.259743 | -42638874.000000 | 2103386.000000 | 4673878.00000 | 11721565.00000 | 3.587775e+09 |
df['advisory'] = df['']
#IVs = '%s + ' % IV
#DV = '%s ~ ' % DV
IVs = 'SOX_policies_2011 '
#DV = 'advisory ~ '
DV = 'donor_advisory_2016 ~ '
controls = '+ total_revenue_2011_logged + program_expense_percent_2011 + age + \
complexity_2011 + C(category)'
#admin_expense_percent + leader_comp_percent + budget_surplus
logit_formula = DV+IVs+controls
print logit_formula
#globals()["mod%s" % model_num] = smf.logit(formula=logit_formula, data=df).fit()
#print globals()["mod%s" % model_num].summary()
# #print model_num.summary()
#print '\n', "Chi-squared value:", globals()["mod%s" % model_num].llr, '\n' #TO GET THE CHI-SQUARED VALUE
donor_advisory_2016 ~ SOX_policies_2011 + total_revenue_2011_logged + program_expense_percent_2011 + age + complexity_2011 + C(category)
df.to_csv('df.csv', encoding='utf-8')
logit_variables = ['donor_advisory_2016', 'SOX_policies_2011', 'total_revenue_2011_logged',
'program_expense_percent_2011', 'age', 'complexity_2011', 'state_2011', 'category']
df_2011 = df[logit_variables]
print len(df_2011)
len(df_2011.dropna())
84958
4808
#IVs = '%s + ' % IV
#DV = '%s ~ ' % DV
IVs = 'SOX_policies_2011 '
#DV = 'advisory ~ '
DV = 'donor_advisory_2016 ~ '
controls = '+ total_revenue_2011_logged + program_expense_percent_2011 + age + \
complexity_2011'
#admin_expense_percent + leader_comp_percent + budget_surplus
logit_formula = DV+IVs+controls
print logit_formula
#globals()["mod%s" % model_num] = smf.logit(formula=logit_formula, data=df).fit()
#print globals()["mod%s" % model_num].summary()
# #print model_num.summary()
#print '\n', "Chi-squared value:", globals()["mod%s" % model_num].llr, '\n' #TO GET THE CHI-SQUARED VALUE
donor_advisory_2016 ~ SOX_policies_2011 + total_revenue_2011_logged + program_expense_percent_2011 + age + complexity_2011
logit = smf.logit(formula=logit_formula, data=df_2011).fit()
logit.summary()
Optimization terminated successfully. Current function value: 0.044916 Iterations 10
Dep. Variable: | donor_advisory_2016 | No. Observations: | 4808 |
---|---|---|---|
Model: | Logit | Df Residuals: | 4802 |
Method: | MLE | Df Model: | 5 |
Date: | Wed, 31 Aug 2016 | Pseudo R-squ.: | 0.08557 |
Time: | 21:43:38 | Log-Likelihood: | -215.96 |
converged: | True | LL-Null: | -236.17 |
LLR p-value: | 1.229e-07 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | -3.0296 | 1.925 | -1.574 | 0.115 | -6.802 0.743 |
SOX_policies_2011 | -0.4992 | 0.145 | -3.448 | 0.001 | -0.783 -0.215 |
total_revenue_2011_logged | 0.3113 | 0.138 | 2.260 | 0.024 | 0.041 0.581 |
program_expense_percent_2011 | -0.0285 | 0.009 | -3.068 | 0.002 | -0.047 -0.010 |
age | -0.0104 | 0.009 | -1.130 | 0.258 | -0.028 0.008 |
complexity_2011 | -1.2189 | 0.358 | -3.402 | 0.001 | -1.921 -0.517 |
#def new_logit(IV,model_num):
def new_logit_clustered(data, DV, columns, FE, model_num):
#IVs = '%s + ' % IV
#DV = 'RTs_binary ~ '
DV = '%s ~ ' % DV
#controls = 'from_user_followers_count + time_on_twitter_days + CSR_sustainability + \
# URLs_binary + photo'
IVs = ' + '.join(columns)
FE = '%s ' % FE
logit_formula = DV+IVs+FE
print logit_formula
globals()["mod%s" % model_num] = smf.logit(formula=logit_formula, data=data).fit(cov_type='cluster',
cov_kwds={'groups': df['firm_from_user_screen_name']})
print globals()["mod%s" % model_num].summary()
#print model_num.summary()
print '\n', "Chi-squared value:", globals()["mod%s" % model_num].llr, '\n' #TO GET THE CHI-SQUARED VALUE
#print '\n', "Pseudo R-squared:", globals()["mod%s" % model_num].prsquared #TO GET THE PSEUDO-R-SQUARED
logit = smf.logit(formula=logit_formula, data=df).fit()
logit.summary()
Optimization terminated successfully. Current function value: 0.044916 Iterations 10
Dep. Variable: | donor_advisory_2016 | No. Observations: | 4808 |
---|---|---|---|
Model: | Logit | Df Residuals: | 4802 |
Method: | MLE | Df Model: | 5 |
Date: | Wed, 31 Aug 2016 | Pseudo R-squ.: | 0.08557 |
Time: | 21:46:12 | Log-Likelihood: | -215.96 |
converged: | True | LL-Null: | -236.17 |
LLR p-value: | 1.229e-07 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | -3.0296 | 1.925 | -1.574 | 0.115 | -6.802 0.743 |
SOX_policies_2011 | -0.4992 | 0.145 | -3.448 | 0.001 | -0.783 -0.215 |
total_revenue_2011_logged | 0.3113 | 0.138 | 2.260 | 0.024 | 0.041 0.581 |
program_expense_percent_2011 | -0.0285 | 0.009 | -3.068 | 0.002 | -0.047 -0.010 |
age | -0.0104 | 0.009 | -1.130 | 0.258 | -0.028 0.008 |
complexity_2011 | -1.2189 | 0.358 | -3.402 | 0.001 | -1.921 -0.517 |
len(df[df['2011 data']==1])
4863
df[df['2011 data']==1][['SOX_policies_2011', 'total_revenue_2011_logged', 'program_expense_percent_2011',
'age', 'complexity_2011']].describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
SOX_policies_2011 | 4833 | 2.533416 | 0.869466 | 0.000000 | 2.000000 | 3.00000 | 3.00000 | 3.000000 |
total_revenue_2011_logged | 4811 | 15.530981 | 1.282845 | 12.586466 | 14.569243 | 15.36338 | 16.27977 | 22.000798 |
program_expense_percent_2011 | 4833 | 80.416325 | 10.553429 | 2.200000 | 75.500000 | 81.60000 | 87.00000 | 99.700000 |
age | 4860 | 40.051029 | 19.240216 | 0.000000 | 25.000000 | 35.00000 | 52.00000 | 108.000000 |
complexity_2011 | 4833 | 2.466791 | 0.514468 | 1.000000 | 2.000000 | 2.00000 | 3.00000 | 3.000000 |
len(df[df['state_2011'].notnull()])
4863
df['state_2011'].value_counts()
NY 672 CA 649 DC 333 FL 283 TX 239 VA 187 MA 183 IL 172 PA 169 OH 127 CO 126 GA 114 WA 110 MD 108 MI 107 MN 94 NC 93 MO 92 NJ 86 TN 80 OR 79 AZ 75 CT 71 WI 64 IN 52 KY 37 SC 35 NE 33 LA 29 OK 29 ME 27 UT 26 AL 26 KS 24 NM 24 IA 23 MT 22 HI 20 NH 17 NV 17 RI 15 MS 15 VT 14 AR 14 DE 13 WY 9 AK 8 ID 7 SD 6 WV 5 ND 2 PR 1 Name: state_2011, dtype: int64
len(df[ (df['2011 data']==1) & (df['state_2011'].isnull())])
0
df_2011.dropna().describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
donor_advisory_2016 | 4808 | 0.008527 | 0.091959 | 0.000000 | 0.000000 | 0.000000 | 0.0000 | 1.000000 |
SOX_policies_2011 | 4808 | 2.534110 | 0.869042 | 0.000000 | 2.000000 | 3.000000 | 3.0000 | 3.000000 |
total_revenue_2011_logged | 4808 | 15.532027 | 1.282551 | 12.586466 | 14.569824 | 15.364199 | 16.2804 | 22.000798 |
program_expense_percent_2011 | 4808 | 80.427787 | 10.558554 | 2.200000 | 75.500000 | 81.600000 | 87.0000 | 99.700000 |
age | 4808 | 40.016015 | 19.198956 | 0.000000 | 25.000000 | 35.000000 | 52.0000 | 108.000000 |
complexity_2011 | 4808 | 2.466722 | 0.514543 | 1.000000 | 2.000000 | 2.000000 | 3.0000 | 3.000000 |
logit = smf.logit(formula=logit_formula, data=df_2011.dropna()).fit(cov_type='cluster',
cov_kwds={'groups': df_2011.dropna()['state_2011']})
logit.summary()
Warning: Maximum number of iterations has been exceeded. Current function value: 0.043361 Iterations: 35
//anaconda/lib/python2.7/site-packages/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals "Check mle_retvals", ConvergenceWarning)
Dep. Variable: | donor_advisory_2016 | No. Observations: | 4808 |
---|---|---|---|
Model: | Logit | Df Residuals: | 4792 |
Method: | MLE | Df Model: | 15 |
Date: | Wed, 31 Aug 2016 | Pseudo R-squ.: | 0.1172 |
Time: | 21:48:06 | Log-Likelihood: | -208.48 |
converged: | False | LL-Null: | -236.17 |
LLR p-value: | 1.546e-06 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | -3.4848 | 1.884 | -1.849 | 0.064 | -7.178 0.208 |
C(category)[T.Arts, Culture, Humanities] | -0.7432 | 0.965 | -0.770 | 0.441 | -2.635 1.149 |
C(category)[T.Community Development] | 0.2960 | 0.856 | 0.346 | 0.730 | -1.382 1.974 |
C(category)[T.Education] | -0.1108 | 0.901 | -0.123 | 0.902 | -1.877 1.655 |
C(category)[T.Environment] | -17.5510 | 0.614 | -28.598 | 0.000 | -18.754 -16.348 |
C(category)[T.Health] | -0.1414 | 0.744 | -0.190 | 0.849 | -1.600 1.317 |
C(category)[T.Human Services] | -0.2230 | 0.760 | -0.293 | 0.769 | -1.713 1.268 |
C(category)[T.Human and Civil Rights] | -0.0320 | 0.774 | -0.041 | 0.967 | -1.550 1.486 |
C(category)[T.International] | -0.0034 | 0.766 | -0.004 | 0.996 | -1.505 1.498 |
C(category)[T.Religion] | 1.1258 | 0.708 | 1.589 | 0.112 | -0.263 2.514 |
C(category)[T.Research and Public Policy] | 0.5617 | 0.799 | 0.703 | 0.482 | -1.004 2.127 |
SOX_policies_2011 | -0.4728 | 0.124 | -3.805 | 0.000 | -0.716 -0.229 |
total_revenue_2011_logged | 0.2978 | 0.126 | 2.356 | 0.018 | 0.050 0.546 |
program_expense_percent_2011 | -0.0330 | 0.011 | -2.998 | 0.003 | -0.055 -0.011 |
age | -0.0104 | 0.007 | -1.499 | 0.134 | -0.024 0.003 |
complexity_2011 | -0.8049 | 0.465 | -1.730 | 0.084 | -1.717 0.107 |
logit = smf.logit(formula=logit_formula, data=df).fit()
logit.summary()
Warning: Maximum number of iterations has been exceeded. Current function value: 0.043361 Iterations: 35
//anaconda/lib/python2.7/site-packages/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals "Check mle_retvals", ConvergenceWarning)
Dep. Variable: | donor_advisory_2016 | No. Observations: | 4808 |
---|---|---|---|
Model: | Logit | Df Residuals: | 4792 |
Method: | MLE | Df Model: | 15 |
Date: | Wed, 31 Aug 2016 | Pseudo R-squ.: | 0.1172 |
Time: | 21:48:36 | Log-Likelihood: | -208.48 |
converged: | False | LL-Null: | -236.17 |
LLR p-value: | 1.546e-06 |
coef | std err | z | P>|z| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
Intercept | -3.4848 | 2.010 | -1.734 | 0.083 | -7.424 0.454 |
C(category)[T.Arts, Culture, Humanities] | -0.7432 | 0.940 | -0.790 | 0.429 | -2.586 1.100 |
C(category)[T.Community Development] | 0.2960 | 0.835 | 0.354 | 0.723 | -1.341 1.933 |
C(category)[T.Education] | -0.1108 | 0.929 | -0.119 | 0.905 | -1.931 1.709 |
C(category)[T.Environment] | -17.5510 | 4367.028 | -0.004 | 0.997 | -8576.769 8541.667 |
C(category)[T.Health] | -0.1414 | 0.755 | -0.187 | 0.851 | -1.620 1.338 |
C(category)[T.Human Services] | -0.2230 | 0.707 | -0.315 | 0.753 | -1.609 1.163 |
C(category)[T.Human and Civil Rights] | -0.0320 | 0.934 | -0.034 | 0.973 | -1.863 1.799 |
C(category)[T.International] | -0.0034 | 0.790 | -0.004 | 0.997 | -1.552 1.545 |
C(category)[T.Religion] | 1.1258 | 0.673 | 1.672 | 0.095 | -0.194 2.445 |
C(category)[T.Research and Public Policy] | 0.5617 | 0.930 | 0.604 | 0.546 | -1.260 2.384 |
SOX_policies_2011 | -0.4728 | 0.146 | -3.240 | 0.001 | -0.759 -0.187 |
total_revenue_2011_logged | 0.2978 | 0.139 | 2.150 | 0.032 | 0.026 0.569 |
program_expense_percent_2011 | -0.0330 | 0.010 | -3.336 | 0.001 | -0.052 -0.014 |
age | -0.0104 | 0.010 | -1.081 | 0.280 | -0.029 0.008 |
complexity_2011 | -0.8049 | 0.394 | -2.045 | 0.041 | -1.576 -0.034 |
df_2011.dropna()['category'].value_counts()
Human Services 1188 Arts, Culture, Humanities 670 Health 574 International 428 Community Development 381 Animals 372 Environment 317 Religion 295 Education 280 Human and Civil Rights 181 Research and Public Policy 122 Name: category, dtype: int64
logit = smf.logit(formula=logit_formula, data=df[df['2011 data']==1]).fit(cov_type='cluster',
cov_kwds={'groups': df[df['2011 data']]['state_2011']})
logit.summary()
Optimization terminated successfully. Current function value: 0.044916 Iterations 10
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-513-f1b037f5d426> in <module>() 1 logit = smf.logit(formula=logit_formula, data=df[df['2011 data']==1]).fit(cov_type='cluster', ----> 2 cov_kwds={'groups': df[df['2011 data']==1]['state_2011']}) 3 logit.summary() //anaconda/lib/python2.7/site-packages/statsmodels/discrete/discrete_model.pyc in fit(self, start_params, method, maxiter, full_output, disp, callback, **kwargs) 1374 bnryfit = super(Logit, self).fit(start_params=start_params, 1375 method=method, maxiter=maxiter, full_output=full_output, -> 1376 disp=disp, callback=callback, **kwargs) 1377 1378 discretefit = LogitResults(self, bnryfit) //anaconda/lib/python2.7/site-packages/statsmodels/discrete/discrete_model.pyc in fit(self, start_params, method, maxiter, full_output, disp, callback, **kwargs) 201 mlefit = super(DiscreteModel, self).fit(start_params=start_params, 202 method=method, maxiter=maxiter, full_output=full_output, --> 203 disp=disp, callback=callback, **kwargs) 204 205 return mlefit # up to subclasses to wrap results //anaconda/lib/python2.7/site-packages/statsmodels/base/model.pyc in fit(self, start_params, method, maxiter, full_output, disp, fargs, callback, retall, skip_hessian, **kwargs) 455 #print('kwds inLikelihoodModel.fit', kwds) 456 #TODO: add Hessian approximation and change the above if needed --> 457 mlefit = LikelihoodModelResults(self, xopt, Hinv, scale=1., **kwds) 458 459 #TODO: hardcode scale? //anaconda/lib/python2.7/site-packages/statsmodels/base/model.pyc in __init__(self, model, params, normalized_cov_params, scale, **kwargs) 940 # TODO: we shouldn't need use_t in get_robustcov_results 941 get_robustcov_results(self, cov_type=cov_type, use_self=True, --> 942 use_t=use_t, **cov_kwds) 943 944 //anaconda/lib/python2.7/site-packages/statsmodels/base/covtype.pyc in get_robustcov_results(self, cov_type, use_t, **kwds) 193 self.n_groups = n_groups = len(np.unique(groups)) 194 res.cov_params_default = sw.cov_cluster(self, groups, --> 195 use_correction=use_correction) 196 197 elif groups.ndim == 2: //anaconda/lib/python2.7/site-packages/statsmodels/stats/sandwich_covariance.pyc in cov_cluster(results, group, use_correction) 535 clusters = np.unique(group) 536 --> 537 scale = S_crosssection(xu, group) 538 539 nobs, k_params = xu.shape //anaconda/lib/python2.7/site-packages/statsmodels/stats/sandwich_covariance.pyc in S_crosssection(x, group) 489 490 ''' --> 491 x_group_sums = group_sums(x, group).T #TODO: why transposed 492 493 return S_white_simple(x_group_sums) //anaconda/lib/python2.7/site-packages/statsmodels/stats/sandwich_covariance.pyc in group_sums(x, group) 435 #TODO: transpose return in group_sum, need test coverage first 436 return np.array([np.bincount(group, weights=x[:, col]) --> 437 for col in range(x.shape[1])]) 438 439 ValueError: The weights and list don't have the same length.
logit = smf.logit(formula=logit_formula, data=df[df['total_revenue_2011_logged'].notnull()]).fit(cov_type='cluster',
cov_kwds={'groups': df[df['total_revenue_2011_logged'].notnull()]['state_2011']})
logit.summary()
Optimization terminated successfully. Current function value: 0.044916 Iterations 10
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-526-17812cb557f5> in <module>() 1 logit = smf.logit(formula=logit_formula, data=df[df['total_revenue_2011_logged'].notnull()]).fit(cov_type='cluster', ----> 2 cov_kwds={'groups': df[df['total_revenue_2011_logged'].notnull()]['state_2011']}) 3 logit.summary() //anaconda/lib/python2.7/site-packages/statsmodels/discrete/discrete_model.pyc in fit(self, start_params, method, maxiter, full_output, disp, callback, **kwargs) 1374 bnryfit = super(Logit, self).fit(start_params=start_params, 1375 method=method, maxiter=maxiter, full_output=full_output, -> 1376 disp=disp, callback=callback, **kwargs) 1377 1378 discretefit = LogitResults(self, bnryfit) //anaconda/lib/python2.7/site-packages/statsmodels/discrete/discrete_model.pyc in fit(self, start_params, method, maxiter, full_output, disp, callback, **kwargs) 201 mlefit = super(DiscreteModel, self).fit(start_params=start_params, 202 method=method, maxiter=maxiter, full_output=full_output, --> 203 disp=disp, callback=callback, **kwargs) 204 205 return mlefit # up to subclasses to wrap results //anaconda/lib/python2.7/site-packages/statsmodels/base/model.pyc in fit(self, start_params, method, maxiter, full_output, disp, fargs, callback, retall, skip_hessian, **kwargs) 455 #print('kwds inLikelihoodModel.fit', kwds) 456 #TODO: add Hessian approximation and change the above if needed --> 457 mlefit = LikelihoodModelResults(self, xopt, Hinv, scale=1., **kwds) 458 459 #TODO: hardcode scale? //anaconda/lib/python2.7/site-packages/statsmodels/base/model.pyc in __init__(self, model, params, normalized_cov_params, scale, **kwargs) 940 # TODO: we shouldn't need use_t in get_robustcov_results 941 get_robustcov_results(self, cov_type=cov_type, use_self=True, --> 942 use_t=use_t, **cov_kwds) 943 944 //anaconda/lib/python2.7/site-packages/statsmodels/base/covtype.pyc in get_robustcov_results(self, cov_type, use_t, **kwds) 193 self.n_groups = n_groups = len(np.unique(groups)) 194 res.cov_params_default = sw.cov_cluster(self, groups, --> 195 use_correction=use_correction) 196 197 elif groups.ndim == 2: //anaconda/lib/python2.7/site-packages/statsmodels/stats/sandwich_covariance.pyc in cov_cluster(results, group, use_correction) 535 clusters = np.unique(group) 536 --> 537 scale = S_crosssection(xu, group) 538 539 nobs, k_params = xu.shape //anaconda/lib/python2.7/site-packages/statsmodels/stats/sandwich_covariance.pyc in S_crosssection(x, group) 489 490 ''' --> 491 x_group_sums = group_sums(x, group).T #TODO: why transposed 492 493 return S_white_simple(x_group_sums) //anaconda/lib/python2.7/site-packages/statsmodels/stats/sandwich_covariance.pyc in group_sums(x, group) 435 #TODO: transpose return in group_sum, need test coverage first 436 return np.array([np.bincount(group, weights=x[:, col]) --> 437 for col in range(x.shape[1])]) 438 439 ValueError: The weights and list don't have the same length.
print len(df[(df['2011 data']==1) & (df['donor_advisory_2016'].notnull())])
print len(df[(df['2011 data']==1) & (df['donor_advisory_2016']==0)])
print len(df[(df['2011 data']==1) & (df['donor_advisory_2016']==1)])
4863 4816 47
print cols_2011
['charity_name_2011', 'category_2011', 'city_2011', 'state_2011', 'cause_2011', 'tag_line_2011', 'url_2011', 'ein_2011', 'fye_2011', 'overall_rating_2011', 'overall_rating_star_2011', 'efficiency_rating_2011', 'AT_rating_2011', 'financial_rating_star_2011', 'AT_rating_star_2011', 'program_expense_percent_2011', 'admin_expense_percent_2011', 'fund_expense_percent_2011', 'fund_efficiency_2011', 'primary_revenue_growth_2011', 'program_expense_growth_2011', 'working_capital_ratio_2011', 'independent_board_2011', 'no_material_division_2011', 'audited_financials_2011', 'no_loans_related_2011', 'documents_minutes_2011', 'form_990_2011', 'conflict_of_interest_policy_2011', 'whistleblower_policy_2011', 'records_retention_policy_2011', 'CEO_listed_2011', 'process_CEO_compensation_2011', 'no_board_compensation_2011', 'donor_privacy_policy_2011', 'board_listed_2011', 'audited_financials_web_2011', 'form_990_web_2011', 'staff_listed_2011', 'primary_revenue_2011', 'other_revenue_2011', 'total_revenue_2011', 'govt_revenue_2011', 'program_expense_2011', 'admin_expense_2011', 'fund_expense_2011', 'total_functional_expense_2011', 'affiliate_payments_2011', 'budget_surplus_2011', 'net_assets_2011', 'leader_comp_2011', 'leader_comp_percent_2011', 'email_2011', 'website_2011']
BMF_columns = ['NEW ROW', 'NAME_2015_BMF', 'STREET_2015_BMF', 'CITY_2015_BMF', 'STATE_2015_BMF', 'ZIP_2015_BMF',
'RULING_2015_BMF', 'ACTIVITY_2015_BMF', 'TAX_PERIOD_2015_BMF', 'ASSET_AMT_2015_BMF', 'INCOME_AMT_2015_BMF',
'REVENUE_AMT_2015_BMF', 'NTEE_CD_2015_BMF', '2015 BMF', 'ruledate_2004_BMF', 'name_MSTRALL', 'state_MSTRALL',
'NTEE1_MSTRALL', 'nteecc_MSTRALL', 'zip_MSTRALL', 'fips_MSTRALL', 'taxper_MSTRALL', 'income_MSTRALL',
'F990REV_MSTRALL', 'assets_MSTRALL', 'ruledate_MSTRALL', 'deductcd_MSTRALL', 'accper_MSTRALL',
'rule_date_v1', 'taxpd']
df['']df[(df['SOX_policies_2011'].notnull())]
print len(df[df['current_donor_advisory']==1])
DA_2016 = df[df['current_donor_advisory']==1]['org_id'].tolist()
print len(DA_2016), len(set(DA_2016))
DA_2016[:3]
321 321 321
['16722', '14954', '16155']
df['donor_advisory_2016'] = np.nan
df['donor_advisory_2016'] = np.where( df['org_id'].isin(DA_2016), 1,0
)
df['donor_advisory_2016'].value_counts()
0 84590 1 368 Name: donor_advisory_2016, dtype: int64
df[(df['SOX_policies_2011'].notnull())][['name', 'org_id', 'FYE', '2011 data', #'current_or_past_donor_advisory',
'current_donor_advisory', 'past_donor_advisory', 'SOX_policies_2011']][:24]
name | org_id | FYE | 2011 data | current_donor_advisory | past_donor_advisory | SOX_policies_2011 | |
---|---|---|---|---|---|---|---|
21 | 10,000 Degrees | 6466 | FY2009 | 1 | 0 | 0 | 3 |
39 | 100 Club of Arizona | 12123 | FY2009 | 1 | 0 | 0 | 2 |
52 | 1000 Friends of Florida | 10092 | FY2008 | 1 | 0 | 0 | 0 |
63 | 1000 Friends of Oregon | 8770 | FY2010 | 1 | 0 | 0 | 3 |
148 | 4 Paws for Ability | 13055 | FY2009 | 1 | 0 | 0 | 3 |
161 | The 92nd Street Y | 4792 | FY2010 | 1 | 0 | 0 | 3 |
185 | A Better Chance | 6082 | FY2010 | 1 | 0 | 0 | 3 |
208 | A Contemporary Theatre | 3634 | FY2009 | 1 | 0 | 0 | 3 |
228 | A Kid Again | 9239 | FY2009 | 1 | 0 | 0 | 3 |
244 | A Noise Within | 10176 | FY2010 | 1 | 0 | 0 | 0 |
255 | A Place Called Home | 8040 | FY2010 | 1 | 0 | 0 | 3 |
276 | A.J. Muste Memorial Institute | 6096 | FY2009 | 1 | 0 | 0 | 3 |
292 | AAA Foundation for Traffic Safety | 8302 | FY2009 | 1 | 0 | 0 | 3 |
312 | Aaron Diamond AIDS Research Center | 4991 | FY2010 | 1 | 0 | 0 | 3 |
328 | AARP Foundation | 3205 | FY2009 | 1 | 0 | 0 | 3 |
337 | AAUW - American Association of University Women | 3240 | FY2010 | 1 | 0 | 0 | 3 |
362 | Abilities United | 7940 | FY2010 | 1 | 0 | 0 | 3 |
373 | The Ability Experience | 7632 | FY2010 | 1 | 0 | 0 | 3 |
385 | Abode Services | 9182 | FY2010 | 1 | 0 | 0 | 3 |
399 | The Abraham Fund Initiatives | 9371 | FY2009 | 1 | 0 | 0 | 3 |
426 | Abused Deaf Women's Advocacy Services | 12762 | FY2009 | 1 | 0 | 0 | 3 |
441 | Academy of Achievement | 5705 | FY2009 | 1 | 0 | 0 | 1 |
458 | Academy of American Poets | 9256 | FY2009 | 1 | 0 | 0 | 3 |
477 | The Academy of Natural Sciences of Drexel University | 3209 | FY2009 | 1 | 0 | 0 | 3 |
#df['advisory'] = np.where(~df['advisory text - current advisory'].isnull(), 1,0)
#df['advisory'].value_counts()
0 84637 1 321 Name: advisory, dtype: int64
#df[(df['2011 data']==1) & (df['past_donor_advisory']==1)][:5]
File "<unknown>", line 1 2016 _donor_advisory ^ SyntaxError: invalid syntax
controls = ['total_revenue_2011', 'program_expense_percent_2011',
]
len(df[(df['2011 data']==1) & (df['total_revenue_2011'].isnull())])#[controls]
df[df['2011 data']==1][:2]
org_id | EIN | org_url | name | category | category-full | Date Published | Form 990 FYE | Form 990 FYE, v2 | FYE | Earliest Rating Publication Date | ratings_system | Overall Score | Overall Rating | advisory text - current advisory | advisory text - past advisory | current_or_past_donor_advisory | current_donor_advisory | past_donor_advisory | latest_entry | current_ratings_url | ein_2016 | Publication_date_and_FY_2016 | Publication Date_2016 | FYE_2016 | donor_alert_2016 | overall_rating_2016 | efficiency_rating_rating_2016 | AT_rating_2016 | overall_rating_star_2016 | financial_rating_star_2016 | AT_rating_star_2016 | program_expense_percent_2016 | admin_expense_percent_2016 | fund_expense_percent_2016 | fund_efficiency_2016 | working_capital_ratio_2016 | program_expense_growth_2016 | liabilities_to_assets_2016 | independent_board_2016 | no_material_division_2016 | audited_financials_2016 | no_loans_related_2016 | documents_minutes_2016 | form_990_2016 | conflict_of_interest_policy_2016 | whistleblower_policy_2016 | records_retention_policy_2016 | CEO_listed_2016 | process_CEO_compensation_2016 | no_board_compensation_2016 | donor_privacy_policy_2016 | board_listed_2016 | audited_financials_web_2016 | form_990_web_2016 | staff_listed_2016 | contributions_gifts_grants_2016 | federated_campaigns_2016 | membership_dues_2016 | fundraising_events_2016 | related_organizations_2016 | government_grants_2016 | total_contributions_2016 | program_service_revenue_2016 | total_primary_revenue_2016 | other_revenue_2016 | total_revenue_2016 | program_expenses_2016 | administrative_expenses_2016 | fundraising_expenses_2016 | total_functional_expenses_2016 | payments_to_affiliates_2016 | excess_or_deficit_2016 | net_assets_2016 | comp_2016 | cp_2016 | mission_2016 | 2011 data | charity_name_2011 | category_2011 | city_2011 | state_2011 | cause_2011 | tag_line_2011 | url_2011 | ein_2011 | fye_2011 | overall_rating_2011 | overall_rating_2011_plus_30 | overall_rating_2011_plus_30_v2 | overall_rating_star_2011 | overall_rating_star_2011_text | efficiency_rating_2011 | AT_rating_2011 | financial_rating_star_2011 | AT_rating_star_2011 | program_expense_percent_2011 | admin_expense_percent_2011 | fund_expense_percent_2011 | fund_efficiency_2011 | primary_revenue_growth_2011 | program_expense_growth_2011 | working_capital_ratio_2011 | independent_board_2011 | no_material_division_2011 | audited_financials_2011 | no_loans_related_2011 | documents_minutes_2011 | form_990_2011 | conflict_of_interest_policy_2011 | whistleblower_policy_2011 | records_retention_policy_2011 | CEO_listed_2011 | process_CEO_compensation_2011 | no_board_compensation_2011 | donor_privacy_policy_2011 | board_listed_2011 | audited_financials_web_2011 | form_990_web_2011 | staff_listed_2011 | primary_revenue_2011 | other_revenue_2011 | total_revenue_2011 | govt_revenue_2011 | program_expense_2011 | admin_expense_2011 | fund_expense_2011 | total_functional_expense_2011 | affiliate_payments_2011 | budget_surplus_2011 | net_assets_2011 | leader_comp_2011 | leader_comp_percent_2011 | email_2011 | website_2011 | 2016 Advisory - Date Posted | 2016 Advisory - Charity Name | 2016 Advisory - advisory_url | 2016 Advisory - advisory | _merge_v1 | to_be_merged | NEW ROW | NAME_2015_BMF | STREET_2015_BMF | CITY_2015_BMF | STATE_2015_BMF | ZIP_2015_BMF | RULING_2015_BMF | ACTIVITY_2015_BMF | TAX_PERIOD_2015_BMF | ASSET_AMT_2015_BMF | INCOME_AMT_2015_BMF | REVENUE_AMT_2015_BMF | NTEE_CD_2015_BMF | 2015 BMF | ruledate_2004_BMF | name_MSTRALL | state_MSTRALL | NTEE1_MSTRALL | nteecc_MSTRALL | zip_MSTRALL | fips_MSTRALL | taxper_MSTRALL | income_MSTRALL | F990REV_MSTRALL | assets_MSTRALL | ruledate_MSTRALL | deductcd_MSTRALL | accper_MSTRALL | rule_date_v1 | taxpd | NAME_SOI | yr_frmtn | pt1_num_vtng_gvrn_bdy_mems | pt1_num_ind_vtng_mems | num_vtng_gvrn_bdy_mems | num_ind_vtng_mems | tot_num_empls | tot_num_vlntrs | contri_grnts_cy | prog_srvc_rev_cy | invst_incm_cy | oth_rev_cy | grnts_and_smlr_amts_cy | tot_prof_fndrsng_exp_cy | tot_fndrsng_exp_cy | pt1_tot_asts_eoy | aud_fincl_stmts | mtrl_divrsn_or_misuse | cnflct_int_plcy | whistleblower_plcy | doc_retention_plcy | federated_campaigns | memshp_dues | rltd_orgs | govt_grnts | all_oth_contri | nncsh_contri | tot_contri | psr_tot | inv_incm_tot_rev | bonds_tot_rev | roylrev_tot_rev | net_rent_tot_rev | gain_or_loss_sec | gain_or_loss_oth | oth_rev_tot | tot_rev | mgmt_srvc_fee_tot | fee_for_srvc_leg_tot | fee_for_srvc_acct_tot | fee_for_srvc_lbby_tot | fee_for_srvc_prof_tot | fee_for_srvc_invst_tot | fee_for_srvc_oth_tot | fs_audited | audit_committee | vlntr_hrs | _merge | rule_date | ruledate_2004_BMF_v2 | ruledate_MSTRALL_v2 | yr_frmtn_v2 | age | category_Animals | category_Arts, Culture, Humanities | category_Community Development | category_Education | category_Environment | category_Health | category_Human Services | category_Human and Civil Rights | category_International | category_Religion | category_Research and Public Policy | govt_revenue_2011_binary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9 | 10166 | 043314346 | http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=10166 | Angel Flight Northeast | Health | Health : Patient and Family Support | 2011-01-05 00:00:00 | 2009-12 | 2009-12-01 | FY2009 | 2005-12-01 | CN 1.0 | -- | Donor Advisory | NaN | This donor advisory was published on Wednesday, January 5, 2011.In accordance with our.policy for removing Donor Advisories., Charity Navigator removed the Donor Advisory for Angel Flight Northeast on March 1, 2012 because the Donor Advisory had been in place for more than a year (since January 5, 2011) and because the issue that prompted the Donor Advisory has been resolved..Charity Navigator had published a Donor Advisory for this charity because we became aware of the following informati... | 1 | 0 | 1 | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1 | Angel Flight Northeast | Health | North Andover | MA | Patient and Family Support | Providing free flights so children and adults can access medical care since 1996 | http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=10166 | 04-3314346 | NaN | NaN | -- | -- | NaN | Missing - Apparent Donor Advisory | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | left_only | 1 | NaN | ANGEL FLIGHT OF NEW ENGLAND INC | LAWRENCE MUNICIPAL AIRPORT | NORTH ANDOVER | MA | 01845-0000 | 200812 | 994179000 | 201312 | 869310 | 896259 | 3877845 | E99 | 1 | 199608 | ANGEL FLIGHT NEW ENG | MA | E | E87 | 01867-1110 | 25017 | 200012 | 539450 | 520862 | 318758 | 199608 | 1 | 12 | 1996 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | left_only | 2008 | 1996 | 1996 | nan | 8 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | NaN |
21 | 6466 | 953667812 | http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=6466 | 10,000 Degrees | Education | Education : Scholarship and Financial Support | 2011-09-20 00:00:00 | 2009-06 | 2009-06-01 | FY2009 | 2003-09-01 | CN 2.0 | 85.33 | 3 stars | NaN | NaN | 0 | 0 | 0 | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1 | 10,000 Degrees | Education | San Rafael | CA | Other Education Programs and Services | Creating College Graduates Who Change the World | http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=6466 | 95-3667812 | 06/2009 | 55.33 | 85.33 | 85.33 | 3 | 3 stars | 52.42 | 59 | 3 | 3 | 79.7 | 7.3 | 12.8 | 0.11 | 3.4 | 0 | 0.67 | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | NO | yes | NO | NO | yes | 3914222 | 216503 | 4130725 | Note: This organization receives $0 in government support. | 2813532 | 260007 | 454629 | 3528168 | 0 | 602557 | 3389166 | 154300 | 4.37 | info@10000degrees.org | http://www.10000degrees.org | NaN | NaN | NaN | NaN | both | 0 | NaN | 10000 DEGREES | 1650 LOS GAMOS SUITE 110 | SAN RAFAEL | CA | 94903-1838 | 198105 | 40000000 | 201506 | 8611662 | 7760209 | 7627694 | B11 | 1 | 198211 | MARIN EDUC FND | CA | B | B20 | 94901-2920 | 06041 | 200106 | 3958011 | 3958011 | 1958251 | 198211 | 1 | 06 | 1982 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | left_only | 1981 | 1982 | 1982 | nan | 35 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
print df.columns.tolist()
['org_id', 'EIN', 'org_url', 'name', 'category', 'category-full', 'Date Published', 'Form 990 FYE', 'Form 990 FYE, v2', 'FYE', 'Earliest Rating Publication Date', 'ratings_system', 'Overall Score', 'Overall Rating', 'advisory text - current advisory', 'advisory text - past advisory', 'current_or_past_donor_advisory', 'current_donor_advisory', 'past_donor_advisory', 'latest_entry', 'current_ratings_url', 'ein_2016', 'Publication_date_and_FY_2016', 'Publication Date_2016', 'FYE_2016', 'donor_alert_2016', 'overall_rating_2016', 'efficiency_rating_rating_2016', 'AT_rating_2016', 'overall_rating_star_2016', 'financial_rating_star_2016', 'AT_rating_star_2016', 'program_expense_percent_2016', 'admin_expense_percent_2016', 'fund_expense_percent_2016', 'fund_efficiency_2016', 'working_capital_ratio_2016', 'program_expense_growth_2016', 'liabilities_to_assets_2016', 'independent_board_2016', 'no_material_division_2016', 'audited_financials_2016', 'no_loans_related_2016', 'documents_minutes_2016', 'form_990_2016', 'conflict_of_interest_policy_2016', 'whistleblower_policy_2016', 'records_retention_policy_2016', 'CEO_listed_2016', 'process_CEO_compensation_2016', 'no_board_compensation_2016', 'donor_privacy_policy_2016', 'board_listed_2016', 'audited_financials_web_2016', 'form_990_web_2016', 'staff_listed_2016', 'contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016', 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016', 'total_contributions_2016', 'program_service_revenue_2016', 'total_primary_revenue_2016', 'other_revenue_2016', 'total_revenue_2016', 'program_expenses_2016', 'administrative_expenses_2016', 'fundraising_expenses_2016', 'total_functional_expenses_2016', 'payments_to_affiliates_2016', 'excess_or_deficit_2016', 'net_assets_2016', 'comp_2016', 'cp_2016', 'mission_2016', '2011 data', 'charity_name_2011', 'category_2011', 'city_2011', 'state_2011', 'cause_2011', 'tag_line_2011', 'url_2011', 'ein_2011', 'fye_2011', 'overall_rating_2011', 'overall_rating_2011_plus_30', 'overall_rating_2011_plus_30_v2', 'overall_rating_star_2011', 'overall_rating_star_2011_text', 'efficiency_rating_2011', 'AT_rating_2011', 'financial_rating_star_2011', 'AT_rating_star_2011', 'program_expense_percent_2011', 'admin_expense_percent_2011', 'fund_expense_percent_2011', 'fund_efficiency_2011', 'primary_revenue_growth_2011', 'program_expense_growth_2011', 'working_capital_ratio_2011', 'independent_board_2011', 'no_material_division_2011', 'audited_financials_2011', 'no_loans_related_2011', 'documents_minutes_2011', 'form_990_2011', 'conflict_of_interest_policy_2011', 'whistleblower_policy_2011', 'records_retention_policy_2011', 'CEO_listed_2011', 'process_CEO_compensation_2011', 'no_board_compensation_2011', 'donor_privacy_policy_2011', 'board_listed_2011', 'audited_financials_web_2011', 'form_990_web_2011', 'staff_listed_2011', 'primary_revenue_2011', 'other_revenue_2011', 'total_revenue_2011', 'govt_revenue_2011', 'program_expense_2011', 'admin_expense_2011', 'fund_expense_2011', 'total_functional_expense_2011', 'affiliate_payments_2011', 'budget_surplus_2011', 'net_assets_2011', 'leader_comp_2011', 'leader_comp_percent_2011', 'email_2011', 'website_2011', '2016 Advisory - Date Posted', '2016 Advisory - Charity Name', '2016 Advisory - advisory_url', '2016 Advisory - advisory', '_merge_v1', 'to_be_merged', u'NEW ROW', 'NAME_2015_BMF', 'STREET_2015_BMF', 'CITY_2015_BMF', 'STATE_2015_BMF', 'ZIP_2015_BMF', 'RULING_2015_BMF', 'ACTIVITY_2015_BMF', 'TAX_PERIOD_2015_BMF', 'ASSET_AMT_2015_BMF', 'INCOME_AMT_2015_BMF', 'REVENUE_AMT_2015_BMF', 'NTEE_CD_2015_BMF', '2015 BMF', 'ruledate_2004_BMF', 'name_MSTRALL', 'state_MSTRALL', 'NTEE1_MSTRALL', 'nteecc_MSTRALL', 'zip_MSTRALL', 'fips_MSTRALL', 'taxper_MSTRALL', 'income_MSTRALL', 'F990REV_MSTRALL', 'assets_MSTRALL', 'ruledate_MSTRALL', 'deductcd_MSTRALL', 'accper_MSTRALL', 'rule_date_v1', 'taxpd', 'NAME_SOI', 'yr_frmtn', 'pt1_num_vtng_gvrn_bdy_mems', 'pt1_num_ind_vtng_mems', 'num_vtng_gvrn_bdy_mems', 'num_ind_vtng_mems', 'tot_num_empls', 'tot_num_vlntrs', 'contri_grnts_cy', 'prog_srvc_rev_cy', 'invst_incm_cy', 'oth_rev_cy', 'grnts_and_smlr_amts_cy', 'tot_prof_fndrsng_exp_cy', 'tot_fndrsng_exp_cy', 'pt1_tot_asts_eoy', 'aud_fincl_stmts', 'mtrl_divrsn_or_misuse', 'cnflct_int_plcy', 'whistleblower_plcy', 'doc_retention_plcy', 'federated_campaigns', 'memshp_dues', 'rltd_orgs', 'govt_grnts', 'all_oth_contri', 'nncsh_contri', 'tot_contri', 'psr_tot', 'inv_incm_tot_rev', 'bonds_tot_rev', 'roylrev_tot_rev', 'net_rent_tot_rev', 'gain_or_loss_sec', 'gain_or_loss_oth', 'oth_rev_tot', 'tot_rev', 'mgmt_srvc_fee_tot', 'fee_for_srvc_leg_tot', 'fee_for_srvc_acct_tot', 'fee_for_srvc_lbby_tot', 'fee_for_srvc_prof_tot', 'fee_for_srvc_invst_tot', 'fee_for_srvc_oth_tot', 'fs_audited', 'audit_committee', 'vlntr_hrs', '_merge', 'rule_date', 'ruledate_2004_BMF_v2', 'ruledate_MSTRALL_v2', 'yr_frmtn_v2', 'age', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy']
SOI_columns = ['taxpd', 'yr_frmtn', 'pt1_num_vtng_gvrn_bdy_mems', 'pt1_num_ind_vtng_mems', 'num_vtng_gvrn_bdy_mems',
'num_ind_vtng_mems', 'tot_num_empls', 'tot_num_vlntrs', 'contri_grnts_cy', 'prog_srvc_rev_cy',
'invst_incm_cy', 'oth_rev_cy', 'grnts_and_smlr_amts_cy', 'tot_prof_fndrsng_exp_cy',
'tot_fndrsng_exp_cy', 'pt1_tot_asts_eoy', 'aud_fincl_stmts', 'mtrl_divrsn_or_misuse',
'cnflct_int_plcy', 'whistleblower_plcy', 'doc_retention_plcy', 'federated_campaigns', 'memshp_dues',
'rltd_orgs', 'govt_grnts', 'all_oth_contri', 'nncsh_contri', 'tot_contri', 'psr_tot',
'inv_incm_tot_rev', 'bonds_tot_rev', 'roylrev_tot_rev', 'net_rent_tot_rev', 'gain_or_loss_sec',
'gain_or_loss_oth', 'oth_rev_tot', 'tot_rev', 'mgmt_srvc_fee_tot', 'fee_for_srvc_leg_tot',
'fee_for_srvc_acct_tot', 'fee_for_srvc_lbby_tot', 'fee_for_srvc_prof_tot', 'fee_for_srvc_invst_tot',
'fee_for_srvc_oth_tot', 'fs_audited', 'audit_committee', 'vlntr_hrs', 'NAME_SOI']
len(SOI_columns)