This is a second post exploring the data that I collected from the IGM Experts Panel, which surveys a group of leading economists on a variety of policy questions. A CSV of all the data is available here, as are separate datasets of the questions and responses.

One of the interesting things that Gordon and Dahl looked at in their 2012 paper [1, PDF] was how individual characteristics of economists might influence their responses. They compiled information on each economist including the institution of study, graduation year, current university, field of specialization, gender and NBER classification. Ten economists have been added to the group since 2012, so I did my best to gather this information from their CVs and the NBER database. The final CSV of individual characteristics is available here.

Below, I join this dataset of individual characteristics with the responses data from an earlier post, and look at some of the relationships graphically. I don't have background in economics or a very good understanding of regression analysis, so I stick to plotting trends rather than claiming statistical significance.

In [8]:

```
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')
#pd.set_option('max_colwidth', 30)
pd.set_option('max_colwidth', 400)
matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)
df_responses = pd.read_csv('output_all.csv')
```

In [9]:

```
r_list = ['Strongly Disagree', 'Disagree', 'Uncertain', 'Agree', 'Strongly Agree']
def indicator(x):
if x in r_list:
return r_list.index(x)
else:
return None
df_responses['vote_num'] = df_responses['vote'].apply(indicator)
df_responses['median_num'] = df_responses['median_vote'].apply(indicator)
df_responses['vote_distance'] = abs(df_responses['median_num'] - df_responses['vote_num'])
# Construct a continuous column, incorporating confidence into vote_num
# Divide by 11 so 10 confidence of agree > 0 confidence of strongly agree
df_responses['incr_votenum'] = df_responses['vote_num'] + df_responses['confidence'] / 11.0
# Median incr_votenum for each question:
df_responses['median_incrvotenum'] = df_responses.groupby(
['qtitle','subquestion'])['incr_votenum'].transform('median')
# Calculate distance from median for each econ vote, less biased by outliers.
df_responses['distance_median'] = abs(df_responses['median_incrvotenum'] - \
df_responses['incr_votenum'])
df_responses.shape
```

Out[9]:

Next, I load and do some processing of the individual variables, getting them into the correct format to join with my responses dataset. I also load in the individual characteristics for the ten new economist, and concatenate those to the original individual characteristics from Gordon and Dahl.

The result of the join is a dataset with `8402`

rows and `26`

columns.

In [12]:

```
df_indvars = pd.read_csv('individual-vars.csv')
df_indvars['name'] = df_indvars['name'].str.split(', ').map(lambda row: row[1] + ' ' + row[0])
institutions = ['Berkeley', 'Chicago', 'Harvard', 'MIT', 'Princeton', 'Stanford', 'Yale']
def inst_ind(element):
return institutions[int(element) - 1]
df_indvars['phdfrom'] = df_indvars['phdfrom'].apply(inst_ind)
df_indvars['currentuniv'] = df_indvars['currentuniv'].apply(inst_ind)
def age_cohort(element):
cur_year = 116
cohort = cur_year - int(element)
if cohort <= 15:
return '0 - 15 Years'
elif cohort >= 30:
return '30+ Years'
else:
return '15 - 30 Years'
df_indvars['cohort'] = df_indvars['phdyear'].apply(age_cohort)
ifields = {8:'Industrial Org', 7:'Public Finance', 6:'Labor', 5:'Finance', 4:'Macro', 3:'International'}
def ifield_ind(element):
return ifields[int(element)]
df_indvars['ifield'] = df_indvars['ifield'].apply(ifield_ind)
# Their coding error? Alesina is listed as MAC in appendix, but has ifield of 3?
# df_indvars[] change it?
# Load individual variables from 10 new economists
# Self coded based on personal websites, NBER information
# http://www.nber.org/programs/
df_indvarsnew = pd.read_csv('individual_vars_new.csv')
# Add cohort column
df_indvarsnew['cohort'] = df_indvarsnew['phdyear'].apply(age_cohort)
# Concat both old and new individual vars datasets
df_indvarsall = pd.concat([df_indvars, df_indvarsnew], ignore_index=True)
#df_indvarsall.to_csv('indvars_2016.csv', encoding='utf-8', index=False)
df_indvarsall.head()
# Gordon Online Appendix and Data:
# http://econweb.ucsd.edu/~gdahl/views-among-economists-code.html
# http://econweb.ucsd.edu/~gdahl/papers/views-among-economists-online-appendix.pdf
# Notes: PhD From and Current University categories are BER=Berkeley; CHI=Chicago, Rochester; HAR=Harvard,
# Cambridge, LSE, Wisconsin; MIT=MIT, Oxford; PRI=Princeton; STA=Stanford; YAL=Yale.
# Field categories are defined by primary NBER affiliation: MAC=macro (EFG, ME, POL); INT=international (IFM, ITI);
# FIN=finance (AP, CF); LAB=labor (LS, ED, AG, DAE, DEV); PF=public finance (PF, EEE);
# IO=industrial organization (IO, LE). Three panel members are not in the NBER;
# Ray Fair and James Stock are assigned to MAC, Eric Maskin is assigned to FIN.
# Female is an indicator equal to 1 for women. Wash is an indicator for experience serving in Washington.
```

Out[12]:

In [13]:

```
# Inner join, on name column
df_all = pd.merge(df_responses, df_indvarsall, on=['name'], how='inner')
print df_all.shape
# Remove middle initials for Brunnermeier, Kaplan, to get right shape of (8402, 26)
```

One of the interesting things Gordon and Dahl found was that economists that were educated at MIT and the University of Chicago seemed to be more confident. I find less evidence of this in the newer data, although this is just a boxplot, not a regression analysis.

In [14]:

```
df_all.boxplot(column='confidence', by='phdfrom', whis=[5.0,95.0])
df_all.groupby('phdfrom').agg({'confidence':{'mean': 'mean', 'median':'median',
'std': 'std', 'count':'count'}}).sort_values(
by=('confidence','mean'),
ascending=False)
```

Out[14]:

Another interesting thing to look at is whether any institutions produce graduates with views further from the median view. There definitely are differences in the means and medians below, but I don't think they would rise to the level of significance because the standard deviations overlap.

It's also important to note that the vote counts by institution vary from `2414`

(MIT) to `28`

(Berkeley), so these responses don't represent the institution as a whole.

In [8]:

```
df_all.boxplot(column='distance_median', by='phdfrom', whis=[5.0,95.0])
df_all.hist(column='distance_median', by='phdfrom', bins=20, figsize=(12,12))
df_all.groupby('phdfrom').agg({'distance_median':{'mean': 'mean', 'median':'median',
'std': 'std', 'count':'count'}}).sort_values(
by=('distance_median','mean'),
ascending=False)
#Only 28 votes from Berkeley
#len(df_all[(df_all['phdfrom'] == 'Berkeley') & (df_all['vote'].isin(r_list))])
```

Out[8]:

Spending time in Washington doesn't seem to have much of an effect on confidence. Those that haven't spent time in Washington maybe have a little bit longer tail when it comes to controversial responses, but overall both groups are very similar.

In [18]:

```
df_all.boxplot(column='confidence', by='washington', whis=[5.0,95.0])
df_all.boxplot(column='distance_median', by='washington', whis=[5.0,95.0])
```

Out[18]:

Gordon and Dahl also used the NBER classification of each economist to look at how responses changed by field of study. I do something similar below, looking at confidence and distance_median grouped by field of study. The Finance and Public Economics groups seem a little more confident. The International and Labor Economics groups seem to have a little higher distance_median, but it's unlikely any of this rises to the level of significance.

In [16]:

```
# http://econweb.ucsd.edu/~gdahl/papers/views-among-economists-online-appendix.pdf
# ifield is grouped nber classifications:
# Field categories are defined by primary NBER affiliation: MAC=macro (EFG, ME, POL); INT=international (IFM, ITI);
# FIN=finance (AP, CF); LAB=labor (LS, ED, AG, DAE, DEV); PF=public finance (PF, EEE);
# IO=industrial organization (IO, LE). Three panel members are not in the NBER;
# Ray Fair and James Stock are assigned to MAC, Eric Maskin is assigned to FIN.
df_all.boxplot(column='confidence', by='ifield', rot=90, whis=[5.0,95.0])
df_all.boxplot(column='distance_median', by='ifield', rot=90, whis=[5.0,95.0])
```

Out[16]:

One final thing to do is look at these characteristics by age cohort, defined as years since their PhD. Confidence doesn't seem to be too different by age cohort, but it does seem like there are more outlier responses in the older groups. Perhaps the older economists have earned the right to say controversial things?

In [19]:

```
df_all.boxplot(column='confidence', by='cohort', whis=[5.0,95.0])
df_all.boxplot(column='distance_median', by='cohort', whis=[5.0,95.0])
df_all.hist(column='distance_median', by='cohort', bins=20)
df_all.groupby('cohort').agg({'distance_median':{'mean': 'mean', 'median':'median',
'std': 'std', 'count':'count'},
'confidence': {'mean': 'mean', 'median':'median',
'std': 'std'}}).sort_values(by=('distance_median','mean'),
ascending=False)
```

Out[19]: