** This is the stripped down dw-nominate
notebook.
If you're new to Python or want thorough documentation, please view the dw-nominate-detail
notebook.**
____________
The Nominate scoring scale was first developed by Keith T. Poole and Howard Rosenthal in the late 1980's.
Since then, it has undergone several iterations, with the DW-series being the latest.
Scores are derived from roll call votes, and contain 2 dimensions:
Read more about these metrics on the Voteview website.
The following notebook is going to build off a visualization made by the Pew Research Institute, by making a gif of changes of house ideology accross congresses.
Here's the finished product:
GIF hosted on Github.We're going to
%matplotlib inline
import os
import glob
import us
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import imageio
where the raw files are hosted:
senate_dl = 'ftp://k7moa.com/junkord/SL01113D21_BSSE.dat'
house_dl = 'ftp://k7moa.com/junkord/HL01113D21_BSSE.DAT'
We can store these links in a list of tuples, later we'll iterate through args
to download the correct files.
args = [('senate', senate_dl),
('house', house_dl)]
for arg in args:
print(arg[1])
ftp://k7moa.com/junkord/SL01113D21_BSSE.dat ftp://k7moa.com/junkord/HL01113D21_BSSE.DAT
columns copied and pasted from voteview docs.
cols = '''Congress Number
ICPSR ID Number
State Code
Congressional District Number
State Name
Party Code
Name
1st Dimension Coordinate
2nd Dimension Coordinate
1st Dimension Bootstrapped Standard Error
2nd Dimension Bootstrapped Standard Error
Correlation Between 1st and 2nd Dimension
Log-Likelihood
Number of Votes
Number of Classification Errors
Geometric Mean Probability'''.split('\n')
"We split the columns by a newline character (\n), to get {} columns.".format(len(cols))
'We split the columns by a newline character (\n), to get 16 columns.'
col_widths = [(0,4), (4,10), (10,13), (13,15),
(15,23), (23,28), (28,40), (40,50),
(50,60), (60,70), (70,80), (80,90),
(90,102), (102,107), (107, 112)]
# This dict is for compat. with Propublica Congress API.
col_mapping = {'Name' : 'last_name',
'Congress Number': 'Congress'}
# dict comprehension to convert state names to abbreviations.
state_dict = {state.name.upper()[:7] : state.abbr for state in us.states.STATES}
df = pd.read_fwf(house_dl, names=cols, colspecs=col_widths)
df.head(3)
Congress Number | ICPSR ID Number | State Code | Congressional District Number | State Name | Party Code | Name | 1st Dimension Coordinate | 2nd Dimension Coordinate | 1st Dimension Bootstrapped Standard Error | 2nd Dimension Bootstrapped Standard Error | Correlation Between 1st and 2nd Dimension | Log-Likelihood | Number of Votes | Number of Classification Errors | Geometric Mean Probability | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 9062 | 1 | 98 | CONNECT | 5000 | STURGES | 0.648 | 0.330 | 0.1129 | 0.1204 | -0.1329 | -26.87624 | 80 | 12 | NaN |
1 | 1 | 9706 | 1 | 98 | CONNECT | 5000 | WADSWORTH | 0.759 | 0.136 | 0.0751 | 0.1067 | -0.2138 | -18.20466 | 86 | 4 | NaN |
2 | 1 | 8457 | 1 | 98 | CONNECT | 5000 | SHERMAN | 0.715 | 0.203 | 0.0759 | 0.2234 | -0.2127 | -39.45672 | 107 | 18 | NaN |
Let's read both files into one dataframe.
df_congress = pd.DataFrame()
for chamber in args:
df = pd.read_fwf(chamber[1], names=cols, colspecs=col_widths)
# replace column names from col_mapping dict
df.columns = [col_mapping.get(col, col) for col in df.columns]
# create column for senate or house
df['chamber'] = chamber[0]
# convert state names to state id.
df['state'] = df['State Name'].replace(state_dict)
df_congress = df_congress.append(df, ignore_index=True)
df_congress.chamber.unique()
array(['senate', 'house'], dtype=object)
let's set some variables for the next graph.
first_congress_yr = 1789
colors = ['b', 'r']
annotation = ('Ideology score from 1st Dimensional Coordinate of DW-Nominate\n'
'Source: voteview.com/dwnomin.htm\nAuthor: @leonyin')
col_name = 'More Liberal' + (57 * ' ') + 'More Conservative'
house_plot = {
'ylim' : [0, 80],
'y_maj' : 20,
'y_min' : 10,
'ylabel': '# House Reps'
}
senate_plot ={
'ylim' : [0, 24],
'y_maj' : 8,
'y_min' : 4,
'ylabel' : '# Senators'
}
def get_nominal(x):
'''
Returns a nominal for each congress number
'''
if x % 10 == 1 and x < 110:
return 'st'
elif x % 10 == 2 and x < 110:
return 'nd'
elif x % 10 == 3 and x < 110:
return 'rd'
else:
return 'th'
def rep_dem_indie(row):
'''
Maps Democrat, Republican, or Other for a given Pandas row.
'''
if row['Party Code'] == 100:
return 'D'
elif row['Party Code'] == 200:
return 'R'
else:
return None
def plot_polarity(congress, chamber='senate', how='area',
col='1st Dimension Coordinate'):
'''
Plot the liberal-conservative polarity for a congress
for either house of reps or senate.
The default metric plotted is the 1st dim coordinate.
'''
if chamber == 'house':
plot_vars = house_plot
else:
plot_vars = senate_plot
# Set variables for title
congress_year = first_congress_yr + (2 * congress) - 2
congress_nominal = str(congress) + get_nominal(congress)
# create supertitle and subtitle.
sup_title = '{} Ideology'.format(chamber.title())
title = '{} Congress {}-{}'.format(congress_nominal,
congress_year,
congress_year + 1)
# filter the df to the correct conress and chamber.
df_c = df_congress[(df_congress['Congress'] == congress) &
(df_congress['chamber'] == chamber)]
# fix up the party names.
df_c['Party'] = df_c.apply(rep_dem_indie, axis=1)
# round ideology dimension.
df_c[col_name] = df_c[col].round(1)
# this groupby creates a multi-index,
# to remove one index, we use the unstack() function,
# reverting Party back into a column
df2 = df_c.groupby(['Party', col_name])[col_name].count() \
.unstack('Party').fillna(0)
# below is all matplotlib functions.
fig = plt.figure()
fig.suptitle(sup_title, fontsize=17)
ax = fig.add_subplot(111)
# plot the one line per party
df2[['D','R']].plot(kind=how, stacked=False,
color=colors, alpha=.46,
xlim=[-1,1], ylim=plot_vars['ylim'],
ax=ax, title=title)
# add y label
ax.set_ylabel(plot_vars['ylabel'])
# vertical line at x = 0
plt.axvline(0, color='k', linestyle='dotted')
# plot legend
plt.legend(loc='upper right', frameon=False)
# tailor x and y axis ticks
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
ax.xaxis.set_minor_locator(ticker.MultipleLocator(.1))
ax.yaxis.set_major_locator(ticker.MultipleLocator(plot_vars['y_maj']))
ax.yaxis.set_minor_locator(ticker.MultipleLocator(plot_vars['y_min']))
ax.yaxis.set_ticks_position('left')
ax.xaxis.set_ticks_position('bottom')
# write annotations
plt.annotate(annotation, (0,0), (0, -32), xycoords='axes fraction',
textcoords='offset points', va='top')
plt.subplots_adjust(top=0.86)
# save the figure in as a png
plt.savefig('figs/{}/{}.png'.format(chamber, congress),
bbox_inches='tight', dpi=100)
plot_polarity(87, chamber='house', how='area')
/Users/leonyin/anaconda/lib/python3.5/site-packages/ipykernel/__main__.py:29: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy /Users/leonyin/anaconda/lib/python3.5/site-packages/ipykernel/__main__.py:32: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
We can do this for many congresses, and use the images to generate a nice gif.
For this visualization, I chose to begin at the 87th congress, while JFK was in office.
def make_gif(congress, congress_start, congress_end):
'''
Saves a png for each congress into the figs subdirectory.
Uses ImageIO to combine images into a gif.
Deletes all png files in directory.
'''
for i in range(congress_start, congress_end):
plot_polarity(i, chamber=congress, how='area')
filenames = glob.glob('figs/{}/*.png'.format(congress))
images = []
for filename in filenames:
images.append(imageio.imread(filename))
os.remove(filename)
kwargs = { 'duration': .23 }
imageio.mimsave('figs/{}/movie.gif'.format(congress),
images, **kwargs)
make_gif(congress='house',
congress_start=87,
congress_end=114)
/Users/leonyin/anaconda/lib/python3.5/site-packages/ipykernel/__main__.py:29: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy /Users/leonyin/anaconda/lib/python3.5/site-packages/ipykernel/__main__.py:32: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy /Users/leonyin/anaconda/lib/python3.5/site-packages/matplotlib/pyplot.py:524: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`). max_open_warning, RuntimeWarning)
Here's the finished product:
GIF hosted on Github.Notice that values on the liberal spectrum are almost never less than -0.7.
Also notice the gradual polarization between the two parties.
This is the beginning of what we're going to use the Nominate dataset for.
In the next notebook, we'll examine how we can extend the use of the Nominate dataset to the Propublica Congress API.