Heat and Violence in Chicago

Brian Keegan (@bkeegan), College of Social Sciences and Humanities, Northeastern University

Can we attribute a fall in violent crime in Chicago to its new conceal and carry laws? This was the argument many conservative blogs and news outlets were making in early April after statistics were released showing a marked drop in the murder rate in Chicago. These articles attributed "Chicago's first-quarter murder total [hitting] its lowest number since 1958" to the deterrent effects of Chicago's concealed and carry permits being issued in late February (RedState). Other conservative outlets latched onto the news to argue the policy "is partly responsible for Chicago's across-the-board drop in the crime" (TheBlaze) or that the policy contributed to the "murder rate promptly [falling] to 1958 levels" (TownHall).

Several articles hedged about the causal direction of any relationship and pointed out that this change is hard to separate from falling general crime rates as well as the atrocious winter weather this season (PJMedia, Wonkette, HuffPo) The central claim here is whether the adoption of the conceal and carry policy in March 2014 contributed to significant changes in crime rates rather than other social, historical, or environmental factors.

However, an April 7 feature story by David Bernstein and Noah Isackson in Chicago magazine found substantial evidence of violent crimes like homicides, robberies, burglaries, and assaults being reclassified, downgraded to more minor crimes, and even closed as noncriminal incidents. They argue that after Police Superintendent Garry McCarthy arrived in May 2011, the drop in crime has improbably plummeted in spite of high unemployment and significant contraction in the Chicago Police Department's beat cops. An audit by Chicago's inspector general into these crime numbers suggests assaults and batteries may have been underreported by more than 24%. This raises a second question: can we attribute the fall in violent crime in Chicago to systematic underreporting of criminal statistics?

In this post, I do four things:

  • First, I demonstate the relationship crime has with environmental factors like temperature as well as temporal factors like the hour of the day and day of the week. I use a common technique in signal processing to identify that criminal activity not only follows an annual pattern, but also patterns by day of the week.
In [381]:
Image(filename='homicides_month_hour_heatmap.png')
Out[381]:
  • Second, I estimate a simple statistical model based on the findings above. This model combines temperature, the day of the week, the week of the year, and longer-term historical trends and despite its simplicity (relative to more advanced types of time series models that could be estimated), does a very good job explaining the dynamics of crime in Chicago over the past 13 years.
In [382]:
Image(filename='personal_model_comparison.png')
Out[382]:
  • Third, I use this statistical model to make predictions about crime rates for the rest of 2014. If there's a significant fall-off in violent crime following the introduction of the conceal and carry policy in March 2014, this could be evidence of its success as a deterrent (or that this is a bad model). But if the actual crime data matches the model's forecasted trends, it suggests the new conceal and carry policy has had no effect. There are no findings here as yet, but I expect as the data comes in there will be no significant changes after March 2014.
In [383]:
Image(filename='2014_homicide_predictions.png')
Out[383]:
  • Fourth, I find evidence of substantial discrepancies in the reporting some crime data since 2013. This obviously imperils the findings of the analyses done above, but also replicates the findings reported by Bernstein and Isackson. The statistical model above expected that property crimes such as arson, burglary, theft, and robbery should follow a particular pattern, which the observed data significantly deviates from after 2013. I perform some additional analyses to uncover which crimes and reporting districts are driving this discrepancy as well as how severe this discrepancy is.
In [384]:
Image(filename='personal_crime_rates_down.png')
Out[384]:

Start your kernels!

In [271]:
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.sandbox.regression.predstd import wls_prediction_std
import seaborn as sb
from collections import Counter
from IPython.core.display import Image
import scipy.stats as stats

Scrape data

This will scrape historical weather data from Weather Underground between 2001 and 2014 and save each year's data as a CSV files.

In [ ]:
for i in range(2001,2015):
    with open('{0}.csv'.format(str(i)),'wb') as f:
        data = urllib2.urlopen('http://www.wunderground.com/history/airport/ORD/{0}/1/1/CustomHistory.html?dayend=1&monthend=1&yearend={1}&format=1'.format(str(i),str(i+1))).read()
        f.write(data)

Next we want to combine each year's CSV file into a master CSV file for all years. We do this by loading each CSV and saving it into the weather list of DataFrames and then using pd.concat to combine them all together. Then we save the resulting data as weather.csv to load up later.

In [ ]:
weather = list()

for i in range(2001,2015):
    weather.append(pd.read_csv('{0}.csv'.format(str(i)),skiprows=1))
    
weather_df = pd.concat(weather)
weather_df.set_index('CST',inplace=True)
weather_df.index = pd.to_datetime(weather_df.index,unit='D')
weather_df.to_csv('weather.csv')

Read data

Weather

Read in weather.csv and clean up the data so that it's indexed by a proper DateTimeIndex and junk in the rainfall data is removed (in case we wanted to use it later).

In [2]:
weather_df = pd.read_csv('weather.csv',low_memory=False,index_col=0)
weather_df.index = pd.to_datetime(weather_df.index,unit='D')
weather_df['PrecipitationIn'] = weather_df['PrecipitationIn'].replace('T',np.nan)
weather_df['PrecipitationIn'] = weather_df['PrecipitationIn'].dropna().astype(float)
weather_df.tail()
Out[2]:
Max TemperatureF Mean TemperatureF Min TemperatureF Max Dew PointF MeanDew PointF Min DewpointF Max Humidity Mean Humidity Min Humidity Max Sea Level PressureIn Mean Sea Level PressureIn Min Sea Level PressureIn Max VisibilityMiles Mean VisibilityMiles Min VisibilityMiles Max Wind SpeedMPH Mean Wind SpeedMPH Max Gust SpeedMPH PrecipitationIn CloudCover
2014-04-02 45 40 34 29 26 24 70 60 49 30.17 30.12 30.05 10 10 10 17 10 29 NaN 7 ...
2014-04-03 43 39 35 36 32 28 89 75 60 30.04 29.86 29.67 10 6 0 29 14 35 0.53 8 ...
2014-04-04 48 41 34 41 33 21 92 73 54 30.00 29.60 29.45 10 6 0 30 17 40 NaN 8 ...
2014-04-05 51 41 31 25 22 17 69 49 29 30.20 30.13 30.02 10 10 10 13 6 16 0.00 3 ...
2014-04-06 57 44 31 29 25 19 63 49 35 30.17 30.08 29.96 10 10 10 14 7 20 0.00 4 ...

5 rows × 22 columns

List out all the different variables in the weather data.

In [3]:
weather_df.columns
Out[3]:
Index([u'Max TemperatureF', u'Mean TemperatureF', u'Min TemperatureF', u'Max Dew PointF', u'MeanDew PointF', u'Min DewpointF', u'Max Humidity', u' Mean Humidity', u' Min Humidity', u' Max Sea Level PressureIn', u' Mean Sea Level PressureIn', u' Min Sea Level PressureIn', u' Max VisibilityMiles', u' Mean VisibilityMiles', u' Min VisibilityMiles', u' Max Wind SpeedMPH', u' Mean Wind SpeedMPH', u' Max Gust SpeedMPH', u'PrecipitationIn', u' CloudCover', u' Events', u' WindDirDegrees<br />'], dtype='object')

All crime data

In [4]:
crime_df = pd.read_csv('all_crime.csv')
crime_df['Datetime'] = pd.to_datetime(crime_df['Date'],format="%m/%d/%Y %I:%M:%S %p")
crime_df['Date'] = crime_df['Datetime'].apply(lambda x:x.date())
crime_df['Weekday'] = crime_df['Datetime'].apply(lambda x:x.weekday())
crime_df['Hour'] = crime_df['Datetime'].apply(lambda x:x.hour)
crime_df['Day'] = crime_df['Datetime'].apply(lambda x:x.day)
crime_df['Week'] = crime_df['Datetime'].apply(lambda x:x.week)
crime_df['Month'] = crime_df['Datetime'].apply(lambda x:x.month)

crime_df.head()
C:\Anaconda\lib\site-packages\pandas\io\parsers.py:1070: DtypeWarning: Columns (11,13) have mixed types. Specify dtype option on import or set low_memory=False.
  data = self._reader.read(nrows)
Out[4]:
ID Case Number Date Block IUCR Primary Type Description Location Description Arrest Domestic Beat District Ward Community Area FBI Code X Coordinate Y Coordinate Year Updated On Latitude
0 9558116 HX209604 2014-04-03 026XX W LOGAN BLVD 0610 BURGLARY FORCIBLE ENTRY OTHER False False 1411 14 35 22 05 1158555 1917229 2014 04/06/2014 12:38:43 AM 41.928617 ...
1 9558320 HX209887 2014-04-03 008XX N LONG AVE 0610 BURGLARY FORCIBLE ENTRY APARTMENT False False 1524 15 37 25 05 1140117 1905254 2014 04/06/2014 12:38:43 AM 41.896114 ...
2 9557107 HX208940 2014-04-03 070XX S RACINE AVE 0495 BATTERY AGGRAVATED OF A SENIOR CITIZEN APARTMENT True True 734 7 17 67 04B 1169494 1858212 2014 04/06/2014 12:38:43 AM 41.766438 ...
3 9557134 HX208947 2014-04-03 025XX N MILWAUKEE AVE 0810 THEFT OVER $500 STREET False False 1414 14 35 22 06 1155439 1916718 2014 04/06/2014 12:38:43 AM 41.927278 ...
4 9557154 HX208991 2014-04-03 070XX S MERRILL AVE 2820 OTHER OFFENSE TELEPHONE THREAT RESIDENCE False True 331 3 5 43 26 1191720 1858787 2014 04/05/2014 12:39:46 AM 41.767505 ...

5 rows × 28 columns

In [5]:
dict(Counter(crime_df['Primary Type']))
Out[5]:
{'ARSON': 9065,
 'ASSAULT': 330270,
 'BATTERY': 1006149,
 'BURGLARY': 325403,
 'CONCEALED CARRY LICENSE VIOLATION': 3,
 'CRIM SEXUAL ASSAULT': 19481,
 'CRIMINAL DAMAGE': 636302,
 'CRIMINAL TRESPASS': 162053,
 'DECEPTIVE PRACTICE': 176914,
 'DOMESTIC VIOLENCE': 1,
 'GAMBLING': 13142,
 'HOMICIDE': 6581,
 'INTERFERE WITH PUBLIC OFFICER': 6172,
 'INTERFERENCE WITH PUBLIC OFFICER': 3365,
 'INTIMIDATION': 3315,
 'KIDNAPPING': 5802,
 'LIQUOR LAW VIOLATION': 12786,
 'MOTOR VEHICLE THEFT': 264724,
 'NARCOTICS': 628009,
 'NON - CRIMINAL': 2,
 'NON-CRIMINAL': 14,
 'NON-CRIMINAL (SUBJECT SPECIFIED)': 3,
 'OBSCENITY': 276,
 'OFFENSE INVOLVING CHILDREN': 33480,
 'OFFENSES INVOLVING CHILDREN': 439,
 'OTHER NARCOTIC VIOLATION': 95,
 'OTHER OFFENSE': 338382,
 'OTHER OFFENSE ': 7,
 'PROSTITUTION': 63493,
 'PUBLIC INDECENCY': 110,
 'PUBLIC PEACE VIOLATION': 38695,
 'RITUALISM': 23,
 'ROBBERY': 205745,
 'SEX OFFENSE': 20011,
 'STALKING': 2566,
 'THEFT': 1123449,
 'WEAPONS VIOLATION': 51793}
In [10]:
personal_crimes = ['ASSAULT','BATTERY','CRIM SEXUAL ASSAULT','HOMICIDE']
property_crimes = ['ARSON','BURGLARY','MOTOR VEHICLE THEFT','ROBBERY','THEFT']

Group data by date and join crime and weather data together

Join the temperature and crime data together based on their sharing a common DateTimeIndex.

In [19]:
arson_gb = crime_df[crime_df['Primary Type'] == 'ARSON'].groupby('Date')['ID'].agg(len)
assault_gb = crime_df[crime_df['Primary Type'] == 'ASSAULT'].groupby('Date')['ID'].agg(len)
battery_gb = crime_df[crime_df['Primary Type'] == 'BATTERY'].groupby('Date')['ID'].agg(len)
burglary_gb = crime_df[crime_df['Primary Type'] == 'BURGLARY'].groupby('Date')['ID'].agg(len)
homicide_gb = crime_df[crime_df['Primary Type'] == 'HOMICIDE'].groupby('Date')['ID'].agg(len)
sexual_assault_gb = crime_df[crime_df['Primary Type'] == 'CRIM SEXUAL ASSAULT'].groupby('Date')['ID'].agg(len)
robbery_gb = crime_df[crime_df['Primary Type'] == 'ROBBERY'].groupby('Date')['ID'].agg(len)
theft_gb = crime_df[crime_df['Primary Type'] == 'THEFT'].groupby('Date')['ID'].agg(len)
vehicle_theft_gb = crime_df[crime_df['Primary Type'] == 'MOTOR VEHICLE THEFT'].groupby('Date')['ID'].agg(len)
personal_gb = crime_df[crime_df['Primary Type'].isin(personal_crimes)].groupby('Date')['ID'].agg(len)
property_gb = crime_df[crime_df['Primary Type'].isin(property_crimes)].groupby('Date')['ID'].agg(len)

arson_gb.index = pd.to_datetime(arson_gb.index,unit='D')
assault_gb.index = pd.to_datetime(assault_gb.index,unit='D')
battery_gb.index = pd.to_datetime(battery_gb.index,unit='D')
burglary_gb.index = pd.to_datetime(burglary_gb.index,unit='D')
homicide_gb.index = pd.to_datetime(homicide_gb.index,unit='D')
sexual_assault_gb.index = pd.to_datetime(sexual_assault_gb.index,unit='D')
robbery_gb.index = pd.to_datetime(robbery_gb.index,unit='D')
theft_gb.index = pd.to_datetime(theft_gb.index,unit='D')
vehicle_theft_gb.index = pd.to_datetime(vehicle_theft_gb.index,unit='D')
personal_gb.index = pd.to_datetime(personal_gb.index,unit='D')
property_gb.index = pd.to_datetime(property_gb.index,unit='D')


ts = pd.DataFrame({'Arson':arson_gb.ix[:'2014-3-31'],
                   'Assault':assault_gb.ix[:'2014-3-31'],
                   'Battery':battery_gb.ix[:'2014-3-31'],
                   'Burglary':burglary_gb.ix[:'2014-3-31'],
                   'Homicide':homicide_gb.ix[:'2014-3-31'],
                   'Sexual_assault':sexual_assault_gb.ix[:'2014-3-31'],
                   'Robbery':robbery_gb.ix[:'2014-3-31'],
                   'Vehicle_theft':vehicle_theft_gb.ix[:'2014-3-31'],
                   'Theft':theft_gb.ix[:'2014-3-31'],
                   'Personal':personal_gb.ix[:'2014-3-31'],
                   'Property':property_gb.ix[:'2014-3-31'],
                   'Temperature':weather_df['Mean TemperatureF'].ix[:'2014-3-31'],
                   'Binned temperature':weather_df['Mean TemperatureF'].ix[:'2014-3-31']//10.*10,
                   'Humidity':weather_df[' Mean Humidity'].ix[:'2014-3-31'],
                   'Precipitation':weather_df['PrecipitationIn'].ix[:'2014-3-31']
                   })

ts['Time'] = range((max(ts.index)-min(ts.index)).days+1)

ts.reset_index(inplace=True)
ts.set_index('index',drop=False,inplace=True)
ts['Weekday'] = ts['index'].apply(lambda x:x.weekday())
ts['Hour'] = ts['index'].apply(lambda x:x.hour)
ts['Week'] = ts['index'].apply(lambda x:x.week)
ts['Month'] = ts['index'].apply(lambda x:x.month)
ts['Year'] = ts['index'].apply(lambda x:x.year)
ts['Weekend'] = ts['Weekday'].isin([5,6]).astype(int)

PART 1: The dynamics of crime, time, and weather

Define a helper function that we'll be using later on.

In [20]:
# adapted from http://matplotlib.org/examples/api/barchart_demo.html
def autolabel(rects):
    max_height = max([rect.get_height() for rect in rects if hasattr(rect,'get_height') and not np.isnan(rect.get_height())])
    for rect in rects:
        if hasattr(rect,'get_height'):
            height = rect.get_height()
            if not np.isnan(height):
                ax.text(rect.get_x()+rect.get_width()/2., height-.05*max_height, '%d'%int(height),
                ha='center', va='bottom',color='w')

Plot the occurrence of crimes over time. There are several apparent features:

  1. Many crimes have very strong annual seasonality: rates are higher in the summer months and lower in the winter months.

  2. There's a decreasing trend in many types of crimes over time.

I've marked March 1, 2014 with a vertical black dotted line to indicate when the gun policy went into effect.

In [25]:
figsize(12,6)
ts2 = pd.DataFrame({'Arson':arson_gb/float(arson_gb.max()),
                   'Assault':assault_gb/float(assault_gb.max()),
                   'Battery':battery_gb/float(battery_gb.max()),
                   'Burglary':burglary_gb/float(burglary_gb.max()),
                   'Homicide':homicide_gb/float(homicide_gb.max()),
                   'Sexual assault':sexual_assault_gb/float(sexual_assault_gb.max()),
                   'Robbery':robbery_gb/float(robbery_gb.max()),
                   'Theft':theft_gb/float(theft_gb.max()),})
ax = ts2.resample('M').plot(lw=4,alpha=.75,colormap='hsv')
ax.set_ylabel('Incidents (normalized)',fontsize=18)
#ax.right_ax.set_ylabel('Temperature  (F)',fontsize=18)

ax.set_xlabel('Time',fontsize=18)
ax.set_ylim((0,1))
#ax.set_yscale('log')
ax.grid(False,which='minor')

plt.axvline('2014-3-1',c='k',ls='--')
ax.legend(loc='upper center',ncol=4)
Out[25]:
<matplotlib.legend.Legend at 0x8875f080>

We can plot the strength of the correlations between these crime statistics and some other variables as well. We add in Temperature,Humidity, and Precipitation. Redder colors are stronger correlations, bluer colors are weaker or negative correlations. The strongest correlation we find (darkest red) is between Battery and Assault: when batteries are high assaults are also high. The weakest correlation we find is between Humidity and Temperature: when temperature is high, humidity is low. Note that I've set the diagonal to zero (Arson is perfectly correlated with Arson).

In [28]:
figsize(8,6)
#ts.corr().columns
a = np.array(ts[['Arson','Burglary','Robbery','Theft','Assault','Battery','Homicide','Sexual_assault','Temperature','Humidity','Precipitation']].resample('M').corr())
np.fill_diagonal(a,0)
plt.pcolor(a,cmap='RdBu_r',edgecolors='k')

plt.xlim((0,11))
plt.ylim((0,11))

plt.xticks(arange(.5,11.5),['Arson','Burglary','Robbery','Theft','Assault','Battery','Homicide','Sexual assault','Temperature','Humidity','Precipitation'],rotation=90,fontsize=15)
plt.yticks(arange(.5,11.5),['Arson','Burglary','Robbery','Theft','Assault','Battery','Homicide','Sexual assault','Temperature','Humidity','Precipitation'],fontsize=15)

plt.title('Correlation between crime occurences',fontsize=20)
plt.colorbar()
plt.grid(b=True,which='major',alpha=.5)
In [29]:
figsize(12,6)
ts3 = pd.DataFrame({'Personal':personal_gb/float(personal_gb.max()),
                    'Property':property_gb/float(property_gb.max())
                    })
ax = ts3.resample('M').plot(lw=4,alpha=.75,colormap='jet')
ax.set_ylabel('Incidents (normalized)',fontsize=18)
#ax.right_ax.set_ylabel('Temperature  (F)',fontsize=18)

ax.set_xlabel('Time',fontsize=18)
ax.set_ylim((0,1))
#ax.set_yscale('log')
ax.grid(False,which='minor')
ax.legend(loc='upper center',ncol=4,fontsize=18)
Out[29]:
<matplotlib.legend.Legend at 0x4a961400>

Crime as a function of temperature

Next we explore the correlation between temperature and crime above. The figure below plots the

In [393]:
ax = ts.boxplot(['Personal','Property'],by='Binned temperature')
ax[0].set_ylabel('Number of crimes',fontsize=18)
ax[0].set_xlabel('Temperature (F)',fontsize=18)
ax[0].set_title('Personal crimes',fontsize=15)

ax[1].set_xlabel('Temperature (F)',fontsize=18)
ax[1].set_title('Property crimes',fontsize=15)

plt.suptitle('Crime increases with temperature',fontsize=20)

#plt.xticks(plt.xticks()[0],arange(-10,100,10),fontsize=15)
Out[393]:
<matplotlib.text.Text at 0x1234e1780>

We can also examine the relationship between Temperature, Humidity, and number of crimes. First, we extract the total number of crimes for each observed combination of Temperature and Humidity and store this in array1. Then we extract the total observations of Temperature and Humidity and store this in array2. The latter is to normalize the former in the event there are some combinations of temperature occur frequently and result in us overcounting the crime statistics.

Moving from bottom to top we can see the effect we saw above that temperature increases the frequency of crimes (bluer is less crime, redder is more crime). Moving from left to right we see crime doesn't vary substantially as function of humidity, with the exception that very high levels of humidity (above 60%) might have higher rates of crime

In [31]:
var = 'Personal'
ct1 = ts.groupby(['Temperature','Humidity'])[var].agg(np.sum).reset_index()
array1 = np.array(pd.pivot_table(ct1,values=var,rows='Temperature',cols='Humidity').fillna(0))

ct2 = ts.groupby(['Temperature','Humidity'])[var].agg(len).reset_index()
array2 = np.array(pd.pivot_table(ct2,values=var,rows='Temperature',cols='Humidity').fillna(0))

normalized_crime = array1/array2

plt.imshow(normalized_crime,cmap='RdBu_r',origin='lower',label='Count')
#plt.legend(loc='upper right',bbox_to_anchor=(1,.5))
plt.xlabel('Humidity',fontsize=18)
plt.ylabel('Temperature',fontsize=18)
plt.title('{0} crimes by temperature and humidity'.format(var),fontsize=24)
plt.colorbar()
Out[31]:
<matplotlib.colorbar.Colorbar instance at 0x0000000031055888>