Disclaimer: This is my Coursera Capstone Project for Data Science. Also, I will be using the term "Black" instead of African American because that is the race indicator provided from the dataset and the category also includes persons of African origin (substantial minority).
I'm a Frogtown resident, community advocate, programmer. I will like to use open-source data to be share stories and create action. Please follow me on Github
Currently in the United States, there is alot of tension between law enforcement, and the public. I will be looking into the Traffic Stop Data for Saint Paul, Minnesota (USA) provided by the Saint Paul Police Department (SPPD). Analyzing the traffic stop data can provide evidence (or lack thereof) of systemic biases. The goal of my report to add insight on what is happening in my community, Thomas-Dale neighborhood aka Frogtown, as well as advocate citizens to use open source data and/or demand their public agencies to provide such data.
There are certain parts of Frogtown that have greater frequency of traffic stops compared to rest of the neighborhood, particularly along University University Avenue.The data suggests targeting of Black drivers given that they are stopped more frequently, searched more frequently per stop, and less likelihood to receive a citation. Some other data insights include that moving violation stops are given mostly in the morning and have a higher citation rate. In contrast, during the late night hours, there are greater instances of equipment violations. Furthermore, there seems to be many communities, including Frogtown that have considerable instances of equipment violations.
The results may be obvious, but proving it may be more challenging. As a researcher, my goal is to measure the impact, seek the truth, explore, and challenge my expectations. Data can be the great equalizer challenging our worldviews and/or reinforcing our existing perspective. Data reports can be used to share stories and information effectively. Furthermore, data is used as an evaluation tool to determine the effectiveness of programs and policies.
Thus data practitioners, more broadly institutions, hold strong responsibility and influence in shaping the data in support a certain narrative. In our current political climate, the public perception on law enforcement is polarized and I hope these studies can shed light on the issues. This report and other will be available via open source, allowing others to contribute, replicate, use code for their own respective neighborhood.
The City of Saint Paul is the second largest city in Minnesota, USA, and is the capital city for the state. Saint Paul is often paired with their nearby city, Minneapolis, where they are aptly named, The Twin Cities. It has roughly over 300,000 people and the city itself is quite diverse. Minnesota has a high level of racial inequity ranking 47th of 51st compared to rest of the United States. Saint Paul is broken down to seventeen Planning Districts, created in 1979 to allow neighborhoods to participate in governance and use Community Development Block Grants. The Thomas/Dale neighborhood is one of the district planning councils. A few years ago, a tragic police fatal shooting of African-American male, Philando Castile, occurred during a traffic stop in the suburbs of Saint Paul, Falcon Heights, which has increased tension within the community between law enforcement and citizens.
The Frogtown community has historically been a transitional community with new immigrant/refugee communities living in the neighborhood for short period of time. From my experience, Frogtown boasts considerable diversity respect to language, culture, and ethnicity. In recent times, it has been historically poor. Here is a snapshot of the community exported from Minnesota Compass based on 2017 Census Demographic Data.
The image below displays the Frogtown Community using the police grid (matches well with actual boundaries). On the southern boundary of Frogtown is University Avenue, where the Light Rail Transportation runs along the boundary and it is a heavy residential street as well. I will emphasize more noticeable landmarks once plugging in the 4-square data. The two rightmost sectors are respectfully Mt. Airy and Capitol Heights. These two communities are considered distinct by community members.
The dataset contains SPPD traffic stop collected by SSPD from 2000 to 2018 via agreement of the Saint Paul chapter of the NAACP and can be accessed here. The website have a lot of features and visualizations for basic analysis, but for advanced users data transformations are not available/ limited. I have chosen to select years from 2017 to 2018 based on data limitations.
Data Features:
Data limitation as explain on the website:
Reason for stop (available starting in 2017)
Data reflects traffic stops originating by St. Paul Police Officers
Race is based on officers’ perceptions
Fields indicating “No Data” may be due to a variety of factors, including:
Supplemental Info Suggested by Author
The dataset consist of each record of driver being stopped, but the locations coordinates are limited to the police grid coordinates. There is maybe over 90 or so police grids!
I'll be using the Four-square API to get information on local businesesses. Some street/ and areas might be more active than others
The Minnesota Compass offers raw data for both Minneapolis and Saint Paul districts. They are a non-partisan group.
The primary data will range from 2017-18. The longitudinal analysis will have data from 2001 to 2018.
import pandas as pd
import numpy as np
from datetime import datetime
import warnings
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
warnings.filterwarnings('ignore')
import plotly
from IPython.display import HTML
from IPython.display import display
import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
from sodapy import Socrata
#New Upload Method Get Information from Socrata API
client = Socrata("information.stpaul.gov", None)
#Data Load
#df = pd.read_csv('Data/Traffic_Stop_Dataset.csv')
#More familar column names Column Names
#cols= ['Year','Date','Race','Gender','Driver_search','Vehicle_search','Citation','Age','Reason','Grid','GridLocation', 'Count']
#df.columns= cols
results = client.get("kkd6-vvns", limit=1000000)
results_df = pd.DataFrame.from_records(results)
#rename columns
cols= ['Gunk', 'Gunk2', 'Gunk3','Age','Citation', 'Count', 'Date','Driver_search','Gender','GridLocation', 'Grid','Race','Reason','Vehicle_search','Year']
results_df.columns= cols
results_df =results_df.iloc[:,3:]
#change datatypes
results_df = results_df.astype({"Year": int, "Age": float, "Grid":float, "Count":int})
#We will be choosing from 2017 to 2018 because there a reason given for traffic stop
df= results_df.query('Year in [2018,2017,2019,2020,2021]')
df.shape
WARNING:root:Requests made without an app_token will be subject to strict throttling limits.
(106978, 12)
df
Age | Citation | Count | Date | Driver_search | Gender | GridLocation | Grid | Race | Reason | Vehicle_search | Year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | No | 1 | 2018-05-11T20:47:00.000 | No | Male | NaN | 991.0 | Black | Moving Violation | No | 2018 |
1 | NaN | No | 1 | 2019-03-03T00:15:00.000 | No | Male | NaN | 991.0 | Black | Equipment Violation | No | 2019 |
2 | 28.0 | Yes | 1 | 2019-02-13T13:21:00.000 | No | Male | NaN | 991.0 | Black | Moving Violation | No | 2019 |
3 | NaN | No | 1 | 2019-07-27T21:23:00.000 | No | Male | NaN | 991.0 | White | Equipment Violation | No | 2019 |
4 | NaN | No | 1 | 2018-01-20T21:19:00.000 | No | Male | NaN | 991.0 | White | Moving Violation | No | 2018 |
5 | NaN | No | 1 | 2018-12-27T14:02:00.000 | No | Male | NaN | 991.0 | White | Moving Violation | No | 2018 |
6 | 24.0 | Yes | 1 | 2018-03-13T13:51:00.000 | No | Female | NaN | 991.0 | White | Moving Violation | No | 2018 |
7 | NaN | Yes | 1 | 2019-03-02T22:12:00.000 | No | Male | NaN | 991.0 | White | Equipment Violation | No | 2019 |
8 | 25.0 | Yes | 1 | 2019-06-07T22:08:00.000 | No | Female | NaN | 991.0 | Other | Moving Violation | No | 2019 |
9 | NaN | No | 1 | 2019-01-19T22:08:00.000 | No | Male | NaN | 991.0 | White | Equipment Violation | No | 2019 |
10 | NaN | No | 1 | 2019-06-07T20:18:00.000 | No | Male | NaN | 991.0 | White | Moving Violation | No | 2019 |
11 | 22.0 | Yes | 1 | 2018-05-01T14:11:00.000 | No | Male | NaN | 991.0 | White | Moving Violation | No | 2018 |
12 | 21.0 | Yes | 1 | 2019-09-27T19:13:00.000 | No | Female | NaN | 991.0 | Black | Moving Violation | No | 2019 |
13 | NaN | No | 1 | 2019-07-20T21:10:00.000 | No | Male | NaN | 991.0 | White | Equipment Violation | No | 2019 |
14 | NaN | No | 1 | 2019-05-09T16:56:00.000 | No | Female | NaN | 999.0 | White | Moving Violation | No | 2019 |
15 | NaN | No | 1 | 2018-11-21T18:56:00.000 | No | Male | NaN | 991.0 | Asian | Moving Violation | No | 2018 |
16 | NaN | No | 1 | 2018-08-01T14:14:00.000 | No | Male | NaN | 993.0 | White | Moving Violation | No | 2018 |
17 | NaN | No | 1 | 2018-10-27T20:43:00.000 | No | Male | NaN | 991.0 | White | Equipment Violation | No | 2018 |
18 | NaN | No | 1 | 2018-02-18T00:36:00.000 | No | Male | NaN | 996.0 | Black | Moving Violation | No | 2018 |
19 | NaN | No | 1 | 2018-06-23T00:46:00.000 | No | Male | NaN | 993.0 | Other | Moving Violation | No | 2018 |
20 | NaN | No | 1 | 2018-11-21T20:06:00.000 | No | Male | NaN | 991.0 | White | Moving Violation | No | 2018 |
21 | 54.0 | Yes | 1 | 2018-04-09T14:20:00.000 | No | Female | NaN | 991.0 | White | Moving Violation | No | 2018 |
22 | NaN | No | 1 | 2018-10-05T20:49:00.000 | No | Male | NaN | 991.0 | White | Moving Violation | No | 2018 |
23 | NaN | No | 1 | 2018-10-05T20:35:00.000 | No | Female | NaN | 991.0 | White | Moving Violation | No | 2018 |
24 | NaN | No | 1 | 2018-10-27T21:29:00.000 | No | Male | NaN | 991.0 | Latino | Moving Violation | No | 2018 |
25 | NaN | Yes | 1 | 2018-06-30T22:35:00.000 | No | Male | NaN | 994.0 | White | Moving Violation | No | 2018 |
26 | NaN | No | 1 | 2018-01-13T00:51:00.000 | No | Male | NaN | 991.0 | White | Equipment Violation | No | 2018 |
27 | NaN | No | 1 | 2018-05-11T21:56:00.000 | No | Male | NaN | 991.0 | White | Moving Violation | No | 2018 |
28 | NaN | No | 1 | 2018-02-17T21:37:00.000 | No | Female | NaN | 991.0 | Black | Equipment Violation | No | 2018 |
29 | 23.0 | Yes | 1 | 2018-04-12T11:34:00.000 | No | Male | NaN | 991.0 | White | Moving Violation | No | 2018 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
784687 | NaN | No | 1 | 2017-08-30T17:45:00.000 | No | Female | {'latitude': '44.959171113', 'longitude': '-93... | 94.0 | Other | Moving Violation | No | 2017 |
784688 | NaN | No | 1 | 2017-11-02T22:46:00.000 | No | Male | {'latitude': '44.959369203', 'longitude': '-93... | 87.0 | Black | Moving Violation | No | 2017 |
784689 | NaN | No | 1 | 2017-10-25T03:00:00.000 | No | Female | {'latitude': '44.96658643', 'longitude': '-93.... | 68.0 | Black | Moving Violation | No | 2017 |
784690 | NaN | No | 1 | 2017-12-07T15:30:00.000 | No | Male | {'latitude': '44.988221181', 'longitude': '-93... | 11.0 | White | Moving Violation | No | 2017 |
784691 | 34.0 | Yes | 1 | 2017-09-07T08:30:00.000 | No | Female | {'latitude': '44.928875372', 'longitude': '-93... | 267.0 | White | Moving Violation | No | 2017 |
784692 | 30.0 | Yes | 1 | 2017-09-29T18:33:00.000 | No | Male | {'latitude': '44.959515062', 'longitude': '-93... | 91.0 | White | Moving Violation | No | 2017 |
784693 | NaN | Yes | 1 | 2017-09-13T10:42:00.000 | No | Female | {'latitude': '44.973827765', 'longitude': '-93... | 51.0 | Black | Moving Violation | No | 2017 |
784694 | NaN | Yes | 1 | 2017-07-27T08:06:00.000 | No | Male | {'latitude': '44.973124945', 'longitude': '-93... | 47.0 | Black | Moving Violation | No | 2017 |
784695 | NaN | No | 1 | 2017-09-20T22:34:00.000 | No | Female | {'latitude': '44.988314242', 'longitude': '-93... | 14.0 | Black | Moving Violation | No | 2017 |
784696 | 37.0 | Yes | 1 | 2017-09-28T09:39:00.000 | No | Male | {'latitude': '44.980704001', 'longitude': '-93... | 32.0 | Black | Moving Violation | No | 2017 |
784697 | NaN | No | 1 | 2017-07-25T13:50:00.000 | No | Male | {'latitude': '44.954914788', 'longitude': '-93... | 112.0 | Black | Moving Violation | No | 2017 |
784698 | NaN | Yes | 1 | 2017-12-06T10:20:00.000 | No | Female | {'latitude': '44.980999933', 'longitude': '-93... | 31.0 | Black | Moving Violation | No | 2017 |
784699 | NaN | Yes | 1 | 2017-10-05T10:03:00.000 | No | Female | {'latitude': '44.973124945', 'longitude': '-93... | 47.0 | Latino | Moving Violation | No | 2017 |
784700 | NaN | Yes | 1 | 2017-12-01T11:21:00.000 | No | Male | {'latitude': '44.928875372', 'longitude': '-93... | 267.0 | White | Moving Violation | No | 2017 |
784701 | NaN | No | 1 | 2017-10-11T15:12:00.000 | No | Female | {'latitude': '44.966688357', 'longitude': '-93... | 75.0 | Black | Moving Violation | No | 2017 |
784702 | NaN | No | 1 | 2017-10-03T17:40:00.000 | Yes | Male | {'latitude': '44.959923808', 'longitude': '-93... | 99.0 | White | Moving Violation | Yes | 2017 |
784703 | NaN | No | 1 | 2017-09-29T17:45:00.000 | No | Female | {'latitude': '44.95086001', 'longitude': '-93.... | 132.0 | White | Moving Violation | No | 2017 |
784704 | 27.0 | Yes | 1 | 2017-08-17T21:35:00.000 | No | Male | {'latitude': '44.953679624', 'longitude': '-93... | 111.0 | Other | Equipment Violation | No | 2017 |
784705 | NaN | No | 1 | 2017-07-29T01:17:00.000 | No | Male | {'latitude': '44.980704001', 'longitude': '-93... | 32.0 | Black | Equipment Violation | No | 2017 |
784706 | 28.0 | Yes | 1 | 2017-09-25T08:37:00.000 | No | Female | {'latitude': '44.965443164', 'longitude': '-93... | 269.0 | Black | Moving Violation | No | 2017 |
784707 | NaN | No | 1 | 2017-11-09T10:51:00.000 | No | Female | {'latitude': '44.952068692', 'longitude': '-93... | 105.0 | White | Moving Violation | No | 2017 |
784708 | NaN | No | 1 | 2017-12-04T00:15:00.000 | Yes | Male | {'latitude': '44.966741984', 'longitude': '-93... | 73.0 | Black | Moving Violation | Yes | 2017 |
784709 | NaN | No | 1 | 2017-08-31T23:28:00.000 | No | Female | {'latitude': '44.973798199', 'longitude': '-93... | 50.0 | White | Equipment Violation | No | 2017 |
784710 | NaN | Yes | 1 | 2017-09-07T10:09:00.000 | No | Female | {'latitude': '44.988189836', 'longitude': '-93... | 9.0 | White | Moving Violation | No | 2017 |
784711 | 25.0 | Yes | 1 | 2017-10-24T08:45:00.000 | No | Male | {'latitude': '44.952726847', 'longitude': '-93... | 119.0 | Black | Moving Violation | No | 2017 |
784712 | 26.0 | Yes | 1 | 2017-08-22T07:39:00.000 | No | Male | {'latitude': '44.952131905', 'longitude': '-93... | 107.0 | Other | Moving Violation | No | 2017 |
784713 | 82.0 | Yes | 1 | 2017-11-02T12:01:00.000 | No | Female | {'latitude': '44.916259517', 'longitude': '-93... | 206.0 | White | Moving Violation | No | 2017 |
784714 | NaN | Yes | 1 | 2017-10-31T09:05:00.000 | No | Female | {'latitude': '44.908307731', 'longitude': '-93... | 223.0 | Latino | Moving Violation | No | 2017 |
784715 | NaN | No | 1 | 2017-09-14T22:42:00.000 | No | Male | NaN | 24.0 | White | Equipment Violation | No | 2017 |
784716 | NaN | No | 1 | 2017-11-22T11:10:00.000 | No | Female | {'latitude': '44.948648967', 'longitude': '-93... | 131.0 | White | Moving Violation | No | 2017 |
106978 rows × 12 columns
First, I will be creating some functions that will make data transformations easier; such as creating an AgeBin, Neighborhood Designation, and Policing District Designation
#Community function
def commun(x):
if x in [67,68,87,88,89,90,91,92]:
return 'Thomas_Frogtown'
elif x in [5,6,7,8,25,26,27,28,45,46,47,48]:
return 'Como'
elif x in [107, 108, 109, 110,127,128,129,130]:
return 'Summit_University'
elif x in [101,102,103,104,105,106,122,123,124,125,126]:
return 'Union_Park'
elif x in [63,64,65,66,83,84,85,86]:
return 'Midway'
elif x in [142,143,144,145,146,162,163,164,165,166]:
return 'Macalester_Groveland'
elif x in [182,183,184,185,186,202,203,204,205,206,223,224,225,242,243,244,245,246]:
return 'Highland_Park'
elif x in [147,148,149,167,168]:
return 'Summit_Hill'
elif x in [1,2,21,22,43,44,61,62,81,82]:
return 'St_Anthony'
elif x in [226,207,187,188,189,267,268,169,170,171,249,150,151,230,367]:
return 'West_7th'
elif x in [209,210,211,212,213,214,215,192,193,194,195,172,173,174,175]:
return 'West_Side'
elif x in [111,112,131,132,133,152,153]:
return 'Capital_River'
elif x in [98,99,100,118,119,119,120,137,138,139,140,160,197,180,200,240,280]:
return 'Battle_Creek'
elif x in [76,95,96,97,115,116,117,138,114,136]:
return 'Dayton_Bluff'
elif x in [9,10,11,12,29,30,31,32,49,50,51,52,269,69,70,71,72]:
return 'North_End'
elif x in [13,14,15,16,33,34,35,36,53,54,55,56,73,74,75,93,94]:
return 'Payne_Phalen'
elif x in [17,18,19,20,37,38,39,40,56,57,58,59,60,77,78,79,80]:
return 'Greater_East_Side'
else:
return 'NaN'
#District Function
def district(x):
if x in [1,2,3,4,5,6,7,8,21,22,23,24,25,26,27,28,43,44,45,46,47,48,\
61,62,63,64,65,66,67,68,269,81,82,83,84,85,86,87,88,89,\
101,102,103,104,105,106,107,108,109,110,122,123,124,125,126,\
127,128,129,130,142,143,144,145,146,147,148,149,149,\
162,163,164,165,166,167,168,182,183,184,185,186,\
202,203,204,205,206,223,224,225,242,243,244,245,246]:
return 'Western'
elif x in [9,10,11,12,29,30,32,31,49,50,51,52,69,70,71,72,\
90,91,92,111,112,131,132,133,267,268,249,130,230,\
150,151,152,153,169,170,171,172,173,174,175,207,209,226,\
187,188,189,192,193,194,195,210,211,212,213,214,215]:
return 'Central'
elif x in [13,14,15,16,17,18,19,20,33,34,35,36,37,38,39,40,\
53,54,55,56,57,58,59,60,73,74,75,76,77,78,79,80,\
93,94,95,96,97,98,99,100,114,115,116,117,118,119,120,\
136,137,138,139,140,160,197,180,200,240,280]:
return 'Eastern'
else:
return 'NaN'
# Define Age Function bins
def agef(x):
if x<19:
return 'Teen <19'
elif x>18 and x<25:
return 'Young Adult 19-24'
elif x>25 and x<31:
return 'Adult 26-30'
elif x>30 and x<46:
return 'Middle Adult 31-45'
elif x>45:
return 'Older Adult 46+'
else:
return 'NaN'
I will be constructing several variables. Originally, I wanted to extract gridlocation coordinates from the dataset, but it makes more sense to connect the grid to a json file. The manipulations and additions are listed below:
Initial Omissions
df.Reason.value_counts()
Moving Violation 77989 Equipment Violation 24475 Investigative Stop 4191 911 Call / Citizen Reported 183 No Data 140 Name: Reason, dtype: int64
df.Year.value_counts()
2017 31290 2018 29685 2019 22843 2020 18164 Name: Year, dtype: int64
#Prepping the Primary Dataset
#Add District Plannning Council and District columns from Functions
df['Community']= df['Grid'].apply(commun)
df['District']= df['Grid'].apply(district)
df['AgeDemo']= df['Age'].apply(agef)
#Add Time Variables
df['Date']= pd.to_datetime(df['Date'])
df['DayofWeek']=df['Date'].dt.dayofweek
df['Weekend'] = df['DayofWeek'].apply(lambda x: 1 if (x>4) else 0)
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day
df['Hour'] = df['Date'].dt.hour
df['LateNight'] = df['Hour'].apply(lambda x: 1 if (x>21 or x<5) else 0)
#Screening
df= df[df.Reason != '911 Call / Citizen Reported']
df= df[df.Reason != 'No Data']
df= df.loc[df['Reason'] != 'Investigative Stop']
df= df.loc[df['Race'] != 'Native American']
#Replace variables with dummies
df['Driver_search'].replace(to_replace=['No','Yes'], value=[0,1],inplace=True)
df['Vehicle_search'].replace(to_replace=['No','Yes'], value=[0,1],inplace=True)
df['Citation'].replace(to_replace=['No','Yes'], value=[0,1],inplace=True)
df['Gender'].replace(to_replace=['Male','Female'], value=[0,1],inplace=True) #FEMALE is 1
df= pd.concat([df,pd.get_dummies(df['Reason'])], axis=1)
df= pd.concat([df,pd.get_dummies(df['Race'])], axis=1)
#Let Logitiude and Latitude
## I conldn't figure out how to utilize the dictionary, so I just converted it to a string
# Separate Latitude and Longitude
df['GridLocation'] = df['GridLocation'].astype('str')
#Get Latitude
new=df['GridLocation'].str.split("',", n = 1, expand = True)
# making seperate first name column from new data frame
lat=new[0].str.split(" '", n = 1, expand = True)
df['Latitude']= pd.to_numeric(lat[1])
#Get Longtitude
long= new[1].str.split(": '", n = 1, expand = True)
#long[1]
df['Longitude']= pd.to_numeric(long[1].str.rstrip("'}"))
#Use if need to change variables ot integers
#df[['Dr_search', 'V_search', 'Citation']] = df[['Dr_search', 'V_search', 'Gender','Citation']].astype(int)
# Remove any missing community data entries
df= df[df.Community != 'NaN']
df.head()
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-7-5b64225ba477> in <module> 22 23 #Replace variables with dummies ---> 24 df['Driver_search'].replace(to_replace=['No','Yes'], value=[0,1],inplace=True) 25 df['Vehicle_search'].replace(to_replace=['No','Yes'], value=[0,1],inplace=True) 26 df['Citation'].replace(to_replace=['No','Yes'], value=[0,1],inplace=True) ~\Anaconda3\lib\site-packages\pandas\core\series.py in replace(self, to_replace, value, inplace, limit, regex, method) 3843 return super(Series, self).replace(to_replace=to_replace, value=value, 3844 inplace=inplace, limit=limit, -> 3845 regex=regex, method=method) 3846 3847 @Appender(generic._shared_docs['shift'] % _shared_doc_kwargs) ~\Anaconda3\lib\site-packages\pandas\core\generic.py in replace(self, to_replace, value, inplace, limit, regex, method) 6545 dest_list=value, 6546 inplace=inplace, -> 6547 regex=regex) 6548 6549 else: # [NA, ''] -> 0 ~\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in replace_list(self, src_list, dest_list, inplace, regex) 557 return _compare_or_regex_search(values, s, regex) 558 --> 559 masks = [comp(s, regex) for i, s in enumerate(src_list)] 560 561 result_blocks = [] ~\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in <listcomp>(.0) 557 return _compare_or_regex_search(values, s, regex) 558 --> 559 masks = [comp(s, regex) for i, s in enumerate(src_list)] 560 561 result_blocks = [] ~\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in comp(s, regex) 555 return _compare_or_regex_search(maybe_convert_objects(values), 556 getattr(s, 'asm8'), regex) --> 557 return _compare_or_regex_search(values, s, regex) 558 559 masks = [comp(s, regex) for i, s in enumerate(src_list)] ~\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in _compare_or_regex_search(a, b, regex) 1949 raise TypeError( 1950 "Cannot compare types {a!r} and {b!r}".format(a=type_names[0], -> 1951 b=type_names[1])) 1952 return result 1953 TypeError: Cannot compare types 'ndarray(dtype=int64)' and 'str'
df.head()
Age | Citation | Count | Date | Driver_search | Gender | GridLocation | Grid | Race | Reason | ... | Hour | LateNight | Equipment Violation | Moving Violation | Asian | Black | Latino | Other | White | Latitude | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | 0 | 1 | 2018-05-11 20:47:00 | 0 | 0 | nan | 991.0 | Black | Moving Violation | ... | 20 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | NaN |
1 | NaN | 0 | 1 | 2019-03-03 00:15:00 | 0 | 0 | nan | 991.0 | Black | Equipment Violation | ... | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | NaN |
2 | 28.0 | 1 | 1 | 2019-02-13 13:21:00 | 0 | 0 | nan | 991.0 | Black | Moving Violation | ... | 13 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | NaN |
3 | NaN | 0 | 1 | 2019-07-27 21:23:00 | 0 | 0 | nan | 991.0 | White | Equipment Violation | ... | 21 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | NaN |
4 | NaN | 0 | 1 | 2018-01-20 21:19:00 | 0 | 0 | nan | 991.0 | White | Moving Violation | ... | 21 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | NaN |
5 rows × 29 columns
a=df.query("Year==2020")
a.Race.value_counts()
df20x=df
df20x
Age | Citation | Count | Date | Driver_search | Gender | GridLocation | Grid | Race | Reason | ... | Hour | LateNight | Equipment Violation | Moving Violation | Asian | Black | Latino | Other | White | Latitude | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | 0 | 1 | 2018-05-11 20:47:00 | 0 | 0 | nan | 991.0 | Black | Moving Violation | ... | 20 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | NaN |
1 | NaN | 0 | 1 | 2019-03-03 00:15:00 | 0 | 0 | nan | 991.0 | Black | Equipment Violation | ... | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | NaN |
2 | 28.0 | 1 | 1 | 2019-02-13 13:21:00 | 0 | 0 | nan | 991.0 | Black | Moving Violation | ... | 13 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | NaN |
3 | NaN | 0 | 1 | 2019-07-27 21:23:00 | 0 | 0 | nan | 991.0 | White | Equipment Violation | ... | 21 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | NaN |
4 | NaN | 0 | 1 | 2018-01-20 21:19:00 | 0 | 0 | nan | 991.0 | White | Moving Violation | ... | 21 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | NaN |
5 | NaN | 0 | 1 | 2018-12-27 14:02:00 | 0 | 0 | nan | 991.0 | White | Moving Violation | ... | 14 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | NaN |
6 | 24.0 | 1 | 1 | 2018-03-13 13:51:00 | 0 | 1 | nan | 991.0 | White | Moving Violation | ... | 13 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | NaN |
7 | NaN | 1 | 1 | 2019-03-02 22:12:00 | 0 | 0 | nan | 991.0 | White | Equipment Violation | ... | 22 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | NaN |
8 | 25.0 | 1 | 1 | 2019-06-07 22:08:00 | 0 | 1 | nan | 991.0 | Other | Moving Violation | ... | 22 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | NaN |
9 | NaN | 0 | 1 | 2019-01-19 22:08:00 | 0 | 0 | nan | 991.0 | White | Equipment Violation | ... | 22 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | NaN |
10 | NaN | 0 | 1 | 2019-06-07 20:18:00 | 0 | 0 | nan | 991.0 | White | Moving Violation | ... | 20 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | NaN |
11 | 22.0 | 1 | 1 | 2018-05-01 14:11:00 | 0 | 0 | nan | 991.0 | White | Moving Violation | ... | 14 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | NaN |
12 | 21.0 | 1 | 1 | 2019-09-27 19:13:00 | 0 | 1 | nan | 991.0 | Black | Moving Violation | ... | 19 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | NaN |
13 | NaN | 0 | 1 | 2019-07-20 21:10:00 | 0 | 0 | nan | 991.0 | White | Equipment Violation | ... | 21 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | NaN |
14 | NaN | 0 | 1 | 2019-05-09 16:56:00 | 0 | 1 | nan | 999.0 | White | Moving Violation | ... | 16 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | NaN |
15 | NaN | 0 | 1 | 2018-11-21 18:56:00 | 0 | 0 | nan | 991.0 | Asian | Moving Violation | ... | 18 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | NaN |
16 | NaN | 0 | 1 | 2018-08-01 14:14:00 | 0 | 0 | nan | 993.0 | White | Moving Violation | ... | 14 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | NaN |
17 | NaN | 0 | 1 | 2018-10-27 20:43:00 | 0 | 0 | nan | 991.0 | White | Equipment Violation | ... | 20 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | NaN |
18 | NaN | 0 | 1 | 2018-02-18 00:36:00 | 0 | 0 | nan | 996.0 | Black | Moving Violation | ... | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | NaN |
19 | NaN | 0 | 1 | 2018-06-23 00:46:00 | 0 | 0 | nan | 993.0 | Other | Moving Violation | ... | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | NaN |
20 | NaN | 0 | 1 | 2018-11-21 20:06:00 | 0 | 0 | nan | 991.0 | White | Moving Violation | ... | 20 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | NaN |
21 | 54.0 | 1 | 1 | 2018-04-09 14:20:00 | 0 | 1 | nan | 991.0 | White | Moving Violation | ... | 14 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | NaN |
22 | NaN | 0 | 1 | 2018-10-05 20:49:00 | 0 | 0 | nan | 991.0 | White | Moving Violation | ... | 20 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | NaN |
23 | NaN | 0 | 1 | 2018-10-05 20:35:00 | 0 | 1 | nan | 991.0 | White | Moving Violation | ... | 20 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | NaN |
24 | NaN | 0 | 1 | 2018-10-27 21:29:00 | 0 | 0 | nan | 991.0 | Latino | Moving Violation | ... | 21 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | NaN |
25 | NaN | 1 | 1 | 2018-06-30 22:35:00 | 0 | 0 | nan | 994.0 | White | Moving Violation | ... | 22 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | NaN |
26 | NaN | 0 | 1 | 2018-01-13 00:51:00 | 0 | 0 | nan | 991.0 | White | Equipment Violation | ... | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | NaN |
27 | NaN | 0 | 1 | 2018-05-11 21:56:00 | 0 | 0 | nan | 991.0 | White | Moving Violation | ... | 21 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | NaN |
28 | NaN | 0 | 1 | 2018-02-17 21:37:00 | 0 | 1 | nan | 991.0 | Black | Equipment Violation | ... | 21 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | NaN |
29 | 23.0 | 1 | 1 | 2018-04-12 11:34:00 | 0 | 0 | nan | 991.0 | White | Moving Violation | ... | 11 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
784687 | NaN | 0 | 1 | 2017-08-30 17:45:00 | 0 | 1 | {'latitude': '44.959171113', 'longitude': '-93... | 94.0 | Other | Moving Violation | ... | 17 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 44.959171 |
784688 | NaN | 0 | 1 | 2017-11-02 22:46:00 | 0 | 0 | {'latitude': '44.959369203', 'longitude': '-93... | 87.0 | Black | Moving Violation | ... | 22 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 44.959369 |
784689 | NaN | 0 | 1 | 2017-10-25 03:00:00 | 0 | 1 | {'latitude': '44.96658643', 'longitude': '-93.... | 68.0 | Black | Moving Violation | ... | 3 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 44.966586 |
784690 | NaN | 0 | 1 | 2017-12-07 15:30:00 | 0 | 0 | {'latitude': '44.988221181', 'longitude': '-93... | 11.0 | White | Moving Violation | ... | 15 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 44.988221 |
784691 | 34.0 | 1 | 1 | 2017-09-07 08:30:00 | 0 | 1 | {'latitude': '44.928875372', 'longitude': '-93... | 267.0 | White | Moving Violation | ... | 8 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 44.928875 |
784692 | 30.0 | 1 | 1 | 2017-09-29 18:33:00 | 0 | 0 | {'latitude': '44.959515062', 'longitude': '-93... | 91.0 | White | Moving Violation | ... | 18 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 44.959515 |
784693 | NaN | 1 | 1 | 2017-09-13 10:42:00 | 0 | 1 | {'latitude': '44.973827765', 'longitude': '-93... | 51.0 | Black | Moving Violation | ... | 10 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 44.973828 |
784694 | NaN | 1 | 1 | 2017-07-27 08:06:00 | 0 | 0 | {'latitude': '44.973124945', 'longitude': '-93... | 47.0 | Black | Moving Violation | ... | 8 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 44.973125 |
784695 | NaN | 0 | 1 | 2017-09-20 22:34:00 | 0 | 1 | {'latitude': '44.988314242', 'longitude': '-93... | 14.0 | Black | Moving Violation | ... | 22 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 44.988314 |
784696 | 37.0 | 1 | 1 | 2017-09-28 09:39:00 | 0 | 0 | {'latitude': '44.980704001', 'longitude': '-93... | 32.0 | Black | Moving Violation | ... | 9 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 44.980704 |
784697 | NaN | 0 | 1 | 2017-07-25 13:50:00 | 0 | 0 | {'latitude': '44.954914788', 'longitude': '-93... | 112.0 | Black | Moving Violation | ... | 13 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 44.954915 |
784698 | NaN | 1 | 1 | 2017-12-06 10:20:00 | 0 | 1 | {'latitude': '44.980999933', 'longitude': '-93... | 31.0 | Black | Moving Violation | ... | 10 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 44.981000 |
784699 | NaN | 1 | 1 | 2017-10-05 10:03:00 | 0 | 1 | {'latitude': '44.973124945', 'longitude': '-93... | 47.0 | Latino | Moving Violation | ... | 10 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 44.973125 |
784700 | NaN | 1 | 1 | 2017-12-01 11:21:00 | 0 | 0 | {'latitude': '44.928875372', 'longitude': '-93... | 267.0 | White | Moving Violation | ... | 11 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 44.928875 |
784701 | NaN | 0 | 1 | 2017-10-11 15:12:00 | 0 | 1 | {'latitude': '44.966688357', 'longitude': '-93... | 75.0 | Black | Moving Violation | ... | 15 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 44.966688 |
784702 | NaN | 0 | 1 | 2017-10-03 17:40:00 | 1 | 0 | {'latitude': '44.959923808', 'longitude': '-93... | 99.0 | White | Moving Violation | ... | 17 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 44.959924 |
784703 | NaN | 0 | 1 | 2017-09-29 17:45:00 | 0 | 1 | {'latitude': '44.95086001', 'longitude': '-93.... | 132.0 | White | Moving Violation | ... | 17 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 44.950860 |
784704 | 27.0 | 1 | 1 | 2017-08-17 21:35:00 | 0 | 0 | {'latitude': '44.953679624', 'longitude': '-93... | 111.0 | Other | Equipment Violation | ... | 21 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 44.953680 |
784705 | NaN | 0 | 1 | 2017-07-29 01:17:00 | 0 | 0 | {'latitude': '44.980704001', 'longitude': '-93... | 32.0 | Black | Equipment Violation | ... | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 44.980704 |
784706 | 28.0 | 1 | 1 | 2017-09-25 08:37:00 | 0 | 1 | {'latitude': '44.965443164', 'longitude': '-93... | 269.0 | Black | Moving Violation | ... | 8 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 44.965443 |
784707 | NaN | 0 | 1 | 2017-11-09 10:51:00 | 0 | 1 | {'latitude': '44.952068692', 'longitude': '-93... | 105.0 | White | Moving Violation | ... | 10 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 44.952069 |
784708 | NaN | 0 | 1 | 2017-12-04 00:15:00 | 1 | 0 | {'latitude': '44.966741984', 'longitude': '-93... | 73.0 | Black | Moving Violation | ... | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 44.966742 |
784709 | NaN | 0 | 1 | 2017-08-31 23:28:00 | 0 | 1 | {'latitude': '44.973798199', 'longitude': '-93... | 50.0 | White | Equipment Violation | ... | 23 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 44.973798 |
784710 | NaN | 1 | 1 | 2017-09-07 10:09:00 | 0 | 1 | {'latitude': '44.988189836', 'longitude': '-93... | 9.0 | White | Moving Violation | ... | 10 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 44.988190 |
784711 | 25.0 | 1 | 1 | 2017-10-24 08:45:00 | 0 | 0 | {'latitude': '44.952726847', 'longitude': '-93... | 119.0 | Black | Moving Violation | ... | 8 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 44.952727 |
784712 | 26.0 | 1 | 1 | 2017-08-22 07:39:00 | 0 | 0 | {'latitude': '44.952131905', 'longitude': '-93... | 107.0 | Other | Moving Violation | ... | 7 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 44.952132 |
784713 | 82.0 | 1 | 1 | 2017-11-02 12:01:00 | 0 | 1 | {'latitude': '44.916259517', 'longitude': '-93... | 206.0 | White | Moving Violation | ... | 12 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 44.916260 |
784714 | NaN | 1 | 1 | 2017-10-31 09:05:00 | 0 | 1 | {'latitude': '44.908307731', 'longitude': '-93... | 223.0 | Latino | Moving Violation | ... | 9 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 44.908308 |
784715 | NaN | 0 | 1 | 2017-09-14 22:42:00 | 0 | 0 | nan | 24.0 | White | Equipment Violation | ... | 22 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | NaN |
784716 | NaN | 0 | 1 | 2017-11-22 11:10:00 | 0 | 1 | {'latitude': '44.948648967', 'longitude': '-93... | 131.0 | White | Moving Violation | ... | 11 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 44.948649 |
101982 rows × 29 columns
I'll be exploring the dataset through multiple angles: longitudinal, geo-spatial, and in-depth analysis from 2017-18. I will be focusing primarily on racial discrimination. I will be mostly be using data visualization, a predictive model would be inappropriate as the data is mostly binary.
There are several methods/considerations/limitations in testing for racial biases; :
The primary analysis will focus on the second layer, I'll be create a master table that collects the groupby values; conditioned on race.
My analysis will be focusing primarily on treatment:
'Eq' stands for Equipment Violation; and 'Mov' indicates Moving Violation
Most of the values are normalized from [0 to 1] and conditioned on Racial identity; Examples provided below
How to read results:
Eq_Margin of the Asian group indicates the percentage 'Equipment Violation' respect to all stops conditioned on being asian. So a value 0.24 would indicate that 24% of stops for Asians were for Equipment Violations.
Eq_Citation of 0.4 for Asian drivers indicates that 40% of Asian driver received a citation for equipment violations conditioned on being asian.
Mov_DriverSearch of 0.15 for Asians indicates that 15% of Asian drivers were searched during a Moving Violation conditioned on being Asian
Mov_Gender_F of 0.55 for Asians indicates that 55% of Asian women are stopped for Moving Violations conditioned on being women.
Eq_LateNight of 0.25 for Asians indicate that 25% of Asian drivers are stopped for Equipment Violation during latenight conditioned on being Asian
Morn_Citation of 0.2 for Asians indicate that 20% of Asian drivers received citations during the daytime conditioned of being Asian
Are there any trends throughout the years?
Does the number of stores and type of stores influence the neighborhood? We'll see this visually.
How does Frogtown compare to their neighbors?
#Empty Data table setup
data = [['Asian',15,15,15,0.5,0.5,10,0.5,0.5,10,0.5,0.5,0.5,0.5,10,0.5,0.5,0.5,0.5],\
['Black', 15,15,15,0.5,0.5,10,0.5,0.5,10,0.5,0.5,0.5,0.5,10,0.5,0.5,0.5,0.5],\
['Latino', 15,15,15,0.5,0.5,10,0.5,0.5,10,0.5,0.5,0.5,0.5,10,0.5,0.5,0.5,0.5],\
['Other', 15,15,15,0.5,0.5,10,0.5,0.5,10,0.5,0.5,0.5,0.5,10,0.5,0.5,0.5,0.5],\
['White',15,15,15,0.5,0.5,10,0.5,0.5,10,0.5,0.5,0.5,0.5,10,0.5,0.5,0.5,0.5],\
['Total/Average',15,15,15,0.5,0.5,10,0.5,0.5,10,0.5,0.5,0.5,0.5,10,0.5,0.5,0.5,0.5]]
Race_Grp = pd.DataFrame(data, columns= ['Race','Tot_Count','Eq_Count','Mov_Count','Eq_Margin','Mov_Margin','Citation_Count','Eq_Citation',\
'Mov_Citation','Driversearch_Count','Eq_DriverSearch','Mov_DriverSearch',\
'Eq_Gender_F','Mov_Gender_F','LateNight_Count','Eq_LateNight','Mov_LateNight',\
'Morn_Citation','Late_Citation'])
Race_Grp.set_index('Race', inplace=True)
#Specify Frogtown
#rf=df.query("Community=='Thomas_Frogtown'")
rf=df.query("Year==2020")
Note: The code is sensitive to the ordering of columns
#procedure: Run this line of code to check patterns
#round(rf['Race'].value_counts().sort_index(level=1),4)
RR= ['Asian','Black','Latino','Other','White']
for i,j in enumerate(RR):
Race_Grp.set_value(j,'Tot_Count', round(rf['Race'].value_counts().sort_index(level=1),4)[i])
Race_Grp.set_value(j,'Eq_Count', round(rf.groupby(['Race'])['Reason'].value_counts().sort_index(level=1),4)[i])
Race_Grp.set_value(j,'Mov_Count', round(rf.groupby(['Race'])['Reason'].value_counts().sort_index(level=1),4)[i+len(RR)])
Race_Grp.set_value(j,'Eq_Margin', round(rf.groupby(['Race'])['Reason'].value_counts(normalize=True).sort_index(level=1),4)[i])
Race_Grp.set_value(j,'Mov_Margin', round(rf.groupby(['Race'])['Reason'].value_counts(normalize=True).sort_index(level=1),4)[i +len(RR)])
Race_Grp.set_value(j,'Citation_Count',round(rf.groupby(['Race'])['Citation'].value_counts().sort_index(level=1),4)[i+len(RR)])
Race_Grp.set_value(j,'Eq_Citation', round(rf.groupby(['Race','Reason'])['Citation'].value_counts(normalize=True).sort_index(level=2),4)[2*i+10])
Race_Grp.set_value(j,'Mov_Citation', round(rf.groupby(['Race','Reason'])['Citation'].value_counts(normalize=True).sort_index(level=2),4)[2*i+11])
Race_Grp.set_value(j,'Driversearch_Count', round(rf.groupby(['Race'])['Driver_search'].value_counts().sort_index(level=1),4)[i+len(RR)])
Race_Grp.set_value(j,'Eq_DriverSearch', round(rf.groupby(['Race','Reason'])['Driver_search'].value_counts(normalize=True).sort_index(level=2),4)[i*2+10])
Race_Grp.set_value(j,'Mov_DriverSearch', round(rf.groupby(['Race','Reason'])['Driver_search'].value_counts(normalize=True).sort_index(level=2),4)[i*2+11])
Race_Grp.set_value(j,'Eq_Gender_F', round(rf.groupby(['Race','Reason'])['Gender'].value_counts(normalize=True).sort_index(level=2),4)[i*2+10])
Race_Grp.set_value(j,'Mov_Gender_F', round(rf.groupby(['Race','Reason'])['Gender'].value_counts(normalize=True).sort_index(level=2),4)[i*2+11])
Race_Grp.set_value(j,'LateNight_Count', round(rf.groupby(['Race'])['LateNight'].value_counts().sort_index(level=1),4)[i +len(RR)])
Race_Grp.set_value(j,'Eq_LateNight', round(rf.groupby(['Race','Reason'])['LateNight'].value_counts(normalize=True).sort_index(level=2),4)[i*2+10])
Race_Grp.set_value(j,'Mov_LateNight', round(rf.groupby(['Race','Reason'])['LateNight'].value_counts(normalize=True).sort_index(level=2),4)[i*2+11])
Race_Grp.set_value(j,'Morn_Citation', round(rf.groupby(['Race','LateNight'])['Citation'].value_counts(normalize=True).sort_index(level=2),4)[i*2+10])
Race_Grp.set_value(j,'Late_Citation', round(rf.groupby(['Race','LateNight'])['Citation'].value_counts(normalize=True).sort_index(level=2),4)[i*2+11])
#Include Total/Avg Variables; #NOTE: the order
E=rf[rf.Reason == 'Equipment Violation'].sum() #2 /3
M=rf[rf.Reason == 'Moving Violation'].sum()#2 /3
Mo= rf[rf.LateNight == 0].sum()
L=rf[rf.LateNight == 1].sum()
Race_Grp.set_value('Total/Average','Tot_Count',rf.count()[1])
Race_Grp.set_value('Total/Average','Eq_Count',rf[rf.Reason == 'Equipment Violation'].count()[1])
Race_Grp.set_value('Total/Average','Mov_Count',rf[rf.Reason == 'Moving Violation'].count()[1])
Race_Grp.set_value('Total/Average','LateNight_Count',rf[rf.LateNight == 1].count()[1])
Race_Grp.set_value('Total/Average','Citation_Count',rf[rf.Citation == 1].count()[1])
Race_Grp.set_value('Total/Average','Driversearch_Count',rf[rf.Driver_search == 1].count()[1])
Race_Grp.set_value('Total/Average','Eq_Margin',E[2]/rf.count()[1])
Race_Grp.set_value('Total/Average','Mov_Margin',M[2]/rf.count()[1])
Race_Grp.set_value('Total/Average','Eq_Citation',E[1]/E[2])
Race_Grp.set_value('Total/Average','Mov_Citation',M[1]/M[2])
Race_Grp.set_value('Total/Average','Eq_DriverSearch',E[3]/E[2])
Race_Grp.set_value('Total/Average','Mov_DriverSearch',M[3]/M[2])
Race_Grp.set_value('Total/Average','Eq_Gender_F',E[4]/E[2])
# Race_Grp.set_value('Total/Average','Mov_Gender_F',M[4]/M[2])
Race_Grp.set_value('Total/Average','Eq_LateNight',E[19]/E[2])
Race_Grp.set_value('Total/Average','Mov_LateNight',M[19]/M[2])
Race_Grp.set_value('Total/Average','Morn_Citation',Mo[1]/Mo[2])
Race_Grp.set_value('Total/Average','Late_Citation',L[1]/L[2])
Race_Grp
#round(rf['Race'].value_counts().sort_index(level=1),4)
Tot_Count | Eq_Count | Mov_Count | Eq_Margin | Mov_Margin | Citation_Count | Eq_Citation | Mov_Citation | Driversearch_Count | Eq_DriverSearch | Mov_DriverSearch | Eq_Gender_F | Mov_Gender_F | LateNight_Count | Eq_LateNight | Mov_LateNight | Morn_Citation | Late_Citation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Race | ||||||||||||||||||
Asian | 1592 | 458 | 1134 | 0.287700 | 0.712300 | 594 | 0.168100 | 0.455900 | 138 | 0.122300 | 0.072300 | 0.229300 | 0.2681 | 429 | 0.397400 | 0.2178 | 0.458300 | 0.142200 |
Black | 7780 | 2610 | 5170 | 0.335500 | 0.664500 | 2902 | 0.206100 | 0.457300 | 781 | 0.131000 | 0.084900 | 0.296200 | 0.3124 | 2286 | 0.394600 | 0.2429 | 0.460900 | 0.161900 |
Latino | 1112 | 350 | 762 | 0.314700 | 0.685300 | 486 | 0.251400 | 0.522300 | 69 | 0.088600 | 0.049900 | 0.265700 | 0.2822 | 262 | 0.360000 | 0.1785 | 0.521200 | 0.164100 |
Other | 330 | 85 | 245 | 0.257600 | 0.742400 | 153 | 0.305900 | 0.518400 | 13 | 0.070600 | 0.028600 | 0.305900 | 0.2857 | 74 | 0.352900 | 0.1796 | 0.558600 | 0.135100 |
White | 7350 | 1925 | 5425 | 0.261900 | 0.738100 | 3478 | 0.328300 | 0.524600 | 300 | 0.058700 | 0.034500 | 0.366200 | 0.3808 | 1320 | 0.264900 | 0.1493 | 0.543000 | 0.154500 |
Total/Average | 18164 | 5428 | 12736 | 0.298833 | 0.701167 | 7613 | 0.250737 | 0.490892 | 1301 | 0.100958 | 0.059124 | 0.313559 | 0.5000 | 4371 | 0.345984 | 0.0000 | 0.502066 | 0.157401 |
F=['Tot_Count','Eq_Count','Mov_Count','Eq_Margin','Mov_Margin', 'Eq_Gender_F','Mov_Gender_F']
Race_Grp[F]
Tot_Count | Eq_Count | Mov_Count | Eq_Margin | Mov_Margin | Eq_Gender_F | Mov_Gender_F | |
---|---|---|---|---|---|---|---|
Race | |||||||
Asian | 1592 | 458 | 1134 | 0.287700 | 0.712300 | 0.229300 | 0.2681 |
Black | 7780 | 2610 | 5170 | 0.335500 | 0.664500 | 0.296200 | 0.3124 |
Latino | 1112 | 350 | 762 | 0.314700 | 0.685300 | 0.265700 | 0.2822 |
Other | 330 | 85 | 245 | 0.257600 | 0.742400 | 0.305900 | 0.2857 |
White | 7350 | 1925 | 5425 | 0.261900 | 0.738100 | 0.366200 | 0.3808 |
Total/Average | 18164 | 5428 | 12736 | 0.298833 | 0.701167 | 0.313559 | 0.5000 |
G=['Citation_Count','Eq_Citation','Mov_Citation','Driversearch_Count','Eq_DriverSearch','Mov_DriverSearch',]
Race_Grp[G]
Citation_Count | Eq_Citation | Mov_Citation | Driversearch_Count | Eq_DriverSearch | Mov_DriverSearch | |
---|---|---|---|---|---|---|
Race | ||||||
Asian | 594 | 0.168100 | 0.455900 | 138 | 0.122300 | 0.072300 |
Black | 2902 | 0.206100 | 0.457300 | 781 | 0.131000 | 0.084900 |
Latino | 486 | 0.251400 | 0.522300 | 69 | 0.088600 | 0.049900 |
Other | 153 | 0.305900 | 0.518400 | 13 | 0.070600 | 0.028600 |
White | 3478 | 0.328300 | 0.524600 | 300 | 0.058700 | 0.034500 |
Total/Average | 7613 | 0.250737 | 0.490892 | 1301 | 0.100958 | 0.059124 |
H=['LateNight_Count','Eq_LateNight','Mov_LateNight','Morn_Citation','Late_Citation']
Race_Grp[H]
LateNight_Count | Eq_LateNight | Mov_LateNight | Morn_Citation | Late_Citation | |
---|---|---|---|---|---|
Race | |||||
Asian | 429 | 0.397400 | 0.2178 | 0.458300 | 0.142200 |
Black | 2286 | 0.394600 | 0.2429 | 0.460900 | 0.161900 |
Latino | 262 | 0.360000 | 0.1785 | 0.521200 | 0.164100 |
Other | 74 | 0.352900 | 0.1796 | 0.558600 | 0.135100 |
White | 1320 | 0.264900 | 0.1493 | 0.543000 | 0.154500 |
Total/Average | 4371 | 0.345984 | 0.0000 | 0.502066 | 0.157401 |
We will be taking the results from previous section and graphing it
# Graph Based on Counts
#Select Features
Features= ['Tot_Count','Eq_Count', 'Mov_Count','LateNight_Count', 'Citation_Count','Driversearch_Count']
df_t= Race_Grp[Features][:-1]
df_t = df_t.transpose()
#print(df_t)
#Plotting
ax= df_t.plot(kind='barh', figsize=(10, 6),alpha= 0.6)
plt.title('Total Traffic Stops in Saint Paul from 2017-18')
plt.ylabel('')
plt.xlabel('')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
# Graph on Equipment Violations
#Select Features
Features= ['Eq_Margin','Eq_LateNight','Eq_Citation','Eq_DriverSearch','Eq_Gender_F']
df_t= Race_Grp[Features][:-1]
df_t = df_t.transpose()
#print(df_t)
#Plotting
ax=df_t.plot(kind='barh', figsize=(10, 6),alpha= 0.6)
plt.title('Equipment Violations in Saint Paul from 2017-18')
plt.ylabel('')
plt.xlabel('Percent')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
# Graph Based on Movement Violations
#Select Features
Features= ['Mov_Margin','Mov_Citation','Mov_DriverSearch','Mov_Gender_F','Mov_LateNight' ]
df_t= Race_Grp[Features][:-1]
df_t = df_t.transpose()
#print(df_t)
#Plotting
ax=df_t.plot(kind='barh', figsize=(10, 6),alpha= 0.6)
plt.title('Movement Violations in Saint Paul from 2017-18')
plt.ylabel('')
plt.xlabel('Percent')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
# Graph Based on Morning and Latenight Citations
#Select Features
Features= ['Morn_Citation','Late_Citation']
df_t= Race_Grp[Features][:-1]
df_t = df_t.transpose()
#print(df_t)
#Plotting
ax=df_t.plot(kind='barh', figsize=(10, 6),alpha= 0.6)
plt.title('Morning vs LateNight Citations Margin in Saint Paul from 2017-18')
plt.ylabel('')
plt.xlabel('')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
#Age Breakdown by Citation
#bf= rf.query("AgeDemo != 'NaN'")
Features= ['Citation','AgeDemo']
B= rf[Features].groupby(['AgeDemo']).sum()
B=B.transpose()
#print(B)
ax=B.plot(kind='barh', figsize=(10, 6),alpha= 0.6)
plt.title('Driver Age Citation Breakdown in Saint Paul from 2017-18')
plt.ylabel('')
plt.xlabel('')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
Insights
Recall in Frogtown the racial distribution is 1/3 Black, 1/3 Asian, and 1/5 white. With that said there is greater proportion of both Black and White drivers being stopped.
I will be checking out patterns for month, day of the week, and time of hour.
#Traffic Stops by Day of Week (Count)
Features= ['Count','LateNight','DayofWeek']
B= rf[Features].groupby(['DayofWeek']).sum() #group by Function
#print(B)
ax=B.plot(kind='line')
plt.title('Traffic Stops (Count) by DayofWeek in Frogtown from 2017-19')
plt.ylabel('')
plt.xlabel('Day of Week (Starting Monday)')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
#Traffic Stops by Day of Week
Features= ['Count','Citation','Equipment Violation','Moving Violation','LateNight','DayofWeek']
B= rf[Features].groupby(['DayofWeek']).sum() #group by Function
B=B.div(B['Count'].values,axis=0) #divide by count to get normalization
B.drop(B.columns[[0]], axis=1, inplace=True) #drop first group
#print(B)
ax=B.plot(kind='line')
plt.title('Traffic Stops (Margin) by DayofWeek in Frogtown from 2017-19')
plt.ylabel('')
plt.xlabel('Day of Week (Starting Monday)')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
#Traffic Stops by Month
Features= ['Count', 'LateNight','Month']
B= rf[Features].groupby(['Month']).sum()
#print(B)
ax=B.plot(kind='line')
plt.title('Monthly Traffic Stops (Count) in Frogtown from 2017-19')
plt.ylabel('')
plt.xlabel('Month')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
#Traffic Stops by Month
Features= ['Count','Citation','Equipment Violation','Moving Violation','LateNight','Month']
B= rf[Features].groupby(['Month']).sum()
B=B.div(B['Count'].values,axis=0)
B.drop(B.columns[[0]], axis=1, inplace=True)
#print(B)
ax=B.plot(kind='line')
plt.title('Monthly Traffic Stops (margin) in Frogtown from 2017-19')
plt.ylabel('')
plt.xlabel('Month')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
#Traffic Stops by Hour
Features= ['Count','Driver_search','Hour']
B= rf[Features].groupby(['Hour']).sum()
#print(B)
ax= B.plot(kind='line')
plt.title('Traffic Stops Hourly (counts) in Frogtown from 2017-19')
plt.ylabel('')
plt.xlabel('Hour')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
#Traffic Stops by Hour
Features= ['Count','Citation','Equipment Violation','Moving Violation','Driver_search','Hour']
B= rf[Features].groupby(['Hour']).sum()
B=B.div(B['Count'].values,axis=0)
B.drop(B.columns[[0]], axis=1, inplace=True)
#print(B)
ax= B.plot(kind='line')
plt.title('Traffic Stops (margin) Hourly in Frogtown from 2017-19')
plt.ylabel('')
plt.xlabel('Hour')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
The full dataset ranges from 2001 to 2018 and has many missing components. For some years, 50% of the data collected are missing key information. The 'total count' will includes all instances of traffic stops in Frogtown even if there is missing supplmental information. Note the sum function ignores missing values.
#Data Prep
#From the masterdataset get subsample
long= results_df.query('Grid in [87.0,88.0,89.0,67.0,68.0,90.0,91.0,92.0]')
#Add Time Variables
long['Date']= pd.to_datetime(long['Date'])
long['DayofWeek']=long['Date'].dt.dayofweek
long['Weekend'] = long['DayofWeek'].apply(lambda x: 1 if (x>4) else 0)
long['Month'] = long['Date'].dt.month
long['Day'] = long['Date'].dt.day
long['Hour'] = long['Date'].dt.hour
long['LateNight'] = long['Hour'].apply(lambda x: 1 if (x>21 or x<5) else 0)
#Screening
long=long.query("Race!='No Data'")
long=long.query("Driver_search !='No Data'")
long=long.query("Gender !='No Data'")
#Replace variables with dummies
long['Driver_search'].replace(to_replace=['No','Yes'], value=[0,1],inplace=True)
long['Vehicle_search'].replace(to_replace=['No','Yes'], value=[0,1],inplace=True)
long['Citation'].replace(to_replace=['No','Yes'], value=[0,1],inplace=True)
long['Gender'].replace(to_replace=['Male','Female'], value=[0,1],inplace=True) #FEMALE is 1
long[['Driver_search','Gender']] = long[['Driver_search', 'Gender']].astype(int)
long= pd.concat([long,pd.get_dummies(long['Reason'])], axis=1)
long= pd.concat([long,pd.get_dummies(long['Race'])], axis=1)
FG_long= long
#Line Graph Total Stops In Frogtown
Features= ['Count','Year']
B= FG_long[Features].groupby(['Year']).sum()
ax= B.plot(kind='line')
plt.title('Total Traffic Stops in Frogtown from 2001-19')
plt.ylabel('')
plt.xlabel('')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
#Line graph COunt
Features= ['Count','Citation','LateNight','Driver_search','Gender','Year']
B= FG_long[Features].groupby(['Year']).sum() #group by Function
#print(B)
ax= B.plot(kind='line')
plt.title('Trends Count in Frogtown from 2001-19')
plt.ylabel('')
plt.xlabel('')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
#Line Graph Margin
Features= ['Count','Citation','LateNight','Driver_search','Gender','Year']
B= FG_long[Features].groupby(['Year']).sum() #group by Function
B=B.div(B['Count'].values,axis=0) #divide by count to get normalization
B.drop(B.columns[[0]], axis=1, inplace=True) #drop first group
#print(B)
ax= B.plot(kind='line')
plt.title('Margin Trends in Frogtown from 2001-19')
plt.ylabel('')
plt.xlabel('')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
Features= ['Count','Asian','Black','Latino','Other','White','Year']
B= FG_long[Features].groupby(['Year']).sum() #group by Function
B=B.div(B['Count'].values,axis=0) #divide by count to get normalization
B.drop(B.columns[[0]], axis=1, inplace=True) #drop first group
#print(B)
ax= B.plot(kind='line')
plt.title('Racial Breakdown Margin of Traffic Stops in Frogtown from 2001-19')
plt.ylabel('')
plt.xlabel('')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
Preliminary Longitudinal Analysis:
I will be using the Four-Square API to get the nearby venues, within the radius of each police grid. I've separated the venue into three categories of interest. The map below indicates the venues within Frogtown (Upon running it again, I may have exhausted my queries):
#Function that extracts information of many neighborhoods
def getNearbyVenues(names, latitudes, longitudes, radius=500):
venues_list=[]
for name, lat, lng in zip(names, latitudes, longitudes):
print(name)
# create the API request URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
CLIENT_ID,
CLIENT_SECRET,
VERSION,
lat,
lng,
radius,
LIMIT)
# make the GET request
results = requests.get(url).json()["response"]['groups'][0]['items']
# return only relevant information for each nearby venue
venues_list.append([(
name,
lat,
lng,
v['venue']['name'],
v['venue']['location']['lat'],
v['venue']['location']['lng'],
v['venue']['categories'][0]['name']) for v in results])
nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
nearby_venues.columns = ['Neighborhood',
'Neighborhood Latitude',
'Neighborhood Longitude',
'Venue',
'Venue_Latitude',
'Venue_Longitude',
'Venue_Category']
return(nearby_venues)
#Simplify the table
ab=rf.groupby(['Grid','Latitude','Longitude'], as_index=False).count()
Features= ['Grid','Latitude','Longitude']
ab= ab[Features]
# Get the location of Venues for bunch of locations
Frogtown_venues = getNearbyVenues(names=ab['Grid'],
latitudes=ab['Latitude'],
longitudes=ab['Longitude']
)
Bar= ['Dive Bar','Bar','Liquor Store']
Restaurant= ['Asian Restaurant', 'Thai Restaurant','Ethiopian Restaurant',' Vietnamese Restaurant','Middle Eastern Restaurant'\
,'Chinese Restaurant','Ramen Restaurant','BBQ Joint','Fast Food Restaurant','Noodle House','Restaurant','Café']
Convenience= ['Convenience Store','Grocery Store']
#Select on mutiple values
Frogtown_venuesB= Frogtown_venues.loc[Frogtown_venues['Venue_Category'].isin(Bar)]
Frogtown_venuesR= Frogtown_venues.loc[Frogtown_venues['Venue_Category'].isin(Restaurant)]
Frogtown_venuesC= Frogtown_venues.loc[Frogtown_venues['Venue_Category'].isin(Convenience)]
67.0 68.0 87.0 88.0 89.0 90.0 91.0 92.0
Frogtown_venues.head()
Neighborhood | Neighborhood Latitude | Neighborhood Longitude | Venue | Venue_Latitude | Venue_Longitude | Venue_Category | |
---|---|---|---|---|---|---|---|
0 | 67.0 | 44.966603 | -93.141545 | Half Time Rec | 44.970362 | -93.143015 | Dive Bar |
1 | 67.0 | 44.966603 | -93.141545 | Tobasi Stop | 44.963243 | -93.139395 | Convenience Store |
2 | 67.0 | 44.966603 | -93.141545 | Pro-Image Beauty School | 44.963986 | -93.140867 | Cosmetics Shop |
3 | 67.0 | 44.966603 | -93.141545 | Minnehaha Liquors | 44.963355 | -93.140359 | Liquor Store |
4 | 67.0 | 44.966603 | -93.141545 | Sunrise Creative Gourmet | 44.964883 | -93.137208 | Market |
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
import folium # map rendering library
#Create Frogtown GeoMap
FG_map = folium.Map(location=[44.958326, -93.122926], zoom_start=14,tiles="OpenStreetMap")
# For Bars
for lat, lng, borough, neighborhood in zip(Frogtown_venuesB['Venue_Latitude'], Frogtown_venuesB['Venue_Longitude'],Frogtown_venuesB['Venue'], Frogtown_venuesB['Venue_Category']):
label = '{}, {}'.format(borough, neighborhood)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, lng],
radius=5,
popup=label,
color='blue',
fill=True,
fill_color='#3186cc',
fill_opacity=0.7,
parse_html=False).add_to(FG_map)
#Convenice Stores
for lat, lng, borough, neighborhood in zip(Frogtown_venuesC['Venue_Latitude'], Frogtown_venuesC['Venue_Longitude'],Frogtown_venuesC['Venue'], Frogtown_venuesC['Venue_Category']):
label = '{}, {}'.format(borough, neighborhood)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, lng],
radius=5,
popup=label,
color='yellow',
fill=True,
fill_color='#3186cc',
fill_opacity=0.7,
parse_html=False).add_to(FG_map)
# Restaurants
for lat, lng, borough, neighborhood in zip(Frogtown_venuesR['Venue_Latitude'], Frogtown_venuesR['Venue_Longitude'],Frogtown_venuesR['Venue'], Frogtown_venuesR['Venue_Category']):
label = '{}, {}'.format(borough, neighborhood)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, lng],
radius=5,
popup=label,
color='green',
fill=True,
fill_color='#3186cc',
fill_opacity=0.7,
parse_html=False).add_to(FG_map)
FG_map
The graph above shows the stores within the Frogtown area. University Avenue has concentrated traffic. There are some neighborhood bars and convenience near the residential homes, we'll see a clearer picture with geo-spatial data
The Saint Paul Police department have a json file that maps out the police grid.
To prepare the data for geo-spatial information, I would need to group by grid. I changed the variables to dummies early on, so I can sum them up during aggregation.
For the neighborhood visualizations, I took in data from the Census on racial distribution
sp_geo1
'https://information.stpaul.gov/resource/kkd6-vvns.geojson'
#import Programs and police grid
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium # map rendering library
sp_geo = r'Geo-Json\Saint Paul Police Grid - Shapefile.geojson'
#Socrata API IS NOT that good
#sp_geo= 'https://information.stpaul.gov/resource/xfxz-iqwn.json'
# Programmer Note: One of the challenges to get appropiate connection was to set the attribute Grid to be json value gridnum.
#All of Saint Paul Grid
# The graph is determined by Grid so we will group along that
Features= ['Grid','Count','Citation','Equipment Violation','Moving Violation','Driver_search','Vehicle_search','LateNight',\
'Asian','Black','Latino','White','Other','Gender','Weekend']
# Create a sum and divide by Count; and
B= df[Features].groupby(['Grid']).sum()
#Save Sum Values
C= B[['Count','Citation','Equipment Violation','Moving Violation','LateNight']]
#Divide by Count and then add new columns in tranformed table
B=B.div(B['Count'].values,axis=0)
B['Count']=C.iloc[:,0]
B['Citation_count'] = C.iloc[:,1]
B['Equipment Violation_count'] = C.iloc[:,2]
B['Moving Violation_count'] = C.iloc[:,3]
B['LateNight_count'] = C.iloc[:,4]
# The GRID needs to follow exactly like the json file
B=B.reset_index()
B.Grid = B.Grid.astype(int)
B.Grid = B.Grid.astype(str)
#print(B.head())
#Frogtown Only:
#FROGTOWN Value
F= rf[Features].groupby(['Grid']).sum()
C= F['Count']
F=F.div(F['Count'].values,axis=0)
F['Count']=C
# The GRID needs to follow exactly like the json file
F=F.reset_index()
F.Grid = F.Grid.astype(int)
F.Grid = F.Grid.astype(str)
#Add demographic Data
SPDem = pd.read_csv('Data\MSP Neighborhoods_2013-2017.csv', skiprows=[1])
SPDem.columns = SPDem.columns.str.replace(' ', '')
SPDem['City'].value_counts().to_frame()
#Get Saint Paul only
SPDem= SPDem.query('City in ["St. Paul"]')
#SPDem.columns.values[i]
#for i,square in enumerate(SPDem.columns.values):
# print(str(i) + ' '+ square)
#print(SPDem)
#Specify columns of interest
SPDem = SPDem.iloc[:,[1,2,27,39,43,47,63,71,107,123,127,131,135,159,215,243,267,271,275,279,363,370,374,378,\
382,386,390,394,399,403,471,475,479,495,511,515,519,523]]
col_names= ['TotHH','Neigh','Unemploy','<HighSDip','HighSDip','AssDeg','BacDeg+','Foreign%','Child<18','Inc<35k','Inc_35to50','Inc_50to75','Inc75to100','Age65+','Renter','LangNotEng','Age18_24','Age25_34','Age35_44','Age45_54','Male','Race_Black','Race_Native','Race_Asian','Race_Other','Race_Two+','Race_His','Race_White','Race_POC','Poverty','Poverty_At','Poverty_100to150','Poverty_150to200','CarOwn','Commute<10','Commute_10to19','Commute_20-29','Commute_30+']
SPDem.columns= col_names
#Rename certain values so they can be matched/merged correctly with the other dataframe.
SPDem.Neigh.unique()
SPDem.iat[2,1] = 'Capital'
SPDem.iat[4,1] = 'Thomas'
SPDem.sort_values(['Neigh'],inplace=True)
SPDem =SPDem[['Race_White','Race_Black','Race_Asian']]
#All of SaintPaul Neighborhood
# The graph is determined by Grid so we will group along that
Features= ['Community','Count','Citation','Equipment Violation','Moving Violation','Driver_search','Vehicle_search','LateNight',\
'Asian','Black','Latino','White','Other','Gender','Weekend']
# Create a sum and divide by Count; and
N= df[Features].groupby(['Community']).sum()
#Save Sum Values
C= N[['Count','Citation','Equipment Violation','Moving Violation','LateNight']]
#Divide by Count and then add new columns in tranformed table
N=N.div(N['Count'].values,axis=0)
N['Count']=C.iloc[:,0]
N['Citation_count'] = C.iloc[:,1]
N['Equipment Violation_count'] = C.iloc[:,2]
N['Moving Violation_count'] = C.iloc[:,3]
N['LateNight_count'] = C.iloc[:,4]
# Add the demographic info!
N['White_Demo']= SPDem.iloc[:,0].values #.values was used
N['Black_Demo']= SPDem.iloc[:,1].values
N['Asian_Demo']= SPDem.iloc[:,2].values
N=N.reset_index()
#Merging two dataframes; where you create a distinct dataframe of just grid and community, then merge to neighbrhood file
A= df.groupby(['Grid','Community']).size().reset_index(name='Blah')
#A= df[['Grid','Community',]].groupby('Community').reset_index()
C=A
N=pd.merge(N, C, on='Community')
# The GRID needs to follow exactly like the json file
N=N.reset_index()
N.Grid = N.Grid.astype(int)
N.Grid = N.Grid.astype(str)
N
index | Community | Count | Citation | Equipment Violation | Moving Violation | Driver_search | Vehicle_search | LateNight | Asian | ... | Weekend | Citation_count | Equipment Violation_count | Moving Violation_count | LateNight_count | White_Demo | Black_Demo | Asian_Demo | Grid | Blah | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | Battle_Creek | 2290 | 0.722271 | 0.117467 | 0.882533 | 0.032751 | 0.030568 | 0.154148 | 0.143668 | ... | 0.110480 | 1654 | 269.0 | 2021.0 | 353 | 0.398383 | 0.203079 | 0.231585 | 98 | 268 |
1 | 1 | Battle_Creek | 2290 | 0.722271 | 0.117467 | 0.882533 | 0.032751 | 0.030568 | 0.154148 | 0.143668 | ... | 0.110480 | 1654 | 269.0 | 2021.0 | 353 | 0.398383 | 0.203079 | 0.231585 | 99 | 599 |
2 | 2 | Battle_Creek | 2290 | 0.722271 | 0.117467 | 0.882533 | 0.032751 | 0.030568 | 0.154148 | 0.143668 | ... | 0.110480 | 1654 | 269.0 | 2021.0 | 353 | 0.398383 | 0.203079 | 0.231585 | 100 | 32 |
3 | 3 | Battle_Creek | 2290 | 0.722271 | 0.117467 | 0.882533 | 0.032751 | 0.030568 | 0.154148 | 0.143668 | ... | 0.110480 | 1654 | 269.0 | 2021.0 | 353 | 0.398383 | 0.203079 | 0.231585 | 118 | 144 |
4 | 4 | Battle_Creek | 2290 | 0.722271 | 0.117467 | 0.882533 | 0.032751 | 0.030568 | 0.154148 | 0.143668 | ... | 0.110480 | 1654 | 269.0 | 2021.0 | 353 | 0.398383 | 0.203079 | 0.231585 | 119 | 682 |
5 | 5 | Battle_Creek | 2290 | 0.722271 | 0.117467 | 0.882533 | 0.032751 | 0.030568 | 0.154148 | 0.143668 | ... | 0.110480 | 1654 | 269.0 | 2021.0 | 353 | 0.398383 | 0.203079 | 0.231585 | 120 | 152 |
6 | 6 | Battle_Creek | 2290 | 0.722271 | 0.117467 | 0.882533 | 0.032751 | 0.030568 | 0.154148 | 0.143668 | ... | 0.110480 | 1654 | 269.0 | 2021.0 | 353 | 0.398383 | 0.203079 | 0.231585 | 137 | 39 |
7 | 7 | Battle_Creek | 2290 | 0.722271 | 0.117467 | 0.882533 | 0.032751 | 0.030568 | 0.154148 | 0.143668 | ... | 0.110480 | 1654 | 269.0 | 2021.0 | 353 | 0.398383 | 0.203079 | 0.231585 | 138 | 87 |
8 | 8 | Battle_Creek | 2290 | 0.722271 | 0.117467 | 0.882533 | 0.032751 | 0.030568 | 0.154148 | 0.143668 | ... | 0.110480 | 1654 | 269.0 | 2021.0 | 353 | 0.398383 | 0.203079 | 0.231585 | 139 | 65 |
9 | 9 | Battle_Creek | 2290 | 0.722271 | 0.117467 | 0.882533 | 0.032751 | 0.030568 | 0.154148 | 0.143668 | ... | 0.110480 | 1654 | 269.0 | 2021.0 | 353 | 0.398383 | 0.203079 | 0.231585 | 140 | 42 |
10 | 10 | Battle_Creek | 2290 | 0.722271 | 0.117467 | 0.882533 | 0.032751 | 0.030568 | 0.154148 | 0.143668 | ... | 0.110480 | 1654 | 269.0 | 2021.0 | 353 | 0.398383 | 0.203079 | 0.231585 | 160 | 70 |
11 | 11 | Battle_Creek | 2290 | 0.722271 | 0.117467 | 0.882533 | 0.032751 | 0.030568 | 0.154148 | 0.143668 | ... | 0.110480 | 1654 | 269.0 | 2021.0 | 353 | 0.398383 | 0.203079 | 0.231585 | 180 | 48 |
12 | 12 | Battle_Creek | 2290 | 0.722271 | 0.117467 | 0.882533 | 0.032751 | 0.030568 | 0.154148 | 0.143668 | ... | 0.110480 | 1654 | 269.0 | 2021.0 | 353 | 0.398383 | 0.203079 | 0.231585 | 197 | 1 |
13 | 13 | Battle_Creek | 2290 | 0.722271 | 0.117467 | 0.882533 | 0.032751 | 0.030568 | 0.154148 | 0.143668 | ... | 0.110480 | 1654 | 269.0 | 2021.0 | 353 | 0.398383 | 0.203079 | 0.231585 | 200 | 3 |
14 | 14 | Battle_Creek | 2290 | 0.722271 | 0.117467 | 0.882533 | 0.032751 | 0.030568 | 0.154148 | 0.143668 | ... | 0.110480 | 1654 | 269.0 | 2021.0 | 353 | 0.398383 | 0.203079 | 0.231585 | 240 | 15 |
15 | 15 | Battle_Creek | 2290 | 0.722271 | 0.117467 | 0.882533 | 0.032751 | 0.030568 | 0.154148 | 0.143668 | ... | 0.110480 | 1654 | 269.0 | 2021.0 | 353 | 0.398383 | 0.203079 | 0.231585 | 280 | 43 |
16 | 16 | Capital_River | 4911 | 0.697007 | 0.108735 | 0.891265 | 0.029933 | 0.025657 | 0.145795 | 0.083690 | ... | 0.159845 | 3423 | 534.0 | 4377.0 | 716 | 0.675663 | 0.132114 | 0.089509 | 111 | 148 |
17 | 17 | Capital_River | 4911 | 0.697007 | 0.108735 | 0.891265 | 0.029933 | 0.025657 | 0.145795 | 0.083690 | ... | 0.159845 | 3423 | 534.0 | 4377.0 | 716 | 0.675663 | 0.132114 | 0.089509 | 112 | 1007 |
18 | 18 | Capital_River | 4911 | 0.697007 | 0.108735 | 0.891265 | 0.029933 | 0.025657 | 0.145795 | 0.083690 | ... | 0.159845 | 3423 | 534.0 | 4377.0 | 716 | 0.675663 | 0.132114 | 0.089509 | 131 | 452 |
19 | 19 | Capital_River | 4911 | 0.697007 | 0.108735 | 0.891265 | 0.029933 | 0.025657 | 0.145795 | 0.083690 | ... | 0.159845 | 3423 | 534.0 | 4377.0 | 716 | 0.675663 | 0.132114 | 0.089509 | 132 | 230 |
20 | 20 | Capital_River | 4911 | 0.697007 | 0.108735 | 0.891265 | 0.029933 | 0.025657 | 0.145795 | 0.083690 | ... | 0.159845 | 3423 | 534.0 | 4377.0 | 716 | 0.675663 | 0.132114 | 0.089509 | 133 | 2307 |
21 | 21 | Capital_River | 4911 | 0.697007 | 0.108735 | 0.891265 | 0.029933 | 0.025657 | 0.145795 | 0.083690 | ... | 0.159845 | 3423 | 534.0 | 4377.0 | 716 | 0.675663 | 0.132114 | 0.089509 | 152 | 302 |
22 | 22 | Capital_River | 4911 | 0.697007 | 0.108735 | 0.891265 | 0.029933 | 0.025657 | 0.145795 | 0.083690 | ... | 0.159845 | 3423 | 534.0 | 4377.0 | 716 | 0.675663 | 0.132114 | 0.089509 | 153 | 465 |
23 | 23 | Como | 1328 | 0.640813 | 0.140813 | 0.859187 | 0.031627 | 0.026355 | 0.192018 | 0.100151 | ... | 0.146084 | 851 | 187.0 | 1141.0 | 255 | 0.762898 | 0.089230 | 0.054048 | 5 | 33 |
24 | 24 | Como | 1328 | 0.640813 | 0.140813 | 0.859187 | 0.031627 | 0.026355 | 0.192018 | 0.100151 | ... | 0.146084 | 851 | 187.0 | 1141.0 | 255 | 0.762898 | 0.089230 | 0.054048 | 6 | 29 |
25 | 25 | Como | 1328 | 0.640813 | 0.140813 | 0.859187 | 0.031627 | 0.026355 | 0.192018 | 0.100151 | ... | 0.146084 | 851 | 187.0 | 1141.0 | 255 | 0.762898 | 0.089230 | 0.054048 | 7 | 13 |
26 | 26 | Como | 1328 | 0.640813 | 0.140813 | 0.859187 | 0.031627 | 0.026355 | 0.192018 | 0.100151 | ... | 0.146084 | 851 | 187.0 | 1141.0 | 255 | 0.762898 | 0.089230 | 0.054048 | 8 | 54 |
27 | 27 | Como | 1328 | 0.640813 | 0.140813 | 0.859187 | 0.031627 | 0.026355 | 0.192018 | 0.100151 | ... | 0.146084 | 851 | 187.0 | 1141.0 | 255 | 0.762898 | 0.089230 | 0.054048 | 25 | 100 |
28 | 28 | Como | 1328 | 0.640813 | 0.140813 | 0.859187 | 0.031627 | 0.026355 | 0.192018 | 0.100151 | ... | 0.146084 | 851 | 187.0 | 1141.0 | 255 | 0.762898 | 0.089230 | 0.054048 | 26 | 29 |
29 | 29 | Como | 1328 | 0.640813 | 0.140813 | 0.859187 | 0.031627 | 0.026355 | 0.192018 | 0.100151 | ... | 0.146084 | 851 | 187.0 | 1141.0 | 255 | 0.762898 | 0.089230 | 0.054048 | 27 | 61 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
171 | 171 | Union_Park | 3137 | 0.659229 | 0.141536 | 0.858464 | 0.031559 | 0.026777 | 0.184252 | 0.039528 | ... | 0.174370 | 2068 | 444.0 | 2693.0 | 578 | 0.772243 | 0.098012 | 0.037718 | 126 | 189 |
172 | 172 | West_7th | 3038 | 0.727452 | 0.095787 | 0.904213 | 0.017117 | 0.011521 | 0.108624 | 0.035879 | ... | 0.143186 | 2210 | 291.0 | 2747.0 | 330 | 0.721428 | 0.108826 | 0.029926 | 150 | 274 |
173 | 173 | West_7th | 3038 | 0.727452 | 0.095787 | 0.904213 | 0.017117 | 0.011521 | 0.108624 | 0.035879 | ... | 0.143186 | 2210 | 291.0 | 2747.0 | 330 | 0.721428 | 0.108826 | 0.029926 | 151 | 42 |
174 | 174 | West_7th | 3038 | 0.727452 | 0.095787 | 0.904213 | 0.017117 | 0.011521 | 0.108624 | 0.035879 | ... | 0.143186 | 2210 | 291.0 | 2747.0 | 330 | 0.721428 | 0.108826 | 0.029926 | 169 | 463 |
175 | 175 | West_7th | 3038 | 0.727452 | 0.095787 | 0.904213 | 0.017117 | 0.011521 | 0.108624 | 0.035879 | ... | 0.143186 | 2210 | 291.0 | 2747.0 | 330 | 0.721428 | 0.108826 | 0.029926 | 170 | 266 |
176 | 176 | West_7th | 3038 | 0.727452 | 0.095787 | 0.904213 | 0.017117 | 0.011521 | 0.108624 | 0.035879 | ... | 0.143186 | 2210 | 291.0 | 2747.0 | 330 | 0.721428 | 0.108826 | 0.029926 | 171 | 73 |
177 | 177 | West_7th | 3038 | 0.727452 | 0.095787 | 0.904213 | 0.017117 | 0.011521 | 0.108624 | 0.035879 | ... | 0.143186 | 2210 | 291.0 | 2747.0 | 330 | 0.721428 | 0.108826 | 0.029926 | 187 | 59 |
178 | 178 | West_7th | 3038 | 0.727452 | 0.095787 | 0.904213 | 0.017117 | 0.011521 | 0.108624 | 0.035879 | ... | 0.143186 | 2210 | 291.0 | 2747.0 | 330 | 0.721428 | 0.108826 | 0.029926 | 188 | 215 |
179 | 179 | West_7th | 3038 | 0.727452 | 0.095787 | 0.904213 | 0.017117 | 0.011521 | 0.108624 | 0.035879 | ... | 0.143186 | 2210 | 291.0 | 2747.0 | 330 | 0.721428 | 0.108826 | 0.029926 | 189 | 9 |
180 | 180 | West_7th | 3038 | 0.727452 | 0.095787 | 0.904213 | 0.017117 | 0.011521 | 0.108624 | 0.035879 | ... | 0.143186 | 2210 | 291.0 | 2747.0 | 330 | 0.721428 | 0.108826 | 0.029926 | 207 | 433 |
181 | 181 | West_7th | 3038 | 0.727452 | 0.095787 | 0.904213 | 0.017117 | 0.011521 | 0.108624 | 0.035879 | ... | 0.143186 | 2210 | 291.0 | 2747.0 | 330 | 0.721428 | 0.108826 | 0.029926 | 226 | 46 |
182 | 182 | West_7th | 3038 | 0.727452 | 0.095787 | 0.904213 | 0.017117 | 0.011521 | 0.108624 | 0.035879 | ... | 0.143186 | 2210 | 291.0 | 2747.0 | 330 | 0.721428 | 0.108826 | 0.029926 | 230 | 216 |
183 | 183 | West_7th | 3038 | 0.727452 | 0.095787 | 0.904213 | 0.017117 | 0.011521 | 0.108624 | 0.035879 | ... | 0.143186 | 2210 | 291.0 | 2747.0 | 330 | 0.721428 | 0.108826 | 0.029926 | 249 | 80 |
184 | 184 | West_7th | 3038 | 0.727452 | 0.095787 | 0.904213 | 0.017117 | 0.011521 | 0.108624 | 0.035879 | ... | 0.143186 | 2210 | 291.0 | 2747.0 | 330 | 0.721428 | 0.108826 | 0.029926 | 267 | 393 |
185 | 185 | West_7th | 3038 | 0.727452 | 0.095787 | 0.904213 | 0.017117 | 0.011521 | 0.108624 | 0.035879 | ... | 0.143186 | 2210 | 291.0 | 2747.0 | 330 | 0.721428 | 0.108826 | 0.029926 | 268 | 469 |
186 | 186 | West_Side | 1192 | 0.433725 | 0.279362 | 0.720638 | 0.083893 | 0.078020 | 0.353188 | 0.041946 | ... | 0.283557 | 517 | 333.0 | 859.0 | 421 | 0.453392 | 0.135823 | 0.064984 | 172 | 61 |
187 | 187 | West_Side | 1192 | 0.433725 | 0.279362 | 0.720638 | 0.083893 | 0.078020 | 0.353188 | 0.041946 | ... | 0.283557 | 517 | 333.0 | 859.0 | 421 | 0.453392 | 0.135823 | 0.064984 | 173 | 193 |
188 | 188 | West_Side | 1192 | 0.433725 | 0.279362 | 0.720638 | 0.083893 | 0.078020 | 0.353188 | 0.041946 | ... | 0.283557 | 517 | 333.0 | 859.0 | 421 | 0.453392 | 0.135823 | 0.064984 | 174 | 245 |
189 | 189 | West_Side | 1192 | 0.433725 | 0.279362 | 0.720638 | 0.083893 | 0.078020 | 0.353188 | 0.041946 | ... | 0.283557 | 517 | 333.0 | 859.0 | 421 | 0.453392 | 0.135823 | 0.064984 | 175 | 1 |
190 | 190 | West_Side | 1192 | 0.433725 | 0.279362 | 0.720638 | 0.083893 | 0.078020 | 0.353188 | 0.041946 | ... | 0.283557 | 517 | 333.0 | 859.0 | 421 | 0.453392 | 0.135823 | 0.064984 | 192 | 97 |
191 | 191 | West_Side | 1192 | 0.433725 | 0.279362 | 0.720638 | 0.083893 | 0.078020 | 0.353188 | 0.041946 | ... | 0.283557 | 517 | 333.0 | 859.0 | 421 | 0.453392 | 0.135823 | 0.064984 | 193 | 154 |
192 | 192 | West_Side | 1192 | 0.433725 | 0.279362 | 0.720638 | 0.083893 | 0.078020 | 0.353188 | 0.041946 | ... | 0.283557 | 517 | 333.0 | 859.0 | 421 | 0.453392 | 0.135823 | 0.064984 | 194 | 202 |
193 | 193 | West_Side | 1192 | 0.433725 | 0.279362 | 0.720638 | 0.083893 | 0.078020 | 0.353188 | 0.041946 | ... | 0.283557 | 517 | 333.0 | 859.0 | 421 | 0.453392 | 0.135823 | 0.064984 | 195 | 35 |
194 | 194 | West_Side | 1192 | 0.433725 | 0.279362 | 0.720638 | 0.083893 | 0.078020 | 0.353188 | 0.041946 | ... | 0.283557 | 517 | 333.0 | 859.0 | 421 | 0.453392 | 0.135823 | 0.064984 | 209 | 6 |
195 | 195 | West_Side | 1192 | 0.433725 | 0.279362 | 0.720638 | 0.083893 | 0.078020 | 0.353188 | 0.041946 | ... | 0.283557 | 517 | 333.0 | 859.0 | 421 | 0.453392 | 0.135823 | 0.064984 | 210 | 5 |
196 | 196 | West_Side | 1192 | 0.433725 | 0.279362 | 0.720638 | 0.083893 | 0.078020 | 0.353188 | 0.041946 | ... | 0.283557 | 517 | 333.0 | 859.0 | 421 | 0.453392 | 0.135823 | 0.064984 | 211 | 22 |
197 | 197 | West_Side | 1192 | 0.433725 | 0.279362 | 0.720638 | 0.083893 | 0.078020 | 0.353188 | 0.041946 | ... | 0.283557 | 517 | 333.0 | 859.0 | 421 | 0.453392 | 0.135823 | 0.064984 | 212 | 34 |
198 | 198 | West_Side | 1192 | 0.433725 | 0.279362 | 0.720638 | 0.083893 | 0.078020 | 0.353188 | 0.041946 | ... | 0.283557 | 517 | 333.0 | 859.0 | 421 | 0.453392 | 0.135823 | 0.064984 | 213 | 42 |
199 | 199 | West_Side | 1192 | 0.433725 | 0.279362 | 0.720638 | 0.083893 | 0.078020 | 0.353188 | 0.041946 | ... | 0.283557 | 517 | 333.0 | 859.0 | 421 | 0.453392 | 0.135823 | 0.064984 | 214 | 77 |
200 | 200 | West_Side | 1192 | 0.433725 | 0.279362 | 0.720638 | 0.083893 | 0.078020 | 0.353188 | 0.041946 | ... | 0.283557 | 517 | 333.0 | 859.0 | 421 | 0.453392 | 0.135823 | 0.064984 | 215 | 18 |
201 rows × 25 columns
#Create Frogtown GeoMap
FG_map = folium.Map(location=[44.958326, -93.122926], zoom_start=14,tiles="OpenStreetMap")
# I've included the markers from previous graph
for lat, lng, borough, neighborhood in zip(Frogtown_venues['Venue_Latitude'], Frogtown_venues['Venue_Longitude'],Frogtown_venues['Venue'], Frogtown_venues['Venue_Category']):
label = '{}, {}'.format(borough, neighborhood)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, lng],
radius=5,
popup=label,
color='green',
fill=True,
fill_color='#3186cc',
fill_opacity=0.1,
parse_html=False).add_to(FG_map)
FG_map.choropleth(
geo_data=sp_geo,
data=F,
columns=['Grid','Count'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Total Traffic Stops',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-37-e561985be0e4> in <module> 4 5 # I've included the markers from previous graph ----> 6 for lat, lng, borough, neighborhood in zip(Frogtown_venues['Venue_Latitude'], Frogtown_venues['Venue_Longitude'],Frogtown_venues['Venue'], Frogtown_venues['Venue_Category']): 7 label = '{}, {}'.format(borough, neighborhood) 8 label = folium.Popup(label, parse_html=True) NameError: name 'Frogtown_venues' is not defined
#sp_geo = r'Geo-Json\Saint Paul Police Grid - Shapefile.geojson'
FG_map = folium.Map(location=[44.958326, -93.122926], zoom_start=14,tiles="OpenStreetMap")
FG_map.choropleth(
geo_data=sp_geo,
data=F,
columns=['Grid','Citation'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Citation Margin',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
FG_map = folium.Map(location=[44.958326, -93.122926], zoom_start=14,tiles="OpenStreetMap")
FG_map.choropleth(
geo_data=sp_geo,
data=F,
columns=['Grid','Equipment Violation'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Equipment Violation Margin',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
FG_map = folium.Map(location=[44.958326, -93.122926], zoom_start=14,tiles="OpenStreetMap")
FG_map.choropleth(
geo_data=sp_geo,
data=F,
columns=['Grid','LateNight'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='LateNight Stop Margin',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
FG_map = folium.Map(location=[44.958326, -93.122926], zoom_start=14,tiles="OpenStreetMap")
FG_map.choropleth(
geo_data=sp_geo,
data=F,
columns=['Grid','Driver_search'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Driver Search Margin',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")
FG_map.choropleth(
geo_data=sp_geo,
data=N,
columns=['Grid','Count'],
#nan_fill_color='purple',
#nan_fill_opacity=0.4,
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Total Traffic Stops',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")
#messed_up_data.loc[4, 'Count'] = float('nan')
C=B.query('Count < 2000')
FG_map.choropleth(
geo_data=sp_geo,
data=C,
columns=['Grid','Count'],
#nan_fill_color='purple',
#nan_fill_opacity=0.4,
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Total Traffic Stops',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")
C= B.query('Count < 2000')
FG_map.choropleth(
geo_data=sp_geo,
data=C,
columns=['Grid','Citation_count'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Citation Counts',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")
FG_map.choropleth(
geo_data=sp_geo,
data=N,
columns=['Grid','Citation'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Citation Margin',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")
C= B.query('Count > 100')
FG_map.choropleth(
geo_data=sp_geo,
data=C,
columns=['Grid','Citation'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Citation Margin',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")
C= B.query('Count < 2000')
FG_map.choropleth(
geo_data=sp_geo,
data=C,
columns=['Grid','Equipment Violation_count'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Total Equipment Violation Stops',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")
FG_map.choropleth(
geo_data=sp_geo,
data=N,
columns=['Grid','Equipment Violation'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Equipment Violation Margin',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")
C= B.query('Count > 100')
FG_map.choropleth(
geo_data=sp_geo,
data=C,
columns=['Grid','Equipment Violation'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Equipment Violation Margin',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")
C= B.query('Count < 2000')
FG_map.choropleth(
geo_data=sp_geo,
data=C,
columns=['Grid','Moving Violation_count'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Total Moving Violation Counts',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")
FG_map.choropleth(
geo_data=sp_geo,
data=N,
columns=['Grid','LateNight'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='LateNight Traffic Stop Margin',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")
C= B.query('Count > 100')
FG_map.choropleth(
geo_data=sp_geo,
data=C,
columns=['Grid','LateNight'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='LateNight Traffic Stop Margin',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")
FG_map.choropleth(
geo_data=sp_geo,
data=N,
columns=['Grid','Driver_search'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Driver Search Margin',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")
C= B.query('Count >100')
FG_map.choropleth(
geo_data=sp_geo,
data=C,
columns=['Grid','Driver_search'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Driver Search Margin',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")
FG_map.choropleth(
geo_data=sp_geo,
data=N,
columns=['Grid','White_Demo'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='White Margin',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
I map out the difference between actual demographic distribution of whites from the margin white drivers. For legend, if lighter indicates over-representation and darker indicates under-representation
#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")
N['Diff']= N['White_Demo']- N['White']
FG_map.choropleth(
geo_data=sp_geo,
data=N,
columns=['Grid','Diff'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='White: Difference',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")
C= B.query('Count >100')
FG_map.choropleth(
geo_data=sp_geo,
data=C,
columns=['Grid','White'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='White Margin',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")
FG_map.choropleth(
geo_data=sp_geo,
data=N,
columns=['Grid','Black_Demo'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Black Margin',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
I map out the difference between actual demographic distribution of whites from the margin white drivers. For legend, Black drivers are by default over-representated!
#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")
N['Diff']= N['Black_Demo']- N['Black']
FG_map.choropleth(
geo_data=sp_geo,
data=N,
columns=['Grid','Diff'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Black: Difference',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")
C= B.query('Count >100')
FG_map.choropleth(
geo_data=sp_geo,
data=C,
columns=['Grid','Black'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Black Margin',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")
FG_map.choropleth(
geo_data=sp_geo,
data=N,
columns=['Grid','Asian_Demo'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Asian Margin',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
I map out the difference between actual demographic distribution of whites from the margin white drivers. For legend, Asian drivers are by default underrepresentated.
#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")
N['Diff']= N['Asian_Demo']- N['Asian']
FG_map.choropleth(
geo_data=sp_geo,
data=N,
columns=['Grid','Diff'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Asian: Difference',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")
C= B.query('Count >100')
FG_map.choropleth(
geo_data=sp_geo,
data=C,
columns=['Grid','Asian'],
key_on="feature.properties.gridnum",
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Asian Margin',
highlight= True,
name= 'Clean Map'
)
folium.LayerControl().add_to(FG_map)
# display map
FG_map
Quick Analysis
After digging into the data, we get a better grasp on how traffic stops are administered and the citation rate. Within Frogtown, the data shows that movement violations are more frequent during the daytime, which have a high citation rate. During late night, equipment violation is more frequent in the community. When expanding outward to the Saint Paul, we see that traffic stops are more frequent along university avenue, which is both commerical and the light transit runs through the avenue as well. The Frogtown area have more traffic stops than their neighboring community, with greater concentration from Western Ave to Lexington Ave. Within Saint Paul, there are similar communities like Frogtown.
In regards to the racial question within Frogtown, Black and White drivers are over-representated, while Asians are under-representated for total citations given. Black drivers are at least twice more likely to be searched than their white counterparts for moving violations, despite having low citation rates. During the late night hours, black drivers are more likely to be stopped and have greater rates of moving violations. On the other hand, white drivers have greater citation rate during the morning time. Within gender, white female drivers get stopped. Though it's worth nothing that this can be due to the wealth gap between different racial lines. At the city level, we see that black drivers are over-representated on total traffic stops in all neighborhoods. On contrast, Asian drivers are under-representated respect to total traffic stops.
This report created more questions than answers. Why is there discrepancies in the data? What is the criteria for an equipment violation; does crashes count? There is simply not enough volume of 911 calls and invesitgative stops to account for the imbalance.
There is a lot more information that can be gleamed from the results. Please feel free to email me if you have any questions or thoughts.
Please check out the numbers for your respective community in the Appendix.
Features= ['Community','Count','Citation','Equipment Violation','Moving Violation','Driver_search','Vehicle_search','LateNight',\
'Asian','Black','Latino','White','Other','Gender','Weekend']
#pd.options.display.float_format = '{:.4f}%'.format
# Create a sum and divide by Count; and
Ne= df[Features].groupby(['Community']).sum()
#Save Sum Values
C= Ne[['Count','Citation','Equipment Violation','Moving Violation','LateNight']]
#Divide by Count and then add new columns in tranformed table
Ne=Ne.div(Ne['Count'].values,axis=0)
Ne['Count']=C.iloc[:,0]
Ne['Citation_count'] = C.iloc[:,1]
Ne['Equipment Violation_count'] = C.iloc[:,2]
Ne['Moving Violation_count'] = C.iloc[:,3]
Ne['LateNight_count'] = C.iloc[:,4]
Ne['White_Demo']= SPDem.iloc[:,0].values #.values was used
#print(N['White_Demo'])
Ne['Black_Demo']= SPDem.iloc[:,1].values
Ne['Asian_Demo']= SPDem.iloc[:,2].values
Ne=Ne.reset_index()
Ne[['Community','Count', 'Citation_count','Citation','Equipment Violation_count','Equipment Violation','Moving Violation_count','Moving Violation']].set_index('Community')
#df
Count | Citation_count | Citation | Equipment Violation_count | Equipment Violation | Moving Violation_count | Moving Violation | |
---|---|---|---|---|---|---|---|
Community | |||||||
Battle_Creek | 2290 | 1654 | 0.7223% | 269.0000% | 0.1175% | 2021.0000% | 0.8825% |
Capital_River | 4911 | 3423 | 0.6970% | 534.0000% | 0.1087% | 4377.0000% | 0.8913% |
Como | 1328 | 851 | 0.6408% | 187.0000% | 0.1408% | 1141.0000% | 0.8592% |
Dayton_Bluff | 2568 | 934 | 0.3637% | 753.0000% | 0.2932% | 1815.0000% | 0.7068% |
Greater_East_Side | 3968 | 2073 | 0.5224% | 810.0000% | 0.2041% | 3158.0000% | 0.7959% |
Highland_Park | 3372 | 2960 | 0.8778% | 76.0000% | 0.0225% | 3296.0000% | 0.9775% |
Macalester_Groveland | 1851 | 1437 | 0.7763% | 130.0000% | 0.0702% | 1721.0000% | 0.9298% |
Midway | 2476 | 1288 | 0.5202% | 491.0000% | 0.1983% | 1985.0000% | 0.8017% |
North_End | 8561 | 4883 | 0.5704% | 1854.0000% | 0.2166% | 6707.0000% | 0.7834% |
Payne_Phalen | 12136 | 4185 | 0.3448% | 3573.0000% | 0.2944% | 8563.0000% | 0.7056% |
St_Anthony | 347 | 223 | 0.6427% | 94.0000% | 0.2709% | 253.0000% | 0.7291% |
Summit_Hill | 851 | 585 | 0.6874% | 74.0000% | 0.0870% | 777.0000% | 0.9130% |
Summit_University | 2477 | 1133 | 0.4574% | 573.0000% | 0.2313% | 1904.0000% | 0.7687% |
Thomas_Frogtown | 5458 | 2219 | 0.4066% | 1654.0000% | 0.3030% | 3804.0000% | 0.6970% |
Union_Park | 3137 | 2068 | 0.6592% | 444.0000% | 0.1415% | 2693.0000% | 0.8585% |
West_7th | 3038 | 2210 | 0.7275% | 291.0000% | 0.0958% | 2747.0000% | 0.9042% |
West_Side | 1192 | 517 | 0.4337% | 333.0000% | 0.2794% | 859.0000% | 0.7206% |
Ne[['Community','Driver_search','Vehicle_search', 'Citation_count','LateNight','Gender','Weekend']].set_index('Community')
Driver_search | Vehicle_search | Citation_count | LateNight | Gender | Weekend | |
---|---|---|---|---|---|---|
Community | ||||||
Battle_Creek | 0.03% | 0.03% | 1654 | 0.15% | 0.42% | 0.11% |
Capital_River | 0.03% | 0.03% | 3423 | 0.15% | 0.41% | 0.16% |
Como | 0.03% | 0.03% | 851 | 0.19% | 0.43% | 0.15% |
Dayton_Bluff | 0.09% | 0.09% | 934 | 0.35% | 0.33% | 0.23% |
Greater_East_Side | 0.07% | 0.06% | 2073 | 0.27% | 0.38% | 0.17% |
Highland_Park | 0.00% | 0.00% | 2960 | 0.02% | 0.44% | 0.03% |
Macalester_Groveland | 0.01% | 0.01% | 1437 | 0.09% | 0.42% | 0.09% |
Midway | 0.06% | 0.05% | 1288 | 0.25% | 0.40% | 0.18% |
North_End | 0.06% | 0.05% | 4883 | 0.24% | 0.37% | 0.18% |
Payne_Phalen | 0.11% | 0.10% | 4185 | 0.38% | 0.31% | 0.24% |
St_Anthony | 0.03% | 0.02% | 223 | 0.39% | 0.28% | 0.39% |
Summit_Hill | 0.03% | 0.03% | 585 | 0.13% | 0.42% | 0.14% |
Summit_University | 0.08% | 0.07% | 1133 | 0.32% | 0.35% | 0.25% |
Thomas_Frogtown | 0.10% | 0.09% | 2219 | 0.40% | 0.34% | 0.22% |
Union_Park | 0.03% | 0.03% | 2068 | 0.18% | 0.38% | 0.17% |
West_7th | 0.02% | 0.01% | 2210 | 0.11% | 0.39% | 0.14% |
West_Side | 0.08% | 0.08% | 517 | 0.35% | 0.32% | 0.28% |
Ne[['Community','Asian','Asian_Demo', 'Black','Black_Demo','White','White_Demo']].set_index('Community')
Asian | Asian_Demo | Black | Black_Demo | White | White_Demo | |
---|---|---|---|---|---|---|
Community | ||||||
Battle_Creek | 0.143668 | 0.231585 | 0.292140 | 0.203079 | 0.430131 | 0.398383 |
Capital_River | 0.083690 | 0.089509 | 0.302993 | 0.132114 | 0.515985 | 0.675663 |
Como | 0.100151 | 0.054048 | 0.306476 | 0.089230 | 0.491717 | 0.762898 |
Dayton_Bluff | 0.146417 | 0.323817 | 0.419393 | 0.140553 | 0.306075 | 0.348776 |
Greater_East_Side | 0.184476 | 0.297678 | 0.336946 | 0.155262 | 0.336694 | 0.358690 |
Highland_Park | 0.029359 | 0.071076 | 0.147687 | 0.140959 | 0.726572 | 0.652471 |
Macalester_Groveland | 0.026472 | 0.037623 | 0.127499 | 0.126550 | 0.740681 | 0.768102 |
Midway | 0.050889 | 0.039393 | 0.341276 | 0.020893 | 0.498384 | 0.871761 |
North_End | 0.160963 | 0.355203 | 0.397851 | 0.231465 | 0.338512 | 0.259136 |
Payne_Phalen | 0.211519 | 0.358401 | 0.400132 | 0.128624 | 0.276038 | 0.307874 |
St_Anthony | 0.083573 | 0.112020 | 0.221902 | 0.094220 | 0.550432 | 0.711481 |
Summit_Hill | 0.039953 | 0.027451 | 0.132785 | 0.050839 | 0.740306 | 0.839548 |
Summit_University | 0.061365 | 0.082662 | 0.480420 | 0.324349 | 0.346790 | 0.491559 |
Thomas_Frogtown | 0.125870 | 0.327318 | 0.492305 | 0.329769 | 0.272261 | 0.223661 |
Union_Park | 0.039528 | 0.037718 | 0.259165 | 0.098012 | 0.599936 | 0.772243 |
West_7th | 0.035879 | 0.029926 | 0.180053 | 0.108826 | 0.689928 | 0.721428 |
West_Side | 0.041946 | 0.064984 | 0.281040 | 0.135823 | 0.453020 | 0.453392 |
#Long Strect of code
#Programmer Note: The sort index made the values of the output easier to predice to allow more predictable results
#Group by Race to get count
A=round(rf['Race'].value_counts().sort_index(level=1),4)
Race_Grp.set_value('Asian','Tot_Count', A[0])
Race_Grp.set_value('Black','Tot_Count', A[1])
Race_Grp.set_value('Latino','Tot_Count', A[2])
Race_Grp.set_value('Other','Tot_Count', A[3])
Race_Grp.set_value('White','Tot_Count', A[4])
#Group by Race and Reason to get counts
A= round(rf.groupby(['Race'])['Reason'].value_counts().sort_index(level=1),4)
Race_Grp.set_value('Asian','Eq_Count', A[0])
Race_Grp.set_value('Black','Eq_Count', A[1])
Race_Grp.set_value('Latino','Eq_Count', A[2])
Race_Grp.set_value('Other','Eq_Count', A[3])
Race_Grp.set_value('White','Eq_Count', A[4])
Race_Grp.set_value('Asian','Mov_Count', A[5])
Race_Grp.set_value('Black','Mov_Count', A[6])
Race_Grp.set_value('Latino','Mov_Count', A[7])
Race_Grp.set_value('Other','Mov_Count', A[8])
Race_Grp.set_value('White','Mov_Count', A[9])
#Group by Race and Reason but normalized counts
A= round(rf.groupby(['Race'])['Reason'].value_counts(normalize=True).sort_index(level=1),4)
Race_Grp.set_value('Asian','Eq_Margin', A[0])
Race_Grp.set_value('Black','Eq_Margin', A[1])
Race_Grp.set_value('Latino','Eq_Margin', A[2])
Race_Grp.set_value('Other','Eq_Margin', A[3])
Race_Grp.set_value('White','Eq_Margin', A[4])
Race_Grp.set_value('Asian','Mov_Margin', A[5])
Race_Grp.set_value('Black','Mov_Margin', A[6])
Race_Grp.set_value('Latino','Mov_Margin', A[7])
Race_Grp.set_value('Other','Mov_Margin', A[8])
Race_Grp.set_value('White','Mov_Margin', A[9])
#Group by Race and Citation Counts
A=round(rf.groupby(['Race'])['Citation'].value_counts().sort_index(level=1),4)
Race_Grp.set_value('Asian','Citation_Count', A[5])
Race_Grp.set_value('Black','Citation_Count', A[6])
Race_Grp.set_value('Latino','Citation_Count', A[7])
Race_Grp.set_value('Other','Citation_Count', A[8])
Race_Grp.set_value('White','Citation_Count', A[9])
#Group by Race, Reason, and Citation Normalized Counts
A= round(rf.groupby(['Race','Reason'])['Citation'].value_counts(normalize=True).sort_index(level=2),4)
Race_Grp.set_value('Asian','Eq_Citation', A[10])
Race_Grp.set_value('Black','Eq_Citation', A[12])
Race_Grp.set_value('Latino','Eq_Citation', A[14])
Race_Grp.set_value('Other','Eq_Citation', A[16])
Race_Grp.set_value('White','Eq_Citation', A[18])
Race_Grp.set_value('Asian','Mov_Citation', A[11])
Race_Grp.set_value('Black','Mov_Citation', A[13])
Race_Grp.set_value('Latino','Mov_Citation', A[15])
Race_Grp.set_value('Other','Mov_Citation', A[17])
Race_Grp.set_value('White','Mov_Citation', A[19])
#Group by Race and Search Counts
A=round(rf.groupby(['Race'])['Driver_search'].value_counts().sort_index(level=1),4)
Race_Grp.set_value('Asian','Driversearch_Count', A[5])
Race_Grp.set_value('Black','Driversearch_Count', A[6])
Race_Grp.set_value('Latino','Driversearch_Count', A[7])
Race_Grp.set_value('Other','Driversearch_Count', A[8])
Race_Grp.set_value('White','Driversearch_Count', A[9])
#Group by Race and Citation Counts
A= round(rf.groupby(['Race','Reason'])['Driver_search'].value_counts(normalize=True).sort_index(level=2),4)
Race_Grp.set_value('Asian','Eq_DriverSearch', A[10])
Race_Grp.set_value('Black','Eq_DriverSearch', A[12])
Race_Grp.set_value('Latino','Eq_DriverSearch', A[14])
Race_Grp.set_value('Other','Eq_DriverSearch', A[16])
Race_Grp.set_value('White','Eq_DriverSearch', A[18])
Race_Grp.set_value('Asian','Mov_DriverSearch', A[11])
Race_Grp.set_value('Black','Mov_DriverSearch', A[13])
Race_Grp.set_value('Latino','Mov_DriverSearch', A[15])
Race_Grp.set_value('Other','Mov_DriverSearch', A[17])
Race_Grp.set_value('White','Mov_DriverSearch', A[19])
#Group by Race,Reason, and Gender Normalized Counts
A=round(rf.groupby(['Race','Reason'])['Gender'].value_counts(normalize=True).sort_index(level=2),4)
Race_Grp.set_value('Asian','Eq_Gender_F', A[10])
Race_Grp.set_value('Black','Eq_Gender_F', A[12])
Race_Grp.set_value('Latino','Eq_Gender_F', A[14])
Race_Grp.set_value('Other','Eq_Gender_F', A[16])
Race_Grp.set_value('White','Eq_Gender_F', A[18])
Race_Grp.set_value('Asian','Mov_Gender_F', A[11])
Race_Grp.set_value('Black','Mov_Gender_F', A[13])
Race_Grp.set_value('Latino','Mov_Gender_F', A[15])
Race_Grp.set_value('Other','Mov_Gender_F', A[17])
Race_Grp.set_value('White','Mov_Gender_F', A[19])
#Group by Race,LateNight Counts
A=round(rf.groupby(['Race'])['LateNight'].value_counts().sort_index(level=1),4)
Race_Grp.set_value('Asian','LateNight_Count', A[5])
Race_Grp.set_value('Black','LateNight_Count', A[6])
Race_Grp.set_value('Latino','LateNight_Count', A[7])
Race_Grp.set_value('Other','LateNight_Count', A[8])
Race_Grp.set_value('White','LateNight_Count', A[9])
#Group by Race,Reason, and Latenight Normalized Counts
A=round(rf.groupby(['Race','Reason'])['LateNight'].value_counts(normalize=True).sort_index(level=2),4)
Race_Grp.set_value('Asian','Eq_LateNight', A[10])
Race_Grp.set_value('Black','Eq_LateNight', A[12])
Race_Grp.set_value('Latino','Eq_LateNight', A[14])
Race_Grp.set_value('Other','Eq_LateNight', A[16])
Race_Grp.set_value('White','Eq_LateNight', A[18])
Race_Grp.set_value('Asian','Mov_LateNight', A[11])
Race_Grp.set_value('Black','Mov_LateNight', A[13])
Race_Grp.set_value('Latino','Mov_LateNight', A[15])
Race_Grp.set_value('Other','Mov_LateNight', A[17])
Race_Grp.set_value('White','Mov_LateNight', A[19])
#Group by Race,Latenight, and Citation Normalized Counts
#1/3 of data is done during latenight activities
A=round(rf.groupby(['Race','LateNight'])['Citation'].value_counts(normalize=True).sort_index(level=2),4)
Race_Grp.set_value('Asian','Morn_Citation', A[10])
Race_Grp.set_value('Black','Morn_Citation', A[12])
Race_Grp.set_value('Latino','Morn_Citation', A[14])
Race_Grp.set_value('Other','Morn_Citation', A[16])
Race_Grp.set_value('White','Morn_Citation', A[18])
Race_Grp.set_value('Asian','Late_Citation', A[11])
Race_Grp.set_value('Black','Late_Citation', A[13])
Race_Grp.set_value('Latino','Late_Citation', A[15])
Race_Grp.set_value('Other','Late_Citation', A[17])
Race_Grp.set_value('White','Late_Citation', A[19])
# Construct for every neighborhood
d_North_End=Race_Grp
d_West_Side=Race_Grp
d_Battle_Creek=Race_Grp
d_Dayton_Bluff=Race_Grp
d_Payne_Phalen=Race_Grp
d_Capital_River=Race_Grp
d_Union_Park=Race_Grp
d_Summit_University=Race_Grp
d_Greater_East_Side=Race_Grp
d_Thomas_Frogtown=Race_Grp
d_St_Anthony=Race_Grp
d_Summit_Hill=Race_Grp
d_Midway=Race_Grp
d_West_7th=Race_Grp
d_Como=Race_Grp
d_Highland_Park=Race_Grp
d_Macalester_Groveland=Race_Grp
A= df.Community.unique()
Dict= {A[0]:d_North_End, A[1]:d_West_Side, A[2]:d_Battle_Creek, A[3]:d_Dayton_Bluff, A[4]:d_Payne_Phalen, A[5]:d_Capital_River,
A[6]:d_Union_Park, A[7]:d_Summit_University, A[8]:d_Greater_East_Side, A[9]:d_Thomas_Frogtown, A[10]:d_St_Anthony,
A[11]:d_Summit_Hill, A[12]:d_Midway, A[13]:d_West_7th, A[14]:d_Como, A[15]:d_Highland_Park, A[16]:d_Macalester_Groveland}
RR= ['Asian','Black','Latino','Other','White']
for k,p in enumerate(Dict):
rf= df[df['Community']==p]
for i,j in enumerate(RR):
Dict[p].set_value(j,'Tot_Count', round(rf['Race'].value_counts().sort_index(level=1),4)[i])
Dict[p].set_value(j,'Eq_Count', round(rf.groupby(['Race'])['Reason'].value_counts().sort_index(level=1),4)[i])
Dict[p].set_value(j,'Mov_Count', round(rf.groupby(['Race'])['Reason'].value_counts().sort_index(level=1),4)[i+len(RR)])
#print(Dict['Capital_River'])
print(Dict['West_Side'])
#DR[1]
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
CLIENT_ID,
CLIENT_SECRET,
VERSION,
grid_latitude,
grid_longitude,
radius,
LIMIT)
url # display URL
#Get respective json
results = requests.get(url).json()
# Find venues in one location
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) # flatten JSON
# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]
# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
#nearby_venues
Frogtown_venues.head()
#Frogtown_venues.groupby('Neighborhood').count()
Frogtown_venues.Venue_Category.unique()
#Specify the Venues of interest to store and display
x=['Dive Bar', 'Convenience Store', 'Liquor Store', 'Restaurant', 'Middle Eastern Restaurant','Vietnamese Restaurant',\
'Asian Restaurant', 'Thai Restaurant','Café','Fast Food Restaurant','Noodle House','Ethiopian Restaurant','Bar',\
'Ramen Restaurant', 'BBQ Joint', 'Grocery Store', 'Chinese Restaurant']
#Frogtown_venues[Frogtown_venues['Venue_Category'].isin(x)]