FT Crusader Logo

Frogtown/Saint Paul Traffic Stop Code; 04/07/19

By Frogtown Crusader (Abu Nayeem)

Table of contents¶

Introduction
Data
Methodology
Analysis
Conclusion

Introduction ¶

Disclaimer: This is my Coursera Capstone Project for Data Science. Also, I will be using the term "Black" instead of African American because that is the race indicator provided from the dataset and the category also includes persons of African origin (substantial minority).

About Me:¶

I'm a Frogtown resident, community advocate, programmer. I will like to use open-source data to be share stories and create action. Please follow me on Github

Purpose:¶

Currently in the United States, there is alot of tension between law enforcement, and the public. I will be looking into the Traffic Stop Data for Saint Paul, Minnesota (USA) provided by the Saint Paul Police Department (SPPD). Analyzing the traffic stop data can provide evidence (or lack thereof) of systemic biases. The goal of my report to add insight on what is happening in my community, Thomas-Dale neighborhood aka Frogtown, as well as advocate citizens to use open source data and/or demand their public agencies to provide such data.

Executive Summary:¶

There are certain parts of Frogtown that have greater frequency of traffic stops compared to rest of the neighborhood, particularly along University University Avenue.The data suggests targeting of Black drivers given that they are stopped more frequently, searched more frequently per stop, and less likelihood to receive a citation. Some other data insights include that moving violation stops are given mostly in the morning and have a higher citation rate. In contrast, during the late night hours, there are greater instances of equipment violations. Furthermore, there seems to be many communities, including Frogtown that have considerable instances of equipment violations.

Why prove the obvious?¶

The results may be obvious, but proving it may be more challenging. As a researcher, my goal is to measure the impact, seek the truth, explore, and challenge my expectations. Data can be the great equalizer challenging our worldviews and/or reinforcing our existing perspective. Data reports can be used to share stories and information effectively. Furthermore, data is used as an evaluation tool to determine the effectiveness of programs and policies.

Thus data practitioners, more broadly institutions, hold strong responsibility and influence in shaping the data in support a certain narrative. In our current political climate, the public perception on law enforcement is polarized and I hope these studies can shed light on the issues. This report and other will be available via open source, allowing others to contribute, replicate, use code for their own respective neighborhood.

About Saint Paul¶

The City of Saint Paul is the second largest city in Minnesota, USA, and is the capital city for the state. Saint Paul is often paired with their nearby city, Minneapolis, where they are aptly named, The Twin Cities. It has roughly over 300,000 people and the city itself is quite diverse. Minnesota has a high level of racial inequity ranking 47th of 51st compared to rest of the United States. Saint Paul is broken down to seventeen Planning Districts, created in 1979 to allow neighborhoods to participate in governance and use Community Development Block Grants. The Thomas/Dale neighborhood is one of the district planning councils. A few years ago, a tragic police fatal shooting of African-American male, Philando Castile, occurred during a traffic stop in the suburbs of Saint Paul, Falcon Heights, which has increased tension within the community between law enforcement and citizens.

About Thomas-Dale-Frogtown Neighborhood¶

The Frogtown community has historically been a transitional community with new immigrant/refugee communities living in the neighborhood for short period of time. From my experience, Frogtown boasts considerable diversity respect to language, culture, and ethnicity. In recent times, it has been historically poor. Here is a snapshot of the community exported from Minnesota Compass based on 2017 Census Demographic Data.

title

Frogtown Community Information¶

The image below displays the Frogtown Community using the police grid (matches well with actual boundaries). On the southern boundary of Frogtown is University Avenue, where the Light Rail Transportation runs along the boundary and it is a heavy residential street as well. I will emphasize more noticeable landmarks once plugging in the 4-square data. The two rightmost sectors are respectfully Mt. Airy and Capitol Heights. These two communities are considered distinct by community members.

title

Data ¶

About the Datasets:¶

The dataset contains SPPD traffic stop collected by SSPD from 2000 to 2018 via agreement of the Saint Paul chapter of the NAACP and can be accessed here. The website have a lot of features and visualizations for basic analysis, but for advanced users data transformations are not available/ limited. I have chosen to select years from 2017 to 2018 based on data limitations.

Data Features:

Individual Traffic Stop Data
Driver characteristics
- Gender, Age (if recieve citation), and Race
- Was the driver searched, vehicle searched, and/or recieve a citation?
GeoCoordinates of center of police grid and timestamp for

Data limitation as explain on the website:

Reason for stop (available starting in 2017)
- Include Moving Violation, Equipment Violation, Investigative Stop, and 911 call
Data reflects traffic stops originating by St. Paul Police Officers
Race is based on officers’ perceptions
Fields indicating “No Data” may be due to a variety of factors, including:
- Age data is only collected when a citation is issued
- Reason for stop data was not collected before 2017
- Technology changes over time/ Technical Errors/ Lack of Available information
Supplemental Info Suggested by Author

The dataset consist of each record of driver being stopped, but the locations coordinates are limited to the police grid coordinates. There is maybe over 90 or so police grids!

title

Four-Square API Dataset¶

I'll be using the Four-square API to get information on local businesesses. Some street/ and areas might be more active than others

Minnesota Compass 2017 Census Survey Data¶

The Minnesota Compass offers raw data for both Minneapolis and Saint Paul districts. They are a non-partisan group.

Data Prep¶

The primary data will range from 2017-18. The longitudinal analysis will have data from 2001 to 2018.

In [1]:

import pandas as pd
import numpy as np
from datetime import datetime
import warnings
%matplotlib inline 
import matplotlib as mpl
import matplotlib.pyplot as plt
warnings.filterwarnings('ignore')
import plotly
from IPython.display import HTML
from IPython.display import display
import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
from sodapy import Socrata

#New Upload Method Get Information from Socrata API
client = Socrata("information.stpaul.gov", None)

#Data Load
#df = pd.read_csv('Data/Traffic_Stop_Dataset.csv')
#More familar column names Column Names
#cols= ['Year','Date','Race','Gender','Driver_search','Vehicle_search','Citation','Age','Reason','Grid','GridLocation', 'Count']
#df.columns= cols

results = client.get("kkd6-vvns", limit=1000000)
results_df = pd.DataFrame.from_records(results)

#rename columns
cols= ['Gunk', 'Gunk2', 'Gunk3','Age','Citation', 'Count', 'Date','Driver_search','Gender','GridLocation', 'Grid','Race','Reason','Vehicle_search','Year']
results_df.columns= cols
results_df =results_df.iloc[:,3:]
#change datatypes
results_df = results_df.astype({"Year": int, "Age": float, "Grid":float, "Count":int})

#We will be choosing from 2017 to 2018 because there a reason given for traffic stop
df= results_df.query('Year in [2018,2017,2019,2020,2021]')

df.shape

WARNING:root:Requests made without an app_token will be subject to strict throttling limits.

Out[1]:

(106978, 12)

In [2]:

df

Out[2]:

	Age	Citation	Count	Date	Driver_search	Gender	GridLocation	Grid	Race	Reason	Vehicle_search	Year
0	NaN	No	1	2018-05-11T20:47:00.000	No	Male	NaN	991.0	Black	Moving Violation	No	2018
1	NaN	No	1	2019-03-03T00:15:00.000	No	Male	NaN	991.0	Black	Equipment Violation	No	2019
2	28.0	Yes	1	2019-02-13T13:21:00.000	No	Male	NaN	991.0	Black	Moving Violation	No	2019
3	NaN	No	1	2019-07-27T21:23:00.000	No	Male	NaN	991.0	White	Equipment Violation	No	2019
4	NaN	No	1	2018-01-20T21:19:00.000	No	Male	NaN	991.0	White	Moving Violation	No	2018
5	NaN	No	1	2018-12-27T14:02:00.000	No	Male	NaN	991.0	White	Moving Violation	No	2018
6	24.0	Yes	1	2018-03-13T13:51:00.000	No	Female	NaN	991.0	White	Moving Violation	No	2018
7	NaN	Yes	1	2019-03-02T22:12:00.000	No	Male	NaN	991.0	White	Equipment Violation	No	2019
8	25.0	Yes	1	2019-06-07T22:08:00.000	No	Female	NaN	991.0	Other	Moving Violation	No	2019
9	NaN	No	1	2019-01-19T22:08:00.000	No	Male	NaN	991.0	White	Equipment Violation	No	2019
10	NaN	No	1	2019-06-07T20:18:00.000	No	Male	NaN	991.0	White	Moving Violation	No	2019
11	22.0	Yes	1	2018-05-01T14:11:00.000	No	Male	NaN	991.0	White	Moving Violation	No	2018
12	21.0	Yes	1	2019-09-27T19:13:00.000	No	Female	NaN	991.0	Black	Moving Violation	No	2019
13	NaN	No	1	2019-07-20T21:10:00.000	No	Male	NaN	991.0	White	Equipment Violation	No	2019
14	NaN	No	1	2019-05-09T16:56:00.000	No	Female	NaN	999.0	White	Moving Violation	No	2019
15	NaN	No	1	2018-11-21T18:56:00.000	No	Male	NaN	991.0	Asian	Moving Violation	No	2018
16	NaN	No	1	2018-08-01T14:14:00.000	No	Male	NaN	993.0	White	Moving Violation	No	2018
17	NaN	No	1	2018-10-27T20:43:00.000	No	Male	NaN	991.0	White	Equipment Violation	No	2018
18	NaN	No	1	2018-02-18T00:36:00.000	No	Male	NaN	996.0	Black	Moving Violation	No	2018
19	NaN	No	1	2018-06-23T00:46:00.000	No	Male	NaN	993.0	Other	Moving Violation	No	2018
20	NaN	No	1	2018-11-21T20:06:00.000	No	Male	NaN	991.0	White	Moving Violation	No	2018
21	54.0	Yes	1	2018-04-09T14:20:00.000	No	Female	NaN	991.0	White	Moving Violation	No	2018
22	NaN	No	1	2018-10-05T20:49:00.000	No	Male	NaN	991.0	White	Moving Violation	No	2018
23	NaN	No	1	2018-10-05T20:35:00.000	No	Female	NaN	991.0	White	Moving Violation	No	2018
24	NaN	No	1	2018-10-27T21:29:00.000	No	Male	NaN	991.0	Latino	Moving Violation	No	2018
25	NaN	Yes	1	2018-06-30T22:35:00.000	No	Male	NaN	994.0	White	Moving Violation	No	2018
26	NaN	No	1	2018-01-13T00:51:00.000	No	Male	NaN	991.0	White	Equipment Violation	No	2018
27	NaN	No	1	2018-05-11T21:56:00.000	No	Male	NaN	991.0	White	Moving Violation	No	2018
28	NaN	No	1	2018-02-17T21:37:00.000	No	Female	NaN	991.0	Black	Equipment Violation	No	2018
29	23.0	Yes	1	2018-04-12T11:34:00.000	No	Male	NaN	991.0	White	Moving Violation	No	2018
...	...	...	...	...	...	...	...	...	...	...	...	...
784687	NaN	No	1	2017-08-30T17:45:00.000	No	Female	{'latitude': '44.959171113', 'longitude': '-93...	94.0	Other	Moving Violation	No	2017
784688	NaN	No	1	2017-11-02T22:46:00.000	No	Male	{'latitude': '44.959369203', 'longitude': '-93...	87.0	Black	Moving Violation	No	2017
784689	NaN	No	1	2017-10-25T03:00:00.000	No	Female	{'latitude': '44.96658643', 'longitude': '-93....	68.0	Black	Moving Violation	No	2017
784690	NaN	No	1	2017-12-07T15:30:00.000	No	Male	{'latitude': '44.988221181', 'longitude': '-93...	11.0	White	Moving Violation	No	2017
784691	34.0	Yes	1	2017-09-07T08:30:00.000	No	Female	{'latitude': '44.928875372', 'longitude': '-93...	267.0	White	Moving Violation	No	2017
784692	30.0	Yes	1	2017-09-29T18:33:00.000	No	Male	{'latitude': '44.959515062', 'longitude': '-93...	91.0	White	Moving Violation	No	2017
784693	NaN	Yes	1	2017-09-13T10:42:00.000	No	Female	{'latitude': '44.973827765', 'longitude': '-93...	51.0	Black	Moving Violation	No	2017
784694	NaN	Yes	1	2017-07-27T08:06:00.000	No	Male	{'latitude': '44.973124945', 'longitude': '-93...	47.0	Black	Moving Violation	No	2017
784695	NaN	No	1	2017-09-20T22:34:00.000	No	Female	{'latitude': '44.988314242', 'longitude': '-93...	14.0	Black	Moving Violation	No	2017
784696	37.0	Yes	1	2017-09-28T09:39:00.000	No	Male	{'latitude': '44.980704001', 'longitude': '-93...	32.0	Black	Moving Violation	No	2017
784697	NaN	No	1	2017-07-25T13:50:00.000	No	Male	{'latitude': '44.954914788', 'longitude': '-93...	112.0	Black	Moving Violation	No	2017
784698	NaN	Yes	1	2017-12-06T10:20:00.000	No	Female	{'latitude': '44.980999933', 'longitude': '-93...	31.0	Black	Moving Violation	No	2017
784699	NaN	Yes	1	2017-10-05T10:03:00.000	No	Female	{'latitude': '44.973124945', 'longitude': '-93...	47.0	Latino	Moving Violation	No	2017
784700	NaN	Yes	1	2017-12-01T11:21:00.000	No	Male	{'latitude': '44.928875372', 'longitude': '-93...	267.0	White	Moving Violation	No	2017
784701	NaN	No	1	2017-10-11T15:12:00.000	No	Female	{'latitude': '44.966688357', 'longitude': '-93...	75.0	Black	Moving Violation	No	2017
784702	NaN	No	1	2017-10-03T17:40:00.000	Yes	Male	{'latitude': '44.959923808', 'longitude': '-93...	99.0	White	Moving Violation	Yes	2017
784703	NaN	No	1	2017-09-29T17:45:00.000	No	Female	{'latitude': '44.95086001', 'longitude': '-93....	132.0	White	Moving Violation	No	2017
784704	27.0	Yes	1	2017-08-17T21:35:00.000	No	Male	{'latitude': '44.953679624', 'longitude': '-93...	111.0	Other	Equipment Violation	No	2017
784705	NaN	No	1	2017-07-29T01:17:00.000	No	Male	{'latitude': '44.980704001', 'longitude': '-93...	32.0	Black	Equipment Violation	No	2017
784706	28.0	Yes	1	2017-09-25T08:37:00.000	No	Female	{'latitude': '44.965443164', 'longitude': '-93...	269.0	Black	Moving Violation	No	2017
784707	NaN	No	1	2017-11-09T10:51:00.000	No	Female	{'latitude': '44.952068692', 'longitude': '-93...	105.0	White	Moving Violation	No	2017
784708	NaN	No	1	2017-12-04T00:15:00.000	Yes	Male	{'latitude': '44.966741984', 'longitude': '-93...	73.0	Black	Moving Violation	Yes	2017
784709	NaN	No	1	2017-08-31T23:28:00.000	No	Female	{'latitude': '44.973798199', 'longitude': '-93...	50.0	White	Equipment Violation	No	2017
784710	NaN	Yes	1	2017-09-07T10:09:00.000	No	Female	{'latitude': '44.988189836', 'longitude': '-93...	9.0	White	Moving Violation	No	2017
784711	25.0	Yes	1	2017-10-24T08:45:00.000	No	Male	{'latitude': '44.952726847', 'longitude': '-93...	119.0	Black	Moving Violation	No	2017
784712	26.0	Yes	1	2017-08-22T07:39:00.000	No	Male	{'latitude': '44.952131905', 'longitude': '-93...	107.0	Other	Moving Violation	No	2017
784713	82.0	Yes	1	2017-11-02T12:01:00.000	No	Female	{'latitude': '44.916259517', 'longitude': '-93...	206.0	White	Moving Violation	No	2017
784714	NaN	Yes	1	2017-10-31T09:05:00.000	No	Female	{'latitude': '44.908307731', 'longitude': '-93...	223.0	Latino	Moving Violation	No	2017
784715	NaN	No	1	2017-09-14T22:42:00.000	No	Male	NaN	24.0	White	Equipment Violation	No	2017
784716	NaN	No	1	2017-11-22T11:10:00.000	No	Female	{'latitude': '44.948648967', 'longitude': '-93...	131.0	White	Moving Violation	No	2017

106978 rows × 12 columns

First, I will be creating some functions that will make data transformations easier; such as creating an AgeBin, Neighborhood Designation, and Policing District Designation

In [3]:

#Community function
def commun(x): 
    if x in [67,68,87,88,89,90,91,92]:
        return 'Thomas_Frogtown'
    elif x in [5,6,7,8,25,26,27,28,45,46,47,48]:
        return 'Como'
    elif x in [107, 108, 109, 110,127,128,129,130]: 
        return 'Summit_University'
    elif x in [101,102,103,104,105,106,122,123,124,125,126]: 
        return 'Union_Park'
    elif x in [63,64,65,66,83,84,85,86]: 
        return 'Midway' 
    elif x in [142,143,144,145,146,162,163,164,165,166]:
        return 'Macalester_Groveland'
    elif x in [182,183,184,185,186,202,203,204,205,206,223,224,225,242,243,244,245,246]:
        return 'Highland_Park'
    elif x in [147,148,149,167,168]:
        return 'Summit_Hill'
    elif x in [1,2,21,22,43,44,61,62,81,82]:
        return 'St_Anthony'
    elif x in [226,207,187,188,189,267,268,169,170,171,249,150,151,230,367]:
        return 'West_7th'
    elif x in [209,210,211,212,213,214,215,192,193,194,195,172,173,174,175]:
        return 'West_Side'
    elif x in [111,112,131,132,133,152,153]:
        return 'Capital_River'
    elif x in [98,99,100,118,119,119,120,137,138,139,140,160,197,180,200,240,280]:
        return 'Battle_Creek'
    elif x in [76,95,96,97,115,116,117,138,114,136]:
        return 'Dayton_Bluff' 
    elif x in [9,10,11,12,29,30,31,32,49,50,51,52,269,69,70,71,72]:
        return 'North_End'
    elif x in [13,14,15,16,33,34,35,36,53,54,55,56,73,74,75,93,94]:
        return 'Payne_Phalen'
    elif x in [17,18,19,20,37,38,39,40,56,57,58,59,60,77,78,79,80]:
        return 'Greater_East_Side'
    else: 
        return 'NaN'

#District Function    
    
def district(x):
    if x in [1,2,3,4,5,6,7,8,21,22,23,24,25,26,27,28,43,44,45,46,47,48,\
            61,62,63,64,65,66,67,68,269,81,82,83,84,85,86,87,88,89,\
            101,102,103,104,105,106,107,108,109,110,122,123,124,125,126,\
            127,128,129,130,142,143,144,145,146,147,148,149,149,\
            162,163,164,165,166,167,168,182,183,184,185,186,\
            202,203,204,205,206,223,224,225,242,243,244,245,246]:
        return 'Western'
    elif x in [9,10,11,12,29,30,32,31,49,50,51,52,69,70,71,72,\
              90,91,92,111,112,131,132,133,267,268,249,130,230,\
              150,151,152,153,169,170,171,172,173,174,175,207,209,226,\
              187,188,189,192,193,194,195,210,211,212,213,214,215]:
        return 'Central'
    elif x in [13,14,15,16,17,18,19,20,33,34,35,36,37,38,39,40,\
              53,54,55,56,57,58,59,60,73,74,75,76,77,78,79,80,\
              93,94,95,96,97,98,99,100,114,115,116,117,118,119,120,\
              136,137,138,139,140,160,197,180,200,240,280]:
        return 'Eastern'
    else:
        return 'NaN'
    
# Define Age Function bins
def agef(x):
    if x<19: 
        return 'Teen <19'
    elif x>18 and x<25: 
        return 'Young Adult 19-24'
    elif x>25 and x<31: 
        return 'Adult 26-30'
    elif x>30 and x<46: 
        return 'Middle Adult 31-45' 
    elif x>45: 
        return 'Older Adult 46+' 
    else: 
        return 'NaN'    

Data Cleaning/Wrangling¶

I will be constructing several variables. Originally, I wanted to extract gridlocation coordinates from the dataset, but it makes more sense to connect the grid to a json file. The manipulations and additions are listed below:

Convert time variable to datetime; Extract Month, DayofWeek, Weekend, Hour
I've constructed a variable LateNight which denotes if a stop occured between 10:00PM to 5:00AM
Converted several variables to integers; Note: Female is designated as 1
Converted some descriptive columns to dummy variables
Extracted Latitude and Longitude in separate columns for each police grid

Initial Omissions

The demographic 'Native American' was excluded because the numbers were too small.
There are empty cells under the 'No Data' Category
The two reasons, '911 call', and 'Investigative Stop' were excluded because they were small numbers and not relevant to study; See below
Finally, any data entries not belonging to a community were excluded. It's possible that some stops occurred outside Saint Paul jursidiction. These data points were excluded
Driver being search and vehicle being searched is strongly correlated, so exclude vehicle search in analysis

In [4]:

df.Reason.value_counts()

Out[4]:

Moving Violation               77989
Equipment Violation            24475
Investigative Stop              4191
911 Call / Citizen Reported      183
No Data                          140
Name: Reason, dtype: int64

In [11]:

df.Year.value_counts()

Out[11]:

2017    31290
2018    29685
2019    22843
2020    18164
Name: Year, dtype: int64

In [7]:

#Prepping the Primary Dataset

#Add District Plannning Council and District columns from Functions
df['Community']= df['Grid'].apply(commun)
df['District']= df['Grid'].apply(district)
df['AgeDemo']= df['Age'].apply(agef)

#Add Time Variables
df['Date']= pd.to_datetime(df['Date'])
df['DayofWeek']=df['Date'].dt.dayofweek
df['Weekend'] = df['DayofWeek'].apply(lambda x: 1 if (x>4)  else 0)
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day
df['Hour'] = df['Date'].dt.hour
df['LateNight'] = df['Hour'].apply(lambda x: 1 if (x>21 or x<5)  else 0)

#Screening
df= df[df.Reason != '911 Call / Citizen Reported']
df= df[df.Reason != 'No Data']
df= df.loc[df['Reason'] != 'Investigative Stop']
df= df.loc[df['Race'] != 'Native American']

#Replace variables with dummies
df['Driver_search'].replace(to_replace=['No','Yes'], value=[0,1],inplace=True) 
df['Vehicle_search'].replace(to_replace=['No','Yes'], value=[0,1],inplace=True) 
df['Citation'].replace(to_replace=['No','Yes'], value=[0,1],inplace=True) 
df['Gender'].replace(to_replace=['Male','Female'], value=[0,1],inplace=True) #FEMALE is 1

df= pd.concat([df,pd.get_dummies(df['Reason'])], axis=1)
df= pd.concat([df,pd.get_dummies(df['Race'])], axis=1)

#Let Logitiude and Latitude
## I conldn't figure out how to utilize the dictionary, so I just converted it to a string
# Separate Latitude and Longitude 
df['GridLocation'] = df['GridLocation'].astype('str') 

#Get Latitude
new=df['GridLocation'].str.split("',", n = 1, expand = True) 
# making seperate first name column from new data frame
lat=new[0].str.split(" '", n = 1, expand = True)
df['Latitude']= pd.to_numeric(lat[1]) 

#Get Longtitude
long= new[1].str.split(": '", n = 1, expand = True)
#long[1]
df['Longitude']= pd.to_numeric(long[1].str.rstrip("'}"))

#Use if need to change variables ot integers
#df[['Dr_search', 'V_search',  'Citation']] = df[['Dr_search', 'V_search', 'Gender','Citation']].astype(int)

# Remove any missing community data entries
df= df[df.Community != 'NaN']


df.head()

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-5b64225ba477> in <module>
     22 
     23 #Replace variables with dummies
---> 24 df['Driver_search'].replace(to_replace=['No','Yes'], value=[0,1],inplace=True)
     25 df['Vehicle_search'].replace(to_replace=['No','Yes'], value=[0,1],inplace=True)
     26 df['Citation'].replace(to_replace=['No','Yes'], value=[0,1],inplace=True)

~\Anaconda3\lib\site-packages\pandas\core\series.py in replace(self, to_replace, value, inplace, limit, regex, method)
   3843         return super(Series, self).replace(to_replace=to_replace, value=value,
   3844                                            inplace=inplace, limit=limit,
-> 3845                                            regex=regex, method=method)
   3846 
   3847     @Appender(generic._shared_docs['shift'] % _shared_doc_kwargs)

~\Anaconda3\lib\site-packages\pandas\core\generic.py in replace(self, to_replace, value, inplace, limit, regex, method)
   6545                                                        dest_list=value,
   6546                                                        inplace=inplace,
-> 6547                                                        regex=regex)
   6548 
   6549                 else:  # [NA, ''] -> 0

~\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in replace_list(self, src_list, dest_list, inplace, regex)
    557             return _compare_or_regex_search(values, s, regex)
    558 
--> 559         masks = [comp(s, regex) for i, s in enumerate(src_list)]
    560 
    561         result_blocks = []

~\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in <listcomp>(.0)
    557             return _compare_or_regex_search(values, s, regex)
    558 
--> 559         masks = [comp(s, regex) for i, s in enumerate(src_list)]
    560 
    561         result_blocks = []

~\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in comp(s, regex)
    555                 return _compare_or_regex_search(maybe_convert_objects(values),
    556                                                 getattr(s, 'asm8'), regex)
--> 557             return _compare_or_regex_search(values, s, regex)
    558 
    559         masks = [comp(s, regex) for i, s in enumerate(src_list)]

~\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in _compare_or_regex_search(a, b, regex)
   1949         raise TypeError(
   1950             "Cannot compare types {a!r} and {b!r}".format(a=type_names[0],
-> 1951                                                           b=type_names[1]))
   1952     return result
   1953 

TypeError: Cannot compare types 'ndarray(dtype=int64)' and 'str'

In [9]:

df.head()

Out[9]:

	Age	Citation	Count	Date	GridLocation	Grid	Race	Reason	...	Hour	LateNight	Equipment Violation	Moving Violation	Black	White	Latitude
0	NaN	0	1	2018-05-11 20:47:00	nan	991.0	Black	Moving Violation	...	20	0	0	1	1	0	NaN
1	NaN	0	1	2019-03-03 00:15:00	nan	991.0	Black	Equipment Violation	...	0	1	1	0	1	0	NaN
2	28.0	1	1	2019-02-13 13:21:00	nan	991.0	Black	Moving Violation	...	13	0	0	1	1	0	NaN
3	NaN	0	1	2019-07-27 21:23:00	nan	991.0	White	Equipment Violation	...	21	0	1	0	0	1	NaN
4	NaN	0	1	2018-01-20 21:19:00	nan	991.0	White	Moving Violation	...	21	0	0	1	0	1	NaN

5 rows × 29 columns

In [16]:

a=df.query("Year==2020")
a.Race.value_counts()
df20x=df
df20x

Out[16]:

	Age	Citation	Count	Date	Driver_search	Gender	GridLocation	Grid	Race	Reason	...	Hour	LateNight	Equipment Violation	Moving Violation	Asian	Black	Latino	Other	White	Latitude
0	NaN	0	1	2018-05-11 20:47:00	0	0	nan	991.0	Black	Moving Violation	...	20	0	0	1	0	1	0	0	0	NaN
1	NaN	0	1	2019-03-03 00:15:00	0	0	nan	991.0	Black	Equipment Violation	...	0	1	1	0	0	1	0	0	0	NaN
2	28.0	1	1	2019-02-13 13:21:00	0	0	nan	991.0	Black	Moving Violation	...	13	0	0	1	0	1	0	0	0	NaN
3	NaN	0	1	2019-07-27 21:23:00	0	0	nan	991.0	White	Equipment Violation	...	21	0	1	0	0	0	0	0	1	NaN
4	NaN	0	1	2018-01-20 21:19:00	0	0	nan	991.0	White	Moving Violation	...	21	0	0	1	0	0	0	0	1	NaN
5	NaN	0	1	2018-12-27 14:02:00	0	0	nan	991.0	White	Moving Violation	...	14	0	0	1	0	0	0	0	1	NaN
6	24.0	1	1	2018-03-13 13:51:00	0	1	nan	991.0	White	Moving Violation	...	13	0	0	1	0	0	0	0	1	NaN
7	NaN	1	1	2019-03-02 22:12:00	0	0	nan	991.0	White	Equipment Violation	...	22	1	1	0	0	0	0	0	1	NaN
8	25.0	1	1	2019-06-07 22:08:00	0	1	nan	991.0	Other	Moving Violation	...	22	1	0	1	0	0	0	1	0	NaN
9	NaN	0	1	2019-01-19 22:08:00	0	0	nan	991.0	White	Equipment Violation	...	22	1	1	0	0	0	0	0	1	NaN
10	NaN	0	1	2019-06-07 20:18:00	0	0	nan	991.0	White	Moving Violation	...	20	0	0	1	0	0	0	0	1	NaN
11	22.0	1	1	2018-05-01 14:11:00	0	0	nan	991.0	White	Moving Violation	...	14	0	0	1	0	0	0	0	1	NaN
12	21.0	1	1	2019-09-27 19:13:00	0	1	nan	991.0	Black	Moving Violation	...	19	0	0	1	0	1	0	0	0	NaN
13	NaN	0	1	2019-07-20 21:10:00	0	0	nan	991.0	White	Equipment Violation	...	21	0	1	0	0	0	0	0	1	NaN
14	NaN	0	1	2019-05-09 16:56:00	0	1	nan	999.0	White	Moving Violation	...	16	0	0	1	0	0	0	0	1	NaN
15	NaN	0	1	2018-11-21 18:56:00	0	0	nan	991.0	Asian	Moving Violation	...	18	0	0	1	1	0	0	0	0	NaN
16	NaN	0	1	2018-08-01 14:14:00	0	0	nan	993.0	White	Moving Violation	...	14	0	0	1	0	0	0	0	1	NaN
17	NaN	0	1	2018-10-27 20:43:00	0	0	nan	991.0	White	Equipment Violation	...	20	0	1	0	0	0	0	0	1	NaN
18	NaN	0	1	2018-02-18 00:36:00	0	0	nan	996.0	Black	Moving Violation	...	0	1	0	1	0	1	0	0	0	NaN
19	NaN	0	1	2018-06-23 00:46:00	0	0	nan	993.0	Other	Moving Violation	...	0	1	0	1	0	0	0	1	0	NaN
20	NaN	0	1	2018-11-21 20:06:00	0	0	nan	991.0	White	Moving Violation	...	20	0	0	1	0	0	0	0	1	NaN
21	54.0	1	1	2018-04-09 14:20:00	0	1	nan	991.0	White	Moving Violation	...	14	0	0	1	0	0	0	0	1	NaN
22	NaN	0	1	2018-10-05 20:49:00	0	0	nan	991.0	White	Moving Violation	...	20	0	0	1	0	0	0	0	1	NaN
23	NaN	0	1	2018-10-05 20:35:00	0	1	nan	991.0	White	Moving Violation	...	20	0	0	1	0	0	0	0	1	NaN
24	NaN	0	1	2018-10-27 21:29:00	0	0	nan	991.0	Latino	Moving Violation	...	21	0	0	1	0	0	1	0	0	NaN
25	NaN	1	1	2018-06-30 22:35:00	0	0	nan	994.0	White	Moving Violation	...	22	1	0	1	0	0	0	0	1	NaN
26	NaN	0	1	2018-01-13 00:51:00	0	0	nan	991.0	White	Equipment Violation	...	0	1	1	0	0	0	0	0	1	NaN
27	NaN	0	1	2018-05-11 21:56:00	0	0	nan	991.0	White	Moving Violation	...	21	0	0	1	0	0	0	0	1	NaN
28	NaN	0	1	2018-02-17 21:37:00	0	1	nan	991.0	Black	Equipment Violation	...	21	0	1	0	0	1	0	0	0	NaN
29	23.0	1	1	2018-04-12 11:34:00	0	0	nan	991.0	White	Moving Violation	...	11	0	0	1	0	0	0	0	1	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
784687	NaN	0	1	2017-08-30 17:45:00	0	1	{'latitude': '44.959171113', 'longitude': '-93...	94.0	Other	Moving Violation	...	17	0	0	1	0	0	0	1	0	44.959171
784688	NaN	0	1	2017-11-02 22:46:00	0	0	{'latitude': '44.959369203', 'longitude': '-93...	87.0	Black	Moving Violation	...	22	1	0	1	0	1	0	0	0	44.959369
784689	NaN	0	1	2017-10-25 03:00:00	0	1	{'latitude': '44.96658643', 'longitude': '-93....	68.0	Black	Moving Violation	...	3	1	0	1	0	1	0	0	0	44.966586
784690	NaN	0	1	2017-12-07 15:30:00	0	0	{'latitude': '44.988221181', 'longitude': '-93...	11.0	White	Moving Violation	...	15	0	0	1	0	0	0	0	1	44.988221
784691	34.0	1	1	2017-09-07 08:30:00	0	1	{'latitude': '44.928875372', 'longitude': '-93...	267.0	White	Moving Violation	...	8	0	0	1	0	0	0	0	1	44.928875
784692	30.0	1	1	2017-09-29 18:33:00	0	0	{'latitude': '44.959515062', 'longitude': '-93...	91.0	White	Moving Violation	...	18	0	0	1	0	0	0	0	1	44.959515
784693	NaN	1	1	2017-09-13 10:42:00	0	1	{'latitude': '44.973827765', 'longitude': '-93...	51.0	Black	Moving Violation	...	10	0	0	1	0	1	0	0	0	44.973828
784694	NaN	1	1	2017-07-27 08:06:00	0	0	{'latitude': '44.973124945', 'longitude': '-93...	47.0	Black	Moving Violation	...	8	0	0	1	0	1	0	0	0	44.973125
784695	NaN	0	1	2017-09-20 22:34:00	0	1	{'latitude': '44.988314242', 'longitude': '-93...	14.0	Black	Moving Violation	...	22	1	0	1	0	1	0	0	0	44.988314
784696	37.0	1	1	2017-09-28 09:39:00	0	0	{'latitude': '44.980704001', 'longitude': '-93...	32.0	Black	Moving Violation	...	9	0	0	1	0	1	0	0	0	44.980704
784697	NaN	0	1	2017-07-25 13:50:00	0	0	{'latitude': '44.954914788', 'longitude': '-93...	112.0	Black	Moving Violation	...	13	0	0	1	0	1	0	0	0	44.954915
784698	NaN	1	1	2017-12-06 10:20:00	0	1	{'latitude': '44.980999933', 'longitude': '-93...	31.0	Black	Moving Violation	...	10	0	0	1	0	1	0	0	0	44.981000
784699	NaN	1	1	2017-10-05 10:03:00	0	1	{'latitude': '44.973124945', 'longitude': '-93...	47.0	Latino	Moving Violation	...	10	0	0	1	0	0	1	0	0	44.973125
784700	NaN	1	1	2017-12-01 11:21:00	0	0	{'latitude': '44.928875372', 'longitude': '-93...	267.0	White	Moving Violation	...	11	0	0	1	0	0	0	0	1	44.928875
784701	NaN	0	1	2017-10-11 15:12:00	0	1	{'latitude': '44.966688357', 'longitude': '-93...	75.0	Black	Moving Violation	...	15	0	0	1	0	1	0	0	0	44.966688
784702	NaN	0	1	2017-10-03 17:40:00	1	0	{'latitude': '44.959923808', 'longitude': '-93...	99.0	White	Moving Violation	...	17	0	0	1	0	0	0	0	1	44.959924
784703	NaN	0	1	2017-09-29 17:45:00	0	1	{'latitude': '44.95086001', 'longitude': '-93....	132.0	White	Moving Violation	...	17	0	0	1	0	0	0	0	1	44.950860
784704	27.0	1	1	2017-08-17 21:35:00	0	0	{'latitude': '44.953679624', 'longitude': '-93...	111.0	Other	Equipment Violation	...	21	0	1	0	0	0	0	1	0	44.953680
784705	NaN	0	1	2017-07-29 01:17:00	0	0	{'latitude': '44.980704001', 'longitude': '-93...	32.0	Black	Equipment Violation	...	1	1	1	0	0	1	0	0	0	44.980704
784706	28.0	1	1	2017-09-25 08:37:00	0	1	{'latitude': '44.965443164', 'longitude': '-93...	269.0	Black	Moving Violation	...	8	0	0	1	0	1	0	0	0	44.965443
784707	NaN	0	1	2017-11-09 10:51:00	0	1	{'latitude': '44.952068692', 'longitude': '-93...	105.0	White	Moving Violation	...	10	0	0	1	0	0	0	0	1	44.952069
784708	NaN	0	1	2017-12-04 00:15:00	1	0	{'latitude': '44.966741984', 'longitude': '-93...	73.0	Black	Moving Violation	...	0	1	0	1	0	1	0	0	0	44.966742
784709	NaN	0	1	2017-08-31 23:28:00	0	1	{'latitude': '44.973798199', 'longitude': '-93...	50.0	White	Equipment Violation	...	23	1	1	0	0	0	0	0	1	44.973798
784710	NaN	1	1	2017-09-07 10:09:00	0	1	{'latitude': '44.988189836', 'longitude': '-93...	9.0	White	Moving Violation	...	10	0	0	1	0	0	0	0	1	44.988190
784711	25.0	1	1	2017-10-24 08:45:00	0	0	{'latitude': '44.952726847', 'longitude': '-93...	119.0	Black	Moving Violation	...	8	0	0	1	0	1	0	0	0	44.952727
784712	26.0	1	1	2017-08-22 07:39:00	0	0	{'latitude': '44.952131905', 'longitude': '-93...	107.0	Other	Moving Violation	...	7	0	0	1	0	0	0	1	0	44.952132
784713	82.0	1	1	2017-11-02 12:01:00	0	1	{'latitude': '44.916259517', 'longitude': '-93...	206.0	White	Moving Violation	...	12	0	0	1	0	0	0	0	1	44.916260
784714	NaN	1	1	2017-10-31 09:05:00	0	1	{'latitude': '44.908307731', 'longitude': '-93...	223.0	Latino	Moving Violation	...	9	0	0	1	0	0	1	0	0	44.908308
784715	NaN	0	1	2017-09-14 22:42:00	0	0	nan	24.0	White	Equipment Violation	...	22	1	1	0	0	0	0	0	1	NaN
784716	NaN	0	1	2017-11-22 11:10:00	0	1	{'latitude': '44.948648967', 'longitude': '-93...	131.0	White	Moving Violation	...	11	0	0	1	0	0	0	0	1	44.948649

101982 rows × 29 columns

Data Methodology ¶

I'll be exploring the dataset through multiple angles: longitudinal, geo-spatial, and in-depth analysis from 2017-18. I will be focusing primarily on racial discrimination. I will be mostly be using data visualization, a predictive model would be inappropriate as the data is mostly binary.

Standard Analysis¶

There are several methods/considerations/limitations in testing for racial biases; :

At the first layer (shallow level), are certain drivers being selected more than others? There can be many explainations for discrepancies that may not be discrimination
At the second layer, are 'certain' drivers being treated differently. For example, do women drivers get less citations?
Are certain racial groups treatment stand out compared to their peers?
Are there external factors, such as venues, local bars, and congested traffic areas that may influence outcome?
Racial identification is imperfect because it is determined by officers and certain persons are mixed heritage

The primary analysis will focus on the second layer, I'll be create a master table that collects the groupby values; conditioned on race.

My analysis will be focusing primarily on treatment:

'Eq' stands for Equipment Violation; and 'Mov' indicates Moving Violation
Most of the values are normalized from [0 to 1] and conditioned on Racial identity; Examples provided below

How to read results:

Eq_Margin of the Asian group indicates the percentage 'Equipment Violation' respect to all stops conditioned on being asian. So a value 0.24 would indicate that 24% of stops for Asians were for Equipment Violations.
Eq_Citation of 0.4 for Asian drivers indicates that 40% of Asian driver received a citation for equipment violations conditioned on being asian.
Mov_DriverSearch of 0.15 for Asians indicates that 15% of Asian drivers were searched during a Moving Violation conditioned on being Asian
Mov_Gender_F of 0.55 for Asians indicates that 55% of Asian women are stopped for Moving Violations conditioned on being women.
Eq_LateNight of 0.25 for Asians indicate that 25% of Asian drivers are stopped for Equipment Violation during latenight conditioned on being Asian
Morn_Citation of 0.2 for Asians indicate that 20% of Asian drivers received citations during the daytime conditioned of being Asian

Longitudinal Analysis¶

Are there any trends throughout the years?

Commercial Analysis¶

Does the number of stores and type of stores influence the neighborhood? We'll see this visually.

Geo-Spatial Analysis¶

How does Frogtown compare to their neighbors?

Analysis ¶

Standard Analysis ¶

My strategy below is to create to save the group by values into a single table; the sort index allows the data to be formated in a way to predict patterns. The original code was more bulky, you can see it in the excess section

In [17]:

#Empty Data table setup
data = [['Asian',15,15,15,0.5,0.5,10,0.5,0.5,10,0.5,0.5,0.5,0.5,10,0.5,0.5,0.5,0.5],\
        ['Black', 15,15,15,0.5,0.5,10,0.5,0.5,10,0.5,0.5,0.5,0.5,10,0.5,0.5,0.5,0.5],\
        ['Latino', 15,15,15,0.5,0.5,10,0.5,0.5,10,0.5,0.5,0.5,0.5,10,0.5,0.5,0.5,0.5],\
        ['Other', 15,15,15,0.5,0.5,10,0.5,0.5,10,0.5,0.5,0.5,0.5,10,0.5,0.5,0.5,0.5],\
       ['White',15,15,15,0.5,0.5,10,0.5,0.5,10,0.5,0.5,0.5,0.5,10,0.5,0.5,0.5,0.5],\
        ['Total/Average',15,15,15,0.5,0.5,10,0.5,0.5,10,0.5,0.5,0.5,0.5,10,0.5,0.5,0.5,0.5]] 
         

Race_Grp = pd.DataFrame(data, columns= ['Race','Tot_Count','Eq_Count','Mov_Count','Eq_Margin','Mov_Margin','Citation_Count','Eq_Citation',\
                                      'Mov_Citation','Driversearch_Count','Eq_DriverSearch','Mov_DriverSearch',\
                                      'Eq_Gender_F','Mov_Gender_F','LateNight_Count','Eq_LateNight','Mov_LateNight',\
                                      'Morn_Citation','Late_Citation'])

Race_Grp.set_index('Race', inplace=True)

#Specify Frogtown
#rf=df.query("Community=='Thomas_Frogtown'")
rf=df.query("Year==2020")

Note: The code is sensitive to the ordering of columns

In [19]:

#procedure: Run this line of code to check patterns 
#round(rf['Race'].value_counts().sort_index(level=1),4)

RR= ['Asian','Black','Latino','Other','White']
for i,j in enumerate(RR):
    Race_Grp.set_value(j,'Tot_Count', round(rf['Race'].value_counts().sort_index(level=1),4)[i])
    Race_Grp.set_value(j,'Eq_Count', round(rf.groupby(['Race'])['Reason'].value_counts().sort_index(level=1),4)[i])
    Race_Grp.set_value(j,'Mov_Count', round(rf.groupby(['Race'])['Reason'].value_counts().sort_index(level=1),4)[i+len(RR)])
    Race_Grp.set_value(j,'Eq_Margin', round(rf.groupby(['Race'])['Reason'].value_counts(normalize=True).sort_index(level=1),4)[i])
    Race_Grp.set_value(j,'Mov_Margin', round(rf.groupby(['Race'])['Reason'].value_counts(normalize=True).sort_index(level=1),4)[i +len(RR)])
    Race_Grp.set_value(j,'Citation_Count',round(rf.groupby(['Race'])['Citation'].value_counts().sort_index(level=1),4)[i+len(RR)])
    Race_Grp.set_value(j,'Eq_Citation', round(rf.groupby(['Race','Reason'])['Citation'].value_counts(normalize=True).sort_index(level=2),4)[2*i+10])
    Race_Grp.set_value(j,'Mov_Citation', round(rf.groupby(['Race','Reason'])['Citation'].value_counts(normalize=True).sort_index(level=2),4)[2*i+11])
    Race_Grp.set_value(j,'Driversearch_Count', round(rf.groupby(['Race'])['Driver_search'].value_counts().sort_index(level=1),4)[i+len(RR)])    
    Race_Grp.set_value(j,'Eq_DriverSearch', round(rf.groupby(['Race','Reason'])['Driver_search'].value_counts(normalize=True).sort_index(level=2),4)[i*2+10])
    Race_Grp.set_value(j,'Mov_DriverSearch', round(rf.groupby(['Race','Reason'])['Driver_search'].value_counts(normalize=True).sort_index(level=2),4)[i*2+11])
    Race_Grp.set_value(j,'Eq_Gender_F', round(rf.groupby(['Race','Reason'])['Gender'].value_counts(normalize=True).sort_index(level=2),4)[i*2+10])
    Race_Grp.set_value(j,'Mov_Gender_F', round(rf.groupby(['Race','Reason'])['Gender'].value_counts(normalize=True).sort_index(level=2),4)[i*2+11])
    Race_Grp.set_value(j,'LateNight_Count', round(rf.groupby(['Race'])['LateNight'].value_counts().sort_index(level=1),4)[i +len(RR)])
    Race_Grp.set_value(j,'Eq_LateNight', round(rf.groupby(['Race','Reason'])['LateNight'].value_counts(normalize=True).sort_index(level=2),4)[i*2+10])
    Race_Grp.set_value(j,'Mov_LateNight', round(rf.groupby(['Race','Reason'])['LateNight'].value_counts(normalize=True).sort_index(level=2),4)[i*2+11])
    Race_Grp.set_value(j,'Morn_Citation', round(rf.groupby(['Race','LateNight'])['Citation'].value_counts(normalize=True).sort_index(level=2),4)[i*2+10]) 
    Race_Grp.set_value(j,'Late_Citation', round(rf.groupby(['Race','LateNight'])['Citation'].value_counts(normalize=True).sort_index(level=2),4)[i*2+11]) 

#Include Total/Avg Variables; #NOTE: the order
E=rf[rf.Reason == 'Equipment Violation'].sum() #2 /3
M=rf[rf.Reason == 'Moving Violation'].sum()#2 /3
Mo= rf[rf.LateNight == 0].sum()
L=rf[rf.LateNight == 1].sum()    
    
Race_Grp.set_value('Total/Average','Tot_Count',rf.count()[1])    
Race_Grp.set_value('Total/Average','Eq_Count',rf[rf.Reason == 'Equipment Violation'].count()[1]) 
Race_Grp.set_value('Total/Average','Mov_Count',rf[rf.Reason == 'Moving Violation'].count()[1]) 
Race_Grp.set_value('Total/Average','LateNight_Count',rf[rf.LateNight == 1].count()[1]) 
Race_Grp.set_value('Total/Average','Citation_Count',rf[rf.Citation == 1].count()[1]) 
Race_Grp.set_value('Total/Average','Driversearch_Count',rf[rf.Driver_search == 1].count()[1])
Race_Grp.set_value('Total/Average','Eq_Margin',E[2]/rf.count()[1])
Race_Grp.set_value('Total/Average','Mov_Margin',M[2]/rf.count()[1])
Race_Grp.set_value('Total/Average','Eq_Citation',E[1]/E[2])
Race_Grp.set_value('Total/Average','Mov_Citation',M[1]/M[2])
Race_Grp.set_value('Total/Average','Eq_DriverSearch',E[3]/E[2])
Race_Grp.set_value('Total/Average','Mov_DriverSearch',M[3]/M[2])
Race_Grp.set_value('Total/Average','Eq_Gender_F',E[4]/E[2])
# Race_Grp.set_value('Total/Average','Mov_Gender_F',M[4]/M[2])
Race_Grp.set_value('Total/Average','Eq_LateNight',E[19]/E[2])
Race_Grp.set_value('Total/Average','Mov_LateNight',M[19]/M[2])
Race_Grp.set_value('Total/Average','Morn_Citation',Mo[1]/Mo[2])
Race_Grp.set_value('Total/Average','Late_Citation',L[1]/L[2])

Race_Grp
#round(rf['Race'].value_counts().sort_index(level=1),4)

Out[19]:

	Tot_Count	Eq_Count	Mov_Count	Eq_Margin	Mov_Margin	Citation_Count	Eq_Citation	Mov_Citation	Driversearch_Count	Eq_DriverSearch	Mov_DriverSearch	Eq_Gender_F	Mov_Gender_F	LateNight_Count	Eq_LateNight	Mov_LateNight	Morn_Citation	Late_Citation
Race
Asian	1592	458	1134	0.287700	0.712300	594	0.168100	0.455900	138	0.122300	0.072300	0.229300	0.2681	429	0.397400	0.2178	0.458300	0.142200
Black	7780	2610	5170	0.335500	0.664500	2902	0.206100	0.457300	781	0.131000	0.084900	0.296200	0.3124	2286	0.394600	0.2429	0.460900	0.161900
Latino	1112	350	762	0.314700	0.685300	486	0.251400	0.522300	69	0.088600	0.049900	0.265700	0.2822	262	0.360000	0.1785	0.521200	0.164100
Other	330	85	245	0.257600	0.742400	153	0.305900	0.518400	13	0.070600	0.028600	0.305900	0.2857	74	0.352900	0.1796	0.558600	0.135100
White	7350	1925	5425	0.261900	0.738100	3478	0.328300	0.524600	300	0.058700	0.034500	0.366200	0.3808	1320	0.264900	0.1493	0.543000	0.154500
Total/Average	18164	5428	12736	0.298833	0.701167	7613	0.250737	0.490892	1301	0.100958	0.059124	0.313559	0.5000	4371	0.345984	0.0000	0.502066	0.157401

In [20]:

F=['Tot_Count','Eq_Count','Mov_Count','Eq_Margin','Mov_Margin', 'Eq_Gender_F','Mov_Gender_F']
Race_Grp[F]

Out[20]:

	Tot_Count	Eq_Count	Mov_Count	Eq_Margin	Mov_Margin	Eq_Gender_F	Mov_Gender_F
Race
Asian	1592	458	1134	0.287700	0.712300	0.229300	0.2681
Black	7780	2610	5170	0.335500	0.664500	0.296200	0.3124
Latino	1112	350	762	0.314700	0.685300	0.265700	0.2822
Other	330	85	245	0.257600	0.742400	0.305900	0.2857
White	7350	1925	5425	0.261900	0.738100	0.366200	0.3808
Total/Average	18164	5428	12736	0.298833	0.701167	0.313559	0.5000

In [21]:

G=['Citation_Count','Eq_Citation','Mov_Citation','Driversearch_Count','Eq_DriverSearch','Mov_DriverSearch',]
Race_Grp[G]

Out[21]:

	Citation_Count	Eq_Citation	Mov_Citation	Driversearch_Count	Eq_DriverSearch	Mov_DriverSearch
Race
Asian	594	0.168100	0.455900	138	0.122300	0.072300
Black	2902	0.206100	0.457300	781	0.131000	0.084900
Latino	486	0.251400	0.522300	69	0.088600	0.049900
Other	153	0.305900	0.518400	13	0.070600	0.028600
White	3478	0.328300	0.524600	300	0.058700	0.034500
Total/Average	7613	0.250737	0.490892	1301	0.100958	0.059124

In [22]:

H=['LateNight_Count','Eq_LateNight','Mov_LateNight','Morn_Citation','Late_Citation']
Race_Grp[H]

Out[22]:

	LateNight_Count	Eq_LateNight	Mov_LateNight	Morn_Citation	Late_Citation
Race
Asian	429	0.397400	0.2178	0.458300	0.142200
Black	2286	0.394600	0.2429	0.460900	0.161900
Latino	262	0.360000	0.1785	0.521200	0.164100
Other	74	0.352900	0.1796	0.558600	0.135100
White	1320	0.264900	0.1493	0.543000	0.154500
Total/Average	4371	0.345984	0.0000	0.502066	0.157401

Plotting¶

We will be taking the results from previous section and graphing it

In [23]:

# Graph Based on Counts

#Select Features
Features= ['Tot_Count','Eq_Count', 'Mov_Count','LateNight_Count', 'Citation_Count','Driversearch_Count']
df_t= Race_Grp[Features][:-1]
df_t = df_t.transpose()
#print(df_t)

#Plotting
ax= df_t.plot(kind='barh', figsize=(10, 6),alpha= 0.6)


plt.title('Total Traffic Stops in Saint Paul from 2017-18')
plt.ylabel('')
plt.xlabel('')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

In [13]:

# Graph on Equipment Violations

#Select Features
Features= ['Eq_Margin','Eq_LateNight','Eq_Citation','Eq_DriverSearch','Eq_Gender_F']
df_t= Race_Grp[Features][:-1]
df_t = df_t.transpose()
#print(df_t)

#Plotting
ax=df_t.plot(kind='barh', figsize=(10, 6),alpha= 0.6)

plt.title('Equipment Violations in Saint Paul from 2017-18')
plt.ylabel('')
plt.xlabel('Percent')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

In [14]:

# Graph Based on Movement Violations

#Select Features
Features= ['Mov_Margin','Mov_Citation','Mov_DriverSearch','Mov_Gender_F','Mov_LateNight' ]
df_t= Race_Grp[Features][:-1]
df_t = df_t.transpose()
#print(df_t)

#Plotting
ax=df_t.plot(kind='barh', figsize=(10, 6),alpha= 0.6)

plt.title('Movement Violations in Saint Paul from 2017-18')
plt.ylabel('')
plt.xlabel('Percent')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False) 

In [15]:

# Graph Based on Morning and Latenight Citations

#Select Features
Features= ['Morn_Citation','Late_Citation']
df_t= Race_Grp[Features][:-1]
df_t = df_t.transpose()
#print(df_t)

#Plotting
ax=df_t.plot(kind='barh', figsize=(10, 6),alpha= 0.6)

plt.title('Morning vs LateNight Citations Margin in Saint Paul from 2017-18')
plt.ylabel('')
plt.xlabel('')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

In [16]:

#Age Breakdown by Citation
#bf= rf.query("AgeDemo != 'NaN'")
Features= ['Citation','AgeDemo']
B= rf[Features].groupby(['AgeDemo']).sum()
B=B.transpose()

#print(B)
ax=B.plot(kind='barh', figsize=(10, 6),alpha= 0.6)

plt.title('Driver Age Citation Breakdown in Saint Paul from 2017-18')
plt.ylabel('')
plt.xlabel('')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

Insights

Recall in Frogtown the racial distribution is 1/3 Black, 1/3 Asian, and 1/5 white. With that said there is greater proportion of both Black and White drivers being stopped.

Blacks were stopped the most and by proportion have greater likelihood of being stopped for equipment violation (
Asians were stopped less respect to their proportional population
Blacks were less likely to recieve citations despite being pulled over frequently
White drivers were more likely to recieve a citation for moving violations compared to other groups
Black drivers are searched much often than their peers despite low citations count
Female drivers via proportion are less likely to be stopped; though white females have the highest proportion
There are considerably more Equipment violations during latenight than in the morning and less Moving Violations
The citation rates are much higher during the daytime vs the nighttime. This makes sense because there is less drivers during late night. Though 1/3 of stops occur during latenight.
For the age distribution, nearly 1/3 of citations don't have the driver's age. Based on the available data, middle aged adults are most likely to get citations.

Time Based Analysis¶

I will be checking out patterns for month, day of the week, and time of hour.

In [20]:

#Traffic Stops by Day of Week (Count)

Features= ['Count','LateNight','DayofWeek']
B= rf[Features].groupby(['DayofWeek']).sum()  #group by Function

#print(B)
ax=B.plot(kind='line')

plt.title('Traffic Stops (Count) by DayofWeek in Frogtown from 2017-19')
plt.ylabel('')
plt.xlabel('Day of Week (Starting Monday)')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

In [19]:

#Traffic Stops by Day of Week

Features= ['Count','Citation','Equipment Violation','Moving Violation','LateNight','DayofWeek']
B= rf[Features].groupby(['DayofWeek']).sum()  #group by Function
B=B.div(B['Count'].values,axis=0) #divide by count to get normalization
B.drop(B.columns[[0]], axis=1, inplace=True) #drop first group

#print(B)
ax=B.plot(kind='line')

plt.title('Traffic Stops (Margin) by DayofWeek in Frogtown from 2017-19')
plt.ylabel('')
plt.xlabel('Day of Week (Starting Monday)')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

The traffic stops are most frequent on Tuesday and less on the weekend
Late Night traffic stops increases on the weekend days includes Friday (makes sense)
There seems to be strong correlation between moving violation and number of citations
There seems to be strong correlation between equipment violation and latenight traffic stops
Less citations by proportion is less during the weekend

In [21]:

#Traffic Stops by Month

Features= ['Count', 'LateNight','Month']
B= rf[Features].groupby(['Month']).sum()

#print(B)
ax=B.plot(kind='line')

plt.title('Monthly Traffic Stops (Count) in Frogtown from 2017-19')
plt.ylabel('')
plt.xlabel('Month')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

In [22]:

#Traffic Stops by Month

Features= ['Count','Citation','Equipment Violation','Moving Violation','LateNight','Month']
B= rf[Features].groupby(['Month']).sum()
B=B.div(B['Count'].values,axis=0)
B.drop(B.columns[[0]], axis=1, inplace=True)

#print(B)
ax=B.plot(kind='line')

plt.title('Monthly Traffic Stops (margin) in Frogtown from 2017-19')
plt.ylabel('')
plt.xlabel('Month')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

There is less traffic stops during the winter months, and steadily more on Spring and Fall
There is a significant drop on Equipment and Latenight traffic stops during the summer month
There is an increase proportion of moving violations and citations during the summer months
Less citiations are given Late Fall and early Winter

In [23]:

#Traffic Stops by Hour

Features= ['Count','Driver_search','Hour']
B= rf[Features].groupby(['Hour']).sum()
#print(B)
ax= B.plot(kind='line')

plt.title('Traffic Stops Hourly (counts) in Frogtown from 2017-19')
plt.ylabel('')
plt.xlabel('Hour')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

In [24]:

#Traffic Stops by Hour

Features= ['Count','Citation','Equipment Violation','Moving Violation','Driver_search','Hour']
B= rf[Features].groupby(['Hour']).sum()
B=B.div(B['Count'].values,axis=0)
B.drop(B.columns[[0]], axis=1, inplace=True)
#print(B)
ax= B.plot(kind='line')

plt.title('Traffic Stops (margin) Hourly in Frogtown from 2017-19')
plt.ylabel('')
plt.xlabel('Hour')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

There are signficiant more traffic stops during the night hours
Moving violations during the day, Equipment violations during the night
Very high frequency of citations during the daytime hour matched with moving violations

Longitudinal Analysis of Frogtown ¶

The full dataset ranges from 2001 to 2018 and has many missing components. For some years, 50% of the data collected are missing key information. The 'total count' will includes all instances of traffic stops in Frogtown even if there is missing supplmental information. Note the sum function ignores missing values.

In [25]:

#Data Prep

#From the masterdataset get subsample
long= results_df.query('Grid in [87.0,88.0,89.0,67.0,68.0,90.0,91.0,92.0]')

#Add Time Variables
long['Date']= pd.to_datetime(long['Date'])
long['DayofWeek']=long['Date'].dt.dayofweek
long['Weekend'] = long['DayofWeek'].apply(lambda x: 1 if (x>4)  else 0)
long['Month'] = long['Date'].dt.month
long['Day'] = long['Date'].dt.day
long['Hour'] = long['Date'].dt.hour
long['LateNight'] = long['Hour'].apply(lambda x: 1 if (x>21 or x<5)  else 0)

#Screening
long=long.query("Race!='No Data'")
long=long.query("Driver_search !='No Data'")
long=long.query("Gender !='No Data'")

#Replace variables with dummies
long['Driver_search'].replace(to_replace=['No','Yes'], value=[0,1],inplace=True) 
long['Vehicle_search'].replace(to_replace=['No','Yes'], value=[0,1],inplace=True) 
long['Citation'].replace(to_replace=['No','Yes'], value=[0,1],inplace=True) 
long['Gender'].replace(to_replace=['Male','Female'], value=[0,1],inplace=True) #FEMALE is 1
long[['Driver_search','Gender']] = long[['Driver_search', 'Gender']].astype(int)
long= pd.concat([long,pd.get_dummies(long['Reason'])], axis=1)
long= pd.concat([long,pd.get_dummies(long['Race'])], axis=1)

FG_long= long

In [30]:

#Line Graph Total Stops In Frogtown

Features= ['Count','Year']
B= FG_long[Features].groupby(['Year']).sum()
ax= B.plot(kind='line')

plt.title('Total Traffic Stops in Frogtown from 2001-19')
plt.ylabel('')
plt.xlabel('')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

In [29]:

#Line graph COunt

Features= ['Count','Citation','LateNight','Driver_search','Gender','Year']
B= FG_long[Features].groupby(['Year']).sum()  #group by Function

#print(B)
ax= B.plot(kind='line')

plt.title('Trends Count in Frogtown from 2001-19')
plt.ylabel('')
plt.xlabel('')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

In [28]:

#Line Graph Margin

Features= ['Count','Citation','LateNight','Driver_search','Gender','Year']
B= FG_long[Features].groupby(['Year']).sum()  #group by Function
B=B.div(B['Count'].values,axis=0) #divide by count to get normalization
B.drop(B.columns[[0]], axis=1, inplace=True) #drop first group

#print(B)
ax= B.plot(kind='line')

plt.title('Margin Trends in Frogtown from 2001-19')
plt.ylabel('')
plt.xlabel('')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

In [31]:

Features= ['Count','Asian','Black','Latino','Other','White','Year']
B= FG_long[Features].groupby(['Year']).sum()  #group by Function
B=B.div(B['Count'].values,axis=0) #divide by count to get normalization
B.drop(B.columns[[0]], axis=1, inplace=True) #drop first group

#print(B)
ax= B.plot(kind='line')

plt.title('Racial Breakdown Margin of Traffic Stops in Frogtown from 2001-19')
plt.ylabel('')
plt.xlabel('')
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

Preliminary Longitudinal Analysis:

There was plenty of past data that indicate a traffic, but there was no information given.Thus the graphs presented are missing many datapoints and can provide a skewed a picture
At around 2004 to 2005, a change of data practice/collections probably occured
The Citation rate has increased in the last couple of years
The racial demogrpahic of traffic stops have been steady with Blacks being overrepresentated despite some demographic shift in the neighborhood
Driver search rate has declined over time

Commercial Insight/Analysis ¶

I will be using the Four-Square API to get the nearby venues, within the radius of each police grid. I've separated the venue into three categories of interest. The map below indicates the venues within Frogtown (Upon running it again, I may have exhausted my queries):

Green indicates Restaurant
Blue indicates Bars
Yellow indicates Convenience/Corner Stores

In [113]:

#Function that extracts information of many neighborhoods

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue_Latitude', 
                  'Venue_Longitude', 
                  'Venue_Category']
    
    return(nearby_venues)
    

In [62]:

#Simplify the table
ab=rf.groupby(['Grid','Latitude','Longitude'], as_index=False).count()
Features= ['Grid','Latitude','Longitude']
ab= ab[Features]

# Get the location of Venues for bunch of locations 
Frogtown_venues = getNearbyVenues(names=ab['Grid'],
                                   latitudes=ab['Latitude'],
                                   longitudes=ab['Longitude']
                                  )

Bar= ['Dive Bar','Bar','Liquor Store']
Restaurant= ['Asian Restaurant', 'Thai Restaurant','Ethiopian Restaurant',' Vietnamese Restaurant','Middle Eastern Restaurant'\
            ,'Chinese Restaurant','Ramen Restaurant','BBQ Joint','Fast Food Restaurant','Noodle House','Restaurant','Café']
Convenience= ['Convenience Store','Grocery Store']

#Select on mutiple values
Frogtown_venuesB= Frogtown_venues.loc[Frogtown_venues['Venue_Category'].isin(Bar)]
Frogtown_venuesR= Frogtown_venues.loc[Frogtown_venues['Venue_Category'].isin(Restaurant)]
Frogtown_venuesC= Frogtown_venues.loc[Frogtown_venues['Venue_Category'].isin(Convenience)]

67.0
68.0
87.0
88.0
89.0
90.0
91.0
92.0

In [64]:

Frogtown_venues.head()

Out[64]:

	Neighborhood	Neighborhood Latitude	Neighborhood Longitude	Venue	Venue_Latitude	Venue_Longitude	Venue_Category
0	67.0	44.966603	-93.141545	Half Time Rec	44.970362	-93.143015	Dive Bar
1	67.0	44.966603	-93.141545	Tobasi Stop	44.963243	-93.139395	Convenience Store
2	67.0	44.966603	-93.141545	Pro-Image Beauty School	44.963986	-93.140867	Cosmetics Shop
3	67.0	44.966603	-93.141545	Minnehaha Liquors	44.963355	-93.140359	Liquor Store
4	67.0	44.966603	-93.141545	Sunrise Creative Gourmet	44.964883	-93.137208	Market

In [366]:

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
import folium # map rendering library

#Create Frogtown GeoMap
FG_map = folium.Map(location=[44.958326, -93.122926], zoom_start=14,tiles="OpenStreetMap")

# For Bars    
for lat, lng, borough, neighborhood in zip(Frogtown_venuesB['Venue_Latitude'], Frogtown_venuesB['Venue_Longitude'],Frogtown_venuesB['Venue'], Frogtown_venuesB['Venue_Category']):
    label = '{}, {}'.format(borough, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(FG_map)  

#Convenice Stores    
for lat, lng, borough, neighborhood in zip(Frogtown_venuesC['Venue_Latitude'], Frogtown_venuesC['Venue_Longitude'],Frogtown_venuesC['Venue'], Frogtown_venuesC['Venue_Category']):
    label = '{}, {}'.format(borough, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='yellow',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(FG_map)  

# Restaurants    
for lat, lng, borough, neighborhood in zip(Frogtown_venuesR['Venue_Latitude'], Frogtown_venuesR['Venue_Longitude'],Frogtown_venuesR['Venue'], Frogtown_venuesR['Venue_Category']):
    label = '{}, {}'.format(borough, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(FG_map)  
    
    
FG_map    
    

Out[366]:

The graph above shows the stores within the Frogtown area. University Avenue has concentrated traffic. There are some neighborhood bars and convenience near the residential homes, we'll see a clearer picture with geo-spatial data

Geo-Graphing Prep ¶

The Saint Paul Police department have a json file that maps out the police grid.

To prepare the data for geo-spatial information, I would need to group by grid. I changed the variables to dummies early on, so I can sum them up during aggregation.

For the neighborhood visualizations, I took in data from the Census on racial distribution

In [129]:

sp_geo1

Out[129]:

'https://information.stpaul.gov/resource/kkd6-vvns.geojson'

In [32]:

#import Programs and police grid
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium # map rendering library

sp_geo = r'Geo-Json\Saint Paul Police Grid - Shapefile.geojson'

#Socrata API IS NOT that good
#sp_geo= 'https://information.stpaul.gov/resource/xfxz-iqwn.json'

# Programmer Note: One of the challenges to get appropiate connection was to set the attribute Grid to be json value gridnum. 

In [33]:

#All of Saint Paul Grid

# The graph is determined by Grid so we will group along that
Features= ['Grid','Count','Citation','Equipment Violation','Moving Violation','Driver_search','Vehicle_search','LateNight',\
          'Asian','Black','Latino','White','Other','Gender','Weekend']

# Create a sum and divide by Count; and 
B= df[Features].groupby(['Grid']).sum()

#Save Sum Values
C= B[['Count','Citation','Equipment Violation','Moving Violation','LateNight']]

#Divide by Count and then add new columns in tranformed table
B=B.div(B['Count'].values,axis=0)
B['Count']=C.iloc[:,0] 
B['Citation_count'] = C.iloc[:,1]
B['Equipment Violation_count'] = C.iloc[:,2]
B['Moving Violation_count'] = C.iloc[:,3]
B['LateNight_count'] = C.iloc[:,4]

# The GRID needs to follow exactly like the json file
B=B.reset_index()
B.Grid = B.Grid.astype(int)
B.Grid = B.Grid.astype(str)
#print(B.head())

In [34]:

#Frogtown Only:
#FROGTOWN Value
F= rf[Features].groupby(['Grid']).sum()
C= F['Count']
F=F.div(F['Count'].values,axis=0)
F['Count']=C 

# The GRID needs to follow exactly like the json file
F=F.reset_index()
F.Grid = F.Grid.astype(int)
F.Grid = F.Grid.astype(str)

In [18]:

#Add  demographic Data
SPDem = pd.read_csv('Data\MSP Neighborhoods_2013-2017.csv', skiprows=[1])
SPDem.columns = SPDem.columns.str.replace(' ', '')
SPDem['City'].value_counts().to_frame()

#Get Saint Paul only
SPDem= SPDem.query('City in ["St. Paul"]')
#SPDem.columns.values[i]

#for i,square in enumerate(SPDem.columns.values): 
#    print(str(i) + ' '+ square)
#print(SPDem)

#Specify columns of interest
SPDem = SPDem.iloc[:,[1,2,27,39,43,47,63,71,107,123,127,131,135,159,215,243,267,271,275,279,363,370,374,378,\
                       382,386,390,394,399,403,471,475,479,495,511,515,519,523]]
col_names= ['TotHH','Neigh','Unemploy','<HighSDip','HighSDip','AssDeg','BacDeg+','Foreign%','Child<18','Inc<35k','Inc_35to50','Inc_50to75','Inc75to100','Age65+','Renter','LangNotEng','Age18_24','Age25_34','Age35_44','Age45_54','Male','Race_Black','Race_Native','Race_Asian','Race_Other','Race_Two+','Race_His','Race_White','Race_POC','Poverty','Poverty_At','Poverty_100to150','Poverty_150to200','CarOwn','Commute<10','Commute_10to19','Commute_20-29','Commute_30+']
SPDem.columns= col_names

#Rename certain values so they can be matched/merged correctly with the other dataframe.
SPDem.Neigh.unique()
SPDem.iat[2,1] = 'Capital'
SPDem.iat[4,1] = 'Thomas'
SPDem.sort_values(['Neigh'],inplace=True)
SPDem =SPDem[['Race_White','Race_Black','Race_Asian']]

In [19]:

#All of SaintPaul Neighborhood

# The graph is determined by Grid so we will group along that
Features= ['Community','Count','Citation','Equipment Violation','Moving Violation','Driver_search','Vehicle_search','LateNight',\
          'Asian','Black','Latino','White','Other','Gender','Weekend']

# Create a sum and divide by Count; and 
N= df[Features].groupby(['Community']).sum()

#Save Sum Values
C= N[['Count','Citation','Equipment Violation','Moving Violation','LateNight']]

#Divide by Count and then add new columns in tranformed table
N=N.div(N['Count'].values,axis=0)
N['Count']=C.iloc[:,0] 
N['Citation_count'] = C.iloc[:,1]
N['Equipment Violation_count'] = C.iloc[:,2]
N['Moving Violation_count'] = C.iloc[:,3]
N['LateNight_count'] = C.iloc[:,4]

# Add the demographic info!
N['White_Demo']= SPDem.iloc[:,0].values #.values was used
N['Black_Demo']= SPDem.iloc[:,1].values 
N['Asian_Demo']= SPDem.iloc[:,2].values 
N=N.reset_index()

#Merging two dataframes; where you create a distinct dataframe of just grid and community, then merge to neighbrhood file
A= df.groupby(['Grid','Community']).size().reset_index(name='Blah')
#A= df[['Grid','Community',]].groupby('Community').reset_index()
C=A
N=pd.merge(N, C, on='Community')

# The GRID needs to follow exactly like the json file
N=N.reset_index()
N.Grid = N.Grid.astype(int)
N.Grid = N.Grid.astype(str)
N

Out[19]:

	index	Community	Count	Citation	Equipment Violation	Moving Violation	Driver_search	Vehicle_search	LateNight	Asian	...	Weekend	Citation_count	Equipment Violation_count	Moving Violation_count	LateNight_count	White_Demo	Black_Demo	Asian_Demo	Grid	Blah
0	0	Battle_Creek	2290	0.722271	0.117467	0.882533	0.032751	0.030568	0.154148	0.143668	...	0.110480	1654	269.0	2021.0	353	0.398383	0.203079	0.231585	98	268
1	1	Battle_Creek	2290	0.722271	0.117467	0.882533	0.032751	0.030568	0.154148	0.143668	...	0.110480	1654	269.0	2021.0	353	0.398383	0.203079	0.231585	99	599
2	2	Battle_Creek	2290	0.722271	0.117467	0.882533	0.032751	0.030568	0.154148	0.143668	...	0.110480	1654	269.0	2021.0	353	0.398383	0.203079	0.231585	100	32
3	3	Battle_Creek	2290	0.722271	0.117467	0.882533	0.032751	0.030568	0.154148	0.143668	...	0.110480	1654	269.0	2021.0	353	0.398383	0.203079	0.231585	118	144
4	4	Battle_Creek	2290	0.722271	0.117467	0.882533	0.032751	0.030568	0.154148	0.143668	...	0.110480	1654	269.0	2021.0	353	0.398383	0.203079	0.231585	119	682
5	5	Battle_Creek	2290	0.722271	0.117467	0.882533	0.032751	0.030568	0.154148	0.143668	...	0.110480	1654	269.0	2021.0	353	0.398383	0.203079	0.231585	120	152
6	6	Battle_Creek	2290	0.722271	0.117467	0.882533	0.032751	0.030568	0.154148	0.143668	...	0.110480	1654	269.0	2021.0	353	0.398383	0.203079	0.231585	137	39
7	7	Battle_Creek	2290	0.722271	0.117467	0.882533	0.032751	0.030568	0.154148	0.143668	...	0.110480	1654	269.0	2021.0	353	0.398383	0.203079	0.231585	138	87
8	8	Battle_Creek	2290	0.722271	0.117467	0.882533	0.032751	0.030568	0.154148	0.143668	...	0.110480	1654	269.0	2021.0	353	0.398383	0.203079	0.231585	139	65
9	9	Battle_Creek	2290	0.722271	0.117467	0.882533	0.032751	0.030568	0.154148	0.143668	...	0.110480	1654	269.0	2021.0	353	0.398383	0.203079	0.231585	140	42
10	10	Battle_Creek	2290	0.722271	0.117467	0.882533	0.032751	0.030568	0.154148	0.143668	...	0.110480	1654	269.0	2021.0	353	0.398383	0.203079	0.231585	160	70
11	11	Battle_Creek	2290	0.722271	0.117467	0.882533	0.032751	0.030568	0.154148	0.143668	...	0.110480	1654	269.0	2021.0	353	0.398383	0.203079	0.231585	180	48
12	12	Battle_Creek	2290	0.722271	0.117467	0.882533	0.032751	0.030568	0.154148	0.143668	...	0.110480	1654	269.0	2021.0	353	0.398383	0.203079	0.231585	197	1
13	13	Battle_Creek	2290	0.722271	0.117467	0.882533	0.032751	0.030568	0.154148	0.143668	...	0.110480	1654	269.0	2021.0	353	0.398383	0.203079	0.231585	200	3
14	14	Battle_Creek	2290	0.722271	0.117467	0.882533	0.032751	0.030568	0.154148	0.143668	...	0.110480	1654	269.0	2021.0	353	0.398383	0.203079	0.231585	240	15
15	15	Battle_Creek	2290	0.722271	0.117467	0.882533	0.032751	0.030568	0.154148	0.143668	...	0.110480	1654	269.0	2021.0	353	0.398383	0.203079	0.231585	280	43
16	16	Capital_River	4911	0.697007	0.108735	0.891265	0.029933	0.025657	0.145795	0.083690	...	0.159845	3423	534.0	4377.0	716	0.675663	0.132114	0.089509	111	148
17	17	Capital_River	4911	0.697007	0.108735	0.891265	0.029933	0.025657	0.145795	0.083690	...	0.159845	3423	534.0	4377.0	716	0.675663	0.132114	0.089509	112	1007
18	18	Capital_River	4911	0.697007	0.108735	0.891265	0.029933	0.025657	0.145795	0.083690	...	0.159845	3423	534.0	4377.0	716	0.675663	0.132114	0.089509	131	452
19	19	Capital_River	4911	0.697007	0.108735	0.891265	0.029933	0.025657	0.145795	0.083690	...	0.159845	3423	534.0	4377.0	716	0.675663	0.132114	0.089509	132	230
20	20	Capital_River	4911	0.697007	0.108735	0.891265	0.029933	0.025657	0.145795	0.083690	...	0.159845	3423	534.0	4377.0	716	0.675663	0.132114	0.089509	133	2307
21	21	Capital_River	4911	0.697007	0.108735	0.891265	0.029933	0.025657	0.145795	0.083690	...	0.159845	3423	534.0	4377.0	716	0.675663	0.132114	0.089509	152	302
22	22	Capital_River	4911	0.697007	0.108735	0.891265	0.029933	0.025657	0.145795	0.083690	...	0.159845	3423	534.0	4377.0	716	0.675663	0.132114	0.089509	153	465
23	23	Como	1328	0.640813	0.140813	0.859187	0.031627	0.026355	0.192018	0.100151	...	0.146084	851	187.0	1141.0	255	0.762898	0.089230	0.054048	5	33
24	24	Como	1328	0.640813	0.140813	0.859187	0.031627	0.026355	0.192018	0.100151	...	0.146084	851	187.0	1141.0	255	0.762898	0.089230	0.054048	6	29
25	25	Como	1328	0.640813	0.140813	0.859187	0.031627	0.026355	0.192018	0.100151	...	0.146084	851	187.0	1141.0	255	0.762898	0.089230	0.054048	7	13
26	26	Como	1328	0.640813	0.140813	0.859187	0.031627	0.026355	0.192018	0.100151	...	0.146084	851	187.0	1141.0	255	0.762898	0.089230	0.054048	8	54
27	27	Como	1328	0.640813	0.140813	0.859187	0.031627	0.026355	0.192018	0.100151	...	0.146084	851	187.0	1141.0	255	0.762898	0.089230	0.054048	25	100
28	28	Como	1328	0.640813	0.140813	0.859187	0.031627	0.026355	0.192018	0.100151	...	0.146084	851	187.0	1141.0	255	0.762898	0.089230	0.054048	26	29
29	29	Como	1328	0.640813	0.140813	0.859187	0.031627	0.026355	0.192018	0.100151	...	0.146084	851	187.0	1141.0	255	0.762898	0.089230	0.054048	27	61
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
171	171	Union_Park	3137	0.659229	0.141536	0.858464	0.031559	0.026777	0.184252	0.039528	...	0.174370	2068	444.0	2693.0	578	0.772243	0.098012	0.037718	126	189
172	172	West_7th	3038	0.727452	0.095787	0.904213	0.017117	0.011521	0.108624	0.035879	...	0.143186	2210	291.0	2747.0	330	0.721428	0.108826	0.029926	150	274
173	173	West_7th	3038	0.727452	0.095787	0.904213	0.017117	0.011521	0.108624	0.035879	...	0.143186	2210	291.0	2747.0	330	0.721428	0.108826	0.029926	151	42
174	174	West_7th	3038	0.727452	0.095787	0.904213	0.017117	0.011521	0.108624	0.035879	...	0.143186	2210	291.0	2747.0	330	0.721428	0.108826	0.029926	169	463
175	175	West_7th	3038	0.727452	0.095787	0.904213	0.017117	0.011521	0.108624	0.035879	...	0.143186	2210	291.0	2747.0	330	0.721428	0.108826	0.029926	170	266
176	176	West_7th	3038	0.727452	0.095787	0.904213	0.017117	0.011521	0.108624	0.035879	...	0.143186	2210	291.0	2747.0	330	0.721428	0.108826	0.029926	171	73
177	177	West_7th	3038	0.727452	0.095787	0.904213	0.017117	0.011521	0.108624	0.035879	...	0.143186	2210	291.0	2747.0	330	0.721428	0.108826	0.029926	187	59
178	178	West_7th	3038	0.727452	0.095787	0.904213	0.017117	0.011521	0.108624	0.035879	...	0.143186	2210	291.0	2747.0	330	0.721428	0.108826	0.029926	188	215
179	179	West_7th	3038	0.727452	0.095787	0.904213	0.017117	0.011521	0.108624	0.035879	...	0.143186	2210	291.0	2747.0	330	0.721428	0.108826	0.029926	189	9
180	180	West_7th	3038	0.727452	0.095787	0.904213	0.017117	0.011521	0.108624	0.035879	...	0.143186	2210	291.0	2747.0	330	0.721428	0.108826	0.029926	207	433
181	181	West_7th	3038	0.727452	0.095787	0.904213	0.017117	0.011521	0.108624	0.035879	...	0.143186	2210	291.0	2747.0	330	0.721428	0.108826	0.029926	226	46
182	182	West_7th	3038	0.727452	0.095787	0.904213	0.017117	0.011521	0.108624	0.035879	...	0.143186	2210	291.0	2747.0	330	0.721428	0.108826	0.029926	230	216
183	183	West_7th	3038	0.727452	0.095787	0.904213	0.017117	0.011521	0.108624	0.035879	...	0.143186	2210	291.0	2747.0	330	0.721428	0.108826	0.029926	249	80
184	184	West_7th	3038	0.727452	0.095787	0.904213	0.017117	0.011521	0.108624	0.035879	...	0.143186	2210	291.0	2747.0	330	0.721428	0.108826	0.029926	267	393
185	185	West_7th	3038	0.727452	0.095787	0.904213	0.017117	0.011521	0.108624	0.035879	...	0.143186	2210	291.0	2747.0	330	0.721428	0.108826	0.029926	268	469
186	186	West_Side	1192	0.433725	0.279362	0.720638	0.083893	0.078020	0.353188	0.041946	...	0.283557	517	333.0	859.0	421	0.453392	0.135823	0.064984	172	61
187	187	West_Side	1192	0.433725	0.279362	0.720638	0.083893	0.078020	0.353188	0.041946	...	0.283557	517	333.0	859.0	421	0.453392	0.135823	0.064984	173	193
188	188	West_Side	1192	0.433725	0.279362	0.720638	0.083893	0.078020	0.353188	0.041946	...	0.283557	517	333.0	859.0	421	0.453392	0.135823	0.064984	174	245
189	189	West_Side	1192	0.433725	0.279362	0.720638	0.083893	0.078020	0.353188	0.041946	...	0.283557	517	333.0	859.0	421	0.453392	0.135823	0.064984	175	1
190	190	West_Side	1192	0.433725	0.279362	0.720638	0.083893	0.078020	0.353188	0.041946	...	0.283557	517	333.0	859.0	421	0.453392	0.135823	0.064984	192	97
191	191	West_Side	1192	0.433725	0.279362	0.720638	0.083893	0.078020	0.353188	0.041946	...	0.283557	517	333.0	859.0	421	0.453392	0.135823	0.064984	193	154
192	192	West_Side	1192	0.433725	0.279362	0.720638	0.083893	0.078020	0.353188	0.041946	...	0.283557	517	333.0	859.0	421	0.453392	0.135823	0.064984	194	202
193	193	West_Side	1192	0.433725	0.279362	0.720638	0.083893	0.078020	0.353188	0.041946	...	0.283557	517	333.0	859.0	421	0.453392	0.135823	0.064984	195	35
194	194	West_Side	1192	0.433725	0.279362	0.720638	0.083893	0.078020	0.353188	0.041946	...	0.283557	517	333.0	859.0	421	0.453392	0.135823	0.064984	209	6
195	195	West_Side	1192	0.433725	0.279362	0.720638	0.083893	0.078020	0.353188	0.041946	...	0.283557	517	333.0	859.0	421	0.453392	0.135823	0.064984	210	5
196	196	West_Side	1192	0.433725	0.279362	0.720638	0.083893	0.078020	0.353188	0.041946	...	0.283557	517	333.0	859.0	421	0.453392	0.135823	0.064984	211	22
197	197	West_Side	1192	0.433725	0.279362	0.720638	0.083893	0.078020	0.353188	0.041946	...	0.283557	517	333.0	859.0	421	0.453392	0.135823	0.064984	212	34
198	198	West_Side	1192	0.433725	0.279362	0.720638	0.083893	0.078020	0.353188	0.041946	...	0.283557	517	333.0	859.0	421	0.453392	0.135823	0.064984	213	42
199	199	West_Side	1192	0.433725	0.279362	0.720638	0.083893	0.078020	0.353188	0.041946	...	0.283557	517	333.0	859.0	421	0.453392	0.135823	0.064984	214	77
200	200	West_Side	1192	0.433725	0.279362	0.720638	0.083893	0.078020	0.353188	0.041946	...	0.283557	517	333.0	859.0	421	0.453392	0.135823	0.064984	215	18

201 rows × 25 columns

Frogtown Geo-Spatial Data ¶

Total Traffic Incidents in Frogtown by Grid¶

In [37]:

#Create Frogtown GeoMap
FG_map = folium.Map(location=[44.958326, -93.122926], zoom_start=14,tiles="OpenStreetMap")


# I've included the markers from previous graph    
for lat, lng, borough, neighborhood in zip(Frogtown_venues['Venue_Latitude'], Frogtown_venues['Venue_Longitude'],Frogtown_venues['Venue'], Frogtown_venues['Venue_Category']):
    label = '{}, {}'.format(borough, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.1,
        parse_html=False).add_to(FG_map)       
    

FG_map.choropleth(
    geo_data=sp_geo,
    data=F,
    columns=['Grid','Count'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Total Traffic Stops',
    highlight= True,
    name= 'Clean Map'   
)
folium.LayerControl().add_to(FG_map)


# display map
FG_map

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-37-e561985be0e4> in <module>
      4 
      5 # I've included the markers from previous graph
----> 6 for lat, lng, borough, neighborhood in zip(Frogtown_venues['Venue_Latitude'], Frogtown_venues['Venue_Longitude'],Frogtown_venues['Venue'], Frogtown_venues['Venue_Category']):
      7     label = '{}, {}'.format(borough, neighborhood)
      8     label = folium.Popup(label, parse_html=True)

NameError: name 'Frogtown_venues' is not defined

Citation Margin in Frogtown by Grid¶

In [133]:

#sp_geo = r'Geo-Json\Saint Paul Police Grid - Shapefile.geojson'

FG_map = folium.Map(location=[44.958326, -93.122926], zoom_start=14,tiles="OpenStreetMap")
FG_map.choropleth(
    geo_data=sp_geo,
    data=F,
    columns=['Grid','Citation'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Citation Margin',
    highlight= True,
    name= 'Clean Map'   
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[133]:

Equipment Violation Margin in Frogtown by Grid¶

In [38]:

FG_map = folium.Map(location=[44.958326, -93.122926], zoom_start=14,tiles="OpenStreetMap")
FG_map.choropleth(
    geo_data=sp_geo,
    data=F,
    columns=['Grid','Equipment Violation'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Equipment Violation Margin',
    highlight= True,
    name= 'Clean Map'   
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[38]:

LateNight Traffic Stop Margin in Frogtown by Grid¶

In [39]:

FG_map = folium.Map(location=[44.958326, -93.122926], zoom_start=14,tiles="OpenStreetMap")
FG_map.choropleth(
    geo_data=sp_geo,
    data=F,
    columns=['Grid','LateNight'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='LateNight Stop Margin',
    highlight= True,
    name= 'Clean Map'   
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[39]:

Driver Searched Margin In Frogtown¶

In [40]:

FG_map = folium.Map(location=[44.958326, -93.122926], zoom_start=14,tiles="OpenStreetMap")
FG_map.choropleth(
    geo_data=sp_geo,
    data=F,
    columns=['Grid','Driver_search'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Driver Search Margin',
    highlight= True,
    name= 'Clean Map'   
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[40]:

Frogtown Summary Stats¶

For total incidents, high volume of incidents are around University ave around Dale
The citation margin is not too high even in the spaces where there is greater frequency of stops
The Equipment Violation density is high and located in the same high density rate
The LateNight Stop Margin is really high in the University area and the vehicle search rate is also high

Saint Paul Geospatial Data ¶

For margin specifications, a police grid must have more than 100 total traffic stops. A smaller number creates bigger imbalance on margins
The downtown district was excluded when graphing total numbers under grid data because it has very high frequency of stops; influencing the legend gradient.
The data is from 2017 to 2018.
There is one police grid that has zero traffic stops (try to find it)

Total Traffic Incidents in Saint Paul by Neighborhood¶

In [41]:

#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")

FG_map.choropleth(
    geo_data=sp_geo,
    data=N,
    columns=['Grid','Count'],
    #nan_fill_color='purple',
    #nan_fill_opacity=0.4,
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Total Traffic Stops',
    highlight= True,
    name= 'Clean Map'
    
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[41]:

Total Traffic Incidents in Saint Paul by Grid¶

In [42]:

#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")

#messed_up_data.loc[4, 'Count'] = float('nan')
C=B.query('Count < 2000')

FG_map.choropleth(
    geo_data=sp_geo,
    data=C,
    columns=['Grid','Count'],
    #nan_fill_color='purple',
    #nan_fill_opacity=0.4,
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Total Traffic Stops',
    highlight= True,
    name= 'Clean Map'
    
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[42]:

Total Citation Count in Saint Paul by Grid¶

In [43]:

#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")

C= B.query('Count < 2000')

FG_map.choropleth(
    geo_data=sp_geo,
    data=C,
    columns=['Grid','Citation_count'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Citation Counts',
    highlight= True,
    name= 'Clean Map'
    
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[43]:

Citation Margin in Saint Paul by Neighborhood¶

In [44]:

#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")

FG_map.choropleth(
    geo_data=sp_geo,
    data=N,
    columns=['Grid','Citation'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Citation Margin',
    highlight= True,
    name= 'Clean Map'
    
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[44]:

Citation Margin in Saint Paul by Grid¶

In [45]:

#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")

C= B.query('Count > 100')

FG_map.choropleth(
    geo_data=sp_geo,
    data=C,
    columns=['Grid','Citation'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Citation Margin',
    highlight= True,
    name= 'Clean Map'
    
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[45]:

Equipment Violation Count in Saint Paul by Grid¶

In [46]:

#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")

C= B.query('Count < 2000')

FG_map.choropleth(
    geo_data=sp_geo,
    data=C,
    columns=['Grid','Equipment Violation_count'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Total Equipment Violation Stops',
    highlight= True,
    name= 'Clean Map'
    
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[46]:

Equipment Violation Margin in Saint Paul by Neighborhood¶

In [47]:

#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")


FG_map.choropleth(
    geo_data=sp_geo,
    data=N,
    columns=['Grid','Equipment Violation'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Equipment Violation Margin',
    highlight= True,
    name= 'Clean Map'
    
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[47]:

Equipment Violation Margin in Saint Paul by Grid¶

In [48]:

#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")

C= B.query('Count > 100')

FG_map.choropleth(
    geo_data=sp_geo,
    data=C,
    columns=['Grid','Equipment Violation'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Equipment Violation Margin',
    highlight= True,
    name= 'Clean Map'
    
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[48]:

Moving Violation Count in Saint Paul by Grid¶

In [49]:

#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")

C= B.query('Count < 2000')

FG_map.choropleth(
    geo_data=sp_geo,
    data=C,
    columns=['Grid','Moving Violation_count'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Total Moving Violation Counts',
    highlight= True,
    name= 'Clean Map'
    
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[49]:

LateNight Traffic Stop Margin in Saint Paul by Neighborhood¶

In [50]:

#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")


FG_map.choropleth(
    geo_data=sp_geo,
    data=N,
    columns=['Grid','LateNight'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='LateNight Traffic Stop Margin',
    highlight= True,
    name= 'Clean Map'
    
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[50]:

LateNight Traffic Stop Margin in Saint Paul by Grid¶

In [51]:

#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")

C= B.query('Count > 100')

FG_map.choropleth(
    geo_data=sp_geo,
    data=C,
    columns=['Grid','LateNight'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='LateNight Traffic Stop Margin',
    highlight= True,
    name= 'Clean Map'
    
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[51]:

Driver Searched Margin in Saint Paul by Neighborhood¶

In [52]:

#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")


FG_map.choropleth(
    geo_data=sp_geo,
    data=N,
    columns=['Grid','Driver_search'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Driver Search Margin',
    highlight= True,
    name= 'Clean Map'
    
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[52]:

Driver Searched Margin in Saint Paul by Grid¶

In [53]:

#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")

C= B.query('Count >100')

FG_map.choropleth(
    geo_data=sp_geo,
    data=C,
    columns=['Grid','Driver_search'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Driver Search Margin',
    highlight= True,
    name= 'Clean Map'
    
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[53]:

White proportion in Saint Paul by Neighborhood(Census)¶

In [54]:

#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")


FG_map.choropleth(
    geo_data=sp_geo,
    data=N,
    columns=['Grid','White_Demo'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='White Margin',
    highlight= True,
    name= 'Clean Map'
    
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[54]:

White Driver Margin (Diff) in Saint Paul by Neighborhood¶

I map out the difference between actual demographic distribution of whites from the margin white drivers. For legend, if lighter indicates over-representation and darker indicates under-representation

In [55]:

#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")

N['Diff']= N['White_Demo']- N['White']

FG_map.choropleth(
    geo_data=sp_geo,
    data=N,
    columns=['Grid','Diff'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='White: Difference',
    highlight= True,
    name= 'Clean Map'
    
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[55]:

White Driver Margin in Saint Paul by Grid¶

In [56]:

#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")

C= B.query('Count >100')

FG_map.choropleth(
    geo_data=sp_geo,
    data=C,
    columns=['Grid','White'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='White Margin',
    highlight= True,
    name= 'Clean Map'
    
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[56]:

Black proportion in Saint Paul by Neighborhood(Census)¶

In [57]:

#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")


FG_map.choropleth(
    geo_data=sp_geo,
    data=N,
    columns=['Grid','Black_Demo'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Black Margin',
    highlight= True,
    name= 'Clean Map'
    
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[57]:

Black Driver Margin (Diff) in Saint Paul by Neighborhood¶

I map out the difference between actual demographic distribution of whites from the margin white drivers. For legend, Black drivers are by default over-representated!

In [58]:

#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")

N['Diff']= N['Black_Demo']- N['Black']

FG_map.choropleth(
    geo_data=sp_geo,
    data=N,
    columns=['Grid','Diff'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Black: Difference',
    highlight= True,
    name= 'Clean Map'
    
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[58]:

Black proportion in Saint Paul by Grid¶

In [59]:

#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")

C= B.query('Count >100')

FG_map.choropleth(
    geo_data=sp_geo,
    data=C,
    columns=['Grid','Black'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Black Margin',
    highlight= True,
    name= 'Clean Map'
    
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[59]:

Asian proportion in Saint Paul by Neighborhood(Census)¶

In [60]:

#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")



FG_map.choropleth(
    geo_data=sp_geo,
    data=N,
    columns=['Grid','Asian_Demo'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Asian Margin',
    highlight= True,
    name= 'Clean Map'
    
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[60]:

Asian Driver Margin (Diff) in Saint Paul by Neighborhood¶

I map out the difference between actual demographic distribution of whites from the margin white drivers. For legend, Asian drivers are by default underrepresentated.

In [61]:

#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")

N['Diff']= N['Asian_Demo']- N['Asian']

FG_map.choropleth(
    geo_data=sp_geo,
    data=N,
    columns=['Grid','Diff'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Asian: Difference',
    highlight= True,
    name= 'Clean Map'
    
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[61]:

Asian proportion in Saint Paul by Grid¶

In [62]:

#Create Saint Paul Population Map
FG_map = folium.Map(location=[44.958326, -93.132926], zoom_start=12,tiles="OpenStreetMap")

C= B.query('Count >100')

FG_map.choropleth(
    geo_data=sp_geo,
    data=C,
    columns=['Grid','Asian'],
    key_on="feature.properties.gridnum",
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Asian Margin',
    highlight= True,
    name= 'Clean Map'
    
)
folium.LayerControl().add_to(FG_map)
                             
# display map
FG_map

Out[62]:

Quick Analysis

The Frogtown community does not have heaviest density of traffic stops, but the cluster is very apparent to nearby grids
Despite greater frequency of traffic stops in certain areas, the margin of citations is lower
Many lower social economic areas have more equipment violations, which is also representated by their margin
Many drivers in lower social economic areas are searched by margin and are stopped more regularly late night

Conclusions/Summary ¶

After digging into the data, we get a better grasp on how traffic stops are administered and the citation rate. Within Frogtown, the data shows that movement violations are more frequent during the daytime, which have a high citation rate. During late night, equipment violation is more frequent in the community. When expanding outward to the Saint Paul, we see that traffic stops are more frequent along university avenue, which is both commerical and the light transit runs through the avenue as well. The Frogtown area have more traffic stops than their neighboring community, with greater concentration from Western Ave to Lexington Ave. Within Saint Paul, there are similar communities like Frogtown.

In regards to the racial question within Frogtown, Black and White drivers are over-representated, while Asians are under-representated for total citations given. Black drivers are at least twice more likely to be searched than their white counterparts for moving violations, despite having low citation rates. During the late night hours, black drivers are more likely to be stopped and have greater rates of moving violations. On the other hand, white drivers have greater citation rate during the morning time. Within gender, white female drivers get stopped. Though it's worth nothing that this can be due to the wealth gap between different racial lines. At the city level, we see that black drivers are over-representated on total traffic stops in all neighborhoods. On contrast, Asian drivers are under-representated respect to total traffic stops.

This report created more questions than answers. Why is there discrepancies in the data? What is the criteria for an equipment violation; does crashes count? There is simply not enough volume of 911 calls and invesitgative stops to account for the imbalance.

There is a lot more information that can be gleamed from the results. Please feel free to email me if you have any questions or thoughts.

Please check out the numbers for your respective community in the Appendix.

Appendix (Neighborhood Tables)¶

In [ ]:

In [30]:

Features= ['Community','Count','Citation','Equipment Violation','Moving Violation','Driver_search','Vehicle_search','LateNight',\
          'Asian','Black','Latino','White','Other','Gender','Weekend']

#pd.options.display.float_format = '{:.4f}%'.format

# Create a sum and divide by Count; and 
Ne= df[Features].groupby(['Community']).sum()

#Save Sum Values
C= Ne[['Count','Citation','Equipment Violation','Moving Violation','LateNight']]

#Divide by Count and then add new columns in tranformed table
Ne=Ne.div(Ne['Count'].values,axis=0)
Ne['Count']=C.iloc[:,0] 
Ne['Citation_count'] = C.iloc[:,1]
Ne['Equipment Violation_count'] = C.iloc[:,2]
Ne['Moving Violation_count'] = C.iloc[:,3]
Ne['LateNight_count'] = C.iloc[:,4]

Ne['White_Demo']= SPDem.iloc[:,0].values #.values was used
#print(N['White_Demo'])
Ne['Black_Demo']= SPDem.iloc[:,1].values 
Ne['Asian_Demo']= SPDem.iloc[:,2].values 
Ne=Ne.reset_index()

Ne[['Community','Count', 'Citation_count','Citation','Equipment Violation_count','Equipment Violation','Moving Violation_count','Moving Violation']].set_index('Community')

#df

Out[30]:

	Count	Citation_count	Citation	Equipment Violation_count	Equipment Violation	Moving Violation_count	Moving Violation
Community
Battle_Creek	2290	1654	0.7223%	269.0000%	0.1175%	2021.0000%	0.8825%
Capital_River	4911	3423	0.6970%	534.0000%	0.1087%	4377.0000%	0.8913%
Como	1328	851	0.6408%	187.0000%	0.1408%	1141.0000%	0.8592%
Dayton_Bluff	2568	934	0.3637%	753.0000%	0.2932%	1815.0000%	0.7068%
Greater_East_Side	3968	2073	0.5224%	810.0000%	0.2041%	3158.0000%	0.7959%
Highland_Park	3372	2960	0.8778%	76.0000%	0.0225%	3296.0000%	0.9775%
Macalester_Groveland	1851	1437	0.7763%	130.0000%	0.0702%	1721.0000%	0.9298%
Midway	2476	1288	0.5202%	491.0000%	0.1983%	1985.0000%	0.8017%
North_End	8561	4883	0.5704%	1854.0000%	0.2166%	6707.0000%	0.7834%
Payne_Phalen	12136	4185	0.3448%	3573.0000%	0.2944%	8563.0000%	0.7056%
St_Anthony	347	223	0.6427%	94.0000%	0.2709%	253.0000%	0.7291%
Summit_Hill	851	585	0.6874%	74.0000%	0.0870%	777.0000%	0.9130%
Summit_University	2477	1133	0.4574%	573.0000%	0.2313%	1904.0000%	0.7687%
Thomas_Frogtown	5458	2219	0.4066%	1654.0000%	0.3030%	3804.0000%	0.6970%
Union_Park	3137	2068	0.6592%	444.0000%	0.1415%	2693.0000%	0.8585%
West_7th	3038	2210	0.7275%	291.0000%	0.0958%	2747.0000%	0.9042%
West_Side	1192	517	0.4337%	333.0000%	0.2794%	859.0000%	0.7206%

In [25]:

Ne[['Community','Driver_search','Vehicle_search', 'Citation_count','LateNight','Gender','Weekend']].set_index('Community')

Out[25]:

	Driver_search	Vehicle_search	Citation_count	LateNight	Gender	Weekend
Community
Battle_Creek	0.03%	0.03%	1654	0.15%	0.42%	0.11%
Capital_River	0.03%	0.03%	3423	0.15%	0.41%	0.16%
Como	0.03%	0.03%	851	0.19%	0.43%	0.15%
Dayton_Bluff	0.09%	0.09%	934	0.35%	0.33%	0.23%
Greater_East_Side	0.07%	0.06%	2073	0.27%	0.38%	0.17%
Highland_Park	0.00%	0.00%	2960	0.02%	0.44%	0.03%
Macalester_Groveland	0.01%	0.01%	1437	0.09%	0.42%	0.09%
Midway	0.06%	0.05%	1288	0.25%	0.40%	0.18%
North_End	0.06%	0.05%	4883	0.24%	0.37%	0.18%
Payne_Phalen	0.11%	0.10%	4185	0.38%	0.31%	0.24%
St_Anthony	0.03%	0.02%	223	0.39%	0.28%	0.39%
Summit_Hill	0.03%	0.03%	585	0.13%	0.42%	0.14%
Summit_University	0.08%	0.07%	1133	0.32%	0.35%	0.25%
Thomas_Frogtown	0.10%	0.09%	2219	0.40%	0.34%	0.22%
Union_Park	0.03%	0.03%	2068	0.18%	0.38%	0.17%
West_7th	0.02%	0.01%	2210	0.11%	0.39%	0.14%
West_Side	0.08%	0.08%	517	0.35%	0.32%	0.28%

In [65]:

Ne[['Community','Asian','Asian_Demo', 'Black','Black_Demo','White','White_Demo']].set_index('Community')

Out[65]:

	Asian	Asian_Demo	Black	Black_Demo	White	White_Demo
Community
Battle_Creek	0.143668	0.231585	0.292140	0.203079	0.430131	0.398383
Capital_River	0.083690	0.089509	0.302993	0.132114	0.515985	0.675663
Como	0.100151	0.054048	0.306476	0.089230	0.491717	0.762898
Dayton_Bluff	0.146417	0.323817	0.419393	0.140553	0.306075	0.348776
Greater_East_Side	0.184476	0.297678	0.336946	0.155262	0.336694	0.358690
Highland_Park	0.029359	0.071076	0.147687	0.140959	0.726572	0.652471
Macalester_Groveland	0.026472	0.037623	0.127499	0.126550	0.740681	0.768102
Midway	0.050889	0.039393	0.341276	0.020893	0.498384	0.871761
North_End	0.160963	0.355203	0.397851	0.231465	0.338512	0.259136
Payne_Phalen	0.211519	0.358401	0.400132	0.128624	0.276038	0.307874
St_Anthony	0.083573	0.112020	0.221902	0.094220	0.550432	0.711481
Summit_Hill	0.039953	0.027451	0.132785	0.050839	0.740306	0.839548
Summit_University	0.061365	0.082662	0.480420	0.324349	0.346790	0.491559
Thomas_Frogtown	0.125870	0.327318	0.492305	0.329769	0.272261	0.223661
Union_Park	0.039528	0.037718	0.259165	0.098012	0.599936	0.772243
West_7th	0.035879	0.029926	0.180053	0.108826	0.689928	0.721428
West_Side	0.041946	0.064984	0.281040	0.135823	0.453020	0.453392

Excess Code¶

In [ ]:

#Long Strect of code

#Programmer Note: The sort index made the values of the output easier to predice to allow more predictable results

#Group by Race to get count
A=round(rf['Race'].value_counts().sort_index(level=1),4)
Race_Grp.set_value('Asian','Tot_Count', A[0])
Race_Grp.set_value('Black','Tot_Count', A[1])
Race_Grp.set_value('Latino','Tot_Count', A[2])
Race_Grp.set_value('Other','Tot_Count', A[3])
Race_Grp.set_value('White','Tot_Count', A[4])

#Group by Race and Reason to get counts
A= round(rf.groupby(['Race'])['Reason'].value_counts().sort_index(level=1),4)
Race_Grp.set_value('Asian','Eq_Count', A[0])
Race_Grp.set_value('Black','Eq_Count', A[1])
Race_Grp.set_value('Latino','Eq_Count', A[2])
Race_Grp.set_value('Other','Eq_Count', A[3])
Race_Grp.set_value('White','Eq_Count', A[4])
Race_Grp.set_value('Asian','Mov_Count', A[5])
Race_Grp.set_value('Black','Mov_Count', A[6])
Race_Grp.set_value('Latino','Mov_Count', A[7])
Race_Grp.set_value('Other','Mov_Count', A[8])
Race_Grp.set_value('White','Mov_Count', A[9])

#Group by Race and Reason but normalized counts
A= round(rf.groupby(['Race'])['Reason'].value_counts(normalize=True).sort_index(level=1),4)
Race_Grp.set_value('Asian','Eq_Margin', A[0])
Race_Grp.set_value('Black','Eq_Margin', A[1])
Race_Grp.set_value('Latino','Eq_Margin', A[2])
Race_Grp.set_value('Other','Eq_Margin', A[3])
Race_Grp.set_value('White','Eq_Margin', A[4])
Race_Grp.set_value('Asian','Mov_Margin', A[5])
Race_Grp.set_value('Black','Mov_Margin', A[6])
Race_Grp.set_value('Latino','Mov_Margin', A[7])
Race_Grp.set_value('Other','Mov_Margin', A[8])
Race_Grp.set_value('White','Mov_Margin', A[9])

#Group by Race and Citation Counts
A=round(rf.groupby(['Race'])['Citation'].value_counts().sort_index(level=1),4)
Race_Grp.set_value('Asian','Citation_Count', A[5])
Race_Grp.set_value('Black','Citation_Count', A[6])
Race_Grp.set_value('Latino','Citation_Count', A[7])
Race_Grp.set_value('Other','Citation_Count', A[8])
Race_Grp.set_value('White','Citation_Count', A[9])

#Group by Race, Reason, and Citation Normalized Counts
A= round(rf.groupby(['Race','Reason'])['Citation'].value_counts(normalize=True).sort_index(level=2),4)
Race_Grp.set_value('Asian','Eq_Citation', A[10])
Race_Grp.set_value('Black','Eq_Citation', A[12])
Race_Grp.set_value('Latino','Eq_Citation', A[14])
Race_Grp.set_value('Other','Eq_Citation', A[16])
Race_Grp.set_value('White','Eq_Citation', A[18])
Race_Grp.set_value('Asian','Mov_Citation', A[11])
Race_Grp.set_value('Black','Mov_Citation', A[13])
Race_Grp.set_value('Latino','Mov_Citation', A[15])
Race_Grp.set_value('Other','Mov_Citation', A[17])
Race_Grp.set_value('White','Mov_Citation', A[19])

#Group by Race and Search Counts
A=round(rf.groupby(['Race'])['Driver_search'].value_counts().sort_index(level=1),4)
Race_Grp.set_value('Asian','Driversearch_Count', A[5])
Race_Grp.set_value('Black','Driversearch_Count', A[6])
Race_Grp.set_value('Latino','Driversearch_Count', A[7])
Race_Grp.set_value('Other','Driversearch_Count', A[8])
Race_Grp.set_value('White','Driversearch_Count', A[9])

#Group by Race and Citation Counts
A= round(rf.groupby(['Race','Reason'])['Driver_search'].value_counts(normalize=True).sort_index(level=2),4)
Race_Grp.set_value('Asian','Eq_DriverSearch', A[10])
Race_Grp.set_value('Black','Eq_DriverSearch', A[12])
Race_Grp.set_value('Latino','Eq_DriverSearch', A[14])
Race_Grp.set_value('Other','Eq_DriverSearch', A[16])
Race_Grp.set_value('White','Eq_DriverSearch', A[18])
Race_Grp.set_value('Asian','Mov_DriverSearch', A[11])
Race_Grp.set_value('Black','Mov_DriverSearch', A[13])
Race_Grp.set_value('Latino','Mov_DriverSearch', A[15])
Race_Grp.set_value('Other','Mov_DriverSearch', A[17])
Race_Grp.set_value('White','Mov_DriverSearch', A[19])

#Group by Race,Reason, and Gender Normalized Counts
A=round(rf.groupby(['Race','Reason'])['Gender'].value_counts(normalize=True).sort_index(level=2),4)
Race_Grp.set_value('Asian','Eq_Gender_F', A[10])
Race_Grp.set_value('Black','Eq_Gender_F', A[12])
Race_Grp.set_value('Latino','Eq_Gender_F', A[14])
Race_Grp.set_value('Other','Eq_Gender_F', A[16])
Race_Grp.set_value('White','Eq_Gender_F', A[18])
Race_Grp.set_value('Asian','Mov_Gender_F', A[11])
Race_Grp.set_value('Black','Mov_Gender_F', A[13])
Race_Grp.set_value('Latino','Mov_Gender_F', A[15])
Race_Grp.set_value('Other','Mov_Gender_F', A[17])
Race_Grp.set_value('White','Mov_Gender_F', A[19])

#Group by Race,LateNight Counts
A=round(rf.groupby(['Race'])['LateNight'].value_counts().sort_index(level=1),4)
Race_Grp.set_value('Asian','LateNight_Count', A[5])
Race_Grp.set_value('Black','LateNight_Count', A[6])
Race_Grp.set_value('Latino','LateNight_Count', A[7])
Race_Grp.set_value('Other','LateNight_Count', A[8])
Race_Grp.set_value('White','LateNight_Count', A[9])

#Group by Race,Reason, and Latenight Normalized Counts
A=round(rf.groupby(['Race','Reason'])['LateNight'].value_counts(normalize=True).sort_index(level=2),4)
Race_Grp.set_value('Asian','Eq_LateNight', A[10])
Race_Grp.set_value('Black','Eq_LateNight', A[12])
Race_Grp.set_value('Latino','Eq_LateNight', A[14])
Race_Grp.set_value('Other','Eq_LateNight', A[16])
Race_Grp.set_value('White','Eq_LateNight', A[18])
Race_Grp.set_value('Asian','Mov_LateNight', A[11])
Race_Grp.set_value('Black','Mov_LateNight', A[13])
Race_Grp.set_value('Latino','Mov_LateNight', A[15])
Race_Grp.set_value('Other','Mov_LateNight', A[17])
Race_Grp.set_value('White','Mov_LateNight', A[19])

#Group by Race,Latenight, and Citation Normalized Counts
#1/3 of data is done during latenight activities
A=round(rf.groupby(['Race','LateNight'])['Citation'].value_counts(normalize=True).sort_index(level=2),4)
Race_Grp.set_value('Asian','Morn_Citation', A[10])
Race_Grp.set_value('Black','Morn_Citation', A[12])
Race_Grp.set_value('Latino','Morn_Citation', A[14])
Race_Grp.set_value('Other','Morn_Citation', A[16])
Race_Grp.set_value('White','Morn_Citation', A[18])
Race_Grp.set_value('Asian','Late_Citation', A[11])
Race_Grp.set_value('Black','Late_Citation', A[13])
Race_Grp.set_value('Latino','Late_Citation', A[15])
Race_Grp.set_value('Other','Late_Citation', A[17])
Race_Grp.set_value('White','Late_Citation', A[19])

In [ ]:

# Construct for every neighborhood

d_North_End=Race_Grp
d_West_Side=Race_Grp
d_Battle_Creek=Race_Grp
d_Dayton_Bluff=Race_Grp
d_Payne_Phalen=Race_Grp
d_Capital_River=Race_Grp
d_Union_Park=Race_Grp
d_Summit_University=Race_Grp
d_Greater_East_Side=Race_Grp
d_Thomas_Frogtown=Race_Grp
d_St_Anthony=Race_Grp
d_Summit_Hill=Race_Grp
d_Midway=Race_Grp
d_West_7th=Race_Grp
d_Como=Race_Grp
d_Highland_Park=Race_Grp
d_Macalester_Groveland=Race_Grp

A= df.Community.unique()
Dict= {A[0]:d_North_End, A[1]:d_West_Side, A[2]:d_Battle_Creek, A[3]:d_Dayton_Bluff, A[4]:d_Payne_Phalen, A[5]:d_Capital_River,
       A[6]:d_Union_Park, A[7]:d_Summit_University, A[8]:d_Greater_East_Side, A[9]:d_Thomas_Frogtown, A[10]:d_St_Anthony,
       A[11]:d_Summit_Hill, A[12]:d_Midway, A[13]:d_West_7th, A[14]:d_Como, A[15]:d_Highland_Park, A[16]:d_Macalester_Groveland}


RR= ['Asian','Black','Latino','Other','White']
for k,p in enumerate(Dict):
    rf= df[df['Community']==p]
    for i,j in enumerate(RR):
        Dict[p].set_value(j,'Tot_Count', round(rf['Race'].value_counts().sort_index(level=1),4)[i])
        Dict[p].set_value(j,'Eq_Count', round(rf.groupby(['Race'])['Reason'].value_counts().sort_index(level=1),4)[i])
        Dict[p].set_value(j,'Mov_Count', round(rf.groupby(['Race'])['Reason'].value_counts().sort_index(level=1),4)[i+len(RR)])

        
#print(Dict['Capital_River'])
print(Dict['West_Side'])
#DR[1]     

In [ ]:

LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    grid_latitude, 
    grid_longitude, 
    radius, 
    LIMIT)
url # display URL

#Get respective json
results = requests.get(url).json()

# Find venues in one location
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]
# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
#nearby_venues

Frogtown_venues.head()
#Frogtown_venues.groupby('Neighborhood').count()
Frogtown_venues.Venue_Category.unique()

#Specify the Venues of interest to store and display
x=['Dive Bar', 'Convenience Store', 'Liquor Store', 'Restaurant', 'Middle Eastern Restaurant','Vietnamese Restaurant',\
   'Asian Restaurant', 'Thai Restaurant','Café','Fast Food Restaurant','Noodle House','Ethiopian Restaurant','Bar',\
   'Ramen Restaurant', 'BBQ Joint', 'Grocery Store', 'Chinese Restaurant']
#Frogtown_venues[Frogtown_venues['Venue_Category'].isin(x)]