Disclaimer: This is my Coursera Capstone Project for Data Science. Also, I will be using the term "Black" instead of African American because that is the race indicator provided from the dataset and the category also includes persons of African origin (substantial minority).
I'm a Frogtown resident, community advocate, programmer. I will like to use open-source data to be share stories and create action. Please follow me on Github
Currently in the United States, there is alot of tension between law enforcement, and the public. I will be looking into the Traffic Stop Data for Saint Paul, Minnesota (USA) provided by the Saint Paul Police Department (SPPD). Analyzing the traffic stop data can provide evidence (or lack thereof) of systemic biases. The goal of my report to add insight on what is happening in my community, Thomas-Dale neighborhood aka Frogtown, as well as advocate citizens to use open source data and/or demand their public agencies to provide such data.
There are certain parts of Frogtown that have greater frequency of traffic stops compared to rest of the neighborhood, particularly along University University Avenue.The data suggests targeting of Black drivers given that they are stopped more frequently, searched more frequently per stop, and less likelihood to receive a citation. Some other data insights include that moving violation stops are given mostly in the morning and have a higher citation rate. In contrast, during the late night hours, there are greater instances of equipment violations. Furthermore, there seems to be many communities, including Frogtown that have considerable instances of equipment violations.
The results may be obvious, but proving it may be more challenging. As a researcher, my goal is to measure the impact, seek the truth, explore, and challenge my expectations. Data can be the great equalizer challenging our worldviews and/or reinforcing our existing perspective. Data reports can be used to share stories and information effectively. Furthermore, data is used as an evaluation tool to determine the effectiveness of programs and policies.
Thus data practitioners, more broadly institutions, hold strong responsibility and influence in shaping the data in support a certain narrative. In our current political climate, the public perception on law enforcement is polarized and I hope these studies can shed light on the issues. This report and other will be available via open source, allowing others to contribute, replicate, use code for their own respective neighborhood.
The City of Saint Paul is the second largest city in Minnesota, USA, and is the capital city for the state. Saint Paul is often paired with their nearby city, Minneapolis, where they are aptly named, The Twin Cities. It has roughly over 300,000 people and the city itself is quite diverse. Minnesota has a high level of racial inequity ranking 47th of 51st compared to rest of the United States. Saint Paul is broken down to seventeen Planning Districts, created in 1979 to allow neighborhoods to participate in governance and use Community Development Block Grants. The Thomas/Dale neighborhood is one of the district planning councils. A few years ago, a tragic police shooting occurred during a traffic stop in the suburbs of Saint Paul, Falcon Heights, which has increased tension within the community between law enforcement and citizens.
The Frogtown community has historically been a transitional community with new immigrant/refugee communities living in the neighborhood for short period of time. From my experience, Frogtown boasts considerable diversity respect to language, culture, and ethnicity. In recent times, it has been historically poor. Here is a snapshot of the community exported from Minnesota Compass based on 2017 Census Demographic Data.
The image below displays the Frogtown Community using the police grid (matches well with actual boundaries). On the southern boundary of Frogtown is University Avenue, where the Light Rail Transportation runs along the boundary and it is a heavy residential street as well. I will emphasize more noticeable landmarks once plugging in the 4-square data. The two rightmost sectors are respectfully Mt. Airy and Capitol Heights. These two communities are considered distinct by community members.
The dataset contains SPPD traffic stop collected by SSPD from 2000 to 2018 via agreement of the Saint Paul chapter of the NAACP and can be accessed here. The website have a lot of features and visualizations for basic analysis, but for advanced users data transformations are not available/ limited. I have chosen to select years from 2017 to 2018 based on data limitations.
Data Features:
Data limitation as explain on the website:
Reason for stop (available starting in 2017)
Data reflects traffic stops originating by St. Paul Police Officers
Race is based on officers’ perceptions
Fields indicating “No Data” may be due to a variety of factors, including:
Supplemental Info Suggested by Author
The dataset consist of each traffic stop record, but the locations coordinates are limited to the police grid coordinates. There is maybe over 200 or so police grids!
I'll be using the Four-square API to get information on local businesesses. Some street/ and areas might be more active than others
The Minnesota Compass offers raw data for both Minneapolis and Saint Paul districts. They are a non-partisan group.
The primary data will range from 2017-18. The longitudinal analysis will have data from 2001 to 2018.
I will be constructing several variables. Originally, I wanted to extract gridlocation coordinates from the dataset, but it makes more sense to connect the grid to a json file. The manipulations and additions are listed below:
Initial Omissions
df.Reason.value_counts()
Moving Violation 48859 Equipment Violation 12362 Investigative Stop 2245 No Data 139 911 Call / Citizen Reported 138 Name: Reason, dtype: int64
I'll be exploring the dataset through multiple angles: longitudinal, geo-spatial, and in-depth analysis from 2017-18. I will be focusing primarily on racial discrimination. I will be mostly be using data visualization, a predictive model would be inappropiate as the data is mostly binary.
There are several methods/considerations/limitations in testing for racial biases; :
The primary analysis will focus on the second later, I'll be create a master table that collects the groupby values; conditioned on race.
My analysis will be focusing primarily on treatment:
'Eq' stands for Equipment Violation; and 'Mov' indicates Moving Violation
Most of the values are normalized from [0 to 1] and conditioned on Racial identity; Examples provided below
How to read results:
Eq_Margin of the Asian group indicates the percentage 'Equipment Violation' respect to all stops conditioned on being asian. So a value 0.24 would indicate that 24% of stops for Asians were for Equipment Violations.
Eq_Citation of 0.4 for Asian drivers indicates that 40% of Asian driver recieved a citation for equipment violations conditioned on being asian.
Mov_DriverSearch of 0.15 for Asians indicates that 15% of Asian drivers were searched during a Moving Violation conditioned on being Asian
Mov_Gender_F of 0.55 for Asians indicates that 55% of Asian women are stopped for Moving Violations conditioned on being women.
Eq_LateNight of 0.25 for Asians indicate that 25% of Asian drivers are stopped for Equipment Violation during latenight conditioned on being Asian
Morn_Citation of 0.2 for Asians indicate that 20% of Asian drivers recieved during the daytime conditioned of being Asian
Are there any trends throughout the years?
Does the number of stores and type of stores influence the neighborhood? We'll see this visually.
How does Frogtown compare to their neighbors?
We will be taking the results from previous section and graphing it
Insights
Recall in Frogtown the racial distribution is 1/3 Black, 1/3 Asian, and 1/5 white. With that said there is greater proportion of both Black and White drivers being stopped.
I will be checking out patterns for month, day of the week, and time of hour.
The full dataset ranges from 2001 to 2018 and has many missing components. For some years, 50% of the data collected are missing key information. The 'total count' will includes all instances of traffic stops in Frogtown even if there is missing supplmental information.
Preliminary Longitudinal Analysis:
I will be using the Four-Square API to get the nearby venues, within the radius of each police grid. I've separated the venue into three categories of interest.
The graph above shows the stores within the Frogtown area. University Avenue has concentrated traffic. There are some neighborhood bars and convenience near the residential homes, we'll see a clearer picture with geo-spatial data
The Saint Paul Police department has a json file that maps out the police grid. Note the data is now either grouped by Grid or Neighborhood
I map out the difference between actual demographic distribution of whites from the margin white drivers stopped. For legend, if lighter indicates over-representation and darker indicates under-representation
I map out the difference between neighborhood of Blacks from the margin black drivers drivers stopped. For legend, Black drivers are by default over-representated!
I map out the difference between demographic distribution of Asian from the margin os Asian drivers stopped for traffic related incident. For legend, Asian drivers are by mostly underrepresentated.
Quick Analysis
After digging into the data, we get a better grasp on how traffic stops are administered and the citation rate. Within Frogtown, the data shows that movement violations are more frequent during the daytime, which have a high citation rate. During late night, equipment violation is more frequent in the community. When expanding outward to the Saint Paul, we see that traffic stops are more frequent along university avenue, which is both commerical and the light transit runs through the avenue as well. The Frogtown area have more traffic stops than their neighboring community, with greater concentration from Western Ave to Lexington Ave. Within Saint Paul, there are similar communities like Frogtown.
In regards to the racial question within Frogtown, Black and White drivers are over-representated, while Asians are under-representated for total citations given. Black drivers are at least twice more likely to be searched than their white counterparts for moving violations, despite having low citation rates. During the late night hours, black drivers are more likely to be stopped and have greater rates of moving violations. On the other hand, white drivers have greater citation rate during the morning time. Within gender, white female drivers get stopped. Though it's worth nothing that this can be due to the wealth gap between different racial lines. At the city level, we see that black drivers are over-representated on total traffic stops in all neighborhoods. On contrast, Asian drivers are under-representated respect to total traffic stops.
This report created more questions than answers. Why is there discrepancies in the data? What is the criteria for an equipment violation; does crashes count? There is simply not enough volume of 911 calls and invesitgative stops to account for the imbalance.
There is a lot more information that can be gleamed from the results. Please feel free to email me if you have any questions or thoughts.
Please check out the numbers for your respective community in the Appendix.