FT Crusader Logo

Frogtown/Saint Paul Traffic Stop Report; 04/07/18

By Frogtown Crusader (Abu Nayeem)

Introduction

Disclaimer: This is my Coursera Capstone Project for Data Science. Also, I will be using the term "Black" instead of African American because that is the race indicator provided from the dataset and the category also includes persons of African origin (substantial minority).

About Me:

I'm a Frogtown resident, community advocate, programmer. I will like to use open-source data to be share stories and create action. Please follow me on Github

Purpose:

Currently in the United States, there is alot of tension between law enforcement, and the public. I will be looking into the Traffic Stop Data for Saint Paul, Minnesota (USA) provided by the Saint Paul Police Department (SPPD). Analyzing the traffic stop data can provide evidence (or lack thereof) of systemic biases. The goal of my report to add insight on what is happening in my community, Thomas-Dale neighborhood aka Frogtown, as well as advocate citizens to use open source data and/or demand their public agencies to provide such data.

Executive Summary:

There are certain parts of Frogtown that have greater frequency of traffic stops compared to rest of the neighborhood, particularly along University University Avenue.The data suggests targeting of Black drivers given that they are stopped more frequently, searched more frequently per stop, and less likelihood to receive a citation. Some other data insights include that moving violation stops are given mostly in the morning and have a higher citation rate. In contrast, during the late night hours, there are greater instances of equipment violations. Furthermore, there seems to be many communities, including Frogtown that have considerable instances of equipment violations.

Why prove the obvious?

The results may be obvious, but proving it may be more challenging. As a researcher, my goal is to measure the impact, seek the truth, explore, and challenge my expectations. Data can be the great equalizer challenging our worldviews and/or reinforcing our existing perspective. Data reports can be used to share stories and information effectively. Furthermore, data is used as an evaluation tool to determine the effectiveness of programs and policies.

Thus data practitioners, more broadly institutions, hold strong responsibility and influence in shaping the data in support a certain narrative. In our current political climate, the public perception on law enforcement is polarized and I hope these studies can shed light on the issues. This report and other will be available via open source, allowing others to contribute, replicate, use code for their own respective neighborhood.

About Saint Paul

The City of Saint Paul is the second largest city in Minnesota, USA, and is the capital city for the state. Saint Paul is often paired with their nearby city, Minneapolis, where they are aptly named, The Twin Cities. It has roughly over 300,000 people and the city itself is quite diverse. Minnesota has a high level of racial inequity ranking 47th of 51st compared to rest of the United States. Saint Paul is broken down to seventeen Planning Districts, created in 1979 to allow neighborhoods to participate in governance and use Community Development Block Grants. The Thomas/Dale neighborhood is one of the district planning councils. A few years ago, a tragic police shooting occurred during a traffic stop in the suburbs of Saint Paul, Falcon Heights, which has increased tension within the community between law enforcement and citizens.

About Thomas-Dale-Frogtown Neighborhood

The Frogtown community has historically been a transitional community with new immigrant/refugee communities living in the neighborhood for short period of time. From my experience, Frogtown boasts considerable diversity respect to language, culture, and ethnicity. In recent times, it has been historically poor. Here is a snapshot of the community exported from Minnesota Compass based on 2017 Census Demographic Data.

title

title

Frogtown Community Information

The image below displays the Frogtown Community using the police grid (matches well with actual boundaries). On the southern boundary of Frogtown is University Avenue, where the Light Rail Transportation runs along the boundary and it is a heavy residential street as well. I will emphasize more noticeable landmarks once plugging in the 4-square data. The two rightmost sectors are respectfully Mt. Airy and Capitol Heights. These two communities are considered distinct by community members.

title

About the Datasets:

The dataset contains SPPD traffic stop collected by SSPD from 2000 to 2018 via agreement of the Saint Paul chapter of the NAACP and can be accessed here. The website have a lot of features and visualizations for basic analysis, but for advanced users data transformations are not available/ limited. I have chosen to select years from 2017 to 2018 based on data limitations.

Data Features:

  • Individual Traffic Stop Data
  • Driver characteristics
    • Gender, Age (if recieve citation), and Race
    • Was the driver searched, vehicle searched, and/or recieve a citation?
  • GeoCoordinates of center of police grid and timestamp for

Data limitation as explain on the website:

  • Reason for stop (available starting in 2017)
    • Include Moving Violation, Equipment Violation, Investigative Stop, and 911 call
  • Data reflects traffic stops originating by St. Paul Police Officers
  • Race is based on officers’ perceptions
  • Fields indicating “No Data” may be due to a variety of factors, including:

    • Age data is only collected when a citation is issued
    • Reason for stop data was not collected before 2017
    • Technology changes over time/ Technical Errors/ Lack of Available information
  • Supplemental Info Suggested by Author

The dataset consist of each traffic stop record, but the locations coordinates are limited to the police grid coordinates. There is maybe over 200 or so police grids!

title

Four-Square API Dataset

I'll be using the Four-square API to get information on local businesesses. Some street/ and areas might be more active than others

Minnesota Compass 2017 Census Survey Data

The Minnesota Compass offers raw data for both Minneapolis and Saint Paul districts. They are a non-partisan group.

Data Prep

The primary data will range from 2017-18. The longitudinal analysis will have data from 2001 to 2018.

Data Cleaning/Wrangling

I will be constructing several variables. Originally, I wanted to extract gridlocation coordinates from the dataset, but it makes more sense to connect the grid to a json file. The manipulations and additions are listed below:

  1. Convert time variable to datetime; Extract Month, DayofWeek, Weekend, Hour
  2. I've constructed a variable LateNight which denotes if a stop occured between 10:00PM to 5:00AM
  3. Converted several variables to integers; Note: Female is designated as 1
  4. Converted some descriptive columns to dummy variables
  5. Extracted Latitude and Longitude in separate columns for each police grid

Initial Omissions

  • The demographic 'Native American' was excluded because the numbers were too small.
  • There are empty cells under the 'No Data' Category
  • The two reasons, '911 call', and 'Investigative Stop' were excluded because they were small numbers and not relevant to study; See below
  • Finally, any data entries not belonging to a community were excluded. It's possible that some stops occurred outside Saint Paul jursidiction. These data points were excluded
  • Driver being search and vehicle being searched is strongly correlated, so exclude vehicle search in analysis
In [7]:
df.Reason.value_counts()
Out[7]:
Moving Violation               48859
Equipment Violation            12362
Investigative Stop              2245
No Data                          139
911 Call / Citizen Reported      138
Name: Reason, dtype: int64

Data Methodology

I'll be exploring the dataset through multiple angles: longitudinal, geo-spatial, and in-depth analysis from 2017-18. I will be focusing primarily on racial discrimination. I will be mostly be using data visualization, a predictive model would be inappropiate as the data is mostly binary.

Standard Analysis

There are several methods/considerations/limitations in testing for racial biases; :

  1. At the first layer (shallow level), are certain drivers being selected more than others? There can be many explainations for discrepancies that may not be discrimination
  2. At the second layer, are 'certain' drivers being treated differently. For example, do women drivers get less citations?
  3. Are certain racial groups treatment stand out compared to their peers?
  4. Are there external factors, such as venues, local bars, and congested traffic areas that may influence outcome?
  5. Racial idenitification is imperfect given that this is determined by officers and certain persons are mixed heritage

The primary analysis will focus on the second later, I'll be create a master table that collects the groupby values; conditioned on race.

My analysis will be focusing primarily on treatment:

  • 'Eq' stands for Equipment Violation; and 'Mov' indicates Moving Violation

  • Most of the values are normalized from [0 to 1] and conditioned on Racial identity; Examples provided below

How to read results:

  • Eq_Margin of the Asian group indicates the percentage 'Equipment Violation' respect to all stops conditioned on being asian. So a value 0.24 would indicate that 24% of stops for Asians were for Equipment Violations.

  • Eq_Citation of 0.4 for Asian drivers indicates that 40% of Asian driver recieved a citation for equipment violations conditioned on being asian.

  • Mov_DriverSearch of 0.15 for Asians indicates that 15% of Asian drivers were searched during a Moving Violation conditioned on being Asian

  • Mov_Gender_F of 0.55 for Asians indicates that 55% of Asian women are stopped for Moving Violations conditioned on being women.

  • Eq_LateNight of 0.25 for Asians indicate that 25% of Asian drivers are stopped for Equipment Violation during latenight conditioned on being Asian

  • Morn_Citation of 0.2 for Asians indicate that 20% of Asian drivers recieved during the daytime conditioned of being Asian

Longitudinal Analysis

Are there any trends throughout the years?

Commercial Analysis

Does the number of stores and type of stores influence the neighborhood? We'll see this visually.

Geo-Spatial Analysis

How does Frogtown compare to their neighbors?

Analysis

Standard Analysis

My strategy below is to create to save the group by values into a single table; the sort index allows the data to be formated in a way to view patterns

title

title

title

Plotting

We will be taking the results from previous section and graphing it

title

title

title

title

title

Insights

Recall in Frogtown the racial distribution is 1/3 Black, 1/3 Asian, and 1/5 white. With that said there is greater proportion of both Black and White drivers being stopped.

  • Black drivers were stopped the most and by proportion have greater likelihood of being stopped for equipment violation (
  • Asian drivers were stopped less respect to their proportional population
  • Black drivers were less likely to recieve citations despite being pulled over frequently
  • White drivers were more likely to recieve a citation for moving violations compared to other groups
  • Black drivers are searched much often than their peers despite low citations count
  • Female drivers via proportion are less likely to be stopped; though white females have the highest proportion
  • There are considerably more Equipment violations during latenight than in the morning and less Moving Violations
  • The citation rates are much higher during the daytime vs the nighttime. This makes sense because there is less drivers during late night. Though 1/3 of stops occur during latenight.
  • For the age distribution, nearly 1/3 of citations don't have the driver's age. Based on the available data, middle aged adults are most likely to get citations.

Time Based Analysis

I will be checking out patterns for month, day of the week, and time of hour.

title

title

  • The traffic stops are most frequent on Tuesday and less on the weekend
  • Late Night traffic stops increases on the weekend days includes Friday (makes sense)
  • There seems to be strong correlation between moving violation and number of citations
  • There seems to be strong correlation between equipment violation and latenight traffic stops
  • Less citations by proportion is less during the weekend

title

title

  • There is less traffic stops during the winter months, and steadily more on Spring and Fall
  • There is a significant drop on Equipment and Latenight traffic stops during the summer month
  • There is an increase proportion of moving violations and citations during the summer months
  • Less citiations are given Late Fall and early Winter

title

title

  • There are signficiant more traffic stops during the night hours
  • Moving violations during the day, Equipment violations during the night
  • Very high frequency of citations during the daytime hour matched with moving violations

Longitudinal Analysis of Frogtown

The full dataset ranges from 2001 to 2018 and has many missing components. For some years, 50% of the data collected are missing key information. The 'total count' will includes all instances of traffic stops in Frogtown even if there is missing supplmental information.

title

title

title

title

Preliminary Longitudinal Analysis:

  • There was plenty of past data that indicate a traffic, but there was no information given.Thus the graphs presented are missing many datapoints and can provide a skewed a picture
  • At around 2004 to 2005, a change of data practice/collections probably occured
  • The Citation rate has increased in the last couple of years
  • The racial demogrpahic of traffic stops have been steady with Black drivers being overrepresentated despite some demographic shift in the neighborhood
  • Driver search rate has declined over time

Commercial Insight/Analysis

I will be using the Four-Square API to get the nearby venues, within the radius of each police grid. I've separated the venue into three categories of interest.

  • Green indicates Restaurant
  • Blue indicates Bars
  • Yellow indicates Convenience/Corner Stores

title

title

The graph above shows the stores within the Frogtown area. University Avenue has concentrated traffic. There are some neighborhood bars and convenience near the residential homes, we'll see a clearer picture with geo-spatial data

Geo-Spatial Prep

The Saint Paul Police department has a json file that maps out the police grid. Note the data is now either grouped by Grid or Neighborhood

Frogtown Geo-Spatial Data

Total Traffic Incidents in Frogtown by Grid (W/ Venue marks)

title

Citation Margin in Frogtown by Grid

title

Equipment Violation Margin in Frogtown by Grid

title

LateNight Traffic Stop Margin in Frogtown by Grid

title

Driver Searched Margin In Frogtown by Grid

title

Frogtown Summary Stats

  • For total incidents, high volume of incidents are around University ave around Dale
  • The citation margin is not too high even in the spaces where there is greater frequency of stops
  • The Equipment Violation density is high and located in the same high density rate
  • The LateNight Stop Margin is really high in the University area and the vehicle search rate is also high

Saint Paul Geospatial Data

  • For margin specifications, a police grid must have more than 100 total traffic stops. A smaller number creates bigger imbalance on margins
  • The downtown district was excluded when graphing total numbers under grid data because it has very high frequency of stops; influencing the legend gradient.
  • The data is from 2017 to 2018.
  • There is one police grid that has zero traffic stops (try to find it)

Total Traffic Incidents in Saint Paul by Neighborhood

title

Total Traffic Incidents in Saint Paul by Grid

title

Total Citation Count in Saint Paul by Grid

title

Citation Margin in Saint Paul by Neighborhood

title

Citation Margin in Saint Paul by Grid

title

Equipment Violation Count in Saint Paul by Grid

title

Equipment Violation Margin in Saint Paul by Neighborhood

title

Equipment Violation Margin in Saint Paul by Grid

title

Moving Violation Count in Saint Paul by Grid

title

LateNight Traffic Stop Margin in Saint Paul by Neighborhood

title

LateNight Traffic Stop Margin in Saint Paul by Grid

title

Driver Searched Margin in Saint Paul by Neighborhood

title

Driver Searched Margin in Saint Paul by Grid

title

White proportion in Saint Paul by Neighborhood (Census)

title

White Driver and Census Margin Difference in Saint Paul by Neighborhood

I map out the difference between actual demographic distribution of whites from the margin white drivers stopped. For legend, if lighter indicates over-representation and darker indicates under-representation

title

White Driver Margin in Saint Paul by Grid

title

Black proportion in Saint Paul by Neighborhood(Census)

title

Black Drivers and Census Margin Difference in Saint Paul by Neighborhood

I map out the difference between neighborhood of Blacks from the margin black drivers drivers stopped. For legend, Black drivers are by default over-representated!

title

Black Drivers proportion in Saint Paul by Grid

title

Asian proportion in Saint Paul by Neighborhood(Census)

title

Asian Driver and Census Margin Difference in Saint Paul by Neighborhood

I map out the difference between demographic distribution of Asian from the margin os Asian drivers stopped for traffic related incident. For legend, Asian drivers are by mostly underrepresentated.

title

Asian driver proportion in Saint Paul by Grid

title

Quick Analysis

  • The Frogtown community does not have heaviest density of traffic stops, but the cluster is very apparent to nearby grids
  • Despite greater frequency of traffic stops in certain areas, the margin of citations is lower
  • Many lower social economic areas have more equipment violations, which is also representated by their margin
  • Many drivers in lower social economic areas are searched by margin and are stopped more regularly late night
  • Black drivers are over-respresentated, while Asian drivers are under-representated. This is true across the city.

Conclusions/Summary

After digging into the data, we get a better grasp on how traffic stops are administered and the citation rate. Within Frogtown, the data shows that movement violations are more frequent during the daytime, which have a high citation rate. During late night, equipment violation is more frequent in the community. When expanding outward to the Saint Paul, we see that traffic stops are more frequent along university avenue, which is both commerical and the light transit runs through the avenue as well. The Frogtown area have more traffic stops than their neighboring community, with greater concentration from Western Ave to Lexington Ave. Within Saint Paul, there are similar communities like Frogtown.

In regards to the racial question within Frogtown, Black and White drivers are over-representated, while Asians are under-representated for total citations given. Black drivers are at least twice more likely to be searched than their white counterparts for moving violations, despite having low citation rates. During the late night hours, black drivers are more likely to be stopped and have greater rates of moving violations. On the other hand, white drivers have greater citation rate during the morning time. Within gender, white female drivers get stopped. Though it's worth nothing that this can be due to the wealth gap between different racial lines. At the city level, we see that black drivers are over-representated on total traffic stops in all neighborhoods. On contrast, Asian drivers are under-representated respect to total traffic stops.

This report created more questions than answers. Why is there discrepancies in the data? What is the criteria for an equipment violation; does crashes count? There is simply not enough volume of 911 calls and invesitgative stops to account for the imbalance.

There is a lot more information that can be gleamed from the results. Please feel free to email me if you have any questions or thoughts.

Please check out the numbers for your respective community in the Appendix.

Appendix (Neighborhood Tables)

title

title

title