In [11]:
# Data Science Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import ttest_ind
%matplotlib inline
# Project Helper Files
from constants import *

Quantifying the School To Prison Pipeline -- a high level analysis of discipline-based inequities America's Public Schools

Introduction

America's 96,000 public schools provide the foundation for the next generation's education and, ultimately, future success.

Yet, educational researchers and practitioners worry that America's public schools do not provide all students equal opportunity for future success.

Some stakeholders feel the inequitable treatment of students is so severe -- particularly poor and minority students -- they've coined the term School to Prison Pipeline. That is, some schools and students are pushed, not to success, but towards the criminal justice system.

Overview of Investigation Foci

This analysis' following 3 questions will center on exploring this prominent and debated concept of "The School to Prison Pipeline" in US schools, generally defined as the disproportionate tendency of minors and young adults from disadvantaged backgrounds to become incarcerated, because of increasingly harsh school and municipal policies.

Discussions of "The School to Prison Pipeline" and its causes largely center on a few interrelated factors:

  1. Disproportionate suspensions, particularly towards black and disabled students.
  2. Police Presence on campus, especially in contrast to lack of counselors and mental health professionals, meaning troubled students may be more likely to be received by law enforcement than counselors who are trained in helping students cope, reform, and stay in school.
  3. "Zero tolerance" policies in schools increasing suspension and expulsion rates as schools deliver harsh discipline for even first-time or relatively minor offenses in an aim to stop disruptive behavior before it gets worse.

We'll explore these key factors via desegregated data on all 96,000 US public schools from the 2016/15 school year CRDC data. Because revealing causality in these factors is impossible to determine given just this data, this investigation centers on the extent and nature of these key measures at a national level.

Overview of the Dataset

The Civil Rights Data Collection, school-level data for 2015-2016

The Civil Rights Data Collection (CRDC) is a biennial survey required by the U.S. Department of Education’s (Department) Office for Civil Rights (OCR) since 1968. (Note, however, that survey content changes over time.)

The 2015–16 CRDC (the most recent year published) collects data from all public local educational agencies (LEAs, ie School Districts) and schools, including

  • long-term secure juvenile justice facilities
  • charter schools
  • alternative schools
  • and schools serving students with disabilities

with a response rate of 99.8% from 17,337 LEAs and 96,360 schools. Specifically, I will be looking at the finer-grained data disaggregated by school.

Data content and format

Each school (row) in the dataset includes 1,800 columns (typically a student count disaggregated by race and gender for some school measure) regarding 32 general topics, comprising a 460 MB csv. The topics I will investigate utilize only 50 columns pertaining to suspensions, expulsions, and school population. I will only look at white, black, and hispanic students, who form the majority of students at nearly all schools.

Known issues with dataset

  • From the CRDC report: "An important consideration for response rates is that the reporting process requires all schools and LEAs to respond to each survey item on the CRDC. Some LEAs, that did not have complete data, reported a zero value. It is not possible to determine all possible situations where this may have occurred. As such, it may be the case that the item response rates may be positively biased. For the large majority of CRDC survey items, the rate of missing data ranged from 0-5% of reported values."
  • In the separate Data Prep notebook, IDs in the original dataset greater than 10 significant digits had been cast to scientific notation format and stored as strings. I recreated the IDs for those, which are a combination of State, local, and school identifiers.
  • CRDC does not provide Lat Long coordinates, so I merged in school Lat Longs from the National Center for Educational Statistics in the Data Prep notebook. When schools did not have lat-long coordinates available, I assigned them the coordinate of their school district (LEA) office.

Load All Schools Data (after downlaoding)

Find this data in this Google Drive Folder.

In [5]:
DATA_FILE = 'data (download CSVs here)/crdc-data-with-lat-long.csv'
crdc_data = pd.read_csv(
    DATA_FILE,
    usecols=COLS_WITH_NEEDED_DATA,
    low_memory=False,
    encoding="ISO-8859-1"
)
Out[5]:
LEA_STATE_NAME SCH_NAME SCH_ENR_HI_M SCH_ENR_HI_F SCH_ENR_BL_M SCH_ENR_BL_F SCH_ENR_WH_M SCH_ENR_WH_F TOT_ENR_M TOT_ENR_F ... TOT_DISCWODIS_EXPZT_M TOT_DISCWODIS_EXPZT_F SCH_FTESECURITY_LEO SCH_FTESECURITY_GUA SCH_FTESERVICES_NUR SCH_FTESERVICES_PSY SCH_FTESERVICES_SOC SCH_JJTYPE LAT1516 LON1516
0 ALABAMA Wallace Sch - Mt Meigs Campus 5 0 71 0 50 0 128 0 ... 0 0 -9.00 2.0 0.00 2.00 0.0 -7 32.374812 -86.082360
1 ALABAMA McNeel Sch - Vacca Campus 0 0 38 0 14 0 52 0 ... 0 0 -9.00 2.0 0.00 1.00 0.0 -7 33.583385 -86.710058
2 ALABAMA Alabama Youth Services 0 0 554 0 323 0 908 0 ... 0 0 -9.00 2.0 0.00 0.00 0.0 -9 32.374847 -86.082332
3 ALABAMA AUTAUGA CAMPUS 2 0 17 0 14 0 38 0 ... 0 0 -9.00 0.0 0.00 0.00 0.0 -7 NaN NaN
4 ALABAMA Albertville Middle School 140 143 11 5 194 185 358 346 ... 0 0 1.00 0.0 1.00 0.00 0.0 -9 34.260194 -86.206174
5 ALABAMA Albertville High Sch 260 221 20 20 350 398 645 650 ... 0 0 1.00 1.0 1.00 0.00 0.0 -9 34.262154 -86.204863
6 ALABAMA Evans Elem Sch 161 173 17 14 194 191 381 389 ... 0 0 1.00 0.0 1.00 0.00 0.0 -9 34.273161 -86.220086
7 ALABAMA Albertville Elem Sch 218 215 11 8 188 176 430 417 ... 0 0 1.00 0.0 1.00 0.00 0.0 -9 34.253251 -86.221834
8 ALABAMA Big Spring Lake Kinderg Sch 134 128 11 5 110 92 264 234 ... 0 0 1.00 0.0 1.00 0.00 0.0 -9 34.290220 -86.192490
9 ALABAMA Albertville Primary Sch 281 269 20 17 227 230 555 534 ... 0 0 1.00 0.0 1.00 0.00 0.0 -9 34.253251 -86.221834
10 ALABAMA Kate Duncan Smith DAR Middle 8 5 2 2 218 188 235 210 ... 0 0 0.33 0.0 0.33 0.00 0.0 -9 34.533721 -86.253681
11 ALABAMA Asbury Sch 92 95 0 2 191 149 289 250 ... 0 0 0.50 0.0 0.50 0.00 0.0 -9 34.362770 -86.142240
12 ALABAMA Claysville Jr High Sch 8 8 5 0 44 53 64 65 ... 0 0 -9.00 0.0 0.25 0.00 0.0 -9 34.406429 -86.270689
13 ALABAMA Douglas Elem Sch 95 83 2 2 164 155 261 240 ... 0 0 0.25 0.0 0.25 0.00 0.0 -9 34.176234 -86.321259
14 ALABAMA Douglas High Sch 77 65 5 5 224 212 310 284 ... 0 0 0.25 0.0 0.25 0.00 0.0 -9 34.178157 -86.319947
15 ALABAMA Brindlee Mountain Elementary School 11 8 2 2 116 113 131 123 ... 0 0 0.25 0.0 0.25 0.00 0.0 -9 34.344388 -86.442199
16 ALABAMA Kate D Smith DAR High Sch 2 2 2 2 230 215 236 223 ... 0 0 0.34 0.0 0.33 0.00 0.0 -9 34.533721 -86.253681
17 ALABAMA Brindlee Mountain Primary School 5 5 2 2 119 95 128 102 ... 0 0 0.33 0.0 0.25 0.00 0.0 -9 34.399966 -86.446812
18 ALABAMA Robert D Sloman Primary 104 89 2 5 146 140 258 238 ... 0 0 0.25 0.0 25.25 0.00 0.0 -9 34.176713 -86.323279
19 ALABAMA Brindlee Mt Middle Sch 11 5 2 2 113 122 130 129 ... 0 0 0.25 0.0 0.33 0.00 0.0 -9 34.377158 -86.422337
20 ALABAMA Brindlee Mt High Sch 11 8 5 2 167 164 187 176 ... 0 0 0.34 0.0 0.34 0.00 0.0 -9 34.376400 -86.421876
21 ALABAMA Kate D Smith DAR Elem Sch 2 2 0 0 200 212 215 223 ... 0 0 0.33 0.0 0.33 0.00 0.0 -9 34.533721 -86.253681
22 ALABAMA Douglas Middle Sch 89 71 2 0 155 143 250 218 ... 0 0 0.25 0.0 0.25 0.00 0.0 -9 34.176234 -86.321259
23 ALABAMA Asbury Elem Sch 98 101 2 2 137 152 237 259 ... 0 0 0.50 0.0 0.50 0.00 0.0 -9 34.362794 -86.142507
24 ALABAMA Trace Crossings Elem Sch 116 56 74 80 101 74 334 238 ... 0 0 0.00 0.0 2.00 0.00 0.0 -9 33.340886 -86.844733
25 ALABAMA Greystone Elem Sch 20 20 23 20 227 173 307 256 ... 0 0 0.00 0.0 1.00 0.00 0.0 -9 33.413047 -86.658547
26 ALABAMA Hoover High Sch 113 116 428 398 860 797 1518 1449 ... 0 0 0.00 0.0 3.00 0.00 1.0 -9 33.344370 -86.837683
27 ALABAMA Berry Middle Sch 35 44 119 122 368 347 586 582 ... 0 0 0.00 0.0 3.00 0.00 0.0 -9 33.395648 -86.732180
28 ALABAMA South Shades Crest Elem Sch 29 20 80 53 173 179 318 295 ... 0 0 0.00 0.0 1.00 0.00 0.0 -9 33.337527 -86.878390
29 ALABAMA Robert F Bumpus Middle Sch 35 29 125 122 209 212 414 414 ... 0 0 0.00 0.0 1.00 0.00 0.0 -9 33.330911 -86.852477
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
96330 WYOMING Washington Elementary 23 17 0 0 92 83 120 102 ... 0 0 0.20 0.0 0.33 1.00 0.0 -9 41.510680 -109.465821
96331 WYOMING Lincoln Middle School 38 41 2 2 149 176 203 230 ... 0 0 0.33 0.0 0.50 0.00 1.0 -9 41.510680 -109.465821
96332 WYOMING Jackson Elementary 23 29 0 2 86 104 116 141 ... 0 0 0.20 0.0 0.34 0.00 0.0 -9 41.510680 -109.465821
96333 WYOMING Truman Elementary 20 20 0 2 131 116 155 142 ... 0 0 0.20 0.0 0.33 0.00 0.0 -9 41.510680 -109.465821
96334 WYOMING Harrison Elementary 8 11 2 2 122 125 137 143 ... 0 0 0.20 0.0 1.00 0.50 1.0 -9 41.510680 -109.465821
96335 WYOMING Thoman Ranch Elementary 0 2 0 0 0 0 0 2 ... 0 0 -9.00 0.0 0.00 0.00 0.0 -9 41.510680 -109.465821
96336 WYOMING Ten Sleep K-12 0 0 2 0 56 50 60 50 ... 0 0 0.00 0.0 1.00 0.00 0.0 -9 44.036012 -107.447922
96337 WYOMING Colter Elementary 113 119 0 0 167 149 290 270 ... 0 0 -9.00 0.0 1.45 0.67 0.0 -9 43.462312 -110.797767
96338 WYOMING Jackson Elementary 119 128 2 0 170 158 301 290 ... 0 0 -9.00 0.0 1.00 1.00 1.0 -9 43.462312 -110.797767
96339 WYOMING Jackson Hole High School 95 89 2 0 212 215 324 316 ... 0 0 0.75 0.0 0.95 1.00 0.0 -9 43.462312 -110.797767
96340 WYOMING Jackson Hole Middle School 92 92 2 2 194 182 298 288 ... 0 0 1.00 0.0 0.45 1.00 0.0 -9 43.462312 -110.797767
96341 WYOMING Alta Elementary 0 0 0 0 26 17 26 19 ... 0 0 -9.00 0.0 0.00 0.00 0.0 -9 43.462312 -110.797767
96342 WYOMING Kelly Elementary 0 0 0 0 26 20 26 20 ... 0 0 -9.00 0.0 0.05 0.00 0.0 -9 43.462312 -110.797767
96343 WYOMING Moran Elementary 2 2 0 0 8 8 10 10 ... 0 0 -9.00 0.0 0.05 0.00 0.0 -9 43.462312 -110.797767
96344 WYOMING Wilson Elementary 2 8 0 0 101 107 110 125 ... 0 0 -9.00 0.0 0.00 0.00 0.0 -9 43.462312 -110.797767
96345 WYOMING Summit High School 20 8 0 0 14 8 36 18 ... 0 0 0.25 0.0 0.05 0.00 1.0 -9 43.462312 -110.797767
96346 WYOMING Upton Middle School 0 0 0 0 32 20 34 24 ... 0 0 -9.00 0.0 0.20 0.00 0.0 -9 44.101000 -104.623594
96347 WYOMING Upton Elementary 0 0 0 2 59 56 63 66 ... 0 0 -9.00 0.0 0.45 0.00 0.0 -9 44.101000 -104.623594
96348 WYOMING Upton High School 0 0 0 0 47 47 49 51 ... 0 0 -9.00 0.0 0.35 0.00 0.0 -9 44.101000 -104.623594
96349 WYOMING Worland High School 53 38 0 0 146 155 203 197 ... 0 0 0.20 0.0 0.20 0.00 0.2 -9 44.011520 -107.943721
96350 WYOMING Worland Middle School 41 44 0 0 125 113 171 159 ... 0 0 0.20 0.0 0.20 0.00 0.2 -9 44.011520 -107.943721
96351 WYOMING East Side Elementary 17 23 0 0 74 92 93 120 ... 0 0 0.20 0.0 0.20 0.00 0.2 -9 44.011520 -107.943721
96352 WYOMING South Side Elementary 26 20 0 0 71 86 103 111 ... 0 0 0.20 0.0 0.20 0.00 0.2 -9 44.011520 -107.943721
96353 WYOMING West Side Elementary 35 41 0 0 65 53 104 98 ... 0 0 0.20 0.0 0.20 0.00 0.2 -9 44.011520 -107.943721
96354 WYOMING Powder River Basin Children's Center 0 2 0 0 26 11 28 13 ... 0 0 -9.00 0.0 1.00 0.50 1.0 -9 44.297605 -105.494905
96355 WYOMING C-Bar-V Ranch 5 2 0 0 26 5 41 9 ... 0 0 -9.00 0.0 1.00 2.50 3.5 -9 43.535575 -110.830607
96356 WYOMING Wyoming Girls School 0 8 0 2 0 53 0 82 ... 0 0 -9.00 -9.0 2.00 0.00 1.0 Post 41.138600 -104.819200
96357 WYOMING Wyoming Boys School 23 0 5 0 146 0 187 0 ... 0 0 -9.00 -9.0 0.00 0.00 0.0 Post 41.138600 -104.819200
96358 WYOMING Youth Emergency Services Inc. 2 2 0 0 17 14 21 18 ... 0 0 -9.00 0.0 0.00 0.00 1.0 -9 44.296500 -105.494900
96359 WYOMING Saint Stephen's Indian School 0 0 0 0 0 0 110 107 ... 0 0 -9.00 1.0 1.00 0.00 0.0 -9 42.985268 -108.420787

96360 rows × 50 columns

Geographic distribution of all schools

To get a sense for the geographic distrubtion of the 96,000 schools, we can plot their lat-long coordinates. Schools are colored simply by their latitude.

Note Alaska and Hawaii faintly on the left, with lower population densities.

In [6]:
plt.scatter(x=crdc_data['LON1516'], y=crdc_data['LAT1516'], c=crdc_data['LAT1516'], s=0.001, cmap='ocean')
plt.show()

Analytic Questions

Question 1: How disproportionate are suspensions across race and gender?

In [17]:
df = crdc_data
In [18]:
RACES = ['BL', 'WH', 'HI']
SEXES = ['M', 'F']
POP_LOWER_BOUND = 50 # Remove populations (e.g. white male) smaller than this threshold


# 1. Plotting

def plot_measure_accross_all_demographics(df, calculation, measure, bounds=[0,1]):
    figure_num = 0
    plt.figure(figsize=(20,6))
    for sex_index, sex in enumerate(SEXES):
        for race_index, race in enumerate(RACES):
            figure_num += 1
            likelyhood = f'{calculation}_{measure}_{race}_{sex}'
            curr_dem_data = df[pd.notnull(data[likelyhood])]
            
            plt.subplot(len(SEXES), len(RACES), figure_num)
            plt.scatter(x=curr_dem_data['LON1516'], y=curr_dem_data['LAT1516'], c=curr_dem_data[likelyhood], s=1, alpha=1, cmap='coolwarm')
            plt.title(f'{race}_{sex}, avg: {round(curr_dem_data[likelyhood].mean(), 2)}, n: {curr_dem_data[likelyhood].count()}')
            plt.colorbar()
            plt.clim(*bounds)
            plt.axis('off')
    plt.subplots_adjust(wspace=0.8, hspace=0.6)
    plt.show()
    

# 2. Calculations

# ITERATIVE FUNCTION which appends likelyhood columns to the df for all demographics
# Flag parameter 'comarison_race' lets you compare how many times the first races is likely to be
# affected as the second race.
def calculate_likelyhood_comparisons(df, measure, comparison_race=None, races=RACES, sexes=SEXES, lower_bound=POP_LOWER_BOUND):
    df = remove_schools_with_pop_less_than(lower_bound)
    for sex in sexes:
        for race in races:
            df = calculate_likelyhood_comparison(df, measure, race, sex, comparison_race, sex)
    return df


def remove_schools_with_pop_less_than(lower_bound):
    filter_col_df = df[DEMOGRAPHIC_COUNT_COLS]
    filtered_df = filter_col_df[filter_col_df >= lower_bound].dropna()
    return df.merge(filtered_df)


def calculate_likelyhood_comparison(df, measure, race, sex, comparison_race, comparison_sex):
    likelyhood = get_percentage_affected(df, measure, race, sex)
    column_name = f'PERCENT_AFFECTED_{measure}_{race}_{sex}'
    if comparison_race:
        likelyhood = likelyhood / get_percentage_affected(df, measure, comparison_race, comparison_sex)
        column_name = f'LH_COMPARED_TO_WH_FOR_{measure}_{race}_{sex}'
    likelyhood = likelyhood[(likelyhood != np.inf) & (pd.notnull(likelyhood)) & (likelyhood > 0)]  # Filter out infinity and NaN
    return df.merge(
        likelyhood.to_frame(column_name),
        how='left',
        left_index=True,
        right_index=True,
    )


def get_percentage_affected(df, measure, race, sex):
    affected = f'{measure}_{race}_{sex}'  # e.g. 'SCH_DISCWODIS_MULTOOS_BL_M'
    pop_total = f'SCH_ENR_{race}_{sex}'  # e.g. 'SCH_ENR_TR_M' 
    return df[affected] / df[pop_total]      

What percentage of each demographic population recieve suspensions?

In [20]:
data = calculate_likelyhood_comparisons(df, 'SCH_DISCWODIS_MULTOOS')  # "more than one out of school suspension"
calculation = 'PERCENT_AFFECTED' 
measure = 'SCH_DISCWODIS_MULTOOS'
plot_measure_accross_all_demographics(data, calculation, measure, bounds=[0, 0.15])

How many times more likely to be suspended are blank and latino students than white students?

For the above "percent of population affected" plots, it's near impossible to compare the severity across a single school. One way to zero on on this is to color schools by how much more likely a certain population is to be affeced compared to the least affected population. This measure might reveal schools where, even if a demogrpahic is severely affected, so were other demographics.

How likelyhood comparisons are calculated:

Percent of X pop affected / Percent of White counterpart population affected

In [21]:
data = calculate_likelyhood_comparisons(df, 'SCH_DISCWODIS_MULTOOS', comparison_race='WH')  # "more than one out of school suspension"
calculation = 'LH_COMPARED_TO_WH_FOR' 
measure = 'SCH_DISCWODIS_MULTOOS'
plot_measure_accross_all_demographics(data, calculation, measure, bounds=[1, 4])

Question 2: What impact have "Zero-tolerance" policies had on expulsion?

All analysis look at "students without disabilities expelled on zero tolerance policy".

Zero Tolerance policies impact 2% of white students, but about 4% and 5% of black and latino students, respectively, for schools who gave out at least one Zero Tolerance based expulsion for that demographic.

In [22]:
measure = 'SCH_DISCWODIS_EXPZT'
calculation = 'PERCENT_AFFECTED'
data = calculate_likelyhood_comparisons(df, measure, lower_bound=1)
plot_measure_accross_all_demographics(data, calculation, measure, bounds=[0,0.09])

Question 3: Does the proportion of Campus Counselors (psychologists) vs Campus Police Officers correlate with suspensions?

Data notes:

  1. It's possible to have fractional staff recorded if they are not full-time.

  2. A Data entry, system-level error on the form filled in by schools caused only 22,000 schools (after corrections 25000) to correctly enter the number of Law Enforcement Officers on campus. Where this system error occurred, the value is -9. Due this fact, there are only 17,500 schools with both Police and Counselor counts. We'll do investigate Juvenile Justice Facilities seperately: first, because we'd expect different counselor and police presence there, but second because none of them actually recorded the number of police.

  3. #### Where are the police data? Notably, none of the 608 Juvenile Justice facilities which have a count enterered for Police officers. This may be because the police staff at JJ facilities do not map to the categories on the survey.

We could try to use Security Guards as a proxy for Police. However, only 64 of the 608 JJ facilities have data for both Counselors and Security Guards, so we've removed all JJ schools from the following analysis.

In [23]:
POLICE = 'SCH_FTESECURITY_LEO'
COUNSELORS = 'SCH_FTESERVICES_PSY'
SECURITY_GUARDS = 'SCH_FTESECURITY_GUA'

SUSPENSIONS = 'TOTAL_SUSPENSIONS'


def plot_ratio_to_students_of(
                data=crdc_data,
                x_name=POLICE,
                y_name=COUNSELORS,
                xlabel=None,
                ylabel=None,
                xlim=None,
                ylim=None,
                dot_size=0.5,
                show_hist=False
):
    # Filter out any negative numbers, which signal data errors
    schools_with_correctly_documented_staff = (data[[y_name, x_name]] >= 0).all(axis='columns')
    staff_df = data.loc[schools_with_correctly_documented_staff,]

    # Get ratio to student population
    total_pop = staff_df['TOT_ENR_M'] + staff_df['TOT_ENR_F']
    staff_to_students_df = pd.DataFrame()
    staff_to_students_df[y_name] = staff_df[y_name] / total_pop
    staff_to_students_df[x_name] = staff_df[x_name] / total_pop

    # Total Suspensions, for dot coloring
    staff_to_students_df[SUSPENSIONS] = staff_df[['TOT_DISCWODIS_MULTOOS_M', 'TOT_DISCWODIS_MULTOOS_F']].sum(
        axis='columns') / total_pop

    # Remove anomalies
    staff_to_students_df = staff_to_students_df[staff_to_students_df[SUSPENSIONS] > 0]

    plt.scatter(x=staff_to_students_df[x_name], y=staff_to_students_df[y_name], alpha=0.9, s=dot_size,
                c=staff_to_students_df[SUSPENSIONS], cmap='coolwarm')
    plt.clim(0, 0.05)
    plt.ylabel(ylabel)
    plt.xlabel(xlabel)
    plt.title(f'{xlabel} vs {ylabel}, color=Long Term Suspension Percentage, n={len(staff_to_students_df)}')
    plt.colorbar()
    axes = plt.gca()
    axes.set_xlim(xlim)
    axes.set_ylim(ylim)

    plt.show()

    if show_hist:
        # Hist of x axis
        plot_hist(staff_to_students_df, x_name, xlabel)
        # Hist of y axis
        plot_hist(staff_to_students_df, y_name, ylabel)

    return staff_to_students_df


def plot_hist(data, name, label, x_range=[0, 0.008], y_range=[None, None]):
    plt.hist(data[name], range=x_range, bins=100)
    plt.xlabel(label)
    plt.ylabel('Number of Schools')
    plt.title(f'Distribution of {label}, n={len(data)}, mean={round(data[name].mean(), 3)}')
    plt.grid(True)
    plt.show()
    return

Non Juvenile Justice Schools:

Long Term Suspensions correlate with Police Presence

Plotting Campus Police against Campus Counselors per student, and color schools by we see a a hotspot of suspensions at shcools with low-counselor levels and high police levels.

You can also see how schools seem to hire with a predetermined ratio in mind: consistent ratio lines jut outward, most notably one marking the a 1-to-1 ratio.

This supports existing intution in the "School to Prison Pipeline" concept -- schools with high suspension rates confront troubled kids with police with higher likelyhood than trained counselors. However, it remains unclear if high levels of suspension-worthy activity triggered hiring more Police, or if increased Police (and a lack of counseling) escalate suspension counts.

Which leads us to two questions we can investigate: do Police correlate positively alone with suspensions, or does it worsen when mixed with low counseling?

In [24]:
staff_to_students_df = plot_ratio_to_students_of( 
    x_name=POLICE, 
    y_name=COUNSELORS, 
    xlabel='Police/student', 
    ylabel='Counselor/student', 
    xlim=[0, 0.01], 
    ylim=[0, 0.01],
    show_hist=True
)

Does lack of counselors alone account for increased suspensions?

This, of course, may result from schools with a high rates of suspensions and misbehaviour attempting to mediate rates via increased counselors. But further research is required.

In [25]:
insufficient_counselors = staff_to_students_df[staff_to_students_df[COUNSELORS] < 0.004]
plot_hist(insufficient_counselors, SUSPENSIONS, label=f'Suspension Rates when Counselors/Student < 0.004', x_range=[0, 0.2])

sufficient_counselors = staff_to_students_df[staff_to_students_df[COUNSELORS] >= 0.004]
plot_hist(sufficient_counselors, SUSPENSIONS, label=f'Suspension Rates when Counselors/Student >= 0.004', x_range=[0, 0.2])

Does high levels of Police alone correlate with increased suspensions?

Here, results line up predictably with the STTP: Schools with the Police/Student ration >= 0.004 have actually have ~10x the rate of suspensions of those who had less police presence

This, of course, may result from schools with a high rates of suspensions and misbehaviour attempting to mediate rates via increased counselors. But further research is required.

Further research is also necessary to observe which schools have such high police presence, given that there are only 1,000 of them.

In [26]:
low_police = staff_to_students_df[staff_to_students_df[POLICE] < 0.004]
plot_hist(low_police, POLICE, label=f'Suspension Rates when Police/Student < 0.004', x_range=[0, 0.2])

high_police = staff_to_students_df[staff_to_students_df[POLICE] >= 0.004]
plot_hist(high_police, POLICE, label=f'Suspension Rates when Police/Student >= 0.004', x_range=[0, 0.2])

Q:What's the correlation with having more Police than Counselors?

A: A 2x increase in suspensions: from 0.001 to 0.002

In [148]:
more_police = staff_to_students_df[staff_to_students_df[COUNSELORS] < staff_to_students_df[POLICE]]
plot_hist(more_police, POLICE, label=f'Suspension Rates when Police/Student > Counselor/Student', x_range=[0, 0.2])

less_police = staff_to_students_df[staff_to_students_df[COUNSELORS] >= staff_to_students_df[POLICE]]
plot_hist(less_police, POLICE, label=f'Suspension Rates when Police/Student <= Counselor/Student', x_range=[0, 0.2])

Future questions to investigate

Digging deeper into Zero Tolerance Policies

With proper data on which schools implement zero-tolerance (rather than just knowing which schools have given at least one suspension via the policies), we could more accurately measure the impact of these policies.

Does zero-tolerance policy adoption correlate with other factors? Do ZT policy schools have a higher rate in general of expulsions?

Particularly of interest, can we see schools increasing police or at the cost of counselors? And does that impact change in suspension rates?

Further, we could takes differences between years to identify schools underwent large changes, and investigate them further for signs of the cuase.

Expanding my analysis to more demographic and discipline categories

More obviously, this investigation looked only at students from three races without disabilities and in terms of long term suspensions. More rigorous analsysis would include more expansive demographics as well as comparison against and aggregation with short term suspensions.

Incorperate Socio-Economic Status and Test Scores

The NYT's Upshot did a data vis article comparing race, academic performance, and income utilizing the Stanford Education Data Archive (SEDA). The positive trend of wealth and performance were stunning, as were how wealth seems not to nuetralize the race-based gaps in achievement-- do these hold for discipline as well?

In [ ]: