In this part of the technical narrative we specifically analyze the data given to us by the Monarch school. The style of this narrative follows our journey through the thinking and coding we have done in this week long Datathon. Though we have left some uninteresting results out, we have mostly kept all of our findings to illustrate what made us think the way we did. On the way, you will see some patterns that we have found. However,as a heads up, using our previous Data Science Expertise we have found that most of the patterns are not casual, but rather intuitive correlations which can be explained. As a result, we go through the process of explaining the -perhaps not as powerful as hoped- data, but at the same time do end up giving advice to the Monarch school.
The general structure of our approach to the Monarch Data Set:
1.We first ran some general tests on the data to get a feel for it.
2.We then attacked a hypothesis and delved deeper into the analysis to see if it was true. This hypothesis will be described in more detail throughout the narrative.
Interesting statistic that we found Teachers seem more biased on the surface -in terms of referral giving- than they really are. We will use different comparisons, frequencies, proportions, and variations to show why they aren't as mean as they may look.
Summary of Advice to Monarch Think about creating a fun, interactive, "role-playing" based "behavior" class for either lower grades (Grades 2-4) focusing on emotional health(such as anger management etc) OR a class on how to deal with lack of attention in large crowds. The the source of this advice and the reasoning behind the "either/or factor" is explained later in the narrative.
Further Analysis of the Education Problem We have chosen this document to be our main technical narrative. However, please check out the python notebook that we have submitted which goes over our analysis for the California Suspensions data set. We consider that analysis as being less interesting, but still valuable in the expansion of our analysis.
import pandas as pd
import numpy as np
import statsmodels.api as sm #used for statistical modeling
import matplotlib.pyplot as plt
#to have plots show up automatically in notebook
%matplotlib inline
#this gives us the frequency method
from nltk.book import *
import statistics
from statistics import median
from collections import Counter
from IPython.core.display import Image
Importing the Data Files
#reading in file
diref_data = pd.read_csv('disciplinaryreferrals.csv')
#renaming columns
diref_data.rename(columns={'c03I02001_id': 'CaseNumber',
'c03I02002_timestamp': 'TimeStamp',
'c03I02003_student_id': 'StudentID',
'c03I02004_grade': 'Grade',
'c03I02005_date_and_time_of_misbehavior': 'TimeofIncident',
'c03I02006_location_of_misbehavior': 'Location',
'c03I02007_documenting_staff_id': 'DocumentingStaffID',
'c03I02008_documenting_staff': 'DocumentingStaff',
'c03I02009_classroom_or_administrative_managed': 'ClassroomOrAdministrative',
'c03I0200a_type_of_misbehavior': 'Type',
'c03I0200b_narrative_description_of_misbehavior': 'Description',
'c03I0200c_consequence': 'Consequence',
'c03I0200d_reporting_staff_id': 'ReportingStaffID',
'c03I0200e_reporting_staff': 'ReportingStaff',
'c03I0200f_d12_planning_completed': 'D12Planning',
'c03I0200g_narrative_of_consequence': 'NarrativeOfConsequence'
}, inplace=True)
# Deleting unneccesary information (case number:doesn't give new info,
# teacher full names: have IDs,D12Planning:All say ABCD,
# narrative of consequence: missing for most, Classroom or Administrative:Mostly Administrative)
#Can comment out the bottom "del" if you would like to work with the values
#del diref_data['CaseNumber']
del diref_data['DocumentingStaff']
del diref_data['ReportingStaff']
del diref_data['ClassroomOrAdministrative']
del diref_data['D12Planning']
del diref_data['NarrativeOfConsequence']
most_common_offense= 'null' #to store the most common offense
offense_amount = -1 #to the amount of times that offense was commited
#for loop to see how many times each unique offense was commited
offense_cases = {} #creating dictionary to store amount of offenses
for offense in diref_data['Type']:
if not offense in offense_cases:
offense_cases[offense] = 1
else:
offense_cases[offense] = offense_cases[offense]+1
#lets now classify the types of disruptions more thoroughly- since some students have multiple
#types of disruptions and are thus classified on their own
#for offense in offense_cases:
#lets see which locaiton provokes the most referrals
for offense in offense_cases:
if offense_cases[offense]>offense_amount:
offense_amount = offense_cases[offense]
most_common_offense = offense_amount
print("Top Offense has been commited: ",offense_amount, "times")
print('Top 5 offenses a kid is most likely going to commit:\n',
sorted(offense_cases, key=offense_cases.__getitem__, reverse = True)[0:5])
#plot a representative bar graph
plt.bar(range(len(offense_cases)), offense_cases.values(), align='center')
plt.xlabel('Offense')
plt.ylabel('Offense Cases')
plt.title("General Distribution of Types of Offenses")
#The bar graph
plt.show()
Top Offense has been commited: 130 times Top 5 offenses a kid is most likely going to commit: ['Major Defiances (D12), Major Disruptions (D12)', 'Verbal/Physical Intimidation (D12)', 'Minor Disruptions (CM), Non-compliance/off-task (CM), Recurring teacher managed behavior checked above (D12)', 'Major Defiances (D12)', 'Minor Disruptions (CM)']
most_common_location= 'null' #to store the most common locaiton a referral was given in
location_amount = -1 #to the amount of times that referral was given in that location
#for loop to see how many times each location had a referral given out in
location_cases = {} #creating dictionary to store amount of different locations
for location in diref_data['Location']:
if not location in location_cases:
location_cases[location] = 1
else:
location_cases[location] = location_cases[location]+1
#lets see which locaiton provokes the most referrals
for location in location_cases:
if location_cases[location]>location_amount:
location_amount = location_cases[location]
most_common_location = location
print("Location:", most_common_location,"has provoked highest amount of referrals: ",location_amount)
print('Top 5 places a kid is most likely going to get a referral:\n',
sorted(location_cases, key=location_cases.__getitem__, reverse = True)[0:5])
#plot a representative bar graph
plt.bar(range(len(location_cases)), location_cases.values(), align='center')
plt.xlabel('Location')
plt.ylabel('Referral Cases')
plt.title("Location Correlation?")
#The bar graph
plt.show()
Location: Classroom has provoked highest amount of referrals: 290 Top 5 places a kid is most likely going to get a referral: ['Classroom', 'English classroom', 'Science Classroom', 'PE', 'Yoga']
highest_offense = 0 #to store the highest amount of offenses a student has had to deal with
highest_offender = -1 #to store the student with most referrals
#for loop to see how many times each student has recieved a referral
all_student_cases = {} #creating dictionary to store amount of times theyve been cited
for student in diref_data['StudentID']:
if not student in all_student_cases:
all_student_cases[student] = 1
else:
all_student_cases[student] = all_student_cases[student]+1
#plot a representative bar graph
plt.bar(range(len(all_student_cases)), sorted(all_student_cases.values()), align='center')
plt.xlabel('Student')
plt.ylabel('Referral Cases')
plt.title("Distribution of Referred Students")
#The bar graph isn't really accurate in matching the teacher IDs because some IDs are missing
#This is fixed in later graphs where StudentIDs are important, here it isn't at the moment
plt.show()
highest_offense = 0 #to store the highest amount of offenses a teacher has had to deal with
highest_offender = -1 #to store the teacher with most referrals
#for loop to see how many times each teacher has given out a referral
teacher_cases = {} #creating dictionary to store amount of times theyve been cited
for teacher in diref_data['ReportingStaffID']:
if not teacher in teacher_cases:
teacher_cases[teacher] = 1
else:
teacher_cases[teacher] = teacher_cases[teacher]+1
#lets see which teacher gives out the most referrals
for teacher in teacher_cases:
if teacher_cases[teacher]>highest_offense:
highest_offense = teacher_cases[teacher]
highest_offender = teacher
#lets find the "median" case to use later on for analysis
print("Teacher", median(teacher_cases), "has given out the median amount of referrals.")
print("Teacher", highest_offender,"has given the highest amount of referrals: ",highest_offense)
#plot a representative bar graph
plt.bar(range(len(teacher_cases)), teacher_cases.values(), align='center')
plt.xlabel('Teacher')
plt.ylabel('Referral Cases')
plt.title("How many referrals has each teacher given?")
#The bar graph isn't really accurate in matching the teacher IDs because some IDs are missing
plt.show()
Teacher 92 has given out the median amount of referrals. Teacher 127 has given the highest amount of referrals: 70
At this point we are prompted with a hypothesis. There seems to be a good amount of students and teachers who are "repeat offenders." From personal experience, it always seems to be that there are certain teachers who are just "meaner" and therefore give out more referrals. To test this theory/bias out, we will analyze correlations and causations (if any) between the teachers handing out referrals and the students recieving them. Could it be that the majority of the referral pool is dominated by a certain teacher who keeps giving out referrals to the same students? If so, could the referral situation be alleviated by approaching that teacher versus all students?
Disclaimer: Within this hypothesis we do in fact work with a bias. However, within the analysis we make sure that our bias does not affect the way we analyze. Yes it does affect the questions we pose and the data we choose to analyze, but it doesn't make us see things that aren't really there.
global student_cases,teacher_cases
# Let's see if students and teacher are strongly correlated (could personality conflicts
# be playing a role)?
temp_teacher = [0,0]
temp_student = [0,0]
#order students by increasing order of how many referrals they've recieved
y = diref_data['StudentID'] # response
#order teachers by increasing order of how many referrals they've given out
X = diref_data['ReportingStaffID'] # predictor
X = sm.add_constant(X) # Add a constant term to the predictor
# The actual fitting happens here
est = sm.OLS(y, X) #fit least squares model
est = est.fit()
est.summary()
Dep. Variable: | StudentID | R-squared: | 0.003 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.002 |
Method: | Least Squares | F-statistic: | 3.275 |
Date: | Sun, 08 Mar 2015 | Prob (F-statistic): | 0.0707 |
Time: | 00:11:48 | Log-Likelihood: | -5704.8 |
No. Observations: | 983 | AIC: | 1.141e+04 |
Df Residuals: | 981 | BIC: | 1.142e+04 |
Df Model: | 1 |
coef | std err | t | P>|t| | [95.0% Conf. Int.] | |
---|---|---|---|---|---|
const | 131.7158 | 6.257 | 21.050 | 0.000 | 119.436 143.995 |
ReportingStaffID | 0.1036 | 0.057 | 1.810 | 0.071 | -0.009 0.216 |
Omnibus: | 1310.951 | Durbin-Watson: | 0.861 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 65.982 |
Skew: | 0.058 | Prob(JB): | 4.70e-15 |
Kurtosis: | 1.736 | Cond. No. | 267. |
# Let's plot the regression line on top of the data
x_ = np.array([X.min(), X.max()])
y_ = est.predict(x_)
diref_data.plot(x='ReportingStaffID', y='StudentID', kind='scatter')
plt.plot(x_[:, 1], y_, 'r-')
plt.title("Student - Teacher Correlation")
<matplotlib.text.Text at 0x111d10780>
#Now we know from the previous data that Teacher 127 (aka Ms.Tyler) gives out the most referrals...
#So lets see now what students she has given referrals to
highest_offense = 0 #to store the highest amount of offenses a student has recieved
highest_offender = -1 #to store the student with the most referrals handed out by Tyler
highest_student_ID = 0 #to store highest student ID for plotting purposes
#for loop to see how many times each student has recieved a referral from Tyler
student_cases = {} #creating dictionary to store amount of times theyve been cited
for (student,teacher) in zip(diref_data['StudentID'],diref_data['ReportingStaffID']):
if(teacher == 127):
if not student in student_cases:
student_cases[student] = 1
else:
student_cases[student] = student_cases[student]+1
#lets see what the highest studentID is for plotting purposes
for student in student_cases:
if student>highest_student_ID:
highest_student_ID = student
#for loop to see how many times each student has recieved a referral NOT from Tyler
referral_cases = {} #creating dictionary to store amount of times theyve been cited by others
for (all_students,teacher) in zip(diref_data['StudentID'],diref_data['ReportingStaffID']):
if(all_students in student_cases): #checking if that student is in Tyler student cases
if(teacher != 127):#counting only teachers other than Tyler
if not all_students in referral_cases:
referral_cases[all_students] = 1
else:
referral_cases[all_students] = referral_cases[all_students]+1
#filling in missing student IDs with 0 to keep things uniform
for i in range (0,highest_student_ID):
if i not in student_cases:
student_cases[i] = 0
#filling in missing student IDs with 0 to keep things uniform
for i in range (0,highest_student_ID+1):
if i not in referral_cases:
referral_cases[i] = 0
percentage_keeper={}
for student,referral in zip(student_cases,referral_cases):
if(student_cases[student] != 0): #if student was given
percentage_keeper[student] =(student_cases[student]/
(student_cases[student]+ referral_cases[referral]))*100
frequency = Counter()
for percentage in percentage_keeper:
frequency[percentage_keeper[percentage]]+=1
fdist = FreqDist(frequency)
plt.title("Which proportion of referrals recieved by a student were given by Ms.Tyler?")
fdist.plot()
print(frequency.most_common(4))
[(100.0, 14), (33.33333333333333, 5), (50.0, 5), (25.0, 2)]
#filling in missing student IDs with 0 for plotting purposes
for i in range (0,highest_student_ID):
if i not in student_cases:
student_cases[i] = 0
key_dictionary = {}
for student,i in zip(student_cases,range(0,highest_student_ID+1)):
key_dictionary[i] = student
#plot a representative bar graph
plt.bar(key_dictionary, student_cases.values(), align='center')
plt.xlabel('Student')
plt.ylabel('Referral Cases from Tyler')
plt.title("Tyler referral cases")
#Showing how many referrals shes given out to different students
plt.show()
#filling in missing student IDs with 0
for i in range (0,highest_student_ID+1):
if i not in referral_cases:
referral_cases[i] = 0
key_dictionary = {}
for referral,i in zip(referral_cases,range(0,highest_student_ID+1)):
key_dictionary[i] = referral
#plot a representative bar graph
plt.bar(key_dictionary, referral_cases.values(), align='center')
plt.xlabel('Student')
plt.ylabel('Referral Cases NOT from Tyler')
plt.title("Non-Tyler referral cases")
#Showing how many referrals shes given out to different students
plt.show()
#Lets analyze Ms.Naechia's referral tendencies...
highest_offense = 0 #to store the highest amount of offenses a student has recieved
highest_offender = -1 #to store the student with the most referrals handed out by Naechia
highest_student_ID = 0 #to store highest student ID for plotting purposes
#for loop to see how many times each student has recieved a referral from Naechia
student_cases = {} #creating dictionary to store amount of times theyve been cited
for (student,teacher) in zip(diref_data['StudentID'],diref_data['ReportingStaffID']):
if(teacher == 112):
if not student in student_cases:
student_cases[student] = 1
else:
student_cases[student] = student_cases[student]+1
#lets see what the highest studentID is for plotting purposes
for student in student_cases:
if student>highest_student_ID:
highest_student_ID = student
#for loop to see how many times each student has recieved a referral NOT from Naechia
referral_cases = {} #creating dictionary to store amount of times theyve been cited by others
for (all_students,teacher) in zip(diref_data['StudentID'],diref_data['ReportingStaffID']):
if(all_students in student_cases): #checking if that student is in Naechia student cases
if(teacher != 112):#counting only teachers other than Naechia
if not all_students in referral_cases:
referral_cases[all_students] = 1
else:
referral_cases[all_students] = referral_cases[all_students]+1
#filling in missing student IDs with 0 to keep things uniform
for i in range (0,highest_student_ID):
if i not in student_cases:
student_cases[i] = 0
#filling in missing student IDs with 0 to keep things uniform
for i in range (0,highest_student_ID+1):
if i not in referral_cases:
referral_cases[i] = 0
percentage_keeper={}
for student,referral in zip(student_cases,referral_cases):
if(student_cases[student] != 0): #if student was given
percentage_keeper[student] =(student_cases[student]/
(student_cases[student]+ referral_cases[referral]))*100
frequency = Counter()
for percentage in percentage_keeper:
frequency[percentage_keeper[percentage]]+=1
fdist = FreqDist(frequency)
plt.title("Which proportion of referrals recieved by a student were given by Ms.Naechia?")
fdist.plot()
print(frequency.most_common(4))
#filling in missing student IDs with 0 for plotting purposes
for i in range (0,highest_student_ID):
if i not in student_cases:
student_cases[i] = 0
key_dictionary = {}
for student,i in zip(student_cases,range(0,highest_student_ID+1)):
key_dictionary[i] = student
#plot a representative bar graph
plt.bar(key_dictionary, student_cases.values(), align='center')
plt.xlabel('Student')
plt.ylabel('Referral Cases from Naechia')
plt.title("Naechia referral cases")
#Showing how many referrals shes given out to different students
plt.show()
#filling in missing student IDs with 0
for i in range (0,highest_student_ID+1):
if i not in referral_cases:
referral_cases[i] = 0
key_dictionary = {}
for referral,i in zip(referral_cases,range(0,highest_student_ID+1)):
key_dictionary[i] = referral
#plot a representative bar graph
plt.bar(key_dictionary, referral_cases.values(), align='center')
plt.xlabel('Student')
plt.ylabel('Referral Cases NOT from Naechia')
plt.title("Non-Naechia referral cases")
#Showing how many referrals shes given out to different students
plt.show()
[(100.0, 5), (14.285714285714285, 3), (20.0, 3), (6.896551724137931, 2)]
#Lets analyze the median case: Teacher 92 Ms.Crystal
highest_offense = 0 #to store the highest amount of offenses a student has recieved
highest_offender = -1 #to store the student with the most referrals handed out by Ms.Crystal
highest_student_ID = 0 #to store highest student ID for plotting purposes
#for loop to see how many times each student has recieved a referral from Ms.Crystal
student_cases = {} #creating dictionary to store amount of times theyve been cited
for (student,teacher) in zip(diref_data['StudentID'],diref_data['ReportingStaffID']):
if(teacher == 92):
if not student in student_cases:
student_cases[student] = 1
else:
student_cases[student] = student_cases[student]+1
#lets see what the highest studentID is for plotting purposes
for student in student_cases:
if student>highest_student_ID:
highest_student_ID = student
#for loop to see how many times each student has recieved a referral NOT from Ms.Crystal
referral_cases = {} #creating dictionary to store amount of times theyve been cited by others
for (all_students,teacher) in zip(diref_data['StudentID'],diref_data['ReportingStaffID']):
if(all_students in student_cases): #checking if that student is in Ms.Crystal student cases
if(teacher != 92):#counting only teachers other than Ms.Crystal
if not all_students in referral_cases:
referral_cases[all_students] = 1
else:
referral_cases[all_students] = referral_cases[all_students]+1
#filling in missing student IDs with 0 to keep things uniform
for i in range (0,highest_student_ID):
if i not in student_cases:
student_cases[i] = 0
#filling in missing student IDs with 0 to keep things uniform
for i in range (0,highest_student_ID+1):
if i not in referral_cases:
referral_cases[i] = 0
percentage_keeper={}
for student,referral in zip(student_cases,referral_cases):
if(student_cases[student] != 0): #if student was given
percentage_keeper[student] =(student_cases[student]/
(student_cases[student]+ referral_cases[referral]))*100
frequency = Counter()
for percentage in percentage_keeper:
frequency[percentage_keeper[percentage]]+=1
fdist = FreqDist(frequency)
plt.title("Which proportion of referrals recieved by a student were given by Ms.Crystal?")
fdist.plot()
print(frequency.most_common(4))
#filling in missing student IDs with 0 for plotting purposes
for i in range (0,highest_student_ID):
if i not in student_cases:
student_cases[i] = 0
key_dictionary = {}
for student,i in zip(student_cases,range(0,highest_student_ID+1)):
key_dictionary[i] = student
#plot a representative bar graph
plt.bar(key_dictionary, student_cases.values(), align='center')
plt.xlabel('Student')
plt.ylabel('Referral Cases from Ms.Crystal')
plt.title("Ms.Crystal referral cases")
#Showing how many referrals shes given out to different students
plt.show()
#filling in missing student IDs with 0
for i in range (0,highest_student_ID+1):
if i not in referral_cases:
referral_cases[i] = 0
key_dictionary = {}
for referral,i in zip(referral_cases,range(0,highest_student_ID+1)):
key_dictionary[i] = referral
#plot a representative bar graph
plt.bar(key_dictionary, referral_cases.values(), align='center')
plt.xlabel('Student')
plt.ylabel('Referral Cases NOT from Ms.Crystal')
plt.title("Non-Ms.Crystal referral cases")
#Showing how many referrals shes given out to different students
plt.show()
[(9.090909090909092, 2), (100.0, 1), (33.33333333333333, 1)]
#Ms.Crystal:
total_referrals = 6
total_sole_students = 1 #amount of 100% frequencies
print("Ms.Crystal:",total_sole_students/total_referrals)
#Ms.Naechia:
total_referrals = 47
total_sole_students = 5 #amount of 100% frequencies
print("Ms.Naechia:",total_sole_students/total_referrals)
#Ms.Tyler:
total_referrals = 70
total_sole_students = 13 #amount of 100% frequencies
print("Ms.Tyler:",total_sole_students/total_referrals)
Ms.Crystal: 0.16666666666666666 Ms.Naechia: 0.10638297872340426 Ms.Tyler: 0.18571428571428572
global teacher_cases
proportion_counter = {} #dictionary to keep all proportions for all teachers
#filling in missing Teacher IDs with 0 to keep things uniform
for i in range (0,175):
if i not in teacher_cases:
teacher_cases[i] = 0
#for loop to calculate how many students the teacher was the only one who gave them the referral
for this_teacher in range(0,175):
if(teacher_cases[this_teacher] == 0):#means this is a non-existant teacher ID
continue;
else:
#for loop to see how many times each student has recieved a referral from Ms.Crystal
student_cases = {} #creating dictionary to store amount of times theyve been cited
for (student,teacher) in zip(diref_data['StudentID'],diref_data['ReportingStaffID']):
if(teacher == this_teacher):
if not student in student_cases:
student_cases[student] = 1
else:
student_cases[student] = student_cases[student]+1
#filling in missing student IDs with 0 to keep things uniform
for i in range (0,highest_student_ID):
if i not in student_cases:
student_cases[i] = 0
#creating dictionary to store amount of times theyve been cited by others
referral_cases = {}
for (all_students,teacher) in zip(diref_data['StudentID'],diref_data['ReportingStaffID']):
if(all_students in student_cases): #checking if that student is in this teacher's student cases
if(teacher != this_teacher):#counting only teachers other than this teacher
if not all_students in referral_cases:
referral_cases[all_students] = 1
else:
referral_cases[all_students] = referral_cases[all_students]+1
#filling in missing student IDs with 0 to keep things uniform
for i in range (0,highest_student_ID+1):
if i not in referral_cases:
referral_cases[i] = 0
#keeping track of the proportions
percentage_keeper={}
for student,referral in zip(student_cases,referral_cases):
if(student_cases[student] != 0): #if student was given
percentage_keeper[student] =(student_cases[student]/
(student_cases[student]+ referral_cases[referral]))*100
frequency = Counter()
for percentage in percentage_keeper:
frequency[percentage_keeper[percentage]]+=1
#assign amount of 100% frequencies to that teacher
proportion_counter[this_teacher] = frequency[100.0]
final_proportions = {}
#now find all the proportions
for teacher in teacher_cases:
if(teacher_cases[teacher] == 0):#means this is a non-existant teacher ID
continue;
elif(proportion_counter[teacher] == 0):
final_proportions[teacher] = 0
else: # store the amount of 100% frequencies/total referrals
final_proportions[teacher] = proportion_counter[teacher]/teacher_cases[teacher]
#lets visually represent this data
#plot a representative bar graph
plt.bar(range(len(final_proportions)), sorted(final_proportions.values()), align='center')
plt.xlabel('Teacher')
plt.ylabel('Proportion of sole referrals')
plt.title("Proportion of Students Who Only Get Referrals From One Teacher")
#The bar graph
plt.show()
print("The variance for the proportions is:",statistics.variance(final_proportions.values()))
The variance for the proportions is: 0.0852883388144697
highest_referral_number = 0 # top number of referrals per grade
highest_offending_grade= -1 # to store the grade with the highest amount of referrals
#for loop to see how many times each teacher grade has recieven a referral
grade_cases = {} #creating dictionary to store amount of times theyve been cited
for grade in diref_data['Grade']:
if not grade in grade_cases:
grade_cases[grade] = 1
else:
grade_cases[grade] = grade_cases[grade]+1
print(grade_cases)
#lets see which teacher gives out the most referrals
for grade in grade_cases:
if grade_cases[grade]>highest_referral_number:
highest_referral_number = grade_cases[grade]
highest_offending_grade = grade
print("Grade", highest_offending_grade,"has recieved the highest amount of referrals: ",
highest_referral_number)
#plot a representative bar graph
plt.bar(range(len(grade_cases)), grade_cases.values(), align='center')
plt.xlabel('Grade')
plt.ylabel('Referral Cases')
plt.title("Whats the Worst Disciplined Grade?")
#The bar graph isn't really accurate in matching the teacher IDs because some IDs are missing
plt.show()
{nan: 1, 1.0: 60, 2.0: 135, 3.0: 119, 4.0: 134, 5.0: 93, 6.0: 74, 7.0: 72, 8.0: 77, 9.0: 62, 10.0: 90, 11.0: 48, 12.0: 16, nan: 1, nan: 1} Grade 2.0 has recieved the highest amount of referrals: 135
#Lets create a dictionary storing which grades Tyler and Naechia work with
Tyler_cases = {} #creating dictionary to store amount of times theyve been cited
for (grade,teacher) in zip(diref_data['Grade'],diref_data['ReportingStaffID']):
if(teacher == 127):
if not grade in Tyler_cases:
Tyler_cases[grade] = 1
else:
Tyler_cases[grade] = Tyler_cases[grade]+1
Naechia_cases = {} #creating dictionary to store amount of times theyve been cited
for (grade,teacher) in zip(diref_data['Grade'],diref_data['ReportingStaffID']):
if(teacher == 112):
if not grade in Naechia_cases:
Naechia_cases[grade] = 1
else:
Naechia_cases[grade] = Naechia_cases[grade]+1
print("Tyler:",Tyler_cases)
print("Naechia:",Naechia_cases)
Tyler: {9.0: 19, 10.0: 28, 11.0: 16, 12.0: 7} Naechia: {2.0: 2, 3.0: 9, 4.0: 27, 5.0: 8, 6.0: 1}
Looking at the above data, it prompts us to think that this could just be another "population density" correlation (aka. the largest enrolled grades naturally have the largest amount of referrals given). The general data set that we are given doesn't talk about the general grade density however, which would have been nice to compare this with. (We did try to find enrollment data for Monarch, but a lot of the sources contradicted eachother and were only available for before Monarch's expansion). Nonetheless, it is understandable that this data wouldn't be explicitly available as according to the Monarch website the average student only attends Monarch for 6 months.
Regardless of the lack of data, this is something that we recommend Monarch to consider. If the referral rate per grade does correlate with enrollment, then there seems to be a higher chance of referral situations when a larger crowd is present. Perhaps then, students should be taught skills on how to look after themselves and behave when there is less individualized attention. Perhaps these students begin to feel lost and act out as a subconsious desire to be noticed.
However, if the above distribution does not have to do with population density, then maybe the younger students in particular need to be guided more in behavior. Perhaps a fun class on appropriate behavior (such as anger managment)needs to be introduced? This class would have a large potential of being "fun" as it doesn't require the students to do any major thinking such as math or science. Lessons could be interactive and "role-play" based in which students could have a lot of fun acting out the different techniques to living a more emotionally healthy life.
Image(filename='Action_wordcloud.png')