Analyzing NYC High School Data¶

One of the most controversial issues in the U.S. educational system is the effectiveness of standardized tests, and whether they're unfair to certain groups.

One of such exam is SAT. The SAT, or Scholastic Aptitude Test, is an exam that U.S. high school students take before applying to college. Colleges take the test scores into account when deciding who to admit, so it's fairly important to perform well on it.

In this project, we will collect data from various sources and determine whether SAT scores are unfair to certain groups.

New York City has a significant immigrant population and is very diverse, so comparing demographic factors such as race, income, and gender with SAT scores is a good way to determine whether the SAT is a fair test. For example, if certain racial groups consistently perform better on the SAT, we would have some evidence that the SAT is unfair.

For the purposes of this project, we'll be using data about New York City public schools, which can be found here.
Below are the datasets that we will be using:

SAT scores by school - SAT scores for each high school in New York City
School attendance - Attendance information for each school in New York City
Class size - Information on class size for each school
AP test results - Advanced Placement (AP) exam results for each high school (passing an optional AP exam in a particular subject can earn a student college credit in that subject)
Graduation outcomes - The percentage of students who graduated, and other outcome information
Demographics - Demographic information for each school
School survey - Surveys of parents, teachers, and students at each school

All of these data sets are interrelated. We'll combine all of them into a single data set combined which will be used in our analysis, to find correlations.

In [1]:
import pandas as pd
import numpy
import re

data_files = [
"ap_2010.csv",
"class_size.csv",
"demographics.csv",
"hs_directory.csv",
"sat_results.csv"
]

data = {}

for f in data_files:
data[f.replace(".csv", "")] = d

In [2]:
for item in data:
print(item)

ap_2010
class_size
demographics
hs_directory
sat_results

In [3]:
data['ap_2010'].head()

Out[3]:
DBN SchoolName AP Test Takers Total Exams Taken Number of Exams with scores 3 4 or 5
0 01M448 UNIVERSITY NEIGHBORHOOD H.S. 39 49 10
1 01M450 EAST SIDE COMMUNITY HS 19 21 s
2 01M515 LOWER EASTSIDE PREP 24 26 24
3 01M539 NEW EXPLORATIONS SCI,TECH,MATH 255 377 191
4 02M296 High School of Hospitality Management s s s
In [4]:
data['class_size'].head()

Out[4]:
CSD BOROUGH SCHOOL CODE SCHOOL NAME GRADE PROGRAM TYPE CORE SUBJECT (MS CORE and 9-12 ONLY) CORE COURSE (MS CORE and 9-12 ONLY) SERVICE CATEGORY(K-9* ONLY) NUMBER OF STUDENTS / SEATS FILLED NUMBER OF SECTIONS AVERAGE CLASS SIZE SIZE OF SMALLEST CLASS SIZE OF LARGEST CLASS DATA SOURCE SCHOOLWIDE PUPIL-TEACHER RATIO
0 1 M M015 P.S. 015 Roberto Clemente 0K GEN ED - - - 19.0 1.0 19.0 19.0 19.0 ATS NaN
1 1 M M015 P.S. 015 Roberto Clemente 0K CTT - - - 21.0 1.0 21.0 21.0 21.0 ATS NaN
2 1 M M015 P.S. 015 Roberto Clemente 01 GEN ED - - - 17.0 1.0 17.0 17.0 17.0 ATS NaN
3 1 M M015 P.S. 015 Roberto Clemente 01 CTT - - - 17.0 1.0 17.0 17.0 17.0 ATS NaN
4 1 M M015 P.S. 015 Roberto Clemente 02 GEN ED - - - 15.0 1.0 15.0 15.0 15.0 ATS NaN
In [5]:
data['demographics'].head()

Out[5]:
DBN Name schoolyear fl_percent frl_percent total_enrollment prek k grade1 grade2 ... black_num black_per hispanic_num hispanic_per white_num white_per male_num male_per female_num female_per
0 01M015 P.S. 015 ROBERTO CLEMENTE 20052006 89.4 NaN 281 15 36 40 33 ... 74 26.3 189 67.3 5 1.8 158.0 56.2 123.0 43.8
1 01M015 P.S. 015 ROBERTO CLEMENTE 20062007 89.4 NaN 243 15 29 39 38 ... 68 28.0 153 63.0 4 1.6 140.0 57.6 103.0 42.4
2 01M015 P.S. 015 ROBERTO CLEMENTE 20072008 89.4 NaN 261 18 43 39 36 ... 77 29.5 157 60.2 7 2.7 143.0 54.8 118.0 45.2
3 01M015 P.S. 015 ROBERTO CLEMENTE 20082009 89.4 NaN 252 17 37 44 32 ... 75 29.8 149 59.1 7 2.8 149.0 59.1 103.0 40.9
4 01M015 P.S. 015 ROBERTO CLEMENTE 20092010 96.5 208 16 40 28 32 ... 67 32.2 118 56.7 6 2.9 124.0 59.6 84.0 40.4

5 rows × 38 columns

In [6]:
data['graduation'].head()

Out[6]:
Demographic DBN School Name Cohort Total Cohort Total Grads - n Total Grads - % of cohort Total Regents - n Total Regents - % of cohort Total Regents - % of grads ... Regents w/o Advanced - n Regents w/o Advanced - % of cohort Regents w/o Advanced - % of grads Local - n Local - % of cohort Local - % of grads Still Enrolled - n Still Enrolled - % of cohort Dropped Out - n Dropped Out - % of cohort
0 Total Cohort 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL 2003 5 s s s s s ... s s s s s s s s s s
1 Total Cohort 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL 2004 55 37 67.3% 17 30.9% 45.9% ... 17 30.9% 45.9% 20 36.4% 54.1% 15 27.3% 3 5.5%
2 Total Cohort 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL 2005 64 43 67.2% 27 42.2% 62.8% ... 27 42.2% 62.8% 16 25% 37.200000000000003% 9 14.1% 9 14.1%
3 Total Cohort 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL 2006 78 43 55.1% 36 46.2% 83.7% ... 36 46.2% 83.7% 7 9% 16.3% 16 20.5% 11 14.1%
4 Total Cohort 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL 2006 Aug 78 44 56.4% 37 47.4% 84.1% ... 37 47.4% 84.1% 7 9% 15.9% 15 19.2% 11 14.1%

5 rows × 23 columns

In [7]:
data['hs_directory'].head()

Out[7]:
0 17K548 Brooklyn School for Music & Theatre Brooklyn K440 718-230-6250 718-230-6262 9 12 NaN NaN ... Then to New York City residents NaN NaN NaN NaN NaN NaN NaN NaN 883 Classon Avenue\nBrooklyn, NY 11225\n(40.67...
1 09X543 High School for Violin and Dance Bronx X400 718-842-0687 718-589-9849 9 12 NaN NaN ... Then to New York City residents who attend an ... Then to Bronx students or residents Then to New York City residents NaN NaN NaN NaN NaN NaN 1110 Boston Road\nBronx, NY 10456\n(40.8276026...
2 09X327 Comprehensive Model School Project M.S. 327 Bronx X240 718-294-8111 718-294-8109 6 12 NaN NaN ... Then to Bronx students or residents who attend... Then to New York City residents who attend an ... Then to Bronx students or residents Then to New York City residents NaN NaN NaN NaN NaN 1501 Jerome Avenue\nBronx, NY 10452\n(40.84241...
3 02M280 Manhattan Early College School for Advertising Manhattan M520 718-935-3477 NaN 9 10 9 14.0 ... Then to New York City residents who attend an ... Then to Manhattan students or residents Then to New York City residents NaN NaN NaN NaN NaN NaN 411 Pearl Street\nNew York, NY 10038\n(40.7106...
4 28Q680 Queens Gateway to Health Sciences Secondary Sc... Queens Q695 718-969-3155 718-969-3552 6 12 NaN NaN ... Then to Districts 28 and 29 students or residents Then to Queens students or residents Then to New York City residents NaN NaN NaN NaN NaN NaN 160-20 Goethals Avenue\nJamaica, NY 11432\n(40...

5 rows × 58 columns

In [8]:
data['sat_results'].head()

Out[8]:
DBN SCHOOL NAME Num of SAT Test Takers SAT Critical Reading Avg. Score SAT Math Avg. Score SAT Writing Avg. Score
0 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES 29 355 404 363
1 01M448 UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 91 383 423 366
2 01M450 EAST SIDE COMMUNITY SCHOOL 70 377 402 370
3 01M458 FORSYTH SATELLITE ACADEMY 7 414 401 359
4 01M509 MARTA VALLE HIGH SCHOOL 44 390 433 384

Each data set appears to either have a DBN column, or the information we need to create one. After some processing and making DBN column in each dataset, we can combine columns of each dataset together in a single data set.

In [9]:
all_survey = pd.read_csv("schools/survey_all.txt", delimiter="\t", encoding='windows-1252')
survey = pd.concat([all_survey, d75_survey], axis=0, sort=True)

Out[9]:
N_p N_s N_t aca_p_11 aca_s_11 aca_t_11 aca_tot_11 bn com_p_11 com_s_11 ... t_q8c_1 t_q8c_2 t_q8c_3 t_q8c_4 t_q9 t_q9_1 t_q9_2 t_q9_3 t_q9_4 t_q9_5
0 90.0 NaN 22.0 7.8 NaN 7.9 7.9 M015 7.6 NaN ... 29.0 67.0 5.0 0.0 NaN 5.0 14.0 52.0 24.0 5.0
1 161.0 NaN 34.0 7.8 NaN 9.1 8.4 M019 7.6 NaN ... 74.0 21.0 6.0 0.0 NaN 3.0 6.0 3.0 78.0 9.0
2 367.0 NaN 42.0 8.6 NaN 7.5 8.0 M020 8.3 NaN ... 33.0 35.0 20.0 13.0 NaN 3.0 5.0 16.0 70.0 5.0
3 151.0 145.0 29.0 8.5 7.4 7.8 7.9 M034 8.2 5.9 ... 21.0 45.0 28.0 7.0 NaN 0.0 18.0 32.0 39.0 11.0
4 90.0 NaN 23.0 7.9 NaN 8.1 8.0 M063 7.9 NaN ... 59.0 36.0 5.0 0.0 NaN 10.0 5.0 10.0 60.0 15.0

5 rows × 2773 columns

In [10]:
survey['dbn'].head()

Out[10]:
0    01M015
1    01M019
2    01M020
3    01M034
4    01M063
Name: dbn, dtype: object

There are two immediate facts that we can see in the data:

• There are over 2000 columns, nearly all of which we don't need. We'll have to filter the data to remove the unnecessary ones. Working with fewer columns will make it easier to print the dataframe out and find correlations within it.

We'll need to filter the columns to remove the ones we don't need. Luckily, there's a data dictionary at the original data download location. The dictionary tells us what each column represents. We'll pick columns having aggregate survey data about how parents, teachers, and students feel about school safety, academic performance, and more.

• The survey data has a dbn column that we'll want to convert to uppercase (DBN). The conversion will make the column name consistent with the other data sets.
In [11]:
survey["DBN"] = survey["dbn"]

survey_fields = [
"DBN",
"rr_s",
"rr_t",
"rr_p",
"N_s",
"N_t",
"N_p",
"saf_p_11",
"com_p_11",
"eng_p_11",
"aca_p_11",
"saf_t_11",
"com_t_11",
"eng_t_11",
"aca_t_11",
"saf_s_11",
"com_s_11",
"eng_s_11",
"aca_s_11",
"saf_tot_11",
"com_tot_11",
"eng_tot_11",
"aca_tot_11",
]
survey = survey.loc[:,survey_fields]
data["survey"] = survey


Out[11]:
DBN rr_s rr_t rr_p N_s N_t N_p saf_p_11 com_p_11 eng_p_11 ... eng_t_11 aca_t_11 saf_s_11 com_s_11 eng_s_11 aca_s_11 saf_tot_11 com_tot_11 eng_tot_11 aca_tot_11
0 01M015 NaN 88 60 NaN 22.0 90.0 8.5 7.6 7.5 ... 7.6 7.9 NaN NaN NaN NaN 8.0 7.7 7.5 7.9
1 01M019 NaN 100 60 NaN 34.0 161.0 8.4 7.6 7.6 ... 8.9 9.1 NaN NaN NaN NaN 8.5 8.1 8.2 8.4
2 01M020 NaN 88 73 NaN 42.0 367.0 8.9 8.3 8.3 ... 6.8 7.5 NaN NaN NaN NaN 8.2 7.3 7.5 8.0
3 01M034 89.0 73 50 145.0 29.0 151.0 8.8 8.2 8.0 ... 6.8 7.8 6.2 5.9 6.5 7.4 7.3 6.7 7.1 7.9
4 01M063 NaN 100 60 NaN 23.0 90.0 8.7 7.9 8.1 ... 7.8 8.1 NaN NaN NaN NaN 8.5 7.6 7.9 8.0

5 rows × 23 columns

Add/Modify DBN columns in class_size and hs_directory¶

When we explored all of the data sets, we noticed that some of them, like class_size and hs_directory, don't have a DBN column.

• hs_directory does have a dbn column, though, so we can just rename it.
• However, class_size doesn't appear to have the column at all. For making up DBN column, we will add CSD and SCHOOL CODE columns and add a leading 0 to the CSD if the CSD is less than two digits long
In [12]:
data["hs_directory"]["DBN"] = data["hs_directory"]["dbn"]

string_representation = str(num)
if len(string_representation) > 1:
return string_representation
else:
return "0" + string_representation

data["class_size"]["DBN"] = data["class_size"]["padded_csd"] + data["class_size"]["SCHOOL CODE"]


0    17K548
1    09X543
2    09X327
3    02M280
4    28Q680
Name: DBN, dtype: object

Out[12]:
0    01M015
1    01M015
2    01M015
3    01M015
4    01M015
Name: DBN, dtype: object

Convert numeric columns from Object type to numeric and Add Longitude and Latitude Columns¶

• Before procedding further, we need to change columns in ap_2010 from object (string) data type to numeric data type.
• Same needs to be done with SAT scores columns in sat_results. And then make up a new column that totals up the SAT scores.
• We also want to parse the latitude and longitude coordinates for each school. This will enable us to map the schools and uncover any geographic patterns in the data. The coordinates are currently in the text field Location 1 in the hs_directory data set. We will need to use some regex alongwith string manipulation to extract them in two new columns. And then convert them to numeric format.
In [13]:
cols = ['SAT Math Avg. Score', 'SAT Critical Reading Avg. Score', 'SAT Writing Avg. Score']
for c in cols:
data["sat_results"][c] = pd.to_numeric(data["sat_results"][c], errors="coerce")

data['sat_results']['sat_score'] = data['sat_results'][cols[0]] + data['sat_results'][cols[1]] + data['sat_results'][cols[2]]

cols = ['AP Test Takers ', 'Total Exams Taken', 'Number of Exams with scores 3 4 or 5']
for col in cols:
data["ap_2010"][col] = pd.to_numeric(data["ap_2010"][col], errors="coerce")

def find_lat(loc):
coords = re.findall("$.+, .+$", loc)
lat = coords[0].split(",")[0].replace("(", "")
return lat

def find_lon(loc):
coords = re.findall("$.+, .+$", loc)
lon = coords[0].split(",")[1].replace(")", "").strip()
return lon

data["hs_directory"]["lat"] = data["hs_directory"]["Location 1"].apply(find_lat)
data["hs_directory"]["lon"] = data["hs_directory"]["Location 1"].apply(find_lon)

data["hs_directory"]["lat"] = pd.to_numeric(data["hs_directory"]["lat"], errors="coerce")
data["hs_directory"]["lon"] = pd.to_numeric(data["hs_directory"]["lon"], errors="coerce")


Condense datasets¶

• Further, we will clean our datasets more to have a single row for each DBN value per dataset. This will help us when while combining the data sets.
• We'll perform cleaning for three datasets:

class_size : each school has multiple values for GRADE, PROGRAM TYPE, CORE SUBJECT (MS CORE and 9-12 ONLY), and CORE COURSE (MS CORE and 9-12 ONLY).

- We will filter on GRADE(09-12) and PROGRAM TYPE('GEN ED') to get information about only classes where SAT has been taken.
- And, then we will use groupby method on DBN to get unique groups per school. Taking average on the resulting groupby object using agg() method will give unique values per DBN.


demographics: Here, only column that prevents a given DBN from being unique is schoolyear. We only want to select rows where schoolyear is 20112012. This will give us the most recent year of data, and also match our SAT results data.

graduation: The Demographic and Cohort columns are what prevent DBN from being unique in the graduation data. A Cohort appears to refer to the year the data represents, and the Demographic appears to refer to a specific demographic group. In this case, we want to pick data from the most recent Cohort available, which is 2006. We also want data from the full cohort, so we'll only pick rows where Demographic is Total Cohort.

In [14]:
class_size = data["class_size"]
class_size = class_size[class_size["GRADE "] == "09-12"]
class_size = class_size[class_size["PROGRAM TYPE"] == "GEN ED"]

class_size = class_size.groupby("DBN").agg(numpy.mean)
class_size.reset_index(inplace=True)
data["class_size"] = class_size

data["demographics"] = data["demographics"][data["demographics"]["schoolyear"] == 20112012]



Combine the datasets¶

• We'll merge two data sets at a time. For example, we'll merge sat_results with one dataset, then with another, then the result of that with another dataset. We'll continue combining data sets in this way until we've merged all of them. Afterwards, we'll have roughly the same number of rows, but each row will have columns from all of the data sets.
• Because we're concerned with determing demographic factors correlating with SAT score, we want to preserve as many rows as possible from sat_results while minimizing null values. Therefore, we will use different merge strategies.
• Data sets having lot of missing DBN values will be merged using a left join.
• Data sets having DBN values identical to those in sat_results will be merged using inner join. These data sets also have information we need to keep. Most of our analysis would be impossible if a significant number of rows were missing from demographics, for example.
In [15]:
combined = data["sat_results"]

combined = combined.merge(data["ap_2010"], on="DBN", how="left")

to_merge = ["class_size", "demographics", "survey", "hs_directory"]

for m in to_merge:
combined = combined.merge(data[m], on="DBN", how="inner")

combined = combined.fillna(combined.mean())
combined = combined.fillna(0)

In [16]:
pd.set_option('display.max_columns', 500)

Out[16]:
0 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES 29 355.0 404.0 363.0 1122.0 0 129.028846 197.038462 153.45 Total Cohort HENRY STREET SCHOOL FOR INTERNATIONAL 2006 78.0 43 55.1% 36 46.2% 83.7% 0 0% 0% 36 46.2% 83.7% 7 9% 16.3% 16 20.5% 11 14.1% 1 88.000000 4.000000 22.564286 18.500000 26.571429 0.0 HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES 20112012 0 88.6 422 32 33 50 98 79 80 50 94.0 22.3 105.0 24.9 34 35 59 14.0 123 29.1 227 53.8 7 1.7 259.0 61.4 163.0 38.6 89.0 70 39 379.000000 26.0 151.0 7.8 7.7 7.4 7.6 6.3 5.3 6.1 6.5 6.000000 5.600000 6.100000 6.700000 6.7 6.2 6.6 7.0 01M292 Henry Street School for International Studies Manhattan M056 212-406-9411 212-406-9417 6 12 0 12.0 B39, M14A, M14D, M15, M15-SBS, M21, M22, M9 B, D to Grand St ; F to East Broadway ; J, M, ... 220 Henry Street New York NY 10002 http://schools.nyc.gov/schoolportals/01/M292 323.0 0 0 Henry Street School for International Studies ... Global/International Studies in core subjects,... Chinese (Mandarin), Spanish Psychology Chinese Language and Culture, Spanish Literatu... Chinese (Mandarin), Spanish Math through Card Play; Art, Poetry/Spoken Wor... Basketball Softball Soccer Boxing, Track, CHAMPS, Tennis, Flag Football, ... The Henry Street Settlement; Asia Society; Ame... Gouverneur Hospital (Turning Points) New York University Asia Society Heart of America Foundation 0 0 United Nations 0 0 8:30 AM 3:30 PM This school will provide students with disabil... ESL Functionally Accessible 1 Priority to continuing 8th graders Then to Manhattan students or residents who at... Then to New York City residents who attend an ... Then to Manhattan students or residents Then to New York City residents 0 0 0 0 0 220 Henry Street\nNew York, NY 10002\n(40.7137... 40.713764 -73.985260
1 01M448 UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 91 383.0 423.0 366.0 1172.0 UNIVERSITY NEIGHBORHOOD H.S. 39.000000 49.000000 10.00 Total Cohort UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 2006 124.0 53 42.7% 42 33.9% 79.2% 8 6.5% 15.1% 34 27.4% 64.2% 11 8.9% 20.8% 46 37.1% 20 16.100000000000001% 1 105.687500 4.750000 22.231250 18.250000 27.062500 0.0 UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 20112012 0 71.8 394 109 97 93 95 83.0 21.1 86.0 21.8 55 10 115 29.2 89 22.6 181 45.9 9 2.3 226.0 57.4 168.0 42.6 84.0 95 10 385.000000 37.0 46.0 7.9 7.4 7.2 7.3 6.6 5.8 6.6 7.3 6.000000 5.700000 6.300000 7.000000 6.8 6.3 6.7 7.2 01M448 University Neighborhood High School Manhattan M446 212-962-4341 212-267-5611 9 12 0 12.0 M14A, M14D, M15, M21, M22, M9 F to East Broadway ; J, M, Z to Delancey St-Es... 200 Monroe Street New York NY 10002 www.universityneighborhoodhs.com 299.0 0 0 University Neighborhood High School (UNHS) is ... While attending UNHS, students can earn up to ... Chinese, Spanish Calculus AB, Chinese Language and Culture, Eng... 0 Chinese (Cantonese), Chinese (Mandarin), Spanish Basketball, Badminton, Handball, Glee, Dance, ... Baseball, Basketball, Bowling, Cross Country, ... Basketball, Bowling, Cross Country, Softball, ... 0 0 Grand Street Settlement, Henry Street Settleme... Gouverneur Hospital, The Door, The Mount Sinai... New York University, CUNY Baruch College, Pars... Dance Film Association, Dance Makers Film Work... W!SE, Big Brothers Big Sisters, Peer Health Ex... Deloitte LLP Consulting and Financial Services... 0 Movement Research Incoming students are expected to attend schoo... Community Service Requirement, Dress Code Requ... 8:15 AM 3:15 PM This school will provide students with disabil... ESL Not Functionally Accessible 3 Open to New York City residents For M35B only: Open only to students whose hom... 0 0 0 0 0 0 0 0 200 Monroe Street\nNew York, NY 10002\n(40.712... 40.712332 -73.984797
2 01M450 EAST SIDE COMMUNITY SCHOOL 70 377.0 402.0 370.0 1149.0 EAST SIDE COMMUNITY HS 19.000000 21.000000 153.45 Total Cohort EAST SIDE COMMUNITY SCHOOL 2006 90.0 70 77.8% 67 74.400000000000006% 95.7% 0 0% 0% 67 74.400000000000006% 95.7% 3 3.3% 4.3% 15 16.7% 5 5.6% 1 57.600000 2.733333 21.200000 19.400000 22.866667 0.0 EAST SIDE COMMUNITY HIGH SCHOOL 20112012 0 71.8 598 92 73 76 101 93 77 86 30.0 5.0 158.0 26.4 91 19 58 9.7 143 23.9 331 55.4 62 10.4 327.0 54.7 271.0 45.3 0.0 98 28 598.208333 42.0 150.0 8.7 8.2 8.1 8.4 7.3 8.0 8.0 8.8 6.611667 6.094722 6.620278 7.381389 7.9 7.9 7.9 8.4 01M450 East Side Community School Manhattan M060 212-460-8467 212-260-9657 6 12 0 12.0 M101, M102, M103, M14A, M14D, M15, M15-SBS, M2... 6 to Astor Place ; L to 1st Ave 420 East 12 Street New York NY 10009 www.eschs.org 649.0 0 Consortium School We are a small 6-12 secondary school that prep... Our Advisory System ensures that we can effect... 0 Calculus AB, English Literature and Composition 0 American Sign Language, Arabic, Chinese (Manda... After-School Tutoring, Art Portfolio Classes, ... Baseball, Basketball, Soccer Basketball, Soccer, Softball 0 Basketball, Bicycling, Fitness, Flag Football,... University Settlement, Big Brothers Big Sister... 0 Columbia Teachers College, New York University... , Internship Program, Loisaida Art Gallery loc... College Bound Initiative, Center for Collabora... Prudential Securities, Moore Capital, Morgan S... 0 Brooklyn Boulders (Rock Climbing) Students present and defend their work to comm... Our school requires an Academic Portfolio for ... 8:30 AM 3:30 PM This school will provide students with disabil... ESL Not Functionally Accessible 1 Priority to continuing 8th graders Then to New York City residents 0 0 0 0 0 0 0 0 420 East 12 Street\nNew York, NY 10009\n(40.72... 40.729783 -73.983041
3 01M509 MARTA VALLE HIGH SCHOOL 44 390.0 433.0 384.0 1207.0 0 129.028846 197.038462 153.45 Total Cohort MARTA VALLE HIGH SCHOOL 2006 84.0 47 56% 40 47.6% 85.1% 17 20.2% 36.200000000000003% 23 27.4% 48.9% 7 8.300000000000001% 14.9% 25 29.8% 5 6% 1 69.642857 3.000000 23.571429 20.000000 27.357143 0.0 MARTA VALLE SECONDARY SCHOOL 20112012 0 80.7 367 143 100 51 73 41.0 11.2 95.0 25.9 28 36 34 9.3 116 31.6 209 56.9 6 1.6 170.0 46.3 197.0 53.7 90.0 100 21 306.000000 29.0 69.0 7.7 7.4 7.2 7.3 6.4 5.3 6.1 6.8 6.400000 5.900000 6.400000 7.000000 6.9 6.2 6.6 7.0 01M509 Marta Valle High School Manhattan M025 212-473-8152 212-475-7588 9 12 0 12.0 B39, M103, M14A, M14D, M15, M15-SBS, M21, M22,... B, D to Grand St ; F, J, M, Z to Delancey St-E... 145 Stanton Street New York NY 10002 www.martavalle.org 401.0 0 0 Marta Valle High School (MVHS) offers a strong... Advanced Regents Diploma, Early Graduation, up... French, Spanish English Literature and Composition, Studio Art... 0 Spanish Model Peer Leadership Program, 'The Vine' Stud... Basketball, Rugby Rugby, Volleyball Rugby Volleyball, Zumba NYCDOE Innovation Zone Lab Site, Grand Street ... Gouvenuer's Hospital New York University (NYU), Sarah Lawrence Coll... Young Audiences, The National Arts Club, Educa... College for Every Student (CFES), Morningside ... Estée Lauder Bank of America CASALEAP, Beacon Students Dress for Success, Summer Bridge to S... Community Service Requirement, Extended Day Pr... 8:00 AM 3:30 PM This school will provide students with disabil... ESL Functionally Accessible 1 Priority to District 1 students or residents Then to Manhattan students or residents Then to New York City residents 0 0 0 0 0 0 0 145 Stanton Street\nNew York, NY 10002\n(40.72... 40.720569 -73.985673
4 01M539 NEW EXPLORATIONS INTO SCIENCE, TECHNOLOGY AND ... 159 522.0 574.0 525.0 1621.0 NEW EXPLORATIONS SCI,TECH,MATH 255.000000 377.000000 191.00 Total Cohort NEW EXPLORATIONS INTO SCIENCE TECHNO 2006 46.0 46 100% 46 100% 100% 31 67.400000000000006% 67.400000000000006% 15 32.6% 32.6% 0 0% 0% 0 0% 0 0% 1 156.368421 6.157895 25.510526 19.473684 31.210526 0.0 NEW EXPLORATIONS INTO SCIENCE TECH AND MATH 20112012 0 23.0 1613 100 107 139 110 114 107 149 126 117 117 123 147 157 4.0 0.2 43.0 2.7 2 0 448 27.8 189 11.7 229 14.2 725 44.9 794.0 49.2 819.0 50.8 98.0 68 51 923.000000 67.0 736.0 8.5 7.9 7.9 8.4 7.6 5.6 5.9 7.3 7.300000 6.400000 7.000000 7.700000 7.8 6.7 6.9 7.8 01M539 New Explorations into Science, Technology and ... Manhattan M022 212-677-5190 212-260-8124 K 12 0 12.0 B39, M14A, M14D, M21, M22, M8, M9 F, J, M, Z to Delancey St-Essex St 111 Columbia Street New York NY 10002 www.nestmk12.net 1725.0 0 0 New Explorations into Science, Technology and ... 1st level science sequence - 9th grade: Regent... Chinese (Mandarin), French, Italian, Latin, Sp... Biology, Calculus AB, Calculus BC, Chemistry, ... 0 0 After-school Jazz Band, Annual Coffee House Co... Basketball, Fencing, Indoor Track Basketball, Fencing, Indoor Track 0 Badminton, Baseball, Cross-Country, Dance, Out... 7th Precinct Community Affairs, NYCWastele, ... 0 Hunter College, New York University, Cornell U... VH1, Dancing Classrooms, Center for Arts Educa... After 3 Time Warner Cable, Google, IBM, MET Project, S... 0 0 Dress Code Required: Business Casual - shirt/b... 0 8:15 AM 4:00 PM This school will provide students with disabil... ESL Not Functionally Accessible 1 Priority to continuing 8th graders Then to New York City residents 0 0 0 0 0 0 0 0 111 Columbia Street\nNew York, NY 10002\n(40.7... 40.718725 -73.979426

Add a school district column for mapping¶

Now, we have a clean data set on which we can base our analysis. Mapping the statistics out on a school district level might be an interesting way to analyze them. Adding a column to the data set that specifies the school district will help us accomplish this.

In [17]:
def get_first_two_chars(dbn):
return dbn[0:2]

combined["school_dist"] = combined["DBN"].apply(get_first_two_chars)


Out[17]:
0 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES 29 355.0 404.0 363.0 1122.0 0 129.028846 197.038462 153.45 Total Cohort HENRY STREET SCHOOL FOR INTERNATIONAL 2006 78.0 43 55.1% 36 46.2% 83.7% 0 0% 0% 36 46.2% 83.7% 7 9% 16.3% 16 20.5% 11 14.1% 1 88.0000 4.000000 22.564286 18.50 26.571429 0.0 HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES 20112012 0 88.6 422 32 33 50 98 79 80 50 94.0 22.3 105.0 24.9 34 35 59 14.0 123 29.1 227 53.8 7 1.7 259.0 61.4 163.0 38.6 89.0 70 39 379.000000 26.0 151.0 7.8 7.7 7.4 7.6 6.3 5.3 6.1 6.5 6.000000 5.600000 6.100000 6.700000 6.7 6.2 6.6 7.0 01M292 Henry Street School for International Studies Manhattan M056 212-406-9411 212-406-9417 6 12 0 12.0 B39, M14A, M14D, M15, M15-SBS, M21, M22, M9 B, D to Grand St ; F to East Broadway ; J, M, ... 220 Henry Street New York NY 10002 http://schools.nyc.gov/schoolportals/01/M292 323.0 0 0 Henry Street School for International Studies ... Global/International Studies in core subjects,... Chinese (Mandarin), Spanish Psychology Chinese Language and Culture, Spanish Literatu... Chinese (Mandarin), Spanish Math through Card Play; Art, Poetry/Spoken Wor... Basketball Softball Soccer Boxing, Track, CHAMPS, Tennis, Flag Football, ... The Henry Street Settlement; Asia Society; Ame... Gouverneur Hospital (Turning Points) New York University Asia Society Heart of America Foundation 0 0 United Nations 0 0 8:30 AM 3:30 PM This school will provide students with disabil... ESL Functionally Accessible 1 Priority to continuing 8th graders Then to Manhattan students or residents who at... Then to New York City residents who attend an ... Then to Manhattan students or residents Then to New York City residents 0 0 0 0 0 220 Henry Street\nNew York, NY 10002\n(40.7137... 40.713764 -73.985260 01
1 01M448 UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 91 383.0 423.0 366.0 1172.0 UNIVERSITY NEIGHBORHOOD H.S. 39.000000 49.000000 10.00 Total Cohort UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 2006 124.0 53 42.7% 42 33.9% 79.2% 8 6.5% 15.1% 34 27.4% 64.2% 11 8.9% 20.8% 46 37.1% 20 16.100000000000001% 1 105.6875 4.750000 22.231250 18.25 27.062500 0.0 UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 20112012 0 71.8 394 109 97 93 95 83.0 21.1 86.0 21.8 55 10 115 29.2 89 22.6 181 45.9 9 2.3 226.0 57.4 168.0 42.6 84.0 95 10 385.000000 37.0 46.0 7.9 7.4 7.2 7.3 6.6 5.8 6.6 7.3 6.000000 5.700000 6.300000 7.000000 6.8 6.3 6.7 7.2 01M448 University Neighborhood High School Manhattan M446 212-962-4341 212-267-5611 9 12 0 12.0 M14A, M14D, M15, M21, M22, M9 F to East Broadway ; J, M, Z to Delancey St-Es... 200 Monroe Street New York NY 10002 www.universityneighborhoodhs.com 299.0 0 0 University Neighborhood High School (UNHS) is ... While attending UNHS, students can earn up to ... Chinese, Spanish Calculus AB, Chinese Language and Culture, Eng... 0 Chinese (Cantonese), Chinese (Mandarin), Spanish Basketball, Badminton, Handball, Glee, Dance, ... Baseball, Basketball, Bowling, Cross Country, ... Basketball, Bowling, Cross Country, Softball, ... 0 0 Grand Street Settlement, Henry Street Settleme... Gouverneur Hospital, The Door, The Mount Sinai... New York University, CUNY Baruch College, Pars... Dance Film Association, Dance Makers Film Work... W!SE, Big Brothers Big Sisters, Peer Health Ex... Deloitte LLP Consulting and Financial Services... 0 Movement Research Incoming students are expected to attend schoo... Community Service Requirement, Dress Code Requ... 8:15 AM 3:15 PM This school will provide students with disabil... ESL Not Functionally Accessible 3 Open to New York City residents For M35B only: Open only to students whose hom... 0 0 0 0 0 0 0 0 200 Monroe Street\nNew York, NY 10002\n(40.712... 40.712332 -73.984797 01
2 01M450 EAST SIDE COMMUNITY SCHOOL 70 377.0 402.0 370.0 1149.0 EAST SIDE COMMUNITY HS 19.000000 21.000000 153.45 Total Cohort EAST SIDE COMMUNITY SCHOOL 2006 90.0 70 77.8% 67 74.400000000000006% 95.7% 0 0% 0% 67 74.400000000000006% 95.7% 3 3.3% 4.3% 15 16.7% 5 5.6% 1 57.6000 2.733333 21.200000 19.40 22.866667 0.0 EAST SIDE COMMUNITY HIGH SCHOOL 20112012 0 71.8 598 92 73 76 101 93 77 86 30.0 5.0 158.0 26.4 91 19 58 9.7 143 23.9 331 55.4 62 10.4 327.0 54.7 271.0 45.3 0.0 98 28 598.208333 42.0 150.0 8.7 8.2 8.1 8.4 7.3 8.0 8.0 8.8 6.611667 6.094722 6.620278 7.381389 7.9 7.9 7.9 8.4 01M450 East Side Community School Manhattan M060 212-460-8467 212-260-9657 6 12 0 12.0 M101, M102, M103, M14A, M14D, M15, M15-SBS, M2... 6 to Astor Place ; L to 1st Ave 420 East 12 Street New York NY 10009 www.eschs.org 649.0 0 Consortium School We are a small 6-12 secondary school that prep... Our Advisory System ensures that we can effect... 0 Calculus AB, English Literature and Composition 0 American Sign Language, Arabic, Chinese (Manda... After-School Tutoring, Art Portfolio Classes, ... Baseball, Basketball, Soccer Basketball, Soccer, Softball 0 Basketball, Bicycling, Fitness, Flag Football,... University Settlement, Big Brothers Big Sister... 0 Columbia Teachers College, New York University... , Internship Program, Loisaida Art Gallery loc... College Bound Initiative, Center for Collabora... Prudential Securities, Moore Capital, Morgan S... 0 Brooklyn Boulders (Rock Climbing) Students present and defend their work to comm... Our school requires an Academic Portfolio for ... 8:30 AM 3:30 PM This school will provide students with disabil... ESL Not Functionally Accessible 1 Priority to continuing 8th graders Then to New York City residents 0 0 0 0 0 0 0 0 420 East 12 Street\nNew York, NY 10009\n(40.72... 40.729783 -73.983041 01

Find correlations¶

In [18]:
correlations = combined.corr()
correlations = correlations["sat_score"]
print(correlations)

SAT Critical Reading Avg. Score         0.986820
SAT Math Avg. Score                     0.972643
SAT Writing Avg. Score                  0.987771
sat_score                               1.000000
AP Test Takers                          0.523140
Total Exams Taken                       0.514333
Number of Exams with scores 3 4 or 5    0.463245
Total Cohort                            0.325144
CSD                                     0.042948
NUMBER OF STUDENTS / SEATS FILLED       0.394626
NUMBER OF SECTIONS                      0.362673
AVERAGE CLASS SIZE                      0.381014
SIZE OF SMALLEST CLASS                  0.249949
SIZE OF LARGEST CLASS                   0.314434
SCHOOLWIDE PUPIL-TEACHER RATIO               NaN
schoolyear                                   NaN
fl_percent                                   NaN
frl_percent                            -0.722225
total_enrollment                        0.367857
ell_num                                -0.153778
ell_percent                            -0.398750
sped_num                                0.034933
sped_percent                           -0.448170
asian_num                               0.475445
asian_per                               0.570730
black_num                               0.027979
black_per                              -0.284139
hispanic_num                            0.025744
hispanic_per                           -0.396985
white_num                               0.449559
...
rr_p                                    0.047925
N_s                                     0.423463
N_t                                     0.291463
N_p                                     0.421530
saf_p_11                                0.122913
com_p_11                               -0.115073
eng_p_11                                0.020254
aca_p_11                                0.035155
saf_t_11                                0.313810
com_t_11                                0.082419
eng_t_11                                0.036906
aca_t_11                                0.132348
saf_s_11                                0.337639
com_s_11                                0.187370
eng_s_11                                0.213822
aca_s_11                                0.339435
saf_tot_11                              0.318753
com_tot_11                              0.077310
eng_tot_11                              0.100102
aca_tot_11                              0.190966
zip                                    -0.063977
total_students                          0.407827
number_programs                         0.117012
priority08                                   NaN
priority09                                   NaN
priority10                                   NaN
lat                                    -0.121029
lon                                    -0.132222
Name: sat_score, Length: 67, dtype: float64


Plotting survey correlations¶

• There are several fields in combined that originally came from a survey of parents, teachers, and students. We will make a bar plot of the correlations between these fields and sat_score.
In [19]:
# Remove DBN since it's a unique identifier, not a useful numerical value for correlation.
survey_fields.remove("DBN")

import matplotlib.pyplot as plt
combined.corr()['sat_score'][survey_fields].plot.bar()
plt.show()

<Figure size 640x480 with 1 Axes>

Key Observations:

• N_s, N_t, N_p correlate highly with sat_score. Since these fields are directly related to total enrollment, this can be understood.
• rr_s (Student Response Rate) however is the more interesting point. Since, students who excel academically are more likely to respond to a survey regarding sat_score.
• saf_t_11, saf_s_11, saf_tot_11 gives us another good perspective about the relationship between a safe environment and academic brilliance. Schools where Students and Teachers feel safe are more likely to have good sat_score.
• aca_s_11 i.e. how students percieve academic standards correlates highly for sat_score. However, same is not true for aca_p_11 and aca_t_11, i.e. how Parents and Teachers perceive academic standards.

Exploring Safety and SAT Scores¶

• We will investigating safety scores by making a scatter plot of the saf_s_11 column vs. the sat_score in combined.
• Map out safety scores :
• Compute the average safety score for each district.
• Make a map that shows safety scores by district.
In [20]:
combined.plot.scatter('saf_s_11', 'sat_score')
plt.show()


From the Scatter Plot, we can observe that there seems to be a positive correlation between the sat_score and saf_s_11, although its not that strong.

In [21]:
districts = combined.groupby('school_dist').agg(numpy.mean)
districts.reset_index(inplace=True)
districts

Out[21]:
school_dist SAT Critical Reading Avg. Score SAT Math Avg. Score SAT Writing Avg. Score sat_score AP Test Takers Total Exams Taken Number of Exams with scores 3 4 or 5 Total Cohort CSD NUMBER OF STUDENTS / SEATS FILLED NUMBER OF SECTIONS AVERAGE CLASS SIZE SIZE OF SMALLEST CLASS SIZE OF LARGEST CLASS SCHOOLWIDE PUPIL-TEACHER RATIO schoolyear fl_percent frl_percent total_enrollment ell_num ell_percent sped_num sped_percent asian_num asian_per black_num black_per hispanic_num hispanic_per white_num white_per male_num male_per female_num female_per rr_s rr_t rr_p N_s N_t N_p saf_p_11 com_p_11 eng_p_11 aca_p_11 saf_t_11 com_t_11 eng_t_11 aca_t_11 saf_s_11 com_s_11 eng_s_11 aca_s_11 saf_tot_11 com_tot_11 eng_tot_11 aca_tot_11 grade_span_max expgrade_span_max zip total_students number_programs priority08 priority09 priority10 lat lon
0 01 441.833333 473.333333 439.333333 1354.500000 116.681090 173.019231 135.800000 93.500000 1.0 115.244241 5.148538 22.675415 18.798392 26.553044 0.0 20112012.0 0.0 58.983333 668.500000 42.166667 10.000000 82.000000 17.083333 134.500000 17.516667 125.500000 22.333333 214.833333 40.733333 186.000000 18.450000 328.166667 50.050000 340.333333 49.950000 76.500000 85.333333 33.166667 525.368056 38.500000 239.166667 8.233333 7.800000 7.683333 8.016667 7.066667 6.233333 6.700000 7.500000 6.768611 6.165787 6.736713 7.446898 7.433333 6.816667 7.116667 7.683333 12.0 12.0 10003.166667 659.500000 1.333333 0.0 0.0 0.0 40.719022 -73.982377
1 02 426.619092 444.186256 424.832836 1295.638184 128.908454 201.516827 157.495833 158.647849 2.0 149.818949 5.686360 25.038118 20.662667 28.593861 0.0 20112012.0 0.0 63.164583 605.604167 51.625000 13.006250 70.770833 13.212500 118.395833 14.479167 141.145833 24.733333 271.458333 49.554167 68.541667 10.056250 266.500000 44.718750 339.104167 55.281250 83.312500 86.416667 38.333333 495.171007 31.687500 190.000000 8.254167 7.495833 7.379167 7.704167 7.404167 6.520833 7.120833 7.564583 6.910660 6.204057 6.650422 7.385029 7.520833 6.720833 7.037500 7.541667 12.0 12.0 10023.770833 621.395833 1.416667 0.0 0.0 0.0 40.739699 -73.991386
2 03 428.529851 437.997512 426.915672 1293.443035 156.183494 244.522436 193.087500 183.384409 3.0 156.005994 5.839200 23.716311 19.737593 27.122831 0.0 20112012.0 0.0 58.050000 661.416667 36.916667 9.025000 63.583333 14.500000 68.000000 6.408333 181.833333 34.000000 228.250000 44.533333 172.000000 13.725000 272.166667 47.950000 389.250000 52.050000 83.166667 80.833333 36.166667 519.250000 28.166667 206.416667 8.316667 7.541667 7.508333 7.758333 6.675000 5.991667 6.533333 6.991667 6.716667 6.258333 6.633333 7.375000 7.233333 6.616667 6.891667 7.391667 12.0 12.0 10023.750000 717.916667 2.000000 0.0 0.0 0.0 40.781574 -73.977370
3 04 402.142857 416.285714 405.714286 1224.142857 129.016484 183.879121 151.035714 113.857143 4.0 132.362265 5.192610 24.101048 19.740816 27.460291 0.0 20112012.0 0.0 71.000000 569.285714 21.571429 4.557143 58.571429 13.214286 61.142857 6.128571 151.714286 30.028571 341.571429 61.400000 9.285714 1.314286 193.285714 30.514286 376.000000 69.485714 87.571429 92.714286 41.285714 504.857143 32.428571 193.142857 8.385714 7.657143 7.500000 7.928571 7.871429 7.042857 7.428571 7.885714 6.885714 6.114286 6.685714 7.571429 7.714286 6.942857 7.185714 7.785714 12.0 12.0 10029.857143 580.857143 1.142857 0.0 0.0 0.0 40.793449 -73.943215
4 05 427.159915 438.236674 419.666098 1285.062687 85.722527 115.725275 142.464286 143.677419 5.0 120.623901 4.845101 24.290247 20.447527 27.746703 0.0 20112012.0 0.0 58.671429 605.857143 21.571429 4.157143 59.714286 10.428571 32.714286 7.300000 342.571429 49.000000 195.571429 36.214286 31.285714 6.714286 300.000000 48.471429 305.857143 51.528571 82.857143 79.285714 40.428571 442.428571 27.714286 184.857143 8.085714 7.342857 7.357143 7.685714 6.857143 5.885714 6.242857 6.957143 6.314286 6.000000 6.428571 7.285714 7.085714 6.414286 6.671429 7.314286 12.0 12.0 10030.142857 609.857143 1.142857 0.0 0.0 0.0 40.817077 -73.949251
5 06 382.011940 400.565672 382.066269 1164.643881 108.711538 159.715385 105.425000 180.848387 6.0 139.041709 5.285860 25.324199 20.556126 29.217872 0.0 20112012.0 0.0 84.250000 621.800000 165.700000 30.040000 80.100000 14.950000 7.800000 0.810000 82.000000 11.470000 523.900000 86.380000 5.000000 0.720000 333.600000 53.790000 288.200000 46.210000 85.000000 83.500000 54.600000 482.300000 33.000000 261.500000 8.530000 7.990000 7.750000 8.050000 7.490000 6.610000 7.060000 7.660000 6.950000 6.150000 6.700000 7.510000 7.660000 6.920000 7.170000 7.760000 12.0 12.0 10036.200000 628.900000 1.300000 0.0 0.0 0.0 40.848970 -73.932502
6 07 376.461538 380.461538 371.923077 1128.846154 73.703402 112.476331 105.276923 105.605459 7.0 97.597416 4.146610 23.560037 19.394465 27.004052 0.0 20112012.0 0.0 79.630769 481.000000 72.000000 16.661538 84.846154 17.446154 8.000000 1.707692 144.461538 29.253846 322.000000 67.607692 3.769231 0.800000 261.769231 53.138462 219.230769 46.861538 80.538462 77.153846 42.923077 376.230769 25.846154 187.615385 8.423077 7.761538 7.600000 7.907692 6.969231 6.315385 6.784615 7.284615 6.800000 6.184615 6.646154 7.453846 7.392308 6.746154 7.007692 7.553846 12.0 12.0 10452.692308 465.846154 1.461538 0.0 0.0 0.0 40.816815 -73.919971
7 08 386.214383 395.542741 377.908005 1159.665129 118.379371 168.020979 144.731818 215.510264 8.0 129.765099 5.165710 23.813192 19.385355 27.362141 0.0 20112012.0 0.0 76.663636 684.545455 72.909091 15.790909 140.545455 19.609091 30.272727 2.809091 179.818182 27.809091 433.545455 66.109091 36.181818 2.554545 355.000000 45.454545 329.545455 54.545455 69.454545 77.636364 36.636364 459.837121 34.000000 148.545455 8.172727 7.636364 7.454545 7.718182 7.109091 6.254545 6.781818 7.390909 6.364697 5.954066 6.401843 7.143763 7.218182 6.645455 6.863636 7.409091 12.0 12.0 10467.000000 547.636364 1.272727 0.0 0.0 0.0 40.823803 -73.866087
8 09 373.755970 383.582836 374.633134 1131.971940 71.411538 104.265385 98.470000 113.330645 9.0 100.118588 4.335504 23.986379 19.543005 28.447128 0.0 20112012.0 0.0 76.800000 442.800000 78.900000 19.945000 74.150000 16.655000 7.000000 1.480000 151.900000 33.110000 278.250000 64.135000 3.000000 0.665000 220.700000 48.995000 222.050000 50.995000 84.200000 84.650000 40.650000 349.150000 25.450000 162.300000 8.510000 7.930000 7.755000 8.080000 6.935000 6.305000 6.905000 7.355000 6.655000 6.145000 6.705000 7.490000 7.365000 6.795000 7.110000 7.665000 12.0 12.0 10456.100000 449.700000 1.150000 0.0 0.0 0.0 40.836349 -73.906240
9 10 403.363636 418.000000 400.863636 1222.227273 132.231206 226.914336 191.618182 161.318182 10.0 168.876526 6.512691 24.244350 18.897297 28.583860 0.0 20112012.0 0.0 72.268182 818.545455 129.090909 18.177273 99.045455 13.622727 119.681818 6.631818 169.090909 23.027273 453.545455 63.272727 71.409091 6.527273 413.181818 48.586364 405.363636 51.413636 82.727273 82.136364 43.000000 648.772727 43.500000 303.590909 8.272727 7.768182 7.622727 7.927273 6.972727 6.154545 6.700000 7.309091 6.577273 6.040909 6.654545 7.409091 7.268182 6.645455 6.986364 7.536364 12.0 12.0 10463.181818 757.863636 1.500000 0.0 0.0 0.0 40.870345 -73.898360
10 11 389.866667 394.533333 380.600000 1165.000000 83.813462 122.484615 108.833333 122.866667 11.0 129.031031 5.037052 24.956523 19.784469 29.523549 0.0 20112012.0 0.0 66.960000 562.466667 61.666667 12.133333 105.600000 17.380000 19.800000 3.846667 243.000000 41.346667 278.266667 50.846667 16.400000 3.113333 319.866667 55.900000 242.600000 44.100000 80.400000 79.933333 42.466667 400.733333 27.466667 190.800000 8.053333 7.693333 7.546667 7.826667 7.026667 6.766667 7.000000 7.540000 6.186667 5.746667 6.260000 7.213333 7.086667 6.740000 6.940000 7.526667 12.0 12.0 10467.933333 563.666667 1.533333 0.0 0.0 0.0 40.873138 -73.856120
11 12 364.769900 379.109453 357.943781 1101.823134 93.102564 139.442308 153.450000 110.467742 12.0 91.684504 4.151865 22.212078 18.487312 25.804834 0.0 20112012.0 0.0 81.733333 391.000000 105.500000 27.141667 70.666667 17.216667 5.500000 1.591667 109.750000 28.075000 271.750000 69.275000 3.000000 0.825000 199.916667 50.133333 191.083333 49.866667 80.000000 79.750000 49.166667 277.333333 21.166667 163.083333 8.650000 8.008333 7.808333 8.166667 7.266667 6.758333 7.250000 7.816667 7.116667 6.566667 7.116667 7.783333 7.666667 7.100000 7.391667 7.916667 12.0 12.0 10463.166667 409.000000 1.083333 0.0 0.0 0.0 40.831412 -73.886946
12 13 409.393800 424.127440 403.666361 1237.187600 232.931953 382.704142 320.773077 224.595533 13.0 218.306055 8.003842 24.621784 19.625241 28.672265 0.0 20112012.0 0.0 63.861538 902.000000 35.461538 8.723077 65.692308 12.084615 272.461538 10.453846 416.000000 68.476923 110.000000 16.276923 96.230769 3.623077 482.769231 48.707692 419.230769 51.292308 84.153846 77.615385 35.769231 759.846154 40.000000 318.384615 8.330769 7.700000 7.692308 7.884615 7.092308 6.830769 7.069231 7.600000 6.407692 6.069231 6.576923 7.376923 7.269231 6.869231 7.123077 7.607692 12.0 12.0 11207.153846 895.153846 2.076923 0.0 0.0 0.0 40.692865 -73.977016
13 14 395.937100 398.189765 385.333049 1179.459915 77.798077 114.873626 123.282143 112.347926 14.0 123.643728 4.905127 24.311291 19.554945 28.511594 0.0 20112012.0 0.0 75.928571 556.214286 57.785714 9.485714 97.714286 18.600000 19.500000 3.871429 222.428571 42.228571 291.928571 49.664286 18.714286 3.507143 324.714286 58.050000 231.500000 41.942857 77.714286 85.500000 35.071429 395.642857 32.428571 158.214286 8.357143 7.864286 7.735714 7.971429 7.271429 6.650000 7.257143 7.735714 6.685714 6.207143 6.742857 7.507143 7.435714 6.914286 7.250000 7.750000 12.0 12.0 11210.785714 545.357143 2.000000 0.0 0.0 0.0 40.711599 -73.948360
14 15 395.679934 404.628524 390.295854 1190.604312 94.574786 141.581197 153.450000 104.207885 15.0 135.707319 5.236588 25.423643 20.900953 29.287015 0.0 20112012.0 0.0 70.800000 562.666667 48.888889 8.844444 103.333333 17.866667 26.333333 4.966667 251.777778 47.000000 246.111111 41.266667 36.444444 6.377778 270.666667 48.211111 292.000000 51.788889 81.444444 82.888889 32.333333 433.000000 34.777778 170.000000 8.077778 7.577778 7.477778 7.844444 6.566667 5.877778 6.288889 7.011111 6.177778 5.900000 6.333333 7.222222 6.933333 6.444444 6.700000 7.366667 12.0 12.0 11214.222222 573.111111 1.666667 0.0 0.0 0.0 40.675972 -73.989255
15 16 371.529851 379.164179 369.415672 1120.109701 82.264423 126.519231 153.450000 247.185484 16.0 177.501282 6.970513 24.234006 18.070192 29.308333 0.0 20112012.0 0.0 66.150000 658.500000 18.750000 3.725000 131.750000 19.775000 4.750000 1.075000 562.000000 80.150000 80.750000 16.900000 4.500000 0.925000 377.750000 53.400000 280.750000 46.600000 68.500000 75.250000 19.500000 458.000000 31.500000 122.000000 7.375000 6.900000 6.875000 7.075000 5.750000 5.250000 6.025000 6.275000 5.875000 5.775000 6.225000 7.100000 6.325000 5.975000 6.375000 6.825000 12.0 12.0 11219.000000 440.250000 1.750000 0.0 0.0 0.0 40.688008 -73.929686
16 17 386.571429 394.071429 380.785714 1161.428571 105.583791 163.087912 111.360714 121.357143 17.0 130.246192 5.135694 25.408854 20.284341 29.620046 0.0 20112012.0 0.0 67.635714 554.285714 52.357143 11.285714 54.571429 10.621429 16.857143 3.607143 474.285714 83.071429 50.642857 10.664286 9.000000 2.050000 242.857143 48.000000 311.428571 52.000000 78.785714 77.714286 41.214286 426.428571 27.357143 209.857143 7.835714 7.642857 7.550000 7.828571 6.957143 6.600000 7.192857 7.642857 6.007143 5.885714 6.400000 7.228571 6.935714 6.707143 7.035714 7.571429 12.0 12.0 11220.642857 547.071429 1.642857 0.0 0.0 0.0 40.660313 -73.955636
17 18 373.454545 373.090909 371.454545 1118.000000 129.028846 197.038462 153.450000 72.771261 18.0 72.209438 3.317955 22.102538 18.814864 25.448084 0.0 20112012.0 0.0 65.900000 332.272727 29.545455 8.754545 53.363636 16.363636 4.181818 1.236364 290.272727 87.327273 32.363636 9.718182 3.727273 1.163636 164.909091 51.227273 167.363636 48.772727 78.181818 83.363636 31.454545 222.181818 18.454545 89.181818 8.272727 7.718182 7.572727 7.963636 6.890909 6.400000 6.936364 7.390909 6.190909 6.018182 6.409091 7.209091 7.109091 6.718182 6.963636 7.527273 12.0 12.0 11224.000000 344.000000 1.090909 0.0 0.0 0.0 40.641863 -73.914726
18 19 367.083333 377.583333 359.166667 1103.833333 88.097756 124.769231 120.670833 114.322581 19.0 105.752625 4.247924 24.314442 19.921011 28.148996 0.0 20112012.0 0.0 71.833333 492.500000 62.833333 14.941667 75.666667 15.291667 15.750000 3.191667 285.000000 55.066667 182.666667 39.916667 5.833333 1.150000 284.500000 53.725000 208.000000 46.275000 70.833333 89.166667 34.083333 311.916667 28.500000 141.333333 7.591667 7.575000 7.375000 7.658333 6.516667 6.400000 6.766667 7.308333 6.150000 5.933333 6.458333 7.141667 6.758333 6.641667 6.866667 7.350000 12.0 12.0 11207.500000 440.416667 1.916667 0.0 0.0 0.0 40.676547 -73.882158
19 20 406.223881 465.731343 401.732537 1273.687761 227.805769 359.407692 177.690000 591.374194 20.0 420.029766 14.721019 25.423366 17.681562 29.745831 0.0 20112012.0 0.0 65.840000 2462.600000 601.400000 20.800000 341.200000 15.140000 844.600000 32.920000 186.800000 12.620000 766.000000 32.300000 658.200000 21.840000 1345.200000 44.920000 1117.400000 55.080000 64.400000 78.200000 25.000000 1360.200000 107.000000 602.400000 7.920000 7.180000 7.240000 7.460000 7.820000 7.660000 7.800000 8.280000 7.120000 6.340000 7.040000 7.640000 7.620000 7.080000 7.380000 7.800000 12.0 12.0 11210.200000 2521.400000 3.800000 0.0 0.0 0.0 40.626751 -74.006191
20 21 395.283582 421.786974 389.242062 1206.312619 135.467657 203.835664 142.377273 275.351906 21.0 224.702989 8.191627 25.284294 20.167110 28.885220 0.0 20112012.0 0.0 62.227273 1160.000000 160.909091 17.163636 157.363636 14.454545 262.090909 17.272727 364.909091 33.409091 230.454545 23.409091 297.454545 25.527273 616.090909 58.172727 543.909091 41.827273 79.272727 89.181818 34.545455 930.545455 64.454545 336.181818 7.818182 7.336364 7.263636 7.527273 7.072727 6.400000 6.963636 7.472727 6.390909 6.081818 6.500000 7.163636 7.090909 6.609091 6.909091 7.390909 12.0 12.0 11221.000000 1098.272727 3.272727 0.0 0.0 0.0 40.593596 -73.978465
21 22 473.500000 502.750000 474.250000 1450.500000 391.007212 614.509615 370.362500 580.250000 22.0 495.279369 15.777391 30.077636 22.509675 33.464740 0.0 20112012.0 0.0 44.225000 2142.500000 138.250000 4.400000 166.250000 6.650000 507.750000 17.975000 622.250000 35.100000 271.000000 12.350000 732.000000 33.975000 1015.750000 44.300000 1126.750000 55.700000 90.250000 87.500000 48.750000 1887.250000 89.500000 1106.250000 8.300000 7.325000 7.525000 7.775000 8.175000 7.400000 7.600000 8.200000 7.025000 6.050000 6.775000 7.500000 7.850000 6.900000 7.300000 7.850000 12.0 12.0 11223.000000 2149.000000 2.250000 0.0 0.0 0.0 40.618285 -73.952288
22 23 380.666667 398.666667 378.000000 1157.333333 29.000000 31.000000 153.450000 87.000000 23.0 120.113095 4.985119 24.299702 18.654762 28.994048 0.0 20112012.0 0.0 70.400000 503.666667 8.000000 1.600000 68.666667 14.466667 6.000000 1.133333 438.000000 87.133333 53.333333 10.600000 3.333333 0.566667 221.333333 44.433333 282.333333 55.566667 84.000000 73.666667 40.666667 420.333333 25.333333 175.000000 7.900000 7.533333 7.566667 7.700000 7.033333 6.433333 6.666667 7.366667 6.233333 6.066667 6.766667 7.566667 7.066667 6.700000 7.000000 7.566667 12.0 12.0 11219.000000 391.000000 1.333333 0.0 0.0 0.0 40.668586 -73.912298
23 24 405.846154 434.000000 402.153846 1242.000000 126.474852 179.094675 115.165385 234.682382 24.0 213.471903 8.613529 23.865936 18.565040 28.251471 0.0 20112012.0 0.0 62.584615 991.000000 179.769231 21.069231 97.076923 9.607692 182.692308 17.176923 74.769231 7.800000 616.615385 63.415385 113.461538 11.284615 572.769231 51.646154 418.230769 48.353846 83.846154 83.230769 35.692308 765.461538 47.769231 247.692308 8.461538 7.684615 7.569231 7.869231 7.669231 6.569231 7.230769 7.723077 7.107692 6.169231 6.784615 7.500000 7.753846 6.807692 7.192308 7.707692 12.0 12.0 11206.153846 962.461538 2.230769 0.0 0.0 0.0 40.740621 -73.911518
24 25 437.250000 483.500000 436.250000 1357.000000 205.260817 279.889423 174.793750 268.733871 25.0 280.576007 9.990385 27.106270 21.213189 31.371220 0.0 20112012.0 0.0 52.950000 1356.750000 257.500000 19.737500 136.750000 9.975000 465.250000 40.600000 240.375000 13.062500 494.000000 29.237500 148.875000 16.537500 669.500000 48.500000 687.250000 51.500000 87.000000 78.500000 37.625000 1028.250000 56.000000 370.875000 8.175000 7.537500 7.412500 7.687500 7.487500 6.825000 7.200000 7.712500 6.912500 6.150000 6.712500 7.400000 7.525000 6.837500 7.112500 7.612500 12.0 12.0 11361.000000 1288.875000 1.875000 0.0 0.0 0.0 40.745414 -73.815558
25 26 445.200000 487.600000 444.800000 1377.600000 410.605769 632.407692 392.090000 825.600000 26.0 595.953216 19.240117 29.971140 20.315789 33.554327 0.0 20112012.0 0.0 43.800000 2991.600000 248.600000 7.480000 350.000000 11.960000 1259.200000 38.140000 676.400000 28.420000 604.800000 19.380000 432.400000 13.420000 1475.000000 49.220000 1516.600000 50.780000 67.000000 83.400000 23.800000 1930.800000 129.800000 684.200000 7.720000 6.980000 7.220000 7.260000 7.000000 6.580000 6.840000 7.260000 6.760000 6.060000 6.660000 7.380000 7.140000 6.540000 6.900000 7.300000 12.0 12.0 11388.600000 2837.400000 4.600000 0.0 0.0 0.0 40.748507 -73.759176
26 27 407.800000 422.200000 394.300000 1224.300000 100.611538 145.315385 95.125000 288.961290 27.0 249.324536 9.019043 26.649659 20.757535 30.938081 0.0 20112012.0 0.0 61.340000 1148.200000 117.800000 6.820000 134.600000 11.950000 263.100000 16.890000 365.700000 39.760000 394.100000 31.380000 103.900000 10.750000 610.700000 52.400000 537.500000 47.600000 76.900000 86.500000 40.900000 775.100000 59.200000 276.800000 7.830000 7.400000 7.370000 7.640000 6.930000 6.780000 7.060000 7.630000 6.390000 6.000000 6.500000 7.310000 7.010000 6.710000 6.950000 7.510000 12.0 12.0 11556.300000 1072.000000 2.500000 0.0 0.0 0.0 40.638828 -73.807823
27 28 445.941655 465.997286 435.908005 1347.846947 182.010490 273.559441 175.336364 351.214076 28.0 255.381164 8.719058 26.770807 21.349816 30.172038 0.0 20112012.0 0.0 53.790909 1210.636364 87.363636 4.963636 123.454545 9.827273 403.727273 31.345455 343.727273 37.509091 283.454545 19.954545 165.636364 9.400000 590.363636 45.863636 620.272727 54.136364 85.090909 85.545455 40.545455 872.545455 54.727273 362.636364 8.172727 7.590909 7.472727 7.781818 7.600000 7.145455 7.363636 7.972727 6.627273 6.081818 6.618182 7.454545 7.463636 6.936364 7.145455 7.745455 12.0 12.0 11422.000000 1304.272727 2.545455 0.0 0.0 0.0 40.709344 -73.806367
28 29 395.764925 399.457090 386.707836 1181.929851 63.385817 96.514423 135.268750 98.108871 29.0 88.372155 3.454193 25.355632 21.409201 28.672730 0.0 20112012.0 0.0 52.362500 428.250000 12.625000 2.912500 56.125000 13.037500 20.750000 4.825000 359.875000 83.987500 39.000000 9.150000 4.125000 1.050000 219.875000 51.862500 208.375000 48.137500 77.625000 90.625000 44.750000 294.500000 21.375000 146.500000 7.762500 7.362500 7.325000 7.562500 7.050000 6.912500 7.162500 7.575000 6.075000 5.962500 6.487500 7.250000 6.962500 6.737500 7.000000 7.475000 12.0 12.0 11413.625000 474.125000 1.250000 0.0 0.0 0.0 40.685276 -73.752740
29 30 430.679934 465.961857 429.740299 1326.382090 157.231838 252.123932 115.150000 310.526882 30.0 251.803744 9.260486 25.715644 19.312273 29.760726 0.0 20112012.0 0.0 54.211111 1221.555556 220.555556 16.133333 125.111111 7.988889 281.777778 24.866667 124.222222 10.444444 607.222222 43.522222 204.444444 20.788889 604.111111 44.777778 617.444444 55.222222 89.000000 85.444444 42.333333 963.444444 63.111111 318.777778 8.222222 7.444444 7.433333 7.700000 7.500000 6.655556 6.911111 7.533333 7.033333 6.166667 6.844444 7.511111 7.577778 6.744444 7.077778 7.588889 12.0 12.0 11103.000000 1123.333333 2.555556 0.0 0.0 0.0 40.755398 -73.932306
30 31 457.500000 472.500000 452.500000 1382.500000 228.908654 355.111538 194.435000 450.787097 31.0 380.528319 13.251284 28.119729 19.995494 32.785641 0.0 20112012.0 0.0 38.310000 1850.600000 64.200000 2.860000 301.500000 16.710000 177.700000 10.300000 307.300000 18.270000 418.700000 22.170000 938.700000 48.810000 948.500000 52.090000 902.100000 47.910000 88.900000 91.300000 43.500000 1476.600000 88.900000 593.300000 7.800000 7.390000 7.460000 7.620000 7.210000 7.140000 7.390000 7.860000 6.530000 6.070000 6.780000 7.330000 7.200000 6.870000 7.200000 7.610000 12.0 12.0 10307.100000 1847.500000 5.000000 0.0 0.0 0.0 40.595680 -74.125726
31 32 371.500000 385.833333 362.166667 1119.500000 70.342949 100.179487 83.558333 105.333333 32.0 100.525613 4.450572 22.479804 17.936520 26.816013 0.0 20112012.0 0.0 82.866667 420.833333 79.500000 18.533333 67.500000 15.650000 6.333333 1.750000 82.000000 19.833333 326.166667 76.850000 3.833333 0.966667 217.833333 51.716667 203.000000 48.283333 75.666667 72.333333 34.666667 312.833333 22.333333 137.833333 8.350000 8.050000 7.883333 8.216667 7.050000 6.166667 6.900000 7.483333 6.766667 6.033333 6.783333 7.483333 7.383333 6.750000 7.200000 7.733333 12.0 12.0 11231.666667 381.500000 1.000000 0.0 0.0 0.0 40.696295 -73.917124
In [61]:
import numpy as np
from mpl_toolkits.basemap import Basemap
m = Basemap(
projection='merc',
llcrnrlat=40.496044,
urcrnrlat=40.915256,
llcrnrlon=-74.255735,
urcrnrlon=-73.700272,
resolution='f'
)

m.drawmapboundary(fill_color='#ABD1FF')
m.drawcoastlines(color='black', linewidth=.4)
m.drawrivers(color='#ABD1FF', linewidth=.4)
m.fillcontinents(color= '#F2F1EF',lake_color='#ABD1FF')
longitudes = districts['lon'].tolist()
latitudes = districts['lat'].tolist()

m.scatter(longitudes, latitudes, s=50, zorder=2, latlon=True, c=districts['saf_tot_11'], cmap='viridis')
plt.axis(aspect='equal')
plt.xlabel('longitude')
plt.ylabel('latitude')
plt.colorbar(label='Safety Score')
plt.title('New York City: Schools Safety Score')
plt.savefig('new_york_schools.png')
plt.show()


From the Map, we can observe that parts of Manhattan, Bronx and Queens have relatively higher safety scores. While Brooklyn has mostly lower safety scores.

Racial differences in SAT Scores¶

There are a few columns that indicate the percentage of each race at a given school:

• white_per
• asian_per
• black_per
• hispanic_per

By plotting out the correlations between these columns and sat_score, we can determine whether there are any racial differences in SAT performance. We will investigate racial differences in SAT scores by making a bar plot of the correlations between the columns above and sat_score.

In [23]:
races_per = ['white_per', 'asian_per', 'black_per', 'hispanic_per']
combined.corr()['sat_score'][races_per].plot.bar()
plt.show()


From the bar plot above we can notice that higher percentage of White or Asian students at a school correlates with higher SAT Scores. And vice versa for Black and Hispanic Students. It can be due to various external factors, like economic background of students, funding to Schools, etc.

We will continue exploration regarding race and SAT scores by makign a scatter plot of hispanic_per vs. sat_score.

In [24]:
combined.plot.scatter(x='hispanic_per', y='sat_score')
plt.show()


There seems to be a negative correlation between hispanic students percentage and SAT Score. This may be due to a lack of funding for schools in certain areas, which are more likely to have a higher percentage of black or hispanic students.

Filtering down on hispanic population by researching schools with a hispanic_per greater than 95%.

In [25]:
combined[combined['hispanic_per']>95]['SCHOOL NAME']

Out[25]:
44                         MANHATTAN BRIDGES HIGH SCHOOL
82      WASHINGTON HEIGHTS EXPEDITIONARY LEARNING SCHOOL
89     GREGORIO LUPERON HIGH SCHOOL FOR SCIENCE AND M...
125                  ACADEMY FOR LANGUAGE AND TECHNOLOGY
141                INTERNATIONAL SCHOOL FOR LIBERAL ARTS
176     PAN AMERICAN INTERNATIONAL HIGH SCHOOL AT MONROE
253                            MULTICULTURAL HIGH SCHOOL
286               PAN AMERICAN INTERNATIONAL HIGH SCHOOL
Name: SCHOOL NAME, dtype: object

We can observe from our findings that the above schools are catering mostly to the immigrant community, who have recently migrated to USA from neighboring hispanic countries. And are learning English, which can be correlated to lower SAT Scores.

Researching schools with a hispanic_per less than 10% and an average SAT score greater than 1800.

In [26]:
combined[(combined['hispanic_per'] < 10) & (combined['sat_score'] > 1800)]['SCHOOL NAME']

Out[26]:
37                                STUYVESANT HIGH SCHOOL
151                         BRONX HIGH SCHOOL OF SCIENCE
187                       BROOKLYN TECHNICAL HIGH SCHOOL
327    QUEENS HIGH SCHOOL FOR THE SCIENCES AT YORK CO...
356                  STATEN ISLAND TECHNICAL HIGH SCHOOL
Name: SCHOOL NAME, dtype: object

From our findings, we observe that the above schools are sepcialized in Science & Technology. Plus, have a criteria (clearing specific entrance exams) for admitting students, which explaining the High SAT Scores. But, there is no specific reason for low Hispanic proportion of students.

Gender differences in SAT Scores¶

There are two columns that indicate the percentage of each gender at a school:

• male_per
• female_per

We can plot out the correlations between each percentage and sat_score by making a bar plot.

In [27]:
gender_per = ['male_per', 'female_per']
combined.corr()['sat_score'][gender_per].plot.bar()
plt.show()


Here, from the bar plot we can observe that higher percentage of Female Students correlates positively with SAT Scores and vice versa for Male Students. However neither of the correlations are strong.

Investigating schools with high SAT scores and a high female_per by making a scatter plot of female_per vs. sat_score.

In [28]:
combined.plot.scatter(x='female_per', y='sat_score')
plt.show()


From the scatter plot above, there doesn't seem to be any strong correlation among the SAT Scores and Female Percentage.

Researching schools with a female_per greater than 60% and an average SAT score greater than 1700.

In [29]:
combined[(combined['sat_score'] > 1700) & (combined['female_per'] > 60)]['SCHOOL NAME']

Out[29]:
5                         BARD HIGH SCHOOL EARLY COLLEGE
26                         ELEANOR ROOSEVELT HIGH SCHOOL
60                                    BEACON HIGH SCHOOL
61     FIORELLO H. LAGUARDIA HIGH SCHOOL OF MUSIC & A...
302                          TOWNSEND HARRIS HIGH SCHOOL
Name: SCHOOL NAME, dtype: object

From our findings, we notice that the above schools have their major focus primarily on two things:

• Arts
• Preparing students for college

These appear to be very selective liberal arts schools that have high academic standards.

Conclusion:¶

There isn't enough evidence to claim that the SAT scores are unfair to certain groups. On surface, it looks like the SAT Scores are unfair. But on further investigation, we can see that there are multiple other factors (like immigration, funding, etc.) which may influence academic performance.

In [ ]: