This guided project has the purpose of showing how the gender gap in college behaved in the past years, comparing courses from different areas from the year of 1968 until 2011. The data was taken from The Department of Education Statistics and can be found here. Since this project was designed to be published, it uses the pallet Color Blind 10, from Tableu, to show the graphs in colors easy to identify by everyone.
First, let's import the libraries we are going to need, as well as read our data into pandas dataframes.
# displaying the graphs inline
%matplotlib inline
# importing pandas and matplotlib
import pandas as pd
import matplotlib.pyplot as plt
# reading our data
women_degrees = pd.read_csv('percent-bachelors-degrees-women-usa.csv')
women_degrees.head()
Year | Agriculture | Architecture | Art and Performance | Biology | Business | Communications and Journalism | Computer Science | Education | Engineering | English | Foreign Languages | Health Professions | Math and Statistics | Physical Sciences | Psychology | Public Administration | Social Sciences and History | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1970 | 4.229798 | 11.921005 | 59.7 | 29.088363 | 9.064439 | 35.3 | 13.6 | 74.535328 | 0.8 | 65.570923 | 73.8 | 77.1 | 38.0 | 13.8 | 44.4 | 68.4 | 36.8 |
1 | 1971 | 5.452797 | 12.003106 | 59.9 | 29.394403 | 9.503187 | 35.5 | 13.6 | 74.149204 | 1.0 | 64.556485 | 73.9 | 75.5 | 39.0 | 14.9 | 46.2 | 65.5 | 36.2 |
2 | 1972 | 7.420710 | 13.214594 | 60.4 | 29.810221 | 10.558962 | 36.6 | 14.9 | 73.554520 | 1.2 | 63.664263 | 74.6 | 76.9 | 40.2 | 14.8 | 47.6 | 62.6 | 36.1 |
3 | 1973 | 9.653602 | 14.791613 | 60.2 | 31.147915 | 12.804602 | 38.4 | 16.4 | 73.501814 | 1.6 | 62.941502 | 74.9 | 77.4 | 40.9 | 16.5 | 50.4 | 64.3 | 36.4 |
4 | 1974 | 14.074623 | 17.444688 | 61.9 | 32.996183 | 16.204850 | 40.5 | 18.9 | 73.336811 | 2.2 | 62.413412 | 75.3 | 77.9 | 41.8 | 18.2 | 52.6 | 66.1 | 37.3 |
After the first setup, we are going to display our data based in group of courses. The groups are going to be STEM (Science Technology Engineerig and Mathematics), liberal arts and other.
# Creating the new plots
stem_cats = ['Psychology', 'Biology', 'Math and Statistics', 'Physical Sciences', 'Computer Science', 'Engineering']
lib_arts_cats = ['Foreign Languages', 'English', 'Communications and Journalism', 'Art and Performance', 'Social Sciences and History']
other_cats = ['Health Professions', 'Public Administration', 'Education', 'Agriculture','Business', 'Architecture']
# creating the plot figure
fig = plt.figure(figsize=(16, 20))
# generating the first column of the grid: STEM courses
for sp in range(0, 18, 3):
# creating the category index
cat_index = int(sp/3)
# positioning each graph in a different place
ax = fig.add_subplot(6,3,sp+1)
# creating two lines in the same graph, with special configuration (color and linewidth)
ax.plot(women_degrees['Year'], women_degrees[stem_cats[cat_index]], c=(0,107/255,164/255), linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[stem_cats[cat_index]], c=(1, 128/255, 14/255), linewidth=3)
# removing the spines
for spine in ax.spines.values():
spine.set_visible(False)
# setting the same limits for all graphs
ax.set_xlim(1968, 2011)
ax.set_ylim(0,100)
# reducing the amount of ticks in the y axis to improve readability
ax.set_yticks([0, 100])
# putting a horizontal line to show where the 50 is, also change the color and transparency (alpha)
ax.axhline(50, c=(171/255,171/255,171/255), alpha=0.3)
# setting the titles for each graph
ax.set_title(stem_cats[cat_index])
# removing the ticks and bottom labels
ax.tick_params(bottom=False, top=False, left=False, right=False, labelbottom =False)
# annotating the legend of the lines in the first and last graph
if cat_index == 0:
ax.text(2005, 83, 'Women')
ax.text(2006, 11, 'Men')
elif cat_index == 5:
ax.text(2007, 88, 'Men')
ax.text(2005, 6, 'Women')
# setting the label bottom only in the last one
ax.tick_params(labelbottom =True)
# generating the second column of the grid: liberal arts courses
for sp in range(1, 16, 3):
# creating the category index
cat_index = int((sp-1)/3)
# positioning each graph in a different place
ax = fig.add_subplot(6,3,sp+1)
# creating two lines in the same graph, with special configuration (color and linewidth)
ax.plot(women_degrees['Year'], women_degrees[lib_arts_cats[cat_index]], c=(0,107/255,164/255), linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[lib_arts_cats[cat_index]], c=(1, 128/255, 14/255), linewidth=3)
# removing the spines
for spine in ax.spines.values():
spine.set_visible(False)
# setting the same limits for all graphs
ax.set_xlim(1968, 2011)
ax.set_ylim(0,100)
# reducing the amount of ticks in the y axis to improve readability
ax.set_yticks([0, 100])
# putting a horizontal line to show where the 50 is, also change the color and transparency (alpha)
ax.axhline(50, c=(171/255,171/255,171/255), alpha=0.3)
# setting the titles for each graph
ax.set_title(lib_arts_cats[cat_index])
# removing the ticks and bottom labels
ax.tick_params(bottom=False, top=False, left=False, right=False, labelbottom =False)
# annotating the legend of the lines in the first and last graph (women -> blue)
if cat_index == 0:
ax.text(2004, 75, 'Women')
ax.text(2006, 21, 'Men')
if cat_index == 4:
ax.text(2005, 55, 'Men')
ax.text(2005, 40, 'Women')
# setting the label bottom only in the last one
ax.tick_params(labelbottom =True)
# generating the third column of the grid: other courses
for sp in range(2, 20, 3):
# creating the category index
cat_index = int((sp-2)/3)
# positioning each graph in a different place
ax = fig.add_subplot(6,3,sp+1)
# creating two lines in the same graph, with special configuration (color and linewidth)
ax.plot(women_degrees['Year'], women_degrees[other_cats[cat_index]], c=(0,107/255,164/255), linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[other_cats[cat_index]], c=(1, 128/255, 14/255), linewidth=3)
# removing the spines
for spine in ax.spines.values():
spine.set_visible(False)
# setting the same limits for all graphs
ax.set_xlim(1968, 2011)
ax.set_ylim(0,100)
# reducing the amount of ticks in the y axis to improve readability
ax.set_yticks([0, 100])
# putting a horizontal line to show where the 50 is, also change the color and transparency (alpha)
ax.axhline(50, c=(171/255,171/255,171/255), alpha=0.3)
# setting the titles for each graph
ax.set_title(other_cats[cat_index])
# removing the ticks and bottom labels
ax.tick_params(bottom=False, top=False, left=False, right=False, labelbottom =False)
# annotating the legend of the lines in the first and last graph (women -> blue)
if cat_index == 0:
ax.text(2005, 89, 'Women')
ax.text(2006, 7, 'Men')
elif cat_index == 5:
ax.text(2006, 62, 'Men')
ax.text(2005, 34, 'Women')
# setting the label bottom only in the last one
ax.tick_params(labelbottom =True)
# saving our image - fig.savefig() also works
# plt.savefig('gender_degrees.png')
# showing the graphs
plt.show()
After all this graph plotting we can see that the gap between women and men college degrees behave differently in the different categories.
From all the courses here, we can see that just a couple of them still present a big gap, this courses are Computer Science and Engineering, both in STEM category.
The courses that had the best evolution through time are Physical Sciences, Agriculture, Business and Architecture, courses that started with a big gap back in 1968 and achieved a cool spot in 2010.
After this small analysis we can see that we still need to understand why this two STEM courses (Computer Science and Engineering) have such a big gap. Only this way we can find out the difficulties that make women not to choose this areas.