The gender gap in college education is not a new topic. This gap is reported on often in the news and not everyone agrees that there is a gap.
The Department of Education Statistics releases Digest of Education Statistics annually. The data set that we are going to explore contains the percentage of bachelor's degrees granted to women from 1970 to 2012 and is broken up into 17 categories of degrees, with each column as a separate category.
Randal Olson
, a data scientist at University of Pennsylvania, has cleaned the data set and made it available on his personal website. The data set can be download here.
In this mission, we'll explore how we can communicate the nuanced narrative of gender gap using effective data visualization.
Before anything else, let's import required libraries to setup the environment. %matplotlib inline
will be run to see see the plots in the Jupyter notbook itself.
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
Read the percent-bachelors-degrees-women-usa.csv
into pandas and assign the resulting DataFrame to a variable named women_degrees
.
women_degrees = pd.read_csv('percent-bachelors-degrees-women-usa.csv')
Let's read first row of the dataset to get an idea about the data in each columns.
women_degrees.iloc[0] # Read the first row of the dataset
Year 1970.000000 Agriculture 4.229798 Architecture 11.921005 Art and Performance 59.700000 Biology 29.088363 Business 9.064439 Communications and Journalism 35.300000 Computer Science 13.600000 Education 74.535328 Engineering 0.800000 English 65.570923 Foreign Languages 73.800000 Health Professions 77.100000 Math and Statistics 38.000000 Physical Sciences 13.800000 Psychology 44.400000 Public Administration 68.400000 Social Sciences and History 36.800000 Name: 0, dtype: float64
Then let's look at top five rows of the dataset to understand how the data is structured.
women_degrees.head() #Read top five rows of the dataset
Year | Agriculture | Architecture | Art and Performance | Biology | Business | Communications and Journalism | Computer Science | Education | Engineering | English | Foreign Languages | Health Professions | Math and Statistics | Physical Sciences | Psychology | Public Administration | Social Sciences and History | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1970 | 4.229798 | 11.921005 | 59.7 | 29.088363 | 9.064439 | 35.3 | 13.6 | 74.535328 | 0.8 | 65.570923 | 73.8 | 77.1 | 38.0 | 13.8 | 44.4 | 68.4 | 36.8 |
1 | 1971 | 5.452797 | 12.003106 | 59.9 | 29.394403 | 9.503187 | 35.5 | 13.6 | 74.149204 | 1.0 | 64.556485 | 73.9 | 75.5 | 39.0 | 14.9 | 46.2 | 65.5 | 36.2 |
2 | 1972 | 7.420710 | 13.214594 | 60.4 | 29.810221 | 10.558962 | 36.6 | 14.9 | 73.554520 | 1.2 | 63.664263 | 74.6 | 76.9 | 40.2 | 14.8 | 47.6 | 62.6 | 36.1 |
3 | 1973 | 9.653602 | 14.791613 | 60.2 | 31.147915 | 12.804602 | 38.4 | 16.4 | 73.501814 | 1.6 | 62.941502 | 74.9 | 77.4 | 40.9 | 16.5 | 50.4 | 64.3 | 36.4 |
4 | 1974 | 14.074623 | 17.444688 | 61.9 | 32.996183 | 16.204850 | 40.5 | 18.9 | 73.336811 | 2.2 | 62.413412 | 75.3 | 77.9 | 41.8 | 18.2 | 52.6 | 66.1 | 37.3 |
The dataset represents the percentage of bachelor's degrees granted to women under 17 disciplines from 1970 to 2012 and is organized in the ascending order of year.
All data in data set are stored as floating point numbers.
Set the line color for the line charts to the dark blue color and orange color from the Color Blind 10 palette.
#The color for each line is assigned here
cb_dark_blue = (0/255,107/255,164/255)
cb_orange = (255/255, 128/255, 14/255)
There are seventeen degrees that we need to generate line charts. We can therefore group the degrees into STEM, liberal arts, and other, in the following way:
#The disciplines for each categories are assigned here
stem_cats = ['Engineering', 'Computer Science', 'Psychology', 'Biology', 'Physical Sciences', 'Math and Statistics']
lib_arts_cats = ['Foreign Languages', 'English', 'Communications and Journalism', 'Art and Performance',
'Social Sciences and History'
]
other_cats = ['Health Professions', 'Public Administration', 'Education', 'Agriculture','Business', 'Architecture']
As we have already categorized seventeen degrees into three groups and maximum numbers of degrees in any category are six, we now generate a 6 row by 3 column grid of subplots. Then we Generate line charts for both male and female percentages for every degree in each categories and add text annotations for Women
and Men
in the topmost and bottommost plots.
#Generate a figure for all plots
fig = plt.figure(figsize=(18, 30))
#Generate the line charts for STEM degrees
for sp1 in range(0,6): #Create a loop to generate six subplots for six STEM degrees
ax1 = fig.add_subplot(6,3,(sp1*3)+1) #add six subplots to first coumn of the figure
ax1.plot(women_degrees['Year'], women_degrees[stem_cats[sp1]], c=cb_dark_blue, label='Women', linewidth=3)
ax1.plot(women_degrees['Year'], 100-women_degrees[stem_cats[sp1]], c=cb_orange, label='Men', linewidth=3)
#Add text annotations for Women and Men in the topmost and bottommost plots of first coumn
if sp1 == 0:
ax1.text(2008, 87, 'Men')
ax1.text(2006, 10, 'Women')
elif sp1 == 5:
ax1.text(2008, 61, 'Men')
ax1.text(2006, 37, 'Women')
#Set titles for all plots in STEM category
ax1.set_title(stem_cats[sp1])
#Generate line charts for Liberal arts degrees
for sp2 in range(0,5): #Create a loop to generate six subplots for five Liberal arts degrees
ax2 = fig.add_subplot(6,3,(sp2*3)+2) #add five subplots to second coumn of the figure
ax2.plot(women_degrees['Year'], women_degrees[lib_arts_cats[sp2]], c=cb_dark_blue, label='Women', linewidth=3)
ax2.plot(women_degrees['Year'], 100-women_degrees[lib_arts_cats[sp2]], c=cb_orange, label='Men', linewidth=3)
#Add text annotations for Women and Men in the topmost and bottommost plots of second coumn
if sp2 == 0:
ax2.text(2008, 24, 'Men')
ax2.text(2006, 74, 'Women')
elif sp2 == 4:
ax2.text(2008, 54, 'Men')
ax2.text(2006, 43, 'Women')
#Set titles for all plots in Liberal arts category
ax2.set_title(lib_arts_cats[sp2])
#Generate line charts for other degrees
for sp3 in range(0,6):
ax3 = fig.add_subplot(6,3,(sp3*3)+3) #add six subplots to first coumn of the figure
ax3.plot(women_degrees['Year'], women_degrees[other_cats[sp3]], c=cb_dark_blue, label='Women', linewidth=3)
ax3.plot(women_degrees['Year'], 100-women_degrees[other_cats[sp3]], c=cb_orange, label='Men', linewidth=3)
#Add text annotations for Women and Men in the topmost and bottommost plots of third coumn
if sp3 == 0:
ax3.text(2008, 11, 'Men')
ax3.text(2006, 87, 'Women')
elif sp3 == 5:
ax3.text(2008, 62, 'Men')
ax3.text(2006, 36, 'Women')
#Set titles for all plots in Other category
ax3.set_title(other_cats[sp3])
#Display all figures
plt.show()
To improve the field of view, let's remove the non-data elements from plots
#Generate a figure for all plots
fig = plt.figure(figsize=(18, 30))
#Generate line charts for STEM degrees
for sp1 in range(0,6): #Create a loop to generate six subplots for six STEM degrees
ax1 = fig.add_subplot(6,3,(sp1*3)+1) #add six subplots to first coumn of the figure
ax1.plot(women_degrees['Year'], women_degrees[stem_cats[sp1]], c=cb_dark_blue, label='Women', linewidth=3)
ax1.plot(women_degrees['Year'], 100-women_degrees[stem_cats[sp1]], c=cb_orange, label='Men', linewidth=3)
#Add text annotations for Women and Men in the topmost and bottommost plots of first coumn
if sp1 == 0:
ax1.text(2008, 87, 'Men')
ax1.text(2006, 10, 'Women')
elif sp1 == 5:
ax1.text(2008, 61, 'Men')
ax1.text(2006, 37, 'Women')
#Remove all of the tick marks and the x-axis labels for all line charts except the bottommost line charts in first column
ax1.tick_params(bottom="off", top="off", left="off", right="off", labelbottom='off')
if sp1 == 5:
ax1.tick_params(labelbottom = 'on')
#Removes the spines for the all axis of all plots in first column
ax1.spines["top"].set_visible(False)
ax1.spines["right"].set_visible(False)
ax1.spines["bottom"].set_visible(False)
ax1.spines["left"].set_visible(False)
#Set titles for all plots in STEM category
ax1.set_title(stem_cats[sp1])
#Generate line charts for Liberal arts degrees
for sp2 in range(0,5): #Create a loop to generate six subplots for five Liberal arts degrees
ax2 = fig.add_subplot(6,3,(sp2*3)+2) #add five subplots to second coumn of the figure
ax2.plot(women_degrees['Year'], women_degrees[lib_arts_cats[sp2]], c=cb_dark_blue, label='Women', linewidth=3)
ax2.plot(women_degrees['Year'], 100-women_degrees[lib_arts_cats[sp2]], c=cb_orange, label='Men', linewidth=3)
#Add text annotations for Women and Men in the topmost and bottommost plots of second coumn
if sp2 == 0:
ax2.text(2008, 24, 'Men')
ax2.text(2006, 74, 'Women')
elif sp2 == 4:
ax2.text(2008, 54, 'Men')
ax2.text(2006, 43, 'Women')
#Remove all of the tick marks and the x-axis labels for all line charts except the bottommost line charts in second column
ax2.tick_params(bottom="off", top="off", left="off", right="off", labelbottom='off')
if sp2 == 4:
ax2.tick_params(labelbottom = 'on')
#Removes the spines for the all axis of all plots in second column
ax2.spines["top"].set_visible(False)
ax2.spines["right"].set_visible(False)
ax2.spines["bottom"].set_visible(False)
ax2.spines["left"].set_visible(False)
#Set titles for all plots in Liberal arts category
ax2.set_title(lib_arts_cats[sp2])
#Generate line charts for other degrees
for sp3 in range(0,6):
ax3 = fig.add_subplot(6,3,(sp3*3)+3) #add six subplots to first coumn of the figure
ax3.plot(women_degrees['Year'], women_degrees[other_cats[sp3]], c=cb_dark_blue, label='Women', linewidth=3)
ax3.plot(women_degrees['Year'], 100-women_degrees[other_cats[sp3]], c=cb_orange, label='Men', linewidth=3)
#Add text annotations for Women and Men in the topmost and bottommost plots of third coumn
if sp3 == 0:
ax3.text(2008, 8, 'Men')
ax3.text(2006, 90, 'Women')
elif sp3 == 5:
ax3.text(2008, 62, 'Men')
ax3.text(2006, 36, 'Women')
#Remove all of the tick marks and the x-axis labels for all line charts except the bottommost line charts in third column
ax3.tick_params(bottom="off", top="off", left="off", right="off", labelbottom='off')
if sp3 == 5:
ax3.tick_params(labelbottom = 'on')
#Removes the spines for the all axis of all plots in third column
ax3.spines["top"].set_visible(False)
ax3.spines["right"].set_visible(False)
ax3.spines["bottom"].set_visible(False)
ax3.spines["left"].set_visible(False)
#Set titles for all plots in Other category
ax3.set_title(other_cats[sp3])
#Display all figures
plt.show()
let's simplify the y-axis labels by keeping just the starting and ending labels (0
and 100
). As well as let's limit the x-axis values to improve readerbility of the plots
#Generate a figure for all plots
fig = plt.figure(figsize=(18, 30))
#Generate line charts for STEM degrees
for sp1 in range(0,6): #Create a loop to generate six subplots for six STEM degrees
ax1 = fig.add_subplot(6,3,(sp1*3)+1) #add six subplots to first coumn of the figure
ax1.plot(women_degrees['Year'], women_degrees[stem_cats[sp1]], c=cb_dark_blue, label='Women', linewidth=3)
ax1.plot(women_degrees['Year'], 100-women_degrees[stem_cats[sp1]], c=cb_orange, label='Men', linewidth=3)
#Add text annotations for Women and Men in the topmost and bottommost plots of first coumn
if sp1 == 0:
ax1.text(2008, 87, 'Men')
ax1.text(2006, 10, 'Women')
elif sp1 == 5:
ax1.text(2008, 61, 'Men')
ax1.text(2006, 37, 'Women')
#Remove all of the tick marks and the x-axis labels for all line charts except the bottommost line charts in first column
ax1.tick_params(bottom="off", top="off", left="off", right="off", labelbottom='off')
if sp1 == 5:
ax1.tick_params(labelbottom = 'on')
#Enable just the y-axis labels at 0 and 100 for all plots
ax1.set_yticks([0,100])
#Removes the spines for the all axis of all plots in first column
ax1.spines["top"].set_visible(False)
ax1.spines["right"].set_visible(False)
ax1.spines["bottom"].set_visible(False)
ax1.spines["left"].set_visible(False)
#Set titles for all plots in STEM category
ax1.set_title(stem_cats[sp1])
#Limit the x-axis values of first column
ax1.set_xlim(1968,2012)
#Generate line charts for Liberal arts degrees
for sp2 in range(0,5): #Create a loop to generate six subplots for five Liberal arts degrees
ax2 = fig.add_subplot(6,3,(sp2*3)+2) #add five subplots to second coumn of the figure
ax2.plot(women_degrees['Year'], women_degrees[lib_arts_cats[sp2]], c=cb_dark_blue, label='Women', linewidth=3)
ax2.plot(women_degrees['Year'], 100-women_degrees[lib_arts_cats[sp2]], c=cb_orange, label='Men', linewidth=3)
#Add text annotations for Women and Men in the topmost and bottommost plots of second coumn
if sp2 == 0:
ax2.text(2008, 24, 'Men')
ax2.text(2006, 74, 'Women')
elif sp2 == 4:
ax2.text(2008, 54, 'Men')
ax2.text(2006, 43, 'Women')
#Remove all of the tick marks and the x-axis labels for all line charts except the bottommost line charts in second column
ax2.tick_params(bottom="off", top="off", left="off", right="off", labelbottom='off')
if sp2 == 4:
ax2.tick_params(labelbottom = 'on')
#Enable just the y-axis labels at 0 and 100 for all plots
ax2.set_yticks([0,100])
#Removes the spines for the all axis of all plots in second column
ax2.spines["top"].set_visible(False)
ax2.spines["right"].set_visible(False)
ax2.spines["bottom"].set_visible(False)
ax2.spines["left"].set_visible(False)
#Set titles for all plots in Liberal arts category
ax2.set_title(lib_arts_cats[sp2])
#Limit the x-axis values of second column
ax2.set_xlim(1968,2012)
#Generate line charts for other degrees
for sp3 in range(0,6):
ax3 = fig.add_subplot(6,3,(sp3*3)+3) #add six subplots to first coumn of the figure
ax3.plot(women_degrees['Year'], women_degrees[other_cats[sp3]], c=cb_dark_blue, label='Women', linewidth=3)
ax3.plot(women_degrees['Year'], 100-women_degrees[other_cats[sp3]], c=cb_orange, label='Men', linewidth=3)
#Add text annotations for Women and Men in the topmost and bottommost plots of third coumn
if sp3 == 0:
ax3.text(2008, 8, 'Men')
ax3.text(2006, 90, 'Women')
elif sp3 == 5:
ax3.text(2008, 62, 'Men')
ax3.text(2006, 36, 'Women')
#Remove all of the tick marks and the x-axis labels for all line charts except the bottommost line charts in third column
ax3.tick_params(bottom="off", top="off", left="off", right="off", labelbottom='off')
if sp3 == 5:
ax3.tick_params(labelbottom = 'on')
#Enable just the y-axis labels at 0 and 100 for all plots
ax3.set_yticks([0,100])
#Removes the spines for the all axis of all plots in third column
ax3.spines["top"].set_visible(False)
ax3.spines["right"].set_visible(False)
ax3.spines["bottom"].set_visible(False)
ax3.spines["left"].set_visible(False)
#Set titles for all plots in Other category
ax3.set_title(other_cats[sp3])
#Limit the x-axis values of third column
ax3.set_xlim(1968,2012)
#Display all figures
plt.show()
Let's generate a horizontal line across all of the line charts where the y-axis label 50 would have been. This helps to understand which degrees have close to 50-50 gender breakdown. In addition to that we export the figure containing all of the line charts to a .png
file.
#Generate a figure for all plots
fig = plt.figure(figsize=(18, 30))
#Generate line charts for STEM degrees
for sp1 in range(0,6): #Create a loop to generate six subplots for six STEM degrees
ax1 = fig.add_subplot(6,3,(sp1*3)+1) #add six subplots to first coumn of the figure
ax1.plot(women_degrees['Year'], women_degrees[stem_cats[sp1]], c=cb_dark_blue, label='Women', linewidth=3)
ax1.plot(women_degrees['Year'], 100-women_degrees[stem_cats[sp1]], c=cb_orange, label='Men', linewidth=3)
#Add text annotations for Women and Men in the topmost and bottommost plots of first coumn
if sp1 == 0:
ax1.text(2008, 87, 'Men')
ax1.text(2006, 10, 'Women')
elif sp1 == 5:
ax1.text(2008, 61, 'Men')
ax1.text(2006, 37, 'Women')
#Remove all of the tick marks and the x-axis labels for all line charts except the bottommost line charts in first column
ax1.tick_params(bottom="off", top="off", left="off", right="off", labelbottom='off')
if sp1 == 5:
ax1.tick_params(labelbottom = 'on')
#Enable just the y-axis labels at 0 and 100 for all plots
ax1.set_yticks([0,100])
#For all plots generate a horizontal line starting at the y-axis position 50
ax1.axhline(50, c=(171/255, 171/255, 171/255), alpha=0.3)
#Removes the spines for the all axis of all plots in first column
ax1.spines["top"].set_visible(False)
ax1.spines["right"].set_visible(False)
ax1.spines["bottom"].set_visible(False)
ax1.spines["left"].set_visible(False)
#Set titles for all plots in STEM category
ax1.set_title(stem_cats[sp1])
#Limit the x-axis values of first column
ax1.set_xlim(1968,2012)
#Generate line charts for Liberal arts degrees
for sp2 in range(0,5): #Create a loop to generate six subplots for five Liberal arts degrees
ax2 = fig.add_subplot(6,3,(sp2*3)+2) #add five subplots to second coumn of the figure
ax2.plot(women_degrees['Year'], women_degrees[lib_arts_cats[sp2]], c=cb_dark_blue, label='Women', linewidth=3)
ax2.plot(women_degrees['Year'], 100-women_degrees[lib_arts_cats[sp2]], c=cb_orange, label='Men', linewidth=3)
#Add text annotations for Women and Men in the topmost and bottommost plots of second coumn
if sp2 == 0:
ax2.text(2008, 24, 'Men')
ax2.text(2006, 74, 'Women')
elif sp2 == 4:
ax2.text(2008, 54, 'Men')
ax2.text(2006, 43, 'Women')
#Remove all of the tick marks and the x-axis labels for all line charts except the bottommost line charts in second column
ax2.tick_params(bottom="off", top="off", left="off", right="off", labelbottom='off')
if sp2 == 4:
ax2.tick_params(labelbottom = 'on')
#Enable just the y-axis labels at 0 and 100 for all plots
ax2.set_yticks([0,100])
#For all plots generate a horizontal line starting at the y-axis position 50
ax2.axhline(50, c=(171/255, 171/255, 171/255), alpha=0.3)
#Removes the spines for the all axis of all plots in second column
ax2.spines["top"].set_visible(False)
ax2.spines["right"].set_visible(False)
ax2.spines["bottom"].set_visible(False)
ax2.spines["left"].set_visible(False)
#Set titles for all plots in Liberal arts category
ax2.set_title(lib_arts_cats[sp2])
#Limit the x-axis values of secon column
ax2.set_xlim(1968,2012)
#Generate line charts for other degrees
for sp3 in range(0,6):
ax3 = fig.add_subplot(6,3,(sp3*3)+3) #add six subplots to first coumn of the figure
ax3.plot(women_degrees['Year'], women_degrees[other_cats[sp3]], c=cb_dark_blue, label='Women', linewidth=3)
ax3.plot(women_degrees['Year'], 100-women_degrees[other_cats[sp3]], c=cb_orange, label='Men', linewidth=3)
#Add text annotations for Women and Men in the topmost and bottommost plots of third coumn
if sp3 == 0:
ax3.text(2008, 8, 'Men')
ax3.text(2006, 90, 'Women')
elif sp3 == 5:
ax3.text(2008, 62, 'Men')
ax3.text(2006, 36, 'Women')
#Remove all of the tick marks and the x-axis labels for all line charts except the bottommost line charts in third column
ax3.tick_params(bottom="off", top="off", left="off", right="off", labelbottom='off')
if sp3 == 5:
ax3.tick_params(labelbottom = 'on')
#Enable just the y-axis labels at 0 and 100 for all plots
ax3.set_yticks([0,100])
#For all plots generate a horizontal line starting at the y-axis position 50
ax3.axhline(50, c=(171/255, 171/255, 171/255), alpha=0.3)
#Removes the spines for the all axis of all plots in third column
ax3.spines["top"].set_visible(False)
ax3.spines["right"].set_visible(False)
ax3.spines["bottom"].set_visible(False)
ax3.spines["left"].set_visible(False)
#Set titles for all plots in Other category
ax3.set_title(other_cats[sp3])
#Limit the x-axis values of third column
ax3.set_xlim(1968,2012)
#Export the figure containing all of the line charts to "gender_degrees.png"
plt.savefig('gender_degrees.png')
#Display all figures
plt.show()
According to line charts above, we can see that:
The gender gap in majority of the degrees has decreased over the period while the degrees such as Social Sciences and History, Agriculture and Business narrowing the gap to almost zero. However, significant gender gap still exist in most of the degrees.
The dominent gender in most of the degrees remain same while in few degrees such as Psychology, Biology and Communications and Journalism has changed during the period. Further Feminine is the dominent gender in most of the degrees.
Overall, The Liberal arts is the category that has lowest gender gap.
By considering all of above observations, it can be conclude that still there is a significant gender inequality in college degrees even after implementing lots of plans to reduce gender gaps in higher education field.