The gender gap in college education is not a new topic. This gap is reported on often in the news and not everyone agrees that there is a gap.
The Department of Education Statistics releases Digest of Education Statistics annually. The data set that we are going to explore contains the percentage of bachelor's degrees granted to women from 1970 to 2012 and is broken up into 17 categories of degrees, with each column as a separate category.
Randal Olson
, a data scientist at University of Pennsylvania, has cleaned the data set and made it available on his personal website. The data set can be download here.
In this mission, we'll explore how we can communicate the nuanced narrative of gender gap using effective data visualization.
Before anything else, let's import required libraries to setup the environment. %matplotlib inline
will be run to see see the plots in the Jupyter notbook itself.
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
Read the percent-bachelors-degrees-women-usa.csv
into pandas and assign the resulting DataFrame to a variable named women_degrees
.
women_degrees = pd.read_csv('percent-bachelors-degrees-women-usa.csv')
Let's read first row of the dataset to get an idea about the data in each columns.
women_degrees.iloc[0] # Read the first row of the dataset
Year 1970.000000 Agriculture 4.229798 Architecture 11.921005 Art and Performance 59.700000 Biology 29.088363 Business 9.064439 Communications and Journalism 35.300000 Computer Science 13.600000 Education 74.535328 Engineering 0.800000 English 65.570923 Foreign Languages 73.800000 Health Professions 77.100000 Math and Statistics 38.000000 Physical Sciences 13.800000 Psychology 44.400000 Public Administration 68.400000 Social Sciences and History 36.800000 Name: 0, dtype: float64
Then let's look at top five rows of the dataset to understand how the data is structured.
women_degrees.head() #Read top five rows of the dataset
Year | Agriculture | Architecture | Art and Performance | Biology | Business | Communications and Journalism | Computer Science | Education | Engineering | English | Foreign Languages | Health Professions | Math and Statistics | Physical Sciences | Psychology | Public Administration | Social Sciences and History | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1970 | 4.229798 | 11.921005 | 59.7 | 29.088363 | 9.064439 | 35.3 | 13.6 | 74.535328 | 0.8 | 65.570923 | 73.8 | 77.1 | 38.0 | 13.8 | 44.4 | 68.4 | 36.8 |
1 | 1971 | 5.452797 | 12.003106 | 59.9 | 29.394403 | 9.503187 | 35.5 | 13.6 | 74.149204 | 1.0 | 64.556485 | 73.9 | 75.5 | 39.0 | 14.9 | 46.2 | 65.5 | 36.2 |
2 | 1972 | 7.420710 | 13.214594 | 60.4 | 29.810221 | 10.558962 | 36.6 | 14.9 | 73.554520 | 1.2 | 63.664263 | 74.6 | 76.9 | 40.2 | 14.8 | 47.6 | 62.6 | 36.1 |
3 | 1973 | 9.653602 | 14.791613 | 60.2 | 31.147915 | 12.804602 | 38.4 | 16.4 | 73.501814 | 1.6 | 62.941502 | 74.9 | 77.4 | 40.9 | 16.5 | 50.4 | 64.3 | 36.4 |
4 | 1974 | 14.074623 | 17.444688 | 61.9 | 32.996183 | 16.204850 | 40.5 | 18.9 | 73.336811 | 2.2 | 62.413412 | 75.3 | 77.9 | 41.8 | 18.2 | 52.6 | 66.1 | 37.3 |
The dataset represents the percentage of bachelor's degrees granted to women under 17 disciplines from 1970 to 2012 and is organized in the ascending order of year.
All data in data set are stored as floating point numbers.
Set the line color for the line charts to the dark blue color and orange color from the Color Blind 10 palette.
#The color for each line is assigned here
cb_dark_blue = (0/255,107/255,164/255)
cb_orange = (255/255, 128/255, 14/255)
There are seventeen degrees that we need to generate line charts. We can therefore group the degrees into STEM, liberal arts, and other, in the following way:
#The disciplines for each categories are assigned here
stem_cats = ['Engineering', 'Computer Science', 'Psychology', 'Biology', 'Physical Sciences', 'Math and Statistics']
lib_arts_cats = ['Foreign Languages', 'English', 'Communications and Journalism', 'Art and Performance',
'Social Sciences and History'
]
other_cats = ['Health Professions', 'Public Administration', 'Education', 'Agriculture','Business', 'Architecture']
As we have already categorized seventeen degrees into three groups and maximum numbers of degrees in any category are six, we now generate a 6 row by 3 column grid of subplots. Then we Generate line charts for both male and female percentages for every degree in each categories and add text annotations for Women
and Men
in the topmost and bottommost plots.
To improve the field of view, let's remove the non-data elements from plots
let's simplify the y-axis labels by keeping just the starting and ending labels (0
and 100
). As well as let's limit the x-axis values to improve readerbility of the plots
Let's generate a horizontal line across all of the line charts where the y-axis label 50 would have been. This helps to understand which degrees have close to 50-50 gender breakdown. In addition to that we export the figure containing all of the line charts to a .png
file.
# Generate a figure for all plots
fig = plt.figure(figsize=(18, 30))
# Create a list by including 03 categories
cats = [stem_cats, lib_arts_cats, other_cats]
# Create a loop to generate 03 columns for subplots
for col in range(1,4):
# Create a nested loop to add subplots under each categories
for ct in range(0,len(cats[(col-1)])):
ax = fig.add_subplot(6,3,3*ct+col)
# Generate line charts for each degrees
ax.plot(women_degrees['Year'], women_degrees[cats[col-1][ct]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[cats[col-1][ct]], c=cb_orange, label='Men', linewidth=3)
# Set titles for all plots
ax.set_title(cats[col-1][ct])
# Enable just the y-axis labels at 0 and 100 for all plots
ax.set_yticks([0,100])
# For all plots generate a horizontal line starting at the y-axis position 50
ax.axhline(50, c=(171/255, 171/255, 171/255), alpha=0.3)
# Add text annotations for Women and Men in the topmost and bottommost plots of all coumns
if (col == 1) and (ct == 0):
ax.text(2008, 87, 'Men')
ax.text(2006, 10, 'Women')
if (col == 1) and (ct == 5):
ax.text(2008, 61, 'Men')
ax.text(2006, 37, 'Women')
if (col == 2) and (ct == 0):
ax.text(2008, 24, 'Men')
ax.text(2006, 74, 'Women')
if (col == 2) and (ct == 4):
ax.text(2008, 54, 'Men')
ax.text(2006, 43, 'Women')
if (col == 3) and (ct == 0):
ax.text(2008, 8, 'Men')
ax.text(2006, 90, 'Women')
if (col == 3) and (ct == 5):
ax.text(2008, 62, 'Men')
ax.text(2006, 36, 'Women')
# Removing all 4 spines of all plots
for spine in ax.spines.values():
spine.set_visible(False)
# Remove all of the tick marks and the x-axis labels for all line charts except the bottommost line charts each columns
ax.tick_params(bottom="off", top="off", left="off", right="off", labelbottom='off')
ax.tick_params(labelbottom = 'on')
# Limit the x-axis values of all plots
ax.set_xlim(1968,2012)
#Export the figure containing all of the line charts to "gender_degrees.png"
plt.savefig('gender_degrees.png')
#Display all figures
plt.show()
According to line charts above, we can see that:
The gender gap in majority of the degrees has decreased over the period while the degrees such as Social Sciences and History, Agriculture and Business narrowing the gap to almost zero. However, significant gender gap still exist in most of the degrees.
The dominent gender in most of the degrees remain same while in few degrees such as Psychology, Biology and Communications and Journalism has changed during the period. Further Feminine is the dominent gender in most of the degrees.
Overall, The Liberal arts is the category that has lowest gender gap.
By considering all of above observations, it can be conclude that still there is a significant gender inequality in college degrees even after implementing lots of plans to reduce gender gaps in higher education field.