To determine whether there is a gender gap between men and women in certain academic fields, we will examine the percentage of women who earned college degrees in those fields.
Data in this dataset is originally from the Department of Education Statistics. It contains the percentage of bachelor's degrees granted to women from 1970 to 2012 and is broken up into 17 categories of degrees, with each column as a separate category.
Randal Olson, a data scientist at University of Pennsylvania, has cleaned the data set and made it available on his personal website. Here's a preview of the first few rows:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
women_degrees = pd.read_csv('percent-bachelors-degrees-women-usa.csv')
women_degrees.head()
Year | Agriculture | Architecture | Art and Performance | Biology | Business | Communications and Journalism | Computer Science | Education | Engineering | English | Foreign Languages | Health Professions | Math and Statistics | Physical Sciences | Psychology | Public Administration | Social Sciences and History | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1970 | 4.229798 | 11.921005 | 59.7 | 29.088363 | 9.064439 | 35.3 | 13.6 | 74.535328 | 0.8 | 65.570923 | 73.8 | 77.1 | 38.0 | 13.8 | 44.4 | 68.4 | 36.8 |
1 | 1971 | 5.452797 | 12.003106 | 59.9 | 29.394403 | 9.503187 | 35.5 | 13.6 | 74.149204 | 1.0 | 64.556485 | 73.9 | 75.5 | 39.0 | 14.9 | 46.2 | 65.5 | 36.2 |
2 | 1972 | 7.420710 | 13.214594 | 60.4 | 29.810221 | 10.558962 | 36.6 | 14.9 | 73.554520 | 1.2 | 63.664263 | 74.6 | 76.9 | 40.2 | 14.8 | 47.6 | 62.6 | 36.1 |
3 | 1973 | 9.653602 | 14.791613 | 60.2 | 31.147915 | 12.804602 | 38.4 | 16.4 | 73.501814 | 1.6 | 62.941502 | 74.9 | 77.4 | 40.9 | 16.5 | 50.4 | 64.3 | 36.4 |
4 | 1974 | 14.074623 | 17.444688 | 61.9 | 32.996183 | 16.204850 | 40.5 | 18.9 | 73.336811 | 2.2 | 62.413412 | 75.3 | 77.9 | 41.8 | 18.2 | 52.6 | 66.1 | 37.3 |
A gap between the number of women versus men in the fields of Science, Technology, Engineering and Math (STEM) has frequently been reported in the news. How severe is it and has it improved over time? Let's take a look by producing line charts showing the breakout of college degrees earned by women versus men in STEM fields. Our dataset doesn't have the percentage of men receiving degrees explicitly, but we can derive that percentage by subtracting the percentage of women from 100. We will order our line charts by decreasing gender gap.
# STEM categories in order of decreasing gender gap
stem_cats = ['Engineering', 'Computer Science', 'Psychology', 'Biology', 'Physical Sciences', 'Math and Statistics']
fig = plt.figure(figsize=(18, 3))
for sp in range(0,6):
ax = fig.add_subplot(1,6,sp+1)
# Create line chart showing percent of women earning degrees in blue with slighly heavier line than default
ax.plot(women_degrees['Year'], women_degrees[stem_cats[sp]], c='blue', label='Women', linewidth=3)
# Create line chart showing percent of men earning degrees in red with slighly heavier line than default.
# To get the percent of men earning degrees, we just subtract the percent of women from 100
ax.plot(women_degrees['Year'], 100-women_degrees[stem_cats[sp]], c='red', label='Men', linewidth=3)
# Remove all spines on charts
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
# Set x-axis limits to min and max years in dataset
ax.set_xlim(1968, 2011)
# Set y-axis limits to 0 and 100 since we are charting percentages
ax.set_ylim(0,100)
# Set title to current STEM category
ax.set_title(stem_cats[sp])
# Remove tick marks
ax.tick_params(bottom="off", top="off", left="off", right="off")
# Label lines in first and last charts at specific x,y points
if sp == 0:
ax.text(2005, 87, 'Men')
ax.text(2002, 8, 'Women')
elif sp == 5:
ax.text(2005, 62, 'Men')
ax.text(2001, 35, 'Women')
plt.show()
These line charts show that the gender gap in STEM fields improved quite a bit over the four decades in our dataset. By 2012, nearly as many Math and Statistics degrees were granted to women as men. The gender gap also narrowed significantly in the Physical Sciences and actually flipped to become female-dominated in the fields of Psychology and Biology. Engineering continues to narrow its gender gap very slowly. Unfortunately, Computer Science bucks the positive trend shown in the other STEM fields. This academic field has reversed the progress gains it made in the 1980s and increased its gender gap in the last decade in the dataset.
Now let's take a look at all the degrees in this dataset. We'll split them into three columns: STEM majors, liberal arts majors and majors in other fields, and we'll rank them in descending order of percentage of degrees earned by women.
Reducing Non-Data Ink
In keeping with the principles of elegant design advocated by Edward Tufte, we will eliminate as much non-data ink as possible in our data visualization by doing the following where appropriate:
Accessibility
We also need to make sure that our visualizations can be consumed by as wide an audience as possible. Across the world, color blindness affects 8% of all men and 0.5% of women. So for accessibility, we'll use colorblind-friendly colors from the Tableau Colorblind 10 color palette.
# Colorblind-friendly dark blue and orange
cb_dark_blue = (0/255,107/255,164/255)
cb_orange = (255/255, 128/255, 14/255)
# College majors in descending order of percentage earned by women
stem_cats = ['Psychology', 'Biology', 'Math and Statistics', 'Physical Sciences', 'Computer Science', 'Engineering']
lib_arts_cats = ['Foreign Languages', 'English', 'Communications and Journalism', 'Art and Performance', 'Social Sciences and History']
other_cats = ['Health Professions', 'Public Administration', 'Education', 'Agriculture','Business', 'Architecture']
fig = plt.figure(figsize=(15, 18))
for r in range(0,6):
for c in range(0,3):
if c==1 and r == 5:
None
else:
ax = fig.add_subplot(6,3,(r*3)+c+1)
# First column STEM degress
if c==0:
ax.plot(women_degrees['Year'], women_degrees[stem_cats[r]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[stem_cats[r]], c=cb_orange, label='Men', linewidth=3)
ax.set_title(stem_cats[r])
# Second column liberal arts degress
elif c==1 and r < 5:
ax.plot(women_degrees['Year'], women_degrees[lib_arts_cats[r]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[lib_arts_cats[r]], c=cb_orange, label='Men', linewidth=3)
ax.set_title(lib_arts_cats[r])
# Third column other degress
elif c==2:
ax.plot(women_degrees['Year'], women_degrees[other_cats[r]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[other_cats[r]], c=cb_orange, label='Men', linewidth=3)
ax.set_title(other_cats[r])
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.set_xlim(1968, 2011)
ax.set_ylim(0,100)
ax.tick_params(bottom=False, top=False, left=False, right=False)
# Limit y-axis labels to just 0 and 100
ax.set_yticks([0,100])
# Add midpoint line in a colorblind-friendly shade of gray
ax.axhline(50, c=(171/255, 171/255, 171/255), alpha=0.3)
# Add line labels at an x,y point
if r == 0:
# Psychology
if c == 0:
ax.text(2000, 85, 'Women')
ax.text(2002, 10, 'Men')
# Foreign Languages
elif c == 1:
ax.text(1998, 78, 'Women')
ax.text(2002, 20, 'Men')
# Health Professions
elif c == 2:
ax.text(2000, 92, 'Women')
ax.text(2002, 5, 'Men')
elif r == 5:
# Engineering
if c == 0:
ax.text(2005, 90, 'Men')
ax.text(2000, 7, 'Women')
# Architecture
if c == 2:
ax.text(2005, 68, 'Men')
ax.text(2002, 30, 'Women')
# Remove x-axis labels in all but lowest charts
if (c == 0 and r < 5) or (c == 1 and r < 4) or (c == 2 and r < 5):
ax.tick_params(labelbottom=False)
# Save plot area as .png
plt.savefig('gender_degrees.png')
plt.show()
The line charts above illustrate that the gender gap has stayed relatively constant in several of the liberal arts majors, while changing significantly in selected majors in the Other category. Foreign Languages, English, Art and Performance and Education still have a similar majority of female graduates as they did in the 1970s. In contrast, women have gained the most ground in Agriculture, Business and Architecture, narrowing the gender gap to almost nothing in these fields. Women continue to gain ground in Health Professions and Public Administration. The liberal arts field of Communication and Journalism joins Pyschology and Biology in having the distinction of now being female-dominated, whereas the opposite was true in 1970.