The aim of this project is to investigate how has the gender gap changed along the years in 17 different degree courses.
The Department of Education Statistics released a dataset which shows the percentage of women graduates from 17 bachelor degrees, from 1970 to 2011. This dataset was cleaned by Randal Olson, a data scientist from the University of Pennsylvania.
Main results:
Let's start by importing the required libraries and getting an initial overview of the data.
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
# Read the dataset
women_degrees = pd.read_csv('percent-bachelors-degrees-women-usa.csv')
# Explore the data
women_degrees.head()
Year | Agriculture | Architecture | Art and Performance | Biology | Business | Communications and Journalism | Computer Science | Education | Engineering | English | Foreign Languages | Health Professions | Math and Statistics | Physical Sciences | Psychology | Public Administration | Social Sciences and History | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1970 | 4.229798 | 11.921005 | 59.7 | 29.088363 | 9.064439 | 35.3 | 13.6 | 74.535328 | 0.8 | 65.570923 | 73.8 | 77.1 | 38.0 | 13.8 | 44.4 | 68.4 | 36.8 |
1 | 1971 | 5.452797 | 12.003106 | 59.9 | 29.394403 | 9.503187 | 35.5 | 13.6 | 74.149204 | 1.0 | 64.556485 | 73.9 | 75.5 | 39.0 | 14.9 | 46.2 | 65.5 | 36.2 |
2 | 1972 | 7.420710 | 13.214594 | 60.4 | 29.810221 | 10.558962 | 36.6 | 14.9 | 73.554520 | 1.2 | 63.664263 | 74.6 | 76.9 | 40.2 | 14.8 | 47.6 | 62.6 | 36.1 |
3 | 1973 | 9.653602 | 14.791613 | 60.2 | 31.147915 | 12.804602 | 38.4 | 16.4 | 73.501814 | 1.6 | 62.941502 | 74.9 | 77.4 | 40.9 | 16.5 | 50.4 | 64.3 | 36.4 |
4 | 1974 | 14.074623 | 17.444688 | 61.9 | 32.996183 | 16.204850 | 40.5 | 18.9 | 73.336811 | 2.2 | 62.413412 | 75.3 | 77.9 | 41.8 | 18.2 | 52.6 | 66.1 | 37.3 |
# Summary statistics
women_degrees.iloc[:, 1:].describe()
Agriculture | Architecture | Art and Performance | Biology | Business | Communications and Journalism | Computer Science | Education | Engineering | English | Foreign Languages | Health Professions | Math and Statistics | Physical Sciences | Psychology | Public Administration | Social Sciences and History | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 |
mean | 33.848165 | 33.685540 | 61.100000 | 49.429864 | 40.653471 | 56.216667 | 25.809524 | 76.356236 | 12.892857 | 66.186680 | 71.723810 | 82.983333 | 44.478571 | 31.304762 | 68.776190 | 76.085714 | 45.407143 |
std | 12.552731 | 9.574057 | 1.305336 | 10.087725 | 13.116109 | 8.698610 | 6.688753 | 2.212641 | 5.670824 | 1.950990 | 1.926682 | 2.914967 | 2.646262 | 9.000866 | 9.705463 | 5.879504 | 4.763653 |
min | 4.229798 | 11.921005 | 58.600000 | 29.088363 | 9.064439 | 35.300000 | 13.600000 | 72.166525 | 0.800000 | 61.647206 | 69.000000 | 75.500000 | 38.000000 | 13.800000 | 44.400000 | 62.600000 | 36.100000 |
25% | 30.840814 | 28.520709 | 60.200000 | 44.311821 | 37.390851 | 55.125000 | 19.125000 | 74.994573 | 10.625000 | 65.583807 | 70.125000 | 81.825000 | 42.875000 | 24.875000 | 65.550000 | 74.625000 | 43.825000 |
50% | 33.317552 | 35.994852 | 61.300000 | 50.971469 | 47.210123 | 59.850000 | 27.300000 | 75.937020 | 14.100000 | 66.112018 | 71.150000 | 83.700000 | 44.900000 | 32.100000 | 72.750000 | 77.450000 | 45.300000 |
75% | 45.663953 | 40.790605 | 62.000000 | 58.679194 | 48.876139 | 62.125000 | 29.775000 | 78.619420 | 16.950000 | 67.861247 | 73.875000 | 85.175000 | 46.500000 | 40.200000 | 76.925000 | 81.100000 | 49.375000 |
max | 50.037182 | 44.499331 | 63.400000 | 62.169456 | 50.552335 | 64.600000 | 37.100000 | 79.618625 | 19.000000 | 68.894487 | 75.300000 | 86.500000 | 48.300000 | 42.200000 | 77.800000 | 82.100000 | 51.800000 |
Now, we will plot a line chart of the gender gap of each degree along the years. The first column of the plot shows the STEM (science, technology, engineering, and maths) degrees, the second one shows the liberal arts category, and the last one has the remaining courses. Each column of the figure is ordered by descending order of proportion of degrees awarded to women. Then we will be able to answer questions like:
# STEM degrees
stem_raw = ['Biology', 'Computer Science', 'Engineering', 'Math and Statistics', 'Physical Sciences', 'Psychology']
# Liberal Arts degrees
lib_arts_raw = ['Art and Performance', 'Communications and Journalism', 'English', 'Foreign Languages', 'Social Sciences and History']
# Other degrees
others_raw = ['Agriculture', 'Architecture', 'Business', 'Education', 'Health Professions', 'Public Administration']
# Order the degrees by descending women proportion in 2011.
def order_degrees(subject_category):
'''
Function that orders the degree names of each subject category in descending order of women graduate percentages
in 2011, which represents the final year of the data.
'''
women_2011 = []
for degree in subject_category:
proportion_2011 = women_degrees[degree].iloc[-1]
women_2011.append([degree, proportion_2011])
women_2011 = sorted(women_2011, key = lambda x: x[1], reverse = True)
return [item[0] for item in women_2011]
stem = order_degrees(stem_raw)
lib_arts = order_degrees(lib_arts_raw)
others = order_degrees(others_raw)
# Save the color-blinded-friendly colors.
dark_blue = (0/255,107/255,164/255)
orange = (255/255, 128/255, 14/255)
grey = (171/255, 171/255, 171/255)
# Make a function to make the plots.
fig = plt.figure(figsize = (18, 24))
def plot_gender_gap(subject_category, plot_positions):
'''
Function that plots line charts of perentages of men and women graduates from a range of degrees from 1970 to 2011.
'''
for i in range(len(subject_category)):
ax = fig.add_subplot(6, 3, plot_positions[i])
ax.plot(women_degrees['Year'], women_degrees[subject_category[i]], c = dark_blue, label = 'Women', linewidth = 3)
ax.plot(women_degrees['Year'], 100 - women_degrees[subject_category[i]], c = orange, label = 'Men', linewidth = 3)
for key, spine in ax.spines.items():
spine.set_visible(False)
ax.set_xlim(1968, 2011)
ax.set_ylim(0, 100)
ax.set_yticks([0, 100])
ax.set_title(subject_category[i])
ax.tick_params(bottom = 'off', top = 'off', left = 'off', right = 'off', labelbottom = 'off')
ax.axhline(50, c = grey, alpha = 0.3)
# Text annotations
if subject_category == stem:
if i == 0:
ax.text(2003, 15, 'Men')
ax.text(2002, 83, 'Women')
elif i == 5:
ax.text(2003, 88, 'Men')
ax.text(2001, 10, 'Women')
ax.tick_params(labelbottom = 'on')
elif subject_category == lib_arts:
if i == 0:
ax.text(2003, 20, 'Men')
ax.text(2002, 80, 'Women')
elif i == 5:
ax.tick_params(labelbottom = 'on')
else:
if i == 0:
ax.text(2003, 5, 'Men')
ax.text(2002, 90, 'Women')
elif i == 5:
ax.text(2003, 62, 'Men')
ax.text(2002, 35, 'Women')
ax.tick_params(labelbottom = 'on')
return None
# Plot the line charts
stem_positions = [1, 4, 7, 10, 13, 16]
plot_gender_gap(stem, stem_positions)
lib_arts_positions = [2, 5, 8, 11, 14]
plot_gender_gap(lib_arts, lib_arts_positions)
others_positions = [3, 6, 9, 12, 15, 18]
plot_gender_gap(others, others_positions)
plt.savefig('gender_gap_plot.png')
plt.show()
Liberal Arts Observations: The most consistent category in terms of gap gender is Liberal Arts. Arts & Performance and English have remained almost constant, and are just inclined towards women, who on average represent 61% and 66% of the graduates respectively. Communications and Journalism has remained almost static as well with a women majority of about 60% since 1978, but before that it had a men majority of up to 65% in 1970, and it is the only degree in this category to see a significant change of leading gender. Foreign Languages has been gradually closing the gap, getting to a women representation of 69% in 2011 from 75% in 1970. At last, the gender gap has clompletely disappeared at Social Sciences & History, considering it had a 64% of men proportion in 1970.
Stem Observations: Stem fields are mostly more inclined for men. Four out of 6 degrees considered have men majorities, and out of them, the bigger differences are seen in Computer Science and Engineering Majors. In 1970, Computer Science had a huge proportion of men, with about 86%, then the gap began closing sharply until 1982, but since then it has been expanding again and has almost reached its initial distribution. In Engineering, although higher proportions of women have graduated over time, it still has a heavy men majority. The other 2 degrees with men majorities are Physical Sciences, where the gap closed very fast from 1970 to 2000 and then remained static at a 58% of men representation, and Math & Statistics, which has not seen drastic changes and men are leading with around 55%. The remaining 2 are currently women majority. Psychology had equal shares back in 1972, and has since seen a constant increase in women proportions, getting to 78% in 2011. Lastly, Biology is the only one where the leading gender has changed, women representation has constantly increased and reached around 60% of graduates, compared to only 29% in 1970.
Others Observations The remaining six courses can be equally divided into two tendencies. Three have had consistent gender gaps with heavy women majorities, being Health Professions, Public Administration, and Educations, with mean shares of 82%, 76%, and 76% respectively. The other three have had huge increases in women proportions, leading to equal representations in Agriculture and Business from 4% and 9% in 1970, and to a similar distribution of genders in Architecture.