Dataset Description : The Department of Education Statistics releases a data set annually containing the percentage of bachelor's degrees granted to women from 1970 to 2012. The data set is broken up into 17 categories of degrees, with each column as a separate category. This dataset was compiled to explore the gender gap in STEM fields, which stands for science, technology, engineering, and mathematics. This gap is reported on often in the news and not everyone agrees that there is a gap. Here is the link to the dataset
** Aim **: Creating easy to understand plots with the help of this dataset in order to expose the gender disparity in the varius college degrees.
# let's import the libraries and read the dataset into a dataframe:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
women_degrees = pd.read_csv('percent-bachelors-degrees-women-usa.csv')
women_degrees.head()
Year | Agriculture | Architecture | Art and Performance | Biology | Business | Communications and Journalism | Computer Science | Education | Engineering | English | Foreign Languages | Health Professions | Math and Statistics | Physical Sciences | Psychology | Public Administration | Social Sciences and History | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1970 | 4.229798 | 11.921005 | 59.7 | 29.088363 | 9.064439 | 35.3 | 13.6 | 74.535328 | 0.8 | 65.570923 | 73.8 | 77.1 | 38.0 | 13.8 | 44.4 | 68.4 | 36.8 |
1 | 1971 | 5.452797 | 12.003106 | 59.9 | 29.394403 | 9.503187 | 35.5 | 13.6 | 74.149204 | 1.0 | 64.556485 | 73.9 | 75.5 | 39.0 | 14.9 | 46.2 | 65.5 | 36.2 |
2 | 1972 | 7.420710 | 13.214594 | 60.4 | 29.810221 | 10.558962 | 36.6 | 14.9 | 73.554520 | 1.2 | 63.664263 | 74.6 | 76.9 | 40.2 | 14.8 | 47.6 | 62.6 | 36.1 |
3 | 1973 | 9.653602 | 14.791613 | 60.2 | 31.147915 | 12.804602 | 38.4 | 16.4 | 73.501814 | 1.6 | 62.941502 | 74.9 | 77.4 | 40.9 | 16.5 | 50.4 | 64.3 | 36.4 |
4 | 1974 | 14.074623 | 17.444688 | 61.9 | 32.996183 | 16.204850 | 40.5 | 18.9 | 73.336811 | 2.2 | 62.413412 | 75.3 | 77.9 | 41.8 | 18.2 | 52.6 | 66.1 | 37.3 |
women_degrees.tail()
Year | Agriculture | Architecture | Art and Performance | Biology | Business | Communications and Journalism | Computer Science | Education | Engineering | English | Foreign Languages | Health Professions | Math and Statistics | Physical Sciences | Psychology | Public Administration | Social Sciences and History | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
37 | 2007 | 47.605026 | 43.100459 | 61.4 | 59.411993 | 49.000459 | 62.5 | 17.6 | 78.721413 | 16.8 | 67.874923 | 70.2 | 85.4 | 44.1 | 40.7 | 77.1 | 82.1 | 49.3 |
38 | 2008 | 47.570834 | 42.711730 | 60.7 | 59.305765 | 48.888027 | 62.4 | 17.8 | 79.196327 | 16.5 | 67.594028 | 70.2 | 85.2 | 43.3 | 40.7 | 77.2 | 81.7 | 49.4 |
39 | 2009 | 48.667224 | 43.348921 | 61.0 | 58.489583 | 48.840474 | 62.8 | 18.1 | 79.532909 | 16.8 | 67.969792 | 69.3 | 85.1 | 43.3 | 40.7 | 77.1 | 82.0 | 49.4 |
40 | 2010 | 48.730042 | 42.066721 | 61.3 | 59.010255 | 48.757988 | 62.5 | 17.6 | 79.618625 | 17.2 | 67.928106 | 69.0 | 85.0 | 43.1 | 40.2 | 77.0 | 81.7 | 49.3 |
41 | 2011 | 50.037182 | 42.773438 | 61.2 | 58.742397 | 48.180418 | 62.2 | 18.2 | 79.432812 | 17.5 | 68.426730 | 69.5 | 84.8 | 43.1 | 40.1 | 76.7 | 81.9 | 49.2 |
The dataset represents the percentages of Women in each of the fields of education being granted a Bachelor's dergree from the year 1970 to the year 2011.
Let's first see the STEM category for the gender gap analysis.
## STEM includes 'Engineering', 'Computer Science', 'Psychology', 'Biology', 'Physical Sciences', and 'Math and Statistics':
cb_dark_blue = (0/255,107/255,164/255)
cb_orange = (255/255, 128/255, 14/255)
stem_cats = ['Engineering', 'Computer Science', 'Psychology', 'Biology', 'Physical Sciences', 'Math and Statistics']
fig = plt.figure(figsize=(18, 3))
for sp in range(0,6):
ax = fig.add_subplot(1,6,sp+1)
ax.plot(women_degrees['Year'], women_degrees[stem_cats[sp]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[stem_cats[sp]], c=cb_orange, label='Men', linewidth=3)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.set_xlim(1968, 2011)
ax.set_ylim(0,100)
ax.set_title(stem_cats[sp])
ax.tick_params(bottom="off", top="off", left="off", right="off")
if sp == 0:
ax.text(2005, 87, 'Men')
ax.text(2002, 8, 'Women')
elif sp == 5:
ax.text(2005, 62, 'Men')
ax.text(2001, 35, 'Women')
plt.show()
## Let's visualise the data in other categories too :
stem_cats = ['Psychology', 'Biology', 'Math and Statistics', 'Physical Sciences',
'Computer Science', 'Engineering']
lib_arts_cats = ['Foreign Languages', 'English', 'Communications and Journalism',
'Art and Performance', 'Social Sciences and History']
other_cats = ['Health Professions', 'Public Administration', 'Education',
'Agriculture','Business', 'Architecture']
cb_dark_blue = (0/255,107/255,164/255) #For women
cb_orange = (255/255, 128/255, 14/255) #For men
fig = plt.figure(figsize=(18, 18))
for sp in range(0,18):
ax = fig.add_subplot(6,3,sp+1)
ax.tick_params(bottom="off", top="off", left="off", right="off", labelbottom='off')
## for STEM category:
if (sp % 3 == 0):
ax.plot(women_degrees['Year'], women_degrees[stem_cats[int(sp/3)]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[stem_cats[int(sp/3)]], c=cb_orange, label='Men', linewidth=3)
ax.set_title(stem_cats[int(sp/3)])
# text annotations
loc = int(sp/3);
if loc == 0:
ax.text(2003, 85, 'Women')
ax.text(2005, 10, 'Men')
elif loc == 5:
ax.text(2005, 87, 'Men')
ax.text(2003, 7, 'Women')
ax.tick_params(labelbottom='on') # enabling the x-axis labels for the bottommost chart
ax.axhline(50, c=(171/255, 171/255, 171/255), alpha=0.3) # Generating a horizontal line of reference
## for Liberal arts category:
# when (sp = 16) we reach the end of liberal arts category, hence we need to skip that plot
elif ((sp % 3 == 1) and (sp!=16)):
ax.plot(women_degrees['Year'], women_degrees[lib_arts_cats[int((sp-1)/3)]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[lib_arts_cats[int((sp-1)/3)]], c=cb_orange, label='Men', linewidth=3)
ax.set_title(lib_arts_cats[int((sp-1)/3)])
# text annotations
loc = int((sp-1)/3);
if loc == 0:
ax.text(2003, 78, 'Women')
ax.text(2005, 18, 'Men')
elif loc == 4:
ax.tick_params(labelbottom='on') # enabling the x-axis labels for the bottommost chart
ax.axhline(50, c=(171/255, 171/255, 171/255), alpha=0.3) # Generating a horizontal line of reference
## for Other category :
elif (sp % 3 == 2):
ax.plot(women_degrees['Year'], women_degrees[other_cats[int((sp-2)/3)]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[other_cats[int((sp-2)/3)]], c=cb_orange, label='Men', linewidth=3)
ax.set_title(other_cats[int((sp-2)/3)])
# text annotations
loc = int((sp-2)/3);
if loc == 0:
ax.text(2003, 90, 'Women')
ax.text(2005, 5, 'Men')
elif loc == 5:
ax.text(2005, 62, 'Men')
ax.text(2003, 30, 'Women')
ax.tick_params(labelbottom='on') # enabling the x-axis labels for the bottommost chart
ax.axhline(50, c=(171/255, 171/255, 171/255), alpha=0.3) # Generating a horizontal line of reference
## for (sp == 16):
else :
ax.plot()
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.set_xlim(1968, 2011)
ax.set_ylim(0,100)
ax.set_yticks([0,100]) # simplifying the Y-axis labels
# Export file before calling pyplot.show()
fig.savefig("gender_degrees.png")
plt.show()
Psychology
and Biology
- Both the fields were initially dominated by men, but currently hold more women as a percentage of people achieving the Bachelor's degree.Math and Statistics
- Though the field is dominated by men, the gender disparity isn't as large.Physical Sciences
- We can see the trend lines depicting the gender disparity coming closer, from what was at first, a huge disparity.Computer Science
- Though we saw the trend lines come close during the 90's, the gender gap in this field has grown in the 2000's, nearly attaining the values that it previously had during the 80's.Engineering
- The field of Engineering has always been dominated by men, indicating either a loss of interest or bringing up a stereotype which is deep rooted in our society.Social Sciences and History
having almost no gender disparity.Health Profession
, Public Administration
and Education
Architecture
: The only field that has a slight gender disparity in this category.Agriculture
, Business
: The fields have almost no gender disparity as we see the trend lines meeting together.We see that overall, the STEM category holds the highest gender disparity amongst all the categories with Engineering
and Computer Science
having the largest gender gap.