Almost everyone talks about gender inequality. While some discussed about this based on their general opininions and what they have heard from somewhere else, others base their discussion on solid data. Here we are going to do the same, talk about gender inequality/equality by looking into the dataset that deals with the percentage of bachelor's degrees granted to women from 1970 to 2012.
The Institute of Education Sciences (IES) is the statistics, research, and evaluation arm of the U.S. Department of Education. This Department releases a data set annually containing the percentage of bachelor's degrees granted to women from 1970. The data set is broken up into 17 categories of degrees, with each column as a separate category.
Randal Olson, a data scientist at University of Pennsylvania, has cleaned the data set and made it available on his personal website. You can download the dataset Randal compiled here.
We use this extensive data set to communicate the nuanced narrative of gender gap using effective data visualization.
So let's get started. We will first start by reading the data set and importing the necessary libraries needed for the visualization.
# Importing necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
# to show the plots within the notebook
%matplotlib inline
# Reading the data set
women_degrees = pd.read_csv(r'C:\Users\Surface GO\Downloads\percent-bachelors-degrees-women-usa.csv')
Now that we have the data set read to the system, let us go ahead and check out the contents.
print(women_degrees.columns)
women_degrees.head(5)
Index(['Year', 'Agriculture', 'Architecture', 'Art and Performance', 'Biology', 'Business', 'Communications and Journalism', 'Computer Science', 'Education', 'Engineering', 'English', 'Foreign Languages', 'Health Professions', 'Math and Statistics', 'Physical Sciences', 'Psychology', 'Public Administration', 'Social Sciences and History'], dtype='object')
Year | Agriculture | Architecture | Art and Performance | Biology | Business | Communications and Journalism | Computer Science | Education | Engineering | English | Foreign Languages | Health Professions | Math and Statistics | Physical Sciences | Psychology | Public Administration | Social Sciences and History | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1970 | 4.229798 | 11.921005 | 59.7 | 29.088363 | 9.064439 | 35.3 | 13.6 | 74.535328 | 0.8 | 65.570923 | 73.8 | 77.1 | 38.0 | 13.8 | 44.4 | 68.4 | 36.8 |
1 | 1971 | 5.452797 | 12.003106 | 59.9 | 29.394403 | 9.503187 | 35.5 | 13.6 | 74.149204 | 1.0 | 64.556485 | 73.9 | 75.5 | 39.0 | 14.9 | 46.2 | 65.5 | 36.2 |
2 | 1972 | 7.420710 | 13.214594 | 60.4 | 29.810221 | 10.558962 | 36.6 | 14.9 | 73.554520 | 1.2 | 63.664263 | 74.6 | 76.9 | 40.2 | 14.8 | 47.6 | 62.6 | 36.1 |
3 | 1973 | 9.653602 | 14.791613 | 60.2 | 31.147915 | 12.804602 | 38.4 | 16.4 | 73.501814 | 1.6 | 62.941502 | 74.9 | 77.4 | 40.9 | 16.5 | 50.4 | 64.3 | 36.4 |
4 | 1974 | 14.074623 | 17.444688 | 61.9 | 32.996183 | 16.204850 | 40.5 | 18.9 | 73.336811 | 2.2 | 62.413412 | 75.3 | 77.9 | 41.8 | 18.2 | 52.6 | 66.1 | 37.3 |
We can see that there are 17 majors listed here.
'Agriculture', 'Architecture', 'Art and Performance', 'Biology', 'Business', 'Communications and Journalism', 'Computer Science', 'Education', 'Engineering', 'English', 'Foreign Languages', 'Health Professions', 'Math and Statistics', 'Physical Sciences', 'Psychology', 'Public Administration', 'Social Sciences and History'
These majors can be divided into 3 groups.
Major | Category |
---|---|
Psychology | STEM |
Biology | STEM |
Math and Statistics | STEM |
Physical Sciences | STEM |
Computer Science | STEM |
Engineering | STEM |
Foreign Languages | Liberal Arts |
English | Liberal Arts |
Communications and Journalism | Liberal Arts |
Art and Performance | Liberal Arts |
Social Sciences and History | Liberal Arts |
Health Profession | Others |
Public Administration | Others |
Education | Others |
Agriculture | Others |
Business | Others |
Architecture | Others |
So let us create 3 different list of these categories for more efficient analysis
# Creating STEM Category list
stem_cats = ['Psychology', 'Biology', 'Math and Statistics', 'Physical Sciences', 'Computer Science', 'Engineering']
# Creating Liberal Category list
lib_arts_cats = [
'Foreign Languages', 'English', 'Communications and Journalism', 'Art and Performance',
'Social Sciences and History']
# Creating Other Category list
other_cats = ['Health Professions', 'Public Administration', 'Education', 'Agriculture','Business', 'Architecture']
Now that our data is ready to be visualised, let us look into some of the plot aesthetics. Our aim from this plot is educating anyone about gender differences in the education field through the medium of a few plots.
In order to create a easily understandable visual cues, we need to colour code our plot lines. While choosing the colour we are also going to tare considerations of people who are colour blind. So we are going to choose those colours that creates less confusion to those who might have colour blindness.
# Choosing the colours
cb_dark_blue = (0/255,107/255,164/255)
cb_orange = (255/255, 128/255, 14/255)
Now we need to create a canvas big enough to include all our plots. Since we have total of 17 plots which are divided into 3 group with each having 6,5 and 6 plots respectively, we need to have an area of 6x3 rectangle.
We will have 3 coulmns for each of these groups with a maxiumum of 6 rows.
In order to create the subplots we have to loop through the subplots. To do that effectively, we need to know the numbering system of the subplots.
Our plan is to have subplots of each group in one column. So we need to access (6,3,1) (6,3,4) (6,3,7) and so on. So we have to construct our for loop accordingly.
# Creating the plotting area
fig = plt.figure(figsize=(18, 30))
fig.suptitle('Gender Gap in various Majors over the period of 1970 to 2010', fontsize=16)
##### Creating the plot for STEM group ##########
for i in range(0,6): # For loop to go through each 6 subplots
ax = fig.add_subplot( 6, 3, ( i*3 + 1 )) # Finds 1st column in each row
x = women_degrees['Year'] # x value is the Year
y = women_degrees[stem_cats[i]] # y value is the major
y_men = 100 -y # y men is found by subtracting women % from 100
ax.plot(x, y , c=cb_dark_blue, label='Women', linewidth=3) # Plotting the data of women
ax.plot(x, y_men, c=cb_orange, label='Men', linewidth=3) # Plotting the data of men
ax.set_title(stem_cats[i]) #Giving the plot its title
#### Improving the Plot Aesthetics #####
ax.set_xlim(1968, 2011) # Setting the x limit
ax.set_ylim(0,100) # Setting the y limit
for spine in ax.spines.values(): # Removing all 4 spines of a plot
spine.set_visible(False)
ax.set_yticks([0,100]) # Setting the y ticks to 0 and 100
ax.axhline(50, c=(
171/255, 171/255, 171/255), alpha=0.3) # Creating a 50% line to give visual clue
# Removing all the other ticks from the plot
ax.tick_params(bottom=False, top=False, left=False, right=False, labelbottom =False)
# Setting the labels for easy understanding
if i == 0: # Condition to set the label only on the top plot
ax.text(2005, 15, 'Men')
ax.text(2004, 82, 'Women')
elif i == 5: # condition to set the label only on the bottom plot
ax.text(2005, 90, 'Men')
ax.text(2004, 10, 'Women')
ax.tick_params(labelbottom=True) # Enabling x ticks only on the bottom plot
##### Creating the plot for Liberal arts group #############
for i in range(0,5): # For loop to go through each 5 subplots
ax = fig.add_subplot( 6, 3, ( i*3 + 2 )) # Finds 1st column in each row
x = women_degrees['Year'] # x value is the Year
y = women_degrees[lib_arts_cats[i]] # y value is the major
y_men = 100 -y # y men is found by subtracting women % from 100
ax.plot(x, y , c=cb_dark_blue, label='Women', linewidth=3) # Plotting the data of women
ax.plot(x, y_men, c=cb_orange, label='Men', linewidth=3) # Plotting the data of men
ax.set_title(lib_arts_cats[i]) # Giving the plot its title
#### Improving the Plot Aesthetics #####
ax.set_xlim(1968, 2011) # Setting the x limit
ax.set_ylim(0,100) # Setting the y limit
for spine in ax.spines.values(): # Removing all 4 spines of a plot
spine.set_visible(False)
ax.set_yticks([0,100]) # Setting the y ticks to 0 and 100
ax.axhline(50, c=(
171/255, 171/255, 171/255), alpha=0.3) # Creating a 50% line to give visual clue
# Removing all the other ticks from the plot
ax.tick_params(bottom=False, top=False, left=False, right=False, labelbottom =False)
# Setting the labels for easy understanding
if i == 0: # Condition to set the label only on the top plot
ax.text(2005, 20, 'Men')
ax.text(2004, 75, 'Women')
elif i == 4: # condition to set the label only on the bottom plot
ax.text(2005, 58, 'Men')
ax.text(2004, 40, 'Women')
ax.tick_params(labelbottom=True) # Enabling x ticks only on the bottom plot
##### Creating the plot for Other group ############
for i in range(0,6): # For loop to go through each 6 subplots
ax = fig.add_subplot( 6, 3, ( i*3 + 3 )) # Finds 1st column in each row
x = women_degrees['Year'] # x value is the Year
y = women_degrees[other_cats[i]] # y value is the major
y_men = 100 -y # y men is found by subtracting women % from 100
ax.plot(x, y , c=cb_dark_blue, label='Women', linewidth=3) # Plotting the data of women
ax.plot(x, y_men, c=cb_orange, label='Men', linewidth=3) # Plotting the data of men
ax.set_title(other_cats[i]) #Giving the plot its title
#### Improving the Plot Aesthetics #####
ax.set_xlim(1968, 2011) # Setting the x limit
ax.set_ylim(0,100) # Setting the y limit
for spine in ax.spines.values(): # Removing all 4 spines of a plot
spine.set_visible(False)
ax.set_yticks([0,100]) # Setting the y ticks to 0 and 100
ax.axhline(50, c=(
171/255, 171/255, 171/255), alpha=0.3) # Creating a 50% line to give visual clue
# Removing all the other ticks from the plot
ax.tick_params(bottom=False, top=False, left=False, right=False, labelbottom =False)
# Setting the labels for easy understanding
if i == 0: # Condition to set the label only on the top plot
ax.text(2006, 7, 'Men')
ax.text(2004, 93, 'Women')
elif i == 5: # condition to set the label only on the bottom plot
ax.text(2005, 65, 'Men')
ax.text(2004, 35, 'Women')
ax.tick_params(labelbottom=True) # Enabling x ticks only on the bottom plot
###### Plot Printing #######
plt.savefig("gender_degrees.png") # To Save the plot as .png image
plt.show() # To show the plot
Things have drastically changed from 1970 to 2010. In 1970 in 11 out of 17 majors, women percentage is below the 50% line, but by 2010 that number has changed to 8 out of 17! But in 3 out of these 8 majors, women are behind men by a very small fraction. That translates to this. In 12 out of 18 majors we have analyses, women are either ahead of men or in neck to neck competition.
By analysing the data we can see that in almost all majors, over the year, women are improving their presence. Though in most of the STEM majors, women still need to make a strong comeback.
This data set shows us the a cross sectional picture of gender gap over the years in US education. Since education defines how a society will function as a whole, we can try to interpolate these observation to the whole society. If that is the case, we can say that geneder gap is not myth, it really exists, but things are changing for better. In the near future, we can hope to have equal opportunity irrespective of the gender.