#!/usr/bin/env python # coding: utf-8 # # Visualizing The Gender Gap Evolution In College Degrees # # ## Introduction # # The aim of this project is to investigate how has the gender gap changed along the years in 17 different degree courses. # # [The Department of Education Statistics](https://nces.ed.gov/programs/digest/2013menu_tables.asp) released a dataset which shows the percentage of women graduates from 17 bachelor degrees, from 1970 to 2011. This dataset was cleaned by Randal Olson, a data scientist from the University of Pennsylvania. # # Main results: # # - The gender gap is the most consistent in Liberal Arts Degrees, where all 5 courses analyzed have constant women majorities. # - STEM degrees are in general very popular for men, where Engineering and Computer Science have huge men majorities. # - 4 of the 17 degrees considered have seen significant increases in women proportions, leading them to close the gender gap. # ## Exploring the data # # Let's start by importing the required libraries and getting an initial overview of the data. # In[1]: get_ipython().run_line_magic('matplotlib', 'inline') import pandas as pd import matplotlib.pyplot as plt # Read the dataset women_degrees = pd.read_csv('percent-bachelors-degrees-women-usa.csv') # Explore the data women_degrees.head() # In[2]: # Summary statistics women_degrees.iloc[:, 1:].describe() # ## Development of the gender gap # # Now, we will plot a line chart of the gender gap of each degree along the years. The first column of the plot shows the STEM (science, technology, engineering, and maths) degrees, the second one shows the liberal arts category, and the last one has the remaining courses. *Each column of the figure is ordered by descending order of proportion of degrees awarded to women.* Then we will be able to answer questions like: # # - Which degree has the most consistent gap? # - Which degree has seen the biggest changes in gender gap along the years? # - Which category is the most popular for men? # In[5]: # STEM degrees stem_raw = ['Biology', 'Computer Science', 'Engineering', 'Math and Statistics', 'Physical Sciences', 'Psychology'] # Liberal Arts degrees lib_arts_raw = ['Art and Performance', 'Communications and Journalism', 'English', 'Foreign Languages', 'Social Sciences and History'] # Other degrees others_raw = ['Agriculture', 'Architecture', 'Business', 'Education', 'Health Professions', 'Public Administration'] # Order the degrees by descending women proportion in 2011. def order_degrees(subject_category): ''' Function that orders the degree names of each subject category in descending order of women graduate percentages in 2011, which represents the final year of the data. ''' women_2011 = [] for degree in subject_category: proportion_2011 = women_degrees[degree].iloc[-1] women_2011.append([degree, proportion_2011]) women_2011 = sorted(women_2011, key = lambda x: x[1], reverse = True) return [item[0] for item in women_2011] stem = order_degrees(stem_raw) lib_arts = order_degrees(lib_arts_raw) others = order_degrees(others_raw) # Save the color-blinded-friendly colors. dark_blue = (0/255,107/255,164/255) orange = (255/255, 128/255, 14/255) grey = (171/255, 171/255, 171/255) # Make a function to make the plots. fig = plt.figure(figsize = (18, 24)) def plot_gender_gap(subject_category, plot_positions): ''' Function that plots line charts of perentages of men and women graduates from a range of degrees from 1970 to 2011. ''' for i in range(len(subject_category)): ax = fig.add_subplot(6, 3, plot_positions[i]) ax.plot(women_degrees['Year'], women_degrees[subject_category[i]], c = dark_blue, label = 'Women', linewidth = 3) ax.plot(women_degrees['Year'], 100 - women_degrees[subject_category[i]], c = orange, label = 'Men', linewidth = 3) for key, spine in ax.spines.items(): spine.set_visible(False) ax.set_xlim(1968, 2011) ax.set_ylim(0, 100) ax.set_yticks([0, 100]) ax.set_title(subject_category[i]) ax.tick_params(bottom = 'off', top = 'off', left = 'off', right = 'off', labelbottom = 'off') ax.axhline(50, c = grey, alpha = 0.3) # Text annotations if subject_category == stem: if i == 0: ax.text(2003, 15, 'Men') ax.text(2002, 83, 'Women') elif i == 5: ax.text(2003, 88, 'Men') ax.text(2001, 10, 'Women') ax.tick_params(labelbottom = 'on') elif subject_category == lib_arts: if i == 0: ax.text(2003, 20, 'Men') ax.text(2002, 80, 'Women') elif i == 5: ax.tick_params(labelbottom = 'on') else: if i == 0: ax.text(2003, 5, 'Men') ax.text(2002, 90, 'Women') elif i == 5: ax.text(2003, 62, 'Men') ax.text(2002, 35, 'Women') ax.tick_params(labelbottom = 'on') return None # Plot the line charts stem_positions = [1, 4, 7, 10, 13, 16] plot_gender_gap(stem, stem_positions) lib_arts_positions = [2, 5, 8, 11, 14] plot_gender_gap(lib_arts, lib_arts_positions) others_positions = [3, 6, 9, 12, 15, 18] plot_gender_gap(others, others_positions) plt.savefig('gender_gap_plot.png') plt.show() # **Liberal Arts Observations**: # The most consistent category in terms of gap gender is Liberal Arts. Arts & Performance and English have remained almost constant, and are just inclined towards women, who on average represent 61% and 66% of the graduates respectively. Communications and Journalism has remained almost static as well with a women majority of about 60% since 1978, but before that it had a men majority of up to 65% in 1970, and it is the only degree in this category to see a significant change of leading gender. Foreign Languages has been gradually closing the gap, getting to a women representation of 69% in 2011 from 75% in 1970. At last, the gender gap has clompletely disappeared at Social Sciences & History, considering it had a 64% of men proportion in 1970. # # **Stem Observations**: # Stem fields are mostly more inclined for men. Four out of 6 degrees considered have men majorities, and out of them, the bigger differences are seen in Computer Science and Engineering Majors. In 1970, Computer Science had a huge proportion of men, with about 86%, then the gap began closing sharply until 1982, but since then it has been expanding again and has almost reached its initial distribution. In Engineering, although higher proportions of women have graduated over time, it still has a heavy men majority. The other 2 degrees with men majorities are Physical Sciences, where the gap closed very fast from 1970 to 2000 and then remained static at a 58% of men representation, and Math & Statistics, which has not seen drastic changes and men are leading with around 55%. The remaining 2 are currently women majority. Psychology had equal shares back in 1972, and has since seen a constant increase in women proportions, getting to 78% in 2011. Lastly, Biology is the only one where the leading gender has changed, women representation has constantly increased and reached around 60% of graduates, compared to only 29% in 1970. # # **Others Observations** # The remaining six courses can be equally divided into two tendencies. Three have had consistent gender gaps with heavy women majorities, being Health Professions, Public Administration, and Educations, with mean shares of 82%, 76%, and 76% respectively. The other three have had huge increases in women proportions, leading to equal representations in Agriculture and Business from 4% and 9% in 1970, and to a similar distribution of genders in Architecture.