Welcome to my notebook for the fifth guided project for Dataquest's Data Scientist in Python path. This time, we're going to practice improving plot aesthetics and making our visualizations as effective as possible.
We will be using data compiled by Randal Olson on the percentage of bachelors degrees awarded to women in USA. The raw data can be found at the website of the National Center for Education Statistics.
First, we start with loading up the the modules and the data.
# Loading modules
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
# Loading in data
women_degrees = pd.read_csv('percent-bachelors-degrees-women-usa.csv')
In presenting colored visualizations, we may want to consider whether the colors we use can be differentiated by color-blind people. To take this into account, we'll use a color-blind-friendly color palette. Let's load in the adjusted RGB values for those.
# Setting RGB values for our color palette
cb_dark_blue = (0/255,107/255,164/255)
cb_orange = (255/255, 128/255, 14/255)
cb_gray = (171/255, 171/255, 171/255)
Now, let's begin generating our first graph. We only consider the STEM degrees to set up our first few lines of code for generating the visualizations.
stem_cats = ['Engineering', 'Computer Science', 'Psychology', 'Biology', 'Physical Sciences', 'Math and Statistics']
fig = plt.figure(figsize=(18, 3))
for sp in range(0,6):
ax = fig.add_subplot(1,6,sp+1)
ax.plot(women_degrees['Year'], women_degrees[stem_cats[sp]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[stem_cats[sp]], c=cb_orange, label='Men', linewidth=3)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.set_xlim(1968, 2011)
ax.set_ylim(0,100)
ax.set_title(stem_cats[sp])
ax.tick_params(right='off', left='off', bottom='off', top='off')
if sp == 0:
ax.text(2005, 87, 'Men')
ax.text(2002, 8, 'Women')
elif sp == 5:
ax.text(2005, 62, 'Men')
ax.text(2001, 35, 'Women')
Let's also have a quick look at our data set first before moving forward.
women_degrees.head()
Year | Agriculture | Architecture | Art and Performance | Biology | Business | Communications and Journalism | Computer Science | Education | Engineering | English | Foreign Languages | Health Professions | Math and Statistics | Physical Sciences | Psychology | Public Administration | Social Sciences and History | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1970 | 4.229798 | 11.921005 | 59.7 | 29.088363 | 9.064439 | 35.3 | 13.6 | 74.535328 | 0.8 | 65.570923 | 73.8 | 77.1 | 38.0 | 13.8 | 44.4 | 68.4 | 36.8 |
1 | 1971 | 5.452797 | 12.003106 | 59.9 | 29.394403 | 9.503187 | 35.5 | 13.6 | 74.149204 | 1.0 | 64.556485 | 73.9 | 75.5 | 39.0 | 14.9 | 46.2 | 65.5 | 36.2 |
2 | 1972 | 7.420710 | 13.214594 | 60.4 | 29.810221 | 10.558962 | 36.6 | 14.9 | 73.554520 | 1.2 | 63.664263 | 74.6 | 76.9 | 40.2 | 14.8 | 47.6 | 62.6 | 36.1 |
3 | 1973 | 9.653602 | 14.791613 | 60.2 | 31.147915 | 12.804602 | 38.4 | 16.4 | 73.501814 | 1.6 | 62.941502 | 74.9 | 77.4 | 40.9 | 16.5 | 50.4 | 64.3 | 36.4 |
4 | 1974 | 14.074623 | 17.444688 | 61.9 | 32.996183 | 16.204850 | 40.5 | 18.9 | 73.336811 | 2.2 | 62.413412 | 75.3 | 77.9 | 41.8 | 18.2 | 52.6 | 66.1 | 37.3 |
print(women_degrees.shape)
women_degrees.describe()
(42, 18)
Year | Agriculture | Architecture | Art and Performance | Biology | Business | Communications and Journalism | Computer Science | Education | Engineering | English | Foreign Languages | Health Professions | Math and Statistics | Physical Sciences | Psychology | Public Administration | Social Sciences and History | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 | 42.000000 |
mean | 1990.500000 | 33.848165 | 33.685540 | 61.100000 | 49.429864 | 40.653471 | 56.216667 | 25.809524 | 76.356236 | 12.892857 | 66.186680 | 71.723810 | 82.983333 | 44.478571 | 31.304762 | 68.776190 | 76.085714 | 45.407143 |
std | 12.267844 | 12.552731 | 9.574057 | 1.305336 | 10.087725 | 13.116109 | 8.698610 | 6.688753 | 2.212641 | 5.670824 | 1.950990 | 1.926682 | 2.914967 | 2.646262 | 9.000866 | 9.705463 | 5.879504 | 4.763653 |
min | 1970.000000 | 4.229798 | 11.921005 | 58.600000 | 29.088363 | 9.064439 | 35.300000 | 13.600000 | 72.166525 | 0.800000 | 61.647206 | 69.000000 | 75.500000 | 38.000000 | 13.800000 | 44.400000 | 62.600000 | 36.100000 |
25% | 1980.250000 | 30.840814 | 28.520709 | 60.200000 | 44.311821 | 37.390851 | 55.125000 | 19.125000 | 74.994573 | 10.625000 | 65.583807 | 70.125000 | 81.825000 | 42.875000 | 24.875000 | 65.550000 | 74.625000 | 43.825000 |
50% | 1990.500000 | 33.317552 | 35.994852 | 61.300000 | 50.971469 | 47.210123 | 59.850000 | 27.300000 | 75.937020 | 14.100000 | 66.112018 | 71.150000 | 83.700000 | 44.900000 | 32.100000 | 72.750000 | 77.450000 | 45.300000 |
75% | 2000.750000 | 45.663953 | 40.790605 | 62.000000 | 58.679194 | 48.876139 | 62.125000 | 29.775000 | 78.619420 | 16.950000 | 67.861247 | 73.875000 | 85.175000 | 46.500000 | 40.200000 | 76.925000 | 81.100000 | 49.375000 |
max | 2011.000000 | 50.037182 | 44.499331 | 63.400000 | 62.169456 | 50.552335 | 64.600000 | 37.100000 | 79.618625 | 19.000000 | 68.894487 | 75.300000 | 86.500000 | 48.300000 | 42.200000 | 77.800000 | 82.100000 | 51.800000 |
We see that we have 17 college degrees with varying patterns in terms of female representation. In the next section, we will generate a similar graph but containing all the college degrees in our data set.
Before we begin generating the charts for all seventeen majors, we will categorize them into three groups:
After organizing them into the different categories, we will arrange them in descending order based on the percentage of women for the last year in our data set.
# Classifying degrees based in categories
stem = ['Biology', 'Computer Science', 'Engineering', 'Math and Statistics', 'Physical Sciences', 'Psychology']
lib_arts = ['Art and Performance', 'Communications and Journalism', 'English', 'Foreign Languages', 'Social Sciences and History']
other = ['Agriculture', 'Architecture', 'Business', 'Education', 'Health Professions', 'Public Administration']
# Sorting the degrees within each category based on ending share of degrees awarded to women
stem_sorted = women_degrees[stem].sort_values(women_degrees[stem].shape[0] - 1, axis=1, ascending=False)
lib_arts_sorted = women_degrees[lib_arts].sort_values(women_degrees[lib_arts].shape[0] - 1, axis=1, ascending=False)
other_sorted = women_degrees[other].sort_values(women_degrees[other].shape[0] - 1, axis=1, ascending=False)
# Creating sorted list of degree categories
stem_cats = list(stem_sorted.columns)
lib_arts_cats = list(lib_arts_sorted.columns)
other_cats = list(other_sorted.columns)
Let's see the categories and their ordering.
print(stem_cats)
print(lib_arts_cats)
print(other_cats)
['Psychology', 'Biology', 'Math and Statistics', 'Physical Sciences', 'Computer Science', 'Engineering'] ['Foreign Languages', 'English', 'Communications and Journalism', 'Art and Performance', 'Social Sciences and History'] ['Health Professions', 'Public Administration', 'Education', 'Agriculture', 'Business', 'Architecture']
Let's now generate our figure with subplots arranged in six rows and three columns (6 by 3). The first column is where we will graph the STEM degrees, the second column is for the liberal arts degrees, and the third column for the other degrees.
# Generating the figures
fig = plt.figure(figsize=(18, 12))
# Generating plots for first column (STEM degrees)
for sp in range(0, 18, 3):
cat_index = int(sp / 3)
ax = fig.add_subplot(6, 3, sp + 1)
ax.plot(women_degrees['Year'], women_degrees[stem_cats[cat_index]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[stem_cats[cat_index]], c=cb_orange, label='Men', linewidth=3)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.set_xlim(1968, 2011)
ax.set_ylim(0,100)
ax.set_title(stem_cats[cat_index])
ax.tick_params(right='off', left='off', bottom='off', top='off')
if cat_index == 0:
ax.text(2005, 10, 'Men')
ax.text(2005, 85, 'Women')
if cat_index == 5:
ax.text(2005, 90, 'Men')
ax.text(2005, 5, 'Women')
# Generating plots for second column (liberal arts degrees)
for sp in range(1, 16, 3):
cat_index = int((sp - 1) / 3)
ax = fig.add_subplot(6, 3, sp + 1)
ax.plot(women_degrees['Year'], women_degrees[lib_arts_cats[cat_index]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[lib_arts_cats[cat_index]], c=cb_orange, label='Men', linewidth=3)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.set_xlim(1968, 2011)
ax.set_ylim(0,100)
ax.set_title(lib_arts_cats[cat_index])
ax.tick_params(right='off', left='off', bottom='off', top='off')
if cat_index == 0:
ax.text(2005, 15, 'Men')
ax.text(2005, 80, 'Women')
# Generating plots for third column (other degrees)
for sp in range(2, 20, 3):
cat_index = int((sp - 2) / 3)
ax = fig.add_subplot(6, 3, sp + 1)
ax.plot(women_degrees['Year'], women_degrees[other_cats[cat_index]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[other_cats[cat_index]], c=cb_orange, label='Men', linewidth=3)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.set_xlim(1968, 2011)
ax.set_ylim(0,100)
ax.set_title(other_cats[cat_index])
ax.tick_params(right='off', left='off', bottom='off', top='off')
if cat_index == 0:
ax.text(2005, 0, 'Men')
ax.text(2005, 95, 'Women')
if cat_index == 5:
ax.text(2005, 65, 'Men')
ax.text(2005, 30, 'Women')
fig.tight_layout()
The repeating x-axis labels (years) per subplot is cluttering our figure so we will remove those except for the bottom subplot for each column.
# Generating the figures
fig = plt.figure(figsize=(18, 12))
# Generating plots for first column (STEM degrees)
for sp in range(0, 18, 3):
cat_index = int(sp / 3)
ax = fig.add_subplot(6, 3, sp + 1)
ax.plot(women_degrees['Year'], women_degrees[stem_cats[cat_index]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[stem_cats[cat_index]], c=cb_orange, label='Men', linewidth=3)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.set_xlim(1968, 2011)
ax.set_ylim(0,100)
ax.set_title(stem_cats[cat_index])
# Added labelbottom=False to the arguments in ax.tick_params
ax.tick_params(right='off', left='off', bottom='off', top='off', labelbottom='off')
# Adjusted annotation positions
if cat_index == 0:
ax.text(2005, 10, 'Men')
ax.text(2005, 85, 'Women')
if cat_index == 5:
ax.text(2005, 90, 'Men')
ax.text(2005, 5, 'Women')
# Adding back the x labels for the bottom subplot
ax.tick_params(labelbottom='on')
# Generating plots for second column (liberal arts degrees)
for sp in range(1, 16, 3):
cat_index = int((sp - 1) / 3)
ax = fig.add_subplot(6, 3, sp + 1)
ax.plot(women_degrees['Year'], women_degrees[lib_arts_cats[cat_index]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[lib_arts_cats[cat_index]], c=cb_orange, label='Men', linewidth=3)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.set_xlim(1968, 2011)
ax.set_ylim(0,100)
ax.set_title(lib_arts_cats[cat_index])
# Added labelbottom=False to the arguments in ax.tick_params
ax.tick_params(right='off', left='off', bottom='off', top='off', labelbottom='off')
# Adjusted annotation positions
if cat_index == 0:
ax.text(2005, 15, 'Men')
ax.text(2005, 80, 'Women')
if cat_index == 4:
# Adding back the x labels for the bottom subplot
ax.tick_params(labelbottom='on')
# Generating plots for third column (other degrees)
for sp in range(2, 20, 3):
cat_index = int((sp - 2) / 3)
ax = fig.add_subplot(6, 3, sp + 1)
ax.plot(women_degrees['Year'], women_degrees[other_cats[cat_index]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[other_cats[cat_index]], c=cb_orange, label='Men', linewidth=3)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.set_xlim(1968, 2011)
ax.set_ylim(0,100)
ax.set_title(other_cats[cat_index])
# Added labelbottom=False to the arguments in ax.tick_params
ax.tick_params(right='off', left='off', bottom='off', top='off', labelbottom='off')
# Adjusted annotation positions
if cat_index == 0:
ax.text(2005, 0, 'Men')
ax.text(2005, 95, 'Women')
if cat_index == 5:
ax.text(2005, 65, 'Men')
ax.text(2005, 30, 'Women')
# Adding back the x labels for the bottom subplot
ax.tick_params(labelbottom='on')
fig.tight_layout()
In order to further reduce the amount of clutter in our figure, we will also remove most y-axis tick labels and retain only zero (0) and one hundred (100)
# Generating the figures
fig = plt.figure(figsize=(18, 12))
# Generating plots for first column (STEM degrees)
for sp in range(0, 18, 3):
cat_index = int(sp / 3)
ax = fig.add_subplot(6, 3, sp + 1)
ax.plot(women_degrees['Year'], women_degrees[stem_cats[cat_index]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[stem_cats[cat_index]], c=cb_orange, label='Men', linewidth=3)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.set_xlim(1968, 2011)
ax.set_ylim(0,100)
ax.set_title(stem_cats[cat_index])
ax.tick_params(right='off', left='off', bottom='off', top='off', labelbottom='off')
# Removing y tick labels except 0 and 100
ax.set_yticks([0,100])
if cat_index == 0:
ax.text(2005, 10, 'Men')
ax.text(2005, 85, 'Women')
if cat_index == 5:
ax.text(2005, 90, 'Men')
ax.text(2005, 5, 'Women')
ax.tick_params(labelbottom='on')
# Generating plots for second column (liberal arts degrees)
for sp in range(1, 16, 3):
cat_index = int((sp - 1) / 3)
ax = fig.add_subplot(6, 3, sp + 1)
ax.plot(women_degrees['Year'], women_degrees[lib_arts_cats[cat_index]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[lib_arts_cats[cat_index]], c=cb_orange, label='Men', linewidth=3)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.set_xlim(1968, 2011)
ax.set_ylim(0,100)
ax.set_title(lib_arts_cats[cat_index])
ax.tick_params(right='off', left='off', bottom='off', top='off', labelbottom='off')
# Removing y tick labels except 0 and 100
ax.set_yticks([0,100])
if cat_index == 0:
ax.text(2005, 15, 'Men')
ax.text(2005, 80, 'Women')
if cat_index == 4:
ax.tick_params(labelbottom='on')
# Generating plots for third column (other degrees)
for sp in range(2, 20, 3):
cat_index = int((sp - 2) / 3)
ax = fig.add_subplot(6, 3, sp + 1)
ax.plot(women_degrees['Year'], women_degrees[other_cats[cat_index]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[other_cats[cat_index]], c=cb_orange, label='Men', linewidth=3)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.set_xlim(1968, 2011)
ax.set_ylim(0,100)
ax.set_title(other_cats[cat_index])
ax.tick_params(right='off', left='off', bottom='off', top='off', labelbottom='off')
# Removing y tick labels except 0 and 100
ax.set_yticks([0,100])
if cat_index == 0:
ax.text(2005, 0, 'Men')
ax.text(2005, 95, 'Women')
if cat_index == 5:
ax.text(2005, 65, 'Men')
ax.text(2005, 30, 'Women')
ax.tick_params(labelbottom='on')
fig.tight_layout()
Since we removed most of the y-axis tick labels, it's now more difficult to eyeball the values for the line graphs. In order to remedy this, we will add a horizontal gray line at the 50-mark in the y-axis. This will help our readers see much easier the degree of discrepancy in terms of gender gaps.
# Generating the figures
fig = plt.figure(figsize=(18, 12))
# Generating plots for first column (STEM degrees)
for sp in range(0, 18, 3):
cat_index = int(sp / 3)
ax = fig.add_subplot(6, 3, sp + 1)
ax.plot(women_degrees['Year'], women_degrees[stem_cats[cat_index]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[stem_cats[cat_index]], c=cb_orange, label='Men', linewidth=3)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.set_xlim(1968, 2011)
ax.set_ylim(0,100)
ax.set_title(stem_cats[cat_index])
ax.tick_params(right='off', left='off', bottom='off', top='off', labelbottom='off')
ax.set_yticks([0,100])
# Added horizontal line at y=50
ax.axhline(50, c=cb_gray, alpha=0.3)
if cat_index == 0:
ax.text(2005, 10, 'Men')
ax.text(2005, 85, 'Women')
if cat_index == 5:
ax.text(2005, 90, 'Men')
ax.text(2005, 5, 'Women')
ax.tick_params(labelbottom='on')
# Generating plots for second column (liberal arts degrees)
for sp in range(1, 16, 3):
cat_index = int((sp - 1) / 3)
ax = fig.add_subplot(6, 3, sp + 1)
ax.plot(women_degrees['Year'], women_degrees[lib_arts_cats[cat_index]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[lib_arts_cats[cat_index]], c=cb_orange, label='Men', linewidth=3)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.set_xlim(1968, 2011)
ax.set_ylim(0,100)
ax.set_title(lib_arts_cats[cat_index])
ax.tick_params(right='off', left='off', bottom='off', top='off', labelbottom='off')
ax.set_yticks([0,100])
# Added horizontal line at y=50
ax.axhline(50, c=cb_gray, alpha=0.3)
if cat_index == 0:
ax.text(2005, 15, 'Men')
ax.text(2005, 80, 'Women')
if cat_index == 4:
ax.tick_params(labelbottom='on')
# Generating plots for third column (other degrees)
for sp in range(2, 20, 3):
cat_index = int((sp - 2) / 3)
ax = fig.add_subplot(6, 3, sp + 1)
ax.plot(women_degrees['Year'], women_degrees[other_cats[cat_index]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[other_cats[cat_index]], c=cb_orange, label='Men', linewidth=3)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.set_xlim(1968, 2011)
ax.set_ylim(0,100)
ax.set_title(other_cats[cat_index])
ax.tick_params(right='off', left='off', bottom='off', top='off', labelbottom='off')
ax.set_yticks([0,100])
# Added horizontal line at y=50
ax.axhline(50, c=cb_gray, alpha=0.3)
if cat_index == 0:
ax.text(2005, 0, 'Men')
ax.text(2005, 95, 'Women')
if cat_index == 5:
ax.text(2005, 65, 'Men')
ax.text(2005, 30, 'Women')
ax.tick_params(labelbottom='on')
# Generating the figures
fig = plt.figure(figsize=(18, 12))
# Generating plots for first column (STEM degrees)
for sp in range(0, 18, 3):
cat_index = int(sp / 3)
ax = fig.add_subplot(6, 3, sp + 1)
ax.plot(women_degrees['Year'], women_degrees[stem_cats[cat_index]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[stem_cats[cat_index]], c=cb_orange, label='Men', linewidth=3)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.set_xlim(1968, 2011)
ax.set_ylim(0,100)
ax.set_title(stem_cats[cat_index])
ax.tick_params(right='off', left='off', bottom='off', top='off', labelbottom='off')
ax.set_yticks([0,100])
ax.axhline(50, c=cb_gray, alpha=0.3)
if cat_index == 0:
ax.text(2005, 10, 'Men')
ax.text(2005, 85, 'Women')
if cat_index == 5:
ax.text(2005, 90, 'Men')
ax.text(2005, 5, 'Women')
ax.tick_params(labelbottom='on')
# Generating plots for second column (liberal arts degrees)
for sp in range(1, 16, 3):
cat_index = int((sp - 1) / 3)
ax = fig.add_subplot(6, 3, sp + 1)
ax.plot(women_degrees['Year'], women_degrees[lib_arts_cats[cat_index]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[lib_arts_cats[cat_index]], c=cb_orange, label='Men', linewidth=3)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.set_xlim(1968, 2011)
ax.set_ylim(0,100)
ax.set_title(lib_arts_cats[cat_index])
ax.tick_params(right='off', left='off', bottom='off', top='off', labelbottom='off')
ax.set_yticks([0,100])
ax.axhline(50, c=cb_gray, alpha=0.3)
if cat_index == 0:
ax.text(2005, 15, 'Men')
ax.text(2005, 80, 'Women')
if cat_index == 4:
ax.tick_params(labelbottom='on')
# Generating plots for third column (other degrees)
for sp in range(2, 20, 3):
cat_index = int((sp - 2) / 3)
ax = fig.add_subplot(6, 3, sp + 1)
ax.plot(women_degrees['Year'], women_degrees[other_cats[cat_index]], c=cb_dark_blue, label='Women', linewidth=3)
ax.plot(women_degrees['Year'], 100-women_degrees[other_cats[cat_index]], c=cb_orange, label='Men', linewidth=3)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.set_xlim(1968, 2011)
ax.set_ylim(0,100)
ax.set_title(other_cats[cat_index])
ax.tick_params(right='off', left='off', bottom='off', top='off', labelbottom='off')
ax.set_yticks([0,100])
ax.axhline(50, c=cb_gray, alpha=0.3)
if cat_index == 0:
ax.text(2005, 0, 'Men')
ax.text(2005, 95, 'Women')
if cat_index == 5:
ax.text(2005, 65, 'Men')
ax.text(2005, 30, 'Women')
ax.tick_params(labelbottom='on')
# Saving figure
fig.savefig('gender_gaps_college.png', dpi=300)
In this project, we learned how to manipulate or adjust various chart elements to improve readability of our visualizations. As for the data itself, we showed that gender gaps vary across different degrees although the general trend is that the percentage of women awarded degrees in different majors have been somewhat increasing.