This project will be utilizing a dataset of on the job outcomes for students who graduated from college between 2010 and 2012. The original data on job outcomes was released by American Community Survey , which conducts surveys and aggregates the data. FiveThirtyEight cleaned the dataset and released it on their Github repo . You can find the data file here.
This project is focused on learning how to display data in various types of graph formats. This can be done by importing matplotlib along with the data file. Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
Data in table format is difficult for discerning whether any patterns or trends are present. That's where the benefit of plotting data in appropriate graph form comes in. I'm sure you've heard the saying, "A picture is worth a thousand numbers", or is it, "a thousand words". Whatever it is, data in visual display format can be quite revealing.
# import and read the dataset.
# import the plotting function matplotlib.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
recent_grads = pd.read_csv('recent-grads.csv')
recent_grads.iloc[0]
Rank 1 Major_code 2419 Major PETROLEUM ENGINEERING Total 2339 Men 2057 Women 282 Major_category Engineering ShareWomen 0.120564 Sample_size 36 Employed 1976 Full_time 1849 Part_time 270 Full_time_year_round 1207 Unemployed 37 Unemployment_rate 0.0183805 Median 110000 P25th 95000 P75th 125000 College_jobs 1534 Non_college_jobs 364 Low_wage_jobs 193 Name: 0, dtype: object
recent_grads.head()
Rank | Major_code | Major | Total | Men | Women | Major_category | ShareWomen | Sample_size | Employed | ... | Part_time | Full_time_year_round | Unemployed | Unemployment_rate | Median | P25th | P75th | College_jobs | Non_college_jobs | Low_wage_jobs | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2419 | PETROLEUM ENGINEERING | 2339.0 | 2057.0 | 282.0 | Engineering | 0.120564 | 36 | 1976 | ... | 270 | 1207 | 37 | 0.018381 | 110000 | 95000 | 125000 | 1534 | 364 | 193 |
1 | 2 | 2416 | MINING AND MINERAL ENGINEERING | 756.0 | 679.0 | 77.0 | Engineering | 0.101852 | 7 | 640 | ... | 170 | 388 | 85 | 0.117241 | 75000 | 55000 | 90000 | 350 | 257 | 50 |
2 | 3 | 2415 | METALLURGICAL ENGINEERING | 856.0 | 725.0 | 131.0 | Engineering | 0.153037 | 3 | 648 | ... | 133 | 340 | 16 | 0.024096 | 73000 | 50000 | 105000 | 456 | 176 | 0 |
3 | 4 | 2417 | NAVAL ARCHITECTURE AND MARINE ENGINEERING | 1258.0 | 1123.0 | 135.0 | Engineering | 0.107313 | 16 | 758 | ... | 150 | 692 | 40 | 0.050125 | 70000 | 43000 | 80000 | 529 | 102 | 0 |
4 | 5 | 2405 | CHEMICAL ENGINEERING | 32260.0 | 21239.0 | 11021.0 | Engineering | 0.341631 | 289 | 25694 | ... | 5180 | 16697 | 1672 | 0.061098 | 65000 | 50000 | 75000 | 18314 | 4440 | 972 |
5 rows × 21 columns
recent_grads.tail()
Rank | Major_code | Major | Total | Men | Women | Major_category | ShareWomen | Sample_size | Employed | ... | Part_time | Full_time_year_round | Unemployed | Unemployment_rate | Median | P25th | P75th | College_jobs | Non_college_jobs | Low_wage_jobs | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
168 | 169 | 3609 | ZOOLOGY | 8409.0 | 3050.0 | 5359.0 | Biology & Life Science | 0.637293 | 47 | 6259 | ... | 2190 | 3602 | 304 | 0.046320 | 26000 | 20000 | 39000 | 2771 | 2947 | 743 |
169 | 170 | 5201 | EDUCATIONAL PSYCHOLOGY | 2854.0 | 522.0 | 2332.0 | Psychology & Social Work | 0.817099 | 7 | 2125 | ... | 572 | 1211 | 148 | 0.065112 | 25000 | 24000 | 34000 | 1488 | 615 | 82 |
170 | 171 | 5202 | CLINICAL PSYCHOLOGY | 2838.0 | 568.0 | 2270.0 | Psychology & Social Work | 0.799859 | 13 | 2101 | ... | 648 | 1293 | 368 | 0.149048 | 25000 | 25000 | 40000 | 986 | 870 | 622 |
171 | 172 | 5203 | COUNSELING PSYCHOLOGY | 4626.0 | 931.0 | 3695.0 | Psychology & Social Work | 0.798746 | 21 | 3777 | ... | 965 | 2738 | 214 | 0.053621 | 23400 | 19200 | 26000 | 2403 | 1245 | 308 |
172 | 173 | 3501 | LIBRARY SCIENCE | 1098.0 | 134.0 | 964.0 | Education | 0.877960 | 2 | 742 | ... | 237 | 410 | 87 | 0.104946 | 22000 | 20000 | 22000 | 288 | 338 | 192 |
5 rows × 21 columns
# create a secondary DataFrame with only the numeric columns.
numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
recent_grads_num = recent_grads.select_dtypes(include=numerics)
recent_grads_num.head()
print("Qty.", len(recent_grads_num), "rows in dataset.")
Qty. 173 rows in dataset.
recent_grads_num.describe()
Rank | Major_code | Total | Men | Women | ShareWomen | Sample_size | Employed | Full_time | Part_time | Full_time_year_round | Unemployed | Unemployment_rate | Median | P25th | P75th | College_jobs | Non_college_jobs | Low_wage_jobs | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 173.000000 | 173.000000 | 172.000000 | 172.000000 | 172.000000 | 172.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 |
mean | 87.000000 | 3879.815029 | 39370.081395 | 16723.406977 | 22646.674419 | 0.522223 | 356.080925 | 31192.763006 | 26029.306358 | 8832.398844 | 19694.427746 | 2416.329480 | 0.068191 | 40151.445087 | 29501.445087 | 51494.219653 | 12322.635838 | 13284.497110 | 3859.017341 |
std | 50.084928 | 1687.753140 | 63483.491009 | 28122.433474 | 41057.330740 | 0.231205 | 618.361022 | 50675.002241 | 42869.655092 | 14648.179473 | 33160.941514 | 4112.803148 | 0.030331 | 11470.181802 | 9166.005235 | 14906.279740 | 21299.868863 | 23789.655363 | 6944.998579 |
min | 1.000000 | 1100.000000 | 124.000000 | 119.000000 | 0.000000 | 0.000000 | 2.000000 | 0.000000 | 111.000000 | 0.000000 | 111.000000 | 0.000000 | 0.000000 | 22000.000000 | 18500.000000 | 22000.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 44.000000 | 2403.000000 | 4549.750000 | 2177.500000 | 1778.250000 | 0.336026 | 39.000000 | 3608.000000 | 3154.000000 | 1030.000000 | 2453.000000 | 304.000000 | 0.050306 | 33000.000000 | 24000.000000 | 42000.000000 | 1675.000000 | 1591.000000 | 340.000000 |
50% | 87.000000 | 3608.000000 | 15104.000000 | 5434.000000 | 8386.500000 | 0.534024 | 130.000000 | 11797.000000 | 10048.000000 | 3299.000000 | 7413.000000 | 893.000000 | 0.067961 | 36000.000000 | 27000.000000 | 47000.000000 | 4390.000000 | 4595.000000 | 1231.000000 |
75% | 130.000000 | 5503.000000 | 38909.750000 | 14631.000000 | 22553.750000 | 0.703299 | 338.000000 | 31433.000000 | 25147.000000 | 9948.000000 | 16891.000000 | 2393.000000 | 0.087557 | 45000.000000 | 33000.000000 | 60000.000000 | 14444.000000 | 11783.000000 | 3466.000000 |
max | 173.000000 | 6403.000000 | 393735.000000 | 173809.000000 | 307087.000000 | 0.968954 | 4212.000000 | 307933.000000 | 251540.000000 | 115172.000000 | 199897.000000 | 28169.000000 | 0.177226 | 110000.000000 | 95000.000000 | 125000.000000 | 151643.000000 | 148395.000000 | 48207.000000 |
raw_data_count = len(recent_grads_num)
print("Qty.", raw_data_count, "rows in dataset.")
Qty. 173 rows in dataset.
# remove any rows with missing values.
recent_grads_num = recent_grads_num.dropna()
recent_grads_original = recent_grads
recent_grads = recent_grads_num
cleaned_data_count = len(recent_grads)
print("Qty.", cleaned_data_count, "rows after removing rows of missing data.")
Qty. 172 rows after removing rows of missing data.
# use appropriate commands to generate a scatterplot.
ax = recent_grads.plot(x='Sample_size', y='Employed', kind='scatter', figsize=(7,5))
ax.set_title('Employed vs. Sample_size')
ax.set_xlim(-200, 4500)
ax.set_ylim(-25000, 325000)
data = 0
ax = recent_grads.plot(x='Sample_size', y='Median', kind='scatter', figsize=(7,5))
ax.set_title('Median vs. Sample_size')
ax.set_xlim(-200, 4500)
ax.set_ylim(10000, 120000)
data = 0
ax = recent_grads.plot(x='Sample_size', y='Unemployment_rate', kind='scatter', figsize=(7,5))
ax.set_title('Unemployment_rate vs. Sample_size')
ax.set_xlim(-200, 4500)
ax.set_ylim(-0.02, 0.20)
data = 0
ax = recent_grads.plot(x='Full_time', y='Median', kind='scatter', figsize=(7,5))
ax.set_title('Median vs. Full_time')
ax.set_xlim(-25000, 275000)
ax.set_ylim(10000, 120000)
data = 0
ax = recent_grads.plot(x='Total', y='Median', kind='scatter', figsize=(7,5))
ax.set_title('Median vs. Total')
ax.set_xlim(-25000, 400000)
ax.set_ylim(10000, 120000)
data = 0
In the scatterplot above labeled, "Median vs. Total", there is no obvious upward or downward trend.
Eighty percent of the majors listed have a total of less than 50,000 students. If we removed the Majors with more than 50,000 grads, we might see a trend.
under_50000_grads = recent_grads[(recent_grads["Total"] < 50000)]
print(len(under_50000_grads))
ax = under_50000_grads.plot(x='Total', y='Median', kind='scatter', figsize=(7,5))
ax.set_title('Median vs. Total')
ax.set_xlim(-1000, 50000)
ax.set_ylim(10000, 120000)
data = 0
136
Still no obvious trend for Majors with less than 50,000 grads.
ax = recent_grads.plot(x='ShareWomen', y='Unemployment_rate', kind='scatter', figsize=(7,5))
ax.set_title('Unemployment_rate vs. ShareWomen')
ax.set_xlim(-0.1, 1.1)
ax.set_ylim(-0.02, 0.2)
data = 0
ax = recent_grads.plot(x='ShareWomen', y='Median', kind='scatter', figsize=(7,5))
ax.set_title('Median vs. ShareWomen')
ax.set_xlim(-0.1, 1.1)
ax.set_ylim(10000, 120000)
data = 0
In the scatterplot above labeled, "Median vs. ShareWomen", there is no obvious upward or downward trend.
There may be a very weak relationship between Median Salary and "ShareWomen". If there is, it yields an inverse relationship for proportion of women by major increases, median salary decreases.
There is one "outlier" with a median salary of $110,000. This relates to the major "Petroleum Engineering".
ax = recent_grads.plot(x='Men', y='Median', kind='scatter', figsize=(7,5))
ax.set_title('Median vs. Men')
ax.set_xlim(-10000, 200000)
ax.set_ylim(10000, 120000)
data = 0
ax = recent_grads.plot(x='Women', y='Median', kind='scatter', figsize=(7,5))
ax.set_title('Median vs. Women')
ax.set_xlim(-10000, 320000)
ax.set_ylim(10000, 120000)
data = 0
ax = recent_grads.plot(x='Full_time_year_round', y='Median', kind='scatter', figsize=(7,5))
ax.set_title('Median vs. Full_time_year_round')
ax.set_xlim(-10000, 220000)
ax.set_ylim(10000, 120000)
data = 0
In the scatterplot above labeled, "Median vs. Full_time_year_round", there is no obvious upward or downward trend.
Eighty percent of the Full_time_year_round employees listed by Major have a total of less than 22,000.
Histograms are often used to display a distribution of the dataset to express:
# use the ".hist" command to generate a histogram.
recent_grads['Sample_size'].hist(bins=25, range=(0,5000))
plt.title("Histogram for 'Sample_Size'")
plt.ylabel("Frequency")
plt.xlabel('Sample_size of Full_time')
<matplotlib.text.Text at 0x7fe418dfbc88>
The distribution above is definitely non-normal and is typically referred to as a lognormal type distribution.
recent_grads['Median'].hist(bins=25, range=(0,120000))
plt.title("Histogram for 'Median'")
plt.ylabel("Frequency")
plt.xlabel('Median Salary of Full_time_year_round_workers by Major')
<matplotlib.text.Text at 0x7fe418d4cda0>
recent_grads['Employed'].hist(bins=25, range=(0,320000))
plt.title("Histogram for 'Employed'")
plt.ylabel("Frequency")
plt.xlabel('Quantity Employed by Major')
<matplotlib.text.Text at 0x7fe418e8f7f0>
recent_grads['Full_time'].hist(bins=25, range=(0,265000))
plt.title("Histogram for 'Full_time'")
plt.ylabel("Frequency")
plt.xlabel('Number Employed 35 Hrs or More by Major')
<matplotlib.text.Text at 0x7fe41b13ba90>
recent_grads['ShareWomen'].hist(bins=20, range=(0,1.2))
plt.title("Histogram for 'ShareWomen'")
plt.ylabel("Frequency")
plt.xlabel('Proportion of Women as Share of Total by Major')
<matplotlib.text.Text at 0x7fe418f91518>
In the histogram distribution above, there are at least two peaks (high values). It is possible that this is truly a bi-modal distribution, meaning that it represents two distinct populations. There is a hypothesis test to test whether a distribution is uni-modal, bi-modal or multi-modal. I'm not sure if that is available in the "Python Library".
recent_grads['Unemployment_rate'].hist(bins=20, range=(0,0.2))
plt.title("Histogram for 'Unemployment_rate'")
plt.ylabel("Frequency")
plt.xlabel('Proportion of Unemployed by Major')
<matplotlib.text.Text at 0x7fe418f6fc88>
recent_grads['Men'].hist(bins=30, range=(0,200000))
plt.title("Histogram for 'Men'")
plt.ylabel("Frequency")
plt.xlabel('Qty of Male Graduates by Major')
<matplotlib.text.Text at 0x7fe418f7a9b0>
recent_grads['Women'].hist(bins=30, range=(0,320000))
plt.title("Histogram for 'Women'")
plt.ylabel("Frequency")
plt.xlabel('Qty of Female Graduates by Major')
<matplotlib.text.Text at 0x7fe418b787b8>
A scatter matrix plot combines both scatter plots and histograms into one grid of plots and allows us to explore potential relationships and distributions simultaneously. A scatter matrix plot consists of n by n plots on a grid, where n is the number of columns, the plots on the diagonal are histograms, and the non-diagonal plots are scatter plots.
pd.plotting.scatter_matrix(recent_grads[['Sample_size', 'Median']], figsize=(10,10))
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fe418af7ac8>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fe418a98048>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7fe4189e0c18>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fe41899f320>]], dtype=object)
pd.plotting.scatter_matrix(recent_grads[['Sample_size', 'Median', 'Unemployment_rate']], figsize=(10,10))
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fe418950a90>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fe4188be470>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fe418885e48>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7fe418846160>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fe41880fb38>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fe4187ccb70>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7fe418799780>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fe4187586d8>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fe41869f828>]], dtype=object)
pd.plotting.scatter_matrix(recent_grads[['Total', 'Median']], figsize=(10,10))
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fe41861ce80>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fe418590748>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7fe4184dd1d0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fe41851a4e0>]], dtype=object)
Regarding the two variables "Total" and "Median" in the scatter matrix plots above, I don't see anything noteworthy concerning trends or relationship between the two.
The distributions are both logarithmic simply because most of the "Total" and "Median" values are at the bottom end of the total range.
pd.plotting.scatter_matrix(recent_grads[['ShareWomen', 'Median']], figsize=(10,10))
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fe418448320>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fe4183b27b8>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7fe418382160>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fe41833e358>]], dtype=object)
Regarding the two variables "ShareWomen" and "Median" in the scatter matrix plots above, I don't see anything noteworthy concerning trends or relationship between the two.
pd.plotting.scatter_matrix(recent_grads[['Full_time_year_round', 'Median']], figsize=(10,10))
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fe418264f60>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fe418257358>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7fe41819fda0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fe4181601d0>]], dtype=object)
Regarding the two variables "Full_Time_Year_Round" and "Median" in the scatter matrix plots above, I don't see anything noteworthy concerning trends or relationship between the two.
recent_grads[:10]['ShareWomen'].plot(kind='bar')
<matplotlib.axes._subplots.AxesSubplot at 0x7fe4180ff2b0>
recent_grads[-10:]['ShareWomen'].plot(kind='bar')
<matplotlib.axes._subplots.AxesSubplot at 0x7fe418075208>
recent_grads[:10]['Unemployment_rate'].plot(kind='bar')
<matplotlib.axes._subplots.AxesSubplot at 0x7fe41802b828>
recent_grads[-10:]['Unemployment_rate'].plot(kind='bar')
<matplotlib.axes._subplots.AxesSubplot at 0x7fe417fb3080>
print(recent_grads_original.columns)
majors = recent_grads_original["Major"]
first_ten_majors = majors[:10]
print(first_ten_majors)
Index(['Rank', 'Major_code', 'Major', 'Total', 'Men', 'Women', 'Major_category', 'ShareWomen', 'Sample_size', 'Employed', 'Full_time', 'Part_time', 'Full_time_year_round', 'Unemployed', 'Unemployment_rate', 'Median', 'P25th', 'P75th', 'College_jobs', 'Non_college_jobs', 'Low_wage_jobs'], dtype='object') 0 PETROLEUM ENGINEERING 1 MINING AND MINERAL ENGINEERING 2 METALLURGICAL ENGINEERING 3 NAVAL ARCHITECTURE AND MARINE ENGINEERING 4 CHEMICAL ENGINEERING 5 NUCLEAR ENGINEERING 6 ACTUARIAL SCIENCE 7 ASTRONOMY AND ASTROPHYSICS 8 MECHANICAL ENGINEERING 9 ELECTRICAL ENGINEERING Name: Major, dtype: object
recent_grads_original[:10].plot.bar(x='Major', y='Women')
<matplotlib.axes._subplots.AxesSubplot at 0x7fe417f47780>
recent_grads_original[:10].plot.bar(x='Major', y='Men')
<matplotlib.axes._subplots.AxesSubplot at 0x7fe417f851d0>
I suppose the only thing that is noteworthy about the two bar plots above is that the top three dominant Engineering categories for both men and women grad quantities are the same ones:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.boxplot(recent_grads['Median'])
ax.set_ylim(10000,130000)
ax.set_xticklabels(["Median salary of full-time, year-round workers"])
plt.show()
print(recent_grads["Median"].describe())
count 172.000000 mean 40076.744186 std 11461.388773 min 22000.000000 25% 33000.000000 50% 36000.000000 75% 45000.000000 max 110000.000000 Name: Median, dtype: float64
About 50 % of the median salaries fall between 30,000 and 5,000. The overall average median salary across all Majors is about 40,000. There are a few outliers above $0,000.
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.boxplot(recent_grads['Unemployment_rate'])
ax.set_ylim(-0.05,0.2)
ax.set_xticklabels(["Unemployment Proportion"])
plt.show()
print(recent_grads["Unemployment_rate"].describe())
count 172.000000 mean 0.068024 std 0.030340 min 0.000000 25% 0.050261 50% 0.067544 75% 0.087247 max 0.177226 Name: Unemployment_rate, dtype: float64
About 50 % of the median salaries fall between 5 % and 8.7%. The overall average median salary across all Majors is about 6.8 %. There are a few outliers above 12.5 %.
pd.plotting.scatter_matrix(recent_grads[['ShareWomen', 'Unemployment_rate']], figsize=(10,10))
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fe417ce3400>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fe417c8ca90>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7fe417c5c710>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fe417c16b38>]], dtype=object)
# consolidate University Majors into Major Categories.
recent_grads_original['Major_category'].value_counts()
Engineering 29 Education 16 Humanities & Liberal Arts 15 Biology & Life Science 14 Business 13 Health 12 Computers & Mathematics 11 Agriculture & Natural Resources 10 Physical Sciences 10 Psychology & Social Work 9 Social Science 9 Arts 8 Industrial Arts & Consumer Services 7 Law & Public Policy 5 Communications & Journalism 4 Interdisciplinary 1 Name: Major_category, dtype: int64
We see in the table output above that there are qty. 29 different branches of Engineering categories that consolidate into the one general category, "Engineering".
Rather than generating a boxplot of each of the 172 Majors, I think it is worth looking at the grouped Majors by category which numbers qty. 16 as shown in the table output above.
Perhaps we can draw some conclusions as to whether "consolidated majors" account for significant differences in Median Salaries.
cols = ['Major_category', 'Median']
category_compare = recent_grads_original[cols]
data = category_compare['Median']
category = category_compare['Major_category']
boxplot=category_compare.boxplot(column='Median', by='Major_category', figsize=(10,6))
tick_positions = range(1,16)
plt.xticks(rotation=90)
plt.ylabel('Median Salary')
plt.title('Boxplots for Major Categories by Median Salary')
plt.suptitle('')
plt.show()
Looking at the boxplot distributions above, we can see that "Engineering" stands out as the University Major (regardless of qty. of students graduated from Engineering) that yields the highest potential Median Salary compared to all the other ones.
On the other end of the scale, it looks like "Psychology & Social Work" yields the lowest Median Salary.
If we conducted a statistical test for equal variances between Major_Categories, we would most likely reject the hypothesis of equal variances. This means that the spread of Median Salaries is not the same for all of the Major_Categories.
This project provided some great opportunities to get experience with generating various types of graphs and to make visual observations.
Graph Type:
None of the scatterplots revealed significant relationships between any of the variables compared. The only thing that showed significant differences in median salary was university major_category as opposed to volume of grads by major.
Other than that, it was fun to explore the data visually!