We'll be working with a dataset on the job outcomes of students who graduated from college between 2010 and 2012. The original data on job outcomes was released by American Community Survey, which conducts surveys and aggregates the data. FiveThirtyEight cleaned the dataset and released it on their Github repo.
Each row in the dataset represents a different major in college and contains information on gender diversity, employment rates, median salaries, and more. Here are some of the columns in the dataset:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
rg = pd.read_csv('recent-grads.csv')
rg
Rank | Major_code | Major | Total | Men | Women | Major_category | ShareWomen | Sample_size | Employed | ... | Part_time | Full_time_year_round | Unemployed | Unemployment_rate | Median | P25th | P75th | College_jobs | Non_college_jobs | Low_wage_jobs | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2419 | PETROLEUM ENGINEERING | 2339.0 | 2057.0 | 282.0 | Engineering | 0.120564 | 36 | 1976 | ... | 270 | 1207 | 37 | 0.018381 | 110000 | 95000 | 125000 | 1534 | 364 | 193 |
1 | 2 | 2416 | MINING AND MINERAL ENGINEERING | 756.0 | 679.0 | 77.0 | Engineering | 0.101852 | 7 | 640 | ... | 170 | 388 | 85 | 0.117241 | 75000 | 55000 | 90000 | 350 | 257 | 50 |
2 | 3 | 2415 | METALLURGICAL ENGINEERING | 856.0 | 725.0 | 131.0 | Engineering | 0.153037 | 3 | 648 | ... | 133 | 340 | 16 | 0.024096 | 73000 | 50000 | 105000 | 456 | 176 | 0 |
3 | 4 | 2417 | NAVAL ARCHITECTURE AND MARINE ENGINEERING | 1258.0 | 1123.0 | 135.0 | Engineering | 0.107313 | 16 | 758 | ... | 150 | 692 | 40 | 0.050125 | 70000 | 43000 | 80000 | 529 | 102 | 0 |
4 | 5 | 2405 | CHEMICAL ENGINEERING | 32260.0 | 21239.0 | 11021.0 | Engineering | 0.341631 | 289 | 25694 | ... | 5180 | 16697 | 1672 | 0.061098 | 65000 | 50000 | 75000 | 18314 | 4440 | 972 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
168 | 169 | 3609 | ZOOLOGY | 8409.0 | 3050.0 | 5359.0 | Biology & Life Science | 0.637293 | 47 | 6259 | ... | 2190 | 3602 | 304 | 0.046320 | 26000 | 20000 | 39000 | 2771 | 2947 | 743 |
169 | 170 | 5201 | EDUCATIONAL PSYCHOLOGY | 2854.0 | 522.0 | 2332.0 | Psychology & Social Work | 0.817099 | 7 | 2125 | ... | 572 | 1211 | 148 | 0.065112 | 25000 | 24000 | 34000 | 1488 | 615 | 82 |
170 | 171 | 5202 | CLINICAL PSYCHOLOGY | 2838.0 | 568.0 | 2270.0 | Psychology & Social Work | 0.799859 | 13 | 2101 | ... | 648 | 1293 | 368 | 0.149048 | 25000 | 25000 | 40000 | 986 | 870 | 622 |
171 | 172 | 5203 | COUNSELING PSYCHOLOGY | 4626.0 | 931.0 | 3695.0 | Psychology & Social Work | 0.798746 | 21 | 3777 | ... | 965 | 2738 | 214 | 0.053621 | 23400 | 19200 | 26000 | 2403 | 1245 | 308 |
172 | 173 | 3501 | LIBRARY SCIENCE | 1098.0 | 134.0 | 964.0 | Education | 0.877960 | 2 | 742 | ... | 237 | 410 | 87 | 0.104946 | 22000 | 20000 | 22000 | 288 | 338 | 192 |
173 rows × 21 columns
From the first look at the DS, it seems that the highest earning majors are those in the Engineering category, with significantly lower share of women in these disciplines. While the lowest ranking median earnings belong to the Social Science categories, being also the area with the most women. An initial question pops-up: are there less women in highest paying professions? Also, the latter categories seem to have greatest uneployment rate, but this is all just an initiall observation.
rg.iloc[0] # 1st row preview
Rank 1 Major_code 2419 Major PETROLEUM ENGINEERING Total 2339 Men 2057 Women 282 Major_category Engineering ShareWomen 0.120564 Sample_size 36 Employed 1976 Full_time 1849 Part_time 270 Full_time_year_round 1207 Unemployed 37 Unemployment_rate 0.0183805 Median 110000 P25th 95000 P75th 125000 College_jobs 1534 Non_college_jobs 364 Low_wage_jobs 193 Name: 0, dtype: object
rg.head()
Rank | Major_code | Major | Total | Men | Women | Major_category | ShareWomen | Sample_size | Employed | ... | Part_time | Full_time_year_round | Unemployed | Unemployment_rate | Median | P25th | P75th | College_jobs | Non_college_jobs | Low_wage_jobs | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2419 | PETROLEUM ENGINEERING | 2339.0 | 2057.0 | 282.0 | Engineering | 0.120564 | 36 | 1976 | ... | 270 | 1207 | 37 | 0.018381 | 110000 | 95000 | 125000 | 1534 | 364 | 193 |
1 | 2 | 2416 | MINING AND MINERAL ENGINEERING | 756.0 | 679.0 | 77.0 | Engineering | 0.101852 | 7 | 640 | ... | 170 | 388 | 85 | 0.117241 | 75000 | 55000 | 90000 | 350 | 257 | 50 |
2 | 3 | 2415 | METALLURGICAL ENGINEERING | 856.0 | 725.0 | 131.0 | Engineering | 0.153037 | 3 | 648 | ... | 133 | 340 | 16 | 0.024096 | 73000 | 50000 | 105000 | 456 | 176 | 0 |
3 | 4 | 2417 | NAVAL ARCHITECTURE AND MARINE ENGINEERING | 1258.0 | 1123.0 | 135.0 | Engineering | 0.107313 | 16 | 758 | ... | 150 | 692 | 40 | 0.050125 | 70000 | 43000 | 80000 | 529 | 102 | 0 |
4 | 5 | 2405 | CHEMICAL ENGINEERING | 32260.0 | 21239.0 | 11021.0 | Engineering | 0.341631 | 289 | 25694 | ... | 5180 | 16697 | 1672 | 0.061098 | 65000 | 50000 | 75000 | 18314 | 4440 | 972 |
5 rows × 21 columns
rg.tail()
Rank | Major_code | Major | Total | Men | Women | Major_category | ShareWomen | Sample_size | Employed | ... | Part_time | Full_time_year_round | Unemployed | Unemployment_rate | Median | P25th | P75th | College_jobs | Non_college_jobs | Low_wage_jobs | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
168 | 169 | 3609 | ZOOLOGY | 8409.0 | 3050.0 | 5359.0 | Biology & Life Science | 0.637293 | 47 | 6259 | ... | 2190 | 3602 | 304 | 0.046320 | 26000 | 20000 | 39000 | 2771 | 2947 | 743 |
169 | 170 | 5201 | EDUCATIONAL PSYCHOLOGY | 2854.0 | 522.0 | 2332.0 | Psychology & Social Work | 0.817099 | 7 | 2125 | ... | 572 | 1211 | 148 | 0.065112 | 25000 | 24000 | 34000 | 1488 | 615 | 82 |
170 | 171 | 5202 | CLINICAL PSYCHOLOGY | 2838.0 | 568.0 | 2270.0 | Psychology & Social Work | 0.799859 | 13 | 2101 | ... | 648 | 1293 | 368 | 0.149048 | 25000 | 25000 | 40000 | 986 | 870 | 622 |
171 | 172 | 5203 | COUNSELING PSYCHOLOGY | 4626.0 | 931.0 | 3695.0 | Psychology & Social Work | 0.798746 | 21 | 3777 | ... | 965 | 2738 | 214 | 0.053621 | 23400 | 19200 | 26000 | 2403 | 1245 | 308 |
172 | 173 | 3501 | LIBRARY SCIENCE | 1098.0 | 134.0 | 964.0 | Education | 0.877960 | 2 | 742 | ... | 237 | 410 | 87 | 0.104946 | 22000 | 20000 | 22000 | 288 | 338 | 192 |
5 rows × 21 columns
rg.describe()
Rank | Major_code | Total | Men | Women | ShareWomen | Sample_size | Employed | Full_time | Part_time | Full_time_year_round | Unemployed | Unemployment_rate | Median | P25th | P75th | College_jobs | Non_college_jobs | Low_wage_jobs | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 173.000000 | 173.000000 | 172.000000 | 172.000000 | 172.000000 | 172.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 |
mean | 87.000000 | 3879.815029 | 39370.081395 | 16723.406977 | 22646.674419 | 0.522223 | 356.080925 | 31192.763006 | 26029.306358 | 8832.398844 | 19694.427746 | 2416.329480 | 0.068191 | 40151.445087 | 29501.445087 | 51494.219653 | 12322.635838 | 13284.497110 | 3859.017341 |
std | 50.084928 | 1687.753140 | 63483.491009 | 28122.433474 | 41057.330740 | 0.231205 | 618.361022 | 50675.002241 | 42869.655092 | 14648.179473 | 33160.941514 | 4112.803148 | 0.030331 | 11470.181802 | 9166.005235 | 14906.279740 | 21299.868863 | 23789.655363 | 6944.998579 |
min | 1.000000 | 1100.000000 | 124.000000 | 119.000000 | 0.000000 | 0.000000 | 2.000000 | 0.000000 | 111.000000 | 0.000000 | 111.000000 | 0.000000 | 0.000000 | 22000.000000 | 18500.000000 | 22000.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 44.000000 | 2403.000000 | 4549.750000 | 2177.500000 | 1778.250000 | 0.336026 | 39.000000 | 3608.000000 | 3154.000000 | 1030.000000 | 2453.000000 | 304.000000 | 0.050306 | 33000.000000 | 24000.000000 | 42000.000000 | 1675.000000 | 1591.000000 | 340.000000 |
50% | 87.000000 | 3608.000000 | 15104.000000 | 5434.000000 | 8386.500000 | 0.534024 | 130.000000 | 11797.000000 | 10048.000000 | 3299.000000 | 7413.000000 | 893.000000 | 0.067961 | 36000.000000 | 27000.000000 | 47000.000000 | 4390.000000 | 4595.000000 | 1231.000000 |
75% | 130.000000 | 5503.000000 | 38909.750000 | 14631.000000 | 22553.750000 | 0.703299 | 338.000000 | 31433.000000 | 25147.000000 | 9948.000000 | 16891.000000 | 2393.000000 | 0.087557 | 45000.000000 | 33000.000000 | 60000.000000 | 14444.000000 | 11783.000000 | 3466.000000 |
max | 173.000000 | 6403.000000 | 393735.000000 | 173809.000000 | 307087.000000 | 0.968954 | 4212.000000 | 307933.000000 | 251540.000000 | 115172.000000 | 199897.000000 | 28169.000000 | 0.177226 | 110000.000000 | 95000.000000 | 125000.000000 | 151643.000000 | 148395.000000 | 48207.000000 |
#Let's drop rows that contait missing values - actualy that's only 1 row
rg = rg.dropna()
rg
Rank | Major_code | Major | Total | Men | Women | Major_category | ShareWomen | Sample_size | Employed | ... | Part_time | Full_time_year_round | Unemployed | Unemployment_rate | Median | P25th | P75th | College_jobs | Non_college_jobs | Low_wage_jobs | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2419 | PETROLEUM ENGINEERING | 2339.0 | 2057.0 | 282.0 | Engineering | 0.120564 | 36 | 1976 | ... | 270 | 1207 | 37 | 0.018381 | 110000 | 95000 | 125000 | 1534 | 364 | 193 |
1 | 2 | 2416 | MINING AND MINERAL ENGINEERING | 756.0 | 679.0 | 77.0 | Engineering | 0.101852 | 7 | 640 | ... | 170 | 388 | 85 | 0.117241 | 75000 | 55000 | 90000 | 350 | 257 | 50 |
2 | 3 | 2415 | METALLURGICAL ENGINEERING | 856.0 | 725.0 | 131.0 | Engineering | 0.153037 | 3 | 648 | ... | 133 | 340 | 16 | 0.024096 | 73000 | 50000 | 105000 | 456 | 176 | 0 |
3 | 4 | 2417 | NAVAL ARCHITECTURE AND MARINE ENGINEERING | 1258.0 | 1123.0 | 135.0 | Engineering | 0.107313 | 16 | 758 | ... | 150 | 692 | 40 | 0.050125 | 70000 | 43000 | 80000 | 529 | 102 | 0 |
4 | 5 | 2405 | CHEMICAL ENGINEERING | 32260.0 | 21239.0 | 11021.0 | Engineering | 0.341631 | 289 | 25694 | ... | 5180 | 16697 | 1672 | 0.061098 | 65000 | 50000 | 75000 | 18314 | 4440 | 972 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
168 | 169 | 3609 | ZOOLOGY | 8409.0 | 3050.0 | 5359.0 | Biology & Life Science | 0.637293 | 47 | 6259 | ... | 2190 | 3602 | 304 | 0.046320 | 26000 | 20000 | 39000 | 2771 | 2947 | 743 |
169 | 170 | 5201 | EDUCATIONAL PSYCHOLOGY | 2854.0 | 522.0 | 2332.0 | Psychology & Social Work | 0.817099 | 7 | 2125 | ... | 572 | 1211 | 148 | 0.065112 | 25000 | 24000 | 34000 | 1488 | 615 | 82 |
170 | 171 | 5202 | CLINICAL PSYCHOLOGY | 2838.0 | 568.0 | 2270.0 | Psychology & Social Work | 0.799859 | 13 | 2101 | ... | 648 | 1293 | 368 | 0.149048 | 25000 | 25000 | 40000 | 986 | 870 | 622 |
171 | 172 | 5203 | COUNSELING PSYCHOLOGY | 4626.0 | 931.0 | 3695.0 | Psychology & Social Work | 0.798746 | 21 | 3777 | ... | 965 | 2738 | 214 | 0.053621 | 23400 | 19200 | 26000 | 2403 | 1245 | 308 |
172 | 173 | 3501 | LIBRARY SCIENCE | 1098.0 | 134.0 | 964.0 | Education | 0.877960 | 2 | 742 | ... | 237 | 410 | 87 | 0.104946 | 22000 | 20000 | 22000 | 288 | 338 | 192 |
172 rows × 21 columns
# share_man
rg['Share_men'] = rg['Men'] / rg['Total']
rg['Share_men']
<ipython-input-7-8068199f487c>:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rg['Share_men'] = rg['Men'] / rg['Total']
0 0.879436 1 0.898148 2 0.846963 3 0.892687 4 0.658369 ... 168 0.362707 169 0.182901 170 0.200141 171 0.201254 172 0.122040 Name: Share_men, Length: 172, dtype: float64
rg
Rank | Major_code | Major | Total | Men | Women | Major_category | ShareWomen | Sample_size | Employed | ... | Full_time_year_round | Unemployed | Unemployment_rate | Median | P25th | P75th | College_jobs | Non_college_jobs | Low_wage_jobs | Share_men | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2419 | PETROLEUM ENGINEERING | 2339.0 | 2057.0 | 282.0 | Engineering | 0.120564 | 36 | 1976 | ... | 1207 | 37 | 0.018381 | 110000 | 95000 | 125000 | 1534 | 364 | 193 | 0.879436 |
1 | 2 | 2416 | MINING AND MINERAL ENGINEERING | 756.0 | 679.0 | 77.0 | Engineering | 0.101852 | 7 | 640 | ... | 388 | 85 | 0.117241 | 75000 | 55000 | 90000 | 350 | 257 | 50 | 0.898148 |
2 | 3 | 2415 | METALLURGICAL ENGINEERING | 856.0 | 725.0 | 131.0 | Engineering | 0.153037 | 3 | 648 | ... | 340 | 16 | 0.024096 | 73000 | 50000 | 105000 | 456 | 176 | 0 | 0.846963 |
3 | 4 | 2417 | NAVAL ARCHITECTURE AND MARINE ENGINEERING | 1258.0 | 1123.0 | 135.0 | Engineering | 0.107313 | 16 | 758 | ... | 692 | 40 | 0.050125 | 70000 | 43000 | 80000 | 529 | 102 | 0 | 0.892687 |
4 | 5 | 2405 | CHEMICAL ENGINEERING | 32260.0 | 21239.0 | 11021.0 | Engineering | 0.341631 | 289 | 25694 | ... | 16697 | 1672 | 0.061098 | 65000 | 50000 | 75000 | 18314 | 4440 | 972 | 0.658369 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
168 | 169 | 3609 | ZOOLOGY | 8409.0 | 3050.0 | 5359.0 | Biology & Life Science | 0.637293 | 47 | 6259 | ... | 3602 | 304 | 0.046320 | 26000 | 20000 | 39000 | 2771 | 2947 | 743 | 0.362707 |
169 | 170 | 5201 | EDUCATIONAL PSYCHOLOGY | 2854.0 | 522.0 | 2332.0 | Psychology & Social Work | 0.817099 | 7 | 2125 | ... | 1211 | 148 | 0.065112 | 25000 | 24000 | 34000 | 1488 | 615 | 82 | 0.182901 |
170 | 171 | 5202 | CLINICAL PSYCHOLOGY | 2838.0 | 568.0 | 2270.0 | Psychology & Social Work | 0.799859 | 13 | 2101 | ... | 1293 | 368 | 0.149048 | 25000 | 25000 | 40000 | 986 | 870 | 622 | 0.200141 |
171 | 172 | 5203 | COUNSELING PSYCHOLOGY | 4626.0 | 931.0 | 3695.0 | Psychology & Social Work | 0.798746 | 21 | 3777 | ... | 2738 | 214 | 0.053621 | 23400 | 19200 | 26000 | 2403 | 1245 | 308 | 0.201254 |
172 | 173 | 3501 | LIBRARY SCIENCE | 1098.0 | 134.0 | 964.0 | Education | 0.877960 | 2 | 742 | ... | 410 | 87 | 0.104946 | 22000 | 20000 | 22000 | 288 | 338 | 192 | 0.122040 |
172 rows × 22 columns
Do students in more popular majors make more money? - not neccesarily (Fig. 1.2 and 1.3)
Do students that majored in subjects that were majority female make more money? - no (Fig. 1.5, 1.6)
Is there any link between the number of full-time employees and median salary? - some (Fif 1.3 )
rg.plot(x="Sample_size", y="Median", kind="scatter", xlim=(0,3000))
<matplotlib.axes._subplots.AxesSubplot at 0x20eb8a326a0>
Smaller the sample size, bigger the Median span, as expected. With sample size growing, the Median value of earnings (of those employed full time, year-round) converges around 40.000. Assuming that the sample size is proportional to the Total number of people with a major, we could say that more popular Majors do not generate more income.
rg.plot(x="Sample_size", y="Unemployment_rate", kind="scatter")
<matplotlib.axes._subplots.AxesSubplot at 0x20eb8aea4c0>
Smaller the sample size, bigger the Unemployment_rate span, as expected. With Sample_size growing, the Unemployment_rate converges around 0.075
rg.plot(x="Median", y="Full_time", kind="scatter", xlim=(0,80000))
<matplotlib.axes._subplots.AxesSubplot at 0x20eb8b663a0>
rg.plot(x="Total", y="Full_time", kind="scatter")
<matplotlib.axes._subplots.AxesSubplot at 0x20ebb92ff70>
The smaller the number of people employed full time, the larger the range od Median earnings. As this number grows, the Median converges roughly around 40000.
Since the "Full_time" ( = number of people employed full time for a certain Major) is highly proportional with "Total" ( = total number of people for a certain Major), therefor, we could say the less popular Majors tend to have a larger Median span, while for more popular Majors, the Median earnings roughly converge around 45000.
rg.plot(x="ShareWomen", y="Unemployment_rate", kind="scatter")
<matplotlib.axes._subplots.AxesSubplot at 0x20eb8c17670>
There is almost no correlation between the number of women enrolled for a Major with the Unemployment rate. It seems that regardless of the share of the woman for a Major, the Uneployment rate roughly converges around 0.070.
rg.plot(x="Women", y="Median", kind="scatter", xlim=(0,125000))
<matplotlib.axes._subplots.AxesSubplot at 0x20eb8c17970>
rg.plot(x="ShareWomen", y="Median", kind="scatter")
<matplotlib.axes._subplots.AxesSubplot at 0x20eb8c67d00>
rg.plot(x="Men", y="Median", kind="scatter", xlim=(0,120000))
<matplotlib.axes._subplots.AxesSubplot at 0x20eb8cfd160>
rg.plot(x="Share_men", y="Median", kind="scatter")
<matplotlib.axes._subplots.AxesSubplot at 0x20eb8d52100>
CONCLUSION Fig. 1.5 and 1.6: Looking just at these 2 scatter plots: Women/Man Vs. Median we can see that as the number of women enrolled for a major grows, the median earning converges around 35000, while for Men we can see that this number is somwhere around 42000. Plots ShareWoman/Share_men Vs. Median confirm this obesrvation - here we can clearly see that as the percentage of the women in a major grows, the median falls, while the situation is opposite for the men.
Sample_size
Median
Employed
Full_time
ShareWomen
Unemployment_rate
Men
Women
rg["Sample_size"].plot(kind='hist')
<matplotlib.axes._subplots.AxesSubplot at 0x20eb9d739a0>
rg['Sample_size'].hist(bins=15, range=(0,5000))
<matplotlib.axes._subplots.AxesSubplot at 0x20eb9de2310>
rg['Median'].hist(bins=30)
<matplotlib.axes._subplots.AxesSubplot at 0x20eb9e65d00>
rg['Employed'].hist(bins=20, xrot=30)
<matplotlib.axes._subplots.AxesSubplot at 0x20eb9f054c0>
rg['Full_time'].hist(bins = 20)
<matplotlib.axes._subplots.AxesSubplot at 0x20eb9f37670>
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.hist(rg["ShareWomen"],bins=10)
plt.show()
rg['ShareWomen'].hist(bins = 10)
<matplotlib.axes._subplots.AxesSubplot at 0x20eb8ce7190>
~10% of Majors have less than 20% share of women
~32% of Majors have less than 40 % share of women --> we can say that 32% of majors are predominantly male
~13% of Majors have more than 80% share of women
~45% of Majors have more than 60% share of women --> we can say that 45% of majors are predominantly female
rg["Unemployment_rate"].hist(bins = 25, xrot=45)
<matplotlib.axes._subplots.AxesSubplot at 0x20eba0f3670>
rg["Men"].hist(bins = 15, xrot=30)
<matplotlib.axes._subplots.AxesSubplot at 0x20eb9f8af40>
rg["Women"].hist(bins = 15, xrot=30)
<matplotlib.axes._subplots.AxesSubplot at 0x20eba1daa60>
from pandas.plotting import scatter_matrix
scatter_matrix(rg[["Sample_size", "Median"]], figsize=(10,10))
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBA26D8E0>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBA28B0A0>], [<matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBA2C24C0>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBA2EF910>]], dtype=object)
Bigger the sample_size, the Median value approaches the approximate value of around 40000.
scatter_matrix(rg[["Sample_size", "Median", "Unemployment_rate"]], figsize=(15,15))
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBA3C8F10>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBA60ABB0>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBA3E80A0>], [<matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBA4204C0>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBA44D910>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBA47ACA0>], [<matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBA47AD90>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBA4B4280>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBA50BA90>]], dtype=object)
scatter_matrix(rg[["Median", "ShareWomen", "Share_men"]], figsize=(15,15))
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBACF1280>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBAD12490>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBA87E940>], [<matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBA8AAD90>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBA8E3220>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBA913100>], [<matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBA9131F0>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBA93A9D0>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBA999880>]], dtype=object)
scatter_matrix(rg[["Median", "Full_time"]], figsize=(10,10))
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBAAC8820>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBAADBFA0>], [<matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBAB13430>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020EBAB3E880>]], dtype=object)
These scatter matrix plots only confirm our understanding so far:
rg[:10].plot.bar(x='Major', y='ShareWomen')
<matplotlib.axes._subplots.AxesSubplot at 0x20ebab87970>
rg[-10:].plot.bar(x='Major', y='ShareWomen')
<matplotlib.axes._subplots.AxesSubplot at 0x20ebac8d370>
As we can see from the first plot in this figure, 10 highest earning ranked majors have significantly lower share of women in these disciplines, while the lowest ranking median earnings (belonging mostly to the Social Science categories) are also the areas with the highest share of women.
rg[:10].plot.bar(x='Major', y='Unemployment_rate')
<matplotlib.axes._subplots.AxesSubplot at 0x20ebb7f2490>
rg[-10:].plot.bar(x='Major', y='Unemployment_rate')
<matplotlib.axes._subplots.AxesSubplot at 0x20ebb869400>
It is hard to tell from these plots if the the lowest or the highest ranked majors have greate/lower unemployment_rate. Maybe the unemployment rate is equally distributed over Majors, regardless of their rank or popularity.
Further analyses needed!
Here are some ideas: