We'll be working with a dataset on the job outcomes of students who graduated from college between 2010 and 2012. The original data on job outcomes was released by American Community Survey, which conducts surveys and aggregates the data. FiveThirtyEight cleaned the dataset and released it on their Github repo.
Each row in the dataset represents a different major in college and contains information on gender diversity, employment rates, median salaries, and more. Here are some of the columns in the dataset:
Rank
- Rank by median earnings (the dataset is ordered by this column).Major_code
- Major code.Major
- Major description.Major_category
- Category of major.Total
- Total number of people with major.Sample_size
- Sample size (unweighted) of full-time.Men
- Male graduates.Women
- Female graduates.ShareWomen
- Women as share of total.Employed
- Number employed.Median
- Median salary of full-time, year-round workers.Low_wage_jobs
- Number in low-wage service jobs.Full_time
- Number employed 35 hours or more.Part_time
- Number employed less than 35 hours.Using visualizations, we can start to explore questions from the dataset like:
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline
recent_grads = pd.read_csv('recent-grads.csv')
print(recent_grads.iloc[0])
print(recent_grads.head())
print(recent_grads.tail())
recent_grads.describe()
Rank 1 Major_code 2419 Major PETROLEUM ENGINEERING Total 2339 Men 2057 Women 282 Major_category Engineering ShareWomen 0.120564 Sample_size 36 Employed 1976 Full_time 1849 Part_time 270 Full_time_year_round 1207 Unemployed 37 Unemployment_rate 0.0183805 Median 110000 P25th 95000 P75th 125000 College_jobs 1534 Non_college_jobs 364 Low_wage_jobs 193 Name: 0, dtype: object Rank Major_code Major Total \ 0 1 2419 PETROLEUM ENGINEERING 2339.0 1 2 2416 MINING AND MINERAL ENGINEERING 756.0 2 3 2415 METALLURGICAL ENGINEERING 856.0 3 4 2417 NAVAL ARCHITECTURE AND MARINE ENGINEERING 1258.0 4 5 2405 CHEMICAL ENGINEERING 32260.0 Men Women Major_category ShareWomen Sample_size Employed \ 0 2057.0 282.0 Engineering 0.120564 36 1976 1 679.0 77.0 Engineering 0.101852 7 640 2 725.0 131.0 Engineering 0.153037 3 648 3 1123.0 135.0 Engineering 0.107313 16 758 4 21239.0 11021.0 Engineering 0.341631 289 25694 ... Part_time Full_time_year_round Unemployed \ 0 ... 270 1207 37 1 ... 170 388 85 2 ... 133 340 16 3 ... 150 692 40 4 ... 5180 16697 1672 Unemployment_rate Median P25th P75th College_jobs Non_college_jobs \ 0 0.018381 110000 95000 125000 1534 364 1 0.117241 75000 55000 90000 350 257 2 0.024096 73000 50000 105000 456 176 3 0.050125 70000 43000 80000 529 102 4 0.061098 65000 50000 75000 18314 4440 Low_wage_jobs 0 193 1 50 2 0 3 0 4 972 [5 rows x 21 columns] Rank Major_code Major Total Men Women \ 168 169 3609 ZOOLOGY 8409.0 3050.0 5359.0 169 170 5201 EDUCATIONAL PSYCHOLOGY 2854.0 522.0 2332.0 170 171 5202 CLINICAL PSYCHOLOGY 2838.0 568.0 2270.0 171 172 5203 COUNSELING PSYCHOLOGY 4626.0 931.0 3695.0 172 173 3501 LIBRARY SCIENCE 1098.0 134.0 964.0 Major_category ShareWomen Sample_size Employed \ 168 Biology & Life Science 0.637293 47 6259 169 Psychology & Social Work 0.817099 7 2125 170 Psychology & Social Work 0.799859 13 2101 171 Psychology & Social Work 0.798746 21 3777 172 Education 0.877960 2 742 ... Part_time Full_time_year_round Unemployed \ 168 ... 2190 3602 304 169 ... 572 1211 148 170 ... 648 1293 368 171 ... 965 2738 214 172 ... 237 410 87 Unemployment_rate Median P25th P75th College_jobs Non_college_jobs \ 168 0.046320 26000 20000 39000 2771 2947 169 0.065112 25000 24000 34000 1488 615 170 0.149048 25000 25000 40000 986 870 171 0.053621 23400 19200 26000 2403 1245 172 0.104946 22000 20000 22000 288 338 Low_wage_jobs 168 743 169 82 170 622 171 308 172 192 [5 rows x 21 columns]
Rank | Major_code | Total | Men | Women | ShareWomen | Sample_size | Employed | Full_time | Part_time | Full_time_year_round | Unemployed | Unemployment_rate | Median | P25th | P75th | College_jobs | Non_college_jobs | Low_wage_jobs | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 173.000000 | 173.000000 | 172.000000 | 172.000000 | 172.000000 | 172.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 | 173.000000 |
mean | 87.000000 | 3879.815029 | 39370.081395 | 16723.406977 | 22646.674419 | 0.522223 | 356.080925 | 31192.763006 | 26029.306358 | 8832.398844 | 19694.427746 | 2416.329480 | 0.068191 | 40151.445087 | 29501.445087 | 51494.219653 | 12322.635838 | 13284.497110 | 3859.017341 |
std | 50.084928 | 1687.753140 | 63483.491009 | 28122.433474 | 41057.330740 | 0.231205 | 618.361022 | 50675.002241 | 42869.655092 | 14648.179473 | 33160.941514 | 4112.803148 | 0.030331 | 11470.181802 | 9166.005235 | 14906.279740 | 21299.868863 | 23789.655363 | 6944.998579 |
min | 1.000000 | 1100.000000 | 124.000000 | 119.000000 | 0.000000 | 0.000000 | 2.000000 | 0.000000 | 111.000000 | 0.000000 | 111.000000 | 0.000000 | 0.000000 | 22000.000000 | 18500.000000 | 22000.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 44.000000 | 2403.000000 | 4549.750000 | 2177.500000 | 1778.250000 | 0.336026 | 39.000000 | 3608.000000 | 3154.000000 | 1030.000000 | 2453.000000 | 304.000000 | 0.050306 | 33000.000000 | 24000.000000 | 42000.000000 | 1675.000000 | 1591.000000 | 340.000000 |
50% | 87.000000 | 3608.000000 | 15104.000000 | 5434.000000 | 8386.500000 | 0.534024 | 130.000000 | 11797.000000 | 10048.000000 | 3299.000000 | 7413.000000 | 893.000000 | 0.067961 | 36000.000000 | 27000.000000 | 47000.000000 | 4390.000000 | 4595.000000 | 1231.000000 |
75% | 130.000000 | 5503.000000 | 38909.750000 | 14631.000000 | 22553.750000 | 0.703299 | 338.000000 | 31433.000000 | 25147.000000 | 9948.000000 | 16891.000000 | 2393.000000 | 0.087557 | 45000.000000 | 33000.000000 | 60000.000000 | 14444.000000 | 11783.000000 | 3466.000000 |
max | 173.000000 | 6403.000000 | 393735.000000 | 173809.000000 | 307087.000000 | 0.968954 | 4212.000000 | 307933.000000 | 251540.000000 | 115172.000000 | 199897.000000 | 28169.000000 | 0.177226 | 110000.000000 | 95000.000000 | 125000.000000 | 151643.000000 | 148395.000000 | 48207.000000 |
recent_grads
Rank | Major_code | Major | Total | Men | Women | Major_category | ShareWomen | Sample_size | Employed | ... | Part_time | Full_time_year_round | Unemployed | Unemployment_rate | Median | P25th | P75th | College_jobs | Non_college_jobs | Low_wage_jobs | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2419 | PETROLEUM ENGINEERING | 2339.0 | 2057.0 | 282.0 | Engineering | 0.120564 | 36 | 1976 | ... | 270 | 1207 | 37 | 0.018381 | 110000 | 95000 | 125000 | 1534 | 364 | 193 |
1 | 2 | 2416 | MINING AND MINERAL ENGINEERING | 756.0 | 679.0 | 77.0 | Engineering | 0.101852 | 7 | 640 | ... | 170 | 388 | 85 | 0.117241 | 75000 | 55000 | 90000 | 350 | 257 | 50 |
2 | 3 | 2415 | METALLURGICAL ENGINEERING | 856.0 | 725.0 | 131.0 | Engineering | 0.153037 | 3 | 648 | ... | 133 | 340 | 16 | 0.024096 | 73000 | 50000 | 105000 | 456 | 176 | 0 |
3 | 4 | 2417 | NAVAL ARCHITECTURE AND MARINE ENGINEERING | 1258.0 | 1123.0 | 135.0 | Engineering | 0.107313 | 16 | 758 | ... | 150 | 692 | 40 | 0.050125 | 70000 | 43000 | 80000 | 529 | 102 | 0 |
4 | 5 | 2405 | CHEMICAL ENGINEERING | 32260.0 | 21239.0 | 11021.0 | Engineering | 0.341631 | 289 | 25694 | ... | 5180 | 16697 | 1672 | 0.061098 | 65000 | 50000 | 75000 | 18314 | 4440 | 972 |
5 | 6 | 2418 | NUCLEAR ENGINEERING | 2573.0 | 2200.0 | 373.0 | Engineering | 0.144967 | 17 | 1857 | ... | 264 | 1449 | 400 | 0.177226 | 65000 | 50000 | 102000 | 1142 | 657 | 244 |
6 | 7 | 6202 | ACTUARIAL SCIENCE | 3777.0 | 2110.0 | 1667.0 | Business | 0.441356 | 51 | 2912 | ... | 296 | 2482 | 308 | 0.095652 | 62000 | 53000 | 72000 | 1768 | 314 | 259 |
7 | 8 | 5001 | ASTRONOMY AND ASTROPHYSICS | 1792.0 | 832.0 | 960.0 | Physical Sciences | 0.535714 | 10 | 1526 | ... | 553 | 827 | 33 | 0.021167 | 62000 | 31500 | 109000 | 972 | 500 | 220 |
8 | 9 | 2414 | MECHANICAL ENGINEERING | 91227.0 | 80320.0 | 10907.0 | Engineering | 0.119559 | 1029 | 76442 | ... | 13101 | 54639 | 4650 | 0.057342 | 60000 | 48000 | 70000 | 52844 | 16384 | 3253 |
9 | 10 | 2408 | ELECTRICAL ENGINEERING | 81527.0 | 65511.0 | 16016.0 | Engineering | 0.196450 | 631 | 61928 | ... | 12695 | 41413 | 3895 | 0.059174 | 60000 | 45000 | 72000 | 45829 | 10874 | 3170 |
10 | 11 | 2407 | COMPUTER ENGINEERING | 41542.0 | 33258.0 | 8284.0 | Engineering | 0.199413 | 399 | 32506 | ... | 5146 | 23621 | 2275 | 0.065409 | 60000 | 45000 | 75000 | 23694 | 5721 | 980 |
11 | 12 | 2401 | AEROSPACE ENGINEERING | 15058.0 | 12953.0 | 2105.0 | Engineering | 0.139793 | 147 | 11391 | ... | 2724 | 8790 | 794 | 0.065162 | 60000 | 42000 | 70000 | 8184 | 2425 | 372 |
12 | 13 | 2404 | BIOMEDICAL ENGINEERING | 14955.0 | 8407.0 | 6548.0 | Engineering | 0.437847 | 79 | 10047 | ... | 2694 | 5986 | 1019 | 0.092084 | 60000 | 36000 | 70000 | 6439 | 2471 | 789 |
13 | 14 | 5008 | MATERIALS SCIENCE | 4279.0 | 2949.0 | 1330.0 | Engineering | 0.310820 | 22 | 3307 | ... | 878 | 1967 | 78 | 0.023043 | 60000 | 39000 | 65000 | 2626 | 391 | 81 |
14 | 15 | 2409 | ENGINEERING MECHANICS PHYSICS AND SCIENCE | 4321.0 | 3526.0 | 795.0 | Engineering | 0.183985 | 30 | 3608 | ... | 811 | 2004 | 23 | 0.006334 | 58000 | 25000 | 74000 | 2439 | 947 | 263 |
15 | 16 | 2402 | BIOLOGICAL ENGINEERING | 8925.0 | 6062.0 | 2863.0 | Engineering | 0.320784 | 55 | 6170 | ... | 1983 | 3413 | 589 | 0.087143 | 57100 | 40000 | 76000 | 3603 | 1595 | 524 |
16 | 17 | 2412 | INDUSTRIAL AND MANUFACTURING ENGINEERING | 18968.0 | 12453.0 | 6515.0 | Engineering | 0.343473 | 183 | 15604 | ... | 2243 | 11326 | 699 | 0.042876 | 57000 | 37900 | 67000 | 8306 | 3235 | 640 |
17 | 18 | 2400 | GENERAL ENGINEERING | 61152.0 | 45683.0 | 15469.0 | Engineering | 0.252960 | 425 | 44931 | ... | 7199 | 33540 | 2859 | 0.059824 | 56000 | 36000 | 69000 | 26898 | 11734 | 3192 |
18 | 19 | 2403 | ARCHITECTURAL ENGINEERING | 2825.0 | 1835.0 | 990.0 | Engineering | 0.350442 | 26 | 2575 | ... | 343 | 1848 | 170 | 0.061931 | 54000 | 38000 | 65000 | 1665 | 649 | 137 |
19 | 20 | 3201 | COURT REPORTING | 1148.0 | 877.0 | 271.0 | Law & Public Policy | 0.236063 | 14 | 930 | ... | 223 | 808 | 11 | 0.011690 | 54000 | 50000 | 54000 | 402 | 528 | 144 |
20 | 21 | 2102 | COMPUTER SCIENCE | 128319.0 | 99743.0 | 28576.0 | Computers & Mathematics | 0.222695 | 1196 | 102087 | ... | 18726 | 70932 | 6884 | 0.063173 | 53000 | 39000 | 70000 | 68622 | 25667 | 5144 |
21 | 22 | 1104 | FOOD SCIENCE | NaN | NaN | NaN | Agriculture & Natural Resources | NaN | 36 | 3149 | ... | 1121 | 1735 | 338 | 0.096931 | 53000 | 32000 | 70000 | 1183 | 1274 | 485 |
22 | 23 | 2502 | ELECTRICAL ENGINEERING TECHNOLOGY | 11565.0 | 8181.0 | 3384.0 | Engineering | 0.292607 | 97 | 8587 | ... | 1873 | 5681 | 824 | 0.087557 | 52000 | 35000 | 60000 | 5126 | 2686 | 696 |
23 | 24 | 2413 | MATERIALS ENGINEERING AND MATERIALS SCIENCE | 2993.0 | 2020.0 | 973.0 | Engineering | 0.325092 | 22 | 2449 | ... | 1040 | 1151 | 70 | 0.027789 | 52000 | 35000 | 62000 | 1911 | 305 | 70 |
24 | 25 | 6212 | MANAGEMENT INFORMATION SYSTEMS AND STATISTICS | 18713.0 | 13496.0 | 5217.0 | Business | 0.278790 | 278 | 16413 | ... | 2420 | 13017 | 1015 | 0.058240 | 51000 | 38000 | 60000 | 6342 | 5741 | 708 |
25 | 26 | 2406 | CIVIL ENGINEERING | 53153.0 | 41081.0 | 12072.0 | Engineering | 0.227118 | 565 | 43041 | ... | 10080 | 29196 | 3270 | 0.070610 | 50000 | 40000 | 60000 | 28526 | 9356 | 2899 |
26 | 27 | 5601 | CONSTRUCTION SERVICES | 18498.0 | 16820.0 | 1678.0 | Industrial Arts & Consumer Services | 0.090713 | 295 | 16318 | ... | 1751 | 12313 | 1042 | 0.060023 | 50000 | 36000 | 60000 | 3275 | 5351 | 703 |
27 | 28 | 6204 | OPERATIONS LOGISTICS AND E-COMMERCE | 11732.0 | 7921.0 | 3811.0 | Business | 0.324838 | 156 | 10027 | ... | 1183 | 7724 | 504 | 0.047859 | 50000 | 40000 | 60000 | 1466 | 3629 | 285 |
28 | 29 | 2499 | MISCELLANEOUS ENGINEERING | 9133.0 | 7398.0 | 1735.0 | Engineering | 0.189970 | 118 | 7428 | ... | 1662 | 5476 | 597 | 0.074393 | 50000 | 39000 | 65000 | 3445 | 2426 | 365 |
29 | 30 | 5402 | PUBLIC POLICY | 5978.0 | 2639.0 | 3339.0 | Law & Public Policy | 0.558548 | 55 | 4547 | ... | 1306 | 2776 | 670 | 0.128426 | 50000 | 35000 | 70000 | 1550 | 1871 | 340 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
143 | 144 | 1105 | PLANT SCIENCE AND AGRONOMY | 7416.0 | 4897.0 | 2519.0 | Agriculture & Natural Resources | 0.339671 | 110 | 6594 | ... | 1246 | 4522 | 314 | 0.045455 | 32000 | 22900 | 40000 | 2089 | 3545 | 1231 |
144 | 145 | 2308 | SCIENCE AND COMPUTER TEACHER EDUCATION | 6483.0 | 2049.0 | 4434.0 | Education | 0.683943 | 59 | 5362 | ... | 1227 | 3247 | 266 | 0.047264 | 32000 | 28000 | 39000 | 4214 | 1106 | 591 |
145 | 146 | 5200 | PSYCHOLOGY | 393735.0 | 86648.0 | 307087.0 | Psychology & Social Work | 0.779933 | 2584 | 307933 | ... | 115172 | 174438 | 28169 | 0.083811 | 31500 | 24000 | 41000 | 125148 | 141860 | 48207 |
146 | 147 | 6002 | MUSIC | 60633.0 | 29909.0 | 30724.0 | Arts | 0.506721 | 419 | 47662 | ... | 24943 | 21425 | 3918 | 0.075960 | 31000 | 22300 | 42000 | 13752 | 28786 | 9286 |
147 | 148 | 2306 | PHYSICAL AND HEALTH EDUCATION TEACHING | 28213.0 | 15670.0 | 12543.0 | Education | 0.444582 | 259 | 23794 | ... | 7230 | 13651 | 1920 | 0.074667 | 31000 | 24000 | 40000 | 12777 | 9328 | 2042 |
148 | 149 | 6006 | ART HISTORY AND CRITICISM | 21030.0 | 3240.0 | 17790.0 | Humanities & Liberal Arts | 0.845934 | 204 | 17579 | ... | 6140 | 9965 | 1128 | 0.060298 | 31000 | 23000 | 40000 | 5139 | 9738 | 3426 |
149 | 150 | 6000 | FINE ARTS | 74440.0 | 24786.0 | 49654.0 | Arts | 0.667034 | 623 | 59679 | ... | 23656 | 31877 | 5486 | 0.084186 | 30500 | 21000 | 41000 | 20792 | 32725 | 11880 |
150 | 151 | 2901 | FAMILY AND CONSUMER SCIENCES | 58001.0 | 5166.0 | 52835.0 | Industrial Arts & Consumer Services | 0.910933 | 518 | 46624 | ... | 15872 | 26906 | 3355 | 0.067128 | 30000 | 22900 | 40000 | 20985 | 20133 | 5248 |
151 | 152 | 5404 | SOCIAL WORK | 53552.0 | 5137.0 | 48415.0 | Psychology & Social Work | 0.904075 | 374 | 45038 | ... | 13481 | 27588 | 3329 | 0.068828 | 30000 | 25000 | 35000 | 27449 | 14416 | 4344 |
152 | 153 | 1103 | ANIMAL SCIENCES | 21573.0 | 5347.0 | 16226.0 | Agriculture & Natural Resources | 0.752144 | 255 | 17112 | ... | 5353 | 10824 | 917 | 0.050862 | 30000 | 22000 | 40000 | 5443 | 9571 | 2125 |
153 | 154 | 6003 | VISUAL AND PERFORMING ARTS | 16250.0 | 4133.0 | 12117.0 | Arts | 0.745662 | 132 | 12870 | ... | 6253 | 6322 | 1465 | 0.102197 | 30000 | 22000 | 40000 | 3849 | 7635 | 2840 |
154 | 155 | 2312 | TEACHER EDUCATION: MULTIPLE LEVELS | 14443.0 | 2734.0 | 11709.0 | Education | 0.810704 | 142 | 13076 | ... | 2214 | 8457 | 496 | 0.036546 | 30000 | 24000 | 37000 | 10766 | 1949 | 722 |
155 | 156 | 5299 | MISCELLANEOUS PSYCHOLOGY | 9628.0 | 1936.0 | 7692.0 | Psychology & Social Work | 0.798920 | 60 | 7653 | ... | 3221 | 3838 | 419 | 0.051908 | 30000 | 20800 | 40000 | 2960 | 3948 | 1650 |
156 | 157 | 5403 | HUMAN SERVICES AND COMMUNITY ORGANIZATION | 9374.0 | 885.0 | 8489.0 | Psychology & Social Work | 0.905590 | 89 | 8294 | ... | 2405 | 5061 | 326 | 0.037819 | 30000 | 24000 | 35000 | 2878 | 4595 | 724 |
157 | 158 | 3402 | HUMANITIES | 6652.0 | 2013.0 | 4639.0 | Humanities & Liberal Arts | 0.697384 | 49 | 5052 | ... | 2225 | 2661 | 372 | 0.068584 | 30000 | 20000 | 49000 | 1168 | 3354 | 1141 |
158 | 159 | 4901 | THEOLOGY AND RELIGIOUS VOCATIONS | 30207.0 | 18616.0 | 11591.0 | Humanities & Liberal Arts | 0.383719 | 310 | 24202 | ... | 8767 | 13944 | 1617 | 0.062628 | 29000 | 22000 | 38000 | 9927 | 12037 | 3304 |
159 | 160 | 6007 | STUDIO ARTS | 16977.0 | 4754.0 | 12223.0 | Arts | 0.719974 | 182 | 13908 | ... | 5673 | 7413 | 1368 | 0.089552 | 29000 | 19200 | 38300 | 3948 | 8707 | 3586 |
160 | 161 | 2201 | COSMETOLOGY SERVICES AND CULINARY ARTS | 10510.0 | 4364.0 | 6146.0 | Industrial Arts & Consumer Services | 0.584776 | 117 | 8650 | ... | 2064 | 5949 | 510 | 0.055677 | 29000 | 20000 | 36000 | 563 | 7384 | 3163 |
161 | 162 | 1199 | MISCELLANEOUS AGRICULTURE | 1488.0 | 404.0 | 1084.0 | Agriculture & Natural Resources | 0.728495 | 24 | 1290 | ... | 335 | 936 | 82 | 0.059767 | 29000 | 23000 | 42100 | 483 | 626 | 31 |
162 | 163 | 5502 | ANTHROPOLOGY AND ARCHEOLOGY | 38844.0 | 11376.0 | 27468.0 | Humanities & Liberal Arts | 0.707136 | 247 | 29633 | ... | 14515 | 13232 | 3395 | 0.102792 | 28000 | 20000 | 38000 | 9805 | 16693 | 6866 |
163 | 164 | 6102 | COMMUNICATION DISORDERS SCIENCES AND SERVICES | 38279.0 | 1225.0 | 37054.0 | Health | 0.967998 | 95 | 29763 | ... | 13862 | 14460 | 1487 | 0.047584 | 28000 | 20000 | 40000 | 19957 | 9404 | 5125 |
164 | 165 | 2307 | EARLY CHILDHOOD EDUCATION | 37589.0 | 1167.0 | 36422.0 | Education | 0.968954 | 342 | 32551 | ... | 7001 | 20748 | 1360 | 0.040105 | 28000 | 21000 | 35000 | 23515 | 7705 | 2868 |
165 | 166 | 2603 | OTHER FOREIGN LANGUAGES | 11204.0 | 3472.0 | 7732.0 | Humanities & Liberal Arts | 0.690111 | 56 | 7052 | ... | 3685 | 3214 | 846 | 0.107116 | 27500 | 22900 | 38000 | 2326 | 3703 | 1115 |
166 | 167 | 6001 | DRAMA AND THEATER ARTS | 43249.0 | 14440.0 | 28809.0 | Arts | 0.666119 | 357 | 36165 | ... | 15994 | 16891 | 3040 | 0.077541 | 27000 | 19200 | 35000 | 6994 | 25313 | 11068 |
167 | 168 | 3302 | COMPOSITION AND RHETORIC | 18953.0 | 7022.0 | 11931.0 | Humanities & Liberal Arts | 0.629505 | 151 | 15053 | ... | 6612 | 7832 | 1340 | 0.081742 | 27000 | 20000 | 35000 | 4855 | 8100 | 3466 |
168 | 169 | 3609 | ZOOLOGY | 8409.0 | 3050.0 | 5359.0 | Biology & Life Science | 0.637293 | 47 | 6259 | ... | 2190 | 3602 | 304 | 0.046320 | 26000 | 20000 | 39000 | 2771 | 2947 | 743 |
169 | 170 | 5201 | EDUCATIONAL PSYCHOLOGY | 2854.0 | 522.0 | 2332.0 | Psychology & Social Work | 0.817099 | 7 | 2125 | ... | 572 | 1211 | 148 | 0.065112 | 25000 | 24000 | 34000 | 1488 | 615 | 82 |
170 | 171 | 5202 | CLINICAL PSYCHOLOGY | 2838.0 | 568.0 | 2270.0 | Psychology & Social Work | 0.799859 | 13 | 2101 | ... | 648 | 1293 | 368 | 0.149048 | 25000 | 25000 | 40000 | 986 | 870 | 622 |
171 | 172 | 5203 | COUNSELING PSYCHOLOGY | 4626.0 | 931.0 | 3695.0 | Psychology & Social Work | 0.798746 | 21 | 3777 | ... | 965 | 2738 | 214 | 0.053621 | 23400 | 19200 | 26000 | 2403 | 1245 | 308 |
172 | 173 | 3501 | LIBRARY SCIENCE | 1098.0 | 134.0 | 964.0 | Education | 0.877960 | 2 | 742 | ... | 237 | 410 | 87 | 0.104946 | 22000 | 20000 | 22000 | 288 | 338 | 192 |
173 rows × 21 columns
raw_data_count = 173
recent_grads = recent_grads.dropna()
cleaned_data_count = 172
Generate scatter plots in separate jupyter notebook cells to explore the following relations:
# Sample_size and Median
ax1 = recent_grads.plot(x='Sample_size', y='Median', kind='scatter')
ax1.set_title('Sample_size and Median')
ax1.set_xlim(0,3000)
(0, 3000)
# Sample_size and Unemployment_rate
ax2 = recent_grads.plot(x='Sample_size', y='Unemployment_rate', kind='scatter')
ax2.set_title('Sample_size and Unemployment_rate')
ax2.set_xlim(0,3000)
(0, 3000)
# Full_time and Median
ax3 = recent_grads.plot(x='Full_time', y='Median', kind='scatter')
ax3.set_title('Full_time and Median')
ax3.set_xlim(0,50000)
ax3.set_ylim(20000,80000)
(20000, 80000)
# ShareWomen and Unemployment_rate
ax4 = recent_grads.plot(x='ShareWomen', y='Unemployment_rate', kind='scatter')
ax4.set_title('ShareWomen and Unemployment_rate')
<matplotlib.text.Text at 0x7fc1153d3278>
# Men and Median
ax5 = recent_grads.plot(x='Men', y='Median', kind='scatter')
ax5.set_title('Men and Median')
ax5.set_xlim(0,150000)
(0, 150000)
# Women and Median
ax6 = recent_grads.plot(x='Women', y='Median', kind='scatter')
ax6.set_title('Women and Median')
ax6.set_xlim(0,150000)
(0, 150000)
# Total and Median
ax7 = recent_grads.plot(x='Total', y='Median', kind='scatter')
ax7.set_xlim(0,50000)
ax7.set_ylim(20000,80000)
(20000, 80000)
The scatter plot above represents the correlation of the Median
(median salary of full-time, year-round workers) with the Total
(total number of people with major). We can observe that as the Median
incrases, the Total
actually goes down.
Therefore, according to our scatter plot students in more popular majors do not actually make more money.
# ShareWomen and Median
ax8 = recent_grads.plot(x='ShareWomen', y='Median', kind='scatter')
ax8.set_xlim(0,1)
ax8.set_ylim(20000,80000)
(20000, 80000)
We can observe the negative correlation between the Median
(median salary of full-time, year-round workers) and the ShareWomen
(share of women of total). We can observe that as the Median
goes up, the ShareWomen
goes down.
Students that majored in subjects that were majority female make less money.
The ax3
scatter plot represents the correlation between the Median
(median salary of full-time, year-round workers) and the Full_time
(number employed 35 hours or more). We can see that there is a slight negative correlation: as the Median
goes up, the Full_time
goes down, but there are some outliers as well. So the link between the two is medium strong.
Generate histograms in separate jupyter notebook cells to explore the distributions of the following columns:
# Sample_size
recent_grads['Sample_size'].hist(bins=25, range=(0,2500))
<matplotlib.axes._subplots.AxesSubplot at 0x7fc11520bd30>
# Median
recent_grads['Median'].hist(bins=20, range=(20000,80000))
<matplotlib.axes._subplots.AxesSubplot at 0x7fc11519db38>
# Employed
recent_grads['Employed'].hist(bins=25, range=(0,60000))
<matplotlib.axes._subplots.AxesSubplot at 0x7fc115089d68>
# Full_time
recent_grads['Full_time'].hist(bins=25, range=(0,50000))
<matplotlib.axes._subplots.AxesSubplot at 0x7fc1150196a0>
# ShareWomen
recent_grads['ShareWomen'].hist(bins=10, range=(0,1))
<matplotlib.axes._subplots.AxesSubplot at 0x7fc114f4ac88>
# Unemployment_rate
recent_grads['Unemployment_rate'].hist(bins=25, range=(0,0.19))
<matplotlib.axes._subplots.AxesSubplot at 0x7fc114f135c0>
# Men
recent_grads['Men'].hist(bins=30, range=(0,40000))
<matplotlib.axes._subplots.AxesSubplot at 0x7fc114e11710>
# Women
recent_grads['Women'].hist(bins=30, range=(0,40000))
<matplotlib.axes._subplots.AxesSubplot at 0x7fc114d43b70>
Looking at the ShareWomen
histogram that shows the distribution over the share of women, we can see that highest concentration is in the range between 60% and 80%. Meaning, approx. 80% of majors are predominantly female, and the rest is predominantly male.
If we take a look at the Median
histogram we can see that the most common median salary range is between 30,000 and 40,000.
from pandas.plotting import scatter_matrix
scatter_matrix(recent_grads[['Sample_size','Median']], figsize=(9,9))
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fc114c8cf28>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fc114c147b8>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7fc114bdf7f0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fc114b96f60>]], dtype=object)
scatter_matrix(recent_grads[['Sample_size','Median', 'Unemployment_rate']], figsize=(11,11))
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fc114ac30f0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fc114a3d828>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fc114a0e0f0>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7fc1149c2c50>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fc11498bd30>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fc11494d748>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7fc11491f400>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fc114959278>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fc1148a3780>]], dtype=object)
scatter_matrix(recent_grads[['Total','Median']], figsize=(10,10))
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fc1147a9b38>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fc11472d7b8>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7fc112ef3b70>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fc112eaf550>]], dtype=object)
The scatter matrix above shows us the negative correlation between the Median
and Total
, as one goes down, the other goes up. So students with popular majors do not make more money.
scatter_matrix(recent_grads[['ShareWomen','Median']], figsize=(10,10))
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fc11470acf8>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fc112da8518>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7fc112d71e80>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fc112d2cb70>]], dtype=object)
The scatter matrix above shows us the negative correlation between the Median
and ShareWomen
, as one goes down, the other goes up. So students that majored in subjects that were majority female do not make more money.
scatter_matrix(recent_grads[['Full_time','Median']], figsize=(10,10))
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7fc112d6a9b0>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fc112bcc5c0>], [<matplotlib.axes._subplots.AxesSubplot object at 0x7fc112b94eb8>, <matplotlib.axes._subplots.AxesSubplot object at 0x7fc112b51b00>]], dtype=object)
There is a slight negative correlation: as the Median goes up, the Full_time goes down, but there are some outliers as well.
recent_grads[:10].plot.bar(x='Major', y='ShareWomen')
<matplotlib.axes._subplots.AxesSubplot at 0x7fc112a917b8>
recent_grads[-10:].plot.bar(x='Major', y='ShareWomen')
<matplotlib.axes._subplots.AxesSubplot at 0x7fc112a53518>
The percentages of women in the first ten rows that are mainly engineering category majors were relatively low with the highest value at 0.54, compared to percentages of women in the last ten rows that represent social sciences and arts majors are nearly twice as high, ranging from 0.6 to 1.
recent_grads[:10].plot.bar(x='Major', y='Unemployment_rate')
<matplotlib.axes._subplots.AxesSubplot at 0x7fc112da8da0>
recent_grads[-10:].plot.bar(x='Major', y='Unemployment_rate')
<matplotlib.axes._subplots.AxesSubplot at 0x7fc1127d11d0>
The bar plots above represent the Unemployment_rate
in the last 10 and first 10 majors in the dataset. The first 10 majors are associated with engineering and physics, whereas the last 10 are the majors of social sciences, education and arts.
Petroleum Engineering
, Metallurgical Engineering
, and Astronomy and Astrophysics
jobs since the unemployment rate is the lowest among other majors - up to 3%.Nuclear Engineering
and Clinical Psychology
majors are hired the least because the unemployment rates are the highest among other majors - up to 18%.Use a grouped bar plot to compare the number of men with the number of women in each category of majors:
df = pd.DataFrame(recent_grads,columns=["Major_category","Men","Women"])
df.head(10).plot.bar(x = 'Major_category')
<matplotlib.axes._subplots.AxesSubplot at 0x7fc1113e49b0>
df.tail(10).plot.bar(x = 'Major_category')
<matplotlib.axes._subplots.AxesSubplot at 0x7fc111318e10>
We see the same pattern as before: male students are dominating the Engineering
major category, whereas female students are prevailing in major categories like Health, Education, Arts
.
Use a box plot to explore the distributions of median salaries and unemployment rate:
fig, ax = plt.subplots()
num_cols = ['Median', 'Unemployment_rate']
ax.boxplot(recent_grads[num_cols].values)
{'boxes': [<matplotlib.lines.Line2D at 0x7fc1124369b0>, <matplotlib.lines.Line2D at 0x7fc1123ca860>], 'caps': [<matplotlib.lines.Line2D at 0x7fc1123c0eb8>, <matplotlib.lines.Line2D at 0x7fc1123c0fd0>, <matplotlib.lines.Line2D at 0x7fc1123d67f0>, <matplotlib.lines.Line2D at 0x7fc1123d6ef0>], 'fliers': [<matplotlib.lines.Line2D at 0x7fc1123ca748>, <matplotlib.lines.Line2D at 0x7fc1123dbf98>], 'means': [], 'medians': [<matplotlib.lines.Line2D at 0x7fc1123c5828>, <matplotlib.lines.Line2D at 0x7fc1123db160>], 'whiskers': [<matplotlib.lines.Line2D at 0x7fc112436b38>, <matplotlib.lines.Line2D at 0x7fc1123c0748>, <matplotlib.lines.Line2D at 0x7fc1123d0780>, <matplotlib.lines.Line2D at 0x7fc1123d0f98>]}
The box plot above represents the distribution of median salaries and unemployment rate. We can see that values are not dispersed equally.
Use a hexagonal bin plot to visualize the columns that had dense scatter plots from earlier in the project:
# Total Vs Median
recent_grads.plot.hexbin(x='Total', y='Median', gridsize=25)
<matplotlib.axes._subplots.AxesSubplot at 0x7fc1114f2160>
# ShareWomen Vs Median
recent_grads.plot.hexbin(x='ShareWomen', y='Median', gridsize=25)
<matplotlib.axes._subplots.AxesSubplot at 0x7fc111643518>
# Full_time Vs Median
recent_grads.plot.hexbin(x='Full_time', y='Median', gridsize=25)
<matplotlib.axes._subplots.AxesSubplot at 0x7fc11267b550>
We visualized the salaries of graduates based on college majors with the help of scatter plos, bar plos, histograms, and scatter matrix plots in pandas. Different visualization methods led us to similar conclusions: