Suppose we are data analysts for an online e-learning company that specializes in programming courses. We cover domains such as data science and game development, but our primary focus is web and mobile development. Our goal is to promote our products and invest money in more advertisement, but to do that we need to know what markets to advertise in. We ultiized three surveys related to programming/web development from FreeCodeCamp and Stack Overflow.
These surveys were conducted online by participants worldwide in 2016, 2017, and 2018. FreeCodeCamp's surveys targeted new programmers and asked many questions related to career interest, income expectations, age, gender, home country, time spent programming, and so on. Stack Overflow's 2018 survey was aimed primarily at individuals already in the developer community concerning topics from favorite technologies to job preferences.
We discovered that new programmers are interested in a wide variety of career fields to include web development, data science, data engineering, game development, QA engineering, machine learning, and many other careers. We found that the likely motivator for their programming journey was to advance their income and career opportunities. With this knowledge, we need to ensure that our courses stay up to date, relevant, and beneficial for our customers.
Most importantly, after exploring the surveys we discovered that the two best potential countries to invest our advertising in were the United States and India. Both countries had the highest number of survey participants, which indicates that most new programmers are presumably most numerous in these two countries. Secondly, The US has the highest average monthly spending for programming education, whereas India has a lower average spending. However, India's average monthly spending is still around the same amount as our monthly subscription ($59 US dollars per month).
In short, the two best markets for advertising include the United States and India, we recommend to the marketing team to focus their efforts into these two regions.
We want to answer questions about a population of new coders that are interested in the subjects we teach. We'd like to know:
FreeCodeCamp Survey: https://www.freecodecamp.org/news/we-asked-20-000-people-who-they-are-and-how-theyre-learning-to-code-fff5d668969
Github repository: Survey Year 2017: https://github.com/freeCodeCamp/2017-new-coder-survey/tree/master/clean-data
Survey Year 2016: https://github.com/freeCodeCamp/2016-new-coder-survey#about-the-data
Stack Overflow Survey: https://www.kaggle.com/datasets/stackoverflow/stack-overflow-2018-developer-survey
Some limitations for analyzing survey data:
Method
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.style as style
#style.use("fivethirtyeight")
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")
#pd.options.display.float_format = '{:20,.2f}'.format
pd.options.display.max_columns = 150 # to avoid truncated output
# Freecodecamp survey 2017
csv = pd.read_csv("2017-fCC-New-Coders-Survey-Data.csv", low_memory= False)
# Freecodecamp survey 2016
csv2016 = pd.read_csv("2016-fCC-New-Coders-Survey-Data.csv", low_memory = False)
# Stack exchange survey
exchange = pd.read_csv("survey_results_public.csv", low_memory= False)
csv.head()
Age | AttendedBootcamp | BootcampFinish | BootcampLoanYesNo | BootcampName | BootcampRecommend | ChildrenNumber | CityPopulation | CodeEventConferences | CodeEventDjangoGirls | CodeEventFCC | CodeEventGameJam | CodeEventGirlDev | CodeEventHackathons | CodeEventMeetup | CodeEventNodeSchool | CodeEventNone | CodeEventOther | CodeEventRailsBridge | CodeEventRailsGirls | CodeEventStartUpWknd | CodeEventWkdBootcamps | CodeEventWomenCode | CodeEventWorkshops | CommuteTime | CountryCitizen | CountryLive | EmploymentField | EmploymentFieldOther | EmploymentStatus | EmploymentStatusOther | ExpectedEarning | FinanciallySupporting | FirstDevJob | Gender | GenderOther | HasChildren | HasDebt | HasFinancialDependents | HasHighSpdInternet | HasHomeMortgage | HasServedInMilitary | HasStudentDebt | HomeMortgageOwe | HoursLearning | ID.x | ID.y | Income | IsEthnicMinority | IsReceiveDisabilitiesBenefits | IsSoftwareDev | IsUnderEmployed | JobApplyWhen | JobInterestBackEnd | JobInterestDataEngr | JobInterestDataSci | JobInterestDevOps | JobInterestFrontEnd | JobInterestFullStack | JobInterestGameDev | JobInterestInfoSec | JobInterestMobile | JobInterestOther | JobInterestProjMngr | JobInterestQAEngr | JobInterestUX | JobPref | JobRelocateYesNo | JobRoleInterest | JobWherePref | LanguageAtHome | MaritalStatus | MoneyForLearning | MonthsProgramming | NetworkID | Part1EndTime | Part1StartTime | Part2EndTime | Part2StartTime | PodcastChangeLog | PodcastCodeNewbie | PodcastCodePen | PodcastDevTea | PodcastDotNET | PodcastGiantRobots | PodcastJSAir | PodcastJSJabber | PodcastNone | PodcastOther | PodcastProgThrowdown | PodcastRubyRogues | PodcastSEDaily | PodcastSERadio | PodcastShopTalk | PodcastTalkPython | PodcastTheWebAhead | ResourceCodecademy | ResourceCodeWars | ResourceCoursera | ResourceCSS | ResourceEdX | ResourceEgghead | ResourceFCC | ResourceHackerRank | ResourceKA | ResourceLynda | ResourceMDN | ResourceOdinProj | ResourceOther | ResourcePluralSight | ResourceSkillcrush | ResourceSO | ResourceTreehouse | ResourceUdacity | ResourceUdemy | ResourceW3S | SchoolDegree | SchoolMajor | StudentDebtOwe | YouTubeCodeCourse | YouTubeCodingTrain | YouTubeCodingTut360 | YouTubeComputerphile | YouTubeDerekBanas | YouTubeDevTips | YouTubeEngineeredTruth | YouTubeFCC | YouTubeFunFunFunction | YouTubeGoogleDev | YouTubeLearnCode | YouTubeLevelUpTuts | YouTubeMIT | YouTubeMozillaHacks | YouTubeOther | YouTubeSimplilearn | YouTubeTheNewBoston | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 27.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | more than 1 million | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 15 to 29 minutes | Canada | Canada | software development and IT | NaN | Employed for wages | NaN | NaN | NaN | NaN | female | NaN | NaN | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | NaN | 15.0 | 02d9465b21e8bd09374b0066fb2d5614 | eb78c1c3ac6cd9052aec557065070fbf | NaN | NaN | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | start your own business | NaN | NaN | NaN | English | married or domestic partnership | 150.0 | 6.0 | 6f1fbc6b2b | 2017-03-09 00:36:22 | 2017-03-09 00:32:59 | 2017-03-09 00:59:46 | 2017-03-09 00:36:26 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | some college credit, no degree | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 34.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | less than 100,000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | United States of America | United States of America | NaN | NaN | Not working but looking for work | NaN | 35000.0 | NaN | NaN | male | NaN | NaN | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | NaN | 10.0 | 5bfef9ecb211ec4f518cfc1d2a6f3e0c | 21db37adb60cdcafadfa7dca1b13b6b1 | NaN | 0.0 | 0.0 | 0.0 | NaN | Within 7 to 12 months | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | work for a nonprofit | 1.0 | Full-Stack Web Developer | in an office with other developers | English | single, never married | 80.0 | 6.0 | f8f8be6910 | 2017-03-09 00:37:07 | 2017-03-09 00:33:26 | 2017-03-09 00:38:59 | 2017-03-09 00:37:10 | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | 1.0 | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | 1.0 | 1.0 | some college credit, no degree | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 21.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | more than 1 million | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 15 to 29 minutes | United States of America | United States of America | software development and IT | NaN | Employed for wages | NaN | 70000.0 | NaN | NaN | male | NaN | NaN | 0.0 | 0.0 | 1.0 | NaN | 0.0 | NaN | NaN | 25.0 | 14f1863afa9c7de488050b82eb3edd96 | 21ba173828fbe9e27ccebaf4d5166a55 | 13000.0 | 1.0 | 0.0 | 0.0 | 0.0 | Within 7 to 12 months | 1.0 | NaN | NaN | 1.0 | 1.0 | 1.0 | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | work for a medium-sized company | 1.0 | Front-End Web Developer, Back-End Web Develo... | no preference | Spanish | single, never married | 1000.0 | 5.0 | 2ed189768e | 2017-03-09 00:37:58 | 2017-03-09 00:33:53 | 2017-03-09 00:40:14 | 2017-03-09 00:38:02 | 1.0 | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | Codenewbie | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | 1.0 | NaN | NaN | 1.0 | NaN | NaN | 1.0 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | NaN | high school diploma or equivalent (GED) | NaN | NaN | NaN | NaN | 1.0 | NaN | 1.0 | 1.0 | NaN | NaN | NaN | NaN | 1.0 | 1.0 | NaN | NaN | NaN | NaN | NaN |
3 | 26.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | between 100,000 and 1 million | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | I work from home | Brazil | Brazil | software development and IT | NaN | Employed for wages | NaN | 40000.0 | 0.0 | NaN | male | NaN | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 40000.0 | 14.0 | 91756eb4dc280062a541c25a3d44cfb0 | 3be37b558f02daae93a6da10f83f0c77 | 24000.0 | 0.0 | 0.0 | 0.0 | 1.0 | Within the next 6 months | 1.0 | NaN | NaN | NaN | 1.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | work for a medium-sized company | NaN | Front-End Web Developer, Full-Stack Web Deve... | from home | Portuguese | married or domestic partnership | 0.0 | 5.0 | dbdc0664d1 | 2017-03-09 00:40:13 | 2017-03-09 00:37:45 | 2017-03-09 00:42:26 | 2017-03-09 00:40:18 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | 1.0 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | some college credit, no degree | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | 1.0 | 1.0 | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN |
4 | 20.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | between 100,000 and 1 million | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Portugal | Portugal | NaN | NaN | Not working but looking for work | NaN | 140000.0 | NaN | NaN | female | NaN | NaN | 0.0 | 0.0 | 1.0 | NaN | 0.0 | NaN | NaN | 10.0 | aa3f061a1949a90b27bef7411ecd193f | d7c56bbf2c7b62096be9db010e86d96d | NaN | 0.0 | 0.0 | 0.0 | NaN | Within 7 to 12 months | 1.0 | NaN | NaN | NaN | 1.0 | 1.0 | NaN | 1.0 | 1.0 | NaN | NaN | NaN | NaN | work for a multinational corporation | 1.0 | Full-Stack Web Developer, Information Security... | in an office with other developers | Portuguese | single, never married | 0.0 | 24.0 | 11b0f2d8a9 | 2017-03-09 00:42:45 | 2017-03-09 00:39:44 | 2017-03-09 00:45:42 | 2017-03-09 00:42:50 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | bachelor's degree | Information Technology | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
exchange.head()
Respondent | Hobby | OpenSource | Country | Student | Employment | FormalEducation | UndergradMajor | CompanySize | DevType | YearsCoding | YearsCodingProf | JobSatisfaction | CareerSatisfaction | HopeFiveYears | JobSearchStatus | LastNewJob | AssessJob1 | AssessJob2 | AssessJob3 | AssessJob4 | AssessJob5 | AssessJob6 | AssessJob7 | AssessJob8 | AssessJob9 | AssessJob10 | AssessBenefits1 | AssessBenefits2 | AssessBenefits3 | AssessBenefits4 | AssessBenefits5 | AssessBenefits6 | AssessBenefits7 | AssessBenefits8 | AssessBenefits9 | AssessBenefits10 | AssessBenefits11 | JobContactPriorities1 | JobContactPriorities2 | JobContactPriorities3 | JobContactPriorities4 | JobContactPriorities5 | JobEmailPriorities1 | JobEmailPriorities2 | JobEmailPriorities3 | JobEmailPriorities4 | JobEmailPriorities5 | JobEmailPriorities6 | JobEmailPriorities7 | UpdateCV | Currency | Salary | SalaryType | ConvertedSalary | CurrencySymbol | CommunicationTools | TimeFullyProductive | EducationTypes | SelfTaughtTypes | TimeAfterBootcamp | HackathonReasons | AgreeDisagree1 | AgreeDisagree2 | AgreeDisagree3 | LanguageWorkedWith | LanguageDesireNextYear | DatabaseWorkedWith | DatabaseDesireNextYear | PlatformWorkedWith | PlatformDesireNextYear | FrameworkWorkedWith | FrameworkDesireNextYear | IDE | OperatingSystem | NumberMonitors | Methodology | VersionControl | CheckInCode | AdBlocker | AdBlockerDisable | AdBlockerReasons | AdsAgreeDisagree1 | AdsAgreeDisagree2 | AdsAgreeDisagree3 | AdsActions | AdsPriorities1 | AdsPriorities2 | AdsPriorities3 | AdsPriorities4 | AdsPriorities5 | AdsPriorities6 | AdsPriorities7 | AIDangerous | AIInteresting | AIResponsible | AIFuture | EthicsChoice | EthicsReport | EthicsResponsible | EthicalImplications | StackOverflowRecommend | StackOverflowVisit | StackOverflowHasAccount | StackOverflowParticipate | StackOverflowJobs | StackOverflowDevStory | StackOverflowJobsRecommend | StackOverflowConsiderMember | HypotheticalTools1 | HypotheticalTools2 | HypotheticalTools3 | HypotheticalTools4 | HypotheticalTools5 | WakeTime | HoursComputer | HoursOutside | SkipMeals | ErgonomicDevices | Exercise | Gender | SexualOrientation | EducationParents | RaceEthnicity | Age | Dependents | MilitaryUS | SurveyTooLong | SurveyEasy | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Yes | No | Kenya | No | Employed part-time | Bachelor’s degree (BA, BS, B.Eng., etc.) | Mathematics or statistics | 20 to 99 employees | Full-stack developer | 3-5 years | 3-5 years | Extremely satisfied | Extremely satisfied | Working as a founder or co-founder of my own c... | I’m not actively looking, but I am open to new... | Less than a year ago | 10.0 | 7.0 | 8.0 | 1.0 | 2.0 | 5.0 | 3.0 | 4.0 | 9.0 | 6.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3.0 | 1.0 | 4.0 | 2.0 | 5.0 | 5.0 | 6.0 | 7.0 | 2.0 | 1.0 | 4.0 | 3.0 | My job status or other personal status changed | NaN | NaN | Monthly | NaN | KES | Slack | One to three months | Taught yourself a new language, framework, or ... | The official documentation and/or standards fo... | NaN | To build my professional network | Strongly agree | Strongly agree | Neither Agree nor Disagree | JavaScript;Python;HTML;CSS | JavaScript;Python;HTML;CSS | Redis;SQL Server;MySQL;PostgreSQL;Amazon RDS/A... | Redis;SQL Server;MySQL;PostgreSQL;Amazon RDS/A... | AWS;Azure;Linux;Firebase | AWS;Azure;Linux;Firebase | Django;React | Django;React | Komodo;Vim;Visual Studio Code | Linux-based | 1 | Agile;Scrum | Git | Multiple times per day | Yes | No | NaN | Strongly agree | Strongly agree | Strongly agree | Saw an online advertisement and then researche... | 1.0 | 5.0 | 4.0 | 7.0 | 2.0 | 6.0 | 3.0 | Artificial intelligence surpassing human intel... | Algorithms making important decisions | The developers or the people creating the AI | I'm excited about the possibilities more than ... | No | Yes, and publicly | Upper management at the company/organization | Yes | 10 (Very Likely) | Multiple times per day | Yes | I have never participated in Q&A on Stack Over... | No, I knew that Stack Overflow had a jobs boar... | Yes | NaN | Yes | Extremely interested | Extremely interested | Extremely interested | Extremely interested | Extremely interested | Between 5:00 - 6:00 AM | 9 - 12 hours | 1 - 2 hours | Never | Standing desk | 3 - 4 times per week | Male | Straight or heterosexual | Bachelor’s degree (BA, BS, B.Eng., etc.) | Black or of African descent | 25 - 34 years old | Yes | NaN | The survey was an appropriate length | Very easy |
1 | 3 | Yes | Yes | United Kingdom | No | Employed full-time | Bachelor’s degree (BA, BS, B.Eng., etc.) | A natural science (ex. biology, chemistry, phy... | 10,000 or more employees | Database administrator;DevOps specialist;Full-... | 30 or more years | 18-20 years | Moderately dissatisfied | Neither satisfied nor dissatisfied | Working in a different or more specialized tec... | I am actively looking for a job | More than 4 years ago | 1.0 | 7.0 | 10.0 | 8.0 | 2.0 | 5.0 | 4.0 | 3.0 | 6.0 | 9.0 | 1.0 | 5.0 | 3.0 | 7.0 | 10.0 | 4.0 | 11.0 | 9.0 | 6.0 | 2.0 | 8.0 | 3.0 | 1.0 | 5.0 | 2.0 | 4.0 | 1.0 | 3.0 | 4.0 | 5.0 | 2.0 | 6.0 | 7.0 | I saw an employer’s advertisement | British pounds sterling (£) | 51000 | Yearly | 70841.0 | GBP | Confluence;Office / productivity suite (Micros... | One to three months | Taught yourself a new language, framework, or ... | The official documentation and/or standards fo... | NaN | NaN | Agree | Agree | Neither Agree nor Disagree | JavaScript;Python;Bash/Shell | Go;Python | Redis;PostgreSQL;Memcached | PostgreSQL | Linux | Linux | Django | React | IPython / Jupyter;Sublime Text;Vim | Linux-based | 2 | NaN | Git;Subversion | A few times per week | Yes | Yes | The website I was visiting asked me to disable it | Somewhat agree | Neither agree nor disagree | Neither agree nor disagree | NaN | 3.0 | 5.0 | 1.0 | 4.0 | 6.0 | 7.0 | 2.0 | Increasing automation of jobs | Increasing automation of jobs | The developers or the people creating the AI | I'm excited about the possibilities more than ... | Depends on what it is | Depends on what it is | Upper management at the company/organization | Yes | 10 (Very Likely) | A few times per month or weekly | Yes | A few times per month or weekly | Yes | No, I have one but it's out of date | 7 | Yes | A little bit interested | A little bit interested | A little bit interested | A little bit interested | A little bit interested | Between 6:01 - 7:00 AM | 5 - 8 hours | 30 - 59 minutes | Never | Ergonomic keyboard or mouse | Daily or almost every day | Male | Straight or heterosexual | Bachelor’s degree (BA, BS, B.Eng., etc.) | White or of European descent | 35 - 44 years old | Yes | NaN | The survey was an appropriate length | Somewhat easy |
2 | 4 | Yes | Yes | United States | No | Employed full-time | Associate degree | Computer science, computer engineering, or sof... | 20 to 99 employees | Engineering manager;Full-stack developer | 24-26 years | 6-8 years | Moderately satisfied | Moderately satisfied | Working as a founder or co-founder of my own c... | I’m not actively looking, but I am open to new... | Less than a year ago | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 5 | No | No | United States | No | Employed full-time | Bachelor’s degree (BA, BS, B.Eng., etc.) | Computer science, computer engineering, or sof... | 100 to 499 employees | Full-stack developer | 18-20 years | 12-14 years | Neither satisfied nor dissatisfied | Slightly dissatisfied | Working as a founder or co-founder of my own c... | I’m not actively looking, but I am open to new... | Less than a year ago | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | A recruiter contacted me | U.S. dollars ($) | NaN | NaN | NaN | NaN | NaN | Three to six months | Completed an industry certification program (e... | The official documentation and/or standards fo... | NaN | NaN | Disagree | Disagree | Strongly disagree | C#;JavaScript;SQL;TypeScript;HTML;CSS;Bash/Shell | C#;JavaScript;SQL;TypeScript;HTML;CSS;Bash/Shell | SQL Server;Microsoft Azure (Tables, CosmosDB, ... | SQL Server;Microsoft Azure (Tables, CosmosDB, ... | Azure | Azure | NaN | Angular;.NET Core;React | Visual Studio;Visual Studio Code | Windows | 2 | Agile;Kanban;Scrum | Git | Multiple times per day | Yes | Yes | The ad-blocking software was causing display i... | Neither agree nor disagree | Somewhat agree | Somewhat agree | Stopped going to a website because of their ad... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Artificial intelligence surpassing human intel... | Artificial intelligence surpassing human intel... | A governmental or other regulatory body | I don't care about it, or I haven't thought ab... | No | Yes, but only within the company | Upper management at the company/organization | Yes | 10 (Very Likely) | A few times per week | Yes | A few times per month or weekly | Yes | No, I have one but it's out of date | 8 | Yes | Somewhat interested | Somewhat interested | Somewhat interested | Somewhat interested | Somewhat interested | Between 6:01 - 7:00 AM | 9 - 12 hours | Less than 30 minutes | 3 - 4 times per week | NaN | I don't typically exercise | Male | Straight or heterosexual | Some college/university study without earning ... | White or of European descent | 35 - 44 years old | No | No | The survey was an appropriate length | Somewhat easy |
4 | 7 | Yes | No | South Africa | Yes, part-time | Employed full-time | Some college/university study without earning ... | Computer science, computer engineering, or sof... | 10,000 or more employees | Data or business analyst;Desktop or enterprise... | 6-8 years | 0-2 years | Slightly satisfied | Moderately satisfied | Working in a different or more specialized tec... | I’m not actively looking, but I am open to new... | Between 1 and 2 years ago | 8.0 | 5.0 | 7.0 | 1.0 | 2.0 | 6.0 | 4.0 | 3.0 | 10.0 | 9.0 | 1.0 | 10.0 | 2.0 | 4.0 | 8.0 | 3.0 | 11.0 | 7.0 | 5.0 | 9.0 | 6.0 | 2.0 | 1.0 | 4.0 | 5.0 | 3.0 | 7.0 | 3.0 | 6.0 | 2.0 | 1.0 | 4.0 | 5.0 | My job status or other personal status changed | South African rands (R) | 260000 | Yearly | 21426.0 | ZAR | Office / productivity suite (Microsoft Office,... | Three to six months | Taken a part-time in-person course in programm... | The official documentation and/or standards fo... | NaN | NaN | Strongly agree | Agree | Strongly disagree | C;C++;Java;Matlab;R;SQL;Bash/Shell | Assembly;C;C++;Matlab;SQL;Bash/Shell | SQL Server;PostgreSQL;Oracle;IBM Db2 | PostgreSQL;Oracle;IBM Db2 | Arduino;Windows Desktop or Server | Arduino;Windows Desktop or Server | NaN | NaN | Notepad++;Visual Studio;Visual Studio Code | Windows | 2 | Evidence-based software engineering;Formal sta... | Zip file back-ups | Weekly or a few times per month | No | NaN | NaN | Somewhat agree | Somewhat agree | Somewhat disagree | Clicked on an online advertisement;Saw an onli... | 2.0 | 3.0 | 4.0 | 6.0 | 1.0 | 7.0 | 5.0 | Algorithms making important decisions | Algorithms making important decisions | The developers or the people creating the AI | I'm excited about the possibilities more than ... | No | Yes, but only within the company | Upper management at the company/organization | Yes | 10 (Very Likely) | Daily or almost daily | Yes | Less than once per month or monthly | No, I knew that Stack Overflow had a jobs boar... | No, I know what it is but I don't have one | NaN | Yes | Extremely interested | Extremely interested | Extremely interested | Extremely interested | Extremely interested | Before 5:00 AM | Over 12 hours | 1 - 2 hours | Never | NaN | 3 - 4 times per week | Male | Straight or heterosexual | Some college/university study without earning ... | White or of European descent | 18 - 24 years old | Yes | NaN | The survey was an appropriate length | Somewhat easy |
The first step in our analysis is to identify the appropriate columns that are relevant. Unfortunately there are over 100 columns which is far too many for a practical analysis.
We identified a few columns for analysis using datapackage.json
. This JSON file describes each column for FreeCodeCamp's new coder surveys.
# Index location of the first set of columns to drop
print(csv.columns.get_loc("CodeEventConferences"))
print(csv.columns.get_loc("CodeEventWorkshops"))
8 23
# Drops columns
csv = csv.drop(csv.iloc[:, 8:23], axis=1)
# Index location of the next set of columns to drop
print(csv.columns.get_loc("NetworkID"))
print(csv.columns.get_loc("ResourceW3S"))
59 100
# Drop columns
csv = csv.drop(csv.iloc[:, 59:100], axis=1)
print(csv.columns.get_loc("YouTubeCodeCourse"))
63
# Drop remaining columns including index postion 63 and onward
csv = csv.drop(csv.iloc[:, 63:], axis=1)
csv.head()
Age | AttendedBootcamp | BootcampFinish | BootcampLoanYesNo | BootcampName | BootcampRecommend | ChildrenNumber | CityPopulation | CodeEventWorkshops | CommuteTime | CountryCitizen | CountryLive | EmploymentField | EmploymentFieldOther | EmploymentStatus | EmploymentStatusOther | ExpectedEarning | FinanciallySupporting | FirstDevJob | Gender | GenderOther | HasChildren | HasDebt | HasFinancialDependents | HasHighSpdInternet | HasHomeMortgage | HasServedInMilitary | HasStudentDebt | HomeMortgageOwe | HoursLearning | ID.x | ID.y | Income | IsEthnicMinority | IsReceiveDisabilitiesBenefits | IsSoftwareDev | IsUnderEmployed | JobApplyWhen | JobInterestBackEnd | JobInterestDataEngr | JobInterestDataSci | JobInterestDevOps | JobInterestFrontEnd | JobInterestFullStack | JobInterestGameDev | JobInterestInfoSec | JobInterestMobile | JobInterestOther | JobInterestProjMngr | JobInterestQAEngr | JobInterestUX | JobPref | JobRelocateYesNo | JobRoleInterest | JobWherePref | LanguageAtHome | MaritalStatus | MoneyForLearning | MonthsProgramming | ResourceW3S | SchoolDegree | SchoolMajor | StudentDebtOwe | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 27.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | more than 1 million | NaN | 15 to 29 minutes | Canada | Canada | software development and IT | NaN | Employed for wages | NaN | NaN | NaN | NaN | female | NaN | NaN | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | NaN | 15.0 | 02d9465b21e8bd09374b0066fb2d5614 | eb78c1c3ac6cd9052aec557065070fbf | NaN | NaN | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | start your own business | NaN | NaN | NaN | English | married or domestic partnership | 150.0 | 6.0 | 1.0 | some college credit, no degree | NaN | NaN |
1 | 34.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | less than 100,000 | NaN | NaN | United States of America | United States of America | NaN | NaN | Not working but looking for work | NaN | 35000.0 | NaN | NaN | male | NaN | NaN | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | NaN | 10.0 | 5bfef9ecb211ec4f518cfc1d2a6f3e0c | 21db37adb60cdcafadfa7dca1b13b6b1 | NaN | 0.0 | 0.0 | 0.0 | NaN | Within 7 to 12 months | NaN | NaN | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | work for a nonprofit | 1.0 | Full-Stack Web Developer | in an office with other developers | English | single, never married | 80.0 | 6.0 | 1.0 | some college credit, no degree | NaN | NaN |
2 | 21.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | more than 1 million | NaN | 15 to 29 minutes | United States of America | United States of America | software development and IT | NaN | Employed for wages | NaN | 70000.0 | NaN | NaN | male | NaN | NaN | 0.0 | 0.0 | 1.0 | NaN | 0.0 | NaN | NaN | 25.0 | 14f1863afa9c7de488050b82eb3edd96 | 21ba173828fbe9e27ccebaf4d5166a55 | 13000.0 | 1.0 | 0.0 | 0.0 | 0.0 | Within 7 to 12 months | 1.0 | NaN | NaN | 1.0 | 1.0 | 1.0 | NaN | NaN | 1.0 | NaN | NaN | NaN | NaN | work for a medium-sized company | 1.0 | Front-End Web Developer, Back-End Web Develo... | no preference | Spanish | single, never married | 1000.0 | 5.0 | NaN | high school diploma or equivalent (GED) | NaN | NaN |
3 | 26.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | between 100,000 and 1 million | NaN | I work from home | Brazil | Brazil | software development and IT | NaN | Employed for wages | NaN | 40000.0 | 0.0 | NaN | male | NaN | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 40000.0 | 14.0 | 91756eb4dc280062a541c25a3d44cfb0 | 3be37b558f02daae93a6da10f83f0c77 | 24000.0 | 0.0 | 0.0 | 0.0 | 1.0 | Within the next 6 months | 1.0 | NaN | NaN | NaN | 1.0 | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | work for a medium-sized company | NaN | Front-End Web Developer, Full-Stack Web Deve... | from home | Portuguese | married or domestic partnership | 0.0 | 5.0 | NaN | some college credit, no degree | NaN | NaN |
4 | 20.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | between 100,000 and 1 million | NaN | NaN | Portugal | Portugal | NaN | NaN | Not working but looking for work | NaN | 140000.0 | NaN | NaN | female | NaN | NaN | 0.0 | 0.0 | 1.0 | NaN | 0.0 | NaN | NaN | 10.0 | aa3f061a1949a90b27bef7411ecd193f | d7c56bbf2c7b62096be9db010e86d96d | NaN | 0.0 | 0.0 | 0.0 | NaN | Within 7 to 12 months | 1.0 | NaN | NaN | NaN | 1.0 | 1.0 | NaN | 1.0 | 1.0 | NaN | NaN | NaN | NaN | work for a multinational corporation | 1.0 | Full-Stack Web Developer, Information Security... | in an office with other developers | Portuguese | single, never married | 0.0 | 24.0 | NaN | bachelor's degree | Information Technology | NaN |
csv.iloc[:,:20]
Age | AttendedBootcamp | BootcampFinish | BootcampLoanYesNo | BootcampName | BootcampRecommend | ChildrenNumber | CityPopulation | CodeEventWorkshops | CommuteTime | CountryCitizen | CountryLive | EmploymentField | EmploymentFieldOther | EmploymentStatus | EmploymentStatusOther | ExpectedEarning | FinanciallySupporting | FirstDevJob | Gender | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 27.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | more than 1 million | NaN | 15 to 29 minutes | Canada | Canada | software development and IT | NaN | Employed for wages | NaN | NaN | NaN | NaN | female |
1 | 34.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | less than 100,000 | NaN | NaN | United States of America | United States of America | NaN | NaN | Not working but looking for work | NaN | 35000.0 | NaN | NaN | male |
2 | 21.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | more than 1 million | NaN | 15 to 29 minutes | United States of America | United States of America | software development and IT | NaN | Employed for wages | NaN | 70000.0 | NaN | NaN | male |
3 | 26.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | between 100,000 and 1 million | NaN | I work from home | Brazil | Brazil | software development and IT | NaN | Employed for wages | NaN | 40000.0 | 0.0 | NaN | male |
4 | 20.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | between 100,000 and 1 million | NaN | NaN | Portugal | Portugal | NaN | NaN | Not working but looking for work | NaN | 140000.0 | NaN | NaN | female |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
18170 | 41.0 | 0.0 | NaN | NaN | NaN | NaN | 1.0 | more than 1 million | NaN | I work from home | Indonesia | Indonesia | software development and IT | NaN | Self-employed freelancer | NaN | NaN | 0.0 | NaN | male |
18171 | 31.0 | 0.0 | NaN | NaN | NaN | NaN | 1.0 | more than 1 million | NaN | Less than 15 minutes | Nigeria | Nigeria | transportation | NaN | Self-employed freelancer | NaN | 70000.0 | 1.0 | NaN | male |
18172 | 39.0 | 0.0 | NaN | NaN | NaN | NaN | 3.0 | more than 1 million | 1.0 | 45 to 60 minutes | South Africa | South Africa | NaN | IT support and website update | Employed for wages | NaN | NaN | 0.0 | 1.0 | male |
18173 | 54.0 | 0.0 | NaN | NaN | NaN | NaN | 3.0 | between 100,000 and 1 million | NaN | Less than 15 minutes | United Kingdom | United Kingdom | education | NaN | Employed for wages | NaN | NaN | 0.0 | NaN | male |
18174 | 50.0 | 0.0 | NaN | NaN | NaN | NaN | 2.0 | less than 100,000 | NaN | 15 to 29 minutes | United Kingdom | United Kingdom | health care | NaN | Employed for wages | NaN | NaN | 0.0 | NaN | male |
18175 rows × 20 columns
If we utilize the following code below we'll get a better understanding of missing data in the columns. There are instances of respondents failing to enter information during the survey. Many columns have missing data, and it's going to be difficult to clean the dataset without removing nearly every row.
# Missing data calculated
series = csv.apply(pd.isnull).sum()/csv.shape[0] * 100
# Columns with less than or equal to 60% missing data points
list = series[series <= 60].index
print(series)
Age 15.449794 AttendedBootcamp 2.563961 BootcampFinish 94.118294 BootcampLoanYesNo 94.063274 BootcampName 94.778542 ... MonthsProgramming 6.002751 ResourceW3S 46.272352 SchoolDegree 15.444292 SchoolMajor 51.983494 StudentDebtOwe 81.502063 Length: 63, dtype: float64
# Converts the list of columns we want to use from pandas.index to list
cols_to_use = pd.Index.tolist(list)
cols_to_use.extend(["JobRoleInterest", "ExpectedEarning"])
# Isolates the dataframe down to only preferred columns
csv = csv[cols_to_use]
# Drop id.x and id.y columns
csv = csv.drop(columns=["ID.x","ID.y","ResourceW3S"])
csv
Age | AttendedBootcamp | CityPopulation | CommuteTime | CountryCitizen | CountryLive | EmploymentField | EmploymentStatus | Gender | HasDebt | HasFinancialDependents | HasHighSpdInternet | HasServedInMilitary | HoursLearning | Income | IsEthnicMinority | IsReceiveDisabilitiesBenefits | IsSoftwareDev | IsUnderEmployed | JobApplyWhen | JobPref | JobWherePref | LanguageAtHome | MaritalStatus | MoneyForLearning | MonthsProgramming | SchoolDegree | SchoolMajor | JobRoleInterest | ExpectedEarning | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 27.0 | 0.0 | more than 1 million | 15 to 29 minutes | Canada | Canada | software development and IT | Employed for wages | female | 1.0 | 0.0 | 1.0 | 0.0 | 15.0 | NaN | NaN | 0.0 | 0.0 | 0.0 | NaN | start your own business | NaN | English | married or domestic partnership | 150.0 | 6.0 | some college credit, no degree | NaN | NaN | NaN |
1 | 34.0 | 0.0 | less than 100,000 | NaN | United States of America | United States of America | NaN | Not working but looking for work | male | 1.0 | 0.0 | 1.0 | 0.0 | 10.0 | NaN | 0.0 | 0.0 | 0.0 | NaN | Within 7 to 12 months | work for a nonprofit | in an office with other developers | English | single, never married | 80.0 | 6.0 | some college credit, no degree | NaN | Full-Stack Web Developer | 35000.0 |
2 | 21.0 | 0.0 | more than 1 million | 15 to 29 minutes | United States of America | United States of America | software development and IT | Employed for wages | male | 0.0 | 0.0 | 1.0 | 0.0 | 25.0 | 13000.0 | 1.0 | 0.0 | 0.0 | 0.0 | Within 7 to 12 months | work for a medium-sized company | no preference | Spanish | single, never married | 1000.0 | 5.0 | high school diploma or equivalent (GED) | NaN | Front-End Web Developer, Back-End Web Develo... | 70000.0 |
3 | 26.0 | 0.0 | between 100,000 and 1 million | I work from home | Brazil | Brazil | software development and IT | Employed for wages | male | 1.0 | 1.0 | 1.0 | 0.0 | 14.0 | 24000.0 | 0.0 | 0.0 | 0.0 | 1.0 | Within the next 6 months | work for a medium-sized company | from home | Portuguese | married or domestic partnership | 0.0 | 5.0 | some college credit, no degree | NaN | Front-End Web Developer, Full-Stack Web Deve... | 40000.0 |
4 | 20.0 | 0.0 | between 100,000 and 1 million | NaN | Portugal | Portugal | NaN | Not working but looking for work | female | 0.0 | 0.0 | 1.0 | 0.0 | 10.0 | NaN | 0.0 | 0.0 | 0.0 | NaN | Within 7 to 12 months | work for a multinational corporation | in an office with other developers | Portuguese | single, never married | 0.0 | 24.0 | bachelor's degree | Information Technology | Full-Stack Web Developer, Information Security... | 140000.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
18170 | 41.0 | 0.0 | more than 1 million | I work from home | Indonesia | Indonesia | software development and IT | Self-employed freelancer | male | 1.0 | 1.0 | 0.0 | 0.0 | 10.0 | 60000.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | start your own business | NaN | Indonesian | married or domestic partnership | 10.0 | 1.0 | bachelor's degree | Telecommunications Technician | NaN | NaN |
18171 | 31.0 | 0.0 | more than 1 million | Less than 15 minutes | Nigeria | Nigeria | transportation | Self-employed freelancer | male | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 60000.0 | 0.0 | 0.0 | 0.0 | 1.0 | more than 12 months from now | work for a nonprofit | no preference | English | divorced | 10000.0 | 1.0 | high school diploma or equivalent (GED) | NaN | DevOps / SysAdmin, Mobile Developer, Pro... | 70000.0 |
18172 | 39.0 | 0.0 | more than 1 million | 45 to 60 minutes | South Africa | South Africa | NaN | Employed for wages | male | 1.0 | 1.0 | 0.0 | 0.0 | 10.0 | 1000000.0 | 0.0 | 0.0 | 1.0 | 1.0 | NaN | NaN | NaN | Zulu | married or domestic partnership | 19.0 | 3.0 | some high school | NaN | NaN | NaN |
18173 | 54.0 | 0.0 | between 100,000 and 1 million | Less than 15 minutes | United Kingdom | United Kingdom | education | Employed for wages | male | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1000000.0 | 0.0 | 0.0 | 0.0 | 1.0 | NaN | freelance | NaN | English | divorced | 0.0 | 5.0 | trade, technical, or vocational training | NaN | NaN | NaN |
18174 | 50.0 | 0.0 | less than 100,000 | 15 to 29 minutes | United Kingdom | United Kingdom | health care | Employed for wages | male | 1.0 | 1.0 | 1.0 | 1.0 | 5.0 | 1000000.0 | 0.0 | 0.0 | 0.0 | 1.0 | I haven't decided | work for a government | no preference | English | married or domestic partnership | NaN | 10.0 | bachelor's degree | Computer and Information Studies | Back-End Web Developer, Data Engineer, Data ... | NaN |
18175 rows × 30 columns
# Count missing data
nulls = csv.apply(pd.isnull).sum()/csv.shape[0] * 100
nulls = nulls.sort_values()
nulls
IsSoftwareDev 0.588721 AttendedBootcamp 2.563961 MonthsProgramming 6.002751 HoursLearning 8.038514 MoneyForLearning 8.792297 Gender 14.971114 CountryCitizen 15.367263 HasHighSpdInternet 15.378267 SchoolDegree 15.444292 Age 15.449794 CityPopulation 15.521320 LanguageAtHome 15.576341 CountryLive 15.620358 MaritalStatus 15.625860 HasFinancialDependents 15.658872 IsEthnicMinority 15.856946 HasDebt 15.867950 HasServedInMilitary 16.060523 IsReceiveDisabilitiesBenefits 16.247593 EmploymentStatus 21.072902 JobPref 25.815681 CommuteTime 49.127923 IsUnderEmployed 49.254470 SchoolMajor 51.983494 JobApplyWhen 55.224209 JobWherePref 55.334250 EmploymentField 55.345254 Income 58.057772 ExpectedEarning 60.385144 JobRoleInterest 61.529574 dtype: float64
csv.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 18175 entries, 0 to 18174 Data columns (total 30 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Age 15367 non-null float64 1 AttendedBootcamp 17709 non-null float64 2 CityPopulation 15354 non-null object 3 CommuteTime 9246 non-null object 4 CountryCitizen 15382 non-null object 5 CountryLive 15336 non-null object 6 EmploymentField 8116 non-null object 7 EmploymentStatus 14345 non-null object 8 Gender 15454 non-null object 9 HasDebt 15291 non-null float64 10 HasFinancialDependents 15329 non-null float64 11 HasHighSpdInternet 15380 non-null float64 12 HasServedInMilitary 15256 non-null float64 13 HoursLearning 16714 non-null float64 14 Income 7623 non-null float64 15 IsEthnicMinority 15293 non-null float64 16 IsReceiveDisabilitiesBenefits 15222 non-null float64 17 IsSoftwareDev 18068 non-null float64 18 IsUnderEmployed 9223 non-null float64 19 JobApplyWhen 8138 non-null object 20 JobPref 13483 non-null object 21 JobWherePref 8118 non-null object 22 LanguageAtHome 15344 non-null object 23 MaritalStatus 15335 non-null object 24 MoneyForLearning 16577 non-null float64 25 MonthsProgramming 17084 non-null float64 26 SchoolDegree 15368 non-null object 27 SchoolMajor 8727 non-null object 28 JobRoleInterest 6992 non-null object 29 ExpectedEarning 7200 non-null float64 dtypes: float64(15), object(15) memory usage: 4.2+ MB
# New column to indicate year of survey completion
csv["Year"] = 2017
csv2016["Year"] = 2016
# Columns of interest
column_lists = csv.columns.to_list()
column_lists
# Apply column filtering to survey 2016
survey_2016 = csv2016[column_lists]
# Merge dataframes
combined_survey = pd.concat([csv, survey_2016])
# Merged dataframe length (rows)
print("Number of Rows:")
print(combined_survey.shape[0])
Number of Rows: 33795
JobRoleInterest
: "Which one of these careers are you interested in?"
Most of the courses offered on our e-learning platform are for web and mobile development. We need to identify if the sample from the dataset is representative of the population of new coders. One significant limitation to this survey is in regards to the number of rows that contain missing information for JobRoleInterest
. Roughly 6 out of 10 observations do not have a response to this question.
It's strange that this many people took the survey neglected to answer this question. In addtion to this question, perhaps another question should have been asked: "What are your goals for learning programming", or something similar.
After merging both dataframes together we ended up with 33,795 rows. For analysis we're going to remove all observations that failed to answer this question. The final dataframe will include only 13,495 rows.
Of these observations we'll notice that career interest heavily leans to web development (including full stack, front end, and back end web development). Many observations also include multiple categories, rather than just one category. We can split each string for each row in the JobRoleInterest
column. This will help us understand the number of choices that each person selected.
We can split each occurance of a job category for rows containing multiple categories. To do this we'll have to use pandas.Series.str.split
. This approach will help us count every individual job category.
interests = combined_survey["JobRoleInterest"].value_counts(normalize=True) * 100
interests.head(20)
Full-Stack Web Developer 25.150056 Front-End Web Developer 13.553168 Back-End Web Developer 6.268989 Data Scientist / Data Engineer 4.786958 Mobile Developer 3.934791 User Experience Designer 2.423120 DevOps / SysAdmin 1.889589 Product Manager 1.822897 Data Scientist 1.126343 Quality Assurance Engineer 0.881808 Game Developer 0.844757 Information Security 0.681734 Full-Stack Web Developer, Front-End Web Developer 0.474250 Front-End Web Developer, Full-Stack Web Developer 0.414969 Data Engineer 0.392738 User Experience Designer, Front-End Web Developer 0.318637 Front-End Web Developer, Back-End Web Developer, Full-Stack Web Developer 0.288996 Back-End Web Developer, Front-End Web Developer, Full-Stack Web Developer 0.266765 Back-End Web Developer, Full-Stack Web Developer, Front-End Web Developer 0.266765 Full-Stack Web Developer, Front-End Web Developer, Back-End Web Developer 0.229715 Name: JobRoleInterest, dtype: float64
# Combination of all job interests
len(interests)
3214
# New dataframe excluding any missing data from JobRoleInterest column
survey = combined_survey[combined_survey["JobRoleInterest"].notnull()].copy()
# Splits each occurence of a job category
survey["JobRoleInterest"] = survey["JobRoleInterest"].str.split(",")
# Combined dataset (survey) missing values in percentage
(survey.apply(pd.isnull).sum()/survey.shape[0] * 100).sort_values(ascending = False)
EmploymentField 61.645054 Income 59.147833 CommuteTime 53.864394 IsUnderEmployed 53.093738 SchoolMajor 47.143386 MaritalStatus 39.066321 EmploymentStatus 14.071878 ExpectedEarning 10.596517 IsReceiveDisabilitiesBenefits 7.773249 HasServedInMilitary 7.654687 IsEthnicMinority 7.476843 LanguageAtHome 7.476843 HasDebt 7.454613 Age 7.387921 CityPopulation 7.365691 CountryLive 7.321230 HasFinancialDependents 7.306410 SchoolDegree 7.128566 CountryCitizen 7.121156 HasHighSpdInternet 7.054465 Gender 6.595035 MoneyForLearning 6.587625 HoursLearning 5.779918 MonthsProgramming 4.579474 AttendedBootcamp 1.237495 JobPref 0.955910 JobWherePref 0.652093 JobApplyWhen 0.548351 IsSoftwareDev 0.229715 JobRoleInterest 0.000000 Year 0.000000 dtype: float64
# Fill missing data points with average
survey["ExpectedEarning"] = survey["ExpectedEarning"].fillna(survey["ExpectedEarning"].median())
survey
Age | AttendedBootcamp | CityPopulation | CommuteTime | CountryCitizen | CountryLive | EmploymentField | EmploymentStatus | Gender | HasDebt | HasFinancialDependents | HasHighSpdInternet | HasServedInMilitary | HoursLearning | Income | IsEthnicMinority | IsReceiveDisabilitiesBenefits | IsSoftwareDev | IsUnderEmployed | JobApplyWhen | JobPref | JobWherePref | LanguageAtHome | MaritalStatus | MoneyForLearning | MonthsProgramming | SchoolDegree | SchoolMajor | JobRoleInterest | ExpectedEarning | Year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 34.0 | 0.0 | less than 100,000 | NaN | United States of America | United States of America | NaN | Not working but looking for work | male | 1.0 | 0.0 | 1.0 | 0.0 | 10.0 | NaN | 0.0 | 0.0 | 0.0 | NaN | Within 7 to 12 months | work for a nonprofit | in an office with other developers | English | single, never married | 80.0 | 6.0 | some college credit, no degree | NaN | [Full-Stack Web Developer] | 35000.0 | 2017 |
2 | 21.0 | 0.0 | more than 1 million | 15 to 29 minutes | United States of America | United States of America | software development and IT | Employed for wages | male | 0.0 | 0.0 | 1.0 | 0.0 | 25.0 | 13000.0 | 1.0 | 0.0 | 0.0 | 0.0 | Within 7 to 12 months | work for a medium-sized company | no preference | Spanish | single, never married | 1000.0 | 5.0 | high school diploma or equivalent (GED) | NaN | [ Front-End Web Developer, Back-End Web Deve... | 70000.0 | 2017 |
3 | 26.0 | 0.0 | between 100,000 and 1 million | I work from home | Brazil | Brazil | software development and IT | Employed for wages | male | 1.0 | 1.0 | 1.0 | 0.0 | 14.0 | 24000.0 | 0.0 | 0.0 | 0.0 | 1.0 | Within the next 6 months | work for a medium-sized company | from home | Portuguese | married or domestic partnership | 0.0 | 5.0 | some college credit, no degree | NaN | [ Front-End Web Developer, Full-Stack Web De... | 40000.0 | 2017 |
4 | 20.0 | 0.0 | between 100,000 and 1 million | NaN | Portugal | Portugal | NaN | Not working but looking for work | female | 0.0 | 0.0 | 1.0 | 0.0 | 10.0 | NaN | 0.0 | 0.0 | 0.0 | NaN | Within 7 to 12 months | work for a multinational corporation | in an office with other developers | Portuguese | single, never married | 0.0 | 24.0 | bachelor's degree | Information Technology | [Full-Stack Web Developer, Information Securi... | 140000.0 | 2017 |
6 | 29.0 | 0.0 | between 100,000 and 1 million | 30 to 44 minutes | United Kingdom | United Kingdom | NaN | Employed for wages | female | 1.0 | 0.0 | 1.0 | 0.0 | 16.0 | 40000.0 | NaN | 0.0 | 0.0 | 0.0 | I'm already applying | work for a medium-sized company | no preference | English | married or domestic partnership | 0.0 | 12.0 | some college credit, no degree | NaN | [Full-Stack Web Developer] | 30000.0 | 2017 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
15585 | 32.0 | 0.0 | more than 1 million | 40.0 | Ukraine | Ukraine | health care | Employed for wages | female | 0.0 | 1.0 | 1.0 | 0.0 | 5.0 | 36000.0 | 1.0 | 0.0 | 0.0 | 1.0 | Within the next 6 months | work for a multinational corporation | in an office with other developers | Russian | married or domestic partnership | 5.0 | 2.0 | bachelor's degree | Linguistics | [ Front-End Web Developer] | 8400.0 | 2016 |
15598 | 51.0 | 0.0 | less than 100,000 | 30.0 | United States of America | United States of America | finance | Employed for wages | male | 1.0 | 1.0 | 1.0 | 1.0 | 30.0 | 200000.0 | 0.0 | 0.0 | 0.0 | 0.0 | more than 12 months from now | work for a medium-sized company | in an office with other developers | English | married or domestic partnership | 100.0 | 12.0 | professional degree (MBA, MD, JD, etc.) | Investments and Securities | [Full-Stack Web Developer] | 100000.0 | 2016 |
15600 | 38.0 | 0.0 | more than 1 million | 90.0 | United States of America | United States of America | finance | Employed for wages | male | 0.0 | 1.0 | 1.0 | 0.0 | 6.0 | 200000.0 | 0.0 | 0.0 | 0.0 | 0.0 | more than 12 months from now | work for a startup | no preference | English | married or domestic partnership | 500.0 | 12.0 | bachelor's degree | Finance | [Full-Stack Web Developer] | 150000.0 | 2016 |
15608 | 40.0 | 0.0 | more than 1 million | 60.0 | Australia | Australia | software development and IT | Employed for wages | male | 1.0 | 1.0 | 0.0 | 0.0 | 10.0 | 200000.0 | 0.0 | 0.0 | 0.0 | 0.0 | more than 12 months from now | work for a multinational corporation | in an office with other developers | English | married or domestic partnership | 0.0 | 2.0 | bachelor's degree | Computer Systems Analysis | [ DevOps / SysAdmin] | 80000.0 | 2016 |
15615 | 28.0 | 0.0 | less than 100,000 | 7.0 | United States of America | United States of America | food and beverage | Employed for wages | male | 1.0 | 1.0 | 1.0 | 0.0 | 20.0 | 200000.0 | 0.0 | 0.0 | 0.0 | 1.0 | I'm already applying | work for a medium-sized company | from home | English | married or domestic partnership | 1400.0 | 7.0 | associate's degree | Computer and Information Systems Security | [Full-Stack Web Developer] | 50000.0 | 2016 |
13495 rows × 31 columns
# Counts each occurence of a particular category
category_count = dict()
# For loop for counting each individual category in the JobRoleInterest column
for categories in survey["JobRoleInterest"]:
for category in categories:
if category in category_count:
category_count[category] += 1 # counts category key if already present in dictionary
else:
category_count[category] = 1 # adds unique category key to dictionary if not already present
# Transforms dictionary to dataframe
category_count = pd.DataFrame.from_dict(category_count, orient="index", columns= ["Count"])
category_count = category_count.reset_index(level = 0)
category_count = category_count.rename(columns = {"index":"Interests"})
category_count["Interests"].unique()
array(['Full-Stack Web Developer', ' Front-End Web Developer', ' Back-End Web Developer', ' DevOps / SysAdmin', ' Mobile Developer', ' Full-Stack Web Developer', ' Information Security', ' Front-End Web Developer', ' Quality Assurance Engineer', ' Game Developer', ' User Experience Designer', ' DevOps / SysAdmin', ' Data Scientist', ' Data Engineer', 'Back-End Web Developer', 'Information Security', ' Data Scientist', ' Mobile Developer', ' Product Manager', 'Data Engineer', 'Game Developer', ' Product Manager', ' User Experience Designer', ' Quality Assurance Engineer', 'Ethical Hacker', ' security expert', ' Technical Writer', ' Researcher', 'Systems Engineer', 'Desktop Applications Programmer', ' Robotics', 'Non technical ', ' UI Design', 'Software engineer ', 'email coder', ' Data analyst', ' I dont yet know', ' UX developer/designer', ' support scientific resaerch ', ' AI and neuroscience', 'Full Stack Software Engineer', ' Program Manager', ' Application Support Analyst', " This futurist's dream of using some tech in a way that inspires critical amounts of people to influence the changes we need to protect ", ' Information Architect', 'Physicist ', 'Security Business Analyst ', ' Bioinformatics/science ', ' creative coder / generative artist/designer', ' a job in which I can use coding skills to create valuable portals to advance human rights', 'Research ', ' Bitcoin/Crypto', 'Embedded hardware', 'Data/Interactive Journalist', 'Software Engineering', ' Software Engineer', ' Business Analyst', 'Network Engineer', 'Information Developer', 'Java developer', ' Project Management', 'Machine learning engineer', 'Real-time systems', ' Cybersecurity', ' software engineer', 'GIS Developer', 'Research and education', ' System Software', 'Full Stack Developer ', 'AI', ' Bioinformatics ', ' Data Analyst', 'Urban Planner', 'Software Engineer', 'full stack developer', ' SWE', ' Embedded Developer', ' virtual reality developer', ' Journalist/Graphic Designer/Marketing', ' Web Designer', 'Computer Architect', ' Networking', 'Software Developer', ' Software Developer', ' Machine Learning Engineer', ' data analyst', ' AI and Machine Learning', ' computer engineer', ' Artificial Intelligence', 'Systems Programming', 'Software Engineer (Computer Science Based)', 'Technology Management', 'full-stack developer', ' Software developer', 'BA or developer', ' User Interface Design', 'System Engineer', 'Network', ' Analyst', ' Machine Learning ', 'Pharmacy tech', 'data journalist / data visualist', 'Desings', ' Infrastructure Architect ', ' Tech art', ' Technology-Business Liaison', ' Product Designer', 'Front-End Web Designer', 'Document Controller', ' Software enginner', ' programmer', 'undeceided', 'Pharmaceutical industry', ' Information Technology', ' Library Developer', ' Desktop Application Developer', ' Machine Learning', ' Operating Systems', ' Compilers', ' etc...', ' GIS Database Admin', ' designer', 'Support Engineer or API Support', ' Software engineer', ' Python Developer', ' Bioinformatics', 'Robotics Process Automation Specialist', 'Data visualisation', ' Desktop applications developer', 'All - whatever is required to develop tools to revolutionize the mechanical engineering process', 'Digital Humanitites', ' User Interface Designer', 'Artificial Intelligence', ' Software Development', 'Programming', 'Web development ', ' Marketing', 'Financial Services', 'software developer', 'Natural Language Processing', ' Entreprenuer / Web Dev Hustler ', ' Machine Learning Engineer ', 'Marketing Automation ', 'AI Developer', ' network admin', 'Front end', ' back end', ' game', ' web', ' mobile developer', 'Not sure!', ' Anything that engages me', "i don't know what the difference is between most of these soz lol", 'Unsure', 'Any of them.', 'Not sure yet', 'Not Sure Yet', 'Not sure', ' i dunno!!!!', ' milatary engineer', ' SEO', 'Software engineer', 'Astrophysicist', ' Journalist', 'philosopher', ' Java developer', 'Desktop Applications', ' Programmer', 'IoT Developer', 'Systems Programmer', 'Web Designer', "Don't know yet", ' Artificial intelligence', ' Artificial Intelligence Engineer', 'Developer Evangelist', ' Bioinformatitian', ' IoT', ' Entrepreneur', ' I am interested in Game Development', ' Mobile Development', ' Web Design', ' Front End Web Development', 'programmer', 'Data Reporter', 'Not Sure', 'Web developer', 'User Interface Designer', 'Robotics and AI Engineer', ' Ethical Hacker', ' Artificial Intelligence engineer', ' Scientific Programming', ' Software Developer or Front-End Web Developer', ' UI Designer', ' Campaign Manager', ' AI Engineer', 'Software Specialist ', ' Project Manager', ' Growth Hacker', 'Research', 'idk', ' Founder', 'Software Engineers', 'VR Technology developer', ' developer', ' plc', 'Ceo', ' Tech lobbiest', 'Quant (Algorithmic Trader)', 'Machine learning and AI ', 'Project manager', 'undecided', ' Databases', 'Project Manager', 'Cloud computing ', 'Software Developper', 'College professor', ' System Administrator/Network', ' Software Projects Manager', 'Teacher. Teaching students to code. ', 'Education', 'code developer...in whatever format', ' front-end', ' back-end', ' app dev etc.', 'improving in my current career as a Learning technologist', 'Informatician', ' Artificial Intelligence ', 'lab scientist', 'Data Visualization Specialist', "I don't know yet!", "I'm just learning code to increase my skill-set. I see it as a literacy issue.", ' Teacher', ' Criminal Defense Attorney-- focusing on cyber crimes ', 'Remote Support', 'non-programmer', ' IT specialist ', ' Data Scientist / Data Engineer'], dtype=object)
There are many different "job interests" throughout the survey, and it's obvious that respondents were able to write-in their own response to the question. The biggest downfall of this approach is that we end up with many different variations of the same career, different spelling and capitalization, and unknown responses.
Python-Pandas counts these all as unique values so it is more difficult to get a completely accurate count. For example, different variations of "Front-End Developer". We do see some extra whitespace scattered throughout some of the values too. In order to clean up some of the values in this dataframe we'll strip any extra white space and change everything to lower case font.
# Strips whitespace, changes to lower case
category_count["Interests"] = category_count["Interests"].str.lstrip().str.rstrip().str.lower()
# Groupy by interests and adds up the number of occurences
category_count.groupby("Interests").sum().sort_values(by = "Count", ascending= False).head(50)
Count | |
---|---|
Interests | |
full-stack web developer | 6769 |
front-end web developer | 4912 |
back-end web developer | 3476 |
mobile developer | 2719 |
user experience designer | 1744 |
data scientist | 1643 |
game developer | 1628 |
information security | 1326 |
data engineer | 1248 |
devops / sysadmin | 1146 |
product manager | 1005 |
data scientist / data engineer | 646 |
quality assurance engineer | 602 |
software engineer | 16 |
software developer | 8 |
artificial intelligence | 5 |
data analyst | 5 |
programmer | 4 |
machine learning engineer | 4 |
desktop application developer | 3 |
not sure | 3 |
not sure yet | 3 |
project manager | 3 |
machine learning | 2 |
product designer | 2 |
web designer | 2 |
full stack developer | 2 |
research | 2 |
ethical hacker | 2 |
user interface designer | 2 |
researcher | 2 |
business analyst | 2 |
bioinformatics | 2 |
undecided | 2 |
unsure | 2 |
java developer | 2 |
artificial intelligence engineer | 2 |
python developer | 1 |
quant (algorithmic trader) | 1 |
project management | 1 |
remote support | 1 |
research and education | 1 |
real-time systems | 1 |
philosopher | 1 |
programming | 1 |
program manager | 1 |
mobile development | 1 |
natural language processing | 1 |
network | 1 |
network admin | 1 |
# Career interest frequency
group_category = category_count.groupby("Interests").sum().sort_values(by = "Count", ascending= False).head(50)
# Plot results
fig, ax = plt.subplots(figsize = (10,8))
plt.barh(group_category.index[:15], group_category["Count"][:15], height = .6, color = "grey")
# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)
# Invert data, x ticks to top
plt.gca().invert_yaxis()
ax.xaxis.tick_top()
# Title
plt.title("Career Interests", size = 20, loc = "left", x = -0.28, y = 1.08)
# X label
plt.text(-1950, -1.6,"Frequency", size = 14, color = "grey")
plt.show()
After some data cleaning we can see that it's not perfect, but we definitely can tell that we have quite a range of interests ranging from primarily web-development to data science, game development and many other interests.
While we have many mixed interests, this is a good way to show that individuals might be interested in other topics than just web-development. We also see that some individuals responded with different versions of "I don't know". While it would be possible to remove any rows with this answers, given how few there are it's unlikely to affect our analysis either way.
# Gender frequency (Freecodecamp)
genders = survey["Gender"].value_counts(normalize=True, dropna=False) * 100
# Plot results
fig, ax = plt.subplots(figsize = (12, 8))
genders.plot(kind = "bar", color = "grey", width = .58)
# Title
plt.title("Gender representation (FreeCodeCamp)", size = 19, loc = "left", x = -0.1, y = 1.02)
# Remove spines
plt.gca().spines[["top", "left", "right"]].set_visible(False)
# X and Y labels
plt.ylabel("Frequency (percent)", color = "grey", size = 14, loc = "top")
plt.xlabel("Gender", color = "grey", size = 14, loc = "left")
# X and Y ticks
plt.yticks(size = 12)
plt.xticks(rotation = 0, size = 12)
plt.show()
We'll introduce a similar survey conducted in 2018 by Stack Exchange (a popular forum for asking and answering software/programming related questions). We'll perform data cleaning on this dataset shortly, but first we can get an overview of its contents and how its demographics compare to Freecodecamp's.
# Gender frequency (Stack Exchange)
genders_stk_exchange = exchange["Gender"].value_counts(normalize=True, dropna=False) * 100
# Plot results
fig, ax = plt.subplots(figsize = (12, 8))
genders_stk_exchange[:3].plot(kind = "bar", color = "grey", width = .57)
# Title
plt.title("Gender representation (Stack Exchange)", size = 19, loc = "left", x = -0.1, y = 1.02)
# Remove spines
plt.gca().spines[["top", "left", "right"]].set_visible(False)
# X and Y labels
plt.ylabel("Frequency (percent)", color = "grey", size = 14, loc = "top")
plt.xlabel("Gender", color = "grey", size = 14, loc = "left")
# X and Y ticks
plt.yticks(size = 12)
plt.xticks(rotation = 0, size = 12)
plt.show()
# Age distribution plotted
fig, ax = plt.subplots(figsize = (12,8))
survey["Age"].hist(bins = 20, color = "grey")
# Title
plt.title("Age Groups (FreeCodeCamp)", size = 19, loc = "left", x = -0.1, y = 1.02)
# Remove gridlines
ax.grid(False)
# Remove spines
plt.gca().spines[["right","top"]].set_visible(False)
# X and Y labels
plt.ylabel("# of observations", color = "grey", size = 14, loc = "top")
plt.xlabel("Age", color = "grey", size = 14, loc = "left")
# X and Y ticks
plt.yticks(size = 12)
plt.xticks(size = 12)
# Text
plt.text(32.5,2700,"Most new programmers\nare in their early 20s to early 30s", size = 14, color = "maroon")
# Main demographic highlighted
plt.axvspan(survey["Age"].quantile(0.25), survey["Age"].quantile(0.75), ymax=1000, color = "maroon", alpha = 0.4)
plt.show()
# Stack exchange age groups
# Color assignment
colors = ["grey","grey", "maroon", "grey", "grey", "grey"]
# Plot results
fig, ax = plt.subplots(figsize = (12, 8))
ages = exchange["Age"].value_counts().iloc[[4,1,0,2,3,5]].plot.bar(width = 0.65, color = colors)
# Remove spines
plt.gca().spines[["top", "left", "right"]].set_visible(False)
# Title
plt.title("Age Groups (Stack Exchange)", size = 19, loc = "left",x = -0.1, y = 1.02)
# X and Y lables
plt.ylabel("# of observations", color = "grey", size = 14, loc = "top")
plt.xlabel("Age", color = "grey", size = 14, loc = "left")
# X and Y ticks
plt.yticks(size = 12, color = "grey")
plt.xticks(size = 11, rotation = 0, color = "grey")
# Most frequent age group highlighted
plt.gca().get_xticklabels()[2].set_color("maroon")
plt.show()
# Freecodecamp countries
# Country frequency (freecodecamp)
countries = survey["CountryLive"].value_counts(normalize=True) * 100
# Frequency table to dataframe
countries = pd.Series.to_frame(countries).reset_index()
# Rename dataframe columns
countries = countries.rename(columns={"index":"Country","CountryLive":"Percentage"})
#------------------------------------------------------------------------------------------------#
# Stack Exchange Countries
# Country frequency (Stack Exchange)
countries_stack = exchange["Country"].value_counts(normalize=True) * 100
# Frequency table to dataframe
countries_stack = pd.Series.to_frame(countries_stack).reset_index()
# Rename dataframe columns
countries_stack = countries_stack.rename(columns={"index":"Country","Country":"Percentage"})
#---------------------------------------------------------------------------------------------------#
# Plot results (FreeCodeCamp)
# Color assignment
colors = ["maroon","maroon","maroon","maroon","grey","grey","grey","grey","grey","grey"]
fig, ax = plt.subplots(figsize = (10, 8))
plt.barh(countries["Country"][:10], countries["Percentage"][:10], color = colors, height= 0.65)
# Title
plt.title("Country Representation (FreeCodeCamp)", loc = "left", size = 18, x = -0.3, y = 1.08)
# Invert data, x ticks to top
plt.gca().invert_yaxis()
ax.xaxis.tick_top()
# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)
# Text
plt.text(-15.2, -1.4,"Frequency (in percent)", size = 14, color = "grey")
# X and Y ticks
plt.xticks(size = 13, color = "grey")
plt.yticks(size = 14, color = "grey")
# Top 4 countries highlighted
plt.gca().get_yticklabels()[0].set_color("maroon")
plt.gca().get_yticklabels()[1].set_color("maroon")
plt.gca().get_yticklabels()[2].set_color("maroon")
plt.gca().get_yticklabels()[3].set_color("maroon")
plt.show()
# Plot results (Stack Exchange)
# Color Assignment
colors = ["maroon","maroon","#D6A0A9","maroon","maroon","grey","grey","grey","grey","grey"]
fig, ax = plt.subplots(figsize = (10, 8))
plt.barh(countries_stack["Country"][:10], countries_stack["Percentage"][:10], color = colors, height= 0.6)
# Title
plt.title("Country Representation (Stack Exchange)", loc = "left", size = 18, x = -0.23, y = 1.09)
# Invert data, x ticks to top
plt.gca().invert_yaxis()
ax.xaxis.tick_top()
# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)
# Text
plt.text(-4.9, -1.4,"Frequency (in percent)", size = 14, color = "grey")
# X and Y ticks
plt.xticks(size = 13, color = "grey")
plt.yticks(size = 14, color = "grey")
# Highlight top 5 countries
plt.gca().get_yticklabels()[0].set_color("maroon")
plt.gca().get_yticklabels()[1].set_color("maroon")
plt.gca().get_yticklabels()[2].set_color("#D6A0A9") # Germany
plt.gca().get_yticklabels()[3].set_color("maroon")
plt.gca().get_yticklabels()[4].set_color("maroon")
plt.show()
# FreeCodeCamp
# School degree frequency (Freecodecamp)
code_camp_edu = survey["SchoolDegree"].value_counts(normalize=True) * 100
# Frequency table to dataframe
code_camp_edu = pd.Series.to_frame(code_camp_edu).reset_index()
# Rename dataframe columns
code_camp_edu = code_camp_edu.rename(columns={"index":"School Degree","SchoolDegree":"Percentage"})
# Color assignment
colors = ["maroon","maroon","grey","grey","grey","grey","grey","grey","grey","grey"]
# Plot results
fig, ax = plt.subplots(figsize = (10, 8))
plt.barh(code_camp_edu["School Degree"][:10], code_camp_edu["Percentage"][:10], color = colors, height= 0.62)
# Title
plt.title("School Degree Representation (FreeCodeCamp)", loc = "left", size = 18, x = -0.52, y = 1.1)
# Y label
plt.ylabel("School Degree", loc = "top", size = 14, color = "grey")
# Invert data, x ticks to top
plt.gca().invert_yaxis()
ax.xaxis.tick_top()
# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)
# Text
plt.text(-12, -1.6,"Frequency (in percent)", size = 14, color = "grey")
# X and Y ticks
plt.xticks(size = 13, color = "grey")
plt.yticks(size = 14, color = "grey")
# Highlight top 2 degrees
plt.gca().get_yticklabels()[0].set_color("maroon")
plt.gca().get_yticklabels()[1].set_color("maroon")
plt.show()
# Stack Exchange
# Replace string values
exchange["FormalEducation"] = exchange["FormalEducation"].replace({"Secondary school (e.g. American high school, German Realschule or Gymnasium, etc.)":"High School"})
# School degree frequency (Stack Exchange)
stk_exchange_edu = exchange["FormalEducation"].value_counts(normalize=True) * 100
# Frequency table to dataframe
stk_exchange_edu = pd.Series.to_frame(stk_exchange_edu).reset_index()
# Rename dataframe columns
stk_exchange_edu = stk_exchange_edu.rename(columns={"index":"School Degree","FormalEducation":"Percentage"})
# Color assignment
colors = ["maroon","maroon","grey","grey","grey","grey","grey","grey","grey","grey"]
# Plot results
fig, ax = plt.subplots(figsize = (10, 8))
plt.barh(stk_exchange_edu["School Degree"], stk_exchange_edu["Percentage"], color = colors, height= 0.62)
# Title
plt.title("School Degree Representation (Stack Exchange)", loc = "left", size = 18, x = -0.72, y = 1.1)
# Y label
plt.ylabel("School Degree", loc = "top", size = 14, color = "grey")
# Invert data, x ticks to top
plt.gca().invert_yaxis()
ax.xaxis.tick_top()
# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)
# Text
plt.text(-12, -1.5,"Frequency (in percent)", size = 14, color = "grey")
# X and Y ticks
plt.xticks(size = 13, color = "grey")
plt.yticks(size = 14, color = "grey")
# Highlight top 2 degrees
plt.gca().get_yticklabels()[0].set_color("maroon")
plt.gca().get_yticklabels()[1].set_color("maroon")
plt.show()
Thus far we have done the following:
JobRoleInterest
inputJobRoleInterest
categoriesBoth datasets share similar a similar distribution concerning age and gender. Men consist of the majority of respondents of new programmers (70 %, women at 20%).
The stack exchange survey is consisted primarily of STEM careers, and the distribution of gender is even more pronounced. Men represent nearly 60% of respondents, nan
s (unknown, missing data) at roughly 35% and women at only around 5%.
Age distribution is roughly the same too. New programmers are most likely to be in their early 20s to early 30s, and stack exchange survey participants are usually 25 to 34 years old.
Country representation between both surveys is about the same. A majority of survey participants are from the United States, followed by India in both examples. Countries with the highest participation are English-Speaking countries (except for Germany in Stack Exchange).
Bachelor's degrees are the most common degree held by respondents from both surveys.
We've seen a high level overview of the data. To provide customers with the most relevant training possible, we need to discover why people decide to learn a new skill like programming.
We'll provide the several charts and data that we believe supports the idea that new programmers are motivated by income and career opportunities. While only 40% of respondents answered the JobRoleInterest
question; 13,495 observations is more than enough to get a representative sample. There are many different career paths utilizing programming and tech skills that respondents are interested in.
Participants were asked the following questions regarding employment opportunities:
"Imagine that you are assessing a potential job opportunity. Please rank the following aspects of the job opportunity in order of importance , where 1 is the most important and 10 is the least important.
"Now, imagine you are assessing a job's benefits package. Please rank the following aspects of a job's benefits package from most to least important to you, where 1 is most important and 11 is least important.
By calculating the job aspects and benefits, on average the most important values should have a lower score (since 1 is most important, and 10 is least important). Before this calculation, we'll perform a bit of data cleaning on the stack exchange dataset.
# Rename current job related columns from stack exchange dataset
# Currency related columns
currency = exchange.columns[51:56].tolist()
# Columns up to index 38
columns = exchange.columns[:38].tolist()
# Age and gender columns
columns.extend(["Gender", "Age"])
# Add currency related columns to list
for i in currency:
columns.append(i)
# Isolates dataframe down to columns from list "columns"
stk_exchange = exchange[columns].copy()
# Rename job aspects and job benefits columns for easier comprehension
rename_cols = {
"AssessJob1":"Industry_working_in",
"AssessJob2":"Company_funding",
"AssessJob3":"Department_working_in",
"AssessJob4":"Technologies/Frameworks",
"AssessJob5":"Compensation_and_benefits",
"AssessJob6":"Company_culture",
"AssessJob7":"WFH",
"AssessJob8":"Professional_development",
"AssessJob9":"Company_diversity",
"AssessJob10":"Product_impact",
"AssessBenefits1":"Compensation",
"AssessBenefits2":"Stock_options",
"AssessBenefits3":"Health_insurance",
"AssessBenefits4":"Parental_leave",
"AssessBenefits5":"Fitness_wellness_benefit",
"AssessBenefits6":"Retirement",
"AssessBenefits7":"Meals/snacks",
"AssessBenefits8":"Computer/office_equipment",
"AssessBenefits9":"Childcare_benefit",
"AssessBenefits10":"Transportaion_benefit",
"AssessBenefits11":"Conference/education_budget"
}
exchange = exchange.rename(columns=rename_cols)
# Isolate rows only containing following countries listed below
stk_countries = stk_exchange[stk_exchange["Country"].str.contains("United States|India|United Kingdom|Canada", na = False)]
len(stk_countries["Country"])
43644
exchange
Respondent | Hobby | OpenSource | Country | Student | Employment | FormalEducation | UndergradMajor | CompanySize | DevType | YearsCoding | YearsCodingProf | JobSatisfaction | CareerSatisfaction | HopeFiveYears | JobSearchStatus | LastNewJob | Industry_working_in | Company_funding | Department_working_in | Technologies/Frameworks | Compensation_and_benefits | Company_culture | WFH | Professional_development | Company_diversity | Product_impact | Compensation | Stock_options | Health_insurance | Parental_leave | Fitness_wellness_benefit | Retirement | Meals/snacks | Computer/office_equipment | Childcare_benefit | Transportaion_benefit | Conference/education_budget | JobContactPriorities1 | JobContactPriorities2 | JobContactPriorities3 | JobContactPriorities4 | JobContactPriorities5 | JobEmailPriorities1 | JobEmailPriorities2 | JobEmailPriorities3 | JobEmailPriorities4 | JobEmailPriorities5 | JobEmailPriorities6 | JobEmailPriorities7 | UpdateCV | Currency | Salary | SalaryType | ConvertedSalary | CurrencySymbol | CommunicationTools | TimeFullyProductive | EducationTypes | SelfTaughtTypes | TimeAfterBootcamp | HackathonReasons | AgreeDisagree1 | AgreeDisagree2 | AgreeDisagree3 | LanguageWorkedWith | LanguageDesireNextYear | DatabaseWorkedWith | DatabaseDesireNextYear | PlatformWorkedWith | PlatformDesireNextYear | FrameworkWorkedWith | FrameworkDesireNextYear | IDE | OperatingSystem | NumberMonitors | Methodology | VersionControl | CheckInCode | AdBlocker | AdBlockerDisable | AdBlockerReasons | AdsAgreeDisagree1 | AdsAgreeDisagree2 | AdsAgreeDisagree3 | AdsActions | AdsPriorities1 | AdsPriorities2 | AdsPriorities3 | AdsPriorities4 | AdsPriorities5 | AdsPriorities6 | AdsPriorities7 | AIDangerous | AIInteresting | AIResponsible | AIFuture | EthicsChoice | EthicsReport | EthicsResponsible | EthicalImplications | StackOverflowRecommend | StackOverflowVisit | StackOverflowHasAccount | StackOverflowParticipate | StackOverflowJobs | StackOverflowDevStory | StackOverflowJobsRecommend | StackOverflowConsiderMember | HypotheticalTools1 | HypotheticalTools2 | HypotheticalTools3 | HypotheticalTools4 | HypotheticalTools5 | WakeTime | HoursComputer | HoursOutside | SkipMeals | ErgonomicDevices | Exercise | Gender | SexualOrientation | EducationParents | RaceEthnicity | Age | Dependents | MilitaryUS | SurveyTooLong | SurveyEasy | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Yes | No | Kenya | No | Employed part-time | Bachelor’s degree (BA, BS, B.Eng., etc.) | Mathematics or statistics | 20 to 99 employees | Full-stack developer | 3-5 years | 3-5 years | Extremely satisfied | Extremely satisfied | Working as a founder or co-founder of my own c... | I’m not actively looking, but I am open to new... | Less than a year ago | 10.0 | 7.0 | 8.0 | 1.0 | 2.0 | 5.0 | 3.0 | 4.0 | 9.0 | 6.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 3.0 | 1.0 | 4.0 | 2.0 | 5.0 | 5.0 | 6.0 | 7.0 | 2.0 | 1.0 | 4.0 | 3.0 | My job status or other personal status changed | NaN | NaN | Monthly | NaN | KES | Slack | One to three months | Taught yourself a new language, framework, or ... | The official documentation and/or standards fo... | NaN | To build my professional network | Strongly agree | Strongly agree | Neither Agree nor Disagree | JavaScript;Python;HTML;CSS | JavaScript;Python;HTML;CSS | Redis;SQL Server;MySQL;PostgreSQL;Amazon RDS/A... | Redis;SQL Server;MySQL;PostgreSQL;Amazon RDS/A... | AWS;Azure;Linux;Firebase | AWS;Azure;Linux;Firebase | Django;React | Django;React | Komodo;Vim;Visual Studio Code | Linux-based | 1 | Agile;Scrum | Git | Multiple times per day | Yes | No | NaN | Strongly agree | Strongly agree | Strongly agree | Saw an online advertisement and then researche... | 1.0 | 5.0 | 4.0 | 7.0 | 2.0 | 6.0 | 3.0 | Artificial intelligence surpassing human intel... | Algorithms making important decisions | The developers or the people creating the AI | I'm excited about the possibilities more than ... | No | Yes, and publicly | Upper management at the company/organization | Yes | 10 (Very Likely) | Multiple times per day | Yes | I have never participated in Q&A on Stack Over... | No, I knew that Stack Overflow had a jobs boar... | Yes | NaN | Yes | Extremely interested | Extremely interested | Extremely interested | Extremely interested | Extremely interested | Between 5:00 - 6:00 AM | 9 - 12 hours | 1 - 2 hours | Never | Standing desk | 3 - 4 times per week | Male | Straight or heterosexual | Bachelor’s degree (BA, BS, B.Eng., etc.) | Black or of African descent | 25 - 34 years old | Yes | NaN | The survey was an appropriate length | Very easy |
1 | 3 | Yes | Yes | United Kingdom | No | Employed full-time | Bachelor’s degree (BA, BS, B.Eng., etc.) | A natural science (ex. biology, chemistry, phy... | 10,000 or more employees | Database administrator;DevOps specialist;Full-... | 30 or more years | 18-20 years | Moderately dissatisfied | Neither satisfied nor dissatisfied | Working in a different or more specialized tec... | I am actively looking for a job | More than 4 years ago | 1.0 | 7.0 | 10.0 | 8.0 | 2.0 | 5.0 | 4.0 | 3.0 | 6.0 | 9.0 | 1.0 | 5.0 | 3.0 | 7.0 | 10.0 | 4.0 | 11.0 | 9.0 | 6.0 | 2.0 | 8.0 | 3.0 | 1.0 | 5.0 | 2.0 | 4.0 | 1.0 | 3.0 | 4.0 | 5.0 | 2.0 | 6.0 | 7.0 | I saw an employer’s advertisement | British pounds sterling (£) | 51000 | Yearly | 70841.0 | GBP | Confluence;Office / productivity suite (Micros... | One to three months | Taught yourself a new language, framework, or ... | The official documentation and/or standards fo... | NaN | NaN | Agree | Agree | Neither Agree nor Disagree | JavaScript;Python;Bash/Shell | Go;Python | Redis;PostgreSQL;Memcached | PostgreSQL | Linux | Linux | Django | React | IPython / Jupyter;Sublime Text;Vim | Linux-based | 2 | NaN | Git;Subversion | A few times per week | Yes | Yes | The website I was visiting asked me to disable it | Somewhat agree | Neither agree nor disagree | Neither agree nor disagree | NaN | 3.0 | 5.0 | 1.0 | 4.0 | 6.0 | 7.0 | 2.0 | Increasing automation of jobs | Increasing automation of jobs | The developers or the people creating the AI | I'm excited about the possibilities more than ... | Depends on what it is | Depends on what it is | Upper management at the company/organization | Yes | 10 (Very Likely) | A few times per month or weekly | Yes | A few times per month or weekly | Yes | No, I have one but it's out of date | 7 | Yes | A little bit interested | A little bit interested | A little bit interested | A little bit interested | A little bit interested | Between 6:01 - 7:00 AM | 5 - 8 hours | 30 - 59 minutes | Never | Ergonomic keyboard or mouse | Daily or almost every day | Male | Straight or heterosexual | Bachelor’s degree (BA, BS, B.Eng., etc.) | White or of European descent | 35 - 44 years old | Yes | NaN | The survey was an appropriate length | Somewhat easy |
2 | 4 | Yes | Yes | United States | No | Employed full-time | Associate degree | Computer science, computer engineering, or sof... | 20 to 99 employees | Engineering manager;Full-stack developer | 24-26 years | 6-8 years | Moderately satisfied | Moderately satisfied | Working as a founder or co-founder of my own c... | I’m not actively looking, but I am open to new... | Less than a year ago | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 5 | No | No | United States | No | Employed full-time | Bachelor’s degree (BA, BS, B.Eng., etc.) | Computer science, computer engineering, or sof... | 100 to 499 employees | Full-stack developer | 18-20 years | 12-14 years | Neither satisfied nor dissatisfied | Slightly dissatisfied | Working as a founder or co-founder of my own c... | I’m not actively looking, but I am open to new... | Less than a year ago | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | A recruiter contacted me | U.S. dollars ($) | NaN | NaN | NaN | NaN | NaN | Three to six months | Completed an industry certification program (e... | The official documentation and/or standards fo... | NaN | NaN | Disagree | Disagree | Strongly disagree | C#;JavaScript;SQL;TypeScript;HTML;CSS;Bash/Shell | C#;JavaScript;SQL;TypeScript;HTML;CSS;Bash/Shell | SQL Server;Microsoft Azure (Tables, CosmosDB, ... | SQL Server;Microsoft Azure (Tables, CosmosDB, ... | Azure | Azure | NaN | Angular;.NET Core;React | Visual Studio;Visual Studio Code | Windows | 2 | Agile;Kanban;Scrum | Git | Multiple times per day | Yes | Yes | The ad-blocking software was causing display i... | Neither agree nor disagree | Somewhat agree | Somewhat agree | Stopped going to a website because of their ad... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Artificial intelligence surpassing human intel... | Artificial intelligence surpassing human intel... | A governmental or other regulatory body | I don't care about it, or I haven't thought ab... | No | Yes, but only within the company | Upper management at the company/organization | Yes | 10 (Very Likely) | A few times per week | Yes | A few times per month or weekly | Yes | No, I have one but it's out of date | 8 | Yes | Somewhat interested | Somewhat interested | Somewhat interested | Somewhat interested | Somewhat interested | Between 6:01 - 7:00 AM | 9 - 12 hours | Less than 30 minutes | 3 - 4 times per week | NaN | I don't typically exercise | Male | Straight or heterosexual | Some college/university study without earning ... | White or of European descent | 35 - 44 years old | No | No | The survey was an appropriate length | Somewhat easy |
4 | 7 | Yes | No | South Africa | Yes, part-time | Employed full-time | Some college/university study without earning ... | Computer science, computer engineering, or sof... | 10,000 or more employees | Data or business analyst;Desktop or enterprise... | 6-8 years | 0-2 years | Slightly satisfied | Moderately satisfied | Working in a different or more specialized tec... | I’m not actively looking, but I am open to new... | Between 1 and 2 years ago | 8.0 | 5.0 | 7.0 | 1.0 | 2.0 | 6.0 | 4.0 | 3.0 | 10.0 | 9.0 | 1.0 | 10.0 | 2.0 | 4.0 | 8.0 | 3.0 | 11.0 | 7.0 | 5.0 | 9.0 | 6.0 | 2.0 | 1.0 | 4.0 | 5.0 | 3.0 | 7.0 | 3.0 | 6.0 | 2.0 | 1.0 | 4.0 | 5.0 | My job status or other personal status changed | South African rands (R) | 260000 | Yearly | 21426.0 | ZAR | Office / productivity suite (Microsoft Office,... | Three to six months | Taken a part-time in-person course in programm... | The official documentation and/or standards fo... | NaN | NaN | Strongly agree | Agree | Strongly disagree | C;C++;Java;Matlab;R;SQL;Bash/Shell | Assembly;C;C++;Matlab;SQL;Bash/Shell | SQL Server;PostgreSQL;Oracle;IBM Db2 | PostgreSQL;Oracle;IBM Db2 | Arduino;Windows Desktop or Server | Arduino;Windows Desktop or Server | NaN | NaN | Notepad++;Visual Studio;Visual Studio Code | Windows | 2 | Evidence-based software engineering;Formal sta... | Zip file back-ups | Weekly or a few times per month | No | NaN | NaN | Somewhat agree | Somewhat agree | Somewhat disagree | Clicked on an online advertisement;Saw an onli... | 2.0 | 3.0 | 4.0 | 6.0 | 1.0 | 7.0 | 5.0 | Algorithms making important decisions | Algorithms making important decisions | The developers or the people creating the AI | I'm excited about the possibilities more than ... | No | Yes, but only within the company | Upper management at the company/organization | Yes | 10 (Very Likely) | Daily or almost daily | Yes | Less than once per month or monthly | No, I knew that Stack Overflow had a jobs boar... | No, I know what it is but I don't have one | NaN | Yes | Extremely interested | Extremely interested | Extremely interested | Extremely interested | Extremely interested | Before 5:00 AM | Over 12 hours | 1 - 2 hours | Never | NaN | 3 - 4 times per week | Male | Straight or heterosexual | Some college/university study without earning ... | White or of European descent | 18 - 24 years old | Yes | NaN | The survey was an appropriate length | Somewhat easy |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
98850 | 101513 | Yes | Yes | United States | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
98851 | 101531 | No | Yes | Spain | Yes, full-time | Not employed, but looking for work | NaN | NaN | NaN | Back-end developer;Front-end developer | 0-2 years | 0-2 years | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
98852 | 101541 | Yes | Yes | India | Yes, full-time | Employed full-time | Bachelor’s degree (BA, BS, B.Eng., etc.) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
98853 | 101544 | Yes | No | Russian Federation | No | Independent contractor, freelancer, or self-em... | Some college/university study without earning ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
98854 | 101548 | Yes | Yes | Cambodia | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
98855 rows × 129 columns
These benefits and aspects are measured by current employees working in STEM fields. So we have to be careful to not assume these ratings directly relate to new programmers that participated in FreeCodeCamp's survey (as many of these respondents do not work in software/tech jobs).
However, if the same questions were asked by FreeCodeCamp, it's probable that we would see similar results. Therefore, if we use the stack exchange survey as proxy, compensation and health insurance are the most important to job applicants, or those interested in switching jobs. Some of the least important benefits include childcare, parental leave or a fitness/wellness benefit.
Job aspects describe how job candidates view a potential job opportunity, and the particular make-up of an organization. Respondents rated pay and benefits (which for some reason is listed as a benefit and an aspect), the technologies or programs used, career mobility, and the company culture higher than other aspects.
# Slice dataset to contain only job aspect columns
job_assessment = exchange.iloc[:,17:27]
# Constructs new dataframe of column averages
assessments = pd.Series.to_frame(job_assessment.mean(axis=0).sort_values(ascending=False)) # Calculate averages along each column
# Assign index name
assessments.index.name = "Aspects"
# Reset index
assessments.reset_index()
#---------------------------------------------------------------------------------------------------------------------------------#
# Slice dataset to contain only job aspect columns
benefits = exchange.iloc[:,27:38]
# Constructs new dataframe of column averages
job_benefits = pd.Series.to_frame(benefits.mean(axis=0).sort_values(ascending=False)) # Calculate averages along each column
# Assign index name
job_benefits.index.name = "Benefits"
# Reset index
job_benefits.reset_index()
#----------------------------------------------------------------------------------------------------------------------------------#
# Plot results
# If looking for a new job, rate importance of job aspects from 1(most important) to 11(least important)
# Color assignment
colors = ["grey","grey","grey","grey","grey","grey","grey","grey","grey","maroon","maroon"]
fig, ax = plt.subplots(figsize = (8, 6))
plt.barh(job_benefits.index, job_benefits[0], color = colors, height= 0.62)
# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)
# X axis top
ax.xaxis.tick_top()
# Title
plt.title("Job benefits", size = 19, loc = "left", x= -0.35, y = 1.16)
# Text
plt.text(-3.2,12,"Rating (1 most important, 11 least important), average", color = "grey", size = 14)
# X and Y ticks
plt.yticks(size = 14, color = "grey")
plt.xticks(size = 13, color = "grey")
# Highlight top 2 benefits
plt.gca().get_yticklabels()[-1].set_color("maroon")
plt.gca().get_yticklabels()[-2].set_color("maroon")
plt.show()
# Plot results
# If looking for a new job, rate importance of job aspects from 1(most important) to 10(least important)
# Color assignment
colors = ["grey","grey","grey","grey","grey","grey","maroon","maroon","maroon","maroon"]
fig, ax = plt.subplots(figsize = (8, 6))
plt.barh(assessments.index, assessments[0], color = colors, height= 0.6)
# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)
# X axis top
ax.xaxis.tick_top()
# Title
plt.title("Job aspects", size = 19, loc = "left", x= -0.4, y = 1.16)
# Text
plt.text(-3.2,11,"Rating (1 most important, 10 least important), average", color = "grey", size = 14)
# X and Y ticks
plt.yticks(size = 14, color = "grey")
plt.xticks(size = 13, color = "grey")
# Highlight top 4 job aspects
plt.gca().get_yticklabels()[-1].set_color("maroon")
plt.gca().get_yticklabels()[-2].set_color("maroon")
plt.gca().get_yticklabels()[-3].set_color("maroon")
plt.gca().get_yticklabels()[-4].set_color("maroon")
plt.show()
Income
: Respondents were asked their current yearly income.
ExpectedEarning
: "About how much money do you expect to earn per year at your first developer job, in US dollars?"
Has Debt
: The question asked was "Do you have any debt?"
In a high level overview we'll see that the median and average salary of new programmers is less than $50,000 dollars(US). We'll see that new programmers expect to earn about $15,000 to $20,000 more in their new tech/software careers than what they currently earn.
# Income distribution
# Plot results
fig, ax = plt.subplots(figsize = (14,10))
survey["Income"].plot.hist(bins = 120, color = "grey", xlim = (0,250000))
# Remove spines
plt.gca().spines[["right","top"]].set_visible(False)
# Title
plt.title("Income distribution of survey respondents\n(All countries)",loc= "left", size = 18, y = 1.02)
# Average and median income
plt.axvline(survey["Income"].mean(), color = "red", alpha = 0.5, linewidth = 3)
plt.axvline(survey["Income"].median(), color = "blue", alpha = 0.5, linewidth = 3)
# Misc. Text
plt.text(41000, 850, " Average \n Income", size = 15, color = "red")
plt.text(14000, 850, " Median \n Income", size = 15, color = "blue")
# X and Y labels
plt.ylabel("Frequency",size = 15, loc = "top", color ="grey")
plt.xlabel("Income, Yearly (US dollars)", size = 15, loc = "left", color ="grey")
# X and Y ticks
plt.yticks(size = 14)
plt.xticks(size = 13)
plt.show()
# Difference between current income and expected income
fig, ax = plt.subplots(figsize = (13,10))
# Freecodecamp survey expected earning distribution
survey["ExpectedEarning"].plot.kde(xlim = (0, 200000), color = "#ED7E00", linewidth = 3)
# Freecodecamp survey current income distribution
survey["Income"].plot.kde(color = "#4B86C1", linewidth = 3)
# Title
plt.title("Current earnings vs. Expected earnings\n(All countries)", size = 18, loc = "left", y = 1.05)
# X and Y ticks
plt.xticks(size = 14)
plt.yticks(size = 14)
# X and Y labels
plt.ylabel("Density (Probability)", size = 14, color = "grey", loc = "top")
plt.xlabel("Income, Yearly (US dollars)", size = 14, color = "grey", loc = "left")
# Remove spines
plt.gca().spines[["right", "top"]].set_visible(False)
# Misc. text
plt.text(x = 0.01, y = 0.84, s="Income: Freecodecamp", color = "#4B86C1", size = 13, transform=ax.transAxes)
plt.text(x = 0.29, y = .90, s="Desired Income", color = "#ED7E00", size = 13, transform=ax.transAxes)
plt.text(0.55,0.85,"""Typically, survey participants expect to earn\n\$15,000 to \$20,000 more in their new career,
compared to their current income""", color = "grey", size = 14, transform=ax.transAxes)
# X and Y ticks
plt.yticks(size = 13)
plt.xticks(size = 13)
plt.show()
We can find each person's desired salary increase (relative to their current income, as a percentage) by utilizing the following formula:
Increase = New Number - Original Number
% increase = Increase / Original Number x 100
Since we have missing data points in both columns we expect to see negative percentages in the new column that we create. Missing data won't be dropped, instead we'll ignore any percentages below 0.
We'll notice that most often, respondents desire a salary increase in the range of 0% to 120%.
# Column creation using formula above
survey["Percent_Increase"] = (survey["ExpectedEarning"] - survey["Income"]) / survey["Income"] * 100
# Frequency distribution
survey["Percent_Increase"].value_counts(bins = 20, normalize= True) * 100
(-115.647, 734.302] 40.192664 (734.302, 1567.585] 0.570582 (1567.585, 2400.867] 0.044461 (15733.384, 16566.667] 0.014820 (4067.432, 4900.714] 0.007410 (5733.996, 6567.279] 0.007410 (2400.867, 3234.149] 0.007410 (14066.82, 14900.102] 0.007410 (3234.149, 4067.432] 0.000000 (4900.714, 5733.996] 0.000000 (6567.279, 7400.561] 0.000000 (7400.561, 8233.843] 0.000000 (9067.126, 9900.408] 0.000000 (9900.408, 10733.69] 0.000000 (10733.69, 11566.973] 0.000000 (11566.973, 12400.255] 0.000000 (12400.255, 13233.537] 0.000000 (13233.537, 14066.82] 0.000000 (14900.102, 15733.384] 0.000000 (8233.843, 9067.126] 0.000000 Name: Percent_Increase, dtype: float64
fig, ax = plt.subplots(figsize = (13,9))
# Expected salary increase (in a percentage) histogram
survey[survey["Percent_Increase"] <= 500]["Percent_Increase"].plot.hist(bins = 15, color = "grey")
# Boolean masking ^^^ less than or equal to %500 ^^^
# Lower and upper quartile %25 to %75 range
plt.axvspan(survey["Percent_Increase"].quantile(0.25), survey["Percent_Increase"].quantile(0.75), color = "maroon", alpha = 0.4)
# Remove spines
plt.gca().spines[["right","top"]].set_visible(False)
# Title
plt.title("Desired salary increase (in percent)", loc="left", size = 20, y = 1.05)
# X and Y labels
plt.ylabel("Frequency", size = 15, color = "grey", loc = "top")
plt.xlabel("Percent Increase", size = 15, loc = "left", color = "grey")
# X and Y ticks
plt.xticks(size = 13)
plt.yticks(size = 13)
# Text
plt.text(126,1000,"Typical range of expected salary increase (in percent)", size = 14, color = "maroon")
plt.show()
Most respondents do not have financial dependents to care for, and less than half do not have debts to pay off.
# Replaces following columns with True/False values
survey["HasDebt"] = survey["HasDebt"].replace({1.0:"True", 0.0: "False"})
survey["HasFinancialDependents"] = survey["HasFinancialDependents"].replace({1.0:"True", 0.0: "False"})
# Financial dependents
print("Financial Dependents:","\n", survey["HasFinancialDependents"].value_counts(normalize = True, dropna=False) * 100)
print("\n")
# Has debt of any kind
print("Has Debt:", "\n", survey["HasDebt"].value_counts(normalize = True, dropna=False) * 100)
print("\n")
Financial Dependents: False 71.619118 True 21.074472 NaN 7.306410 Name: HasFinancialDependents, dtype: float64 Has Debt: False 50.596517 True 41.948870 NaN 7.454613 Name: HasDebt, dtype: float64
EmploymentStatus
: "Regarding employment status, are you currently..."
Respondents were asked to select their current employment stats, examples include not working, employed for wages, self-employed, military, etc...
About half of respondents answered that they are actively working in some manner for their income. A smaller percentage neglected to answer, and the remaining participants are either not working but actively looking for work, not working and not looking for work, and the survey includes stay at home parents.
"Employed for wages" is the most common employment status, but this group has lowest median hours spent per week (10 hours) learning. The employment group "Not working but looking for work" has the highest median hours (20). Typically, respondents spend about 12 hours per week (median) or 1.7 hours per day learning programming. We did not calculate the weekly average, because the data contains many outliers in the range of 30 hours to 175 per week that significantly skews the distribtion.
# Fills in missing data from hours learning column
survey["HoursLearning"] = survey["HoursLearning"].fillna(survey["HoursLearning"].median())
# Hours spent learning distribution
fig, ax = plt.subplots(figsize = (12, 6))
sns.boxplot(x= "HoursLearning", data = survey, color = "grey", medianprops=dict(color="maroon", alpha=1))
# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)
# Title
plt.title("Hours spent learning per week", loc="left", size = 20, y = 1.05)
# Misc. Text
plt.text(80,-0.04,"Outliers", size = 14)
plt.text(0, -0.42, "Median (12 hours/week)", size = 14, color = "maroon")
plt.text(100, -0.3, "75% of participants spent\n20 hours or less per week learning", size = 14, color = "grey")
plt.text(100, -0.15, "25% of participants spent\n6 hours or less per week learning", size = 14, color = "grey")
# X label
plt.xlabel("Hours", size = 15, loc = "left", color = "grey")
# X ticks
plt.xticks(size = 13, color = "grey")
plt.show()
# Print stats
print(survey["HoursLearning"].describe())
count 13495.000000 mean 16.955761 std 14.573179 min 0.000000 25% 7.000000 50% 12.000000 75% 20.000000 max 168.000000 Name: HoursLearning, dtype: float64
# Frequency table employment status
survey["EmploymentStatus"].value_counts(dropna=False)
Employed for wages 5564 Not working but looking for work 3395 NaN 1899 Not working and not looking for work 948 Self-employed freelancer 644 Doing an unpaid internship 294 Unable to work 235 A stay-at-home parent or homemaker 220 Self-employed business owner 210 Military 62 Retired 24 Name: EmploymentStatus, dtype: int64
# Hours spent per week by employment status
survey.groupby("EmploymentStatus")["HoursLearning"].median().sort_values(ascending=False)
EmploymentStatus Not working but looking for work 20.0 Self-employed freelancer 20.0 Doing an unpaid internship 15.0 Self-employed business owner 15.0 Not working and not looking for work 14.0 A stay-at-home parent or homemaker 13.0 Retired 12.0 Unable to work 12.0 Employed for wages 10.0 Military 10.0 Name: HoursLearning, dtype: float64
MonthsProgramming
: "About how many months have you been programming for? ("Programming experience")
There is some evidence that may suggest the type of career field has less influence on the motivation of individuals to learning programming.
Farming/fishing/forestry and education (typically careers we would not associate with programming/software development) have the greatest number of months programming. Besides these two career fields the IT/Software development field has the third highest average amount of experience. Presumbably respondents in the IT/Software development were likely spending time outside of work learning, or had just been hired.
Farming/fishing/forestry and education are some of the lowest paid career fields in this survey, yet on average, respondents expected a lower expected income than other career fields. Instead we see higher paying careers with less "programming experience" expecting higher income after switching to tech/software related jobs.
There may be a better argument to be made that education level may have more influence over a person's reason to begin learning a skill like programming for more career opportunities.
# Salary and experience comparison for employment fields
# Assign groupby objects for plotting using SchoolDegree
empfld_months_prg = survey.groupby("EmploymentField").mean().sort_values(by="MonthsProgramming") # sort by the average number of months programming
empfld_income = survey.groupby("EmploymentField").mean().sort_values(by="Income") # sort by the average income
empfld_expected_salary = survey.groupby("EmploymentField").mean().sort_values(by="ExpectedEarning")
#-------------------------------------------------------------------------------------------------------------------------------------#
# Color assignment
colors = ["grey", "grey", "grey", "grey", "grey","grey", "grey", "grey", "grey", "grey","grey", "grey", "maroon", "maroon", "maroon",]
# Plot results experience
fig, ax = plt.subplots(figsize = (8, 6))
plt.barh(empfld_months_prg.index, empfld_months_prg["MonthsProgramming"], color = colors, height = 0.6)
# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)
# Y label
plt.ylabel("Career Field", loc = "top", size = 14, color = "grey")
# Text
plt.text(-8,16.3,"Average number of months", size = 14, color = "grey")
# Title
plt.title("New programmer experience by career field", size = 20, loc = "left", x = -0.65, y = 1.12)
# X axis to top
ax.xaxis.tick_top()
# X and Y ticks
plt.xticks(size = 13, color = "grey")
plt.yticks(size = 14)
# Highlight top 3 career fields most experience
plt.gca().get_yticklabels()[-1].set_color("maroon")
plt.gca().get_yticklabels()[-2].set_color("maroon")
plt.gca().get_yticklabels()[-3].set_color("maroon")
plt.show()
#---------------------------------------------------------------------------------------------------------------------------------------#
# Salary
# Color assignment
colors = ["maroon", "grey", "grey", "grey", "grey","maroon", "grey", "maroon", "grey", "grey","grey", "grey", "grey", "grey", "grey",]
# Plot results income
fig, ax = plt.subplots(figsize = (8, 6))
plt.barh(empfld_income.index, empfld_income["Income"], color = colors, height = 0.6)
# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)
# Y label
plt.ylabel("Career Field", loc = "top", size = 14, color = "grey")
# Text
plt.text(-25000,16.4,"Average Salary (US dollars)", size = 14, color = "grey")
plt.text(30000,-0.5, "Career fields shaded in red\nhave the highest average number of months\nspent learning programming", color = "grey")
# Title
plt.title("Salary by career field", size = 20, loc = "left", x = -0.65, y = 1.12)
# X axis to top
ax.xaxis.tick_top()
# X and Y ticks
plt.xticks(size = 13, color = "grey")
plt.yticks(size = 14)
# Highlight top 3 career fields most experience
plt.gca().get_yticklabels()[0].set_color("maroon")
plt.gca().get_yticklabels()[5].set_color("maroon")
plt.gca().get_yticklabels()[-8].set_color("maroon")
plt.show()
#--------------------------------------------------------------------------------------------------------------------------------------------#
# Color assignment
colors_salary = ["maroon", "grey", "grey", "grey", "grey","grey", "grey", "maroon", "grey", "grey","maroon", "grey", "grey", "grey", "grey"]
# Plot results expected earning
fig, ax = plt.subplots(figsize = (8, 6))
plt.barh(empfld_expected_salary.index, empfld_expected_salary["ExpectedEarning"].sort_values(), color = colors_salary, height = 0.6)
# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)
# Y label
plt.ylabel("Career Field", loc = "top", size = 14, color = "grey")
# Text
plt.text(-20000, 16.3,"Average (US dollars)", size = 14, color = "grey")
# Title
plt.title("Expected annual salary increase", size = 20, loc = "left", x = -0.65, y = 1.12)
# X axis to top
ax.xaxis.tick_top()
# X and Y ticks
plt.xticks(size = 13, color = "grey")
plt.yticks(size = 14)
# Highlight top 3 career fields most experience
plt.gca().get_yticklabels()[0].set_color("maroon")
plt.gca().get_yticklabels()[-5].set_color("maroon")
plt.gca().get_yticklabels()[7].set_color("maroon")
plt.show()
Earlier we noticed that the career field someone is in may have a weaker influence on the motivation for people to start learning programming. Instead, a person's education level may be a more significant factor. The data suggests that individuals with education less than a bachelor's have some of the greatest amount of "programming experience".
These three fields are:
They have the highest median and average expected salary increase (in percent), and in terms of yearly income, respondents in these groups are some of the lowest earning. However, we have to note that in comparison to the average expected earning (in US dollar amounts), degree holders with Ph.D.s, professional degrees, and bachelor's generally expect a higher salary, with the exception of associate's degree holders.
Neither career type nor education level are perfect indicators for whether or not some one may be more motivated/interested in learning new programming/tech skills. We think it's reasonable to argue that the data suggests survey participants are generally interested in programming for the career and income opportunities.
# Average expected earning by school degree
round(survey.groupby("SchoolDegree")["ExpectedEarning"].mean().sort_values(ascending=False), 2)
SchoolDegree Ph.D. 61165.49 associate's degree 60870.16 professional degree (MBA, MD, JD, etc.) 56383.49 bachelor's degree 55103.32 no high school (secondary school) 54756.51 some high school 53954.36 some college credit, no degree 53603.29 master's degree (non-professional) 52670.46 trade, technical, or vocational training 49398.36 high school diploma or equivalent (GED) 48131.91 Name: ExpectedEarning, dtype: float64
# Median expeceted salary increase (percent)
salary_increase_median = round(survey.groupby("SchoolDegree")["Percent_Increase"].median().sort_values(ascending=False),2)
salary_increase_median = pd.Series.to_frame(salary_increase_median).reset_index()
salary_increase_median = salary_increase_median.rename(columns={"index":"SchoolDegree","Percent_Increase":"Percentage"})
# Average expected salary increase (percent)
salary_increase = round(survey.groupby("SchoolDegree")["Percent_Increase"].mean().sort_values(ascending=False),2)
salary_increase = pd.Series.to_frame(salary_increase).reset_index()
salary_increase = salary_increase.rename(columns={"index":"SchoolDegree","Percent_Increase":"Percentage"})
#------------------------------------------------------------------------------------------------------------------------#
# Color assignment
colors = ["#145DDE", "#145DDE","#145DDE", "grey", "grey", "grey","grey", "grey", "grey", "grey"]
# Plot (1) results average and median salary increase (percent)
fig, ax = plt.subplots(figsize = (8, 6))
# Title
plt.title("Expected salary raise by education level", size = 19, loc = "left", x= -0.65, y = 1.16)
# Y label
plt.ylabel("School Degree", loc = "top", size = 14, color = "grey")
# Misc. text
plt.text(-4,-2,"Median", color = "#4B86C1", size = 14)
plt.text(35,-2,"Average", color = "grey", size = 14)
plt.text(80,-2,"(Percent)", size = 14)
plt.text(160, 2.5,"Education levels below bachelor's\ndegree have the highest average\nand median expected salary increase", color = "grey")
# Average plotted
plt.barh(salary_increase["SchoolDegree"], salary_increase["Percentage"], color = colors, height = 0.62)
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)
ax.xaxis.tick_top()
# Median plotted
plt.barh(salary_increase_median["SchoolDegree"], salary_increase_median["Percentage"], color = "#4B86C1", height = 0.62)
plt.yticks(size = 14, color = "grey")
plt.xticks(size = 13, color = "grey")
plt.gca().invert_yaxis()
# Top 3 education levels by expected salary raise highlighted
plt.gca().get_yticklabels()[0].set_color("#145DDE")
plt.gca().get_yticklabels()[1].set_color("#145DDE")
plt.gca().get_yticklabels()[2].set_color("#145DDE")
plt.show()
#-----------------------------------------------------------------------------------------------------------------------------------------#
# Assign groupby objects for plotting using SchoolDegree
schl_dgree = survey.groupby("SchoolDegree").mean().sort_values(by = "MonthsProgramming") # sort by the average number of months programming
degree_income = survey.groupby("SchoolDegree").mean().sort_values(by = "Income") # sort by the average income
# Plot results income by school edcuation level
colors_degree_income = ["#145DDE", "grey","#145DDE", "grey", "grey", "#145DDE","grey", "grey", "grey", "grey"]
# Plot (2) school degree income
fig, ax = plt.subplots(figsize = (8, 6))
plt.barh(degree_income.index, degree_income["Income"], color = colors_degree_income, height = 0.62)
# Title
plt.title("Salary by education", size = 20, loc = "left", x = -0.7, y = 1.12)
# Text
plt.text(-23000,10.7,"Average Salary (US dollars)", size = 14, color = "grey")
# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)
# Y label
plt.ylabel("School Degree", loc = "top", size = 14, color = "grey")
# X axis to top
ax.xaxis.tick_top()
# X and Y ticks
plt.xticks(size = 13, color = "grey")
plt.yticks(size = 14)
# Top 3 education levels by expected salary raise highlighted
plt.gca().get_yticklabels()[0].set_color("#145DDE")
plt.gca().get_yticklabels()[2].set_color("#145DDE")
plt.gca().get_yticklabels()[5].set_color("#145DDE")
plt.show()
#-----------------------------------------------------------------------------------------------------------------------------------------#
# Plot results number of months programming by edcuation level
colors_schl_dgree = ["grey", "grey","grey", "grey", "grey", "#145DDE","#145DDE", "grey", "#145DDE", "grey"]
# Plot (3) school degree number of months programming
fig, ax = plt.subplots(figsize = (8, 6))
plt.barh(schl_dgree.index, schl_dgree["MonthsProgramming"], color = colors_schl_dgree, height = 0.62)
# Title
plt.title("New programmer experience by education", size = 20, loc = "left", x = -0.7, y = 1.12)
# Text
plt.text(-11,10.7,"Average number of months", size = 14, color = "grey")
# Remove spines
plt.gca().spines[["right", "left", "top", "bottom"]].set_visible(False)
# Y label
plt.ylabel("School Degree", loc = "top", size = 14, color = "grey")
# X axis to top
ax.xaxis.tick_top()
# X and Y ticks
plt.xticks(size = 13, color = "grey")
plt.yticks(size = 14)
# Top 3 education levels by expected salary raise highlighted
plt.gca().get_yticklabels()[-2].set_color("#145DDE")
plt.gca().get_yticklabels()[-4].set_color("#145DDE")
plt.gca().get_yticklabels()[-5].set_color("#145DDE")
plt.show()
# Calculate median to avoid most of the skewness from outliers
# Median hours spent studying programming
survey.groupby("SchoolDegree")["HoursLearning"].median().sort_values(ascending=False)
SchoolDegree trade, technical, or vocational training 13.0 Ph.D. 12.0 associate's degree 12.0 bachelor's degree 12.0 high school diploma or equivalent (GED) 12.0 master's degree (non-professional) 12.0 professional degree (MBA, MD, JD, etc.) 12.0 some college credit, no degree 12.0 some high school 12.0 no high school (secondary school) 10.0 Name: HoursLearning, dtype: float64
A vast majority of respondents reside in the United States, followed by India at about 7 % and the United Kingdom at 5 %. Before making a decision, we need to find out how much are new programmers willing to spend on education. If we advertise in markets that are only interested in free learning we're unlikely to be profitable.
MoneyForLearning
column describes the amount of money that survey participants have spent since the beginning of their programming journey. Since our business model operates on a monthly subscription we are interested in how much customers are willing to spend per month. To find that information we need to create a new column.
Formula: MoneyForLearning
/ MonthsProgramming
We may need to limit our analysis to the following countries: US, India, UK, and Canada. Two reasons for this decision are:
# Months programming frequency
survey["MonthsProgramming"].value_counts().head(20)
1.0 1373 6.0 1371 12.0 1334 3.0 1273 2.0 1228 24.0 821 4.0 733 5.0 557 36.0 441 0.0 421 8.0 412 10.0 320 18.0 288 7.0 246 9.0 229 20.0 194 48.0 190 30.0 149 60.0 143 15.0 143 Name: MonthsProgramming, dtype: int64
To avoid dividing by zero, we'll need to change that particular value with 1. We can at least assume that respondents that answered with 0 months experience had probably just started and had only a few weeks of experience. For simplicity we'll change it to 1.
# Set new copy
spending = survey.copy()
# Replaces any instances of "zero months programming" (0) with (1) for proper calculation
spending["MonthsProgramming"] = spending["MonthsProgramming"].replace({0:1})
# Calculates monthly spending by dividing money for learning with number of months programming
spending["Monthly_spending"] = spending["MoneyForLearning"] / spending["MonthsProgramming"]
spending["Monthly_spending"].value_counts(dropna=False)
0.000000 5769 NaN 1140 16.666667 297 50.000000 264 100.000000 246 ... 130.000000 1 80000.000000 1 76.000000 1 47.222222 1 1600.000000 1 Name: Monthly_spending, Length: 707, dtype: int64
# Total number of missing data points in monthly_spending column
spending["Monthly_spending"].isna().sum()
1140
# Drop missing data from following columns
spending = spending.dropna(subset=["CountryLive","Monthly_spending"])
# Groupby and calculate mean
avg_month = spending.groupby("CountryLive").mean()
# Shows only four countries selected below
avg_month["Monthly_spending"][["United States of America", "India","United Kingdom", "Canada"]]
CountryLive United States of America 256.969675 India 100.449884 United Kingdom 93.828988 Canada 141.571630 Name: Monthly_spending, dtype: float64
# Assigns new variable for countries listed below
four_countries = spending[spending["CountryLive"].str.contains("United States of America|India|United Kingdom|Canada")]
# Plot results of outliers in USA, India, UK, and Canada
fig, ax = plt.subplots(figsize = (12, 8))
sns.boxplot(x = "CountryLive", y = "Monthly_spending", data = four_countries)
# Remove spines
plt.gca().spines[["right","top"]].set_visible(False)
# X ticks
plt.xticks(size = 13, color = "grey")
# X and Y labels
plt.ylabel("US dollars", loc = "top", size = 14, color = "grey")
plt.xlabel("")
# Title
plt.title("Money spent per month", loc = "left", size = 20)
plt.show()
It's still difficult to tell if the data is wrong or not with so many outliers in each country. There are far too many data points with monthly_spending
values exceeding several thousand dollars. These outliers skew the distribution of monthly spending.
Using the .value_counts()
method with bins set to 20 should give a clearer picture of the distribution of monthly_spending
. With this we should be able to know where to isolate the data further.
# Value counts method shows distribution
spending["Monthly_spending"].value_counts(bins = 20, normalize= True) * 100
(-80.001, 4000.0] 99.210412 (4000.0, 8000.0] 0.442516 (8000.0, 12000.0] 0.182213 (16000.0, 20000.0] 0.052061 (12000.0, 16000.0] 0.043384 (24000.0, 28000.0] 0.017354 (36000.0, 40000.0] 0.008677 (68000.0, 72000.0] 0.008677 (48000.0, 52000.0] 0.008677 (76000.0, 80000.0] 0.008677 (32000.0, 36000.0] 0.008677 (28000.0, 32000.0] 0.008677 (44000.0, 48000.0] 0.000000 (52000.0, 56000.0] 0.000000 (56000.0, 60000.0] 0.000000 (60000.0, 64000.0] 0.000000 (64000.0, 68000.0] 0.000000 (20000.0, 24000.0] 0.000000 (72000.0, 76000.0] 0.000000 (40000.0, 44000.0] 0.000000 Name: Monthly_spending, dtype: float64
$4,000 (US) per month is higher than even the average college tuition in the United States, nonetheless, we'll use this amount as a cutoff for re-calculating the monthly spending of United States, India, UK, and Canada. This should result in slightly less skewed calculations.
four_countries
Age | AttendedBootcamp | CityPopulation | CommuteTime | CountryCitizen | CountryLive | EmploymentField | EmploymentStatus | Gender | HasDebt | HasFinancialDependents | HasHighSpdInternet | HasServedInMilitary | HoursLearning | Income | IsEthnicMinority | IsReceiveDisabilitiesBenefits | IsSoftwareDev | IsUnderEmployed | JobApplyWhen | JobPref | JobWherePref | LanguageAtHome | MaritalStatus | MoneyForLearning | MonthsProgramming | SchoolDegree | SchoolMajor | JobRoleInterest | ExpectedEarning | Year | Percent_Increase | Monthly_spending | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 34.0 | 0.0 | less than 100,000 | NaN | United States of America | United States of America | NaN | Not working but looking for work | male | True | False | 1.0 | 0.0 | 10.0 | NaN | 0.0 | 0.0 | 0.0 | NaN | Within 7 to 12 months | work for a nonprofit | in an office with other developers | English | single, never married | 80.0 | 6.0 | some college credit, no degree | NaN | [Full-Stack Web Developer] | 35000.0 | 2017 | NaN | 13.333333 |
2 | 21.0 | 0.0 | more than 1 million | 15 to 29 minutes | United States of America | United States of America | software development and IT | Employed for wages | male | False | False | 1.0 | 0.0 | 25.0 | 13000.0 | 1.0 | 0.0 | 0.0 | 0.0 | Within 7 to 12 months | work for a medium-sized company | no preference | Spanish | single, never married | 1000.0 | 5.0 | high school diploma or equivalent (GED) | NaN | [ Front-End Web Developer, Back-End Web Deve... | 70000.0 | 2017 | 438.461538 | 200.000000 |
6 | 29.0 | 0.0 | between 100,000 and 1 million | 30 to 44 minutes | United Kingdom | United Kingdom | NaN | Employed for wages | female | True | False | 1.0 | 0.0 | 16.0 | 40000.0 | NaN | 0.0 | 0.0 | 0.0 | I'm already applying | work for a medium-sized company | no preference | English | married or domestic partnership | 0.0 | 12.0 | some college credit, no degree | NaN | [Full-Stack Web Developer] | 30000.0 | 2017 | -25.000000 | 0.000000 |
15 | 32.0 | 0.0 | less than 100,000 | 30 to 44 minutes | United States of America | United States of America | sales | Employed for wages | male | True | False | 1.0 | 0.0 | 1.0 | 20000.0 | 0.0 | 0.0 | 0.0 | 1.0 | more than 12 months from now | work for a nonprofit | in an office with other developers | English | single, never married | 0.0 | 1.0 | master's degree (non-professional) | English | [Full-Stack Web Developer] | 40000.0 | 2017 | 100.000000 | 0.000000 |
16 | 29.0 | 0.0 | between 100,000 and 1 million | 30 to 44 minutes | Lithuania | United States of America | finance | Employed for wages | male | False | False | 1.0 | 0.0 | 6.0 | 60000.0 | 0.0 | 0.0 | 0.0 | 0.0 | Within the next 6 months | work for a medium-sized company | in an office with other developers | English | married or domestic partnership | 200.0 | 12.0 | master's degree (non-professional) | Political Science | [Full-Stack Web Developer] | 60000.0 | 2017 | 0.000000 | 16.666667 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
15571 | 61.0 | 0.0 | less than 100,000 | 20.0 | United States of America | United States of America | farming, fishing, and forestry | Employed for wages | male | False | True | 1.0 | 0.0 | 40.0 | 60000.0 | 0.0 | 0.0 | 0.0 | 1.0 | Within 7 to 12 months | work for a medium-sized company | no preference | English | married or domestic partnership | 500.0 | 240.0 | bachelor's degree | Computer Programming | [Full-Stack Web Developer] | 80000.0 | 2016 | 33.333333 | 2.083333 |
15578 | 42.0 | 0.0 | between 100,000 and 1 million | 60.0 | United States of America | United States of America | NaN | Self-employed business owner | female | True | True | 1.0 | 0.0 | 25.0 | 60000.0 | 0.0 | 0.0 | 0.0 | 1.0 | Within 7 to 12 months | work for a medium-sized company | no preference | English | married or domestic partnership | 0.0 | 1.0 | bachelor's degree | Film and Video Studies | [Full-Stack Web Developer] | 60000.0 | 2016 | 0.000000 | 0.000000 |
15598 | 51.0 | 0.0 | less than 100,000 | 30.0 | United States of America | United States of America | finance | Employed for wages | male | True | True | 1.0 | 1.0 | 30.0 | 200000.0 | 0.0 | 0.0 | 0.0 | 0.0 | more than 12 months from now | work for a medium-sized company | in an office with other developers | English | married or domestic partnership | 100.0 | 12.0 | professional degree (MBA, MD, JD, etc.) | Investments and Securities | [Full-Stack Web Developer] | 100000.0 | 2016 | -50.000000 | 8.333333 |
15600 | 38.0 | 0.0 | more than 1 million | 90.0 | United States of America | United States of America | finance | Employed for wages | male | False | True | 1.0 | 0.0 | 6.0 | 200000.0 | 0.0 | 0.0 | 0.0 | 0.0 | more than 12 months from now | work for a startup | no preference | English | married or domestic partnership | 500.0 | 12.0 | bachelor's degree | Finance | [Full-Stack Web Developer] | 150000.0 | 2016 | -25.000000 | 41.666667 |
15615 | 28.0 | 0.0 | less than 100,000 | 7.0 | United States of America | United States of America | food and beverage | Employed for wages | male | True | True | 1.0 | 0.0 | 20.0 | 200000.0 | 0.0 | 0.0 | 0.0 | 1.0 | I'm already applying | work for a medium-sized company | from home | English | married or domestic partnership | 1400.0 | 7.0 | associate's degree | Computer and Information Systems Security | [Full-Stack Web Developer] | 50000.0 | 2016 | -75.000000 | 200.000000 |
7536 rows × 33 columns
# Money spent less than or equal to $4,000
four_countries = four_countries[four_countries["Monthly_spending"] <= 4000]
# Dataframe length
four_countries.shape
(7469, 33)
Respondents that indicated they paid for learning will help us make our decision for which countries to advertise in. We noticed that over 92% of survey participants from United States, India, United Kingdom, and Canada did not attend a programming bootcamp (online or in person training program that teaches the fundamentals of programming within a limited timeframe, not to be confused with our e-learning platform that is accessible 24/7 at the customer's learning pace).
The average monthly spending grouped by bootcamp attendance indicates that bootcamp attendees spend far more than those that did not attend a programming bootcamp. This indicates a skewness in the data that may yield inaccurate/skewed results for spending by country.
# Bootcamp attendance frequency
# 1 = Yes
# 0 = No
four_countries["AttendedBootcamp"].value_counts(normalize= True, dropna= False) * 100
0.0 92.703173 1.0 6.868389 NaN 0.428438 Name: AttendedBootcamp, dtype: float64
# Average money spent grouped by bootcamp attendance
four_countries.groupby("AttendedBootcamp")["Monthly_spending"].mean()
AttendedBootcamp 0.0 72.306148 1.0 952.885023 Name: Monthly_spending, dtype: float64
# Number of observations where individuals did not spend money
print("Free learning:", len(four_countries[four_countries["Monthly_spending"] <= 0]), "observations")
# Number of observations where individuals did not spend money
print("Paid learning", len(four_countries[four_countries["Monthly_spending"] > 0]), "observations")
Free learning: 3271 observations Paid learning 4198 observations
# Frequency of observations that did not spend money, but did attend a programming bootcamp
# Boolean masking
free = (four_countries[four_countries["Monthly_spending"] == 0])
free["AttendedBootcamp"].value_counts(dropna=False, normalize= True) * 100
0.0 98.135127 1.0 1.284011 NaN 0.580862 Name: AttendedBootcamp, dtype: float64
# Assigns variable that isolates respondents that only paid for learning
attended_bc = four_countries[four_countries["Monthly_spending"] > 0]
# Returns frequency of all paid learning based on bootcamp attendance
attended_bc["AttendedBootcamp"].value_counts(dropna=False, normalize= True) * 100
0.0 88.470700 1.0 11.219628 NaN 0.309671 Name: AttendedBootcamp, dtype: float64
# Variable assignment for boolean masking, both values must meet criteria below
# Paid learning, and bootcamp attendance is false
group_a = (four_countries["Monthly_spending"] > 0) & (four_countries["AttendedBootcamp"] == 1)
a = four_countries[group_a]
# Paid learning, and bootcamp attendance is false
group_b = (four_countries["Monthly_spending"] > 0) & (four_countries["AttendedBootcamp"] == 0)
b = four_countries[group_b]
We'll see that group A (paid learners that also attended a programming bootcamp) has a greater range of monthly spending. Any amount from $1 to $4,000 US dollars is common. As programming bootcamps are generally expensive this does not come as a surprise. The opposite is true for group b (paid learners that did not attend a bootcamp). Group b could be any other paid learning service or subscription, thus we generally see that most people in this group spent less than $500 US dollars.
Bootcamp attendees clearly skews the data for Monthly_spending
, however we cannot discount that they did not pursue other means of learning as well. Instead we need to understand that these bootcamps are expensive, whereas other means of learning (outside of community colleges and universities) are cheaper.
We'll demonstrate that the monthly spending by country is significantly higher when bootcamp attendance is true. Whereas average spending is more reasonable when considering all spending habits.
# Plot results of monthly spending for group a
fig, ax = plt.subplots(figsize = (13,9))
four_countries[group_a]["Monthly_spending"].plot.hist(bins = 20, color = "grey")
# Remove spines
plt.gca().spines[["right","top"]].set_visible(False)
# Title
plt.title("Monthly spending of programming bootcamp attendees", loc="left", size = 20, y = 1.05)
# X and Y labels
plt.ylabel("Frequency", size = 16, color = "grey", loc = "top")
plt.xlabel("US dollars", size = 14, loc = "left", color = "grey")
# X and Y ticks
plt.xticks(size = 13)
plt.yticks(size = 14)
# Text
plt.text(1500, 80,"Respondents that reported attendance\nof a programming bootcamp spent any amount up to $4,000", color = "grey", size = 14)
plt.show()
# Plot results of monthly spending for group b
fig, ax = plt.subplots(figsize = (13,9))
four_countries[group_b]["Monthly_spending"].plot.hist(bins = 15, color = "grey")
# Remove spines
plt.gca().spines[["right","top"]].set_visible(False)
# Title
plt.title("Monthly spending (excluding bootcamp attendees)", loc="left", size = 20, y = 1.05)
# X and Y labels
plt.ylabel("Frequency", size = 16, color = "grey", loc = "top")
plt.xlabel("US dollars", size = 14, loc = "left", color = "grey")
# X and Y ticks
plt.xticks(size = 13)
plt.yticks(size = 14)
# Text
plt.text(2000,2000,"Excluding programming bootcamp attendees,\nmost participants spent less than $500", color = "grey", size = 14)
plt.show()
# Isolate rows to include only monthly spending less than or equal to $4000
spending = spending[spending["Monthly_spending"] <= 4000]
# Monthly average spending by country
avg_month = spending.groupby("CountryLive").mean()
country_spends = avg_month["Monthly_spending"][["United States of America", "India","United Kingdom", "Canada"]].sort_index(ascending=False)
# Respondents that have spent money for learning, and did attend a bootcamp
over_zero_and_bootcamp = four_countries[group_a].groupby("CountryLive")["Monthly_spending"].mean().sort_index(ascending=False)
# X labels
labels = ["United States", "United Kingdom", "India", "Canada"]
x = np.arange(len(labels)) # the label locations
width = 0.35 # the width of the bars
# Plot results
fig, ax = plt.subplots(figsize = (12, 8))
rects1 = ax.bar(x - width/2, country_spends, width, label="Includes Free Learners",color = "#4B86C1")
rects2 = ax.bar(x + width/2, over_zero_and_bootcamp, width, label= "Bootcamp attendees only", color = "#ED7E00")
# Subscription price
plt.axhline(59, color = "black", alpha = 0.5, label = "Subscription Price ($59)", linewidth = 2, linestyle = "--")
# Text
plt.text(-0.5,1110,"Country averages only include bootcamp\nattendees that paid for learning", size = 14, color = "grey")
# Labels and title
ax.set_ylabel("US dollars", loc = "top", size = 14, color = "grey")
ax.set_title("Average Monthly Spending", size = 20, loc = "left")
ax.set_xticks(x, labels, size = 14, color = "grey")
# Legend
ax.legend(loc = "center left")
ax.spines[["right", "left", "top", "bottom"]].set_visible(False)
# Bar labels
ax.bar_label(rects1, padding=3)
ax.bar_label(rects2, padding=3)
# Apply tight layout
fig.tight_layout()
plt.show()
#------------------------------------------------------------------------------------------------------------------------------------------------#
# Isolate rows to include only monthly spending less than or equal to $4000
spending = spending[spending["Monthly_spending"] <= 4000]
# Monthly average spending by country
avg_month = spending.groupby("CountryLive").mean()
country_spends = avg_month["Monthly_spending"][["United States of America", "India","United Kingdom", "Canada"]].sort_index(ascending=False)
# Respondents that have spent money for learning, but did not attend a bootcamp
over_zero = four_countries[four_countries["Monthly_spending"] > 0].groupby("CountryLive")["Monthly_spending"].mean().sort_index(ascending=False)
# X labels
labels = ["United States", "United Kingdom", "India", "Canada"]
x = np.arange(len(labels)) # the label locations
width = 0.35 # the width of the bars
# Plot results
fig, ax = plt.subplots(figsize = (12, 8))
rects1 = ax.bar(x - width/2, country_spends, width, label="Includes Free Learners",color = "#4B86C1")
rects2 = ax.bar(x + width/2, over_zero, width, label= "Excludes Free Learners", color = "#ED7E00")
# Subscription price
plt.axhline(59, color = "black", alpha = 0.5, label = "Subscription Price ($59)", linewidth = 2, linestyle = "--")
# Labels and title
ax.set_ylabel("US dollars", loc = "top", size = 14, color = "grey")
ax.set_title("Average Monthly Spending", size = 20, loc = "left")
ax.set_xticks(x, labels, size = 14, color = "grey")
# Legend
ax.legend(loc = "upper right")
# Remove spines
ax.spines[["right", "left", "top", "bottom"]].set_visible(False)
# Bar labels
ax.bar_label(rects1, padding=3)
ax.bar_label(rects2, padding=3)
# Apply tight layout
fig.tight_layout()
plt.show()
The United States should be our first choice for advertising:
India could be the second choice:
The analysis of freeCodeCamp's survey indicates that the United States should be the prime candidate for advertising based on the following criteria:
The second candidate can be India:
We believe we explained why people are learning and practicing a new skill like programming. The information in this survey points to a desire for upward mobility, career advancement, and higher income. We demonstrated that the difference between current yearly salary from the expected salary was great enough to explain the decisions of survey participants. Many people indicated that they are interested in software/data science careers outside of the current one they have with an expectation of increased salary.