I used the recommended procedure shown below throughout the complete project to prevent the above Warning. It didn't always work.
"Try using .loc[row_indexer,col_indexer] = value instead"
In this guided project, we'll work with exit surveys from employees of the Department of Education, Training and Employment (DETE) and the Technical and Further Education (TAFE) institute in Queensland, Australia. You can find the TAFE exit survey here and the survey for the DETE here.
Some slight modifications were made to these datasets to make them easier to work with, including changing the encoding to UTF-8 (the original ones are encoded using cp1252.)
For this project, the questions posed are:
1. Are employees who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer?
2. Are younger employees resigning due to some kind of dissatisfaction? What about older employees?
I'm going to deviate slightly from the project guidelines. The guidelines suggest that after we clean and organize the two data sets, combine them and answer the questions "across" the two institutes.
I would agree with this if I knew at this point of the analysis that the answers to the questions for each of the institutes independent of each other were close enough to be considered the same. I choose to not assume that, and will therefore answer the questions for each of the Institutes separately and compare the results. After that, I will still combine the data and analyze the results of that as well.
MY CHOSEN PROCESS:
# Import both pandas, numpy and matplotlib to cover all needed code for this project.
# Read in and save the DETE Institute *.csv data file as a DataFrame.
# Not all missing data is signified as NaN, some are as 'Not Stated'.
# Add na_values function to standardize missing data as NaN.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dete_survey = pd.read_csv('dete_survey.csv', na_values=['Not Stated'])
# print column headings and file info to get a feel for the data file structure.
print(dete_survey.columns)
print('\n')
print(dete_survey.info)
Index(['ID', 'SeparationType', 'Cease Date', 'DETE Start Date', 'Role Start Date', 'Position', 'Classification', 'Region', 'Business Unit', 'Employment Status', 'Career move to public sector', 'Career move to private sector', 'Interpersonal conflicts', 'Job dissatisfaction', 'Dissatisfaction with the department', 'Physical work environment', 'Lack of recognition', 'Lack of job security', 'Work location', 'Employment conditions', 'Maternity/family', 'Relocation', 'Study/Travel', 'Ill Health', 'Traumatic incident', 'Work life balance', 'Workload', 'None of the above', 'Professional Development', 'Opportunities for promotion', 'Staff morale', 'Workplace issue', 'Physical environment', 'Worklife balance', 'Stress and pressure support', 'Performance of supervisor', 'Peer support', 'Initiative', 'Skills', 'Coach', 'Career Aspirations', 'Feedback', 'Further PD', 'Communication', 'My say', 'Information', 'Kept informed', 'Wellness programs', 'Health & Safety', 'Gender', 'Age', 'Aboriginal', 'Torres Strait', 'South Sea', 'Disability', 'NESB'], dtype='object') <bound method DataFrame.info of ID SeparationType Cease Date DETE Start Date \ 0 1 Ill Health Retirement 08/2012 1984.0 1 2 Voluntary Early Retirement (VER) 08/2012 NaN 2 3 Voluntary Early Retirement (VER) 05/2012 2011.0 3 4 Resignation-Other reasons 05/2012 2005.0 4 5 Age Retirement 05/2012 1970.0 .. ... ... ... ... 817 819 Age Retirement 02/2014 1977.0 818 820 Age Retirement 01/2014 1980.0 819 821 Resignation-Move overseas/interstate 01/2014 2009.0 820 822 Ill Health Retirement 12/2013 2001.0 821 823 Resignation-Move overseas/interstate 12/2013 NaN Role Start Date Position \ 0 2004.0 Public Servant 1 NaN Public Servant 2 2011.0 Schools Officer 3 2006.0 Teacher 4 1989.0 Head of Curriculum/Head of Special Education .. ... ... 817 1999.0 Teacher 818 1980.0 Teacher 819 2009.0 Public Servant 820 2009.0 Teacher 821 NaN Teacher Aide Classification Region \ 0 A01-A04 Central Office 1 AO5-AO7 Central Office 2 NaN Central Office 3 Primary Central Queensland 4 NaN South East .. ... ... 817 Primary Central Queensland 818 Secondary North Coast 819 A01-A04 Central Office 820 Secondary Darling Downs South West 821 NaN Metropolitan Business Unit Employment Status ... \ 0 Corporate Strategy and Peformance Permanent Full-time ... 1 Corporate Strategy and Peformance Permanent Full-time ... 2 Education Queensland Permanent Full-time ... 3 NaN Permanent Full-time ... 4 NaN Permanent Full-time ... .. ... ... ... 817 NaN Permanent Part-time ... 818 NaN Permanent Full-time ... 819 Education Queensland Permanent Full-time ... 820 NaN Permanent Full-time ... 821 NaN NaN ... Kept informed Wellness programs Health & Safety Gender Age \ 0 N N N Male 56-60 1 N N N Male 56-60 2 N N N Male 61 or older 3 A N A Female 36-40 4 N A M Female 61 or older .. ... ... ... ... ... 817 A A SA Female 56-60 818 N N N Male 51-55 819 A N A Female 31-35 820 A N A Female 41-45 821 NaN NaN NaN NaN NaN Aboriginal Torres Strait South Sea Disability NESB 0 NaN NaN NaN NaN Yes 1 NaN NaN NaN NaN NaN 2 NaN NaN NaN NaN NaN 3 NaN NaN NaN NaN NaN 4 NaN NaN NaN NaN NaN .. ... ... ... ... ... 817 NaN NaN NaN NaN NaN 818 NaN NaN NaN NaN NaN 819 NaN NaN NaN NaN NaN 820 NaN NaN NaN NaN NaN 821 NaN NaN NaN NaN NaN [822 rows x 56 columns]>
There is a lot of variation on column heading formats and many columns with information that is not relevant to answering the questions posed for this project.
# Observe volume of missing data. by column.
print(dete_survey.isnull().sum())
ID 0 SeparationType 0 Cease Date 34 DETE Start Date 73 Role Start Date 98 Position 5 Classification 367 Region 105 Business Unit 696 Employment Status 5 Career move to public sector 0 Career move to private sector 0 Interpersonal conflicts 0 Job dissatisfaction 0 Dissatisfaction with the department 0 Physical work environment 0 Lack of recognition 0 Lack of job security 0 Work location 0 Employment conditions 0 Maternity/family 0 Relocation 0 Study/Travel 0 Ill Health 0 Traumatic incident 0 Work life balance 0 Workload 0 None of the above 0 Professional Development 14 Opportunities for promotion 87 Staff morale 6 Workplace issue 34 Physical environment 5 Worklife balance 7 Stress and pressure support 12 Performance of supervisor 9 Peer support 10 Initiative 9 Skills 11 Coach 55 Career Aspirations 76 Feedback 30 Further PD 54 Communication 8 My say 10 Information 6 Kept informed 9 Wellness programs 56 Health & Safety 29 Gender 24 Age 11 Aboriginal 806 Torres Strait 819 South Sea 815 Disability 799 NESB 790 dtype: int64
# There are many columns which are not relevant to answering the questions as described in the introduction.
# Use appropriate function to drop columns 28 to 49.
labels = dete_survey.columns[28:49]
dete_survey_updated = dete_survey.drop(columns=labels, axis=1)
dete_survey_updated.info
<bound method DataFrame.info of ID SeparationType Cease Date DETE Start Date \ 0 1 Ill Health Retirement 08/2012 1984.0 1 2 Voluntary Early Retirement (VER) 08/2012 NaN 2 3 Voluntary Early Retirement (VER) 05/2012 2011.0 3 4 Resignation-Other reasons 05/2012 2005.0 4 5 Age Retirement 05/2012 1970.0 .. ... ... ... ... 817 819 Age Retirement 02/2014 1977.0 818 820 Age Retirement 01/2014 1980.0 819 821 Resignation-Move overseas/interstate 01/2014 2009.0 820 822 Ill Health Retirement 12/2013 2001.0 821 823 Resignation-Move overseas/interstate 12/2013 NaN Role Start Date Position \ 0 2004.0 Public Servant 1 NaN Public Servant 2 2011.0 Schools Officer 3 2006.0 Teacher 4 1989.0 Head of Curriculum/Head of Special Education .. ... ... 817 1999.0 Teacher 818 1980.0 Teacher 819 2009.0 Public Servant 820 2009.0 Teacher 821 NaN Teacher Aide Classification Region \ 0 A01-A04 Central Office 1 AO5-AO7 Central Office 2 NaN Central Office 3 Primary Central Queensland 4 NaN South East .. ... ... 817 Primary Central Queensland 818 Secondary North Coast 819 A01-A04 Central Office 820 Secondary Darling Downs South West 821 NaN Metropolitan Business Unit Employment Status ... \ 0 Corporate Strategy and Peformance Permanent Full-time ... 1 Corporate Strategy and Peformance Permanent Full-time ... 2 Education Queensland Permanent Full-time ... 3 NaN Permanent Full-time ... 4 NaN Permanent Full-time ... .. ... ... ... 817 NaN Permanent Part-time ... 818 NaN Permanent Full-time ... 819 Education Queensland Permanent Full-time ... 820 NaN Permanent Full-time ... 821 NaN NaN ... Work life balance Workload None of the above Gender Age \ 0 False False True Male 56-60 1 False False False Male 56-60 2 False False True Male 61 or older 3 False False False Female 36-40 4 True False False Female 61 or older .. ... ... ... ... ... 817 False True False Female 56-60 818 False False True Male 51-55 819 True False False Female 31-35 820 False False False Female 41-45 821 False False False NaN NaN Aboriginal Torres Strait South Sea Disability NESB 0 NaN NaN NaN NaN Yes 1 NaN NaN NaN NaN NaN 2 NaN NaN NaN NaN NaN 3 NaN NaN NaN NaN NaN 4 NaN NaN NaN NaN NaN .. ... ... ... ... ... 817 NaN NaN NaN NaN NaN 818 NaN NaN NaN NaN NaN 819 NaN NaN NaN NaN NaN 820 NaN NaN NaN NaN NaN 821 NaN NaN NaN NaN NaN [822 rows x 35 columns]>
I removed many columns from the DETE dataset that were not needed to answer the questions. This reduces the volume of output from the many commands in this project and makes it a little easier to manipulate the file.
# Use the _pd.str.replace_ code to make column titles simplified and consistent.
print(dete_survey_updated.columns)
print('\n')
dete_survey_updated.columns = dete_survey_updated.columns.str.replace(' ', '_').str.strip().str.upper()
print(dete_survey_updated.columns)
Index(['ID', 'SeparationType', 'Cease Date', 'DETE Start Date', 'Role Start Date', 'Position', 'Classification', 'Region', 'Business Unit', 'Employment Status', 'Career move to public sector', 'Career move to private sector', 'Interpersonal conflicts', 'Job dissatisfaction', 'Dissatisfaction with the department', 'Physical work environment', 'Lack of recognition', 'Lack of job security', 'Work location', 'Employment conditions', 'Maternity/family', 'Relocation', 'Study/Travel', 'Ill Health', 'Traumatic incident', 'Work life balance', 'Workload', 'None of the above', 'Gender', 'Age', 'Aboriginal', 'Torres Strait', 'South Sea', 'Disability', 'NESB'], dtype='object') Index(['ID', 'SEPARATIONTYPE', 'CEASE_DATE', 'DETE_START_DATE', 'ROLE_START_DATE', 'POSITION', 'CLASSIFICATION', 'REGION', 'BUSINESS_UNIT', 'EMPLOYMENT_STATUS', 'CAREER_MOVE_TO_PUBLIC_SECTOR', 'CAREER_MOVE_TO_PRIVATE_SECTOR', 'INTERPERSONAL_CONFLICTS', 'JOB_DISSATISFACTION', 'DISSATISFACTION_WITH_THE_DEPARTMENT', 'PHYSICAL_WORK_ENVIRONMENT', 'LACK_OF_RECOGNITION', 'LACK_OF_JOB_SECURITY', 'WORK_LOCATION', 'EMPLOYMENT_CONDITIONS', 'MATERNITY/FAMILY', 'RELOCATION', 'STUDY/TRAVEL', 'ILL_HEALTH', 'TRAUMATIC_INCIDENT', 'WORK_LIFE_BALANCE', 'WORKLOAD', 'NONE_OF_THE_ABOVE', 'GENDER', 'AGE', 'ABORIGINAL', 'TORRES_STRAIT', 'SOUTH_SEA', 'DISABILITY', 'NESB'], dtype='object')
I standardized the column title format with all capitals as well as removing spaces and changing the format from camelcase to snakecase. Again, it's all about simplifying and standardizing.
# Observe the various types of reasons for resigning in column ['SEPEARATIONTYPE'].
# Extract only the resignation type reasons.
print(dete_survey_updated.loc[:, 'SEPARATIONTYPE'].value_counts())
dete_resignations = dete_survey_updated[dete_survey_updated.loc[:, 'SEPARATIONTYPE'].isin(['Resignation-Other reasons','Resignation-Other employer', 'Resignation-Move overseas/interstate'])]
print('\n')
print(dete_resignations.info())
Age Retirement 285 Resignation-Other reasons 150 Resignation-Other employer 91 Resignation-Move overseas/interstate 70 Voluntary Early Retirement (VER) 67 Ill Health Retirement 61 Other 49 Contract Expired 34 Termination 15 Name: SEPARATIONTYPE, dtype: int64 <class 'pandas.core.frame.DataFrame'> Int64Index: 311 entries, 3 to 821 Data columns (total 35 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ID 311 non-null int64 1 SEPARATIONTYPE 311 non-null object 2 CEASE_DATE 300 non-null object 3 DETE_START_DATE 283 non-null float64 4 ROLE_START_DATE 271 non-null float64 5 POSITION 308 non-null object 6 CLASSIFICATION 161 non-null object 7 REGION 265 non-null object 8 BUSINESS_UNIT 32 non-null object 9 EMPLOYMENT_STATUS 307 non-null object 10 CAREER_MOVE_TO_PUBLIC_SECTOR 311 non-null bool 11 CAREER_MOVE_TO_PRIVATE_SECTOR 311 non-null bool 12 INTERPERSONAL_CONFLICTS 311 non-null bool 13 JOB_DISSATISFACTION 311 non-null bool 14 DISSATISFACTION_WITH_THE_DEPARTMENT 311 non-null bool 15 PHYSICAL_WORK_ENVIRONMENT 311 non-null bool 16 LACK_OF_RECOGNITION 311 non-null bool 17 LACK_OF_JOB_SECURITY 311 non-null bool 18 WORK_LOCATION 311 non-null bool 19 EMPLOYMENT_CONDITIONS 311 non-null bool 20 MATERNITY/FAMILY 311 non-null bool 21 RELOCATION 311 non-null bool 22 STUDY/TRAVEL 311 non-null bool 23 ILL_HEALTH 311 non-null bool 24 TRAUMATIC_INCIDENT 311 non-null bool 25 WORK_LIFE_BALANCE 311 non-null bool 26 WORKLOAD 311 non-null bool 27 NONE_OF_THE_ABOVE 311 non-null bool 28 GENDER 302 non-null object 29 AGE 306 non-null object 30 ABORIGINAL 7 non-null object 31 TORRES_STRAIT 0 non-null object 32 SOUTH_SEA 3 non-null object 33 DISABILITY 8 non-null object 34 NESB 9 non-null object dtypes: bool(18), float64(2), int64(1), object(14) memory usage: 49.2+ KB None
Resigning accounts for 38% of all the reasons for leaving the DETE Institute.
# Use the _pd.str.strip & replace & get_ functions to separate year in ['CEASE_DATE'] column.
dete_resignations = dete_resignations.copy()
print(dete_resignations.loc[:, 'CEASE_DATE'].value_counts())
print('\n')
CEASE_DATE_CLEANED = dete_resignations.loc[:, 'CEASE_DATE'].str.strip().str.replace('/', '/ ').str.split().str.get(-1)
dete_resignations.loc[:, 'CEASE_DATE_CLEANED'] = CEASE_DATE_CLEANED
print(dete_resignations.loc[:, 'CEASE_DATE_CLEANED'].value_counts)
# Convert years from string to numeric - float in order to execute subtraction.
dete_resignations.loc[:, 'CEASE_DATE_CLEANED'] = dete_resignations.loc[:, 'CEASE_DATE_CLEANED'].astype(float)
print(dete_resignations.loc[:, 'CEASE_DATE_CLEANED'].unique())
print('\n')
dete_resignations.loc[:, 'CEASE_DATE_CLEANED'] = dete_resignations.loc[:, 'CEASE_DATE_CLEANED'].dropna()
dete_resignations.loc[:, 'CEASE_DATE_CLEANED'].sort_index()
print(dete_resignations.loc[:, 'CEASE_DATE_CLEANED'].value_counts())
print('\n')
dete_resignations.tail()
print(dete_resignations.loc[:, 'DETE_START_DATE'].value_counts())
# Calculate difference between start and cease dates in order to determine years of service.
INSTITUTE_SERVICE = dete_resignations.loc[:, 'CEASE_DATE_CLEANED'] - dete_resignations.loc[:, 'DETE_START_DATE']
dete_resignations.loc[:, 'INSTITUTE_SERVICE'] = INSTITUTE_SERVICE
dete_resignations.loc[:, 'INSTITUTE_SERVICE'].value_counts()
2012 126 2013 74 01/2014 22 12/2013 17 06/2013 14 09/2013 11 11/2013 9 07/2013 9 10/2013 6 08/2013 4 05/2012 2 05/2013 2 07/2012 1 2010 1 09/2010 1 07/2006 1 Name: CEASE_DATE, dtype: int64 <bound method IndexOpsMixin.value_counts of 3 2012 5 2012 8 2012 9 2012 11 2012 ... 808 2013 815 2014 816 2014 819 2014 821 2013 Name: CEASE_DATE_CLEANED, Length: 311, dtype: object> [2012. 2013. 2010. 2014. nan 2006.] 2013.0 146 2012.0 129 2014.0 22 2010.0 2 2006.0 1 Name: CEASE_DATE_CLEANED, dtype: int64 2011.0 24 2008.0 22 2007.0 21 2012.0 21 2010.0 17 2005.0 15 2004.0 14 2009.0 13 2006.0 13 2013.0 10 2000.0 9 1999.0 8 1996.0 6 2002.0 6 1992.0 6 1998.0 6 2003.0 6 1994.0 6 1993.0 5 1990.0 5 1980.0 5 1997.0 5 1991.0 4 1989.0 4 1988.0 4 1995.0 4 2001.0 3 1985.0 3 1986.0 3 1983.0 2 1976.0 2 1974.0 2 1971.0 1 1972.0 1 1984.0 1 1982.0 1 1987.0 1 1975.0 1 1973.0 1 1977.0 1 1963.0 1 Name: DETE_START_DATE, dtype: int64
5.0 23 1.0 22 3.0 20 0.0 20 6.0 17 4.0 16 9.0 14 2.0 14 7.0 13 13.0 8 8.0 8 20.0 7 15.0 7 10.0 6 22.0 6 14.0 6 17.0 6 12.0 6 16.0 5 18.0 5 23.0 4 11.0 4 24.0 4 39.0 3 19.0 3 21.0 3 32.0 3 28.0 2 26.0 2 25.0 2 30.0 2 36.0 2 29.0 1 33.0 1 42.0 1 27.0 1 41.0 1 35.0 1 38.0 1 34.0 1 49.0 1 31.0 1 Name: INSTITUTE_SERVICE, dtype: int64
# create a function and use with _applymap_ to consolidate
# True and False results into one column ['dissatisfied'].
def update_vals(element):
if element == False:
return False
elif pd.isnull(element):
return np.nan
else:
return True
columns = ['JOB_DISSATISFACTION', 'DISSATISFACTION_WITH_THE_DEPARTMENT',
'PHYSICAL_WORK_ENVIRONMENT', 'LACK_OF_RECOGNITION',
'LACK_OF_JOB_SECURITY', 'WORK_LOCATION', 'EMPLOYMENT_CONDITIONS',
'WORK_LIFE_BALANCE', 'WORKLOAD',]
# Use _applymap_ function to consolidate True and False results into one column ['dissatisfied'].
dete_resignations.loc[:, 'dissatisfied'] = dete_resignations[columns].applymap(update_vals).any(1, skipna=False)
dete_resignations_up = dete_resignations.copy()
print("False means resigned for other than being dissatisfied.")
print("True means resigned due to being dissatisfied in some particular way.")
print('\n')
print(dete_resignations_up.loc[:, 'dissatisfied'].value_counts(dropna=False))
dete_resignations_up.head()
False means resigned for other than being dissatisfied. True means resigned due to being dissatisfied in some particular way. False 162 True 149 Name: dissatisfied, dtype: int64
ID | SEPARATIONTYPE | CEASE_DATE | DETE_START_DATE | ROLE_START_DATE | POSITION | CLASSIFICATION | REGION | BUSINESS_UNIT | EMPLOYMENT_STATUS | ... | GENDER | AGE | ABORIGINAL | TORRES_STRAIT | SOUTH_SEA | DISABILITY | NESB | CEASE_DATE_CLEANED | INSTITUTE_SERVICE | dissatisfied | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 4 | Resignation-Other reasons | 05/2012 | 2005.0 | 2006.0 | Teacher | Primary | Central Queensland | NaN | Permanent Full-time | ... | Female | 36-40 | NaN | NaN | NaN | NaN | NaN | 2012.0 | 7.0 | False |
5 | 6 | Resignation-Other reasons | 05/2012 | 1994.0 | 1997.0 | Guidance Officer | NaN | Central Office | Education Queensland | Permanent Full-time | ... | Female | 41-45 | NaN | NaN | NaN | NaN | NaN | 2012.0 | 18.0 | True |
8 | 9 | Resignation-Other reasons | 07/2012 | 2009.0 | 2009.0 | Teacher | Secondary | North Queensland | NaN | Permanent Full-time | ... | Female | 31-35 | NaN | NaN | NaN | NaN | NaN | 2012.0 | 3.0 | False |
9 | 10 | Resignation-Other employer | 2012 | 1997.0 | 2008.0 | Teacher Aide | NaN | NaN | NaN | Permanent Part-time | ... | Female | 46-50 | NaN | NaN | NaN | NaN | NaN | 2012.0 | 15.0 | True |
11 | 12 | Resignation-Move overseas/interstate | 2012 | 2009.0 | 2009.0 | Teacher | Secondary | Far North Queensland | NaN | Permanent Full-time | ... | Male | 31-35 | NaN | NaN | NaN | NaN | NaN | 2012.0 | 3.0 | False |
5 rows × 38 columns
The cease_date had many different forms in the original datafile. It had to be standardized to year only in order to calculate years of service by subtracting start_date from it.
I also extracted data associated with "dissatisfied" as a reason for resigning from many different columns and consolidated it into one new column.
# Create a new column designated as ['institute'] with every row being 'DETE'.
dete_resignations_up.loc[:, 'institute'] = 'DETE'
dete_resignations_up.columns = dete_resignations_up.columns.str.strip().str.lower()
print(dete_resignations_up.head())
dete_resignations_up.loc[:, 'institute_service'].value_counts(dropna=False)
id separationtype cease_date dete_start_date \ 3 4 Resignation-Other reasons 05/2012 2005.0 5 6 Resignation-Other reasons 05/2012 1994.0 8 9 Resignation-Other reasons 07/2012 2009.0 9 10 Resignation-Other employer 2012 1997.0 11 12 Resignation-Move overseas/interstate 2012 2009.0 role_start_date position classification region \ 3 2006.0 Teacher Primary Central Queensland 5 1997.0 Guidance Officer NaN Central Office 8 2009.0 Teacher Secondary North Queensland 9 2008.0 Teacher Aide NaN NaN 11 2009.0 Teacher Secondary Far North Queensland business_unit employment_status ... age aboriginal \ 3 NaN Permanent Full-time ... 36-40 NaN 5 Education Queensland Permanent Full-time ... 41-45 NaN 8 NaN Permanent Full-time ... 31-35 NaN 9 NaN Permanent Part-time ... 46-50 NaN 11 NaN Permanent Full-time ... 31-35 NaN torres_strait south_sea disability nesb cease_date_cleaned \ 3 NaN NaN NaN NaN 2012.0 5 NaN NaN NaN NaN 2012.0 8 NaN NaN NaN NaN 2012.0 9 NaN NaN NaN NaN 2012.0 11 NaN NaN NaN NaN 2012.0 institute_service dissatisfied institute 3 7.0 False DETE 5 18.0 True DETE 8 3.0 False DETE 9 15.0 True DETE 11 3.0 False DETE [5 rows x 39 columns]
NaN 38 5.0 23 1.0 22 3.0 20 0.0 20 6.0 17 4.0 16 9.0 14 2.0 14 7.0 13 13.0 8 8.0 8 20.0 7 15.0 7 12.0 6 22.0 6 17.0 6 10.0 6 14.0 6 16.0 5 18.0 5 24.0 4 23.0 4 11.0 4 39.0 3 32.0 3 19.0 3 21.0 3 36.0 2 30.0 2 25.0 2 28.0 2 26.0 2 29.0 1 42.0 1 38.0 1 27.0 1 41.0 1 35.0 1 49.0 1 34.0 1 33.0 1 31.0 1 Name: institute_service, dtype: int64
The purpose of creating a new column "insitute" with 'DETE' as the input was to distinguish the DETE file from the TAFE file after they are combined.
I also changed the column titles to lower case to match the TAFE column titles.
# Allocate the years of service into four categorical groups:
# New < 3 years
# Experienced 3-6 years
# Established 7-10 yers
# Veteran > 10 years
def update_vals(element):
if element < 3.0:
return 'New'
elif 3 <= element < 7:
return 'Experienced'
elif 6 < element < 11:
return 'Established'
elif element > 10:
return 'Veteran'
elif pd.isnull(element):
return np.nan
dete_resignations_up.loc[:, 'service_cat'] = dete_resignations_up.loc[:, 'institute_service'].map(update_vals)
dete_resignations_up.head()
dete_resignations_up.loc[:, 'service_cat'].value_counts(dropna=False)
Veteran 100 Experienced 76 New 56 Established 41 NaN 38 Name: service_cat, dtype: int64
Rather than making a bar graph with as many distinct bars as there are distinct numbers of years of service, I grouped the years into categorical groups covering ranges of years of service. I condensed the number of groups to four as shown above in the output.
# Use _pivot_table_ to determine percentage of employees resigning
# due to dissatisfaction for each years of service group.
dete_resignations_up.loc[:, 'dissatisfied'] = dete_resignations_up.loc[:, 'dissatisfied'].fillna(False)
print(dete_resignations_up.loc[:, 'dissatisfied'].value_counts(dropna=False))
print('\n')
Percent_Average = 100*(149/(154+149))
print("Percent Average DETE Employees Dissatisfied Regardless of Years of Service =", Percent_Average)
print('\n')
diss_dete_pct1 = dete_resignations_up.pivot_table(index='service_cat', values='dissatisfied')
print(diss_dete_pct1.info)
False 162 True 149 Name: dissatisfied, dtype: int64 Percent Average DETE Employees Dissatisfied Regardless of Years of Service = 49.17491749174918 <bound method DataFrame.info of dissatisfied service_cat Established 0.609756 Experienced 0.460526 New 0.375000 Veteran 0.560000>
# Use matplotlib dunction to plt a bar chart to provide visual
# comparison between yrs of service groups for percent employees
# resigning due to some kind of dissatisfaction.
%matplotlib inline
diss_dete_pct1.plot(kind='bar', rot=30, ylim=(0,0.75))
for i in range(4):
plt.text(i-0.14, 0.2, round(diss_dete_pct1.loc[:, 'dissatisfied'], 3).iloc[i], c='k')
plt.xlabel('Years of Service Categories')
plt.ylabel('Proportion of Employees Dissatisfied')
plt.title('Proportion Dissatisfied by Service Categories: DETE')
Text(0.5, 1.0, 'Proportion Dissatisfied by Service Categories: DETE')
Question 1:
1. Are employees who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer?
Answers:
The answer to both questions above is YES.
For employees classified as "New" (less than 3 years of service), about 38% resigned due to some kind of dissatisfaction. For employees with more than 6 years of service (Established and Veteran), about 61% and 56% respectively resigned due to some kind of dissatisfaction. The percent for employees classified as Experienced (between 3-6 years of srvice) had 46% dissatisfied.
From the graph results shown above, it appears that the percentage of employees who resigned for "dissatisfied" reasons, increases as years of service increases. I can surmise why that might be; such as, the longer one is in service, the greater chance to see significant changes in management (good or bad), changes in moral, changes in institute direction, etc.
I will not conclude anything further at this point but will wait until analysis is complete for both institutes.
# Consolidate the employee age into five age range groups:
print(dete_resignations_up.loc[:, 'age'].value_counts(dropna=False))
print('\n')
def update_vals(element):
if element == '20 or younger':
return 'Twenties'
if element == '21-25':
return 'Twenties'
if element == '26-30':
return 'Twenties'
if element == '31-35':
return 'Thirties'
if element == '36-40':
return 'Thirties'
if element == '41-45':
return 'Forties'
if element == '46-50':
return 'Forties'
if element == '51-55':
return 'Fifties'
if element == '56-60':
return 'Fifties'
if element == '61 or older':
return 'Above Sixty'
elif pd.isnull(element):
return np.nan
dete_resignations_up.loc[:, 'age_groups'] = dete_resignations_up.loc[:, 'age'].map(update_vals)
dete_resignations_up.head()
dete_resignations_up.loc[:, 'age_groups'].value_counts(dropna=False)
41-45 48 46-50 42 36-40 41 26-30 35 51-55 32 21-25 29 31-35 29 56-60 26 61 or older 23 NaN 5 20 or younger 1 Name: age, dtype: int64
Forties 90 Thirties 70 Twenties 65 Fifties 58 Above Sixty 23 NaN 5 Name: age_groups, dtype: int64
# Use _pivot_table_ to determine percentage of employees resigning
# due to dissatisfaction for each age group.
dete_resignations_up.loc[:, 'dissatisfied'] = dete_resignations_up.loc[:, 'dissatisfied'].fillna(False)
print(dete_resignations_up.loc[:, 'dissatisfied'].value_counts(dropna=False))
print('\n')
Percent_Average = 100*(149/(154+149))
print("Percent Average DETE Employees Dissatisfied Regardless Their Age =", Percent_Average)
print('\n')
diss_dete_pct2 = dete_resignations_up.pivot_table(index='age_groups', values='dissatisfied')
print(diss_dete_pct2.info)
False 162 True 149 Name: dissatisfied, dtype: int64 Percent Average DETE Employees Dissatisfied Regardless Their Age = 49.17491749174918 <bound method DataFrame.info of dissatisfied age_groups Above Sixty 0.521739 Fifties 0.586207 Forties 0.466667 Thirties 0.457143 Twenties 0.446154>
# Use matplotlib function to plot a bar chart to provide visual
# comparison between age groups for percent employees resigning
# due to some kind of dissatisfaction.
%matplotlib inline
diss_dete_pct2.plot(kind='bar', rot=30, ylim=(0,0.75))
for i in range(5):
plt.text(i-0.22, 0.25, round(diss_dete_pct2.loc[:, 'dissatisfied'], 3).iloc[i], c='k')
plt.xlabel('Age Group Categories')
plt.ylabel('Proportion of Employees Dissatisfied')
plt.title('Proportion Dissatisfied by Age Group: DETE')
Text(0.5, 1.0, 'Proportion Dissatisfied by Age Group: DETE')
Question 2:
Are younger employees resigning due to some kind of dissatisfaction? What about older employees?
Answers:
The answer to both questions above is YES.
There's very little difference in percent dissatisfied for employees in there twenties, thirties and forties: between 45% to 47%. There's a significant increase in percent for employees in their fifties: 59%, followed by a drop off for employees over 60: 52%.
The total range of percentage among the age groups is 14, which is much less than the total range for years of service groups: 23.5.
I will not write observations after each series of executions for the TAFE file in that they would basically be the same as the ones I wrote for the DETE file above.
# read in and save the TAFE Institution *.csv data files as DataFrame.
# not all missing data is signified as NaN, some are as 'Not Stated'.
# add na_values function to standardize missing data as NaN.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
tafe_survey = pd.read_csv('tafe_survey.csv', na_values=['Not Stated'])
print('\n')
print(tafe_survey.columns)
print('\n')
print(tafe_survey.head())
print('\n')
print(tafe_survey.info)
Index(['Record ID', 'Institute', 'WorkArea', 'CESSATION YEAR', 'Reason for ceasing employment', 'Contributing Factors. Career Move - Public Sector ', 'Contributing Factors. Career Move - Private Sector ', 'Contributing Factors. Career Move - Self-employment', 'Contributing Factors. Ill Health', 'Contributing Factors. Maternity/Family', 'Contributing Factors. Dissatisfaction', 'Contributing Factors. Job Dissatisfaction', 'Contributing Factors. Interpersonal Conflict', 'Contributing Factors. Study', 'Contributing Factors. Travel', 'Contributing Factors. Other', 'Contributing Factors. NONE', 'Main Factor. Which of these was the main factor for leaving?', 'InstituteViews. Topic:1. I feel the senior leadership had a clear vision and direction', 'InstituteViews. Topic:2. I was given access to skills training to help me do my job better', 'InstituteViews. Topic:3. I was given adequate opportunities for personal development', 'InstituteViews. Topic:4. I was given adequate opportunities for promotion within %Institute]Q25LBL%', 'InstituteViews. Topic:5. I felt the salary for the job was right for the responsibilities I had', 'InstituteViews. Topic:6. The organisation recognised when staff did good work', 'InstituteViews. Topic:7. Management was generally supportive of me', 'InstituteViews. Topic:8. Management was generally supportive of my team', 'InstituteViews. Topic:9. I was kept informed of the changes in the organisation which would affect me', 'InstituteViews. Topic:10. Staff morale was positive within the Institute', 'InstituteViews. Topic:11. If I had a workplace issue it was dealt with quickly', 'InstituteViews. Topic:12. If I had a workplace issue it was dealt with efficiently', 'InstituteViews. Topic:13. If I had a workplace issue it was dealt with discreetly', 'WorkUnitViews. Topic:14. I was satisfied with the quality of the management and supervision within my work unit', 'WorkUnitViews. Topic:15. I worked well with my colleagues', 'WorkUnitViews. Topic:16. My job was challenging and interesting', 'WorkUnitViews. Topic:17. I was encouraged to use my initiative in the course of my work', 'WorkUnitViews. Topic:18. I had sufficient contact with other people in my job', 'WorkUnitViews. Topic:19. I was given adequate support and co-operation by my peers to enable me to do my job', 'WorkUnitViews. Topic:20. I was able to use the full range of my skills in my job', 'WorkUnitViews. Topic:21. I was able to use the full range of my abilities in my job. ; Category:Level of Agreement; Question:YOUR VIEWS ABOUT YOUR WORK UNIT]', 'WorkUnitViews. Topic:22. I was able to use the full range of my knowledge in my job', 'WorkUnitViews. Topic:23. My job provided sufficient variety', 'WorkUnitViews. Topic:24. I was able to cope with the level of stress and pressure in my job', 'WorkUnitViews. Topic:25. My job allowed me to balance the demands of work and family to my satisfaction', 'WorkUnitViews. Topic:26. My supervisor gave me adequate personal recognition and feedback on my performance', 'WorkUnitViews. Topic:27. My working environment was satisfactory e.g. sufficient space, good lighting, suitable seating and working area', 'WorkUnitViews. Topic:28. I was given the opportunity to mentor and coach others in order for me to pass on my skills and knowledge prior to my cessation date', 'WorkUnitViews. Topic:29. There was adequate communication between staff in my unit', 'WorkUnitViews. Topic:30. Staff morale was positive within my work unit', 'Induction. Did you undertake Workplace Induction?', 'InductionInfo. Topic:Did you undertake a Corporate Induction?', 'InductionInfo. Topic:Did you undertake a Institute Induction?', 'InductionInfo. Topic: Did you undertake Team Induction?', 'InductionInfo. Face to Face Topic:Did you undertake a Corporate Induction; Category:How it was conducted?', 'InductionInfo. On-line Topic:Did you undertake a Corporate Induction; Category:How it was conducted?', 'InductionInfo. Induction Manual Topic:Did you undertake a Corporate Induction?', 'InductionInfo. Face to Face Topic:Did you undertake a Institute Induction?', 'InductionInfo. On-line Topic:Did you undertake a Institute Induction?', 'InductionInfo. Induction Manual Topic:Did you undertake a Institute Induction?', 'InductionInfo. Face to Face Topic: Did you undertake Team Induction; Category?', 'InductionInfo. On-line Topic: Did you undertake Team Induction?process you undertook and how it was conducted.]', 'InductionInfo. Induction Manual Topic: Did you undertake Team Induction?', 'Workplace. Topic:Did you and your Manager develop a Performance and Professional Development Plan (PPDP)?', 'Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination?', 'Workplace. Topic:Does your workplace promote and practice the principles of employment equity?', 'Workplace. Topic:Does your workplace value the diversity of its employees?', 'Workplace. Topic:Would you recommend the Institute as an employer to others?', 'Gender. What is your Gender?', 'CurrentAge. Current Age', 'Employment Type. Employment Type', 'Classification. Classification', 'LengthofServiceOverall. Overall Length of Service at Institute (in years)', 'LengthofServiceCurrent. Length of Service at current workplace (in years)'], dtype='object') Record ID Institute \ 0 6.341330e+17 Southern Queensland Institute of TAFE 1 6.341337e+17 Mount Isa Institute of TAFE 2 6.341388e+17 Mount Isa Institute of TAFE 3 6.341399e+17 Mount Isa Institute of TAFE 4 6.341466e+17 Southern Queensland Institute of TAFE WorkArea CESSATION YEAR Reason for ceasing employment \ 0 Non-Delivery (corporate) 2010.0 Contract Expired 1 Non-Delivery (corporate) 2010.0 Retirement 2 Delivery (teaching) 2010.0 Retirement 3 Non-Delivery (corporate) 2010.0 Resignation 4 Delivery (teaching) 2010.0 Resignation Contributing Factors. Career Move - Public Sector \ 0 NaN 1 - 2 - 3 - 4 - Contributing Factors. Career Move - Private Sector \ 0 NaN 1 - 2 - 3 - 4 Career Move - Private Sector Contributing Factors. Career Move - Self-employment \ 0 NaN 1 - 2 - 3 - 4 - Contributing Factors. Ill Health Contributing Factors. Maternity/Family \ 0 NaN NaN 1 - - 2 - - 3 - - 4 - - ... \ 0 ... 1 ... 2 ... 3 ... 4 ... Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination? \ 0 Yes 1 Yes 2 Yes 3 Yes 4 Yes Workplace. Topic:Does your workplace promote and practice the principles of employment equity? \ 0 Yes 1 Yes 2 Yes 3 Yes 4 Yes Workplace. Topic:Does your workplace value the diversity of its employees? \ 0 Yes 1 Yes 2 Yes 3 Yes 4 Yes Workplace. Topic:Would you recommend the Institute as an employer to others? \ 0 Yes 1 Yes 2 Yes 3 Yes 4 Yes Gender. What is your Gender? CurrentAge. Current Age \ 0 Female 26 30 1 NaN NaN 2 NaN NaN 3 NaN NaN 4 Male 41 45 Employment Type. Employment Type Classification. Classification \ 0 Temporary Full-time Administration (AO) 1 NaN NaN 2 NaN NaN 3 NaN NaN 4 Permanent Full-time Teacher (including LVT) LengthofServiceOverall. Overall Length of Service at Institute (in years) \ 0 1-2 1 NaN 2 NaN 3 NaN 4 3-4 LengthofServiceCurrent. Length of Service at current workplace (in years) 0 1-2 1 NaN 2 NaN 3 NaN 4 3-4 [5 rows x 72 columns] <bound method DataFrame.info of Record ID Institute \ 0 6.341330e+17 Southern Queensland Institute of TAFE 1 6.341337e+17 Mount Isa Institute of TAFE 2 6.341388e+17 Mount Isa Institute of TAFE 3 6.341399e+17 Mount Isa Institute of TAFE 4 6.341466e+17 Southern Queensland Institute of TAFE .. ... ... 697 6.350668e+17 Barrier Reef Institute of TAFE 698 6.350677e+17 Southern Queensland Institute of TAFE 699 6.350704e+17 Tropical North Institute of TAFE 700 6.350712e+17 Southbank Institute of Technology 701 6.350730e+17 Tropical North Institute of TAFE WorkArea CESSATION YEAR Reason for ceasing employment \ 0 Non-Delivery (corporate) 2010.0 Contract Expired 1 Non-Delivery (corporate) 2010.0 Retirement 2 Delivery (teaching) 2010.0 Retirement 3 Non-Delivery (corporate) 2010.0 Resignation 4 Delivery (teaching) 2010.0 Resignation .. ... ... ... 697 Delivery (teaching) 2013.0 Resignation 698 Non-Delivery (corporate) 2013.0 Resignation 699 Delivery (teaching) 2013.0 Resignation 700 Non-Delivery (corporate) 2013.0 Contract Expired 701 Non-Delivery (corporate) 2013.0 Resignation Contributing Factors. Career Move - Public Sector \ 0 NaN 1 - 2 - 3 - 4 - .. ... 697 Career Move - Public Sector 698 Career Move - Public Sector 699 - 700 NaN 701 - Contributing Factors. Career Move - Private Sector \ 0 NaN 1 - 2 - 3 - 4 Career Move - Private Sector .. ... 697 - 698 - 699 - 700 NaN 701 - Contributing Factors. Career Move - Self-employment \ 0 NaN 1 - 2 - 3 - 4 - .. ... 697 - 698 - 699 - 700 NaN 701 Career Move - Self-employment Contributing Factors. Ill Health Contributing Factors. Maternity/Family \ 0 NaN NaN 1 - - 2 - - 3 - - 4 - - .. ... ... 697 - - 698 - - 699 - - 700 NaN NaN 701 - - ... \ 0 ... 1 ... 2 ... 3 ... 4 ... .. ... 697 ... 698 ... 699 ... 700 ... 701 ... Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination? \ 0 Yes 1 Yes 2 Yes 3 Yes 4 Yes .. ... 697 Yes 698 NaN 699 Yes 700 No 701 Yes Workplace. Topic:Does your workplace promote and practice the principles of employment equity? \ 0 Yes 1 Yes 2 Yes 3 Yes 4 Yes .. ... 697 Yes 698 NaN 699 Yes 700 No 701 Yes Workplace. Topic:Does your workplace value the diversity of its employees? \ 0 Yes 1 Yes 2 Yes 3 Yes 4 Yes .. ... 697 Yes 698 NaN 699 Yes 700 Yes 701 Yes Workplace. Topic:Would you recommend the Institute as an employer to others? \ 0 Yes 1 Yes 2 Yes 3 Yes 4 Yes .. ... 697 Yes 698 NaN 699 Yes 700 No 701 Yes Gender. What is your Gender? CurrentAge. Current Age \ 0 Female 26 30 1 NaN NaN 2 NaN NaN 3 NaN NaN 4 Male 41 45 .. ... ... 697 Male 51-55 698 NaN NaN 699 Female 51-55 700 Female 41 45 701 Female 26 30 Employment Type. Employment Type Classification. Classification \ 0 Temporary Full-time Administration (AO) 1 NaN NaN 2 NaN NaN 3 NaN NaN 4 Permanent Full-time Teacher (including LVT) .. ... ... 697 Temporary Full-time Teacher (including LVT) 698 NaN NaN 699 Permanent Full-time Teacher (including LVT) 700 Temporary Full-time Professional Officer (PO) 701 Contract/casual Administration (AO) LengthofServiceOverall. Overall Length of Service at Institute (in years) \ 0 1-2 1 NaN 2 NaN 3 NaN 4 3-4 .. ... 697 1-2 698 NaN 699 5-6 700 1-2 701 3-4 LengthofServiceCurrent. Length of Service at current workplace (in years) 0 1-2 1 NaN 2 NaN 3 NaN 4 3-4 .. ... 697 1-2 698 NaN 699 1-2 700 1-2 701 1-2 [702 rows x 72 columns]>
# Observe volume of missing data by column.
print(tafe_survey.isnull().sum())
Record ID 0 Institute 0 WorkArea 0 CESSATION YEAR 7 Reason for ceasing employment 1 ... CurrentAge. Current Age 106 Employment Type. Employment Type 106 Classification. Classification 106 LengthofServiceOverall. Overall Length of Service at Institute (in years) 106 LengthofServiceCurrent. Length of Service at current workplace (in years) 106 Length: 72, dtype: int64
# There are many columns which are not relevant to answering the research questions as described in the introduction.
# Use appropriate function to drop the following columns.
labels = tafe_survey.columns[17:66]
tafe_survey_updated = tafe_survey.drop(columns=labels, axis=1)
print(tafe_survey_updated.info)
print('\n')
print(tafe_survey_updated.columns)
<bound method DataFrame.info of Record ID Institute \ 0 6.341330e+17 Southern Queensland Institute of TAFE 1 6.341337e+17 Mount Isa Institute of TAFE 2 6.341388e+17 Mount Isa Institute of TAFE 3 6.341399e+17 Mount Isa Institute of TAFE 4 6.341466e+17 Southern Queensland Institute of TAFE .. ... ... 697 6.350668e+17 Barrier Reef Institute of TAFE 698 6.350677e+17 Southern Queensland Institute of TAFE 699 6.350704e+17 Tropical North Institute of TAFE 700 6.350712e+17 Southbank Institute of Technology 701 6.350730e+17 Tropical North Institute of TAFE WorkArea CESSATION YEAR Reason for ceasing employment \ 0 Non-Delivery (corporate) 2010.0 Contract Expired 1 Non-Delivery (corporate) 2010.0 Retirement 2 Delivery (teaching) 2010.0 Retirement 3 Non-Delivery (corporate) 2010.0 Resignation 4 Delivery (teaching) 2010.0 Resignation .. ... ... ... 697 Delivery (teaching) 2013.0 Resignation 698 Non-Delivery (corporate) 2013.0 Resignation 699 Delivery (teaching) 2013.0 Resignation 700 Non-Delivery (corporate) 2013.0 Contract Expired 701 Non-Delivery (corporate) 2013.0 Resignation Contributing Factors. Career Move - Public Sector \ 0 NaN 1 - 2 - 3 - 4 - .. ... 697 Career Move - Public Sector 698 Career Move - Public Sector 699 - 700 NaN 701 - Contributing Factors. Career Move - Private Sector \ 0 NaN 1 - 2 - 3 - 4 Career Move - Private Sector .. ... 697 - 698 - 699 - 700 NaN 701 - Contributing Factors. Career Move - Self-employment \ 0 NaN 1 - 2 - 3 - 4 - .. ... 697 - 698 - 699 - 700 NaN 701 Career Move - Self-employment Contributing Factors. Ill Health Contributing Factors. Maternity/Family \ 0 NaN NaN 1 - - 2 - - 3 - - 4 - - .. ... ... 697 - - 698 - - 699 - - 700 NaN NaN 701 - - ... Contributing Factors. Study Contributing Factors. Travel \ 0 ... NaN NaN 1 ... - Travel 2 ... - - 3 ... - Travel 4 ... - - .. ... ... ... 697 ... - - 698 ... - - 699 ... - - 700 ... NaN NaN 701 ... - Travel Contributing Factors. Other Contributing Factors. NONE \ 0 NaN NaN 1 - - 2 - NONE 3 - - 4 - - .. ... ... 697 - - 698 - - 699 Other - 700 NaN NaN 701 - - Gender. What is your Gender? CurrentAge. Current Age \ 0 Female 26 30 1 NaN NaN 2 NaN NaN 3 NaN NaN 4 Male 41 45 .. ... ... 697 Male 51-55 698 NaN NaN 699 Female 51-55 700 Female 41 45 701 Female 26 30 Employment Type. Employment Type Classification. Classification \ 0 Temporary Full-time Administration (AO) 1 NaN NaN 2 NaN NaN 3 NaN NaN 4 Permanent Full-time Teacher (including LVT) .. ... ... 697 Temporary Full-time Teacher (including LVT) 698 NaN NaN 699 Permanent Full-time Teacher (including LVT) 700 Temporary Full-time Professional Officer (PO) 701 Contract/casual Administration (AO) LengthofServiceOverall. Overall Length of Service at Institute (in years) \ 0 1-2 1 NaN 2 NaN 3 NaN 4 3-4 .. ... 697 1-2 698 NaN 699 5-6 700 1-2 701 3-4 LengthofServiceCurrent. Length of Service at current workplace (in years) 0 1-2 1 NaN 2 NaN 3 NaN 4 3-4 .. ... 697 1-2 698 NaN 699 1-2 700 1-2 701 1-2 [702 rows x 23 columns]> Index(['Record ID', 'Institute', 'WorkArea', 'CESSATION YEAR', 'Reason for ceasing employment', 'Contributing Factors. Career Move - Public Sector ', 'Contributing Factors. Career Move - Private Sector ', 'Contributing Factors. Career Move - Self-employment', 'Contributing Factors. Ill Health', 'Contributing Factors. Maternity/Family', 'Contributing Factors. Dissatisfaction', 'Contributing Factors. Job Dissatisfaction', 'Contributing Factors. Interpersonal Conflict', 'Contributing Factors. Study', 'Contributing Factors. Travel', 'Contributing Factors. Other', 'Contributing Factors. NONE', 'Gender. What is your Gender?', 'CurrentAge. Current Age', 'Employment Type. Employment Type', 'Classification. Classification', 'LengthofServiceOverall. Overall Length of Service at Institute (in years)', 'LengthofServiceCurrent. Length of Service at current workplace (in years)'], dtype='object')
# Use the _mapping_ code to make column titles simplified and consistent.
print(tafe_survey_updated.columns)
print('\n')
mapping = {'Record ID': 'id', 'CESSATION YEAR': 'cease_date', 'Reason for ceasing employment': 'separationtype',
'Gender. What is your Gender?': 'gender', 'CurrentAge. Current Age': 'age',
'Employment Type. Employment Type': 'employment_status', 'Classification. Classification': 'position',
'LengthofServiceOverall. Overall Length of Service at Institute (in years)': 'institute_service',
'LengthofServiceCurrent. Length of Service at current workplace (in years)': 'role_service' }
tafe_survey_updated = tafe_survey_updated.rename(mapping, axis=1)
print(tafe_survey_updated.columns)
Index(['Record ID', 'Institute', 'WorkArea', 'CESSATION YEAR', 'Reason for ceasing employment', 'Contributing Factors. Career Move - Public Sector ', 'Contributing Factors. Career Move - Private Sector ', 'Contributing Factors. Career Move - Self-employment', 'Contributing Factors. Ill Health', 'Contributing Factors. Maternity/Family', 'Contributing Factors. Dissatisfaction', 'Contributing Factors. Job Dissatisfaction', 'Contributing Factors. Interpersonal Conflict', 'Contributing Factors. Study', 'Contributing Factors. Travel', 'Contributing Factors. Other', 'Contributing Factors. NONE', 'Gender. What is your Gender?', 'CurrentAge. Current Age', 'Employment Type. Employment Type', 'Classification. Classification', 'LengthofServiceOverall. Overall Length of Service at Institute (in years)', 'LengthofServiceCurrent. Length of Service at current workplace (in years)'], dtype='object') Index(['id', 'Institute', 'WorkArea', 'cease_date', 'separationtype', 'Contributing Factors. Career Move - Public Sector ', 'Contributing Factors. Career Move - Private Sector ', 'Contributing Factors. Career Move - Self-employment', 'Contributing Factors. Ill Health', 'Contributing Factors. Maternity/Family', 'Contributing Factors. Dissatisfaction', 'Contributing Factors. Job Dissatisfaction', 'Contributing Factors. Interpersonal Conflict', 'Contributing Factors. Study', 'Contributing Factors. Travel', 'Contributing Factors. Other', 'Contributing Factors. NONE', 'gender', 'age', 'employment_status', 'position', 'institute_service', 'role_service'], dtype='object')
# Observe the various types of reasons for resigning in column ['SEPEARATIONTYPE'].
# Extract only the resignation type reasons.
print(tafe_survey_updated.loc[:, 'separationtype'].value_counts())
tafe_resignations = tafe_survey_updated[tafe_survey_updated.loc[:, 'separationtype'] == 'Resignation']
tafe_resignations.copy()
print(tafe_resignations)
Resignation 340 Contract Expired 127 Retrenchment/ Redundancy 104 Retirement 82 Transfer 25 Termination 23 Name: separationtype, dtype: int64 id Institute \ 3 6.341399e+17 Mount Isa Institute of TAFE 4 6.341466e+17 Southern Queensland Institute of TAFE 5 6.341475e+17 Southern Queensland Institute of TAFE 6 6.341520e+17 Barrier Reef Institute of TAFE 7 6.341537e+17 Southern Queensland Institute of TAFE .. ... ... 696 6.350660e+17 Southern Queensland Institute of TAFE 697 6.350668e+17 Barrier Reef Institute of TAFE 698 6.350677e+17 Southern Queensland Institute of TAFE 699 6.350704e+17 Tropical North Institute of TAFE 701 6.350730e+17 Tropical North Institute of TAFE WorkArea cease_date separationtype \ 3 Non-Delivery (corporate) 2010.0 Resignation 4 Delivery (teaching) 2010.0 Resignation 5 Delivery (teaching) 2010.0 Resignation 6 Non-Delivery (corporate) 2010.0 Resignation 7 Delivery (teaching) 2010.0 Resignation .. ... ... ... 696 Non-Delivery (corporate) 2013.0 Resignation 697 Delivery (teaching) 2013.0 Resignation 698 Non-Delivery (corporate) 2013.0 Resignation 699 Delivery (teaching) 2013.0 Resignation 701 Non-Delivery (corporate) 2013.0 Resignation Contributing Factors. Career Move - Public Sector \ 3 - 4 - 5 - 6 - 7 - .. ... 696 - 697 Career Move - Public Sector 698 Career Move - Public Sector 699 - 701 - Contributing Factors. Career Move - Private Sector \ 3 - 4 Career Move - Private Sector 5 - 6 Career Move - Private Sector 7 - .. ... 696 Career Move - Private Sector 697 - 698 - 699 - 701 - Contributing Factors. Career Move - Self-employment \ 3 - 4 - 5 - 6 - 7 - .. ... 696 - 697 - 698 - 699 - 701 Career Move - Self-employment Contributing Factors. Ill Health Contributing Factors. Maternity/Family \ 3 - - 4 - - 5 - - 6 - Maternity/Family 7 - - .. ... ... 696 - - 697 - - 698 - - 699 - - 701 - - ... Contributing Factors. Study Contributing Factors. Travel \ 3 ... - Travel 4 ... - - 5 ... - - 6 ... - - 7 ... - - .. ... ... ... 696 ... - - 697 ... - - 698 ... - - 699 ... - - 701 ... - Travel Contributing Factors. Other Contributing Factors. NONE gender \ 3 - - NaN 4 - - Male 5 Other - Female 6 Other - Male 7 Other - Male .. ... ... ... 696 - - Male 697 - - Male 698 - - NaN 699 Other - Female 701 - - Female age employment_status position \ 3 NaN NaN NaN 4 41 45 Permanent Full-time Teacher (including LVT) 5 56 or older Contract/casual Teacher (including LVT) 6 20 or younger Temporary Full-time Administration (AO) 7 46 50 Permanent Full-time Teacher (including LVT) .. ... ... ... 696 21 25 Temporary Full-time Operational (OO) 697 51-55 Temporary Full-time Teacher (including LVT) 698 NaN NaN NaN 699 51-55 Permanent Full-time Teacher (including LVT) 701 26 30 Contract/casual Administration (AO) institute_service role_service 3 NaN NaN 4 3-4 3-4 5 7-10 7-10 6 3-4 3-4 7 3-4 3-4 .. ... ... 696 5-6 5-6 697 1-2 1-2 698 NaN NaN 699 5-6 1-2 701 3-4 1-2 [340 rows x 23 columns]
# Create a function and use with _applymap_ to consolidate
# True and False results into one column ['dissatisfied'].
def update_vals(element):
if element == '-':
return False
elif pd.isnull(element):
return np.nan
else:
return True
# Use _applymap_ function to consolidate True and False results into one column ['dissatisfied'].
tafe_resignations.loc[:, 'dissatisfied'] = tafe_resignations.loc[:, ('Contributing Factors. Dissatisfaction', 'Contributing Factors. Job Dissatisfaction')].applymap(update_vals).any(1, skipna=False)
tafe_resignations_up = tafe_resignations
print(tafe_resignations_up.loc[:, 'dissatisfied'].value_counts(dropna=False))
False 241 True 91 NaN 8 Name: dissatisfied, dtype: int64
/dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/indexing.py:845: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy /dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/indexing.py:966: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
# Create a new column designated as ['institute'] with every row being 'TAFE'.
tafe_resignations_up.loc[:, 'institute'] = 'TAFE'
tafe_resignations_up.head()
tafe_resignations_up.loc[:, 'institute_service'].value_counts(dropna=False)
Less than 1 year 73 1-2 64 3-4 63 NaN 50 5-6 33 11-20 26 7-10 21 More than 20 years 10 Name: institute_service, dtype: int64
# Allocate the years of service into four categorical groups:
# New < 3 years
# Experienced 3-6 years
# Established 7-10 yers
# Veteran > 10 years
def update_vals(element):
if element == 'Less than 1 year':
return 'New'
elif element == '1-2':
return 'New'
elif element == '3-4':
return 'Experienced'
elif element == '5-6':
return 'Experienced'
elif element == '7-10':
return 'Established'
elif element == '11-20':
return 'Veteran'
elif element == 'More than 20 years':
return 'Veteran'
elif pd.isnull(element):
return np.nan
tafe_resignations_up.loc[:, 'service_cat'] = tafe_resignations_up.loc[:, 'institute_service'].map(update_vals)
tafe_resignations_up.head()
tafe_resignations_up.loc[:, 'service_cat'].value_counts(dropna=False)
New 137 Experienced 96 NaN 50 Veteran 36 Established 21 Name: service_cat, dtype: int64
# Use _pivot_table_ to determine percentage of employees resigning
# due to dissatisfaction for each years of service group.
tafe_resignations_up.loc[:, 'dissatisfied'] = tafe_resignations_up.loc[:, 'dissatisfied'].fillna(False)
print(tafe_resignations_up.loc[:, 'dissatisfied'].value_counts(dropna=False))
print('\n')
Percent_Average = 100*(91/(241+91))
print("Percent Average TAFE Employees Dissatisfied Regardless of Years of Service =", Percent_Average)
print('\n')
diss_tafe_pct1 = tafe_resignations_up.pivot_table(index='service_cat', values='dissatisfied')
print(diss_tafe_pct1.info)
False 249 True 91 Name: dissatisfied, dtype: int64 Percent Average TAFE Employees Dissatisfied Regardless of Years of Service = 27.40963855421687 <bound method DataFrame.info of dissatisfied service_cat Established 0.333333 Experienced 0.250000 New 0.262774 Veteran 0.277778>
# Use matplotlib dunction to plot a bar chart to provide visual
# comparison between yrs of service groups for percent employees
# resigning due to some kind of dissatisfaction.
%matplotlib inline
diss_tafe_pct1.plot(kind='bar', rot=30, ylim=(0,0.40))
for i in range(4):
plt.text(i-0.16, 0.15, round(diss_tafe_pct1.loc[:, 'dissatisfied'], 3).iloc[i], c='k')
plt.xlabel('Years of Service Categories')
plt.ylabel('Proportion of Employees Dissatisfied')
plt.title('Proportion Dissatisfied by Service Categories: TAFE')
Text(0.5, 1.0, 'Proportion Dissatisfied by Service Categories: TAFE')
Question 1:
1. Are employees who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer?
Answers:
The answer to both questions above is YES.
For employees classified as "New" (less than 3 years of service), about 26% resigned due to some kind of dissatisfaction, which is very close to Experienced and Veteran groups. The Established group ws slightly higher at 33%.
The overall range of percentages for the TAFE Institute being 8.3 is much lower than that of the DETE Institute: 23.5. Also the overall average across all years of service groups for TAFE is 27% which is much lower than that for DETE: 49%.
To see these significant differences between the two institutes tells me it was appopriate to evaluate the results for each institute separately first before combining the data and analyzing the grouped results. We shall see if that masks the signficant differences between the two institutes.
I will not conclude anything further at this point but wait until analysis is complete for both institutes.
# Observe the data stucture in the age column to determine how to organize into groups.
tafe_resignations_up.loc[:, 'age'].value_counts(dropna=False)
NaN 50 41 45 45 51-55 39 46 50 39 21 25 33 31 35 32 36 40 32 26 30 32 56 or older 29 20 or younger 9 Name: age, dtype: int64
# Use the _pd_str.strip and replace_ code to standardize the age group format:
tafe_resignations_up.copy()
tafe_resignations_up.loc[:, 'age_cleaned'] = tafe_resignations_up.loc[:, 'age'].str.strip().str.replace(' ', '-')
tafe_resignations_up.loc[:, 'age_cleaned'].value_counts(dropna=False)
/dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/indexing.py:845: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy /dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/indexing.py:966: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
NaN 50 41-45 45 51-55 39 46-50 39 21-25 33 36-40 32 31-35 32 26-30 32 56 or older 29 20 or younger 9 Name: age_cleaned, dtype: int64
# Consolidate the employee age into five age range groups:
def update_vals(element):
if element == '20 or younger':
return 'Twenties'
if element == '21-25':
return 'Twenties'
if element == '26-30':
return 'Twenties'
if element == '31-35':
return 'Thirties'
if element == '36-40':
return 'Thirties'
if element == '41-45':
return 'Forties'
if element == '46-50':
return 'Forties'
if element == '51-55':
return 'Fifties'
if element == '56 or older':
return 'Above 55'
elif pd.isnull(element):
return np.nan
tafe_resignations_up.loc[:, 'age_groups'] = tafe_resignations_up.loc[:, 'age_cleaned'].map(update_vals)
tafe_resignations_up.head()
tafe_resignations_up.loc[:, 'age_groups'].value_counts(dropna=False)
Forties 84 Twenties 74 Thirties 64 NaN 50 Fifties 39 Above 55 29 Name: age_groups, dtype: int64
# Use _pivot_table_ to determine percentage of employees resigning
# due to dissatisfaction for each age group.
tafe_resignations_up.loc[:, 'dissatisfied'] = tafe_resignations_up.loc[:, 'dissatisfied'].fillna(False)
print(tafe_resignations_up.loc[:, 'dissatisfied'].value_counts(dropna=False))
print('\n')
Percent_Average = 100*(91/(241+91))
print("Percent Average TAFE Employees Dissatisfied Regardless of Years of Service =", Percent_Average)
print('\n')
diss_tafe_pct2 = tafe_resignations_up.pivot_table(index='age_groups', values='dissatisfied')
print(diss_tafe_pct2.info)
False 249 True 91 Name: dissatisfied, dtype: int64 Percent Average TAFE Employees Dissatisfied Regardless of Years of Service = 27.40963855421687 <bound method DataFrame.info of dissatisfied age_groups Above 55 0.206897 Fifties 0.282051 Forties 0.285714 Thirties 0.250000 Twenties 0.270270>
# Use matplotlib function to plot a bar chart to provide visual
# comparison between age groups for percent employees resigning
# due to some kind of dissatisfaction.
%matplotlib inline
import matplotlib.pyplot as plt
diss_tafe_pct2.plot(kind='bar', rot=30, ylim=(0,0.33))
for i in range(5):
plt.text(i-0.22, 0.12, round(diss_tafe_pct2.loc[:, 'dissatisfied'], 3).iloc[i], c='k')
plt.xlabel('Age Group Categories')
plt.ylabel('Proportion of Employees Dissatisfied')
plt.title('Proportion Dissatisfied by Age Group: TAFE')
Text(0.5, 1.0, 'Proportion Dissatisfied by Age Group: TAFE')
Question 2:
Are younger employees resigning due to some kind of dissatisfaction? What about older employees?
Answers:
The answer to both questions above is YES.
There's very little difference in percent dissatisfied for employees in there twenties, thirties, forties and fifties: between 25% to 29%. There's a slight decrease in percent for employees over 55: 21%.
The overall range of percentages for the TAFE Institute being 7.9 is much lower than that of the DETE Institute: 14. Also the overall average across all employee age groups for TAFE is 27% which is much lower than that for DETE: 49%.
# USE concatenate to combine the two data sets.
combined = pd.concat([dete_resignations_up, tafe_resignations_up], axis=0, ignore_index=True)
combined.head()
print(combined.info())
print('\n')
print(combined.loc[:, 'service_cat'].value_counts(dropna=False))
print('\n')
print(combined.loc[:, 'dissatisfied'].value_counts(dropna=False))
print('\n')
print(combined.notnull().sum().sort_values())
print('\n')
<class 'pandas.core.frame.DataFrame'> RangeIndex: 651 entries, 0 to 650 Data columns (total 57 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 651 non-null float64 1 separationtype 651 non-null object 2 cease_date 635 non-null object 3 dete_start_date 283 non-null float64 4 role_start_date 271 non-null float64 5 position 598 non-null object 6 classification 161 non-null object 7 region 265 non-null object 8 business_unit 32 non-null object 9 employment_status 597 non-null object 10 career_move_to_public_sector 311 non-null object 11 career_move_to_private_sector 311 non-null object 12 interpersonal_conflicts 311 non-null object 13 job_dissatisfaction 311 non-null object 14 dissatisfaction_with_the_department 311 non-null object 15 physical_work_environment 311 non-null object 16 lack_of_recognition 311 non-null object 17 lack_of_job_security 311 non-null object 18 work_location 311 non-null object 19 employment_conditions 311 non-null object 20 maternity/family 311 non-null object 21 relocation 311 non-null object 22 study/travel 311 non-null object 23 ill_health 311 non-null object 24 traumatic_incident 311 non-null object 25 work_life_balance 311 non-null object 26 workload 311 non-null object 27 none_of_the_above 311 non-null object 28 gender 592 non-null object 29 age 596 non-null object 30 aboriginal 7 non-null object 31 torres_strait 0 non-null object 32 south_sea 3 non-null object 33 disability 8 non-null object 34 nesb 9 non-null object 35 cease_date_cleaned 300 non-null float64 36 institute_service 563 non-null object 37 dissatisfied 651 non-null bool 38 institute 651 non-null object 39 service_cat 563 non-null object 40 age_groups 596 non-null object 41 Institute 340 non-null object 42 WorkArea 340 non-null object 43 Contributing Factors. Career Move - Public Sector 332 non-null object 44 Contributing Factors. Career Move - Private Sector 332 non-null object 45 Contributing Factors. Career Move - Self-employment 332 non-null object 46 Contributing Factors. Ill Health 332 non-null object 47 Contributing Factors. Maternity/Family 332 non-null object 48 Contributing Factors. Dissatisfaction 332 non-null object 49 Contributing Factors. Job Dissatisfaction 332 non-null object 50 Contributing Factors. Interpersonal Conflict 332 non-null object 51 Contributing Factors. Study 332 non-null object 52 Contributing Factors. Travel 332 non-null object 53 Contributing Factors. Other 332 non-null object 54 Contributing Factors. NONE 332 non-null object 55 role_service 290 non-null object 56 age_cleaned 290 non-null object dtypes: bool(1), float64(4), object(52) memory usage: 285.6+ KB None New 193 Experienced 172 Veteran 136 NaN 88 Established 62 Name: service_cat, dtype: int64 False 411 True 240 Name: dissatisfied, dtype: int64 torres_strait 0 south_sea 3 aboriginal 7 disability 8 nesb 9 business_unit 32 classification 161 region 265 role_start_date 271 dete_start_date 283 role_service 290 age_cleaned 290 cease_date_cleaned 300 none_of_the_above 311 workload 311 work_life_balance 311 traumatic_incident 311 study/travel 311 relocation 311 maternity/family 311 ill_health 311 work_location 311 lack_of_job_security 311 lack_of_recognition 311 physical_work_environment 311 dissatisfaction_with_the_department 311 job_dissatisfaction 311 interpersonal_conflicts 311 career_move_to_private_sector 311 career_move_to_public_sector 311 employment_conditions 311 Contributing Factors. Career Move - Private Sector 332 Contributing Factors. Career Move - Public Sector 332 Contributing Factors. Maternity/Family 332 Contributing Factors. Dissatisfaction 332 Contributing Factors. Job Dissatisfaction 332 Contributing Factors. Travel 332 Contributing Factors. NONE 332 Contributing Factors. Other 332 Contributing Factors. Study 332 Contributing Factors. Interpersonal Conflict 332 Contributing Factors. Career Move - Self-employment 332 Contributing Factors. Ill Health 332 Institute 340 WorkArea 340 service_cat 563 institute_service 563 gender 592 age 596 age_groups 596 employment_status 597 position 598 cease_date 635 institute 651 dissatisfied 651 separationtype 651 id 651 dtype: int64
print(combined.loc[:, 'dissatisfied'].value_counts(dropna=False))
print('\n')
Percent_Average = 100*(240/(240+411))
print("Percent Average COMBINED Dissatisfied Regardless Their Age =", Percent_Average)
print('\n')
combined_updated = combined.dropna(axis=1, thresh = 400).copy()
not_missing = combined_updated.notnull().sum()
print(not_missing)
False 411 True 240 Name: dissatisfied, dtype: int64 Percent Average COMBINED Dissatisfied Regardless Their Age = 36.86635944700461 id 651 separationtype 651 cease_date 635 position 598 employment_status 597 gender 592 age 596 institute_service 563 dissatisfied 651 institute 651 service_cat 563 age_groups 596 dtype: int64
combined_updated_2 = combined_updated
combined_updated_2.loc[:, 'dissatisfied'] = combined_updated_2.loc[:, 'dissatisfied'].fillna(False)
diss_pct1 = combined_updated_2.pivot_table(index='service_cat', values='dissatisfied')
print(diss_pct1.info)
<bound method DataFrame.info of dissatisfied service_cat Established 0.516129 Experienced 0.343023 New 0.295337 Veteran 0.485294>
# Use matplotlib function to plot a bar chart to provide visual
# comparison between age groups for percent employees resigning
# due to some kind of dissatisfaction.
%matplotlib inline
diss_pct1.plot(kind='bar', rot=30, ylim=(0,0.65))
for i in range(4):
plt.text(i-0.16, 0.15, round(diss_pct1.loc[:, 'dissatisfied'], 3).iloc[i], c='k')
plt.xlabel('Years of Service Categories')
plt.ylabel('Proportion of Employees Dissatisfied')
plt.title('Proportion Dissatisfied by Service Categories: Combined')
Text(0.5, 1.0, 'Proportion Dissatisfied by Service Categories: Combined')
Question 1:
1. Are employees who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer?
Answers:
The answer to both questions above is YES.
For employees classified as "New" (less than 3 years of service), about 30% resigned due to some kind of dissatisfaction. Veteran and Established groups were the higher ones at 49% and 52% respectively.
So, if a task force was commissioned to focus on the group with the higher percentage, I suppose they wouldn't choose Veteran since they are very close to retirement age. If they chose to focus on the Established group, would it be correct to use the same strategy for both institutes, assuming the Established group has the same issues for both?
# Use the _pd_str.strip and replace_ code to standardize the age group format.
combined_updated_2.loc[:, 'age_groups'].value_counts(dropna=False)
age_2 = combined_updated_2.loc[:, 'age'].str.strip().str.replace(' ', '-')
combined_updated_2.loc[:, 'age_cleaned'] = age_2
print(combined_updated_2.loc[:, 'age_cleaned'].value_counts(dropna=False))
41-45 93 46-50 81 36-40 73 51-55 71 26-30 67 21-25 62 31-35 61 NaN 55 56 or older 29 56-60 26 61 or older 23 20 or younger 10 Name: age_cleaned, dtype: int64
# Consolidate the employee age into five age range groups:
def update_vals(element):
if element == '20 or younger':
return 'Twenties'
if element == '21-25':
return 'Twenties'
if element == '26-30':
return 'Twenties'
if element == '31-35':
return 'Thirties'
if element == '36-40':
return 'Thirties'
if element == '41-45':
return 'Forties'
if element == '46-50':
return 'Forties'
if element == '51-55':
return 'Fifties'
if element == '56-60':
return 'Above 55'
if element == '56 or older':
return 'Above 55'
if element == '61 or older':
return 'Above 55'
elif pd.isnull(element):
return np.nan
combined_updated_2.loc[:, 'age_groups'] = combined_updated_2.loc[:, 'age_cleaned'].map(update_vals)
print(combined_updated_2.loc[:, 'age_groups'].value_counts(dropna=False))
Forties 174 Twenties 139 Thirties 134 Above 55 78 Fifties 71 NaN 55 Name: age_groups, dtype: int64
print(combined_updated_2.loc[:, 'dissatisfied'].value_counts(dropna=False))
diss_pct2 = combined_updated_2.pivot_table(index='age_groups', values='dissatisfied')
print(diss_pct2.info)
False 411 True 240 Name: dissatisfied, dtype: int64 <bound method DataFrame.info of dissatisfied age_groups Above 55 0.423077 Fifties 0.422535 Forties 0.379310 Thirties 0.358209 Twenties 0.352518>
# Use matplotlib function to plot a bar chart to provide visual
# comparison between age groups for percent employees resigning
# due to some kind of dissatisfaction.
%matplotlib inline
import matplotlib.pyplot as plt
diss_pct2.plot(kind='bar', rot=30, ylim=(0,0.6))
for i in range(5):
plt.text(i-0.21, 0.2, round(diss_pct2.loc[:, 'dissatisfied'], 3).iloc[i], c='k')
plt.xlabel('Age Group Categories')
plt.ylabel('Proportion of Employees Dissatisfied')
plt.title('Proportion Dissatisfied by Age Group: Combined')
Text(0.5, 1.0, 'Proportion Dissatisfied by Age Group: Combined')
Question 2:
Are younger employees resigning due to some kind of dissatisfaction? What about older employees?
Answers:
The answer to both questions above is YES.
There's very little difference in percent dissatisfied for employees in there twenties, thirties, forties: between 35% to 38%. The result for employees in the fifties and over 55 are the same: 42%.
The total range among the groups is only 7. This suggests that age of employee has no bearing on level of percent dissatisfied.
However, is that necessarily true within each of the institutes? Now, for the conclusion of this project.
In conclusion, I would say that it paid dividends to analyze the data by each institute, DETE and TAFE separately before combining the data.
If I was part of a Corporate Human Resource Team commissioned to reduce the percentage of employees resigning due to some dissatisfaction, our strategy would be much different if we had the data broken out by institute versus having only the combined data results. Seeing that the overall percent of dissatisfied employees within DETE (49.0%) is almost twice that of TAFE (27.4%), we would most likely begin establishing a strategy starting with the DETE Institute first.
If we happened to have two teams available with each one focusing on one of the two institutes, each would most likely establish a different strategy of attack based on the results in the tables below for each institute.
If we had only the combined data results to base our strategy on, it would be difficult to know where to begin.
% OF EMPLOYEES RESIGNING DUE TO SOME DISSATISFACTION
----------------------------------------------------
Yrs. Service Group | Years | DETE % | TAFE % | Difference | Combined % |
---|---|---|---|---|---|
New | < 3 | 37.5 | 26.3 | 11.2 | 29.5 |
Experienced | 3-6 | 46.1 | 25.0 | 21.1 | 34.3 |
Established | 7-10 | 61.0 | 33.3 | 27.7 | 51.6 |
Veteran | > 10 | 56.0 | 27.8 | 28.2 | 48.5 |
Employee Age Group | DETE % | TAFE % | Difference | Combined % |
---|---|---|---|---|
Under 30 | 44.6 | 27.0 | 17.6 | 35.3 |
Thirties | 45.7 | 25.0 | 20.7 | 35.8 |
Forties | 46.7 | 28.6 | 18.1 | 37.9 |
Fifties | 58.6 | 28.2 | 30.4 | 42.3 |
Over 56 | 52.2 | 20.7 | 31.5 | 42.3 |
It is interesting to see that in both tables above, the difference (in red) in percent between the two institutes increases as years of service or employee age increases.