In this project, we'll work with exit surveys from employees of the Department of Education, Training and Employment (DETE) and the Technical and Further Education (TAFE) institute in Queensland, Australia. You can find the TAFE exit survey here and the survey for the DETE here.
To achieve our objectives, we combined the data sets from both surveys. However, although both used the same survey template, one of them customized some of the responses. In addition, we were not provided with information about the columns of the datasets, so we used our general knowledge to define the fields in both DataFrames.
Employees who resigned due to some type of dissatisfaction based on length of service at a company.
SERV. CAT. | DETE | DETE % | TAFE | TAFE % | Total | Total % |
---|---|---|---|---|---|---|
Veteran | 64 | 86.49 | 10 | 13.51 | 74 | 32.74 |
Established | 25 | 78.13 | 7 | 21.88 | 32 | 14.16 |
Experienced | 39 | 61.90 | 24 | 38.10 | 63 | 27.88 |
New | 21 | 36.84 | 36 | 63.16 | 57 | 25.22 |
TOTALS | 149 | - | 77 | - | 226 | 100.00 |
Employees who resigned due to some type of dissatisfaction based on age group.
AGE CAT. | DETE | DETE % | TAFE | TAFE % | Total | Total % |
---|---|---|---|---|---|---|
Senior | 65 | 69.15 | 29 | 30.85 | 94 | 41.59 |
Middle | 55 | 66.27 | 28 | 33.73 | 83 | 36.73 |
Junior | 29 | 59.18 | 20 | 40.82 | 49 | 21.68 |
TOTALS | 149 | - | 77 | - | 226 | 100.00 |
We'll import the necessary libraries to start the development of the project.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
Now, we'll read both data sets and check their dimensions (number of rows and columns).
dete_survey = pd.read_csv('dete_survey.csv')
tafe_survey = pd.read_csv('tafe_survey.csv')
dim_dete = dete_survey.shape
dim_tafe = tafe_survey.shape
print('DETE-Survey:\nTotal Rows: {}\nTotal Columns: {}'.format(dim_dete[0], dim_dete[1]))
print('\nTAFE-Survey:\nTotal Rows: {}\nTotal Columns: {}'.format(dim_tafe[0], dim_tafe[1]))
DETE-Survey: Total Rows: 822 Total Columns: 56 TAFE-Survey: Total Rows: 702 Total Columns: 72
print('DETE Survey - Columns:', '-' * 22, *dete_survey.columns, sep='\n')
print('\nTAFE Survey - Columns:', '-' * 22, *tafe_survey.columns, sep='\n')
DETE Survey - Columns: ---------------------- ID SeparationType Cease Date DETE Start Date Role Start Date Position Classification Region Business Unit Employment Status Career move to public sector Career move to private sector Interpersonal conflicts Job dissatisfaction Dissatisfaction with the department Physical work environment Lack of recognition Lack of job security Work location Employment conditions Maternity/family Relocation Study/Travel Ill Health Traumatic incident Work life balance Workload None of the above Professional Development Opportunities for promotion Staff morale Workplace issue Physical environment Worklife balance Stress and pressure support Performance of supervisor Peer support Initiative Skills Coach Career Aspirations Feedback Further PD Communication My say Information Kept informed Wellness programs Health & Safety Gender Age Aboriginal Torres Strait South Sea Disability NESB TAFE Survey - Columns: ---------------------- Record ID Institute WorkArea CESSATION YEAR Reason for ceasing employment Contributing Factors. Career Move - Public Sector Contributing Factors. Career Move - Private Sector Contributing Factors. Career Move - Self-employment Contributing Factors. Ill Health Contributing Factors. Maternity/Family Contributing Factors. Dissatisfaction Contributing Factors. Job Dissatisfaction Contributing Factors. Interpersonal Conflict Contributing Factors. Study Contributing Factors. Travel Contributing Factors. Other Contributing Factors. NONE Main Factor. Which of these was the main factor for leaving? InstituteViews. Topic:1. I feel the senior leadership had a clear vision and direction InstituteViews. Topic:2. I was given access to skills training to help me do my job better InstituteViews. Topic:3. I was given adequate opportunities for personal development InstituteViews. Topic:4. I was given adequate opportunities for promotion within %Institute]Q25LBL% InstituteViews. Topic:5. I felt the salary for the job was right for the responsibilities I had InstituteViews. Topic:6. The organisation recognised when staff did good work InstituteViews. Topic:7. Management was generally supportive of me InstituteViews. Topic:8. Management was generally supportive of my team InstituteViews. Topic:9. I was kept informed of the changes in the organisation which would affect me InstituteViews. Topic:10. Staff morale was positive within the Institute InstituteViews. Topic:11. If I had a workplace issue it was dealt with quickly InstituteViews. Topic:12. If I had a workplace issue it was dealt with efficiently InstituteViews. Topic:13. If I had a workplace issue it was dealt with discreetly WorkUnitViews. Topic:14. I was satisfied with the quality of the management and supervision within my work unit WorkUnitViews. Topic:15. I worked well with my colleagues WorkUnitViews. Topic:16. My job was challenging and interesting WorkUnitViews. Topic:17. I was encouraged to use my initiative in the course of my work WorkUnitViews. Topic:18. I had sufficient contact with other people in my job WorkUnitViews. Topic:19. I was given adequate support and co-operation by my peers to enable me to do my job WorkUnitViews. Topic:20. I was able to use the full range of my skills in my job WorkUnitViews. Topic:21. I was able to use the full range of my abilities in my job. ; Category:Level of Agreement; Question:YOUR VIEWS ABOUT YOUR WORK UNIT] WorkUnitViews. Topic:22. I was able to use the full range of my knowledge in my job WorkUnitViews. Topic:23. My job provided sufficient variety WorkUnitViews. Topic:24. I was able to cope with the level of stress and pressure in my job WorkUnitViews. Topic:25. My job allowed me to balance the demands of work and family to my satisfaction WorkUnitViews. Topic:26. My supervisor gave me adequate personal recognition and feedback on my performance WorkUnitViews. Topic:27. My working environment was satisfactory e.g. sufficient space, good lighting, suitable seating and working area WorkUnitViews. Topic:28. I was given the opportunity to mentor and coach others in order for me to pass on my skills and knowledge prior to my cessation date WorkUnitViews. Topic:29. There was adequate communication between staff in my unit WorkUnitViews. Topic:30. Staff morale was positive within my work unit Induction. Did you undertake Workplace Induction? InductionInfo. Topic:Did you undertake a Corporate Induction? InductionInfo. Topic:Did you undertake a Institute Induction? InductionInfo. Topic: Did you undertake Team Induction? InductionInfo. Face to Face Topic:Did you undertake a Corporate Induction; Category:How it was conducted? InductionInfo. On-line Topic:Did you undertake a Corporate Induction; Category:How it was conducted? InductionInfo. Induction Manual Topic:Did you undertake a Corporate Induction? InductionInfo. Face to Face Topic:Did you undertake a Institute Induction? InductionInfo. On-line Topic:Did you undertake a Institute Induction? InductionInfo. Induction Manual Topic:Did you undertake a Institute Induction? InductionInfo. Face to Face Topic: Did you undertake Team Induction; Category? InductionInfo. On-line Topic: Did you undertake Team Induction?process you undertook and how it was conducted.] InductionInfo. Induction Manual Topic: Did you undertake Team Induction? Workplace. Topic:Did you and your Manager develop a Performance and Professional Development Plan (PPDP)? Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination? Workplace. Topic:Does your workplace promote and practice the principles of employment equity? Workplace. Topic:Does your workplace value the diversity of its employees? Workplace. Topic:Would you recommend the Institute as an employer to others? Gender. What is your Gender? CurrentAge. Current Age Employment Type. Employment Type Classification. Classification LengthofServiceOverall. Overall Length of Service at Institute (in years) LengthofServiceCurrent. Length of Service at current workplace (in years)
Observation.- We notice that in both DataFrames there are many columns. So, in order to be able to observe the values in all of them, we'll modify some necessary options inside pandas.
pd.set_option('display.max_columns', 1000)
pd.set_option('display.max_rows', 1000)
pd.set_option('display.max_colwidth', 73)
pd.options.display.float_format = "{:,.2f}".format
Before continuing, we'll implement a function that allows us to better visualize the information in the datasets.
def df_info(df, title_df=''):
rows = df.shape[0]
cols = df.shape[1]
title = title_df + ' ' + str(type(df)) + ':'
col_names = list(df.columns)
info_df = pd.DataFrame(col_names, columns=['Column Name'])
info_df['Non-null Count'] = df.count().values
info_df['Missing Values Count'] = df.isnull().sum().values
info_df['Missing Values %'] = (info_df['Missing Values Count'] / rows) * 100
info_df['DType'] = df.dtypes.values
dtype_vc = info_df['DType'].value_counts()
dtype_, dt_total = dtype_vc.index, dtype_vc.values
print(title, '-' * len(title), sep='\n')
print('RangeIndex: {:,} entries, 0 to {:,}\nData columns (total {} columns):'.format(rows, rows-1, cols))
print('DTypes', end=': ')
for dt, val in zip(dtype_, dt_total):
print('{}({})'.format(dt, val), end=' ')
return info_df
Well, now we are going to visualize some values in both DataFrames and we'll also see the detail of the data contained in them.
info_dete = df_info(dete_survey, title_df='DETE - Survey')
info_dete
DETE - Survey <class 'pandas.core.frame.DataFrame'>: ---------------------------------------------------- RangeIndex: 822 entries, 0 to 821 Data columns (total 56 columns): DTypes: object(37) bool(18) int64(1)
Column Name | Non-null Count | Missing Values Count | Missing Values % | DType | |
---|---|---|---|---|---|
0 | ID | 822 | 0 | 0.00 | int64 |
1 | SeparationType | 822 | 0 | 0.00 | object |
2 | Cease Date | 822 | 0 | 0.00 | object |
3 | DETE Start Date | 822 | 0 | 0.00 | object |
4 | Role Start Date | 822 | 0 | 0.00 | object |
5 | Position | 817 | 5 | 0.61 | object |
6 | Classification | 455 | 367 | 44.65 | object |
7 | Region | 822 | 0 | 0.00 | object |
8 | Business Unit | 126 | 696 | 84.67 | object |
9 | Employment Status | 817 | 5 | 0.61 | object |
10 | Career move to public sector | 822 | 0 | 0.00 | bool |
11 | Career move to private sector | 822 | 0 | 0.00 | bool |
12 | Interpersonal conflicts | 822 | 0 | 0.00 | bool |
13 | Job dissatisfaction | 822 | 0 | 0.00 | bool |
14 | Dissatisfaction with the department | 822 | 0 | 0.00 | bool |
15 | Physical work environment | 822 | 0 | 0.00 | bool |
16 | Lack of recognition | 822 | 0 | 0.00 | bool |
17 | Lack of job security | 822 | 0 | 0.00 | bool |
18 | Work location | 822 | 0 | 0.00 | bool |
19 | Employment conditions | 822 | 0 | 0.00 | bool |
20 | Maternity/family | 822 | 0 | 0.00 | bool |
21 | Relocation | 822 | 0 | 0.00 | bool |
22 | Study/Travel | 822 | 0 | 0.00 | bool |
23 | Ill Health | 822 | 0 | 0.00 | bool |
24 | Traumatic incident | 822 | 0 | 0.00 | bool |
25 | Work life balance | 822 | 0 | 0.00 | bool |
26 | Workload | 822 | 0 | 0.00 | bool |
27 | None of the above | 822 | 0 | 0.00 | bool |
28 | Professional Development | 808 | 14 | 1.70 | object |
29 | Opportunities for promotion | 735 | 87 | 10.58 | object |
30 | Staff morale | 816 | 6 | 0.73 | object |
31 | Workplace issue | 788 | 34 | 4.14 | object |
32 | Physical environment | 817 | 5 | 0.61 | object |
33 | Worklife balance | 815 | 7 | 0.85 | object |
34 | Stress and pressure support | 810 | 12 | 1.46 | object |
35 | Performance of supervisor | 813 | 9 | 1.09 | object |
36 | Peer support | 812 | 10 | 1.22 | object |
37 | Initiative | 813 | 9 | 1.09 | object |
38 | Skills | 811 | 11 | 1.34 | object |
39 | Coach | 767 | 55 | 6.69 | object |
40 | Career Aspirations | 746 | 76 | 9.25 | object |
41 | Feedback | 792 | 30 | 3.65 | object |
42 | Further PD | 768 | 54 | 6.57 | object |
43 | Communication | 814 | 8 | 0.97 | object |
44 | My say | 812 | 10 | 1.22 | object |
45 | Information | 816 | 6 | 0.73 | object |
46 | Kept informed | 813 | 9 | 1.09 | object |
47 | Wellness programs | 766 | 56 | 6.81 | object |
48 | Health & Safety | 793 | 29 | 3.53 | object |
49 | Gender | 798 | 24 | 2.92 | object |
50 | Age | 811 | 11 | 1.34 | object |
51 | Aboriginal | 16 | 806 | 98.05 | object |
52 | Torres Strait | 3 | 819 | 99.64 | object |
53 | South Sea | 7 | 815 | 99.15 | object |
54 | Disability | 23 | 799 | 97.20 | object |
55 | NESB | 32 | 790 | 96.11 | object |
ID
column is of data type int64.del info_dete
dete_survey.head(7)
ID | SeparationType | Cease Date | DETE Start Date | Role Start Date | Position | Classification | Region | Business Unit | Employment Status | Career move to public sector | Career move to private sector | Interpersonal conflicts | Job dissatisfaction | Dissatisfaction with the department | Physical work environment | Lack of recognition | Lack of job security | Work location | Employment conditions | Maternity/family | Relocation | Study/Travel | Ill Health | Traumatic incident | Work life balance | Workload | None of the above | Professional Development | Opportunities for promotion | Staff morale | Workplace issue | Physical environment | Worklife balance | Stress and pressure support | Performance of supervisor | Peer support | Initiative | Skills | Coach | Career Aspirations | Feedback | Further PD | Communication | My say | Information | Kept informed | Wellness programs | Health & Safety | Gender | Age | Aboriginal | Torres Strait | South Sea | Disability | NESB | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Ill Health Retirement | 08/2012 | 1984 | 2004 | Public Servant | A01-A04 | Central Office | Corporate Strategy and Peformance | Permanent Full-time | True | False | False | True | False | False | True | False | False | False | False | False | False | False | False | False | False | True | A | A | N | N | N | A | A | A | A | N | N | N | A | A | A | N | A | A | N | N | N | Male | 56-60 | NaN | NaN | NaN | NaN | Yes |
1 | 2 | Voluntary Early Retirement (VER) | 08/2012 | Not Stated | Not Stated | Public Servant | AO5-AO7 | Central Office | Corporate Strategy and Peformance | Permanent Full-time | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | A | A | N | N | N | N | A | A | A | N | N | N | A | A | A | N | A | A | N | N | N | Male | 56-60 | NaN | NaN | NaN | NaN | NaN |
2 | 3 | Voluntary Early Retirement (VER) | 05/2012 | 2011 | 2011 | Schools Officer | NaN | Central Office | Education Queensland | Permanent Full-time | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | N | N | N | N | N | N | N | N | N | N | N | N | N | N | N | A | A | N | N | N | N | Male | 61 or older | NaN | NaN | NaN | NaN | NaN |
3 | 4 | Resignation-Other reasons | 05/2012 | 2005 | 2006 | Teacher | Primary | Central Queensland | NaN | Permanent Full-time | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | A | N | N | N | A | A | N | N | A | A | A | A | A | A | A | A | A | A | A | N | A | Female | 36-40 | NaN | NaN | NaN | NaN | NaN |
4 | 5 | Age Retirement | 05/2012 | 1970 | 1989 | Head of Curriculum/Head of Special Education | NaN | South East | NaN | Permanent Full-time | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False | A | A | N | N | D | D | N | A | A | A | A | A | A | SA | SA | D | D | A | N | A | M | Female | 61 or older | NaN | NaN | NaN | NaN | NaN |
5 | 6 | Resignation-Other reasons | 05/2012 | 1994 | 1997 | Guidance Officer | NaN | Central Office | Education Queensland | Permanent Full-time | False | True | False | False | False | False | False | False | False | True | True | False | False | False | False | False | False | False | SD | SD | SD | D | SD | D | D | A | NaN | SA | A | A | N | N | D | A | D | D | D | D | NaN | Female | 41-45 | NaN | NaN | NaN | NaN | NaN |
6 | 7 | Age Retirement | 05/2012 | 1972 | 2007 | Teacher | Secondary | Darling Downs South West | NaN | Permanent Part-time | False | False | False | True | True | False | False | False | False | False | False | False | False | False | False | False | False | False | SD | N | SD | D | A | A | D | D | SA | D | N | N | D | D | N | N | D | D | D | D | SD | Female | 56-60 | NaN | NaN | NaN | NaN | NaN |
dete_survey.tail(7)
ID | SeparationType | Cease Date | DETE Start Date | Role Start Date | Position | Classification | Region | Business Unit | Employment Status | Career move to public sector | Career move to private sector | Interpersonal conflicts | Job dissatisfaction | Dissatisfaction with the department | Physical work environment | Lack of recognition | Lack of job security | Work location | Employment conditions | Maternity/family | Relocation | Study/Travel | Ill Health | Traumatic incident | Work life balance | Workload | None of the above | Professional Development | Opportunities for promotion | Staff morale | Workplace issue | Physical environment | Worklife balance | Stress and pressure support | Performance of supervisor | Peer support | Initiative | Skills | Coach | Career Aspirations | Feedback | Further PD | Communication | My say | Information | Kept informed | Wellness programs | Health & Safety | Gender | Age | Aboriginal | Torres Strait | South Sea | Disability | NESB | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
815 | 817 | Resignation-Other employer | 01/2014 | 2012 | 2012 | Teacher | Primary | Far North Queensland | NaN | Permanent Full-time | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | A | N | A | A | A | A | A | A | A | A | A | N | SA | SA | SA | A | SA | SA | SA | SA | SA | Male | 21-25 | NaN | NaN | NaN | NaN | NaN |
816 | 818 | Resignation-Move overseas/interstate | 01/2014 | 2012 | 2012 | Teacher | Secondary | North Coast | NaN | Permanent Full-time | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False | False | False | False | SA | SA | A | A | SA | SA | N | N | A | A | A | A | A | N | N | N | A | N | A | N | A | Female | 21-25 | NaN | NaN | NaN | NaN | NaN |
817 | 819 | Age Retirement | 02/2014 | 1977 | 1999 | Teacher | Primary | Central Queensland | NaN | Permanent Part-time | False | False | False | False | False | False | False | False | False | False | True | False | False | True | False | False | True | False | SA | N | D | D | A | N | N | D | A | N | A | A | N | SA | SA | N | D | A | A | A | SA | Female | 56-60 | NaN | NaN | NaN | NaN | NaN |
818 | 820 | Age Retirement | 01/2014 | 1980 | 1980 | Teacher | Secondary | North Coast | NaN | Permanent Full-time | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | A | SA | D | D | D | A | A | N | A | N | A | A | N | A | N | N | A | A | N | N | N | Male | 51-55 | NaN | NaN | NaN | NaN | NaN |
819 | 821 | Resignation-Move overseas/interstate | 01/2014 | 2009 | 2009 | Public Servant | A01-A04 | Central Office | Education Queensland | Permanent Full-time | False | False | False | False | False | False | False | False | False | False | True | True | False | False | False | True | False | False | A | A | A | A | A | D | N | A | A | A | A | A | A | A | A | A | A | A | A | N | A | Female | 31-35 | NaN | NaN | NaN | NaN | NaN |
820 | 822 | Ill Health Retirement | 12/2013 | 2001 | 2009 | Teacher | Secondary | Darling Downs South West | NaN | Permanent Full-time | False | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False | False | False | A | D | D | A | SD | SD | SD | A | D | SD | SD | D | A | A | N | N | N | SD | A | N | A | Female | 41-45 | NaN | NaN | NaN | NaN | NaN |
821 | 823 | Resignation-Move overseas/interstate | 12/2013 | Not Stated | Not Stated | Teacher Aide | NaN | Metropolitan | NaN | NaN | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
DETE start date
and Role start date
we observed Not stated values, which also correspond to values considered as null.Professional development
column to the Health and safety
column we observe denoted values, for which we have no information. These values are most likely to be discarded as irrelevant to our analysis.In the following, we'll see the frequency table of some of the fields that we consider important.
Let's start by examining the ones in the SeparationType
field:
dete_survey['SeparationType'].value_counts(dropna=False)
Age Retirement 285 Resignation-Other reasons 150 Resignation-Other employer 91 Resignation-Move overseas/interstate 70 Voluntary Early Retirement (VER) 67 Ill Health Retirement 61 Other 49 Contract Expired 34 Termination 15 Name: SeparationType, dtype: int64
With a frequency of approximately 35%, the value Age Retirement is the predominant value, followed by the value Resignation-Other reasons with approximately 18%. (53% in total)
Now, let's review the values of the Position
column
dete_survey['Position'].value_counts(dropna=False)
Teacher 324 Teacher Aide 137 Public Servant 126 Cleaner 97 Head of Curriculum/Head of Special Education 38 Schools Officer 24 School Administrative Staff 16 Guidance Officer 12 Technical Officer 11 Professional Officer 7 Other 7 School Principal 5 School Based Professional Staff (Therapist, nurse, etc) 5 NaN 5 Business Service Manager 4 Deputy Principal 4 Name: Position, dtype: int64
With a frequency of approximately 39%, the value Teacher is the predominant value, followed by the value Teacher Aide with approximately 17% (56% in total).
dete_survey['Region'].value_counts(dropna=False)
Metropolitan 135 Central Office 130 South East 108 Not Stated 105 Darling Downs South West 95 Central Queensland 84 North Coast 75 North Queensland 53 Far North Queensland 37 Name: Region, dtype: int64
In the Region
column, the predominant values are Metropolitan and Central Office, both with a frequency of 16% respectively. (32% in total). We also observed null values, which appear as Not Stated with a frequency of approximately 13%.
dete_survey['Employment Status'].value_counts(dropna=False)
Permanent Full-time 434 Permanent Part-time 308 Temporary Full-time 41 Temporary Part-time 24 Casual 10 NaN 5 Name: Employment Status, dtype: int64
In the Employment Status
column, the predominant values are Permanent Full-time and Permanent Part-time, both with a frequency of 53% and 37% respectively. (90% in total).
info_tafe = df_info(tafe_survey, title_df='TAFE - Survey')
info_tafe
TAFE - Survey <class 'pandas.core.frame.DataFrame'>: ---------------------------------------------------- RangeIndex: 702 entries, 0 to 701 Data columns (total 72 columns): DTypes: object(70) float64(2)
Column Name | Non-null Count | Missing Values Count | Missing Values % | DType | |
---|---|---|---|---|---|
0 | Record ID | 702 | 0 | 0.00 | float64 |
1 | Institute | 702 | 0 | 0.00 | object |
2 | WorkArea | 702 | 0 | 0.00 | object |
3 | CESSATION YEAR | 695 | 7 | 1.00 | float64 |
4 | Reason for ceasing employment | 701 | 1 | 0.14 | object |
5 | Contributing Factors. Career Move - Public Sector | 437 | 265 | 37.75 | object |
6 | Contributing Factors. Career Move - Private Sector | 437 | 265 | 37.75 | object |
7 | Contributing Factors. Career Move - Self-employment | 437 | 265 | 37.75 | object |
8 | Contributing Factors. Ill Health | 437 | 265 | 37.75 | object |
9 | Contributing Factors. Maternity/Family | 437 | 265 | 37.75 | object |
10 | Contributing Factors. Dissatisfaction | 437 | 265 | 37.75 | object |
11 | Contributing Factors. Job Dissatisfaction | 437 | 265 | 37.75 | object |
12 | Contributing Factors. Interpersonal Conflict | 437 | 265 | 37.75 | object |
13 | Contributing Factors. Study | 437 | 265 | 37.75 | object |
14 | Contributing Factors. Travel | 437 | 265 | 37.75 | object |
15 | Contributing Factors. Other | 437 | 265 | 37.75 | object |
16 | Contributing Factors. NONE | 437 | 265 | 37.75 | object |
17 | Main Factor. Which of these was the main factor for leaving? | 113 | 589 | 83.90 | object |
18 | InstituteViews. Topic:1. I feel the senior leadership had a clear vis... | 608 | 94 | 13.39 | object |
19 | InstituteViews. Topic:2. I was given access to skills training to hel... | 613 | 89 | 12.68 | object |
20 | InstituteViews. Topic:3. I was given adequate opportunities for perso... | 610 | 92 | 13.11 | object |
21 | InstituteViews. Topic:4. I was given adequate opportunities for promo... | 608 | 94 | 13.39 | object |
22 | InstituteViews. Topic:5. I felt the salary for the job was right for ... | 615 | 87 | 12.39 | object |
23 | InstituteViews. Topic:6. The organisation recognised when staff did g... | 607 | 95 | 13.53 | object |
24 | InstituteViews. Topic:7. Management was generally supportive of me | 614 | 88 | 12.54 | object |
25 | InstituteViews. Topic:8. Management was generally supportive of my team | 608 | 94 | 13.39 | object |
26 | InstituteViews. Topic:9. I was kept informed of the changes in the or... | 610 | 92 | 13.11 | object |
27 | InstituteViews. Topic:10. Staff morale was positive within the Institute | 602 | 100 | 14.25 | object |
28 | InstituteViews. Topic:11. If I had a workplace issue it was dealt wit... | 601 | 101 | 14.39 | object |
29 | InstituteViews. Topic:12. If I had a workplace issue it was dealt wit... | 597 | 105 | 14.96 | object |
30 | InstituteViews. Topic:13. If I had a workplace issue it was dealt wit... | 601 | 101 | 14.39 | object |
31 | WorkUnitViews. Topic:14. I was satisfied with the quality of the mana... | 609 | 93 | 13.25 | object |
32 | WorkUnitViews. Topic:15. I worked well with my colleagues | 605 | 97 | 13.82 | object |
33 | WorkUnitViews. Topic:16. My job was challenging and interesting | 607 | 95 | 13.53 | object |
34 | WorkUnitViews. Topic:17. I was encouraged to use my initiative in the... | 610 | 92 | 13.11 | object |
35 | WorkUnitViews. Topic:18. I had sufficient contact with other people i... | 613 | 89 | 12.68 | object |
36 | WorkUnitViews. Topic:19. I was given adequate support and co-operatio... | 609 | 93 | 13.25 | object |
37 | WorkUnitViews. Topic:20. I was able to use the full range of my skill... | 609 | 93 | 13.25 | object |
38 | WorkUnitViews. Topic:21. I was able to use the full range of my abili... | 608 | 94 | 13.39 | object |
39 | WorkUnitViews. Topic:22. I was able to use the full range of my knowl... | 608 | 94 | 13.39 | object |
40 | WorkUnitViews. Topic:23. My job provided sufficient variety | 611 | 91 | 12.96 | object |
41 | WorkUnitViews. Topic:24. I was able to cope with the level of stress ... | 610 | 92 | 13.11 | object |
42 | WorkUnitViews. Topic:25. My job allowed me to balance the demands of ... | 611 | 91 | 12.96 | object |
43 | WorkUnitViews. Topic:26. My supervisor gave me adequate personal reco... | 606 | 96 | 13.68 | object |
44 | WorkUnitViews. Topic:27. My working environment was satisfactory e.g.... | 610 | 92 | 13.11 | object |
45 | WorkUnitViews. Topic:28. I was given the opportunity to mentor and co... | 609 | 93 | 13.25 | object |
46 | WorkUnitViews. Topic:29. There was adequate communication between sta... | 603 | 99 | 14.10 | object |
47 | WorkUnitViews. Topic:30. Staff morale was positive within my work unit | 606 | 96 | 13.68 | object |
48 | Induction. Did you undertake Workplace Induction? | 619 | 83 | 11.82 | object |
49 | InductionInfo. Topic:Did you undertake a Corporate Induction? | 432 | 270 | 38.46 | object |
50 | InductionInfo. Topic:Did you undertake a Institute Induction? | 483 | 219 | 31.20 | object |
51 | InductionInfo. Topic: Did you undertake Team Induction? | 440 | 262 | 37.32 | object |
52 | InductionInfo. Face to Face Topic:Did you undertake a Corporate Induc... | 555 | 147 | 20.94 | object |
53 | InductionInfo. On-line Topic:Did you undertake a Corporate Induction;... | 555 | 147 | 20.94 | object |
54 | InductionInfo. Induction Manual Topic:Did you undertake a Corporate I... | 555 | 147 | 20.94 | object |
55 | InductionInfo. Face to Face Topic:Did you undertake a Institute Induc... | 530 | 172 | 24.50 | object |
56 | InductionInfo. On-line Topic:Did you undertake a Institute Induction? | 555 | 147 | 20.94 | object |
57 | InductionInfo. Induction Manual Topic:Did you undertake a Institute I... | 553 | 149 | 21.23 | object |
58 | InductionInfo. Face to Face Topic: Did you undertake Team Induction; ... | 555 | 147 | 20.94 | object |
59 | InductionInfo. On-line Topic: Did you undertake Team Induction?proces... | 555 | 147 | 20.94 | object |
60 | InductionInfo. Induction Manual Topic: Did you undertake Team Induction? | 555 | 147 | 20.94 | object |
61 | Workplace. Topic:Did you and your Manager develop a Performance and P... | 608 | 94 | 13.39 | object |
62 | Workplace. Topic:Does your workplace promote a work culture free from... | 594 | 108 | 15.38 | object |
63 | Workplace. Topic:Does your workplace promote and practice the princip... | 587 | 115 | 16.38 | object |
64 | Workplace. Topic:Does your workplace value the diversity of its emplo... | 586 | 116 | 16.52 | object |
65 | Workplace. Topic:Would you recommend the Institute as an employer to ... | 581 | 121 | 17.24 | object |
66 | Gender. What is your Gender? | 596 | 106 | 15.10 | object |
67 | CurrentAge. Current Age | 596 | 106 | 15.10 | object |
68 | Employment Type. Employment Type | 596 | 106 | 15.10 | object |
69 | Classification. Classification | 596 | 106 | 15.10 | object |
70 | LengthofServiceOverall. Overall Length of Service at Institute (in ye... | 596 | 106 | 15.10 | object |
71 | LengthofServiceCurrent. Length of Service at current workplace (in ye... | 596 | 106 | 15.10 | object |
Record ID
and CESSATION YEAR
fields are of type float. The others are of type object.Record ID
, Institute
and WorkArea
.Reason for ceasing employment
with only one null value. Followed by the CESSATION YEAR
field with only 7 null values.Main Factor. Which of these was the main factor for leaving?
has the highest number of null values (589 approximately 84% of its values).del info_tafe
tafe_survey.head(6)
Record ID | Institute | WorkArea | CESSATION YEAR | Reason for ceasing employment | Contributing Factors. Career Move - Public Sector | Contributing Factors. Career Move - Private Sector | Contributing Factors. Career Move - Self-employment | Contributing Factors. Ill Health | Contributing Factors. Maternity/Family | Contributing Factors. Dissatisfaction | Contributing Factors. Job Dissatisfaction | Contributing Factors. Interpersonal Conflict | Contributing Factors. Study | Contributing Factors. Travel | Contributing Factors. Other | Contributing Factors. NONE | Main Factor. Which of these was the main factor for leaving? | InstituteViews. Topic:1. I feel the senior leadership had a clear vision and direction | InstituteViews. Topic:2. I was given access to skills training to help me do my job better | InstituteViews. Topic:3. I was given adequate opportunities for personal development | InstituteViews. Topic:4. I was given adequate opportunities for promotion within %Institute]Q25LBL% | InstituteViews. Topic:5. I felt the salary for the job was right for the responsibilities I had | InstituteViews. Topic:6. The organisation recognised when staff did good work | InstituteViews. Topic:7. Management was generally supportive of me | InstituteViews. Topic:8. Management was generally supportive of my team | InstituteViews. Topic:9. I was kept informed of the changes in the organisation which would affect me | InstituteViews. Topic:10. Staff morale was positive within the Institute | InstituteViews. Topic:11. If I had a workplace issue it was dealt with quickly | InstituteViews. Topic:12. If I had a workplace issue it was dealt with efficiently | InstituteViews. Topic:13. If I had a workplace issue it was dealt with discreetly | WorkUnitViews. Topic:14. I was satisfied with the quality of the management and supervision within my work unit | WorkUnitViews. Topic:15. I worked well with my colleagues | WorkUnitViews. Topic:16. My job was challenging and interesting | WorkUnitViews. Topic:17. I was encouraged to use my initiative in the course of my work | WorkUnitViews. Topic:18. I had sufficient contact with other people in my job | WorkUnitViews. Topic:19. I was given adequate support and co-operation by my peers to enable me to do my job | WorkUnitViews. Topic:20. I was able to use the full range of my skills in my job | WorkUnitViews. Topic:21. I was able to use the full range of my abilities in my job. ; Category:Level of Agreement; Question:YOUR VIEWS ABOUT YOUR WORK UNIT] | WorkUnitViews. Topic:22. I was able to use the full range of my knowledge in my job | WorkUnitViews. Topic:23. My job provided sufficient variety | WorkUnitViews. Topic:24. I was able to cope with the level of stress and pressure in my job | WorkUnitViews. Topic:25. My job allowed me to balance the demands of work and family to my satisfaction | WorkUnitViews. Topic:26. My supervisor gave me adequate personal recognition and feedback on my performance | WorkUnitViews. Topic:27. My working environment was satisfactory e.g. sufficient space, good lighting, suitable seating and working area | WorkUnitViews. Topic:28. I was given the opportunity to mentor and coach others in order for me to pass on my skills and knowledge prior to my cessation date | WorkUnitViews. Topic:29. There was adequate communication between staff in my unit | WorkUnitViews. Topic:30. Staff morale was positive within my work unit | Induction. Did you undertake Workplace Induction? | InductionInfo. Topic:Did you undertake a Corporate Induction? | InductionInfo. Topic:Did you undertake a Institute Induction? | InductionInfo. Topic: Did you undertake Team Induction? | InductionInfo. Face to Face Topic:Did you undertake a Corporate Induction; Category:How it was conducted? | InductionInfo. On-line Topic:Did you undertake a Corporate Induction; Category:How it was conducted? | InductionInfo. Induction Manual Topic:Did you undertake a Corporate Induction? | InductionInfo. Face to Face Topic:Did you undertake a Institute Induction? | InductionInfo. On-line Topic:Did you undertake a Institute Induction? | InductionInfo. Induction Manual Topic:Did you undertake a Institute Induction? | InductionInfo. Face to Face Topic: Did you undertake Team Induction; Category? | InductionInfo. On-line Topic: Did you undertake Team Induction?process you undertook and how it was conducted.] | InductionInfo. Induction Manual Topic: Did you undertake Team Induction? | Workplace. Topic:Did you and your Manager develop a Performance and Professional Development Plan (PPDP)? | Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination? | Workplace. Topic:Does your workplace promote and practice the principles of employment equity? | Workplace. Topic:Does your workplace value the diversity of its employees? | Workplace. Topic:Would you recommend the Institute as an employer to others? | Gender. What is your Gender? | CurrentAge. Current Age | Employment Type. Employment Type | Classification. Classification | LengthofServiceOverall. Overall Length of Service at Institute (in years) | LengthofServiceCurrent. Length of Service at current workplace (in years) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 634,133,009,996,093,952.00 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2,010.00 | Contract Expired | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Agree | Agree | Agree | Neutral | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Strongly Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Neutral | Agree | Agree | Yes | Yes | Yes | Yes | Face to Face | - | - | Face to Face | - | - | Face to Face | - | - | Yes | Yes | Yes | Yes | Yes | Female | 26 30 | Temporary Full-time | Administration (AO) | 1-2 | 1-2 |
1 | 634,133,654,064,530,944.00 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2,010.00 | Retirement | - | - | - | - | - | - | - | - | - | Travel | - | - | NaN | Agree | Agree | Agree | Agree | Agree | Strongly Agree | Strongly Agree | Agree | Strongly Agree | Agree | Agree | Agree | Disagree | Strongly Agree | Strongly Agree | Strongly Agree | Agree | Agree | Agree | Strongly Agree | Agree | Agree | Agree | Strongly Agree | Agree | Strongly Agree | Strongly Agree | Agree | Agree | Strongly Agree | No | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Yes | Yes | Yes | Yes | Yes | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 634,138,845,606,562,944.00 | Mount Isa Institute of TAFE | Delivery (teaching) | 2,010.00 | Retirement | - | - | - | - | - | - | - | - | - | - | - | NONE | NaN | Agree | Agree | Agree | Agree | Agree | Agree | Strongly Agree | Agree | Agree | Agree | Agree | Neutral | Neutral | Strongly Agree | Strongly Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | No | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Yes | Yes | Yes | Yes | Yes | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 634,139,903,350,000,000.00 | Mount Isa Institute of TAFE | Non-Delivery (corporate) | 2,010.00 | Resignation | - | - | - | - | - | - | - | - | - | Travel | - | - | NaN | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Yes | No | Yes | Yes | - | - | - | NaN | - | - | - | - | - | Yes | Yes | Yes | Yes | Yes | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 634,146,578,511,788,032.00 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2,010.00 | Resignation | - | Career Move - Private Sector | - | - | - | - | - | - | - | - | - | - | NaN | Agree | Agree | Strongly Agree | Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Agree | Strongly Agree | Strongly Agree | Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Yes | Yes | Yes | Yes | - | - | Induction Manual | Face to Face | - | - | Face to Face | - | - | Yes | Yes | Yes | Yes | Yes | Male | 41 45 | Permanent Full-time | Teacher (including LVT) | 3-4 | 3-4 |
5 | 634,147,506,906,311,040.00 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2,010.00 | Resignation | - | - | - | - | - | - | - | - | - | - | Other | - | NaN | Neutral | Neutral | Neutral | Neutral | Agree | Neutral | Neutral | Neutral | Neutral | Neutral | Neutral | Neutral | Neutral | Neutral | Neutral | NaN | Neutral | Neutral | Neutral | Neutral | Neutral | Neutral | Neutral | Neutral | Disagree | Neutral | Disagree | Neutral | Neutral | Neutral | Yes | Yes | Yes | NaN | - | On-line | - | Face to Face | - | - | - | - | - | Yes | NaN | NaN | NaN | NaN | Female | 56 or older | Contract/casual | Teacher (including LVT) | 7-10 | 7-10 |
tafe_survey.tail(6)
Record ID | Institute | WorkArea | CESSATION YEAR | Reason for ceasing employment | Contributing Factors. Career Move - Public Sector | Contributing Factors. Career Move - Private Sector | Contributing Factors. Career Move - Self-employment | Contributing Factors. Ill Health | Contributing Factors. Maternity/Family | Contributing Factors. Dissatisfaction | Contributing Factors. Job Dissatisfaction | Contributing Factors. Interpersonal Conflict | Contributing Factors. Study | Contributing Factors. Travel | Contributing Factors. Other | Contributing Factors. NONE | Main Factor. Which of these was the main factor for leaving? | InstituteViews. Topic:1. I feel the senior leadership had a clear vision and direction | InstituteViews. Topic:2. I was given access to skills training to help me do my job better | InstituteViews. Topic:3. I was given adequate opportunities for personal development | InstituteViews. Topic:4. I was given adequate opportunities for promotion within %Institute]Q25LBL% | InstituteViews. Topic:5. I felt the salary for the job was right for the responsibilities I had | InstituteViews. Topic:6. The organisation recognised when staff did good work | InstituteViews. Topic:7. Management was generally supportive of me | InstituteViews. Topic:8. Management was generally supportive of my team | InstituteViews. Topic:9. I was kept informed of the changes in the organisation which would affect me | InstituteViews. Topic:10. Staff morale was positive within the Institute | InstituteViews. Topic:11. If I had a workplace issue it was dealt with quickly | InstituteViews. Topic:12. If I had a workplace issue it was dealt with efficiently | InstituteViews. Topic:13. If I had a workplace issue it was dealt with discreetly | WorkUnitViews. Topic:14. I was satisfied with the quality of the management and supervision within my work unit | WorkUnitViews. Topic:15. I worked well with my colleagues | WorkUnitViews. Topic:16. My job was challenging and interesting | WorkUnitViews. Topic:17. I was encouraged to use my initiative in the course of my work | WorkUnitViews. Topic:18. I had sufficient contact with other people in my job | WorkUnitViews. Topic:19. I was given adequate support and co-operation by my peers to enable me to do my job | WorkUnitViews. Topic:20. I was able to use the full range of my skills in my job | WorkUnitViews. Topic:21. I was able to use the full range of my abilities in my job. ; Category:Level of Agreement; Question:YOUR VIEWS ABOUT YOUR WORK UNIT] | WorkUnitViews. Topic:22. I was able to use the full range of my knowledge in my job | WorkUnitViews. Topic:23. My job provided sufficient variety | WorkUnitViews. Topic:24. I was able to cope with the level of stress and pressure in my job | WorkUnitViews. Topic:25. My job allowed me to balance the demands of work and family to my satisfaction | WorkUnitViews. Topic:26. My supervisor gave me adequate personal recognition and feedback on my performance | WorkUnitViews. Topic:27. My working environment was satisfactory e.g. sufficient space, good lighting, suitable seating and working area | WorkUnitViews. Topic:28. I was given the opportunity to mentor and coach others in order for me to pass on my skills and knowledge prior to my cessation date | WorkUnitViews. Topic:29. There was adequate communication between staff in my unit | WorkUnitViews. Topic:30. Staff morale was positive within my work unit | Induction. Did you undertake Workplace Induction? | InductionInfo. Topic:Did you undertake a Corporate Induction? | InductionInfo. Topic:Did you undertake a Institute Induction? | InductionInfo. Topic: Did you undertake Team Induction? | InductionInfo. Face to Face Topic:Did you undertake a Corporate Induction; Category:How it was conducted? | InductionInfo. On-line Topic:Did you undertake a Corporate Induction; Category:How it was conducted? | InductionInfo. Induction Manual Topic:Did you undertake a Corporate Induction? | InductionInfo. Face to Face Topic:Did you undertake a Institute Induction? | InductionInfo. On-line Topic:Did you undertake a Institute Induction? | InductionInfo. Induction Manual Topic:Did you undertake a Institute Induction? | InductionInfo. Face to Face Topic: Did you undertake Team Induction; Category? | InductionInfo. On-line Topic: Did you undertake Team Induction?process you undertook and how it was conducted.] | InductionInfo. Induction Manual Topic: Did you undertake Team Induction? | Workplace. Topic:Did you and your Manager develop a Performance and Professional Development Plan (PPDP)? | Workplace. Topic:Does your workplace promote a work culture free from all forms of unlawful discrimination? | Workplace. Topic:Does your workplace promote and practice the principles of employment equity? | Workplace. Topic:Does your workplace value the diversity of its employees? | Workplace. Topic:Would you recommend the Institute as an employer to others? | Gender. What is your Gender? | CurrentAge. Current Age | Employment Type. Employment Type | Classification. Classification | LengthofServiceOverall. Overall Length of Service at Institute (in years) | LengthofServiceCurrent. Length of Service at current workplace (in years) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
696 | 635,065,973,820,625,024.00 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2,013.00 | Resignation | - | Career Move - Private Sector | - | - | - | - | - | - | - | - | - | - | NaN | Neutral | Strongly Disagree | Strongly Disagree | Not Applicable | Disagree | Agree | Agree | Agree | Agree | Neutral | Neutral | Neutral | Neutral | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Agree | Yes | No | NaN | NaN | - | - | - | Face to Face | - | - | Face to Face | - | - | Yes | Yes | Yes | Yes | Yes | Male | 21 25 | Temporary Full-time | Operational (OO) | 5-6 | 5-6 |
697 | 635,066,785,175,197,056.00 | Barrier Reef Institute of TAFE | Delivery (teaching) | 2,013.00 | Resignation | Career Move - Public Sector | - | - | - | - | - | - | - | - | - | - | - | NaN | Neutral | Agree | Agree | Neutral | Disagree | Neutral | Agree | Agree | Agree | Disagree | Agree | Agree | Agree | Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Neutral | Neutral | Neutral | Agree | Agree | Neutral | Neutral | Agree | Neutral | Neutral | Neutral | No | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Yes | Yes | Yes | Yes | Yes | Male | 51-55 | Temporary Full-time | Teacher (including LVT) | 1-2 | 1-2 |
698 | 635,067,716,405,666,048.00 | Southern Queensland Institute of TAFE | Non-Delivery (corporate) | 2,013.00 | Resignation | Career Move - Public Sector | - | - | - | - | - | - | - | - | - | - | - | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
699 | 635,070,442,972,541,056.00 | Tropical North Institute of TAFE | Delivery (teaching) | 2,013.00 | Resignation | - | - | - | - | - | - | - | - | - | - | Other | - | NaN | Agree | Strongly Agree | Strongly Agree | Neutral | Agree | Agree | Agree | Agree | Agree | Neutral | Agree | Agree | Agree | Agree | Agree | Strongly Agree | Strongly Agree | Agree | Agree | Agree | Agree | Agree | Strongly Agree | Agree | Strongly Agree | Strongly Agree | Strongly Agree | Agree | Agree | Agree | Yes | No | Yes | Yes | - | - | - | - | - | Induction Manual | Face to Face | - | - | Yes | Yes | Yes | Yes | Yes | Female | 51-55 | Permanent Full-time | Teacher (including LVT) | 5-6 | 1-2 |
700 | 635,071,153,170,978,944.00 | Southbank Institute of Technology | Non-Delivery (corporate) | 2,013.00 | Contract Expired | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Strongly Agree | Strongly Disagree | Strongly Disagree | Strongly Disagree | Disagree | Agree | Neutral | Neutral | Strongly Agree | Agree | Strongly Disagree | Strongly Disagree | Strongly Disagree | Strongly Disagree | Agree | Strongly Agree | Neutral | Agree | Disagree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Disagree | Strongly Disagree | Disagree | Agree | Strongly Agree | Agree | Neutral | Yes | No | Yes | Yes | - | On-line | - | Face to Face | - | - | Face to Face | - | - | No | No | No | Yes | No | Female | 41 45 | Temporary Full-time | Professional Officer (PO) | 1-2 | 1-2 |
701 | 635,073,030,973,790,976.00 | Tropical North Institute of TAFE | Non-Delivery (corporate) | 2,013.00 | Resignation | - | - | Career Move - Self-employment | - | - | - | - | - | - | Travel | - | - | Career Move - Self-employment | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Neutral | Strongly Agree | Strongly Agree | Agree | Disagree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Strongly Agree | Neutral | Strongly Agree | Strongly Agree | Yes | Yes | Yes | Yes | - | - | - | - | - | - | - | - | - | Yes | Yes | Yes | Yes | Yes | Female | 26 30 | Contract/casual | Administration (AO) | 3-4 | 1-2 |
The Record ID
column contains numeric values that are too large.
Previously, in the DETE survey, we saw that the values of the fields ranging from Professional develoment
to Health and safety
were denotative and we had no information about them. Here in the TAFE survey we seem to find their meanings. These appear in the field values ranging from InstituteViews. Topic: 1. I feel the senior leadership had a clear vision and direction
to the field WorkUnitViews. Topic: 30. Staff morale was positive within my work unit
.
Next, we review the frequency tables of some fields and determine if these will be necessary to carry out our analysis.
tafe_survey['WorkArea'].value_counts(dropna=False)
Non-Delivery (corporate) 432 Delivery (teaching) 270 Name: WorkArea, dtype: int64
In the column WorkArea
we only find two values. Being the value Non-Delivery (corporate) the one that predominates with approximately 62%.
tafe_survey['Reason for ceasing employment'].value_counts(dropna=False)
Resignation 340 Contract Expired 127 Retrenchment/ Redundancy 104 Retirement 82 Transfer 25 Termination 23 NaN 1 Name: Reason for ceasing employment, dtype: int64
In the column Reason for ceasing employment
the predominant value is Resignation with approximately 48%. Followed by the Contract Expired value with approximately 18% (making a total of 66%).
Now let's look at the values of some of the columns starting with the phrase Contributing Factors...
since together they have many null values.
tafe_survey['Contributing Factors. Career Move - Public Sector '].value_counts(dropna=False)
- 375 NaN 265 Career Move - Public Sector 62 Name: Contributing Factors. Career Move - Public Sector , dtype: int64
tafe_survey['Contributing Factors. Career Move - Private Sector '].value_counts(dropna=False)
- 336 NaN 265 Career Move - Private Sector 101 Name: Contributing Factors. Career Move - Private Sector , dtype: int64
tafe_survey['Contributing Factors. Career Move - Self-employment'].value_counts(dropna=False)
- 420 NaN 265 Career Move - Self-employment 17 Name: Contributing Factors. Career Move - Self-employment, dtype: int64
We see that in addition to having NaN values there are also many values equal to '-', which are also null. The rest practically shows whether the factor was the reason for the exit or not. It would have been more efficient to use a single column in which the respondent indicates the factor contributing to his or her exit.
tafe_survey['Employment Type. Employment Type'].value_counts(dropna=False)
Permanent Full-time 237 Temporary Full-time 177 NaN 106 Contract/casual 71 Permanent Part-time 59 Temporary Part-time 52 Name: Employment Type. Employment Type, dtype: int64
In the field Employment Type. Employment Type
the predominant value is Permanent Full-time with approximately 34%. Followed by the value Temporary Full-time with almost 25% (making a total of 59%).
Let us now review some of the fields that contained denotative values in the DETE dataset and compare them with some of the fields in the TAFE dataset.
dete_survey['Opportunities for promotion'].value_counts(dropna=False)
A 242 N 230 SA 100 NaN 87 D 83 SD 56 M 24 Name: Opportunities for promotion, dtype: int64
dete_survey['Information'].value_counts(dropna=False)
A 436 SA 165 N 120 D 45 SD 39 M 11 NaN 6 Name: Information, dtype: int64
tafe_survey['InstituteViews. Topic:1. I feel the senior leadership had a clear vision and direction'].value_counts(dropna=False)
Agree 233 Neutral 150 NaN 94 Strongly Agree 87 Disagree 84 Strongly Disagree 49 Not Applicable 5 Name: InstituteViews. Topic:1. I feel the senior leadership had a clear vision and direction, dtype: int64
tafe_survey['WorkUnitViews. Topic:30. Staff morale was positive within my work unit'].value_counts(dropna=False)
Agree 236 Neutral 119 Strongly Agree 104 NaN 96 Strongly Disagree 70 Disagree 69 Not Applicable 8 Name: WorkUnitViews. Topic:30. Staff morale was positive within my work unit, dtype: int64
Now that we see the values in parallel, we can state that the values in the TAFE dataset correspond to the meaning of the denoted values in the DETE dataset. However, these fields turn out to be irrelevant for carrying out our analysis, so they will be removed later.
Let's start by re-reading the values from the dete_survey.csv
dataset but this time we'll replace those values that appear as Not Stated with NaN instead.
dete_survey = pd.read_csv('dete_survey.csv', na_values='Not Stated')
In the DETE dataset, we're going to eliminate the columns containing the denoted values and also the last 5, in which null values predominate.
To carry out the first part, we can make use of the get_loc() method in order to obtain the index that occupies a given column. We can also go back to the dataset information view and see directly there the values of the indexes that make up the range we want to delete.
print('From index {}'.format(dete_survey.columns.get_loc('Professional Development')), end=' ')
print('to {}.'.format(dete_survey.columns.get_loc('Health & Safety')))
From index 28 to 48.
dete_survey_updated = dete_survey.drop(dete_survey.columns[28:49], axis=1)
# Dropping the last 5 columns
dete_survey_updated = dete_survey_updated.drop(dete_survey_updated.columns[-5:], axis=1)
# Verifying results
info_dete_updated = df_info(dete_survey_updated, title_df='DETE - Survey (Updated)')
info_dete_updated
DETE - Survey (Updated) <class 'pandas.core.frame.DataFrame'>: -------------------------------------------------------------- RangeIndex: 822 entries, 0 to 821 Data columns (total 30 columns): DTypes: bool(18) object(9) float64(2) int64(1)
Column Name | Non-null Count | Missing Values Count | Missing Values % | DType | |
---|---|---|---|---|---|
0 | ID | 822 | 0 | 0.00 | int64 |
1 | SeparationType | 822 | 0 | 0.00 | object |
2 | Cease Date | 788 | 34 | 4.14 | object |
3 | DETE Start Date | 749 | 73 | 8.88 | float64 |
4 | Role Start Date | 724 | 98 | 11.92 | float64 |
5 | Position | 817 | 5 | 0.61 | object |
6 | Classification | 455 | 367 | 44.65 | object |
7 | Region | 717 | 105 | 12.77 | object |
8 | Business Unit | 126 | 696 | 84.67 | object |
9 | Employment Status | 817 | 5 | 0.61 | object |
10 | Career move to public sector | 822 | 0 | 0.00 | bool |
11 | Career move to private sector | 822 | 0 | 0.00 | bool |
12 | Interpersonal conflicts | 822 | 0 | 0.00 | bool |
13 | Job dissatisfaction | 822 | 0 | 0.00 | bool |
14 | Dissatisfaction with the department | 822 | 0 | 0.00 | bool |
15 | Physical work environment | 822 | 0 | 0.00 | bool |
16 | Lack of recognition | 822 | 0 | 0.00 | bool |
17 | Lack of job security | 822 | 0 | 0.00 | bool |
18 | Work location | 822 | 0 | 0.00 | bool |
19 | Employment conditions | 822 | 0 | 0.00 | bool |
20 | Maternity/family | 822 | 0 | 0.00 | bool |
21 | Relocation | 822 | 0 | 0.00 | bool |
22 | Study/Travel | 822 | 0 | 0.00 | bool |
23 | Ill Health | 822 | 0 | 0.00 | bool |
24 | Traumatic incident | 822 | 0 | 0.00 | bool |
25 | Work life balance | 822 | 0 | 0.00 | bool |
26 | Workload | 822 | 0 | 0.00 | bool |
27 | None of the above | 822 | 0 | 0.00 | bool |
28 | Gender | 798 | 24 | 2.92 | object |
29 | Age | 811 | 11 | 1.34 | object |
Dropping columns in the TAFE dataset.
del info_dete_updated
print('From index {}'.format(tafe_survey.columns.get_loc('Main Factor. Which of these was the main factor for leaving?')), end=' ')
print('to {}.'.format(tafe_survey.columns.get_loc('Workplace. Topic:Would you recommend the Institute as an employer to others?')))
From index 17 to 65.
tafe_survey_updated = tafe_survey.drop(tafe_survey.columns[17:66], axis=1)
# Verifying results
info_tafe_updated = df_info(tafe_survey_updated, title_df='TAFE - Survey (Updated)')
info_tafe_updated
TAFE - Survey (Updated) <class 'pandas.core.frame.DataFrame'>: -------------------------------------------------------------- RangeIndex: 702 entries, 0 to 701 Data columns (total 23 columns): DTypes: object(21) float64(2)
Column Name | Non-null Count | Missing Values Count | Missing Values % | DType | |
---|---|---|---|---|---|
0 | Record ID | 702 | 0 | 0.00 | float64 |
1 | Institute | 702 | 0 | 0.00 | object |
2 | WorkArea | 702 | 0 | 0.00 | object |
3 | CESSATION YEAR | 695 | 7 | 1.00 | float64 |
4 | Reason for ceasing employment | 701 | 1 | 0.14 | object |
5 | Contributing Factors. Career Move - Public Sector | 437 | 265 | 37.75 | object |
6 | Contributing Factors. Career Move - Private Sector | 437 | 265 | 37.75 | object |
7 | Contributing Factors. Career Move - Self-employment | 437 | 265 | 37.75 | object |
8 | Contributing Factors. Ill Health | 437 | 265 | 37.75 | object |
9 | Contributing Factors. Maternity/Family | 437 | 265 | 37.75 | object |
10 | Contributing Factors. Dissatisfaction | 437 | 265 | 37.75 | object |
11 | Contributing Factors. Job Dissatisfaction | 437 | 265 | 37.75 | object |
12 | Contributing Factors. Interpersonal Conflict | 437 | 265 | 37.75 | object |
13 | Contributing Factors. Study | 437 | 265 | 37.75 | object |
14 | Contributing Factors. Travel | 437 | 265 | 37.75 | object |
15 | Contributing Factors. Other | 437 | 265 | 37.75 | object |
16 | Contributing Factors. NONE | 437 | 265 | 37.75 | object |
17 | Gender. What is your Gender? | 596 | 106 | 15.10 | object |
18 | CurrentAge. Current Age | 596 | 106 | 15.10 | object |
19 | Employment Type. Employment Type | 596 | 106 | 15.10 | object |
20 | Classification. Classification | 596 | 106 | 15.10 | object |
21 | LengthofServiceOverall. Overall Length of Service at Institute (in ye... | 596 | 106 | 15.10 | object |
22 | LengthofServiceCurrent. Length of Service at current workplace (in ye... | 596 | 106 | 15.10 | object |
del info_tafe_updated
Next, let's turn our attention to the column names. Each dataframe contains many of the same columns, but the column names are different. Below are some of the columns we'd like to use for our final analysis:
DETE_Survey | TAFE_Survey | Definition |
---|---|---|
ID | Record ID | An id used to identify the participant of the survey |
SeparationType | Reason for ceasing employment | The reason why the participant's employment ended |
Cease Date | CESSATION YEAR | The year or month the participant's employment ended |
DETE Start Date | The year the participant began employment with the DETE | |
LengthofServiceOverall. Overall Length of Service at Institute (in years) | The length of the person's employment (in years) | |
Age | CurrentAge. Current Age | The age of the participant |
Gender | Gender. What is your Gender? | The gender of the participant |
Because we eventually want to combine them, we'll have to standardize the column names.
Let's start by correcting the column names in the DETE-Survey dataset.
dete_survey_updated.columns = dete_survey_updated.columns.str.strip().str.lower().str.replace(' ', '_')
dete_survey_updated.rename({'separationtype': 'separation_type'}, axis=1, inplace=True)
print('DETE Survey (updated) - Columns:', '-' * 32, *dete_survey_updated.columns, sep='\n')
DETE Survey (updated) - Columns: -------------------------------- id separation_type cease_date dete_start_date role_start_date position classification region business_unit employment_status career_move_to_public_sector career_move_to_private_sector interpersonal_conflicts job_dissatisfaction dissatisfaction_with_the_department physical_work_environment lack_of_recognition lack_of_job_security work_location employment_conditions maternity/family relocation study/travel ill_health traumatic_incident work_life_balance workload none_of_the_above gender age
Now, we'll correct the column names in the TAFE-Survey dataset.
map_name_cols = {'Record ID': 'id', 'CESSATION YEAR': 'cease_date', 'Reason for ceasing employment': 'separation_type',
'Gender. What is your Gender?': 'gender', 'CurrentAge. Current Age': 'age',
'Employment Type. Employment Type': 'employment_status', 'Classification. Classification': 'position',
'LengthofServiceOverall. Overall Length of Service at Institute (in years)': 'institute_service',
'LengthofServiceCurrent. Length of Service at current workplace (in years)': 'role_service'
}
tafe_survey_updated.rename(map_name_cols, axis=1, inplace=True)
print('\nTAFE Survey (updated) - Columns:', '-' * 32, *tafe_survey_updated.columns, sep='\n')
TAFE Survey (updated) - Columns: -------------------------------- id Institute WorkArea cease_date separation_type Contributing Factors. Career Move - Public Sector Contributing Factors. Career Move - Private Sector Contributing Factors. Career Move - Self-employment Contributing Factors. Ill Health Contributing Factors. Maternity/Family Contributing Factors. Dissatisfaction Contributing Factors. Job Dissatisfaction Contributing Factors. Interpersonal Conflict Contributing Factors. Study Contributing Factors. Travel Contributing Factors. Other Contributing Factors. NONE gender age employment_status position institute_service role_service
For the development of this project we only need to focus on those people who resigned, that is why we'll filter both datasets taking as into account those values in the column separation_type
that contain the word 'Resignation'.
dete_survey_updated['separation_type'].value_counts(dropna=False)
Age Retirement 285 Resignation-Other reasons 150 Resignation-Other employer 91 Resignation-Move overseas/interstate 70 Voluntary Early Retirement (VER) 67 Ill Health Retirement 61 Other 49 Contract Expired 34 Termination 15 Name: separation_type, dtype: int64
tafe_survey_updated['separation_type'].value_counts(dropna=False)
Resignation 340 Contract Expired 127 Retrenchment/ Redundancy 104 Retirement 82 Transfer 25 Termination 23 NaN 1 Name: separation_type, dtype: int64
In the DETE-Survey dataset we see that in the column separation_type
there are 3 values containing the word 'Resignation'. Whereas, in the TAFE-Survey dataset, there is only one.
Next, in the DETE-Survey dataset, we'll replace the values of the records in the separation_type
column that contain the word 'Resignation' and replace them only with that word.
dete_survey_updated['separation_type'] = dete_survey_updated['separation_type'].str.split('-').str[0]
# Check the values in the separationtype column were updated correctly
dete_survey_updated['separation_type'].value_counts()
Resignation 311 Age Retirement 285 Voluntary Early Retirement (VER) 67 Ill Health Retirement 61 Other 49 Contract Expired 34 Termination 15 Name: separation_type, dtype: int64
# Select only the resignation separation types from each dataframe
dete_resignations = dete_survey_updated[dete_survey_updated['separation_type'] == 'Resignation'].copy()
tafe_resignations = tafe_survey_updated[tafe_survey_updated['separation_type'] == 'Resignation'].copy()
dim_dete = dete_resignations.shape
dim_tafe = tafe_resignations.shape
print('DETE-Survey (Resignations):\nTotal Rows: {}\nTotal Columns: {}'.format(dim_dete[0], dim_dete[1]))
print('\nTAFE-Survey (Resignations):\nTotal Rows: {}\nTotal Columns: {}'.format(dim_tafe[0], dim_tafe[1]))
DETE-Survey (Resignations): Total Rows: 311 Total Columns: 30 TAFE-Survey (Resignations): Total Rows: 340 Total Columns: 23
Below, we clean and explore the cease_date
and dete_start_date
columns (the latter in the DETE dataset) to make sure all of the years make sense. We'll use the following criteria:
cease_date
is the last year of the person's employment and the dete_start_date
is the person's first year of employment, it wouldn't make sense to have years after the current date.dete_start_date
was before the year 1940.dete_resignations['cease_date'].value_counts(dropna=False)
2012 126 2013 74 01/2014 22 12/2013 17 06/2013 14 09/2013 11 NaN 11 11/2013 9 07/2013 9 10/2013 6 08/2013 4 05/2012 2 05/2013 2 07/2006 1 07/2012 1 2010 1 09/2010 1 Name: cease_date, dtype: int64
# Extract the years and convert them to a float type
dete_resignations['cease_date'] = dete_resignations['cease_date'].str[-4:].astype(float)
# Check the values again and look for outliers
dete_resignations['cease_date'].value_counts().sort_index()
2,006.00 1 2,010.00 2 2,012.00 129 2,013.00 146 2,014.00 22 Name: cease_date, dtype: int64
dete_resignations['dete_start_date'].value_counts(dropna=False).sort_index()
1,963.00 1 1,971.00 1 1,972.00 1 1,973.00 1 1,974.00 2 1,975.00 1 1,976.00 2 1,977.00 1 1,980.00 5 1,982.00 1 1,983.00 2 1,984.00 1 1,985.00 3 1,986.00 3 1,987.00 1 1,988.00 4 1,989.00 4 1,990.00 5 1,991.00 4 1,992.00 6 1,993.00 5 1,994.00 6 1,995.00 4 1,996.00 6 1,997.00 5 1,998.00 6 1,999.00 8 2,000.00 9 2,001.00 3 2,002.00 6 2,003.00 6 2,004.00 14 2,005.00 15 2,006.00 13 2,007.00 21 2,008.00 22 2,009.00 13 2,010.00 17 2,011.00 24 2,012.00 21 2,013.00 10 nan 28 Name: dete_start_date, dtype: int64
tafe_resignations['cease_date'].value_counts(dropna=False).sort_index()
2,009.00 2 2,010.00 68 2,011.00 116 2,012.00 94 2,013.00 55 nan 5 Name: cease_date, dtype: int64
dates = [dete_resignations['cease_date'], tafe_resignations['cease_date'], dete_resignations['dete_start_date']]
df_dates = pd.DataFrame(dates, index=['DETE_cease_date', 'TAFE_cease_date', 'DETE_start_date']).transpose()
ax = df_dates.plot.box(title='Date fields of the DETE and TAFE institute employees', figsize=(10, 8))
ax.yaxis.grid()
The values of the cease_date
field in both the DETE and TAFE datasets are within the expected range (despite some outliers). We note that the oldest year in DETE is 2006, while in TAFE it is 2009. Although the data show some disparity, such a situation would not affect our analysis.
We also observed normal values in the dete_start_date
field (despite registering some outliers). Being 1963 the oldest year.
To carry out our analysis, we need to have the service time of each employee available. In the TAFE dataset we already have these values in the field we call institute_service
. However, the DETE dataset lacks such a field.
To obtain the service time of each employee in the DETE dataset, we'll create a new column called institute_service
, in which we'll store the difference of the fields cease_date
and dete_start_date
.
del dates, df_dates
dete_resignations['institute_service'] = dete_resignations['cease_date'] - dete_resignations['dete_start_date']
# Checking the values
dete_resignations['institute_service'].value_counts(dropna=False).sort_index()
0.00 20 1.00 22 2.00 14 3.00 20 4.00 16 5.00 23 6.00 17 7.00 13 8.00 8 9.00 14 10.00 6 11.00 4 12.00 6 13.00 8 14.00 6 15.00 7 16.00 5 17.00 6 18.00 5 19.00 3 20.00 7 21.00 3 22.00 6 23.00 4 24.00 4 25.00 2 26.00 2 27.00 1 28.00 2 29.00 1 30.00 2 31.00 1 32.00 3 33.00 1 34.00 1 35.00 1 36.00 2 38.00 1 39.00 3 41.00 1 42.00 1 49.00 1 nan 38 Name: institute_service, dtype: int64
tafe_resignations['institute_service'].value_counts(dropna=False)
Less than 1 year 73 1-2 64 3-4 63 NaN 50 5-6 33 11-20 26 7-10 21 More than 20 years 10 Name: institute_service, dtype: int64
dete_resignations['institute_service'].value_counts(bins=9)
(-0.05, 5.444] 115 (5.444, 10.889] 58 (10.889, 16.333] 36 (16.333, 21.778] 24 (21.778, 27.222] 19 (27.222, 32.667] 9 (32.667, 38.111] 6 (38.111, 43.556] 5 (43.556, 49.0] 1 Name: institute_service, dtype: int64
Excluding the null values of the institute_service
field from the TAFE dataset, 25% (73) of the employees resigned due to some type of dissatisfaction and only worked for less than one year. This is followed by 22% (64) of those who worked between 1 and 2 years (making a total of 47%).
Similarly, excluding the null values of the institute_service
field from the DETE dataset, we observe that 42% (115) of the employees worked at most 5 years. This is followed by 21% (58) of those who worked for a period of 5 to 10 years. (making a total of 63%).
Now we'll change the values of the institute_service
field of the TAFE dataset. We'll replace the values of each range by their average (we'll keep the only numeric values in some cases). This will facilitate further sorting.
We'll implement a function that allows us to extract the numerical values and then obtain their average.
def range_to_unique(val):
if pd.notna(val):
if ' ' in val:
numbers = [int(n) for n in val.split() if n.isdigit()]
else:
numbers = [int(n) for n in val.split('-') if n.isdigit()]
return int(sum(numbers) / len(numbers))
return np.nan
tafe_resignations['institute_service'] = tafe_resignations['institute_service'].map(range_to_unique)
# Checking values
tafe_resignations['institute_service'].value_counts(dropna=False).sort_index()
1.00 137 3.00 63 5.00 33 8.00 21 15.00 26 20.00 10 nan 50 Name: institute_service, dtype: int64
Before starting with the identification of dissatisfied employees we will correct the values of the age
field in both data sets.
dete_resignations['age'].value_counts(dropna=False)
41-45 48 46-50 42 36-40 41 26-30 35 51-55 32 31-35 29 21-25 29 56-60 26 61 or older 23 NaN 5 20 or younger 1 Name: age, dtype: int64
tafe_resignations['age'].value_counts(dropna=False)
NaN 50 41 45 45 51-55 39 46 50 39 21 25 33 26 30 32 31 35 32 36 40 32 56 or older 29 20 or younger 9 Name: age, dtype: int64
The values are almost similar, the main difference is that in the DETE data set there is a different value *(56-60)* and an additional value *(61 or older), which represent the higher ranges. While in the TAFE* dataset the value that represents the highest range is *56 or older*.
In order to make it easier to sort the values later, we'll replace the range of ages by their average (in some cases by the single numeric value). To do this, we'll use the range_to_unique function that we implemented earlier.
dete_resignations['age'] = dete_resignations['age'].map(range_to_unique)
tafe_resignations['age'] = tafe_resignations['age'].map(range_to_unique)
# Checking values
print('Age column DETE (Resgnations):', '-' * 30, sep='\n')
print(dete_resignations['age'].value_counts(dropna=False).sort_index())
print('\nAge column TAFE (Resgnations):', '-' * 30, sep='\n')
print(tafe_resignations['age'].value_counts(dropna=False).sort_index())
Age column DETE (Resgnations): ------------------------------ 20.00 1 23.00 29 28.00 35 33.00 29 38.00 41 43.00 48 48.00 42 53.00 32 58.00 26 61.00 23 nan 5 Name: age, dtype: int64 Age column TAFE (Resgnations): ------------------------------ 20.00 9 23.00 33 28.00 32 33.00 32 38.00 32 43.00 45 48.00 39 53.00 39 56.00 29 nan 50 Name: age, dtype: int64
Now that we have completed the data cleansing process, we will proceed with the identification of dissatisfied employees.
Below are the columns we'll use to categorize employees as dissatisfied from each dataframe.
Contributing Factors. Dissatisfaction
Contributing Factors. Job Dissatisfaction
job_dissatisfaction
dissatisfaction_with_the_department
physical_work_environment
lack_of_recognition
lack_of_job_security
work_location
employment_conditions
work_life_balance
workload
If the employee indicated any of the factors above caused them to resign, we'll mark them as dissatisfied in a new column (True, False, or NaN values):
tafe_resignations['Contributing Factors. Dissatisfaction'].value_counts(dropna=False)
- 277 Contributing Factors. Dissatisfaction 55 NaN 8 Name: Contributing Factors. Dissatisfaction, dtype: int64
tafe_resignations['Contributing Factors. Job Dissatisfaction'].value_counts(dropna=False)
- 270 Job Dissatisfaction 62 NaN 8 Name: Contributing Factors. Job Dissatisfaction, dtype: int64
# Update the values in the contributing factors columns to be either True, False, or NaN
def update_vals(x):
if x == '-':
return False
elif pd.isna(x):
return np.nan
return True
tafe_resignations['dissatisfied'] = tafe_resignations[['Contributing Factors. Dissatisfaction', 'Contributing Factors. Job Dissatisfaction']].applymap(update_vals).any(axis=1, skipna=False)
tafe_resignations['dissatisfied'].value_counts(dropna=False)
False 241 True 91 NaN 8 Name: dissatisfied, dtype: int64
Now that we have updated the fields in the TAFE dataset, we'll do the same for the fields in the DETE dataset. To do this, we'll display the frequencies of all the fields and then see if we need to use the update_vals function before updating their values.
dete_cols_up = ['job_dissatisfaction', 'dissatisfaction_with_the_department',
'physical_work_environment', 'lack_of_recognition',
'lack_of_job_security', 'work_location',
'employment_conditions', 'work_life_balance', 'workload']
df_dete_cols_vc = pd.DataFrame()
for col in dete_cols_up:
df_dete_cols_vc[col] = dete_resignations[col].value_counts(dropna=False)
df_dete_cols_vc
job_dissatisfaction | dissatisfaction_with_the_department | physical_work_environment | lack_of_recognition | lack_of_job_security | work_location | employment_conditions | work_life_balance | workload | |
---|---|---|---|---|---|---|---|---|---|
False | 270 | 282 | 305 | 278 | 297 | 293 | 288 | 243 | 284 |
True | 41 | 29 | 6 | 33 | 14 | 18 | 23 | 68 | 27 |
All values are of data type bool so we'll go directly to update them.
del df_dete_cols_vc
dete_resignations['dissatisfied'] = dete_resignations[dete_cols_up].any(axis=1, skipna=False)
dete_resignations['dissatisfied'].value_counts(dropna=False)
False 162 True 149 Name: dissatisfied, dtype: int64
Now that we have obtained the records of employees who resigned due to some type of dissatisfaction. We can be left with only the columns we need to perform our analysis.
Next, we will review some values from both data sets and decide which columns to keep.
dete_resignations.sample(6)
id | separation_type | cease_date | dete_start_date | role_start_date | position | classification | region | business_unit | employment_status | career_move_to_public_sector | career_move_to_private_sector | interpersonal_conflicts | job_dissatisfaction | dissatisfaction_with_the_department | physical_work_environment | lack_of_recognition | lack_of_job_security | work_location | employment_conditions | maternity/family | relocation | study/travel | ill_health | traumatic_incident | work_life_balance | workload | none_of_the_above | gender | age | institute_service | dissatisfied | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
662 | 663 | Resignation | 2,013.00 | nan | nan | Teacher | Primary | Central Queensland | NaN | Permanent Part-time | False | False | False | False | False | False | False | False | False | False | True | False | False | False | False | False | False | False | Female | 48.00 | nan | False |
517 | 518 | Resignation | 2,010.00 | 2,007.00 | 2,007.00 | Teacher Aide | NaN | Metropolitan | NaN | Permanent Full-time | False | False | False | False | False | False | False | False | False | False | False | True | False | False | False | False | False | False | NaN | 33.00 | 3.00 | False |
562 | 563 | Resignation | 2,013.00 | 2,012.00 | 2,012.00 | Cleaner | NaN | Central Office | Education Queensland | Permanent Part-time | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | Female | 43.00 | 1.00 | False |
506 | 507 | Resignation | 2,013.00 | 2,009.00 | 2,010.00 | Public Servant | A01-A04 | North Coast | NaN | Permanent Full-time | True | False | False | False | True | False | False | False | False | False | False | False | False | True | False | False | False | False | Female | 23.00 | 4.00 | True |
410 | 411 | Resignation | 2,013.00 | 2,008.00 | 2,008.00 | Teacher | Primary | Central Queensland | NaN | Permanent Full-time | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | Female | 28.00 | 5.00 | False |
406 | 407 | Resignation | 2,013.00 | 1,991.00 | 1,992.00 | Teacher Aide | NaN | Darling Downs South West | NaN | Permanent Part-time | False | False | False | False | False | False | True | False | False | False | False | False | False | True | True | False | False | False | Female | 58.00 | 22.00 | True |
tafe_resignations.sample(6)
id | Institute | WorkArea | cease_date | separation_type | Contributing Factors. Career Move - Public Sector | Contributing Factors. Career Move - Private Sector | Contributing Factors. Career Move - Self-employment | Contributing Factors. Ill Health | Contributing Factors. Maternity/Family | Contributing Factors. Dissatisfaction | Contributing Factors. Job Dissatisfaction | Contributing Factors. Interpersonal Conflict | Contributing Factors. Study | Contributing Factors. Travel | Contributing Factors. Other | Contributing Factors. NONE | gender | age | employment_status | position | institute_service | role_service | dissatisfied | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
650 | 634,982,066,598,317,056.00 | Southern Queensland Institute of TAFE | Delivery (teaching) | 2,013.00 | Resignation | - | Career Move - Private Sector | - | - | - | - | - | - | - | - | - | - | Male | 53.00 | Permanent Full-time | Teacher (including LVT) | 8.00 | 7-10 | False |
684 | 635,037,453,598,125,056.00 | Brisbane North Institute of TAFE | Non-Delivery (corporate) | 2,013.00 | Resignation | Career Move - Public Sector | - | - | - | - | - | - | - | - | - | - | - | Male | 43.00 | Contract/casual | Administration (AO) | 3.00 | Less than 1 year | False |
545 | 634,825,691,015,155,968.00 | Brisbane North Institute of TAFE | Non-Delivery (corporate) | 2,012.00 | Resignation | - | Career Move - Private Sector | - | - | - | - | - | - | - | - | Other | - | Male | 53.00 | Temporary Full-time | Administration (AO) | 1.00 | 1-2 | False |
197 | 634,444,676,214,219,008.00 | Brisbane North Institute of TAFE | Non-Delivery (corporate) | 2,011.00 | Resignation | Career Move - Public Sector | - | - | - | - | - | - | - | - | - | Other | - | Female | 43.00 | Contract/casual | Administration (AO) | 3.00 | 5-6 | False |
55 | 634,218,492,665,218,944.00 | Southbank Institute of Technology | Non-Delivery (corporate) | 2,010.00 | Resignation | - | Career Move - Private Sector | - | - | - | - | - | - | - | - | - | - | Female | 33.00 | Temporary Full-time | Administration (AO) | 1.00 | Less than 1 year | False |
276 | 634,558,122,915,155,968.00 | SkillsTech Australia | Delivery (teaching) | 2,011.00 | Resignation | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | nan | NaN | NaN | nan | NaN | NaN |
In both DataFrames, we'll drop the columns we used to identify employees who resigned due to some type of dissatisfaction. We'll also drop the region
column from the DETE dataset and the WorkArea
column from the TAFE dataset.
Apart from the columns we mentioned, we'll also skip some others that turn out to be irrelevant for our analysis. However, it's worth noting that we'll no longer need the column
separation_type
since we know that our records correspond to employees who resigned due to some kind of dissatisfaction and this column only presents a single value for all records.
dete_resignations_up = dete_resignations[['id', 'cease_date', 'position', 'employment_status', 'gender', 'age', 'institute_service', 'dissatisfied']].copy()
tafe_resignations_up = tafe_resignations[['id', 'cease_date', 'gender', 'age', 'employment_status', 'position', 'institute_service', 'dissatisfied']].copy()
dete_resignations_up.columns
Index(['id', 'cease_date', 'position', 'employment_status', 'gender', 'age', 'institute_service', 'dissatisfied'], dtype='object')
tafe_resignations_up.columns
Index(['id', 'cease_date', 'gender', 'age', 'employment_status', 'position', 'institute_service', 'dissatisfied'], dtype='object')
Now, we'll combine both DataFrames. To do so, we'll add a column that we will call institute
to each one of them; this in order to be able to identify them when we perform our analysis. After that, we'll combine both DataFrames.
dete_resignations_up['institute'] = 'DETE'
tafe_resignations_up['institute'] = 'TAFE'
combined = pd.concat([dete_resignations_up, tafe_resignations_up], ignore_index=True)
combined.sample(6)
id | cease_date | position | employment_status | gender | age | institute_service | dissatisfied | institute | |
---|---|---|---|---|---|---|---|---|---|
20 | 43.00 | 2,012.00 | Cleaner | Permanent Part-time | Female | 43.00 | 1.00 | False | DETE |
49 | 139.00 | 2,012.00 | Teacher Aide | Permanent Part-time | Female | 43.00 | 8.00 | False | DETE |
83 | 264.00 | 2,013.00 | Teacher | Permanent Full-time | Female | 23.00 | 4.00 | False | DETE |
133 | 382.00 | 2,013.00 | Teacher | Permanent Full-time | Female | 28.00 | 5.00 | True | DETE |
156 | 435.00 | 2,013.00 | Head of Curriculum/Head of Special Education | Permanent Full-time | Female | 28.00 | 7.00 | True | DETE |
210 | 576.00 | 2,013.00 | Public Servant | Permanent Full-time | Female | 33.00 | 9.00 | True | DETE |
Next, we'll classify the employees based on their length of service. For this, we'll take into account the values of the column institute_service
under the following definitions:
# Creating a function for mapping values to the career stages
def service_category(val):
if pd.isna(val):
return np.nan
elif val < 3:
return 'New'
elif val < 7:
return 'Experienced'
elif val < 11:
return 'Established'
else:
return 'Veteran'
combined['service_cat'] = combined['institute_service'].apply(service_category)
combined['service_cat'].value_counts(dropna=False)
New 193 Experienced 172 Veteran 136 NaN 88 Established 62 Name: service_cat, dtype: int64
Now, we'll classify the employees according to their age. To do so, we'll take into account the values in the 'age' column under the following definitions:
def age_category(val):
if pd.isna(val):
return np.nan
elif val < 31:
return 'Junior'
elif val < 46:
return 'Middle'
else:
return 'Senior'
combined['age_cat'] = combined['age'].apply(age_category)
combined['age_cat'].value_counts(dropna=False)
Senior 230 Middle 227 Junior 139 NaN 55 Name: age_cat, dtype: int64
Now let's check how many missing values there are in the combined dataset and then we'll see how we can deal with them.
combined_info = df_info(combined, title_df='DETE & TAFE - Surveys')
combined_info
DETE & TAFE - Surveys <class 'pandas.core.frame.DataFrame'>: ------------------------------------------------------------ RangeIndex: 651 entries, 0 to 650 Data columns (total 11 columns): DTypes: object(7) float64(4)
Column Name | Non-null Count | Missing Values Count | Missing Values % | DType | |
---|---|---|---|---|---|
0 | id | 651 | 0 | 0.00 | float64 |
1 | cease_date | 635 | 16 | 2.46 | float64 |
2 | position | 598 | 53 | 8.14 | object |
3 | employment_status | 597 | 54 | 8.29 | object |
4 | gender | 592 | 59 | 9.06 | object |
5 | age | 596 | 55 | 8.45 | float64 |
6 | institute_service | 563 | 88 | 13.52 | float64 |
7 | dissatisfied | 643 | 8 | 1.23 | object |
8 | institute | 651 | 0 | 0.00 | object |
9 | service_cat | 563 | 88 | 13.52 | object |
10 | age_cat | 596 | 55 | 8.45 | object |
In the column dissatisfied
, it has about 1% of its null values. These being few, we can replace them by the boolean value 'False'.
In the columns age
with its corresponding age_cat
and institute_service
with its corresponding service_cat
, we observe a significant amount of missing values. Here the strategy will be to consider only those records that present values other than null in the service_cat
and age_cat
fields simultaneously.
We also observed null values in the other columns, with the gender
column having the highest value among these (9%). However, these columns are only referential and we'll ignore them in our analysis; so, we can keep their null values intact.
del combined_info
combined['dissatisfied'].value_counts(dropna=False)
False 403 True 240 NaN 8 Name: dissatisfied, dtype: int64
combined['dissatisfied'] = combined['dissatisfied'].fillna(False)
combined['dissatisfied'].value_counts(dropna=False)
False 411 True 240 Name: dissatisfied, dtype: int64
serv_age_null_mask = (combined['service_cat'].isnull()) & (combined['age_cat'].isnull())
serv_age_null = combined[serv_age_null_mask]
print('Total rows:', serv_age_null.shape[0])
serv_age_null
Total rows: 53
id | cease_date | position | employment_status | gender | age | institute_service | dissatisfied | institute | service_cat | age_cat | |
---|---|---|---|---|---|---|---|---|---|---|---|
141 | 406.00 | 2,012.00 | Teacher | NaN | NaN | nan | nan | False | DETE | NaN | NaN |
301 | 804.00 | 2,013.00 | Teacher Aide | Permanent Part-time | NaN | nan | nan | False | DETE | NaN | NaN |
310 | 823.00 | 2,013.00 | Teacher Aide | NaN | NaN | nan | nan | False | DETE | NaN | NaN |
311 | 634,139,903,350,000,000.00 | 2,010.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
322 | 634,177,007,457,719,040.00 | 2,010.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
324 | 634,177,938,010,971,008.00 | 2,010.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
325 | 634,181,972,647,250,048.00 | 2,010.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
326 | 634,182,099,183,433,984.00 | 2,010.00 | NaN | NaN | NaN | nan | nan | True | TAFE | NaN | NaN |
327 | 634,183,070,751,313,024.00 | 2,010.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
331 | 634,193,418,130,686,976.00 | 2,010.00 | NaN | NaN | NaN | nan | nan | True | TAFE | NaN | NaN |
335 | 634,206,210,361,383,040.00 | 2,010.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
336 | 634,207,964,363,344,000.00 | 2,010.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
337 | 634,208,063,783,969,024.00 | 2,010.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
345 | 634,214,138,447,944,960.00 | 2,010.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
347 | 634,214,849,407,633,024.00 | 2,010.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
348 | 634,217,422,828,726,016.00 | 2,010.00 | NaN | NaN | NaN | nan | nan | True | TAFE | NaN | NaN |
367 | 634,257,448,030,289,024.00 | 2,010.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
370 | 634,266,079,365,634,048.00 | 2,010.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
373 | 634,267,873,855,009,024.00 | 2,011.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
375 | 634,268,641,051,884,032.00 | 2,011.00 | NaN | NaN | NaN | nan | nan | True | TAFE | NaN | NaN |
378 | 634,274,542,281,501,056.00 | 2,010.00 | NaN | NaN | NaN | nan | nan | True | TAFE | NaN | NaN |
379 | 634,274,566,132,353,024.00 | 2,010.00 | NaN | NaN | NaN | nan | nan | True | TAFE | NaN | NaN |
385 | 634,297,824,199,371,008.00 | nan | NaN | NaN | NaN | nan | nan | True | TAFE | NaN | NaN |
397 | 634,326,378,183,960,064.00 | 2,011.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
402 | 634,328,288,440,210,048.00 | nan | NaN | NaN | NaN | nan | nan | True | TAFE | NaN | NaN |
405 | 634,333,279,253,906,048.00 | 2,011.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
419 | 634,381,134,640,312,064.00 | 2,011.00 | NaN | NaN | NaN | nan | nan | True | TAFE | NaN | NaN |
440 | 634,456,810,990,000,000.00 | 2,010.00 | NaN | NaN | NaN | nan | nan | True | TAFE | NaN | NaN |
453 | 634,499,252,882,656,000.00 | 2,010.00 | NaN | NaN | NaN | nan | nan | True | TAFE | NaN | NaN |
461 | 634,523,409,759,687,936.00 | 2,011.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
466 | 634,551,034,294,686,976.00 | 2,011.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
472 | 634,558,122,915,155,968.00 | 2,011.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
474 | 634,563,192,324,844,032.00 | 2,011.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
476 | 634,564,734,197,031,040.00 | 2,011.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
495 | 634,592,500,853,750,016.00 | 2,011.00 | NaN | NaN | NaN | nan | nan | True | TAFE | NaN | NaN |
513 | 634,666,828,449,688,064.00 | 2,012.00 | NaN | NaN | NaN | nan | nan | True | TAFE | NaN | NaN |
519 | 634,683,154,020,000,000.00 | 2,012.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
523 | 634,696,327,235,313,024.00 | 2,012.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
543 | 634,782,721,630,781,056.00 | nan | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
554 | 634,811,041,501,719,040.00 | 2,012.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
556 | 634,811,182,139,843,968.00 | 2,012.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
558 | 634,812,941,202,499,968.00 | 2,012.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
562 | 634,818,651,570,936,960.00 | 2,012.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
581 | 634,878,467,049,843,968.00 | 2,012.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
596 | 634,915,560,890,625,024.00 | 2,013.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
599 | 634,937,512,037,811,968.00 | 2,013.00 | NaN | NaN | NaN | nan | nan | True | TAFE | NaN | NaN |
602 | 634,938,351,431,992,064.00 | 2,013.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
624 | 635,005,511,729,063,040.00 | 2,013.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
625 | 635,005,532,793,750,016.00 | 2,013.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
627 | 635,012,419,902,813,056.00 | 2,013.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
642 | 635,049,577,295,469,056.00 | 2,013.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
645 | 635,065,177,031,603,968.00 | 2,013.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
648 | 635,067,716,405,666,048.00 | 2,013.00 | NaN | NaN | NaN | nan | nan | False | TAFE | NaN | NaN |
We note that there are 53 records in total, which, have null values in both the service_cat
and age_cat
fields (about 8% of the total). If we keep these records, we'll only be increasing the bias in the final results. Therefore, to obtain a better approach, we'll remove these records.
combined.drop(combined[serv_age_null_mask].index, inplace=True)
del serv_age_null
combined.reset_index(drop=True, inplace=True)
combined_info_up = df_info(combined, title_df='DETE & TAFE - Surveys')
combined_info_up
DETE & TAFE - Surveys <class 'pandas.core.frame.DataFrame'>: ------------------------------------------------------------ RangeIndex: 598 entries, 0 to 597 Data columns (total 11 columns): DTypes: object(6) float64(4) bool(1)
Column Name | Non-null Count | Missing Values Count | Missing Values % | DType | |
---|---|---|---|---|---|
0 | id | 598 | 0 | 0.00 | float64 |
1 | cease_date | 585 | 13 | 2.17 | float64 |
2 | position | 595 | 3 | 0.50 | object |
3 | employment_status | 596 | 2 | 0.33 | object |
4 | gender | 592 | 6 | 1.00 | object |
5 | age | 596 | 2 | 0.33 | float64 |
6 | institute_service | 563 | 35 | 5.85 | float64 |
7 | dissatisfied | 598 | 0 | 0.00 | bool |
8 | institute | 598 | 0 | 0.00 | object |
9 | service_cat | 563 | 35 | 5.85 | object |
10 | age_cat | 596 | 2 | 0.33 | object |
Now, we should note that some null values may still exist in the columns service_cat
and age_cat
but these will no longer occur simultaneously. That is, if this is the case, we could still fill in some null values, since, for each missing value in the service_cat
column there would be a value in the age_cat
column and vice versa.
For a better result, we should take into account the values in the institute
field, since they could have a very different distribution in the career stages.
del combined_info_up
combined[combined['institute'] == 'DETE']['service_cat'].value_counts(dropna=False)
Veteran 100 Experienced 76 New 56 Established 41 NaN 35 Name: service_cat, dtype: int64
combined[combined['institute'] == 'TAFE']['service_cat'].value_counts(dropna=False)
New 137 Experienced 96 Veteran 36 Established 21 Name: service_cat, dtype: int64
We see that all the missing values in the service_cat
column correspond to the DETE dataset. Therefore, the missing values in the institute_service
column also correspond to the same data set, since the values in the service_cat
column are based on the values in the institute_service
column.
combined[combined['institute'] == 'DETE']['institute_service'].value_counts(dropna=False)
nan 35 5.00 23 1.00 22 3.00 20 0.00 20 6.00 17 4.00 16 9.00 14 2.00 14 7.00 13 13.00 8 8.00 8 20.00 7 15.00 7 12.00 6 22.00 6 17.00 6 10.00 6 14.00 6 16.00 5 18.00 5 24.00 4 23.00 4 11.00 4 39.00 3 32.00 3 19.00 3 21.00 3 36.00 2 30.00 2 25.00 2 28.00 2 26.00 2 29.00 1 42.00 1 38.00 1 27.00 1 41.00 1 35.00 1 49.00 1 34.00 1 33.00 1 31.00 1 Name: institute_service, dtype: int64
combined[combined['institute'] == 'TAFE']['institute_service'].value_counts(dropna=False)
1.00 137 3.00 63 5.00 33 15.00 26 8.00 21 20.00 10 Name: institute_service, dtype: int64
Let's review the records with missing values in the institute_service
column and see how the values in the age_cat
field are. Based on the observations, we'll work out a strategy to fill in the values in the service_cat
column.
null_inst_serv = combined[combined['institute_service'].isnull()]
print('Total rows:', null_inst_serv.shape[0])
null_inst_serv
Total rows: 35
id | cease_date | position | employment_status | gender | age | institute_service | dissatisfied | institute | service_cat | age_cat | |
---|---|---|---|---|---|---|---|---|---|---|---|
7 | 17.00 | 2,012.00 | Teacher Aide | Permanent Part-time | Male | 61.00 | nan | True | DETE | NaN | Senior |
17 | 40.00 | 2,012.00 | Teacher | Permanent Full-time | Female | 23.00 | nan | True | DETE | NaN | Junior |
37 | 107.00 | 2,012.00 | Teacher Aide | Temporary Part-time | Female | 48.00 | nan | True | DETE | NaN | Senior |
50 | 141.00 | 2,012.00 | Teacher Aide | Permanent Part-time | Female | 53.00 | nan | False | DETE | NaN | Senior |
62 | 197.00 | 2,012.00 | Teacher Aide | Permanent Part-time | Female | 48.00 | nan | False | DETE | NaN | Senior |
95 | 289.00 | 2,013.00 | Public Servant | Permanent Full-time | Female | 28.00 | nan | True | DETE | NaN | Junior |
96 | 292.00 | 2,013.00 | Teacher Aide | Permanent Part-time | Female | 58.00 | nan | False | DETE | NaN | Senior |
97 | 294.00 | 2,012.00 | Schools Officer | Permanent Part-time | NaN | 61.00 | nan | False | DETE | NaN | Senior |
101 | 302.00 | 2,012.00 | School Administrative Staff | Permanent Part-time | Female | 48.00 | nan | False | DETE | NaN | Senior |
117 | 344.00 | 2,012.00 | School Administrative Staff | Permanent Part-time | Female | 43.00 | nan | False | DETE | NaN | Middle |
124 | 366.00 | 2,012.00 | Teacher | Permanent Full-time | Male | 28.00 | nan | False | DETE | NaN | Junior |
132 | 380.00 | 2,012.00 | Teacher | Permanent Part-time | Female | 28.00 | nan | False | DETE | NaN | Junior |
140 | 400.00 | 2,013.00 | Cleaner | NaN | Male | 58.00 | nan | True | DETE | NaN | Senior |
142 | 408.00 | 2,013.00 | Technical Officer | Temporary Part-time | Male | 61.00 | nan | False | DETE | NaN | Senior |
144 | 410.00 | 2,012.00 | Teacher Aide | Permanent Part-time | Female | 38.00 | nan | False | DETE | NaN | Middle |
157 | 439.00 | 2,013.00 | Teacher | Permanent Full-time | Female | 23.00 | nan | False | DETE | NaN | Junior |
160 | 450.00 | 2,012.00 | Teacher | Permanent Full-time | Male | 38.00 | nan | True | DETE | NaN | Middle |
170 | 472.00 | 2,013.00 | Teacher | Permanent Part-time | Female | 58.00 | nan | True | DETE | NaN | Senior |
179 | 490.00 | 2,012.00 | Cleaner | NaN | Female | 48.00 | nan | False | DETE | NaN | Senior |
191 | 532.00 | 2,013.00 | Cleaner | Permanent Part-time | Female | 58.00 | nan | True | DETE | NaN | Senior |
195 | 539.00 | 2,013.00 | Teacher | Permanent Full-time | Male | 28.00 | nan | True | DETE | NaN | Junior |
244 | 663.00 | 2,013.00 | Teacher | Permanent Part-time | Female | 48.00 | nan | False | DETE | NaN | Senior |
254 | 685.00 | nan | Teacher | Permanent Full-time | Male | 23.00 | nan | True | DETE | NaN | Junior |
262 | 696.00 | nan | Teacher Aide | Casual | Female | 48.00 | nan | False | DETE | NaN | Senior |
268 | 706.00 | nan | Teacher Aide | Permanent Full-time | Female | 43.00 | nan | False | DETE | NaN | Middle |
269 | 711.00 | nan | Teacher | Permanent Full-time | Female | 53.00 | nan | True | DETE | NaN | Senior |
270 | 714.00 | 2,013.00 | Teacher Aide | Permanent Part-time | Female | 61.00 | nan | False | DETE | NaN | Senior |
272 | 726.00 | nan | Teacher | Permanent Full-time | Female | 48.00 | nan | False | DETE | NaN | Senior |
288 | 772.00 | nan | Cleaner | Permanent Part-time | Female | 61.00 | nan | False | DETE | NaN | Senior |
290 | 776.00 | nan | Teacher Aide | Permanent Part-time | Female | 43.00 | nan | False | DETE | NaN | Middle |
293 | 790.00 | nan | Teacher | Permanent Full-time | Female | 43.00 | nan | False | DETE | NaN | Middle |
296 | 793.00 | nan | Public Servant | Permanent Part-time | Female | 48.00 | nan | True | DETE | NaN | Senior |
297 | 796.00 | 2,013.00 | Cleaner | Permanent Part-time | Female | 38.00 | nan | False | DETE | NaN | Middle |
298 | 799.00 | nan | Public Servant | Permanent Part-time | Female | 38.00 | nan | False | DETE | NaN | Middle |
299 | 800.00 | nan | Teacher Aide | Permanent Part-time | Female | 38.00 | nan | False | DETE | NaN | Middle |
We can see that the age_cat
field lacks null values in that subset, that is why we'll take them as a basis to fill the values of the service_cat
column in those records.
But first, let's see the frequency of the values of the age
column within the subset seen above.
null_inst_serv['age'].value_counts(dropna=False).sort_index()
23.00 3 28.00 4 38.00 5 43.00 4 48.00 8 53.00 2 58.00 4 61.00 5 Name: age, dtype: int64
Now, let's look at the frequency of values by age group (age_cat
column) for the subset DETE.
del null_inst_serv
age_cat_lst = ['Junior', 'Middle', 'Senior']
for ac in age_cat_lst:
condition = (combined['institute'] == 'DETE') & (combined['age_cat'] == ac)
print('DETE -', ac)
print(combined[condition]['service_cat'].value_counts(dropna=False))
print()
DETE - Junior Experienced 27 New 23 Established 8 NaN 7 Name: service_cat, dtype: int64 DETE - Middle Experienced 33 Veteran 32 New 24 Established 20 NaN 9 Name: service_cat, dtype: int64 DETE - Senior Veteran 67 NaN 19 Experienced 16 Established 13 New 8 Name: service_cat, dtype: int64
We see that the average minimum age in the subset is 23, while the average maximum age is 61 (passing through intermediate ages). Based on these results, we can pose the following:
service_cat
field, where the value of the age_cat
field is Junior, with the value Experienced, which comprises the working range of 3-6 years.service_cat
field, where the value of the age_cat
field is Middle, with the value Veteran, which comprises the working range of 7-10 years.We chose Veteran, the second most frequent value, because it would be the most appropriate based on length of service for that age group.
service_cat
field, where the value of the age_cat
field is Senior, with the value Veteran, which comprises the working range of 11 years and older.#age_cat_lst = ['Junior', 'Middle', 'Senior']
values = ['Experienced', 'Veteran', 'Veteran']
for ac, val in zip(age_cat_lst, values):
condition = (combined['institute'] == 'DETE') & (combined['age_cat'] == ac) & (combined['institute_service'].isnull())
combined.loc[condition, 'service_cat'] = val
print('Total null-values in service_cat column:', combined['service_cat'].isnull().sum())
Total null-values in service_cat column: 0
Now that we have zeroed out the null values in the service_cat
field, let's review the values in the age_cat
column.
combined[combined['age_cat'].isnull()]
id | cease_date | position | employment_status | gender | age | institute_service | dissatisfied | institute | service_cat | age_cat | |
---|---|---|---|---|---|---|---|---|---|---|---|
68 | 215.00 | 2,012.00 | School Administrative Staff | Permanent Part-time | Female | nan | 13.00 | False | DETE | Veteran | NaN |
93 | 286.00 | 2,012.00 | Cleaner | Permanent Full-time | Female | nan | 0.00 | False | DETE | New | NaN |
We see that there are only a couple of values left to fill the age_cat
column. We'll do it using the previous strategy, that is, based on the most frequent value of each group.
serv_cat_lst = ['Veteran', 'New']
for sc in serv_cat_lst:
condition = (combined['institute'] == 'DETE') & (combined['service_cat'] == sc)
print('DETE -', sc)
print(combined[condition]['age_cat'].value_counts(dropna=False))
print()
DETE - Veteran Senior 86 Middle 41 NaN 1 Name: age_cat, dtype: int64 DETE - New Middle 24 Junior 23 Senior 8 NaN 1 Name: age_cat, dtype: int64
#serv_cat_lst = ['Veteran', 'New']
values = ['Senior', 'Middle']
for sc, val in zip(serv_cat_lst, values):
conditions = (combined['institute'] == 'DETE') & (combined['service_cat'] == sc) & (combined['age'].isnull())
combined.loc[conditions, 'age_cat'] = val
print('Total null-values in age_cat column:', combined['age_cat'].isnull().sum())
Total null-values in age_cat column: 0
Let's check the number of null values for each column of our combined dataset. Note that the columns dissatisfied
, service_cat
and age_cat
are relevant to our analysis. If they lack null values, we are ready.
combined_final = df_info(combined, title_df='DETE & TAFE - Surveys')
combined_final
DETE & TAFE - Surveys <class 'pandas.core.frame.DataFrame'>: ------------------------------------------------------------ RangeIndex: 598 entries, 0 to 597 Data columns (total 11 columns): DTypes: object(6) float64(4) bool(1)
Column Name | Non-null Count | Missing Values Count | Missing Values % | DType | |
---|---|---|---|---|---|
0 | id | 598 | 0 | 0.00 | float64 |
1 | cease_date | 585 | 13 | 2.17 | float64 |
2 | position | 595 | 3 | 0.50 | object |
3 | employment_status | 596 | 2 | 0.33 | object |
4 | gender | 592 | 6 | 1.00 | object |
5 | age | 596 | 2 | 0.33 | float64 |
6 | institute_service | 563 | 35 | 5.85 | float64 |
7 | dissatisfied | 598 | 0 | 0.00 | bool |
8 | institute | 598 | 0 | 0.00 | object |
9 | service_cat | 598 | 0 | 0.00 | object |
10 | age_cat | 598 | 0 | 0.00 | object |
combined['dissatisfied'].value_counts()
False 372 True 226 Name: dissatisfied, dtype: int64
dis_pct = combined.pivot_table(index='service_cat', values='dissatisfied', aggfunc=np.sum)['dissatisfied'].sort_values(ascending=False).to_frame()
dete_service = combined[combined['institute'] == 'DETE'].reset_index(drop=True)
dis_pct['DETE'] = dete_service.pivot_table(index='service_cat', values='dissatisfied', aggfunc=np.sum)['dissatisfied'].sort_values(ascending=False)
tafe_service = combined[combined['institute'] == 'TAFE'].reset_index(drop=True)
dis_pct['TAFE'] = tafe_service.pivot_table(index='service_cat', values='dissatisfied', aggfunc=np.sum)['dissatisfied'].sort_values(ascending=False)
dis_pct.rename(columns={'dissatisfied':'Total'}, inplace=True)
dis_pct.index.names = ['S. CATEGORY']
dis_pct.plot(kind='bar', rot=30)
plt.title('Dissatisfied by Service Category', fontsize=14)
plt.xlabel('Service category', fontsize=12)
plt.ylabel('Dissatisfied employees number', fontsize=12)
plt.legend(frameon=False)
sns.despine()
dis_pct
Total | DETE | TAFE | |
---|---|---|---|
S. CATEGORY | |||
Veteran | 74.00 | 64.00 | 10.00 |
Experienced | 63.00 | 39.00 | 24.00 |
New | 57.00 | 21.00 | 36.00 |
Established | 32.00 | 25.00 | 7.00 |
Now, in the same way as we did before, we'll see the number of dissatisfied employees who left the institute by age group.
dis_age = combined.pivot_table(index='age_cat', values='dissatisfied', aggfunc=np.sum)['dissatisfied'].sort_values(ascending=False).to_frame()
dete_serv_age = combined[combined['institute'] == 'DETE'].reset_index(drop=True)
dis_age['DETE'] = dete_serv_age.pivot_table(index='age_cat', values='dissatisfied', aggfunc=np.sum)['dissatisfied'].sort_values(ascending=False)
tafe_serv_age = combined[combined['institute'] == 'TAFE'].reset_index(drop=True)
dis_age['TAFE'] = tafe_serv_age.pivot_table(index='age_cat', values='dissatisfied', aggfunc=np.sum)['dissatisfied'].sort_values(ascending=False)
dis_age.rename(columns={'dissatisfied':'Total'}, inplace=True)
dis_age.index.names = ['AGE CATEGORY']
dis_age.plot(kind='bar', rot=30)
plt.title('Dissatisfied by Age Group', fontsize=14)
plt.xlabel('Age category', fontsize=12)
plt.ylabel('Dissatisfied employees number', fontsize=12)
plt.legend(frameon=False)
sns.despine()
dis_age
Total | DETE | TAFE | |
---|---|---|---|
AGE CATEGORY | |||
Senior | 94.00 | 65.00 | 29.00 |
Middle | 83.00 | 55.00 | 28.00 |
Junior | 49.00 | 29.00 | 20.00 |
Of the employees who resigned due to some type of dissatisfaction, those who have worked longer are the most likely to resign due to dissatisfaction issues (33%) than those who worked for less time (26% on average for Experienced* and New. 14% for Established)*.
Similarly, of the employees who resigned due to some form of dissatisfaction, those in the oldest age group (Senior* category)* are the most likely to resign due to dissatisfaction issues (42%) than those younger employees (22% for Junior).
In total, 226 employees resigned due to some form of dissatisfaction. Summarizing by length of service we obtain:
Finally, we found that the institute with the highest number of employees leaving due to some type of dissatisfaction is DETE, with almost 66%.