I analyzed exit survy data from Department of Education, Training and Employment (DETE) and the Technical and Further Education (TAFE) institute in Queensland, Australia.
The cleaned dataset had 651 rows attributed to either TAFE, or DETE.
I found that:
Resignations increased by more than a factor of 100 starting in 2010.
Employees with 3 years of service or fewer were least likely to resign due to some kind of job dissatisfaction, vs other reasons.
Employees with 7-11 years of service are most likely to resign due to some kind of job dissatisfaction vs other reasons.
Across all age ranges, 38% of resignations involve some sort of job dissatisfaction.
Employees at DETE resign for reasons involving job dissatisfaction 48% of the time, while the figure for TAFE is 27%.
This notebook details the analysis of employee exit surveys from DETE and TAFE.
Data preparation and cleaning for this analysis was performed using preparation.ipynb.
In this project I'll try to answer the following questions:
My analysis draws on exit surveys from Department of Education, Training and Employment (DETE) and the Technical and Further Education (TAFE) institute in Queensland, Australia.
I don't have a complete data dictionary, so I'll have to rely on some general knowledge in interpreting the data.
Below is a preview of a couple columns we'll work with from the dete_survey.csv:
ID
: An id used to identify the participant of the surveySeparationType
: The reason why the person's employment endedCease Date
: The year or month the person's employment endedDETE Start Date
: The year the person began employment with the DETEBelow is a preview of a couple columns we'll work with from the tafe_survey.csv:
Record ID
: An id used to identify the participant of the surveyReason for ceasing employment
: The reason why the person's employment endedLengthofServiceOverall. Overall Length of Service at Institute (in years)
: The length of the person's employment (in years)import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
%matplotlib inline
%config InlineBackend.figure_format='retina' # Doubles chart resolution
combined_updated = pd.read_pickle('combined_updated.pickle')
len(combined_updated)
651
(combined_updated['dissatisfied'] | combined_updated['other_reasons']).sum()
585
combined_updated['dissatisfied'].sum()
240
combined_updated['other_reasons'].sum()
486
(combined_updated['dissatisfied'] & combined_updated['other_reasons']).sum()
141
(combined_updated['dissatisfied'] & ~combined_updated['other_reasons']).sum()
99
The dataset has:
combined_updated['cease_date'].value_counts().sort_index()
2006.0 1 2009.0 2 2010.0 70 2011.0 116 2012.0 223 2013.0 201 2014.0 22 Name: cease_date, dtype: int64
fig, [ax1, ax2, ax3] = plt.subplots(1,3, figsize =(16,4))
combined_updated['cease_date'].plot.hist(ax=ax1, bins=9, xlim=(2006,2014))
ax1.spines['top'].set_visible(False)
ax1.spines['left'].set_visible(False)
ax1.spines['right'].set_visible(False)
ax1.tick_params(left=False, bottom=False)
ax1.set_title("All")
combined_updated[combined_updated['institute'] == 'TAFE']['cease_date'].plot.hist(ax=ax2, bins=9, xlim=(2006,2014))
ax2.spines['top'].set_visible(False)
ax2.spines['left'].set_visible(False)
ax2.spines['right'].set_visible(False)
ax2.tick_params(left=False, bottom=False)
ax2.set_title("TAFE")
combined_updated[combined_updated['institute'] == 'DETE']['cease_date'].plot.hist(ax=ax3, bins=9, xlim=(2006,2014))
ax3.spines['top'].set_visible(False)
ax3.spines['left'].set_visible(False)
ax3.spines['right'].set_visible(False)
ax3.tick_params(left=False, bottom=False)
ax3.set_title("DETE")
plt.show()
This seems significant, but their meaning is unclear.
They suggest that either:
OR
If its the latter, then much of the remaining analysis and any conclusions are likely to be flawed.
combined_updated['service_cat'].notnull().sum()
563
563 rows had enough information to determine the duration of the employees service in their last position before their resignation.
I categorized this information as follows:
These categories were used for further aggregation and analysis.
rows_in_order = ['New', 'Experienced', 'Established', 'Veteran', 'All']
table = combined_updated.pivot_table('dissatisfied', index='service_cat', aggfunc=['mean', 'sum'], margins=True)
table = table.reindex(rows_in_order)
table
mean | sum | |
---|---|---|
dissatisfied | dissatisfied | |
service_cat | ||
New | 0.295337 | 57 |
Experienced | 0.343023 | 59 |
Established | 0.516129 | 32 |
Veteran | 0.485294 | 66 |
All | 0.380107 | 214 |
# Index order isn't preserved, so I have to set it again.
ax = (table['mean']*100).reindex(rows_in_order).plot.bar(legend=False)
ax.spines['top'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
#ax.spines['bottom'].set_visible(False)
ax.tick_params(left=False, bottom=False)
ax.set_xlabel("")
ax.set_ylabel("percent")
ax.set_title("Dissatisfied Resignations by Tenure")
plt.show()
Across all service durations, 38% of resignations involve job dissatisfaction. Employees with six years or fewer of service are less likely than average to cite job dissatisfaction as a reason for their resignation, newest employees being the least likely. Employees with seven years or more of service are more likely than average to depart for reasons that include job dissatisfaction.
If we look at the total number of people, rather than category percentages, we see that more than half those whose reasons for resignation include job dissatisfaction, are either New or Experienced.
table = combined_updated.pivot_table('dissatisfied', index='age', aggfunc=['mean', 'sum'], margins=True)
table
mean | sum | |
---|---|---|
dissatisfied | dissatisfied | |
age | ||
20 or younger | 0.200000 | 2 |
21-25 | 0.306452 | 19 |
26-30 | 0.417910 | 28 |
31-35 | 0.377049 | 23 |
36-40 | 0.342466 | 25 |
41-45 | 0.376344 | 35 |
46-50 | 0.382716 | 31 |
51-55 | 0.422535 | 30 |
56 or older | 0.423077 | 33 |
All | 0.379195 | 226 |
ax = (table['mean']*100).plot.bar(legend=False)
ax.spines['top'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
#ax.spines['bottom'].set_visible(False)
ax.tick_params(left=False, bottom=False)
ax.set_xlabel("")
ax.set_ylabel("percent")
ax.set_title("Dissatisfied Resignations by Employee Age")
plt.show()
combined_updated.pivot_table('dissatisfied', index='institute', aggfunc=['mean', 'sum'], margins=True)
mean | sum | |
---|---|---|
dissatisfied | dissatisfied | |
institute | ||
DETE | 0.479100 | 149 |
TAFE | 0.267647 | 91 |
All | 0.368664 | 240 |
Nearly one half of resignations from DETE involved job dissatisfaction, while only just over a quarter of resignations from TAFE did.
Resignations at DETE and TAFE exploded 100-fold. after 2009 suggesting significant changes at both institutions. Unfortunately the dataset includes just three resignations from before 2010, making a meaningful comparison of the two periods impossible.
Job dissatisfaction is a factor in almost 40% of resignations. It is the exclusive factor in 17% of resignations.
While the % of resignations involving job dissatisfaction varies by age and tenure of the departing employee, it is a significant factor in resignations in every group. At least 30% of resigning employees cite job dissatisfaction as a reason for every duration of tenure, and also for every age group save those twenty years old or younger.
The significance of job dissatisfaction in resignations varies substantially between DETE and TAFE. Efforts to reduce job dissatisfaction, and resulting resignations, should seek to understand the reasons for the difference between institutions.